ECFS is in "Protocols Write Disable Mode" but system is not full

ECFS Version 3.2.1.x has known issue that can cause the file system to move to “Protocols Write Disable Mode” before the system has reached 100% (95% for single replica).

In version 3.2.1.51, in order to mitigate the issue, before ECFS goes into “Protocols Write Disable Mode” due to PStoreDirectBucketOverflowed (or mitigate the issue faster for systems already in PStoreDirectBucketOverflowed condition) , Elastifile introduced 2 new events (both events alert the Google Elastifile support) -

PStoreDirectBucketOverflowed - System has an over utilized bucket (ECFS internal metadata object) and Google support needs to manual workaround the issue.

Until the issue is resolved the system will stay in “Protocols Write Disable Mode” (reads and deletes are ok).

PStore direct bucket approach overflow - System status is OK but it's approaching “Protocols Write Disable Mode” due to PStoreDirectBucketOverflowed.

When this event is fired, system status is still OK, but Google Google support needs to implement a manual workaround the issue.

Both alerts will fire repeatedly until the issue is resolved.

If your ECFS deployment has any of the above events, contact Google Elastifile support at https://support.elastifile.com and open an P1 ticket with "PStore Direct Bucket Overflow - requesting immediate escalation" as the subject

PStore monitoring script

Elastifile created this monitoring script to help storage admins identify the current status of the Elastifile Cloud File System, and to speed up the mitigation in case system is already in one of the states described above.

The script can be used to check current status, in case any of the events described above, or if requested by Google Elastifile support.

Script will not run if the Elastifile Cloud File system rebuild is in progress.

How-to use the script -

ssh into the EMS instance.

cd /tmp

wget https://storage.googleapis.com/elastifile-software-repo/ECFS/pstore_monitoring/bucket_overflow_monitoring.tar.gz

tar zxvf bucket_overflow_monitoring.tar.gz

python bucket_overflow_monitoring.py > /tmp/output.txt

The tar contains 2 files -

bucket_overflow_monitoring.py
README

Run bucket_overflow_monitoring.py script according to the README file instructions.

The script execution might take some time.

Upom completion run the following command and share the output:

grep "WARN_#" < SCRIPT_OUTPUT_FILE> - is not empty, please contact Google Elastifile support at https://support.elastifile.com and open a ticket with "PStore Direct Bucket Overflow - requesting immediate escalation" as the subject

Was this helpful?

How can we improve it?