Elastifile Storage Nodes Faulty Drive

Introduction

Sometimes you will see Node has a Faulty drive in your Elastifile cluster, but the status of the node will be active.

In order to resolve this issue, Please follow below steps.

SSH into EMS.

2. Get node ip-address & Id with faulty drive by running below command.

elfs-cli enodes list

Make note of the ids and their availability zone.

3. Add equal no of nodes using ECFS. i.e. If you have 2 nodes with faulty drives add 2 new nodes in the system.

Elastifile GUI -> System View -> Edit Capacity -> + button

**If you have a multi-zone deployment, you should have at least 1 node in every zone.

Now check the newly created nodes, if deployed in the same zone like faulty ones.

If newly created nodes, get created in a different zone than the faulty ones. Add new nodes again following step 3 and you can remove the extra nodes later.

4. Wait for ownership recovery and data rebuild to be finished.

Remove nodes

Use the below command to remove the old nodes from the system. You should use the same id as identified in step 2.

** While adding new nodes if you have created redundant nodes. Remove a specific node using node Id based on its IP address.

elfs-cli enode delete -–id <node_id> --async true

elfs-cli enode delete_multi --ids <node_ids> --async true

** Make note of task id, for checking status.

Node status will change from Active to Pending removal.

Post Checks

1. Check status of node removal task by executing below command.

elfs-cli control_task show --id <control_task_id>

2. To check if node has been removed successfully you can run below commands and check if the node count matches between ELFS and ECS.

elfs-cli enode list -t

ecs-cli nodes

** Nodes with pending removal status can take few hours to be fully removed from system.

Please contact Google Elastifile support at https://support.elastifile.com if you need any further help.

Was this helpful?

How can we improve it?