Overview
Elastifile Cloud File System includes an internal built in snapshot scheduling and cooling mechanism that moves consistent snapshots from the primary storage to GCS in ECFS proprietary format.
The proprietary format preserves the file structure as well as all files attributes. Data is compressed while in transit and the cooling mechanism is dedup aware so only unique data is sent over the wire to the GCS.
As with other backup solutions the first backup is a full backup and the following backups are incremental.
Solution Capabilities and Limitations
- The backup solution is for data backup, it does not backup the file system and cluster, recovering from a backup will require to remount and validate the file system.
- It is recommended to use one snapshot per day per DC to avoid performance degradation as result of the scanning process and snapshot deletion.
- No more than 30 namespaces (Data Containers) per ECFS clusters enabled with snapshot cooling scheduler.
- The above mentioned limitations are not enforced by the system so customers are advised to keep them in mind.
- Up to 100 snapshots per data container can be stored in GCS.
- DR-restoring a full snapshot from GCS is a manual external procedure. The procedure must be done with [email protected]. For more information, check this KB article.
Scope
Elastifile provides the ability to define snapshot schedulers on a data container level, as a backup mechanism.
When a customer defines a scheduler on multiple DCs, it can lead to a storm of snapshots deletion at the same time, which is a heavy operation that could impact system performance.
This article describes the procedure of implementing snapshot schedulers through the CLI, which brings additional capabilities which are not enabled in the GUI.
The important one is defining the start time which helps overcome the described challenges.
Using the start time flag, offsets can be defined in order to avoid snapshots deletion contention.
Pre-requisites
1. Make sure "Private Google Access" is enabled on the VPC subnetwork
Read the following article on How To Check and Change the Private Google Access setting
2. Object Tier is Available
In case the object tier feature is not available, please contact [email protected]
Elastifile will evaluate the request. If approved, a suitable license will be provided.
3. DATA ILM is disabled
Data ILM is a feature to move live cold data to GCS buckets, not just snapshots.
This feature is currently not supported and must be disabled.
In order to easily disable the feature on all existing DCs:
for i in $(elfs-cli data_container list | awk '{print $1}' | grep -o '^[0-9]*$'); do elfs-cli data_container update --id $i --data_ilm data_ilm_disabled --automatic_data_ilm auto_data_ilm_disabled; done
For any new data container please run the following as well:
elfs-cli data_container update --id <NEW_DC_ID> --data_ilm data_ilm_disabled --automatic_data_ilm auto_data_ilm_disabled
4. Activate Object Tier
Must be configured after step 2 is completed successfully
5. Set the data tiering policy to 100% primary
Configuration Guide
- List the data containers in the system and their IDs
[root@schedule ~(elfs_admin)]# elfs-cli data_container list | awk '{print$1, $2}'
id name
------------
1 dc01
2 dc02
3 dc03
4 dc04
2. Create the relevant schedule policies per DC
[root@schedule ~(elfs_admin)]# elfs-cli schedule create --name dc01 --data-container-ids 1
id: 1
name: dc01
state: enabled
data_containers:
id: 1 name: dc01
[root@schedule ~(elfs_admin)]# elfs-cli schedule create --name dc02 --data-container-ids 2
id: 2
name: dc02
state: enabled
data_containers:
id: 2 name: dc02
3. Define the task policy per each of the schedulers (use 1 hour gap between schedules)
- The id represents the schedule id created in step 1
- The repeat and delete after values are in minutes
- Start time is defined in UTC time
id: 1
name: dc01
schedule_id: 1
type: SnapshotIlmTask
start_time: 2020-04-24T20:00:00.000Z
repeat_after: 1440
cool_after: 10080
delete_after: 14400
created_at: Apr 11, 15:37:58
updated_at: Apr 11, 15:37:58
[root@schedule ~(elfs_admin)]# elfs-cli schedule create_task --id 2 --name dc02 --type SnapshotIlmTask --repeat-after 10080 --cool_after 14400 --delete-after 43200 --start-time 2020-04-27T23:00:00
id: 2
name: dc02
schedule_id: 2
type: SnapshotIlmTask
start
_time: 2020-04-24T23:00:00.000Z
repeat_after: 10080
cool_after: 14400
delete_after: 43200
created_at: Apr 11, 15:44:06
updated_at: Apr 11, 15:44:06
Manage ECFS Snapshots
DC01 has 4 snapshots- 2 are placed locally (in the SSD tier) and 2 are placed in the object tier.
The ECFS snapshots are reachable by accessing hidden directories, depending on where they are located:
[root@client ~]# mkdir /mnt/DC01
[root@client ~]# mount 10.229.255.1:DC01/root /mnt/DC01
[root@client ~]#
[root@client ~]# df -h /mnt/DC01
Filesystem Size Used Avail Use% Mounted on
10.229.255.1:DC01/root 1000G 0 1000G 0% /mnt/DC01
[root@client ~]#
[root@client ~]# cd /mnt/DC01
[root@client DC01]# ll .snapshot
total 0
drwxr-xr-x. 2 root root 0 Apr 26 11:00 local01
drwxr-xr-x. 2 root root 0 Apr 26 11:00 local02
[root@client DC01]#
[root@client DC01]# ll .object
total 0
drwxr-xr-x. 2 root root 0 Apr 26 11:00 object01
drwxr-xr-x. 2 root root 0 Apr 26 11:00 object02
How to restore a file/ dir from a snapshot
For a full DR-recovery manual procedure, please contact [email protected]
How to Prevent a Snapshot Deletion
There could be some cases where you would like to stop the existing snapshots in the system from being deleted as part of the scheduler, e.g. when there is an important snapshots of a specific point in time or in a performance issue when there is a massive snapshots deletion at the same time.
1. List the snapshots in the system:
elfs-cli snapshot list
2. Choose the relevant snapshot and configure by its id:
elfs-cli snapshot update --id 2 --no-deletion_schedule_mins