Basic guideline in order to maximize the performance from Elastifile cluster.
In order to run the erun performance tool, install the following rpm in a Centos7 client:
elfs-tools-2.7.1.2-53085.fc219ee4f9c3.el7.centos.x86_64.rpm
For Erun tests
- use the same number of clients/loaders/machines, as the number of ECFS nodes, for example, 3x nodes should be tested with 3 erun clients/loaders.
- in the erun command:
- clients (not the number of clients/loaders.machines...) = number of cores / 2
- files (nr-files)= number of cores
- Queue = should be tuned to suit the latency demands:
- Latency too high --> decrease Queue size
- Latency too low --> Increase Queue size.
- When starting a new erun test, use --initial-write-phase flag, to create new data. this will first build the working set (perform only writes...) and only then start with the requested workload.
- Once the data is available, and there is a need to rerun a new test on the same data, using different options (such as: different queue size or read/write ratio), use --reuse-existing-files instead.
- erun example, OLTP workload for 4 cores per node, testing 4k block size, 70/30:
- erun --profile io --data-payload --max-file-size 100M --clients 2 --nr-files 4 --queue-size 8 --readwrites 70 --min-io-size 4K --max-io-size 4K --duration 12000 --erun-dir `hostname` 10.99.0.2:dc/root --initial-write-phase
- erun example, BW tests for 4 cores per node, testing 64k block size, 70/30:
- erun --profile io --data-payload --max-file-size 100M --clients 2 --nr-files 4 --queue-size 4 --readwrites 70 --min-io-size 64K --max-io-size 64K --duration 12000 --erun-dir `hostname` 10.99.0.2:dc/root --io-alignment 32768 --initial-write-phase
For Any other testing tool on linux machine, the rule of thumb is
- Clients = half the number of the cluster cores. i.e. 3x nodes with 4x cores, should be tested with 6x clients
- Total number of Files = as the same number the cluster cores. i.e. 3x nodes with 4x cores, should be tested with 12x clients
- Reaching max number of IOPS, with low latency (~2ms) - using 4k or 8k block sizes.
- Reaching max BW, where latency is less crucial (can be ~ 10-20 ms) - using 32k, 64k or 256k block sizes.
The current Elastifile configurations in GCP:
ECFS Configuration Type | Cluster Info |
---|---|
SSD Persistent Disks - SMALL | 4 Cores, 32GB RAM, 4 x 175GB PD SSD |
SSD Persistent Disks - MEDIUM | 4 Cores, 42GB RAM, 4 x 1TB PD SSD |
SSD Persistent Disks - LARGE | 16 Cores, 96GB RAM, 4 X 5TB PD SSD |
Local SSD | 16 Cores, 96GB RAM, 8 X 375GB Local SSD |
Standard Persistent Disks (not under 2ms, due to standard drives..) | 4 Cores, 64GB RAM, 4 x 1TB Standard PD |
MAX Configuration - Local SSD |
Some examples of the expected performance results from different GCP configurations:
Maximum sustained IOPS (under 2ms)
ECFS Configuration Type | Read IOPS | Write IOPS | Mixed Read/Write IOPS (70/30 Ratio) | |||
Per System | Per Node | Per System | Per Node | Per System | Per Node | |
SSD Persistent Disks - SMALL - 3 nodes | 40,000 | 13,000 | 10,000 | 3,300 | 20,000 | 6,600 |
SSD Persistent Disks - MEDIUM - 3 nodes | 40,000 | 13,000 | 10,000 | 3,300 | 20,000 | 6,600 |
SSD Persistent Disks - MEDIUM - 3 nodes - Single Replication | 42,000 | 14,000 | 24,300 | 8,100 | 30,000 | 10,000 |
SSD Persistent Disks - LARGE - 3 nodes | 74,000 | 24,000 | 19,000 | 6,300 | 45,000 | 15,000 |
SSD Persistent Disks - LARGE - 3 nodes - Single Replication | 74,000 | 24,000 | 52,000 | 17,300 | 64,000 | 21,300 |
Local SSD - 3 node | 178,000 | 59,000 | 51,000 | 17,000 | 105,000 | 35,000 |
Standard Persistent Disks (not under 2ms, due to standard drives..) - 6 nodes | 18,000 | 3,000 | 11,500 | 1,900 | 14,000 | 2,300 |
MAX Configuration - Local SSD |
Maximum sustained throughput (MB/s)
ECFS Configuration Type | Read Throughput (MB/s) | Write Throughput (MB/s) | |||
Per System | Per Node | Per System | Per Node | ||
SSD Persistent Disks - SMALL - 3 nodes | 4 Cores, 32GB RAM, 4 x 175GB PD SSD | 700 | 233 | 200 | 66 |
SSD Persistent Disks - MEDIUM - 3 nodes | 4 Cores, 56GB RAM, 4 x 1TB PD SSD | 700 | 233 | 200 | 66 |
SSD Persistent Disks - MEDIUM - 3 nodes - Single Replication | 4 Cores, 56GB RAM, 4 x 1TB PD SSD | 1100 | 366 | 395 | 131 |
SSD Persistent Disks - LARGE - 3 nodes | 16 Cores, 96GB RAM, 4 X 5TB PD SSD | 1,700 | 566 | 330 | 110 |
SSD Persistent Disks - LARGE - 3 nodes - Single Replication | 16 Cores, 96GB RAM, 4 X 5TB PD SSD | 2000 | 666 | 910 | 303 |
Local SSD - 3 nodes | 16 Cores, 96GB RAM, 8 X 375GB Local SSD | 3,500 | 1167 | 1,100 | 367 |
Standard Persistent Disks - 6 nodes - Default | 4 Cores, 64GB RAM, 4 x 1TB Standard PD | 470 | 80 | 218 | 36 |
Standard Persistent Disks - 3 nodes | 4 Cores, 64GB RAM, 4 x 1TB Standard PD | 240 | 80 | 112 | 37 |
Standard Persistent Disks - 6 nodes | 4 Cores, 64GB RAM, 4 x 3TB Standard PD | 500 | 83 | 280 | 45 |
Standard Persistent Disks - 3 nodes | 4 Cores, 64GB RAM, 4 x 3TB Standard PD | 242 | 80 | 150 | 50 |
Standard Persistent Disks - 3 nodes | 4 Cores, 32GB RAM, 4 x 175GB Standard PD | 75 | 25 | 50 | 17 |
MAX Configuration - Local SSD |
Single Client comparison tests:
Centos 7 with nfs (erun) | ||||||||||||||||||||
Test (100MB files...) | ECFS Configuration Type | Read IOPS | Write IOPS | 70/30 IOPS | Read Throughput (MB/s) | Write Throughput (MB/s) | ||||||||||||||
Cluster | Latency-ms | Per Node | Per Client | Cluster | Latency-ms | Per Node | Per Client | Cluster | Latency-ms | Per Node | Per Client | Cluster | Per Node | Per Client | Cluster | Per Node | Per Client | |||
1 Client,1 Connection, 4 files | SSD Persistent Disks - MEDIUM - 3 nodes | 4 Cores, 56GB RAM, 4 x 1TB PD SSD | ||||||||||||||||||
1 Client,6 Connections, 4 files | SSD Persistent Disks - MEDIUM - 3 nodes | 4 Cores, 56GB RAM, 4 x 1TB PD SSD | ||||||||||||||||||
1 Client,1 Connection, 1 files | SSD Persistent Disks - MEDIUM - 3 nodes | 4 Cores, 56GB RAM, 4 x 1TB PD SSD | ||||||||||||||||||
1 Client,6 Connections, 1 files | SSD Persistent Disks - MEDIUM - 3 nodes | 4 Cores, 56GB RAM, 4 x 1TB PD SSD | ||||||||||||||||||
1 Client,1 Connection, 20 files | Local SSD - 3 nodes | 20 Cores, 128GB RAM, 8 X 375GB Local SSD | 23500 | 1.7 | 7833 | 9300 | 1.9 | 3100 | 12300 | 1.7 | 4100 | 1000 | 333 | 665 | 222 | |||||
1 Client,1 Connection, 1 files | Local SSD - 3 nodes | 20 Cores, 128GB RAM, 8 X 375GB Local SSD | 14600 | 1.7 | 4867 | 5700 | 1.9 | 1900 | 13500 | 1.9 | 4500 | 1300 | 433 | 240 | 80 | |||||
1 Client,30 Connections, 20 files | Local SSD - 3 nodes | 20 Cores, 128GB RAM, 8 X 375GB Local SSD | 180000 | 2.6 | 60000 | 60000 | 2.7 | 20000 | 113000 | 2.7 | 37667 | 1600 | 533 | 770 | 257 | |||||
1 Client,30 Connections, 1 files | Local SSD - 3 nodes | 20 Cores, 128GB RAM, 8 X 375GB Local SSD | 194000 | 2.3 | 64667 | 62000 | 2.6 | 20667 | 117000 | 2.6 | 39000 | 1600 | 533 | 810 | 270 | |||||
Windows 2016R2 with nfs services (Latency from the GUI...) | ||||||||||||||||||||
1 Client,1 Node, 20 files | Local SSD - 4 nodes | 20 Cores, 128GB RAM, 8 X 375GB Local SSD | 24000 | 2.1 | 6000 | 24000 | 16500 | 2.8 | 4125 | 16500 | 20000 | 2.2 | 5000 | 20000 | 875 | 218.75 | 875 | 320 | 80 | 320 |
1 Client,1 Nodes, 1 files | Local SSD - 4 nodes | 20 Cores, 128GB RAM, 8 X 375GB Local SSD | 17000 | 1.8 | 4250 | 17000 | 4500 | 1.9 | 1125 | 4500 | 12000 | 1.9 | 3000 | 12000 | 950 | 237.5 | 950 | 165 | 41.25 | 165 |
10 Clients,1 Node, 20 files each | Local SSD - 4 nodes | 20 Cores, 128GB RAM, 8 X 375GB Local SSD | 95000 | 3 | 23750 | 9500 | 47000 | 2.9 | 11750 | 4700 | 70000 | 2.8 | 17500 | 7000 | 1800 | 450 | 180 | 680 | 170 | 68 |
10 Clients,1 Node, 1 file each | Local SSD - 4 nodes | 20 Cores, 128GB RAM, 8 X 375GB Local SSD | 91000 | 1.9 | 22750 | 9100 | 35000 | 2.3 | 8750 | 3500 | 60000 | 1.9 | 15000 | 6000 | 1800 | 450 | 180 | 670 | 167.5 | 67 |
10 Client,4 Nodes, 40 clients total 1 files each | Local SSD - 4 nodes | 20 Cores, 128GB RAM, 8 X 375GB Local SSD | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||||||
10 Client,4 Nodes, 40 clients total 20 files each | Local SSD - 4 nodes | 20 Cores, 128GB RAM, 8 X 375GB Local SSD | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||||||