TACC System information:
- NO NVMe
- NO Persistence Memory
- Server will only use SCM (tmpfs which has size of ~90G only)
Node available with Queue Name
Queue Name Node Type Max Nodes per Job
(assoc'd cores)*Max Duration Max Jobs in Queue* Charge Rate
(per node-hour)skx-dev
SKX 4 nodes
(192 cores)*2 hrs 1* 1 SU skx-normal
SKX 128 nodes
(6,144 cores)*48 hrs 25* 1 SU skx-large
**SKX 868 nodes
(41,664 cores)*48 hrs 3* 1 SU Stampede2 SKX Compute Node Specifications
Model: Intel Xeon Platinum 8160 ("Skylake") Total cores per SKX node: 48 cores on two sockets (24 cores/socket) Hardware threads per core: 2 Hardware threads per node: 48 x 2 = 96 Clock rate: 2.1GHz nominal (1.4-3.7GHz depending on instruction set and number of active cores) RAM: 192GB (2.67GHz) DDR4 Cache: 32KB L1 data cache per core; 1MB L2 per core; 33MB L3 per socket. Each socket can cache up to 57MB (sum of L2 and L3 capacity). Local storage: 144GB /tmp
partition on a 200GB SSD. Size of/tmp
partition as of 14 Nov 2017.
Server:
Server Environment variables (If set any)
Client Configuration:
Configuration:
Environment variables (If set any):
Testing Area | Test | Test Priority (1- HIGH, 2 - LOW) | Number of Servers | Number of Clients | Input Parameter | Expected Result | Observed Result | Defect | Notes | Expected SU's (1 node * 1 hour = 1 SU) | |
---|---|---|---|---|---|---|---|---|---|---|---|
Server YAML config options | To verify the test cases from below section with specific server config options in YAML file | 1 | target = [16] nr_xs_helpers = [1] CRT_CTX_SHARE_ADDR=[0, 1] | No sever crash, Performance increase linearly | No need individual test but below test can be used this configuration | ||||||
Performance | No Replica Run IOR and collect BW Run IOR small size and collect IOPS | 1 | 1, 8, 32, 128 128 | 1, 16, 96, 256 740 | Transfer Size: 256B 4K 128K 512K 1M Block Size: 64M (Depend upon no. of process as file size will increase because of it) FPP and SSF | single server got ~12GB Read/write so it should scale linearly. | 1406 Nodes taking ~30 min | 703 | |||
Replica 2 Way Run IOR and collect BW Run IOR small size and collect IOPS | 1 | 8, 32, 128 | 16, 96, 740 | ||||||||
Replica 3 Way Run IOR and collect BW Run IOR small size and collect IOPS | 1 | 8, 32, 128 | 16, 96, 740 | ||||||||
Replica 4 Way Run IOR and collect BW Run IOR small size and collect IOPS | 1 | 8, 32, 128 | 16, 96, 740 | ||||||||
Metadata Test (Using MDTest) | |||||||||||
CART self_test | |||||||||||
POSIX (Fuse)? | |||||||||||
Functionality and Scale testing | Run all daos_test | ||||||||||
Single server/Max clients | 1 | MAX | |||||||||
Max servers/single client | MAX | 1 | |||||||||
Large number of Pools (~1000) what other operation needed after pool creation? | |||||||||||
dmg utility testing for example: pool query | |||||||||||
Negative Scenarios with Scalability | |||||||||||
Reliability and Data Integrity (Soak testing) | |||||||||||