TACC Stempede2 Test Plan

TACC Stempede2 Test Plan

How to Run daos on TACC Stampede2

TACC System information:

Server:

# single server instance per config file for now servers: -   targets: 16                # Confirm the number of targets   first_core: 0              # offset of the first core for service xstreams   nr_xs_helpers: 1          # count of offload/helper xstreams per target   fabric_iface: ib0        # map to OFI_INTERFACE=ib0   fabric_iface_port: 31416  # map to OFI_PORT=31416   log_mask: ERR      # map to D_LOG_MASK=ERR   log_file: /tmp/daos_server.log # map to D_LOG_FILE=/tmp/server.log   # Environment variable values should be supplied without encapsulating quotes.   env_vars:                 # influence DAOS IO Server behaviour by setting env variables   - CRT_TIMEOUT=120 - CRT_CREDIT_EP_CTX=0   - PSM2_MULTI_EP=1   - CRT_CTX_SHARE_ADDR=1 - PMEMOBJ_CONF=prefault.at_open=1;prefault.at_create=1; # Do we need this?   - PMEM_IS_PMEM_FORCE=1 # Do we need this?   # Storage definitions   # When scm_class is set to ram, tmpfs will be used to emulate SCM.   # The size of ram is specified by scm_size in GB units.   scm_mount: /dev/shm   # map to -s /mnt/daos   scm_class: ram   scm_size: 90

Server Environment variables (If set any)

Client Configuration:

Configuration:

Environment variables (If set any):

export CRT_PHY_ADDR_STR="ofi+psm2" export OFI_INTERFACE=ib0 export FI_PSM2_NAME_SERVER=1 export PSM2_MULTI_EP=1 export FI_SOCKETS_MAX_CONN_RETRY=1 export CRT_CTX_SHARE_ADDR=1 export CRT_TIMEOUT=120

Other important information for running test:

Item or Notes

Description

Item or Notes

Description

For Defect CART-777

There is an issue with verbs+open MPI so some time it orterun is getting stuck because of it, To workaround the issue use the "--mca btl tcp,self --mca oob tcp" with IOR or other commands

 

 

 

Test Description:

Testing Area 

Test

Test Priority (1- HIGH,  2 - LOW)

Number of Servers

Number of Clients

Input Parameter

Expected Result

Observed Result

Defect

Notes

Expected SU's (1 node * 1 hour = 1 SU)

Testing Area 

Test

Test Priority (1- HIGH,  2 - LOW)

Number of Servers

Number of Clients

Input Parameter

Expected Result

Observed Result

Defect

Notes

Expected SU's (1 node * 1 hour = 1 SU)

Server YAML config options 

To verify the test cases from below section with specific server config options in YAML file 

1

 

 

target = [16]

nr_xs_helpers = [1]

CRT_CTX_SHARE_ADDR=[0, 1]

No sever crash,

Performance increase linearly 

 

 

No need individual test but below test can be used this configuration 

 

 

Performance

No Replica 

Run IOR and collect BW

Run IOR small size and collect IOPS

1

1,

8,

32,

128

128

1,

16,

96,

256

740

protocol : daos

Transfer Size: 256B 4K 128K 512K 1M (Do we need non standard size also be covered?)

Block Size: 64M (Depend upon no. of process as file size will increase because of it)

FPP and SSF

single server got ~12GB Read/write so it should scale linearly.

With 128 server should be close to 1.5TB  BW? 

 

 

 

1406 Nodes

taking ~30 min 

 

703

Replica 2 Way

Run IOR and collect BW

Run IOR small size and collect IOPS

1

8,

32,

128

16,

96,

740

Same As Above

 

 

 

 

1020 Nodes

for ~30 min

510

Replica 3 Way

Run IOR and collect BW

Run IOR small size and collect IOPS

1

8,

32,

128

16,

96,

740

Same As Above

 

 

 

 

1020 Nodes

for ~30 min

510

Replica 4 Way

Run IOR and collect BW

Run IOR small size and collect IOPS

1

8,

32,

128

16,

96,

740

Same As Above

 

 

 

 

1020 Nodes

for ~30 min

510

Any Erasure Encoding object class need to  run? May be with medium size?

EC_2P1G1
EC_2P2G1
EC_8P2G1

1?

32

96

Same As Above?

 

 

 

 

128 nodes for ~60 min

120

Metadata Test (Using MDTest)

1

1,

8,

32,

128

128

1,

16,

96,

256

740

How many tasks per client 1 ,4 or only 8?

What class type should be tested ? 

-n = 1000 (every process will creat/stat/read/remove )

-z = 0 and 20 (depth of hierarchical directory structure) 

Result with 1 server, 1 client is available from

https://jira.hpdd.intel.com/secure/attachment/31383/sbatch_run.txt

 

 

 

1406 Nodes

taking ~15 min 

 

350

CART self_test

1

2

32

126

1

1

1

orterun --timeout 3600 --mca mtl ^psm2,ofi -x FI_PSM2_DISCONNECT=1 -np 1 -ompi-server <urifile> self_test --group-name daos_server --endpoint 0-<NO_OF_SERVER>:0 --master-endpoint 0-<NO_OF_SERVER>:0 --message-sizes 'b1048576',' b1048576 0','0 b1048576',' b1048576 i2048',' i2048 b1048576',' i2048',' i2048 0','0 i2048','0' --max-inflight-rpcs 1 --repetitions 100

 

Did not get all the number for 126 servers 

CART-791

 

https://wiki.hpdd.intel.com/download/attachments/114950812/2SN_1CN_TACC-Stampede2_20191022_144511.txt?api=v2

 

https://wiki.hpdd.intel.com/download/attachments/114950812/32SN_1CN_TACC-Stampede2_20191022_172517.txt?api=v2

 

https://wiki.hpdd.intel.com/download/attachments/114950812/126SN_1CN_TACC-Stampede2_20191023_091546.txt?api=v2

166 Nodes

for ~5 min

14

POSIX (Fuse)

2?

32

96

Run IOR with POSIX mode. Are we there to get the full performance ? 

 

 

 

 

128

for ~60 min

128

DFS

2

 

 

Not sure if we want to cover dfs as we are covering daos with IOR on above test cases

 

 

 

 

 

 

HDF5?

2?

32

96

Any specific test we want to run?

 

 

 

 

 

 

FIO?

 

 

 

Do we want to test this?

 

 

 

 

 

 

Functionality and Scale testing

Run all daos_test

2

128

740

 

 

 

 

 

868 node for ~60 min

 

Single server/Max clients

(IOR)

1

1

126

(Client processes

1

64

128

512

1024

2016)

Create pool, Query pool

Run IOR (Specific size?)

Transfer size: 256B, 1M

Block size: 16M for 256B TS, otherwise 64M

Flags: -w -W -r -R

iter: 3

Poole create should work fine. IOR will run with ~2000 tasks so it should success. Query pool info after IOR run and measure the pool size compare to file size.

Assuming 16 client processes per node. (Need to verify if it works fine. 8 client processes per node works.)

 

 

(Total nodes available at present 127. 16 Client processes per node)

128 node for ~30 min

64

1

1

866

(Client processes

4096

8192

13856)

Create pool, Query pool

Run IOR (Specific size?)

Transfer size: 256B, 1M

Block size: 16M for 256B TS, otherwise 64M

Flags: -w -W -r -R

iter: 3

Poole create should work fine. IOR will run with ~13000 tasks so it should success. Query pool info after IOR run and measure the pool size compare to file size.

Assuming 16 client processes per node. (Need to verify if it works fine. 8 client processes per node works.)

 

 

(Total nodes available at present 867. 16 Client processes per node)

868 node for ~30 min

434

Max servers/single client

(IOR)

1

1

8

16

32

64

126

1

(Client processes

16)

Create pool, Query pool

Run IOR with DAOS and POSIX api

Transfer size: 256B, 1M

Block size: 16M for 256B TS, otherwise 64M

Flags: -w -W -r -R

iter: 3

Poole create should work fine. IOR will be run with 16 client processes per node (need to verify if it works fine. 8 client processes per node works). Query pool info after IOR run and measure the pool size compare to file size.

 

 

(Total nodes available at present 127. 16 Client processes per node)

128 node for ~30 min

64

1

512

866

1

(Client processes

16)

Create pool, Query pool

Run IOR with DAOS and POSIX api

Transfer size: 256B, 1M

Block size: 16M for 256B TS, otherwise 64M

Flags: -w -W -r -R

iter: 3

Poole create should work fine. IOR will be run with 16 client processes per node (need to verify if it works fine. 8 client processes per node works). Query pool info after IOR run and measure the pool size compare to file size.

 

 

(Total nodes available at present 867. 16 Client processes per node)

868 node for ~30 min

434

Large number of Pools (~1000)

 

 

128

Server number  seems ok?

740

Create large number of pools (~90MB each),

Write small data with IOR.

Restart all the servers.

Query all the pools

Read the IOR data from each pool with verification

what other operation needed after pool creation? 

 

Measure server restart time with this many pools

Pool query should report correct sizes after IOR write

IOR read should work fine with data validation after all server restart

 

 

 

868 node for ~60 min 

868

dmg utility testing

for example: pool query

 

 

 

dmg pool create 

dmg pool query

dmg pool destroy

Anything more to cover? Some of this tools are going to cover in other test cases

 

 

 

 

 

 

Negative Scenarios with Scalability

Server failure and rebuild data

128

740

1

Create the multiple pools.

Store the IOR with 2,3,4 replica and with multiple groups.

Kill server one by one 64 maximum (Half the requested size)?

After each server kill read the IOR data and verify the content.

Multiple server can be killed (2/4/8), Object data will be lost if all copy lost. May be we can verify the remaining system is functional

Rebuild should happen for all the object and data should not be corrupted after server failure

 

 

 

868 for ~2 hours

868

daos_run_io_conf

128

740

2

This will exclude the ranks and add it back in to the loop for given number.

We can have maximum 16 targets and include all rank. Test will exclude the rank randomly and add it back.

Pool query is also part of this test to verify the usage 

We have not tried on TACC but locally it works but there are few issue need to be resolved which we caught during local testing (DAOS-3510)

 

 

 

868 for ~30 min

434