Page Comparison

...

Testing Area	Test	Test Priority (1- HIGH, 2 - LOW)	Number of Servers	Number of Clients	Input Parameter	Expected Result	Observed Result	Defect	Notes	Expected SU's (1 node * 1 hour = 1 SU)
Server YAML config options	To verify the test cases from below section with specific server config options in YAML file	1			target = [16] nr_xs_helpers = [1] CRT_CTX_SHARE_ADDR=[0, 1]	No sever crash, Performance increase linearly			No need individual test but below test can be used this configuration
Performance	No Replica Run IOR and collect BW Run IOR small size and collect IOPS	1	1, 8, 32, 128 128	1, 16, 96, 256 740	protocol : daos Transfer Size: 256B 4K 128K 512K 1M (Do we need non standard size also be covered?) Block Size: 64M (Depend upon no. of process as file size will increase because of it) FPP and SSF	single server got ~12GB Read/write so it should scale linearly. With 128 server should be close to 1.5TB BW?				1406 Nodes taking ~30 min	703
	Replica 2 Way Run IOR and collect BW Run IOR small size and collect IOPS	1	8, 32, 128	16, 96, 740	Same As Above					1020 Nodes for ~30 min	510
	Replica 3 Way Run IOR and collect BW Run IOR small size and collect IOPS	1	8, 32, 128	16, 96, 740	Same As Above					1020 Nodes for ~30 min	510
	Replica 4 Way Run IOR and collect BW Run IOR small size and collect IOPS	1	8, 32, 128	16, 96, 740	Same As Above					1020 Nodes for ~30 min	510
	Any Erasure Encoding object class need to run? May be with medium size? EC_2P1G1 EC_2P2G1 EC_8P2G1	1?	32	96	Same As Above?					128 nodes for ~60 min	120
	Metadata Test (Using MDTest)	1	1, 8, 32, 128 128	1, 16, 96, 256 740	How many tasks per client 1 ,4 or only 8? What class type should be tested ? -n = 1000 (every process will creat/stat/read/remove ) -z = 0 and 20 (depth of hierarchical directory structure)	Result with 1 server, 1 client is available from https://jira.hpdd.intel.com/secure/attachment/31383/sbatch_run.txt				1406 Nodes taking ~15 min	350
	CART self_test	1	2 32 126	1 1 1	orterun --timeout 3600 --mca mtl ^psm2,ofi -x FI_PSM2_DISCONNECT=1 -np 1 -ompi-server <urifile> self_test --group-name daos_server --endpoint 0-<NO_OF_SERVER>:0 --master-endpoint 0-<NO_OF_SERVER>:0 --message-sizes 'b1048576',' b1048576 0','0 b1048576',' b1048576 i2048',' i2048 b1048576',' i2048',' i2048 0','0 i2048','0' --max-inflight-rpcs 1 --repetitions 100		Did not get all the number for 126 servers	CART-791	https://wiki.hpdd.intel.com/download/attachments/114950812/2SN_1CN_TACC-Stampede2_20191022_144511.txt?api=v2 https://wiki.hpdd.intel.com/download/attachments/114950812/32SN_1CN_TACC-Stampede2_20191022_172517.txt?api=v2 https://wiki.hpdd.intel.com/download/attachments/114950812/126SN_1CN_TACC-Stampede2_20191023_091546.txt?api=v2	166 Nodes for ~5 min	14
	POSIX (Fuse)	2?	32	96	Run IOR with POSIX mode. Are we there to get the full performance ?					128 for ~60 min	128
	DFS	2			Not sure if we want to cover dfs as we are covering daos with IOR on above test cases
	HDF5?	2?	32	96	Any specific test we want to run?
	FIO?				Do we want to test this?
Functionality and Scale testing	Run all daos_test	2	128	740						868 node for ~60 min
	Single server/Max clients (IOR)		1	126 (Client processes 1 64 128 512 1024 2016)	Create pool, Query pool Run IOR (Specific size?) Transfer size: 256B, 1M Block size: 16M for 256B TS, otherwise 64M Flags: -w -W -r -R iter: 3	Poole create should work fine. IOR will run with ~2000 tasks so it should success. Query pool info after IOR run and measure the pool size compare to file size. Assuming 16 client processes per node. (Need to verify if it works fine. 8 client processes per node works.)			(Total nodes available at present 128127. 16 Client processes per node)	128 node for ~30 min	64
	Single server/Max clients (IOR)		1	866 (Client processes 1 128 1024 4096 8192 13856)	Create pool, Query pool Run IOR (Specific size?) Transfer size: 256B, 1M Block size: 16M for 256B TS, otherwise 64M Flags: -w -W -r -R iter: 3	Poole create should work fine. IOR will run with ~13000 tasks so it should success. Query pool info after IOR run and measure the pool size compare to file size. Assuming 16 client processes per node. (Need to verify if it works fine. 8 client processes per node works.)			(Total nodes available at present 868867. 16 Client processes per node)	868 node for ~30 min	434
	Max servers/single client (IOR)		1 8 16 32 64 126	1 (Client processes 16)	Create pool, Query pool Run IOR with DAOS and POSIX api Transfer size: 256B, 1M Block size: 16M for 256B TS, otherwise 64M Flags: -w -W -r -R iter: 3	Poole create should work fine. IOR will be run with 16 client processes per node (need to verify if it works fine. 8 client processes per node works). Query pool info after IOR run and measure the pool size compare to file size.			(Total nodes available at present 128127. 16 Client processes per node)	128 node for ~30 min	64
	Max servers/single client (IOR)		1 16 64 256 512 866	1 (Client processes 16)	Create pool, Query pool Run IOR with DAOS and POSIX api Transfer size: 256B, 1M Block size: 16M for 256B TS, otherwise 64M Flags: -w -W -r -R iter: 3	Poole create should work fine. IOR will be run with 16 client processes per node (need to verify if it works fine. 8 client processes per node works). Query pool info after IOR run and measure the pool size compare to file size.			(Total nodes available at present 868867. 16 Client processes per node)	868 node for ~30 min	434
	Single server/Max clients (Mdtest)		1	Client processes 1 64 128 512 1024 2032	Create pool Run mdtest with DFS and Posix api num of files/dir: -n 100\|10K write and read (-w and -e): 4 for -n 100 otherwise keep files empty depth (-z): 0 and 20 iter: 3	Pool create should work as expected. Run mdtest with 16 client processes per node(need to be verified if 16 clients per node works fine).			(Total nodes available at present 128127. 16 Client processes per node)	128 node for ~30 min	64
	Single server/Max clients (Mdtest)		1	Client processes 1 128 1024 4096 8192 13872	Create pool Run mdtest with DFS and Posix api num of files/dir: -n 100\|10K write and read (-w and -e): 4 for -n 100 otherwise keep files empty depth (-z): 0 and 20 iter: 3	Pool create should work as expected. Run mdtest with 16 client processes per node(need to be verified if 16 clients per node works fine).			(Total nodes available at present 868867. 16 Client processes per node)	868 node for ~30 min	434
	Max servers/single client (Mdtest)		1 8 16 32 64 127	Client processes 16	Create pool Run mdtest with DFS and Posix api num of files/dir: -n 100\|10K write and read (-w and -e): 4 for -n 100 otherwise keep files empty depth (-z): 0 and 20 iter: 3	Pool create should work as expected. Run mdtest with 16 client processes per node(need to be verified if 16 clients per node works fine).			(Total nodes available at present 128127. 16 Client processes per node)	128 node for ~30 min	64
	Max servers/single client (Mdtest)		1 16 64 256 512 867	Client processes 16	Create pool Run mdtest with DFS and Posix api num of files/dir: -n 100\|10K write and read (-w and -e): 4 for -n 100 otherwise keep files empty depth (-z): 0 and 20 iter: 3	Pool create should work as expected. Run mdtest with 16 client processes per node(need to be verified if 16 clients per node works fine).			(Total nodes available at present 868867. 16 Client processes per node)	868 node for ~30 min	434
	Large number of Pools (~1000)		128 Server number seems ok?	740	Create large number of pools (~90MB each), Write small data with IOR. Restart all the servers. Query all the pools Read the IOR data from each pool with verification what other operation needed after pool creation?	Measure server restart time with this many pools Pool query should report correct sizes after IOR write IOR read should work fine with data validation after all server restart				868 node for ~60 min	868
	dmg utility testing for example: pool query				dmg pool create dmg pool query dmg pool destroy Anything more to cover? Some of this tools are going to cover in other test cases
Negative Scenarios with Scalability	Server failure and rebuild data	128	740	1	Create the multiple pools. Store the IOR with 2,3,4 replica and with multiple groups. Kill server one by one 64 maximum (Half the requested size)? After each server kill read the IOR data and verify the content. Multiple server can be killed (2/4/8), Object data will be lost if all copy lost. May be we can verify the remaining system is functional	Rebuild should happen for all the object and data should not be corrupted after server failure				868 for ~2 hours	868
Negative Scenarios with Scalability	daos_run_io_conf	128	740	2	This will exclude the ranks and add it back in to the loop for given number. We can have maximum 16 targets and include all rank. Test will exclude the rank randomly and add it back. Pool query is also part of this test to verify the usage	We have not tried on TACC but locally it works but there are few issue need to be resolved which we caught during local testing (DAOS-3510)				868 for ~30 min	434
Reliability and Data Integrity (Soak testing)	Current Soak testing									868 for 2 hours	1736

...

Versions Compared

Old Version 63

New Version 64

Key