NVMe Test Plan

Hardware Configuration

Item	Description
Fabric	Preferred PSM2
Number of Servers	2 to 6
Drives per servers	We have few system which has multiple drives so run first IO tests with 2 NVMe per servers
Number of Client	4 to 8 and use 32 process per CN.
daos_nvme.conf	Use the default option for now. [Nvme] TransportID "trtype:PCIe traddr:0000:81:00.0" Nvme0 TransportID "trtype:PCIe traddr:0000:82:00.0" Nvme1 TimeoutUsec 0 ActionOnTimeout None AdminPollRate 100000 HotplugEnable No HotplugPollRate 0
Minimum Pool Size	1G

Tests:

Test	Condition	Data Input	Comments
UnitTest		src/vos/vea/tests/vea_ut src/eio/smd/tests/smd_ut Look at the DAOS-1246 and include all unit tests for NVMe
Exiting Functional Test		Identify and run existing functional test cases with NVMe	Manual Test is in Progress, once it's done will update the test case information here.
I/O	Create the Pool with small NVMe size (0/1G/48G)	Write/Read data (<4K) 1B/1K/4K [Random/Sequential] and make sure it does not use NVMe Write/Read data with non standard sizes 4025,259K,1.1M,30.22M [Random/Sequential] and make sure it does not use NVMe	For second case, data sizes can be generated random instead of predefined fixed size.
		Write/Read data (>4K) 1M/16G [Random/Sequential] and make sure it use NVMe Write/Read data with non standard sizes 4025,259K,1.1M,30.22M [Random/Sequential] and make sure it does not use NVMe	For second case, data sizes can be generated random instead of predefined fixed size.
	Create the Pool with Large NVMe size (1TB/2TB)	Write/Read data (<4K) 1B/1K/4K [Random/Sequential] and make sure it does not use NVMe Write/Read data with non standard sizes 4025,259K,1.1M,30.22M [Random/Sequential] and make sure it does not use NVMe	For second case, data sizes can be generated random instead of predefined fixed size.
		Write/Read data (>4K) 1M/16G/1TB [Random/Sequential] and make sure it use NVMe Write/Read data with non standard sizes 4025,259K,1.1M,30.22M [Random/Sequential] and make sure it does not use NVMe	For second case, data sizes can be generated random instead of predefined fixed size.
	Unaligned IO	Try using the offset from API or use the core Python API to modify the existing Array and read through.	The test code daos_run_io_conf.c will be doing similar thing so worth to use same.
I/O with Server/System restart	Create the Pool with small NVMe size (0/1G/48G)	Write/Read data (<4K) 1B/1K/4K	Write the data/ Stop Server/Start Server/Read data back and check data integrity.
		Write/Read data (>4K) 1M/16G	Write the data/ Reboot the node/ Start server/read data back and check data integrity.
	Create the Pool with Large NVMe size (1TB/2TB)	Write/Read data (<4K) 1B/1K/4K	Write the data/ Stop Server/Start Server/Read data back and check data integrity.
		Write/Read data (>4K) 1M/16G	Write the data/ Reboot the node/ Start server/read data back and check data integrity.
	Create the Pool with Large NVMe size (1TB/2TB)	Write single IOR data set, Read single IOR data set	Kill the server while IO is doing Write and start the server. IO should continue after server start? Do the same when read is in progress
	Create the Pool with Large NVMe size (1TB/2TB)	Write multiple IOR data-sets, Read multiple IOR data-sets, Read-Write Together	Kill the server while IO is doing multiple write and start the server. IO should continue after server start? Do the same when Multiple Read is in progress.
	Re-written data fetch validation	Write the data on NVMe >4K Re-write using the same array with small size ~1-2 bytes which will go through SCM. Do this and change ~100 bytes with the different data Do fetch which will combined the record and verify it. Do the similar thing like writing small data set to SCM and overwrite large data to NVMe, and validate the content.	When overwriting the data will be kind of new epoch entry getting created. But it will use the old data set and update the new bytes value only. During fetch Epoch will aggregated and provide the result with modified bytes.
	Re-written data fetch validation	Write the data on SCM <4K Extend the data set using the same array with larger size >8K which will go through NVMe Do this and repeat for few times with the different data Fetch the dataset which will combined the records, Verify all the old+new records.
Large number of Pool with Server/System restart	Create the large number of pools (10000) with different NVMe sizes	Write mixed data across all the pools (1K/4K/1M/1T)	Write the data/ Stop Server/Start Server/Read data back and check data integrity.
Large number of Pool with Server/System restart		Write mixed data across all the pools (1K/4K/1M/1T)	Write the data/ Reboot the node/ Start server/read data back and check data integrity.
Pool Capacity	Create the NVMe pool size 1GB	Write IO > 1GB which should failed with ENOM SPACE
	Create pool same size as NVMe drive		Write IO till pool is getting filled up, once the Drive is full it should not allow to write more data with ENOM_SPACE
	Create the pool with maximum of NVMe size and delete.		Run this in loop for example if NVMe is 2TB, create the pool size of 1TB, 500GB, 500GB, delete all the pools. Do this in a loop and make sure pool creation work and size can be reclaimed.
Pool Extend	Extend the single pool to multiple targets	Create the few data set on single pool (1K/4K/1M/1T). Extend the pool to all target at once.	Verify the data integrity after pool extension done
		Create the few data set with single pools (1K/4K/1M/1T). Extend the pool target one by one, for example 6 server so pool created with 2 and start extending the pool to 4 servers one by one	Verify the data integrity after pool extension done
	Extend the multiple pools to targets	Create the few data set on different pools (1K/4K/1M/1T). Extend the pools to all target at once.	Verify the data integrity after pool extension done
		Create the few data set with single pools (1K/4K/1M/1T). Extend the pool target one by one, for example 6 server so pool created with 2 and start extending the pool to 4 servers one by one	Verify the data integrity after pool extension done
Pool Exclude	Exclude the target from Pool	Create the few data set on different pools (1K/4K/1M/1T). Exclude the pools from all target at once.	Add target to pool and verify the data integrity after pool excluded
Pool Exclude		Create the few data set on different pools (1K/4K/1M/1T). Exclude the pools from all target one by one.	Add target to pool and verify the data integrity after pool excluded
NVMe rebuild	Single drive Rebuild	Use 4 server minimum and load 50% of Drives.	Shutdown the single server or Eject and make sure the data is getting rebuilt on another NVMe drive
Object	Create the large number of object update/fetch with different Object ID in single pool created on NVMe		Verify the objects are getting created and data are not corrupted
Object	Create the large number of object in multiple pool created on NVMe (Pools size 1M/1G/1T)		Verify the objects are getting created and data are not corrupted
Performance	Compare the DAOS performance	Run performance utility (TBD) without DAOS and IOR with DAOS.	Performance measurement for Read/Write and IOPS.
Performance	daos_perf test	Run daos_perf test with VOS and with DAOS
Metadata	Create the pool of small size (NVMe size 1G)	Run Mdtest to fill the Pool with metadata	After fool it should not allow any data to be written on pool Even if NVMe has the space ?
Control Plane/Management for NVMe	NVMe SSD discovery with "discover" bindings NVMe SSD burn-in with "burnin" bindings NVMe SSD configuration with "configuration" bindings	TBD	TBD
Control Plane/Management for NVMe	Pro-active action based on telemetry data (rebalancing) evicting SSD based on high temperature wear-leveling data	TBD	TBD
Control Plane/Management for NVMe	SSD firmware image update	TBD	TBD