Verify data access after engine restart w/ WAL replay + w/ check pointing (unsynchronized WAL & VOS)
DAOS-13009: Implement verify data access after engine restart w/ WAL replay + w/ check pointing test case Resolved
Start 2 DAOS servers with 1 engine per server
Create a single pool and container
Run ior w/ DFS to populate the container with data
After ior has completed, shutdown every engine cleanly (dmg system stop
)
Restart each engine (dmg system start
)
Verify the previously written data matches with an ior read
Verify POSIX data access after engine restart
(check modification timestamp?)
DAOS-13010: Implement verify POSIX data access after engine restart test case Resolved
Start 2 DAOS servers with 1 engine per server
Create a single pool and a POSIX container
Start dfuse
Write and then read data to the dfuse mount point
After the read has completed, unmount dfuse
Shutdown every engine cleanly (dmg system stop)
Restart each engine (dmg system start)
Remount dfuse
Verify the previously written data exists
Verify more data can be written
Verify device roles in dmg storage query output
https://daosio.atlassian.net/browse/DAOS-13011
Start 1 DAOS server with 1 engine per server
Get a list of device information (dmg storage query list-devices
)
Verify each device’s role entry matches the expected value based upon the server storage configuration
Verify data access after engine restart w/o WAL replay + w/ check pointing
(synchronized WAL & VOS)
https://daosio.atlassian.net/browse/DAOS-13012
Start 2 DAOS servers with 1 engine per server
Create a single pool and container
Run ior w/ DFS to populate the container with data
Confirm that all data has been checkpointed
After ior has completed, shutdown every engine cleanly (dmg system stop
)
Restart each engine (dmg system start
)
Verify the previously written data matches with an ior read
DAOS-13016: Add new mechanism to verify data has been checkpointed Resolved
Verify data access after engine restart w/ WAL replay + w/o check pointing (unsynchronized WAL & VOS)
https://daosio.atlassian.net/browse/DAOS-13013
Start 2 DAOS servers with 1 engine per server
Create a single pool and container
Disable checkpointing
Run ior w/ DFS to populate the container with small amount of data
After ior has completed, shutdown every engine cleanly (dmg system stop
)
Restart each engine (dmg system start
)
Verify the previously written data matches with an ior read
https://daosio.atlassian.net/browse/DAOS-13017
Verify snapshots after engine restart
https://daosio.atlassian.net/browse/DAOS-13014
Start 2 DAOS servers with 1 engine per server
Create a single pool and container in the pool
Run ior w/ DFS to populate the container with persistent data followed by creating a snapshot (daos container create-snap
). Repeat this three times.
Verify all three snapshots exist (daos container list-snaps
)
Remove the second snapshot (daos container destroy-snap
)
Verify that two snapshots exist (daos container list-snaps
)
Shutdown every engine cleanly (dmg system stop --force
)
Restart each engine (dmg system start
)
Verify all engines have joined (dmg system query
)
Verify that two snapshots exist (daos container list-snaps
)
Remove the two snapshots (daos container destroy-snap
)
Verify that no snapshots exist (daos container list-snaps
)
Verify pool & container attributes after engine restart
DAOS-13015: Implement verify pool & container attributes after engine restart test case Resolved
Start 3 DAOS servers with 1 engine on each server
Create a multiple pools and containers
List the current pool and container attributes
Modify at least one different attribute on each pool and container
Shutdown every engine cleanly (dmg system stop
)
Restart each engine (dmg system start
)
Verify each modified pool and container attribute is still set
Verify the specific metrics to track activity of md_on_ssd.
DAOS-11626: Add tests for new md_on_ssd metrics Resolved
test_wal_commit_metrics
Start 2 DAOS servers with 1 engine on each server
Verify the engine_dmabuff_wal_* metrics are 0
Create a pool
Verify the engine_dmabuff_wal_sz metric is greater than 0
Verify the engine_dmabuff_wal_waiters metrics are 0
test_wal_reply_metrics
Start 2 DAOS servers with 1 engine on each server
Verify the engine_pool_vos_rehydration_replay_* metrics are 0
Create a pool
Verify the engine_pool_vos_rehydration_replay_count metric is 1
Verify the engine_pool_vos_rehydration_replay_entries metric is > 0
Verify the engine_pool_vos_rehydration_replay_size metric is > 0
Verify the engine_pool_vos_rehydration_replay_time metric is within 10,000 - 50,000
Verify the engine_pool_vos_rehydration_replay_transactions metric is > 0
test_wal_checkpoint_metrics
Start 2 DAOS servers with 1 engine on each server
Verify the engine_pool_checkpoint_* metrics are 0
Create a pool w/ check pointing disabled (pool1)
Verify the engine_pool_checkpoint_* metrics are 0 for pool1
Create a pool w/ check pointing eanbled (pool2)
Verify the engine_pool_checkpoint_* metrics are 0 for pool1
Verify the engine_pool_checkpoint_dirty_chunks metrics are within 0-300 for pool2
Verify the engine_pool_checkpoint_dirty_pages metrics are within 0-3 for pool2
Verify the engine_pool_checkpoint_duration metrics are within 0-300 for pool2
Verify the engine_pool_checkpoint_iovs_copied metrics are > 0 for pool2
Verify the engine_pool_checkpoint_wal_purged metrics are >= 0 for pool2
Create a container for pool2
Use ior to write data to pool2
Wait double the check point frequency to allow for check pointing to complete
Verify the engine_pool_checkpoint_* metrics are 0 for pool1
Verify the engine_pool_checkpoint_dirty_chunks metrics are within 0-300 for pool2
Verify the engine_pool_checkpoint_dirty_pages metrics are within 0-3 for pool2
Verify the engine_pool_checkpoint_duration metrics are within 0-300 for pool2
Verify the engine_pool_checkpoint_iovs_copied metrics are > 0 for pool2
Verify the engine_pool_checkpoint_wal_purged metrics are > 0 for pool2
Add running a subset of pr tests with MD on SSD in master PRs
DAOS-13530: Add running a subset of pr tests with MD on SSD in master PRs Resolved
Update the existing nvme/fault.py test to expect a stopped server when setting a device fault that has "has_sys_xs" set to true.
Additional testing handled by https://daosio.atlassian.net/wiki/spaces/DC/pages/11161927681