...
This is currently a work in progress. See https://github.com/daos-stack/daos_scaled_testing/blob/master/database/README.md for general usage.
Running the Validation Suite
TODO
Description of test cases
...
Description of test cases
Validation Suite Workflow
These instructions cover running the Frontera performance suite for validation of Test Builds, Release Candidates, and performance-sensitive features.
Info |
---|
Some details are my personal organizational style and can be tweaked as desired - Dalton Bohning |
Clone Test Scrips
Using a fresh clone of the test scripts keeps the configuration isolated from other ongoing work. E.g. cloning into a directory with the ticket number.
Code Block |
---|
cd $WORK/TESTS
git clone git@github.com:daos-stack/daos_scaled_testing.git daos-xxxx
cd daos-xxxx/frontera |
Build DAOS
Using a separate build directory makes it easy to reference the build later - perhaps several versions later.
Code Block |
---|
vim run_build.sh
# BUILD_DIR="${WORK}/BUILDS/v2.2.0-rc1"
# DAOS_BRANCH="release/2.2"
# DAOS_COMMIT="v2.2.0-rc1" |
Running on a compute node is quicker, but not required.
Code Block |
---|
idev
# Wait for session
./run_build.sh
# Get a cup of coffee |
Common Reasons for Build Failures
TODO
Run Sanity Tests
Before executing hundreds of test cases, make sure the test scripts and DAOS are behaving as expected. For example, sometimes a simple daos
or dmg
interface change can cause every test to fail.
Code Block |
---|
vim run_testlist.py
# env['JOBNAME'] = "daos-xxxx-sanity"
# env['DAOS_DIR'] = abspath(expandvars("${WORK}/BUILDS/v2.2.0-rc1/latest/daos"))
# env['RES_DIR'] = abspath(expandvars("${WORK}/RESULTS/v2.2.0-rc1_sanity"))
./run_testlist.py -r tests/sanity/ |
After the tests complete, sanity check the results. All tests should contain Pass
.
Code Block |
---|
./get_results.py --tests all "${WORK}/RESULTS/v2.2.0-rc1_sanity"
cat "${WORK}/RESULTS/v2.2.0-rc1_sanity/*.csv" |
Optionally, you could look through one or more logs to check for unexpected warnings or errors.
Run Validation Suite
Info |
---|
Frontera only allows queuing 100 jobs per user, so tests will need to be executed in batches. |
Info |
---|
It’s not strictly required to run the tests serially, but this seems to reduce the variance and interference between jobs. Because of how the priority queue works, running serially doesn’t necessarily take longer than running in parallel - especially since fewer jobs need to be re-ran in case of variance/noise/interference. |
Info |
---|
The --dryrun option to run_testlist.py helps make sure the expected variants will be ran. |
Basic Tests
There are currently 48 variants in this group.
Code Block |
---|
vim run_testlist.py
# env['JOBNAME'] = "daos-xxxx"
# env['DAOS_DIR'] = abspath(expandvars("${WORK}/BUILDS/v2.2.0-rc1/latest/daos"))
# env['RES_DIR'] = abspath(expandvars("${WORK}/RESULTS/v2.2.0-rc1"))
./run_testlist.py -r --serial tests/basic |
EC Tests
Use showq -u
to get the SLURM_JOB_ID
of the last job in the queue, if any.
There are currently 42 EC IOR variants.
Info |
---|
There are also rf0 variants, but those are not usually ran. The --filter option runs only the EC variants. |
Code Block |
---|
./run_testlist.py --serial --slurm_dep_afterany <LAST_SLURM_JOB_ID> --filter "oclass=EC_16P2G1 oclass=EC_16P2GX oclass=EC_8P2G1 oclass=EC_8P2GX oclass=EC_4P2G1 oclass=EC_4P2GX oclass=EC_2P1G1 oclass=EC_2P1GX" tests/ec_vs_rf0_complex/ior* |
There are currently 42 EC MDTest variants.
Code Block |
---|
./run_testlist.py --serial --slurm_dep_afterany <LAST_SLURM_JOB_ID> --filter "oclass=EC_16P2G1 oclass=EC_16P2GX oclass=EC_8P2G1 oclass=EC_8P2GX oclass=EC_4P2G1 oclass=EC_4P2GX oclass=EC_2P1G1 oclass=EC_2P1GX" tests/ec_vs_rf0_complex/mdtest* |
Rebuild Tests
There is currently 1 variant in this group.
It’s helpful, but not necessary, to use a different results directory.
Code Block |
---|
vim run_testlist.py
# env['JOBNAME'] = "daos-xxxx-rebuild"
# env['DAOS_DIR'] = abspath(expandvars("${WORK}/BUILDS/v2.2.0-rc1/latest/daos"))
# env['RES_DIR'] = abspath(expandvars("${WORK}/RESULTS/v2.2.0-rc1_rebuild"))
./run_testlist.py -r --serial --slurm_dep_afterany <LAST_SLURM_JOB_ID> tests/rebuild/load_ec.py |
Max Scale Tests
There is currently 1 variant in this group.
It’s helpful, but not necessary, to use a different results directory.
Code Block |
---|
vim run_testlist.py
# env['JOBNAME'] = "daos-xxxx-max"
# env['DAOS_DIR'] = abspath(expandvars("${WORK}/BUILDS/v2.2.0-rc1/latest/daos"))
# env['RES_DIR'] = abspath(expandvars("${WORK}/RESULTS/v2.2.0-rc1_max"))
./run_testlist.py -r --serial --slurm_dep_afterany <LAST_SLURM_JOB_ID> tests/max/max.py |
TCP Tests
There is currently 1 variant in this group.
It’s helpful, but not necessary, to use a different results directory.
Note |
---|
Don’t forget to change daos_server.yml and env_daos back to Verbs after running with TCP! |
Code Block |
---|
vim daos_server.yml
# provider: ofi+verbs;ofi_rxm # <-- Comment/remove this out
# provider: ofi+tcp;ofi_rxm # <-- Uncomment/add this
# - FI_OFI_RXM_DEF_TCP_WAIT_OBJ=pollfd # <-- Uncomment/add this
vim env_daos
# PROVIDER="${2:-tcp}" |
Code Block |
---|
vim run_testlist.py
# env['JOBNAME'] = "daos-xxxx-tcp"
# env['DAOS_DIR'] = abspath(expandvars("${WORK}/BUILDS/v2.2.0-rc1/latest/daos"))
# env['RES_DIR'] = abspath(expandvars("${WORK}/RESULTS/v2.2.0-rc1_tcp"))
./run_testlist.py -r --serial --slurm_dep_afterany <LAST_SLURM_JOB_ID> --filter "daos_servers=2,daos_clients=8" tests/basic/mdtest_easy.py |
Gather Results
TODO
Insert to Database
TODO
Generate Reports
TODO
Analyze
TODO