This is currently a work in progress. See https://github.com/daos-stack/daos_scaled_testing/blob/master/database/README.md for general usage.

Running the Validation Suite

TODO

Description of test cases

...

Description of test cases

Frontera Test Plan

Validation Suite Workflow

These instructions cover running the Frontera performance suite for validation of Test Builds, Release Candidates, and performance-sensitive features.

Info
Some details are my personal organizational style and can be tweaked as desired - Dalton Bohning

Clone Test Scrips

Using a fresh clone of the test scripts keeps the configuration isolated from other ongoing work. E.g. cloning into a directory with the ticket number.

Code Block
cd $WORK/TESTS git clone git@github.com:daos-stack/daos_scaled_testing.git daos-xxxx cd daos-xxxx/frontera

Build DAOS

Using a separate build directory makes it easy to reference the build later - perhaps several versions later.

Code Block
vim run_build.sh # BUILD_DIR="${WORK}/BUILDS/v2.2.0-rc1" # DAOS_BRANCH="release/2.2" # DAOS_COMMIT="v2.2.0-rc1"

Running on a compute node is quicker, but not required.

Code Block
idev # Wait for session ./run_build.sh # Get a cup of coffee

Common Reasons for Build Failures

TODO

Run Sanity Tests

Before executing hundreds of test cases, make sure the test scripts and DAOS are behaving as expected. For example, sometimes a simple daos or dmg interface change can cause every test to fail.

Code Block

vim run_testlist.py
# env['JOBNAME']     = "daos-xxxx-sanity"
# env['DAOS_DIR']    = abspath(expandvars("${WORK}/BUILDS/v2.2.0-rc1/latest/daos"))
# env['RES_DIR']     = abspath(expandvars("${WORK}/RESULTS/v2.2.0-rc1_sanity"))
./run_testlist.py -r tests/sanity/

After the tests complete, sanity check the results. All tests should contain Pass.

Code Block
./get_results.py --tests all "${WORK}/RESULTS/v2.2.0-rc1_sanity" cat "${WORK}/RESULTS/v2.2.0-rc1_sanity/*.csv"

Optionally, you could look through one or more logs to check for unexpected warnings or errors.

Run Validation Suite

Info
Frontera only allows queuing 100 jobs per user, so tests will need to be executed in batches.

Info
It’s not strictly required to run the tests serially, but this seems to reduce the variance and interference between jobs. Because of how the priority queue works, running serially doesn’t necessarily take longer than running in parallel - especially since fewer jobs need to be re-ran in case of variance/noise/interference.

Info
The --dryrun option to run_testlist.py helps make sure the expected variants will be ran.

Basic Tests

There are currently 48 variants in this group.

Code Block

vim run_testlist.py
# env['JOBNAME']     = "daos-xxxx"
# env['DAOS_DIR']    = abspath(expandvars("${WORK}/BUILDS/v2.2.0-rc1/latest/daos"))
# env['RES_DIR']     = abspath(expandvars("${WORK}/RESULTS/v2.2.0-rc1"))
./run_testlist.py -r --serial tests/basic

EC Tests

Use showq -u to get the SLURM_JOB_ID of the last job in the queue, if any.

There are currently 42 EC IOR variants.

Info
There are also rf0 variants, but those are not usually ran. The --filter option runs only the EC variants.

Code Block

./run_testlist.py --serial --slurm_dep_afterany <LAST_SLURM_JOB_ID> --filter "oclass=EC_16P2G1 oclass=EC_16P2GX oclass=EC_8P2G1 oclass=EC_8P2GX oclass=EC_4P2G1 oclass=EC_4P2GX oclass=EC_2P1G1 oclass=EC_2P1GX" tests/ec_vs_rf0_complex/ior*

There are currently 42 EC MDTest variants.

Code Block

./run_testlist.py --serial --slurm_dep_afterany <LAST_SLURM_JOB_ID> --filter "oclass=EC_16P2G1 oclass=EC_16P2GX oclass=EC_8P2G1 oclass=EC_8P2GX oclass=EC_4P2G1 oclass=EC_4P2GX oclass=EC_2P1G1 oclass=EC_2P1GX" tests/ec_vs_rf0_complex/mdtest*

Rebuild Tests

There is currently 1 variant in this group.

It’s helpful, but not necessary, to use a different results directory.

Code Block

vim run_testlist.py
# env['JOBNAME']     = "daos-xxxx-rebuild"
# env['DAOS_DIR']    = abspath(expandvars("${WORK}/BUILDS/v2.2.0-rc1/latest/daos"))
# env['RES_DIR']     = abspath(expandvars("${WORK}/RESULTS/v2.2.0-rc1_rebuild"))
./run_testlist.py -r --serial --slurm_dep_afterany <LAST_SLURM_JOB_ID> tests/rebuild/load_ec.py

Max Scale Tests

There is currently 1 variant in this group.

It’s helpful, but not necessary, to use a different results directory.

Code Block

vim run_testlist.py
# env['JOBNAME']     = "daos-xxxx-max"
# env['DAOS_DIR']    = abspath(expandvars("${WORK}/BUILDS/v2.2.0-rc1/latest/daos"))
# env['RES_DIR']     = abspath(expandvars("${WORK}/RESULTS/v2.2.0-rc1_max"))
./run_testlist.py -r --serial --slurm_dep_afterany <LAST_SLURM_JOB_ID> tests/max/max.py

TCP Tests

There is currently 1 variant in this group.

It’s helpful, but not necessary, to use a different results directory.

Note
Don’t forget to change daos_server.yml and env_daos back to Verbs after running with TCP!

Code Block

vim daos_server.yml
# provider: ofi+verbs;ofi_rxm           # <-- Comment/remove this out
# provider: ofi+tcp;ofi_rxm             # <-- Uncomment/add this
#  - FI_OFI_RXM_DEF_TCP_WAIT_OBJ=pollfd # <-- Uncomment/add this

vim env_daos
# PROVIDER="${2:-tcp}"

Code Block

vim run_testlist.py
# env['JOBNAME']     = "daos-xxxx-tcp"
# env['DAOS_DIR']    = abspath(expandvars("${WORK}/BUILDS/v2.2.0-rc1/latest/daos"))
# env['RES_DIR']     = abspath(expandvars("${WORK}/RESULTS/v2.2.0-rc1_tcp"))
./run_testlist.py -r --serial --slurm_dep_afterany <LAST_SLURM_JOB_ID> --filter "daos_servers=2,daos_clients=8" tests/basic/mdtest_easy.py

Gather Results

TODO

Insert to Database

TODO

Generate Reports

TODO

Analyze

TODO

Versions Compared

Old Version 37

New Version 38

Key

Running the Validation Suite

Description of test cases

Description of test cases

Validation Suite Workflow

Clone Test Scrips

Build DAOS

Common Reasons for Build Failures

Run Sanity Tests

Run Validation Suite

Basic Tests

EC Tests

Rebuild Tests

Max Scale Tests

TCP Tests

Gather Results

Insert to Database

Generate Reports

Analyze

Page Comparison

Versions Compared

Old Version 37

New Version 38

Key

Running the Validation Suite

Description of test cases

Description of test cases

Validation Suite Workflow

Clone Test Scrips

Build DAOS

Common Reasons for Build Failures

Run Sanity Tests

Run Validation Suite

Basic Tests

EC Tests

Rebuild Tests

Max Scale Tests

TCP Tests

Gather Results

Insert to Database

Generate Reports

Analyze