We have an automated test infrastructure for building and running DAOS tests on Frontera.

Please see below for instructions on building and running tests.

Citizenship

Before doing any work on Frontera, you should read and understand Citizenship on Frontera.

You should also be aware of the limited credits for running jobs. After logging in, you can run:

/usr/local/etc/taccinfo
--------------------- Project balances for user dbohninx ----------------------
| Name           Avail SUs     Expires |                                      |
| STAR-Intel         #####  YYYY-MM-DD |                                      |

Initial Setup

All of these setup instructions should be ran on a login node (E.g. login3.frontera).

Add your local binary directory to your PATH

Add the following line to ~/.bashrc. There should be an if block labeled "SECTION 2" where you should put this.

export PATH=$HOME/.local/bin:$PATH

Setup directories and clone the test scripts

mkdir -p ${WORK}/{BUILDS,TESTS,RESULTS,WEEKLY_RESULTS,TOOLS}
cd ${WORK}/TESTS
git clone https://github.com/daos-stack/daos_scaled_testing

Install python dependencies for post-job scripts

These packages are for some of the .py scripts for post-processing results

cd ${WORK}/TESTS/daos_scaled_testing
python3 -m pip install --upgrade pip
python3 -m pip install --user -r python3_requirements.txt

Build MPI packages - Optional

By default, the system installed MVAPICH2 is used and recommended. If you want to use MPICH or OpenMPI, they must be installed from scratch.

Since we only build with a single core on login nodes (remember Citizenship on Frontera), this may take a while to complete.

cd ${WORK}/TESTS/daos_scaled_testing/frontera
./build_and_install_tools.sh

This script is not well maintained and may need adjustment.

Build DAOS

Edit run_build.sh:

vim run_build.sh

Configure these lines:

BUILD_DIR="${WORK}/BUILDS/"
DAOS_BRANCH="master"

Optionally, you can choose to build a specific branch, commit, or cherry pick.

When executed on a login node, run_build.sh will only use a single process. It is recommended to build on a development node. This will build DAOS in <BUILD_DIR>/<date>/daos and build the latest hpc/ior.

idev
# wait
./run_build.sh

Running Tests

The test script should be executed on a login node, which uses slurm to reserve nodes and run jobs.

Configure run_testlist.py:

cd ${WORK}/TESTS/daos_scaled_testing/frontera
vim run_testlist.py
...
# Configure these lines
env['JOBNAME']     = "<sbatch_jobname>"
env['DAOS_DIR']    = abspath(expandvars("${WORK}/BUILDS/latest/daos")) # Path to daos
env['RES_DIR']     = abspath(expandvars("${WORK}/RESULTS")) # Path to test results

Configure DAOS Configs

The DAOS configs are daos_scaled_testing/frontera/daos_{server,agent,control}.yml.

The client / test runner environment is defined in daos_scaled_testing/frontera/env_daos.

Test Variants

Test configs are in the daos_scaled_testing/frontera/tests directory. Each config is a python file that defines a list of test variants to run. For example, in tests/sanity/ior.py:

'''
    IOR sanity tests.
    Defines a list 'tests' containing dictionary items of tests.
'''

# Default environment variables used by each test
env_vars = {
    'pool_size': '85G',
    'chunk_size': '1M',
    'segments': '1',
    'xfer_size': '1M',
    'block_size': '150G',
    'sw_time': '5',
    'iterations': '1',
    'ppc': 32
}

# List of tests
tests = [
    {
        'test_group': 'IOR',
        'test_name': 'ior_sanity',
        'oclass': 'SX',
        'scale': [
            # (num_servers, num_clients, timeout_minutes)
            (1, 1, 1),
        ],
        'env_vars': dict(env_vars),
        'enabled': True
    },
]

These parameters can be configured as desired. Only tests with 'enabled': True will run. To execute some tests:

$ ./run_testlist.py tests/sanity/ior.py
Importing tests from tests/sanity/ior.py
001. Running ior_sanity SX, 1 servers, 1 clients, 1048576 ec_ell_size
...
Submitted batch job 3728480

You can then monitor the status of your jobs in the queue:

showq -u

Get test results

When all the sbatch jobs complete, you should see the results at <RES_DIR>/<date>. You can extract the IOR and MdTest results into CSV format by running:

cd ${WORK}/TESTS/daos_scaled_testing/frontera

# Get all ior and mdtest results
./get_results.py <RES_DIR>

# Get all ior and mdtest results, and email the result CSVs
./get_results.py <RES_DIR> --email first.last@intel.com

Storing and retrieving tests in the database

This is currently a work in progress. See https://github.com/daos-stack/daos_scaled_testing/blob/master/database/README.md for general usage.

Running the Validation Suite

TODO

Description of test cases

Frontera Test Plan

DAOS on Frontera