IO-500 SC21 (and ISC22)
Table of Contents
Notes
These instructions involve a new way to run io-500 with DAOS that does not require using dfuse on all client nodes; thus the instructions are a little different than previous versions.
More specifically, the changes are in the run section of the benchmark and the following changes should be noted vs previous versions:
mfu branch should be updated (still same branch name, just git pull, or fresh checkout).
io500 app is updated to sc21 tag.
should NOT export DAOS_FUSE env variable.
should export a new env variable on the client env - MFU_POSIX_TS=1 (can be done with the mpirun command)
this is an intermediate step to still support the older way of running and will not be required in future versions.
support for pool and container labels have been added (don't need to use uuids).
a new io-500 ini file that supports this new mode without dfuse, and also adds parameters for an extended io-500 run.
dfuse is still required only on the login node when running io-500 using the io-500.sh script
it is not required though if running the binary directly.
The run io-500 sections below has more details
Pre-requisites
DAOS - See Hardware Requirements - DAOS v2.6 for installation and setup instructions
MPI - any version / implementation
cmake 3.1+
clush - See https://clustershell.readthedocs.io/en/latest/install.html for installation
Alternatives are possible, though examples are not provided in these instructions.
Build Paths
These instructions assume the following paths. For simplicity, you can set these variables to the actual locations where you have/want these installed.
After setting these variables, most of the scripts can be "copy-pasted".
MY_DAOS_INSTALL_PATH=${HOME}/install/daos
MY_MFU_INSTALL_PATH=${HOME}/install/mfu
MY_MFU_SOURCE_PATH=${HOME}/mpifileutils
MY_MFU_BUILD_PATH=${HOME}/mpifileutils/build
MY_IO500_PATH=${HOME}/io500
Build MFU Dependencies
Make sure the MPI you want to use is in your PATH and LD_LIBRARY_PATH. (E.g. module load)
libcircle, lwgrp, and dtcmp
Follow the instructions to build libcircle, lwgrp and dtcmp here (but not mpilefileutils):
https://mpifileutils.readthedocs.io/en/v0.10.1/build.html#build-everything-directly
You can use the same prefix for all three dependencies above (libcircle, lwgrp, dtcmp). Let’s assume we used: ${MY_MFU_INSTALL_PATH}
Before building libcircle, an optimization you can make is to apply this change to the source (the 512 in there can be tuned more depending on how many total MPI ranks you are using to run the io-500):
# Navigate to libcircle source directory
cd libcircle-0.3.0
# Generate patch file
cat << 'EOF' > libcircle_opt.patch
--- a/libcircle/token.c
+++ b/libcircle/token.c
@@ -1307,6 +1307,12 @@
LOG(CIRCLE_LOG_DBG, "Sending work request to %d...", source);
+ /* first always ask rank 0 for work */
+ int temp;
+ MPI_Comm_rank(comm, &temp);
+ if (st->local_work_requested < 10 && temp != 0 && temp < 512)
+ source = 0;
+
/* increment number of work requests for profiling */
st->local_work_requested++;
EOF
# Apply the patch
patch -p1 < libcircle_opt.patch
libarchive-devel
You will need libarchive-devel. For example:
yum install libarchive-devel
Build MFU
Make sure the MPI loaded in your PATH is the same as the one used to build the libcircle, lwgrp and dtcmp.
Clone and build MFU
After setting MY_DAOS_INSTALL_PATH, MY_MFU_SOURCE_PATH, and MY_MFU_BUILD_PATH, you can run:
Add MFU libraries and binaries to your path
Clone and Build IO-500
Clone the IO-500 repo
Edit prepare.sh to:
Point to the pfind that works with our mpifileutils
Build ior with DFS support
Assuming MY_DAOS_INSTALL_PATH is set, you can run:
( Note: for the ISC22 version of IO500, line 9 of this patch needs to be changed to the new version, as of 2022-04-20 it is IOR_HASH=d3574d536643475269d37211e283b49ebd6732d7
)
Update the Makefile with correct paths
The Makefile needs to be updated to use the actual install location of DAOS and MFU. If you set MY_DAOS_INSTALL_PATH and MY_MFU_INSTALL_PATH, you can run:
Run the prepare.sh script
Run IO-500
Setup the config file
A sample config-full.ini file for reference: https://github.com/mchaarawi/io500/blob/main/config-full-sc21.ini
If you want to download this:
you need to change the result dir:
https://github.com/mchaarawi/io500/blob/main/config-full-sc21.ini#L4
to point to a directory where the results will be stored. This directory is required to be accessible from rank 0 of the io-500 application. So it can be either:
A shared filesystem (example: an NFS, dfuse, lustre fs) accessible from the first node in the hostfile where rank 0 is running.
A local file system (/tmp/results) on the first node in the hostfile where rank 0 is running.
After the run is complete, the result files are all stored under this directory.
When running at first, set a short stonewall (5 seconds) to just verify everything runs fine.
For [find] the nprocs setting under that should be the same as the number of processes you want to run with the entire workflow (in io500.sh).
Create DAOS pool, container with type POSIX
For documentation on creating pools, see Pool Operations - DAOS v2.6.
For documentation on creating containers, see Container Management - DAOS v2.6.
For example:
Set the pool, cont, and mfu timestamp environment variables
Note that when using Intel MPI, some extra environment variables are required as detailed on:
https://docs.daos.io/v2.0/user/mpi-io/?h=intel+mpi#intel-mpi
Substitute variables in the config file
This will replace $DAOS_POOL, $DAOS_CONT with their actual values.
Run the io500 in one of two ways:
Run the binary directly with or without the extended mode:
The extended mode is not required for an official submission and will extend your runtime significantly. After the run completes, you will need to tar up the result dir for that run.
Note that some versions of OpenMPI require setting the environment variables on the mpirun command line, so one needs to add the environment variables that are mentioned above (including the LD_LIBRARY_PATH that was updated to include mfu library path) on the mpirun command line with the following format:
Run the io-500.sh script:
This requires mounting dfuse on the launch node only (not all the compute nodes):
Then, edit the io500.sh launch script with the mpirun command and change the local workdir, to add the dfuse prefix
Note that some versions of OpenMPI require setting the environment variables on the mpirun command line, so one needs to add the environment variables that are mentioned above (including the LD_LIBRARY_PATH that was updated to include mfu library path) on the mpirun command line in the script here with the following format:
Then run the io500.sh script which will tar the results for you at the end and place them in the result directory you specified in the ini file:
Lastly umount dfuse on the launch node:
Results
The tarball generated at the end (whether ran the binary or with the script) with the results can be submitted to the io500 committee for consideration.