NOTE THESE ARE NOT TO BE APPLIED TO 2.0 TESTING, USE THE QUICKSTARTS IN THE 2.0 ON-LINE DOCUMENTATION
Table of Contents |
---|
Introduction
...
All nodes have a base openSUSE or SLES 15.2 installed.
Install pdsh pdsh on the admin node
Code Block |
---|
sudo zypper install pdsh |
...
For example, if one wanted to use node-1 as their admin node, node-2 and node-3 as client nodes, and node-4 and node-5 as their server nodes then these variables would be defined as:
Code Block |
---|
ADMIN_NODE=node-1
CLIENT_NODES=node-2, node-3
SERVER_NODES=node-4, node-5
ALL_NODES=$ADMIN_NODE,$CLIENT_NODES,$SERVER_NODES |
...
Note |
---|
If a client node is also serving as an admin node then exclude $ADMIN_NODE from the ALL_NODES assignment to prevent duplication, e.g. ALL_NODES=$CLIENT_NODES,$SERVER_NODES |
Set-Up
Please refer here for initial set up which consists of rpm installation, generate and set up certificates, setting up config files, starting servers and agents.
Note |
---|
For this quick start, the daos-tests package will need to be installed on the client nodes |
The following applications will be run from a client node:
...
Code Block |
---|
SHARED_DIR=<shared dir by all nodes>
export FI_UNIVERSE_SIZE=2048
export OFI_INTERFACE=eth0
export CRT_PHY_ADDR_STR="ofi+sockets"
# selt_test --help for more details on params
#Generate the attach info file (enable SHARED_DIR with perms for sudo to write )
sudo daos_agent -o /etc/daos/daos_agent.yml -l $SHARED_DIR/daos_agent.log dump-attachinfo -o $SHARED_DIR/daos_server.attach_info_tmp
# Run:
self_test --path $SHARED_DIR --group-name daos_server --endpoint 0-1:0 (for 4 servers --endpoint 0-3:0 ranks:tags)
# Sample output:
Adding endpoints:
ranks: 0-1 (# ranks = 2)
tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
Group name to test against: daos_server
# endpoints: 2
Message sizes: [(200000-BULK_GET 200000-BULK_PUT), (200000-BULK_GET 0-EMPTY), (0-EMPTY 200000-BULK_PUT), (200000-BULK_GET 1000-IOV), (1000-IOV 200000-BULK_PUT), (1000-IOV 1000-IOV), (1000-IOV 0-EMPTY), (0-EMPTY 1000-IOV), (0-EMPTY 0-EMPTY)]
Buffer addresses end with: <Default>
Repetitions per size: 20000
Max inflight RPCs: 1000
CLI [rank=0 pid=3255] Attached daos_server
##################################################
Results for message size (200000-BULK_GET 200000-BULK_PUT) (max_inflight_rpcs = 1000):
Master Endpoint 2:0
-------------------
RPC Bandwidth (MB/sec): 222.67
RPC Throughput (RPCs/sec): 584
RPC Latencies (us):
Min : 27191
25th %: 940293
Median : 1678137
75th %: 2416765
Max : 3148987
Average: 1671626
Std Dev: 821872.40
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 2416764
1:0 - 969063
##################################################
Results for message size (200000-BULK_GET 0-EMPTY) (max_inflight_rpcs = 1000):
Master Endpoint 2:0
-------------------
RPC Bandwidth (MB/sec): 112.08
RPC Throughput (RPCs/sec): 588
RPC Latencies (us):
Min : 2880
25th %: 1156162
Median : 1617356
75th %: 2185604
Max : 2730569
Average: 1659133
Std Dev: 605053.68
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 2185589
1:0 - 1181363
##################################################
Results for message size (0-EMPTY 200000-BULK_PUT) (max_inflight_rpcs = 1000):
Master Endpoint 2:0
-------------------
RPC Bandwidth (MB/sec): 112.11
RPC Throughput (RPCs/sec): 588
RPC Latencies (us):
Min : 4956
25th %: 747786
Median : 1558111
75th %: 2583834
Max : 3437395
Average: 1659959
Std Dev: 1078975.59
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 2583826
1:0 - 776862
##################################################
Results for message size (200000-BULK_GET 1000-IOV) (max_inflight_rpcs = 1000):
Master Endpoint 2:0
-------------------
RPC Bandwidth (MB/sec): 112.57
RPC Throughput (RPCs/sec): 587
RPC Latencies (us):
Min : 2755
25th %: 12341
Median : 1385716
75th %: 3393178
Max : 3399349
Average: 1660125
Std Dev: 1446054.82
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 12343
1:0 - 3393174
##################################################
Results for message size (1000-IOV 200000-BULK_PUT) (max_inflight_rpcs = 1000):
Master Endpoint 2:0
-------------------
RPC Bandwidth (MB/sec): 112.68
RPC Throughput (RPCs/sec): 588
RPC Latencies (us):
Min : 4557
25th %: 522380
Median : 1640322
75th %: 2725419
Max : 3441963
Average: 1661254
Std Dev: 1147206.09
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 600190
1:0 - 2725402
##################################################
Results for message size (1000-IOV 1000-IOV) (max_inflight_rpcs = 1000):
Master Endpoint 2:0
-------------------
RPC Bandwidth (MB/sec): 88.87
RPC Throughput (RPCs/sec): 46595
RPC Latencies (us):
Min : 1165
25th %: 21374
Median : 21473
75th %: 21572
Max : 21961
Average: 20923
Std Dev: 2786.99
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 21430
1:0 - 21516
##################################################
Results for message size (1000-IOV 0-EMPTY) (max_inflight_rpcs = 1000):
Master Endpoint 2:0
-------------------
RPC Bandwidth (MB/sec): 59.03
RPC Throughput (RPCs/sec): 61902
RPC Latencies (us):
Min : 1164
25th %: 15544
Median : 16104
75th %: 16575
Max : 17237
Average: 15696
Std Dev: 2126.37
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 15579
1:0 - 16571
##################################################
Results for message size (0-EMPTY 1000-IOV) (max_inflight_rpcs = 1000):
Master Endpoint 2:0
-------------------
RPC Bandwidth (MB/sec): 46.93
RPC Throughput (RPCs/sec): 49209
RPC Latencies (us):
Min : 945
25th %: 20327
Median : 20393
75th %: 20434
Max : 20576
Average: 19821
Std Dev: 2699.27
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 20393
1:0 - 20393
##################################################
Results for message size (0-EMPTY 0-EMPTY) (max_inflight_rpcs = 1000):
Master Endpoint 2:0
-------------------
RPC Bandwidth (MB/sec): 0.00
RPC Throughput (RPCs/sec): 65839
RPC Latencies (us):
Min : 879
25th %: 14529
Median : 15108
75th %: 15650
Max : 16528
Average: 14765
Std Dev: 2087.87
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 14569
1:0 - 15649
|
...
Code Block |
---|
module load gnu-openmpi/3.1.6 or export LD_LIBRARY_PATH=<openmpi lib path>:$LD_LIBRARY_PATH export PATH=<openmpi bin path>:$PATH export D_LOG_FILE=/tmp/daos_perf.log # Single process daos_perf -a 64 -d 256 -c R2S -P 20G -T daos -s 1k -R "U;pV" -g /etc/daos/daos_control.yamlyml # MPI orterun --enable-recovery -x D_LOG_FILE=/tmp/daos_perf_daos.log --host <host name>:4 --map-by node --mca btl_openib_warn_default_gid_prefix "0" --mca btl "tcp,self" --mca oob "tcp" --mca pml "ob1" --mca btl_tcp_if_include "eth0" --np 4 --tag-output /usr/bin/daos_perf -a 64 -d 256 -c R2S -P 20G -T daos -s 1k -R "U;pV" -g /etc/daos/daos_control.yamlyml # Sample Output: Test : DAOS R2S (full stack, 2 replica) Pool : 9c88849b-b0d6-4444-bb39-42769a7a1ef5 Parameters : pool size : SCM: 20480 MB, NVMe: 0 MB credits : -1 (sync I/O for -ve) obj_per_cont : 1 x 1 (procs) dkey_per_obj : 256 akey_per_dkey : 64 recx_per_akey : 16 value type : single stride size : 1024 zero copy : no VOS file : <NULL> Running test=UPDATE Running UPDATE test (iteration=1) UPDATE successfully completed: duration : 91.385233 sec bandwith : 2.801 MB/sec rate : 2868.56 IO/sec latency : 348.607 us (nonsense if credits > 1) Duration across processes: MAX duration : 91.385233 sec MIN duration : 91.385233 sec Average duration : 91.385233 sec Completed test=UPDATE |
...
Code Block |
---|
dmg pool create --namelabel=daos_test_pool --size=500G # Sample output Creating DAOS pool with automatic storage allocation: 500 GB NVMe + 6.00% SCM Pool created with 6.00% SCM/NVMe ratio --------------------------------------- UUID : acf889b6-f290-4d7b-823a-5fae0014a64d Service Ranks : 0 Storage Ranks : 0 Total Size : 530 GB SCM : 30 GB (30 GB / rank) NVMe : 500 GB (500 GB / rank) dmg pool list # Sample output Pool UUID Svc Replicas -------------- ---------------- acf889b6-f290-4d7b-823a-5fae0014a64d 0 DAOS_POOL=<pool uuid> (define on all clients) |
...
Code Block |
---|
daos cont create --type=POSIX --oclass=SX --pool=$DAOS_POOL
DAOS_CONT=<cont uuid> (define on all clients) |
...
Code Block |
---|
# Create directory mkdir -p /tmp/daos_dfuse/daos_test # Use dfuse to mount the daos container to the above directory dfuse --container $DAOS_CONT --disable-direct-iocaching --mountpoint /tmp/daos_dfuse/daos_test --pool $DAOS_POOL # verfiy that the file type is dfuse df -h # Sample output dfuse 500G 17G 34G 34% /tmp/daos_dfuse/daos_test |
...
Code Block |
---|
module load gnu-mpich/3.4~a2
or
export LD_LIBRARY_PATH=<mpich lib path>:$LD_LIBRARY_PATH
export PATH=<mpich bin path>:$PATH
# Download ior source
git clone https://github.com/hpc/ior.git
# Build IOR
cd ior
./bootstrap
mkdir build;cd build
MPICC=mpicc ../configure --with-daos=/usr --prefix=<your dir>
make
make install
# Add IOR to paths add <your dir>/lib to LD_LIBRARY_PATh and <your dir>/bin to PATH |
...
Build mpifileutils with dependencies that are built from source
Code Block |
---|
cd /tmp
mkdir install
installdir=`pwd`/install
export CC=mpicc
# download dependencies and build
mkdir deps
cd deps
wget https://github.com/hpc/libcircle/releases/download/v0.3/libcircle-0.3.0.tar.gz
wget https://github.com/llnl/lwgrp/releases/download/v1.0.3/lwgrp-1.0.3.tar.gz
wget https://github.com/llnl/dtcmp/releases/download/v1.1.1/dtcmp-1.1.1.tar.gz
wget https://github.com/libarchive/libarchive/releases/download/3.5.1/libarchive-3.5.1.tar.gz
tar -zxf libcircle-0.3.0.tar.gz
cd libcircle-0.3.0
./configure --prefix=$installdir
make install
cd ..
tar -zxf lwgrp-1.0.3.tar.gz
cd lwgrp-1.0.3
./configure --prefix=$installdir
make install
cd ..
tar -zxf dtcmp-1.1.1.tar.gz
cd dtcmp-1.1.1
./configure --prefix=$installdir --with-lwgrp=$installdir
make install
cd ..
tar -zxf libarchive-3.5.1.tar.gz
cd libarchive-3.5.1
./configure --prefix=$installdir
make install
cd ..
cd ..
# Download mpifileutils and build
git clone --depth 1 https://github.com/hpc/mpifileutils
mkdir build install
cd build
cmake ../mpifileutils \
-DWITH_DTCMP_PREFIX=../install \
-DWITH_LibCircle_PREFIX=../install \
-DDTCMP_INCLUDE_DIRS=./install/include \
-DDTCMP_LIBRARIES=../install/lib64/libdtcmp.so \
-DLibCircle_INCLUDE_DIRS=../install/include \
-DLibCircle_LIBRARIES=../install/lib64/libcircle.so \
-DCMAKE_INSTALL_PREFIX=../install \
-DWITH_CART_PREFIX=/usr \
-DWITH_DAOS_PREFIX=/usr \
-DENABLE_DAOS=ON
make install |
...
Code Block |
---|
mpirun -hosts <hosts> -np 16 --ppn 16 dcp --bufsize 64MB --chunksize 128MB /tmp/daos_dfuse/daos_test daos://$DAOS_POOL/$DAOS_CONT3 #Sample output [2021-04-29T23:55:52] Walking /tmp/daos_dfuse/daos_test [2021-04-29T23:55:52] Walked 11 items in 0.026 secs (417.452 items/sec) ... [2021-04-29T23:55:52] Walked 11 items in 0.026 seconds (415.641 items/sec) [2021-04-29T23:55:52] Copying to / [2021-04-29T23:55:52] Items: 11 [2021-04-29T23:55:52] Directories: 1 [2021-04-29T23:55:52] Files: 10 [2021-04-29T23:55:52] Links: 0 [2021-04-29T23:55:52] Data: 10.000 GiB (1.000 GiB per file) [2021-04-29T23:55:52] Creating 1 directories [2021-04-29T23:55:52] Creating 10 files. [2021-04-29T23:55:52] Copying data. [2021-04-29T23:56:53] Copied 1.312 GiB (13%) in 61.194 secs (21.963 MiB/s) 405 secs left ... [2021-04-29T23:58:11] Copied 6.000 GiB (60%) in 139.322 secs (44.099 MiB/s) 93 secs left ... [2021-04-29T23:58:11] Copied 10.000 GiB (100%) in 139.322 secs (73.499 MiB/s) done [2021-04-29T23:58:11] Copy data: 10.000 GiB (10737418240 bytes) [2021-04-29T23:58:11] Copy rate: 73.499 MiB/s (10737418240 bytes in 139.322 seconds) [2021-04-29T23:58:11] Syncing data to disk. [2021-04-29T23:58:11] Sync completed in 0.006 seconds. [2021-04-29T23:58:11] Fixing permissions. [2021-04-29T23:58:11] Updated 11 items in 0.002 seconds (4822.579 items/sec) [2021-04-29T23:58:11] Syncing directory updates to disk. [2021-04-29T23:58:11] Sync completed in 0.001 seconds. [2021-04-29T23:58:11] Started: Apr-29-2021,23:55:52 [2021-04-29T23:58:11] Completed: Apr-29-2021,23:58:11 [2021-04-29T23:58:11] Seconds: 139.335 [2021-04-29T23:58:11] Items: 11 [2021-04-29T23:58:11] Directories: 1 [2021-04-29T23:58:11] Files: 10 [2021-04-29T23:58:11] Links: 0 [2021-04-29T23:58:11] Data: 10.000 GiB (10737418240 bytes) [2021-04-29T23:58:11] Rate: 73.492 MiB/s (10737418240 bytes in 139.335 seconds) # Create directory mkdir /tmp/datamover3 #RUN mpirun -hosts wolf-184<host> --ppn 16 -np 16 dcp --bufsize 64MB --chunksize 128MB daos://$DAOS_POOL/$DAOS_CONT3 /tmp/datamover3/ # Sample output [2021-04-30T00:02:14] Walking / [2021-04-30T00:02:15] Walked 12 items in 0.112 secs (107.354 items/sec) ... [2021-04-30T00:02:15] Walked 12 items in 0.112 seconds (107.236 items/sec) [2021-04-30T00:02:15] Copying to /tmp/datamover3 [2021-04-30T00:02:15] Items: 12 [2021-04-30T00:02:15] Directories: 2 [2021-04-30T00:02:15] Files: 10 [2021-04-30T00:02:15] Links: 0 [2021-04-30T00:02:15] Data: 10.000 GiB (1.000 GiB per file) [2021-04-30T00:02:15] Creating 2 directories [2021-04-30T00:02:15] Original directory exists, skip the creation: `/tmp/datamover3/' (errno=17 File exists) [2021-04-30T00:02:15] Creating 10 files. [2021-04-30T00:02:15] Copying data. [2021-04-30T00:03:15] Copied 1.938 GiB (19%) in 60.341 secs (32.880 MiB/s) 251 secs left ... [2021-04-30T00:03:46] Copied 8.750 GiB (88%) in 91.953 secs (97.441 MiB/s) 13 secs left ... [2021-04-30T00:03:46] Copied 10.000 GiB (100%) in 91.953 secs (111.361 MiB/s) done [2021-04-30T00:03:46] Copy data: 10.000 GiB (10737418240 bytes) [2021-04-30T00:03:46] Copy rate: 111.361 MiB/s (10737418240 bytes in 91.954 seconds) [2021-04-30T00:03:46] Syncing data to disk. [2021-04-30T00:03:47] Sync completed in 0.135 seconds. [2021-04-30T00:03:47] Fixing permissions. [2021-04-30T00:03:47] Updated 12 items in 0.000 seconds (71195.069 items/sec) [2021-04-30T00:03:47] Syncing directory updates to disk. [2021-04-30T00:03:47] Sync completed in 0.001 seconds. [2021-04-30T00:03:47] Started: Apr-30-2021,00:02:15 [2021-04-30T00:03:47] Completed: Apr-30-2021,00:03:47 [2021-04-30T00:03:47] Seconds: 92.091 [2021-04-30T00:03:47] Items: 12 [2021-04-30T00:03:47] Directories: 2 [2021-04-30T00:03:47] Files: 10 [2021-04-30T00:03:47] Links: 0 [2021-04-30T00:03:47] Data: 10.000 GiB (10737418240 bytes) [2021-04-30T00:03:47] Rate: 111.194 MiB/s (10737418240 bytes in 92.091 seconds) # Verify the two directories have the same content mjean@wolf-184:~/build> ls -la /tmp/datamover3/daos_test/ total 10485808 drwxr-xr-x 2 mjean mjean 4096 Apr 30 00:02 . drwxr-xr-x 3 mjean mjean 4096 Apr 30 00:02 .. -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000000 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000001 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000002 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000003 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000004 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000005 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000006 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000007 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000008 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000009 mjean@wolf-184:~/build> ls -la /tmp/daos_dfuse/daos_test/ total 10485760 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000000 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000001 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000002 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000003 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000004 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000005 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000006 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000007 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000008 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000009 |
...