NOTE THESE ARE NOT TO BE APPLIED TO 2.0 TESTING, USE THE QUICKSTARTS IN THE 2.0 ON-LINE DOCUMENTATION
Table of Contents |
---|
Introduction
...
This document covers installation of daos rpms on openSUSE/SLES 15.2 and updating the DAOS configuration files needed by daos servers
There dfuse mount point The quick start describes how to use dfuse in order to take advantage of daos support for POSIX, some example runs of daos tests and . The quick start steps users through running benchmarking tools like
ior and mdtest along with some examples of how to move data between a POSIX file system and daos containers (and vise versa) and finally cleaning up your daos setup.For this we are using a set of 2 servers each
Requirements
The quick start requires a minimum of 1 server with PMEM and SSDs connected via infiniband storage network and 2 client nodes 1 client node and 1 admin node without pmem/ssd but on the infiniband storage network.
All nodes have a base openSUSE or SLES 15.2 installed.
Install pdsh on the admin node
Code Block |
---|
sudo zypper install pdsh |
For example, if one wanted to use node-1 as their admin node, node-2 and node-3 as client nodes, and node-4 and node-5 as their server nodes then these variables would be defined as:
Code Block |
---|
CLIENT_NODES=node-2, node-3
SERVER_NODES=node-4, node-5
ALL_NODES=$ADMIN_NODE,$CLIENT_NODES,$SERVER_NODES |
Note |
---|
If a client node is also serving as an admin node then exclude $ADMIN_NODE from the ALL_NODES assignment to prevent duplication, e.g. ALL_NODES=$CLIENT_NODES,$SERVER_NODES |
Set-Up
Please refer here for initial set up which consists of rpm installation, generate and set up certificates, setting up config files, starting servers and agents.config files, starting servers and agents.
Note |
---|
For this quick start, the daos-tests package will need to be installed on the client nodes |
The following applications will be run from a client node:
...
Code Block |
---|
SHARED_DIR=<shared dir by all nodes>
export FI_UNIVERSE_SIZE=2048
export OFI_INTERFACE=eth0
export CRT_PHY_ADDR_STR="ofi+sockets"
# selt_test --help for more details on params
#Generate the attach info file (enable SHARED_DIR with perms for sudo to write )
sudo daos_agent -o /etc/daos/daos_agent.yml -l $SHARED_DIR/daos_agent.log dump-attachinfo -o $SHARED_DIR/daos_server.attach_info_tmp
# Run:
self_test --path $SHARED_DIR --group-name daos_server --endpoint 0-1:0 (for 4 servers --endpoint 0-3:0 ranks:tags)
# Sample output:
Adding endpoints:
ranks: 0-1 (# ranks = 2)
tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
Group name to test against: daos_server
# endpoints: 2
Message sizes: [(200000-BULK_GET 200000-BULK_PUT), (200000-BULK_GET 0-EMPTY), (0-EMPTY 200000-BULK_PUT), (200000-BULK_GET 1000-IOV), (1000-IOV 200000-BULK_PUT), (1000-IOV 1000-IOV), (1000-IOV 0-EMPTY), (0-EMPTY 1000-IOV), (0-EMPTY 0-EMPTY)]
Buffer addresses end with: <Default>
Repetitions per size: 20000
Max inflight RPCs: 1000
CLI [rank=0 pid=3255] Attached daos_server
##################################################
Results for message size (200000-BULK_GET 200000-BULK_PUT) (max_inflight_rpcs = 1000):
Master Endpoint 2:0
-------------------
RPC Bandwidth (MB/sec): 222.67
RPC Throughput (RPCs/sec): 584
RPC Latencies (us):
Min : 27191
25th %: 940293
Median : 1678137
75th %: 2416765
Max : 3148987
Average: 1671626
Std Dev: 821872.40
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 2416764
1:0 - 969063
##################################################
Results for message size (200000-BULK_GET 0-EMPTY) (max_inflight_rpcs = 1000):
Master Endpoint 2:0
-------------------
RPC Bandwidth (MB/sec): 112.08
RPC Throughput (RPCs/sec): 588
RPC Latencies (us):
Min : 2880
25th %: 1156162
Median : 1617356
75th %: 2185604
Max : 2730569
Average: 1659133
Std Dev: 605053.68
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 2185589
1:0 - 1181363
##################################################
Results for message size (0-EMPTY 200000-BULK_PUT) (max_inflight_rpcs = 1000):
Master Endpoint 2:0
-------------------
RPC Bandwidth (MB/sec): 112.11
RPC Throughput (RPCs/sec): 588
RPC Latencies (us):
Min : 4956
25th %: 747786
Median : 1558111
75th %: 2583834
Max : 3437395
Average: 1659959
Std Dev: 1078975.59
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 2583826
1:0 - 776862
##################################################
Results for message size (200000-BULK_GET 1000-IOV) (max_inflight_rpcs = 1000):
Master Endpoint 2:0
-------------------
RPC Bandwidth (MB/sec): 112.57
RPC Throughput (RPCs/sec): 587
RPC Latencies (us):
Min : 2755
25th %: 12341
Median : 1385716
75th %: 3393178
Max : 3399349
Average: 1660125
Std Dev: 1446054.82
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 12343
1:0 - 3393174
##################################################
Results for message size (1000-IOV 200000-BULK_PUT) (max_inflight_rpcs = 1000):
Master Endpoint 2:0
-------------------
RPC Bandwidth (MB/sec): 112.68
RPC Throughput (RPCs/sec): 588
RPC Latencies (us):
Min : 4557
25th %: 522380
Median : 1640322
75th %: 2725419
Max : 3441963
Average: 1661254
Std Dev: 1147206.09
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 600190
1:0 - 2725402
##################################################
Results for message size (1000-IOV 1000-IOV) (max_inflight_rpcs = 1000):
Master Endpoint 2:0
-------------------
RPC Bandwidth (MB/sec): 88.87
RPC Throughput (RPCs/sec): 46595
RPC Latencies (us):
Min : 1165
25th %: 21374
Median : 21473
75th %: 21572
Max : 21961
Average: 20923
Std Dev: 2786.99
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 21430
1:0 - 21516
##################################################
Results for message size (1000-IOV 0-EMPTY) (max_inflight_rpcs = 1000):
Master Endpoint 2:0
-------------------
RPC Bandwidth (MB/sec): 59.03
RPC Throughput (RPCs/sec): 61902
RPC Latencies (us):
Min : 1164
25th %: 15544
Median : 16104
75th %: 16575
Max : 17237
Average: 15696
Std Dev: 2126.37
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 15579
1:0 - 16571
##################################################
Results for message size (0-EMPTY 1000-IOV) (max_inflight_rpcs = 1000):
Master Endpoint 2:0
-------------------
RPC Bandwidth (MB/sec): 46.93
RPC Throughput (RPCs/sec): 49209
RPC Latencies (us):
Min : 945
25th %: 20327
Median : 20393
75th %: 20434
Max : 20576
Average: 19821
Std Dev: 2699.27
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 20393
1:0 - 20393
##################################################
Results for message size (0-EMPTY 0-EMPTY) (max_inflight_rpcs = 1000):
Master Endpoint 2:0
-------------------
RPC Bandwidth (MB/sec): 0.00
RPC Throughput (RPCs/sec): 65839
RPC Latencies (us):
Min : 879
25th %: 14529
Median : 15108
75th %: 15650
Max : 16528
Average: 14765
Std Dev: 2087.87
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 14569
1:0 - 15649
|
...
Code Block |
---|
module load gnu-openmpi/3.1.6 or export LD_LIBRARY_PATH=<openmpi lib path>:$LD_LIBRARY_PATH export PATH=<openmpi bin path>:$PATH export D_LOG_FILE=/tmp/daos_perf.log # Single process daos_perf -a 64 -d 256 -c R2S -P 20G -T daos -s 1k -R "U;pV" -g /etc/daos/daos_control.yamlyml # MPI orterun --enable-recovery -x D_LOG_FILE=/tmp/daos_perf_daos.log --host <host name>:4 --map-by node --mca btl_openib_warn_default_gid_prefix "0" --mca btl "tcp,self" --mca oob "tcp" --mca pml "ob1" --mca btl_tcp_if_include "eth0" --np 4 --tag-output /usr/bin/daos_perf -a 64 -d 256 -c R2S -P 20G -T daos -s 1k -R "U;pV" -g /etc/daos/daos_control.yamlyml # Sample Output: Test : DAOS R2S (full stack, 2 replica) Pool : 9c88849b-b0d6-4444-bb39-42769a7a1ef5 Parameters : pool size : SCM: 20480 MB, NVMe: 0 MB credits : -1 (sync I/O for -ve) obj_per_cont : 1 x 1 (procs) dkey_per_obj : 256 akey_per_dkey : 64 recx_per_akey : 16 value type : single stride size : 1024 zero copy : no VOS file : <NULL> Running test=UPDATE Running UPDATE test (iteration=1) UPDATE successfully completed: duration : 91.385233 sec bandwith : 2.801 MB/sec rate : 2868.56 IO/sec latency : 348.607 us (nonsense if credits > 1) Duration across processes: MAX duration : 91.385233 sec MIN duration : 91.385233 sec Average duration : 91.385233 sec Completed test=UPDATE |
...
Code Block |
---|
dmg pool create --namelabel=daos_test_pool --size=500G # Sample output Creating DAOS pool with automatic storage allocation: 500 GB NVMe + 6.00% SCM Pool created with 6.00% SCM/NVMe ratio --------------------------------------- UUID : acf889b6-f290-4d7b-823a-5fae0014a64d Service Ranks : 0 Storage Ranks : 0 Total Size : 530 GB SCM : 30 GB (30 GB / rank) NVMe : 500 GB (500 GB / rank) dmg pool list # Sample output Pool UUID Svc Replicas -------------- ---------------- acf889b6-f290-4d7b-823a-5fae0014a64d 0 DAOS_POOL=<pool uuid> (define on all clients) |
...
Code Block |
---|
daos cont create --type=POSIX --oclass=SX --pool=$DAOS_POOL
DAOS_CONT=<cont uuid> (define on all clients) |
...
Code Block |
---|
# Create directory mkdir -p /tmp/daos_dfuse/daos_test # Use dfuse to mount the daos container to the above directory dfuse --container $DAOS_CONT --disable-direct-iocaching --mountpoint /tmp/daos_dfuse/daos_test --pool $DAOS_POOL # verfiy that the file type is dfuse df -h # Sample output dfuse 500G 17G 34G 34% /tmp/daos_dfuse/daos_test |
...
Code Block |
---|
module load gnu-mpich/3.4~a2
or
export LD_LIBRARY_PATH=<mpich lib path>:$LD_LIBRARY_PATH
export PATH=<mpich bin path>:$PATH
# Download ior source
git clone https://github.com/hpc/ior.git
# Build IOR
cd ior
./bootstrap
mkdir build;cd build
MPICC=mpicc ../configure --with-daos=/usr --prefix=<your dir>
make
make install
# Add IOR to paths add <your dir>/lib to LD_LIBRARY_PATh and <your dir>/bin to PATH |
...
Code Block |
---|
# Install build dependencies (on all client nodes)
sudo zypper install dtcmp-mpich-devel libcircle-mpich-devel libcap-devel
# Build mpifileutils from installed packages (on all client nodes)
git clone --depth 1 https://github.com/hpc/mpifileutils
mkdir build install
cd build
cmake ../mpifileutils -DENABLE_DAOS=ON \
-DENABLE_LIBARCHIVE=OFF \
-DDTCMP_INCLUDE_DIRS=/usr/lib64/mpi/gcc/mpich/include \
-DDTCMP_LIBRARIES=/usr/lib64/mpi/gcc/mpich/lib64/libdtcmp.so \
-DLibCircle_INCLUDE_DIRS=/usr/lib64/mpi/gcc/mpich/include \
-DLibCircle_LIBRARIES=/usr/lib64/mpi/gcc/mpich/lib64/libcircle.so \
-DWITH_CART_PREFIX=/usr \
-DWITH_DAOS_PREFIX=/usr \
-DCMAKE_INSTALL_INCLUDEDIR=/usr/lib64/mpi/gcc/mpich/include \
-DCMAKE_INSTALL_PREFIX=/usr/lib64/mpi/gcc/mpich/ \
-DCMAKE_INSTALL_LIBDIR=/usr/lib64/mpi/gcc/mpich/lib64
sudo make install |
Build mpifileutils with dependencies that are built from source
...
Code Block |
---|
mkdir install installdir=`pwd`/install export CC=mpicc # download dependencies and build mkdir deps cd deps wget https://github.com/hpc/libcircle/releases/download/v0.3/libcircle-0.3.0.tar.gz wget https://github.com/llnl/lwgrp/releases/download/v1.0.3/lwgrp-1.0.3.tar.gz wget https://github.com/llnl/dtcmp/releases/download/v1.1.1/dtcmp-1.1.1.tar.gz wget https://github.com/libarchive/libarchive/releases/download/3.5.1/libarchive-3.5.1.tar.gz tar -zxf libcircle-0.3.0.tar.gz cd libcircle-0.3.0 ./configure --prefix=$installdir make install cd .. tar -zxf lwgrp-1.0.3.tar.gz cd lwgrp-1.0.3 ./configure --prefix=$installdir make install cd .. tar -zxf dtcmp-1.1.1.tar.gz cd dtcmp-1.1.1 ./configure --prefix=$installdir --with-lwgrp=$installdir make install cd .. tar -zxf libarchive-3.5.1.tar.gz cd libarchive-3.5.1 ./configure --prefix=$installdir make install cd .. cd .. # Download mpifileutils and build git clone --depth 1 https://github.com/hpc/mpifileutils mkdir build install cd build cmake ../mpifileutils \ -DWITH_DTCMP_PREFIX=../install \ -DWITH_LibCircle_PREFIX=../install \ -DDTCMP_INCLUDE_DIRS=./install/include \ -DDTCMP_LIBRARIES=../install/lib64/libdtcmp.so \ -DLibCircle_INCLUDE_DIRS=../install/include \ -DLibCircle_LIBRARIES=../install/lib64/libcircle.so \ -DCMAKE_INSTALL_PREFIX=../install \ -DWITH_CART_PREFIX=/usr \ -DWITH_DAOS_PREFIX=/usr \ -DENABLE_DAOS=ON make install |
...
Code Block |
---|
mpirun -hosts <hosts> -np 16 --ppn 16 dcp --bufsize 64MB --chunksize 128MB /tmp/daos_dfuse/daos_test daos://$DAOS_POOL/$DAOS_CONT3 #Sample output [2021-04-29T23:55:52] Walking /tmp/daos_dfuse/daos_test [2021-04-29T23:55:52] Walked 11 items in 0.026 secs (417.452 items/sec) ... [2021-04-29T23:55:52] Walked 11 items in 0.026 seconds (415.641 items/sec) [2021-04-29T23:55:52] Copying to / [2021-04-29T23:55:52] Items: 11 [2021-04-29T23:55:52] Directories: 1 [2021-04-29T23:55:52] Files: 10 [2021-04-29T23:55:52] Links: 0 [2021-04-29T23:55:52] Data: 10.000 GiB (1.000 GiB per file) [2021-04-29T23:55:52] Creating 1 directories [2021-04-29T23:55:52] Creating 10 files. [2021-04-29T23:55:52] Copying data. [2021-04-29T23:56:53] Copied 1.312 GiB (13%) in 61.194 secs (21.963 MiB/s) 405 secs left ... [2021-04-29T23:58:11] Copied 6.000 GiB (60%) in 139.322 secs (44.099 MiB/s) 93 secs left ... [2021-04-29T23:58:11] Copied 10.000 GiB (100%) in 139.322 secs (73.499 MiB/s) done [2021-04-29T23:58:11] Copy data: 10.000 GiB (10737418240 bytes) [2021-04-29T23:58:11] Copy rate: 73.499 MiB/s (10737418240 bytes in 139.322 seconds) [2021-04-29T23:58:11] Syncing data to disk. [2021-04-29T23:58:11] Sync completed in 0.006 seconds. [2021-04-29T23:58:11] Fixing permissions. [2021-04-29T23:58:11] Updated 11 items in 0.002 seconds (4822.579 items/sec) [2021-04-29T23:58:11] Syncing directory updates to disk. [2021-04-29T23:58:11] Sync completed in 0.001 seconds. [2021-04-29T23:58:11] Started: Apr-29-2021,23:55:52 [2021-04-29T23:58:11] Completed: Apr-29-2021,23:58:11 [2021-04-29T23:58:11] Seconds: 139.335 [2021-04-29T23:58:11] Items: 11 [2021-04-29T23:58:11] Directories: 1 [2021-04-29T23:58:11] Files: 10 [2021-04-29T23:58:11] Links: 0 [2021-04-29T23:58:11] Data: 10.000 GiB (10737418240 bytes) [2021-04-29T23:58:11] Rate: 73.492 MiB/s (10737418240 bytes in 139.335 seconds) # Create directory mkdir /tmp/datamover3 #RUN mpirun -hosts wolf-184<host> --ppn 16 -np 16 dcp --bufsize 64MB --chunksize 128MB daos://$DAOS_POOL/$DAOS_CONT3 /tmp/datamover3/ # Sample output [2021-04-30T00:02:14] Walking / [2021-04-30T00:02:15] Walked 12 items in 0.112 secs (107.354 items/sec) ... [2021-04-30T00:02:15] Walked 12 items in 0.112 seconds (107.236 items/sec) [2021-04-30T00:02:15] Copying to /tmp/datamover3 [2021-04-30T00:02:15] Items: 12 [2021-04-30T00:02:15] Directories: 2 [2021-04-30T00:02:15] Files: 10 [2021-04-30T00:02:15] Links: 0 [2021-04-30T00:02:15] Data: 10.000 GiB (1.000 GiB per file) [2021-04-30T00:02:15] Creating 2 directories [2021-04-30T00:02:15] Original directory exists, skip the creation: `/tmp/datamover3/' (errno=17 File exists) [2021-04-30T00:02:15] Creating 10 files. [2021-04-30T00:02:15] Copying data. [2021-04-30T00:03:15] Copied 1.938 GiB (19%) in 60.341 secs (32.880 MiB/s) 251 secs left ... [2021-04-30T00:03:46] Copied 8.750 GiB (88%) in 91.953 secs (97.441 MiB/s) 13 secs left ... [2021-04-30T00:03:46] Copied 10.000 GiB (100%) in 91.953 secs (111.361 MiB/s) done [2021-04-30T00:03:46] Copy data: 10.000 GiB (10737418240 bytes) [2021-04-30T00:03:46] Copy rate: 111.361 MiB/s (10737418240 bytes in 91.954 seconds) [2021-04-30T00:03:46] Syncing data to disk. [2021-04-30T00:03:47] Sync completed in 0.135 seconds. [2021-04-30T00:03:47] Fixing permissions. [2021-04-30T00:03:47] Updated 12 items in 0.000 seconds (71195.069 items/sec) [2021-04-30T00:03:47] Syncing directory updates to disk. [2021-04-30T00:03:47] Sync completed in 0.001 seconds. [2021-04-30T00:03:47] Started: Apr-30-2021,00:02:15 [2021-04-30T00:03:47] Completed: Apr-30-2021,00:03:47 [2021-04-30T00:03:47] Seconds: 92.091 [2021-04-30T00:03:47] Items: 12 [2021-04-30T00:03:47] Directories: 2 [2021-04-30T00:03:47] Files: 10 [2021-04-30T00:03:47] Links: 0 [2021-04-30T00:03:47] Data: 10.000 GiB (10737418240 bytes) [2021-04-30T00:03:47] Rate: 111.194 MiB/s (10737418240 bytes in 92.091 seconds) # Verify the two directories have the same content mjean@wolf-184:~/build> ls -la /tmp/datamover3/daos_test/ total 10485808 drwxr-xr-x 2 mjean mjean 4096 Apr 30 00:02 . drwxr-xr-x 3 mjean mjean 4096 Apr 30 00:02 .. -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000000 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000001 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000002 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000003 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000004 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000005 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000006 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000007 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000008 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000009 mjean@wolf-184:~/build> ls -la /tmp/daos_dfuse/daos_test/ total 10485760 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000000 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000001 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000002 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000003 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000004 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000005 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000006 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000007 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000008 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000009 |
...