NOTE THESE ARE NOT TO BE APPLIED TO 2.0 TESTING, USE THE QUICKSTARTS IN THE 2.0 ON-LINE DOCUMENTATION
Table of Contents |
---|
Introduction
...
For example, if one wanted to use node-1 as their admin node, node-2 and node-3 as client nodes, and node-4 and node-5 as their server nodes then these variables would be defined as:
Code Block |
---|
ADMIN_NODE=node-1
CLIENT_NODES=node-2, node-3
SERVER_NODES=node-4, node-5
ALL_NODES=$ADMIN_NODE,$CLIENT_NODES,$SERVER_NODES |
...
Please refer here for initial set up which consists of rpm installation, generate and set up certificates, setting up config files, starting servers and agents.
Note |
---|
For this quick start, the daos-tests package will need to be installed on the client nodes |
The following applications will be run from a client node:
...
Code Block |
---|
SHARED_DIR=<shared dir by all nodes> export FI_UNIVERSE_SIZE=2048 export OFI_INTERFACE=eth0 export CRT_PHY_ADDR_STR="ofi+sockets" # selt_test --help for more details on params #Generate the attach info file (enable SHARED_DIR with perms for sudo to write ) sudo daos_agent -o /etc/daos/daos_agent.yml -l $SHARED_DIR/daos_agent.log dump-attachinfo -o $SHARED_DIR/daos_server.attach_info_tmp # Run: self_test --path $SHARED_DIR --group-name daos_server --endpoint 0-1:0 (for 4 servers --endpoint 0-3:0 ranks:tags) # Sample output: Adding endpoints: ranks: 0-1 (# ranks = 2) tags: 0 (# tags = 1) Warning: No --master-endpoint specified; using this command line application as the master endpoint Self Test Parameters: Group name to test against: daos_server # endpoints: 2 Message sizes: [(200000-BULK_GET 200000-BULK_PUT), (200000-BULK_GET 0-EMPTY), (0-EMPTY 200000-BULK_PUT), (200000-BULK_GET 1000-IOV), (1000-IOV 200000-BULK_PUT), (1000-IOV 1000-IOV), (1000-IOV 0-EMPTY), (0-EMPTY 1000-IOV), (0-EMPTY 0-EMPTY)] Buffer addresses end with: <Default> Repetitions per size: 20000 Max inflight RPCs: 1000 CLI [rank=0 pid=3255] Attached daos_server ################################################## Results for message size (200000-BULK_GET 200000-BULK_PUT) (max_inflight_rpcs = 1000): Master Endpoint 2:0 ------------------- RPC Bandwidth (MB/sec): 222.67 RPC Throughput (RPCs/sec): 584 RPC Latencies (us): Min : 27191 25th %: 940293 Median : 1678137 75th %: 2416765 Max : 3148987 Average: 1671626 Std Dev: 821872.40 RPC Failures: 0 Endpoint results (rank:tag - Median Latency (us)): 0:0 - 2416764 1:0 - 969063 ################################################## Results for message size (200000-BULK_GET 0-EMPTY) (max_inflight_rpcs = 1000): Master Endpoint 2:0 ------------------- RPC Bandwidth (MB/sec): 112.08 RPC Throughput (RPCs/sec): 588 RPC Latencies (us): Min : 2880 25th %: 1156162 Median : 1617356 75th %: 2185604 Max : 2730569 Average: 1659133 Std Dev: 605053.68 RPC Failures: 0 Endpoint results (rank:tag - Median Latency (us)): 0:0 - 2185589 1:0 - 1181363 ################################################## Results for message size (0-EMPTY 200000-BULK_PUT) (max_inflight_rpcs = 1000): Master Endpoint 2:0 ------------------- RPC Bandwidth (MB/sec): 112.11 RPC Throughput (RPCs/sec): 588 RPC Latencies (us): Min : 4956 25th %: 747786 Median : 1558111 75th %: 2583834 Max : 3437395 Average: 1659959 Std Dev: 1078975.59 RPC Failures: 0 Endpoint results (rank:tag - Median Latency (us)): 0:0 - 2583826 1:0 - 776862 ################################################## Results for message size (200000-BULK_GET 1000-IOV) (max_inflight_rpcs = 1000): Master Endpoint 2:0 ------------------- RPC Bandwidth (MB/sec): 112.57 RPC Throughput (RPCs/sec): 587 RPC Latencies (us): Min : 2755 25th %: 12341 Median : 1385716 75th %: 3393178 Max : 3399349 Average: 1660125 Std Dev: 1446054.82 RPC Failures: 0 Endpoint results (rank:tag - Median Latency (us)): 0:0 - 12343 1:0 - 3393174 ################################################## Results for message size (1000-IOV 200000-BULK_PUT) (max_inflight_rpcs = 1000): Master Endpoint 2:0 ------------------- RPC Bandwidth (MB/sec): 112.68 RPC Throughput (RPCs/sec): 588 RPC Latencies (us): Min : 4557 25th %: 522380 Median : 1640322 75th %: 2725419 Max : 3441963 Average: 1661254 Std Dev: 1147206.09 RPC Failures: 0 Endpoint results (rank:tag - Median Latency (us)): 0:0 - 600190 1:0 - 2725402 ################################################## Results for message size (1000-IOV 1000-IOV) (max_inflight_rpcs = 1000): Master Endpoint 2:0 ------------------- RPC Bandwidth (MB/sec): 88.87 RPC Throughput (RPCs/sec): 46595 RPC Latencies (us): Min : 1165 25th %: 21374 Median : 21473 75th %: 21572 Max : 21961 Average: 20923 Std Dev: 2786.99 RPC Failures: 0 Endpoint results (rank:tag - Median Latency (us)): 0:0 - 21430 1:0 - 21516 ################################################## Results for message size (1000-IOV 0-EMPTY) (max_inflight_rpcs = 1000): Master Endpoint 2:0 ------------------- RPC Bandwidth (MB/sec): 59.03 RPC Throughput (RPCs/sec): 61902 RPC Latencies (us): Min : 1164 25th %: 15544 Median : 16104 75th %: 16575 Max : 17237 Average: 15696 Std Dev: 2126.37 RPC Failures: 0 Endpoint results (rank:tag - Median Latency (us)): 0:0 - 15579 1:0 - 16571 ################################################## Results for message size (0-EMPTY 1000-IOV) (max_inflight_rpcs = 1000): Master Endpoint 2:0 ------------------- RPC Bandwidth (MB/sec): 46.93 RPC Throughput (RPCs/sec): 49209 RPC Latencies (us): Min : 945 25th %: 20327 Median : 20393 75th %: 20434 Max : 20576 Average: 19821 Std Dev: 2699.27 RPC Failures: 0 Endpoint results (rank:tag - Median Latency (us)): 0:0 - 20393 1:0 - 20393 ################################################## Results for message size (0-EMPTY 0-EMPTY) (max_inflight_rpcs = 1000): Master Endpoint 2:0 ------------------- RPC Bandwidth (MB/sec): 0.00 RPC Throughput (RPCs/sec): 65839 RPC Latencies (us): Min : 879 25th %: 14529 Median : 15108 75th %: 15650 Max : 16528 Average: 14765 Std Dev: 2087.87 RPC Failures: 0 Endpoint results (rank:tag - Median Latency (us)): 0:0 - 14569 1:0 - 15649 |
...
Code Block |
---|
module load gnu-openmpi/3.1.6 or export LD_LIBRARY_PATH=<openmpi lib path>:$LD_LIBRARY_PATH export PATH=<openmpi bin path>:$PATH export D_LOG_FILE=/tmp/daos_perf.log # Single process daos_perf -a 64 -d 256 -c R2S -P 20G -T daos -s 1k -R "U;pV" -g /etc/daos/daos_control.yamlyml # MPI orterun --enable-recovery -x D_LOG_FILE=/tmp/daos_perf_daos.log --host <host name>:4 --map-by node --mca btl_openib_warn_default_gid_prefix "0" --mca btl "tcp,self" --mca oob "tcp" --mca pml "ob1" --mca btl_tcp_if_include "eth0" --np 4 --tag-output /usr/bin/daos_perf -a 64 -d 256 -c R2S -P 20G -T daos -s 1k -R "U;pV" -g /etc/daos/daos_control.yamlyml # Sample Output: Test : DAOS R2S (full stack, 2 replica) Pool : 9c88849b-b0d6-4444-bb39-42769a7a1ef5 Parameters : pool size : SCM: 20480 MB, NVMe: 0 MB credits : -1 (sync I/O for -ve) obj_per_cont : 1 x 1 (procs) dkey_per_obj : 256 akey_per_dkey : 64 recx_per_akey : 16 value type : single stride size : 1024 zero copy : no VOS file : <NULL> Running test=UPDATE Running UPDATE test (iteration=1) UPDATE successfully completed: duration : 91.385233 sec bandwith : 2.801 MB/sec rate : 2868.56 IO/sec latency : 348.607 us (nonsense if credits > 1) Duration across processes: MAX duration : 91.385233 sec MIN duration : 91.385233 sec Average duration : 91.385233 sec Completed test=UPDATE |
...
Code Block |
---|
dmg pool create --namelabel=daos_test_pool --size=500G # Sample output Creating DAOS pool with automatic storage allocation: 500 GB NVMe + 6.00% SCM Pool created with 6.00% SCM/NVMe ratio --------------------------------------- UUID : acf889b6-f290-4d7b-823a-5fae0014a64d Service Ranks : 0 Storage Ranks : 0 Total Size : 530 GB SCM : 30 GB (30 GB / rank) NVMe : 500 GB (500 GB / rank) dmg pool list # Sample output Pool UUID Svc Replicas -------------- ---------------- acf889b6-f290-4d7b-823a-5fae0014a64d 0 DAOS_POOL=<pool uuid> (define on all clients) |
...
Code Block |
---|
daos cont create --type=POSIX --oclass=SX --pool=$DAOS_POOL
DAOS_CONT=<cont uuid> (define on all clients) |
...
Code Block |
---|
# Create directory mkdir -p /tmp/daos_dfuse/daos_test # Use dfuse to mount the daos container to the above directory dfuse --container $DAOS_CONT --disable-direct-iocaching --mountpoint /tmp/daos_dfuse/daos_test --pool $DAOS_POOL # verfiy that the file type is dfuse df -h # Sample output dfuse 500G 17G 34G 34% /tmp/daos_dfuse/daos_test |
...
Code Block |
---|
module load gnu-mpich/3.4~a2
or
export LD_LIBRARY_PATH=<mpich lib path>:$LD_LIBRARY_PATH
export PATH=<mpich bin path>:$PATH
# Download ior source
git clone https://github.com/hpc/ior.git
# Build IOR
cd ior
./bootstrap
mkdir build;cd build
MPICC=mpicc ../configure --with-daos=/usr --prefix=<your dir>
make
make install
# Add IOR to paths add <your dir>/lib to LD_LIBRARY_PATh and <your dir>/bin to PATH |
...
Code Block |
---|
mpirun -hosts <hosts> -np 16 --ppn 16 dcp --bufsize 64MB --chunksize 128MB /tmp/daos_dfuse/daos_test daos://$DAOS_POOL/$DAOS_CONT3 #Sample output [2021-04-29T23:55:52] Walking /tmp/daos_dfuse/daos_test [2021-04-29T23:55:52] Walked 11 items in 0.026 secs (417.452 items/sec) ... [2021-04-29T23:55:52] Walked 11 items in 0.026 seconds (415.641 items/sec) [2021-04-29T23:55:52] Copying to / [2021-04-29T23:55:52] Items: 11 [2021-04-29T23:55:52] Directories: 1 [2021-04-29T23:55:52] Files: 10 [2021-04-29T23:55:52] Links: 0 [2021-04-29T23:55:52] Data: 10.000 GiB (1.000 GiB per file) [2021-04-29T23:55:52] Creating 1 directories [2021-04-29T23:55:52] Creating 10 files. [2021-04-29T23:55:52] Copying data. [2021-04-29T23:56:53] Copied 1.312 GiB (13%) in 61.194 secs (21.963 MiB/s) 405 secs left ... [2021-04-29T23:58:11] Copied 6.000 GiB (60%) in 139.322 secs (44.099 MiB/s) 93 secs left ... [2021-04-29T23:58:11] Copied 10.000 GiB (100%) in 139.322 secs (73.499 MiB/s) done [2021-04-29T23:58:11] Copy data: 10.000 GiB (10737418240 bytes) [2021-04-29T23:58:11] Copy rate: 73.499 MiB/s (10737418240 bytes in 139.322 seconds) [2021-04-29T23:58:11] Syncing data to disk. [2021-04-29T23:58:11] Sync completed in 0.006 seconds. [2021-04-29T23:58:11] Fixing permissions. [2021-04-29T23:58:11] Updated 11 items in 0.002 seconds (4822.579 items/sec) [2021-04-29T23:58:11] Syncing directory updates to disk. [2021-04-29T23:58:11] Sync completed in 0.001 seconds. [2021-04-29T23:58:11] Started: Apr-29-2021,23:55:52 [2021-04-29T23:58:11] Completed: Apr-29-2021,23:58:11 [2021-04-29T23:58:11] Seconds: 139.335 [2021-04-29T23:58:11] Items: 11 [2021-04-29T23:58:11] Directories: 1 [2021-04-29T23:58:11] Files: 10 [2021-04-29T23:58:11] Links: 0 [2021-04-29T23:58:11] Data: 10.000 GiB (10737418240 bytes) [2021-04-29T23:58:11] Rate: 73.492 MiB/s (10737418240 bytes in 139.335 seconds) # Create directory mkdir /tmp/datamover3 #RUN mpirun -hosts wolf-184<host> --ppn 16 -np 16 dcp --bufsize 64MB --chunksize 128MB daos://$DAOS_POOL/$DAOS_CONT3 /tmp/datamover3/ # Sample output [2021-04-30T00:02:14] Walking / [2021-04-30T00:02:15] Walked 12 items in 0.112 secs (107.354 items/sec) ... [2021-04-30T00:02:15] Walked 12 items in 0.112 seconds (107.236 items/sec) [2021-04-30T00:02:15] Copying to /tmp/datamover3 [2021-04-30T00:02:15] Items: 12 [2021-04-30T00:02:15] Directories: 2 [2021-04-30T00:02:15] Files: 10 [2021-04-30T00:02:15] Links: 0 [2021-04-30T00:02:15] Data: 10.000 GiB (1.000 GiB per file) [2021-04-30T00:02:15] Creating 2 directories [2021-04-30T00:02:15] Original directory exists, skip the creation: `/tmp/datamover3/' (errno=17 File exists) [2021-04-30T00:02:15] Creating 10 files. [2021-04-30T00:02:15] Copying data. [2021-04-30T00:03:15] Copied 1.938 GiB (19%) in 60.341 secs (32.880 MiB/s) 251 secs left ... [2021-04-30T00:03:46] Copied 8.750 GiB (88%) in 91.953 secs (97.441 MiB/s) 13 secs left ... [2021-04-30T00:03:46] Copied 10.000 GiB (100%) in 91.953 secs (111.361 MiB/s) done [2021-04-30T00:03:46] Copy data: 10.000 GiB (10737418240 bytes) [2021-04-30T00:03:46] Copy rate: 111.361 MiB/s (10737418240 bytes in 91.954 seconds) [2021-04-30T00:03:46] Syncing data to disk. [2021-04-30T00:03:47] Sync completed in 0.135 seconds. [2021-04-30T00:03:47] Fixing permissions. [2021-04-30T00:03:47] Updated 12 items in 0.000 seconds (71195.069 items/sec) [2021-04-30T00:03:47] Syncing directory updates to disk. [2021-04-30T00:03:47] Sync completed in 0.001 seconds. [2021-04-30T00:03:47] Started: Apr-30-2021,00:02:15 [2021-04-30T00:03:47] Completed: Apr-30-2021,00:03:47 [2021-04-30T00:03:47] Seconds: 92.091 [2021-04-30T00:03:47] Items: 12 [2021-04-30T00:03:47] Directories: 2 [2021-04-30T00:03:47] Files: 10 [2021-04-30T00:03:47] Links: 0 [2021-04-30T00:03:47] Data: 10.000 GiB (10737418240 bytes) [2021-04-30T00:03:47] Rate: 111.194 MiB/s (10737418240 bytes in 92.091 seconds) # Verify the two directories have the same content mjean@wolf-184:~/build> ls -la /tmp/datamover3/daos_test/ total 10485808 drwxr-xr-x 2 mjean mjean 4096 Apr 30 00:02 . drwxr-xr-x 3 mjean mjean 4096 Apr 30 00:02 .. -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000000 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000001 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000002 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000003 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000004 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000005 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000006 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000007 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000008 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000009 mjean@wolf-184:~/build> ls -la /tmp/daos_dfuse/daos_test/ total 10485760 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000000 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000001 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000002 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000003 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000004 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000005 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000006 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000007 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000008 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000009 |
...