Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

NOTE THESE ARE NOT TO BE APPLIED TO 2.0 TESTING, USE THE QUICKSTARTS IN THE 2.0 ON-LINE DOCUMENTATION

Table of Contents


Introduction

...

For example, if one wanted to use node-1 as their admin node, node-2 and node-3 as client nodes, and node-4 and node-5 as their server nodes then these variables would be defined as:

Code Block
ADMIN_NODE=node-1
CLIENT_NODES=node-2, node-3
SERVER_NODES=node-4, node-5
ALL_NODES=$ADMIN_NODE,$CLIENT_NODES,$SERVER_NODES

...

Code Block
SHARED_DIR=<shared dir by all nodes>
export FI_UNIVERSE_SIZE=2048
export OFI_INTERFACE=eth0
export CRT_PHY_ADDR_STR="ofi+sockets"

# selt_test --help for more details on params

#Generate the attach info file (enable SHARED_DIR with perms for sudo to write )
sudo daos_agent -o /etc/daos/daos_agent.yml -l $SHARED_DIR/daos_agent.log dump-attachinfo -o $SHARED_DIR/daos_server.attach_info_tmp

# Run: 
self_test --path $SHARED_DIR --group-name daos_server --endpoint 0-1:0           (for 4 servers --endpoint 0-3:0   ranks:tags)


# Sample output:

Adding endpoints:                                                                                   
  ranks: 0-1 (# ranks = 2)                                                                          
  tags: 0 (# tags = 1)                                                                              
Warning: No --master-endpoint specified; using this command line application as the master endpoint 
Self Test Parameters:                                                                               
  Group name to test against: daos_server                                                           
  # endpoints:                2                                                                     
  Message sizes:              [(200000-BULK_GET 200000-BULK_PUT), (200000-BULK_GET 0-EMPTY), (0-EMPTY 200000-BULK_PUT), (200000-BULK_GET 1000-IOV), (1000-IOV 200000-BULK_PUT), (1000-IOV 1000-IOV), (1000-IOV 0-EMPTY), (0-EMPTY 1000-IOV), (0-EMPTY 0-EMPTY)]                                                                                                             
  Buffer addresses end with:  <Default>                                                                                                                                               
  Repetitions per size:       20000                                                                                                                                                   
  Max inflight RPCs:          1000                                                                                                                                                    

CLI [rank=0 pid=3255]   Attached daos_server
##################################################
Results for message size (200000-BULK_GET 200000-BULK_PUT) (max_inflight_rpcs = 1000):

Master Endpoint 2:0
-------------------
        RPC Bandwidth (MB/sec): 222.67
        RPC Throughput (RPCs/sec): 584
        RPC Latencies (us):           
                Min    : 27191        
                25th  %: 940293       
                Median : 1678137      
                75th  %: 2416765      
                Max    : 3148987      
                Average: 1671626      
                Std Dev: 821872.40    
        RPC Failures: 0               

        Endpoint results (rank:tag - Median Latency (us)):
                0:0 - 2416764                             
                1:0 - 969063                              

##################################################
Results for message size (200000-BULK_GET 0-EMPTY) (max_inflight_rpcs = 1000):

Master Endpoint 2:0
-------------------
        RPC Bandwidth (MB/sec): 112.08
        RPC Throughput (RPCs/sec): 588
        RPC Latencies (us):           
                Min    : 2880         
                25th  %: 1156162      
                Median : 1617356      
                75th  %: 2185604      
                Max    : 2730569      
                Average: 1659133      
                Std Dev: 605053.68    
        RPC Failures: 0               

        Endpoint results (rank:tag - Median Latency (us)):
                0:0 - 2185589                             
                1:0 - 1181363                             

##################################################
Results for message size (0-EMPTY 200000-BULK_PUT) (max_inflight_rpcs = 1000):

Master Endpoint 2:0
-------------------
        RPC Bandwidth (MB/sec): 112.11
        RPC Throughput (RPCs/sec): 588
        RPC Latencies (us):           
                Min    : 4956         
                25th  %: 747786       
                Median : 1558111      
                75th  %: 2583834      
                Max    : 3437395      
                Average: 1659959      
                Std Dev: 1078975.59   
        RPC Failures: 0               

        Endpoint results (rank:tag - Median Latency (us)):
                0:0 - 2583826                             
                1:0 - 776862                              

##################################################
Results for message size (200000-BULK_GET 1000-IOV) (max_inflight_rpcs = 1000):

Master Endpoint 2:0
-------------------
        RPC Bandwidth (MB/sec): 112.57
        RPC Throughput (RPCs/sec): 587
        RPC Latencies (us):           
                Min    : 2755         
                25th  %: 12341        
                Median : 1385716      
                75th  %: 3393178      
                Max    : 3399349      
                Average: 1660125      
                Std Dev: 1446054.82   
        RPC Failures: 0               

        Endpoint results (rank:tag - Median Latency (us)):
                0:0 - 12343                               
                1:0 - 3393174                             

##################################################
Results for message size (1000-IOV 200000-BULK_PUT) (max_inflight_rpcs = 1000):

Master Endpoint 2:0
-------------------
        RPC Bandwidth (MB/sec): 112.68
        RPC Throughput (RPCs/sec): 588
        RPC Latencies (us):           
                Min    : 4557         
                25th  %: 522380       
                Median : 1640322      
                75th  %: 2725419      
                Max    : 3441963      
                Average: 1661254      
                Std Dev: 1147206.09   
        RPC Failures: 0               

        Endpoint results (rank:tag - Median Latency (us)):
                0:0 - 600190                              
                1:0 - 2725402                             

##################################################
Results for message size (1000-IOV 1000-IOV) (max_inflight_rpcs = 1000):

Master Endpoint 2:0
-------------------
        RPC Bandwidth (MB/sec): 88.87
        RPC Throughput (RPCs/sec): 46595
        RPC Latencies (us):             
                Min    : 1165           
                25th  %: 21374          
                Median : 21473          
                75th  %: 21572          
                Max    : 21961          
                Average: 20923          
                Std Dev: 2786.99        
        RPC Failures: 0                 

        Endpoint results (rank:tag - Median Latency (us)):
                0:0 - 21430                               
                1:0 - 21516                               

##################################################
Results for message size (1000-IOV 0-EMPTY) (max_inflight_rpcs = 1000):

Master Endpoint 2:0
-------------------
        RPC Bandwidth (MB/sec): 59.03
        RPC Throughput (RPCs/sec): 61902
        RPC Latencies (us):             
                Min    : 1164           
                25th  %: 15544          
                Median : 16104          
                75th  %: 16575          
                Max    : 17237          
                Average: 15696          
                Std Dev: 2126.37        
        RPC Failures: 0                 

        Endpoint results (rank:tag - Median Latency (us)):
                0:0 - 15579                               
                1:0 - 16571                               

##################################################
Results for message size (0-EMPTY 1000-IOV) (max_inflight_rpcs = 1000):

Master Endpoint 2:0
-------------------
        RPC Bandwidth (MB/sec): 46.93
        RPC Throughput (RPCs/sec): 49209
        RPC Latencies (us):             
                Min    : 945            
                25th  %: 20327          
                Median : 20393
                75th  %: 20434
                Max    : 20576
                Average: 19821
                Std Dev: 2699.27
        RPC Failures: 0

        Endpoint results (rank:tag - Median Latency (us)):
                0:0 - 20393
                1:0 - 20393

##################################################
Results for message size (0-EMPTY 0-EMPTY) (max_inflight_rpcs = 1000):

Master Endpoint 2:0
-------------------
        RPC Bandwidth (MB/sec): 0.00
        RPC Throughput (RPCs/sec): 65839
        RPC Latencies (us):
                Min    : 879
                25th  %: 14529
                Median : 15108
                75th  %: 15650
                Max    : 16528
                Average: 14765
                Std Dev: 2087.87
        RPC Failures: 0

        Endpoint results (rank:tag - Median Latency (us)):
                0:0 - 14569
                1:0 - 15649


...

Code Block
module load gnu-openmpi/3.1.6

or

export LD_LIBRARY_PATH=<openmpi lib path>:$LD_LIBRARY_PATH
export PATH=<openmpi bin path>:$PATH


export D_LOG_FILE=/tmp/daos_perf.log


# Single process
daos_perf -a 64 -d 256 -c R2S -P 20G -T daos -s 1k -R "U;pV" -g /etc/daos/daos_control.yamlyml



# MPI
orterun --enable-recovery -x D_LOG_FILE=/tmp/daos_perf_daos.log --host <host name>:4 --map-by node --mca btl_openib_warn_default_gid_prefix "0" --mca btl "tcp,self" --mca oob "tcp" --mca pml "ob1" --mca btl_tcp_if_include "eth0" --np 4 --tag-output /usr/bin/daos_perf -a 64 -d 256 -c R2S -P 20G -T daos -s 1k -R "U;pV" -g /etc/daos/daos_control.yamlyml  

# Sample Output:

Test :
        DAOS R2S (full stack, 2 replica)
Pool :
        9c88849b-b0d6-4444-bb39-42769a7a1ef5
Parameters :
        pool size     : SCM: 20480 MB, NVMe: 0 MB
        credits       : -1 (sync I/O for -ve)
        obj_per_cont  : 1 x 1 (procs)
        dkey_per_obj  : 256
        akey_per_dkey : 64
        recx_per_akey : 16
        value type    : single
        stride size   : 1024
        zero copy     : no
        VOS file      : <NULL>
Running test=UPDATE
Running UPDATE test (iteration=1)
UPDATE successfully completed:
        duration : 91.385233  sec
        bandwith : 2.801      MB/sec
        rate     : 2868.56    IO/sec
        latency  : 348.607    us (nonsense if credits > 1)
Duration across processes:
        MAX duration : 91.385233  sec
        MIN duration : 91.385233  sec
        Average duration : 91.385233  sec
Completed test=UPDATE 

...

Code Block
dmg pool create --namelabel=daos_test_pool --size=500G
  
# Sample output 
Creating DAOS pool with automatic storage allocation: 500 GB NVMe + 6.00% SCM

Pool created with 6.00% SCM/NVMe ratio
---------------------------------------

  UUID          : acf889b6-f290-4d7b-823a-5fae0014a64d

  Service Ranks : 0

  Storage Ranks : 0

  Total Size    : 530 GB

  SCM           : 30 GB (30 GB / rank)

  NVMe          : 500 GB (500 GB / rank)


dmg pool list

# Sample output
Pool UUID                            Svc Replicas
--------------                       ----------------
acf889b6-f290-4d7b-823a-5fae0014a64d 0

DAOS_POOL=<pool uuid> (define on all clients)

...

Code Block
daos cont create --type=POSIX --oclass=SX --pool=$DAOS_POOL
DAOS_CONT=<cont uuid>  (define on all clients)

...

Code Block
# Create directory
mkdir -p /tmp/daos_dfuse/daos_test

# Use dfuse to mount the daos container to the above directory
dfuse --container $DAOS_CONT --disable-direct-iocaching --mountpoint /tmp/daos_dfuse/daos_test --pool $DAOS_POOL

# verfiy that the file type is dfuse
df -h

# Sample output
dfuse                                                       500G   17G   34G  34% /tmp/daos_dfuse/daos_test

...

Code Block
module load gnu-mpich/3.4~a2

or 

export LD_LIBRARY_PATH=<mpich lib path>:$LD_LIBRARY_PATH
export PATH=<mpich bin path>:$PATH


# Download ior source 
git clone https://github.com/hpc/ior.git 

# Build IOR 
cd ior 
./bootstrap 
mkdir build;cd build 
MPICC=mpicc ../configure --with-daos=/usr --prefix=<your dir> 
make 
make install 

# Add IOR to paths  add <your dir>/lib to LD_LIBRARY_PATh and <your dir>/bin to PATH

...

Code Block
mpirun -hosts <hosts> -np 16 --ppn 16 dcp --bufsize 64MB --chunksize 128MB /tmp/daos_dfuse/daos_test daos://$DAOS_POOL/$DAOS_CONT3


#Sample output

[2021-04-29T23:55:52] Walking /tmp/daos_dfuse/daos_test
[2021-04-29T23:55:52] Walked 11 items in 0.026 secs (417.452 items/sec) ...
[2021-04-29T23:55:52] Walked 11 items in 0.026 seconds (415.641 items/sec)
[2021-04-29T23:55:52] Copying to /
[2021-04-29T23:55:52] Items: 11
[2021-04-29T23:55:52]   Directories: 1
[2021-04-29T23:55:52]   Files: 10
[2021-04-29T23:55:52]   Links: 0
[2021-04-29T23:55:52] Data: 10.000 GiB (1.000 GiB per file)
[2021-04-29T23:55:52] Creating 1 directories
[2021-04-29T23:55:52] Creating 10 files.
[2021-04-29T23:55:52] Copying data.
[2021-04-29T23:56:53] Copied 1.312 GiB (13%) in 61.194 secs (21.963 MiB/s) 405 secs left ...
[2021-04-29T23:58:11] Copied 6.000 GiB (60%) in 139.322 secs (44.099 MiB/s) 93 secs left ...
[2021-04-29T23:58:11] Copied 10.000 GiB (100%) in 139.322 secs (73.499 MiB/s) done
[2021-04-29T23:58:11] Copy data: 10.000 GiB (10737418240 bytes)
[2021-04-29T23:58:11] Copy rate: 73.499 MiB/s (10737418240 bytes in 139.322 seconds)
[2021-04-29T23:58:11] Syncing data to disk.
[2021-04-29T23:58:11] Sync completed in 0.006 seconds.
[2021-04-29T23:58:11] Fixing permissions.
[2021-04-29T23:58:11] Updated 11 items in 0.002 seconds (4822.579 items/sec)
[2021-04-29T23:58:11] Syncing directory updates to disk.
[2021-04-29T23:58:11] Sync completed in 0.001 seconds.
[2021-04-29T23:58:11] Started: Apr-29-2021,23:55:52
[2021-04-29T23:58:11] Completed: Apr-29-2021,23:58:11
[2021-04-29T23:58:11] Seconds: 139.335
[2021-04-29T23:58:11] Items: 11
[2021-04-29T23:58:11]   Directories: 1
[2021-04-29T23:58:11]   Files: 10
[2021-04-29T23:58:11]   Links: 0
[2021-04-29T23:58:11] Data: 10.000 GiB (10737418240 bytes)
[2021-04-29T23:58:11] Rate: 73.492 MiB/s (10737418240 bytes in 139.335 seconds)


# Create directory
mkdir /tmp/datamover3

#RUN 
mpirun -hosts wolf-184<host> --ppn 16 -np 16 dcp --bufsize 64MB --chunksize 128MB daos://$DAOS_POOL/$DAOS_CONT3 /tmp/datamover3/

# Sample output
[2021-04-30T00:02:14] Walking /
[2021-04-30T00:02:15] Walked 12 items in 0.112 secs (107.354 items/sec) ...
[2021-04-30T00:02:15] Walked 12 items in 0.112 seconds (107.236 items/sec)
[2021-04-30T00:02:15] Copying to /tmp/datamover3
[2021-04-30T00:02:15] Items: 12
[2021-04-30T00:02:15]   Directories: 2
[2021-04-30T00:02:15]   Files: 10
[2021-04-30T00:02:15]   Links: 0
[2021-04-30T00:02:15] Data: 10.000 GiB (1.000 GiB per file)
[2021-04-30T00:02:15] Creating 2 directories
[2021-04-30T00:02:15] Original directory exists, skip the creation: `/tmp/datamover3/' (errno=17 File exists)
[2021-04-30T00:02:15] Creating 10 files.
[2021-04-30T00:02:15] Copying data.
[2021-04-30T00:03:15] Copied 1.938 GiB (19%) in 60.341 secs (32.880 MiB/s) 251 secs left ...
[2021-04-30T00:03:46] Copied 8.750 GiB (88%) in 91.953 secs (97.441 MiB/s) 13 secs left ...
[2021-04-30T00:03:46] Copied 10.000 GiB (100%) in 91.953 secs (111.361 MiB/s) done
[2021-04-30T00:03:46] Copy data: 10.000 GiB (10737418240 bytes)
[2021-04-30T00:03:46] Copy rate: 111.361 MiB/s (10737418240 bytes in 91.954 seconds)
[2021-04-30T00:03:46] Syncing data to disk.
[2021-04-30T00:03:47] Sync completed in 0.135 seconds.
[2021-04-30T00:03:47] Fixing permissions.
[2021-04-30T00:03:47] Updated 12 items in 0.000 seconds (71195.069 items/sec)
[2021-04-30T00:03:47] Syncing directory updates to disk.
[2021-04-30T00:03:47] Sync completed in 0.001 seconds.
[2021-04-30T00:03:47] Started: Apr-30-2021,00:02:15
[2021-04-30T00:03:47] Completed: Apr-30-2021,00:03:47
[2021-04-30T00:03:47] Seconds: 92.091
[2021-04-30T00:03:47] Items: 12
[2021-04-30T00:03:47]   Directories: 2
[2021-04-30T00:03:47]   Files: 10
[2021-04-30T00:03:47]   Links: 0
[2021-04-30T00:03:47] Data: 10.000 GiB (10737418240 bytes)
[2021-04-30T00:03:47] Rate: 111.194 MiB/s (10737418240 bytes in 92.091 seconds)


# Verify the two directories have the same content mjean@wolf-184:~/build> ls -la /tmp/datamover3/daos_test/
total 10485808
drwxr-xr-x 2 mjean mjean       4096 Apr 30 00:02 .
drwxr-xr-x 3 mjean mjean       4096 Apr 30 00:02 ..
-rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000000
-rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000001
-rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000002
-rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000003
-rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000004
-rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000005
-rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000006
-rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000007
-rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000008
-rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000009
mjean@wolf-184:~/build> ls -la /tmp/daos_dfuse/daos_test/
total 10485760
-rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000000
-rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000001
-rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000002
-rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000003
-rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000004
-rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000005
-rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000006
-rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000007
-rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000008
-rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000009

...