Introduction
The purpose of this Quick Start is to provider a user with a set of cmdlines to quickly setup and use DAOS with POSIX on openSUSE/SLES 15.2.
This document covers installation of daos rpms on openSUSE/SLES 15.2 and updating the DAOS configuration files needed by daos servers
The quick start describes how to use dfuse in order to take advantage of daos support for POSIX. The quick start steps users through running benchmarking tools like
ior and mdtest along with some examples of how to move data between a POSIX file system and daos containers (and vise versa) and finally cleaning up your daos setup.
Requirements
The quick start requires a minimum of 1 server with PMEM and SSDs connected via infiniband storage network and 1 client node and 1 admin node without pmem/ssd but on the infiniband storage network.
All nodes have a base openSUSE or SLES 15.2 installed.
Install pdsh on the admin node
sudo zypper install pdsh
For example, if one wanted to use node-1 as their admin node, node-2 and node-3 as client nodes, and node-4 and node-5 as their server nodes then these variables would be defined as:
ADMIN_NODE=node-1 CLIENT_NODES=node-2, node-3 SERVER_NODES=node-4, node-5 ALL_NODES=$ADMIN_NODE,$CLIENT_NODES,$SERVER_NODES
If a client node is also serving as an admin node then exclude $ADMIN_NODE from the ALL_NODES assignment to prevent duplication, e.g.
ALL_NODES=$CLIENT_NODES,$SERVER_NODES
Set-Up
Please refer here for initial set up which consists of rpm installation, generate and set up certificates, setting up config files, starting servers and agents.
For this quick start, the daos-tests package will need to be installed on the client nodes
The following applications will be run from a client node:
"*" Indicates that the cmdline will run internally only; cmdline should be removed for external customers
Run CART selftest
SHARED_DIR=<shared dir by all nodes> export FI_UNIVERSE_SIZE=2048 export OFI_INTERFACE=eth0 export CRT_PHY_ADDR_STR="ofi+sockets" # selt_test --help for more details on params #Generate the attach info file (enable SHARED_DIR with perms for sudo to write ) sudo daos_agent -o /etc/daos/daos_agent.yml -l $SHARED_DIR/daos_agent.log dump-attachinfo -o $SHARED_DIR/daos_server.attach_info_tmp # Run: self_test --path $SHARED_DIR --group-name daos_server --endpoint 0-1:0 (for 4 servers --endpoint 0-3:0 ranks:tags) # Sample output: Adding endpoints: ranks: 0-1 (# ranks = 2) tags: 0 (# tags = 1) Warning: No --master-endpoint specified; using this command line application as the master endpoint Self Test Parameters: Group name to test against: daos_server # endpoints: 2 Message sizes: [(200000-BULK_GET 200000-BULK_PUT), (200000-BULK_GET 0-EMPTY), (0-EMPTY 200000-BULK_PUT), (200000-BULK_GET 1000-IOV), (1000-IOV 200000-BULK_PUT), (1000-IOV 1000-IOV), (1000-IOV 0-EMPTY), (0-EMPTY 1000-IOV), (0-EMPTY 0-EMPTY)] Buffer addresses end with: <Default> Repetitions per size: 20000 Max inflight RPCs: 1000 CLI [rank=0 pid=3255] Attached daos_server ################################################## Results for message size (200000-BULK_GET 200000-BULK_PUT) (max_inflight_rpcs = 1000): Master Endpoint 2:0 ------------------- RPC Bandwidth (MB/sec): 222.67 RPC Throughput (RPCs/sec): 584 RPC Latencies (us): Min : 27191 25th %: 940293 Median : 1678137 75th %: 2416765 Max : 3148987 Average: 1671626 Std Dev: 821872.40 RPC Failures: 0 Endpoint results (rank:tag - Median Latency (us)): 0:0 - 2416764 1:0 - 969063 ################################################## Results for message size (200000-BULK_GET 0-EMPTY) (max_inflight_rpcs = 1000): Master Endpoint 2:0 ------------------- RPC Bandwidth (MB/sec): 112.08 RPC Throughput (RPCs/sec): 588 RPC Latencies (us): Min : 2880 25th %: 1156162 Median : 1617356 75th %: 2185604 Max : 2730569 Average: 1659133 Std Dev: 605053.68 RPC Failures: 0 Endpoint results (rank:tag - Median Latency (us)): 0:0 - 2185589 1:0 - 1181363 ################################################## Results for message size (0-EMPTY 200000-BULK_PUT) (max_inflight_rpcs = 1000): Master Endpoint 2:0 ------------------- RPC Bandwidth (MB/sec): 112.11 RPC Throughput (RPCs/sec): 588 RPC Latencies (us): Min : 4956 25th %: 747786 Median : 1558111 75th %: 2583834 Max : 3437395 Average: 1659959 Std Dev: 1078975.59 RPC Failures: 0 Endpoint results (rank:tag - Median Latency (us)): 0:0 - 2583826 1:0 - 776862 ################################################## Results for message size (200000-BULK_GET 1000-IOV) (max_inflight_rpcs = 1000): Master Endpoint 2:0 ------------------- RPC Bandwidth (MB/sec): 112.57 RPC Throughput (RPCs/sec): 587 RPC Latencies (us): Min : 2755 25th %: 12341 Median : 1385716 75th %: 3393178 Max : 3399349 Average: 1660125 Std Dev: 1446054.82 RPC Failures: 0 Endpoint results (rank:tag - Median Latency (us)): 0:0 - 12343 1:0 - 3393174 ################################################## Results for message size (1000-IOV 200000-BULK_PUT) (max_inflight_rpcs = 1000): Master Endpoint 2:0 ------------------- RPC Bandwidth (MB/sec): 112.68 RPC Throughput (RPCs/sec): 588 RPC Latencies (us): Min : 4557 25th %: 522380 Median : 1640322 75th %: 2725419 Max : 3441963 Average: 1661254 Std Dev: 1147206.09 RPC Failures: 0 Endpoint results (rank:tag - Median Latency (us)): 0:0 - 600190 1:0 - 2725402 ################################################## Results for message size (1000-IOV 1000-IOV) (max_inflight_rpcs = 1000): Master Endpoint 2:0 ------------------- RPC Bandwidth (MB/sec): 88.87 RPC Throughput (RPCs/sec): 46595 RPC Latencies (us): Min : 1165 25th %: 21374 Median : 21473 75th %: 21572 Max : 21961 Average: 20923 Std Dev: 2786.99 RPC Failures: 0 Endpoint results (rank:tag - Median Latency (us)): 0:0 - 21430 1:0 - 21516 ################################################## Results for message size (1000-IOV 0-EMPTY) (max_inflight_rpcs = 1000): Master Endpoint 2:0 ------------------- RPC Bandwidth (MB/sec): 59.03 RPC Throughput (RPCs/sec): 61902 RPC Latencies (us): Min : 1164 25th %: 15544 Median : 16104 75th %: 16575 Max : 17237 Average: 15696 Std Dev: 2126.37 RPC Failures: 0 Endpoint results (rank:tag - Median Latency (us)): 0:0 - 15579 1:0 - 16571 ################################################## Results for message size (0-EMPTY 1000-IOV) (max_inflight_rpcs = 1000): Master Endpoint 2:0 ------------------- RPC Bandwidth (MB/sec): 46.93 RPC Throughput (RPCs/sec): 49209 RPC Latencies (us): Min : 945 25th %: 20327 Median : 20393 75th %: 20434 Max : 20576 Average: 19821 Std Dev: 2699.27 RPC Failures: 0 Endpoint results (rank:tag - Median Latency (us)): 0:0 - 20393 1:0 - 20393 ################################################## Results for message size (0-EMPTY 0-EMPTY) (max_inflight_rpcs = 1000): Master Endpoint 2:0 ------------------- RPC Bandwidth (MB/sec): 0.00 RPC Throughput (RPCs/sec): 65839 RPC Latencies (us): Min : 879 25th %: 14529 Median : 15108 75th %: 15650 Max : 16528 Average: 14765 Std Dev: 2087.87 RPC Failures: 0 Endpoint results (rank:tag - Median Latency (us)): 0:0 - 14569 1:0 - 15649
Run DAOS PERF*
(requires openmpi3 - libmpi.so.40)
module load gnu-openmpi/3.1.6 or export LD_LIBRARY_PATH=<openmpi lib path>:$LD_LIBRARY_PATH export PATH=<openmpi bin path>:$PATH export D_LOG_FILE=/tmp/daos_perf.log # Single process daos_perf -a 64 -d 256 -c R2S -P 20G -T daos -s 1k -R "U;pV" -g /etc/daos/daos_control.yml # MPI orterun --enable-recovery -x D_LOG_FILE=/tmp/daos_perf_daos.log --host <host name>:4 --map-by node --mca btl_openib_warn_default_gid_prefix "0" --mca btl "tcp,self" --mca oob "tcp" --mca pml "ob1" --mca btl_tcp_if_include "eth0" --np 4 --tag-output /usr/bin/daos_perf -a 64 -d 256 -c R2S -P 20G -T daos -s 1k -R "U;pV" -g /etc/daos/daos_control.yml # Sample Output: Test : DAOS R2S (full stack, 2 replica) Pool : 9c88849b-b0d6-4444-bb39-42769a7a1ef5 Parameters : pool size : SCM: 20480 MB, NVMe: 0 MB credits : -1 (sync I/O for -ve) obj_per_cont : 1 x 1 (procs) dkey_per_obj : 256 akey_per_dkey : 64 recx_per_akey : 16 value type : single stride size : 1024 zero copy : no VOS file : <NULL> Running test=UPDATE Running UPDATE test (iteration=1) UPDATE successfully completed: duration : 91.385233 sec bandwith : 2.801 MB/sec rate : 2868.56 IO/sec latency : 348.607 us (nonsense if credits > 1) Duration across processes: MAX duration : 91.385233 sec MIN duration : 91.385233 sec Average duration : 91.385233 sec Completed test=UPDATE
Run DAOS_RACER*
(requires openmpi3 - libmpi.so.40 )
module load gnu-openmpi/3.1.6 or export LD_LIBRARY_PATH=<openmpi lib path>:$LD_LIBRARY_PATH export PATH=<openmpi bin path>:$PATH # RUN: export D_LOG_FILE=/tmp/daos_racer.log export D_LOG_MASK=ERR /usr/bin/daos_racer -n /etc/daos/daos_control.yml NOTE: daos_racer is currently disabled due to DAOS-7359
Run DAOS_TEST*
(requires openmpi3 - libmpi.so.40 )
module load gnu-openmpi/3.1.6 or export LD_LIBRARY_PATH=<openmpi lib path>:$LD_LIBRARY_PATH export PATH=<openmpi bin path>:$PATH # RUN: export OFI_INTERFACE=eth0 export POOL_SCM_SIZE=8G export POOL_NVME_SIZE=16G daos_test -pctVAKCoRb -n /etc/daos/daos_control.yml # Sample output from -p (pool test) ================= DAOS pool tests.. ===================== [==========] Running 14 test(s). setup: creating pool, SCM size=8 GB, NVMe size=16 GB setup: created pool b91606bb-87dd-4a57-8eee-5d8747a37b31 [ RUN ] POOL1: connect to non-existing pool [ OK ] POOL1: connect to non-existing pool [ RUN ] POOL2: connect/disconnect to pool rank 0 connecting to pool synchronously ... success rank 0 querying pool info... success rank 0 disconnecting from pool synchronously ... success rank 0 success [ OK ] POOL2: connect/disconnect to pool [ RUN ] POOL3: connect/disconnect to pool (async) rank 0 connecting to pool asynchronously ... success rank 0 querying pool info... success rank 0 disconnecting from pool asynchronously ... success rank 0 success [ OK ] POOL3: connect/disconnect to pool (async) [ RUN ] POOL4: pool handle local2global and global2local rank 0 connecting to pool synchronously ... success rank 0 querying pool info... success rank 0 call local2global on pool handlesuccess rank 0 broadcast global pool handle ...success rank 0 disconnecting from pool synchronously ... success rank 0 success [ OK ] POOL4: pool handle local2global and global2local [ RUN ] POOL5: exclusive connection SUBTEST 1: other connections already exist; shall get -1012 establishing a non-exclusive connection trying to establish an exclusive connection disconnecting the non-exclusive connection SUBTEST 2: no other connections; shall succeed establishing an exclusive connection SUBTEST 3: shall prevent other connections (-1012) trying to establish a non-exclusive connection disconnecting the exclusive connection [ OK ] POOL5: exclusive connection [ RUN ] POOL6: exclude targets and query pool info Skip it for now, because CaRT can't support subgroup membership, excluding a node w/o killing it will cause IV issue. [ OK ] POOL6: exclude targets and query pool info [ RUN ] POOL7: set/get/list user-defined pool attributes (sync) setup: connecting to pool connected to pool, ntarget=16 setting pool attributes synchronously ... listing pool attributes synchronously ... Verifying Total Name Length.. Verifying Small Name.. Verifying All Names.. getting pool attributes synchronously ... Verifying Name-Value (A).. Verifying Name-Value (B).. Verifying with NULL buffer.. Deleting all attributes Verifying all attributes deletion [ OK ] POOL7: set/get/list user-defined pool attributes (sync) [ RUN ] POOL8: set/get/list user-defined pool attributes (async) setting pool attributes asynchronously ... listing pool attributes asynchronously ... Verifying Total Name Length.. Verifying Small Name.. Verifying All Names.. getting pool attributes asynchronously ... Verifying Name-Value (A).. Verifying Name-Value (B).. Verifying with NULL buffer.. Deleting all attributes Verifying all attributes deletion [ OK ] POOL8: set/get/list user-defined pool attributes (async) [ RUN ] POOL9: pool reconnect after daos re-init connected to pool, ntarget=16 [ OK ] POOL9: pool reconnect after daos re-init [ RUN ] POOL10: pool create with properties and query create pool with properties, and query it to verify. setup: creating pool, SCM size=8 GB, NVMe size=16 GB setup: created pool 6b686dce-8cc3-40e6-b706-4c5aedfee4a9 setup: connecting to pool connected to pool, ntarget=16 ACL prop matches expected defaults teardown: destroyed pool 6b686dce-8cc3-40e6-b706-4c5aedfee4a9 [ OK ] POOL10: pool create with properties and query [ RUN ] POOL11: pool list containers (zero) setup: creating pool, SCM size=8 GB, NVMe size=16 GB setup: created pool f3bc2fb8-17aa-45cc-9b1a-87ffebddb048 setup: connected to pool: f3bc2fb8-17aa-45cc-9b1a-87ffebddb048 daos_pool_list_cont returned rc=0 success t0: output nconts=0 verifying conts[0..10], nfilled=0 success t1: conts[] over-sized success t2: nconts=0, non-NULL conts[] rc=0 success t3: in &nconts NULL, -DER_INVAL success teardown: destroyed pool f3bc2fb8-17aa-45cc-9b1a-87ffebddb048 [ OK ] POOL11: pool list containers (zero) [ RUN ] POOL12: pool list containers (many) setup: creating pool, SCM size=8 GB, NVMe size=16 GB setup: created pool d0fbd1a8-2061-428c-9cfb-376f201689fc setup: connected to pool: d0fbd1a8-2061-428c-9cfb-376f201689fc setup: alloc lcarg->conts len 16 setup: creating container: b52e36e6-26db-4e51-9bbe-df776624ca59 setup: creating container: 645212d3-3fad-446d-be2a-c362b0884555 setup: creating container: 4a2ab45a-2adb-4c1e-8d03-2c7f06ad0721 setup: creating container: 07cbf32b-beaf-444a-af64-5602aa41485f setup: creating container: 427824a9-a6c7-4417-83c8-1fc2f5e49c83 setup: creating container: 898ffab6-9472-47cc-a6ca-9a6fe54704c0 setup: creating container: d4d3181c-7376-41fa-8f81-fb363b7b8e2e setup: creating container: c1253a14-991d-4e2a-a3cf-ff2e4b160240 setup: creating container: 66439972-fd1e-4330-a4b4-e5ee6f6c2b06 setup: creating container: 09a4cdca-b4d0-4d3c-9898-75a7a5adbd45 setup: creating container: 480f2232-58c9-444e-bff5-36e2efa3609d setup: creating container: d33ae63c-eeb4-411f-8ca4-c70eb86371dc setup: creating container: d9b3eb0e-7ef3-4e8b-8bd6-8e757a39d0cb setup: creating container: f789b108-96f1-4052-ad22-c6fb5b02cd19 setup: creating container: 2c83b9d1-a723-49da-912b-63ffe3e43930 setup: creating container: 31e5df3b-809f-47d8-b170-68329b687361 daos_pool_list_cont returned rc=0 success t0: output nconts=16 verifying conts[0..26], nfilled=16 container 645212d3-3fad-446d-be2a-c362b0884555 found in list result container 4a2ab45a-2adb-4c1e-8d03-2c7f06ad0721 found in list result container d9b3eb0e-7ef3-4e8b-8bd6-8e757a39d0cb found in list result container b52e36e6-26db-4e51-9bbe-df776624ca59 found in list result container 09a4cdca-b4d0-4d3c-9898-75a7a5adbd45 found in list result container 2c83b9d1-a723-49da-912b-63ffe3e43930 found in list result container d33ae63c-eeb4-411f-8ca4-c70eb86371dc found in list result container 427824a9-a6c7-4417-83c8-1fc2f5e49c83 found in list result container 66439972-fd1e-4330-a4b4-e5ee6f6c2b06 found in list result container f789b108-96f1-4052-ad22-c6fb5b02cd19 found in list result container 480f2232-58c9-444e-bff5-36e2efa3609d found in list result container 07cbf32b-beaf-444a-af64-5602aa41485f found in list result container 31e5df3b-809f-47d8-b170-68329b687361 found in list result container 898ffab6-9472-47cc-a6ca-9a6fe54704c0 found in list result container c1253a14-991d-4e2a-a3cf-ff2e4b160240 found in list result container d4d3181c-7376-41fa-8f81-fb363b7b8e2e found in list result success t1: conts[] over-sized success t2: nconts=0, non-NULL conts[] rc=0 success t3: in &nconts NULL, -DER_INVAL verifying conts[0..16], nfilled=16 container 645212d3-3fad-446d-be2a-c362b0884555 found in list result container 4a2ab45a-2adb-4c1e-8d03-2c7f06ad0721 found in list result container d9b3eb0e-7ef3-4e8b-8bd6-8e757a39d0cb found in list result container b52e36e6-26db-4e51-9bbe-df776624ca59 found in list result container 09a4cdca-b4d0-4d3c-9898-75a7a5adbd45 found in list result container 2c83b9d1-a723-49da-912b-63ffe3e43930 found in list result container d33ae63c-eeb4-411f-8ca4-c70eb86371dc found in list result container 427824a9-a6c7-4417-83c8-1fc2f5e49c83 found in list result container 66439972-fd1e-4330-a4b4-e5ee6f6c2b06 found in list result container f789b108-96f1-4052-ad22-c6fb5b02cd19 found in list result container 480f2232-58c9-444e-bff5-36e2efa3609d found in list result container 07cbf32b-beaf-444a-af64-5602aa41485f found in list result container 31e5df3b-809f-47d8-b170-68329b687361 found in list result container 898ffab6-9472-47cc-a6ca-9a6fe54704c0 found in list result container c1253a14-991d-4e2a-a3cf-ff2e4b160240 found in list result container d4d3181c-7376-41fa-8f81-fb363b7b8e2e found in list result success t4: conts[] exact length verifying conts[0..15], nfilled=0 success t5: conts[] under-sized success teardown: destroy container: b52e36e6-26db-4e51-9bbe-df776624ca59 teardown: destroy container: 645212d3-3fad-446d-be2a-c362b0884555 teardown: destroy container: 4a2ab45a-2adb-4c1e-8d03-2c7f06ad0721 teardown: destroy container: 07cbf32b-beaf-444a-af64-5602aa41485f teardown: destroy container: 427824a9-a6c7-4417-83c8-1fc2f5e49c83 teardown: destroy container: 898ffab6-9472-47cc-a6ca-9a6fe54704c0 teardown: destroy container: d4d3181c-7376-41fa-8f81-fb363b7b8e2e teardown: destroy container: c1253a14-991d-4e2a-a3cf-ff2e4b160240 teardown: destroy container: 66439972-fd1e-4330-a4b4-e5ee6f6c2b06 teardown: destroy container: 09a4cdca-b4d0-4d3c-9898-75a7a5adbd45 teardown: destroy container: 480f2232-58c9-444e-bff5-36e2efa3609d teardown: destroy container: d33ae63c-eeb4-411f-8ca4-c70eb86371dc teardown: destroy container: d9b3eb0e-7ef3-4e8b-8bd6-8e757a39d0cb teardown: destroy container: f789b108-96f1-4052-ad22-c6fb5b02cd19 teardown: destroy container: 2c83b9d1-a723-49da-912b-63ffe3e43930 teardown: destroy container: 31e5df3b-809f-47d8-b170-68329b687361 teardown: destroyed pool d0fbd1a8-2061-428c-9cfb-376f201689fc [ OK ] POOL12: pool list containers (many) [ RUN ] POOL13: retry POOL_{CONNECT,DISCONNECT,QUERY} setting DAOS_POOL_CONNECT_FAIL_CORPC ... success connecting to pool ... success setting DAOS_POOL_QUERY_FAIL_CORPC ... success querying pool info... success setting DAOS_POOL_DISCONNECT_FAIL_CORPC ... success disconnecting from pool ... success [ OK ] POOL13: retry POOL_{CONNECT,DISCONNECT,QUERY} [ RUN ] POOL14: pool connect access based on ACL pool ACL gives the owner no permissions setup: creating pool, SCM size=8 GB, NVMe size=16 GB setup: created pool 86a9951c-08f2-4ad0-a4ad-037f8b85a656 setup: connecting to pool daos_pool_connect failed, rc: -1001 failed to connect pool: -1001 pool disconnect failed: -1002 teardown: destroyed pool 86a9951c-08f2-4ad0-a4ad-037f8b85a656 pool ACL gives the owner RO, they want RW setup: creating pool, SCM size=8 GB, NVMe size=16 GB setup: created pool 15d9c7f9-140e-436d-88d0-2d5924348725 setup: connecting to pool daos_pool_connect failed, rc: -1001 failed to connect pool: -1001 pool disconnect failed: -1002 teardown: destroyed pool 15d9c7f9-140e-436d-88d0-2d5924348725 pool ACL gives the owner RO, they want RO setup: creating pool, SCM size=8 GB, NVMe size=16 GB setup: created pool b6ce2f69-94a8-4fa7-8c43-b35766a0fb6f setup: connecting to pool connected to pool, ntarget=16 teardown: destroyed pool b6ce2f69-94a8-4fa7-8c43-b35766a0fb6f pool ACL gives the owner RW, they want RO setup: creating pool, SCM size=8 GB, NVMe size=16 GB setup: created pool 992015b0-7c70-458e-afa5-eecae4ad5313 setup: connecting to pool connected to pool, ntarget=16 teardown: destroyed pool 992015b0-7c70-458e-afa5-eecae4ad5313 pool ACL gives the owner RW, they want RW setup: creating pool, SCM size=8 GB, NVMe size=16 GB setup: created pool 464bfccf-2ec9-4105-b791-f1163960dc0d setup: connecting to pool connected to pool, ntarget=16 teardown: destroyed pool 464bfccf-2ec9-4105-b791-f1163960dc0d [ OK ] POOL14: pool connect access based on ACL teardown: destroyed pool b91606bb-87dd-4a57-8eee-5d8747a37b31 [==========] 14 test(s) run. [ PASSED ] 14 test(s).
Create pool and dfuse mountpoint for IOR, MDTEST and DATAMOVER
Create pool
dmg pool create --label=daos_test_pool --size=500G # Sample output Creating DAOS pool with automatic storage allocation: 500 GB NVMe + 6.00% SCM Pool created with 6.00% SCM/NVMe ratio --------------------------------------- UUID : acf889b6-f290-4d7b-823a-5fae0014a64d Service Ranks : 0 Storage Ranks : 0 Total Size : 530 GB SCM : 30 GB (30 GB / rank) NVMe : 500 GB (500 GB / rank) dmg pool list # Sample output Pool UUID Svc Replicas -------------- ---------------- acf889b6-f290-4d7b-823a-5fae0014a64d 0 DAOS_POOL=<pool uuid> (define on all clients)
Create container
daos cont create --type=POSIX --oclass=SX $DAOS_POOL DAOS_CONT=<cont uuid> (define on all clients)
Set up dfuse mount point:
( Run dfuse on all client nodes )
# Create directory mkdir -p /tmp/daos_dfuse/daos_test # Use dfuse to mount the daos container to the above directory dfuse --container $DAOS_CONT --disable-direct-io --mountpoint /tmp/daos_dfuse/daos_test --pool $DAOS_POOL # verfiy that the file type is dfuse df -h # Sample output dfuse 500G 17G 34G 34% /tmp/daos_dfuse/daos_test
Run IOR
(uses mpich in the examples)
Build IOR
( Requires mpich added to PATH and LD_LIBRARY_PATH )
module load gnu-mpich/3.4~a2 or export LD_LIBRARY_PATH=<mpich lib path>:$LD_LIBRARY_PATH export PATH=<mpich bin path>:$PATH # Download ior source git clone https://github.com/hpc/ior.git # Build IOR cd ior ./bootstrap mkdir build;cd build MPICC=mpicc ../configure --with-daos=/usr --prefix=<your dir> make make install # Add IOR to paths add <your dir>/lib to LD_LIBRARY_PATh and <your dir>/bin to PATH
module load gnu-mpich/3.4~a2 or export LD_LIBRARY_PATH=<mpich lib path>:$LD_LIBRARY_PATH export PATH=<mpich bin path>:$PATH # Run: mpirun -np 20 -ppn 10 -hosts host1,host2 ior -a POSIX -b 1G -v -w -W -r -R -k -F -T 10 -i 1 -s 1 -o /tmp/daos_dfuse/daos_test/testfile -t 1G # Sample output IOR-3.4.0+dev: MPI Coordinated Test of Parallel I/O Began : Thu Apr 29 16:31:55 2021 Command line : ior -a POSIX -b 1G -v -w -W -r -R -k -F -T 10 -i 1 -s 1 -o /tmp/daos_dfuse/daos_test/testfile -t 1G Machine : Linux wolf-184 Start time skew across all tasks: 0.00 sec TestID : 0 StartTime : Thu Apr 29 16:31:55 2021 Path : /tmp/daos_dfuse/daos_test/testfile.00000000 FS : 493.6 GiB Used FS: 6.6% Inodes: -0.0 Mi Used Inodes: 0.0% Participating tasks : 20 Options: api : POSIX apiVersion : test filename : /tmp/daos_dfuse/daos_test/testfile access : file-per-process type : independent segments : 1 ordering in a file : sequential ordering inter file : no tasks offsets nodes : 2 tasks : 20 clients per node : 10 repetitions : 1 xfersize : 1 GiB blocksize : 1 GiB aggregate filesize : 20 GiB verbose : 1 Results: access bw(MiB/s) IOPS Latency(s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter ------ --------- ---- ---------- ---------- --------- -------- -------- -------- -------- ---- Commencing write performance test: Thu Apr 29 16:31:56 2021 write 224.18 0.218924 91.34 1048576 1048576 0.024130 91.36 0.000243 91.36 0 Verifying contents of the file(s) just written. Thu Apr 29 16:33:27 2021 Commencing read performance test: Thu Apr 29 16:34:59 2021 read 223.26 0.218024 91.60 1048576 1048576 0.137707 91.73 0.000087 91.73 0 Max Write: 224.18 MiB/sec (235.07 MB/sec) Max Read: 223.26 MiB/sec (234.10 MB/sec) Summary of all tests: Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Max(OPs) Min(OPs) Mean(OPs) StdDev Mean(s) Stonewall(s) Stonewall(MiB) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggs(MiB) API RefNum write 224.18 224.18 224.18 0.00 0.22 0.22 0.22 0.00 91.35618 NA NA 0 20 10 1 1 0 1 0 0 1 1073741824 1073741824 20480.0 POSIX 0 read 223.26 223.26 223.26 0.00 0.22 0.22 0.22 0.00 91.73292 NA NA 0 20 10 1 1 0 1 0 0 1 1073741824 1073741824 20480.0 POSIX 0 Finished : Thu Apr 29 16:36:31 2021
Run MDTEST
module load gnu-mpich/3.4~a2 or export LD_LIBRARY_PATH=<mpich lib path>:$LD_LIBRARY_PATH export PATH=<mpich bin path>:$PATH # Create 10000 files # Run: mpirun -np 20 -ppn 10 -hosts host1,host2 mdtest -a POSIX -z 0 -N 1 -P -i 1 -n 500 -e 4096 -d /tmp/daos_dfuse/daos_test -w 4096 #Sample output -- started at 04/29/2021 17:09:02 -- mdtest-3.4.0+dev was launched with 20 total task(s) on 2 node(s) Command line used: mdtest '-a' 'POSIX' '-z' '0' '-N' '1' '-P' '-i' '1' '-n' '500' '-e' '4096' '-d' '/tmp/daos_dfuse/daos_test' '-w' '4096' Path: /tmp/daos_dfuse FS: 36.5 GiB Used FS: 86.5% Inodes: 2.3 Mi Used Inodes: 9.7% Nodemap: 11111111110000000000 V-0: Rank 0 Line 2216 Shifting ranks by 10 for each phase. 20 tasks, 10000 files/directories SUMMARY rate: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 2249.104 2249.085 2249.094 0.009 Directory stat : 4089.121 4089.116 4089.118 0.001 Directory removal : 318.313 318.311 318.312 0.000 File creation : 1080.348 1080.334 1080.341 0.007 File stat : 1676.635 1676.619 1676.632 0.004 File read : 1486.296 1486.291 1486.295 0.002 File removal : 611.135 611.132 611.133 0.001 Tree creation : 667.967 667.967 667.967 0.000 Tree removal : 18.063 18.063 18.063 0.000 SUMMARY time: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 4.446 4.446 4.446 0.000 Directory stat : 2.446 2.446 2.446 0.000 Directory removal : 31.416 31.416 31.416 0.000 File creation : 9.256 9.256 9.256 0.000 File stat : 5.964 5.964 5.964 0.000 File read : 6.728 6.728 6.728 0.000 File removal : 16.363 16.363 16.363 0.000 Tree creation : 0.001 0.001 0.001 0.000 Tree removal : 0.055 0.055 0.055 0.000 -- finished at 04/29/2021 17:10:19 -- # Create 1000000 files #Run: mpirun -np 20 -ppn 10 -hosts host1,host2 mdtest -a POSIX -z 0 -N 1 -P -i 1 -n 50000 -e 4096 -d /tmp/daos_dfuse/daos_test -w 4096
Run DBENCH
module load gnu-mpich/3.4~a2 or export LD_LIBRARY_PATH=<mpich lib path>:$LD_LIBRARY_PATH export PATH=<mpich bin path>:$PATH # Run: dbench -c /usr/share/dbench/client.txt -t 10 -D /tmp/daos_dfuse/daos_test 10 #Sample output dbench version 3.04 - Copyright Andrew Tridgell 1999-2004 Running for 10 seconds with load '/usr/share/dbench/client.txt' and minimum warmup 2 secs 10 clients started 10 131 58.09 MB/sec warmup 1 sec 10 401 42.01 MB/sec execute 1 sec 10 538 43.39 MB/sec execute 2 sec 10 682 37.23 MB/sec execute 3 sec 10 803 28.65 MB/sec execute 4 sec 10 898 23.36 MB/sec execute 5 sec 10 980 19.87 MB/sec execute 6 sec 10 1074 18.43 MB/sec execute 7 sec 10 1161 16.25 MB/sec execute 8 sec 10 1240 14.52 MB/sec execute 9 sec 10 1367 15.67 MB/sec cleanup 10 sec 10 1367 14.25 MB/sec cleanup 11 sec 10 1367 14.08 MB/sec cleanup 11 sec Throughput 15.6801 MB/sec 10 procs
Run DATAMOVER
For more details on datamover reference:
https://github.com/hpc/mpifileutils/blob/master/DAOS-Support.md
# Create a POSIX container daos container create --pool $DAOS_POOL --type POSIX DAOS_CONT2=<cont uuid> # Copy POSIX directory to POSIX container (only directory copies are supported in 1.2) daos filesystem copy --src /tmp/daos_dfuse/daos_test --dst daos://$DAOS_POOL/$DAOS_CONT2 # Copy the same POSIX container to a different POSIX directory daos filesystem copy --src daos://$DAOS_POOL/$DAOS_CONT2 --dst /tmp/datamover2 ls -latr /tmp/datamover2/daos_test/ # Sample output ls -la /tmp/daos_dfuse/daos_test/ total 10485760 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000000 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000001 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000002 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000003 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000004 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000005 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000006 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000007 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000008 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000009 ls -la /tmp/datamover2/daos_test/ total 10485808 drwx------ 2 mjean mjean 4096 Apr 29 17:45 . drwx------ 3 mjean mjean 4096 Apr 29 17:43 .. -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 17:44 testfile.00000000 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 17:44 testfile.00000001 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 17:43 testfile.00000002 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 17:45 testfile.00000003 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 17:44 testfile.00000004 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 17:44 testfile.00000005 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 17:44 testfile.00000006 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 17:43 testfile.00000007 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 17:44 testfile.00000008 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 17:45 testfile.00000009
Run DATAMOVER using mpifileutils
Build mpifileutils
Mpifileutils can be built using dependency packages or dependencies built from source
For more details on mpifileutils reference:
https://github.com/hpc/mpifileutils/blob/master/DAOS-Support.md
# Install the following packages: zypper install mpich-devel libbz2-devel # Setup environment (on launch node) #Setup mpich env module load gnu-mpich/3.4~a2 or export LD_LIBRARY_PATH=<mpich lib path>:$LD_LIBRARY_PATH export PATH=<mpich bin path>:$PATH export MPI_HOME=<mpich path>
Build mpifileutils with dependencies installed from packages
# Install build dependencies (on all client nodes) sudo zypper install dtcmp-mpich-devel libcircle-mpich-devel libcap-devel # Build mpifileutils from installed packages (on all client nodes) git clone --depth 1 https://github.com/hpc/mpifileutils mkdir build install cd build cmake ../mpifileutils -DENABLE_DAOS=ON \ -DENABLE_LIBARCHIVE=OFF \ -DDTCMP_INCLUDE_DIRS=/usr/lib64/mpi/gcc/mpich/include \ -DDTCMP_LIBRARIES=/usr/lib64/mpi/gcc/mpich/lib64/libdtcmp.so \ -DLibCircle_INCLUDE_DIRS=/usr/lib64/mpi/gcc/mpich/include \ -DLibCircle_LIBRARIES=/usr/lib64/mpi/gcc/mpich/lib64/libcircle.so \ -DWITH_CART_PREFIX=/usr \ -DWITH_DAOS_PREFIX=/usr \ -DCMAKE_INSTALL_INCLUDEDIR=/usr/lib64/mpi/gcc/mpich/include \ -DCMAKE_INSTALL_PREFIX=/usr/lib64/mpi/gcc/mpich/ \ -DCMAKE_INSTALL_LIBDIR=/usr/lib64/mpi/gcc/mpich/lib64 sudo make install
Build mpifileutils with dependencies that are built from source
mkdir install installdir=`pwd`/install export CC=mpicc # download dependencies and build mkdir deps cd deps wget https://github.com/hpc/libcircle/releases/download/v0.3/libcircle-0.3.0.tar.gz wget https://github.com/llnl/lwgrp/releases/download/v1.0.3/lwgrp-1.0.3.tar.gz wget https://github.com/llnl/dtcmp/releases/download/v1.1.1/dtcmp-1.1.1.tar.gz wget https://github.com/libarchive/libarchive/releases/download/3.5.1/libarchive-3.5.1.tar.gz tar -zxf libcircle-0.3.0.tar.gz cd libcircle-0.3.0 ./configure --prefix=$installdir make install cd .. tar -zxf lwgrp-1.0.3.tar.gz cd lwgrp-1.0.3 ./configure --prefix=$installdir make install cd .. tar -zxf dtcmp-1.1.1.tar.gz cd dtcmp-1.1.1 ./configure --prefix=$installdir --with-lwgrp=$installdir make install cd .. tar -zxf libarchive-3.5.1.tar.gz cd libarchive-3.5.1 ./configure --prefix=$installdir make install cd .. cd .. # Download mpifileutils and build git clone --depth 1 https://github.com/hpc/mpifileutils mkdir build install cd build cmake ../mpifileutils \ -DWITH_DTCMP_PREFIX=../install \ -DWITH_LibCircle_PREFIX=../install \ -DDTCMP_INCLUDE_DIRS=./install/include \ -DDTCMP_LIBRARIES=../install/lib64/libdtcmp.so \ -DLibCircle_INCLUDE_DIRS=../install/include \ -DLibCircle_LIBRARIES=../install/lib64/libcircle.so \ -DCMAKE_INSTALL_PREFIX=../install \ -DWITH_CART_PREFIX=/usr \ -DWITH_DAOS_PREFIX=/usr \ -DENABLE_DAOS=ON make install
Create two POSIX containers for the mpifilutils test cases
daos container create --pool $DAOS_POOL --type POSIX DAOS_CONT3=<cont uuid> daos container create --pool $DAOS_POOL --type POSIX DAOS_CONT4=<cont uuid>
Run doas copy (dcp)
mpirun -hosts <hosts> -np 16 --ppn 16 dcp --bufsize 64MB --chunksize 128MB /tmp/daos_dfuse/daos_test daos://$DAOS_POOL/$DAOS_CONT3 #Sample output [2021-04-29T23:55:52] Walking /tmp/daos_dfuse/daos_test [2021-04-29T23:55:52] Walked 11 items in 0.026 secs (417.452 items/sec) ... [2021-04-29T23:55:52] Walked 11 items in 0.026 seconds (415.641 items/sec) [2021-04-29T23:55:52] Copying to / [2021-04-29T23:55:52] Items: 11 [2021-04-29T23:55:52] Directories: 1 [2021-04-29T23:55:52] Files: 10 [2021-04-29T23:55:52] Links: 0 [2021-04-29T23:55:52] Data: 10.000 GiB (1.000 GiB per file) [2021-04-29T23:55:52] Creating 1 directories [2021-04-29T23:55:52] Creating 10 files. [2021-04-29T23:55:52] Copying data. [2021-04-29T23:56:53] Copied 1.312 GiB (13%) in 61.194 secs (21.963 MiB/s) 405 secs left ... [2021-04-29T23:58:11] Copied 6.000 GiB (60%) in 139.322 secs (44.099 MiB/s) 93 secs left ... [2021-04-29T23:58:11] Copied 10.000 GiB (100%) in 139.322 secs (73.499 MiB/s) done [2021-04-29T23:58:11] Copy data: 10.000 GiB (10737418240 bytes) [2021-04-29T23:58:11] Copy rate: 73.499 MiB/s (10737418240 bytes in 139.322 seconds) [2021-04-29T23:58:11] Syncing data to disk. [2021-04-29T23:58:11] Sync completed in 0.006 seconds. [2021-04-29T23:58:11] Fixing permissions. [2021-04-29T23:58:11] Updated 11 items in 0.002 seconds (4822.579 items/sec) [2021-04-29T23:58:11] Syncing directory updates to disk. [2021-04-29T23:58:11] Sync completed in 0.001 seconds. [2021-04-29T23:58:11] Started: Apr-29-2021,23:55:52 [2021-04-29T23:58:11] Completed: Apr-29-2021,23:58:11 [2021-04-29T23:58:11] Seconds: 139.335 [2021-04-29T23:58:11] Items: 11 [2021-04-29T23:58:11] Directories: 1 [2021-04-29T23:58:11] Files: 10 [2021-04-29T23:58:11] Links: 0 [2021-04-29T23:58:11] Data: 10.000 GiB (10737418240 bytes) [2021-04-29T23:58:11] Rate: 73.492 MiB/s (10737418240 bytes in 139.335 seconds) # Create directory mkdir /tmp/datamover3 #RUN mpirun -hosts wolf-184 --ppn 16 -np 16 dcp --bufsize 64MB --chunksize 128MB daos://$DAOS_POOL/$DAOS_CONT3 /tmp/datamover3/ # Sample output [2021-04-30T00:02:14] Walking / [2021-04-30T00:02:15] Walked 12 items in 0.112 secs (107.354 items/sec) ... [2021-04-30T00:02:15] Walked 12 items in 0.112 seconds (107.236 items/sec) [2021-04-30T00:02:15] Copying to /tmp/datamover3 [2021-04-30T00:02:15] Items: 12 [2021-04-30T00:02:15] Directories: 2 [2021-04-30T00:02:15] Files: 10 [2021-04-30T00:02:15] Links: 0 [2021-04-30T00:02:15] Data: 10.000 GiB (1.000 GiB per file) [2021-04-30T00:02:15] Creating 2 directories [2021-04-30T00:02:15] Original directory exists, skip the creation: `/tmp/datamover3/' (errno=17 File exists) [2021-04-30T00:02:15] Creating 10 files. [2021-04-30T00:02:15] Copying data. [2021-04-30T00:03:15] Copied 1.938 GiB (19%) in 60.341 secs (32.880 MiB/s) 251 secs left ... [2021-04-30T00:03:46] Copied 8.750 GiB (88%) in 91.953 secs (97.441 MiB/s) 13 secs left ... [2021-04-30T00:03:46] Copied 10.000 GiB (100%) in 91.953 secs (111.361 MiB/s) done [2021-04-30T00:03:46] Copy data: 10.000 GiB (10737418240 bytes) [2021-04-30T00:03:46] Copy rate: 111.361 MiB/s (10737418240 bytes in 91.954 seconds) [2021-04-30T00:03:46] Syncing data to disk. [2021-04-30T00:03:47] Sync completed in 0.135 seconds. [2021-04-30T00:03:47] Fixing permissions. [2021-04-30T00:03:47] Updated 12 items in 0.000 seconds (71195.069 items/sec) [2021-04-30T00:03:47] Syncing directory updates to disk. [2021-04-30T00:03:47] Sync completed in 0.001 seconds. [2021-04-30T00:03:47] Started: Apr-30-2021,00:02:15 [2021-04-30T00:03:47] Completed: Apr-30-2021,00:03:47 [2021-04-30T00:03:47] Seconds: 92.091 [2021-04-30T00:03:47] Items: 12 [2021-04-30T00:03:47] Directories: 2 [2021-04-30T00:03:47] Files: 10 [2021-04-30T00:03:47] Links: 0 [2021-04-30T00:03:47] Data: 10.000 GiB (10737418240 bytes) [2021-04-30T00:03:47] Rate: 111.194 MiB/s (10737418240 bytes in 92.091 seconds) # Verify the two directories have the same content mjean@wolf-184:~/build> ls -la /tmp/datamover3/daos_test/ total 10485808 drwxr-xr-x 2 mjean mjean 4096 Apr 30 00:02 . drwxr-xr-x 3 mjean mjean 4096 Apr 30 00:02 .. -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000000 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000001 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000002 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000003 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000004 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000005 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000006 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000007 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000008 -rw-r--r-- 1 mjean mjean 1073741824 Apr 30 00:03 testfile.00000009 mjean@wolf-184:~/build> ls -la /tmp/daos_dfuse/daos_test/ total 10485760 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000000 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000001 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000002 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000003 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000004 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000005 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000006 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000007 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000008 -rw-r--r-- 1 mjean mjean 1073741824 Apr 29 16:31 testfile.00000009
Clean Up
Remove datamover tmp directories
rm -rf /tmp/datamover2 rm -rf /tmp/datamover3
Remove dfuse mountpoint:
# unmount dfuse pdsh -w $CLIENT_NODES 'fusermount3 -uz /tmp/daos_dfuse/daos_test' # remove mount dir pdsh -w $CLIENT_NODES rm -rf /tmp/daos_dfuse
Destroy Containers:
# destroy container1 daos container destroy --pool $DAOS_POOL --cont $DAOS_CONT # destroy container2 daos container destroy --pool $DAOS_POOL --cont $DAOS_CONT2 # destroy container3 daos container destroy --pool $DAOS_POOL --cont $DAOS_CONT3 # destroy container4 daos container destroy --pool $DAOS_POOL --cont $DAOS_CONT4
Destroy Pool:
# destroy pool dmg pool destroy --pool $DAOS_POOL
Stop Agents:
# stop agents pdsh -S -w $CLIENT_NODES "sudo systemctl stop daos_agent"
Stop Servers:
# stop servers pdsh -S -w $SERVER_NODES "sudo systemctl stop daos_server"