NOTE THESE ARE NOT TO BE APPLIED TO 2.0 TESTING, USE THE QUICKSTARTS IN THE 2.0 ON-LINE DOCUMENTATION
Table of Contents
Table of Contents | ||
---|---|---|
|
...
This documentation provides a general tour to DAOS management commands (DMGdmg) for daos_admin, and DAOS tools (daos) for daos_client users. Including Provides help with pool and container create, list, query and destroy on a 2 hosts DAOS server and 1 host DAOS client environment. Example of DMG and DAOS commands option are provided. for daos_admin and daos_client users. Some frequent common errors user might see and workaround are provided. Setting and workarounds for new users when using the dmg and daos tools. Example runs of data transfer between DAOS file systems, by setting up of DAOS dfuse mount point and run traffic with dfuse fio and mpirun mdtest. Runs with 4 Example of basic dmg and daos tools runs on 2 hosts DAOS server and example 1 host client, runs of DAOS rebuild and outputs are providedover dfuse fio and mpirun mdtest on a 4 hosts DAOS server.
Requirements
Set environment variables for list of servers, client and admin node.
Code Block | ||
---|---|---|
| ||
export SERVER_NODES=node-1,node-2 export # Example of 2 hosts server # For 1 host server, export SERVER_NODES=node-1 export SERVER_NODES=node-1,node-2 # Example to use admin and client on the same node export ADMIN_NODE=node-3 export CLIENT_NODE=node-3 export ALL_NODES=$SERVER_NODES,$CLIENT_NODE |
...
dmg system query
Code Block | ||
---|---|---|
| ||
# system query output for a 2 hosts DAOS server
$ dmg system query
Rank State
---- -----
[0-1] Joined |
...
dmg storage query usage
Code Block | ||
---|---|---|
| ||
# system storage query usage output for a 2 hosts DAOS server $ dmg storage query usage Hosts SCM-Total SCM-Free SCM-Used NVMe-Total NVMe-Free NVMe-Used ----- --------- -------- -------- ---------- --------- --------- boro-35 17 GB 17 GB 0 % 0 B 0 B N/A boro-8 17 GB 17 GB 0 % 0 B 0 B N/A |
dmg pool create help
Code Block | ||
---|---|---|
| ||
$ dmg pool create --help Usage: dmg [OPTIONS] pool create [create-OPTIONS] Application Options: --allow-proxy Allow proxy configuration via environment -l, --host-list= comma separated list of addresses <ipv4addr/hostname> -i, --insecure have dmg attempt to connect without certificates -d, --debug enable debug output -j, --json Enable JSON output -J, --json-logging Enable JSON-formatted log output -o, --config-path= Client config file path Help Options: -h, --help Show this help message [create command options] -g, --group= DAOS pool to be owned by given group, format name@domain -u, --user= DAOS pool to be owned by given user, format name@domain -p, --name= Unique name for pool (set as label) -a, --acl-file= Access Control List file path for DAOS pool -z, --size= Total size of DAOS pool (auto) -t, --scm-ratio= Percentage of SCM:NVMe for pool storage (auto) (default: 6) -k, --nranks= Number of ranks to use (auto) -v, --nsvc= Number of pool service replicas -s, --scm-size= Per-server SCM allocation for DAOS pool (manual) -n, --nvme-size= Per-server NVMe allocation for DAOS pool (manual) -r, --ranks= Storage server unique identifiers (ranks) for DAOS pool -S, --sys= DAOS system that pool is to be a part of (default: daos_server) |
dmg pool create
Code Block | ||
---|---|---|
| ||
# Create a 10GB pool
$ dmg pool create --size=10G
Creating DAOS pool with automatic storage allocation: 10 GB NVMe + 6.00% SCM
Pool created with 100.00% SCM/NVMe ratio
-----------------------------------------
UUID : 0a6003c6-23a7-4cb5-8895-c004ca2b75f5
Service Ranks : 0
Storage Ranks : [0-1]
Total Size : 10 GB
SCM : 10 GB (5.0 GB / rank)
NVMe : 0 B (0 B / rank)
$ dmg storage query usage
Hosts SCM-Total SCM-Free SCM-Used NVMe-Total NVMe-Free NVMe-Used
----- --------- -------- -------- ---------- --------- ---------
boro-35 17 GB 12 GB 29 % 0 B 0 B N/A
boro-8 17 GB 11 GB 36 % 0 B 0 B N/A |
...
Code Block | ||
---|---|---|
| ||
$ daos cont query --pool=$DAOS_POOL --cont=$DAOS_CONT Pool UUID: 528f4710-7eb8-4850-b6aa-09e4b3c8f532 Container UUID: bc4fe707-7470-4b7d-83bf-face75cc98fc Number of snapshots: 0 Latest Persistent Snapshot: 0 Highest Aggregated Epoch: 172477977191481344 Container redundancy factor: 1 |
daos container snapshot help/create/list/destroy
Code Block | ||
---|---|---|
| ||
$ daos help cont snapshotcreate-snap daos command (v1.2), libdaos 1.2.0 container (cont) commands: create create a container clone clone a container destroy destroy a container list-objects list all objects in container list-obj query query a container get-prop get all container's properties set-prop set container's properties get-acl get a container's ACL overwrite-acl replace a container's ACL update-acl add/modify entries in a container's ACL delete-acl delete an entry from a container's ACL set-owner change the user and/or group that own a container stat get container statistics check check objects consistency in container list-attrs list container user-defined attributes del-attr delete container user-defined attribute get-attr get container user-defined attribute set-attr set container user-defined attribute create-snap create container snapshot (optional name) libdaos 1.2.0 container options (snapshot and rollback-related): --snap=NAME container snapshot (create/destroy-snap, rollback) at most recent committed epoch --epc=EPOCHNUM container epoch list(destroy-snapssnap, rollback) list container snapshots taken --epcrange=B-E container epoch range (destroy-snap) container options (query, and destroyall containercommands snapshotsexcept create): <pool options> with --cont use: (--pool, --sys-name) by name, epoch<pool oroptions> range with --path use: (--sys-name) rollback --cont=UUID roll back container(mandatory, toor specified snapshotuse --path) use 'daos help cont|container COMMAND' for command specific--path=PATHSTR options |
daos container snapshot create/list/destroy
Code Block | ||
---|---|---|
| ||
$ daos cont create-snap --pool=$DAOS_POOL --cont=$DAOS_CONT snapshot/epoch 172646116775952384 has been created $ daos container list-snaps --pool=$DAOS_POOL --cont=$DAOS_CONT Container's snapshots : 172478166024060928 172646116775952384 $ daos container destroy-snap --pool=$DAOS_POOL --cont=$DAOS_CONT --epc=172646116775952384 $ daos container list-snaps --pool=$DAOS_POOL --cont=$DAOS_CONT Container's snapshots : 172478166024060928 |
Frequent errors user might see and workaround
use dmg command without daos_admin
...
privilege
Code Block | ||
---|---|---|
| ||
# Error message or timeout after dmg system query $ dmg system query ERROR: dmg: Unable to load Certificate Data: could not load cert: stat /etc/daos/certs/admin.crt: no such file or directory # or Node-hang after dmg system query command issued # Workaround # 1. Make sure the admin-host /etc/daos/daos_control.yml is correctly configured. # including: # hostlist: <daos_server_lists> # port: <port_num> # transport_config: # allow_insecure: <true/false> # ca_cert: /etc/daos/certs/daosCA.crt # cert: /etc/daos/certs/admin.crt # key: /etc/daos/certs/admin.key # # 2. Make sure the admin-host allow_insecure mode match with the servers'. |
...
Code Block | ||
---|---|---|
| ||
$ dmg pool create --size=50G Creating DAOS pool with automatic storage allocation: 50 GB NVMe + 6.00% SCM ERROR: dmg: pool create failed: DER_NOSPACE(-1007): No space on storage target # Workaround: dmg storage query scan to find current available storage $ dmg storage query usage Hosts SCM-Total SCM-Free SCM-Used NVMe-Total NVMe-Free NVMe-Used ----- --------- -------- -------- ---------- --------- --------- boro-8 17 GB 6.0 GB 65 % 0 B 0 B N/A $ dmg pool create --size=2G Creating DAOS pool with automatic storage allocation: 2.0 GB NVMe + 6.00% SCM Pool created with 100.00% SCM/NVMe ratio ----------------------------------------- UUID : b5ce2954-3f3e-4519-be04-ea298d776132 Service Ranks : 0 Storage Ranks : 0 Total Size : 2.0 GB SCM : 2.0 GB (2.0 GB / rank) NVMe : 0 B (0 B / rank) $ dmg storage query usage Hosts SCM-Total SCM-Free SCM-Used NVMe-Total NVMe-Free NVMe-Used ----- --------- -------- -------- ---------- --------- --------- boro-8 17 GB 2.9 GB 83 % 0 B 0 B N/A |
dmg pool destroy timeout
Code Block | ||
---|---|---|
| ||
# dmg pool destroy Timeout or failed due to pool has active container(s) 0# BWorkaround pool destroy --force option $ N/Admg pool destroy --pool=$DAOS_POOL --force Pool-destroy command succeeded |
...
Run with dfuse fio
required rpm
Code Block | ||
---|---|---|
| ||
$ sudo yum install -y fio or $ sudo yum install -y daos-tests |
...
unmount
Code Block | ||
---|---|---|
| ||
$ /usr/bin/fusermount -u /tmp/daos_test1/ $ /usr/bin/df -h -t fuse.daos df: no file systems processed |
...
Run with
...
mpirun mdtest
required rpms
Code Block | ||
---|---|---|
| ||
$ sudo yum install -y mpich $ sudo yum install -y mdtest $ sudo yum install -y Lmod $ sudo module load mpi/mpich-x86_64 $ /usr/bin/touch /tmp/daos_test1/testfile |
run mpirun
...
ior and mdtest
Code Block | ||
---|---|---|
| ||
# Run mpirun ior $ /usr/lib64/mpich/bin/mpirun -host <host1> -np 30 ior -a POSIX -b 26214400 -v -w -k -i 1 -o /tmp/daos_test1/testfile -t 25M IOR-3.4.0+dev: MPI Coordinated Test of Parallel I/O Began : Fri Apr 16 18:07:56 2021 Command line : ior -a POSIX -b 26214400 -v -w -k -i 1 -o /tmp/daos_test1/testfile -t 25M Machine : Linux boro-8.boro.hpdd.intel.com Start time skew across all tasks: 0.00 sec TestID : 0 StartTime : Fri Apr 16 18:07:56 2021 Path : /tmp/daos_test1/testfile FS : 3.8 GiB Used FS: 1.1% Inodes: 0.2 Mi Used Inodes: 0.1% Participating tasks : 30 Options: api : POSIX apiVersion : test filename : /tmp/daos_test1/testfile access : single-shared-file type : independent segments : 1 ordering in a file : sequential ordering inter file : no tasks offsets nodes : 1 tasks : 30 clients per node : 30 repetitions : 1 xfersize : 25 MiB blocksize : 25 MiB aggregate filesize : 750 MiB verbose : 1 Results: access bw(MiB/s) IOPS Latency(s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter ------ --------- ---- ---------- ---------- --------- -------- -------- -------- -------- ---- Commencing write performance test: Fri Apr 16 18:07:56 2021 write 1499.68 59.99 0.480781 25600 25600 0.300237 0.500064 0.483573 0.500107 0 Max Write: 1499.68 MiB/sec (1572.53 MB/sec) Summary of all tests: Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Max(OPs) Min(OPs) Mean(OPs) StdDev Mean(s) Stonewall(s) Stonewall(MiB) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggs(MiB) API RefNum write 1499.68 1499.68 1499.68 0.00 59.99 59.99 59.99 0.00 0.50011 NA NA 0 30 30 1 0 0 1 0 0 1 26214400 26214400 750.0 POSIX 0 Finished : Fri Apr 16 18:07:57 2021 # Run Mpirunmpirun mdtest $ /usr/lib64/mpich/bin/mpirun -host <host1> -np 30 mdtest -a DFS -z 0 -F -C -i 1 -n 1667 -e 4096 -d / -w 4096 --dfs.chunk_size 1048576 --dfs.cont <container.uuid> --dfs.destroy --dfs.dir_oclass RP_3G1 --dfs.group daos_server --dfs.oclass RP_3G1 --dfs.pool <pool_uuid> – started at 04/16/2021 22:01:55 – mdtest-3.4.0+dev was launched with 30 total task(s) on 1 node(s) Command line used: mdtest 'a' 'DFS' '-z' '0' '-F' '-C' '-i' '1' '-n' '1667' '-e' '4096' '-d' '/' '-w' '4096' 'dfs.chunk_size' '1048576' 'dfs.cont' '3e661024-2f1f-4d7a-9cd4-1b05601e0789' 'dfs.destroy' 'dfs.dir_oclass' 'SX' 'dfs.group' 'daos_server' 'dfs.oclass' 'SX' '-dfs.pool' 'd546a7f5-586c-4d8f-aecd-372878df7b97' WARNING: unable to use realpath() on file system. Path: FS: 0.0 GiB Used FS: -nan% Inodes: 0.0 Mi Used Inodes: -nan% Nodemap: 111111111111111111111111111111 30 tasks, 50010 files SUMMARY rate: (of 1 iterations) Operation Max Min Mean Std Dev --------- — — ---- ------- File creation : 14206.584 14206.334 14206.511 0.072 File stat : 0.000 0.000 0.000 0.000 File read : 0.000 0.000 0.000 0.000 File removal : 0.000 0.000 0.000 0.000 Tree creation : 1869.791 1869.791 1869.791 0.000 Tree removal : 0.000 0.000 0.000 0.000 – finished at 04/16/2021 22:01:58 – $ /usr/lib64/mpich/bin/mpirun -host <host1> -np 50 mdtest -a DFS -z 0 -F -C -i 1 -n 1667 -e 4096 -d / -w 4096 --dfs.chunk_size 1048576 --dfs.cont 3e661024-2f1f-4d7a-9cd4-1b05601e0789 --dfs.destroy --dfs.dir_oclass SX --dfs.group daos_server --dfs.oclass SX --dfs.pool d546a7f5-586c-4d8f-aecd-372878df7b97 – started at 04/16/2021 22:02:21 – mdtest-3.4.0+dev was launched with 50 total task(s) on 1 node(s) Command line used: mdtest 'a' 'DFS' '-z' '0' '-F' '-C' '-i' '1' '-n' '1667' '-e' '4096' '-d' '/' '-w' '4096' 'dfs.chunk_size' '1048576' 'dfs.cont' '3e661024-2f1f-4d7a-9cd4-1b05601e0789' 'dfs.destroy' 'dfs.dir_oclass' 'SX' 'dfs.group' 'daos_server' 'dfs.oclass' 'SX' '-dfs.pool' 'd546a7f5-586c-4d8f-aecd-372878df7b97' WARNING: unable to use realpath() on file system. Path: FS: 0.0 GiB Used FS: -nan% Inodes: 0.0 Mi Used Inodes: -nan% Nodemap: 11111111111111111111111111111111111111111111111111 50 tasks, 83350 files SUMMARY rate: (of 1 iterations) Operation Max Min Mean Std Dev --------- — — ---- ------- File creation : 13342.303 13342.093 13342.228 0.059 File stat : 0.000 0.000 0.000 0.000 File read : 0.000 0.000 0.000 0.000 File removal : 0.000 0.000 0.000 0.000 Tree creation : 1782.938 1782.938 1782.938 0.000 Tree removal : 0.000 0.000 0.000 0.000 – finished at 04/16/2021 22:02:27 – |
Run with 4 DAOS
...
hosts server,
...
rebuild with dfuse_io and mpirun
...
Environment variables setup
...
Run dfuse
Code Block | ||
---|---|---|
| ||
# Bring up 4 hosts server with appropriate daos_server.yml and # access-point, reference to DAOS Set-Up # After DAOS servers and, DAOS admin and client RPMsstarted. loaded $ dmg storage format Format Summary: Hosts SCM Devices NVMe Devices ----- ----------- ------------ boro-[8,35,52-53] 1 0 $ dmg pool list Pool UUID Svc Replicas --------- ------------ 733bee7b-c2af-499e-99dd-313b1ef092a9 [1-3] $ daos cont create --pool=$DAOS_POOL --type=POSIX --oclass=RP_3G1 --properties=rf:2 Successfully created container 2649aa0f-3ad7-4943-abf5-4343205a637b $ daos pool list-cont --pool=$DAOS_POOL 2649aa0f-3ad7-4943-abf5-4343205a637b $ dmg pool query --pool=$DAOS_POOL Pool 733bee7b-c2af-499e-99dd-313b1ef092a9, ntarget=32, disabled=0, leader=2, version=1 Pool space info: - Target(VOS) count:32 - SCM: Total size: 5.0 GB Free: 5.0 GB, min:156 MB, max:156 MB, mean:156 MB - NVMe: Total size: 0 B Free: 0 B, min:0 B, max:0 B, mean:0 B Rebuild idle, 0 objs, 0 recs $ df -h -t fuse.daos df: no file systems processed $ mkdir /tmp/daos_test1 $ dfuse --mountpoint=/tmp/daos_test1 --pool=$DAOS_POOL --cont=$DAOS_CONT $ df -h -t fuse.daos Filesystem Size Used Avail Use% Mounted on dfuse 19G 1.1M 19G 1% /tmp/daos_test1 $ fio --name=random-write --ioengine=pvsync --rw=randwrite --bs=4k --size=128M --nrfiles=4 --directory=/tmp/daos_test1 --numjobs=8 --iodepth=16 --runtime=60 --time_based --direct=1 --buffered=0 --randrepeat=0 --norandommap --refill_buffers --group_reporting random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=pvsync, iodepth=16 ... fio-3.7 Starting 8 processes random-write: Laying out IO files (4 files / total 128MiB) random-write: Laying out IO files (4 files / total 128MiB) random-write: Laying out IO files (4 files / total 128MiB) random-write: Laying out IO files (4 files / total 128MiB) random-write: Laying out IO files (4 files / total 128MiB) random-write: Laying out IO files (4 files / total 128MiB) random-write: Laying out IO files (4 files / total 128MiB) random-write: Laying out IO files (4 files / total 128MiB) Jobs: 8 (f=32): [w(8)][100.0%][r=0KiB/s,w=96.1MiB/s][r=0,w=24.6k IOPS][eta 00m:00s] random-write: (groupid=0, jobs=8): err= 0: pid=27879: Sat Apr 17 01:12:57 2021 write: IOPS=24.4k, BW=95.3MiB/s (99.9MB/s)(5716MiB/60001msec) clat (usec): min=220, max=6687, avg=326.19, stdev=55.29 lat (usec): min=220, max=6687, avg=326.28, stdev=55.29 clat percentiles (usec): | 1.00th=[ 260], 5.00th=[ 273], 10.00th=[ 285], 20.00th=[ 293], | 30.00th=[ 306], 40.00th=[ 314], 50.00th=[ 322], 60.00th=[ 330], | 70.00th=[ 338], 80.00th=[ 355], 90.00th=[ 375], 95.00th=[ 396], | 99.00th=[ 445], 99.50th=[ 465], 99.90th=[ 523], 99.95th=[ 562], | 99.99th=[ 1827] bw ( KiB/s): min=10976, max=12496, per=12.50%, avg=12191.82, stdev=157.87, samples=952 iops : min= 2744, max= 3124, avg=3047.92, stdev=39.47, samples=952 lat (usec) : 250=0.23%, 500=99.61%, 750=0.15%, 1000=0.01% lat (msec) : 2=0.01%, 4=0.01%, 10=0.01% cpu : usr=0.81%, sys=1.69%, ctx=1463535, majf=0, minf=308 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,1463226,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=16 Run status group 0 (all jobs): WRITE: bw=95.3MiB/s (99.9MB/s), 95.3MiB/s-95.3MiB/s (99.9MB/s-99.9MB/s), io=5716MiB (5993MB), run=60001-60001msec |
...
Code Block | ||
---|---|---|
| ||
# from daos_admin console, stop a server rank $ dmg system stop --ranks=2 Rank Operation Result --------- ------ 2 stop OK # Verify stopped server been evicted $ dmg system query -v Rank UUID Control Address Fault Domain State Reason ---- --------------- ------------ ----- ------ 0 2bf0e083-33d6-4ce3-83c4-c898c2a7ddbd 10.7.1.8:10001 boro-8.boro.hpdd.intel.com Joined 1 c9ac1dd9-0f9d-4684-90d3-038b720fd26b 10.7.1.35:10001 boro-35.boro.hpdd.intel.com Joined 2 80e44fe9-3a2b-4808-9a0f-88c3cbe7f565 10.7.1.53:10001 boro-53.boro.hpdd.intel.com Evicted system stop 3 a26fd44a-6089-4cc3-a06b-278a85607fd3 10.7.1.52:10001 boro-52.boro.hpdd.intel.com Joined |
No Formatcode | ||
---|---|---|
| ||
# Restart, after evicted server restarted, verify the server joined $ /usr/bin/dmg system query -v Rank UUID Control Address Fault Domain State Reason ---- --------------- ------------ ----- ------ 0 2bf0e083-33d6-4ce3-83c4-c898c2a7ddbd 10.7.1.8:10001 /boro-8.boro.hpdd.intel.com Joined 1 c9ac1dd9-0f9d-4684-90d3-038b720fd26b 10.7.1.35:10001 /boro-35.boro.hpdd.intel.com Joined 2 80e44fe9-3a2b-4808-9a0f-88c3cbe7f565 10.7.1.53:10001 /boro-53.boro.hpdd.intel.com Joined 3 a26fd44a-6089-4cc3-a06b-278a85607fd3 10.7.1.52:10001 /boro-52.boro.hpdd.intel.com Joined |
Clean-Up
Code Block | |
---|---|
| |
| |
# pool reintegrate $ dmg pool reintegrate --pool=$DAOS_POOL --rank=2 Reintegration command succeeded # destroy container $ daos container destroy --pool=$DAOS_POOL --cont=$DAOS_CONT # destroy pool $ dmg pool destroy --pool=$DAOS_POOL Pool-destroy command succeeded # stop clients $ pdsh -S -w $CLIENT_NODES "sudo systemctl stop daos_agent.service" # disable clients $ pdsh -S -w $CLIENT_NODES "sudo systemctl disable daos_agent.service" # stop servers $ pdsh -S -w $SERVER_NODES "sudo systemctl stop daos_server.service" # disable servers $ pdsh -S -w $SERVER_NODES "sudo systemctl disable daos_server.service" |
...