Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

NOTE THESE ARE NOT TO BE APPLIED TO 2.0 TESTING, USE THE QUICKSTARTS IN THE 2.0 ON-LINE DOCUMENTATION

Table of Contents

Table of Contents
excludeTable of Contents

...

This documentation provides a general tour to DAOS management commands (DMGdmg) for daos_admin, and DAOS tools (daos) for daos_client users. Including Provides help with pool and container create, list, query and destroy on a 2 hosts DAOS server and 1 host DAOS client environment. Example of DMG and DAOS commands option are provided. for daos_admin and daos_client users. Some frequent common errors user might see and workaround are provided. Setting and workarounds for new users when using the dmg and daos tools.  Example runs of data transfer between DAOS file systems, by setting up of DAOS dfuse mount point and run traffic with dfuse fio and mpirun mdtest. Runs with 4 Example of basic dmg and daos tools runs on 2 hosts DAOS server and example 1 host client, runs of DAOS rebuild and outputs are providedover dfuse fio and mpirun mdtest on a 4 hosts DAOS server.

Requirements

Set environment variables for list of servers, client and admin node.

Code Block
languagebash
export SERVER_NODES=node-1,node-2
export # Example of 2 hosts server
# For 1 host server, export SERVER_NODES=node-1
export SERVER_NODES=node-1,node-2
# Example to use admin and client on the same node
export ADMIN_NODE=node-3
export CLIENT_NODE=node-3
export ALL_NODES=$SERVER_NODES,$CLIENT_NODE

...

dmg system query

Code Block
languagebash
# system query output for a 2 hosts DAOS server
$ dmg system query
Rank  State  
----  -----  
[0-1] Joined  

...

dmg storage query usage

Code Block
languagebash
# system storage query usage output for a 2 hosts DAOS server
$ dmg storage query usage
Hosts   SCM-Total SCM-Free SCM-Used NVMe-Total NVMe-Free NVMe-Used 
-----   --------- -------- -------- ---------- --------- --------- 
boro-35 17 GB     17 GB    0 %      0 B        0 B       N/A       
boro-8  17 GB     17 GB    0 %      0 B        0 B       N/A        

dmg pool create help

Code Block
languagebash
$ dmg pool create --help
Usage:
  dmg [OPTIONS] pool create [create-OPTIONS]

Application Options:
      --allow-proxy    Allow proxy configuration via environment
  -l, --host-list=     comma separated list of addresses <ipv4addr/hostname>
  -i, --insecure       have dmg attempt to connect without certificates
  -d, --debug          enable debug output
  -j, --json           Enable JSON output
  -J, --json-logging   Enable JSON-formatted log output
  -o, --config-path=   Client config file path

Help Options:
  -h, --help           Show this help message

[create command options]
      -g, --group=     DAOS pool to be owned by given group, format name@domain
      -u, --user=      DAOS pool to be owned by given user, format name@domain
      -p, --name=      Unique name for pool (set as label)
      -a, --acl-file=  Access Control List file path for DAOS pool
      -z, --size=      Total size of DAOS pool (auto)
      -t, --scm-ratio= Percentage of SCM:NVMe for pool storage (auto) (default: 6)
      -k, --nranks=    Number of ranks to use (auto)
      -v, --nsvc=      Number of pool service replicas
      -s, --scm-size=  Per-server SCM allocation for DAOS pool (manual)
      -n, --nvme-size= Per-server NVMe allocation for DAOS pool (manual)
      -r, --ranks=     Storage server unique identifiers (ranks) for DAOS pool
      -S, --sys=       DAOS system that pool is to be a part of (default: daos_server)

dmg pool create

Code Block
languagebash
# Create a 10GB pool
$ dmg pool create --size=10G
Creating DAOS pool with automatic storage allocation: 10 GB NVMe + 6.00% SCM
Pool created with 100.00% SCM/NVMe ratio
-----------------------------------------
  UUID          : 0a6003c6-23a7-4cb5-8895-c004ca2b75f5
  Service Ranks : 0                                   
  Storage Ranks : [0-1]                               
  Total Size    : 10 GB                               
  SCM           : 10 GB (5.0 GB / rank)               
  NVMe          : 0 B (0 B / rank)                  

$ dmg storage query usage
Hosts   SCM-Total SCM-Free SCM-Used NVMe-Total NVMe-Free NVMe-Used 
-----   --------- -------- -------- ---------- --------- --------- 
boro-35 17 GB     12 GB    29 %     0 B        0 B       N/A       
boro-8  17 GB     11 GB    36 %     0 B        0 B       N/A

...

Code Block
languagebash
$ daos cont query  --pool=$DAOS_POOL --cont=$DAOS_CONT
Pool UUID:      528f4710-7eb8-4850-b6aa-09e4b3c8f532
Container UUID: bc4fe707-7470-4b7d-83bf-face75cc98fc
Number of snapshots: 0
Latest Persistent Snapshot: 0
Highest Aggregated Epoch: 172477977191481344
Container redundancy factor: 1

daos container snapshot help/create/list/destroy

Code Block
languagebash
$ daos help cont snapshotcreate-snap
daos command (v1.2), libdaos 1.2.0

container (cont) commands:
          create           create a container
          clone            clone a container
          destroy          destroy a container
          list-objects     list all objects in container
          list-obj
          query            query a container
          get-prop         get all container's properties
          set-prop         set container's properties
          get-acl          get a container's ACL
          overwrite-acl    replace a container's ACL
          update-acl       add/modify entries in a container's ACL
          delete-acl       delete an entry from a container's ACL
          set-owner        change the user and/or group that own a container
          stat             get container statistics
          check            check objects consistency in container
          list-attrs       list container user-defined attributes
          del-attr         delete container user-defined attribute
          get-attr         get container user-defined attribute
          set-attr         set container user-defined attribute
          create-snap      create container snapshot (optional name)
       libdaos 1.2.0
container options (snapshot and rollback-related):
        --snap=NAME        container snapshot (create/destroy-snap, rollback)
at most recent committed epoch    --epc=EPOCHNUM     container epoch list(destroy-snapssnap, rollback)
     list container snapshots taken --epcrange=B-E     container epoch range    (destroy-snap)
container options (query, and destroyall containercommands snapshotsexcept create):
          <pool options>   with --cont use: (--pool, --sys-name)
        by name, epoch<pool oroptions> range  with --path use: (--sys-name)
     rollback   --cont=UUID      roll back container(mandatory, toor specified snapshotuse --path)
 use 'daos help cont|container COMMAND' for command specific--path=PATHSTR
options

daos container snapshot create/list/destroy

Code Block
languagebash
$ daos cont create-snap --pool=$DAOS_POOL --cont=$DAOS_CONT
snapshot/epoch 172646116775952384 has been created

$ daos container list-snaps --pool=$DAOS_POOL --cont=$DAOS_CONT
Container's snapshots :
172478166024060928 
172646116775952384 

$ daos container destroy-snap --pool=$DAOS_POOL --cont=$DAOS_CONT --epc=172646116775952384

$ daos container list-snaps --pool=$DAOS_POOL --cont=$DAOS_CONT
Container's snapshots :
172478166024060928 

Frequent errors user might see and workaround

use dmg command without daos_admin

...

privilege

Code Block
languagebash
# Error message or timeout after dmg system query
$ dmg system query 
ERROR: dmg: Unable to load Certificate Data: could not load cert: stat /etc/daos/certs/admin.crt: no such file or directory
# or Node-hang after dmg system query command
issued


# Workaround
# 1. Make sure the admin-host /etc/daos/daos_control.yml is correctly configured. 
#    including:
#      hostlist: <daos_server_lists>
#      port: <port_num>
#      transport_config:
#        allow_insecure: <true/false>
#        ca_cert: /etc/daos/certs/daosCA.crt
#        cert: /etc/daos/certs/admin.crt
#        key: /etc/daos/certs/admin.key
#
# 2. Make sure the admin-host allow_insecure mode match with the servers'.

...

Code Block
languagebash
$ dmg pool create --size=50G
Creating DAOS pool with automatic storage allocation: 50 GB NVMe + 6.00% SCM
ERROR: dmg: pool create failed: DER_NOSPACE(-1007): No space on storage target

# Workaround: dmg storage query scan to find current available storage
$ dmg storage query usage
Hosts  SCM-Total SCM-Free SCM-Used NVMe-Total NVMe-Free NVMe-Used 
-----  --------- -------- -------- ---------- --------- --------- 
boro-8 17 GB     6.0 GB   65 %     0 B        0 B       N/A       

$ dmg pool create --size=2G
Creating DAOS pool with automatic storage allocation: 2.0 GB NVMe + 6.00% SCM
Pool created with 100.00% SCM/NVMe ratio
-----------------------------------------
  UUID          : b5ce2954-3f3e-4519-be04-ea298d776132
  Service Ranks : 0                                   
  Storage Ranks : 0                                   
  Total Size    : 2.0 GB                              
  SCM           : 2.0 GB (2.0 GB / rank)              
  NVMe          : 0 B (0 B / rank)                    

$ dmg storage query usage
Hosts  SCM-Total SCM-Free SCM-Used NVMe-Total NVMe-Free NVMe-Used 
-----  --------- -------- -------- ---------- --------- --------- 
boro-8 17 GB     2.9 GB   83 %     0 B        0 B       N/A       

dmg pool destroy timeout

Code Block
languagebash
# dmg pool destroy Timeout or failed due to pool has active container(s)
0# BWorkaround pool destroy --force option

$ N/Admg pool destroy --pool=$DAOS_POOL --force
Pool-destroy command succeeded

...


Run with dfuse fio

required rpm

Code Block
languagebash
$ sudo yum install -y fio
or
$ sudo yum install -y daos-tests

...

unmount

Code Block
languagebash
$ /usr/bin/fusermount -u /tmp/daos_test1/

$ /usr/bin/df -h -t fuse.daos
df: no file systems processed

...








Run with

...

mpirun mdtest

required rpms

Code Block
languagebash
$ sudo yum install -y mpich
$ sudo yum install -y mdtest
$ sudo yum install -y Lmod
$ sudo module load mpi/mpich-x86_64
$ /usr/bin/touch /tmp/daos_test1/testfile

run mpirun

...

ior and mdtest

Code Block
languagebash
# Run mpirun ior
$ /usr/lib64/mpich/bin/mpirun -host <host1> -np 30 ior -a POSIX -b 26214400 -v -w -k -i 1 -o /tmp/daos_test1/testfile -t 25M
IOR-3.4.0+dev: MPI Coordinated Test of Parallel I/O
Began : Fri Apr 16 18:07:56 2021
Command line : ior -a POSIX -b 26214400 -v -w -k -i 1 -o /tmp/daos_test1/testfile -t 25M
Machine : Linux boro-8.boro.hpdd.intel.com
Start time skew across all tasks: 0.00 sec
TestID : 0
StartTime : Fri Apr 16 18:07:56 2021
Path : /tmp/daos_test1/testfile
FS : 3.8 GiB Used FS: 1.1% Inodes: 0.2 Mi Used Inodes: 0.1%
Participating tasks : 30
Options:
api : POSIX
apiVersion :
test filename : /tmp/daos_test1/testfile
access : single-shared-file
type : independent
segments : 1
ordering in a file : sequential
ordering inter file : no tasks offsets
nodes : 1
tasks : 30
clients per node : 30
repetitions : 1
xfersize : 25 MiB
blocksize : 25 MiB
aggregate filesize : 750 MiB
verbose : 1
Results:
access bw(MiB/s) IOPS Latency(s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter
------ --------- ---- ---------- ---------- --------- -------- -------- -------- -------- ----
Commencing write performance test: Fri Apr 16 18:07:56 2021
write 1499.68 59.99 0.480781 25600 25600 0.300237 0.500064 0.483573 0.500107 0
Max Write: 1499.68 MiB/sec (1572.53 MB/sec)
Summary of all tests:
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Max(OPs) Min(OPs) Mean(OPs) StdDev Mean(s) Stonewall(s) Stonewall(MiB) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggs(MiB) API RefNum
write 1499.68 1499.68 1499.68 0.00 59.99 59.99 59.99 0.00 0.50011 NA NA 0 30 30 1 0 0 1 0 0 1 26214400 26214400 750.0 POSIX 0
Finished : Fri Apr 16 18:07:57 2021


# Run Mpirunmpirun mdtest
$ /usr/lib64/mpich/bin/mpirun -host <host1> -np 30 mdtest -a DFS -z 0 -F -C -i 1 -n 1667 -e 4096 -d / -w 4096 --dfs.chunk_size 1048576 --dfs.cont <container.uuid> --dfs.destroy --dfs.dir_oclass RP_3G1 --dfs.group daos_server --dfs.oclass RP_3G1 --dfs.pool <pool_uuid>
– started at 04/16/2021 22:01:55 –
mdtest-3.4.0+dev was launched with 30 total task(s) on 1 node(s)
Command line used: mdtest 'a' 'DFS' '-z' '0' '-F' '-C' '-i' '1' '-n' '1667' '-e' '4096' '-d' '/' '-w' '4096' 'dfs.chunk_size' '1048576' 'dfs.cont' '3e661024-2f1f-4d7a-9cd4-1b05601e0789' 'dfs.destroy' 'dfs.dir_oclass' 'SX' 'dfs.group' 'daos_server' 'dfs.oclass' 'SX' '-dfs.pool' 'd546a7f5-586c-4d8f-aecd-372878df7b97'
WARNING: unable to use realpath() on file system.
Path:
FS: 0.0 GiB Used FS: -nan% Inodes: 0.0 Mi Used Inodes: -nan%
Nodemap: 111111111111111111111111111111
30 tasks, 50010 files
SUMMARY rate: (of 1 iterations)
Operation Max Min Mean Std Dev
--------- — — ---- -------
File creation : 14206.584 14206.334 14206.511 0.072
File stat : 0.000 0.000 0.000 0.000
File read : 0.000 0.000 0.000 0.000
File removal : 0.000 0.000 0.000 0.000
Tree creation : 1869.791 1869.791 1869.791 0.000
Tree removal : 0.000 0.000 0.000 0.000
– finished at 04/16/2021 22:01:58 –

$ /usr/lib64/mpich/bin/mpirun -host <host1> -np 50 mdtest -a DFS -z 0 -F -C -i 1 -n 1667 -e 4096 -d / -w 4096 --dfs.chunk_size 1048576 --dfs.cont 3e661024-2f1f-4d7a-9cd4-1b05601e0789 --dfs.destroy --dfs.dir_oclass SX --dfs.group daos_server --dfs.oclass SX --dfs.pool d546a7f5-586c-4d8f-aecd-372878df7b97
– started at 04/16/2021 22:02:21 –
mdtest-3.4.0+dev was launched with 50 total task(s) on 1 node(s)
Command line used: mdtest 'a' 'DFS' '-z' '0' '-F' '-C' '-i' '1' '-n' '1667' '-e' '4096' '-d' '/' '-w' '4096' 'dfs.chunk_size' '1048576' 'dfs.cont' '3e661024-2f1f-4d7a-9cd4-1b05601e0789' 'dfs.destroy' 'dfs.dir_oclass' 'SX' 'dfs.group' 'daos_server' 'dfs.oclass' 'SX' '-dfs.pool' 'd546a7f5-586c-4d8f-aecd-372878df7b97'
WARNING: unable to use realpath() on file system.
Path:
FS: 0.0 GiB Used FS: -nan% Inodes: 0.0 Mi Used Inodes: -nan%
Nodemap: 11111111111111111111111111111111111111111111111111
50 tasks, 83350 files
SUMMARY rate: (of 1 iterations)
Operation Max Min Mean Std Dev
--------- — — ---- -------
File creation : 13342.303 13342.093 13342.228 0.059
File stat : 0.000 0.000 0.000 0.000
File read : 0.000 0.000 0.000 0.000
File removal : 0.000 0.000 0.000 0.000
Tree creation : 1782.938 1782.938 1782.938 0.000
Tree removal : 0.000 0.000 0.000 0.000
– finished at 04/16/2021 22:02:27 –

Run with 4 DAOS

...

hosts server,

...

rebuild with dfuse_io and mpirun

...

Environment variables setup

...

Run dfuse

Code Block
languagebash
# Bring up 4 hosts server with appropriate daos_server.yml and
# access-point, reference to  DAOS Set-Up 
# After DAOS servers and, DAOS admin and client RPMsstarted.
loaded

$ dmg storage format
Format Summary:
  Hosts             SCM Devices NVMe Devices 
  -----             ----------- ------------ 
  boro-[8,35,52-53] 1           0            

$ dmg pool list
Pool UUID Svc Replicas 
--------- ------------ 
733bee7b-c2af-499e-99dd-313b1ef092a9 
[1-3] 

$ daos cont create --pool=$DAOS_POOL --type=POSIX --oclass=RP_3G1 --properties=rf:2
Successfully created container 2649aa0f-3ad7-4943-abf5-4343205a637b 

$ daos pool list-cont --pool=$DAOS_POOL
2649aa0f-3ad7-4943-abf5-4343205a637b

$ dmg pool query --pool=$DAOS_POOL 
Pool 733bee7b-c2af-499e-99dd-313b1ef092a9, ntarget=32, disabled=0, leader=2, version=1 
Pool space info: 
- Target(VOS) count:32 
- SCM: 
  Total size: 5.0 GB 
  Free: 5.0 GB, min:156 MB, max:156 MB, mean:156 MB 
- NVMe: 
  Total size: 0 B 
  Free: 0 B, min:0 B, max:0 B, mean:0 B 
Rebuild idle, 0 objs, 0 recs

$ df -h -t fuse.daos
df: no file systems processed

$ mkdir /tmp/daos_test1

$ dfuse --mountpoint=/tmp/daos_test1 --pool=$DAOS_POOL --cont=$DAOS_CONT

$ df -h -t fuse.daos
Filesystem      Size  Used Avail Use% Mounted on
dfuse            19G  1.1M   19G   1% /tmp/daos_test1

$ fio --name=random-write --ioengine=pvsync --rw=randwrite --bs=4k --size=128M --nrfiles=4 --directory=/tmp/daos_test1 --numjobs=8 --iodepth=16 --runtime=60 --time_based --direct=1 --buffered=0 --randrepeat=0 --norandommap --refill_buffers --group_reporting
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=pvsync, iodepth=16
...
fio-3.7
Starting 8 processes
random-write: Laying out IO files (4 files / total 128MiB)
random-write: Laying out IO files (4 files / total 128MiB)
random-write: Laying out IO files (4 files / total 128MiB)
random-write: Laying out IO files (4 files / total 128MiB)
random-write: Laying out IO files (4 files / total 128MiB)
random-write: Laying out IO files (4 files / total 128MiB)
random-write: Laying out IO files (4 files / total 128MiB)
random-write: Laying out IO files (4 files / total 128MiB)
Jobs: 8 (f=32): [w(8)][100.0%][r=0KiB/s,w=96.1MiB/s][r=0,w=24.6k IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=8): err= 0: pid=27879: Sat Apr 17 01:12:57 2021
  write: IOPS=24.4k, BW=95.3MiB/s (99.9MB/s)(5716MiB/60001msec)
    clat (usec): min=220, max=6687, avg=326.19, stdev=55.29
     lat (usec): min=220, max=6687, avg=326.28, stdev=55.29
    clat percentiles (usec):
     |  1.00th=[  260],  5.00th=[  273], 10.00th=[  285], 20.00th=[  293],
     | 30.00th=[  306], 40.00th=[  314], 50.00th=[  322], 60.00th=[  330],
     | 70.00th=[  338], 80.00th=[  355], 90.00th=[  375], 95.00th=[  396],
     | 99.00th=[  445], 99.50th=[  465], 99.90th=[  523], 99.95th=[  562],
     | 99.99th=[ 1827]
   bw (  KiB/s): min=10976, max=12496, per=12.50%, avg=12191.82, stdev=157.87, samples=952
   iops        : min= 2744, max= 3124, avg=3047.92, stdev=39.47, samples=952
  lat (usec)   : 250=0.23%, 500=99.61%, 750=0.15%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%
  cpu          : usr=0.81%, sys=1.69%, ctx=1463535, majf=0, minf=308
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1463226,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: bw=95.3MiB/s (99.9MB/s), 95.3MiB/s-95.3MiB/s (99.9MB/s-99.9MB/s), io=5716MiB (5993MB), run=60001-60001msec

...

Code Block
languagebash
# from daos_admin console, stop a server rank
$ dmg system stop --ranks=2
Rank Operation Result
--------- ------
2 stop OK

# Verify stopped server been evicted 
$ dmg system query -v 
Rank UUID Control Address Fault Domain State Reason
 ---- --------------- ------------ ----- ------
 0 2bf0e083-33d6-4ce3-83c4-c898c2a7ddbd 10.7.1.8:10001 boro-8.boro.hpdd.intel.com Joined
 1 c9ac1dd9-0f9d-4684-90d3-038b720fd26b 10.7.1.35:10001 boro-35.boro.hpdd.intel.com Joined
 2 80e44fe9-3a2b-4808-9a0f-88c3cbe7f565 10.7.1.53:10001 boro-53.boro.hpdd.intel.com Evicted system stop
 3 a26fd44a-6089-4cc3-a06b-278a85607fd3 10.7.1.52:10001 boro-52.boro.hpdd.intel.com Joined


No Formatcode
languagebash
# Restart, after evicted server restarted, verify the server joined
$ /usr/bin/dmg system query -v
 Rank UUID Control Address Fault Domain State Reason
 ---- --------------- ------------ ----- ------
 0 2bf0e083-33d6-4ce3-83c4-c898c2a7ddbd 10.7.1.8:10001 /boro-8.boro.hpdd.intel.com Joined
 1 c9ac1dd9-0f9d-4684-90d3-038b720fd26b 10.7.1.35:10001 /boro-35.boro.hpdd.intel.com Joined
 2 80e44fe9-3a2b-4808-9a0f-88c3cbe7f565 10.7.1.53:10001 /boro-53.boro.hpdd.intel.com Joined
 3 a26fd44a-6089-4cc3-a06b-278a85607fd3 10.7.1.52:10001 /boro-52.boro.hpdd.intel.com Joined


Clean-Up

noformat
Code Block
language
bash
# pool reintegrate
$ dmg pool reintegrate --pool=$DAOS_POOL --rank=2
Reintegration command succeeded

# destroy container
$ daos container destroy --pool=$DAOS_POOL --cont=$DAOS_CONT

# destroy pool
$ dmg pool destroy --pool=$DAOS_POOL
Pool-destroy command succeeded

# stop clients
$ pdsh -S -w $CLIENT_NODES "sudo systemctl stop daos_agent.service"

# disable clients
$ pdsh -S -w $CLIENT_NODES "sudo systemctl disable daos_agent.service"

# stop servers
$ pdsh -S -w $SERVER_NODES "sudo systemctl stop daos_server.service"

# disable servers
$ pdsh -S -w $SERVER_NODES "sudo systemctl disable daos_server.service"



...