Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

NOTE THESE ARE NOT TO BE APPLIED TO 2.0 TESTING, USE THE QUICKSTARTS IN THE 2.0 ON-LINE DOCUMENTATION

Table of Contents

Table of Contents
excludeTable of Contents

...

This documentation provides a general tour to DAOS management commands (dmg) for daos_admin, and DAOS tools (daos) for daos_client users. Provides help with pool and container create, list, query and destroy on DAOS server for daos_admin and daos_client users. Some frequent common errors users might see and workaround are provided.and workarounds for new users when using the dmg and daos tools.  Example runs of data transfer between DAOS file systems, by setting up of DAOS dfuse mount point and run traffic with dfuse fio and mpirun mdtest. Example of basic dmg and daos tools runs on 2 hosts DAOS server and 1 host client, runs of DAOS rebuild over dfuse fio and mpirun mdtest on a 4 hosts DAOS server.

...

Set environment variables for list of servers, client and admin node.

Code Block
languagebash
# Example of 2 hosts server
# For 1 host server, export SERVER_NODES=node-1
export SERVER_NODES=node-1,node-2
# Example to use admin and client on the same node
export ADMIN_NODE=node-3
export CLIENT_NODE=node-3
export ALL_NODES=$SERVER_NODES,$CLIENT_NODE

...

dmg system query

Code Block
languagebash
# system query output for a 2 hosts DAOS server
$ dmg system query
Rank  State  
----  -----  
[0-1] Joined  

...

dmg storage query usage

Code Block
languagebash
$# dmgsystem storage query usage Hosts   output for a 2 hosts DAOS server
$ dmg storage query usage
Hosts   SCM-Total SCM-Free SCM-Used NVMe-Total NVMe-Free NVMe-Used 
-----   --------- -------- -------- ---------- --------- --------- 
boro-35 17 GB     17 GB    0 %      0 B        0 B       N/A       
boro-8  17 GB     17 GB    0 %      0 B        0 B       N/A        

dmg pool create help

Code Block
languagebash
$ dmg pool create --help
Usage:
  dmg [OPTIONS] pool create [create-OPTIONS]

Application Options:
      --allow-proxy    Allow proxy configuration via environment
  -l, --host-list=     comma separated list of addresses <ipv4addr/hostname>
  -i, --insecure       have dmg attempt to connect without certificates
  -d, --debug          enable debug output
  -j, --json           Enable JSON output
  -J, --json-logging   Enable JSON-formatted log output
  -o, --config-path=   Client config file path

Help Options:
  -h, --help           Show this help message

[create command options]
      -g, --group=     DAOS pool to be owned by given group, format name@domain
      -u, --user=      DAOS pool to be owned by given user, format name@domain
      -p, --name=      Unique name for pool (set as label)
      -a, --acl-file=  Access Control List file path for DAOS pool
      -z, --size=      Total size of DAOS pool (auto)
      -t, --scm-ratio= Percentage of SCM:NVMe for pool storage (auto) (default: 6)
      -k, --nranks=    Number of ranks to use (auto)
      -v, --nsvc=      Number of pool service replicas
      -s, --scm-size=  Per-server SCM allocation for DAOS pool (manual)
      -n, --nvme-size= Per-server NVMe allocation for DAOS pool (manual)
      -r, --ranks=     Storage server unique identifiers (ranks) for DAOS pool
      -S, --sys=       DAOS system that pool is to be a part of (default: daos_server)

dmg pool create

Code Block
languagebash
# Create a 10GB pool
$ dmg pool create --size=10G
Creating DAOS pool with automatic storage allocation: 10 GB NVMe + 6.00% SCM
Pool created with 100.00% SCM/NVMe ratio
-----------------------------------------
  UUID          : 0a6003c6-23a7-4cb5-8895-c004ca2b75f5
  Service Ranks : 0                                   
  Storage Ranks : [0-1]                               
  Total Size    : 10 GB                               
  SCM           : 10 GB (5.0 GB / rank)               
  NVMe          : 0 B (0 B / rank)                  

$ dmg storage query usage
Hosts   SCM-Total SCM-Free SCM-Used NVMe-Total NVMe-Free NVMe-Used 
-----   --------- -------- -------- ---------- --------- --------- 
boro-35 17 GB     12 GB    29 %     0 B        0 B       N/A       
boro-8  17 GB     11 GB    36 %     0 B        0 B       N/A

...

Code Block
languagebash
$ daos cont query  --pool=$DAOS_POOL --cont=$DAOS_CONT
Pool UUID:      528f4710-7eb8-4850-b6aa-09e4b3c8f532
Container UUID: bc4fe707-7470-4b7d-83bf-face75cc98fc
Number of snapshots: 0
Latest Persistent Snapshot: 0
Highest Aggregated Epoch: 172477977191481344
Container redundancy factor: 1

daos container snapshot help/create/list/destroy

Code Block
languagebash
$ daos help cont snapshotcreate-snap
daos command (v1.2), libdaos 1.2.0
container containeroptions (cont) commandssnapshot and rollback-related):
        --snap=NAME  create      container snapshot    create a container(create/destroy-snap, rollback)
        --epc=EPOCHNUM   clone  container epoch (destroy-snap, rollback)
      clone a container --epcrange=B-E     container epoch range (destroy-snap)
container options destroy(query, and all commands except create):
    destroy a container    <pool options>   with --cont use: list(-objects -pool, --sys-name)
   list all objects in container   <pool options>   with    list-obj
          query            query a container
          get-prop         get all container's properties
          set-prop         set container's properties
          get-acl          get a container's ACL
          overwrite-acl    replace a container's ACL
          update-acl       add/modify entries in a container's ACL
          delete-acl       delete an entry from a container's ACL
          set-owner        change the user and/or group that own a container
          stat             get container statistics
          check            check objects consistency in container
          list-attrs       list container user-defined attributes
          del-attr         delete container user-defined attribute
          get-attr         get container user-defined attribute
          set-attr         set container user-defined attribute
          create-snap      create container snapshot (optional name)
                           at most recent committed epoch
          list-snaps       list container snapshots taken
          destroy-snap     destroy container snapshots
                           by name, epoch or range
  -path use: (--sys-name)
       rollback --cont=UUID        roll(mandatory, backor container to specified snapshotuse --path)
   use 'daos help cont|container COMMAND' for command specific options

daos container snapshot create/list/destroy

Code Block
languagebash
 --path=PATHSTR

$ daos cont create-snap --pool=$DAOS_POOL --cont=$DAOS_CONT
snapshot/epoch 172646116775952384 has been created

$ daos container list-snaps --pool=$DAOS_POOL --cont=$DAOS_CONT
Container's snapshots :
172478166024060928 
172646116775952384 

$ daos container destroy-snap --pool=$DAOS_POOL --cont=$DAOS_CONT --epc=172646116775952384

$ daos container list-snaps --pool=$DAOS_POOL --cont=$DAOS_CONT
Container's snapshots :
172478166024060928 

Frequent errors user might see and workaround

use dmg command without daos_admin

...

privilege

Code Block
languagebash
# Error message or timeout after dmg system query
$ dmg system query 
ERROR: dmg: Unable to load Certificate Data: could not load cert: stat /etc/daos/certs/admin.crt: no such file or directory
#
or Node-hang after dmg system query command issued 

# Workaround
# 1. Make sure the admin-host /etc/daos/daos_control.yml is correctly configured. 
#    including:
#      hostlist: <daos_server_lists>
#      port: <port_num>
#      transport_config:
#        allow_insecure: <true/false>
#        ca_cert: /etc/daos/certs/daosCA.crt
#        cert: /etc/daos/certs/admin.crt
#        key: /etc/daos/certs/admin.key
#
# 2. Make sure the admin-host allow_insecure mode match with the servers'.

...

Code Block
languagebash
$ dmg pool create --size=50G
Creating DAOS pool with automatic storage allocation: 50 GB NVMe + 6.00% SCM
ERROR: dmg: pool create failed: DER_NOSPACE(-1007): No space on storage target

# Workaround: dmg storage query scan to find current available storage
$ dmg storage query usage
Hosts  SCM-Total SCM-Free SCM-Used NVMe-Total NVMe-Free NVMe-Used 
-----  --------- -------- -------- ---------- --------- --------- 
boro-8 17 GB     6.0 GB   65 %     0 B        0 B       N/A       

$ dmg pool create --size=2G
Creating DAOS pool with automatic storage allocation: 2.0 GB NVMe + 6.00% SCM
Pool created with 100.00% SCM/NVMe ratio
-----------------------------------------
  UUID          : b5ce2954-3f3e-4519-be04-ea298d776132
  Service Ranks : 0                                   
  Storage Ranks : 0                                   
  Total Size    : 2.0 GB                              
  SCM           : 2.0 GB (2.0 GB / rank)              
  NVMe          : 0 B (0 B / rank)                    

$ dmg storage query usage
Hosts  SCM-Total SCM-Free SCM-Used NVMe-Total NVMe-Free NVMe-Used 
-----  --------- -------- -------- ---------- --------- --------- 
boro-8 17 GB 17 GB     2.9 GB   83 %     0 B        0 B       N/A 2.9 GB   83 %     0 B        0 B       N/A       

...

 

dmg pool destroy timeout

Code Block
languagebash
# dmg pool destroy Timeout or failed due to pool has active container(s)
# Workaround pool destroy --force option

$ dmg pool destroy --pool=$DAOS_POOL --force
Pool-destroy command succeeded


Run with dfuse fio

required rpm

Code Block
languagebash
$ sudo yum install -y fio
or
$ sudo yum install -y daos-tests

...

unmount

Code Block
languagebash
$ /usr/bin/fusermount -u /tmp/daos_test1/

$ /usr/bin/df -h -t fuse.daos
df: no file systems processed

...








Run with mpirun mdtest

required rpms

Code Block
languagebash
$ sudo yum install -y mpich
$ sudo yum install -y mdtest
$ sudo yum install -y Lmod
$ sudo module load mpi/mpich-x86_64
$ /usr/bin/touch /tmp/daos_test1/testfile

...

Code Block
languagebash
# Run mpirun ior
$ /usr/lib64/mpich/bin/mpirun -host <host1> -np 30 ior -a POSIX -b 26214400 -v -w -k -i 1 -o /tmp/daos_test1/testfile -t 25M
IOR-3.4.0+dev: MPI Coordinated Test of Parallel I/O
Began : Fri Apr 16 18:07:56 2021
Command line : ior -a POSIX -b 26214400 -v -w -k -i 1 -o /tmp/daos_test1/testfile -t 25M
Machine : Linux boro-8.boro.hpdd.intel.com
Start time skew across all tasks: 0.00 sec
TestID : 0
StartTime : Fri Apr 16 18:07:56 2021
Path : /tmp/daos_test1/testfile
FS : 3.8 GiB Used FS: 1.1% Inodes: 0.2 Mi Used Inodes: 0.1%
Participating tasks : 30
Options:
api : POSIX
apiVersion :
test filename : /tmp/daos_test1/testfile
access : single-shared-file
type : independent
segments : 1
ordering in a file : sequential
ordering inter file : no tasks offsets
nodes : 1
tasks : 30
clients per node : 30
repetitions : 1
xfersize : 25 MiB
blocksize : 25 MiB
aggregate filesize : 750 MiB
verbose : 1
Results:
access bw(MiB/s) IOPS Latency(s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter
------ --------- ---- ---------- ---------- --------- -------- -------- -------- -------- ----
Commencing write performance test: Fri Apr 16 18:07:56 2021
write 1499.68 59.99 0.480781 25600 25600 0.300237 0.500064 0.483573 0.500107 0
Max Write: 1499.68 MiB/sec (1572.53 MB/sec)
Summary of all tests:
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Max(OPs) Min(OPs) Mean(OPs) StdDev Mean(s) Stonewall(s) Stonewall(MiB) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggs(MiB) API RefNum
write 1499.68 1499.68 1499.68 0.00 59.99 59.99 59.99 0.00 0.50011 NA NA 0 30 30 1 0 0 1 0 0 1 26214400 26214400 750.0 POSIX 0
Finished : Fri Apr 16 18:07:57 2021


# Run mpirun mdtest
$ /usr/lib64/mpich/bin/mpirun -host <host1> -np 30 mdtest -a DFS -z 0 -F -C -i 1 -n 1667 -e 4096 -d / -w 4096 --dfs.chunk_size 1048576 --dfs.cont <container.uuid> --dfs.destroy --dfs.dir_oclass RP_3G1 --dfs.group daos_server --dfs.oclass RP_3G1 --dfs.pool <pool_uuid>
– started at 04/16/2021 22:01:55 –
mdtest-3.4.0+dev was launched with 30 total task(s) on 1 node(s)
Command line used: mdtest 'a' 'DFS' '-z' '0' '-F' '-C' '-i' '1' '-n' '1667' '-e' '4096' '-d' '/' '-w' '4096' 'dfs.chunk_size' '1048576' 'dfs.cont' '3e661024-2f1f-4d7a-9cd4-1b05601e0789' 'dfs.destroy' 'dfs.dir_oclass' 'SX' 'dfs.group' 'daos_server' 'dfs.oclass' 'SX' '-dfs.pool' 'd546a7f5-586c-4d8f-aecd-372878df7b97'
WARNING: unable to use realpath() on file system.
Path:
FS: 0.0 GiB Used FS: -nan% Inodes: 0.0 Mi Used Inodes: -nan%
Nodemap: 111111111111111111111111111111
30 tasks, 50010 files
SUMMARY rate: (of 1 iterations)
Operation Max Min Mean Std Dev
--------- — — ---- -------
File creation : 14206.584 14206.334 14206.511 0.072
File stat : 0.000 0.000 0.000 0.000
File read : 0.000 0.000 0.000 0.000
File removal : 0.000 0.000 0.000 0.000
Tree creation : 1869.791 1869.791 1869.791 0.000
Tree removal : 0.000 0.000 0.000 0.000
– finished at 04/16/2021 22:01:58 –

$ /usr/lib64/mpich/bin/mpirun -host <host1> -np 50 mdtest -a DFS -z 0 -F -C -i 1 -n 1667 -e 4096 -d / -w 4096 --dfs.chunk_size 1048576 --dfs.cont 3e661024-2f1f-4d7a-9cd4-1b05601e0789 --dfs.destroy --dfs.dir_oclass SX --dfs.group daos_server --dfs.oclass SX --dfs.pool d546a7f5-586c-4d8f-aecd-372878df7b97
– started at 04/16/2021 22:02:21 –
mdtest-3.4.0+dev was launched with 50 total task(s) on 1 node(s)
Command line used: mdtest 'a' 'DFS' '-z' '0' '-F' '-C' '-i' '1' '-n' '1667' '-e' '4096' '-d' '/' '-w' '4096' 'dfs.chunk_size' '1048576' 'dfs.cont' '3e661024-2f1f-4d7a-9cd4-1b05601e0789' 'dfs.destroy' 'dfs.dir_oclass' 'SX' 'dfs.group' 'daos_server' 'dfs.oclass' 'SX' '-dfs.pool' 'd546a7f5-586c-4d8f-aecd-372878df7b97'
WARNING: unable to use realpath() on file system.
Path:
FS: 0.0 GiB Used FS: -nan% Inodes: 0.0 Mi Used Inodes: -nan%
Nodemap: 11111111111111111111111111111111111111111111111111
50 tasks, 83350 files
SUMMARY rate: (of 1 iterations)
Operation Max Min Mean Std Dev
--------- — — ---- -------
File creation : 13342.303 13342.093 13342.228 0.059
File stat : 0.000 0.000 0.000 0.000
File read : 0.000 0.000 0.000 0.000
File removal : 0.000 0.000 0.000 0.000
Tree creation : 1782.938 1782.938 1782.938 0.000
Tree removal : 0.000 0.000 0.000 0.000
– finished at 04/16/2021 22:02:27 –

Run with 4 DAOS

...

hosts server, rebuild with dfuse_io and mpirun

Environment variables setup

...

Run dfuse

Code Block
languagebash
# Bring up 4 hosts server with appropriate daos_server.yml and
# access-point, reference to  DAOS Set-Up 
# After DAOS servers and, DAOS admin and client RPMs loadedstarted.

$ dmg storage format
Format Summary:
  Hosts             SCM Devices NVMe Devices 
  -----             ----------- ------------ 
  boro-[8,35,52-53] 1           0            

$ dmg pool list
Pool UUID Svc Replicas 
--------- ------------ 
733bee7b-c2af-499e-99dd-313b1ef092a9 
[1-3] 

$ daos cont create --pool=$DAOS_POOL --type=POSIX --oclass=RP_3G1 --properties=rf:2
Successfully created container 2649aa0f-3ad7-4943-abf5-4343205a637b 

$ daos pool list-cont --pool=$DAOS_POOL
2649aa0f-3ad7-4943-abf5-4343205a637b

$ dmg pool query --pool=$DAOS_POOL 
Pool 733bee7b-c2af-499e-99dd-313b1ef092a9, ntarget=32, disabled=0, leader=2, version=1 
Pool space info: 
- Target(VOS) count:32 
- SCM: 
  Total size: 5.0 GB 
  Free: 5.0 GB, min:156 MB, max:156 MB, mean:156 MB 
- NVMe: 
  Total size: 0 B 
  Free: 0 B, min:0 B, max:0 B, mean:0 B 
Rebuild idle, 0 objs, 0 recs

$ df -h -t fuse.daos
df: no file systems processed

$ mkdir /tmp/daos_test1

$ dfuse --mountpoint=/tmp/daos_test1 --pool=$DAOS_POOL --cont=$DAOS_CONT

$ df -h -t fuse.daos
Filesystem      Size  Used Avail Use% Mounted on
dfuse            19G  1.1M   19G   1% /tmp/daos_test1

$ fio --name=random-write --ioengine=pvsync --rw=randwrite --bs=4k --size=128M --nrfiles=4 --directory=/tmp/daos_test1 --numjobs=8 --iodepth=16 --runtime=60 --time_based --direct=1 --buffered=0 --randrepeat=0 --norandommap --refill_buffers --group_reporting
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=pvsync, iodepth=16
...
fio-3.7
Starting 8 processes
random-write: Laying out IO files (4 files / total 128MiB)
random-write: Laying out IO files (4 files / total 128MiB)
random-write: Laying out IO files (4 files / total 128MiB)
random-write: Laying out IO files (4 files / total 128MiB)
random-write: Laying out IO files (4 files / total 128MiB)
random-write: Laying out IO files (4 files / total 128MiB)
random-write: Laying out IO files (4 files / total 128MiB)
random-write: Laying out IO files (4 files / total 128MiB)
Jobs: 8 (f=32): [w(8)][100.0%][r=0KiB/s,w=96.1MiB/s][r=0,w=24.6k IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=8): err= 0: pid=27879: Sat Apr 17 01:12:57 2021
  write: IOPS=24.4k, BW=95.3MiB/s (99.9MB/s)(5716MiB/60001msec)
    clat (usec): min=220, max=6687, avg=326.19, stdev=55.29
     lat (usec): min=220, max=6687, avg=326.28, stdev=55.29
    clat percentiles (usec):
     |  1.00th=[  260],  5.00th=[  273], 10.00th=[  285], 20.00th=[  293],
     | 30.00th=[  306], 40.00th=[  314], 50.00th=[  322], 60.00th=[  330],
     | 70.00th=[  338], 80.00th=[  355], 90.00th=[  375], 95.00th=[  396],
     | 99.00th=[  445], 99.50th=[  465], 99.90th=[  523], 99.95th=[  562],
     | 99.99th=[ 1827]
   bw (  KiB/s): min=10976, max=12496, per=12.50%, avg=12191.82, stdev=157.87, samples=952
   iops        : min= 2744, max= 3124, avg=3047.92, stdev=39.47, samples=952
  lat (usec)   : 250=0.23%, 500=99.61%, 750=0.15%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%
  cpu          : usr=0.81%, sys=1.69%, ctx=1463535, majf=0, minf=308
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1463226,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: bw=95.3MiB/s (99.9MB/s), 95.3MiB/s-95.3MiB/s (99.9MB/s-99.9MB/s), io=5716MiB (5993MB), run=60001-60001msec

...