Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagebash
export SERVER_NODES=node-1,node-2,node-3,node-4
export ADMIN_NODE=node-5
export CLIENT_NODE=node-5
export ALL_NODES=$SERVER_NODES,$CLIENT_NODE

Run dfuse

No Format
# After DAOS servers and DAOS admin and client RPMs loaded 

$ dmg storage format
Format Summary:
  Hosts             SCM Devices NVMe Devices 
  -----             ----------- ------------ 
  boro-[8,35,52-53] 1           0              

$ dmg system query --verbose
Rank UUID                                 Control Address Fault Domain                 State  Reason 
---- ----                                 --------------- ------------                 -----  ------ 
0    739347b0-7db0-49a3-998f-acad65ff4615 10.7.1.8:10001  /boro-8.boro.hpdd.intel.com  Joined        
1    7273aded-e590-4028-8e85-9ea1ab42d411 10.7.1.52:10001 /boro-52.boro.hpdd.intel.com Joined        
2    d7ea954a-34e4-4671-bcde-047b26626fb4 10.7.1.53:10001 /boro-53.boro.hpdd.intel.com Joined        
3    e2772010-c587-4d33-b5a7-8b5edbc36011 10.7.1.35:10001 /boro-35.boro.hpdd.intel.com Joined        

$ dmg pool create --size=5G
Creating DAOS pool with automatic storage allocation: 5.0 GB NVMe + 6.00% SCM
Pool created with 100.00% SCM/NVMe ratio
-----------------------------------------
  UUID          : 733bee7b-c2af-499e-99dd-313b1ef092a9
  Service Ranks : [1-3]                               
  Storage Ranks : [0-3]                               
  Total Size    : 5.0 GB                              
  SCM           : 5.0 GB (1.3 GB / rank)              
  NVMe          : 0 B (0 B / rank) 

$ dmg pool list
Pool UUID                            Svc Replicas 
---------                            ------------ 
733bee7b-c2af-499e-99dd-313b1ef092a9 [1-3]        

$ daos cont create --pool=$DAOS_POOL --type=POSIX --oclass=RP_3G1 --properties=rf:2
Successfully created container 2649aa0f-3ad7-4943-abf5-4343205a637b

$ daos pool list-cont --pool=$DAOS_POOL
2649aa0f-3ad7-4943-abf5-4343205a637b

$ dmg pool query --pool=733bee7b-c2af-499e-99dd-313b1ef092a9
Pool 733bee7b-c2af-499e-99dd-313b1ef092a9, ntarget=32, disabled=0, leader=2, version=1
Pool space info:
- Target(VOS) count:32
- SCM:
  Total size: 5.0 GB
  Free: 5.0 GB, min:156 MB, max:156 MB, mean:156 MB
- NVMe:
  Total size: 0 B
  Free: 0 B, min:0 B, max:0 B, mean:0 B
Rebuild idle, 0 objs, 0 recs

Wiki Markup
Code Block
languagebash
# After DAOS servers and DAOS admin and client RPMs loaded

$ dmg storage format
Format Summary:
  Hosts             SCM Devices NVMe Devices 
  -----             ----------- ------------ 
  boro-[8,35,52-53] 1           0            

$ dmg pool list
Pool UUID Svc Replicas 
--------- ------------ 
733bee7b-c2af-499e-99dd-313b1ef092a9 
[1-3] 

$ daos cont create --pool=$DAOS_POOL --type=POSIX --oclass=RP_3G1 --properties=rf:2
Successfully created container 2649aa0f-3ad7-4943-abf5-4343205a637b 

$ daos pool list-cont --pool=$DAOS_POOL
2649aa0f-3ad7-4943-abf5-4343205a637b

$ dmg pool query --pool=$DAOS_POOL 
Pool 733bee7b-c2af-499e-99dd-313b1ef092a9, ntarget=32, disabled=0, leader=2, version=1 
Pool space info: 
- Target(VOS) count:32 
- SCM: 
  Total size: 5.0 GB 
  Free: 5.0 GB, min:156 MB, max:156 MB, mean:156 MB 
- NVMe: 
  Total size: 0 B 
  Free: 0 B, min:0 B, max:0 B, mean:0 B 
Rebuild idle, 0 objs, 0 recs

$ df -h -t fuse.daos
df: no file systems processed
$ mkdir /tmp/daos_test1
$ dfuse --m=/tmp/daos_test1 --pool=70f73efc-848e-4f6e-b4fd-909bcf9bd427 --cont=cf2a95ce-9910-4d5e-814c-cafb0a7f0944
$ df -h -t fuse.daos
Filesystem      Size  Used Avail Use% Mounted on
dfuse            19G  1.1M   19G   1% /tmp/daos_test1
$ fio --name=random-write --ioengine=pvsync --rw=randwrite --bs=4k --size=128M --nrfiles=4 --directory=/tmp/daos_test1 --numjobs=8 --iodepth=16 --runtime=60 --time_based --direct=1 --buffered=0 --randrepeat=0 --norandommap --refill_buffers --group_reporting
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=pvsync, iodepth=16
...
fio-3.7
Starting 8 processes
random-write: Laying out IO files (4 files / total 128MiB)
random-write: Laying out IO files (4 files / total 128MiB)
random-write: Laying out IO files (4 files / total 128MiB)
random-write: Laying out IO files (4 files / total 128MiB)
random-write: Laying out IO files (4 files / total 128MiB)
random-write: Laying out IO files (4 files / total 128MiB)
random-write: Laying out IO files (4 files / total 128MiB)
random-write: Laying out IO files (4 files / total 128MiB)
Jobs: 8 (f=32): [w(8)][100.0%][r=0KiB/s,w=96.1MiB/s][r=0,w=24.6k IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=8): err= 0: pid=27879: Sat Apr 17 01:12:57 2021
  write: IOPS=24.4k, BW=95.3MiB/s (99.9MB/s)(5716MiB/60001msec)
    clat (usec): min=220, max=6687, avg=326.19, stdev=55.29
     lat (usec): min=220, max=6687, avg=326.28, stdev=55.29
    clat percentiles (usec):
     |  1.00th=[  260],  5.00th=[  273], 10.00th=[  285], 20.00th=[  293],
     | 30.00th=[  306], 40.00th=[  314], 50.00th=[  322], 60.00th=[  330],
     | 70.00th=[  338], 80.00th=[  355], 90.00th=[  375], 95.00th=[  396],
     | 99.00th=[  445], 99.50th=[  465], 99.90th=[  523], 99.95th=[  562],
     | 99.99th=[ 1827]
   bw (  KiB/s): min=10976, max=12496, per=12.50%, avg=12191.82, stdev=157.87, samples=952
   iops        : min= 2744, max= 3124, avg=3047.92, stdev=39.47, samples=952
  lat (usec)   : 250=0.23%, 500=99.61%, 750=0.15%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%
  cpu          : usr=0.81%, sys=1.69%, ctx=1463535, majf=0, minf=308
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1463226,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: bw=95.3MiB/s (99.9MB/s), 95.3MiB/s-95.3MiB/s (99.9MB/s-99.9MB/s), io=5716MiB (5993MB), run=60001-60001msec
   


Dfuse with leader-rank rebuild:

...

Successfully created container d71ff6a5-15a5-43fe-b829-bef9c65b9ccb



Run mpirun mdtest with rebuild

$ /usr/lib64/mpich/bin/mpirun -host boro-8 -np 30 mdtest -a DFS -z 0 -F -C -i 100 -n 1667 -e 4096 -d / -w 4096 --dfs.chunk_size 1048576 --dfs.cont $DAOS_CONT --dfs.destroy --dfs.dir_oclass RP_3G1 --dfs.group daos_server --dfs.oclass RP_3G1 --dfs.pool $DAOS_POOL

...