5-1-18

Test Configuration

Run with ofi+psm2, ib0. 

daos_test: Run with 8 server (boro-[3-10]), 2 client (boro-[11-12]). Killed servers, cleaned /mnt/daos in between runs listed below.

Tests requiring pool to be created via dmg used 4GB pool. These used boro-11 as client.

Test Results

daos_test

Separate runs with cleanup in between:

  • -mpcCiAeoRd - PASS
  • -r - FAIL with  DAOS-896 - Getting issue details... STATUS

Rebuild failures:

[sdwillso@boro-3 ~]$ orterun --mca mtl ^psm2,ofi -np 1 --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daos_test -r


=================
DAOS rebuild tests..
=================
[==========] Running 18 test(s).
setup: creating pool size=10 GB
6e732a58: rank 1 became pool service leader 0
setup: created pool 6e732a58-0b55-492a-87ed-d4ef9868e541
setup: connecting to pool
connected to pool, ntarget=8
setup: creating container b7d6c348-68c2-47b3-8f14-feba5c5435e9
setup: opening container
[ RUN      ] REBUILD1: rebuild small rec mulitple dkeys
Insert 1000 kv record in object 74027970414510138.1
daos_io_server: src/vos/vos_tree.c:255: kb_key_cmp_uint64: Assertion `krec->kr_size == sizeof(uint64_t)' failed.
daos_io_server: src/vos/vos_tree.c:255: kb_key_cmp_uint64: Assertion `krec->kr_size == sizeof(uint64_t)' failed.
daos_io_server: src/vos/vos_tree.c:255: kb_key_cmp_uint64: Assertion `krec->kr_size == sizeof(uint64_t)' failed.
2018/05/03 21:16:25 DAOS I/O server exited with error: signal: aborted (core dumped)
2018/05/03 21:16:26 DAOS I/O server exited with error: signal: aborted (core dumped)
2018/05/03 21:16:26 DAOS I/O server exited with error: signal: aborted (core dumped)

daosperf

1K Records

CREDITS=1

[sdwillso@boro-11 ~]$ CREDITS=1 ./daos_m/src/tests/daos_perf.sh daos 200 1000 1K
+ /home/sdwillso/daos_m/opt/ompi/bin/orterun -quiet --hostfile /home/sdwillso/scripts/host.cli.1 --ompi-server file:/home/sdwillso/scripts/uri.txt -x DD_SUBSYS= -x DD_MASK= -x D_LOG_FILE=/tmp/daos_perf.log /home/sdwillso/daos_m/install/bin/daos_perf -T daos -P 2G -d 1 -a 200 -r 1000 -s 1K -C 1 -t -z
Test :
	DAOS (full stack)
Parameters :
	pool size     : 2048 MB
	credits       : 1 (sync I/O for -ve)
	obj_per_cont  : 1 x 8 (procs)
	dkey_per_obj  : 1
	akey_per_dkey : 200
	recx_per_akey : 1000
	value type    : single
	value size    : 1024
	zero copy     : yes
	overwrite     : yes
	VOS file      : <NULL>
Started...
update successfully completed:
	duration : 5.792262   sec
	bandwith : 269.756    MB/sec
	rate     : 276230.59  IO/sec
	latency  : 3.620      us (nonsense if credits > 1)
Duration across processes:
MAX duration : 5.791924   sec
MIN duration : 3.733267   sec
Average duration : 4.751263   sec


CREDITS=8

[sdwillso@boro-11 ~]$ CREDITS=8 ./daos_m/src/tests/daos_perf.sh daos 200 1000 1K
+ /home/sdwillso/daos_m/opt/ompi/bin/orterun -quiet --hostfile /home/sdwillso/scripts/host.cli.1 --ompi-server file:/home/sdwillso/scripts/uri.txt -x DD_SUBSYS= -x DD_MASK= -x D_LOG_FILE=/tmp/daos_perf.log /home/sdwillso/daos_m/install/bin/daos_perf -T daos -P 2G -d 1 -a 200 -r 1000 -s 1K -C 8 -t -z
Test :
	DAOS (full stack)
Parameters :
	pool size     : 2048 MB
	credits       : 8 (sync I/O for -ve)
	obj_per_cont  : 1 x 8 (procs)
	dkey_per_obj  : 1
	akey_per_dkey : 200
	recx_per_akey : 1000
	value type    : single
	value size    : 1024
	zero copy     : yes
	overwrite     : yes
	VOS file      : <NULL>
Started...
update successfully completed:
	duration : 5.260553   sec
	bandwith : 297.022    MB/sec
	rate     : 304150.54  IO/sec
	latency  : 3.288      us (nonsense if credits > 1)
Duration across processes:
MAX duration : 5.260145   sec
MIN duration : 2.257793   sec
Average duration : 3.737209   sec

4K Records

CREDITS=1

[sdwillso@boro-11 ~]$ CREDITS=1 ./daos_m/src/tests/daos_perf.sh daos 200 1000 4K
+ /home/sdwillso/daos_m/opt/ompi/bin/orterun -quiet --hostfile /home/sdwillso/scripts/host.cli.1 --ompi-server file:/home/sdwillso/scripts/uri.txt -x DD_SUBSYS= -x DD_MASK= -x D_LOG_FILE=/tmp/daos_perf.log /home/sdwillso/daos_m/install/bin/daos_perf -T daos -P 2G -d 1 -a 200 -r 1000 -s 4K -C 1 -t -z
Test :
	DAOS (full stack)
Parameters :
	pool size     : 2048 MB
	credits       : 1 (sync I/O for -ve)
	obj_per_cont  : 1 x 8 (procs)
	dkey_per_obj  : 1
	akey_per_dkey : 200
	recx_per_akey : 1000
	value type    : single
	value size    : 4096
	zero copy     : yes
	overwrite     : yes
	VOS file      : <NULL>
Started...
update successfully completed:
	duration : 9.773947   sec
	bandwith : 639.455    MB/sec
	rate     : 163700.50  IO/sec
	latency  : 6.109      us (nonsense if credits > 1)
Duration across processes:
MAX duration : 9.773532   sec
MIN duration : 6.241852   sec
Average duration : 8.066638   sec

IOR

[sdwillso@boro-11 ~]$ orterun -np 1 --mca mtl ^psm2,ofi  --ompi-server file:~/scripts/uri.txt ior -v  -i 5 -a DAOS -w -o `uuidgen` -b 10g -t 1m -O daospool=2ba87169-7506-4362-ad20-ff2c5d8259c1,daosrecordsize=1m,daosstripesize=1m,daosstripecount=1024,daosaios=16,daosobjectclass=LARGE,daosPoolSvc=1,daosepoch=1
IOR-3.0.1: MPI Coordinated Test of Parallel I/O

Began: Thu May  3 21:41:48 2018
Command line used: ior -v -i 5 -a DAOS -w -o 472107e8-617c-4d8f-bba4-904afffe75ba -b 10g -t 1m -O daospool=2ba87169-7506-4362-ad20-ff2c5d8259c1,daosrecordsize=1m,daosstripesize=1m,daosstripecount=1024,daosaios=16,daosobjectclass=LARGE,daosPoolSvc=1,daosepoch=1
Machine: Linux boro-11.boro.hpdd.intel.com
Start time skew across all tasks: 0.00 sec

Test 0 started: Thu May  3 21:41:48 2018
Path: /home/sdwillso
FS: 3.8 TiB   Used FS: 9.0%   Inodes: 250.0 Mi   Used Inodes: 1.8%
Participating tasks: 1
[0] WARNING: USING daosStripeMax CAUSES READS TO RETURN INVALID DATA
Summary:
	api                = DAOS
	test filename      = 472107e8-617c-4d8f-bba4-904afffe75ba
	access             = single-shared-file, independent
	pattern            = segmented (1 segment)
	ordering in a file = sequential offsets
	ordering inter file= no tasks offsets
	clients            = 1 (1 per node)
	repetitions        = 5
	xfersize           = 1 MiB
	blocksize          = 10 GiB
	aggregate filesize = 10 GiB

access    bw(MiB/s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
------    ---------  ---------- ---------  --------   --------   --------   --------   ----
Commencing write performance test: Thu May  3 21:41:48 2018
write     10836      10485760   1024.00    0.000745   0.942209   0.002058   0.945029   0   
remove    -          -          -          -          -          -          0.004319   0   
Commencing write performance test: Thu May  3 21:41:49 2018
write     10892      10485760   1024.00    0.000420   0.936785   0.002931   0.940151   1   
remove    -          -          -          -          -          -          0.003810   1   
Commencing write performance test: Thu May  3 21:41:50 2018
write     10923      10485760   1024.00    0.000408   0.935018   0.002065   0.937501   2   
remove    -          -          -          -          -          -          0.003887   2   
Commencing write performance test: Thu May  3 21:41:51 2018
write     10909      10485760   1024.00    0.000406   0.936528   0.001716   0.938661   3   
remove    -          -          -          -          -          -          0.003573   3   
Commencing write performance test: Thu May  3 21:41:52 2018
write     10941      10485760   1024.00    0.000395   0.933429   0.002077   0.935913   4   
remove    -          -          -          -          -          -          0.003628   4   

Max Write: 10941.19 MiB/sec (11472.67 MB/sec)

Summary of all tests:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write       10941.19   10835.65   10900.10      36.06    0.93945 0 1 1 5 0 0 1 0 0 1 10737418240 1048576 10737418240 DAOS 0

Finished: Thu May  3 21:41:55 2018

daos_bench

kv-idx-update

Time: 17.380055 seconds (57537.217157 ops per second)
[sdwillso@boro-11 ~]$ orterun -np 1 --mca mtl ^psm2,ofi  --ompi-server file:~/scripts/uri.txt daosbench --test=kv-idx-update --testid=1 --svc=1 --dpool=09c4817f-385e-4f72-b2fb-f955ed71eca0 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Thu May  3 22:35:45 2018
=================================
===============================
Test Setup
---------------
Test: kv-idx-update
DAOS pool :09c4817f-385e-4f72-b2fb-f955ed71eca0
DAOS container :b71505ad-278b-4fb4-9ca8-24c7d006fa6a
Value buffer size: 64
Number of processes: 1
Number of indexes/process: 1000000
Number of asynchronous I/O: 32
===============================
kv-idx-update
Time: 17.380055 seconds (57537.217157 ops per second)

kv-dkey-update

Time: 0.003823 seconds (26159.067441 ops per second)
[sdwillso@boro-11 ~]$ orterun -np 1 --mca mtl ^psm2,ofi  --ompi-server file:~/scripts/uri.txt daosbench --test=kv-dkey-update --testid=1 --svc=1 --dpool=901bb1cd-2d4d-4435-8f1d-cc55c6f7dda5 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Thu May  3 21:59:29 2018
=================================
===============================
Test Setup
---------------
Test: kv-dkey-update
DAOS pool :901bb1cd-2d4d-4435-8f1d-cc55c6f7dda5
DAOS container :d0d6d572-3112-4d28-900b-437be7644526
Value buffer size: 64
Number of processes: 1
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-dkey-update
Time: 0.003823 seconds (26159.067441 ops per second)

Ended at Thu May  3 21:59:29 2018

kv-akey-update

Time: 0.003510 seconds (28488.552993 ops per second)
[sdwillso@boro-11 ~]$ orterun -np 1 --mca mtl ^psm2,ofi  --ompi-server file:~/scripts/uri.txt daosbench --test=kv-akey-update --testid=1 --svc=1 --dpool=d5b5c2ba-a44e-4812-8eda-ca165903e7d7 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Thu May  3 22:01:32 2018
=================================
===============================
Test Setup
---------------
Test: kv-akey-update
DAOS pool :d5b5c2ba-a44e-4812-8eda-ca165903e7d7
DAOS container :e4ca0ff0-ce93-4024-9848-5528fc898fe4
Value buffer size: 64
Number of processes: 1
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-akey-update
Time: 0.003510 seconds (28488.552993 ops per second)

kv-dkey-fetch

Time: 0.001172 seconds (85348.438817 ops per second)
[sdwillso@boro-11 ~]$ orterun -np 1 --mca mtl ^psm2,ofi  --ompi-server file:~/scripts/uri.txt daosbench --test=kv-dkey-fetch --testid=1 --svc=1 --dpool=bdedb52e-8a48-4cc3-bd90-331ccd059143 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Thu May  3 22:03:17 2018
=================================
===============================
Test Setup
---------------
Test: kv-dkey-fetch
DAOS pool :bdedb52e-8a48-4cc3-bd90-331ccd059143
DAOS container :10de3164-fa3f-4de2-9375-90c931950d6e
Value buffer size: 64
Number of processes: 1
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-dkey-fetch
Time: 0.001172 seconds (85348.438817 ops per second)

kv-akey-fetch

Time: 0.001174 seconds (85189.546618 ops per second)
[sdwillso@boro-11 ~]$ orterun -np 1 --mca mtl ^psm2,ofi  --ompi-server file:~/scripts/uri.txt daosbench --test=kv-akey-fetch --testid=1 --svc=1 --dpool=95a1d949-f32c-488b-bcc8-c9c072bf287d --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Thu May  3 22:05:05 2018
=================================
===============================
Test Setup
---------------
Test: kv-akey-fetch
DAOS pool :95a1d949-f32c-488b-bcc8-c9c072bf287d
DAOS container :68c4e3d3-a987-4cd6-879a-698f3c7d4e2c
Value buffer size: 64
Number of processes: 1
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-akey-fetch
Time: 0.001174 seconds (85189.546618 ops per second)

mpich tests

These were run under sockets due to an issue with psm2 provider: (ticket should go here once created)

Update, Mohamad was able to run as my user, on my nodes, the mpich tests over psm2. Still working out issue that I see. His run was also No Errors on all tests.


Results: No Errors on all tests

[sdwillso@boro-11 test]$ ./run_daos_tests daos:test_file
**** Testing I/O functions ****
**** Testing simple.c ****
POOL UUID = 21272a57-426b-4afa-84c6-b97b998aff28
SVC LIST = 0
 No Errors
**** Testing async.c ****
POOL UUID = 21272a57-426b-4afa-84c6-b97b998aff28
SVC LIST = 0
 No Errors
**** Testing async-multiple.c ****
POOL UUID = 21272a57-426b-4afa-84c6-b97b998aff28
SVC LIST = 0
 No Errors
**** Testing coll_test.c ****
POOL UUID = 21272a57-426b-4afa-84c6-b97b998aff28
SVC LIST = 0
 No Errors
**** Testing excl.c ****
POOL UUID = 21272a57-426b-4afa-84c6-b97b998aff28
SVC LIST = 0
../../../../src/mpi/romio/adio/ad_daos/ad_daos_open.c:281 ADIOI_DAOS_Open() - Array exists (EXCL mode) (-1004)

 No Errors
**** Testing file_info.c ****
POOL UUID = 21272a57-426b-4afa-84c6-b97b998aff28
SVC LIST = 0
 No Errors
**** Testing i_noncontig.c ****
POOL UUID = 21272a57-426b-4afa-84c6-b97b998aff28
SVC LIST = 0
 No Errors
**** Testing noncontig.c ****
POOL UUID = 21272a57-426b-4afa-84c6-b97b998aff28
SVC LIST = 0
 No Errors
**** Testing noncontig_coll.c ****
POOL UUID = 21272a57-426b-4afa-84c6-b97b998aff28
SVC LIST = 0
 No Errors
**** Testing noncontig_coll2.c ****
POOL UUID = 21272a57-426b-4afa-84c6-b97b998aff28
SVC LIST = 0
 No Errors
**** Testing aggregation1.c ****
POOL UUID = 21272a57-426b-4afa-84c6-b97b998aff28
SVC LIST = 0
 No Errors
**** Testing aggregation2.c ****
POOL UUID = 21272a57-426b-4afa-84c6-b97b998aff28
SVC LIST = 0
 No Errors
**** Testing hindexed ****
POOL UUID = 21272a57-426b-4afa-84c6-b97b998aff28
SVC LIST = 0
-------------------------------------------------------
   [ 0 1 2 3 4 5 6 7 8 9  0 1 2 3 4 5 6 7 8 9 ]

[ 0] 0 1 2     3 4 5      D E F     G H I    
[ 1]                                         
[ 2] 6 7 8     9 : ;      J K L     M N O    
[ 3]                                         
[ 4]                                         
[ 5] X Y Z     [ \ ]      l m n     o p q    
[ 6]                                         
[ 7] ^ _ `     a b c      r s t     u v w    
[ 8]                                         
[ 9]                                         

[10] 0 1 2     3 4 5      D E F     G H I    
[11]                                         
[12] 6 7 8     9 : ;      J K L     M N O    
[13]                                         
[14]                                         
[15] X Y Z     [ \ ]      l m n     o p q    
[16]                                         
[17] ^ _ `     a b c      r s t     u v w    
[18]                                         
[19]                                         

 No Errors
**** Testing split_coll.c ****
POOL UUID = 21272a57-426b-4afa-84c6-b97b998aff28
SVC LIST = 0
 No Errors
**** Testing psimple.c ****
POOL UUID = 21272a57-426b-4afa-84c6-b97b998aff28
SVC LIST = 0
 No Errors
**** Testing error.c ****
POOL UUID = 21272a57-426b-4afa-84c6-b97b998aff28
SVC LIST = 0
 No Errors
**** Testing status.c ****
POOL UUID = 21272a57-426b-4afa-84c6-b97b998aff28
SVC LIST = 0
 No Errors
**** Testing types_with_zeros ****
POOL UUID = 21272a57-426b-4afa-84c6-b97b998aff28
SVC LIST = 0
 No Errors
**** Testing darray_read ****
POOL UUID = 21272a57-426b-4afa-84c6-b97b998aff28
SVC LIST = 0
 No Errors
**** Testing fcoll_test.f ****
POOL UUID = 21272a57-426b-4afa-84c6-b97b998aff28
SVC LIST = 0
  No Errors
**** Testing pfcoll_test.f ****
POOL UUID = 21272a57-426b-4afa-84c6-b97b998aff28
SVC LIST = 0
  No Errors