Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

All tests run with ofi+psm2, ib0, with exception of IOR and daosbench; these were run with sockets. See NOTE below.

daos_test: Run with 8 server (boro-[3-10]), 4 client (boro-[11-14]). Killed servers, cleaned /mnt/daos in between runs listed below.

...

daosbench and daos_perf were both run with DAOS_IMPLICIT_PURGE=1.


NOTE: IOR and daosbench were run with sockets due to consistent error at start of tests when run under psm2:

Code Block
[sdwillso@boro-11 ~]$ orterun -np 1 --hostfile ~/hostlists/daos_client_hostlist --mca mtl ^psm2,ofi  --ompi-server file:~/scripts/uri.txt ior -v -W -i 5 -a DAOS -w -o `uuidgen` -b 5g -t 1m -O daospool=00c026da-1231-4ba8-b676-95a71fa1fea5,daosrecordsize=1m,daosstripesize=1m,daosstripecount=1024,daosaios=16,daosobjectclass=LARGE,daosPoolSvc=1,daosepoch=1
IOR-3.0.1: MPI Coordinated Test of Parallel I/O

Began: Tue May 29 22:11:38 2018
Command line used: ior -v -W -i 5 -a DAOS -w -o 657f6932-7df3-40e6-8e65-76c5ca18bf07 -b 5g -t 1m -O daospool=00c026da-1231-4ba8-b676-95a71fa1fea5,daosrecordsize=1m,daosstripesize=1m,daosstripecount=1024,daosaios=16,daosobjectclass=LARGE,daosPoolSvc=1,daosepoch=1
Machine: Linux boro-11.boro.hpdd.intel.com
Start time skew across all tasks: 0.00 sec

Test 0 started: Tue May 29 22:11:38 2018
Path: /home/sdwillso
FS: 3.8 TiB   Used FS: 9.3%   Inodes: 250.0 Mi   Used Inodes: 1.9%
Participating tasks: 1
[0] WARNING: USING daosStripeMax CAUSES READS TO RETURN INVALID DATA
boro-11.boro.hpdd.intel.com.19512Received eager message(s) ptype=0x1 opcode=0xc9 from an unknown process (err=49)
boro-11.boro.hpdd.intel.com.19512Received eager message(s) ptype=0x1 opcode=0xc9 from an unknown process (err=49)
boro-11.boro.hpdd.intel.com.19512Received eager message(s) ptype=0x1 opcode=0xc9 from an unknown process (err=49)
boro-11.boro.hpdd.intel.com.19512Received eager message(s) ptype=0x1 opcode=0xc9 from an unknown process (err=49)
...
etc. repeating until ctrl+c

Suggestions for remedying issue involved ensuring nodes were clean. All processes cleaned, /mnt/daos/ cleaned, nodes rebooted, still see same issue.

Test Results

daos_test

Separate runs with cleanup in between:

...

daosperf

1K Records

CREDITS=1

Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-11 ~]$ CREDITS=1 ./daos_m/src/tests/daos_perf.sh daos 200 1000 1K
+ /home/sdwillso/daos_m/opt/ompi/bin/orterun -quiet --hostfile /home/sdwillso/scripts/host.cli.1 --ompi-server file:/home/sdwillso/scripts/uri.txt -x DD_SUBSYS= -x DD_MASK= -x D_LOG_FILE=/tmp/daos_perf.log /home/sdwillso/daos_m/install/bin/daos_perf -T daos -P 2G -d 1 -a 200 -r 1000 -s 1K -C 1 -t -z
Test :
	DAOS (full stack)
Parameters :
	pool size     : 2048 MB
	credits       : 1 (sync I/O for -ve)
	obj_per_cont  : 1 x 8 (procs)
	dkey_per_obj  : 1
	akey_per_dkey : 200
	recx_per_akey : 1000
	value type    : single
	value size    : 1024
	zero copy     : yes
	overwrite     : yes
	VOS file      : <NULL>
Started...
update successfully completed:
	duration : 10.710054  sec
	bandwith : 145.891    MB/sec
	rate     : 149392.34  IO/sec
	latency  : 6.694      us (nonsense if credits > 1)
Duration across processes:
MAX duration : 10.709952  sec
MIN duration : 5.946499   sec
Average duration : 8.273347   sec

CREDITS=8

Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-11 ~]$ CREDITS=8 ./daos_m/src/tests/daos_perf.sh daos 200 1000 1K
+ /home/sdwillso/daos_m/opt/ompi/bin/orterun -quiet --hostfile /home/sdwillso/scripts/host.cli.1 --ompi-server file:/home/sdwillso/scripts/uri.txt -x DD_SUBSYS= -x DD_MASK= -x D_LOG_FILE=/tmp/daos_perf.log /home/sdwillso/daos_m/install/bin/daos_perf -T daos -P 2G -d 1 -a 200 -r 1000 -s 1K -C 8 -t -z
Test :
	DAOS (full stack)
Parameters :
	pool size     : 2048 MB
	credits       : 8 (sync I/O for -ve)
	obj_per_cont  : 1 x 8 (procs)
	dkey_per_obj  : 1
	akey_per_dkey : 200
	recx_per_akey : 1000
	value type    : single
	value size    : 1024
	zero copy     : yes
	overwrite     : yes
	VOS file      : <NULL>
Started...
update successfully completed:
	duration : 9.245000   sec
	bandwith : 169.010    MB/sec
	rate     : 173066.52  IO/sec
	latency  : 5.778      us (nonsense if credits > 1)
Duration across processes:
MAX duration : 9.241115   sec
MIN duration : 3.688652   sec
Average duration : 6.545638   sec

4K Records

CREDITS=1

Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-11 ~]$ CREDITS=1 ./daos_m/src/tests/daos_perf.sh daos 200 1000 4K
+ /home/sdwillso/daos_m/opt/ompi/bin/orterun -quiet --hostfile /home/sdwillso/scripts/host.cli.1 --ompi-server file:/home/sdwillso/scripts/uri.txt -x DD_SUBSYS= -x DD_MASK= -x D_LOG_FILE=/tmp/daos_perf.log /home/sdwillso/daos_m/install/bin/daos_perf -T daos -P 2G -d 1 -a 200 -r 1000 -s 4K -C 1 -t -z
Test :
	DAOS (full stack)
Parameters :
	pool size     : 2048 MB
	credits       : 1 (sync I/O for -ve)
	obj_per_cont  : 1 x 8 (procs)
	dkey_per_obj  : 1
	akey_per_dkey : 200
	recx_per_akey : 1000
	value type    : single
	value size    : 4096
	zero copy     : yes
	overwrite     : yes
	VOS file      : <NULL>
Started...
update successfully completed:
	duration : 14.762039  sec
	bandwith : 423.383    MB/sec
	rate     : 108386.11  IO/sec
	latency  : 9.226      us (nonsense if credits > 1)
Duration across processes:
MAX duration : 14.757540  sec
MIN duration : 6.194879   sec
Average duration : 11.559975  sec

IOR w/sockets, 2 client 10GB pool, data verification enabled

Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-11 ~]$ orterun -np 1 --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt ior -v -W -i 5 -a DAOS -w -o `uuidgen` -b 5g -t 1m -O daospool=0c35d24c-df37-43e3-9283-0150272956df,daosrecordsize=1m,daosstripesize=1m,daosstripecount=1024,daosaios=16,daosobjectclass=LARGE,daosPoolSvc=1,daosepoch=1
IOR-3.0.1: MPI Coordinated Test of Parallel I/O

Began: Tue May 29 22:14:58 2018
Command line used: ior -v -W -i 5 -a DAOS -w -o 882d642f-892c-4c13-9f6e-2cd625bb5981 -b 5g -t 1m -O daospool=0c35d24c-df37-43e3-9283-0150272956df,daosrecordsize=1m,daosstripesize=1m,daosstripecount=1024,daosaios=16,daosobjectclass=LARGE,daosPoolSvc=1,daosepoch=1
Machine: Linux boro-11.boro.hpdd.intel.com
Start time skew across all tasks: 0.00 sec

Test 0 started: Tue May 29 22:14:58 2018
Path: /home/sdwillso
FS: 3.8 TiB   Used FS: 9.3%   Inodes: 250.0 Mi   Used Inodes: 1.9%
Participating tasks: 1
[0] WARNING: USING daosStripeMax CAUSES READS TO RETURN INVALID DATA
Summary:
	api                = DAOS
	test filename      = 882d642f-892c-4c13-9f6e-2cd625bb5981
	access             = single-shared-file, independent
	pattern            = segmented (1 segment)
	ordering in a file = sequential offsets
	ordering inter file= no tasks offsets
	clients            = 1 (1 per node)
	repetitions        = 5
	xfersize           = 1 MiB
	blocksize          = 5 GiB
	aggregate filesize = 5 GiB

access    bw(MiB/s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
------    ---------  ---------- ---------  --------   --------   --------   --------   ----
Commencing write performance test: Tue May 29 22:14:58 2018
write     1888.81    5242880    1024.00    0.002534   2.70       0.007510   2.71       0   
Verifying contents of the file(s) just written.
Tue May 29 22:15:00 2018

remove    -          -          -          -          -          -          0.003222   0   
Commencing write performance test: Tue May 29 22:15:07 2018
write     1958.07    5242880    1024.00    0.001571   2.61       0.005053   2.61       1   
Verifying contents of the file(s) just written.
Tue May 29 22:15:10 2018

remove    -          -          -          -          -          -          0.003029   1   
Commencing write performance test: Tue May 29 22:15:16 2018
write     1971.30    5242880    1024.00    0.001469   2.59       0.010100   2.60       2   
Verifying contents of the file(s) just written.
Tue May 29 22:15:19 2018

remove    -          -          -          -          -          -          0.003084   2   
Commencing write performance test: Tue May 29 22:15:25 2018
write     1988.23    5242880    1024.00    0.001451   2.57       0.004563   2.58       3   
Verifying contents of the file(s) just written.
Tue May 29 22:15:28 2018

remove    -          -          -          -          -          -          0.003083   3   
Commencing write performance test: Tue May 29 22:15:34 2018
write     1986.93    5242880    1024.00    0.001568   2.57       0.007999   2.58       4   
Verifying contents of the file(s) just written.
Tue May 29 22:15:37 2018

remove    -          -          -          -          -          -          0.003055   4   

Max Write: 1988.23 MiB/sec (2084.81 MB/sec)

Summary of all tests:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write        1988.23    1888.81    1958.67      36.64    2.61496 0 1 1 5 0 0 1 0 0 1 5368709120 1048576 5368709120 DAOS 0

Finished: Tue May 29 22:15:43 2018

daos_bench w/sockets

kv-idx-update

Time: 74.322927 seconds (13454.798341 ops per second)
Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-11 ~]$ orterun -np 1 --ompi-server file:~/scripts/uri.txt daosbench --test=kv-idx-update --testid=1 --svc=1 --dpool=7caf0d73-6a43-4dc3-a4ae-999c6bc511dc --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Tue May 29 22:26:22 2018
=================================
===============================
Test Setup
---------------
Test: kv-idx-update
DAOS pool :7caf0d73-6a43-4dc3-a4ae-999c6bc511dc
DAOS container :c4b70709-359b-40f6-8d1c-c18709769fe4
Value buffer size: 64
Number of processes: 1
Number of indexes/process: 1000000
Number of asynchronous I/O: 32
===============================
kv-idx-update
Time: 74.322927 seconds (13454.798341 ops per second)

Ended at Tue May 29 22:27:38 2018

kv-dkey-update

Time: 0.010937 seconds (9143.062051 ops per second)
Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-11 ~]$ orterun -np 1 --ompi-server file:~/scripts/uri.txt daosbench --test=kv-dkey-update --testid=1 --svc=1 --dpool=fd393171-2f4d-4f35-8e28-f3ee12d12da8 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Tue May 29 22:33:34 2018
=================================
===============================
Test Setup
---------------
Test: kv-dkey-update
DAOS pool :fd393171-2f4d-4f35-8e28-f3ee12d12da8
DAOS container :5a1287bd-9901-47b5-94b6-87b3710c8561
Value buffer size: 64
Number of processes: 1
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-dkey-update
Time: 0.010937 seconds (9143.062051 ops per second)

Ended at Tue May 29 22:33:34 2018

kv-akey-update

Time: 0.008755 seconds (11421.599459 ops per second)
Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-11 ~]$ orterun -np 1 --ompi-server file:~/scripts/uri.txt daosbench --test=kv-akey-update --testid=1 --svc=1 --dpool=6503eed4-e8fc-41b6-a1b9-5c16ec924b63 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Tue May 29 22:35:01 2018
=================================
===============================
Test Setup
---------------
Test: kv-akey-update
DAOS pool :6503eed4-e8fc-41b6-a1b9-5c16ec924b63
DAOS container :0679d0db-23d1-42c6-9bdf-e5caaff2dd5f
Value buffer size: 64
Number of processes: 1
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-akey-update
Time: 0.008755 seconds (11421.599459 ops per second)

Ended at Tue May 29 22:35:01 2018

kv-dkey-fetch

Time: 0.009627 seconds (10387.624339 ops per second)
Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-11 ~]$ orterun -np 1 --ompi-server file:~/scripts/uri.txt daosbench --test=kv-dkey-fetch --testid=1 --svc=1 --dpool=dd571fda-5d1f-40fd-b99b-7d3b1d1512f0 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Tue May 29 22:36:53 2018
=================================
===============================
Test Setup
---------------
Test: kv-dkey-fetch
DAOS pool :dd571fda-5d1f-40fd-b99b-7d3b1d1512f0
DAOS container :9e328b31-1114-422d-807d-d9dbd0c336d2
Value buffer size: 64
Number of processes: 1
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-dkey-fetch
Time: 0.009627 seconds (10387.624339 ops per second)

Ended at Tue May 29 22:36:53 2018

kv-akey-fetch

Time: 0.008745 seconds (11434.480159 ops per second)
Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-11 ~]$ orterun -np 1 --ompi-server file:~/scripts/uri.txt daosbench --test=kv-akey-fetch --testid=1 --svc=1 --dpool=ef698345-9d64-4ab3-9f8c-4833d56a0a45 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Tue May 29 22:38:22 2018
=================================
===============================
Test Setup
---------------
Test: kv-akey-fetch
DAOS pool :ef698345-9d64-4ab3-9f8c-4833d56a0a45
DAOS container :d5fb38dc-64fe-4ce2-9ac5-fe3ed08b9f63
Value buffer size: 64
Number of processes: 1
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-akey-fetch
Time: 0.008745 seconds (11434.480159 ops per second)

Ended at Tue May 29 22:38:22 2018

mpich tests

Results:  Failures seen on several tests. Mohamad is currently debugging; this page will be updated as possible.