Tip of master, commit 9df82adfe091ee598d8f8c97223482faf8ed701c
All tests run with ofi+psm2, ib0, with exception of IOR and daosbench; these were run with sockets. See NOTE below.
daos_test: Run with 8 server (boro-[3-10]), 4 client (boro-[11-14]). Killed servers, cleaned /mnt/daos in between runs listed below.
Tests requiring pool to be created via dmg used 4GB pool. These used boro-11 as client.
mpich tests used boro-3 as server, boro-11 as client, with a 1GB pool.
daosbench and daos_perf were both run with DAOS_IMPLICIT_PURGE=1.
NOTE: IOR and daosbench were run with sockets due to consistent error at start of tests when run under psm2:
[sdwillso@boro-11 ~]$ orterun -np 1 --hostfile ~/hostlists/daos_client_hostlist --mca mtl ^psm2,ofi --ompi-server file:~/scripts/uri.txt ior -v -W -i 5 -a DAOS -w -o `uuidgen` -b 5g -t 1m -O daospool=00c026da-1231-4ba8-b676-95a71fa1fea5,daosrecordsize=1m,daosstripesize=1m,daosstripecount=1024,daosaios=16,daosobjectclass=LARGE,daosPoolSvc=1,daosepoch=1 IOR-3.0.1: MPI Coordinated Test of Parallel I/O Began: Tue May 29 22:11:38 2018 Command line used: ior -v -W -i 5 -a DAOS -w -o 657f6932-7df3-40e6-8e65-76c5ca18bf07 -b 5g -t 1m -O daospool=00c026da-1231-4ba8-b676-95a71fa1fea5,daosrecordsize=1m,daosstripesize=1m,daosstripecount=1024,daosaios=16,daosobjectclass=LARGE,daosPoolSvc=1,daosepoch=1 Machine: Linux boro-11.boro.hpdd.intel.com Start time skew across all tasks: 0.00 sec Test 0 started: Tue May 29 22:11:38 2018 Path: /home/sdwillso FS: 3.8 TiB Used FS: 9.3% Inodes: 250.0 Mi Used Inodes: 1.9% Participating tasks: 1 [0] WARNING: USING daosStripeMax CAUSES READS TO RETURN INVALID DATA boro-11.boro.hpdd.intel.com.19512Received eager message(s) ptype=0x1 opcode=0xc9 from an unknown process (err=49) boro-11.boro.hpdd.intel.com.19512Received eager message(s) ptype=0x1 opcode=0xc9 from an unknown process (err=49) boro-11.boro.hpdd.intel.com.19512Received eager message(s) ptype=0x1 opcode=0xc9 from an unknown process (err=49) boro-11.boro.hpdd.intel.com.19512Received eager message(s) ptype=0x1 opcode=0xc9 from an unknown process (err=49) ... etc. repeating until ctrl+c |
Suggestions for remedying issue involved ensuring nodes were clean. All processes cleaned, /mnt/daos/ cleaned, nodes rebooted, still see same issue.
Separate runs with cleanup in between:
CREDITS=1
[sdwillso@boro-11 ~]$ CREDITS=1 ./daos_m/src/tests/daos_perf.sh daos 200 1000 1K + /home/sdwillso/daos_m/opt/ompi/bin/orterun -quiet --hostfile /home/sdwillso/scripts/host.cli.1 --ompi-server file:/home/sdwillso/scripts/uri.txt -x DD_SUBSYS= -x DD_MASK= -x D_LOG_FILE=/tmp/daos_perf.log /home/sdwillso/daos_m/install/bin/daos_perf -T daos -P 2G -d 1 -a 200 -r 1000 -s 1K -C 1 -t -z Test : DAOS (full stack) Parameters : pool size : 2048 MB credits : 1 (sync I/O for -ve) obj_per_cont : 1 x 8 (procs) dkey_per_obj : 1 akey_per_dkey : 200 recx_per_akey : 1000 value type : single value size : 1024 zero copy : yes overwrite : yes VOS file : <NULL> Started... update successfully completed: duration : 10.710054 sec bandwith : 145.891 MB/sec rate : 149392.34 IO/sec latency : 6.694 us (nonsense if credits > 1) Duration across processes: MAX duration : 10.709952 sec MIN duration : 5.946499 sec Average duration : 8.273347 sec |
CREDITS=8
[sdwillso@boro-11 ~]$ CREDITS=8 ./daos_m/src/tests/daos_perf.sh daos 200 1000 1K + /home/sdwillso/daos_m/opt/ompi/bin/orterun -quiet --hostfile /home/sdwillso/scripts/host.cli.1 --ompi-server file:/home/sdwillso/scripts/uri.txt -x DD_SUBSYS= -x DD_MASK= -x D_LOG_FILE=/tmp/daos_perf.log /home/sdwillso/daos_m/install/bin/daos_perf -T daos -P 2G -d 1 -a 200 -r 1000 -s 1K -C 8 -t -z Test : DAOS (full stack) Parameters : pool size : 2048 MB credits : 8 (sync I/O for -ve) obj_per_cont : 1 x 8 (procs) dkey_per_obj : 1 akey_per_dkey : 200 recx_per_akey : 1000 value type : single value size : 1024 zero copy : yes overwrite : yes VOS file : <NULL> Started... update successfully completed: duration : 9.245000 sec bandwith : 169.010 MB/sec rate : 173066.52 IO/sec latency : 5.778 us (nonsense if credits > 1) Duration across processes: MAX duration : 9.241115 sec MIN duration : 3.688652 sec Average duration : 6.545638 sec |
CREDITS=1
[sdwillso@boro-11 ~]$ CREDITS=1 ./daos_m/src/tests/daos_perf.sh daos 200 1000 4K + /home/sdwillso/daos_m/opt/ompi/bin/orterun -quiet --hostfile /home/sdwillso/scripts/host.cli.1 --ompi-server file:/home/sdwillso/scripts/uri.txt -x DD_SUBSYS= -x DD_MASK= -x D_LOG_FILE=/tmp/daos_perf.log /home/sdwillso/daos_m/install/bin/daos_perf -T daos -P 2G -d 1 -a 200 -r 1000 -s 4K -C 1 -t -z Test : DAOS (full stack) Parameters : pool size : 2048 MB credits : 1 (sync I/O for -ve) obj_per_cont : 1 x 8 (procs) dkey_per_obj : 1 akey_per_dkey : 200 recx_per_akey : 1000 value type : single value size : 4096 zero copy : yes overwrite : yes VOS file : <NULL> Started... update successfully completed: duration : 14.762039 sec bandwith : 423.383 MB/sec rate : 108386.11 IO/sec latency : 9.226 us (nonsense if credits > 1) Duration across processes: MAX duration : 14.757540 sec MIN duration : 6.194879 sec Average duration : 11.559975 sec |
[sdwillso@boro-11 ~]$ orterun -np 1 --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt ior -v -W -i 5 -a DAOS -w -o `uuidgen` -b 5g -t 1m -O daospool=0c35d24c-df37-43e3-9283-0150272956df,daosrecordsize=1m,daosstripesize=1m,daosstripecount=1024,daosaios=16,daosobjectclass=LARGE,daosPoolSvc=1,daosepoch=1 IOR-3.0.1: MPI Coordinated Test of Parallel I/O Began: Tue May 29 22:14:58 2018 Command line used: ior -v -W -i 5 -a DAOS -w -o 882d642f-892c-4c13-9f6e-2cd625bb5981 -b 5g -t 1m -O daospool=0c35d24c-df37-43e3-9283-0150272956df,daosrecordsize=1m,daosstripesize=1m,daosstripecount=1024,daosaios=16,daosobjectclass=LARGE,daosPoolSvc=1,daosepoch=1 Machine: Linux boro-11.boro.hpdd.intel.com Start time skew across all tasks: 0.00 sec Test 0 started: Tue May 29 22:14:58 2018 Path: /home/sdwillso FS: 3.8 TiB Used FS: 9.3% Inodes: 250.0 Mi Used Inodes: 1.9% Participating tasks: 1 [0] WARNING: USING daosStripeMax CAUSES READS TO RETURN INVALID DATA Summary: api = DAOS test filename = 882d642f-892c-4c13-9f6e-2cd625bb5981 access = single-shared-file, independent pattern = segmented (1 segment) ordering in a file = sequential offsets ordering inter file= no tasks offsets clients = 1 (1 per node) repetitions = 5 xfersize = 1 MiB blocksize = 5 GiB aggregate filesize = 5 GiB access bw(MiB/s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter ------ --------- ---------- --------- -------- -------- -------- -------- ---- Commencing write performance test: Tue May 29 22:14:58 2018 write 1888.81 5242880 1024.00 0.002534 2.70 0.007510 2.71 0 Verifying contents of the file(s) just written. Tue May 29 22:15:00 2018 remove - - - - - - 0.003222 0 Commencing write performance test: Tue May 29 22:15:07 2018 write 1958.07 5242880 1024.00 0.001571 2.61 0.005053 2.61 1 Verifying contents of the file(s) just written. Tue May 29 22:15:10 2018 remove - - - - - - 0.003029 1 Commencing write performance test: Tue May 29 22:15:16 2018 write 1971.30 5242880 1024.00 0.001469 2.59 0.010100 2.60 2 Verifying contents of the file(s) just written. Tue May 29 22:15:19 2018 remove - - - - - - 0.003084 2 Commencing write performance test: Tue May 29 22:15:25 2018 write 1988.23 5242880 1024.00 0.001451 2.57 0.004563 2.58 3 Verifying contents of the file(s) just written. Tue May 29 22:15:28 2018 remove - - - - - - 0.003083 3 Commencing write performance test: Tue May 29 22:15:34 2018 write 1986.93 5242880 1024.00 0.001568 2.57 0.007999 2.58 4 Verifying contents of the file(s) just written. Tue May 29 22:15:37 2018 remove - - - - - - 0.003055 4 Max Write: 1988.23 MiB/sec (2084.81 MB/sec) Summary of all tests: Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum write 1988.23 1888.81 1958.67 36.64 2.61496 0 1 1 5 0 0 1 0 0 1 5368709120 1048576 5368709120 DAOS 0 Finished: Tue May 29 22:15:43 2018 |
Time: 74.322927 seconds (13454.798341 ops per second)
[sdwillso@boro-11 ~]$ orterun -np 1 --ompi-server file:~/scripts/uri.txt daosbench --test=kv-idx-update --testid=1 --svc=1 --dpool=7caf0d73-6a43-4dc3-a4ae-999c6bc511dc --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000 ================================ DAOSBENCH (KV) Started at Tue May 29 22:26:22 2018 ================================= =============================== Test Setup --------------- Test: kv-idx-update DAOS pool :7caf0d73-6a43-4dc3-a4ae-999c6bc511dc DAOS container :c4b70709-359b-40f6-8d1c-c18709769fe4 Value buffer size: 64 Number of processes: 1 Number of indexes/process: 1000000 Number of asynchronous I/O: 32 =============================== kv-idx-update Time: 74.322927 seconds (13454.798341 ops per second) Ended at Tue May 29 22:27:38 2018 |
Time: 0.010937 seconds (9143.062051 ops per second)
[sdwillso@boro-11 ~]$ orterun -np 1 --ompi-server file:~/scripts/uri.txt daosbench --test=kv-dkey-update --testid=1 --svc=1 --dpool=fd393171-2f4d-4f35-8e28-f3ee12d12da8 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000 ================================ DAOSBENCH (KV) Started at Tue May 29 22:33:34 2018 ================================= =============================== Test Setup --------------- Test: kv-dkey-update DAOS pool :fd393171-2f4d-4f35-8e28-f3ee12d12da8 DAOS container :5a1287bd-9901-47b5-94b6-87b3710c8561 Value buffer size: 64 Number of processes: 1 Number of keys/process: 100 Number of asynchronous I/O: 32 =============================== kv-dkey-update Time: 0.010937 seconds (9143.062051 ops per second) Ended at Tue May 29 22:33:34 2018 |
Time: 0.008755 seconds (11421.599459 ops per second)
[sdwillso@boro-11 ~]$ orterun -np 1 --ompi-server file:~/scripts/uri.txt daosbench --test=kv-akey-update --testid=1 --svc=1 --dpool=6503eed4-e8fc-41b6-a1b9-5c16ec924b63 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000 ================================ DAOSBENCH (KV) Started at Tue May 29 22:35:01 2018 ================================= =============================== Test Setup --------------- Test: kv-akey-update DAOS pool :6503eed4-e8fc-41b6-a1b9-5c16ec924b63 DAOS container :0679d0db-23d1-42c6-9bdf-e5caaff2dd5f Value buffer size: 64 Number of processes: 1 Number of keys/process: 100 Number of asynchronous I/O: 32 =============================== kv-akey-update Time: 0.008755 seconds (11421.599459 ops per second) Ended at Tue May 29 22:35:01 2018 |
Time: 0.009627 seconds (10387.624339 ops per second)
[sdwillso@boro-11 ~]$ orterun -np 1 --ompi-server file:~/scripts/uri.txt daosbench --test=kv-dkey-fetch --testid=1 --svc=1 --dpool=dd571fda-5d1f-40fd-b99b-7d3b1d1512f0 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000 ================================ DAOSBENCH (KV) Started at Tue May 29 22:36:53 2018 ================================= =============================== Test Setup --------------- Test: kv-dkey-fetch DAOS pool :dd571fda-5d1f-40fd-b99b-7d3b1d1512f0 DAOS container :9e328b31-1114-422d-807d-d9dbd0c336d2 Value buffer size: 64 Number of processes: 1 Number of keys/process: 100 Number of asynchronous I/O: 32 =============================== kv-dkey-fetch Time: 0.009627 seconds (10387.624339 ops per second) Ended at Tue May 29 22:36:53 2018 |
Time: 0.008745 seconds (11434.480159 ops per second)
[sdwillso@boro-11 ~]$ orterun -np 1 --ompi-server file:~/scripts/uri.txt daosbench --test=kv-akey-fetch --testid=1 --svc=1 --dpool=ef698345-9d64-4ab3-9f8c-4833d56a0a45 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000 ================================ DAOSBENCH (KV) Started at Tue May 29 22:38:22 2018 ================================= =============================== Test Setup --------------- Test: kv-akey-fetch DAOS pool :ef698345-9d64-4ab3-9f8c-4833d56a0a45 DAOS container :d5fb38dc-64fe-4ce2-9ac5-fe3ed08b9f63 Value buffer size: 64 Number of processes: 1 Number of keys/process: 100 Number of asynchronous I/O: 32 =============================== kv-akey-fetch Time: 0.008745 seconds (11434.480159 ops per second) Ended at Tue May 29 22:38:22 2018 |
Results: Failures seen on several tests. Mohamad is currently debugging; this page will be updated as possible.