Tip of master, commit 98cd53e0a273885324261d5f38c23a80be84f09e
After running tip of master, reran few tests with OFI updated to 99e333426b64d7d227fd604731235ffc14862662 to pull in some psm2 fixes.
All tests run with ofi+psm2, ib0.
daos_test: Run with 8 server (boro-[4-11]), 2 client (boro-[12-13]). Killed servers, cleaned /mnt/daos in between runs listed below.
Tests requiring pool to be created via dmg used 4GB pool. These used boro-12 as client.
mpich tests used boro-4 as server, boro-12 as client, with a 1GB pool.
Test Results
daos_test
Separate runs with cleanup in between:
- -mpcCAeoRd - PASS
- -i - FAIL, still rebuilding on IO27 after 10 minutes
- -r - FAIL
- looks to be same as -i above, still rebuilding after 10 min
- -O - PASS
daosperf
1K Records
CREDITS=1
[sdwillso@boro-4 ~]$ orterun -x FI_PSM2_DISCONNECT=1 --mca mtl ^psm2,ofi -np 1 -quiet --hostfile ~/scripts/host.cli.1 --ompi-server file:~/scripts/uri.txt -x DD_SUBSYS= -x DD_MASK= -x D_LOG_FILE=/tmp/daos_perf.log daos_perf -T daos -P 2G -d 1 -a 200 -r 1000 -s 1K -C 1 -t -z
Test :
DAOS (full stack)
Parameters :
pool size : 2048 MB
credits : 1 (sync I/O for -ve)
obj_per_cont : 1 x 1 (procs)
dkey_per_obj : 1
akey_per_dkey : 200
recx_per_akey : 1000
value type : single
value size : 1024
zero copy : yes
overwrite : yes
verify fetch : no
VOS file : <NULL>
fe9c1051: rank 1 became pool service leader 0
Started...
update successfully completed:
duration : 5.522145 sec
bandwith : 35.369 MB/sec
rate : 36217.81 IO/sec
latency : 27.611 us (nonsense if credits > 1)
Duration across processes:
MAX duration : 5.522145 sec
MIN duration : 5.522145 sec
Average duration : 5.522145 sec
fe9c1051: rank 1 no longer pool service leader 0
CREDITS=8
- hitting segfault
Unable to locate Jira server for this macro. It may be due to Application Link configuration.
4K Records
CREDITS=1
- hitting segfault
Unable to locate Jira server for this macro. It may be due to Application Link configuration.
IOR, 10GB pool, data verification enabled
[sdwillso@boro-4 ~]$ orterun -np 1 --hostfile ~/hostlists/daos_client_hostlist --mca mtl ^psm2,ofi --ompi-server file:~/scripts/uri.txt ior -v -W -i 5 -a DAOS -w -o `uuidgen` -b 5g -t 1m -O daospool=3c6381d2-b094-476e-b7ae-b1f4f7f908fe,daosrecordsize=1m,daosstripesize=1m,daosstripecount=1024,daosaios=16,daosobjectclass=LARGE,daosPoolSvc=1,daosepoch=1
IOR-3.0.1: MPI Coordinated Test of Parallel I/O
Began: Mon Aug 27 18:00:54 2018
Command line used: ior -v -W -i 5 -a DAOS -w -o 0d89ddc3-ae02-4df5-a071-2423bb3a32b9 -b 5g -t 1m -O daospool=3c6381d2-b094-476e-b7ae-b1f4f7f908fe,daosrecordsize=1m,daosstripesize=1m,daosstripecount=1024,daosaios=16,daosobjectclass=LARGE,daosPoolSvc=1,daosepoch=1
Machine: Linux boro-12.boro.hpdd.intel.com
Start time skew across all tasks: 0.00 sec
Test 0 started: Mon Aug 27 18:00:54 2018
Path: /home/sdwillso
FS: 3.8 TiB Used FS: 13.4% Inodes: 250.0 Mi Used Inodes: 2.7%
Participating tasks: 1
[0] WARNING: USING daosStripeMax CAUSES READS TO RETURN INVALID DATA
Summary:
api = DAOS
test filename = 0d89ddc3-ae02-4df5-a071-2423bb3a32b9
access = single-shared-file, independent
pattern = segmented (1 segment)
ordering in a file = sequential offsets
ordering inter file= no tasks offsets
clients = 1 (1 per node)
repetitions = 5
xfersize = 1 MiB
blocksize = 5 GiB
aggregate filesize = 5 GiB
access bw(MiB/s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter
------ --------- ---------- --------- -------- -------- -------- -------- ----
Commencing write performance test: Mon Aug 27 18:00:55 2018
write 2553.80 5242880 1024.00 0.044146 1.93 0.026871 2.00 0
Verifying contents of the file(s) just written.
Mon Aug 27 18:00:57 2018
remove - - - - - - 0.044487 0
Commencing write performance test: Mon Aug 27 18:01:03 2018
write 2592.04 5242880 1024.00 0.035451 1.92 0.024715 1.98 1
Verifying contents of the file(s) just written.
Mon Aug 27 18:01:05 2018
remove - - - - - - 0.045178 1
Commencing write performance test: Mon Aug 27 18:01:12 2018
write 2593.84 5242880 1024.00 0.035656 1.91 0.025228 1.97 2
Verifying contents of the file(s) just written.
Mon Aug 27 18:01:14 2018
remove - - - - - - 0.043000 2
Commencing write performance test: Mon Aug 27 18:01:20 2018
write 2595.09 5242880 1024.00 0.036727 1.91 0.024770 1.97 3
Verifying contents of the file(s) just written.
Mon Aug 27 18:01:22 2018
remove - - - - - - 0.044975 3
Commencing write performance test: Mon Aug 27 18:01:28 2018
write 2590.57 5242880 1024.00 0.035836 1.92 0.025285 1.98 4
Verifying contents of the file(s) just written.
Mon Aug 27 18:01:30 2018
remove - - - - - - 0.043984 4
Max Write: 2595.09 MiB/sec (2721.15 MB/sec)
Summary of all tests:
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write 2595.09 2553.80 2585.07 15.71 1.98068 0 1 1 5 0 0 1 0 0 1 5368709120 1048576 5368709120 DAOS 0
Finished: Mon Aug 27 18:01:39 2018
daos_bench
kv-idx-update
- At end of this test with multiple servers, container destroy fails
-
Unable to locate Jira server for this macro. It may be due to Application Link configuration.
Time: 105.668696 seconds (9463.540644 ops per second)
[sdwillso@boro-4 ~]$ orterun -x FI_PSM2_DISCONNECT=1 -np 1 --mca mtl ^psm2,ofi --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-idx-update --testid=1 --svc=1 --dpool=099fde5e-e164-4e0f-b1be-bc7130a652b9 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Mon Aug 27 18:25:30 2018
=================================
===============================
Test Setup
---------------
Test: kv-idx-update
DAOS pool :099fde5e-e164-4e0f-b1be-bc7130a652b9
DAOS container :d17f64ae-2c6c-48a5-a3b5-0733424c0384
Value buffer size: 64
Number of processes: 1
Number of indexes/process: 1000000
Number of asynchronous I/O: 32
===============================
kv-idx-update
Time: 105.668696 seconds (9463.540644 ops per second)
daosbench:0:src/tests/daosbench.c:765: Unknown error 2001: Container destroy failed
kv-dkey-update
Time: 0.008301 seconds (12047.253463 ops per second)
[sdwillso@boro-4 ~]$ orterun -x FI_PSM2_DISCONNECT=1 -np 1 --mca mtl ^psm2,ofi --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-dkey-update --testid=1 --svc=1 --dpool=15d0097a-c03b-4272-8d60-b4e7cd11544e --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Mon Aug 27 18:29:54 2018
=================================
===============================
Test Setup
---------------
Test: kv-dkey-update
DAOS pool :15d0097a-c03b-4272-8d60-b4e7cd11544e
DAOS container :44c3bcea-1056-418f-a900-8eb899af26af
Value buffer size: 64
Number of processes: 1
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-dkey-update
Time: 0.008301 seconds (12047.253463 ops per second)
Ended at Mon Aug 27 18:29:55 2018
kv-akey-update
Time: 0.004175 seconds (23950.921016 ops per second)
[sdwillso@boro-4 ~]$ orterun -x FI_PSM2_DISCONNECT=1 -np 1 --mca mtl ^psm2,ofi --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-akey-update --testid=1 --svc=1 --dpool=7185fa9d-f5ad-4454-90e2-3ce5187f3bd3 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Mon Aug 27 18:31:54 2018
=================================
===============================
Test Setup
---------------
Test: kv-akey-update
DAOS pool :7185fa9d-f5ad-4454-90e2-3ce5187f3bd3
DAOS container :9da1b295-f055-4b0b-ac63-9702daa30715
Value buffer size: 64
Number of processes: 1
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-akey-update
Time: 0.004175 seconds (23950.921016 ops per second)
Ended at Mon Aug 27 18:31:55 2018
kv-dkey-fetch
Time: 0.000553 seconds (180706.206748 ops per second)
[sdwillso@boro-4 ~]$ orterun -x FI_PSM2_DISCONNECT=1 -np 1 --mca mtl ^psm2,ofi --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-dkey-fetch --testid=1 --svc=1 --dpool=64a6e942-4bb2-4b50-a95e-58086a17bfab --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Mon Aug 27 18:33:46 2018
=================================
===============================
Test Setup
---------------
Test: kv-dkey-fetch
DAOS pool :64a6e942-4bb2-4b50-a95e-58086a17bfab
DAOS container :b511c4d7-aab0-4700-970f-eadc840806d1
Value buffer size: 64
Number of processes: 1
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-dkey-fetch
Time: 0.000553 seconds (180706.206748 ops per second)
Ended at Mon Aug 27 18:33:47 2018
kv-akey-fetch
Time: 0.001576 seconds (63464.794813 ops per second)
[sdwillso@boro-4 ~]$ orterun -x FI_PSM2_DISCONNECT=1 -np 1 --mca mtl ^psm2,ofi --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-akey-fetch --testid=1 --svc=1 --dpool=7e97c5fb-196e-4851-9c88-069349d8ce6e --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Mon Aug 27 18:35:29 2018
=================================
===============================
Test Setup
---------------
Test: kv-akey-fetch
DAOS pool :7e97c5fb-196e-4851-9c88-069349d8ce6e
DAOS container :932946ad-9bf1-4309-b14b-96ca699ee72c
Value buffer size: 64
Number of processes: 1
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-akey-fetch
Time: 0.001576 seconds (63464.794813 ops per second)
Ended at Mon Aug 27 18:35:30 2018
CaRT Self-Test
Small IO
[sdwillso@boro-4 ~]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes 0 --max-inflight-rpcs 16 --repetitions 100000
Adding endpoints:
ranks: 0 (# ranks = 1)
tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
Group name to test against: daos_server
# endpoints: 1
Message sizes: [(0-EMPTY 0-EMPTY)]
Buffer addresses end with: <Default>
Repetitions per size: 100000
Max inflight RPCs: 16
host boro-4.boro.hpdd.intel.com finished self_test duration 0.500411 S.
##################################################
Results for message size (0-EMPTY 0-EMPTY) (max_inflight_rpcs = 16):
Master Endpoint 0:0
-------------------
RPC Bandwidth (MB/sec): 0.00
RPC Throughput (RPCs/sec): 199836
RPC Latencies (us):
Min : 34
25th %: 77
Median : 77
75th %: 80
Max : 1563
Average: 79
Std Dev: 15.72
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 77
Large IO Bulk PUT
[sdwillso@boro-4 ~]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes "0 b1048576" --max-inflight-rpcs 16 --repetitions 1000
Adding endpoints:
ranks: 0 (# ranks = 1)
tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
Group name to test against: daos_server
# endpoints: 1
Message sizes: [(0-EMPTY 1048576-BULK_PUT)]
Buffer addresses end with: <Default>
Repetitions per size: 1000
Max inflight RPCs: 16
host boro-4.boro.hpdd.intel.com finished self_test duration 0.145925 S.
##################################################
Results for message size (0-EMPTY 1048576-BULK_PUT) (max_inflight_rpcs = 16):
Master Endpoint 0:0
-------------------
RPC Bandwidth (MB/sec): 6852.84
RPC Throughput (RPCs/sec): 6853
RPC Latencies (us):
Min : 1372
25th %: 2292
Median : 2313
75th %: 2336
Max : 4448
Average: 2324
Std Dev: 178.51
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 2313
Large IO Bulk GET
[sdwillso@boro-4 ~]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes "b1048576 0" --max-inflight-rpcs 16 --repetitions 1000
Adding endpoints:
ranks: 0 (# ranks = 1)
tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
Group name to test against: daos_server
# endpoints: 1
Message sizes: [(1048576-BULK_GET 0-EMPTY)]
Buffer addresses end with: <Default>
Repetitions per size: 1000
Max inflight RPCs: 16
host boro-4.boro.hpdd.intel.com finished self_test duration 0.125163 S.
##################################################
Results for message size (1048576-BULK_GET 0-EMPTY) (max_inflight_rpcs = 16):
Master Endpoint 0:0
-------------------
RPC Bandwidth (MB/sec): 7989.59
RPC Throughput (RPCs/sec): 7990
RPC Latencies (us):
Min : 518
25th %: 1961
Median : 1977
75th %: 2004
Max : 3477
Average: 1991
Std Dev: 232.06
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 1977
mpich tests
Results: Hanging on first test until segfault with current master. Updated to OFI commit mentioned at beginning of this page, then hit
Unable to locate Jira server for this macro. It may be due to Application Link configuration.