...
Tip of master, commit 98cd53e0a273885324261d5f38c23a80be84f09e
After running tip of master, reran few tests with OFI updated to 99e333426b64d7d227fd604731235ffc14862662 to pull in some psm2 fixes.
All tests run with ofi+psm2, ib0.
...
- -mpcCAeoRd - PASS
- -i - FAIL, still rebuilding on IO27 after 10 minutes
Jira Legacy |
---|
server | HPDD Community JiraSystem JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 8bba2dd1f325724b-4333f7c9-300634db-bfcdbd1c-f35d4ebbd2ad69d12ec98a69 |
---|
key | DAOS-1289 |
---|
|
- -r - FAIL
- looks to be same as -i above, still rebuilding after 10 min
- -O - PASS
...
CREDITS=8
- hitting segfault Image RemovedCART-496 - segfault in psm2 while running daos_perf OPEN
4K Records
CREDITS=1
- hitting segfault Image RemovedCART-496 - segfault in psm2 while running daos_perf OPEN
IOR, 10GB pool, data verification enabled
...
linenumbers | true |
---|
collapse | true |
---|
...
Jira Legacy |
---|
server | System JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | f325724b-f7c9-34db-bd1c-69d12ec98a69 |
---|
key | CART-496 |
---|
|
4K Records
CREDITS=1
- hitting segfault
Jira Legacy |
---|
server | System JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | f325724b-f7c9-34db-bd1c-69d12ec98a69 |
---|
key | CART-496 |
---|
|
IOR, 10GB pool, data verification enabled
Code Block |
---|
linenumbers | true |
---|
collapse | true |
---|
|
[sdwillso@boro-4 ~]$ orterun -np 1 --hostfile ~/hostlists/daos_client_hostlist --mca mtl ^psm2,ofi --ompi-server file:~/scripts/uri.txt ior -v -W -i 5 -a DAOS -w -o `uuidgen` -b 5g -t 1m -O daospool=3c6381d2-b094-476e-b7ae-b1f4f7f908fe,daosrecordsize=1m,daosstripesize=1m,daosstripecount=1024,daosaios=16,daosobjectclass=LARGE,daosPoolSvc=1,daosepoch=1
IOR-3.0.1: MPI Coordinated Test of Parallel I/O
Began: Mon Aug 27 18:00:54 2018
Command line used: ior -v -W -i 5 -a DAOS -w -o 0d89ddc3-ae02-4df5-a071-2423bb3a32b9 -b 5g -t 1m -O daospool=3c6381d2-b094-476e-b7ae-b1f4f7f908fe,daosrecordsize=1m,daosstripesize=1m,daosstripecount=1024,daosaios=16,daosobjectclass=LARGE,daosPoolSvc=1,daosepoch=1
Machine: Linux boro-12.boro.hpdd.intel.com
Start time skew across all tasks: 0.00 sec
Test 0 started: Mon Aug 27 18:00:54 2018
Path: /home/sdwillso
FS: 3.8 TiB Used FS: 13.4% Inodes: 250.0 Mi Used Inodes: 2.7%
Participating tasks: 1
[0] WARNING: USING daosStripeMax CAUSES READS TO RETURN INVALID DATA
Summary:
api = DAOS
test filename = 0d89ddc3-ae02-4df5-a071-2423bb3a32b9
access = single-shared-file, independent
pattern = segmented (1 segment)
ordering in a file = sequential offsets
ordering inter file= no tasks offsets
clients = 1 (1 per node)
repetitions = 5
xfersize = 1 MiB
blocksize = 5 GiB
aggregate filesize = 5 GiB
access bw(MiB/s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter
------ --------- ---------- --------- -------- -------- -------- -------- ----
Commencing write performance test: Mon Aug 27 18:00:55 2018
write 2553.80 5242880 1024.00 0.044146 1.93 0.026871 2.00 0
Verifying contents of the file(s) just written.
Mon Aug 27 18:00:57 2018
remove - - - - - - 0.044487 0
Commencing write performance test: Mon Aug 27 18:01:03 2018
write 2592.04 5242880 1024.00 0.035451 1.92 0.024715 1.98 1
Verifying contents of the file(s) just written.
Mon Aug 27 18:01:05 2018
remove - - - - - - 0.045178 1
Commencing write performance test: Mon Aug 27 18:01:12 2018
write 2593.84 5242880 1024.00 0.035656 1.91 0.025228 1.97 2
Verifying contents of the file(s) just written.
Mon Aug 27 18:01:14 2018
remove - - - - - - 0.043000 2
Commencing write performance test: Mon Aug 27 18:01:20 2018
write 2595.09 5242880 1024.00 0.036727 1.91 0.024770 1.97 3
Verifying contents of the file(s) just written.
Mon Aug 27 18:01:22 2018
remove - - - - - - 0.044975 3
Commencing write performance test: Mon Aug 27 18:01:28 2018
write 2590.57 5242880 1024.00 0.035836 1.92 0.025285 1.98 4
Verifying contents of the file(s) just written.
Mon Aug 27 18:01:30 2018
remove - - - - - - 0.043984 4
Max Write: 2595.09 MiB/sec (2721.15 MB/sec)
Summary of all tests:
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write 2595.09 2553.80 2585.07 15.71 1.98068 0 1 1 5 0 0 1 0 0 1 5368709120 1048576 5368709120 DAOS 0
Finished: Mon Aug 27 18:01:39 2018 |
...
- At end of this test with multiple servers, container destroy fails
- Image RemovedDAOS-1243 - container destroy fails with 2+ servers in kv-idx-update OPEN
Jira Legacy |
---|
server | System JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | f325724b-f7c9-34db-bd1c-69d12ec98a69 |
---|
key | DAOS-1243 |
---|
|
Time: 105.668696 seconds (9463.540644 ops per second)
...
Code Block |
---|
linenumbers | true |
---|
collapse | true |
---|
|
[sdwillso@boro-4 ~]$ orterun -x FI_PSM2_DISCONNECT=1 -np 1 --mca mtl ^psm2,ofi --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-akey-fetch --testid=1 --svc=1 --dpool=7e97c5fb-196e-4851-9c88-069349d8ce6e --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Mon Aug 27 18:35:29 2018
=================================
===============================
Test Setup
---------------
Test: kv-akey-fetch
DAOS pool :7e97c5fb-196e-4851-9c88-069349d8ce6e
DAOS container :932946ad-9bf1-4309-b14b-96ca699ee72c
Value buffer size: 64
Number of processes: 1
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-akey-fetch
Time: 0.001576 seconds (63464.794813 ops per second)
Ended at Mon Aug 27 18:35:30 2018 |
CaRT Self-Test
Small IO
Large IO Bulk PUT
Large IO Bulk GET
mpich tests
...
Code Block |
---|
linenumbers | true |
---|
collapse | true |
---|
|
[sdwillso@boro-4 ~]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes 0 --max-inflight-rpcs 16 --repetitions 100000
Adding endpoints:
ranks: 0 (# ranks = 1)
tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
Group name to test against: daos_server
# endpoints: 1
Message sizes: [(0-EMPTY 0-EMPTY)]
Buffer addresses end with: <Default>
Repetitions per size: 100000
Max inflight RPCs: 16
host boro-4.boro.hpdd.intel.com finished self_test duration 0.500411 S.
##################################################
Results for message size (0-EMPTY 0-EMPTY) (max_inflight_rpcs = 16):
Master Endpoint 0:0
-------------------
RPC Bandwidth (MB/sec): 0.00
RPC Throughput (RPCs/sec): 199836
RPC Latencies (us):
Min : 34
25th %: 77
Median : 77
75th %: 80
Max : 1563
Average: 79
Std Dev: 15.72
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 77 |
Large IO Bulk PUT
Code Block |
---|
linenumbers | true |
---|
collapse | true |
---|
|
[sdwillso@boro-4 ~]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes "0 b1048576" --max-inflight-rpcs 16 --repetitions 1000
Adding endpoints:
ranks: 0 (# ranks = 1)
tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
Group name to test against: daos_server
# endpoints: 1
Message sizes: [(0-EMPTY 1048576-BULK_PUT)]
Buffer addresses end with: <Default>
Repetitions per size: 1000
Max inflight RPCs: 16
host boro-4.boro.hpdd.intel.com finished self_test duration 0.145925 S.
##################################################
Results for message size (0-EMPTY 1048576-BULK_PUT) (max_inflight_rpcs = 16):
Master Endpoint 0:0
-------------------
RPC Bandwidth (MB/sec): 6852.84
RPC Throughput (RPCs/sec): 6853
RPC Latencies (us):
Min : 1372
25th %: 2292
Median : 2313
75th %: 2336
Max : 4448
Average: 2324
Std Dev: 178.51
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 2313 |
Large IO Bulk GET
Code Block |
---|
linenumbers | true |
---|
collapse | true |
---|
|
[sdwillso@boro-4 ~]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes "b1048576 0" --max-inflight-rpcs 16 --repetitions 1000
Adding endpoints:
ranks: 0 (# ranks = 1)
tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
Group name to test against: daos_server
# endpoints: 1
Message sizes: [(1048576-BULK_GET 0-EMPTY)]
Buffer addresses end with: <Default>
Repetitions per size: 1000
Max inflight RPCs: 16
host boro-4.boro.hpdd.intel.com finished self_test duration 0.125163 S.
##################################################
Results for message size (1048576-BULK_GET 0-EMPTY) (max_inflight_rpcs = 16):
Master Endpoint 0:0
-------------------
RPC Bandwidth (MB/sec): 7989.59
RPC Throughput (RPCs/sec): 7990
RPC Latencies (us):
Min : 518
25th %: 1961
Median : 1977
75th %: 2004
Max : 3477
Average: 1991
Std Dev: 232.06
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 1977 |
mpich tests
Results: Hanging on first test until segfault with current master. Updated to OFI commit mentioned at beginning of this page, then hit
Jira Legacy |
---|
server | System JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | f325724b-f7c9-34db-bd1c-69d12ec98a69 |
---|
key | DAOS-1290 |
---|
|