Tip of master, commit 2062dde6a1814da14eab74363eb23fad0ea3e66b
All tests run with ofi+psm2, ib0.
daos_test: Run with 8 server (boro-[4-11]), 2 client (boro-[12-13]). Killed servers, cleaned /mnt/daos in between runs listed below.
Tests requiring pool to be created via dmg used 4GB pool. These used boro-12 as client.
mpich tests used boro-4 as server, boro-12 as client, with a 1GB pool.
Test Results
daos_test
Separate runs with cleanup in between:
- -mpcCAeoRd - PASS
- -i - FAIL, still rebuilding on IO27 after 10 minutes
- -r - same as -i, still rebuilding after 10 minutes
- -O - PASS
daosperf
1K Records
CREDITS=1
[sdwillso@boro-4 ior]$ orterun --mca mtl ^psm2,ofi -N 1 -quiet --hostfile ~/scripts/host.cli.1 --ompi-server file:~/scripts/uri.txt -x DD_SUBSYS= -x DD_MASK= -x D_LOG_FILE=/tmp/daos_perf.log daos_perf -T daos -P 2G -d 1 -a 200 -r 1000 -s 1K -C 1 -t -z
Test :
DAOS (full stack)
Parameters :
pool size : 2048 MB
credits : 1 (sync I/O for -ve)
obj_per_cont : 1 x 2 (procs)
dkey_per_obj : 1
akey_per_dkey : 200
recx_per_akey : 1000
value type : single
value size : 1024
zero copy : yes
overwrite : yes
verify fetch : no
VOS file : <NULL>
d3bda290: rank 1 became pool service leader 0
Started...
update successfully completed:
duration : 96.823832 sec
bandwith : 4.034 MB/sec
rate : 4131.21 IO/sec
latency : 242.060 us (nonsense if credits > 1)
Duration across processes:
MAX duration : 96.823226 sec
MIN duration : 90.260579 sec
Average duration : 93.541903 sec
d3bda290: rank 1 no longer pool service leader 0
CREDITS=8
4K Records
CREDITS=1
IOR, 10GB pool, data verification enabled
Hitting error after updating to latest IOR: "Can't modify committed epoch"
Still debugging. Stdout here:
[sdwillso@boro-4 ior]$ orterun -x FI_PSM2_DISCONNECT=1 -N 1 --hostfile ~/hostlists/daos_client_hostlist --mca mtl ^psm2,ofi --ompi-server file:~/scripts/uri.txt ior -v -W -i 5 -a DAOS -w -o `uuidgen` -b 5g -t 1m -- -p 20c1b6da-6bfd-4082-8791-718cf0028b3c -v 1 -r 1m -s 1m -c 1024 -a 16 -o LARGE -e 1
ior WARNING: assuming POSIX-based backend for DAOS statfs call.
ior WARNING: assuming POSIX-based backend for DAOS mkdir call.
ior WARNING: assuming POSIX-based backend for DAOS rmdir call.
ior WARNING: assuming POSIX-based backend for DAOS access call.
ior WARNING: assuming POSIX-based backend for DAOS stat call.
ior WARNING: assuming POSIX-based backend for DAOS statfs call.
ior WARNING: assuming POSIX-based backend for DAOS mkdir call.
ior WARNING: assuming POSIX-based backend for DAOS rmdir call.
ior WARNING: assuming POSIX-based backend for DAOS access call.
ior WARNING: assuming POSIX-based backend for DAOS stat call.
IOR-3.1.0: MPI Coordinated Test of Parallel I/O
Began : Mon Sep 10 22:37:41 2018
Command line : ior -v -W -i 5 -a DAOS -w -o b2c28bd8-1877-43af-9f75-ea23d6b28ca1 -b 5g -t 1m -- -p 20c1b6da-6bfd-4082-8791-718cf0028b3c -v 1 -r 1m -s 1m -c 1024 -a 16 -o LARGE -e 1
Machine : Linux boro-12.boro.hpdd.intel.com
Start time skew across all tasks: 10554382.37 sec
TestID : 0
StartTime : Mon Sep 10 22:37:41 2018
Path : /home/sdwillso/ior
FS : 3.8 TiB Used FS: 14.8% Inodes: 250.0 Mi Used Inodes: 2.8%
Participating tasks: 2
[0] WARNING: USING daosStripeMax CAUSES READS TO RETURN INVALID DATA
Options:
api : DAOS
apiVersion : DAOS
test filename : b2c28bd8-1877-43af-9f75-ea23d6b28ca1
access : single-shared-file
type : independent
segments : 1
ordering in a file : sequential
ordering inter file : no tasks offsets
tasks : 2
clients per node : 1
repetitions : 5
xfersize : 1 MiB
blocksize : 5 GiB
aggregate filesize : 10 GiB
Results:
access bw(MiB/s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter
------ --------- ---------- --------- -------- -------- -------- -------- ----
Commencing write performance test: Mon Sep 10 22:37:42 2018
write 4947 5242880 1024.00 0.043661 2.00 0.026218 2.07 0
Verifying contents of the file(s) just written.
Mon Sep 10 22:37:44 2018
remove - - - - - - 0.000043 0
Can't modify committed epoch
Can't modify committed epoch
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
with errorcode -1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[boro-4.boro.hpdd.intel.com:16408] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[boro-4.boro.hpdd.intel.com:16408] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
daos_bench
kv-idx-update
kv-dkey-update
Time: 0.219707 seconds (910.301226 ops per second)
[sdwillso@boro-4 ior]$ orterun -N 1 --mca mtl ^psm2,ofi --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-dkey-update --testid=1 --svc=1 --dpool=37b4b797-73d3-48a2-8a6d-e21d4e320191 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Mon Sep 10 17:45:41 2018
=================================
===============================
Test Setup
---------------
Test: kv-dkey-update
DAOS pool :37b4b797-73d3-48a2-8a6d-e21d4e320191
DAOS container :a1539b57-be18-4394-9747-b92d4bb217dd
Value buffer size: 64
Number of processes: 2
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-dkey-update
Time: 0.219707 seconds (910.301226 ops per second)
Ended at Mon Sep 10 17:45:42 2018
kv-akey-update
Time: 0.126152 seconds (1585.386416 ops per second)
[sdwillso@boro-4 ior]$ orterun -N 1 --mca mtl ^psm2,ofi --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-akey-update --testid=1 --svc=1 --dpool=54728449-42a3-4d8c-b260-62b2da78934a --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Mon Sep 10 17:47:35 2018
=================================
===============================
Test Setup
---------------
Test: kv-akey-update
DAOS pool :54728449-42a3-4d8c-b260-62b2da78934a
DAOS container :de4e0596-9df8-4931-929b-223e1c4a85eb
Value buffer size: 64
Number of processes: 2
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-akey-update
Time: 0.126152 seconds (1585.386416 ops per second)
Ended at Mon Sep 10 17:47:37 2018
kv-dkey-fetch
Time: 0.188764 seconds (1059.522366 ops per second)
[sdwillso@boro-4 ior]$ orterun -N 1 --mca mtl ^psm2,ofi --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-dkey-fetch --testid=1 --svc=1 --dpool=04d536fd-e84a-4b5c-beae-5e63312a948c --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Mon Sep 10 17:49:16 2018
=================================
===============================
Test Setup
---------------
Test: kv-dkey-fetch
DAOS pool :04d536fd-e84a-4b5c-beae-5e63312a948c
DAOS container :b6cf04f9-5c25-4f03-8436-7366034b8194
Value buffer size: 64
Number of processes: 2
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-dkey-fetch
Time: 0.188764 seconds (1059.522366 ops per second)
Ended at Mon Sep 10 17:49:17 2018
kv-akey-fetch
Time: 0.079754 seconds (2507.708677 ops per second)
[sdwillso@boro-4 ior]$ orterun -N 1 --mca mtl ^psm2,ofi --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-akey-fetch --testid=1 --svc=1 --dpool=daf9c1f4-0b2b-4554-b517-66fced0723f0 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Mon Sep 10 17:50:49 2018
=================================
===============================
Test Setup
---------------
Test: kv-akey-fetch
DAOS pool :daf9c1f4-0b2b-4554-b517-66fced0723f0
DAOS container :4a01e667-0056-4b60-80b7-78d095ba3e2b
Value buffer size: 64
Number of processes: 2
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-akey-fetch
Time: 0.079754 seconds (2507.708677 ops per second)
Ended at Mon Sep 10 17:50:50 2018
CaRT Self-Test
Small IO
[sdwillso@boro-4 ior]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes 0 --max-inflight-rpcs 16 --repetitions 100000
Adding endpoints:
ranks: 0 (# ranks = 1)
tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
Group name to test against: daos_server
# endpoints: 1
Message sizes: [(0-EMPTY 0-EMPTY)]
Buffer addresses end with: <Default>
Repetitions per size: 100000
Max inflight RPCs: 16
host boro-4.boro.hpdd.intel.com finished self_test duration 19.785263 S.
##################################################
Results for message size (0-EMPTY 0-EMPTY) (max_inflight_rpcs = 16):
Master Endpoint 0:0
-------------------
RPC Bandwidth (MB/sec): 0.00
RPC Throughput (RPCs/sec): 5054
RPC Latencies (us):
Min : 1086
25th %: 3039
Median : 3067
75th %: 3094
Max : 329750
Average: 3133
Std Dev: 4140.93
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 3067
Large IO Bulk PUT
[sdwillso@boro-4 ior]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes "0 b1048576" --max-inflight-rpcs 16 --repetitions 1000
Adding endpoints:
ranks: 0 (# ranks = 1)
tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
Group name to test against: daos_server
# endpoints: 1
Message sizes: [(0-EMPTY 1048576-BULK_PUT)]
Buffer addresses end with: <Default>
Repetitions per size: 1000
Max inflight RPCs: 16
host boro-4.boro.hpdd.intel.com finished self_test duration 0.340525 S.
##################################################
Results for message size (0-EMPTY 1048576-BULK_PUT) (max_inflight_rpcs = 16):
Master Endpoint 0:0
-------------------
RPC Bandwidth (MB/sec): 2936.65
RPC Throughput (RPCs/sec): 2937
RPC Latencies (us):
Min : 2239
25th %: 5376
Median : 5400
75th %: 5421
Max : 6376
Average: 5382
Std Dev: 254.28
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 5400
Large IO Bulk GET
[sdwillso@boro-4 ior]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes “b1048576 0” --max-inflight-rpcs 16 --repetitions 1000
Adding endpoints:
ranks: 0 (# ranks = 1)
tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
Group name to test against: daos_server
# endpoints: 1
Message sizes: [(1048576-BULK_GET 1048576-BULK_PUT)]
Buffer addresses end with: <Default>
Repetitions per size: 1000
Max inflight RPCs: 16
host boro-4.boro.hpdd.intel.com finished self_test duration 0.579903 S.
##################################################
Results for message size (1048576-BULK_GET 1048576-BULK_PUT) (max_inflight_rpcs = 16):
Master Endpoint 0:0
-------------------
RPC Bandwidth (MB/sec): 3448.85
RPC Throughput (RPCs/sec): 1724
RPC Latencies (us):
Min : 6855
25th %: 8634
Median : 8676
75th %: 8745
Max : 14535
Average: 9219
Std Dev: 1704.53
RPC Failures: 0
Endpoint results (rank:tag - Median Latency (us)):
0:0 - 8675
mpich tests
Results: Hangs at first test, this is due to known issue,
Unable to locate Jira server for this macro. It may be due to Application Link configuration.