10-31-18

Tip of master, commit b20ff0ca1f5ddbfbecf2e6ffe89bd1a42a1d7043

All tests run with ofi+psm2, ib0.

daos_test: Run with 8 server (boro-[4-11]), 2 client (boro-12,16). Killed servers, cleaned /mnt/daos in between runs listed below.

Tests requiring pool to be created via dmg used 4GB pool. These used boro-12 as client.

mpich tests used boro-4 as server, boro-12 as client, with a 1GB pool.

Tests used 8 xstream/server this time, as there is bug with 36xstreams I normally run with.

Test Results

daos_test

Separate runs with cleanup in between:

  • -mpcCAeoRd - PASS
  • -r - FAIL
    • Appears to be  DAOS-1556 - Getting issue details... STATUS
  • -i - FAIL

daosperf

1K Records

CREDITS=1

[sdwillso@boro-4 ~]$ orterun -x FI_PSM2_DISCONNECT=1 --mca mtl ^psm2,ofi -np 1 -quiet --hostfile ~/scripts/host.cli.1 --ompi-server file:~/scripts/uri.txt -x DD_SUBSYS= -x DD_MASK= -x D_LOG_FILE=/tmp/daos_perf.log daos_perf -T daos -P 4G -d 1 -a 200 -r 1000 -s 1K -C 1 -t -z
ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'openmpi-x86_64'
Test :
	DAOS (full stack)
Parameters :
	pool size     : 4096 MB
	credits       : 1 (sync I/O for -ve)
	obj_per_cont  : 1 x 1 (procs)
	dkey_per_obj  : 1
	akey_per_dkey : 200
	recx_per_akey : 1000
	value type    : single
	value size    : 1024
	zero copy     : yes
	overwrite     : yes
	verify fetch  : no
	VOS file      : <NULL>
3e9214af: rank 1 became pool service leader 0
Started...
update successfully completed:
	duration : 4.446848   sec
	bandwith : 43.922     MB/sec
	rate     : 44975.68   IO/sec
	latency  : 22.234     us (nonsense if credits > 1)
Duration across processes:
	MAX duration : 4.446848   sec
	MIN duration : 4.446848   sec
	Average duration : 4.446848   sec
3e9214af: rank 1 no longer pool service leader 0

CREDITS=8

CART-496 - Getting issue details... STATUS

4K Records

CREDITS=1

CART-496 - Getting issue details... STATUS

IOR, 40GB pool, data verification enabled

[sdwillso@boro-4 daos_m]$ orterun --mca mtl ^psm2,ofi -np 1 --ompi-server file:~/scripts/uri.txt dmg create --size=40G
3675153f: rank 1 became pool service leader 0
3675153f-7d2c-48d9-9b19-0da8cebb2b18 1
[sdwillso@boro-4 daos_m]$ orterun -x FI_PSM2_DISCONNECT=1 -N 1 --hostfile ~/hostlists/daos_client_hostlist --mca mtl ^psm2,ofi  --ompi-server file:~/scripts/uri.txt ior -v -W -i 5 -a DAOS -w -o `uuidgen` -b 5g -t 1m -- -p 3675153f-7d2c-48d9-9b19-0da8cebb2b18 -v 1 -r 1m -s 1m -c 1024 -a 16 -o LARGE
ior WARNING: assuming POSIX-based backend for DAOS statfs call.
ior WARNING: assuming POSIX-based backend for DAOS mkdir call.
ior WARNING: assuming POSIX-based backend for DAOS rmdir call.
ior WARNING: assuming POSIX-based backend for DAOS access call.
ior WARNING: assuming POSIX-based backend for DAOS stat call.
ior WARNING: assuming POSIX-based backend for DAOS statfs call.
ior WARNING: assuming POSIX-based backend for DAOS mkdir call.
ior WARNING: assuming POSIX-based backend for DAOS rmdir call.
ior WARNING: assuming POSIX-based backend for DAOS access call.
ior WARNING: assuming POSIX-based backend for DAOS stat call.
IOR-3.1.0: MPI Coordinated Test of Parallel I/O
Began               : Wed Oct 31 22:37:44 2018
Command line        : ior -v -W -i 5 -a DAOS -w -o f76eabf5-dddd-44d8-9e80-859e18b14f3e -b 5g -t 1m -- -p 3675153f-7d2c-48d9-9b19-0da8cebb2b18 -v 1 -r 1m -s 1m -c 1024 -a 16 -o LARGE
Machine             : Linux boro-12.boro.hpdd.intel.com
Start time skew across all tasks: 2208081.84 sec
TestID              : 0
StartTime           : Wed Oct 31 22:37:44 2018
Path                : /home/sdwillso/daos_m
FS                  : 3.8 TiB   Used FS: 14.2%   Inodes: 250.0 Mi   Used Inodes: 3.1%
Participating tasks: 2
[0] WARNING: USING daosStripeMax CAUSES READS TO RETURN INVALID DATA

Options: 
api                 : DAOS
apiVersion          : DAOS
test filename       : f76eabf5-dddd-44d8-9e80-859e18b14f3e
access              : single-shared-file
type                : independent
segments            : 1
ordering in a file  : sequential
ordering inter file : no tasks offsets
tasks               : 2
clients per node    : 1
repetitions         : 5
xfersize            : 1 MiB
blocksize           : 5 GiB
aggregate filesize  : 10 GiB

Results: 

access    bw(MiB/s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
------    ---------  ---------- ---------  --------   --------   --------   --------   ----
Commencing write performance test: Wed Oct 31 22:37:45 2018
write     4542       5242880    1024.00    0.025885   2.21       0.020801   2.25       0   
Verifying contents of the file(s) just written.
Wed Oct 31 22:37:47 2018

remove    -          -          -          -          -          -          0.029135   0   
Commencing write performance test: Wed Oct 31 22:37:54 2018
write     4637       5242880    1024.00    0.023124   2.16       0.020863   2.21       1   
Verifying contents of the file(s) just written.
Wed Oct 31 22:37:56 2018

remove    -          -          -          -          -          -          0.028687   1   
Commencing write performance test: Wed Oct 31 22:38:02 2018
write     4614       5242880    1024.00    0.023486   2.18       0.020511   2.22       2   
Verifying contents of the file(s) just written.
Wed Oct 31 22:38:04 2018

remove    -          -          -          -          -          -          0.029018   2   
Commencing write performance test: Wed Oct 31 22:38:12 2018
write     4620       5242880    1024.00    0.023798   2.17       0.021097   2.22       3   
Verifying contents of the file(s) just written.
Wed Oct 31 22:38:14 2018

remove    -          -          -          -          -          -          0.029168   3   
Commencing write performance test: Wed Oct 31 22:38:21 2018
write     4608       5242880    1024.00    0.024092   2.18       0.020979   2.22       4   
Verifying contents of the file(s) just written.
Wed Oct 31 22:38:23 2018

remove    -          -          -          -          -          -          0.028924   4   
Max Write: 4636.61 MiB/sec (4861.84 MB/sec)

Summary of all tests:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev   Max(OPs)   Min(OPs)  Mean(OPs)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt   blksiz    xsize aggs(MiB)   API RefNum
write        4636.61    4541.56    4603.98      32.64    4636.61    4541.56    4603.98      32.64    2.22428     0      2   1    5   0     0        1         0    0      1 5368709120  1048576   10240.0 DAOS      0
Finished            : Wed Oct 31 22:38:33 2018

daos_bench

kv-idx-update

Time: 399.307237 seconds (2504.337278 ops per second)
[sdwillso@boro-4 ~]$ orterun -np 1 -x FI_PSM2_DISCONNECT=1 --mca mtl ^psm2,ofi  --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-idx-update --testid=1 --svc=0 --dpool=30099e37-040e-42a9-8eb2-c8cbda0e6148 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Thu Nov  1 19:25:42 2018
=================================
===============================
Test Setup
---------------
Test: kv-idx-update
DAOS pool :30099e37-040e-42a9-8eb2-c8cbda0e6148
DAOS container :ce9ffdb8-054e-4f42-a9c4-8188d38b0426
Value buffer size: 64
Number of processes: 1
Number of indexes/process: 1000000
Number of asynchronous I/O: 32
===============================

kv-idx-update
Time: 399.307237 seconds (2504.337278 ops per second)

Ended at Thu Nov  1 19:32:26 2018

kv-dkey-update

Time: 0.088867 seconds (1125.278100 ops per second)
[sdwillso@boro-4 ~]$ orterun -np 1 -x FI_PSM2_DISCONNECT=1 --mca mtl ^psm2,ofi  --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-dkey-update --testid=1 --svc=1 --dpool=fd8ab9b3-1f49-45ad-973c-616cb453e1a9 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Thu Nov  1 19:35:30 2018
=================================
===============================
Test Setup
---------------
Test: kv-dkey-update
DAOS pool :fd8ab9b3-1f49-45ad-973c-616cb453e1a9
DAOS container :d3e1acdf-c922-45fe-8461-fd8e927d2604
Value buffer size: 64
Number of processes: 1
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-dkey-update
Time: 0.088867 seconds (1125.278100 ops per second)

Ended at Thu Nov  1 19:35:31 2018

kv-akey-update

Time: 0.068169 seconds (1466.935135 ops per second)
[sdwillso@boro-4 ~]$ orterun -np 1 -x FI_PSM2_DISCONNECT=1 --mca mtl ^psm2,ofi  --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-akey-update --testid=1 --svc=1 --dpool=4dad8cbc-236a-4364-a7b1-dd59bb212642 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Thu Nov  1 19:37:12 2018
=================================
===============================
Test Setup
---------------
Test: kv-akey-update
DAOS pool :4dad8cbc-236a-4364-a7b1-dd59bb212642
DAOS container :cac9cd5b-b74c-4826-8317-25e17085e50d
Value buffer size: 64
Number of processes: 1
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-akey-update
Time: 0.068169 seconds (1466.935135 ops per second)

Ended at Thu Nov  1 19:37:12 2018

kv-dkey-fetch

Time: 0.049400 seconds (2024.283674 ops per second)
[sdwillso@boro-4 ~]$ orterun -np 1 -x FI_PSM2_DISCONNECT=1 --mca mtl ^psm2,ofi  --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-dkey-fetch --testid=1 --svc=1 --dpool=5ca6999f-15a8-41f4-86ac-9494fd891876 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Thu Nov  1 19:38:33 2018
=================================
===============================
Test Setup
---------------
Test: kv-dkey-fetch
DAOS pool :5ca6999f-15a8-41f4-86ac-9494fd891876
DAOS container :b4bfc4e0-87d0-4ec7-bcd9-03e46235abf8
Value buffer size: 64
Number of processes: 1
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-dkey-fetch
Time: 0.049400 seconds (2024.283674 ops per second)

Ended at Thu Nov  1 19:38:33 2018

kv-akey-fetch

Time: 0.038302 seconds (2610.806612 ops per second)
[sdwillso@boro-4 ~]$ orterun -np 1 -x FI_PSM2_DISCONNECT=1 --mca mtl ^psm2,ofi  --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-akey-fetch --testid=1 --svc=1 --dpool=a87c1051-bef3-4929-ba42-d3a742e532d1 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Thu Nov  1 19:39:53 2018
=================================
===============================
Test Setup
---------------
Test: kv-akey-fetch
DAOS pool :a87c1051-bef3-4929-ba42-d3a742e532d1
DAOS container :41bd8e41-5444-4323-bd72-d820c9ea7429
Value buffer size: 64
Number of processes: 1
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-akey-fetch
Time: 0.038302 seconds (2610.806612 ops per second)

Ended at Thu Nov  1 19:39:53 2018

CaRT Self-Test

Small IO

[sdwillso@boro-4 mpich]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes 0 --max-inflight-rpcs 16 --repetitions 100000
Adding endpoints:
  ranks: 0 (# ranks = 1)
  tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
  Group name to test against: daos_server
  # endpoints:                1
  Message sizes:              [(0-EMPTY 0-EMPTY)]
  Buffer addresses end with:  <Default>
  Repetitions per size:       100000
  Max inflight RPCs:          16

host boro-4.boro.hpdd.intel.com finished self_test duration 0.339317 S.
##################################################
Results for message size (0-EMPTY 0-EMPTY) (max_inflight_rpcs = 16):

Master Endpoint 0:0
-------------------
	RPC Bandwidth (MB/sec): 0.00
	RPC Throughput (RPCs/sec): 294710
	RPC Latencies (us):
		Min    : 31
		25th  %: 51
		Median : 52
		75th  %: 52
		Max    : 968
		Average: 53
		Std Dev: 8.99
	RPC Failures: 0

	Endpoint results (rank:tag - Median Latency (us)):
		0:0 - 52

Large IO Bulk PUT

[sdwillso@boro-4 mpich]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes "0 b1048576" --max-inflight-rpcs 16 --repetitions 1000
Adding endpoints:
  ranks: 0 (# ranks = 1)
  tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
  Group name to test against: daos_server
  # endpoints:                1
  Message sizes:              [(0-EMPTY 1048576-BULK_PUT)]
  Buffer addresses end with:  <Default>
  Repetitions per size:       1000
  Max inflight RPCs:          16

host boro-4.boro.hpdd.intel.com finished self_test duration 0.133766 S.
##################################################
Results for message size (0-EMPTY 1048576-BULK_PUT) (max_inflight_rpcs = 16):

Master Endpoint 0:0
-------------------
	RPC Bandwidth (MB/sec): 7475.75
	RPC Throughput (RPCs/sec): 7476
	RPC Latencies (us):
		Min    : 1013
		25th  %: 2077
		Median : 2096
		75th  %: 2124
		Max    : 4216
		Average: 2130
		Std Dev: 284.34
	RPC Failures: 0

	Endpoint results (rank:tag - Median Latency (us)):
		0:0 - 2096

Large IO Bulk GET

[sdwillso@boro-4 mpich]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes "b1048576 0" --max-inflight-rpcs 16 --repetitions 1000
Adding endpoints:
  ranks: 0 (# ranks = 1)
  tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
  Group name to test against: daos_server
  # endpoints:                1
  Message sizes:              [(1048576-BULK_GET 0-EMPTY)]
  Buffer addresses end with:  <Default>
  Repetitions per size:       1000
  Max inflight RPCs:          16

host boro-4.boro.hpdd.intel.com finished self_test duration 0.116480 S.
##################################################
Results for message size (1048576-BULK_GET 0-EMPTY) (max_inflight_rpcs = 16):

Master Endpoint 0:0
-------------------
	RPC Bandwidth (MB/sec): 8585.14
	RPC Throughput (RPCs/sec): 8585
	RPC Latencies (us):
		Min    : 361
		25th  %: 1827
		Median : 1833
		75th  %: 1901
		Max    : 3450
		Average: 1853
		Std Dev: 258.15
	RPC Failures: 0

	Endpoint results (rank:tag - Median Latency (us)):
		0:0 - 1833

mpich tests