8-3-18

Tip of master, commit 3c2ef48f7abfe11fff0629620dc84b8e57ef4ac1

All tests run with ofi+psm2, ib0.

daos_test: Run with 8 server (boro-[4-11]), 2 client (boro-[12-13]). Killed servers, cleaned /mnt/daos in between runs listed below.

Tests requiring pool to be created via dmg used 4GB pool. These used boro-12 as client.

mpich tests used boro-4 as server, boro-12 as client, with a 1GB pool.

Test Results

daos_test

Separate runs with cleanup in between:

  • -mpcCiAeoRd - PASS
  • -r -
  • -O - PASS

daosperf

1K Records

CREDITS=1

CART-492 - Getting issue details... STATUS

CREDITS=8

CART-492 - Getting issue details... STATUS

4K Records

CREDITS=1

CART-492 - Getting issue details... STATUS

IOR, 10GB pool, data verification enabled

[sdwillso@boro-3 ~]$ orterun -np 1 --hostfile ~/hostlists/daos_client_hostlist --mca mtl ^psm2,ofi  --ompi-server file:~/scripts/uri.txt ior -v -W -i 5 -a DAOS -w -o `uuidgen` -b 5g -t 1m -O daospool=46826ed3-7fde-41f4-b687-a4365646dda7,daosrecordsize=1m,daosstripesize=1m,daosstripecount=1024,daosaios=16,daosobjectclass=LARGE,daosPoolSvc=1,daosepoch=1
IOR-3.0.1: MPI Coordinated Test of Parallel I/O

Began: Fri Aug  3 23:00:20 2018
Command line used: ior -v -W -i 5 -a DAOS -w -o a734f873-e7d3-4952-8bb7-a4a2fad2823f -b 5g -t 1m -O daospool=46826ed3-7fde-41f4-b687-a4365646dda7,daosrecordsize=1m,daosstripesize=1m,daosstripecount=1024,daosaios=16,daosobjectclass=LARGE,daosPoolSvc=1,daosepoch=1
Machine: Linux boro-12.boro.hpdd.intel.com
Start time skew across all tasks: 0.00 sec

Test 0 started: Fri Aug  3 23:00:20 2018
Path: /home/sdwillso
FS: 3.8 TiB   Used FS: 12.7%   Inodes: 250.0 Mi   Used Inodes: 2.5%
Participating tasks: 1
[0] WARNING: USING daosStripeMax CAUSES READS TO RETURN INVALID DATA
Summary:
	api                = DAOS
	test filename      = a734f873-e7d3-4952-8bb7-a4a2fad2823f
	access             = single-shared-file, independent
	pattern            = segmented (1 segment)
	ordering in a file = sequential offsets
	ordering inter file= no tasks offsets
	clients            = 1 (1 per node)
	repetitions        = 5
	xfersize           = 1 MiB
	blocksize          = 5 GiB
	aggregate filesize = 5 GiB

access    bw(MiB/s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
------    ---------  ---------- ---------  --------   --------   --------   --------   ----
Commencing write performance test: Fri Aug  3 23:00:21 2018
write     2575.01    5242880    1024.00    0.044560   1.92       0.025960   1.99       0   
Verifying contents of the file(s) just written.
Fri Aug  3 23:00:23 2018

remove    -          -          -          -          -          -          0.043986   0   
Commencing write performance test: Fri Aug  3 23:00:29 2018
write     2546.76    5242880    1024.00    0.037254   1.95       0.025300   2.01       1   
Verifying contents of the file(s) just written.
Fri Aug  3 23:00:31 2018

remove    -          -          -          -          -          -          0.043078   1   
Commencing write performance test: Fri Aug  3 23:00:37 2018
write     2644.66    5242880    1024.00    0.035688   1.88       0.024632   1.94       2   
Verifying contents of the file(s) just written.
Fri Aug  3 23:00:39 2018

remove    -          -          -          -          -          -          0.046548   2   
Commencing write performance test: Fri Aug  3 23:00:45 2018
write     2621.47    5242880    1024.00    0.048043   1.88       0.024752   1.95       3   
Verifying contents of the file(s) just written.
Fri Aug  3 23:00:47 2018

remove    -          -          -          -          -          -          0.048288   3   
Commencing write performance test: Fri Aug  3 23:00:53 2018
write     2601.91    5242880    1024.00    0.045953   1.90       0.024521   1.97       4   
Verifying contents of the file(s) just written.
Fri Aug  3 23:00:55 2018

remove    -          -          -          -          -          -          0.042314   4   

Max Write: 2644.66 MiB/sec (2773.12 MB/sec)

Summary of all tests:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write        2644.66    2546.76    2597.96      34.34    1.97112 0 1 1 5 0 0 1 0 0 1 5368709120 1048576 5368709120 DAOS 0

Finished: Fri Aug  3 23:01:04 2018

daos_bench

kv-idx-update

kv-dkey-update

kv-akey-update

kv-dkey-fetch

kv-akey-fetch

  • Segfault in psm2

CaRT Self-Test

Small IO

[sdwillso@boro-4 ~]$ orterun --mca mtl ^psm2,ofi -N 1 --hostfile ~/hostlists/daos_single_server --enable-recovery --report-uri ~/scripts/uri.txt daos_server &
[4] 54978
[sdwillso@boro-4 ~]$ loaded 3 items from /home/sdwillso/daos_m/install/share/control/mgmtinit_db.json
(0/36) cores requested; use default (36) cores
DAOS server (v0.0.2) process 18672 started on rank 0 (out of 1) with 36 xstream(s)

[sdwillso@boro-4 ~]$ orterun --mca mtl ^psm2,ofi -np 1 -ompi-server file:~/scripts/uri.txt daos_m/opt/cart/bin/self_test --group-name daos_server --endpoint 0:0 --message-sizes 0 --max-inflight-rpcs 16 --repetitions 100000
Adding endpoints:
  ranks: 0 (# ranks = 1)
  tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
  Group name to test against: daos_server
  # endpoints:                1
  Message sizes:              [(0-EMPTY 0-EMPTY)]
  Buffer addresses end with:  <Default>
  Repetitions per size:       100000
  Max inflight RPCs:          16

host boro-4.boro.hpdd.intel.com finished self_test duration 7.572213 S.
##################################################
Results for message size (0-EMPTY 0-EMPTY) (max_inflight_rpcs = 16):

Master Endpoint 0:0
-------------------
	RPC Bandwidth (MB/sec): 0.00
	RPC Throughput (RPCs/sec): 13206
	RPC Latencies (us):
		Min    : 447
		25th  %: 1099
		Median : 1103
		75th  %: 1109
		Max    : 330084
		Average: 1197
		Std Dev: 4164.04
	RPC Failures: 0

	Endpoint results (rank:tag - Median Latency (us)):
		0:0 - 1103

Large IO

[sdwillso@boro-4 ~]$ orterun --mca mtl ^psm2,ofi -N 1 --hostfile ~/hostlists/daos_single_server --enable-recovery --report-uri ~/scripts/uri.txt daos_server &
[3] 54936
[sdwillso@boro-4 ~]$ loaded 3 items from /home/sdwillso/daos_m/install/share/control/mgmtinit_db.json
(0/36) cores requested; use default (36) cores
DAOS server (v0.0.2) process 18536 started on rank 0 (out of 1) with 36 xstream(s)

[sdwillso@boro-4 ~]$ orterun --mca mtl ^psm2,ofi -np 1 -ompi-server file:~/scripts/uri.txt daos_m/opt/cart/bin/self_test --group-name daos_server --endpoint 0:0 --message-sizes 0:b1048576 --max-inflight-rpcs 16 --repetitions 1000
Adding endpoints:
  ranks: 0 (# ranks = 1)
  tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
  Group name to test against: daos_server
  # endpoints:                1
  Message sizes:              [(0-EMPTY 1048576-BULK_PUT)]
  Buffer addresses end with:  <Default>
  Repetitions per size:       1000
  Max inflight RPCs:          16

host boro-4.boro.hpdd.intel.com finished self_test duration 0.391614 S.
##################################################
Results for message size (0-EMPTY 1048576-BULK_PUT) (max_inflight_rpcs = 16):

Master Endpoint 0:0
-------------------
	RPC Bandwidth (MB/sec): 2553.54
	RPC Throughput (RPCs/sec): 2554
	RPC Latencies (us):
		Min    : 3006
		25th  %: 6210
		Median : 6220
		75th  %: 6239
		Max    : 7039
		Average: 6212
		Std Dev: 342.10
	RPC Failures: 0

	Endpoint results (rank:tag - Median Latency (us)):
		0:0 - 6220