Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Repair Jira Macros

Table of Contents

Tip of master, commit cf89d9f3cc2ec5e5fbc8c33ea9f51e83ac23429a98cd53e0a273885324261d5f38c23a80be84f09e

After running tip of master, reran few tests with OFI updated to 99e333426b64d7d227fd604731235ffc14862662 to pull in some psm2 fixes.

All tests run with ofi+psm2, ib0.

...

  • -mpcCAeoRd - PASS
  • -i - FAIL, still rebuilding on IO27 after 10 minutes
      Appears to be  Image RemovedDAOS-1207 - DaosTestMulti issues in CI OPEN
    • Jira Legacy
      serverSystem JIRA
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverIdf325724b-f7c9-34db-bd1c-69d12ec98a69
      keyDAOS-1289
  • -r - FAIL
    • looks to be same as -i above, still rebuilding after 10 min
  • -O - PASS

daosperf

1K Records

CREDITS=1

Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-4 ~]$ orterun -x FI_PSM2_DISCONNECT=1 --mca mtl ^psm2,ofi -np 1 -quiet --hostfile ~/scripts/host.cli.1 --ompi-server file:~/scripts/uri.txt -x DD_SUBSYS= -x DD_MASK= -x D_LOG_FILE=/tmp/daos_perf.log daos_perf -T daos -P 2G -d 1 -a 200 -r 1000 -s 1K -C 1 -t -z
Test :
	DAOS (full stack)
Parameters :
	pool size     : 2048 MB
	credits       : 1 (sync I/O for -ve)
	obj_per_cont  : 1 x 1 (procs)
	dkey_per_obj  : 1
	akey_per_dkey : 200
	recx_per_akey : 1000
	value type    : single
	value size    : 1024
	zero copy     : yes
	overwrite     : yes
	verify fetch  : no
	VOS file      : <NULL>
fe9c1051: rank 1 became pool service leader 0
Started...
update successfully completed:
	duration : 5.522145   sec
	bandwith : 35.369     MB/sec
	rate     : 36217.81   IO/sec
	latency  : 27.611     us (nonsense if credits > 1)
Duration across processes:
	MAX duration : 5.522145   sec
	MIN duration : 5.522145   sec
	Average duration : 5.522145   sec
fe9c1051: rank 1 no longer pool service leader 0

CREDITS=8

  • hitting segfault  
    Jira Legacy
    serverSystem JIRA
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverIdf325724b-f7c9-34db-bd1c-69d12ec98a69
    keyCART-496

4K Records

CREDITS=1

  • hitting segfault  
    Jira Legacy
    serverSystem JIRA
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverIdf325724b-f7c9-34db-bd1c-69d12ec98a69
    keyCART-496

IOR, 10GB pool, data verification enabled

Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-4 ~]$ orterun -np 1 --hostfile ~/hostlists/daos_client_hostlist --mca mtl ^psm2,ofi  --ompi-server file:~/scripts/uri.txt ior -v -W -i 5 -a DAOS -w -o `uuidgen` -b 5g -t 1m -O daospool=3c6381d2-b094-476e-b7ae-b1f4f7f908fe,daosrecordsize=1m,daosstripesize=1m,daosstripecount=1024,daosaios=16,daosobjectclass=LARGE,daosPoolSvc=1,daosepoch=1
IOR-3.0.1: MPI Coordinated Test of Parallel I/O

Began: Mon Aug 27 18:00:54 2018
Command line used: ior -v -W -i 5 -a DAOS -w -o 0d89ddc3-ae02-4df5-a071-2423bb3a32b9 -b 5g -t 1m -O daospool=3c6381d2-b094-476e-b7ae-b1f4f7f908fe,daosrecordsize=1m,daosstripesize=1m,daosstripecount=1024,daosaios=16,daosobjectclass=LARGE,daosPoolSvc=1,daosepoch=1
Machine: Linux boro-12.boro.hpdd.intel.com
Start time skew across all tasks: 0.00 sec

Test 0 started: Mon Aug 27 18:00:54 2018
Path: /home/sdwillso
FS: 3.8 TiB   Used FS: 13.4%   Inodes: 250.0 Mi   Used Inodes: 2.7%
Participating tasks: 1
[0] WARNING: USING daosStripeMax CAUSES READS TO RETURN INVALID DATA
Summary:
	api                = DAOS
	test filename      = 0d89ddc3-ae02-4df5-a071-2423bb3a32b9
	access             = single-shared-file, independent
	pattern            = segmented (1 segment)
	ordering in a file = sequential offsets
	ordering inter file= no tasks offsets
	clients            = 1 (1 per node)
	repetitions        = 5
	xfersize           = 1 MiB
	blocksize          = 5 GiB
	aggregate filesize = 5 GiB

access    bw(MiB/s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
------    ---------  ---------- ---------  --------   --------   --------   --------   ----
Commencing write performance test: Mon Aug 27 18:00:55 2018
write     2553.80    5242880    1024.00    0.044146   1.93       0.026871   2.00       0   
Verifying contents of the file(s) just written.
Mon Aug 27 18:00:57 2018

remove    -          -          -          -          -          -          0.044487   0   
Commencing write performance test: Mon Aug 27 18:01:03 2018
write     2592.04    5242880    1024.00    0.035451   1.92       0.024715   1.98       1   
Verifying contents of the file(s) just written.
Mon Aug 27 18:01:05 2018

remove    -          -          -          -          -          -          0.045178   1   
Commencing write performance test: Mon Aug 27 18:01:12 2018
write     2593.84    5242880    1024.00    0.035656   1.91       0.025228   1.97       2   
Verifying contents of the file(s) just written.
Mon Aug 27 18:01:14 2018

remove    -          -          -          -          -          -          0.043000   2   
Commencing write performance test: Mon Aug 27 18:01:20 2018
write     2595.09    5242880    1024.00    0.036727   1.91       0.024770   1.97       3   
Verifying contents of the file(s) just written.
Mon Aug 27 18:01:22 2018

remove    -          -          -          -          -          -          0.044975   3   
Commencing write performance test: Mon Aug 27 18:01:28 2018
write     2590.57    5242880    1024.00    0.035836   1.92       0.025285   1.98       4   
Verifying contents of the file(s) just written.
Mon Aug 27 18:01:30 2018

remove    -          -          -          -          -          -          0.043984   4   

Max Write: 2595.09 MiB/sec (2721.15 MB/sec)

Summary of all tests:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write        2595.09    2553.80    2585.07      15.71    1.98068 0 1 1 5 0 0 1 0 0 1 5368709120 1048576 5368709120 DAOS 0

Finished: Mon Aug 27 18:01:39 2018

daos_bench

kv-idx-update

  • At end of this test with multiple servers, container destroy fails
    • Jira Legacy
      serverSystem JIRA
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverIdf325724b-f7c9-34db-bd1c-69d12ec98a69
      keyDAOS-1243
Time: 105.668696 seconds (9463.540644 ops per second)
Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-4 ~]$ orterun -x FI_PSM2_DISCONNECT=1 -np 1 --mca mtl ^psm2,ofi  --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-idx-update --testid=1 --svc=1 --dpool=099fde5e-e164-4e0f-b1be-bc7130a652b9 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Mon Aug 27 18:25:30 2018
=================================
===============================
Test Setup
---------------
Test: kv-idx-update
DAOS pool :099fde5e-e164-4e0f-b1be-bc7130a652b9
DAOS container :d17f64ae-2c6c-48a5-a3b5-0733424c0384
Value buffer size: 64
Number of processes: 1
Number of indexes/process: 1000000
Number of asynchronous I/O: 32
===============================
kv-idx-update
Time: 105.668696 seconds (9463.540644 ops per second)
daosbench:0:src/tests/daosbench.c:765: Unknown error 2001: Container destroy failed

kv-dkey-update

Time: 0.008301 seconds (12047.253463 ops per second)
Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-4 ~]$ orterun -x FI_PSM2_DISCONNECT=1 -np 1 --mca mtl ^psm2,ofi  --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-dkey-update --testid=1 --svc=1 --dpool=15d0097a-c03b-4272-8d60-b4e7cd11544e --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Mon Aug 27 18:29:54 2018
=================================
===============================
Test Setup
---------------
Test: kv-dkey-update
DAOS pool :15d0097a-c03b-4272-8d60-b4e7cd11544e
DAOS container :44c3bcea-1056-418f-a900-8eb899af26af
Value buffer size: 64
Number of processes: 1
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-dkey-update
Time: 0.008301 seconds (12047.253463 ops per second)

Ended at Mon Aug 27 18:29:55 2018

kv-akey-update

Time: 0.004175 seconds (23950.921016 ops per second)
Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-4 ~]$ orterun -x FI_PSM2_DISCONNECT=1 -np 1 --mca mtl ^psm2,ofi  --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-akey-update --testid=1 --svc=1 --dpool=7185fa9d-f5ad-4454-90e2-3ce5187f3bd3 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Mon Aug 27 18:31:54 2018
=================================
===============================
Test Setup
---------------
Test: kv-akey-update
DAOS pool :7185fa9d-f5ad-4454-90e2-3ce5187f3bd3
DAOS container :9da1b295-f055-4b0b-ac63-9702daa30715
Value buffer size: 64
Number of processes: 1
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-akey-update
Time: 0.004175 seconds (23950.921016 ops per second)

Ended at Mon Aug 27 18:31:55 2018

kv-dkey-fetch

Time: 0.000553 seconds (180706.206748 ops per second)
Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-4 ~]$ orterun -x FI_PSM2_DISCONNECT=1 -np 1 --mca mtl ^psm2,ofi  --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-dkey-fetch --testid=1 --svc=1 --dpool=64a6e942-4bb2-4b50-a95e-58086a17bfab --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Mon Aug 27 18:33:46 2018
=================================
===============================
Test Setup
---------------
Test: kv-dkey-fetch
DAOS pool :64a6e942-4bb2-4b50-a95e-58086a17bfab
DAOS container :b511c4d7-aab0-4700-970f-eadc840806d1
Value buffer size: 64
Number of processes: 1
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-dkey-fetch
Time: 0.000553 seconds (180706.206748 ops per second)

Ended at Mon Aug 27 18:33:47 2018

kv-akey-fetch

Time: 0.001576 seconds (63464.794813 ops per second)
Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-4 ~]$ orterun -x FI_PSM2_DISCONNECT=1 -np 1 --mca mtl ^psm2,ofi  --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-akey-fetch --testid=1 --svc=1 --dpool=7e97c5fb-196e-4851-9c88-069349d8ce6e --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Mon Aug 27 18:35:29 2018
=================================
===============================
Test Setup
---------------
Test: kv-akey-fetch
DAOS pool :7e97c5fb-196e-4851-9c88-069349d8ce6e
DAOS container :932946ad-9bf1-4309-b14b-96ca699ee72c
Value buffer size: 64
Number of processes: 1
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-akey-fetch
Time: 0.001576 seconds (63464.794813 ops per second)

Ended at Mon Aug 27 18:35:30 2018

CaRT Self-Test

Small IO

Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-4 ~]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes 0 --max-inflight-rpcs 16 --repetitions 100000
Adding endpoints:
  ranks: 0 (# ranks = 1)
  tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
  Group name to test against: daos_server
  # endpoints:                1
  Message sizes:              [(0-EMPTY 0-EMPTY)]
  Buffer addresses end with:  <Default>
  Repetitions per size:       100000
  Max inflight RPCs:          16

host boro-4.boro.hpdd.intel.com finished self_test duration 0.500411 S.
##################################################
Results for message size (0-EMPTY 0-EMPTY) (max_inflight_rpcs = 16):

Master Endpoint 0:0
-------------------
	RPC Bandwidth (MB/sec): 0.00
	RPC Throughput (RPCs/sec): 199836
	RPC Latencies (us):
		Min    : 34
		25th  %: 77
		Median : 77
		75th  %: 80
		Max    : 1563
		Average: 79
		Std Dev: 15.72
	RPC Failures: 0

	Endpoint results (rank:tag - Median Latency (us)):
		0:0 - 77

Large IO Bulk PUT

Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-4 ~]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes "0 b1048576" --max-inflight-rpcs 16 --repetitions 1000
Adding endpoints:
  ranks: 0 (# ranks = 1)
  tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
  Group name to test against: daos_server
  # endpoints:                1
  Message sizes:              [(0-EMPTY 1048576-BULK_PUT)]
  Buffer addresses end with:  <Default>
  Repetitions per size:       1000
  Max inflight RPCs:          16

host boro-4.boro.hpdd.intel.com finished self_test duration 0.145925 S.
##################################################
Results for message size (0-EMPTY 1048576-BULK_PUT) (max_inflight_rpcs = 16):

Master Endpoint 0:0
-------------------
	RPC Bandwidth (MB/sec): 6852.84
	RPC Throughput (RPCs/sec): 6853
	RPC Latencies (us):
		Min    : 1372
		25th  %: 2292
		Median : 2313
		75th  %: 2336
		Max    : 4448
		Average: 2324
		Std Dev: 178.51
	RPC Failures: 0

	Endpoint results (rank:tag - Median Latency (us)):
		0:0 - 2313

Large IO Bulk GET

Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-4 ~]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes "b1048576 0" --max-inflight-rpcs 16 --repetitions 1000
Adding endpoints:
  ranks: 0 (# ranks = 1)
  tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
  Group name to test against: daos_server
  # endpoints:                1
  Message sizes:              [(1048576-BULK_GET 0-EMPTY)]
  Buffer addresses end with:  <Default>
  Repetitions per size:       1000
  Max inflight RPCs:          16

host boro-4.boro.hpdd.intel.com finished self_test duration 0.125163 S.
##################################################
Results for message size (1048576-BULK_GET 0-EMPTY) (max_inflight_rpcs = 16):

Master Endpoint 0:0
-------------------
	RPC Bandwidth (MB/sec): 7989.59
	RPC Throughput (RPCs/sec): 7990
	RPC Latencies (us):
		Min    : 518
		25th  %: 1961
		Median : 1977
		75th  %: 2004
		Max    : 3477
		Average: 1991
		Std Dev: 232.06
	RPC Failures: 0

	Endpoint results (rank:tag - Median Latency (us)):
		0:0 - 1977

mpich tests

Results:  Hanging on first test until segfault with current master. Updated to OFI commit mentioned at beginning of this page, then hit 

Jira Legacy
serverSystem JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverIdf325724b-f7c9-34db-bd1c-69d12ec98a69
keyDAOS-1290