Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Repair Jira Macros

...

Tip of master, commit 98cd53e0a273885324261d5f38c23a80be84f09e

After running tip of master, reran few tests with OFI updated to 99e333426b64d7d227fd604731235ffc14862662 to pull in some psm2 fixes.

All tests run with ofi+psm2, ib0.

...

  • -mpcCAeoRd - PASS
  • -i - FAIL, still rebuilding on IO27 after 10 minutes
    • Jira Legacy
      serverHPDD Community JiraSystem JIRA
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId8bba2dd1f325724b-4333f7c9-300634db-bfcdbd1c-f35d4ebbd2ad69d12ec98a69
      keyDAOS-1289
  • -r - FAIL
    • looks to be same as -i above, still rebuilding after 10 min
  • -O - PASS

...

Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-4 ~]$ orterun -x FI_PSM2_DISCONNECT=1 --mca mtl ^psm2,ofi -np 1 -quiet --hostfile ~/scripts/host.cli.1 --ompi-server file:~/scripts/uri.txt -x DD_SUBSYS= -x DD_MASK= -x D_LOG_FILE=/tmp/daos_perf.log daos_perf -T daos -P 2G -d 1 -a 200 -r 1000 -s 1K -C 1 -t -z
Test :
	DAOS (full stack)
Parameters :
	pool size     : 2048 MB
	credits       : 1 (sync I/O for -ve)
	obj_per_cont  : 1 x 1 (procs)
	dkey_per_obj  : 1
	akey_per_dkey : 200
	recx_per_akey : 1000
	value type    : single
	value size    : 1024
	zero copy     : yes
	overwrite     : yes
	verify fetch  : no
	VOS file      : <NULL>
fe9c1051: rank 1 became pool service leader 0
Started...
update successfully completed:
	duration : 5.522145   sec
	bandwith : 35.369     MB/sec
	rate     : 36217.81   IO/sec
	latency  : 27.611     us (nonsense if credits > 1)
Duration across processes:
	MAX duration : 5.522145   sec
	MIN duration : 5.522145   sec
	Average duration : 5.522145   sec
fe9c1051: rank 1 no longer pool service leader 0

CREDITS=8

4K Records

CREDITS=1

IOR, 10GB pool, data verification enabled

...

linenumberstrue
collapsetrue

...

  • hitting segfault  
    Jira Legacy
    serverSystem JIRA
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverIdf325724b-f7c9-34db-bd1c-69d12ec98a69
    keyCART-496

4K Records

CREDITS=1

  • hitting segfault  
    Jira Legacy
    serverSystem JIRA
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverIdf325724b-f7c9-34db-bd1c-69d12ec98a69
    keyCART-496

IOR, 10GB pool, data verification enabled

Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-4 ~]$ orterun -np 1 --hostfile ~/hostlists/daos_client_hostlist --mca mtl ^psm2,ofi  --ompi-server file:~/scripts/uri.txt ior -v -W -i 5 -a DAOS -w -o `uuidgen` -b 5g -t 1m -O daospool=3c6381d2-b094-476e-b7ae-b1f4f7f908fe,daosrecordsize=1m,daosstripesize=1m,daosstripecount=1024,daosaios=16,daosobjectclass=LARGE,daosPoolSvc=1,daosepoch=1
IOR-3.0.1: MPI Coordinated Test of Parallel I/O

Began: Mon Aug 27 18:00:54 2018
Command line used: ior -v -W -i 5 -a DAOS -w -o 0d89ddc3-ae02-4df5-a071-2423bb3a32b9 -b 5g -t 1m -O daospool=3c6381d2-b094-476e-b7ae-b1f4f7f908fe,daosrecordsize=1m,daosstripesize=1m,daosstripecount=1024,daosaios=16,daosobjectclass=LARGE,daosPoolSvc=1,daosepoch=1
Machine: Linux boro-12.boro.hpdd.intel.com
Start time skew across all tasks: 0.00 sec

Test 0 started: Mon Aug 27 18:00:54 2018
Path: /home/sdwillso
FS: 3.8 TiB   Used FS: 13.4%   Inodes: 250.0 Mi   Used Inodes: 2.7%
Participating tasks: 1
[0] WARNING: USING daosStripeMax CAUSES READS TO RETURN INVALID DATA
Summary:
	api                = DAOS
	test filename      = 0d89ddc3-ae02-4df5-a071-2423bb3a32b9
	access             = single-shared-file, independent
	pattern            = segmented (1 segment)
	ordering in a file = sequential offsets
	ordering inter file= no tasks offsets
	clients            = 1 (1 per node)
	repetitions        = 5
	xfersize           = 1 MiB
	blocksize          = 5 GiB
	aggregate filesize = 5 GiB

access    bw(MiB/s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
------    ---------  ---------- ---------  --------   --------   --------   --------   ----
Commencing write performance test: Mon Aug 27 18:00:55 2018
write     2553.80    5242880    1024.00    0.044146   1.93       0.026871   2.00       0   
Verifying contents of the file(s) just written.
Mon Aug 27 18:00:57 2018

remove    -          -          -          -          -          -          0.044487   0   
Commencing write performance test: Mon Aug 27 18:01:03 2018
write     2592.04    5242880    1024.00    0.035451   1.92       0.024715   1.98       1   
Verifying contents of the file(s) just written.
Mon Aug 27 18:01:05 2018

remove    -          -          -          -          -          -          0.045178   1   
Commencing write performance test: Mon Aug 27 18:01:12 2018
write     2593.84    5242880    1024.00    0.035656   1.91       0.025228   1.97       2   
Verifying contents of the file(s) just written.
Mon Aug 27 18:01:14 2018

remove    -          -          -          -          -          -          0.043000   2   
Commencing write performance test: Mon Aug 27 18:01:20 2018
write     2595.09    5242880    1024.00    0.036727   1.91       0.024770   1.97       3   
Verifying contents of the file(s) just written.
Mon Aug 27 18:01:22 2018

remove    -          -          -          -          -          -          0.044975   3   
Commencing write performance test: Mon Aug 27 18:01:28 2018
write     2590.57    5242880    1024.00    0.035836   1.92       0.025285   1.98       4   
Verifying contents of the file(s) just written.
Mon Aug 27 18:01:30 2018

remove    -          -          -          -          -          -          0.043984   4   

Max Write: 2595.09 MiB/sec (2721.15 MB/sec)

Summary of all tests:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write        2595.09    2553.80    2585.07      15.71    1.98068 0 1 1 5 0 0 1 0 0 1 5368709120 1048576 5368709120 DAOS 0

Finished: Mon Aug 27 18:01:39 2018

...

  • At end of this test with multiple servers, container destroy fails

...

    • Jira Legacy
      serverSystem JIRA
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverIdf325724b-f7c9-34db-bd1c-69d12ec98a69
      keyDAOS-1243
Time: 105.668696 seconds (9463.540644 ops per second)

...

Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-4 ~]$ orterun -x FI_PSM2_DISCONNECT=1 -np 1 --mca mtl ^psm2,ofi  --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-akey-fetch --testid=1 --svc=1 --dpool=7e97c5fb-196e-4851-9c88-069349d8ce6e --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Mon Aug 27 18:35:29 2018
=================================
===============================
Test Setup
---------------
Test: kv-akey-fetch
DAOS pool :7e97c5fb-196e-4851-9c88-069349d8ce6e
DAOS container :932946ad-9bf1-4309-b14b-96ca699ee72c
Value buffer size: 64
Number of processes: 1
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-akey-fetch
Time: 0.001576 seconds (63464.794813 ops per second)

Ended at Mon Aug 27 18:35:30 2018

CaRT Self-Test

Small IO

Large IO Bulk PUT

Large IO Bulk GET

mpich tests

...

Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-4 ~]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes 0 --max-inflight-rpcs 16 --repetitions 100000
Adding endpoints:
  ranks: 0 (# ranks = 1)
  tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
  Group name to test against: daos_server
  # endpoints:                1
  Message sizes:              [(0-EMPTY 0-EMPTY)]
  Buffer addresses end with:  <Default>
  Repetitions per size:       100000
  Max inflight RPCs:          16

host boro-4.boro.hpdd.intel.com finished self_test duration 0.500411 S.
##################################################
Results for message size (0-EMPTY 0-EMPTY) (max_inflight_rpcs = 16):

Master Endpoint 0:0
-------------------
	RPC Bandwidth (MB/sec): 0.00
	RPC Throughput (RPCs/sec): 199836
	RPC Latencies (us):
		Min    : 34
		25th  %: 77
		Median : 77
		75th  %: 80
		Max    : 1563
		Average: 79
		Std Dev: 15.72
	RPC Failures: 0

	Endpoint results (rank:tag - Median Latency (us)):
		0:0 - 77

Large IO Bulk PUT

Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-4 ~]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes "0 b1048576" --max-inflight-rpcs 16 --repetitions 1000
Adding endpoints:
  ranks: 0 (# ranks = 1)
  tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
  Group name to test against: daos_server
  # endpoints:                1
  Message sizes:              [(0-EMPTY 1048576-BULK_PUT)]
  Buffer addresses end with:  <Default>
  Repetitions per size:       1000
  Max inflight RPCs:          16

host boro-4.boro.hpdd.intel.com finished self_test duration 0.145925 S.
##################################################
Results for message size (0-EMPTY 1048576-BULK_PUT) (max_inflight_rpcs = 16):

Master Endpoint 0:0
-------------------
	RPC Bandwidth (MB/sec): 6852.84
	RPC Throughput (RPCs/sec): 6853
	RPC Latencies (us):
		Min    : 1372
		25th  %: 2292
		Median : 2313
		75th  %: 2336
		Max    : 4448
		Average: 2324
		Std Dev: 178.51
	RPC Failures: 0

	Endpoint results (rank:tag - Median Latency (us)):
		0:0 - 2313

Large IO Bulk GET

Code Block
linenumberstrue
collapsetrue
[sdwillso@boro-4 ~]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes "b1048576 0" --max-inflight-rpcs 16 --repetitions 1000
Adding endpoints:
  ranks: 0 (# ranks = 1)
  tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
  Group name to test against: daos_server
  # endpoints:                1
  Message sizes:              [(1048576-BULK_GET 0-EMPTY)]
  Buffer addresses end with:  <Default>
  Repetitions per size:       1000
  Max inflight RPCs:          16

host boro-4.boro.hpdd.intel.com finished self_test duration 0.125163 S.
##################################################
Results for message size (1048576-BULK_GET 0-EMPTY) (max_inflight_rpcs = 16):

Master Endpoint 0:0
-------------------
	RPC Bandwidth (MB/sec): 7989.59
	RPC Throughput (RPCs/sec): 7990
	RPC Latencies (us):
		Min    : 518
		25th  %: 1961
		Median : 1977
		75th  %: 2004
		Max    : 3477
		Average: 1991
		Std Dev: 232.06
	RPC Failures: 0

	Endpoint results (rank:tag - Median Latency (us)):
		0:0 - 1977

mpich tests

Results: Hanging on first test until segfault with current master. Updated to OFI commit mentioned at beginning of this page, then hit 

Jira Legacy
serverSystem JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverIdf325724b-f7c9-34db-bd1c-69d12ec98a69
keyDAOS-1290