Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 16 Next »

Tip of master, commit 2062dde6a1814da14eab74363eb23fad0ea3e66b

All tests run with ofi+psm2, ib0.

daos_test: Run with 8 server (boro-[4-11]), 2 client (boro-[12-13]). Killed servers, cleaned /mnt/daos in between runs listed below.

Tests requiring pool to be created via dmg used 4GB pool. These used boro-12 as client.

mpich tests used boro-4 as server, boro-12 as client, with a 1GB pool.

Test Results

daos_test

Separate runs with cleanup in between:

  • -mpcCAeoRd - PASS
  • -i - FAIL, still rebuilding on IO27 after 10 minutes
    • Error rendering macro 'jira' : Unable to locate Jira server for this macro. It may be due to Application Link configuration.
  • -r - same as -i, still rebuilding after 10 minutes
  • -O - PASS

daosperf

1K Records

CREDITS=1

[sdwillso@boro-4 ior]$ orterun --mca mtl ^psm2,ofi -N 1 -quiet --hostfile ~/scripts/host.cli.1 --ompi-server file:~/scripts/uri.txt -x DD_SUBSYS= -x DD_MASK= -x D_LOG_FILE=/tmp/daos_perf.log daos_perf -T daos -P 2G -d 1 -a 200 -r 1000 -s 1K -C 1 -t -z
Test :
	DAOS (full stack)
Parameters :
	pool size     : 2048 MB
	credits       : 1 (sync I/O for -ve)
	obj_per_cont  : 1 x 2 (procs)
	dkey_per_obj  : 1
	akey_per_dkey : 200
	recx_per_akey : 1000
	value type    : single
	value size    : 1024
	zero copy     : yes
	overwrite     : yes
	verify fetch  : no
	VOS file      : <NULL>
d3bda290: rank 1 became pool service leader 0
Started...
update successfully completed:
	duration : 96.823832  sec
	bandwith : 4.034      MB/sec
	rate     : 4131.21    IO/sec
	latency  : 242.060    us (nonsense if credits > 1)
Duration across processes:
	MAX duration : 96.823226  sec
	MIN duration : 90.260579  sec
	Average duration : 93.541903  sec
d3bda290: rank 1 no longer pool service leader 0

CREDITS=8

4K Records

CREDITS=1

IOR, 10GB pool, data verification enabled

Hitting error after updating to latest IOR: 'daosStripeSize' must be a multiple of 'transferSize'

However, I am using -t 1m and -s 1m; this should work as far as I understand. Working with Mohamad to understand if there is bug or if usage is incorrect.

[sdwillso@boro-4 ior]$ orterun -x FI_PSM2_DISCONNECT=1 -N 1 --hostfile ~/hostlists/daos_client_hostlist --mca mtl ^psm2,ofi  --ompi-server file:~/scripts/uri.txt ior -v -W -i 5 -a DAOS -w -o `uuidgen` -b 5g -t 1m -- -p 36c72724-1d89-4006-b865-255b35e1b82e -v 1 -r 1m -s 1m -c 1024 -a 16 -o LARGE -e 1
IOR-3.0.1: MPI Coordinated Test of Parallel I/O

non-option argument: -p
non-option argument: 36c72724-1d89-4006-b865-255b35e1b82e
non-option argument: -v
non-option argument: 1
non-option argument: -r
non-option argument: 1m
non-option argument: -s
non-option argument: 1m
non-option argument: -c
non-option argument: 1024
non-option argument: -a
non-option argument: 16
non-option argument: -o
non-option argument: LARGE
non-option argument: -e
non-option argument: 1
non-option argument: -p
non-option argument: 36c72724-1d89-4006-b865-255b35e1b82e
non-option argument: -v
non-option argument: 1
non-option argument: -r
non-option argument: 1m
non-option argument: -s
non-option argument: 1m
non-option argument: -c
non-option argument: 1024
non-option argument: -a
non-option argument: 16
non-option argument: -o
non-option argument: LARGE
non-option argument: -e
non-option argument: 1
Began: Mon Sep 10 17:07:16 2018
Command line used: ior -v -W -i 5 -a DAOS -w -o 1a607f0c-149c-4f95-af0d-339f1365b78d -b 5g -t 1m -- -p 36c72724-1d89-4006-b865-255b35e1b82e -v 1 -r 1m -s 1m -c 1024 -a 16 -o LARGE -e 1
Machine: Linux boro-12.boro.hpdd.intel.com
Start time skew across all tasks: 10554382.37 sec

Test 0 started: Mon Sep 10 17:07:16 2018
Path: /home/sdwillso/ior
FS: 3.8 TiB   Used FS: 14.8%   Inodes: 250.0 Mi   Used Inodes: 2.8%
Participating tasks: 2
'daosStripeSize' must be a multiple of 'transferSize'
'daosStripeSize' must be a multiple of 'transferSize'
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[boro-4.boro.hpdd.intel.com:142056] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[boro-4.boro.hpdd.intel.com:142056] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

daos_bench

kv-idx-update

kv-dkey-update

Time: 0.219707 seconds (910.301226 ops per second)
[sdwillso@boro-4 ior]$ orterun -N 1 --mca mtl ^psm2,ofi  --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-dkey-update --testid=1 --svc=1 --dpool=37b4b797-73d3-48a2-8a6d-e21d4e320191 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Mon Sep 10 17:45:41 2018
=================================
===============================
Test Setup
---------------
Test: kv-dkey-update
DAOS pool :37b4b797-73d3-48a2-8a6d-e21d4e320191
DAOS container :a1539b57-be18-4394-9747-b92d4bb217dd
Value buffer size: 64
Number of processes: 2
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-dkey-update
Time: 0.219707 seconds (910.301226 ops per second)

Ended at Mon Sep 10 17:45:42 2018

kv-akey-update

Time: 0.126152 seconds (1585.386416 ops per second)
[sdwillso@boro-4 ior]$ orterun -N 1 --mca mtl ^psm2,ofi  --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-akey-update --testid=1 --svc=1 --dpool=54728449-42a3-4d8c-b260-62b2da78934a --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Mon Sep 10 17:47:35 2018
=================================
===============================
Test Setup
---------------
Test: kv-akey-update
DAOS pool :54728449-42a3-4d8c-b260-62b2da78934a
DAOS container :de4e0596-9df8-4931-929b-223e1c4a85eb
Value buffer size: 64
Number of processes: 2
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-akey-update
Time: 0.126152 seconds (1585.386416 ops per second)

Ended at Mon Sep 10 17:47:37 2018

kv-dkey-fetch

Time: 0.188764 seconds (1059.522366 ops per second)
[sdwillso@boro-4 ior]$ orterun -N 1 --mca mtl ^psm2,ofi  --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-dkey-fetch --testid=1 --svc=1 --dpool=04d536fd-e84a-4b5c-beae-5e63312a948c --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Mon Sep 10 17:49:16 2018
=================================
===============================
Test Setup
---------------
Test: kv-dkey-fetch
DAOS pool :04d536fd-e84a-4b5c-beae-5e63312a948c
DAOS container :b6cf04f9-5c25-4f03-8436-7366034b8194
Value buffer size: 64
Number of processes: 2
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-dkey-fetch
Time: 0.188764 seconds (1059.522366 ops per second)

Ended at Mon Sep 10 17:49:17 2018

kv-akey-fetch

Time: 0.079754 seconds (2507.708677 ops per second)
[sdwillso@boro-4 ior]$ orterun -N 1 --mca mtl ^psm2,ofi  --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-akey-fetch --testid=1 --svc=1 --dpool=daf9c1f4-0b2b-4554-b517-66fced0723f0 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000
================================
DAOSBENCH (KV)
Started at
Mon Sep 10 17:50:49 2018
=================================
===============================
Test Setup
---------------
Test: kv-akey-fetch
DAOS pool :daf9c1f4-0b2b-4554-b517-66fced0723f0
DAOS container :4a01e667-0056-4b60-80b7-78d095ba3e2b
Value buffer size: 64
Number of processes: 2
Number of keys/process: 100
Number of asynchronous I/O: 32
===============================
kv-akey-fetch
Time: 0.079754 seconds (2507.708677 ops per second)

Ended at Mon Sep 10 17:50:50 2018

CaRT Self-Test

Small IO

[sdwillso@boro-4 ior]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes 0 --max-inflight-rpcs 16 --repetitions 100000
Adding endpoints:
  ranks: 0 (# ranks = 1)
  tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
  Group name to test against: daos_server
  # endpoints:                1
  Message sizes:              [(0-EMPTY 0-EMPTY)]
  Buffer addresses end with:  <Default>
  Repetitions per size:       100000
  Max inflight RPCs:          16

host boro-4.boro.hpdd.intel.com finished self_test duration 19.785263 S.
##################################################
Results for message size (0-EMPTY 0-EMPTY) (max_inflight_rpcs = 16):

Master Endpoint 0:0
-------------------
	RPC Bandwidth (MB/sec): 0.00
	RPC Throughput (RPCs/sec): 5054
	RPC Latencies (us):
		Min    : 1086
		25th  %: 3039
		Median : 3067
		75th  %: 3094
		Max    : 329750
		Average: 3133
		Std Dev: 4140.93
	RPC Failures: 0

	Endpoint results (rank:tag - Median Latency (us)):
		0:0 - 3067

Large IO Bulk PUT

[sdwillso@boro-4 ior]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes "0 b1048576" --max-inflight-rpcs 16 --repetitions 1000
Adding endpoints:
  ranks: 0 (# ranks = 1)
  tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
  Group name to test against: daos_server
  # endpoints:                1
  Message sizes:              [(0-EMPTY 1048576-BULK_PUT)]
  Buffer addresses end with:  <Default>
  Repetitions per size:       1000
  Max inflight RPCs:          16

host boro-4.boro.hpdd.intel.com finished self_test duration 0.340525 S.
##################################################
Results for message size (0-EMPTY 1048576-BULK_PUT) (max_inflight_rpcs = 16):

Master Endpoint 0:0
-------------------
	RPC Bandwidth (MB/sec): 2936.65
	RPC Throughput (RPCs/sec): 2937
	RPC Latencies (us):
		Min    : 2239
		25th  %: 5376
		Median : 5400
		75th  %: 5421
		Max    : 6376
		Average: 5382
		Std Dev: 254.28
	RPC Failures: 0

	Endpoint results (rank:tag - Median Latency (us)):
		0:0 - 5400

Large IO Bulk GET

[sdwillso@boro-4 ior]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes “b1048576 0” --max-inflight-rpcs 16 --repetitions 1000
Adding endpoints:
  ranks: 0 (# ranks = 1)
  tags: 0 (# tags = 1)
Warning: No --master-endpoint specified; using this command line application as the master endpoint
Self Test Parameters:
  Group name to test against: daos_server
  # endpoints:                1
  Message sizes:              [(1048576-BULK_GET 1048576-BULK_PUT)]
  Buffer addresses end with:  <Default>
  Repetitions per size:       1000
  Max inflight RPCs:          16

host boro-4.boro.hpdd.intel.com finished self_test duration 0.579903 S.
##################################################
Results for message size (1048576-BULK_GET 1048576-BULK_PUT) (max_inflight_rpcs = 16):

Master Endpoint 0:0
-------------------
	RPC Bandwidth (MB/sec): 3448.85
	RPC Throughput (RPCs/sec): 1724
	RPC Latencies (us):
		Min    : 6855
		25th  %: 8634
		Median : 8676
		75th  %: 8745
		Max    : 14535
		Average: 9219
		Std Dev: 1704.53
	RPC Failures: 0

	Endpoint results (rank:tag - Median Latency (us)):
		0:0 - 8675

mpich tests

Results: Hangs at first test, this is due to known issue,  Unable to locate Jira server for this macro. It may be due to Application Link configuration.

  • No labels