10-31-18
- Stephen Willson (Unlicensed)
- Jelon Anderson (Deactivated)
Tip of master, commit b20ff0ca1f5ddbfbecf2e6ffe89bd1a42a1d7043
All tests run with ofi+psm2, ib0.
daos_test: Run with 8 server (boro-[4-11]), 2 client (boro-12,16). Killed servers, cleaned /mnt/daos in between runs listed below.
Tests requiring pool to be created via dmg used 4GB pool. These used boro-12 as client.
mpich tests used boro-4 as server, boro-12 as client, with a 1GB pool.
Tests used 8 xstream/server this time, as there is bug with 36xstreams I normally run with.
Test Results
daos_test
Separate runs with cleanup in between:
- -mpcCAeoRd - PASS
- -r - FAIL
- Appears to be DAOS-1556 - Getting issue details... STATUS
- -i - FAIL
- DAOS-1685 - Getting issue details... STATUS
daosperf
1K Records
CREDITS=1
[sdwillso@boro-4 ~]$ orterun -x FI_PSM2_DISCONNECT=1 --mca mtl ^psm2,ofi -np 1 -quiet --hostfile ~/scripts/host.cli.1 --ompi-server file:~/scripts/uri.txt -x DD_SUBSYS= -x DD_MASK= -x D_LOG_FILE=/tmp/daos_perf.log daos_perf -T daos -P 4G -d 1 -a 200 -r 1000 -s 1K -C 1 -t -z ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'openmpi-x86_64' Test : DAOS (full stack) Parameters : pool size : 4096 MB credits : 1 (sync I/O for -ve) obj_per_cont : 1 x 1 (procs) dkey_per_obj : 1 akey_per_dkey : 200 recx_per_akey : 1000 value type : single value size : 1024 zero copy : yes overwrite : yes verify fetch : no VOS file : <NULL> 3e9214af: rank 1 became pool service leader 0 Started... update successfully completed: duration : 4.446848 sec bandwith : 43.922 MB/sec rate : 44975.68 IO/sec latency : 22.234 us (nonsense if credits > 1) Duration across processes: MAX duration : 4.446848 sec MIN duration : 4.446848 sec Average duration : 4.446848 sec 3e9214af: rank 1 no longer pool service leader 0
CREDITS=8
CART-496 - Getting issue details... STATUS
4K Records
CREDITS=1
CART-496 - Getting issue details... STATUS
IOR, 40GB pool, data verification enabled
[sdwillso@boro-4 daos_m]$ orterun --mca mtl ^psm2,ofi -np 1 --ompi-server file:~/scripts/uri.txt dmg create --size=40G 3675153f: rank 1 became pool service leader 0 3675153f-7d2c-48d9-9b19-0da8cebb2b18 1 [sdwillso@boro-4 daos_m]$ orterun -x FI_PSM2_DISCONNECT=1 -N 1 --hostfile ~/hostlists/daos_client_hostlist --mca mtl ^psm2,ofi --ompi-server file:~/scripts/uri.txt ior -v -W -i 5 -a DAOS -w -o `uuidgen` -b 5g -t 1m -- -p 3675153f-7d2c-48d9-9b19-0da8cebb2b18 -v 1 -r 1m -s 1m -c 1024 -a 16 -o LARGE ior WARNING: assuming POSIX-based backend for DAOS statfs call. ior WARNING: assuming POSIX-based backend for DAOS mkdir call. ior WARNING: assuming POSIX-based backend for DAOS rmdir call. ior WARNING: assuming POSIX-based backend for DAOS access call. ior WARNING: assuming POSIX-based backend for DAOS stat call. ior WARNING: assuming POSIX-based backend for DAOS statfs call. ior WARNING: assuming POSIX-based backend for DAOS mkdir call. ior WARNING: assuming POSIX-based backend for DAOS rmdir call. ior WARNING: assuming POSIX-based backend for DAOS access call. ior WARNING: assuming POSIX-based backend for DAOS stat call. IOR-3.1.0: MPI Coordinated Test of Parallel I/O Began : Wed Oct 31 22:37:44 2018 Command line : ior -v -W -i 5 -a DAOS -w -o f76eabf5-dddd-44d8-9e80-859e18b14f3e -b 5g -t 1m -- -p 3675153f-7d2c-48d9-9b19-0da8cebb2b18 -v 1 -r 1m -s 1m -c 1024 -a 16 -o LARGE Machine : Linux boro-12.boro.hpdd.intel.com Start time skew across all tasks: 2208081.84 sec TestID : 0 StartTime : Wed Oct 31 22:37:44 2018 Path : /home/sdwillso/daos_m FS : 3.8 TiB Used FS: 14.2% Inodes: 250.0 Mi Used Inodes: 3.1% Participating tasks: 2 [0] WARNING: USING daosStripeMax CAUSES READS TO RETURN INVALID DATA Options: api : DAOS apiVersion : DAOS test filename : f76eabf5-dddd-44d8-9e80-859e18b14f3e access : single-shared-file type : independent segments : 1 ordering in a file : sequential ordering inter file : no tasks offsets tasks : 2 clients per node : 1 repetitions : 5 xfersize : 1 MiB blocksize : 5 GiB aggregate filesize : 10 GiB Results: access bw(MiB/s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter ------ --------- ---------- --------- -------- -------- -------- -------- ---- Commencing write performance test: Wed Oct 31 22:37:45 2018 write 4542 5242880 1024.00 0.025885 2.21 0.020801 2.25 0 Verifying contents of the file(s) just written. Wed Oct 31 22:37:47 2018 remove - - - - - - 0.029135 0 Commencing write performance test: Wed Oct 31 22:37:54 2018 write 4637 5242880 1024.00 0.023124 2.16 0.020863 2.21 1 Verifying contents of the file(s) just written. Wed Oct 31 22:37:56 2018 remove - - - - - - 0.028687 1 Commencing write performance test: Wed Oct 31 22:38:02 2018 write 4614 5242880 1024.00 0.023486 2.18 0.020511 2.22 2 Verifying contents of the file(s) just written. Wed Oct 31 22:38:04 2018 remove - - - - - - 0.029018 2 Commencing write performance test: Wed Oct 31 22:38:12 2018 write 4620 5242880 1024.00 0.023798 2.17 0.021097 2.22 3 Verifying contents of the file(s) just written. Wed Oct 31 22:38:14 2018 remove - - - - - - 0.029168 3 Commencing write performance test: Wed Oct 31 22:38:21 2018 write 4608 5242880 1024.00 0.024092 2.18 0.020979 2.22 4 Verifying contents of the file(s) just written. Wed Oct 31 22:38:23 2018 remove - - - - - - 0.028924 4 Max Write: 4636.61 MiB/sec (4861.84 MB/sec) Summary of all tests: Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Max(OPs) Min(OPs) Mean(OPs) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggs(MiB) API RefNum write 4636.61 4541.56 4603.98 32.64 4636.61 4541.56 4603.98 32.64 2.22428 0 2 1 5 0 0 1 0 0 1 5368709120 1048576 10240.0 DAOS 0 Finished : Wed Oct 31 22:38:33 2018
daos_bench
kv-idx-update
Time: 399.307237 seconds (2504.337278 ops per second)
[sdwillso@boro-4 ~]$ orterun -np 1 -x FI_PSM2_DISCONNECT=1 --mca mtl ^psm2,ofi --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-idx-update --testid=1 --svc=0 --dpool=30099e37-040e-42a9-8eb2-c8cbda0e6148 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000 ================================ DAOSBENCH (KV) Started at Thu Nov 1 19:25:42 2018 ================================= =============================== Test Setup --------------- Test: kv-idx-update DAOS pool :30099e37-040e-42a9-8eb2-c8cbda0e6148 DAOS container :ce9ffdb8-054e-4f42-a9c4-8188d38b0426 Value buffer size: 64 Number of processes: 1 Number of indexes/process: 1000000 Number of asynchronous I/O: 32 =============================== kv-idx-update Time: 399.307237 seconds (2504.337278 ops per second) Ended at Thu Nov 1 19:32:26 2018
kv-dkey-update
Time: 0.088867 seconds (1125.278100 ops per second)
[sdwillso@boro-4 ~]$ orterun -np 1 -x FI_PSM2_DISCONNECT=1 --mca mtl ^psm2,ofi --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-dkey-update --testid=1 --svc=1 --dpool=fd8ab9b3-1f49-45ad-973c-616cb453e1a9 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000 ================================ DAOSBENCH (KV) Started at Thu Nov 1 19:35:30 2018 ================================= =============================== Test Setup --------------- Test: kv-dkey-update DAOS pool :fd8ab9b3-1f49-45ad-973c-616cb453e1a9 DAOS container :d3e1acdf-c922-45fe-8461-fd8e927d2604 Value buffer size: 64 Number of processes: 1 Number of keys/process: 100 Number of asynchronous I/O: 32 =============================== kv-dkey-update Time: 0.088867 seconds (1125.278100 ops per second) Ended at Thu Nov 1 19:35:31 2018
kv-akey-update
Time: 0.068169 seconds (1466.935135 ops per second)
[sdwillso@boro-4 ~]$ orterun -np 1 -x FI_PSM2_DISCONNECT=1 --mca mtl ^psm2,ofi --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-akey-update --testid=1 --svc=1 --dpool=4dad8cbc-236a-4364-a7b1-dd59bb212642 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000 ================================ DAOSBENCH (KV) Started at Thu Nov 1 19:37:12 2018 ================================= =============================== Test Setup --------------- Test: kv-akey-update DAOS pool :4dad8cbc-236a-4364-a7b1-dd59bb212642 DAOS container :cac9cd5b-b74c-4826-8317-25e17085e50d Value buffer size: 64 Number of processes: 1 Number of keys/process: 100 Number of asynchronous I/O: 32 =============================== kv-akey-update Time: 0.068169 seconds (1466.935135 ops per second) Ended at Thu Nov 1 19:37:12 2018
kv-dkey-fetch
Time: 0.049400 seconds (2024.283674 ops per second)
[sdwillso@boro-4 ~]$ orterun -np 1 -x FI_PSM2_DISCONNECT=1 --mca mtl ^psm2,ofi --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-dkey-fetch --testid=1 --svc=1 --dpool=5ca6999f-15a8-41f4-86ac-9494fd891876 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000 ================================ DAOSBENCH (KV) Started at Thu Nov 1 19:38:33 2018 ================================= =============================== Test Setup --------------- Test: kv-dkey-fetch DAOS pool :5ca6999f-15a8-41f4-86ac-9494fd891876 DAOS container :b4bfc4e0-87d0-4ec7-bcd9-03e46235abf8 Value buffer size: 64 Number of processes: 1 Number of keys/process: 100 Number of asynchronous I/O: 32 =============================== kv-dkey-fetch Time: 0.049400 seconds (2024.283674 ops per second) Ended at Thu Nov 1 19:38:33 2018
kv-akey-fetch
Time: 0.038302 seconds (2610.806612 ops per second)
[sdwillso@boro-4 ~]$ orterun -np 1 -x FI_PSM2_DISCONNECT=1 --mca mtl ^psm2,ofi --hostfile ~/hostlists/daos_client_hostlist --ompi-server file:~/scripts/uri.txt daosbench --test=kv-akey-fetch --testid=1 --svc=1 --dpool=a87c1051-bef3-4929-ba42-d3a742e532d1 --container=`uuidgen` --object-class=tiny --aios=32 --indexes=1000000 ================================ DAOSBENCH (KV) Started at Thu Nov 1 19:39:53 2018 ================================= =============================== Test Setup --------------- Test: kv-akey-fetch DAOS pool :a87c1051-bef3-4929-ba42-d3a742e532d1 DAOS container :41bd8e41-5444-4323-bd72-d820c9ea7429 Value buffer size: 64 Number of processes: 1 Number of keys/process: 100 Number of asynchronous I/O: 32 =============================== kv-akey-fetch Time: 0.038302 seconds (2610.806612 ops per second) Ended at Thu Nov 1 19:39:53 2018
CaRT Self-Test
Small IO
[sdwillso@boro-4 mpich]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes 0 --max-inflight-rpcs 16 --repetitions 100000 Adding endpoints: ranks: 0 (# ranks = 1) tags: 0 (# tags = 1) Warning: No --master-endpoint specified; using this command line application as the master endpoint Self Test Parameters: Group name to test against: daos_server # endpoints: 1 Message sizes: [(0-EMPTY 0-EMPTY)] Buffer addresses end with: <Default> Repetitions per size: 100000 Max inflight RPCs: 16 host boro-4.boro.hpdd.intel.com finished self_test duration 0.339317 S. ################################################## Results for message size (0-EMPTY 0-EMPTY) (max_inflight_rpcs = 16): Master Endpoint 0:0 ------------------- RPC Bandwidth (MB/sec): 0.00 RPC Throughput (RPCs/sec): 294710 RPC Latencies (us): Min : 31 25th %: 51 Median : 52 75th %: 52 Max : 968 Average: 53 Std Dev: 8.99 RPC Failures: 0 Endpoint results (rank:tag - Median Latency (us)): 0:0 - 52
Large IO Bulk PUT
[sdwillso@boro-4 mpich]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes "0 b1048576" --max-inflight-rpcs 16 --repetitions 1000 Adding endpoints: ranks: 0 (# ranks = 1) tags: 0 (# tags = 1) Warning: No --master-endpoint specified; using this command line application as the master endpoint Self Test Parameters: Group name to test against: daos_server # endpoints: 1 Message sizes: [(0-EMPTY 1048576-BULK_PUT)] Buffer addresses end with: <Default> Repetitions per size: 1000 Max inflight RPCs: 16 host boro-4.boro.hpdd.intel.com finished self_test duration 0.133766 S. ################################################## Results for message size (0-EMPTY 1048576-BULK_PUT) (max_inflight_rpcs = 16): Master Endpoint 0:0 ------------------- RPC Bandwidth (MB/sec): 7475.75 RPC Throughput (RPCs/sec): 7476 RPC Latencies (us): Min : 1013 25th %: 2077 Median : 2096 75th %: 2124 Max : 4216 Average: 2130 Std Dev: 284.34 RPC Failures: 0 Endpoint results (rank:tag - Median Latency (us)): 0:0 - 2096
Large IO Bulk GET
[sdwillso@boro-4 mpich]$ orterun -np 1 -ompi-server file:~/scripts/uri.txt self_test --group-name daos_server --endpoint 0:0 --message-sizes "b1048576 0" --max-inflight-rpcs 16 --repetitions 1000 Adding endpoints: ranks: 0 (# ranks = 1) tags: 0 (# tags = 1) Warning: No --master-endpoint specified; using this command line application as the master endpoint Self Test Parameters: Group name to test against: daos_server # endpoints: 1 Message sizes: [(1048576-BULK_GET 0-EMPTY)] Buffer addresses end with: <Default> Repetitions per size: 1000 Max inflight RPCs: 16 host boro-4.boro.hpdd.intel.com finished self_test duration 0.116480 S. ################################################## Results for message size (1048576-BULK_GET 0-EMPTY) (max_inflight_rpcs = 16): Master Endpoint 0:0 ------------------- RPC Bandwidth (MB/sec): 8585.14 RPC Throughput (RPCs/sec): 8585 RPC Latencies (us): Min : 361 25th %: 1827 Median : 1833 75th %: 1901 Max : 3450 Average: 1853 Std Dev: 258.15 RPC Failures: 0 Endpoint results (rank:tag - Median Latency (us)): 0:0 - 1833