TACC System information:

Configuration:

NO NVMe
NO Persistence Memory
Server will only use SCM (tmpfs which has size of ~90G only)

Node available with Queue Name

Queue Name	Node Type	Max Nodes per Job (assoc'd cores)*	Max Duration	Max Jobs in Queue*	Charge Rate (per node-hour)
`skx-dev`	SKX	4 nodes (192 cores)*	2 hrs	1*	1 SU
`skx-normal`	SKX	128 nodes (6,144 cores)*	48 hrs	25*	1 SU
`skx-large`**	SKX	868 nodes (41,664 cores)*	48 hrs	3*	1 SU

Stampede2 SKX Compute Node Specifications

Model:	Intel Xeon Platinum 8160 ("Skylake")
Total cores per SKX node:	48 cores on two sockets (24 cores/socket)
Hardware threads per core:	2
Hardware threads per node:	48 x 2 = 96
Clock rate:	2.1GHz nominal (1.4-3.7GHz depending on instruction set and number of active cores)
RAM:	192GB (2.67GHz) DDR4
Cache:	32KB L1 data cache per core; 1MB L2 per core; 33MB L3 per socket. Each socket can cache up to 57MB (sum of L2 and L3 capacity).
Local storage:	144GB `/tmp` partition on a 200GB SSD. Size of `/tmp` partition as of 14 Nov 2017.

Server:

Code Block

language	yml
title	DAOS Server yaml
collapse	true

# single server instance per config file for now
servers:
-
  targets: 16                		# Confirm the number of targets
  first_core: 0              		# offset of the first core for service xstreams
  nr_xs_helpers: 1           		# count of offload/helper xstreams per target
  fabric_iface: ib0          		# map to OFI_INTERFACE=ib0
  fabric_iface_port: 31416   		# map to OFI_PORT=31416
  log_mask: ERR     		 		# map to D_LOG_MASK=ERR
  log_file: /tmp/daos_server.log 	# map to D_LOG_FILE=/tmp/server.log

  # Environment variable values should be supplied without encapsulating quotes.
  env_vars:                 # influence DAOS IO Server behaviour by setting env variables
  - CRT_TIMEOUT=120
  - CRT_CREDIT_EP_CTX=0
  - PSM2_MULTI_EP=1
  - CRT_CTX_SHARE_ADDR=1
  - PMEMOBJ_CONF=prefault.at_open=1;prefault.at_create=1;  # Do we need this?
  - PMEM_IS_PMEM_FORCE=1								   # Do we need this?

  # Storage definitions

  # When scm_class is set to ram, tmpfs will be used to emulate SCM.
  # The size of ram is specified by scm_size in GB units.
  scm_mount: /dev/shm   # map to -s /mnt/daos
  scm_class: ram
  scm_size: 90

  # When scm_class is set to dcpm, scm_list is the list of device paths for
  # AppDirect pmem namespaces (currently only one per server supported).
  # scm_class: dcpm
  # scm_list: [/dev/pmem0]

  # If using NVMe SSD (will write /mnt/daos/daos_nvme.conf and start I/O
  # service with -n <path>)
  # bdev_class: nvme
  # bdev_list: ["0000:81:00.0"]  # generate regular nvme.conf

  # If emulating NVMe SSD with malloc devices
  # bdev_class: malloc  # map to VOS_BDEV_CLASS=MALLOC
  # bdev_size: 4                # malloc size of each device in GB.
  # bdev_number: 1              # generate nvme.conf as follows:
              # [Malloc]
              #   NumberOfLuns 1
              #   LunSizeInMB 4000

  # If emulating NVMe SSD over kernel block device
  # bdev_class: kdev            # map to VOS_BDEV_CLASS=AIO
  # bdev_list: [/dev/sdc]       # generate nvme.conf as follows:
              # [AIO]
              #   AIO /dev/sdc AIO2

  # If emulating NVMe SSD with backend file
  # bdev_class: file            # map to VOS_BDEV_CLASS=AIO
  # bdev_size: 16           # file size in GB. Create file if does not exist.
  # bdev_list: [/tmp/daos-bdev] # generate nvme.conf as follows:
              # [AIO]
              #   AIO /tmp/aiofile AIO1 4096

Server

file
Environment variables (If set any):

Client Configuration:

Configuration:

Environment variables (If set any):

Testing Area	Test	Test Priority (1- HIGH, 2 - LOW)	Number of Servers	Number of Clients	Input Parameter	Expected Result	Notes
Server YAML config options	To verify the test cases from below section with specific server config options in YAML file	1	1, 8, 64, 128	1, 16, 128, 740	target = [16] nr_xs_helpers = [1] CRT_CTX_SHARE_ADDR=[0, 1]	No sever crash, Performance should linearly	No need individual test but below test can be used this configuration
Performance	No Replica Run IOR and collect BW Run IOR small size and collect IOPS	1	1, 8, 64, 128	1, 16, 128, 740	Transfer Size: 256B 4K 128K 512K 1M Block Size: 64M (Depend upon no. of process as file size will increase because of it) FPP and SSF	single server got ~12GB Read/write so it should scale linearly.
	Replica 2 Way Run IOR and collect BW Run IOR small size and collect IOPS	1
	Replica 3 Way Run IOR and collect BW Run IOR small size and collect IOPS	1
	Replica 4 Way Run IOR and collect BW Run IOR small size and collect IOPS	1
	Metadata Test (Using MDTest)
	CART self_test
	POSIX (Fuse)?

Functionality and Scale testing	Run all daos_test
	Single server/Max clients		1	MAX
	Max servers/single client		MAX	1
	Large number of Pools (~1000) what other operation needed after pool creation?
	dmg utility testing for example: pool query
Negative Scenarios with Scalability



Reliability and Data Integrity (Soak testing)

Versions Compared

Old Version 26

New Version 27

Key

TACC System information:

Configuration:

Server:

Server

Environment variables (If set any):

Client Configuration:

Configuration:

Environment variables (If set any):

Page Comparison

Versions Compared

Old Version 26

New Version 27

Key

TACC System information:

Configuration:

Server:

Server

Environment variables (If set any):

Client Configuration:

Configuration:

Environment variables (If set any):