...
TACC System information:
Configuration:
- NO NVMe
- NO Persistence Memory
- Server will only use SCM (tmpfs which has size of ~90G only)
Node available with Queue Name
Queue Name Node Type Max Nodes per Job
(assoc'd cores)*Max Duration Max Jobs in Queue* Charge Rate
(per node-hour)skx-dev
SKX 4 nodes
(192 cores)*2 hrs 1* 1 SU skx-normal
SKX 128 nodes
(6,144 cores)*48 hrs 25* 1 SU skx-large
**SKX 868 nodes
(41,664 cores)*48 hrs 3* 1 SU Stampede2 SKX Compute Node Specifications
Model: Intel Xeon Platinum 8160 ("Skylake") Total cores per SKX node: 48 cores on two sockets (24 cores/socket) Hardware threads per core: 2 Hardware threads per node: 48 x 2 = 96 Clock rate: 2.1GHz nominal (1.4-3.7GHz depending on instruction set and number of active cores) RAM: 192GB (2.67GHz) DDR4 Cache: 32KB L1 data cache per core; 1MB L2 per core; 33MB L3 per socket. Each socket can cache up to 57MB (sum of L2 and L3 capacity). Local storage: 144GB /tmp
partition on a 200GB SSD. Size of/tmp
partition as of 14 Nov 2017.
Server:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
# single server instance per config file for now
servers:
-
targets: 16 # Confirm the number of targets
first_core: 0 # offset of the first core for service xstreams
nr_xs_helpers: 1 # count of offload/helper xstreams per target
fabric_iface: ib0 # map to OFI_INTERFACE=ib0
fabric_iface_port: 31416 # map to OFI_PORT=31416
log_mask: ERR # map to D_LOG_MASK=ERR
log_file: /tmp/daos_server.log # map to D_LOG_FILE=/tmp/server.log
# Environment variable values should be supplied without encapsulating quotes.
env_vars: # influence DAOS IO Server behaviour by setting env variables
- CRT_TIMEOUT=120
- CRT_CREDIT_EP_CTX=0
- PSM2_MULTI_EP=1
- CRT_CTX_SHARE_ADDR=1
- PMEMOBJ_CONF=prefault.at_open=1;prefault.at_create=1; # Do we need this?
- PMEM_IS_PMEM_FORCE=1 # Do we need this?
# Storage definitions
# When scm_class is set to ram, tmpfs will be used to emulate SCM.
# The size of ram is specified by scm_size in GB units.
scm_mount: /dev/shm # map to -s /mnt/daos
scm_class: ram
scm_size: 90
# When scm_class is set to dcpm, scm_list is the list of device paths for
# AppDirect pmem namespaces (currently only one per server supported).
# scm_class: dcpm
# scm_list: [/dev/pmem0]
# If using NVMe SSD (will write /mnt/daos/daos_nvme.conf and start I/O
# service with -n <path>)
# bdev_class: nvme
# bdev_list: ["0000:81:00.0"] # generate regular nvme.conf
# If emulating NVMe SSD with malloc devices
# bdev_class: malloc # map to VOS_BDEV_CLASS=MALLOC
# bdev_size: 4 # malloc size of each device in GB.
# bdev_number: 1 # generate nvme.conf as follows:
# [Malloc]
# NumberOfLuns 1
# LunSizeInMB 4000
# If emulating NVMe SSD over kernel block device
# bdev_class: kdev # map to VOS_BDEV_CLASS=AIO
# bdev_list: [/dev/sdc] # generate nvme.conf as follows:
# [AIO]
# AIO /dev/sdc AIO2
# If emulating NVMe SSD with backend file
# bdev_class: file # map to VOS_BDEV_CLASS=AIO
# bdev_size: 16 # file size in GB. Create file if does not exist.
# bdev_list: [/tmp/daos-bdev] # generate nvme.conf as follows:
# [AIO]
# AIO /tmp/aiofile AIO1 4096 |
Server
- file
Environment variables (If set any):
Client Configuration:
Configuration:
Environment variables (If set any):
Testing Area | Test | Test Priority (1- HIGH, 2 - LOW) | Number of Servers | Number of Clients | Input Parameter | Expected Result | Observed Result | Defect | Notes | Expected SU's (1 node * 1 hour = 1 SU) |
---|---|---|---|---|---|---|---|---|---|---|
Server YAML config options | To verify the test cases from below section with specific server config options in YAML file | 1 | 1, 8, 64, 128 | 1, 16, 128, 740 | target = [16] nr_xs_helpers = [1] CRT_CTX_SHARE_ADDR=[0, 1] | No sever crash, Performance should linearly | No need individual test but below test can be used this configuration | |||
Performance | No Replica Run IOR and collect BW Run IOR small size and collect IOPS | 1 | 1, 8, 64, 128 | 1, 16, 128, 740 | Transfer Size: 256B 4K 128K 512K 1M Block Size: 64M (Depend upon no. of process as file size will increase because of it) FPP and SSF | single server got ~12GB Read/write so it should scale linearly. | ||||
Replica 2 Way Run IOR and collect BW Run IOR small size and collect IOPS | 1 | |||||||||
Replica 3 Way Run IOR and collect BW Run IOR small size and collect IOPS | 1 | |||||||||
Replica 4 Way Run IOR and collect BW Run IOR small size and collect IOPS | 1 | |||||||||
Metadata Test (Using MDTest) | ||||||||||
CART self_test | ||||||||||
POSIX (Fuse)? | ||||||||||
Functionality and Scale testing | Run all daos_test | |||||||||
Single server/Max clients | 1 | MAX | ||||||||
Max servers/single client | MAX | 1 | ||||||||
Large number of Pools (~1000) what other operation needed after pool creation? | ||||||||||
dmg utility testing for example: pool query | ||||||||||
Negative Scenarios with Scalability | ||||||||||
Reliability and Data Integrity (Soak testing) | ||||||||||