Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 22 Next »

Server:

Configuration:

  • NO NVMe
  • NO Persistence Memory
  • Server will only use SCM (tmpfs which has size of ~90G only)
  • Node available with Queue Name

    Queue NameNode TypeMax Nodes per Job
    (assoc'd cores)*
    Max DurationMax Jobs in Queue*Charge Rate
    (per node-hour)
    skx-devSKX4 nodes
    (192 cores)*
    2 hrs1*1 SU
    skx-normalSKX128 nodes
    (6,144 cores)*
    48 hrs25*1 SU
    skx-large**SKX868 nodes
    (41,664 cores)*
    48 hrs3*1 SU
  • Stampede2 SKX Compute Node Specifications

    Model: Intel Xeon Platinum 8160 ("Skylake")
    Total cores per SKX node: 48 cores on two sockets (24 cores/socket)
    Hardware threads per core: 2
    Hardware threads per node: 48 x 2 = 96
    Clock rate: 2.1GHz nominal (1.4-3.7GHz depending on instruction set and number of active cores)
    RAM: 192GB (2.67GHz) DDR4
    Cache: 32KB L1 data cache per core; 1MB L2 per core; 33MB L3 per socket. Each socket can cache up to 57MB (sum of L2 and L3 capacity).
    Local storage: 144GB /tmp partition on a 200GB SSD. Size of /tmp partition as of 14 Nov 2017.
    DAOS Server yaml
    # single server instance per config file for now
    servers:
    -
      targets: 7                # count of storage targets per each server
      first_core: 0             # offset of the first core for service xstreams
      nr_xs_helpers: 1          # count of offload/helper xstreams per target
      fabric_iface: ib0        # map to OFI_INTERFACE=ib0
      fabric_iface_port: 31416  # map to OFI_PORT=31416
      log_mask: ERR     # map to D_LOG_MASK=ERR
      log_file: /tmp/daos_server.log # map to D_LOG_FILE=/tmp/server.log
    
      # Environment variable values should be supplied without encapsulating quotes.
      env_vars:                 # influence DAOS IO Server behaviour by setting env variables
      - CRT_TIMEOUT=3600
      - PSM2_MULTI_EP=1
      - CRT_CTX_SHARE_ADDR=1
      - CRT_CTX_NUM=2
      - DAOS_IO_MODE=1
      - PMEMOBJ_CONF=prefault.at_open=1;prefault.at_create=1;
      - PMEM_IS_PMEM_FORCE=1
    
      # Storage definitions
    
      # When scm_class is set to ram, tmpfs will be used to emulate SCM.
      # The size of ram is specified by scm_size in GB units.
      scm_mount: /dev/shm   # map to -s /mnt/daos
      scm_class: ram
      scm_size: 500
    
      # When scm_class is set to dcpm, scm_list is the list of device paths for
      # AppDirect pmem namespaces (currently only one per server supported).
      # scm_class: dcpm
      # scm_list: [/dev/pmem0]
    
      # If using NVMe SSD (will write /mnt/daos/daos_nvme.conf and start I/O
      # service with -n <path>)
      # bdev_class: nvme
      # bdev_list: ["0000:81:00.0"]  # generate regular nvme.conf
    
      # If emulating NVMe SSD with malloc devices
      # bdev_class: malloc  # map to VOS_BDEV_CLASS=MALLOC
      # bdev_size: 4                # malloc size of each device in GB.
      # bdev_number: 1              # generate nvme.conf as follows:
                  # [Malloc]
                  #   NumberOfLuns 1
                  #   LunSizeInMB 4000
    
      # If emulating NVMe SSD over kernel block device
      # bdev_class: kdev            # map to VOS_BDEV_CLASS=AIO
      # bdev_list: [/dev/sdc]       # generate nvme.conf as follows:
                  # [AIO]
                  #   AIO /dev/sdc AIO2
    
      # If emulating NVMe SSD with backend file
      # bdev_class: file            # map to VOS_BDEV_CLASS=AIO
      # bdev_size: 16           # file size in GB. Create file if does not exist.
      # bdev_list: [/tmp/daos-bdev] # generate nvme.conf as follows:
                  # [AIO]
                  #   AIO /tmp/aiofile AIO1 4096

    file Environment variables (If set any):



Client Configuration:

Configuration:

Environment variables (If set any):



Testing Area Test

Test Priority (1- HIGH,  2 - LOW)

Number of ServersNumber of ClientsInput ParameterExpected ResultObserved ResultDefectNotesExpected SU's (1 node * 1 hour = 1 SU)
Performance

No Replica

Run IOR and collect BW

Run IOR small size and collect IOPS

11, 8, 64, 1281, 16, 128, 740

Transfer Size: 256B 4K 128K 512K 1M

Block Size: 64M (Depend upon no. of process as file size will increase because of it)

FPP and SSF

single server got ~12GB Read/write so it should scale linearly. 



Replica 2 Way

Run IOR and collect BW

Run IOR small size and collect IOPS

1







Replica 3 Way

Run IOR and collect BW

Run IOR small size and collect IOPS

1







Replica 4 Way

Run IOR and collect BW

Run IOR small size and collect IOPS

1







Metadata Test (Using MDTest)








CART self_test








POSIX (Fuse)?


















Functionality and Scale testingRun all daos_test








Single server/Max clients
1MAX





Max servers/single client
MAX1





Large number of Pools (~1000)

what other operation needed after pool creation?










dmg utility testing

for example: pool query










Negative Scenarios with Scalability







































Reliability and Data Integrity (Soak testing)





























  • No labels