Build DAOS locally and Copy the content on Stampede2 login node:

Login to the Stampede2 and get the home folder full path, For Example: /home1/12345/samirrav
Login as root on Boro reserved machine where you want to build the code.
Create the Folder /home1/12345/samirrav (This needs to match with TACC Home folder path)
Clone daos repo "git clone https://github.com/daos-stack/daos.git"
Do git merge origin/tanabarr/control-no-ipmctl (This patch is required to build code without ipmctl on Stampede2 HW)
Build the daos code
Clone IOR repo git clone https://github.com/daos-stack/ior-hpc.git
Build IOR code and have it install where daos folder /home1/12345/samirrav/daos/install/
Now create the tar.bz2 file for daos folder tar -cjvf daos.tar.bz2 daos/
Copy that daos.tar.bz2 on your Local machine,
From local machine you can scp to the TACC Login node (you can scp directly on TACC machine from Boro system but I do it locally via WinSCP as I don't want to do Token verification every time). This will take few minute for SCP
On Stampede2 node untar the bz2 file using command on the same location matching as on Boro cd /home1/12345/samirrav ; tar xvfj daos.tar.bz2
Have your environment setup script ready with Bin/Lib path exported (Same like we do in Boro or Wolf Environment)
At this point you are ready to run server. Either you can do it manually or you can run sbatch script.

Run manually (During Development):

To get the number of required machine you want using command For example idev -m 60 -N 3 -p skx-dev, This will reserve 3 Nodes for 60min (skx-dev has limit of 2 hours). You might have to wait for some time but skx-dev is faster as you can not have more than 4 node requested. If you need more use skx-normal which has limit of 128. Refer the https://portal.tacc.utexas.edu/user-guides/stampede2#running-queues
Once the slurm reserve the machine you will be on one of the machine console. But you can ssh to other machine from same session or from another Stampede2 login machine session. (User should be able to access the machine until you have the reservation). User will not have ssh access once the reservation is done and node has been released so make sue you get all the logs for debug purpose in case of failure.
Now you can start the server manually orterun --np 1 -x CPATH -x PATH -x LD_LIBRARY_PATH --report-uri /home1/12345/samirrav/hostsfile/uri.txt --enable-recovery daos_server start -i -a /home1/12345/samirrav/daos/install/tmp/ -o /home1/12345/samirrav/hostsfile/daos_server_psm2.yml --debug
From another machine create the pool using dmg command or any other client side operation.

Run via SBATCH:

Use the sample script available from https://jira.hpdd.intel.com/secure/attachment/31378/script_backup.sh

Change below parameter based on your requirement

Slurm Header	Description
#SBATCH -p skx-dev	Partition name where you want to queue the JOB. Each partition has it's own limitation of node and number of hours node can be used. How many JOB can be queued on the partition, You can refer https://portal.tacc.utexas.edu/user-guides/stampede2#running-queues
#SBATCH -N 3	# Total Number of nodes, In this case It's 3 [You need to have NO_OF_SERVERS + NO_OF_CLIENTS +1 one more system needs to be reserved, which will be used for initiating tests. So if you need 1 server and 1 client for testing,need to reserve 3 system for it. If you want 126 server and 1 CN need to reserve 128]
#SBATCH -n 144	# Total Number of mpi tasks (48 x Total No of nodes)
#SBATCH -t 02:00:00	Run time keep it close so in worst case some thing goes stuck it wont end up utilizing the node hours
#SBATCH --mail-user=samir.raval@intel.com	Your email ID so once the script is launched you will notify when JOB started and when JOB finished with it's return code For Example: Slurm Job_id=4546499 Name=test_daos1 Began, Queued time 04:30:54 (4:30 is the time took to start the JOB) Slurm Job_id=4546499 Name=test_daos1 Ended, Run time 00:01:33, COMPLETED, ExitCode 0 (00:01:33 is the time took to complete the JOB)

Change the number of DAOS server/Client count

System used for	Count
DAOS_SERVERS	1
DAOS_CLIENTS	1
URI_FILE	/<LOCAL_PATH>/uri.txt
DAOS_SERVER_YAML	/<LOCAL_PATH>daos_server_psm2.yml
In start_agent()	/<LOCAL_PATH>/daos_agent
In start_server() --attach_info	/<LOCAL_PATH>/tmp

Create the log directory for example /scratch/12345/samirrav/Log and make sure it matches in sbatch script.
Now run the sbatch script.
sbatch scripts/main.sh IOR
-----------------------------------------------------------------
Welcome to the Stampede2 Supercomputer12345
-----------------------------------------------------------------
No reservation for this job
--> Verifying valid submit host (login1)...OK
--> Verifying valid jobname...OK
--> Enforcing max jobs per user...OK
--> Verifying availability of your home dir (/home1/12345/samirrav)...OK
--> Verifying availability of your work dir (/work/12345/samirrav/stampede2)...OK
--> Verifying availability of your scratch dir (/scratch/12345/samirrav)...OK
--> Verifying valid ssh keys...OK
--> Verifying access to desired queue (skx-dev)...OK
--> Verifying job request is within current queue limits...OK
--> Checking available allocation (STAR-Intel)...OK
Submitted batch job 4551152
JOB will be queued and you will see status getting printed "OK", if some thing goes wrong at any stage, it will not queue the JOB and use needs to debug the sbatch script.
Check the status of the JOB using below command. It will update as job gets the resource and runs.
login1(1038)$ squeue | grep samir
4551152 skx-dev test_dao samirrav PD 0:00 3 (Resources)
Once the JOB is finished logs will be copied to Log/4551152/ folder. It will copy all the server/client/agent logs from all the system part of JOB.
User can cancel the job any time using scancel JOB_ID (scancel 4551152)

Avocado setup on TACC (With Python2):

Package needs to be install:

pip install --user avocado-framework==57.0
pip install --user avocado_framework_plugin_loader_yaml==57.0
pip install --user avocado_framework_plugin_result_html==57.0
pip install --user avocado_framework_plugin_varianter_yaml_to_mux==57.0
pip install --user clustershell

Avocado Sanity test:

login2(1221)$ avocado variants --tree -m daos/src/tests/ftest/pool/attribute.yaml
Multiplex tree representation:
┗━━ run
┣━━ hosts
┣━━ server_config
┗━━ attrtests
┣━━ createmode
┣━━ createset
┣━━ createsize
┣━━ name_handles
┃ ╚══ validlongname
┗━━ value_handles
╚══ validvalue

DAOS patch to run Avocado test on TACC:

DAOS_patch

git diff util/server_utils.py
diff --git a/src/tests/ftest/util/server_utils.py b/src/tests/ftest/util/server_utils.py
index 603723e..ddba91a 100644
--- a/src/tests/ftest/util/server_utils.py
+++ b/src/tests/ftest/util/server_utils.py
@@ -80,7 +80,7 @@ class DaosServerConfig(ObjectWithParameters):
             self.targets = BasicParameter(None, 8)
             self.first_core = BasicParameter(None, 0)
             self.nr_xs_helpers = BasicParameter(None, 2)
-            self.fabric_iface = BasicParameter(None, "eth0")
+            self.fabric_iface = BasicParameter(None, "ib0")
             self.fabric_iface_port = BasicParameter(None, 31416)
             self.log_mask = BasicParameter(None, "DEBUG,RPC=ERR,MEM=ERR")
             self.log_file = BasicParameter(None, "/tmp/server.log")
@@ -134,9 +134,9 @@ class DaosServerConfig(ObjectWithParameters):
             #   bdev_list: [/tmp/daos-bdev] - generate nvme.conf as follows:
             #       [AIO]
             #       AIO /tmp/aiofile AIO1 4096
-            self.scm_mount = BasicParameter(None, "/mnt/daos")
+            self.scm_mount = BasicParameter(None, "/dev/shm")
             self.scm_class = BasicParameter(None, "ram")
-            self.scm_size = BasicParameter(None, 6)
+            self.scm_size = BasicParameter(None, 90)
             self.scm_list = BasicParameter(None)
             self.bdev_class = BasicParameter(None)
             self.bdev_list = BasicParameter(None)
@@ -150,8 +150,8 @@ class DaosServerConfig(ObjectWithParameters):
         # Parameters
         self.name = BasicParameter(None, "daos_server")
         self.port = BasicParameter(None, 10001)
-        self.provider = BasicParameter(None, "ofi+sockets")
-        self.socket_dir = BasicParameter(None)          # /tmp/daos_sockets
+        self.provider = BasicParameter(None, "ofi+psm2")
+        self.socket_dir = BasicParameter(None, "/tmp/daos_sockets")          # /tmp/daos_sockets
         self.nr_hugepages = BasicParameter(None, 4096)
         self.control_log_mask = BasicParameter(None, "DEBUG")
         self.control_log_file = BasicParameter(None, "/tmp/daos_control.log")
c455-041[knl](1040)$ git diff util/agent_utils.py
diff --git a/src/tests/ftest/util/agent_utils.py b/src/tests/ftest/util/agent_utils.py
index d8520d0..e5bbe95 100755
--- a/src/tests/ftest/util/agent_utils.py
+++ b/src/tests/ftest/util/agent_utils.py
@@ -70,8 +70,8 @@ def run_agent(basepath, server_list, client_list=None):

     # Verify the domain socket directory is present and owned by this user
     file_checks = (
-        ("Server", server_list, "/var/run/daos_server"),
-        ("Client", client_list, "/var/run/daos_agent"),
+        ("Server", server_list, "/tmp/daos_sockets"),
+        ("Client", client_list, "/tmp/daos_agent"),
     )
     for host_type, host_list, directory in file_checks:
         status, nodeset = check_file_exists(host_list, directory, user)
@@ -88,7 +88,8 @@ def run_agent(basepath, server_list, client_list=None):
     for client in client_list:
         sessions[client] = subprocess.Popen(
             ["ssh", client, "-o ConnectTimeout=10",
-             "{} -i".format(daos_agent_bin)],
+             "{} -i".format(daos_agent_bin),
+             "-s /tmp/daos_agent"],
             stdout=subprocess.PIPE,
             stderr=subprocess.STDOUT
         )
c455-041[knl](1041)$

Avocado test run:

Avocado simplecreate functional test

c455-041[knl](1025)$ ./launch.py simplecreate
Arguments: Namespace(archive=False, clean=False, discard=False, include_localhost=False, list=False, rename=False, sparse=False, tags=['simplecreate'], test_clients=None, test_servers=None)
Running avocado list --paginator off --filter-by-tags=simplecreate ./ | sed -ne '/INSTRUMENTED/s/.* \([^:]*\):.*/\1/p' | uniq
Detected tests:
./pool/simple_create_delete_test.py
Running avocado run --ignore-missing-references on --show-job-log --html-job-result on  --filter-by-tags=simplecreate --mux-yaml ./pool/simple_create_delete_test.yaml -- ./pool/simple_create_delete_test.py
found extension EntryPoint.parse('journal = avocado.plugins.journal:JournalResult')
found extension EntryPoint.parse('tap = avocado.plugins.tap:TAPResult')
found extension EntryPoint.parse('human = avocado.plugins.human:Human')
found extension EntryPoint.parse('teststmpdir = avocado.plugins.teststmpdir:TestsTmpDir')
found extension EntryPoint.parse('jobscripts = avocado.plugins.jobscripts:JobScripts')
found extension EntryPoint.parse('human = avocado.plugins.human:HumanJob')
found extension EntryPoint.parse('yaml_to_mux = avocado_varianter_yaml_to_mux:YamlToMux')
File /etc/avocado/sysinfo/commands does not exist.
File /etc/avocado/sysinfo/files does not exist.
File /etc/avocado/sysinfo/profilers does not exist.
Journalctl collection failed: Command 'journalctl --quiet --lines 1 --output json' failed (rc=1)
Command line: /home1/06739/samirrav/.local/bin/avocado run --ignore-missing-references on --show-job-log --html-job-result on --filter-by-tags=simplecreate --mux-yaml ./pool/simple_create_delete_test.yaml -- ./pool/simple_create_delete_test.py

Avocado version: 57.0

Config files read (in order):
/home1/06739/samirrav/.local/lib/python2.7/site-packages/etc/avocado/avocado.conf
/home1/06739/samirrav/.local/lib/python2.7/site-packages/etc/avocado/conf.d/gdb.conf
/home1/06739/samirrav/.config/avocado/avocado.conf

Avocado config:
Section.Key                             Value
datadir.paths.base_dir                  /var/lib/avocado
datadir.paths.test_dir                  /usr/share/avocado/tests
datadir.paths.data_dir                  /var/lib/avocado/data
datadir.paths.logs_dir                  ~/avocado/job-results
sysinfo.collect.enabled                 True
sysinfo.collect.commands_timeout        -1
sysinfo.collect.installed_packages      False
sysinfo.collect.profiler                False
sysinfo.collect.locale                  C
sysinfo.collect.per_test                False
sysinfo.collectibles.commands           /etc/avocado/sysinfo/commands
sysinfo.collectibles.files              /etc/avocado/sysinfo/files
sysinfo.collectibles.profilers          /etc/avocado/sysinfo/profilers
runner.output.colored                   True
runner.output.utf8
remoter.behavior.reject_unknown_hosts   False
remoter.behavior.disable_known_hosts    False
job.output.loglevel                     debug
restclient.connection.hostname          localhost
restclient.connection.port              9405
restclient.connection.username
restclient.connection.password
plugins.disable                         []
plugins.skip_broken_plugin_notification []
plugins.loaders                         ['file', '@DEFAULT']
gdb.paths.gdb                           /usr/bin/gdb
gdb.paths.gdbserver                     /usr/bin/gdbserver

Avocado Data Directories:

base     /home1/06739/samirrav/avocado
tests    /home1/06739/samirrav/.local/lib/python2.7/site-packages/examples/tests
data     /home1/06739/samirrav/avocado/data
logs     /home1/06739/samirrav/avocado/job-results/job-2019-11-27T17.18-e54628a

Multiplex tree representation:
 \-- run
      |-- hosts
      |-- server_config
      \-- tests
           |-- modes
           |    \-- modeall
           |-- uids
           |    #== validuid
           |-- gids
           |    #== validgid
           \-- setnames
                #== validsetname

Multiplex variants (1):
Variant hosts-server_config-validgid-modeall-validsetname-validuid-5e55:    /run/hosts, /run/server_config, /run/tests/modes/modeall, /run/tests/uids/validuid, /run/tests/gids/validgid, /run/tests/setnames/validsetname
Temporary dir: /tmp/avocado_1lRF7N

Job ID: e54628a065c7bfca9df317a02bf054d11c0f0bdb

File /etc/avocado/sysinfo/commands does not exist.
File /etc/avocado/sysinfo/files does not exist.
File /etc/avocado/sysinfo/profilers does not exist.
Journalctl collection failed: Command 'journalctl --quiet --lines 1 --output json' failed (rc=1)
PARAMS (key=timeout, path=*, default=None) => 600
START 1-./pool/simple_create_delete_test.py:SimpleCreateDeleteTest.test_create;hosts-server_config-validgid-modeall-validsetname-validuid-5e55
Test metadata:
  filename: /home1/06739/samirrav/daos/src/tests/ftest/pool/simple_create_delete_test.py
Job-ID: job-2019-11-27T17.18-e54628a
Test PID: 112275
DATA (filename=output.expected) => NOT FOUND (data sources: variant, test, file)
PARAMS (key=fault_list, path=/run/faults/*/, default=None) => None
PARAMS (key=name, path=/server_config/, default=daos_server) => 'daos_server'
PARAMS (key=test_machines, path=/run/hosts/*, default=None) => ['c455-043']
PARAMS (key=test_servers, path=/run/hosts/*, default=None) => None
PARAMS (key=test_clients, path=/run/hosts/*, default=None) => None
PARAMS (key=server_count, path=/run/hosts/*, default=None) => None
PARAMS (key=client_count, path=/run/hosts/*, default=None) => None
PARAMS (key=bdev_class, path=/server_config/server/, default=None) => None
PARAMS (key=server_partition, path=/run/hosts/*, default=None) => None
PARAMS (key=client_partition, path=/run/hosts/*, default=None) => None
hostlist_servers:  ['c455-043']
hostlist_clients:  None
[stdout] <<1>>
[stdout]

[stdout] <AGENT> agent started on node c455-041 in 3.00518798828 seconds
[stdout]

[stdout] Creating the server yaml file
[stdout]

PARAMS (key=control_log_file, path=/run/server_config/*, default=/tmp/daos_control.log) => '/tmp/daos_control.log'
PARAMS (key=control_log_mask, path=/run/server_config/*, default=DEBUG) => 'DEBUG'
PARAMS (key=group_name, path=/run/server_config/*, default=None) => None
PARAMS (key=name, path=/run/server_config, default=daos_server) => 'daos_server'
PARAMS (key=nr_hugepages, path=/run/server_config/*, default=4096) => 4096
PARAMS (key=port, path=/run/server_config/*, default=10001) => 10001
PARAMS (key=provider, path=/run/server_config/*, default=ofi+psm2) => 'ofi+psm2'
PARAMS (key=socket_dir, path=/run/server_config/*, default=/tmp/daos_sockets) => '/tmp/daos_sockets'
PARAMS (key=user_name, path=/run/server_config/*, default=None) => None
PARAMS (key=bdev_class, path=/run/server_config/servers/*, default=None) => None
PARAMS (key=bdev_list, path=/run/server_config/servers/*, default=None) => None
PARAMS (key=bdev_number, path=/run/server_config/servers/*, default=None) => None
PARAMS (key=bdev_size, path=/run/server_config/servers/*, default=None) => None
PARAMS (key=env_vars, path=/run/server_config/servers/*, default=['ABT_ENV_MAX_NUM_XSTREAMS=100', 'ABT_MAX_NUM_XSTREAMS=100', 'DAOS_MD_CAP=1024', 'CRT_CTX_SHARE_ADDR=0', 'CRT_TIMEOUT=30', 'FI_SOCKETS_MAX_CONN_RETRY=1', 'FI_SOCKETS_CONN_TIMEOUT=2000']) => ['ABT_ENV_MAX_NUM_XSTREAMS=100', 'ABT_MAX_NUM_XSTREAMS=100', 'DAOS_MD_CAP=1024', 'CRT_CTX_SHARE_ADDR=0', 'CRT_TIMEOUT=30', 'FI_SOCKETS_MAX_CONN_RETRY=1', 'FI_SOCKETS_CONN_TIMEOUT=2000']
PARAMS (key=fabric_iface, path=/run/server_config/servers/*, default=ib0) => 'ib0'
PARAMS (key=fabric_iface_port, path=/run/server_config/servers/*, default=31416) => 31416
PARAMS (key=first_core, path=/run/server_config/servers/*, default=0) => 0
PARAMS (key=log_file, path=/run/server_config/servers/*, default=/tmp/server.log) => '/tmp/server.log'
PARAMS (key=log_mask, path=/run/server_config/servers/*, default=DEBUG,RPC=ERR,MEM=ERR) => 'DEBUG,RPC=ERR,MEM=ERR'
PARAMS (key=nr_xs_helpers, path=/run/server_config/servers/*, default=2) => 2
PARAMS (key=scm_class, path=/run/server_config/servers/*, default=ram) => 'ram'
PARAMS (key=scm_list, path=/run/server_config/servers/*, default=None) => None
PARAMS (key=scm_mount, path=/run/server_config/servers/*, default=/dev/shm) => '/dev/shm'
PARAMS (key=scm_size, path=/run/server_config/servers/*, default=90) => 90
PARAMS (key=targets, path=/run/server_config/servers/*, default=8) => 8
Updated param log_file => None
[stdout] Removing any existing server processes
[stdout]

[stdout] c455-043: failure running 'pkill '(daos_server|daos_io_server)' --signal INT; sleep 5; pkill '(daos_server|daos_io_server)' --signal KILL': rc=1
[stdout]

[stdout] Cleaning the server tmpfs directories
[stdout]

[stdout] Start CMD>>>>/home1/06739/samirrav/daos/opt/ompi/bin/orterun --np 1 --hostfile /tmp/avocado_1lRF7N/1-._pool_simple_create_delete_test.py_SimpleCreateDeleteTest.test_create_hosts-server_config-validgid-modeall-validsetname-validuid-5e55/hostfile91322 --enable-recovery -x PATH /home1/06739/samirrav/daos/install/bin/daos_server --debug --config /home1/06739/samirrav/daos/src/tests/ftest/data/daos_avocado_test.yaml start -i -a /home1/06739/samirrav/daos/install/tmp
[stdout]

[stdout] <SERVER>: Starting Servers

[stdout] DEBUG 17:18:51.697327 netdetect.go:749: Input provider string: ofi+psm2

[stdout] DEBUG 17:18:52.562352 netdetect.go:731: ValidateProviderConfig (device: ofi+psm2, provider ib0) returned error: Device ib0 does not support provider: ofi+psm2

[stdout] DEBUG 17:18:52.562518 main.go:94: DAOS config loaded from /home1/06739/samirrav/daos/src/tests/ftest/data/daos_avocado_test.yaml

[stdout] DEBUG 17:18:52.562684 netdetect.go:749: Input provider string: ofi+psm2

[stdout] DEBUG 17:18:53.406461 netdetect.go:731: ValidateProviderConfig (device: ofi+psm2, provider ib0) returned error: Device ib0 does not support provider: ofi+psm2

[stdout] /home1/06739/samirrav/daos/install/bin/daos_server logging to file /tmp/daos_control.log

[stdout] DEBUG 17:18:53.406969 start.go:147: Switching control log level to DEBUG

[stdout] DEBUG 17:18:53.407483 server.go:52: cfg: &server.Configuration{ControlPort:10001, TransportConfig:(*security.TransportConfig)(0xc00018ccb0), Servers:[]*ioserver.Config{(*ioserver.Config)(0xc000001b00)}, BdevInclude:[]string(nil), BdevExclude:[]string(nil), NrHugepages:4096, ControlLogMask:3, ControlLogFile:"/tmp/daos_control.log", ControlLogJSON:false, UserName:"", GroupName:"", SystemName:"daos_server", SocketDir:"/tmp/daos_sockets", Fabric:ioserver.FabricConfig{Provider:"ofi+psm2", Interface:"", InterfacePort:0, PinnedNumaNode:(*uint)(nil)}, Modules:"", Attach:"/home1/06739/samirrav/daos/install/tmp", AccessPoints:[]string{"localhost"}, FaultPath:"", FaultCb:"", Hyperthreads:false, Path:"/home1/06739/samirrav/daos/src/tests/ftest/data/daos_avocado_test.yaml", ext:(*server.ext)(0xc000170860), NvmeShmID:1363321369, validateProviderFn:(server.networkProviderValidation)(0x5c05b0), validateNUMAFn:(server.networkNUMAValidation)(0x5c1290)}

[stdout] DEBUG 17:18:53.416692 config.go:403: Active config saved to /home1/06739/samirrav/daos/src/tests/ftest/data/.daos_server.active.yml (read-only)

[stdout] DEBUG 17:18:53.417084 bdev.go:197: spdk : bdev_list empty in config, no nvme.conf generated for server

[stdout] Starting SPDK v18.07-pre / DPDK 18.02.0 initialization...

[stdout] [ DPDK EAL parameters: spdk -c 0x1 --file-prefix=spdk1363321369 --base-virtaddr=0x200000000000 --proc-type=auto ]

[stdout] DEBUG 17:18:53.582905 ctl_storage.go:97: Warning, NVMe Setup: SPDK env init, has setup been run?: spdk_env_opts_init: 1

[stdout] ipmctl lib not present

[stdout] DEBUG 17:18:53.583238 ipmctl.go:102: discovered 0 DCPM modules

[stdout] DAOS control server listening on 0.0.0.0:10001

[stdout] DEBUG 17:18:53.585482 superblock.go:107: /dev/shm: checking superblock

[stdout] DEBUG 17:18:53.585813 instance.go:176: /dev/shm: checking formatting

[stdout] DEBUG 17:18:53.586674 instance.go:193: /dev/shm (ram) needs format: false

[stdout] DEBUG 17:18:53.590255 exec.go:104: daos_io_server:0 config: &ioserver.Config{Rank:(*ioserver.Rank)(nil), Modules:"", TargetCount:8, HelperStreamCount:2, ServiceThreadCore:0, SystemName:"daos_server", SocketDir:"/tmp/daos_sockets", AttachInfoPath:"/home1/06739/samirrav/daos/install/tmp", LogMask:"DEBUG,RPC=ERR,MEM=ERR", LogFile:"", Storage:ioserver.StorageConfig{SCM:storage.ScmConfig{MountPoint:"/dev/shm", Class:"ram", RamdiskSize:90, DeviceList:[]string(nil)}, Bdev:storage.BdevConfig{ConfigPath:"", Class:"", DeviceList:[]string(nil), DeviceCount:0, FileSize:0, ShmID:1363321369, VosEnv:"", Hostname:"c455-043.stampede2.tacc.utexas.edu"}}, Fabric:ioserver.FabricConfig{Provider:"ofi+psm2", Interface:"ib0", InterfacePort:31416, PinnedNumaNode:(*uint)(nil)}, EnvVars:[]string{"ABT_ENV_MAX_NUM_XSTREAMS=100", "ABT_MAX_NUM_XSTREAMS=100", "DAOS_MD_CAP=1024", "CRT_CTX_SHARE_ADDR=0", "CRT_TIMEOUT=30", "FI_SOCKETS_MAX_CONN_RETRY=1", "FI_SOCKETS_CONN_TIMEOUT=2000"}, Index:0x0}

[stdout] DEBUG 17:18:53.590648 exec.go:105: daos_io_server:0 args: [-t 8 -x 2 -g daos_server -d /tmp/daos_sockets -a /home1/06739/samirrav/daos/install/tmp -s /dev/shm -i 1363321369 -I 0]

[stdout] DEBUG 17:18:53.590830 exec.go:106: daos_io_server:0 env: [ABT_MAX_NUM_XSTREAMS=100 FI_SOCKETS_CONN_TIMEOUT=2000 OFI_INTERFACE=ib0 ABT_ENV_MAX_NUM_XSTREAMS=100 DAOS_MD_CAP=1024 CRT_CTX_SHARE_ADDR=0 CRT_TIMEOUT=30 FI_SOCKETS_MAX_CONN_RETRY=1 CRT_PHY_ADDR_STR=ofi+psm2 OFI_PORT=31416]

[stdout] Starting I/O server instance 0: /home1/06739/samirrav/daos/install/bin/daos_io_server

[stdout] daos_io_server:0 Using legacy core allocation algorithm

[stdout] DEBUG 17:18:55.781800 instance.go:219: I/O server instance 0 ready: uri:"ofi+psm2://7990b02:0" nctxs:18 drpcListenerSock:"/tmp/daos_sockets/daos_io_server_101035.sock"

[stdout] DEBUG 17:18:55.782517 instance.go:342: start MS

[stdout] Management Service access point started (bootstrapped)

[stdout] daos_io_server:0 DAOS I/O server (v0.6.0) process 101035 started on rank 0 (out of 1) with 8 target, 2 helper XS per target, firstcore 0, host c455-043.stampede2.tacc.utexas.edu.

[stdout]

[stdout] <SERVER> server started and took 10.8743472099 seconds to start
[stdout]

PARAMS (key=mode, path=/run/tests/modes/*, default=None) => [511, 'PASS']
PARAMS (key=uid, path=/run/tests/uids/*, default=860384) => ['valid', 'PASS']
PARAMS (key=gid, path=/run/tests/gids/*, default=814017) => ['valid', 'PASS']
PARAMS (key=setname, path=/run/tests/setnames/*, default=None) => ['daos_server', 'PASS']
Destroying pools
Stopping agents
Stopping servers
[stdout] <SERVER> server stopped
[stdout]

DATA (filename=output.expected) => NOT FOUND (data sources: variant, test, file)
DATA (filename=stdout.expected) => NOT FOUND (data sources: variant, test, file)
DATA (filename=stderr.expected) => NOT FOUND (data sources: variant, test, file)
Not logging /var/log/messages (lack of permissions)
PASS 1-./pool/simple_create_delete_test.py:SimpleCreateDeleteTest.test_create;hosts-server_config-validgid-modeall-validsetname-validuid-5e55

Test results available in /home1/06739/samirrav/avocado/job-results/job-2019-11-27T17.18-e54628a
Total test time: 67s
All avocado tests passed!
c455-041[knl](1026)$

DAOS Community

How to Run daos on TACC Stampede2

Analytics

Build DAOS locally and Copy the content on Stampede2 login node:

Run manually (During Development):

Run via SBATCH:

DAOS_SERVERS

1

DAOS_CLIENTS

1

URI_FILE

/<LOCAL_PATH>/uri.txt

DAOS_SERVER_YAML

/<LOCAL_PATH>daos_server_psm2.yml

In start_agent()

/<LOCAL_PATH>/daos_agent

In start_server() --attach_info

/<LOCAL_PATH>/tmp

Avocado setup on TACC (With Python2):

Package needs to be install:

Avocado Sanity test:

DAOS patch to run Avocado test on TACC:

Avocado test run:

Related content