Request from H3C

Submitter: Stephen Pu from H3C

Reviewer: Johann Lombardi (Deactivated) Liang Zhen and other team members from Daos

Status: Need more specification and under review

Expected result:

Request item scope is defined for Daos and H3C collaboration before Q4.2022.

Scope may divide into 2~3 sub iterations.

1. Request priority defined and aligned.

2. Request specified for Product definition.(NOT design stage)

3. General feasibility and estimation could be given.

NAME	Name	Scenario	Description	H3C Proposal	Daos feedback	Priority	If contributed to community	Feasibility and effort estimation	Owner(i.e. who own design who own dev and testing)	Delivery plan (Q1, Q2, Q3 in 2022)	Risk and other comments/concern
volume management	create		User can create a block storage volume by specific size, name and others attributs
	delete		User can delete a block storage volume by specific volume name or uuid
	expand	online, offline	The volume size can be expanded in the IO operation service of online or offline cases.
	search		The specific volume could be searched by their volume name or uuid
	thin provisioning vol		The volume presented appears to be the full provisioned capacity to the application servers, but nothing has been allocated until write operations occur.
	thick provisioning vol		With thick provisioning, the complete amount of virtual disk storage capacity is pre-allocated on the physical storage when the virtual disk is created. A thick-provisioned virtual disk consumes all the space allocated to it in the datastore right from the start, so the space is unavailable for use by other virtual machines or instance.
	modify		The volume's atributes can by modified, like: name
	recycle bin		Recycle Bin is a snapshot recovery feature that enables you to restore accidentally deleted snapshots. When using Recycle Bin, if your snapshots are deleted, they are retained in the Recycle Bin for a time period that you specify.
	snapshoot		You can back up the data on your volumes by taking point-in-time snapshots. Snapshots are incremental backups, which means that only the blocks on the device that have changed after your most recent snapshot are saved. This minimizes the time required to create the snapshot and saves on storage costs by not duplicating data. Each snapshot contains all of the information that is needed to restore your data (from the moment when the snapshot was taken) to a new volume.
	snapshoot
	clone		A clone of a Block Storage volume is a copy made of an existing volume at a specific moment in time.
	QoS		You can use quality of service (QoS) to guarantee that performance of critical workloads is not degraded by competing workloads. You can set a throughput ceiling on a competing workload to limit its impact on system resources, or set a throughput floor for a critical workload, ensuring that it meets minimum throughput targets, regardless of demand by competing workloads. You can even set a ceiling and floor for the same workload. A throughput ceiling limits throughput for a workload to a maximum number of IOPS or MBps, or IOPS and MBps. The workload should CAN'T be out of the +10% to -10% limits scope.
block interface access	NVMe-oF		Use NVMe over Fabric protocol to setup a NVMe block storage.
system management	deployment	Automatic cluster deployment		YES
	installation	Automatic cluster installation		YES
	installation	rollback		YES
	web configuration			YES
	upgrade	online upgrade		YES
		offline upgrade		YES
		rollback		YES
	system monitor	online monitoring		YES
	system monitor	offline monitoring		YES
	alarm			YES
	Log management			YES
	web page			YES
pool	Pool	block storage pool	The user can ceate a specific block storage pool with size or name.
pool	Pool expand	online	The specific block storage pool can be expand its size by no interupt with their operation (online case)
cluster management	Node server	Add new node (online)	The cluster can add a new node server to existing cluster systerm without interupt cluster's operation (online case) The online case means that the storage system still can provide IO service to the custermer. The IO operation fluctuation is allowed, the fluctuation scope CAN'T exceed over 35%.
	Node server	delete a node (online)	The cluster can delete a new node server to existing cluster systerm without interupt cluster's operation (online case) The online case means that the storage system still can provide IO service to the custermer. The IO operation fluctuation is allowed, the fluctuation scope CAN'T exceed over 35%.
	SSD Disk	Add a new SSD (online)	The node server can add a new NVMe SSD without interupt cluster's operation (Online case) The IO operation fluctuation is allowed, the fluctuation scope CAN'T exceed over 20% in the 3 nodes server cluster case..
		Replace a SSD (online)	The node server can replace a new NVMe SSD with original one. (online case) The IO operation fluctuation is allowed, the fluctuation scope CAN'T exceed over 20% in the 3 nodes server cluster case..
		remove exist SSD (online)	The node server can remove a running or failed NVMe SSD in system. (Online case) The IO operation fluctuation is allowed, the fluctuation scope CAN'T exceed over 20% in the 3 nodes server cluster case..
	PMEM Disk	Add a new PMEM (online)	The node server can add a new PMEM without interupt cluster's operation (Online case) The IO operation fluctuation is allowed, the fluctuation scope CAN'T exceed over 20% in the 3 nodes server cluster case..
		Replace a PMEM (online) low priority as PMEM long lifecycel	The node server can replace a newPMEM with original one. (online case) The IO operation fluctuation is allowed, the fluctuation scope CAN'T exceed over 20% in the 3 nodes server cluster case..
		remove exist PMEM (online) Invalid requirment	The node server can remove a running or failed PMEM in system. (Online case) The IO operation fluctuation is allowed, the fluctuation scope CAN'T exceed over 20% in the 3 nodes server cluster case..
cluster enhancement	Network	Network exception robust	The cluster could keep providing IO operation service in the random network failiure case. Descripton: in 3 nodes cluster system, only 1 node's network should be in the random network failiure case testing.
	Network	NIC exception robust	The cluster could keep providing IO operation service in the random node's NIC failiure case. Descripton: in 3 nodes cluster system, only 1 node's NIC should be in the random network failiure case testing.
	SSD Disk Exception		SSD Disk Exception is the test driven case
	Node Exception		Node Exception is the test driven case
	Disk Usage Optimization	PMEM space optimazation	Currently, the PMEM size is divided with the amount of NVMe SSD, it caused a big PMEM space waste. The PMEM's space running out but the NVMe SSD's space still with a lot of usage space.

	Data Rebuild	4TB/H	The data rebuild speed above 4TB/H, in the cluster of 3 nodes each node has 8 NVMe SSD. The IO operation service should not be considered, in simple case.
	Node reboot	IO recovery time	When a node rebooted, how many seconds the IO service could be recovery back to the 100% before it's reboot. The IO reovery time should be in 300 seconds in 3 nodes cluster system