/
Request from H3C

Request from H3C


Submitter: @Stephen Pu from H3C

Reviewer: @Johann Lombardi (Deactivated) @Liang Zhen and other team members from Daos

Status: Need more specification and under review

Expected result:

Request item scope is defined for Daos and H3C collaboration before Q4.2022.

Scope may divide into 2~3 sub iterations.

1. Request priority defined and aligned.

2. Request specified for Product definition.(NOT design stage)

3. General feasibility and estimation could be given.

NAME

Name

Scenario

Description

H3C Proposal

Daos feedback

Priority

If contributed to community

Feasibility and effort estimation

Owner(i.e. who own design who own dev and testing)

Delivery plan

(Q1, Q2, Q3 in 2022)

Risk /Comments

NAME

Name

Scenario

Description

H3C Proposal

Daos feedback

Priority

If contributed to community

Feasibility and effort estimation

Owner(i.e. who own design who own dev and testing)

Delivery plan

(Q1, Q2, Q3 in 2022)

Risk /Comments

volume
 management

create

 

User can create a block storage volume by specific size, name and others attributs

 

Low.

 

 

 

 

 

 

delete

 

User can delete a block storage volume by specific volume name or uuid

 

 

 

 

 

 

 

 

expand

online, offline

The volume size can be expanded in the IO operation service of online or offline cases.

 

 

 

 

 

 

 

 

search

 

The specific volume could be searched by their volume name or uuid

 

 

 

 

 

 

 

 

thin provisioning vol

 

The volume presented appears to be the full provisioned capacity to the application servers, but nothing has been allocated until write operations occur. 

 

 

 

 

 

 

 

 

thick provisioning vol

 

With thick provisioning, the complete amount of virtual disk storage capacity is pre-allocated on the physical storage when the virtual disk is created. A thick-provisioned virtual disk consumes all the space allocated to it in the datastore
right from the start, so the space is unavailable for use by other virtual machines or instance.

 

 

 

 

 

 

 

 

modify

 

The volume's attributes can by modified, like: name

 

 

 

 

 

 

 

 

recycle bin

 

Recycle Bin is a snapshot recovery feature that enables you to restore accidentally deleted
 snapshots. When using Recycle Bin, if your snapshots are deleted, they are retained in the
 Recycle Bin for a time period that you specify.

 

 

 

 

 

 

 

 

snapshoot

 

You can back up the data on your volumes by taking point-in-time snapshots.
Snapshots are incremental backups, which means that only the blocks on the device that
have changed after your most recent snapshot are saved.
This minimizes the time required to create the snapshot and saves on storage costs
by not duplicating data. Each snapshot contains all of the information that is needed
to restore your data (from the moment when the snapshot was taken) to a new volume.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

clone

 

A clone of a Block Storage  volume is a copy made of an existing volume at a specific moment in time.

 

 

 

 

 

 

 

 

QoS

 

You can use quality of service (QoS) to guarantee that performance of critical workloads is not degraded
 by competing workloads. You can set a throughput ceiling on a competing workload to limit its impact
on system resources, or set a throughput floor for a critical workload, ensuring that it meets minimum
throughput targets, regardless of demand by competing workloads. You can even set a ceiling and floor
for the same workload.
A throughput ceiling limits throughput for a workload to a maximum number of IOPS or MBps, or IOPS and MBps.
The workload should CAN'T be out of the +10% to -10% limits scope. 

 

 

 

 

 

 

 

 

block
interface access

NVMe-oF

 

Use NVMe over Fabric protocol to setup a NVMe block storage.

 

 

 

yes

 

 

 

 

system
management

deployment

Automatic cluster deployment

 

YES

 

 

Co-work with daos ansible https://github.com/bensallen/ansible-role-daos

 

 

 

 

installation

Automatic cluster installation

 

YES

 

 

yes

 

 

 

 

rollback

 

YES

 

 

 

 

 

 

 

web potal
configuration

 

 

YES

 

 

 

 

 

 

 

upgrade

online upgrade

 

YES

 

 

 

 

 

 

 

offline upgrade

 

YES

 

 

 

 

 

 

 

rollback

 

YES

 

 

 

 

 

 

 

system
monitor

online monitoring

 

YES

 

 

yes

 

 

 

 

offline monitoring

 

YES

 

 

yes

 

 

 

 

alarm

 

 

YES

 

 

yes

 

 

 

 

Log
management

 

 

YES

 

 

 

 

 

 

 

web page

 

 

YES

 

 

 

 

 

 

 

pool

Pool

block storage pool

The user can create a specific block storage pool with size or name. 

 

 

 

yes

 

 

 

 

Pool expand

online

The specific block storage pool can be expand its size by no interrupt with their operation (online case)

 

 

 

yes

 

 

 

 

cluster
management

Node server

Add new node (online)

The cluster can add a new node server to existing cluster system without interrupt cluster's operation (online case)
The online case means that the storage system still can provide IO service to the customer.
The IO operation fluctuation is allowed, the fluctuation scope CAN'T exceed over 35%. 

 

Yes.

 

 

yes

 

 

 

 

delete a node (online)

The cluster can delete a new node server to existing cluster system without interrupt cluster's operation (online case)
The online case means that the storage system still can provide IO service to the customer.
The IO operation fluctuation is allowed, the fluctuation scope CAN'T exceed over 35%. 

 

Manual Drain and exclude. no impact on workload(online)

 

 

 

 

 

 

SSD Disk

Add a new SSD (online)

The node server can add a new NVMe SSD without interrupt cluster's operation (Online case)
The IO operation fluctuation is allowed, the fluctuation scope CAN'T exceed over 20% in the 3 nodes server cluster case.. 

 

under discussion

High complexity

 

yes

Big Effort

 

 

 

Replace a SSD (online)

The node server can replace a new NVMe SSD with original one. (online case)
The IO operation fluctuation is allowed, the fluctuation scope CAN'T exceed over 20% in the 3 nodes server cluster case.. 

 

yes. (bug fix needed from h3c)

 

 

 

 

 

 

remove exist SSD (online)

The node server can remove a running or failed NVMe SSD in system. (Online case)
The IO operation fluctuation is allowed, the fluctuation scope CAN'T exceed over 20% in the 3 nodes server cluster case.. 

 

yes.

 

 

 

 

 

 

 

PMEM Disk

Add a new PMEM (offline)

The node server can add a new PMEM without interupt cluster's operation (Online case)

Pmem capacity expanding. i.e. First config is 256GB * 12 config but due to adding new nvme disk or improve performance by caching more data into pmem, then expand to 512GB * 12.
The IO operation fluctuation is allowed, the fluctuation scope CAN'T exceed over 20% in the 3 nodes server cluster case.. 

 

2023. can put ahead.

manual dump out pmem metadata and replace pmem and reboot. reformat pmem and recover data.

 

 

 

 

 

 

 

Replace a PMEM (online) low priority as PMEM long lifecycle

The node server can replace a new PMEM with original one. (online case)
The IO operation fluctuation is allowed, the fluctuation scope CAN'T exceed over 20% in the 3 nodes server cluster case.. 

 

Low.

Low priority duet to long-life assurance of pmem

 

 

 

 

 

 

remove exist PMEM (online) Invalid requirment 

The node server can remove a running or failed PMEM in system. (Online case)
The IO operation fluctuation is allowed, the fluctuation scope CAN'T exceed over 20% in the 3 nodes server cluster case.. 

 

 

Invalid

 

 

 

 

 

cluster
enhancement

Network

Network exception robust

The cluster could keep providing IO operation service in the random network failure case.
Description: in 3 nodes cluster system, only 1 node's network should be in the random network failure case testing.

 

By Design. need test and bug fix

 

yes

 

 

 

 

NIC exception robust

The cluster could keep providing IO operation service in the random node's NIC failure case.
Description: in 3 nodes cluster system, only 1 node's NIC should be in the random network failure case testing.

 

No bonding support so far. No plan.

 

yes

 

 

 

 

SSD Disk Exception

 

SSD Disk Exception is the test driven case

 

Framework ready. need further improvement.

 

 

 

 

 

 

Node Exception

 

Node Exception is the test driven case

 

partial support. Need node level telemetry and monitor. Graphinna**

 

yes.

 

 

 

 

Disk Usage Optimization

PMEM space optimization

Currently, the PMEM size is divided with the amount of NVMe SSD, it caused a big PMEM space waste.
The PMEM's space  running out but the NVMe SSD's space still with a lot of usage space.

 

Compatibility issue. no plan. need further discussion.

 

yes

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Data Rebuild

4TB/H

The data rebuild speed above 4TB/H, in the cluster of 3 nodes each node has 8 NVMe SSD.
The IO operation service should not be considered, in simple case.

 

Qos not ready snf interface. need H3C test and perf improvement.

 

 

 

 

 

 

Node reboot 

IO recovery time

When a node rebooted, how many seconds the IO service could be recovery back to the 100% before it's reboot.
The IO reovery time should be in 300 seconds in 3 nodes cluster system.

 

TBD by H3C.

 

 

 

 

 

 

Related content

Top request from H3C and CEC
Top request from H3C and CEC
More like this
Resources
More like this
H3C Test result
H3C Test result
More like this
Sample Pages
Read with this
Tiered Containers (Phase 1)
Tiered Containers (Phase 1)
More like this
Roadmap (updated 2023-11-09)
Roadmap (updated 2023-11-09)
Read with this