Regarding the RAS events, the intent is to provide a table in the online documentation similar to what GPFS does (seehttps://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1pdg_rasevents_gpfs.htm).

The format of the DAOS RAS event will be as follows:

Field

Optional/Mandatory

Description

ID

Mandatory

Unique event identifier referenced in the manual. 64-char string.

TypeMandatoryEvent type e.g. STATE_CHANGE or INFO_ONLY.

Timestamp

Mandatory

Fully qualified timestamp associated with the event. Resolution at the microseconds and include the timezone offset to avoid locality issues.

Severity

Mandatory

Fatal/Warning/Error/Info

Msg

Mandatory

Human readable message.

HID

Optional

Identify hardware component involved in the event. E.g. PCI address for SSD, network interface, …

Rank

Optional

DAOS rank involved in the event.

PIDOptionalIdentifier of the process involved in the RAS event
TIDOptionalIdentifier of the thread involved in the RAS event.
JOBIDOptionalIdentifier of the job involved in the RAS event.

Hostname

Optional

Hostname of the node involved in the event.

PUUID

Optional

Pool UUID involved in the event, if any.

CUUID

Optional

Container UUID involved in the event, if relevant.

OID

Optional

Object identifier involved in the event, if relevant.

Control Operation

Optional

Recommended automatic action, if any.

Data

Optional

Specific instance data treated as a blob.

 

RAS events include:

A plugin interface will be available to allow emitting RAS events via different channel (e.g. syslog, ...). They will also be logged into the io/control logs  to ease troubleshooting after the fact by placing the errors in line with any other messages which are emitted.