Now DAOS is a much larger established codebase it’s become apparent that the original logging macros are insufficient, and inconsistently used across the code, leading to poor user experience and difficulty in reporting or diagnosing issues.

The most commonly used debug levels are:

The coding standards have always said that error logs should include the value of rc, and that it should be at the end of the line however this has been inconsistency applied. The addition of DF_RC to include errors strings as well as numbers was welcome but didn’t address the inconsistency issue, and means that the error messages take up more space in the source for largely boiler-plate text, in addition only some areas of the codebase have adopted them.

Some effort has been made to check for consistency in error output and looking for a standard format, however the depends on having tests that provoke error paths which is time consuming (although productive) and the fixes have involved adding boiler-plate text to the source - improving the quality of error reports for users, but making the code itself larger and harder to read.

The new error macros resolve this issue by taking the rc value as an option to the macro itself, this allows the boiler-plate text to go in the macro and the have it apply the formatting - other printf formatting can still be used. No whitespace, formatting or newlines should be included at the end of the new macros, this will be added automatically.

Preferred:

		DL_ERROR(rc, "failed to create handle hash table");
		DL_ERROR(rc, "Failed to replay %s", act_opc2str(act->ac_opc));

Old macros:

		D_ERROR("failed to create handle hash table: "DF_RC"\n",
			DP_RC(rc));
		D_ERROR("Failed to replay %s, "DF_RC"\n", act_opc2str(act->ac_opc), DP_RC(rc));

New macros, ERROR is shown here but they are all repeated for INFO and WARN.

The system S variants should be used when a system error number is used, typically in the posix code or when interacting with the base OS.

The handle H variants should be used when there is an associated descriptor with which the error is associated, for example a rpc descriptor or hash table etc. For now these just include the pointer value in the logs after the function name although in the future we plan to be able to do more with this. We have the potential to auto-format messages based on the descriptor itself, for example include uuids/ranks etc.

        DHS_INFO(dfuse_info, rc, "duns_resolve_path() on path");

Continuous integration checks.

As the new macros do not tolerate a newline at the end of the string but the old ones did (one will be added automatically if missing) and standard practice is to install one then there’s an obvious danger that developers will include one incorrectly, as such there is a new CI check to detect this and warn about them.

Logging without RC value

For cases where a RC value is not available there is no changes as a result of the new macros.

Further work

There are other macros used throughout the codebase for other reasons, the DFUSE macros are largely redundant as a result of these changes RPC_ERROR is similar to DL_ERROR however has some additional features which are not present yet, VOS_TX_LOG_FAIL also has some unique features which could be useful elsewhere.

Once a proportion of the code are using the new macros we can add metrics for error counts by error, and feed this into telemetry and test results etc. For areas of concern we could choose to proactively migrate code over to make use of this facility.

The use of the _Generic C macro tools mean that data should allow us to log data from and provided descriptor which could further reduce boilerplate code in the source.

Conclusion

Overall migrating to the new macros should improve both the user experience as well as making debugging and working with the codebase easier, the intention is that they should become the go-to macros that developers use when working on code. In time we should extend the tools to require their use for modified code in pull requests and look to making wholesale changes where appropiate.