Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Now daos DAOS is a much larger established codebase it’s become apparent that the origional original logging macros are insufficient, and inconsistently used across the code, leading to poor user experience and difficulty in reporting or diagnosing issues.

...

Code Block
languagec
		D_ERROR("failed to create handle hash table: "DF_RC"\n",
			DP_RC(rc));
		D_ERROR("Failed to replay %s, "DF_RC"\n", act_opc2str(act->ac_opc), DP_RC(rc));

New macros, ERROR is shown here but they are all repeated for INFO and WARN.

  • DL_ERROR: Logs with an RC as enum daos_errno value

  • DHL_ERROR: Logs with an RC as enum daos_error value, and associated pointer/descriptor.

  • DS_ERROR: Logs with an RC as errno value

  • DHS_ERROR: Logs with an RC errno value and associated pointer/descriptor.

The system S variants should be used when a system error number is used, typically in the posix code or when interacting with the base OS.

The handle H variants should be used when there is an associated descriptor with which the error is associated, for example a rpc descriptor or hash table etc. For now these just include the pointer value in the logs after the function name although in the future we plan to be able to do more with this. We have the potential to auto-format messages based on the descriptor itself, for example include uuids/ranks etc.

Code Block
languagec
        DHS_INFO(dfuse_info, rc, "duns_resolve_path() on path");

Continuous integration checks.

As the new macros do not tolerate a newline at the end of the string but the old ones did (one will be added automatically if missing) and standard practice is to install one then there’s an obvious danger that developers will include one incorrectly, as such there is a new CI check to detect this and warn about them.

Logging without RC value

For cases where a RC value is not available there is no changes as a result of the new macros.

Further work

There are other macros used throughout the codebase for other reasons, the DFUSE macros are largely redundant as a result of these changes RPC_ERROR is similar to DL_ERROR however has some additional features which are not present yet, VOS_TX_LOG_FAIL also has some unique features which could be useful elsewhere.

Once a proportion of the code are using the new macros we can add metrics for error counts by error, and feed this into telemetry and test results etc. For areas of concern we could choose to proactively migrate code over to make use of this facility.

The use of the _Generic C macro tools mean that data should allow us to log data from and provided descriptor which could further reduce boilerplate code in the source.

Conclusion

Overall migrating to the new macros should improve both the user experience as well as making debugging and working with the codebase easier, the intention is that they should become the go-to macros that developers use when working on code. In time we should extend the tools to require their use for modified code in pull requests and look to making wholesale changes where appropiate.