C Style Guide

Auto-formatting with clang-format and git-clang-format.

As of 2022 and clang versions 14.0.5 or newer then clang-format is able to format the code automatically, there is a .clang-format config file in the source that applies the correct configuration.  This will be automatically applied via a githook if available and configured or can be run manually.  Where established coding practices or what's in this document are inconsistent with clang-format then clang-format takes precedence to allow us to use the tools properly, although we maybe able to adjust configuration options to suit. VS Code has plugins to automatically format changes on save, to enable this select format on save and set format on save mode to modifications

Ways of accessing clang-format

  • A commit-hook with automatically apply correct formatting settings if the compiler version is new enough (EL 8.6 and newer).

  • GitHub actions will run a job indicating correct formatting and present a downloadable patch file if formatting is incorrect.

  • VS Code can be configured to apply formatting on save.

Basics

  1. Line length is 100, reflow code which goes beyond this.

  2. Tabs (8 characters) for indentation at start of line, spaces thereafter.

  3. Function declarations have a new-line after the return type.

  4. Use Doxygen annotations on larger or complex functions.

  5. Use allocation macros, and check the results.

  6. Blank lines after declaration blocks.

Functions & Variables

Functions that are more than a few lines long should be declared with a leading comment block that describes what the function does, any assumptions for the caller (locks already held, locks that should not be held, etc), and return values. DAOS uses the DOxygen markup style for formatting the code comments, as shown in the example here. \a argument can be used in the descriptive text to name a function argument. \param argument definition should be used to define each of the function parameters, and can be followed by [in][out], or [in,out] to indicate whether each parameter is used for input, output, or both. \retval should be used to define the function return values.

right:

/** * Connect to the DAOS pool identified by UUID \a uuid. Upon a successful * completion, \a poh returns the pool handle, and \a info returns the latest * pool information. * * \param[in] uuid UUID to identify a pool. * \param[in] grp Process set name of the DAOS servers managing the pool * \param[in] svc Pool service replica ranks, as reported by * daos_pool_create(). * \param[in] flags Connect mode represented by the DAOS_PC_ bits. * \param[out] poh Returned open handle. * \param[out] info Returned pool info. * \param[in] ev Completion event, it is optional and can be NULL. * The function will run in blocking mode if \a ev is NULL. * * \return These values will be returned by \a ev::ev_error in * non-blocking mode: * 0 Success * -DER_INVAL Invalid parameter * -DER_UNREACH Network is unreachable * -DER_NO_PERM Permission denied * -DER_NONEXIST Pool is nonexistent */ int daos_pool_connect(const uuid_t uuid, const char *grp, const d_rank_list_t *svc, unsigned int flags, daos_handle_t *poh, daos_pool_info_t *info, daos_event_t *ev);

wrong:

/* Connect */ int daos_pool_connect(const uuid_t uuid, const char *grp, const d_rank_list_t *svc, unsigned int flags, daos_handle_t *poh, daos_pool_info_t *info, daos_event_t *ev);

Don't use inline unless you're doing something so performance critical that the function call overhead will make a difference -- in other words - almost never. It makes debugging harder and overuse can actually hurt performance by causing instruction cache thrashing or crashes due to stack overflow.

Do not use typedef for new declarations, as this hides the type of a field for very little benefit. Never typedef pointers, as the * makes C pointer declarations obvious. Hiding it inside a typedef just obfuscates the code.

Do not embed assignments inside Boolean expressions. Although this can make the code one line shorter, it doesn't make it more understandable and you increase the risk of confusing =with == or getting operator precedence wrong if you skimp on brackets. It's even easier to make mistakes when reading the code, so it's much safer simply to avoid it altogether.

right:

ptr = malloc(size); if (ptr != NULL) {

wrong:

Conditional expressions read more clearly if only Boolean expressions are implicit (i.e., non-Boolean and pointer expressions compare explicitly with 0 and NULL respectively.)

right:

wrong:

Use parentheses to help readability and reduce the chance of operator precedence errors, but not so heavily that it is difficult to determine which parentheses are a matched pair or accidentally hide bugs (e.g. the aforementioned assignments inside Boolean expressions).

right:

wrong:

wrong:

wrong:

Code can be much more readable and efficient if the simple or common actions are taken first in a set of tests. Re-ordering conditions like this also eliminates excessive nesting and helps avoid overflowing the 100-column limit.

Do not place else blocks after terminal statements like returncontinuebreak, or goto in the if block.

right:

wrong:

Function bodies that have complex state to clean up in case of an error should do so only once at the end of the function with a single return, and use goto to jump there in case of an error:

right:

wrong:

Variables should be declared one per line, type and name, even if there are multiple variables of the same type. There should be one space between the variable type and the variable name and when multiple variables are declared, the names must be aligned with spaces. For maximum readability, longer or more important declarations should be at the top. There should always be an empty line after the variable declarations, before the start of the code. There shouldn't be complex variable initializers (e.g. function calls) in the declaration block, since this may be easily missed by the reader and make it confusing to debug.

right:

wrong:

Variable declarations should be kept to the most internal scope, if practical and reasonable, to simplify understanding of where these variables are accessed and modified, and to avoid errors of using/reusing variables that only have meaning if certain branches are used:

right:

wrong:

Even for short conditionals, the operation should be on a separate line:

right:

wrong:

When you wrap a line containing parenthesis, start the continued line after the parenthesis so that the expression or argument is visually bracketed.

right:

wrong:

If you're wrapping an expression, put the operator at the end of the line. If there are no parentheses to which to align the start of the next line, just indent one more tab stop.

Binary and ternary (but not unary) operators should be separated from their arguments by one space.

right:

Function calls should be nestled against the parentheses, the parentheses should crowd the arguments, and one space should appear after commas:

right:

wrong:

Put a space between if, for, while, etc. and the following parenthesis. Put a space after each semicolon in a for statement.

right:

wrong:

Opening braces should be on the same line as the line that introduces the block, except for function calls. Bare closing braces (i.e. not else or while in do/while) get their own line.

If one part of a compound if block has braces, all should.

right:

wrong:

External function declarations absolutely should *NEVER* go into .c files. The only exception is forward declarations for static functions in the same file that can't otherwise be moved before the caller. Instead, the declaration should go into the most "local" header available (e.g. subsystem_internal.h for a given subdirectory). Having external function declarations in .c files can cause very difficult to diagnose runtime bugs, because the compiler takes the local declaration as correct, can not compare it to the actual function declared in a different file, and does not have a declaration in a header to compare it against, but the linker does not check that the number and type of function arguments match.

Function declarations in header files should include the variable names for the parameters, so that they are self explanatory in the header without having to look at the code to see what the parameter is:

right:

wrong:

Structure fields should have a common prefix derived from the structure name, so that they are easily found in the code and tag-jump works properly. This avoids problems with generic field names like page or count that have dozens or hundreds of potential matches in the code.

Structure and constant declarations should not be declared in multiple places. Put the struct into the most "local" header possible.

Structure initialization should be done by field names instead of using positional arguments:

right:

wrong:

When allocating/freeing a single struct, use D_ALLOC_PTR() for clarity:

right:

wrong:

Macros

When you define a macro, protect callers by placing parentheses round every parameter reference in the body.

Line up the backslashes of multi-line macros at column 100 to help readability.

Use a do/while (0) block with no trailing semicolon to ensure multi-statement macros are syntactically equivalent to procedure calls.

right:

wrong:

If you write conditionally compiled code in a procedure body, make sure you do not create unbalanced braces, quotes, etc. This really confuses editors that navigate expressions or use fonts to highlight language features. It can often be much cleaner to put the conditionally compiled code in its own helper function which, by good choice of name, documents itself too.

right:

wrong:

If you nest preprocessor commands, use spaces to visually delineate:

For long or nested #ifdef blocks, include the conditional as a comment with each #else and #endif to make it clear which block is being terminated:

Comments

Single-line comments should have the leading /* and trailing */ on the same line as the comment. Multi-line comments should have the leading /* and trailing */ on their own lines, to match the upstream kernel coding style. Intermediate lines should start with a * aligned with the * on the first line:

Assertions

Do not embed operations inside assertions. If assertions are disabled for performance reasons this code will not be executed.

right:

wrong:

Assertions should be used carefully only for detecting internal consistent state and never for input data from the user, data from the network or for legitimate error condition (e.g. allocation failure). Don't combine multiple sanity checks in the same ASSERT() since it is difficult to find out which one failed.

right:

wrong:

Error Messages

Messages on the console (D_ERROR()D_WARN()) and D_INFO() should be written so they provide useful information to the *administrator* and/or support person in case of a *significant event* or *failure condition*. They should not print "debugging" information in cases that might be hit under normal usage or user-generated race conditions, since verbose console error messages lose the important messages in a flood of noise. Consider that there may be thousands of servers and tens of thousands of clients hitting some failure at the same time.

Print the pool or container UUID at the start of the message, since there are usually multiple targets running on a single server or multiple mountpoints on a client. The error messages should also include enough information to make some useful diagnosis of the problem (e.g. FID and/or filename, client NID, actual values that caused failures, etc.). Otherwise, there is little value in having printed an error, but then needing to reproduce the problem to diagnose it:

right:

wrong:

Error messages that are able to include a numeric error value should use the appropriate macros, and contain no punctuation at the end of the line

right:

wrong: