Autoformatting with clang-format and git-clang-format.

As of 2022 and clang versions 14.0.5 or newer then clang-format is able to format the code automatically, there is a .clang-format config file in the source that applies the correct configuration. This will be automatically applied via a githook if available and configured or can be run manually. Where established coding practices or what's in this document are inconsistent with clang-format then clang-format takes precedence to allow us to use the tools properly, although we maybe able to adjust configuration options to suit.

Currently only fedora ships with a new enough clang for this to work automatically although pre-compiled versions are available on the wolf cluster.

Indentation

Tabs should be used in all C files. Tab width should be set at 8 characters in your IDE.

Tabs are preferred to align variable names in structure and enum declarations. This may temporarily misalign them with other variable names in the structure that are using spaces for indentation, or consider fixing up the whole struct in the same patch if there aren't too many of them. In some cases (long variable names, long comments, nested unions) then it isn't practical to use full tabs to align the fields and spaces can be used for partial indentation. Moreover, the returned type and the function name should be on two different lines.

^right:

...

Code Block

language	c

void
func_helper(...)

...



        struct foobar_counter  *foo;
        unsigned long           last;
        int                     rc = 0;

^wrong:

...

Code Block

language	c

void func(...)
{
        int rc;
        struct foobar_counter *foo;
        unsigned long last;

All lines should wrap at 100 characters. This is to avoid the need to have extra-wide terminal windows for the few lines of code that exceed 100 columns, and avoids nesting functions and conditionals too deeply. If it's getting too hard to wrap at 100 characters, you probably need to rearrange conditional order or break it up into more functions.

^right:

...

Code Block

language	c

void
func_helper(...)

...



        struct foobar_counter  *foo;
        unsigned long           last;
        int                     rc = 0;
 
        do_sth2_1;
 
        if (cond3)
                do_sth_which_needs_a_very_long_line(and, lots, of, arguments);
 
        do_sth2_2;
}
 
void

...


func(...)

...



        long   first;
        int    i;
 
        if (!cond1)
                return;
 
        do_sth1_1;
 
        if (cond 2)
                func_helper(...)
 
        do_sth1_2;
}

...

^wrong:

...

Code Block

language	c

void
func(...)
{
        int                    rc;
        struct foobar_counter *foo;
        unsigned long          last;
 
        if (cond1) {
                do_sth1_1;
                if (cond2) {
                        do_sth2_1;
                        if (cond3) {
                                do_sth_which_needs_a_very_long(and, lots, of, arguments);
                        }
                        do_sth2_2;
                }
                do_sth1_2;
        }
}

Do not include spaces or tabs on blank lines or at the end of lines. Please ensure you remove all instances of these in any patches you submit to GitHub. You can find them with grep or in vim using the following regexps /\[ \t\]$/ or use git git diff --check. Alternatively, if you use vim, you can put this line in your .vimrc file, which will highlight whitespace at the end of lines and spaces followed by tabs in indentation (only works for C/C++ files) let c_space_errors=1. Or you can use this command, which will make tabs and whitespace at the end of lines visible for all files (but a bit more discretely) set list listchars=tab:>\ ,trail:$. In emacs, you can use (whitespace-mode) or (whitespace-visual-mode)depending on the version. You could also consider using (flyspell-prog-mode).

Functions & Variables

Functions that are more than a few lines long should be declared with a leading comment block that describes what the function does, any assumptions for the caller (locks already held, locks that should not be held, etc), and return values. DAOS uses the DOxygen markup style for formatting the code comments, as shown in the example here. \a argument can be used in the descriptive text to name a function argument. \param argument definition should be used to define each of the function parameters, and can be followed by [in], [out], or [in,out] to indicate whether each parameter is used for input, output, or both. \retval should be used to define the function return values.

^right:

Code Block

language	c

/**

...


 * Connect to the DAOS pool identified by UUID \a uuid. Upon a successful

...


 * completion, \a poh returns the pool handle, and \a info returns the latest

...


 * pool information.

...


 * \param[in]   uuid  UUID to identify a pool.

...


 * \param[in]   grp   Process set name of the DAOS servers managing the pool

...


 * \param[in]   svc   Pool service replica ranks, as reported by

...


 *                    daos_pool_create().

...


 * \param[in]   flags Connect mode represented by the DAOS_PC_ bits.

...


 * \param[out]  poh   Returned open handle.

...


 * \param[out]  info  Returned pool info.

...


 * \param[in]   ev    Completion event, it is optional and can be NULL.

...


 *                    The function will run in blocking mode if \a ev is NULL.

...


 * \return            These values will be returned by \a ev::ev_error in

...


 *                    non-blocking mode:

...


 *                    0             Success

...


 *                    -DER_INVAL    Invalid parameter

...


 *                    -DER_UNREACH  Network is unreachable

...


 *                    -DER_NO_PERM  Permission denied

...


 *                    -DER_NONEXIST Pool is nonexistent

...

*/

...

int

...


daos_pool_connect(const uuid_t uuid, const char *grp,

...


                  const d_rank_list_t *svc, unsigned int flags,

...


                  daos_handle_t *poh, daos_pool_info_t *info, daos_event_t *ev);

^wrong:

Code Block

language	c

/* Connect */

...

int

...


daos_pool_connect(const uuid_t uuid, const char *grp,

...


                  const d_rank_list_t *svc, unsigned int flags,

...


                  daos_handle_t *poh, daos_pool_info_t *info, daos_event_t *ev);

Don't use inline unless you're doing something so performance critical that the function call overhead will make a difference -- in other words - almost never. It makes debugging harder and overuse can actually hurt performance by causing instruction cache thrashing or crashes due to stack overflow.

...

Do not embed assignments inside Boolean expressions. Although this can make the code one line shorter, it doesn't make it more understandable and you increase the risk of confusing =with == or getting operator precedence wrong if you skimp on brackets. It's even easier to make mistakes when reading the code, so it's much safer simply to avoid it altogether.

^right:

...

Code Block

language	c

ptr = malloc(size);
        if (ptr != NULL) {

...

^wrong:

Code Block

language	c

if ((ptr = malloc(size)) != NULL) {

...

Conditional expressions read more clearly if only Boolean expressions are implicit (i.e., non-Boolean and pointer expressions compare explicitly with 0 and NULL respectively.)

^right:

...

Code Block

language	c

void *pointer = NULL;

...

bool  writing = false;

...

int   ref_count = 0;

...

if (!writing &&     /* not writing? */

...

pointer != NULL && /* valid pointer? */

...


        reval == 0)     /* no error is returned? */

...


        do_this();

...

^wrong:

Code Block

language	c

if (writing == 0 && /* not writing? */
    pointer &&

...

/* valid pointer or not? */

...

!retval)        /* return value is good or bad? */

...

not_clear_if_we_do_this_or_not();

...

Use parentheses to help readability and reduce the chance of operator precedence errors, but not so heavily that it is difficult to determine which parentheses are a matched pair or accidentally hide bugs (e.g. the aforementioned assignments inside Boolean expressions).

^right:

...

Code Block

language	c

if (a->a_field == 3 &&

...

((bar & BAR_MASK_1) || (bar & BAR_MASK_2)))

...

do this();

...

^wrong:

Code Block

language	c

if (((a == 1) && (b == 2) && ((c == 3) && (d = 4))))

...

maybe_do_this()

...

^wrong:

...

Code Block

language	c

if (a->a_field == 3 || b->b_field & BITMASK1 && c->c_field & BITMASK2)

...

maybe_do_this()

...

^wrong:

Code Block

language	c

if ((((a->a_field) == 3) || (((b->b_field) & (BITMASK1)) &&

...

((c->c_field) & (BITMASK2)))))

...

maybe_do_this()

...

Code can be much more readable and efficient if the simple or common actions are taken first in a set of tests. Re-ordering conditions like this also eliminates excessive nesting and helps avoid overflowing the 100-column limit.

Do not place else blocks after terminal statements like return, continue, break, or goto in the if block.

^right:

...

Code Block

language	c

list_for_each_entry(...) {

...


        if (!condition1) {

...

do_sth1;
                continue;

...

do_sth2_1;

...

     do_sth2_2;
        ...
        do_sth2_

...

N;
 
        if (!condition2)

...

            break;

...

do_sth3_1;
        do_

...

sth3_

...

2;
        if (!condition3)

...

       break;
        ...

...

do_sth3_N;
}

^wrong:

Code Block

language	c

list_for_each_entry(...) {
        if (condition1)

...

do_sth2_1;

...

do_sth2_2;
                ...

...

        do_sth2_N;

...

        if (condition2) {

...

do_

...

sth3_1;

...

     do_sth3_2;

...

     if (condition3) {

...

...

...

do_

...

sth3_N;

...

      continue;
                        }

...

break;
        } else {
                do_sth1;

...

}
}

Function bodies that have complex state to clean up in case of an error should do so only once at the end of the function with a single return, and use goto to jump there in case of an error:

^right:

Code Block

language	c

static

...

int
bar_init(...)
{
        foo = alloc_foo();
        if (foo == NULL)

...

       return -DER_NOMEM;
        rc = setup_foo(foo);
        if (rc)

...

     goto free_foo;
        rc = init_bar(bar);
        if (rc)

...

   goto cleanup_foo;

        return 0;
 
cleanup_foo:
        cleanup_foo(foo);
free_foo:

...

    free_foo(foo);
 
        return rc;
}

^wrong:

Code Block

language	c

static int
bad_func(...)
{

...

   foo = alloc_foo();
        if (foo == NULL)

...

       return -DER_NOMEM;

...

rc = setup_foo(foo);
        if (rc != 0) {

...

free_foo(foo);

...

return rc;
        }

...

Function bodies that have complex state to clean up in case of an error should do so only once at the end of the function with a single return, and use goto to jump there in case of an error:

...

       rc = init_bar(bar);
        if

...

(rc)

...

      cleanup_foo(foo);

...

free_foo(foo);

...

return rc;

...

return

...

Variable should be declared one per line, type and name, even if there are multiple variables of the same type. There should be one space between the variable type and the variable name and when multiple variables are declared, the names must be aligned with tabs. For maximum readability, longer or more important declarations should be at the top. There should always be an empty line after the variable declarations, before the start of the code. There shouldn't be complex variable initializers (e.g. function calls) in the declaration block, since this may be easily missed by the reader and make it confusing to debug.

right:

...

 
wrong:
        int rc, count;
        struct vos_container *cont;
        cont = container_of(ulink, struct vos_container, vc_uhlink); 
        struct d_uuid         *pkey;

Variable declarations should be kept to the most internal scope, if practical and reasonable, to simplify understanding of where these variables are accessed and modified, and to avoid errors of using/reusing variables that only have meaning if certain branches are used:

...

rc;
}

Variables should be declared one per line, type and name, even if there are multiple variables of the same type. There should be one space between the variable type and the variable name and when multiple variables are declared, the names must be aligned with tabs. For maximum readability, longer or more important declarations should be at the top. There should always be an empty line after the variable declarations, before the start of the code. There shouldn't be complex variable initializers (e.g. function calls) in the declaration block, since this may be easily missed by the reader and make it confusing to debug.

^right:

Code Block

language	c

struct vos_container  *cont;
struct d_uuid         *pkey;
int

...

            count;
int

...

rc;

...

cont

...

= container_of(ulink, struct vos_container, vc_uhlink);

^wrong:

Code Block

language	c

int rc, count;
struct vos_container *cont;
cont = container_of(ulink, struct vos_container, vc_uhlink); 
struct d_uuid

...

 *pkey;

Variable declarations should be kept to the most internal scope, if practical and reasonable, to simplify understanding of where these variables are accessed and modified, and to avoid errors of using/reusing variables that only have meaning if certain branches are used:

^right:

Code Block

language	c

int len;

if (len > 0) {

...

  struct obj  *my_obj;

...

int

...

count;

...

Even for short conditionals, the operation should be on a separate line:

...

        my_obj

...

= oget(foo);
 
        count

...

= my_obj->size;

...

if

...

When you wrap a line containing parenthesis, start the continued line after the parenthesis so that the expression or argument is visually bracketed.

...

(count > 32) {

...

        int left = count;
                ...
        }
}

^wrong:

Code Block

language	c

int len = path_length(bar), count, ret;
struct obj

...

*my_obj = oget(foo);
 
if (len > 0) {
        count = my_obj->size;
        if (count > 42)
                ret = 0;
        } else if (len == 0) {

...

          ret = -ENODATA;
        } else {
                D_ERROR("count is bad:

...

%d\n", count);
        }

...

return ret;

Even for short conditionals, the operation should be on a separate line:

^right:

Code Block

language	c

if (foo)
        bar();

^wrong:

Code Block

language	c

if (foo) bar();

When you wrap a line containing parenthesis, start the continued line after the parenthesis so that the expression or argument is visually bracketed.

^right:

Code Block

language	c

variable =

...

do_

...

something_

...

complicated(

...

long_argument, longer_

...

argument,

...

 longest_argument(sub_argument,

...

foo_argument),

...

                 last_argument);

...

if (some_long_condition(arg1, arg2, arg3) < some

...

_long_value &&

...

another_long_condition(very_long_argument_name

...

         another_long_argument_name) >

...

If you're wrapping an expression, put the operator at the end of the line. If there are no parentheses to which to align the start of the next line, just indent one more tab stop.

second_long_value) {

...

do_something();

^wrong:

Code Block

language	c

variable =

...

do_

...

something_

...

complicated(

...

long_argument, longer_argument,

...

longest_

...

Binary and ternary (but not unary) operators should be separated from their arguments by one space.

...

argument(sub_argument, foo_argument),

...

last_argument);

...

Function calls should be nestled against the parentheses, the parentheses should crowd the arguments, and one space should appear after commas:

...


if (some_long_condition(arg1, arg2, arg3) < some_long_value &&
    another_long_condition(very_long_argument_name,
    another_long_argument_name) >

...

second_long_value) {

...

do_

...

something();

...

Put a space between if, for, while, etc. and the following parenthesis. Put a space after each semicolon in a for statement.

...

If you're wrapping an expression, put the operator at the end of the line. If there are no parentheses to which to align the start of the next line, just indent one more tab stop.

Code Block

language	c

off = le32_to_cpu(fsd->fsd_client_start) +

...

Opening braces should be on the same line as the line that introduces the block, except for function calls. Bare closing braces (i.e. not else or while in do/while) get their own line.

...

cl_idx * le16_to_cpu(fsd->fsd_client_size);

Binary and ternary (but not unary) operators should be separated from their arguments by one space.

^right:

Code Block

language	c

a++;
b |= c;
d = (f > g) ? 0 : 1;

Function calls should be nestled against the parentheses, the parentheses should crowd the arguments, and one space should appear after commas:

^right:

Code Block
do_foo(bar, baz);

^wrong:

Code Block
do_foo ( bar,baz );

Put a space between if, for, while, etc. and the following parenthesis. Put a space after each semicolon in a for statement.

^right:

Code Block

language	c

for (a = 0; a < b; a++)

if (a < b || a == c)

while (1)

^wrong:

Code Block

language	c

for( a=0; a<b; a++ )

if( a<b || a==c )

while( 1 )

Opening braces should be on the same line as the line that introduces the block, except for function calls. Bare closing braces (i.e. not else or while in do/while) get their own line.

Code Block

language	c

int foo(void)
{

...

if

...

(bar) {

...

      this();

...

              that();

...

       } else if (baz) {

...

 stuff(

...

);

...

If one part of a compound if block has braces, all should.

...

else {

...

other_stuff();
        }

...

do

...

cow();
        } while

...

(condition);
}

If one part of a compound if block has braces, all should.

^right:

Code Block

language	c

if (foo) {
        bar();

...

baz();
} else {
        salmon();
}

^wrong:

Code Block

language	c

if (foo) {

...

       bar();

...

      baz();
} else
        moose();

External function declarations absolutely should *NEVER* go into .c files. The only exception is forward declarations for static functions in the same file that can't otherwise be moved before the caller. Instead, the declaration should go into the most "local" header available (e.g. subsystem_internal.h for a given subdirectory). Having external function declarations in .c files can cause very difficult to diagnose runtime bugs, because the compiler takes the local declaration as correct, can not compare it to the actual function declared in a different file, and does not have a declaration in a header to compare it against, but the linker does not check that the number and type of function arguments match.

Function declarations in header files should include the variable names for the parameters, so that they are self explanatory in the header without having to look at the code to see what the parameter is:

^right:

...

Code Block

language	c

void ldlm_lock_addref_internal(struct ldlm_lock *lock, enum ldlm_lock_mode lock_mode, int refcount, int rc);

...

^wrong:

...

Code Block

language	c

void ldlm_lock_addref_internal(struct ldlm_lock *, int, int, int)

...

Structure fields should have a common prefix derived from the structure name, so that they are easily found in the code and tag-jump works properly. This avoids problems with generic field names like page or count that have dozens or hundreds of potential matches in the code.

Structure and constant declarations should not be declared in multiple places. Put the struct into the most "local" header possible.

Structure initialization should be done by field names instead of using positional arguments:

Code Block

language	c

struct foo {
        int     foo_val;
        int     foo_flag;
        int     foo_count;
        void   *foo_ptr;
        void   *foo_bar_ptr;
        void   *foo_baz_ptr;
};

...

^right:

Code Block

language	c

void func(void *data)
{
        struct foo fooz = { .foo_val = 1, .foo_flag = 0x20, .foo_ptr = data, .foo_baz_ptr = param };
}

...

^wrong:

...

Code Block
void func(void data) { / not sure which field is being initialized, breaks (maybe silently) if structure adds a new field */ struct foo fooz = { 1, 0x20, data, param }; }

...

When allocating/freeing a single struct, use D_ALLOC_PTR() for clarity:

^right:

...

Code Block

language	c

D_ALLOC_PTR(mds_body);

...

D_FREE_PTR(mds_body);

...

^wrong:

Code Block

language	c

D_ALLOC(mds_body, sizeof(*mds_body));

...

D_FREE(mds_body, sizeof(*mds_body));

Macros

When you define a macro, protect callers by placing parentheses round every parameter reference in the body.

...

Use a do/while (0) block with no trailing semicolon to ensure multi-statement macros are syntactically equivalent to procedure calls.

^right:

...

Code Block

language	c

#define DO_STUFF(a)                                     \
do {                                                    \
        int b = (a) + MAGIC;                            \
        do_other_stuff(b);                              \
} while (0)

...

^wrong:

...

Code Block

language	c

#define DO_STUFF(a) \
do { \
        int b = a + MAGIC; \
        do_other_stuff(b); \
} while (0);

...

If you write conditionally compiled code in a procedure body, make sure you do not create unbalanced braces, quotes, etc. This really confuses editors that navigate expressions or use fonts to highlight language features. It can often be much cleaner to put the conditionally compiled code in its own helper function which, by good choice of name, documents itself too.

^right:

...

Code Block

language	c

int do_stuff(struct object *obj)
{
        if (invalid_object(obj)) {
                ...

...

^wrong:

Code Block

language	c

int do_stuff(struct object *obj
#ifdef HAVE_FOO_FEATURE
             , struct foo_extra *foo)
#else
             )
#endif
{
#ifdef HAVE_FOO_FEATURE

...



        do_other_stuff_with_foo(obj, foo);

...



#else

...



        do_other_stuff(obj, foo);

...



#endif

...

...

...

If you nest preprocessor commands, use spaces to visually delineate:

Code Block

language	c

#ifdef __USE_STEAK__
# include <goose>
# define MOOSE steak
#else /* !__USE_STEAK__ \*/
# include <mutton>
# define MOOSE prancing
#endif

...

For long or nested #ifdef blocks, include the conditional as a comment with each #else and #endif to make it clear which block is being terminated:

Code Block

language	c

#ifdef __KERNEL__
# if HAVE_SOME_KERNEL_API
/* lots
   of
   stuff */
# endif /* HAVE_SOME_KERNEL_API */
#else /* !__KERNEL__ */
# if HAVE_ANOTHER_FEATURE
/* more
 * stuff */
# endif /* HAVE_ANOTHER_FEATURE */
#endif /* __KERNEL__ */

Comments

Single-line comments should have the leading /* and trailing */ on the same line as the comment. Multi-line comments should have the leading /* and trailing */ on their own lines, to match the upstream kernel coding style. Intermediate lines should start with a * aligned with the * on the first line:

Code Block

language	c

/** This is a short comment */
 
/**
 * This is a multi-line comment.  I wish the line would wrap already,
 * as I don't have much to write about.
 */

...

Assertions

Do not embed operations inside assertions. If assertions are disabled for performance reasons this code will not be executed.

^right:

...

Code Block

language	c

len = strcat(foo, bar);

...

D_ASSERT(len > 0);

...

^wrong:

Code Block

language	c

D_ASSERT(strcat(foo, bar) > 0);

...

Assertions should be used carefully only for detecting internal consistent state and never for input data from the user, data from the network or for legitimate error condition (e.g. allocation failure). Don't combine multiple sanity checks in the same ASSERT() since it is difficult to find out which one failed.

...

^right:

Code Block

language	c

D_ASSERT(len > 0);

...

D_ASSERT(obj)

...

^wrong:

Code Block

language	c

D_ASSERT(len > 0 && obj);

Error Messages

Messages on the console (D_ERROR(), D_WARN()) should be written so they provide useful information to the *administrator* and/or support person in case of a *significant event* or *failure condition*. They should not print "debugging" information in cases that might be hit under normal usage or user-generated race conditions, since verbose console error messages lose the important messages in a flood of noise. Consider that there may be thousands of servers and tens of thousands of clients hitting some failure at the same time.

Print the pool or container UUID at the start of the message, since there are usually multiple targets running on a single server or multiple mountpoints on a client. The error messages should also include enough information to make some useful diagnosis of the problem (e.g. FID and/or filename, client NID, actual values that caused failures, etc.). Otherwise, there is little value in having printed an error, but then needing to reproduce the problem to diagnose it:

^right:

...

Code Block

language	c

D_ERROR(DF_UUID": unsupported incompat filesystem feature(s) %x\n", DP_UUID(po_uuid), incompat);

...

D_ERROR(DF_UUID": cannot create root entry: rc = "DF_RC"\n", DP_UUID(co_uuid), DP_RC(rc));

...

D_ERROR(DF_UUID": error initializing object: rc = "DF_RC"\n",DP_UUID(co_uuid), DP_RC(rc));

...


D_ERROR(DF_UUID": cannot start coordinator thread: rc = "DF_RC"\n", mdt_obd_name(mdt), DP_RC(rc));

^wrong:

...

Code Block

language	c

D_ERROR("Cannot get handle\n"

...

);
D_ERROR("NULL bitmap!\n");

...


D_ERROR("invalid event\n");

...

D_ERROR("allocation failed\n");

...

Error messages that print a numeric error value should print it at the end of the line in a consistent format using ": rc = %d\n":

^right:

...

Code Block

language	c

D_ERROR("%s: error creating pool %s %s %s: rc = %d"

...

D_ERROR("%s: cannot destroy container: rc = %d\n", DP_UUID(co_uuid) ,rc);

^wrong:

...

Code Block

language	c

D_ERROR("err %d on param %d\n", rc, ptr);

...

D_ERROR("Can't get index (%d)\n", rc);

DAOS specific error messages (e.g. DER_*) should be printed using the DF_RC macro:

^right:

...

Code Block

language	c

D_ERROR(DF_UUID": cannot create root entry: rc = "DF_RC"\n", DP_UUID(co_uuid), DP_RC(rc));

^wrong:

...

Code Block

language	c

D_ERROR(DF_UUID": cannot create root entry: rc = %d\n", DP_UUID(co_uuid), rc);

Versions Compared

Old Version 22

New Version 23

Key

Autoformatting with clang-format and git-clang-format.

Indentation

Functions & Variables

Macros

Comments

Assertions

Error Messages

Page Comparison

Versions Compared

Old Version 22

New Version 23

Key

Autoformatting with clang-format and git-clang-format.

Indentation

Functions & Variables

Macros

Comments

Assertions

Error Messages