Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Auto-formatting with clang-format and git-clang-format.

As of 2022 and clang versions 14.0.5 or newer then clang-format is able to format the code automatically, there is a .clang-format config file in the source that applies the correct configuration.  This will be automatically applied via a githook if available and configured or can be run manually.  Where established coding practices or what's in this document are inconsistent with clang-format then clang-format takes precedence to allow us to use the tools properly, although we maybe able to adjust configuration options to suit.

Currently only fedora ships with a new enough clang for this to work automatically although pre-compiled versions are available on the wolf cluster.

Indentation

Tabs should be used in all C files. Tab width should be set at 8 characters in your IDE.

Tabs are preferred to align variable names in structure and enum declarations. This may temporarily misalign them with other variable names in the structure that are using spaces for indentation, or consider fixing up the whole struct in the same patch if there aren't too many of them. In some cases (long variable names, long comments, nested unions) then it isn't practical to use full tabs to align the fields and spaces can be used for partial indentation. Moreover, the returned type and the function name should be on two different lines.

right:

Code Block
languagec
void
func_helper(...)
{

        struct foobar_counter  *foo;
        unsigned long           last;
        int                     rc = 0;

wrong:

...

languagec

...

VS Code has plugins to automatically format changes on save, to enable this select format on save and set format on save mode to modifications

Ways of accessing clang-format

  • A commit-hook with automatically apply correct formatting settings if the compiler version is new enough (EL 8.6 and newer).

  • GitHub actions will run a job indicating correct formatting and present a downloadable patch file if formatting is incorrect.

  • VS Code can be configured to apply formatting on save.

Basics

  1. Line length is 100, reflow code which goes beyond this.

  2. Tabs (8 characters) for indentation at start of line, spaces thereafter.

  3. Function declarations have a new-line after the return type.

  4. Use Doxygen annotations on larger or complex functions.

  5. Use allocation macros, and check the results.

  6. Blank lines after declaration blocks.

Functions & Variables

Functions that are more than a few lines long should be declared with a leading comment block that describes what the function does, any assumptions for the caller (locks already held, locks that should not be held, etc), and return values. DAOS uses the DOxygen markup style for formatting the code comments, as shown in the example here. \a argument can be used in the descriptive text to name a function argument. \param argument definition should be used to define each of the function parameters, and can be followed by [in][out], or [in,out] to indicate whether each parameter is used for input, output, or both. \retval should be used to define the function return values.

right:

Code Block
languagec
/**
 * Connect to the DAOS pool identified by UUID \a uuid. Upon a successful
 * completion, \a poh returns the pool handle, and \a info returns the latest
 * pool information.
 *
 * \param[in]   uuid  UUID to identify a pool.
 * \param[in]   grp   Process set name of the DAOS servers managing the pool
 * \param[in]   svc   Pool service replica ranks, as reported by
 *                   struct foobar_counterdaos_pool_create().
 *foo; \param[in]   flags Connect mode represented by unsigned long last;

All lines should wrap at 100 characters. This is to avoid the need to have extra-wide terminal windows for the few lines of code that exceed 100 columns, and avoids nesting functions and conditionals too deeply. If it's getting too hard to wrap at 100 characters, you probably need to rearrange conditional order or break it up into more functions.

right:

Code Block
languagec
void
func_helper(...)
{

        struct foobar_counter  *foo;
        unsigned long     the DAOS_PC_ bits.
 * \param[out]  poh   Returned open handle.
 * \param[out]  info  Returned pool info.
 * \param[in]   ev    Completion event, it is optional and can be NULL.
 *      last;         int     The function will run in blocking mode if \a ev is NULL.
 *
 * rc\return = 0;           do_sth2_1;
 
    These values will be returned by \a ev::ev_error in
 *    if (cond3)               non-blocking  do_sth_which_needs_a_very_long_line(and, lots, of, arguments);mode:
 *             do_sth2_2;
}       void
func(...)
{0            long Success
 first;*         int    i;       -DER_INVAL    if (!cond1)Invalid parameter
 *               return;     -DER_UNREACH  Network is unreachable
 do_sth1_1;*               if (cond 2)   -DER_NO_PERM  Permission denied
 *         func_helper(...)           do_sth1_2;
}

wrong:

Code Block
languagec
void
func(...)
{
        int      -DER_NONEXIST Pool is nonexistent
 */
int
daos_pool_connect(const uuid_t uuid, const char *grp,
             rc;         struct foobar_counter *foo;
  const d_rank_list_t *svc, unsigned int flags,
     unsigned long          last;  daos_handle_t *poh, daos_pool_info_t *info,      if (cond1) {
        daos_event_t *ev);

wrong:

Code Block
languagec
/* Connect */

int
daos_pool_connect(const uuid_t uuid, const char *grp,
       do_sth1_1;           const d_rank_list_t *svc, unsigned int flags,
if (cond2) {                daos_handle_t *poh, daos_pool_info_t       do_sth2_1;
                        if (cond3) {
                                do_sth_which_needs_a_very_long(and, lots, of, arguments);
                        }
                        do_sth2_2;
                }
                do_sth1_2;
        }
}  

Do not include spaces or tabs on blank lines or at the end of lines. Please ensure you remove all instances of these in any patches you submit to GitHub. You can find them with grep or in vim using the following regexps /\[ \t\]$/ or use git git diff --check. Alternatively, if you use vim, you can put this line in your .vimrc file, which will highlight whitespace at the end of lines and spaces followed by tabs in indentation (only works for C/C++ files) let c_space_errors=1. Or you can use this command, which will make tabs and whitespace at the end of lines visible for all files (but a bit more discretely) set list listchars=tab:>\ ,trail:$. In emacs, you can use (whitespace-mode) or (whitespace-visual-mode)depending on the version. You could also consider using (flyspell-prog-mode).

Functions & Variables

Functions that are more than a few lines long should be declared with a leading comment block that describes what the function does, any assumptions for the caller (locks already held, locks that should not be held, etc), and return values. DAOS uses the DOxygen markup style for formatting the code comments, as shown in the example here. \a argument can be used in the descriptive text to name a function argument. \param argument definition should be used to define each of the function parameters, and can be followed by [in][out], or [in,out] to indicate whether each parameter is used for input, output, or both. \retval should be used to define the function return values.

right:

Code Block
languagec
/**
 * Connect to the DAOS pool identified by UUID \a uuid. Upon a successful
 * completion, \a poh returns the pool handle, and \a info returns the latest
 * pool information.
 *
 * \param[in]   uuid  UUID to identify a pool.
 * \param[in]   grp   Process set name of the DAOS servers managing the pool
 * \param[in]   svc   Pool service replica ranks, as reported by
 *                    daos_pool_create().
 * \param[in]   flags Connect mode represented by the DAOS_PC_ bits.
 * \param[out]  poh   Returned open handle.
 * \param[out]  info  Returned pool info.
 * \param[in]   ev    Completion event, it is optional and can be NULL.
 *                    The function will run in blocking mode if \a ev is NULL.
 *
 * \return            These values will be returned by \a ev::ev_error in
 *                    non-blocking mode:
 *                    0             Success
 *                    -DER_INVAL    Invalid parameter
 *                    -DER_UNREACH  Network is unreachable
 *                    -DER_NO_PERM  Permission denied
 *                    -DER_NONEXIST Pool is nonexistent
 */
int
daos_pool_connect(const uuid_t uuid, const char *grp,
                  const d_rank_list_t *svc, unsigned int flags,
                  daos_handle_t *poh, daos_pool_info_t *info, daos_event_t *ev);

wrong:

Code Block
languagec
/* Connect */

int
daos_pool_connect(const uuid_t uuid, const char *grp,
                  const d_rank_list_t *svc, unsigned int flags,
                  daos_handle_t *poh, daos_pool_info_t *info, daos_event_t *ev);

...

*info, daos_event_t *ev);

Don't use inline unless you're doing something so performance critical that the function call overhead will make a difference -- in other words - almost never. It makes debugging harder and overuse can actually hurt performance by causing instruction cache thrashing or crashes due to stack overflow.

...

Variables should be declared one per line, type and name, even if there are multiple variables of the same type. There should be one space between the variable type and the variable name and when multiple variables are declared, the names must be aligned with tabsspaces. For maximum readability, longer or more important declarations should be at the top. There should always be an empty line after the variable declarations, before the start of the code. There shouldn't be complex variable initializers (e.g. function calls) in the declaration block, since this may be easily missed by the reader and make it confusing to debug.

...

Code Block
languagec
struct vos_container  *cont;
struct d_uuid         *pkey;
int                    count;
int                    rc;

cont = container_of(ulink, struct vos_container, vc_uhlink);

...

Code Block
languagec
int len;

if (len > 0) {
        struct obj  *my_obj;
        int          count;

        my_obj = oget(foo);
 
        count = my_obj->size;
 
        if (count > 32) {
                int left = count;
                ...
        }
}

...

Code Block
languagec
D_ALLOC_PTR(mds_body);
D_FREE_PTR(mds_body);

wrong:

Code Block
languagec
D_ALLOC(mds_body, sizeof(*mds_body));
D_FREE(mds_body, sizeof(*mds_body));

Macros

When you define a macro, protect callers by placing parentheses round every parameter reference in the body.

Line up the backslashes of multi-line macros one tabstop from the end of the line at column 100 to help readability.

Use a do/while (0) block with no trailing semicolon to ensure multi-statement macros are syntactically equivalent to procedure calls.

...

Code Block
languagec
/** This is a short comment */
 
/**
 * This is a multi-line comment.  I wish the line would wrap already,
 * as I don't have much to write about.
 */

Assertions

Do not embed operations inside assertions. If assertions are disabled for performance reasons this code will not be executed.

right:

Code Block
languagec
len = strcat(foo, bar);
D_ASSERT(len > 0);

wrong:

Code Block
languagec
D_ASSERT(strcat(foo, bar) > 0);

...


 * as I don't have much to write about.
 */

Assertions

Do not embed operations inside assertions. If assertions are disabled for performance reasons this code will not be executed.

right:

Code Block
languagec
len = strcat(foo, bar);
D_ASSERT(len > 0);
D_ASSERT(obj)

wrong:

Code Block
languagec
D_ASSERT(len > 0 && obj);

Error Messages

Messages on the console (D_ERROR()D_WARN()) should be written so they provide useful information to the *administrator* and/or support person in case of a *significant event* or *failure condition*. They should not print "debugging" information in cases that might be hit under normal usage or user-generated race conditions, since verbose console error messages lose the important messages in a flood of noise. Consider that there may be thousands of servers and tens of thousands of clients hitting some failure at the same time.

Print the pool or container UUID at the start of the message, since there are usually multiple targets running on a single server or multiple mountpoints on a client. The error messages should also include enough information to make some useful diagnosis of the problem (e.g. FID and/or filename, client NID, actual values that caused failures, etc.). Otherwise, there is little value in having printed an error, but then needing to reproduce the problem to diagnose it:

right:

Code Block
languagec
D_ERROR(DF_UUID": unsupported incompat filesystem feature(s) %x\n", DP_UUID(po_uuid), incompat);
D_ERROR(DF_UUID": cannot create root entry: rc = "DF_RC"\n", DP_UUID(co_uuid), DP_RC(rc));
D_ERROR(DF_UUID": error initializing object: rc = "DF_RC"\n",DP_UUID(co_uuid), DP_RC(rc));
D_ERROR(DF_UUID": cannot start coordinator thread: rc = "DF_RC"\n", mdt_obd_name(mdt), DP_RC(rc));

...

strcat(foo, bar) > 0);

Assertions should be used carefully only for detecting internal consistent state and never for input data from the user, data from the network or for legitimate error condition (e.g. allocation failure). Don't combine multiple sanity checks in the same ASSERT() since it is difficult to find out which one failed.

right:

Code Block
languagec
D_ASSERT(len > 0);
D_ASSERT(obj)

wrong:

Code Block
languagec
D_ASSERT(len > 0 && obj);

Error Messages

Messages on the console (D_ERROR()D_WARN()) and D_INFO() should be written so they provide useful information to the *administrator* and/or support person in case of a *significant event* or *failure condition*. They should not print "debugging" information in cases that might be hit under normal usage or user-generated race conditions, since verbose console error messages lose the important messages in a flood of noise. Consider that there may be thousands of servers and tens of thousands of clients hitting some failure at the same time.

Print the pool or container UUID at the start of the message, since there are usually multiple targets running on a single server or multiple mountpoints on a client. The error messages should also include enough information to make some useful diagnosis of the problem (e.g. FID and/or filename, client NID, actual values that caused failures, etc.). Otherwise, there is little value in having printed an error, but then needing to reproduce the problem to diagnose it:

right:

Code Block
languagec
D_ERROR("Cannot get handle\n");
D_ERROR("NULL bitmap!\n");
D_ERROR("invalid event\n");
D_ERROR("allocation failed\n");

Error messages that print a numeric error value should print it at the end of the line in a consistent format using ": rc = %d\n":

right:

Code Block
languagec
D_ERROR("%s: error creating pool %s %s %s: rc = %d"
D_ERROR("%s: cannot destroy container: rc = %d\n", DP_UUID(co_uuid) ,rc);

wrong:

Code Block
languagec
D_ERROR("err %d on param %d\n", rc, ptrDF_UUID ": unsupported incompat filesystem feature(s) %x\n", DP_UUID(po_uuid), incompat);
DL_ERROR(DF_UUID ": cannot create root entry", DP_UUID(co_uuid));
DL_ERROR(DF_UUID ": error initializing object", DP_UUID(co_uuid);
DL_ERROR(DF_UUID ": cannot start coordinator thread", mdt_obd_name(mdt));

wrong:

Code Block
languagec
D_ERROR("Cannot get handle\n");
D_ERROR("NULL bitmap!\n");
D_ERROR("invalid event\n");
D_ERROR("Can't get index (%d)allocation failed\n", rc);

DAOS specific error messages (e.g. DER_*) should be printed using the DF_RC macro:Error messages that are able to include a numeric error value should use the appropriate macros, and contain no punctuation at the end of the line

right:

Code Block
languagec
DDL_ERROR(DF_UUID": cannot create root entry: rc = "DF_RC"\nrc, "%s: error creating pool %s %s %s"
DL_ERROR(rc, DF_UUID ": cannot destroy container", DP_UUID(co_uuid), DP_RC(rc));

wrong:

Code Block
languagec
D_ERROR(DF_UUID":err cannot%d create root entry: rc =on param %d\n", rc, DP_UUID(co_uuid)ptr);
D_ERROR("Can't get index (%d)\n", rc);