Thursday, May 1, 2014

A C error handling style that plays nice with C++ exceptions

I've written a lot of library code using what I call an "hourglass" pattern: I implement a library (in my case, typically, using C++), wrap it in a C API which becomes the only entry point to the library, then wrap that C API in C++ or some other language(s) to provide a rich abstraction and convenient syntax. When it comes to native cross-platform code, C APIs provide unparalleled ABI stability, and portability to other languages via FFIs. I even restrict the API to a subset of C that I know is portable to a wide variety of FFIs and insulates the library from leaking changes in internal data structures — expect more on that in future blog posts.

C Error Reporting Styles

For now, I want to talk about the style of error reporting I've come to adopt for such APIs. Options for error reporting from C functions include the following:

  • Return an error code from functions that can fail.
  • Provide a function like Windows's GetLastError() or OpenGL's glGetError() to retrieve the most recently occurring error code.
  • Provide a global (well, hopefully, thread-local) variable containing the most recent error, like POSIX's errno.
  • Provide a function to return more information about an error, possibly in conjunction with one of the above approaches, like POSIX's strerror function.
  • Allow the client to register a callback when an error occurs, like GLFW's glfwSetErrorCallback.
  • Use an OS-specific mechanism like structured exception handling.
  • Write errors out to a log file, stderr, or somewhere else.
  • Just assert() or somehow else terminate the program when an error occurs.

There are loads of tradeoffs among these options, and it's a matter of opinion as to which ones are better than others. Like all matters of style, unless a style is just plain unsuitable (like the last two options above might be for many uses), it's probably most important to apply the style you choose consistently.

The Hourglass C Error Handling Style

Let me illustrate the style of error handling I use for hourglass C APIs with the following example function for a fictional library called libfoo:

/** Create a set of  widgets.
  * Return libfoo_success and store the widgets in *out_widgets if the widgets were created
  * successfully, otherwise return
  *  - libfoo_invalid_argument if count is less than 1
  *  - libfoo_invalid_argument if out_widgets is NULL
  *  - libfoo_runtime_error if memory for the widgets could not be allocated
  */
libfoo_result_t libfoo_create_widgets(int count, libfoo_widget_container_t* out_widgets, 
                                      libfoo_error_details_t* out_error_details);

This made-up function creates objects of some sort and stores the created objects via a pointer passed in by the client. Assuming that libfoo_widget_container_t is a pointer type (perhaps a typedef to void*), a valid call to libfoo_create_widgets might look as follows:

libfoo_widget_container_t container = NULL;
libfoo_create_widgets(12, &container, NULL);

While valid, this call doesn't check for errors in any way, so a better approach might be:

libfoo_widget_container_t container = NULL;
if (libfoo_create_widgets(12, &container, NULL) != libfoo_success) {
   /* handle the error appropriately */
}

By now you might wonder what's up with that last parameter, out_error_details. The documentation for libfoo_create_widgets() doesn't mention it, and in the above examples we simply pass NULL into it. Elsewhere in our C API we might define and document the libfoo_error_details_t type and some related functions as follows:

/** Many functions in this API take a libfoo_error_details_t* argument as their last
  * parameter. Such an argument is always permitted to be NULL. If the argument is not NULL,
  * and the function returns a value other than libfoo_success, a libfoo_error_details_t
  * object will be stored at its location, from which more error information can be obtained.
  */
typedef void* libfoo_error_details_t;

/** Obtain a NULL-terminated string with more details about an error that occurred.
  */
const char* libfoo_error_details_c_str(libfoo_error_details_t error_details);

/** Free all resources associated with error_details. If error_details is NULL, do nothing.
  */
void libfoo_error_details_free(libfoo_error_details_t error_details);

Let's adjust our example call to libfoo_create_widgets() above to make use of this last parameter:

libfoo_widget_container_t container = NULL;
libfoo_error_details_t error = NULL;
if (libfoo_create_widgets(12, &container, &error) != libfoo_success) {
   printf("Error creating widgets: %s\n", libfoo_error_details_c_str(error));
   libfoo_error_details_free(error);
   abort(); // goodbye, cruel world!
}

While each (possibly failing) function in the API returns an error code, it also accepts an optional parameter, the error details object, that can provide additional error information. In the above example, we query the error details to retrieve a string with a (hopefully) human-readable description of the exact error that occurred.

If you think this form of error handling looks somewhat cumbersome from the point of view of someone writing plain C code, I agree with you! Remember that this API is the middle of an hourglass API, so it's really targeted primarily at being wrapped by another language that provides a more convenient interface. We could, though, make some tradeoffs if we wanted to make it less cumbersome — for example, we could reserve some space for the error details structure and only allow one valid error details object at a time, which would remove the need to have a libfoo_error_details_free() function.

Moving up into C++

Let's focus, though, on the hourglass case, and let's say that we're wrapping this API in some C++ code. How might we handle errors from our C API there?

For starters, we might want to use an RAII type to wrap up our error details object, so we don't have to manually free it. In C++11, such a class might look as follows:

class ErrorDetails {
public:
  ErrorDetails()
  : _details(nullptr) // initialize _details to NULL
  {
  }
  
  ~ErrorDetails()
  {
    // Note: libfoo_error_details_free() is safe to call on NULL values
    libfoo_error_details_free(_details);
  }

  // Provide access to the underlying libfoo_error_details in a way that's
  // convenient for passing to the libfoo API.
  libfoo_error_details_t* get() { return &_details; }

  const char* c_str() const {
    return libfoo_error_details_c_str(_details);
  }

private:
  libfoo_error_details_t _details;
};

With this class, a C++ code snippet calling libfoo_create_widgets() might look as follows:

libfoo_widget_container_t container = nullptr;
ErrorDetails error;
if (libfoo_create_widgets(12, &container, error.get()) != libfoo_success) {
  std::cout << "Error creating widgets: " << error.c_str() << std::endl;
  std::terminate(); // goodbye, cruel world!
}

In addition to using RAII to automatically free the error details object when we're done with it, we've also wrapped libfoo_error_details_c_str() in a member function for convenience.

Using Exceptions

On top of all of the error handling styles available in C, C++ has an entire language construct just to deal with errors: exceptions. What if we wanted to throw an exception when we notice the error, instead of dramatically terminating the entire program? We might write something like:

libfoo_widget_container_t container = nullptr;
ErrorDetails error;
if (libfoo_create_widgets(12, &container, error.get()) != libfoo_success) {
  throw std::runtime_error(error.c_str());
}

As before, our use of RAII in defining ErrorDetails will ensure that the error details object is freed up before the exception is propagated up the stack. Great! And we can now handle the exception however we like somewhere higher up in the program. But, this is still pretty verbose: if we wanted to throw an exception for every error that occurs when making a call to a libfoo function, we'd need this kind of code repeated over and over again.

We can do better. To do so, we're going to legitimately do something that's usually considered a no-no in C++: we're going to throw from a destructor. Have a look at this class:

class ThrowOnError {
public:
  ~ThrowOnError() noexcept(false)
  {
    if (*_details.get() != nullptr) {
      throw std::runtime_error(_details.c_str());
    }
  }

  // Allow an instance of ThrowOnError to be passed directly as the last argument
  // of a call to a libfoo function.
  operator libfoo_error_details_t*() { return _details.get(); }
  
private:
  ErrorDetails _details;
};

Before we go into the details of ThrowOnError, let's look at how we would adjust our example call to libfoo_create_widgets() to use it:

libfoo_widget_container_t container = nullptr;
libfoo_create_widgets(12, &container, ThrowOnError());

Whoa! Our four-line function call is back down to a single line! If libfoo_create_widgets fails, for whatever reason, we throw an exception of type std::runtime_error, and we include whatever error message we got from the call to libfoo_create_widgets. And, as before, our error details will be cleaned up nicely because of our use of RAII in ErrorDetails.

I hope you agree that this is a lot less syntactic overhead than explicitly checking for an error and throwing every time.

How ThrowOnError Works

Let's tease apart what's happening in our final call to libfoo_create_widgets():

  1. First, a temporary object of type ThrowOnError is constructed. This, in turn, constructs its ErrorDetails member, which allocates space for a libfoo_error_details_t object, ErrorDetails::_details and sets it to NULL.
  2. In order to pass the temporary ThrowOnError object to libfoo_create_widgets(), the ThrowOnError::operator libfoo_error_details_t* user-defined conversion function is invoked and returns a pointer to the ErrorDetails::_details member via ErrorDetails::get().
  3. Next, libfoo_create_widgets() is called. When it returns:
    • If the function succeeded, its last argument, the libfoo_error_details_t, has been left untouched.
    • If an error occurred, a non-NULL libfoo_error_details_t object has been assigned to the ErrorDetails::_details member.
  4. After libfoo_create_widgets() returns, but before the statement completes, the ThrowOnError destructor is called. At this point, if ErrorDetails::_details is not NULL, the destructor retrieves the error message and throws an exception.

Throwing From Destructors

Usually, throwing from a destructor is considered a Bad Idea, and for good reason. As an exception being unwound makes its way up the stack, destructors get called, and if such a destructor throws an exception, a C++ program will instantly terminate.

In this case, however, we have a legitimate reason for throwing from a destructor. If we stick to using ThrowOnError for its intended purpose — passing it as the last argument to C functions that use this style of error handling — we would need to be calling such a function from a destructor (directly, or indirectly via another function), for us to run the risk of a throw-during-unwind. That is no more or less risky than any other function call from a destructor.

This does suggest another rule for our hourglass C APIs, though: functions that are responsible for freeing resources, such as libfoo_error_details_free, should never fail, so as not to create the temptation to handle their failure using ThrowOnError.

You might wonder about the noexcept(false) decorating ThrowOnError's destructor, especially if you're unfamiliar with C++11. In C++11, the noexcept specifier was introduced to mark functions that don't throw exceptions. By default, functions are assumed to be able to throw exceptions, like in C++98, except for destructors. In C++11, by default, a destructor is assumed to be noexcept, and it takes an explicit noexcept(false) to mark it as throwing. If you leave this out, the C++ compiler will terminate the program when the destructor throws, just as it does with any other function marked noexcept if it throws.

Wrapping Up Loose Ends

By choosing a particular convention for a C API's error handling, we were able to very conveniently translate errors from the C API to C++ exceptions. You might wonder why the functions still bother to return an error code. I've found this helpful when writing C code that uses the C API directly, as it allows me to continue to handle errors using an if statement, and ignore the "error details" mechanism if I so wish by passing NULL as the last argument.

You might also wonder why we used an error details abstraction at all, rather than just making the last argument something like a char** that is set to the error message if an error occurred. Having an abstraction allows us to extend libfoo_error_details_t with additional functionality. For example, if we want to throw a different exception type based on the type of error that occurred, we can easily add a error_details_type() function that returns a libfoo_result_t given a libfoo_error_details_t. Then, we can use that function in the ThrowOnError destructor to decide what kind of exception we want to throw.

Every experienced C++ programmer I know has opinions on how errors should be handled. I've found this style to be quite useful when layering C++ on top of C, and I think there's some beauty in the interplay between C and C++ here. Whether you love or hate exceptions, and whether you prefer returning error codes or setting errno, I hope you agree with me on that last part!


Edit: As pubby8 points out below something to watch out for with this form of ThrowOnError is that it shouldn't be used more than once in an expression. In practice, with this particular C API convention, this is unlikely, but it's nonetheless something to watch out for and just goes to show that throwing from destructors is fraught with danger.

18 comments:

  1. Just a little point about libfoo_widget_container_t et. al.--generally I'd prefer these to be opaque struct typedefs, e.g. something like:

    typedef struct something_t *something_t;
    // struct something_t is not defined in public headers

    The problem with void * typedefs is that all are equal, and one can pass a variable of one type to an argument that expects another. With the opaque struct typedef like above, this problem doesn't arise, as each typedef references a unique type.

    ReplyDelete
    Replies
    1. Yes - I used to do something along these lines but switched back to void* a while back when I realized I was violating TBAA rules (not that it would be likely to make a difference in this context).

      But, it just occurred to me there are ways to do the opaque struct approach without violating TBAA rules - I'll play with this some more and probably write a follow up post. I have a few more posts on this pattern coming in general...

      Delete
  2. Good article, but I'm guessing that using more than one ThrowOnError per statement will lead to std::terminate. e.g. in seemingly reasonable code such as:

    std::cout << libfoo_create_widgets(1, &c1, ThrowOnError()) << libfoo_create_widgets(2, &c2, ThrowOnError());

    I suppose the way around this is to define ThrowOnError as a function, which could look like either this:

    ThrowOnError(libfoo_create_widgets, 12, &container);

    Or:

    ThrowOnError([&](libfoo_error_details_t details){ libfoo_create_widgets(12, &container, details); });

    Depending on how you implement it.

    ReplyDelete
    Replies
    1. No, there's no problem with statements containing multiple function calls. The arguments to a function aren't constructed until the function call expression is evaluated and are destroyed before the function call returns.

      Your concern is valid if you were to pass multiple ThrowOnError instances to exactly the same function call. But, with this error handling convention, there's only ever one libfoo_error_details_t parameter.

      Delete
    2. > The arguments to a function aren't constructed until the function call expression is evaluated and are destroyed before the function call returns.

      Sorry to be pedantic, but this isn't true. Destruction of temporaries happens at the end of the full-expression containing them, and destruction of arguments is apparently the same, as demonstrated by this example: http://coliru.stacked-crooked.com/a/484f129ffe109c8a

      Delete
    3. Don't be sorry! You're absolutely right and I'm absolutely wrong!

      I thought the destruction would occur at the sequence point following the function call, but as you say it happens at the end of the full-expression. That is indeed unfortunate. In practice, however, I don't think I've ever used more than one such call in an expression, simply because all these functions return only an error code, as opposed to some value I might do something with.

      Even something like...

      if (libfoo_bar(..., ThrowOnError()) == libfoo_success && libfoo_baz(..., ThrowOnError()) == libfoo_success)

      ...doesn't actually occur in practice, since there's no point in combining an error check and ThrowOnError().

      Nonetheless this is something to look out for. Thanks for pointing out my mistake!

      Delete
    4. But in this context, only one of the ThrowOnError instances can hold an error? Either the first one succeeds, or it throws already? And only the one which holds an error actually throws upon destruction? Or am I interpreting the C API wrongly here? It may not work with a (thread-local) errno-based API, of course.

      Delete
    5. Ok, found my mistake. The second call is still invoked as the first ThrowOnError is not yet destructed and therefore didn't throw "early enough".

      Delete
  3. Technically speaking this implementation may be buggy. Documentation of your libfoo library says about last argument that "If the argument is not NULL, and the function returns a value other than libfoo_success, a libfoo_error_details_t object will be stored at its location, from which more error information can be obtained". But it doesn't explicitly say what happens if the when function returns libfoo_success, therefore the last arg value is undefined, it might be NULL but it doesn't have to. In your implementation you throw exception on any non NULL result:
    if (*_details.get() != nullptr) {
    throw std::runtime_error(_details.c_str());
    }

    ReplyDelete
    Replies
    1. You're right that I could've been more explicit. I usually follow the convention that an argument isn't touched unless it's documented that it will be touched. I suppose that the "if" should be "if and only if".

      Delete
  4. So this, then? https://developer.gnome.org/glib/unstable/glib-Error-Reporting.html

    ReplyDelete
    Replies
    1. Thanks for the link! I didn't know glib followed this convention. Awesome!

      Delete
  5. Very nice article! Just a minor thing: ThrowOnError's conversion operator is marked as "explicit", which (unless I'm mistaken) does not allow you to simply write ThrowOnError() as the last argument; but since that's the point of the idiom, I guess the conversion operator should not be "explicit". Is this correct?

    ReplyDelete
    Replies
    1. Good catch! I'm learning my lesson not to write code in a blog post! I've edited the definition of ThrowOnError above to remove the "explicit" from the conversion operator.

      Delete
  6. Just a very small issue. In member function "ErrorDetails::get() const", there seems to be a conversion from more qualified to less qualified (which should be ill-formed).

    ReplyDelete
    Replies
    1. Well spotted, thanks! ThrowOnError's conversion operator needed to be marked non-const as well. I've fixed the post and actually fed the code through a compiler (what an idea!).

      Delete
  7. Another small issue. For the code to be complete, you should also deal with copy constructors and assignment operators of your classes. Currently copying ErrorDetails object may result in double free error.

    Just in case someone decides to copy and use this excellent code. :)

    ReplyDelete
    Replies
    1. Agreed. ErrorDetails should have, at minimum a deleted copy constructor and copy assignment operator. For bonus points it could have a move constructor and move assignment operator; alternatively it could instead contain a unique_ptr with a custom deleter, which would give us all of those for free.

      I expect to make a complete implementation of these available at some point, along with some other related code. Stay tuned!

      Delete