Wednesday, April 07, 2010

System error support in C++0x - part 1

Among the many new library features in C++0x is a little header called <system_error>. It provides a selection of utilities for managing, well, system errors. The principal components defined in the header are:

  • class error_category
  • class error_code
  • class error_condition
  • class system_error
  • enum class errc

I had a hand in the design of this facility, so in this series of posts I will try to capture the rationale, history, and intended uses of the various components.

Where to get it

A complete implementation, and one that supports C++03, is included in Boost. I'd guess that, at this point in time, it is probably the best tested implementation in terms of portability. Of course, you have to spell things starting with boost::system:: rather than std::.

An implementation is included with GCC 4.4 and later. However, you must compile your program with the -std=c++0x option in order to use it.

Finally, Microsoft Visual Studio 2010 will ship with an implementation of the classes. The main limitation is that the system_category() does not represent Win32 errors as was intended. More on what that means later.

(Note that these are just the implementations that I am aware of. There may be others.)


Overview

Here are the types and classes defined by <system_error>, in a nutshell:

  • class error_category - intended as a base class, an error_category is used to define sources of errors or categories of error codes and conditions.
  • class error_code - represents a specific error value returned by an operation (such as a system call).
  • class error_condition - something that you want to test for and, potentially, react to in your code.
  • class system_error - an exception used to wrap error_codes when an error is to be reported via throw/catch.
  • enum class errc - a set of generic error condition values, derived from POSIX.
  • is_error_code_enum<>, is_error_condition_enum<>, make_error_code, make_error_condition - a mechanism for converting enum class error values into error_codes or error_conditions.
  • generic_category() - returns a category object used to classify the errc-based error codes and conditions.
  • system_category() - returns a category object used for error codes that originate from the operating system.

Principles

This section lists some of the guiding principles I had in mind in designing the facility. (I cannot speak for the others involved.) As with most software projects, some were goals at the outset and some were picked up along the way.

Not all errors are exceptional

Simply put, exceptions are not always the right way to handle errors. (In some circles this is a controversial statement, although I really don't understand why.)

In network programming, for example, there are commonly encountered errors such as:

  • You were unable to connect to a remote IP address.
  • Your connection dropped out.
  • You tried to open an IPv6 socket but no IPv6 network interfaces are available.

Sure, these might be exceptional conditions, but equally they may be handled as part of normal control flow. If you reasonably expect it to happen, it's not exceptional. Respectively:

  • The IP address is one of a list of addresses corresponding to a host name. You want to try connecting to the next address in the list.
  • The network is unreliable. You want to try to reestablish the connection and only give up after N failures.
  • Your program can drop back to using an IPv4 socket.

Another requirement, in the case of Asio, was a way to pass the result of an asynchronous operation to its completion handler. In this case, I want the operation's error code to be an argument to the handler callback. (An alternative approach is to provide a means to rethrow an exception inside the handler, such as .NET's BeginXYZ/EndXYZ asynchronous pattern. In my opinion, that design adds complexity and makes the API more error-prone.)

Last, but not least, some domains will be unable or unwilling to use exceptions due to code size and performance constraints.

In short: be pragmatic, not dogmatic. Use whatever error mechanism suits best in terms of clarity, correctness, constraints, and, yes, even personal taste. Often the right place to make the decision between exception and error code is at the point of use. That means a system error facility should support both.

Errors come from multiple sources

The C++03 standard recognises errno as a source of error codes. This is used by the stdio functions, some math functions, and so forth.

On POSIX platforms, many system operations do use errno to propagate errors. POSIX defines additional errno error codes to cover these cases.

Windows, on the other hand, does not make use of errno beyond the standard C library. Windows API calls typically report their errors via GetLastError.

When one considers network programming, the getaddrinfo family of functions uses its own set of error codes (EAI_...) on POSIX, but shares the GetLastError() "namespace" on Windows. Programs that integrate other libraries (for SSL, regular expressions, or whatever) will encounter other categories of error code.

Programs should be able to manage these error codes in a consistent manner. I'm especially concerned with enabling composition of operations to create higher-level abstractions. Combining system calls, getaddrinfo, SSL and regular expressions into one API should not force the user of that API to deal with an explosion of error code types. The addition of a new error source to the implementation of that API should not change the interface.

Be user-extensible

Users of the standard library need to be able to add their own error sources. This ability may just be used to integrate a third-party library, but is also tied in with the desire to create higher-level abstractions. When developing a protocol implementation such as HTTP, I want to be able to add a set of error codes corresponding to the errors defined in the RFC.

Preserve the original error code

This was not one of my original goals: my thinking was that the standard would provide a set of well-known error codes. If a system operation returned an error, it was the responsibility of the library to translate the error into a well-known code (if such a mapping made sense).

Fortunately, someone showed me the error of my ways. Translating an error code discards information: the error returned by the underlying system call is lost. This may not be a big deal in terms of program control flow, but it matters a lot for program supportability. There is no doubt that programmers will use a standardised error code object for logging and tracing, and the original error may be vital in diagnosing problems.

This final principle segues nicely into my topic for part 2: error_code vs error_condition. Stay tuned.

1 comment:

Anonymous said...

excellent article !

This is also good material for your book :)

regards
jose