Stepping away from my series on updating the Lucena Utility Library (LUL) for a bit, I’d like to discuss a passing reference I made to the difficulty in providing header-only reference implementations for some of the newer Standard Library pieces that haven’t passed into ubiquity on our various target platforms. Specifically, I implied that avoiding One Definition Rule (ODR) violations made this impossible. I’m going to qualify that statement and describe some of the issues in detail.
The One Definition Rule: a refresher
In my experience, the ODR is one of the most perniciously botched requirements in all of C++. Veteran coders are as susceptible to violating it as newbs. Compilers aren’t obligated to identify violations—and not only frequently don’t, in some cases they can’t. Finally, violations trigger C++’s very own Crawling Chaos: undefined behavior.
The C++ ODR, in a nutshell, says that if a definition for a variable, function, class type, enumeration type, or template is used in a program, it has to be the only definition for it—or at least all the definitions have to functionally behave as the same definition. Inline functions and inline variables are exempt; concepts are added to the list of restricted constructs in C++2a. The actual rules are a bit more involved; you can see them summarized here or read them for yourself in the Standard (section 6.2, at the time of writing).
Recall that the ODR applies to the entire program, and not just each translation unit. In practical terms, a translation unit is a given source file plus any headers it includes, basically the stuff that gets seen as a single object file by the linker. The program, on the other hand, is all the statically and dynamically linked binaries that comprise whatever executable you’re running. By the way, it’s worth pointing out, since it’s the frequent source of confusion, that we’re talking about definitions, specifically—not declarations. Declarations can be repeated ad nauseam throughout a program, although they, too, need to fundamentally match.
Case study: <optional>
Among the handy new utilities in the C++17 Standard Library is std::optional
. I won’t go into detail about what it does here, as all we care about for this discussion is that it doesn’t have a complete implementation in all of our target compilers. Specifically, Apple LLVM in Xcode 9 doesn’t support it. Read on for an explanation, or just skip to the next paragraph if Apple-y details don’t interest you. The macOS 10.13 SDK bundled with Xcode 9 includes an <experimental/optional> header, but since the requisite binary code isn’t part of the libc++abi.dylib file bundled with macOS 10.13, and because Apple provides no glue code in the SDK, attempting to actually use std::optional
will eventually cause an error or crash. There are similar issues with other types, such as std::any
, and there used to be with std::shared_mutex
(and friends) until macOS 10.12 updated the dylib. I won’t launch into a rant here questioning why the header was included at all, or why Apple didn’t bother to give useful feature test results; we’ll just accept that this is the situation we’re in and move on <cough>.
So, why does libc++, the basis for Apple’s C++ Standard Library implementation, not have a header-only implementation of <optional>, instead requiring an additional binary component? Note that both gcc and Microsoft Visual Studio (MSVS) have header-only implementations in their current versions. The reason is the ODR.
The <optional> specification says that there must be an exception type, std::bad_optional_access
, that can be thrown if an attempt is made to access the contents of an empty or uninitialized optional. std::bad_optional_access
derives from std::exception
, and as such overrides the what()
member function, whose full signature is virtual const char* what() const noexcept
. For efficiency, library implementers like to have a single string literal for this and just pass back its address—and that’s where the trouble starts.
Since we’re taking the address of a string literal, we’re de facto declaring it to have external linkage, which is one of the things that triggers the ODR. The slightly redundant, simplified description of external linkage is that if a thing takes up space in memory, it has an address by which we can refer to it, and if we know that address, we can access it. So far, so good…unless the definition for our string literal is in a header file. From the Standard’s point-of-view, it doesn’t care where our what()
’s return value comes from, as long as all the bad_optional_access
what()
s’ contents compare as equal. However, the potential ODR violation means we’re technically in the realm of undefined behavior: “Hello, nasal demons!”
There are a few ways around this. One is to give the string internal linkage, which can be done by sticking it in an anonymous namespace, or storing it statically in a member function of a utility class. The downside here is that even for a string constant, the compiler would be within its rights to generate a new copy in every translation unit, though none of the tested compilers actually did this. Another solution, which libc++ uses, is to just roll with the external linkage, but make sure the string storage is in a single source file.
Getting back to my earlier mention of gcc’s and MSVS’s implementations, current versions of both of them risk ODR violations by just returning a pointer to a string literal in the header file. Note that this is only a “risk”, not a guaranteed violation, because given the tight coupling between their Standard Library implementations and their compilers, they could technically have some sort of special handling that recognizes the situation and addresses it in an implementation-defined way. You could only be sure by inspecting the source code. However, I submit that if your defense against undefined behavior is dependent on an arbitrary version of a particular compiler, you aren’t defended. Additionally, should you be using libstdc++ or the MSVC Standard Library with some other compiler for some reason, it’s safe to say you won’t have whatever marginal protection you might otherwise have had.
Not exactly “impossible”
This ultimately leads us to the promised qualification: we could have header-only reference implementations, but their usage would have undesirable—or at least unpredictable—effects on memory usage depending on how internal linkage of static const string literals is handled by a given compiler. Or we could just let the nasal demons fly.
Next Time
In the next post, I should be continuing with the next entry on the LUL update.