Defective C++

Part of C++ FQA Lite

This page summarizes the major defects of the C++ programming language (listing all minor quirks would take eternity). To be fair, some of the items by themselves could be design choices, not bugs. For example, a programming language doesn't have to provide garbage collection. It's the combination of the things that makes them all problematic. For example, the lack of garbage collection makes C++ exceptions and operator overloading inherently defective. Therefore, the problems are not listed in the order of "importance" (which is subjective anyway - different people are hit the hardest by different problems). Instead, most defects are followed by one of their complementary defects, so that when a defect causes a problem, the next defect in the list makes it worse.

No compile time encapsulation

In naturally written C++ code, changing the private members of a class requires recompilation of the code using the class. When the class is used to instantiate member objects of other classes, the rule is of course applied recursively.

This makes C++ interfaces very unstable - a change invisible at the interface level still requires to rebuild the calling code, which can be very problematic when that code is not controlled by whoever makes the change. So shipping C++ interfaces to customers can be a bad idea.

Well, at least when all relevant code is controlled by the same team of people, the only problem is the frequent rebuilds of large parts of it. This wouldn't be too bad by itself with almost any language, but C++ has...

Outstandingly complicated grammar

"Outstandingly" should be interpreted literally, because all popular languages have context-free (or "nearly" context-free) grammars, while C++ has undecidable grammar. If you like compilers and parsers, you probably know what this means. If you're not into this kind of thing, there's a simple example showing the problem with parsing C++: is AA BB(CC); an object definition or a function declaration? It turns out that the answer depends heavily on the code before the statement - the "context". This shows (on an intuitive level) that the C++ grammar is quite context-sensitive.

In practice, this means three things. First, C++ compiles slowly (the complexity takes time to deal with). Second, when it doesn't compile, the error messages are frequently incomprehensible (the smallest error which a human reader wouldn't notice completely confuses the compiler). And three, parsing C++ right is very hard, so different compilers will interpret it differently, and tools like debuggers and IDEs periodically get awfully confused.

And slow compilation interacts badly with frequent recompilation. The latter is caused by the lack of encapsulation mentioned above, and the problem is amplified by the fact that C++ has...

No way to locate definitions

OK, so before we can parse AA BB(CC);, we need to find out whether CC is defined as an object or a type. So let's locate the definition of CC and move on, right?

This would work in most modern languages, in which CC is either defined in the same module (so we've already compiled it), or it is imported from another module (so either we've already compiled it, too, or this must be the first time we bump into that module - so let's compile it now, once, but of course not the next time we'll need it). So to compile a program, we need to compile each module, once, no matter how many times each module is used.

In C++, things are different - there are no modules. There are files, each of which can contain many different definitions or just small parts of definitions, and there's no way to tell in which files CC is defined, or which files must be parsed in order to "understand" its definition. So who is responsible to arrange all those files into a sensible string of C++ code? You, of course! In each compiled file, you #include a bunch of header files (which themselves include other files); the #include directive basically issues a copy-and-paste operation to the C preprocessor, inherited by C++ without changes. The compiler then parses the result of all those copy-and-paste operations. So to compile a program, we need to compile each file the number of times it is used in other files.

This causes two problems. First, it multiplies the long time it takes to compile C++ code by the number of times it's used in a program. Second, the only way to figure out what should be recompiled after a change to the code is to check which of the #include files have been changed since the last build. The set of files to rebuild generated by this inspection is usually a superset of the files that really must be recompiled according to the C++ rules of dependencies between definitions. That's because most files #include definitions they don't really need, since people can't spend all their time removing redundant inclusions.

Some compilers support "precompiled headers" - saving the result of the parsing of "popular" header files to some binary file and quickly loading it instead of recompiling from scratch. However, this only works well with definitions that almost never change, typically third-party libraries.

And now that you've waited all that time until your code base recompiles, it's time to run and test the program, which is when the next problem kicks in.

No run time encapsulation

Programming languages have rules defining "valid" programs - for example, a valid program shouldn't divide by zero or access the 7th element of an array of length 5. A valid program isn't necessarily correct (for example, it can delete a file when all you asked was to move it). However, an invalid program is necessarily incorrect (there is no 7th element in the 5-element array). The question is, what happens when an invalid program demonstrates its invalidity by performing a meaningless operation?

If the answer is something like "an exception is raised", your program runs in a managed environment. If the answer is "anything can happen", your program runs somewhere else. In particular, C and C++ are not designed to run in managed environments (think about pointer casts), and while in theory they could run there, in practice all of them run elsewhere.

So what happens in a C++ program with the 5-element array? Most frequently, you access something at the address that would contain the 7th element, but since there isn't any, it contains something else, which just happens to be located there. Sometimes you can tell from the source code what that is, and sometimes you can't. Anyway, you're really lucky if the program crashes; because if it keeps running, you'll have hard time understanding why it ends up crashing or misbehaving later. If it doesn't scare you (you debugged a couple of buffer overflows and feel confident), wait until you get to many megabytes of machine code and many months of execution time. That's when the real fun starts.

Now, the ability of a piece of code to modify a random object when in fact it tries to access an unrelated array indicates that C++ has no run time encapsulation. Since it doesn't have compile time encapsulation, either, one can wonder why it calls itself object-oriented. Two possible answers are warped perspective and marketing (these aren't mutually exclusive).

But if we leave the claims about being object-oriented aside, the fact that a language runs in unmanaged environments can't really be called a "bug". That's because managed environments check things at run time to prevent illegal operations, which translates to a certain (though frequently overestimated) performance penalty. So when performance isn't that important, a managed environment is the way to go. But when it's critical, you just have to deal with the difficulties in debugging. However, C++ (compared to C, for example) makes that much harder that it already has to be, because there are...

No binary implementation rules

When an invalid program finally crashes (or enters an infinite loop, or goes to sleep forever), what you're left with is basically the binary snapshot of its state (a common name for it is a "core dump"). You have to make sense of it in order to find the bug. Sometimes a debugger will show you the call stack at the point of crash; frequently that information is overwritten by garbage. Other things which can help the debugger figure things out may be overwritten, too.

Now, figuring out the meaning of partially corrupted memory snapshots is definitely not the most pleasant way to spend one's time. But with unmanaged environments you have to do it and it can be done, if you know how your source code maps to binary objects and code. Too bad that with C++, there's a ton of these rules and each compiler uses different ones. Think about exception handling or various kinds of inheritance or virtual functions or the layout of standard library containers. In C, there's no standard binary language implementation rules, either, but it's an order of magnitude simpler and in practice compilers use the same rules. Another reason making C++ code hard to debug is the above-mentioned complicated grammar, since debuggers frequently can't deal with many language features (place breakpoints in templates, parse pointer casting commands in data display windows, etc.).

The lack of a standard ABI (application binary interface) has another consequence - it makes shipping C++ interfaces to other teams / customers impractical since the user code won't work unless it's compiled with the same tools and build options. We've already seen another source of this problem - the instability of binary interfaces due to the lack of compile time encapsulation.

The two problems - with debugging C++ code and with using C++ interfaces - don't show up until your project grows complicated in terms of code and / or human interactions, that is, until it's too late. But wait, couldn't you deal with both problems programmatically? You could generate C or other wrappers for C++ interfaces and write programs automatically shoveling through core dumps and deciphering the non-corrupted parts, using something called reflection. Well, actually, you couldn't, not in a reasonable amount of time - there's...

No reflection

It is impossible to programmatically iterate over the methods or the attributes or the base classes of a class in a portable way defined by the C++ standard. Likewise, it is impossible to programmatically determine the type of an object (for dynamically allocated objects, this can be justified to an extent by performance penalties of RTTI, but not for statically allocated globals, and if you could start at the globals, you could decipher lots of memory pointed by them). Features of this sort - when a program can access the structure of programs, in particular, its own structure - are collectively called reflection, and C++ doesn't have it.

As mentioned above, this makes generating wrappers for C++ classes and shoveling through memory snapshots a pain, but that's a small fraction of the things C++ programmers are missing due to this single issue. Wrappers can be useful not only to work around the problem of shipping C++ interfaces - you could automatically handle things like remote procedure calls, logging method invocations, etc. A very common application of reflection is serialization - converting objects to byte sequences and vice versa. With reflection, you can handle it for all types of objects with the same code - you just iterate over the attributes of compound objects, and only need special cases for the basic types. In C++, you must maintain serialization-related code and/or data structures for every class involved.

But perhaps we could deal with this problem programmatically then? After all, debuggers do manage to display objects somehow - the debug information, emitted in the format supported by your tool chain, describes the members of classes and their offsets from the object base pointer and all that sort of meta-data. If we're stuck with C++, perhaps we could parse this information and thus have non-standard, but working reflection? Several things make this pretty hard - not all compilers can produce debug information and optimize the program aggressively enough for a release build, not all debug information formats are documented, and then in C++, we have a...

Very complicated type system

In C++, we have standard and compiler-specific built-in types, structures, enumerations, unions, classes with single, multiple, virtual and non-virtual inheritance, const and volatile qualifiers, pointers, references and arrays, typedefs, global and member functions and function pointers, and templates, which can have specializations on (again) types (or integral constants), and you can "partially specialize" templates by pattern matching their type structure (for example, have a specialization for std::vector<MyRetardedTemplate<T> > for arbitrary values of T), and each template can have base classes (in particular, it can be derived from its own instantiations recursively, which is a well-known practice documented in books), and inner typedefs, and... We have lots of kinds of types.

Naturally, representing the types used in a C++ program, say, in debug information, is not an easy task. A trivial yet annoying manifestation of this problem is the expansion of typedefs done by debuggers when they show objects (and compilers when they produce error messages - another reason why these are so cryptic). You may think it's a StringToStringMap, but only until the tools enlighten you - it's actually more of a...

// don't read this, it's impossible. just count the lines
std::map<std::basic_string<char, std::char_traits<char>, std::allocator<char> >,
std::basic_string<char, std::char_traits<char>, std::allocator<char> >,
std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> >
  >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>,
std::allocator<char> > const, std::basic_string<char, std::char_traits<char>,
std::allocator<char> > > > >

But wait, there's more! C++ supports a wide variety of explicit and implicit type conversions, so now we have a nice set of rules describing the cartesian product of all those types, specifically, how conversion should be handled for each pair of types. For example, if your function accepts const std::vector<const char*>& (which is supposed to mean "a reference to an immutable vector of pointers to immutable built-in strings"), and I have a std::vector<char*> object ("a mutable vector of mutable built-in strings"), then I can't pass it to your function because the types aren't convertible. You have to admit that it doesn't make any sense, because your function guarantees that it won't change anything, and I guarantee that I don't even mind having anything changed, and still the C++ type system gets in the way and the only sane workaround is to copy the vector. And this is an extremely simple example - no virtual inheritance, no user-defined conversion operators, etc.

But conversion rules by themselves are still not the worst problem with the complicated type system. The worst problem is the...

Very complicated type-based binding rules

Types lie at the core of the C++ binding rules. "Binding" means "finding the program entity corresponding to a name mentioned in the code". When the C++ compiler compiles something like f(a,b) (or even a+b), it relies on the argument types to figure out which version of f (or operator+) to call. This includes overload resolution (is it f(int,int) or f(int,double)?), the handling of function template specializations (is it template<class T> void f(vector<T>&,int) or template<class T> void f(T,double)?), and the argument-dependent lookup (ADL) in order to figure out the namespace (is it A::f or B::f?).

When the compiler "succeeds" (translates source code to object code), it doesn't mean that you are equally successful (that is, you think a+b called what the compiler thought it called). When the compiler "fails" (translates source code to error messages), most humans also fail (to understand these error messages; multiple screens listing all available overloads of things like operator<< are less than helpful). By the way, the C++ FAQ has very few items related to the unbelievably complicated static binding, like overload resolution or ADL or template specialization. Presumably people get too depressed to ask any questions and silently give up.

In short, the complicated type system interacts very badly with overloading - having multiple functions with the same name and having the compiler figure out which of them to use based on the argument types (don't confuse it with overriding - virtual functions, though very far from perfect, do follow rules quite sane by C++ standards). And probably the worst kind of overloading is...

Defective operator overloading

C++ operator overloading has all the problems of C++ function overloading (incomprehensible overload resolution rules), and then some. For example, overloaded operators have to return their results by value - naively returning references to objects allocated with new would cause temporary objects to "leak" when code like a+b+c is evaluated. That's because C++ doesn't have garbage collection, since that, folks, is inefficient. Much better to have your code copy massive temporary objects and hope to have them optimized out by our friend the clever compiler. Which, of course, won't happen any time soon.

Like several other features in C++, operator overloading is not necessarily a bad thing by itself - it just happens to interact really badly with other things C++. The lack of automatic memory management is one thing making operator overloading less than useful. Another such thing is...

Defective exceptions

Consider error handling in an overloaded operator or a constructor. You can't use the return value, and setting/reading error flags may be quite cumbersome. How about throwing an exception?

This could be a good idea in some cases if C++ exceptions were any good. They aren't, and can't be - as usual, because of another C++ "feature", the oh-so-efficient manual memory management. If we use exceptions, we have to write exception-safe code - code which frees all resources when the control is transferred from the point of failure (throw) to the point where explicit error handling is done (catch). And the vast majority of "resources" happens to be memory, which is managed manually in C++. To solve this, you are supposed to use RAII, meaning that all pointers have to be "smart" (be wrapped in classes freeing the memory in the destructor, and then you have to design their copying semantics, and...). Exception safe C++ code is almost infeasible to achieve in a non-trivial program.

Of course, C++ exceptions have other flaws, following from still other C++ misfeatures. For example, the above-mentioned lack of reflection in the special case of exceptions means that when you catch an exception, you can't get the call stack describing the context where it was thrown. This means that debugging illegal pointer dereferencing may be easier than figuring out why an exception was thrown, since a debugger will list the call stack in many cases of the former.

At the bottom line, throw/catch are about as useful as longjmp/setjmp (BTW, the former typically runs faster, but it's mere existence makes the rest of the code run slower, which is almost never acknowledged by C++ aficionados). So we have two features, each with its own flaws, and no interoperability between them. This is true for the vast majority of C++ features - most are...

Duplicate facilities

If you need an array in C++, you can use a C-like T arr[] or a C++ std::vector<T> or any of the array classes written before std::vector appeared in the C++ standard. If you need a string, use char* or std::string or any of the pre-standard string classes. If you need to take the address of an object, you can use a C-like pointer, T*, or a C++ reference, T&. If you need to initialize an object, use C-like aggregate initialization or C++ constructors. If you need to print something, you can use a C-like printf call or a C++ iostream call. If you need to generate many similar definitions with some parameters specifying the differences between them, you can use C-like macros or C++ templates. And so on.

Of course you can do the same thing in many ways in almost any language. But the C++ feature duplication is quite special. First, the many ways to do the same thing are usually not purely syntactic options directly supported by the compiler - you can compute a+b with a-b*-1, but that's different from having T* and T& in the same language. Second, you probably noticed a pattern - C++ adds features duplicating functionality already in C. This is bad by itself, because the features don't interoperate well (you can't printf to an iostream and vice versa, code mixing std::string and char* is littered with casts and calls to std::string::c_str, etc.). This is made even worse by the pretty amazing fact that the new C++ features are actually inferior to the old C ones in many aspects.

And the best part is that C++ devotees dare to refer to the C features as evil, and frequently will actually resort to finger pointing and name calling when someone uses them in C++ code (not to mention using plain C)! And at the same time they (falsely) claim that C++ is compatible with C and it's one of its strengths (why, if C is so evil?). The real reason to leave the C syntax in C++ was of course marketing - there's absolutely NO technical reason to parse C-like syntax in order to work with existing C code since that code can be compiled separately. For example, mixing C and the D programming language isn't harder than mixing C and C++. D is a good example since its stated goals are similar to those of C++, but almost all other popular languages have ways to work with C code.

So IMO all that old syntax was kept for strictly commercial purposes - to market the language to non-technical managers or programmers who should have known better and didn't understand the difference between "syntax" and "compatibility with existing code" and simply asked whether the old code will compile with this new compiler. Or maybe they thought it would be easier to learn a pile of new syntax when you also have the (smaller) pile of old syntax than when you have just the new syntax. Either way, C++ got wide-spread by exploiting misconceptions.

Well, it doesn't matter anymore why they kept the old stuff. What matters is that the new stuff isn't really new, either - it's obsessively built in ways exposing the C infrastructure underneath it. And that is purely a wrong design decision, made without an axe to grind. For example, in C++ there's...

No high-level built-in types

C is a pretty low-level language. Its atomic types are supposed to fit into machine registers (usually one, sometimes two of them). The compound types are designed to occupy a flat chunk of memory with of a size known at compile time.

This design has its virtues. It makes it relatively easy to estimate the performance & resource consumption of code. And when you have hard-to-catch low-level bugs, which sooner or later happens in unmanaged environments, having a relatively simple correspondence between source code definitions and machine memory helps to debug the problem. However, in a high-level language, which is supposed to be used when the development-time-cost / execution-time-cost ratio is high, you need things like resizable arrays, key-value mappings, integers that don't overflow and other such gadgets. Emulating these in a low-level language is possible, but is invariably painful since the tools don't understand the core types of your program.

C++ doesn't add any built-in types to C (correction). All higher-level types must be implemented as user-defined classes and templates, and this is when the defects of C++ classes and templates manifest themselves in their full glory. The lack of syntactic support for higher-level types (you can't initialize std::vector with {1,2,3} or initialize an std::map with something like {"a":1,"b":2} or have large integer constants like 3453485348545459347376) is the small part of the problem. Cryptic multi-line or multi-screen compiler error messages, debuggers that can't display the standard C++ types and slow build times unheard of anywhere outside of the C++ world are the larger part of the problem. For example, here's a simple piece of code using the C++ standard library followed by an error message produced from it by gcc 4.2.0. Quiz: what's the problem?

// the code
typedef std::map<std::string,std::string> StringToStringMap;
void print(const StringToStringMap& dict) {
  for(StringToStringMap::iterator p=dict.begin(); p!=dict.end(); ++p) {
    std::cout << p->first << " -> " << p->second << std::endl;
  }
}
// the error message
test.cpp: In function 'void print(const StringToStringMap&)':
test.cpp:8: error: conversion from
'std::_Rb_tree_const_iterator<std::pair<const std::basic_string<char,
std::char_traits<char>, std::allocator<char> >, std::basic_string<char,
std::char_traits<char>, std::allocator<char> > > >' to non-scalar type
'std::_Rb_tree_iterator<std::pair<const std::basic_string<char,
std::char_traits<char>, std::allocator<char> >, std::basic_string<char,
std::char_traits<char>, std::allocator<char> > > >' requested

The decision to avoid new built-in types yields other problems, such as the ability to throw anything, but without the ability to catch it later. class Exception, a built-in base class for all exception classes treated specially by the compiler, could solve this problem with C++ exceptions (but not others). However, the most costly problem with having no new high-level built-in types is probably the lack of easy-to-use containers. But to have those, we need more than just new built-in types and syntax in the C++ compiler. Complicated data structures can't be manipulated easily when you only have...

Manual memory management

Similarly to low-level built-in types, C++ manual memory management is inherited from C without changes (but with the mandatory addition of duplicate syntax - new/delete, which normally call malloc/free but don't have to do that, and of course can be overloaded).

Similarly to the case with low-level built-in types, what makes sense for a low-level language doesn't work when you add higher-level features. Manual memory management is incompatible with features such as exceptions & operator overloading, and makes working with non-trivial data structures very hard, since you have to worry about the life cycles of objects so they won't leak or die while someone still needs them.

The most common solution is copying - since it's dangerous to point to an object which can die before we're done with it, make yourself a copy and become an "owner" of that copy to control its life cycle. An "owner" is a C++ concept not represented in its syntax; an "owner" is the object that is responsible to deallocate a dynamically allocated chunk of memory or some other resource. The standard practice in C++ is to assign each "resource" (a fancy name for memory, most of the time) to an owner object, which is supposed to prevent resource leaks. What it doesn't prevent is access to dead objects; we have copying for that. Which is slow and doesn't work when you need many pointers to the same object (for example, when you want other modules to see your modifications to the object).

An alternative solution to copying is using "smart" pointer classes, which could emulate automatic memory management by maintaining reference counts or what-not. To implement the pointer classes for the many different types in your program, you're encouraged to use...

Defective metaprogramming facilities

There are roughly two kinds of metaprogramming: code that generates other code and code that processes other code. The second kind is practically impossible to do with C++ code - you can't reliably process source code due to the extremely complicated grammar and you can't portably process compiled code because there's no reflection. So this section is about the first kind - code generation.

You can generate C++ code from within a C++ program using C macros and C++ templates. If you use macros, you risk getting clubbed to death by C++ fanatics. Their irrational behavior left aside, these people do have a point - C macros are pretty lame. Too bad templates are probably even worse. They are limited in ways macros aren't (however, the opposite is also true). They compile forever. Being the only way to do metaprogramming, they are routinely abused to do things they weren't designed for. And they are a rats' nest of bizarre syntactic problems.

That wouldn't necessarily be so bad if C++ didn't rely on metaprogramming for doing essential programming tasks. One reason C++ has to do so is that in C++, the common practice is to use static binding (overload resolution, etc.) to implement polymorphism, not dynamic binding. So you can't take an arbitrary object at run time and print it, but in many programs you can take an arbitrary type at compile time and print objects of this type. Here's one common (and broken) application of metaprogramming - the ultimate purpose is to be able to print arbitrary object at run time:

// an abstract base class wrapping objects of arbitrary types.
// there can be several such classes in one large project
struct Obj {
  virtual void print(std::ostream&) const = 0;
};
template<class T> struct ObjImpl : Obj {
  T wrapped;
  virtual void print(std::ostream& out) const { out << wrapped; }
};
// now we can wrap int objects with ObjImpl<int> and string objects
// with ObjImpl<std::string>, store them in the same collection of Obj*
// and print the entire collection using dynamic polymorphism:
void print_them(const std::vector<Obj*>& objects) {
  for(int i=0; i<(int)objects.size(); ++i) {
    objects[i]->print(std::cout); // prints wrapped ints, strings, etc.
    std::cout << std::endl;
  }
}

Typically there are 10 more layers of syntax involved, but you get the idea. This sort of code doesn't really work because it requires all relevant overloads of operator<< to be visible before the point where ObjImpl is defined, and that doesn't happen unless you routinely sort your #include directives according to that rule. Some compilers will compile the code correctly with the rule violated, some will complain, some will silently generate wrong code.

But the most basic reason to rely on the poor C++ metaprogramming features for everyday tasks is the above-mentioned ideological decision to avoid adding high-level built-in types. For example, templates are at the core of the...

Unhelpful standard library

Most things defined by the C++ standard library are templates, and relatively sophisticated ones, causing the users to deal with quite sophisticated manifestations of the problems with templates, discussed above. In particular, a special program called STLFilt exists for decrypting the error messages related to the C++ standard library. Too bad it doesn't patch the debug information in a similar way.

Another problem with the standard library is all the functionality that's not there. A large part of the library duplicates the functionality from the C standard library (which is itself available to C++ programs, too). The main new thing is containers ("algorithms" like max and adjacent_difference don't count as "functionality" in my book). The standard library doesn't support listing directories, opening GUI windows or network sockets. You may think that's because these things are non-portable. Well, the standard library doesn't have matrices or regular expressions, either.

And when you use the standard library in your code, one reason it compiles slowly to a large binary image is that the library extensively uses the...

Defective inlining

First, let's define the terms.

"Inlining" in the context of compilers refers to a technique for implementing function calls (instead of generating a sequence calling the implementation of the function, the compiler integrates that implementation at the point where the call is made). "Inlining" in the context of C++ refers to a way to define functions in order to enable (as opposed to "force") such implementation of the calls to the function (the decision whether to actually use the opportunity is made by the compiler).

Now, the major problem with this C++ way to enable inlining is that you have to place the definition of the function in header files, and have it recompiled over and over again from source. This doesn't have to be that way - the recompilation from source can be avoided by having higher-level object file formats (the way it's done in LLVM and gcc starting from version 4). This approach - link-time inlining - is one aspect of "whole program optimization" supported by modern compilers. But the recompilation from source could also be avoided in simpler ways if C++ had a way to locate definitions instead of recompiling them, which, as we've seen, it hasn't.

The crude support for inlining, designed with a traditional implementation of a C tool chain in mind, wouldn't be as bad if it wasn't used all the time. People define large functions inline for two reasons. Some of them "care" (emotionally) about performance, but never actually measure it, and someone told them that inlining speeds things up, and forgot to tell them how it can slow them down. Another reason is that it's simply annoying to define functions non-inline, since that way, you place the full function definition in a .cpp file and its prototype in a .h file. So you write the prototype twice, with small changes (for example, if a class method returns an object of a type itself defined in the class, you'll need an extra namespace qualification in the .cpp file since you're now outside of the namespace of the class). Much easier to just have the body written right in the .h file, making the code compile more slowly and recompile more frequently (changing the function body will trigger a recompilation).

And you don't even need to actually write any inline functions to get most of their benefits! A large subset of the inline functions of a program are...

Implicitly called & generated functions

Here's a common "design pattern" in C++ code. You have a huge class. Sometimes there's a single pseudo-global object of this class. In that case, you get all the drawbacks of global variables because everybody has a pointer to the thing and modifies it and expects others to see the changes. But you get no benefits of global variables since the thing is allocated on the stack and when your program crashes with a buffer overflow, you can't find the object in a debugger. And at other times there are many of these objects, typically kept in a pseudo-global collection.

Anyway, this huge class has no constructors, no destructor and no operator=. Of course people create and destroy the objects, and sometimes even assign to them. How is this handled by the compiler?

This is handled by the compiler by generating a gigantic pile of code at the point where it would call the user-defined functions with magic names (such as operator=) if there were any. When you crash somewhere at that point, you get to see kilobytes of assembly code in the debugger, all generated from the same source code line. You can then try and figure out which variable didn't like being assigned to, by guessing where the class member offsets are in the assembly listing and looking for symbolic names of the members corresponding to them. Or you can try and guess who forgot all about the fact that these objects were assigned to using the "default" operator= and added something like built-in pointer members to the class. Because that wouldn't work, and could have caused the problem.

Implicit generation of functions is problematic because it slows compilation down, inflates the program binaries and gets in the way when you debug. But the problem with implicitly calling functions (whether or not they were implicitly generated) is arguably even worse.

When you see code like a=f(b,c) (or even a=b+c, thanks to operator overloading), you don't know whether the objects are passed by reference or by value (see "information hiding"). In the latter case, the objects are copied with implicitly called functions; in the former case, that's possible, too, if implicit type conversions were involved. Which means that you don't really understand what the program does unless you know the relevant information about the relevant overloads and types. And by the way, the fact that you can't see whether the object is passed by reference or by value at the point of call is another example of implicit stuff happening in C++.

One more problem with automatically generated functions (such as constructors and destructors) is that they must be regenerated when you add private members to a class, so changing the private parts of a class triggers recompilation... Which brings us back to square 1.


Copyright © 2007-2009 Yossi Kreinin
revised 17 October 2009