I love globals, or Google Core Dump

July 20th, 2008

The entire discussion only applies to unsafe languages, the ones that dump core. By which I mean, C. Or C++, if you're really out of luck.

If it can dump core, it will dump core, unless it decides to silently corrupt its data instead. Trust my experience of working in a multi-processor, multi-threaded, multi-programmer, multi-nightmare environment. Some C++ FQA Lite readers claimed that the fact that I deal with lots of crashes in C++ indicates that I'm a crappy programmer surrounded by similar people. Luckily, you don't really need to trust my experience, because you can trust Google's. Do this:

  1. Find a Google office near you.
  2. Visit a Google toilet.
  3. You'll find a page about software testing, with the subtitle "Debugging sucks. Testing rocks." Read it.
  4. Recover from the trauma.
  5. Realize that the chances of you being better at eliminating bugs than Google are low.
  6. Read about the AdWords multi-threaded billing server nightmare.
  7. The server was written in C++. The bug couldn't happen in a safe language. Meditate on it.
  8. Consider yourself enlightened.

This isn't the reason why this post has "Google core dump" in its title, but hopefully it's a reason for us to agree that your C/C++ program will crash, too.

I love globals

What happens when we face a core dump? Well, we need the same things you'd expect to look for in any investigation: names and addresses. Names of objects looking at which may explain what happened, their addresses to actually look at them, and type information to sensibly display them.

In C and C++, we have 3 kinds of addresses: stack, heap and global. Let's see who lives there.

Except the stack is overwritten, because it can be. Don't count on being able to see the function calls leading to the point of crash, nor the parameters and local variables of those functions. In fact, don't even count on being able to see the point of crash itself: the program counter, the link register, the frame pointer, all that stuff can contain garbage.

And the heap is overwritten, too, nearly as badly. The typical data structure used by C/C++ allocators (for example, dlmalloc) is a kind of linked list, where each memory block is prefixed with its size so you can jump to the next one. Overwrite one of these size values and you will have lost the boundaries of the chunks above that address. That's a loss of 50% of the heap objects on average, assuming uniform distribution of memory overwriting bugs across the address space of the heap.

So don't count on the stack or the heap. Your only hope is that someone has ignored the Best Practices and the finger-pointing by the more proficient colleagues, and allocated a global object. Possibly under the clever disguise of a "Singleton". Not a bad thing after all, that moronic "design pattern", because it ultimately allowed to counter cargo cult programmers' accusations of "globals are evil" with equally powerful cargo cult argument of "it's a design pattern". So people could allocate globals again.

Which is good, because a global always has an accurate name-to-address mapping, no matter what atrocity was committed by the bulk of unsafe code running on the loose. Can't overwrite a symbol table. And it has accurate type information, too. As opposed to objects you find through void*, or a base class pointer where the base class lacks virtual functions or the object vptr was overwritten, etc.

Which is why I frequently start debugging by firing an object view window on a global, or running debugger macros which read globals, etc. Of course you can fuck up a global variable to make debugging unpleasant. For example, if the variable is "static" in the C sense, you need to open the right file or function to display it, and you need the debugger front-end to understand the context, which will be especially challenging if it's a static variable in a template function (one of the best things in C++ is how neatly its new features interact with C's old ones).

Or you can stuff the global into a class or a namespace. I was never able to display globals by their qualified C++ name in, say, gdb 5. But no matter; nm <program> | grep <global> followed by p *(TypeOfGlobal*)addr always does the trick, and no attempts at obfuscating the symbol table will stop it. I still say make it a real, unashamed global to make debugging easier. If you're lucky, you'll get to piss off a couple of cargo cult followers as a nice side-effect.

Google Core Dump

A core dump is a web. Its sites are objects. It's hyperlinks are pointers. It's PageRank is a TypeRank: what's the type of this object according to the votes of the pointers stored in other objects? The spamdexing is done by pointer-like bit patterns stored in unused memory slots. The global variables are the major sites with high availability you can use as roots for the crawling.

What utilities would we like to have for this web? The usual stuff.

I really wish there was a reasonably portable and reliable Google Core Dump kind of thing. But it doesn't look like that many people care about debugging crashes at all. Most core dumps at customer sites seem to go to /dev/null, and those that can't be easily deciphered are apparently given up on until the bug manifests itself in some other way or its cause is guessed by someone.

Am I coming from a particularly weird niche where the code size is large enough and the development rapid enough to make crashes almost unavoidable, but crashes in the final product version are almost intolerable? Or do most good projects allocate everything on the stack and the heap, so with those smashed they're doomed no matter what? Or is the problem simply stinky enough to make it unattractive for a hobby project while lacking revenue potential to make a good commercial project?

Would you like this sort of thing? If you would, drop me a line. In the meanwhile, I satisfy my wish for a Google Core Dump with my perfect implementation for an embedded co-processor, the one I've poked at with Tcl commands. With 128K of memory, no dynamic allocation, and local variables effectively implemented as globals, perfect decoding is easy. I'm telling ya, globals rule.

As to my "reverse DNS" implementation:

By the way, the std.algorithm module, the one with the sort, filter, lowerBound and similar functions, is by Andrei Alexandrescu, of Modern C++ Design fame. How is it possible that his stuff in D is so yummy while his implementation of similar things in C++ is equally icky? Because C++ is to D what proper fixation is to anaesthesia. There, I bet you saw it coming.

What does "global" mean?

For the sake of completeness, I'd like to bore you with a discussion of the various aspects of globalhood, in the vanishingly small hope of this being useful in a battle against a cargo cult follower authoring a coding convention or such. In C++, "global" can mean at least 6 things:

So when I share my love of globals with you, the question is which aspect of globality I mean. What I mean is this:

  1. I like global storage – link-time addresses – for everything which can be handled that way. A global pointer is better than nothing, but it can be overwritten and you will have lost the object; better allocate the entire thing globally.
  2. I like global scope, no classes, namespaces and access control keywords attached, to make symbol table look-up easier, thus making use of the global allocation.
  3. I like global life cycle – no Meyers' singletons and lazy initialization. In fact, I like trivial constructors/destructors, leaving the actual work to init/close functions called by main(). This way, you can actually control the order in which things are done and know what the dependencies are. With Meyers' singletons, the order of destruction is uncontrollable (it's the reverse order of initialization, which doesn't necessarily work). Solutions were proposed to this problem, so dreadful that I'm not going to discuss them. Just grow up, design the damned init/close sequence and be in control again. Why do people think that all major operations should be explicit except for initialization which should happen automagically when you least expect it?
  4. "Globals" in the sense of "touched by every piece of code" is the trademark style of a filthy swine. There are plenty of good reasons to use "globals"; none of them has anything to do with "globals" as in "variables nobody/everybody is responsible for".
  5. I think that everything that's instantiated once per process is a "global", and when you wrap it with scope, access control, and design patterns, you shouldn't stop calling it a global (and instead insist on "singleton", "static class member", etc.). It's still a global, and its wrapping should be evaluated by its practical virtues. Currently, I see no point in wrapping globals in anything – plain old global variables are the thing best supported by all software tools I know.

I think this can be used as "rationale" in a coding guideline, maybe in the part allowing the use of globals as an "exception". But I keep my hopes low.

1. User links about "demangle" on iLinkShareJan 25, 2009

[...] Linker>> saved by stefaanh 22 days ago3 votesProfile simuPOP>> saved by fadereu 28 days ago2 votesI love globals, or Google Core Dump>> saved by sanyaissues 41 days ago3 votesApple patches 25 flaws with latest update>> saved by fenec [...]

2. billJul 3, 2009

The Google bug mentioned here is plain stupidity. To say that it can happen only in C++... is even worse. The same can happen in C (as you put it... any unsafe language). However, any decent C/C++ (multi-threaded... not necessarily) app developer must know... you don't use a pointer (or a boost::ref) to a stack allocated object in one thread from another. Nor can you use the same and put in a list and later de-reference it.

It is our stupidity if we do that... not the fault of C++. You learn lessons as you do serious development work. You make mistakes and correct them.

Every decent Java programmer knows that Java programs can have memory leaks and they learn it through experience. It is newbies that swallow the whole "there are no memory leaks in Java" nonsense without looking deeper.

BTW... multi-threading is hard. Not just in C/C++. You have to go the "no mutable data structures" to get it right (easily that is).

3. Yossi KreininJul 3, 2009

You know which part I liked best? The "...or a boost::ref" part.

4. NathanOct 9, 2009

Want! I want that debugger! I once worked in a place where globals were used with abandon in the last sense. It was terrible. The boss man once actually said something like "Does anyone actually use those?" about a core dump.

5. Yossi KreininOct 9, 2009

@Nathan: I'm not sure I understood, but – are you saying that you want a program for figuring out the types and locations of live objects in a core dump, what I called Google Core Dump above?

6. ChrisJun 30, 2010

I hope my boss man reads this. My company <3 globals, "singletons" and static member variables. It seems like I am the only one at the company that gets that they are all the same thing. The only thing that changes is how it is accessed. But restricting the access still doesn't actually impose any real restrictions.

7. NathanDec 2, 2010

Yes, I want a Google Core Dump.

8. Yossi KreininDec 2, 2010

Wow, that's a long pause in a conversation... well, you're the first to want it, I think; not that I have any – we had something basically working for one of our obscure platforms but it broke down due to size issues in the next revision of that platform.

9. rarecactusJan 28, 2011

Read the Linux Kernel Style Guide. Then try writing a bit of C in that style yourself... a kernel module, perhaps.

Try this, and you will achieve enlightenment. And you'll never want to write a line of C++ again.

P.S. The best kind of globals are the ones that are static to file scope.

10. Yossi KreininJan 29, 2011

Um... What if I already don't want to write a line of C++ again? Looks like you're preaching to the choir, probably unintentionally.

As to static globals – theoretically great, but some software tools have trouble with those. So I prefer global visibility and a naming convention to prevent clashes.

11. rarecactusJan 29, 2011

Personally, I read all the best C++ books — Scott Meyers, Joshuttis, etc. I learned all their recommendations by heart, and I put them into practice too.

However, it wasn't until I wrote code in the Linux kernel style that I really knew what good low-level code looked like. Just to take one feature, the 8-space tabs acted as a check on excessive nesting and overlong functions.

I realized then that C++ was just a framework that had gotten too big for its own good. The OO framework that the Linux kernel guys wrote (take a look at KObject, for example), is actually safer and saner than C++'s horrible vtable clusterfuck.

Over time, people have realized that composition of multiple objects is often a better way to go than inheritance. But C++ provides lots and lots of syntax for creating deep inheritance hierarchies, and almost none for composing objects. Consider how tedious it is to write a wrapper class or even to implement accessor functions in C++.

So to summarize: to a first approximation, C++ consists of an obsolete 1980s style OO framework, some perl-style features that make code slightly faster to write and much, much harder to read, and templates. Templates are good, but not good enough to justify the rest of the crap.

Anyway. I'm probably preaching to the choir again.

With regard to variables static to a file... I think gdb has pretty good support for those. I really like using them. Unlike private variables in C++, symbols static to a file really *are* hidden.



Post a comment