I love globals, or Google Core Dump

The entire discussion only applies to unsafe languages, the ones that dump core. By which I mean, C. Or C++, if you're really out of luck.

If it can dump core, it will dump core, unless it decides to silently corrupt its data instead. Trust my experience of working in a multi-processor, multi-threaded, multi-programmer, multi-nightmare environment. Some C++ FQA Lite readers claimed that the fact that I deal with lots of crashes in C++ indicates that I'm a crappy programmer surrounded by similar people. Luckily, you don't really need to trust my experience, because you can trust Google's. Do this:

  1. Find a Google office near you.
  2. Visit a Google toilet.
  3. You'll find a page about software testing, with the subtitle "Debugging sucks. Testing rocks." Read it.
  4. Recover from the trauma.
  5. Realize that the chances of you being better at eliminating bugs than Google are low.
  6. Read about the AdWords multi-threaded billing server nightmare.
  7. The server was written in C++. The bug couldn't happen in a safe language. Meditate on it.
  8. Consider yourself enlightened.

This isn't the reason why this post has "Google core dump" in its title, but hopefully it's a reason for us to agree that your C/C++ program will crash, too.

I love globals

What happens when we face a core dump? Well, we need the same things you'd expect to look for in any investigation: names and addresses. Names of objects looking at which may explain what happened, their addresses to actually look at them, and type information to sensibly display them.

In C and C++, we have 3 kinds of addresses: stack, heap and global. Let's see who lives there.

Except the stack is overwritten, because it can be. Don't count on being able to see the function calls leading to the point of crash, nor the parameters and local variables of those functions. In fact, don't even count on being able to see the point of crash itself: the program counter, the link register, the frame pointer, all that stuff can contain garbage.

And the heap is overwritten, too, nearly as badly. The typical data structure used by C/C++ allocators (for example, dlmalloc) is a kind of linked list, where each memory block is prefixed with its size so you can jump to the next one. Overwrite one of these size values and you will have lost the boundaries of the chunks above that address. That's a loss of 50% of the heap objects on average, assuming uniform distribution of memory overwriting bugs across the address space of the heap.

So don't count on the stack or the heap. Your only hope is that someone has ignored the Best Practices and the finger-pointing by the more proficient colleagues, and allocated a global object. Possibly under the clever disguise of a "Singleton". Not a bad thing after all, that moronic "design pattern", because it ultimately allowed to counter cargo cult programmers' accusations of "globals are evil" with equally powerful cargo cult argument of "it's a design pattern". So people could allocate globals again.

Which is good, because a global always has an accurate name-to-address mapping, no matter what atrocity was committed by the bulk of unsafe code running on the loose. Can't overwrite a symbol table. And it has accurate type information, too. As opposed to objects you find through void*, or a base class pointer where the base class lacks virtual functions or the object vptr was overwritten, etc.

Which is why I frequently start debugging by firing an object view window on a global, or running debugger macros which read globals, etc. Of course you can fuck up a global variable to make debugging unpleasant. For example, if the variable is "static" in the C sense, you need to open the right file or function to display it, and you need the debugger front-end to understand the context, which will be especially challenging if it's a static variable in a template function (one of the best things in C++ is how neatly its new features interact with C's old ones).

Or you can stuff the global into a class or a namespace. I was never able to display globals by their qualified C++ name in, say, gdb 5. But no matter; nm <program> | grep <global> followed by p *(TypeOfGlobal*)addr always does the trick, and no attempts at obfuscating the symbol table will stop it. I still say make it a real, unashamed global to make debugging easier. If you're lucky, you'll get to piss off a couple of cargo cult followers as a nice side-effect.

Google Core Dump

A core dump is a web. Its sites are objects. It's hyperlinks are pointers. It's PageRank is a TypeRank: what's the type of this object according to the votes of the pointers stored in other objects? The spamdexing is done by pointer-like bit patterns stored in unused memory slots. The global variables are the major sites with high availability you can use as roots for the crawling.

What utilities would we like to have for this web? The usual stuff.

  • Browsers. Debugger object view window is the Firefox, and the memory view window is the Lynx. The core dump Lynx usually sucks in that it doesn't make it easy to follow pointers – can't click on a word and have the browser follow the pointer (by jumping to the memory pointed by it). No back button, either. Oh well.
  • DNS. The ability to translate variable names to raw addresses. Works very reliably for globals and passably otherwise. Works reliably for all objects in safe languages.
  • Reverse DNS. Given an address, tell me the object name. Problematic for dynamically allocated objects, although you could list the names of pointer variables leading to it (Google bombing). Works reliably for global functions and variables. For some reason, the standard addr2line program only supports functions though. Which is why I have an addr2sym program. It so happened that I have several of them, in fact. You can download one here. "Reverse DNS" is particularly useful when you find pointers somewhere in registers or memory and wonder what they could point to. In safe languages, you don't have that problem because everything is typed and so you can simply display the pointed object.
  • Google Core Dump, similar to Google Desktop or Google for the WWW. Crawl a core dump, figure out the object boundaries and types by parsing the heap linked list and the stack and looking at pointers' "votes", create an index, and allow me to query that index. Lots of work, that, some of it heuristical. And in order to get type information in C or C++, you'll have to either parse the source code (good luck doing it with C++), or parse the non-portable debug information format. But it's doable; in fact, we have it, for our particular target/debugger/allocator combo. Of course it has its glitches. Quirky and obscure enough to make open sourcing it not worth the trouble.

I really wish there was a reasonably portable and reliable Google Core Dump kind of thing. But it doesn't look like that many people care about debugging crashes at all. Most core dumps at customer sites seem to go to /dev/null, and those that can't be easily deciphered are apparently given up on until the bug manifests itself in some other way or its cause is guessed by someone.

Am I coming from a particularly weird niche where the code size is large enough and the development rapid enough to make crashes almost unavoidable, but crashes in the final product version are almost intolerable? Or do most good projects allocate everything on the stack and the heap, so with those smashed they're doomed no matter what? Or is the problem simply stinky enough to make it unattractive for a hobby project while lacking revenue potential to make a good commercial project?

Would you like this sort of thing? If you would, drop me a line. In the meanwhile, I satisfy my wish for a Google Core Dump with my perfect implementation for an embedded co-processor, the one I've poked at with Tcl commands. With 128K of memory, no dynamic allocation, and local variables effectively implemented as globals, perfect decoding is easy. I'm telling ya, globals rule.

As to my "reverse DNS" implementation:

  • I could make it more portable by parsing the output of nm --print-size. But just running nm on a 20M symbol table takes about 2 seconds. I want instantaneous output, 'cause I'm very impatient when I debug.
  • Alternatively, I could make it more portable by using a library such as bfd. But that would drag in a library such as bfd, and I had trouble with what looked like library/compiler version mismatches with bfd, whereas my ELF parsing code never had any trouble. Also, an implementation parsing ELF is more interesting as sample code because you get to see how easy to parse these formats are. So it's elfaddr2sym, not addr2sym. (It's really 32-bit-ELF-with-host-endianness-addr2sym, because I'm lazy and it covers all my targets.)
  • There's a ton of addr2sym code out there, and maybe a good addr2sym program. I just didn't find it. I have an acknowledged weakness in the wheel reinventing department.
  • Of course I don't demangle the ugly C++ names; piping to c++filt does.
  • The program is in D, because of the "instantaneous" bit, and because D is one of the best choices available today if you care about both speed and brevity. Look at this: lowerBound!("a.st_value <= b")(ssyms, addr) does a binary search for addr in the sorted ssyms array. As brief as it gets out of the box with any language and standard library, isn't it? The string is compiled statically into the instantiation of the lowerBound template; a & b are the arguments of the anonymous function represented by the string. Readable. Short. Fast. Easy to use – garbage-collected array outputs in functions like filter(), error messages to the point – that's why a decent grammar is a good thing even if you aren't the compiler writer. Looks a lot like C++, braces, static typing, everything. Thus easy to pimp in a 3GL environment, in particular, a C++ environment. You can download the Digital Mars D compiler for Linux, or wait for C++0x to solve 15% of the problems with <algorithm> by introducing worse problems.

By the way, the std.algorithm module, the one with the sort, filter, lowerBound and similar functions, is by Andrei Alexandrescu, of Modern C++ Design fame. How is it possible that his stuff in D is so yummy while his implementation of similar things in C++ is equally icky? Because C++ is to D what proper fixation is to anaesthesia. There, I bet you saw it coming.

What does "global" mean?

For the sake of completeness, I'd like to bore you with a discussion of the various aspects of globalhood, in the vanishingly small hope of this being useful in a battle against a cargo cult follower authoring a coding convention or such. In C++, "global" can mean at least 6 things:

  • Number of instances per process. A "global" is everything that's instantiated once.
  • Life cycle. A "global" is constructed before main and destroyed after main. A static variable inside a function is not "global" in this sense.
  • "Scope" in the "namespace" sense (as opposed to the life cycle sense). We have C-style file scope, class scope, function scope, and "the true global scope". And we have namespaces.
  • Storage. A "global" is assigned a link time address and stored there. In a singleton implementation calling new and assigning its output to a global pointer, the pointer is "global" in this sense but the object is not.
  • Access control. If it's in a class scope, it may be private or protected, which makes it less of a global in this fifth sense.
  • Responsibility. A global can be accessible from everywhere but only actually accessed from a couple of places. For example, you can allocate a large object instantiating lots of members near your main function and then call object methods which aren't aware that the stuff is allocated globally.

So when I share my love of globals with you, the question is which aspect of globality I mean. What I mean is this:

  1. I like global storage – link-time addresses – for everything which can be handled that way. A global pointer is better than nothing, but it can be overwritten and you will have lost the object; better allocate the entire thing globally.
  2. I like global scope, no classes, namespaces and access control keywords attached, to make symbol table look-up easier, thus making use of the global allocation.
  3. I like global life cycle – no Meyers' singletons and lazy initialization. In fact, I like trivial constructors/destructors, leaving the actual work to init/close functions called by main(). This way, you can actually control the order in which things are done and know what the dependencies are. With Meyers' singletons, the order of destruction is uncontrollable (it's the reverse order of initialization, which doesn't necessarily work). Solutions were proposed to this problem, so dreadful that I'm not going to discuss them. Just grow up, design the damned init/close sequence and be in control again. Why do people think that all major operations should be explicit except for initialization which should happen automagically when you least expect it?
  4. "Globals" in the sense of "touched by every piece of code" is the trademark style of a filthy swine. There are plenty of good reasons to use "globals"; none of them has anything to do with "globals" as in "variables nobody/everybody is responsible for".
  5. I think that everything that's instantiated once per process is a "global", and when you wrap it with scope, access control, and design patterns, you shouldn't stop calling it a global (and instead insist on "singleton", "static class member", etc.). It's still a global, and its wrapping should be evaluated by its practical virtues. Currently, I see no point in wrapping globals in anything – plain old global variables are the thing best supported by all software tools I know.

I think this can be used as "rationale" in a coding guideline, maybe in the part allowing the use of globals as an "exception". But I keep my hopes low.

Ahem

To make an embarrassing story short:

  1. The merge scenario from the previous post doesn't work the way I said it works in any DVCS it was tried on. I talked about the case of "merging merges" – when two people resolve the same conflict independently in different clones, and then someone pulls from both clones. More specifically, I mentioned the case where the conflict was between two similar patches, and each of the two people took a different patch when resolving the conflict. I claimed BitKeeper would then remove both patches in the final merge. Well, I'm still sure it did work that way for me once (an older version of bk?.. some specifics I didn't notice?..) But bk 4 works differently; it seems to take the later conflict resolution (throwing away one of the patches in my scenario, but not both). Mercurial reportedly takes the earlier resolution, throwing away the other patch. Git and Bazaar reportedly require user intervention, possibly preventing damage by automerges at the cost of, well, requiring user intervention. bk and hg do manage to create a working version in my specific scenario, but of course throwing away one of the conflict resolutions isn't always safe that way, it's only safe in my scenario where both resolutions are basically equivalent. Anyway, while "merging merges" is specific to DVCS, no contemporary one seems to screw you nearly as badly as I described; "most vexing merge", oh, c'mon. Also, it would be easy to guess that they all deal with this scenario differently, because this whole business of merging is heuristical, what are the chances for different heuristics to do the same thing? And in general it's awfully lame to publish stuff first and check it later. I suck.
  2. And the overall excited mood of that article sucked, too, 'cause, like, c'mon, everybody knows that automerges can cut your fingers off, big deal, calm down. I mean, the worst merge-related bugs I dealt with came from automerges that any kind of version control system would allow. Two changes done in different files, that kind of thing. Coding in a "merge-friendly" way is something few people do, and it isn't that easy. For example, you basically must never change semantics of definitions. If your function didn't lock that semaphore, and now it does, then a call added in another branch, which was completely safe, can now cause a deadlock. So what are you going to do, modify the function name each time you change its "observable semantics"? Is everybody really that anal-retentive about it? I doubt that. But we all live with automerges because it's cheaper to deal with their occasional damage than with the constant damage of manual merges, which take lots of time and are intolerably boring, thus very error prone. Which is why I prefer distributed systems and their better ability to merge long-living branches due to detailed recording of change history, even though long-living branches are extremely harmful. Harmful as they are, they will occasionally flourish, and then you need strong automerge, not manual merge, to end their evil lives. But anyway, who cares about the preferences of a person who doesn't even bother to check his own trivially testable claims?

Blech.

DVCS and its most vexing merge

There's this very elegant way to shoot your leg off with a DVCS. Here's the recipe:

  1. I create a clone of the repository A and call it B. You create another clone of A and call it C.
  2. I add an if statement in B, fixing a bug. You fix the same bug in C in a very similar way.
  3. I pull your changes from C to B. There's a conflict around the bug since we both fixed it. I take my patch, throwing away your nearly identical patch.
  4. Meanwhile, your roommate pulled both B and C into his clone, D. And he had to resolve that conflict, too. And he took your patch, and threw mine away.
  5. Now, D and B are pushed back to A. DVCS quiz: what happens?

Answer: the system has accurately recorded the manual merges, so it has enough information to make this new merge automatically. It sees your patch, and it throws it away as it should according to my manual merge. It sees my patch, and it flushes it down the toilet since that's what your roommate said. Net result: both patches are gone, the bug is back in business. (Edit: it doesn't work that way – bk version 4 does a different thing, and other systems reportedly do still other things. Do you still want to read the rest of this?..)

Which of course doesn't matter, since it's immediately discovered by the massive Automated Test Suite. For example, if it's an OS kernel, each revision is automatically tested on all hardware configurations it should run on. And the whole process only takes 10 minutes, according to the Ten Minute Build XP Practice. No harm done, no reason to discuss it. I just thought it was a curious thing worth sharing.

Maybe it's a well-known thing, but I don't think it is, and if I'm right, it's definitely lovable. For example, here's what BitMover, maker of BitKeeper, the common ancestor of DVCSes, has to say about this:

"It's important to note that because BitKeeper has a star topology and its possible to share data with any repository, it's not necessarily recommended."

What this is trying to say is that the graph of pulls shouldn't be a generic graph and you're better off with a tree. That is, I shouldn't pull directly from you; we should both pull from and push to A. You and your roommate should also synchronize via A, or via A's "child" repository, but then you shouldn't push to A directly, only via that child, and so on. If we maintain this tree structure, the same conflict will never be resolved twice, and then we won't get screwed when the merges are merged.

I wonder if you could detect the situations when you "merge merges", that is, when the same conflict was resolved differently. You could then insist on human intervention and save those humans' bacon. I'm too lazy to think this out and too stupid to effortlessly see the answer, so I'll resort to a social heuristic like all of us uber-formal nerds do. The social heuristic is that Larry McVoy of BitMover has probably already thought about this, and found ways in which it could fail. So I'm not going to argue that BitKeeper merges are broken.

What I'm going to argue, at least for the next couple of paragraphs, is that it sucks when they tell you about their superstar topology and then explain that it's best to avoid actually using it. Not only that, but they fail to mention a fairly frightening and, trust me, not at all unlikely scenario which could actually persuade you to follow their advice.

Because when they tell me "we have this simple model of how things work – repositories with complete local history and changes flowing between them – but you should arbitrarily restrict yourself to a subset of this model, for reasons we aren't going to share with you, even though the general case works", when they tell me that, my reply is "I alias rm to rm -f". I understand how rm works, it's fairly simple, and I don't like to talk about it over and over again, "Are you sure?" – yes, I'm sure, thank you very much and good bye.

But the lovable part is, speaking of social heuristics, the lovable part is that BitMover said it right. Because if they mentioned that fairly frightening and not at all unlikely scenario, they'd scare people off rather than illustrate a point. On the other hand, when they say "It's good practice to think about how the data should flow", most people will nod and follow whatever advice they give them.

Just imagine a team of programmers engaging in the practice of thinking about how the data should flow, dammit, all on company time. Yeah, yeah, so BitKeeper earned a sarcastic comment on Proper Fixation. It's still a small price to pay for getting your message to the majority of your users.

You see, the majority of programmers are not just "irrational" as we all are, but their reliance on reasoning doesn't even exceed the mean of the population, which means they barely use reasoning at all, it's pure gut feeling.

For example, I was writing a bunch of macros in a proprietary debugger extension language. A guy who came to talk to me looked over my shoulder, and I explained, "Debugger macros. Very useful, a crappy language though." He said, "Yeah, looks like so."

HE COULDN'T POSSIBLY KNOW THAT. I knew he couldn't. How could he look at the code and realize that all variables were global? How could he know they were printed upon assignment, including loop counters ('cause it's a "macro", so it works just like assigning at the debugger console, which prints the variable)? He couldn't know any of that. So why did he agree with the "crappy" part? Oh, I knew why.

"You mean it has dollar signs?" Silence. "You mean it prefixes variable names with the dollar sign, don't you?" "Yeah, that." "Well, I like the dollar signs, helps you distinguish between your macro variables and the variables of the debugged C program. Anything else?" "Well, um, yeah, it looks kinda primitive." Low-end Ignorant Language Bigotry quiz: if "crappy" means "has dollar signs", what does "primitive" mean? Answer: no type declarations. I'm sure it was that, although I didn't go on and ask.

So that's "engineers" for you. If you want to write programs or tech docs for the average engineer, keep that picture in mind. Or don't. Aim for the minority, for people who don't work that way, under the assumption that they are the ones that actually matter the most. I don't know if this assumption is right, but at least it's lofty.

Why DVCS?

For the record, I had my share of both centralized and distributed version control systems, and today I like it distributed and I wouldn't ever want to go back, The Most Vexing Merge be damned. Why? I'll share my reasons with you through the story of My Last CVS To BitKeeper Exodus. I think I'll illustrate the engineers-and-reasoning-don't-mix point as a by-product, because that's how that story goes.

There recently was this argument about DVCS encouraging "code bombs", a.k.a "crawling into a cave". I haven't heard either of these terms, so I've been using a term of my own – "accumulating critical mass". The idea is to develop in your own corner, without integrating it with the main branch. You then show up with N KLOC of new stuff and kindly ask to merge it in.

Some people claimed this was particularly harmful for open source projects where there was no managerial authority to prevent this. Ha! Double ha. In an open source project, the key maintainers may say, "you see, we can't integrate it when it's done this way; we're sorry, but you should have talked to us." The changes will then be reimplemented or dropped on the floor.

Now, if you think that in a commercial environment a manager can easily decide to drop changes on the floor, together with the cost of implementing them and especially the cost of delaying the delivery of the features, if you think that, well, I wonder why you do. But perhaps a manager could insist on frequent integration? She could try, but she'd have to deal with real or imaginary cost of merges, increasing over time and getting in the way of deliveries. Of course there are perfect managers and perfect teams where it's all dealt with appropriately, you just have to find them.

So yeah, "code bombing" is a problem, especially in commercial projects. But the idea that DVCS encourages it? Hilarity! It's like saying that guns encourage murder. I prefer to think of guns as something that encourages fear of armed policemen, getting in the way of the natural instinct to club your neighbors to death. I mean, yeah, it's easier to code bomb with a DVCS, but with a centralized system, people use code bombing – or clubbing? – even more, because merging is harder, the cost of merges increases more quickly and the ability to force integration is thus lower. The criminals are poorly equipped, but so is the police.

And this is exactly what happened to the last team stuck with CVS that I helped migrate to BitKeeper. Everybody had their own version made up of file snapshots taken at different times and merged with the repository version to different extents. A centralized system doesn't record these local versions, so unless you immediately commit your merges, you are left with a version of a file which the system doesn't know. This means that the next merges are going to be really hard, because you've lost your GCA, the greatest common ancestor. So instead of a 3-way merge, you'll be doing a 2-way merge, which really sucks.

So I decided to not talk about the caves they were crawling into and the code bombs they were throwing at each other. Rather, I decided to show them how a 2-way merge couldn't do what a 3-way merge could. I still think it's the ultimate argument for DVCS, because DVCS is basically about accurate recording of all versions and not just the single time line of the main trunk. So the argument for it has to do with the benefits of such detailed recording.

So I gave this example where you start with a file having two lines:
aaa
bbb

And then I add a line, ccc, and you delete a line, aaa. If we have the GCA (a 3-way merge), then clearly the right result is deleting aaa and adding ccc, getting this:
bbb
ccc

But with a 2-way merge, we only have your file:
bbb

…and my file:
aaa
bbb
ccc

This can only be merged manually, because there's no way to automatically figure out that you deleted aaa and I added ccc; for all the tool knows, you could have done nothing and I've added two lines, so the right merge is:
aaa
bbb
ccc

…canceling your change. So it has to be manual merge. Manual merge means dozens of boring deltas you have to inspect in each file. That's what I call "costly".

Of course it doesn't matter in a right world, where people integrate frequently and always commit their merged files to the centralized repository. Except it wasn't so in the wrong world of the CVS developers I was "helping" to upgrade to new tools (for the last time in my life, people, for the last time in my life). And I thought we could avoid the discussion of the somewhat-technical-but-largely-social reasons of the constantly increasing cost of merges, and instead we could focus on the technical benefits of the 3-way merge and accurate GCA recording.

And of course I was wrong. The discussion immediately shifted to "we don't need merges" because everything is "modular" and there's a single owner to each file. Of course it wasn't, and there wasn't. Some things were used by everybody, like the awful build scripts and the DMA code. Some modules had two owners, or were in a transition state and had 1.5 owners, and so on. There were merges all over the place.

And if there weren't merges and merge-related problems, how come everybody worked on their own "pirate" version which was never recorded in the main trunk and was made from a colorful variety of files partially merged at different times? How come changes propagated with cp and emacs-diff and not cvs update? And why was the tech lead so passionate about moving to BitKeeper which doesn't let you partially update a repository so you have to merge everything? And why did everybody anxiously object that necessity if there were "no problems with merges"?

The final result: the tech lead simply forced the migration to bk. Everybody on the team hated me for my connection with the idea (I wasn't on their team but I used to be a likable satellite and now became a hateful satellite). Developers who I thought were their best eventually (and I mean eventually) told me it was actually a good thing. So it wasn't a bad closure. And still, I decided that I'm not going to "help" anybody "deploy" any kind of "tool" in this lifetime again, roughly speaking. Too much emotions for this programmer.

And this was supposed to show why I like DVCS, at least in the imperfect world where long-living branches occasionally happen, and the kind of reasoning I think is interesting in this context, and the kind of reasoning other people I came across found interesting. So there were are.

P.S. Why "most vexing"?

I thought I saw that "C++'s most vexing parse" from Scott Meyers' Effective STL has its own Wikipedia entry, but apparently it doesn't. It's basically a variation on the theme of C++'s declaration/definition ambiguity, and I liked the term, especially the "most" part where parses are unambiguously sorted along the vexing dimension. So I figured "X's most vexing Y" is a good template.

I'd like to use this opportunity to say that I skimmed though Effective C++, 3rd Edition, and… Where do I start? There's an advice to create date objects with "Date d(Day(31), Month::april(), Year(2000))" or something. That is, create special types for the constructor arguments. Well, it doesn't check that April comes without the 31th day, does it? The Date constructor could test for it though. Well, why not test for April 41st in the Date constructor, too, and, ya know, spare the users some keystrokes, if you see what I mean? The code is verbose. C++ compiler error messages are verbose. VERBOSITY EVERYWHERE! Help!

This raises the question to the author, whether he ever worked with a system where every piece of data comes covered with the toxic waste of overzealous static typing. But this borders on an ad hominem attack. And seriously, that sort of thing is to be avoided, at least until somebody proposes to have named constants for days or years and not just months.

So instead of the personal attack, I'll ask Software Development Consultants, all of them, to kindly change the phrasing "it's best to do so and so" to "I like so and so" or something. Because we have this huge crappy-dollar-sign crowd, and they copy style from each other like crazy, and their ultimate source of style is books by Software Development Consultants, and whenever they see a "best practice", their common sense is turned off and they add the technique to the bag of tricks. So Consultants have a great power in this world. It doesn't make the common sense shut-off feature their fault, but power they do have.

And with great power comes great responsibility, profanity deleted. I mean, you're obviously giving advice neither you nor others have tested for very long, out of generic principles, profanity deleted. Like "prefer algorithms such as for_each to loops", an advice issued before fully compliant implementations of STL were even widely available, profanity deleted. Quite a piece of advice, that, profanity deleted. Couldn't you at least phrase your advices in a little less self-assured way, fucking profanity deleted?

For example, Meyers has finally lowered the bridge and let the enemy of template metaprogramming occupy a notable share of pages in an Effective C++ edition. I still remember his promise to "never write about templates", in the preface to Modern C++ Design, I think. And now the promise is broken. Hordes of clueless weenies are rushing into the minefield of template metaprogramming as we speak, since it's now officially Mainstream C++. Can you imagine the consequences? I can't. It's too awful. I think I'll go to sleep now.

Extreme Programming Explained

When I saw Kent Beck's "Extreme Programming Explained" in our office, I was shocked. I've already accepted the inevitable occasional purchasing of obscure C++ wisdom, exemplified by titles such as "Effective C++", "Exceptional C++", "Imperfect C++", "Modern C++ Design" and so on. Yikes. Oh well. As long as nobody uses the boost libraries in production code, they can entertain themselves by ordering whatever they want as far as I'm concerned.

But XP? A software development methodology? A programmer ordered a book on methodology? I have to find out who that was. Well, I found out and the shock became worse, because it was one of the cooler kids. Largely out of respect for that guy, I grabbed the book, took it home and read it. What follows is my book review. Spoiler: I say quite some positive things there.

XP and hire/fire dynamics

I didn't look for quotes on this subject, it's just that all of them caught my eye. Maybe it's because HR implications of things are inherently interesting, maybe it's because everything else in a "methodology" is life-threateningly boring, and maybe both. Anyway, the quotes are:

"Given the choice between an extremely skilled loner and a competent-but-social programmer, XP teams consistently choose the more social candidate."

Gotta love the "but". Well, what can I say about this approach? This approach would prevent the best pieces of software and hardware I've seen developed from happening. In all those cases, there was one or more "extremely skilled loners". More on that later, when we get to code ownership, open spaces, pair programming and the like.

For now, it's sufficient to say that a methodology preferring mediocre to "extremely skilled" and calling itself "Extreme Programming" is, um, interesting. That "competent-but-social" is a euphemism for "mediocre" is as clear to me as the fact that mediocrity, while related to lack of talent, is first and foremost a personal value.

"Here's a sad but repeated story: a development team begins applying XP, dramatically improves quality and productivity, but then is disbanded, its leaders fired and the rest of the team scattered. Why does this happen? … The team's improved performance shifted the constraint elsewhere in the organization. The new constraint (e.g. marketing, who can't decide what they want fast enough) doesn't like the spotlight."

I have a fairly developed imagination, especially when it comes to organizational dysfunctions. For example, I can imagine how a team delivering a good result without working overtime will be appreciated less than a team delivering a bad result through heroic efforts (this example is from the book, too; of course the unappreciated team is the one "practicing" XP).

Getting fired over productivity? Because marketing can't invent features fast enough? What??

Now, that certainly challenges one's imagination skills. But we aren't the kind of people to turn down a challenge, are we? OK, lez do it. Imagine. Imagine an organization so rotten, so disgusting, so peculiarly brain-damaged that it can fire people because they work too fast to invent new work for them. Hooray! We did it! I see it right in front of my closed eyes!

But wait, what's that? Are those unfortunate victims of corporate idiocy Extreme Programmers? The competent-but-social programmers? If they're so "social", can't they see what stinky a hole they've landed at? You don't get fired from such places; you quit. I mean, realizing that you're in a hopeless human quagmire is among the most basic social skills – even I have it.

I failed the challenge. I can't imagine this picture.

Brad Jensen, Senior VP, Sabre Airline Solutions: "If programmers won't pair or if they insist on owning code, have the courage to fire them. The rest of the team will bail you ought."

Aha. XP, the two-edged sward. Get fired for using XP or get fired for refusing to use it. Just ducky.

No, really. I don't want to be mean. I won't even pick on the VP's "courage to fire" wording. You know what – I'll even praise this wording. The employer depends on a strong programmer more than the programmer depends on them, and firing an evil, antisocial strong programmer is likely the most courageous act a manager can commit.

All I'm saying is, XP is basically this religious cult, as more quotes will show. This cult is spreading in your organization. They fire people. They easily fire exceedingly productive people, "skilled loners" that won't "pair" and insist on code ownership – nothing evil or antisocial. They fire people just for refusing to accept their most controversial "practices".

Could it be the real reason for the occasional extermination of productive XP teams, assuming Kent Beck is right and the disbanded XP teams he saw actually were productive? With all those "loners" I work with and the kind of "process" we use, I can easily imagine my reaction to a hypothetical XP epidemic at my workplace. Probably something along the lines of "Come and get me! Let's see who gets fired first."

XP "practices"

Why do I place the word "practice" in quotes? I dunno, do you know a better way to typographically mark stupidity? The word kinda reminds of "spiritual practices", the religion thing again. But primarily it reminds me that there's just too much money in this industry. I mean, I have a relative who makes and sells ceramics. Neither she nor her employees use any "best practices". There's not enough money in ceramics for "practices"; you have to work. No time to listen to people talking about work, hence no market for their talks and writings.

This term is annoying on so many levels that I could go on for ages. For example, the words "best practices" and "questionable practices" convey infinite amount of ignorant, arrogant idiocy. You could never pass an exam citing "best practices" – you are supposed to analyze the problem and prove your solution. But in the industry it's perfectly possible to get away with, and even get promoted by substituting analysis with rules of thumb, irrelevant to the case in point as they may be.

OK, enough said about the term "practices"; maybe it's just me. Let's talk about some practices.

Sit Together

XP likes open space. I don't like open space. I'm exceptionally good at ignoring external events when I concentrate, to the point where it takes me 10 seconds to understand what you're saying when you interrupt me in the middle of something. But there's a limit to this autism, and when lots of conversations I can relate to, intellectually and emotionally, happen around me, I can't work.

Many people have talked about the state of "flow" and how you don't want to interrupt someone when they're in it because they're extremely productive, and how interrupting someone all the time means they work at a fraction of their full speed. Myself, I "value" communication as this would be called in XP. I like to talk to people, and people like to talk to me. Lots of them. And much as like it, I like flow, too, and closing the door in my room is quite a Best Practice.

To be fair, with XP, you won't be able to concentrate behind a closed door, either, because of…

Pair Programming

We'll start with some quotes:

"Personal hygiene and health are important issues when pairing. Cover your mouth when you cough. Avoid strong colognes that might affect your partner."

"When programmers aren't emotionally mature enough to separate approval from arousal, working with a person of the opposite gender can bring up sexual feelings that are not in the best interest of the team."

"In Figure 6 the man has moved closer to the woman than is comfortable for her. Neither is making his or her best technical decision at this point."

"Programmers that won't pair." Sounds more like "programmers that won't couple."

"I like to program with someone new every couple of hours…"

OK, OK, this is neither emotionally mature nor in the best interest of the team. Enough with this sort of quotes.

XP measures time in "pair-hours", since all production code must be written in pairs. Since we programmers are good at math, it's easy to see that this halves your task force. Kent Beck claims that pairs are more than 2x more productive than 2 people working alone, so it's a net gain. Is that really so, even if each of the 2 people can close the door?

I worked in teams of size 2 or 3, with everybody going pretty much at full speed, and I can't imagine gaining back the 2x spent on sitting together. Maybe if you're doing something so extraordinarily boring and trivial that you can barely move without talking to someone about it.

No, really, I think XP is for writing huge piles of straightforward code. And here's my proof.

Shared Code

As we've seen, in XP, there's no code ownership unless you want to get fired. The code is owned by the team, and everybody can and is encouraged to change ("refactor") any piece of code, any time.

This concludes the triad of XP practices that I'll call "The Coupling Practices", both because of my emotionally immature amusement with the quotes above and because of the coupling between everything (it's one system owned by everyone, not separate modules with separate owners). The Coupling Practices are:

  • Open space
  • Work in pairs
  • No code ownership

Kent Beck raises just one objection to the shared code practice – people might act irresponsibly and make expedient changes. Well, paranoid attitude towards coworkers is not one of my many sins; if you trust someone to work for your company, then you trust them to act responsibly most of the time, don't you think? My objections are different.

First, the coupling. I'm a great believer in Conway's Law: Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations. I'm the great believer that I am simply because I've never seen this law fail in practice. Strong module boundaries mostly appear at social boundaries – code ownership and responsibility – or at technical boundaries (different programming languages, different processes, different machines, kernel space/user space).

If everybody owns all modules, and unless each module is a separate kernel module, they'll quite likely become one big hairy program. And I don't like coupling, at least not in this sense. Sure, XP encourages refactoring, which should take care of the problem; I just never saw anything beat Conway's Law. To me, it's like defeating the law of gravity with spiritual practices.

My second objection is that shared ownership can't work except for trivial code, and this is the proof of the "XP is for trivial stuff" claim I've made above. Let me start with an example.

I used to think that ASIC engineers, the kind that write hardware description code in Verilog or VHDL, are sociopaths. I mean, the ones I've met were very helpful and patient when it came to explaining things to my feeble programmer mind, but apparently they couldn't stand each other. If one of them even touched another's code, the other guy would go to pieces. As a programmer, I was used to the Shared Code practice (you don't need XP to have a happy tar pit of coders friendly patching each other's stuff). So these hardware types scared me off.

And then I began to understand what you were saying in Verilog. In hardware, you have a bunch of variables, and you have code that computes the next values of variables given their current values. Hardware is like a huge recursive function of its registers' state, with the reset logic being the base of the recursion. Now, it's trivial to understand how a variable is updated – it's spelled using the usual arithmetics and logic and if-then-else and stuff. But understanding how all these updates propagate, and what will happen in, say, 4 cycles, and how the pipelines work together is, I dunno, impossible.

So you see, when you have this mental model of how this monstrous state machine works, and someone makes a change to it, you tend to be upset, because it's hard to read someone else's change and update your mental model. And why are they so sure their change makes sense in the first place? They can't possibly understand everything there is to understand! You could say that it's all irrelevant since XP is about software, and this is hardware, but to me, the line is quite blurred. It's code in a programming language. You can run this code on a simulator or an FPGA without manufacturing anything, just by copying bits. It's software.

Now, obviously we have the other extreme – code so straightforward that you can read it from top to bottom and understand it completely. Say, a one-screen script shoveling through the file system, doing simple regexpy parsing and printing statistics. You know, Practical Extraction and Report. They even have a Language for it. The question is, where is most software – is it like a Verilog module or like a screenful of Perl? (BTW, my Verilog and Perl guru is the same person.)

Well, let's see. Low-level interrupt handling and scheduling code is definitely like Verilog – touch my code and I'll go to pieces. Compilers are also like Verilog, because you have complicated algorithms and you can't just read them and say "aha, I see why this is doing what it's supposed to be doing". There are lots of heuristics and lots of knowledge about interaction of passes and complicated data structures. Optimized image processing code is also like Verilog, because there are all the precision considerations and knowledge about the target optimizer and platform. Computer vision code is also like Verilog… Everything I currently deal with is like that.

I mean, code is not like descriptive prose you'd put in a local wiki. You can't read it, say "I get it!", update it, and make the author happy with the "refactoring". Code is more like poetry: change this line, and now the next line doesn't rhyme, or you've broken the rhythm, or you've put angry words into a happy poem, that sort of trouble. Which is one reason to like code ownership.

It's probably useful to have a second programmer with "read access" to important pieces of code, so that at least 2 people can help debug each piece. You can get there with code reviews, and without the Coupling Practices, which I can't imagine working except for code so straightforward that I doubt there's much of its kind.

XP and CMM

Brad Jensen, Senior VP, Sabre Airline Solutions: "The pure XP projects have very few defects… (Even) the (impure) XP projects have very competitive defect rates, one to two defects per thousand lines of code. The Bangalore SPIN, consisting of ten CMM level-five organizations, reports an average of 8 defects per thousand lines of code."

"Defects per LOC". Interesting metric. How do we improve it? First, we don't record "defects". What's a "defect"? Are 10 reported problems caused by one defect or several? Depends on the "root cause analysis", which is of course up to us coders. The second important thing to do is to write more code. That guy working on the data-driven layout rendering should be taken out and shot. We have editing macros for that. You could squeeze 50K LOC out of rendering if it was spelled as code like it should be.

Who is stupid enough to use a metric encouraging people to misattribute problem reports, increase code size, or simply quit the silly job? Why, it's the Capability Maturity Model, of course. Meet XP's competitor: the dreadful CMM.

I could have received a certificate telling that I was trained in CMM, but I couldn't take it and asked my manager to bail me out (and so he did, THANKS!!). It was stupid to the point of physical pain. The CMM instructor, making $160/hour, talked like this: "Blah-blah-blah-PRACTICE!! Blah-blah-blah-ESTABLISH-AND-MAINTAIN!! Blah-blah-blah-GOAL!!" – yelling at some random word, so you couldn't even fall asleep in the absence of any kind of rhythm your brain could learn to ignore. The sleep deprivation and the countless repetitions caused some of the garbage to be ingrained in my memory. Here's an excerpt for ya.

CMM has 5 levels. CMM level 1 is where you operate right now: it denotes the ability to ship something. CMM level 5, the one mentioned in the interview from Kent Beck's book, denotes complete paralysis. To do anything, you have to write or update so many documents, have so many meetings with "relevant stakeholders", and to perform so many pointless measurements of the defects/LOC kind, that it's much more productive to just go postal and at least remove a CMM auditor or two from the face of the Earth in the process.

Levels 2 to 4 indicate various intermediate stages where paralysis is spreading, but you can still ship. For example, level 2 is called "Managed Process". "Managed Process is distinguished by the degree to which the process is Managed". I'm not making this up. There's a book called Capability Maturity Model Integration, and this book, heavy enough to kill a human, is full of this sort of stuff. Reading it is impossible.

And this is why I think XP is a great thing. (See? I promised I'll say positive things; and I'm not done yet!) I've actually read Extreme Programming Explained. About 75% of it seemed meaningless, but I made it through. The CMMI book, on the other hand, is pretty much infeasible.

CMM has hundreds of practices. XP has a couple dozens. CMM has extremely costly certification process. XP doesn't. CMM forces you to write a zillion documents. XP forces you to write a zillion tests, documentation of features, and story cards. CMM is about lengthening the development cycle. XP is about shortening the development cycle.

CMM is inflicted on you by customers who believe the lies of the worthless bastards from the Software Engineering Institute that their "process" will make you ship quality software. XP attempts to establish itself as an alternative legitimate "process" in the realm of suspicious customers and costly vendors. XP tries to sell commonsense stuff like automated testing to the desperate programmers working on a single-customer product with scarce resources. XP also tries to sell itself to the customer, relieving the desperate vendor's programmers from the insane, intolerable overhead of level 5 CMM paralysis.

XP brings hope to dark, wet, stinky corners around the world. I sincerely think it's great. The permanent brain damage of that CMM training course makes me admire XP.

But then there are people who work for product companies and inflict XP upon themselves. It's like paying taxes the government doesn't ask you to pay. XP is an alternative to CMM. You need it if your customer or manager requires you to use a ready-made methodology, out of lack of trust. If you don't have that problem, make yourself your own "process". You are blessed with an opportunity to just work, and organize your work as you go and as you see fit. Why "practice" stuff when you can just work?

How many product companies use CMM, XP or any other ready-made process, and how many make their own process? How many successful companies borrow existing "methodologies" without being forced to do so? I don't know the numbers, but my bet is that most of them don't.

XP and religion

"EMBRACE CHANGE"

This quote is right from the book cover. "Extreme Programming Explained. EMBRACE CHANGE." Does it freak you out the way it freaks me out? Maybe it's because of the cultural gap created by my Russian origins? Nay, I know plenty of English slogans I can relate to. Say, "Trust a condom". Beats "Embrace change" hands-down. Changes come in two flavors, good and bad. Should I "embrace" both kinds?

"Even programmers can be whole people in the real world. XP is an opportunity to test yourself, to be yourself, to realize that maybe you've been fine all along and just hanging with the wrong crowd."

Is this a religious cult or what?

"The key to XP is integrity, acting in harmony with my true values… The past five years have been a journey of changing my actual values into those I wanted to hold."

"Journey". Talking about being good. Do you like hippies? I like hippies more than nazis. I like XP more than CMM. But IMO the hippie world view and general style is suboptimal.

"With XP, I work to become worthy of respect and offer respect to others. I'm content to do my best and strive always to improve. I hold values I'm proud of and act in harmony with those values."

"I have seen people applying XP bring renewed hope to their software development and their lives. You know enough to get started. I encourage you to start right now. Think about your values. Make conscious choices to live in harmony with them."

It's a religion, people.

I'm not strictly against religion. I even have vague respect towards religious people, those stating that their purpose in life is to be good and that they know how to do it. (I similarly respect the intentions and discipline of various development methodology followers.) Of course when a religious person behaves as an asshole, it's way more annoying than a secular asshole is, because the latter at least doesn't state that he's all about being good and living right. (I noticed that many people who love to talk about "proper process" commit the worst atrocities.)

Some religions or sects are awful, to the point of human sacrifices. (CMM.) Most modern mainstream religions are kinda nice though, and they promote good Values and Practices, like not killing, not stealing and helping each other. (XP promotes automated testing, short build, integration and release cycles.) Many religions have peculiar regulations when it comes to sex. (XP has The Coupling Practices.) This is when religious people start to get annoying, to the point of throwing stones at people who are improperly dressed, not to mention engaging in forbidden sexual relationships. (Refuse to "pair" and XP gets you fired.)

Interestingly enough, the majority of religious people feel much stronger about the peculiar regulations of their religion than the universally recognized atrocities, which are also forbidden by their religion. They never gather around prisons to throw stones at convicts going on vacation, they are more likely to do so near a gay parade. (The Senior VP from Kent Beck's interview didn't recommend firing over lack of tests, which all of us would recognize as a crime; he recommended firing over the violation of The Coupling Practices.)

And this is when my patience towards religious people evaporates. So there's a gay parade. Got a problem with it? Shove it! I'm not gay, but the idea of harassing people based on this sort of criteria is disgusting. (I think I can blend in very well in a team engaging in pair programming and collective ownership, but I know awesome programmers who can't work that way; just try to get them fired.)

But still, religion is not all bad. I think that there are religious communities which are less inflicted with drug abuse compared to your average secular community. (It's not unlikely that many XP teams have more tests and less bugs than "secular" teams using no "standard" process.) Few people have the courage and the patience needed to openly attack religion; in particular, holy scriptures tend to be long and obscure. (Skim through an XP or especially a CMM book and try to stay awake or find refutable claims.)

Young people who aren't cynical enough to realize that bullshit is free and hence everybody is going to bullshit you if only you let them tend to be an easy target for religions seeking expansion. (When I started working as a programmer, I was shocked how messy everything was, and cheerfully welcomed every "methodologist" about to create order out of the chaos. Today, even saying "Best Practice" near me is nor recommended.)

Summary

I treat XP as a religious lifestyle (true of any methodology). XP is not a bad religion. Its primary virtue is the threat it poses to the cannibalistic cults such as the CMM. While I don't want to be anywhere near XP churches, I think religion is an inevitable attribute of human existence, and as far as religions go, XP is not at all bad.

OO C is passable

My problem with C++ bashing is that I'm overqualified. Ever since I've published C++ FQA Lite, I've tried to stay out of "C++ sucks" discussions. I've said everything before and I don't want to say it again. And then much of the C++ arcana is finally fading from my memory; good, no need to refresh it, thank you.

I don't always quit those discussions though. How should I do that? If I were someone else, I could send a link to the C++ FQA to end the discussion ("if you really care, check out yada yada"). But I can't use this link myself, because "OMG, did you actually write a whole site about this?!"

So the last time I didn't quit this discussion, I said: "You know, at some point you actually start to prefer plain C". The seasoned C++ lover replied: "You mean, the C with classes style, with no tricky stuff? Sure, don't we all end up writing most of the code like that?" No, said I, I meant plain C. No plus signs attached. File names ending with .c. C without classes. C.

"Now that is fanaticism," said the guy. "What could be the point of that? You know what? You may like C++ or you may dislike it, but face it: C just isn't good enough".

Fanaticism?! Now that is an insult. Yes, I recently decided to write a bunch of code in plain C inside a C++ code base. I did this after maintaining the previous, C++ implementation of that stuff for 4 years. For at least 3 of those years, I didn't do any significant rewriting in that code, because it could interfere with schedules and it would complicate merges. Although some pieces were hopelessly awful. Sloppy text parsing interleaved with interrupt handling (I exaggerate, but only very slightly). But it worked, so it was hardly urgent to fix it.

And then it had to be ported to a new platform. And I obsessed over the rewrite-or-extend question for a lot of time. There was a critical mass of incompatible new stuff, so I chose "rewrite". But. I decided to write the new version in C++. Why? Because it's a C++ code base, and people are used to C++ and its style and idioms, however brain-damaged. "So you, YOU will do it in C++ after all?! You're a wuss," said my manager, an old-time C++ hater. Time for the rhetorical question. Does this sound like the story of a fanatic?

OK then, why did I decide to do it in C after all, you may ask. I'll tell you why. I did it because everybody was sick and tired of the build time of my auto-generated C++ code.

You see, the whole thing is data-driven. The data objects describing its workload are generated at build time. Do you know any way of generating C++ code that doesn't compile slowly as hell? I don't. I've generated both "real" code (functions doing stuff) and "data definition" code (which does practically nothing except for calling object constructors). And it always compiles slowly. Sometimes turning optimization off speeds up compilation significantly, sometimes less significantly. But the best part is this: you never know exactly what's your problem. Try to profile a C++ compiler.

It's not that manually written C++ code is such a blast. For example, we have a ~2K LOC file defining a class template and explicitly instantiating it 4 times. It worked fine for years, until it met a particular version of the Green Hills C++ compiler. That version decided that it needs more than 1.5G of memory to compile that file. I don't know how much more exactly, because at that point my workstation suffocated, and could only breathe again when the process ran out of swap space and died. Here's another rhetorical question: how the fuck are you supposed to figure out what the fuck causes this problem?

What's that? "It's a tool problem, not a language problem?" Bzzzt, wrong answer! It's neither a language problem nor a tool problem; it's my problem, because I must fix it. And this is why I don't want to deal with a language that consistently breeds tools which create such problems for me. But since I'm hardly a fanatic, and I know exactly why I do want to work on this particular C++ code base, I hold my nose and I delve right into the pile of excrements and find out that if you instantiate each template in its own file, then the process memory consumption barely crosses the 350M mark. Nice and compact, that. So, let's use 4 files.

Nope, manually written C++ code isn't a picnic. But auto-generated code is worse, because it relies on some set of features and uses them a lot. The number of uses per feature per file matters. 1 explicit template instantiation per file = 350M of compiler process memory. 4 instantiations = out of memory. What about "simpler" features, but hundreds of uses? The compiler will grind to a halt for sure. Um, you might say, don't other languages have "features" which will be used hundreds of times by generated code? Yes, they do. Those features just DON'T SUCK quite as impressively. Go ahead, show me a problem with compilation speed anywhere near what C++ exhibits in another language.

Face it: C++ just isn't good enough. "If you really care about C++ parsing complexity, check out the FQA yada yada". I wrote "a whole site" about it, you know. Bottom line: if you generate native code, you want to define your object model such that the generated code can be C code. Assembly is pointlessly low-level and non-portable, and C++ sucks. Believe me, or die waiting for your code to compile.

So, C. My object model will be in C. Um. Bummer. It's an OO thing, with plugins and multiple inheritance and virtual inheritance. It has to be. You have orthogonal plugins which want to derive classes from a common base – a classic diamond hierarchy. Well, I can have a couple of macros for doing MI-style pointer arithmetic, by fetching the derived-type-specific offset of each base class object at run time. No big deal. I even like it better than the C++ MI downcasting syntax – at least you know exactly what you're doing, and you don't need to think whether it should be dynamic_cast or static_cast or eat_flaming_death_cast to really work.

But I miss virtual functions. I really do. I sincerely think that each and every notable feature C++ adds to C makes the language worse, with the single exception of virtual functions. Here's why not having virtual functions in C sucks:

  • You can't quite fake them with C macros.
  • Virtual function call is a shortcut for obj->vtable->func(obj, args). The OO spelling – obj->func(args) – is of course better.
  • You'll usually try to make the C version shorter: obj->func(args), obj->vtable->func(args), or obj->func(obj, args). Quite likely you'll find out that you really needed to pass obj to func and/or the vtable level of indirection. Updating the code may be tedious/too late/really annoying (because of having to admit a stupid mistake). The eventual lack of call syntax uniformity will also be annoying.
  • Decent C++ debuggers automatically downcast base class object pointers to the real run time type when you inspect the object, even when C++ RTTI support is turned off at compile time. They do it by looking at the vtable pointer. Achieving this with OO C is only possible on a per-OO-faking-style, per-debugger basis, using the ugly debugger scripting facilities. Most likely, you won't do it and choose interactive suffering each time you debug the code, having to figure out the actual type yourself and cast pointers manually.
  • With virtual functions, base class implementations are inherited automatically. With explicit vtable structures, you need to either have links to base class implementations (the slow MFC message table way), or you need to explicitly and fully fill the vtables in each derived class. Possibly using the C default of trailing zeros in aggregate initializers as in vtable_type vtable={foo_func,bar_func} /* and the third member, baz_func, is 0 - we check for zero vtable entries before calling our pseudo-virtual functions */. Run time checks for null function pointers can make initialization code smaller, but they also make function calls slower.
  • With explicit vtable initializers, you only see the position of the function in the vtable initializer and its "derived class" name (my_class_baz_func), not its "base class" name (baz_func). You are likely to have a slightly inconsistent "derived class method" naming convention, making it annoying to figure out exactly which base class function we're overriding here.

An impressive list, isn't it? You can see from it that I've really put my employer's money where my mouth is and actually worked with OO C for a while. Aren't C++ classes with virtual functions simply better? No, because C++ classes don't support aggregate initialization. If you have C structures with vtable pointers, you can use the frob_type g_obj={&vtable,5,"name"} initialization style. This translates to assembly that looks like so:

g_obj:
.word vtable
.word 5
.word .str34
.str34:
.asciz "name"

This compiles and loads as fast as it gets. Now, if you choose real C++ vtables, you rule out aggregate initialization once and for all. Your initialization will instead have to be spelled as frob_type g_obj(5, "name"), and even if you have no constructor arguments, C++ will generate a constructor to set the vtable pointer.

The good news: at least the explicit reference to the vtable in our source code is gone. The bad news: with many objects, the C++ version compiles very slowly (I've checked with GNU and Green Hills C++). It also generates a much larger image, since it emits both global data objects and assembly code copying them into object members at run time. The latter code also costs you load time. And if you crash there, good luck figuring out the context. But as far as I'm concerned, the worst part is the build time explosion.

Yes, yes. It's not important. And it's FUD. And it's a tool problem. A "good" compiler could optimize the C++ version and generate the same assembly as the C version. And it could do it quickly. Those compiler writers are just lame. I completely agree with you, sir. Just go away already.

By the way, the same trade-off happens with C++ containers, like std::vector. They're better than {T*base;int size;} structures because you have the shortcut of operator[] (as opposed to array->base[i]). And because debuggers can gracefully display all elements of std::vector as a list of the right size. Some of the debuggers can. Sometimes. Sometimes it breaks. But when it doesn't, it's fun. But, again, you can't use aggregate initialization once your structure has a std::vector in it. And C++0x won't solve it, because its pseudo-aggregate initializers are syntactic sugar, and my problem here isn't the syntax, it's the build time.

And std::vector forces allocation of data on the heap (let's not discuss custom allocator templates, 'cause I'm gonna vomit). Can't have the base address of a std::vector point to a global variable generated specifically to hold its data.

I like things to point to globals generated to hold their data. Helps a lot when you debug, because your pointer is now firmly associated with a symbol table name. And no matter what memory-corrupting atrocity was committed by buggy code, that association will be there to help you figure things out. And heap corruption is very common in C++, because it's a completely unsafe language. So I care a lot about debugging core dumps with corrupted memory. Which is why base, size structures get my vote.

And that's an important point: you can live with just safety, or just control, and of course with both, but if you have neither safety nor control, then, sir, welcome to hell. Which is why OO C is worse than OO in Java or Lisp or PowerShell, but better than OO in C++.

And OO C is not all bad after all. Manually faking virtual functions has its benefits:

  • You can dynamically "change the object type" by changing the vtable pointer. I first saw this in POV-Ray, which has a vtable for circles and a vtable for ellipses. When a geometric transformation applied to a circle object transforms it to an ellipse, the vtable pointer is changed to point to the more generic and slower ellipse rendering functions. Neat. You could do this using C++-style OO by having another level of indirection, but with C, it's marginally faster, which can matter sometimes. And the C way is much neater, which is useful to balance the frustration caused by the drawbacks of fake OO. I use this trick a lot for debug plugins in my fake OO stuff.
  • Likewise, you can overwrite individual vtable entries. This changes the type of all objects of that "class" – primarily useful for logging and other sorts of debugging.
  • Sometimes you really don't need obj->vtable->func(obj, args) – say, obj->func(args) is good enough. And then you win.
  • You don't have to use a structure with named members to represent a vtable. If a bunch of functions have the same prototype, you can keep them in an array. You can then iterate over them or use an index into that array as a "member function pointer". This way, you can have a function calling a method in a bunch of objects, and the method to call will be a parameter of that function. The C++ member function pointers don't support the iterate-over-methods trick as efficiently, and their syntax is remarkably ugly.
  • That each function has a unique non-mangled name (as opposed to A::func, B::func, etc.) has the benefit of making the symbol table clean and independent of the non-portable C++ mangling. And you no longer depend on the varying definition look-up abilities of debuggers and IDEs (I don't like the lengthy disambiguation menus when I ask to show me the definition of "func", and these menus show up too often).
  • If you serialize the objects, or you have programs shoveling through process snapshots and analyzing them, the memory image of OO C objects is easier to deal with because of being more portable. Basically you only need to know the target endianness, the alignment requirements of built-in types, sizeof(enum) and the implementation of bitfields, if you use that. With C++, you need to know the layouts of objects when multiple/virtual inheritance/empty base class optimization/etc. is at play; this isn't portable across compilers. And even that knowledge is only useful if you know the members of each class; otherwise, you need to parse the layouts out of non-portable debug information databases – parsing C++ source code is not an option. With C, you can parse its files and find the struct definitions and figure out the layouts and it will be really easy and portable. Been there, done that.

Of course, you can use all those tricks in a C++ object model when needed, and use virtual functions elsewhere. And if C++ didn't suck so much (for example, if code compiled reasonably fast and if it had a standard ABI and and and…), that would be the way to go. But since C++ just isn't good enough, and I have to use a pure C object model, I point out the benefits of the sucky thing that is OO C to make it a little less bitter.

Overall, OO C is passable, more so than the endless rebuild cycles caused by our previous C++ object model. So you have a bunch of lame stuff. Having to use a typedef because struct A can only be referred to as struct A, not A. No default function arguments. Having to declare variables at the beginning of the scope. (BTW, I use C89. I don't believe in C99. I'm a Luddite, you already know that, right? I don't think C99 is as portable as C89 here in 2008).

So, yeah. I do miss some C++ features, but it's a small itching here, a tiny scratching there, nothing serious. I can live with that as the price for not having my legs shot off by mysterious compile time explosions and other C++ goodies. OO C is good enough for me. Fanaticism? You be the judge.

Redundancy vs dependencies: which is worse?

I believe that there are just two intrinsic forces in programming:

  1. You want to minimize redundancy and, ideally, define every piece of knowledge once.
  2. You want to minimize dependencies – A should depend on B only if it absolutely must.

I think that all other considerations are of the extrinsic real-world kind – domain modeling, usability, schedules, platforms, etc. I also think that I can show how any "good" programming practice is mainly aimed at minimizing redundancy, dependencies, or both. I even think that you can tell a "good" programmer from a "bad" one by their attitude towards redundancy and dependencies. The good ones hate them, the bad ones don't care.

If this idea looks idiotically oversimplified, note that I mean "programming aptitude" in a narrow sense of code quality. I've seen brilliant, cooperative people with uncanny algorithmic capabilities who still wrote awful code. I tried to figure it out and the common denominator seemed to be that they just didn't care about redundancy or dependencies, or even kinda liked them. Maybe it still looks idiotically oversimplified. Let's leave it at that, because it's not what I'm here to talk about.

I'm here to talk about the case when minimizing redundancy conflicts with minimizing dependencies. This case is basically code reuse beyond module boundaries. You can choose between having modules A and B using a module C doing something, or have them do it themselves. What's your call?

One strikingly dumb thing about this question is that it's centered around the term "module", which is vague and informal. However, "module" is what makes this a trade-off. Inside a module, of course you want to reuse the code, end of discussion. Why would anyone want to parse two command line options with two duplicated code snippets when you could use a function?

On the other hand, if two modules parse command lines, we can still factor out the parsing code, but we'd have to make it a third module. Alternatively, we can stuff it into a "utilities" module. The one affectionately called "the trash can". The one which won't link without a bunch of external libraries used to implement some of its handy functions. The one with the configuration which always gets misconfigured, and the initialization which never happens at the right time. You know, the utilities module.

I believe that years of experience barely matter in terms of knowledge. You don't learn at work at a pace anywhere near that of a full-time student. Experience mainly does two things: it builds character, and it destroys character. Case in point: young, passionate programmers are usually very happy to make the third module, nor do they cringe when they delve into the utility trash can. They're then understandably offended when their more seasoned colleagues, having noticed their latest "infrastructural" activity, reach out for the barf bags. This certainly illustrates either the character-building or the character-destroying power of experience, I just don't know which one.

No, seriously. Take command line parsing. You want common syntax for options, right? And you want some of them to accept values, right? And those values can be strings, and booleans, and integers, right? And integers can be decimal or hexadecimal, right? And they can be values of user-defined types, right? And they can have help strings, right? And you'd like to generate help screens from them, right? And GUIs with property pages? And read them from configuration files? And check the legality of flags or sets of flags, right?

Sure. It's not a big deal. Trivial, even. (If you're smart, everything is trivial until you fail completely due to exceeding complexity. And admit that you failed due to exceeding complexity. The former takes time to happen, the latter can never happen.) Quite some people have devoted several of the beautiful months of their youth to the problem of argument passing. Example: XParam, which calls itself "The Solution to Parameter Handling". Took >10K LOC the last time I checked. Comes with its own serialization framework. Rumors tell that its original host project uses <5% of its features.

Clarification: I'm not mocking the authors of XParam. Reason 1: Rumors tell they are pretty sharp. Reason 2: I'm really, really ashamed to admit this, but I once worked on a logging library called XLog. Took >10K LOC the last time I counted. Came with its own serialization framework. First-hand evidence tells that its host project uses 0% of its features. Ouch.

You know how I parse command line arguments in my modules these days? Like so:

for(i=0; i<argc; ++i) {
  if(strcmp(argv[i],"-trace")==0) {
    trace=1;
  }
}

I used C for the example because it's the ugliest language for string processing, and it's still a piece of cake. No, I don't get help screens. No, I don't get proper command line validation. So sue me. They're debugging options. It's good enough. Beats having everything depend on a 10K LOC command line parsing package. Not to mention a 50K LOC utility trash can full of toxic waste.

Modules are important. Module boundaries are important. A module is a piece of software that has:

  1. A reasonably compact, stable interface. An unfortunate side effect of OO training is that compact interfaces aren't valued. It's considered OK to expose an object model with tens of classes and hairy data structures and poorly thought-out extensibility hooks. Furthermore, an unfortunate side effect of C++ classes is that "stable interface" is an oxymoron. But no matter; nothing prevents you from implementing a compact, stable interface.
  2. Documentation. The semantics of the compact and stable interface are described somewhere. A really good module comes with example code. "Internal" modules lacking reasonably complete documentation suck, although aren't always avoidable.
  3. Tests. I don't think unit-testing each class or function makes sense. What has to be tested is the "official" module interfaces, because if they work, the whole module can be considered working, and otherwise, it can't.
  4. Reasonable size. A module should generally be between 1K and 30K LOC (the numbers are given in C++ KLOC units; for 4GLs, divide them by 4). Larger modules are a pile of mud. A system composed of a zillion tiny modules is itself a pile of mud.
  5. Owner. The way to change a module is to convince the owner to make a change. The way to fix a bug is to report it to the owner. That way, you can count on the interface to be backed up by a consistent mental model making it work. The number of people capable of simultaneously maintaining this kind of mental model was experimentally determined to be 1.
  6. Life cycle. Except for emergency bug fixes, changes to a module are batched into versions, which aren't released too frequently. Otherwise you can't have a tested, stable interface.

Pretty heavy.

Do I want to introduce a module to handle command line parsing? Do I really wish to become the honored owner of this bleeding-edge technology? Not now, thank you. Of course, it's the burnout speaking; it would be fun and it would be trivial, really. Luckily, not everyone around is lazy and grumpy like me. See? That guy over there already created a command line parsing module. And that other person here made one, too. Now, do I want my code to depend on their stuff?

Tough question. I've already compromised my reputation by refusing to properly deal with command line parsing. If my next move is refusing to use an Existing Solution, I will have proved the antisocial nature of my personality. Where's my team spirit? And still, I have my doubts. Is this really a module?

First and foremost, and this is the most annoying of all questions – who owns this piece of work? Sure, you thought it was trivial and you hacked it up. But are you willing to support it, or do you have someone in mind you'd like to transfer the ownership to? You probably don't even realize that it has to be supported, since it's so trivial. Oh, but you will. You will realize that it has to be supported.

I'm enough of a Kassandra to tell exactly when you'll realize it. It will be when the first completely idiotic change is made to your code by someone else. Not unlikely, tying it to another piece of "infrastructure", leading to a Gordian knot of dependencies. Ever noticed how infrastructure lovers always believe their module comes first in the dependency food chain and how it ultimately causes cyclic dependencies? So anyway, then, you'll understand. Of course, it will be too late as far as I'm concerned. My code got tied to yours, theirs, and everybody else's sticky infrastructure. Oopsie.

(The naive reader may ask, what can a command line parser possibly depend on? Oh, plenty of stuff. A serialization package. A parsing package. A colored terminal I/O package, for help screens. C++: a platform-specific package for accessing argc, argv before main() is called. C++: a singleton initialization management package. WTF is that, you ask? Get a barf bag and check out what Modern C++ Design has to say on the subject).

So no, I don't want to depend on anything without an owner. I see how this reasoning can be infuriating. Shifting the focus from software to wetware is a dirty trick loved by technically impotent pseudo-business-oriented middle-management loser types. Here's my attempt at distinguishing myself from their ilk: not only do I want to depend on stuff with an owner, but I require a happy owner at that. Contrary to a common managerial assumption (one of those which rarely hold but do keep managers sane), I don't believe in forcibly assigning ownership. If the owner doesn't like the module, expect some pretty lousy gardening job.

What about life cycle? My modules and your modules are released at different times. I might run into a need to check for compatibility with different versions of your stuff. My tests won't compile without your code, so now I need to always have it in my environment. What does your code depend on – is it a small, stable, defined set of things? I don't want to be stuck just because your new version drags in more dependencies. What if I need a feature? You don't seem to support floating point numbers. And I don't see abbreviations, either; I need them for a bunch of flags 'cause they're passed interactively all the time.

What about stable interface and semantics? Your previous version would accept command lines passing the same option multiple times, and take the last value. I have test scripts that count on that. Your new version reports an error, because now this syntax is reserved for lists (-frob=a -frob=b -frob=c passes the list a,b,c as the value of the frob option). Sigh. I guess it could be worse – you could make the string "a,b,c" from this command line, and then the problem would propagate deeper.

I could go on and on, about how your interface isn't a decent public interface (you don't seriously expect me to subclass EnumParser to handle flags with a fixed set of legal string values, do you?). Or about the size of your code, which at the moment dwarfs the size of my own module, so I'd have more command line parsing code than anything else in my tests. And how it hurts when you download the tests using a slow connection to the target machine. And how I don't like it when my tests crash inside your code, even when it's my fault, because you have hairy data structures that I don't like to inspect.

But you already got it – I don't want your code, because I'm an antisocial asshole that has no team spirit whatsoever. I'm going to parse arguments using 5 lines of C code. Worse, I'll make a function out of those five lines, prefix its name with the module name, and replicate it in all my modules. Yep, I'm the copy-paste programmer and you're the enlightened developer of the next generation command line parsing platform. Have it your way, and I'll have it my way.

To complete the bad impression I've been making here, I'll use a flawed real-world analogy. Think living organisms. They have a very serious Tower of Babel problem; redundancy is everywhere. I've heard that humans and octopuses have very similar eye structure, despite being descendant from a blind ancestor. Command line parsers?! Entire eyes, not the simplest piece of hardware with quite some neural complexity at the back-end, get developed independently. Redundancy can hardly get any worse. But it works, apparently better than coordinating everybody's efforts to evolve would work.

Redundancy sucks. Redundancy always means duplicated efforts, and sometimes interoperability problems. But dependencies are worse. The only reasonable thing to depend on is a full-fledged, real module, not an amorphous bunch of code. You can usually look at a problem and guess quite well if its solution has good chances to become a real module, with a real owner, with a stable interface making all its users happy enough. If these chances are low, pick redundancy. And estimate those chances conservatively, too. Redundancy is bad, but dependencies can actually paralyze you. I say – kill dependencies first.

I can't believe I'm praising Tcl

So the other day I both defined Tcl procedures all by myself and used them interactively, and I liked it, to the point where it felt like some kind of local optimum. This entry is an attempt to cope with the trauma of liking Tcl, by means of rationalizing it. I'll first tell the story of my fascinating adventures, and then I'll do the rationalizing (to skip the oh-so-personal first part, go straight for the bullet points).

Here's what I was doing. I have this board. The board has a chip on it. The chip has several processors in it. There's a processor in there that is a memory-mapped device, and I talk to it using CPU load/store commands. The CPU is itself a JTAG target, and I talk to it via a probe, issuing those load/store commands. To the probe I talk via USB using an ugly console that can't even handle the clipboard properly. The console speaks Tcl.

Net result: in order to talk to the memory-mapped processor, I have to speak Tcl. Or I can use a memory view window in a graphical debugger. Except some addresses will change the processor state when read (pop an item from a LIFO, that sort of thing). So no, memory view windows aren't a good idea – you have to aim for the specific address, not shoot at whole address ranges. Damn, just who thought of defining the hardware in such a stupid way? Why, it was me. Little did I know that I was thereby inflicting Tcl on myself.

Anyway, there's this bug I have to debug now, and as far as I know it could be in at least three different pieces of software or in two different pieces of hardware, and I don't like any of the 5 options very much, and I want to find out what it is already. So at the ugly probe console I type things like word 0x1f8cc000 (reads the processor status), word 0x1f8cc008 2 (halts the execution), word 0x1f8cc020 0x875 (places a breakpoint). I get sick and tired of this in about 2 minutes, when the command history is large enough to make the probability of hitting Enter at the wrong auto-completed command annoying. It's annoying to run the wrong command because if I ruin the processor state, it will take me minutes to reproduce that state, because the program reads input via JTAG, which is as slow as it gets.

So I figure this isn't the last bug I'm gonna deal with this way, and it's therefore time for Extending my Environment, equipping myself with the Right Tools for the Job, Tailored to my needs, utilizing the Scripting capabilities of the system. I hate that, I really do. Is there something more distressing than the development of development tools for the mass market of a single developer? Can a programmer have a weakness more pathetic than the tendency to solve easy generic meta-problems when the real, specific problems are too hard? Is there software more disgusting in nature than plug-ins and extensions for a butt-ugly base system? But you know what, I really fail to remember 0x1f8cwhat the breakpoint address is. This story has one hexadecimal value too much for my brain. OK, then. Tcl.

I decided to have one entry point procedure, pmem, that would get a memory-mapped processor id, 'cause there are many of them, and then call one of several functions with the right base address, so that pmem 0 pc would do the same as pmem_pc 0x1f8c0000. Well, in Tcl that's as simple as it gets. Tcl likes to generate and evaluate command strings. More generally, Tcl likes strings. In fact, it likes them more than anything else. In normal programming languages, things are variables and expressions by default. In Tcl, they are strings. abc isn't a variable reference – it's a string. $abc is the variable reference. a+b/c isn't an expression – it's a string. [expr $a+$b/$c] is the expression. Could you believe that? [expr $a+$b/$c]. Isn't that ridiculous?

In fact, that was one of my main applications for Tcl: ridiculing it. I remember reading the huge Tcl/Tk book by Brent Welch with my friend once. There was a power outage and it was past the time when the last UPS squeaked its last squeak. And the book was there 'cause the hardware guys use it for scripting their lovecraftian toolchain. We really did have fun. Tears went down my cheeks from laughter. Even people with the usual frightened/mean comments about those geeks who laugh their brains out over a Tcl book didn't spoil it. So, ridiculing Tcl, my #1 use for it. The other use is the occasional scripting of the hardware hackers' lovecraftian toolchain. Overall, I don't use Tcl very much.

The nice thing about Tcl is that it's still a dynamic language, and reasonably laconic at that, modulo quoting and escaping. So I enter the usual addictive edit/test cycle using tclsh < script. N minutes down the road (I really don't know what N was), I've finished my 2 screenfuls of Tcl and the fun starts. I actually start debugging the goddamn thing.

$ pmem 0 stat
IDLE
$ pmem 0 bkpt 0 0xbff
$ pmem 0 bkpt 1 0xa57
$ pmem 0 cmd run
$ pmem 0 stat
DEBUG
$ pmem 0 pc
0xbff
$ pmem 0 rstack
3 return addresses
addr 0: 0x0005
addr 1: 0x05a8
addr 2: 0x0766
$ pmem 0 cmd stp
$ pmem 0 pc
0xc00

Weeee! HAPPY, HAPPY, JOY, JOY!

You have no idea just how happy this made me. Yeah, I know, I'm overreacting. I'll tell you what: debug various kinds of hardware malfunction for several months, and you'll be able to identify with the warped notion of value one gains through such process. On second thought, I don't know if I'd really recommend it. Remember how I told low-level programming was easy? It is, fundamentally, but there's this other angle from which it's quite a filthy endeavor. I promise to blog about it. I owe it to the people who keep telling me "so low-level is easy?" each time they listen to me swear heartily at a degenerate hardware setup where nothing works no matter what you try. I owe it to myself – me wants to reach a closure here. Why should I tolerate being regularly misquoted at the moments of my deepest professional catharsis?

Aaaanyway, in just N minutes, I bootstrapped myself something not unlike a retarded version of gdb, the way it would work if the symbol table of my program was stripped. But no matter – I have addr2line for that. And the nice thing about my retarded debugger front-end is that it looks like shell commands: blah blah blah. As opposed to blah("blah","blah"). And this, folks, is what I think Tcl, being a tool command language, gets right.

I come from the world of pop infix languages (C/Java/Python/Ruby/you name it). Tcl basically freaks me out with its two fundamental choices:

  • Tcl likes literals, not variables. Tcl: string, $var. Pop infix: "string", var.
  • Tcl likes commands, not expressions. Tcl: doit this that, [expr $this+$that]. Pop infix: doit("this","that"), this+that.

So basically, pop infix languages (and I use the term in the most non-judgmental, factual way), pop infix languages are optimized for programming (duh, they are programming languages). Programming is definitions. Define a variable and it will be easy to use it, and computing hairy expressions from variables is also easy. Tcl is optimized for usage. Most of the time, users give simple commands. Command names and literal parameters are easy. If you are a sophisticated user, and you want to do pmem 0 bkpt [expr [pmem 0 pc] + 1], go ahead and do it. A bit ugly, but on the other hand, simple commands are really, really simple.

And eventually, simple commands become all that matters for the user, because the sophisticated user grows personal shortcuts, which abstract away variables and expressions, so you end up with pmem 0 bkpt nextpc or something. Apparently, flat function calls with literal arguments is what interactive program usage is all about.

I'm not saying that I'm going to use Tcl as the extension language of my next self-made lovecraftian toolchain (I was thinking more along the lines of doing that one in D and using D as my scripting language, 'cause it compiles fast enough and it's apparently high-level enough). I haven't thought enough about this, but the grotesque escaping/quoting in Tcl still freaks me out; I don't want to program like that. All I'm saying is that I like the interactive part. Specifically:

  • Short code matters a lot; short interactive commands matter much more.
  • An interactive command language must be a real language (loops, functions and all).
  • Tcl allows for the shortest commands and it's a real language. I'm fascinated.

Allow me to elaborate.

Short code vs short commands

Lots of people have noticed that keeping your code short is extremely important. More surprisingly, many people fail to notice this, probably because "1 line is better than 5" doesn't sound that convincing. OK, think about 100K lines vs 500K and you'll get the idea. Oh, there are also those dirty Perl/shell one-liners that make one doubt about this. I've known a Bastard Programmer that used 2K bash one-liners as his weapon of choice. OK then, so the actual rule must be "short code is good unless it's written by a bastard". But it's the same core idea.

So we have the Architect type, who loves lots of classes which delegate work to each other, and we have the Enlightened type, who wants to write and read less. And the Enlightened type can rant and rave all day how Python, or Ruby, or Lisp make it oh-so-easy to define data structure literals, or to factor out stuff using meta-programming, or some other thing an Architect just never gets. And I'm all with it.

And then we have interactive shells. And in Python it's doit("xx","yy"). And in Lisp it's (doit "xx" "yy"), or (doit :xx :yy), or (doit xx yy) if you make it a macro. And in Ruby it's doit :xx :yy, if you use symbols and omit parens. And that's about as good as you can get without using your own parser as in doit "xx yy", which can suck in the (more rare) case when you do need to evaluate expressions before passing parameters, and doesn't completely remove overhead. Also note how all these languages use (), which makes you press Shift, instead of [] which doesn't. Ruby and Perl let you omit (), but it costs in readability. And [] is unanimously reserved for less important stuff than function calls.

The whole point of short code is saving human bandwidth, which is the single thing in a computing environment that doesn't obey Moore's law and doesn't double once in 18 months. Now, which kind of bandwidth is the most valuable? I'll tell you which. It's the interactive command bandwidth. That's because (1) you interact a lot with your tools and (2) this interaction isn't what you're trying to do, it's how you're trying to do it, so when it isn't extremely easy it's distracting and extremely frustrating.

This is why an editor that doesn't have short keyboard shortcuts for frequently used commands is a stupid fucking piece of junk and should go down the toilet right now. This is why a Matlab vector – [1 2 3] – is much better than a Python list – [1,2,3] (ever noticed how the space bar is much easier to hit than a comma, you enlightened dynamic language devotee? Size does matter). And don't get me started about further wrapping the vector literal for Numeric Python.

The small overhead is tolerable, though sucky, when you program, because you write the piece of code once and while you're doing it, you're concentrating on the task and its specifics, like the language syntax. When you're interacting with a command shell though, it's a big deal. You're not writing a program – you're looking at files, or solving equations, or single-stepping a processor. I have a bug, I'm frigging anxious, I gotta GO GO GO as fast as I can to find out what it is already, and you think now is the time to type parens, commas and quotation marks?! Fuck you! By which I mean to say, short code is important, short commands are a must.

Which is why I never got to like IPython or IDLE. Perhaps Ruby could be better, because of omitting parens and all. Ruby seems to be less inflicted with the language lawyer pseudo-right-thing mindset. But the basic plain vanilla function-call-with-literal-args syntax still doesn't reach the purity of *sh or Tcl. Well, the shell is an insanely defective programming language, so it's not even an option for anything non-interactive. But Tcl gets way closer to a programming language. Which brings us to the next issue:

Ad-hoc scripting languages – the sub-Turing tar pit

Many debuggers have scripting languages. gdb has one, and Green Hills MULTI has one. Ad hoc command languages usually get the command-syntax-should-be-easy part right – it's command arg arg arg… They then get everything else wrong. That is, you usually don't have any or some of: data structures, loops, conditionals and user-defined functions, option for expression evaluation in all contexts, interface to the host OS, and all the stuff which basically would make the thing a programming language. Or you get all those things in a peculiar, defective form which you haven't seen anywhere else.

I wish people stopped doing that. I understand why many people do that very well – they don't know any language which isn't a 3rd generation one (presumably C++ or Java). They don't know how scripting works except on a theoretical level. They know how to build a big software system, with objects and relationships between objects and factories of objects and stuff. At the system/outside world boundary they're helpless though. Outside of the system our objects are gone. There's this cold, windy, cruel world with users and files and stuff. Gotta have an AbstractInputParser to guard the gates into our nice, warm, little system, um, actually it's "big", no, make it "huge" system.

These are the Architects who get mocked by the Enlightened dynamic language lovers. They normally dismiss scripting languages as "not serious", therefore, when faced with the need to create a command language for their system, they start out with a plan to create a non-serious (a.k.a crippled) language. Even if they wanted to make it a good one, they never thought about the considerations that go into making a good scripting language, nor do they realize how easy/beneficial it is to embed an existing one.

So basically we have 3GL people, who realize that commands should be short ("it's a simple thing we're doing here"), but they don't see that you need a real Turing-complete programming language for the complicated cases. And we have 4GL people, who optimize for the complicated case of programming ("what's a scripting language – it's a programming language, dammit!"), and they don't care about an extra paren or quotation mark.

And then we have Tcl, which makes easy things really easy and scales to handle complicated cases (well, almost, or so I think). And not only does it make plain funcalls easy – it reserves [] for nested funcalls, in the Lispy prefix form of outercall arg [innercall arg arg] arg... [] is better than (). Pressing Shift sucks. And custom keyboard mapping which makes it possible to type parens without pressing Shift is complete idiocy, because you won't be able to work with anyone's machine. This shit matters, if you program all day long it does.

Now what?

I don't know if I'd use Tcl. It's less of a programming language than your typical pop infix 4GL. For starters, [expr] is a bitch. And then there are "advanced" features, like closures, that I think Tcl lacks. It has its interesting side from a "linguistic" perspective though. It has really few core syntax, making it closer to Lisp and Forth than the above-mentioned pop infix ilk. So you can use Tcl and claim for aristocracy. Of course you'll only manage to annoy the best programmers this way; the mediocre won't know what you're talking about, seeing only that Tcl doesn't look enough like C to be worth the name of a language.

I'd think a lot before embedding Tcl as a scripting language for my tools, because of linguistic issues and marketing issues (you ought to give them something close enough to C, whether they're a customer or a roommate). So the practical takeaways for me are modest:

  1. I ain't gonna mock Tcl-scriptable tools no more. I understand what made the authors choose Tcl, and no, it's not just a bad habit. On a level, they chose probably the best thing available today in terms of ease-of-programming/ease-of-use trade-off (assuming they bundle an interactive command shell). If I need to automate something related to those tools, I'll delve into it more happily than previously. So much for emotional self-tuning.
  2. I'll let it sink, and try to figure out whether you have a better trade-off. For example, if Ruby had macros (functions which don't evaluate their inputs), you could say doit x y without making x and y symbol objects, which forces you to prefix them with a colon. How macros of that sort should work in an infix language escapes me (not that I think that much about it, but still). Anyway, I'll definitely add Tcl to the list of things I should understand better in order to fight my linguistic ignorance. Being an amateur compiler writer, that seems like one of my duties.

Python: teaching kids and biting bits don't mix

Today an important thing happened in my professional life. I was told to take a break from The Top Priority Project, so that I could deal with a more important project. The evaluation of the expression max_priority+1 caused my wetware registers to overflow. Therefore, you should consider the following piece of whining as an attempt of a brain to recover from deadline damage. That is, you won't get the kind of deep discussions and intellectual insights you've come to expect from yosefk.com. Instead, you'll get shallow, plain, simple and happy Python hate. That's the kind of thing we have in stock right now.

It has been known for quite some time that a specie called the Idiot is out there, and its revolting specimens will mercilessly prey on your belief in the future of the human race. The time has come for me to share with you some of the shocking experience from my encounters with idiots. Believe it or not, there exists a kind of Idiot who sincerely thinks that a Person should not complain about Technology or Tools he or she gets to use, and that such complains are deeply wrong on this or other level, depending on the exact idiot in question. Well, I hereby inform the idiots, at least the ones who can use a browser and read text, that I (1) have a bad attitude, (2) am not a good workman, so (3) I'll complain about any technology I use as I damn please and (4) I don't give a flying fuck about your moronic opinion, so don't bother to share it with me.

So, Python hate. I don't really hate Python; it's a figure of speech. You know what I really hate? Of course you do. I hate C++. C++ is the hate of my life. I don't think I'll ever be able to hate another programming language. For me, to hate means to recognize that something is inherently evil and should be exterminated. Reaching that status is outstanding achievement for any technology. Your typical piece of software never gets that far. All it does is it does something you need most of the time, and then does something extremely brain-damaged some of the time. So you kinda like it, but sometimes it just pisses you off. It's gotta piss you off. It's a machine. It's doesn't have a pigeon crapload worth of common sense. Of course it pisses you off. Don't lie to me! A person who was exposed to machines and doesn't hate them is either an idiot or is completely devoid of soul! Step back, the child of Satan!

You know what really pisses me off about Python? It's the combination of being BDFL-managed and having roots in the noble idea of being The Language for Teaching. Sure, having lexical scoping wouldn't hurt, and having name errors 5 minutes down the road in a program that happily parses already hurts, and Python shells aren't a picnic, blah blah blah. But nothing is perfect, you know, and people adapt to anything. I learned to live in tcsh and vim as my IDE. So I know how adaptability can bring you quite far, sometimes much farther than you wish to get. But this BDFL+teaching combo really bugs the hell out of me. Allow me to elaborate.

Ever heard about programming languages with an ANSI standard? Sure you did. Well, the other kind of languages have a BDFL standard. It's when a Benevolent Dictator For Life, typically the Punk Compiler Writer (PCW) who invented the language and threw together the initial (and frequently still the only practical) implementation, decides what happens with the language. I plan to blog about PCWs at some point, but this isn't about PCWs in general, this is about PCWs who've been elevated to the BDFL rank. So they bend the language as they damn please. Sometimes it splits the user community (as in Perl 5 vs Perl 6) and sometimes it doesn't (as in Python 2 and Python 3, or so it seems). I'd say that it's totally stupid to use a BDFL-governed language, but I can't, because that would offend C++, The Hate Of My Life, who does have an ANSI standard. Relax, darling. It's you, and only you, that I truly hate. The others mean nothing to me.

So that's what the "BDFL" part means. The "teaching" part is about Python being supposed to be a good (the best? of course, the best) language for teaching kids how to program. While I'm not sure whether the BDFL part is inherently wrong, the teaching part is sure as hell wrong, in two ways:

  1. Why the fuck should kids program?
  2. Why the fuck should I program in a language for kids?

I didn't program till the age of 17, and I have absolutely no regrets about it. I know quite some other programmers-who're-really-into-software-someone-call-for-help who didn't hack in their diapers, either. I also know a bunch of people who did program since they were 10 to 12 years old. They are all burnt out. They're trying to look for some other occupation. Theater, psychology, physics, philosophy, you name it. I haven't gathered that many data points, just a couple dozens. Some people won't fit the pattern I see. But it's enough for me to assume that kids could have better things to do than programming. Watching cartoons is one thing that sounds like fun. That I always liked.

I'm not sure kids shouldn't program. We need scientific data on that one. I'm no specialist, but locking a large enough set of kids in the basement and have them implement progressive radiosity sounds like a good start. Anyway, as you probably noticed, while I'm curious about the scientific truth, I don't care much about kids. I care about me.

Me. I'm a professional programmer. By which I mean to say, I shovel through piles of virtual shit for a living. And to shovel quickly and happily, I need a big shovel. Python is one of my shovels. Core dumps are one of my piles of shit. Python, meet the core dumps. Core dumps, meet the Python.

Core dumps are spelled in hexadecimal. I mean "core dumps", "spelled" and "hexadecimal" in the broadest sense, as in "things having to do with machine encoding and that sort of low-level crud are viewed and edited in tools that tend to emit and consume integers in hexadecimal notation". Or "core dumps are spelled in hexadecimal" for short.

To deal with hexadecimal, Python has a hex() function and a '%s' format string like it should. Well, almost. Python thinks that hexadecimal is like decimal, except in base 16 instead of 10. Integers are naturally spelled in sign magnitude format: 1, 10, -1, -10. In hexadecimal, that's 0x1, 0xa, -0x1, -0xa. "-0x1"? "-0x1"?!! FUCK YOU, PYTHON!!

You see, all computers that have been out there since the year 19-something store integers in 2's complement format. There used to be 1's complement and sign magnitude machines, but they are all dead, which is a good thing, because 2's complement is a good thing. In 2's complement, -1 in hexadecimal is 0xffffffff. If you use 32-bit integers. And twice the number of f's on 64-bit machines. Do I give a flying fuck about 64-bit machines? I think you know the answer. If I cared, I'd run Python on a 64-bit host. An old version of Python.

In old versions of Python, hex() worked about right. Python has ints and longs (that's fixnum and bignum in Lisp, and it would be int and BigInteger in Java if ints didn't overflow but would instead turn into BigIntegers when int wouldn't have enough bits). hex() assumed that if you're using ints, you're a bit biter, and you want to get 0xffffffff for -1, and you should get your non-portable, but handy result. If, on the other hand, you used longs, you probably were a mathematician, so you'd get -0x1. And this was juuuust riiiight.

However, starting from version 2.something, hex() was "fixed" to always do "The Right Thing" – return the portable sign magnitude string. That, my friends, is what a kid would expect. You should know – you were a kid yourself, didn't you expect just that?! Naturally, breaking the frigging backwards compatibility is perfectly OK with the PCW BDFL, who sure as hell doesn't want to add another function, sex() for "sexadecimal". That function could (preferably) do the new thing, and hex would do the old thing to not break the existing programs. Or it could (passably) do the old thing so that people could at least get the old functionality easily, even though hex() would no longer work. But noooo, we need just one hex function doing the one right thing. Too bad we only found out what that right thing was after all those years went by, but in retrospect, it's obvious that nobody needs what we've been doing.

Now, could anybody explain me the value of hexadecimal notation outside of the exciting world of low-level bit fucking? Why on Earth would you want to convert your numbers to a different base? One particular case of this base brain rot is teaching kids to convert between bases. As Richard Feynman has pointed out, and he had the authority to say so and I don't and hence the reference, converting between numbers is completely pointless, and teaching kids how to do this is a waste of time. So what you do is you give me, the adult programmer with my adult core dump problems, a toy that a kid wouldn't need or want to play with. Thank you! You've found the perfect time, because I have a DEADLINE and this fucking shit CRASHES and I couldn't possibly have BETTER THINGS TO DO than WASTING TIME ON CONVERTING BASES!!

I know that I can implement the right (yes, THE RIGHT, damn it) hex and put it into my .pythonrc and that would solve the interactive shell annoyances and I could carry it around and import it and that would solve the non-interactive annoyances, thanks again. Until now I've done it in a cheaper way though – I had a Python 2.3.5 binary handy, and that did the trick. 2 Pythons, one with new libraries and shiny metaprogramming crap and stuff, and one with a working hex(). I like snakes.

Why do I attribute this business to kid-teaching, of all things? I don't lurk on Python forums, so I'm basing this on rumors I heard from die-hard Python weenies. Another, TOTALLY INFURIATING thing: a/b and a//b. int/int yields a frigging FLOAT in Python 3!! This has all the features of hex(-1) == '-0×1':

  • "Kids" get the wrong answer: 3/2 should give 1 (honest-to-God integer division) or it should give the ratio 3/2 (honest-to-God rational/real number division). Floating point is reeeaaally too complicated. I've seen a teacher saying online something along the lines of "if you ask how much is 3/2 times 2, how do you grade the answer 2.99999"? Bonus points if your reply managed to take the identity "2.999… = 3" into account (hordes of grown-ups fail to get that one).
  • "Programmers" get the wrong answer: chunks=len(arr)/n; rem=len(arr)%n. 'nuff said.
  • "Mathematicians" get the wrong answer: I mean, if you're doing numeric work in Python (what's the matter, can't pay for a Matlab license?), you probably know about integer division versus floating point division, but you've never heard about an operation called // (no, it's not a BCPL comment). 3/2=1.5? How am I supposed to do integer division? int(3/2)? Argh!

You know, I'm actually more willing to upgrade to C++0x, whenever that materializes, than I am willing to upgrade to Python 3. At least it would be "for(auto p=m.begin()…" instead of "for(std::map<std::string,std::string>::const_iterator p=FUCK OFF AND DIE YOU EVIL PROGRAMMING LANGUAGE FROM HELL!!".

What's that? Who said I liked C++0x more than I liked Python 3?! Upgrade, I said, "willing to upgrade"! Don't you see it's relative to the previous version? In absolute terms, who can beat C++?! Come back, C++! I still hate you!!

Side effects or not, aliasing kills you

My programming personality is very depressed right now. For a while, I've been away from my usual long-term stuff and back in the trenches. I'm doing simple things, but there's lots of them, so it feels like I'm not moving anywhere even though I go as fast as I can. It's like a highway with monotonic view.

Well, this has completely drained my brain. So I won't tell you anything in this blog – instead, I'll ask you a question. This way, it's OK for me to be wrong, which is even more likely than usual in my current state. Clever trick, isn't it?

My question basically is, how is not having side effects in your programming language going to help you write correct, and optimal, and parallel programs? I've already mentioned it, but this time, I'll elaborate on the point that I fail to get.

The problem with parallelism is scheduling. We want scheduling to be:

  1. Correct: when A depends on B, B should happen before A.
  2. Optimal: A, B and everything else should complete as soon as possible.

So, in order to correctly schedule basic blocks of code called A and B, I need to know if they depend on each other. If I have side effects, this is going to be hard. A modifies an object called X, and B modifies Y. Can X and Y be the same thing? To figure it out, you need whole-program analysis, and if your language is Turing-complete, WPA won't necessarily succeed, either.

This is the aliasing problem. The aliasing problem rules. It's the problem that forces you to use restrict to optimize numeric code. It's the problem that forces you to use static typing if you want to optimize anything at all. And it's the problem that makes all safe languages completely and wildly unsafe when you have multiple threads. The rule of thumb is: if you have a good idea for compile-time or run-time anything, it won't work because of the aliasing problem.

Now, suppose we got rid of those pesky side effects. A doesn't modify X; instead, it creates a new object, Z. Similarly, B takes Y and makes W. Now we can prove that A doesn't depend on B. As soon as their inputs are ready, they can run. This takes care of the correctness problem, which was listed as the scheduling problem #1 at the beginning.

However, this didn't solve the aliasing problem – X and Y can still be the same. I suspect that the aliasing problem is going to raise its ugly head and kick our pretty butt. Specifically, while semantically we made Z from X, we'd really like to wipe out X and reuse its memory for hosting Z. Because that would be more efficient than copying X and modifying some of it and keeping two objects around. Now, we can't do this optimization if X and Y are the same thing, and we know nothing about the execution order of A and B, can we? If A replaced X with Z, B would get the wrong Y.

And if we copy things, we aren't going to get an optimal scheduling, because A and B now run slower – problem #2, right?

Seriously, what do I miss? I realize that there are parallel execution environments where you have to copy things anyway because you have a zillion processors and a zillion processors can't have shared memory. I also realize that you can have few objects shared between tasks that run in parallel, so the overhead of copying them isn't very important. I realize all kinds of things making the aliasing problem a non-problem for side-effect-free parallelization frameworks or languages or whatever.

What I fail to see is, if you run in a single-box, multi-core system, and you do have shared memory, and you do have lots of shared data, how does removing side effects help you? With side effects, you can get an efficient program, if you can handle the debugging. Without side effects, you get a correct program, and then you must somehow optimize object allocation. And I just don't see how you'd go about that without compromising correctness.

Maybe I'm ignorant. Maybe I should have read about this somewhere. But where do I find a levelheaded discussion of, say, common implementation techniques of pure functional language compilers? I just keep bumping into "sufficiently smart compiler" arguments, and then there are people who don't care much about performance (and shouldn't because of their application domain).

Compare this to imperative languages. Anybody can write a pretty decent C compiler. If you don't optimize at all, you get what, 4x slowdown compared to the state of the art optimizer? And on a scalar RISC machine, I bet you can bridge 50%-70% of that gap by your cheapest shot at register allocation. A no-brainer, you need about 30% of a comp sci degree and you're all set. Now, how am I supposed to optimize allocation if there are no side effects? I mean, your spec or your family of specs should come with implementation techniques. "Useful" is good by itself, but it must also be implementable.

By the way, I'd love to find out that I'm wrong on this one. I've done lots of parallelism-related debugging, and it sucks. And maybe I am wrong. Maybe there just isn't much talk about the implementation details of these things. I do think my reasoning makes sense though.

Optimal processor size

I'm going to argue that high-performance chip designs ought to use a (relatively) modest number of (relatively) strong cores. This might seem obvious. However, enough money is spent on developing the other kinds of chips to make the topic interesting, at least to me.

I must say that I understand everyone throwing millions of dollars at hardware which isn't your classic multi-core design. I have an intimate relationship with multi-core chips, and we definitely hate each other. I think that multi-core chips are inherently the least productive programming environment available. Here's why.

Our contestants are:

  • single-box, single-core
  • single-box, multi-core
  • multi-box

With just one core, you aren't going to parallelize the execution of anything in your app just to gain performance, because you won't gain any. You'll only run things in parallel if there's a functional need to do so. So you won't have that much parallelism. Which is good, because you won't have synchronization problems.

If you're performance-hungry enough to need many boxes, you're of course going to parallelize everything, but you'll have to solve your synchronization problems explicitly and gracefully, because you don't have shared memory. There's no way to have a bug where an object happens to be accessed from two places in parallel without synchronization. You can only play with data that you've computed yourself, or that someone else decided to send you.

If you need performance, but for some reason can't afford multiple boxes (you run on someone's desktop or an embedded device), you'll have to settle for multiple cores. Quite likely, you're going to try to squeeze every cycle out of the dreaded device you have to live with just because you couldn't afford more processing power. This means that you can't afford message passing or a side-effect-free environment, and you'll have to use shared memory.

I'm not sure about there being an inherent performance impact to message passing or to having no side effects. If I try to imagine a large system with massive data structures implemented without side effects, it looks like you have to create copies of objects at the logical level. Of course, these copies can then be optimized out by the implementation; I just think that some of the copies will in fact be implemented straight-forwardly in practice.

I could be wrong, and would be truly happy if someone explained to me why. I mean, having no side effects helps analyze the data flow, but the language is still Turing-complete, so you don't always know when an object is no longer needed, right? So sometimes you have to make a new object and keep the old copy around, just in case someone needs it, right? What's wrong with this reasoning? Anyway, today I'll assume that you're forced to use mutable shared memory in multi-core systems for performance reasons, and leave this no-side-effects business for now.

Summary: multiple cores is for performance-hungry people without a budget for buying computational power. So they end up with awful synchronization problems due to shared memory mismanagement, which is even uglier than normal memory mismanagement, like leaks or dangling references.

Memory mismanagement kills productivity. Maybe you disagree; I won't try to convince you now, because, as you might have noticed, I'm desperately trying to stay on topic here. And the topic was that multi-core is an awful environment, so it's natural for people to try to develop a better alternative.

Since multi-core chips are built for anal-retentive performance weenies without a budget, the alternative should also be a high-performance, cheap system. Since the clock frequency doesn't double as happily as it used to these days, the performance must come from parallelism of some sort. However, we want to remove the part where we have independent threads accessing shared memory. What we can do is two things:

  • Use one huge processor.
  • Use many tiny processors.

What does processor "size" have to do with anything? There are basically two ways of avoiding synchronization problems. The problems come from many processors accessing shared memory. The huge processor option doesn't have many processors; the tiny processors option doesn't have shared memory.

The huge processor would run one thread of instructions. To compensate for having just one processor, each instruction would process a huge amount of data, providing the parallelism. Basically we'd have a SIMD VLIW array, except it would be much much wider/deeper than stuff like AltiVec, SSE or C6000.

The tiny processors would talk to their neighbor tiny processors using tiny FIFOs or some other kind of message passing. We use FIFOs to eliminate shared memory. We make the processors tiny because large processors are worthless if they can't process large amounts of data, and large amounts of data mean lots of bandwidth, and lots of bandwidth means memory, and we don't want memory. The advantage over the SIMD VLIW monster is that you run many different threads, which gives more flexibility.

So it's either huge or tiny processors. I'm not going to name any particular architecture, but there were and still are companies working on such things, both start-ups and seasoned vendors. What I'm claiming is that these options provide less performance per square millimeter compared to a multi-core chip. So they can't beat multi-core in the anal-retentive performance-hungry market. Multiple cores and the related memory mismanagement problems are here to stay.

What I'm basically saying is, for every kind of workload, there exists an optimal processor size. (Which finally gets me to the point of this whole thing.) If you put too much stuff into one processor, you won't really be able to use that stuff. If you don't put enough into it, you don't justify the overhead of creating a processor in the first place.

When I think about it, there seems to be no way to define a "processor" in a "universal" way; a processor could be anything, really. Being the die-hard von-Neumann machine devotee that I am, I define a processor as follows:

  • It reads, decodes and executes an instruction stream (a "thread")
  • It reads and writes memory (internal and possibly external)

This definition ignores at least two interesting things: that the human brain doesn't work that way, and that you can have hyper-threaded processors. I'll ignore both for now, although I might come back to the second thing some time.

Now, you can play with the "size" of the processor – its instructions can process tiny or huge amounts of data; the local memory/cache size can also vary. However, having an instruction processing kilobytes of data only pays off if you can normally give the processor that much data to multiply. Otherwise, it's just wasted hardware.

In a typical actually interesting app, there aren't that many places where you need to multiply a zillion adjacent numbers at the same cycle. Sure, your app does need to multiply a zillion numbers per second. But you can rarely arrange the computations in a way meeting the time and location constraints imposed by having just one thread.

I'd guess that people who care about, say, running a website back-end efficiently know exactly what I mean; their data is all intertwined and messy, so SIMD never works for them. However, people working on number crunching generally tend to underestimate the problem. The abstract mental model of their program is usually much more regular and simple in terms of control flow and data access patterns than the actual code.

For example, when you're doing white board run time estimations, you might think of lots of small pieces of data as one big one. It's not at all the same; if you try to convince a huge SIMD array that N small pieces of data are in fact one big one, you'll get the idea.

For many apps, and I won't say "most" because I've never counted, but for many apps, data parallelism can only get you that much performance; you'll need task parallelism to get the rest. "Task parallelism" is when you have many processors running many threads, doing unrelated things.

One instruction stream is not enough, unless your app is really (and not theoretically) very simple and regular. If you have one huge processor, most of it will remain idle most of the time, so you will have wasted space in your chip.

Having ruled out one huge processor, we're left with the other extreme – lots of tiny ones. I think that this can be shown to be inefficient in a relatively intuitive way.

Each time you add a "processor" to your system, you basically add overhead. Reading and decoding instructions and storing intermediate results to local memory is basically overhead. What you really want is to compute, and a processor necessarily includes quite some logic for dispatching computations.

What this means is that if you add a processor, you want it to have enough "meat" for it to be worth the trouble. Lots of tiny processors is like lots of managers, each managing too few employees. I think this argument is fairly intuitive, at least I find it easy to come up with a dumb real-world metaphor for it. The huge processor suffering from "lack of regularity" problems is a bit harder to treat this way.

Since a processor has an "optimal size", it also has an optimal level of performance. If you want more performance, the optimal way to get it is to add more processors of the same size. And there you have it – your standard, boring multi-core design.

Now, I bet this would be more interesting if I could give figures for this optimal size. I could of course shamelessly weasel out of this one – the optimal size depends on lots of things, for example:

  • Application domain. x86 runs Perl; C6000 runs FFT. So x86 has speculative execution, and C6000 has VLIW. (It turns out that I use the name "Perl" to refer to all code dealing with messy data structures, although Python, Firefox and Excel probably aren't any different. The reason must be that I think of "irregular" code in general and Perl in particular as a complicated phenomenon, and a necessary evil).
  • The cost of extra performance. Will your customer pay extra 80% for extra 20% of performance? For an x86-based system, the answer is more likely to be "yes" than for a C6000-based system. If the answer is "yes", adding hardware for optimizing rare use cases is a good idea.

I could go on and on. However, just for the fun of it, I decided to share my current magic constants with you. In fact there aren't many of them – I think that you can use at most 8-16 of everything. That is:

  • At most 8-16 vector elements for SIMD operations
  • At most 8-16 units working in parallel for VLIW machines
  • At most 8-16 processors per external memory module

Also, 8-16 of any of these is a lot. Many good systems will use, say, 2-4 because their application domain is more like Perl than FFT in some respect, so there's no chance of actually utilizing that many resources in parallel.

I have no evidence that the physical constants of the universe make my magic constants universally true. If you know about a great chip that breaks these "rules", it would be interesting to hear about it.