Entries Tagged 'software' ↓

Ahem

To make an embarrassing story short:

  1. The merge scenario from the previous post doesn't work the way I said it works in any DVCS it was tried on. I talked about the case of "merging merges" – when two people resolve the same conflict independently in different clones, and then someone pulls from both clones. More specifically, I mentioned the case where the conflict was between two similar patches, and each of the two people took a different patch when resolving the conflict. I claimed BitKeeper would then remove both patches in the final merge. Well, I'm still sure it did work that way for me once (an older version of bk?.. some specifics I didn't notice?..) But bk 4 works differently; it seems to take the later conflict resolution (throwing away one of the patches in my scenario, but not both). Mercurial reportedly takes the earlier resolution, throwing away the other patch. Git and Bazaar reportedly require user intervention, possibly preventing damage by automerges at the cost of, well, requiring user intervention. bk and hg do manage to create a working version in my specific scenario, but of course throwing away one of the conflict resolutions isn't always safe that way, it's only safe in my scenario where both resolutions are basically equivalent. Anyway, while "merging merges" is specific to DVCS, no contemporary one seems to screw you nearly as badly as I described; "most vexing merge", oh, c'mon. Also, it would be easy to guess that they all deal with this scenario differently, because this whole business of merging is heuristical, what are the chances for different heuristics to do the same thing? And in general it's awfully lame to publish stuff first and check it later. I suck.
  2. And the overall excited mood of that article sucked, too, 'cause, like, c'mon, everybody knows that automerges can cut your fingers off, big deal, calm down. I mean, the worst merge-related bugs I dealt with came from automerges that any kind of version control system would allow. Two changes done in different files, that kind of thing. Coding in a "merge-friendly" way is something few people do, and it isn't that easy. For example, you basically must never change semantics of definitions. If your function didn't lock that semaphore, and now it does, then a call added in another branch, which was completely safe, can now cause a deadlock. So what are you going to do, modify the function name each time you change its "observable semantics"? Is everybody really that anal-retentive about it? I doubt that. But we all live with automerges because it's cheaper to deal with their occasional damage than with the constant damage of manual merges, which take lots of time and are intolerably boring, thus very error prone. Which is why I prefer distributed systems and their better ability to merge long-living branches due to detailed recording of change history, even though long-living branches are extremely harmful. Harmful as they are, they will occasionally flourish, and then you need strong automerge, not manual merge, to end their evil lives. But anyway, who cares about the preferences of a person who doesn't even bother to check his own trivially testable claims?

Blech.

DVCS and its most vexing merge

There's this very elegant way to shoot your leg off with a DVCS. Here's the recipe:

  1. I create a clone of the repository A and call it B. You create another clone of A and call it C.
  2. I add an if statement in B, fixing a bug. You fix the same bug in C in a very similar way.
  3. I pull your changes from C to B. There's a conflict around the bug since we both fixed it. I take my patch, throwing away your nearly identical patch.
  4. Meanwhile, your roommate pulled both B and C into his clone, D. And he had to resolve that conflict, too. And he took your patch, and threw mine away.
  5. Now, D and B are pushed back to A. DVCS quiz: what happens?

Answer: the system has accurately recorded the manual merges, so it has enough information to make this new merge automatically. It sees your patch, and it throws it away as it should according to my manual merge. It sees my patch, and it flushes it down the toilet since that's what your roommate said. Net result: both patches are gone, the bug is back in business. (Edit: it doesn't work that way – bk version 4 does a different thing, and other systems reportedly do still other things. Do you still want to read the rest of this?..)

Which of course doesn't matter, since it's immediately discovered by the massive Automated Test Suite. For example, if it's an OS kernel, each revision is automatically tested on all hardware configurations it should run on. And the whole process only takes 10 minutes, according to the Ten Minute Build XP Practice. No harm done, no reason to discuss it. I just thought it was a curious thing worth sharing.

Maybe it's a well-known thing, but I don't think it is, and if I'm right, it's definitely lovable. For example, here's what BitMover, maker of BitKeeper, the common ancestor of DVCSes, has to say about this:

"It's important to note that because BitKeeper has a star topology and its possible to share data with any repository, it's not necessarily recommended."

What this is trying to say is that the graph of pulls shouldn't be a generic graph and you're better off with a tree. That is, I shouldn't pull directly from you; we should both pull from and push to A. You and your roommate should also synchronize via A, or via A's "child" repository, but then you shouldn't push to A directly, only via that child, and so on. If we maintain this tree structure, the same conflict will never be resolved twice, and then we won't get screwed when the merges are merged.

I wonder if you could detect the situations when you "merge merges", that is, when the same conflict was resolved differently. You could then insist on human intervention and save those humans' bacon. I'm too lazy to think this out and too stupid to effortlessly see the answer, so I'll resort to a social heuristic like all of us uber-formal nerds do. The social heuristic is that Larry McVoy of BitMover has probably already thought about this, and found ways in which it could fail. So I'm not going to argue that BitKeeper merges are broken.

What I'm going to argue, at least for the next couple of paragraphs, is that it sucks when they tell you about their superstar topology and then explain that it's best to avoid actually using it. Not only that, but they fail to mention a fairly frightening and, trust me, not at all unlikely scenario which could actually persuade you to follow their advice.

Because when they tell me "we have this simple model of how things work – repositories with complete local history and changes flowing between them – but you should arbitrarily restrict yourself to a subset of this model, for reasons we aren't going to share with you, even though the general case works", when they tell me that, my reply is "I alias rm to rm -f". I understand how rm works, it's fairly simple, and I don't like to talk about it over and over again, "Are you sure?" – yes, I'm sure, thank you very much and good bye.

But the lovable part is, speaking of social heuristics, the lovable part is that BitMover said it right. Because if they mentioned that fairly frightening and not at all unlikely scenario, they'd scare people off rather than illustrate a point. On the other hand, when they say "It's good practice to think about how the data should flow", most people will nod and follow whatever advice they give them.

Just imagine a team of programmers engaging in the practice of thinking about how the data should flow, dammit, all on company time. Yeah, yeah, so BitKeeper earned a sarcastic comment on Proper Fixation. It's still a small price to pay for getting your message to the majority of your users.

You see, the majority of programmers are not just "irrational" as we all are, but their reliance on reasoning doesn't even exceed the mean of the population, which means they barely use reasoning at all, it's pure gut feeling.

For example, I was writing a bunch of macros in a proprietary debugger extension language. A guy who came to talk to me looked over my shoulder, and I explained, "Debugger macros. Very useful, a crappy language though." He said, "Yeah, looks like so."

HE COULDN'T POSSIBLY KNOW THAT. I knew he couldn't. How could he look at the code and realize that all variables were global? How could he know they were printed upon assignment, including loop counters ('cause it's a "macro", so it works just like assigning at the debugger console, which prints the variable)? He couldn't know any of that. So why did he agree with the "crappy" part? Oh, I knew why.

"You mean it has dollar signs?" Silence. "You mean it prefixes variable names with the dollar sign, don't you?" "Yeah, that." "Well, I like the dollar signs, helps you distinguish between your macro variables and the variables of the debugged C program. Anything else?" "Well, um, yeah, it looks kinda primitive." Low-end Ignorant Language Bigotry quiz: if "crappy" means "has dollar signs", what does "primitive" mean? Answer: no type declarations. I'm sure it was that, although I didn't go on and ask.

So that's "engineers" for you. If you want to write programs or tech docs for the average engineer, keep that picture in mind. Or don't. Aim for the minority, for people who don't work that way, under the assumption that they are the ones that actually matter the most. I don't know if this assumption is right, but at least it's lofty.

Why DVCS?

For the record, I had my share of both centralized and distributed version control systems, and today I like it distributed and I wouldn't ever want to go back, The Most Vexing Merge be damned. Why? I'll share my reasons with you through the story of My Last CVS To BitKeeper Exodus. I think I'll illustrate the engineers-and-reasoning-don't-mix point as a by-product, because that's how that story goes.

There recently was this argument about DVCS encouraging "code bombs", a.k.a "crawling into a cave". I haven't heard either of these terms, so I've been using a term of my own – "accumulating critical mass". The idea is to develop in your own corner, without integrating it with the main branch. You then show up with N KLOC of new stuff and kindly ask to merge it in.

Some people claimed this was particularly harmful for open source projects where there was no managerial authority to prevent this. Ha! Double ha. In an open source project, the key maintainers may say, "you see, we can't integrate it when it's done this way; we're sorry, but you should have talked to us." The changes will then be reimplemented or dropped on the floor.

Now, if you think that in a commercial environment a manager can easily decide to drop changes on the floor, together with the cost of implementing them and especially the cost of delaying the delivery of the features, if you think that, well, I wonder why you do. But perhaps a manager could insist on frequent integration? She could try, but she'd have to deal with real or imaginary cost of merges, increasing over time and getting in the way of deliveries. Of course there are perfect managers and perfect teams where it's all dealt with appropriately, you just have to find them.

So yeah, "code bombing" is a problem, especially in commercial projects. But the idea that DVCS encourages it? Hilarity! It's like saying that guns encourage murder. I prefer to think of guns as something that encourages fear of armed policemen, getting in the way of the natural instinct to club your neighbors to death. I mean, yeah, it's easier to code bomb with a DVCS, but with a centralized system, people use code bombing – or clubbing? – even more, because merging is harder, the cost of merges increases more quickly and the ability to force integration is thus lower. The criminals are poorly equipped, but so is the police.

And this is exactly what happened to the last team stuck with CVS that I helped migrate to BitKeeper. Everybody had their own version made up of file snapshots taken at different times and merged with the repository version to different extents. A centralized system doesn't record these local versions, so unless you immediately commit your merges, you are left with a version of a file which the system doesn't know. This means that the next merges are going to be really hard, because you've lost your GCA, the greatest common ancestor. So instead of a 3-way merge, you'll be doing a 2-way merge, which really sucks.

So I decided to not talk about the caves they were crawling into and the code bombs they were throwing at each other. Rather, I decided to show them how a 2-way merge couldn't do what a 3-way merge could. I still think it's the ultimate argument for DVCS, because DVCS is basically about accurate recording of all versions and not just the single time line of the main trunk. So the argument for it has to do with the benefits of such detailed recording.

So I gave this example where you start with a file having two lines:
aaa
bbb

And then I add a line, ccc, and you delete a line, aaa. If we have the GCA (a 3-way merge), then clearly the right result is deleting aaa and adding ccc, getting this:
bbb
ccc

But with a 2-way merge, we only have your file:
bbb

…and my file:
aaa
bbb
ccc

This can only be merged manually, because there's no way to automatically figure out that you deleted aaa and I added ccc; for all the tool knows, you could have done nothing and I've added two lines, so the right merge is:
aaa
bbb
ccc

…canceling your change. So it has to be manual merge. Manual merge means dozens of boring deltas you have to inspect in each file. That's what I call "costly".

Of course it doesn't matter in a right world, where people integrate frequently and always commit their merged files to the centralized repository. Except it wasn't so in the wrong world of the CVS developers I was "helping" to upgrade to new tools (for the last time in my life, people, for the last time in my life). And I thought we could avoid the discussion of the somewhat-technical-but-largely-social reasons of the constantly increasing cost of merges, and instead we could focus on the technical benefits of the 3-way merge and accurate GCA recording.

And of course I was wrong. The discussion immediately shifted to "we don't need merges" because everything is "modular" and there's a single owner to each file. Of course it wasn't, and there wasn't. Some things were used by everybody, like the awful build scripts and the DMA code. Some modules had two owners, or were in a transition state and had 1.5 owners, and so on. There were merges all over the place.

And if there weren't merges and merge-related problems, how come everybody worked on their own "pirate" version which was never recorded in the main trunk and was made from a colorful variety of files partially merged at different times? How come changes propagated with cp and emacs-diff and not cvs update? And why was the tech lead so passionate about moving to BitKeeper which doesn't let you partially update a repository so you have to merge everything? And why did everybody anxiously object that necessity if there were "no problems with merges"?

The final result: the tech lead simply forced the migration to bk. Everybody on the team hated me for my connection with the idea (I wasn't on their team but I used to be a likable satellite and now became a hateful satellite). Developers who I thought were their best eventually (and I mean eventually) told me it was actually a good thing. So it wasn't a bad closure. And still, I decided that I'm not going to "help" anybody "deploy" any kind of "tool" in this lifetime again, roughly speaking. Too much emotions for this programmer.

And this was supposed to show why I like DVCS, at least in the imperfect world where long-living branches occasionally happen, and the kind of reasoning I think is interesting in this context, and the kind of reasoning other people I came across found interesting. So there were are.

P.S. Why "most vexing"?

I thought I saw that "C++'s most vexing parse" from Scott Meyers' Effective STL has its own Wikipedia entry, but apparently it doesn't. It's basically a variation on the theme of C++'s declaration/definition ambiguity, and I liked the term, especially the "most" part where parses are unambiguously sorted along the vexing dimension. So I figured "X's most vexing Y" is a good template.

I'd like to use this opportunity to say that I skimmed though Effective C++, 3rd Edition, and… Where do I start? There's an advice to create date objects with "Date d(Day(31), Month::april(), Year(2000))" or something. That is, create special types for the constructor arguments. Well, it doesn't check that April comes without the 31th day, does it? The Date constructor could test for it though. Well, why not test for April 41st in the Date constructor, too, and, ya know, spare the users some keystrokes, if you see what I mean? The code is verbose. C++ compiler error messages are verbose. VERBOSITY EVERYWHERE! Help!

This raises the question to the author, whether he ever worked with a system where every piece of data comes covered with the toxic waste of overzealous static typing. But this borders on an ad hominem attack. And seriously, that sort of thing is to be avoided, at least until somebody proposes to have named constants for days or years and not just months.

So instead of the personal attack, I'll ask Software Development Consultants, all of them, to kindly change the phrasing "it's best to do so and so" to "I like so and so" or something. Because we have this huge crappy-dollar-sign crowd, and they copy style from each other like crazy, and their ultimate source of style is books by Software Development Consultants, and whenever they see a "best practice", their common sense is turned off and they add the technique to the bag of tricks. So Consultants have a great power in this world. It doesn't make the common sense shut-off feature their fault, but power they do have.

And with great power comes great responsibility, profanity deleted. I mean, you're obviously giving advice neither you nor others have tested for very long, out of generic principles, profanity deleted. Like "prefer algorithms such as for_each to loops", an advice issued before fully compliant implementations of STL were even widely available, profanity deleted. Quite a piece of advice, that, profanity deleted. Couldn't you at least phrase your advices in a little less self-assured way, fucking profanity deleted?

For example, Meyers has finally lowered the bridge and let the enemy of template metaprogramming occupy a notable share of pages in an Effective C++ edition. I still remember his promise to "never write about templates", in the preface to Modern C++ Design, I think. And now the promise is broken. Hordes of clueless weenies are rushing into the minefield of template metaprogramming as we speak, since it's now officially Mainstream C++. Can you imagine the consequences? I can't. It's too awful. I think I'll go to sleep now.

OO C is passable

My problem with C++ bashing is that I'm overqualified. Ever since I've published C++ FQA Lite, I've tried to stay out of "C++ sucks" discussions. I've said everything before and I don't want to say it again. And then much of the C++ arcana is finally fading from my memory; good, no need to refresh it, thank you.

I don't always quit those discussions though. How should I do that? If I were someone else, I could send a link to the C++ FQA to end the discussion ("if you really care, check out yada yada"). But I can't use this link myself, because "OMG, did you actually write a whole site about this?!"

So the last time I didn't quit this discussion, I said: "You know, at some point you actually start to prefer plain C". The seasoned C++ lover replied: "You mean, the C with classes style, with no tricky stuff? Sure, don't we all end up writing most of the code like that?" No, said I, I meant plain C. No plus signs attached. File names ending with .c. C without classes. C.

"Now that is fanaticism," said the guy. "What could be the point of that? You know what? You may like C++ or you may dislike it, but face it: C just isn't good enough".

Fanaticism?! Now that is an insult. Yes, I recently decided to write a bunch of code in plain C inside a C++ code base. I did this after maintaining the previous, C++ implementation of that stuff for 4 years. For at least 3 of those years, I didn't do any significant rewriting in that code, because it could interfere with schedules and it would complicate merges. Although some pieces were hopelessly awful. Sloppy text parsing interleaved with interrupt handling (I exaggerate, but only very slightly). But it worked, so it was hardly urgent to fix it.

And then it had to be ported to a new platform. And I obsessed over the rewrite-or-extend question for a lot of time. There was a critical mass of incompatible new stuff, so I chose "rewrite". But. I decided to write the new version in C++. Why? Because it's a C++ code base, and people are used to C++ and its style and idioms, however brain-damaged. "So you, YOU will do it in C++ after all?! You're a wuss," said my manager, an old-time C++ hater. Time for the rhetorical question. Does this sound like the story of a fanatic?

OK then, why did I decide to do it in C after all, you may ask. I'll tell you why. I did it because everybody was sick and tired of the build time of my auto-generated C++ code.

You see, the whole thing is data-driven. The data objects describing its workload are generated at build time. Do you know any way of generating C++ code that doesn't compile slowly as hell? I don't. I've generated both "real" code (functions doing stuff) and "data definition" code (which does practically nothing except for calling object constructors). And it always compiles slowly. Sometimes turning optimization off speeds up compilation significantly, sometimes less significantly. But the best part is this: you never know exactly what's your problem. Try to profile a C++ compiler.

It's not that manually written C++ code is such a blast. For example, we have a ~2K LOC file defining a class template and explicitly instantiating it 4 times. It worked fine for years, until it met a particular version of the Green Hills C++ compiler. That version decided that it needs more than 1.5G of memory to compile that file. I don't know how much more exactly, because at that point my workstation suffocated, and could only breathe again when the process ran out of swap space and died. Here's another rhetorical question: how the fuck are you supposed to figure out what the fuck causes this problem?

What's that? "It's a tool problem, not a language problem?" Bzzzt, wrong answer! It's neither a language problem nor a tool problem; it's my problem, because I must fix it. And this is why I don't want to deal with a language that consistently breeds tools which create such problems for me. But since I'm hardly a fanatic, and I know exactly why I do want to work on this particular C++ code base, I hold my nose and I delve right into the pile of excrements and find out that if you instantiate each template in its own file, then the process memory consumption barely crosses the 350M mark. Nice and compact, that. So, let's use 4 files.

Nope, manually written C++ code isn't a picnic. But auto-generated code is worse, because it relies on some set of features and uses them a lot. The number of uses per feature per file matters. 1 explicit template instantiation per file = 350M of compiler process memory. 4 instantiations = out of memory. What about "simpler" features, but hundreds of uses? The compiler will grind to a halt for sure. Um, you might say, don't other languages have "features" which will be used hundreds of times by generated code? Yes, they do. Those features just DON'T SUCK quite as impressively. Go ahead, show me a problem with compilation speed anywhere near what C++ exhibits in another language.

Face it: C++ just isn't good enough. "If you really care about C++ parsing complexity, check out the FQA yada yada". I wrote "a whole site" about it, you know. Bottom line: if you generate native code, you want to define your object model such that the generated code can be C code. Assembly is pointlessly low-level and non-portable, and C++ sucks. Believe me, or die waiting for your code to compile.

So, C. My object model will be in C. Um. Bummer. It's an OO thing, with plugins and multiple inheritance and virtual inheritance. It has to be. You have orthogonal plugins which want to derive classes from a common base – a classic diamond hierarchy. Well, I can have a couple of macros for doing MI-style pointer arithmetic, by fetching the derived-type-specific offset of each base class object at run time. No big deal. I even like it better than the C++ MI downcasting syntax – at least you know exactly what you're doing, and you don't need to think whether it should be dynamic_cast or static_cast or eat_flaming_death_cast to really work.

But I miss virtual functions. I really do. I sincerely think that each and every notable feature C++ adds to C makes the language worse, with the single exception of virtual functions. Here's why not having virtual functions in C sucks:

  • You can't quite fake them with C macros.
  • Virtual function call is a shortcut for obj->vtable->func(obj, args). The OO spelling – obj->func(args) – is of course better.
  • You'll usually try to make the C version shorter: obj->func(args), obj->vtable->func(args), or obj->func(obj, args). Quite likely you'll find out that you really needed to pass obj to func and/or the vtable level of indirection. Updating the code may be tedious/too late/really annoying (because of having to admit a stupid mistake). The eventual lack of call syntax uniformity will also be annoying.
  • Decent C++ debuggers automatically downcast base class object pointers to the real run time type when you inspect the object, even when C++ RTTI support is turned off at compile time. They do it by looking at the vtable pointer. Achieving this with OO C is only possible on a per-OO-faking-style, per-debugger basis, using the ugly debugger scripting facilities. Most likely, you won't do it and choose interactive suffering each time you debug the code, having to figure out the actual type yourself and cast pointers manually.
  • With virtual functions, base class implementations are inherited automatically. With explicit vtable structures, you need to either have links to base class implementations (the slow MFC message table way), or you need to explicitly and fully fill the vtables in each derived class. Possibly using the C default of trailing zeros in aggregate initializers as in vtable_type vtable={foo_func,bar_func} /* and the third member, baz_func, is 0 - we check for zero vtable entries before calling our pseudo-virtual functions */. Run time checks for null function pointers can make initialization code smaller, but they also make function calls slower.
  • With explicit vtable initializers, you only see the position of the function in the vtable initializer and its "derived class" name (my_class_baz_func), not its "base class" name (baz_func). You are likely to have a slightly inconsistent "derived class method" naming convention, making it annoying to figure out exactly which base class function we're overriding here.

An impressive list, isn't it? You can see from it that I've really put my employer's money where my mouth is and actually worked with OO C for a while. Aren't C++ classes with virtual functions simply better? No, because C++ classes don't support aggregate initialization. If you have C structures with vtable pointers, you can use the frob_type g_obj={&vtable,5,"name"} initialization style. This translates to assembly that looks like so:

g_obj:
.word vtable
.word 5
.word .str34
.str34:
.asciz "name"

This compiles and loads as fast as it gets. Now, if you choose real C++ vtables, you rule out aggregate initialization once and for all. Your initialization will instead have to be spelled as frob_type g_obj(5, "name"), and even if you have no constructor arguments, C++ will generate a constructor to set the vtable pointer.

The good news: at least the explicit reference to the vtable in our source code is gone. The bad news: with many objects, the C++ version compiles very slowly (I've checked with GNU and Green Hills C++). It also generates a much larger image, since it emits both global data objects and assembly code copying them into object members at run time. The latter code also costs you load time. And if you crash there, good luck figuring out the context. But as far as I'm concerned, the worst part is the build time explosion.

Yes, yes. It's not important. And it's FUD. And it's a tool problem. A "good" compiler could optimize the C++ version and generate the same assembly as the C version. And it could do it quickly. Those compiler writers are just lame. I completely agree with you, sir. Just go away already.

By the way, the same trade-off happens with C++ containers, like std::vector. They're better than {T*base;int size;} structures because you have the shortcut of operator[] (as opposed to array->base[i]). And because debuggers can gracefully display all elements of std::vector as a list of the right size. Some of the debuggers can. Sometimes. Sometimes it breaks. But when it doesn't, it's fun. But, again, you can't use aggregate initialization once your structure has a std::vector in it. And C++0x won't solve it, because its pseudo-aggregate initializers are syntactic sugar, and my problem here isn't the syntax, it's the build time.

And std::vector forces allocation of data on the heap (let's not discuss custom allocator templates, 'cause I'm gonna vomit). Can't have the base address of a std::vector point to a global variable generated specifically to hold its data.

I like things to point to globals generated to hold their data. Helps a lot when you debug, because your pointer is now firmly associated with a symbol table name. And no matter what memory-corrupting atrocity was committed by buggy code, that association will be there to help you figure things out. And heap corruption is very common in C++, because it's a completely unsafe language. So I care a lot about debugging core dumps with corrupted memory. Which is why base, size structures get my vote.

And that's an important point: you can live with just safety, or just control, and of course with both, but if you have neither safety nor control, then, sir, welcome to hell. Which is why OO C is worse than OO in Java or Lisp or PowerShell, but better than OO in C++.

And OO C is not all bad after all. Manually faking virtual functions has its benefits:

  • You can dynamically "change the object type" by changing the vtable pointer. I first saw this in POV-Ray, which has a vtable for circles and a vtable for ellipses. When a geometric transformation applied to a circle object transforms it to an ellipse, the vtable pointer is changed to point to the more generic and slower ellipse rendering functions. Neat. You could do this using C++-style OO by having another level of indirection, but with C, it's marginally faster, which can matter sometimes. And the C way is much neater, which is useful to balance the frustration caused by the drawbacks of fake OO. I use this trick a lot for debug plugins in my fake OO stuff.
  • Likewise, you can overwrite individual vtable entries. This changes the type of all objects of that "class" – primarily useful for logging and other sorts of debugging.
  • Sometimes you really don't need obj->vtable->func(obj, args) – say, obj->func(args) is good enough. And then you win.
  • You don't have to use a structure with named members to represent a vtable. If a bunch of functions have the same prototype, you can keep them in an array. You can then iterate over them or use an index into that array as a "member function pointer". This way, you can have a function calling a method in a bunch of objects, and the method to call will be a parameter of that function. The C++ member function pointers don't support the iterate-over-methods trick as efficiently, and their syntax is remarkably ugly.
  • That each function has a unique non-mangled name (as opposed to A::func, B::func, etc.) has the benefit of making the symbol table clean and independent of the non-portable C++ mangling. And you no longer depend on the varying definition look-up abilities of debuggers and IDEs (I don't like the lengthy disambiguation menus when I ask to show me the definition of "func", and these menus show up too often).
  • If you serialize the objects, or you have programs shoveling through process snapshots and analyzing them, the memory image of OO C objects is easier to deal with because of being more portable. Basically you only need to know the target endianness, the alignment requirements of built-in types, sizeof(enum) and the implementation of bitfields, if you use that. With C++, you need to know the layouts of objects when multiple/virtual inheritance/empty base class optimization/etc. is at play; this isn't portable across compilers. And even that knowledge is only useful if you know the members of each class; otherwise, you need to parse the layouts out of non-portable debug information databases – parsing C++ source code is not an option. With C, you can parse its files and find the struct definitions and figure out the layouts and it will be really easy and portable. Been there, done that.

Of course, you can use all those tricks in a C++ object model when needed, and use virtual functions elsewhere. And if C++ didn't suck so much (for example, if code compiled reasonably fast and if it had a standard ABI and and and…), that would be the way to go. But since C++ just isn't good enough, and I have to use a pure C object model, I point out the benefits of the sucky thing that is OO C to make it a little less bitter.

Overall, OO C is passable, more so than the endless rebuild cycles caused by our previous C++ object model. So you have a bunch of lame stuff. Having to use a typedef because struct A can only be referred to as struct A, not A. No default function arguments. Having to declare variables at the beginning of the scope. (BTW, I use C89. I don't believe in C99. I'm a Luddite, you already know that, right? I don't think C99 is as portable as C89 here in 2008).

So, yeah. I do miss some C++ features, but it's a small itching here, a tiny scratching there, nothing serious. I can live with that as the price for not having my legs shot off by mysterious compile time explosions and other C++ goodies. OO C is good enough for me. Fanaticism? You be the judge.

Redundancy vs dependencies: which is worse?

I believe that there are just two intrinsic forces in programming:

  1. You want to minimize redundancy and, ideally, define every piece of knowledge once.
  2. You want to minimize dependencies – A should depend on B only if it absolutely must.

I think that all other considerations are of the extrinsic real-world kind – domain modeling, usability, schedules, platforms, etc. I also think that I can show how any "good" programming practice is mainly aimed at minimizing redundancy, dependencies, or both. I even think that you can tell a "good" programmer from a "bad" one by their attitude towards redundancy and dependencies. The good ones hate them, the bad ones don't care.

If this idea looks idiotically oversimplified, note that I mean "programming aptitude" in a narrow sense of code quality. I've seen brilliant, cooperative people with uncanny algorithmic capabilities who still wrote awful code. I tried to figure it out and the common denominator seemed to be that they just didn't care about redundancy or dependencies, or even kinda liked them. Maybe it still looks idiotically oversimplified. Let's leave it at that, because it's not what I'm here to talk about.

I'm here to talk about the case when minimizing redundancy conflicts with minimizing dependencies. This case is basically code reuse beyond module boundaries. You can choose between having modules A and B using a module C doing something, or have them do it themselves. What's your call?

One strikingly dumb thing about this question is that it's centered around the term "module", which is vague and informal. However, "module" is what makes this a trade-off. Inside a module, of course you want to reuse the code, end of discussion. Why would anyone want to parse two command line options with two duplicated code snippets when you could use a function?

On the other hand, if two modules parse command lines, we can still factor out the parsing code, but we'd have to make it a third module. Alternatively, we can stuff it into a "utilities" module. The one affectionately called "the trash can". The one which won't link without a bunch of external libraries used to implement some of its handy functions. The one with the configuration which always gets misconfigured, and the initialization which never happens at the right time. You know, the utilities module.

I believe that years of experience barely matter in terms of knowledge. You don't learn at work at a pace anywhere near that of a full-time student. Experience mainly does two things: it builds character, and it destroys character. Case in point: young, passionate programmers are usually very happy to make the third module, nor do they cringe when they delve into the utility trash can. They're then understandably offended when their more seasoned colleagues, having noticed their latest "infrastructural" activity, reach out for the barf bags. This certainly illustrates either the character-building or the character-destroying power of experience, I just don't know which one.

No, seriously. Take command line parsing. You want common syntax for options, right? And you want some of them to accept values, right? And those values can be strings, and booleans, and integers, right? And integers can be decimal or hexadecimal, right? And they can be values of user-defined types, right? And they can have help strings, right? And you'd like to generate help screens from them, right? And GUIs with property pages? And read them from configuration files? And check the legality of flags or sets of flags, right?

Sure. It's not a big deal. Trivial, even. (If you're smart, everything is trivial until you fail completely due to exceeding complexity. And admit that you failed due to exceeding complexity. The former takes time to happen, the latter can never happen.) Quite some people have devoted several of the beautiful months of their youth to the problem of argument passing. Example: XParam, which calls itself "The Solution to Parameter Handling". Took >10K LOC the last time I checked. Comes with its own serialization framework. Rumors tell that its original host project uses <5% of its features.

Clarification: I'm not mocking the authors of XParam. Reason 1: Rumors tell they are pretty sharp. Reason 2: I'm really, really ashamed to admit this, but I once worked on a logging library called XLog. Took >10K LOC the last time I counted. Came with its own serialization framework. First-hand evidence tells that its host project uses 0% of its features. Ouch.

You know how I parse command line arguments in my modules these days? Like so:

for(i=0; i<argc; ++i) {
  if(strcmp(argv[i],"-trace")==0) {
    trace=1;
  }
}

I used C for the example because it's the ugliest language for string processing, and it's still a piece of cake. No, I don't get help screens. No, I don't get proper command line validation. So sue me. They're debugging options. It's good enough. Beats having everything depend on a 10K LOC command line parsing package. Not to mention a 50K LOC utility trash can full of toxic waste.

Modules are important. Module boundaries are important. A module is a piece of software that has:

  1. A reasonably compact, stable interface. An unfortunate side effect of OO training is that compact interfaces aren't valued. It's considered OK to expose an object model with tens of classes and hairy data structures and poorly thought-out extensibility hooks. Furthermore, an unfortunate side effect of C++ classes is that "stable interface" is an oxymoron. But no matter; nothing prevents you from implementing a compact, stable interface.
  2. Documentation. The semantics of the compact and stable interface are described somewhere. A really good module comes with example code. "Internal" modules lacking reasonably complete documentation suck, although aren't always avoidable.
  3. Tests. I don't think unit-testing each class or function makes sense. What has to be tested is the "official" module interfaces, because if they work, the whole module can be considered working, and otherwise, it can't.
  4. Reasonable size. A module should generally be between 1K and 30K LOC (the numbers are given in C++ KLOC units; for 4GLs, divide them by 4). Larger modules are a pile of mud. A system composed of a zillion tiny modules is itself a pile of mud.
  5. Owner. The way to change a module is to convince the owner to make a change. The way to fix a bug is to report it to the owner. That way, you can count on the interface to be backed up by a consistent mental model making it work. The number of people capable of simultaneously maintaining this kind of mental model was experimentally determined to be 1.
  6. Life cycle. Except for emergency bug fixes, changes to a module are batched into versions, which aren't released too frequently. Otherwise you can't have a tested, stable interface.

Pretty heavy.

Do I want to introduce a module to handle command line parsing? Do I really wish to become the honored owner of this bleeding-edge technology? Not now, thank you. Of course, it's the burnout speaking; it would be fun and it would be trivial, really. Luckily, not everyone around is lazy and grumpy like me. See? That guy over there already created a command line parsing module. And that other person here made one, too. Now, do I want my code to depend on their stuff?

Tough question. I've already compromised my reputation by refusing to properly deal with command line parsing. If my next move is refusing to use an Existing Solution, I will have proved the antisocial nature of my personality. Where's my team spirit? And still, I have my doubts. Is this really a module?

First and foremost, and this is the most annoying of all questions – who owns this piece of work? Sure, you thought it was trivial and you hacked it up. But are you willing to support it, or do you have someone in mind you'd like to transfer the ownership to? You probably don't even realize that it has to be supported, since it's so trivial. Oh, but you will. You will realize that it has to be supported.

I'm enough of a Kassandra to tell exactly when you'll realize it. It will be when the first completely idiotic change is made to your code by someone else. Not unlikely, tying it to another piece of "infrastructure", leading to a Gordian knot of dependencies. Ever noticed how infrastructure lovers always believe their module comes first in the dependency food chain and how it ultimately causes cyclic dependencies? So anyway, then, you'll understand. Of course, it will be too late as far as I'm concerned. My code got tied to yours, theirs, and everybody else's sticky infrastructure. Oopsie.

(The naive reader may ask, what can a command line parser possibly depend on? Oh, plenty of stuff. A serialization package. A parsing package. A colored terminal I/O package, for help screens. C++: a platform-specific package for accessing argc, argv before main() is called. C++: a singleton initialization management package. WTF is that, you ask? Get a barf bag and check out what Modern C++ Design has to say on the subject).

So no, I don't want to depend on anything without an owner. I see how this reasoning can be infuriating. Shifting the focus from software to wetware is a dirty trick loved by technically impotent pseudo-business-oriented middle-management loser types. Here's my attempt at distinguishing myself from their ilk: not only do I want to depend on stuff with an owner, but I require a happy owner at that. Contrary to a common managerial assumption (one of those which rarely hold but do keep managers sane), I don't believe in forcibly assigning ownership. If the owner doesn't like the module, expect some pretty lousy gardening job.

What about life cycle? My modules and your modules are released at different times. I might run into a need to check for compatibility with different versions of your stuff. My tests won't compile without your code, so now I need to always have it in my environment. What does your code depend on – is it a small, stable, defined set of things? I don't want to be stuck just because your new version drags in more dependencies. What if I need a feature? You don't seem to support floating point numbers. And I don't see abbreviations, either; I need them for a bunch of flags 'cause they're passed interactively all the time.

What about stable interface and semantics? Your previous version would accept command lines passing the same option multiple times, and take the last value. I have test scripts that count on that. Your new version reports an error, because now this syntax is reserved for lists (-frob=a -frob=b -frob=c passes the list a,b,c as the value of the frob option). Sigh. I guess it could be worse – you could make the string "a,b,c" from this command line, and then the problem would propagate deeper.

I could go on and on, about how your interface isn't a decent public interface (you don't seriously expect me to subclass EnumParser to handle flags with a fixed set of legal string values, do you?). Or about the size of your code, which at the moment dwarfs the size of my own module, so I'd have more command line parsing code than anything else in my tests. And how it hurts when you download the tests using a slow connection to the target machine. And how I don't like it when my tests crash inside your code, even when it's my fault, because you have hairy data structures that I don't like to inspect.

But you already got it – I don't want your code, because I'm an antisocial asshole that has no team spirit whatsoever. I'm going to parse arguments using 5 lines of C code. Worse, I'll make a function out of those five lines, prefix its name with the module name, and replicate it in all my modules. Yep, I'm the copy-paste programmer and you're the enlightened developer of the next generation command line parsing platform. Have it your way, and I'll have it my way.

To complete the bad impression I've been making here, I'll use a flawed real-world analogy. Think living organisms. They have a very serious Tower of Babel problem; redundancy is everywhere. I've heard that humans and octopuses have very similar eye structure, despite being descendant from a blind ancestor. Command line parsers?! Entire eyes, not the simplest piece of hardware with quite some neural complexity at the back-end, get developed independently. Redundancy can hardly get any worse. But it works, apparently better than coordinating everybody's efforts to evolve would work.

Redundancy sucks. Redundancy always means duplicated efforts, and sometimes interoperability problems. But dependencies are worse. The only reasonable thing to depend on is a full-fledged, real module, not an amorphous bunch of code. You can usually look at a problem and guess quite well if its solution has good chances to become a real module, with a real owner, with a stable interface making all its users happy enough. If these chances are low, pick redundancy. And estimate those chances conservatively, too. Redundancy is bad, but dependencies can actually paralyze you. I say – kill dependencies first.

I can't believe I'm praising Tcl

So the other day I both defined Tcl procedures all by myself and used them interactively, and I liked it, to the point where it felt like some kind of local optimum. This entry is an attempt to cope with the trauma of liking Tcl, by means of rationalizing it. I'll first tell the story of my fascinating adventures, and then I'll do the rationalizing (to skip the oh-so-personal first part, go straight for the bullet points).

Here's what I was doing. I have this board. The board has a chip on it. The chip has several processors in it. There's a processor in there that is a memory-mapped device, and I talk to it using CPU load/store commands. The CPU is itself a JTAG target, and I talk to it via a probe, issuing those load/store commands. To the probe I talk via USB using an ugly console that can't even handle the clipboard properly. The console speaks Tcl.

Net result: in order to talk to the memory-mapped processor, I have to speak Tcl. Or I can use a memory view window in a graphical debugger. Except some addresses will change the processor state when read (pop an item from a LIFO, that sort of thing). So no, memory view windows aren't a good idea – you have to aim for the specific address, not shoot at whole address ranges. Damn, just who thought of defining the hardware in such a stupid way? Why, it was me. Little did I know that I was thereby inflicting Tcl on myself.

Anyway, there's this bug I have to debug now, and as far as I know it could be in at least three different pieces of software or in two different pieces of hardware, and I don't like any of the 5 options very much, and I want to find out what it is already. So at the ugly probe console I type things like word 0x1f8cc000 (reads the processor status), word 0x1f8cc008 2 (halts the execution), word 0x1f8cc020 0x875 (places a breakpoint). I get sick and tired of this in about 2 minutes, when the command history is large enough to make the probability of hitting Enter at the wrong auto-completed command annoying. It's annoying to run the wrong command because if I ruin the processor state, it will take me minutes to reproduce that state, because the program reads input via JTAG, which is as slow as it gets.

So I figure this isn't the last bug I'm gonna deal with this way, and it's therefore time for Extending my Environment, equipping myself with the Right Tools for the Job, Tailored to my needs, utilizing the Scripting capabilities of the system. I hate that, I really do. Is there something more distressing than the development of development tools for the mass market of a single developer? Can a programmer have a weakness more pathetic than the tendency to solve easy generic meta-problems when the real, specific problems are too hard? Is there software more disgusting in nature than plug-ins and extensions for a butt-ugly base system? But you know what, I really fail to remember 0x1f8cwhat the breakpoint address is. This story has one hexadecimal value too much for my brain. OK, then. Tcl.

I decided to have one entry point procedure, pmem, that would get a memory-mapped processor id, 'cause there are many of them, and then call one of several functions with the right base address, so that pmem 0 pc would do the same as pmem_pc 0x1f8c0000. Well, in Tcl that's as simple as it gets. Tcl likes to generate and evaluate command strings. More generally, Tcl likes strings. In fact, it likes them more than anything else. In normal programming languages, things are variables and expressions by default. In Tcl, they are strings. abc isn't a variable reference – it's a string. $abc is the variable reference. a+b/c isn't an expression – it's a string. [expr $a+$b/$c] is the expression. Could you believe that? [expr $a+$b/$c]. Isn't that ridiculous?

In fact, that was one of my main applications for Tcl: ridiculing it. I remember reading the huge Tcl/Tk book by Brent Welch with my friend once. There was a power outage and it was past the time when the last UPS squeaked its last squeak. And the book was there 'cause the hardware guys use it for scripting their lovecraftian toolchain. We really did have fun. Tears went down my cheeks from laughter. Even people with the usual frightened/mean comments about those geeks who laugh their brains out over a Tcl book didn't spoil it. So, ridiculing Tcl, my #1 use for it. The other use is the occasional scripting of the hardware hackers' lovecraftian toolchain. Overall, I don't use Tcl very much.

The nice thing about Tcl is that it's still a dynamic language, and reasonably laconic at that, modulo quoting and escaping. So I enter the usual addictive edit/test cycle using tclsh < script. N minutes down the road (I really don't know what N was), I've finished my 2 screenfuls of Tcl and the fun starts. I actually start debugging the goddamn thing.

$ pmem 0 stat
IDLE
$ pmem 0 bkpt 0 0xbff
$ pmem 0 bkpt 1 0xa57
$ pmem 0 cmd run
$ pmem 0 stat
DEBUG
$ pmem 0 pc
0xbff
$ pmem 0 rstack
3 return addresses
addr 0: 0x0005
addr 1: 0x05a8
addr 2: 0x0766
$ pmem 0 cmd stp
$ pmem 0 pc
0xc00

Weeee! HAPPY, HAPPY, JOY, JOY!

You have no idea just how happy this made me. Yeah, I know, I'm overreacting. I'll tell you what: debug various kinds of hardware malfunction for several months, and you'll be able to identify with the warped notion of value one gains through such process. On second thought, I don't know if I'd really recommend it. Remember how I told low-level programming was easy? It is, fundamentally, but there's this other angle from which it's quite a filthy endeavor. I promise to blog about it. I owe it to the people who keep telling me "so low-level is easy?" each time they listen to me swear heartily at a degenerate hardware setup where nothing works no matter what you try. I owe it to myself – me wants to reach a closure here. Why should I tolerate being regularly misquoted at the moments of my deepest professional catharsis?

Aaaanyway, in just N minutes, I bootstrapped myself something not unlike a retarded version of gdb, the way it would work if the symbol table of my program was stripped. But no matter – I have addr2line for that. And the nice thing about my retarded debugger front-end is that it looks like shell commands: blah blah blah. As opposed to blah("blah","blah"). And this, folks, is what I think Tcl, being a tool command language, gets right.

I come from the world of pop infix languages (C/Java/Python/Ruby/you name it). Tcl basically freaks me out with its two fundamental choices:

  • Tcl likes literals, not variables. Tcl: string, $var. Pop infix: "string", var.
  • Tcl likes commands, not expressions. Tcl: doit this that, [expr $this+$that]. Pop infix: doit("this","that"), this+that.

So basically, pop infix languages (and I use the term in the most non-judgmental, factual way), pop infix languages are optimized for programming (duh, they are programming languages). Programming is definitions. Define a variable and it will be easy to use it, and computing hairy expressions from variables is also easy. Tcl is optimized for usage. Most of the time, users give simple commands. Command names and literal parameters are easy. If you are a sophisticated user, and you want to do pmem 0 bkpt [expr [pmem 0 pc] + 1], go ahead and do it. A bit ugly, but on the other hand, simple commands are really, really simple.

And eventually, simple commands become all that matters for the user, because the sophisticated user grows personal shortcuts, which abstract away variables and expressions, so you end up with pmem 0 bkpt nextpc or something. Apparently, flat function calls with literal arguments is what interactive program usage is all about.

I'm not saying that I'm going to use Tcl as the extension language of my next self-made lovecraftian toolchain (I was thinking more along the lines of doing that one in D and using D as my scripting language, 'cause it compiles fast enough and it's apparently high-level enough). I haven't thought enough about this, but the grotesque escaping/quoting in Tcl still freaks me out; I don't want to program like that. All I'm saying is that I like the interactive part. Specifically:

  • Short code matters a lot; short interactive commands matter much more.
  • An interactive command language must be a real language (loops, functions and all).
  • Tcl allows for the shortest commands and it's a real language. I'm fascinated.

Allow me to elaborate.

Short code vs short commands

Lots of people have noticed that keeping your code short is extremely important. More surprisingly, many people fail to notice this, probably because "1 line is better than 5" doesn't sound that convincing. OK, think about 100K lines vs 500K and you'll get the idea. Oh, there are also those dirty Perl/shell one-liners that make one doubt about this. I've known a Bastard Programmer that used 2K bash one-liners as his weapon of choice. OK then, so the actual rule must be "short code is good unless it's written by a bastard". But it's the same core idea.

So we have the Architect type, who loves lots of classes which delegate work to each other, and we have the Enlightened type, who wants to write and read less. And the Enlightened type can rant and rave all day how Python, or Ruby, or Lisp make it oh-so-easy to define data structure literals, or to factor out stuff using meta-programming, or some other thing an Architect just never gets. And I'm all with it.

And then we have interactive shells. And in Python it's doit("xx","yy"). And in Lisp it's (doit "xx" "yy"), or (doit :xx :yy), or (doit xx yy) if you make it a macro. And in Ruby it's doit :xx :yy, if you use symbols and omit parens. And that's about as good as you can get without using your own parser as in doit "xx yy", which can suck in the (more rare) case when you do need to evaluate expressions before passing parameters, and doesn't completely remove overhead. Also note how all these languages use (), which makes you press Shift, instead of [] which doesn't. Ruby and Perl let you omit (), but it costs in readability. And [] is unanimously reserved for less important stuff than function calls.

The whole point of short code is saving human bandwidth, which is the single thing in a computing environment that doesn't obey Moore's law and doesn't double once in 18 months. Now, which kind of bandwidth is the most valuable? I'll tell you which. It's the interactive command bandwidth. That's because (1) you interact a lot with your tools and (2) this interaction isn't what you're trying to do, it's how you're trying to do it, so when it isn't extremely easy it's distracting and extremely frustrating.

This is why an editor that doesn't have short keyboard shortcuts for frequently used commands is a stupid fucking piece of junk and should go down the toilet right now. This is why a Matlab vector – [1 2 3] – is much better than a Python list – [1,2,3] (ever noticed how the space bar is much easier to hit than a comma, you enlightened dynamic language devotee? Size does matter). And don't get me started about further wrapping the vector literal for Numeric Python.

The small overhead is tolerable, though sucky, when you program, because you write the piece of code once and while you're doing it, you're concentrating on the task and its specifics, like the language syntax. When you're interacting with a command shell though, it's a big deal. You're not writing a program – you're looking at files, or solving equations, or single-stepping a processor. I have a bug, I'm frigging anxious, I gotta GO GO GO as fast as I can to find out what it is already, and you think now is the time to type parens, commas and quotation marks?! Fuck you! By which I mean to say, short code is important, short commands are a must.

Which is why I never got to like IPython or IDLE. Perhaps Ruby could be better, because of omitting parens and all. Ruby seems to be less inflicted with the language lawyer pseudo-right-thing mindset. But the basic plain vanilla function-call-with-literal-args syntax still doesn't reach the purity of *sh or Tcl. Well, the shell is an insanely defective programming language, so it's not even an option for anything non-interactive. But Tcl gets way closer to a programming language. Which brings us to the next issue:

Ad-hoc scripting languages – the sub-Turing tar pit

Many debuggers have scripting languages. gdb has one, and Green Hills MULTI has one. Ad hoc command languages usually get the command-syntax-should-be-easy part right – it's command arg arg arg… They then get everything else wrong. That is, you usually don't have any or some of: data structures, loops, conditionals and user-defined functions, option for expression evaluation in all contexts, interface to the host OS, and all the stuff which basically would make the thing a programming language. Or you get all those things in a peculiar, defective form which you haven't seen anywhere else.

I wish people stopped doing that. I understand why many people do that very well – they don't know any language which isn't a 3rd generation one (presumably C++ or Java). They don't know how scripting works except on a theoretical level. They know how to build a big software system, with objects and relationships between objects and factories of objects and stuff. At the system/outside world boundary they're helpless though. Outside of the system our objects are gone. There's this cold, windy, cruel world with users and files and stuff. Gotta have an AbstractInputParser to guard the gates into our nice, warm, little system, um, actually it's "big", no, make it "huge" system.

These are the Architects who get mocked by the Enlightened dynamic language lovers. They normally dismiss scripting languages as "not serious", therefore, when faced with the need to create a command language for their system, they start out with a plan to create a non-serious (a.k.a crippled) language. Even if they wanted to make it a good one, they never thought about the considerations that go into making a good scripting language, nor do they realize how easy/beneficial it is to embed an existing one.

So basically we have 3GL people, who realize that commands should be short ("it's a simple thing we're doing here"), but they don't see that you need a real Turing-complete programming language for the complicated cases. And we have 4GL people, who optimize for the complicated case of programming ("what's a scripting language – it's a programming language, dammit!"), and they don't care about an extra paren or quotation mark.

And then we have Tcl, which makes easy things really easy and scales to handle complicated cases (well, almost, or so I think). And not only does it make plain funcalls easy – it reserves [] for nested funcalls, in the Lispy prefix form of outercall arg [innercall arg arg] arg... [] is better than (). Pressing Shift sucks. And custom keyboard mapping which makes it possible to type parens without pressing Shift is complete idiocy, because you won't be able to work with anyone's machine. This shit matters, if you program all day long it does.

Now what?

I don't know if I'd use Tcl. It's less of a programming language than your typical pop infix 4GL. For starters, [expr] is a bitch. And then there are "advanced" features, like closures, that I think Tcl lacks. It has its interesting side from a "linguistic" perspective though. It has really few core syntax, making it closer to Lisp and Forth than the above-mentioned pop infix ilk. So you can use Tcl and claim for aristocracy. Of course you'll only manage to annoy the best programmers this way; the mediocre won't know what you're talking about, seeing only that Tcl doesn't look enough like C to be worth the name of a language.

I'd think a lot before embedding Tcl as a scripting language for my tools, because of linguistic issues and marketing issues (you ought to give them something close enough to C, whether they're a customer or a roommate). So the practical takeaways for me are modest:

  1. I ain't gonna mock Tcl-scriptable tools no more. I understand what made the authors choose Tcl, and no, it's not just a bad habit. On a level, they chose probably the best thing available today in terms of ease-of-programming/ease-of-use trade-off (assuming they bundle an interactive command shell). If I need to automate something related to those tools, I'll delve into it more happily than previously. So much for emotional self-tuning.
  2. I'll let it sink, and try to figure out whether you have a better trade-off. For example, if Ruby had macros (functions which don't evaluate their inputs), you could say doit x y without making x and y symbol objects, which forces you to prefix them with a colon. How macros of that sort should work in an infix language escapes me (not that I think that much about it, but still). Anyway, I'll definitely add Tcl to the list of things I should understand better in order to fight my linguistic ignorance. Being an amateur compiler writer, that seems like one of my duties.

Python: teaching kids and biting bits don't mix

Today an important thing happened in my professional life. I was told to take a break from The Top Priority Project, so that I could deal with a more important project. The evaluation of the expression max_priority+1 caused my wetware registers to overflow. Therefore, you should consider the following piece of whining as an attempt of a brain to recover from deadline damage. That is, you won't get the kind of deep discussions and intellectual insights you've come to expect from yosefk.com. Instead, you'll get shallow, plain, simple and happy Python hate. That's the kind of thing we have in stock right now.

It has been known for quite some time that a specie called the Idiot is out there, and its revolting specimens will mercilessly prey on your belief in the future of the human race. The time has come for me to share with you some of the shocking experience from my encounters with idiots. Believe it or not, there exists a kind of Idiot who sincerely thinks that a Person should not complain about Technology or Tools he or she gets to use, and that such complains are deeply wrong on this or other level, depending on the exact idiot in question. Well, I hereby inform the idiots, at least the ones who can use a browser and read text, that I (1) have a bad attitude, (2) am not a good workman, so (3) I'll complain about any technology I use as I damn please and (4) I don't give a flying fuck about your moronic opinion, so don't bother to share it with me.

So, Python hate. I don't really hate Python; it's a figure of speech. You know what I really hate? Of course you do. I hate C++. C++ is the hate of my life. I don't think I'll ever be able to hate another programming language. For me, to hate means to recognize that something is inherently evil and should be exterminated. Reaching that status is outstanding achievement for any technology. Your typical piece of software never gets that far. All it does is it does something you need most of the time, and then does something extremely brain-damaged some of the time. So you kinda like it, but sometimes it just pisses you off. It's gotta piss you off. It's a machine. It's doesn't have a pigeon crapload worth of common sense. Of course it pisses you off. Don't lie to me! A person who was exposed to machines and doesn't hate them is either an idiot or is completely devoid of soul! Step back, the child of Satan!

You know what really pisses me off about Python? It's the combination of being BDFL-managed and having roots in the noble idea of being The Language for Teaching. Sure, having lexical scoping wouldn't hurt, and having name errors 5 minutes down the road in a program that happily parses already hurts, and Python shells aren't a picnic, blah blah blah. But nothing is perfect, you know, and people adapt to anything. I learned to live in tcsh and vim as my IDE. So I know how adaptability can bring you quite far, sometimes much farther than you wish to get. But this BDFL+teaching combo really bugs the hell out of me. Allow me to elaborate.

Ever heard about programming languages with an ANSI standard? Sure you did. Well, the other kind of languages have a BDFL standard. It's when a Benevolent Dictator For Life, typically the Punk Compiler Writer (PCW) who invented the language and threw together the initial (and frequently still the only practical) implementation, decides what happens with the language. I plan to blog about PCWs at some point, but this isn't about PCWs in general, this is about PCWs who've been elevated to the BDFL rank. So they bend the language as they damn please. Sometimes it splits the user community (as in Perl 5 vs Perl 6) and sometimes it doesn't (as in Python 2 and Python 3, or so it seems). I'd say that it's totally stupid to use a BDFL-governed language, but I can't, because that would offend C++, The Hate Of My Life, who does have an ANSI standard. Relax, darling. It's you, and only you, that I truly hate. The others mean nothing to me.

So that's what the "BDFL" part means. The "teaching" part is about Python being supposed to be a good (the best? of course, the best) language for teaching kids how to program. While I'm not sure whether the BDFL part is inherently wrong, the teaching part is sure as hell wrong, in two ways:

  1. Why the fuck should kids program?
  2. Why the fuck should I program in a language for kids?

I didn't program till the age of 17, and I have absolutely no regrets about it. I know quite some other programmers-who're-really-into-software-someone-call-for-help who didn't hack in their diapers, either. I also know a bunch of people who did program since they were 10 to 12 years old. They are all burnt out. They're trying to look for some other occupation. Theater, psychology, physics, philosophy, you name it. I haven't gathered that many data points, just a couple dozens. Some people won't fit the pattern I see. But it's enough for me to assume that kids could have better things to do than programming. Watching cartoons is one thing that sounds like fun. That I always liked.

I'm not sure kids shouldn't program. We need scientific data on that one. I'm no specialist, but locking a large enough set of kids in the basement and have them implement progressive radiosity sounds like a good start. Anyway, as you probably noticed, while I'm curious about the scientific truth, I don't care much about kids. I care about me.

Me. I'm a professional programmer. By which I mean to say, I shovel through piles of virtual shit for a living. And to shovel quickly and happily, I need a big shovel. Python is one of my shovels. Core dumps are one of my piles of shit. Python, meet the core dumps. Core dumps, meet the Python.

Core dumps are spelled in hexadecimal. I mean "core dumps", "spelled" and "hexadecimal" in the broadest sense, as in "things having to do with machine encoding and that sort of low-level crud are viewed and edited in tools that tend to emit and consume integers in hexadecimal notation". Or "core dumps are spelled in hexadecimal" for short.

To deal with hexadecimal, Python has a hex() function and a '%s' format string like it should. Well, almost. Python thinks that hexadecimal is like decimal, except in base 16 instead of 10. Integers are naturally spelled in sign magnitude format: 1, 10, -1, -10. In hexadecimal, that's 0x1, 0xa, -0x1, -0xa. "-0x1"? "-0x1"?!! FUCK YOU, PYTHON!!

You see, all computers that have been out there since the year 19-something store integers in 2's complement format. There used to be 1's complement and sign magnitude machines, but they are all dead, which is a good thing, because 2's complement is a good thing. In 2's complement, -1 in hexadecimal is 0xffffffff. If you use 32-bit integers. And twice the number of f's on 64-bit machines. Do I give a flying fuck about 64-bit machines? I think you know the answer. If I cared, I'd run Python on a 64-bit host. An old version of Python.

In old versions of Python, hex() worked about right. Python has ints and longs (that's fixnum and bignum in Lisp, and it would be int and BigInteger in Java if ints didn't overflow but would instead turn into BigIntegers when int wouldn't have enough bits). hex() assumed that if you're using ints, you're a bit biter, and you want to get 0xffffffff for -1, and you should get your non-portable, but handy result. If, on the other hand, you used longs, you probably were a mathematician, so you'd get -0x1. And this was juuuust riiiight.

However, starting from version 2.something, hex() was "fixed" to always do "The Right Thing" – return the portable sign magnitude string. That, my friends, is what a kid would expect. You should know – you were a kid yourself, didn't you expect just that?! Naturally, breaking the frigging backwards compatibility is perfectly OK with the PCW BDFL, who sure as hell doesn't want to add another function, sex() for "sexadecimal". That function could (preferably) do the new thing, and hex would do the old thing to not break the existing programs. Or it could (passably) do the old thing so that people could at least get the old functionality easily, even though hex() would no longer work. But noooo, we need just one hex function doing the one right thing. Too bad we only found out what that right thing was after all those years went by, but in retrospect, it's obvious that nobody needs what we've been doing.

Now, could anybody explain me the value of hexadecimal notation outside of the exciting world of low-level bit fucking? Why on Earth would you want to convert your numbers to a different base? One particular case of this base brain rot is teaching kids to convert between bases. As Richard Feynman has pointed out, and he had the authority to say so and I don't and hence the reference, converting between numbers is completely pointless, and teaching kids how to do this is a waste of time. So what you do is you give me, the adult programmer with my adult core dump problems, a toy that a kid wouldn't need or want to play with. Thank you! You've found the perfect time, because I have a DEADLINE and this fucking shit CRASHES and I couldn't possibly have BETTER THINGS TO DO than WASTING TIME ON CONVERTING BASES!!

I know that I can implement the right (yes, THE RIGHT, damn it) hex and put it into my .pythonrc and that would solve the interactive shell annoyances and I could carry it around and import it and that would solve the non-interactive annoyances, thanks again. Until now I've done it in a cheaper way though – I had a Python 2.3.5 binary handy, and that did the trick. 2 Pythons, one with new libraries and shiny metaprogramming crap and stuff, and one with a working hex(). I like snakes.

Why do I attribute this business to kid-teaching, of all things? I don't lurk on Python forums, so I'm basing this on rumors I heard from die-hard Python weenies. Another, TOTALLY INFURIATING thing: a/b and a//b. int/int yields a frigging FLOAT in Python 3!! This has all the features of hex(-1) == '-0×1':

  • "Kids" get the wrong answer: 3/2 should give 1 (honest-to-God integer division) or it should give the ratio 3/2 (honest-to-God rational/real number division). Floating point is reeeaaally too complicated. I've seen a teacher saying online something along the lines of "if you ask how much is 3/2 times 2, how do you grade the answer 2.99999"? Bonus points if your reply managed to take the identity "2.999… = 3" into account (hordes of grown-ups fail to get that one).
  • "Programmers" get the wrong answer: chunks=len(arr)/n; rem=len(arr)%n. 'nuff said.
  • "Mathematicians" get the wrong answer: I mean, if you're doing numeric work in Python (what's the matter, can't pay for a Matlab license?), you probably know about integer division versus floating point division, but you've never heard about an operation called // (no, it's not a BCPL comment). 3/2=1.5? How am I supposed to do integer division? int(3/2)? Argh!

You know, I'm actually more willing to upgrade to C++0x, whenever that materializes, than I am willing to upgrade to Python 3. At least it would be "for(auto p=m.begin()…" instead of "for(std::map<std::string,std::string>::const_iterator p=FUCK OFF AND DIE YOU EVIL PROGRAMMING LANGUAGE FROM HELL!!".

What's that? Who said I liked C++0x more than I liked Python 3?! Upgrade, I said, "willing to upgrade"! Don't you see it's relative to the previous version? In absolute terms, who can beat C++?! Come back, C++! I still hate you!!

Side effects or not, aliasing kills you

My programming personality is very depressed right now. For a while, I've been away from my usual long-term stuff and back in the trenches. I'm doing simple things, but there's lots of them, so it feels like I'm not moving anywhere even though I go as fast as I can. It's like a highway with monotonic view.

Well, this has completely drained my brain. So I won't tell you anything in this blog – instead, I'll ask you a question. This way, it's OK for me to be wrong, which is even more likely than usual in my current state. Clever trick, isn't it?

My question basically is, how is not having side effects in your programming language going to help you write correct, and optimal, and parallel programs? I've already mentioned it, but this time, I'll elaborate on the point that I fail to get.

The problem with parallelism is scheduling. We want scheduling to be:

  1. Correct: when A depends on B, B should happen before A.
  2. Optimal: A, B and everything else should complete as soon as possible.

So, in order to correctly schedule basic blocks of code called A and B, I need to know if they depend on each other. If I have side effects, this is going to be hard. A modifies an object called X, and B modifies Y. Can X and Y be the same thing? To figure it out, you need whole-program analysis, and if your language is Turing-complete, WPA won't necessarily succeed, either.

This is the aliasing problem. The aliasing problem rules. It's the problem that forces you to use restrict to optimize numeric code. It's the problem that forces you to use static typing if you want to optimize anything at all. And it's the problem that makes all safe languages completely and wildly unsafe when you have multiple threads. The rule of thumb is: if you have a good idea for compile-time or run-time anything, it won't work because of the aliasing problem.

Now, suppose we got rid of those pesky side effects. A doesn't modify X; instead, it creates a new object, Z. Similarly, B takes Y and makes W. Now we can prove that A doesn't depend on B. As soon as their inputs are ready, they can run. This takes care of the correctness problem, which was listed as the scheduling problem #1 at the beginning.

However, this didn't solve the aliasing problem – X and Y can still be the same. I suspect that the aliasing problem is going to raise its ugly head and kick our pretty butt. Specifically, while semantically we made Z from X, we'd really like to wipe out X and reuse its memory for hosting Z. Because that would be more efficient than copying X and modifying some of it and keeping two objects around. Now, we can't do this optimization if X and Y are the same thing, and we know nothing about the execution order of A and B, can we? If A replaced X with Z, B would get the wrong Y.

And if we copy things, we aren't going to get an optimal scheduling, because A and B now run slower – problem #2, right?

Seriously, what do I miss? I realize that there are parallel execution environments where you have to copy things anyway because you have a zillion processors and a zillion processors can't have shared memory. I also realize that you can have few objects shared between tasks that run in parallel, so the overhead of copying them isn't very important. I realize all kinds of things making the aliasing problem a non-problem for side-effect-free parallelization frameworks or languages or whatever.

What I fail to see is, if you run in a single-box, multi-core system, and you do have shared memory, and you do have lots of shared data, how does removing side effects help you? With side effects, you can get an efficient program, if you can handle the debugging. Without side effects, you get a correct program, and then you must somehow optimize object allocation. And I just don't see how you'd go about that without compromising correctness.

Maybe I'm ignorant. Maybe I should have read about this somewhere. But where do I find a levelheaded discussion of, say, common implementation techniques of pure functional language compilers? I just keep bumping into "sufficiently smart compiler" arguments, and then there are people who don't care much about performance (and shouldn't because of their application domain).

Compare this to imperative languages. Anybody can write a pretty decent C compiler. If you don't optimize at all, you get what, 4x slowdown compared to the state of the art optimizer? And on a scalar RISC machine, I bet you can bridge 50%-70% of that gap by your cheapest shot at register allocation. A no-brainer, you need about 30% of a comp sci degree and you're all set. Now, how am I supposed to optimize allocation if there are no side effects? I mean, your spec or your family of specs should come with implementation techniques. "Useful" is good by itself, but it must also be implementable.

By the way, I'd love to find out that I'm wrong on this one. I've done lots of parallelism-related debugging, and it sucks. And maybe I am wrong. Maybe there just isn't much talk about the implementation details of these things. I do think my reasoning makes sense though.

IHateCamelCase

Yeah, naming conventions. Looks like my brain won't do any better today; those 5 drafts will have to wait. If you aren't in a mood for a trivial subject, skip this.

I think that the best naming convention out there is the Lisp one: case-insensitive-dash-separated. It just doesn't get any better:

  • You never have to hit Shift or Caps Lock in the middle of a name, which makes typing easier. This is especially important for names because you use auto-completion with names. Auto-completion requires you to press things like Ctrl-Space or Alt-Slash or Ctrl-P. Together with another Shift needed for the name to be completed correctly, auto-completion is much more likely to cause repetitive strain injury.
  • You never have to think about the case. Figuring out the case of a letter in a case-sensitive naming convention can be non-trivial; more on that later.
  • The dash-as-a-separator-convention is used in English, so your names look natural.

Unfortunately, most languages use C-style identifiers for names, the dreaded [A-Za-z_][A-Za-z_0-9]*, because their infix parsers can't tell a dash from a minus. So you can't use this convention.

This leads to two problems:

  1. How do we separate between subsequent words in an identifier?
  2. When do we capitalize letters?

Of course, we could use a lowercase_underscore_separated convention. It would solve both problems in a simple way, having all the benefits of the Lisp convention except for the no-Shifts-and-Caps-Locks part. But (1) Caps Lock is available for capital letters, but not for underscore, and Shift is less healthy for your hands, and (2) if we have case sensitivity in our language, we'll of course use it, won't we? OK, let's kill those underscores.

There are two anti-underscore schools: alllowercase and CamelCase. alllowercase looks lame – it makes it easy to know when to capitalize letters (never), but chooses to ignore the word separation problem completely. I used to sneer at it. However, it has two huge benefits: it's very typing-friendly, and it discourages the use of long names. Long names, people, are a frigging nightmare.

HaveYouEverSeenANameTakingHalfAScreen? This is awful. Awful!! I can't lock my eyes on the damned thing. I can only focus on a tiny part of it. My eyes nervously jump around the line, which mentions the moronic mega-identifier twice (at both parts of an assignment). I'm looking for differences, small differences in the names… You know, it could be BlahBlahABlah on the left and BlahBlahBBlah on the right… AAAARGH!

Reading this kind of code is pure mental pain. I prefer mental pain to physical pain on any day, and that's why I'm in the software industry, but still, this sucks. The good news are that alllowercasenametakinghalfascreen is so ridiculous that even the most clueless pseudo-orderly person won't emit it.

Now, CamelCase, which is basically the winner, because it's used in all major languages and libraries, is probably the worst possible naming convention. It fails to solve both problems created by the lack of a good word separator in the A-ZA-Z0-9 languages:

  1. You don't really know when one word ends and the next word starts.
  2. You don't really know when a letter should be capitalized.

The problems of camel case come from using capital letters for word separation. This interferes with the other uses of case in natural language. The problems are amplified by the brilliant idea to assign even more semantical payload to case: functionsLookLikeThis, but ClassesLookLikeThis, etc. Let's look at some examples.

English has words like TCP, DNA and WTF. Should a TCP socket class be called TCPSocket or TCPsocket? What about a TCPIPSocket? What if we need a tcpOpen method – should we call it TCPOpen, like a class, to preserve the natural case of an acronym, or should it be TCPopen, so that the lowercase "o" conveys the fact that it's a function?

Oh, I know, it should be "openTCP"! No, no, why are you using "openTcp" – this is ugliness for its own sake! The only important thing is to get the first letter of a name right, and then you can use natural capitalization! Unless, of course, it's "openTCPIPSocket", and then we have a problem again. "openTcpIpSocket"?.. Some people just can't handle it and resort to underscores: openTCP_IPsocket, open_TCP_IP_socket… It's no use. It's ugly no matter what you do.

Capital letters coming from the natural language, like those in acronyms and names of people, are the smaller part of the problem – Tcp looks ugly, but you know what it means. The other part of the problem is the capital letters coming from formal languages, such as mathematical notation.

For example, in computer vision it's common to denote 3D coordinates with uppercase X,Y,Z, and 2D coordinates with lowercase x,y. In a case-sensitive language, it's damn natural to follow this convention, and it works very well for a local variable X or x (including the case when you use both in the same function). It doesn't work so well when you try to name functions or classes after their arguments/coordinate systems.

Does xySomething start with a lowercase x because it's a function, or because it really accepts x values of 2D coordinates? What about xYSomething – is the Y capitalized because "y" is a word and we always capitalize the first letter of a word, or maybe the function expects Y values of 3D coordinates?

You can have a function working with 3D X coordinates and 2D y coordinates, you know. I think it's better to call it XySomething than xYSomething, because meaning is more important than convention. But did the author of the function think so, too? Of course, we can use an underscore to "clarify" the intent: something_Xy. The underscore clearly shows that the part after the underscore doesn't follow standard naming conventions, so it must be according to the computer-vision-specific convention.

So what happens is that CamelCase code deteriorates to the following state:

  1. You have ugly names like tcpIpOpen.
  2. Since you also have names like TCP_IP_Open, your real naming convention is "camel case with underscores". Which is equivalent to "any identifier that compiles".

Maybe there's a good way to augment CamelCase with rules that make it work well. I probably wouldn't know. I ought to say that I'm not that good at naming conventions in particular and in Best Practices in general. But I doubt there's a good case-sensitive naming convention out there.

Just look at the Python naming convention. You basically have everything. thingslikethis, ThingsLikeThis, things_like_this, thingsLikeThis, and they're all attached to different types of object (module, class, function, method). And every time your language entity convention disagrees with the common sense (class TCPIPSocket), you've got yourself an ugly name. And in a way, this is a good convention, because it at least tries to be consistent with the common conventions used in C, C++ and Java.

The annoying part of this is the slowdown. "Um, how should I spell this name?.." There are actual capitalization trade-offs here. Programming is almost exclusively about making decisions and choosing trade-offs. It's quite tiring, really. Nobody wants to be making some more pointless decisions on the way just for the fun of it. Maybe it's just me and the kind of people I've worked with, but I've always, always bumped into lots and lots of names which looked like a compromise. Somebody was thinking hard here. And it looks ugly anyway.

Barf.

Code, data and interactive programming

"Are code and data the same thing?" I haven't conducted a poll, but I think the following answers are the most popular ones:

  • "What?" (I don't know about universal Turing machines)
  • "Sure!" (I know about universal Turing machines)

My answer is "No". I'll now try to explain briefly why my opinion makes sense in general. After that, I plan to get to the point, which is how the code/data distinction matters in interactive programming environments.

I think that, um, everything is data, at least everything I can think about. I mean, the only things that can technically enter my brain are either data or cigarette smoke, because I don't do drugs. And I hope that the effect of passive smoking is negligible, so it's just data.

In particular, code is data. But not all data is code. Code is a special kind of data, that looks like this:

  • There are lots of blocks.
  • Each block defines the value of something.
  • The blocks depend on each other, and the dependencies can be cyclic.

What this means, and of course everybody knows it, is that you can't make any sense of code in the general case. That is, the only way to compute the values defined by the blocks of code is to "run" the code – keep chasing the links between the blocks, computing the values they define as you go. You can't even prove that this process will terminate given an arbitrary bulk of code, not to mention proving its correctness.

Now, an image, for example, isn't code. Well, philosophically, it is, because if they show you an image and it's really ugly, you'll say "ewww". So the image was in fact a program giving instructions to your brain. The output of your brain's image compiler is the following code in the human body assembly language:

MOV R0, dev_mouth
MOV R1, disgust_string
JMP write
RET
disgust_string:
.asciz "ewww"

More interestingly, you can write a program that processes images, and this particular image may be the one that makes your program so confused that it never terminates. However, this doesn't mean that the image itself is "code". The image doesn't have interconnected blocks defining values. Even if the image is a screenshot of code.

An image is a two-dimensional array of pixels, a nice, regular data structure. You don't have to "run" it in order to do useful things with it, like computing its derivatives or e-mailing it to your friends so they'll go "ewww". And programs doing that can be proven to terminate, unless you have an infinitely slow connection to the outgoing mail server.

So what I'm saying is, code is a special kind of data, having blocks which define values and depend on each other. Does it really matter whether a particular piece of data is "code" according to this definition? I think it does. One reason is the above-mentioned fact that you can't really make sense of code. Many people realize the practical drawbacks of this, and so in many contexts, they use data-driven programming instead of the arguably more natural "code-driven" programming.

Everything you represent as "data" can be processed by many different programs, which is good. Everything you represent as "code" can only be processed by a kind of interpreter, which is bad. I'm not talking about the difficulty of parsing the syntax, which doesn't exist with Lisp or Forth, isn't a very big deal with C or Java and is a full-featured nightmare with C++ or Perl. I'm talking about the semantics – for many purposes, you can't really "understand" what you've parsed without running it, and this is common to all Turing-complete languages.

But this isn't going to be about the inability to analyze code. This is going to be about the somewhat more basic problem with code – that of blocks which point to each other. In order to explain what I mean, I'll use the example of 3 interactive programming environments – Matlab, Unix shells and Python, listed in decreasing order of quality (as interactive environments, not programming languages).

Interactive programming is the kind of programming where the stuff you define is kept around without much effort on your behalf. The other kind of programming is when you compile and run your code and it computes things and exits and they are gone. Clearly interactive programming is nicer, because it makes looking at data and trying out code on it easy.

Or so it should be; in practice, it looks like more people prefer "batch programming", so there might be some drawbacks in the actual interactive environments out there. What makes for a good interactive environment, and what spoils the fun? Let's look at some well-known gotchas with existing environments.

Some of the most upset people I've seen near computers were the ones that had a Matlab session running for months when their machine crashed. It turned out that they had a load of data there – measurements, results of heavy computations, symbolic equations and their solutions – and now it's all gone. GAAA!! This doesn't happen with batch programming that much, because you send the output of programs to persistent storage.

This problem, nasty as it may be, looks easy to fix – just have the system periodically save the workspace in the background. Perhaps Matlab already has this. I wouldn't know, because I tend to manually save things once in a few minutes, since my childhood trauma of losing a file I loved. Anyway, this doesn't look like an inherent problem of interactive computing, just an awfully common implementation problem. For example, do Unix shells, by default, save the command history of each separate concurrent session you run? I think you know the answer.

Speaking of Unix shells. Ever had the pleasure of typing "rm -rf *" in the wrong directory because of command completion from history? GAAA!! OK. Ought to calm down. Let's do Fault Analysis. Why did this happen? The command string with "rm" in it is, basically, code; shell code invokes processes. This code depends on another piece of code, the one that determines the current directory. The command string doesn't have a fixed meaning – you must run getcwd in order to figure it out.

The shell couldn't really warn us about the problem, either. That's because the meaning of "rm" is defined by the code at /bin/rm (or by some other program in the $PATH which happens to be called "rm"). Since the shell can't understand that code without running it, it doesn't have an estimation of the potential danger. And if the shell warned us about all commands completed from history that originally ran in a different directory than the current one, the completion would likely be more annoying than useful.

At some point I've got fed up with Unix shells, and attempted to switch to a Python shell. I tried IPython and pysh, and I still use IDLE at home on my XP box. I ought to say that Python shells suck, and I don't just mean "suck as a replacement for a Unix shell", but also "suck as a way to develop Python code". The single biggest problem is that when you change your code, you must reload modules. It's unclear which modules should be reloaded, there's no way to just reload everything, and ultimately you end up with a mix of old code and new code, which does something, but you aren't quite sure what exactly.

Die-hard Pythonistas refuse to acknowledge there's a problem, though they do bend their programming style to work around it. What they do is they write all of their code in one file, and use execfile instead of import to make sure everything is indeed redefined, the Zen of Python with its love of namespaces be damned. Sure, an interesting project in Python can be just 5000 lines worth of code, but I don't like to navigate in a file that big. And sometimes you do need more lines, you know.

Another thing they do is implement __repr__ in their classes so that print displays their objects, and they'll invest a lot of effort into making eval(repr(obj)) work. The fact that eval'able strings aren't necessarily the most readable way to produce debug prints doesn't seem to bother them. Nor do the contortions they have to go through to solve the prosaic problem of making references to other objects display reasonably. One way to do it is to use dictionary keys instead of pointers, so that member object references aren't expanded into a full object description when they are printed. If you don't know why they're doing this, you'll find their code fairly puzzling.

I find the struggle to make interactive Python programming work very depressing. It reminds me of me, before the invincible idiocy of C++ crushed my spirit. People have a tendency to assume that things are well thought-out and hence should work.

We have this extremely widespread language called C++, and it's centered around type-based static binding. And it's easy to see how this could help a compiler spot errors and optimize the code. Therefore, this programming style can be a good way of writing software, if applied consistently. Ha!

We have this Python language, and several shells for it. Quite obviously, interactive programming is a good way to speed up the development cycle. Therefore, adapting our Python code for interactive programming will pay off, if we do it consistently. Ha!

But I digress. This isn't about the trusting nature of software developers, nor is it a comparison between C++ and Python, mind you. They're hard to compare, since they are very different beasts: Python is a programming language, and C++ is a karmic punishment. So I should get back to the topic of interactive programming.

Here's my opinion on the example programming environments I used in this entry.

Matlab is a great one, unless you lose your workspace. I used it for a while several times and it just never itched, and nothing went wrong.

Unix shells are good in terms of their ability to preserve your data (everything is a flat, self-contained string of bytes). I'd love them if they didn't suck so badly as programming languages. Since they do, I only use shell scripting for one-shot throwaway things, like debugging (fiddling with log files and core dumps).

Python is awful. So when I'm on Unix, I run Python processes from the shell, and sometimes use Python's reflection to make my batch programming just a bit more interactive. For example, if you have a Python function f(a,b,c), you can have your command line parser introspecting its arguments and adding the command line options -a, -b and -c.

So much for specific examples. What's the generic rule? I think it's this: pointer-happy systems can't be interactive. That's because interactive programming is about saving your data objects. And this is only useful when the current value of a preserved object is clear to you. Otherwise, you can't use the object, so what's the point?

When you have pointers in your objects, the objects aren't self-contained, and when the pointed objects are redefined, it isn't clear what should happen with the pointing objects. Should they point to the new thing or the old thing? Either answer can be counter-intuitive to you, and the whole point of interactive programming is to let you enter a state of flow, and if you scratch your head and can't easily guess what the old object means right now, you aren't in a state of flow.

In particular, pointers to code are the worst kind of pointers, because code is the most intertwined data of your program, and a pointer to a single block of code basically points to the entire code base. When an object points to an old function, and the function was redefined, and the system keeps the old definition around, you may easily get a call sequence with both the new function and the old function, which is probably ridiculous. And if you make the old object point to the new function, the function might simply fail to work with that object, and you just can't tell whether it will work or not without running it, remember?

For example, Python is a good interactive calculator, because arithmetic expressions are self-contained. Even if they do contain references to variables, it's fairly clear what happens when you change a variable – all expressions mentioning it will now recompute differently. Note that arithmetic expressions aren't Turing-complete and can't have cyclic references. Now, if you use Python's object-oriented features, then you have objects which point to their class definition which is a bunch of code pointers, and now when you reload the module defining the class, what good are your old objects?

This is why I don't believe in Numeric Python. The whole point of using Python is to use its pointer-happy features, like classes and hairy data structures and dynamically defined functions and stuff. Numeric programming of the kind you do in Matlab tends to use flat, simple objects, which has the nice side-effect of making interactive programming work very well. If you use a numeric library inside a pointer-happy language like Python, quite soon the other libraries you use will make interactive programming annoying. So you'll either move to batch programming or suffer in denial like the die-hard Python weenie you are. Someone using Matlab will be better off, since interactive programming is more productive than batch programming, when it works well.

So at the bottom line, I think that interactive programming has limited applicability, since "general-purpose" programming environments pretty much have to be pointer-happy. That is, if a language doesn't make it very easy to create a huge intertwined mess of code and data pointers, I don't see how it can be usable outside of a fairly restricted domain. And even in the "flat" environments like Matlab or Unix, while old data objects can be useful, old commands are, and ought to be, a two-edged sword. Because they are code, and code is never self-contained and thus has a great potential to do the wrong thing when applied in a new context.

This whole claim is one of those things I'm not quite sure about. From my experience, it took me quite some time to realize which interactive features help me and which get in the way with each environment I tried. So I can't know what happens in Lisp or Smalltalk or Tcl or Excel or Emacs, in terms of (1) applicability to "general-purpose" tasks, (2) the amount of self-contained data compared to the kind with pointers, especially pointers to code and (3) the extent to which the thing is itchy and annoying at times. So comments are most welcome. In particular, if you know of an environment that, put simply, isn't more itchy than Matlab but isn't less general-purpose than Python, that would be very interesting.

The Algorithmic Virtual Machine

There's a very influential platform called the AVM, which stands for Algorithmic Virtual Machine. That's the imaginary device people use as their mental model of a computer. In particular, it's used by many people working on algorithms where performance matters. Performance matters in many different contexts, ranging from huge clusters processing astronomic amounts of data to modest applications running on pathetically weak hardware. However, I believe that the core architecture of the AVM is basically the same everywhere.

AVM application development is done using the ubiquitous AVM SDK – a whiteboard and a couple of hands for handwaving. An AVM application consists of a set of operations your algorithm needs executed. Each operation has a cost (typically one cycle, sometimes more). You can then estimate the run time of your algorithm by the clever technique of summing the cost of all operations.

These estimations are never close enough to the real run time. The definition of "close enough" varies; the quality of estimations, by and large, doesn't. That is, I claim that your handwavy AVM-derived estimation will fail to meet your precision requirements no matter what those requirements are. Apparently our tolerance for errors grows with the lack of understanding of the problem, but it never grows enough. But I'm not really sure about this theory; I'm only sure about AVM-estimations-suck part. Here's why.

The AVM is basically this imaginary machine that runs "operations". Here are some things that real machines must do, but the AVM doesn't:

  • Fetching instructions
  • Fetching operands
  • Testing for conditions
  • Storing results

Basically, the Algorithmic Virtual Machine developers concentrate on "operations" and ignore addressing, branches, caches, buses, registers, pipelines, and all those other gadgets which are needed in order to dispatch the operation. In fact, that's how I currently distinguish between people who write software to get a job done and people who think of software as their job. "People who program" are into operations (algebra, networking, AI); "programmers" are into dispatching (programming languages, operating systems, OO). This is about mental focus rather than aptitude. I haven't noticed that people of either group are inherently less productive than the other kind.

When they're after performance, the "operations" people will naturally look for a way to reduce the number of operations. Sometimes, they'll find an algorithm with a better asymptotic complexity – O(N+M) instead of O(N*M). At other times, they'll come up with a way to perform 4*N*M operations instead of 16*N*M. Both results are very significant – if M and N are the only variables. The trouble is that you can't see all the variables if you just look at the math (as in "we want to multiply and sum all these and then compare to that"). That way, you assume that you run on the AVM and leave out all the dispatching-related variables and get the wrong answer.

Is there a way to take the cost of dispatching into account? Not really, not without implementing your algorithm and measuring its performance. However, families of machines do have related sets of heuristics that can be used to guess the cost of running on them. For example, here are a couple of heuristics that I use for SIMD machines (they are relevant elsewhere, but their relative importance may drop):

  1. Bandwidth is costly.
  2. Addressing is costly.

These heuristics are vague, and I don't see a very good way to make them formal. Perhaps there isn't any. To show that my points have any formal significance, I'd have to formally prove that there's unavoidable intrinsic cost to some things no matter how you build your hardware. And I don't know how to go about that. So what I'll do is I'll give some examples to show what I mean, and leave it there.

Bandwidth

Consider two "algorithms" (probably too fancy a name in this context): computing dot product, and computing its partial sums (Matlab: sum(a .* b) and cumsum(a .* b)). Exactly the same amount of "operations" – N multiplications and N additions. Many people with BA, MSc and PhD degrees in CS assume that the run time is going to be the same, too. It won't, because sum only produces one output, and cumsum produces N outputs. Worse, if the input vector elements are 8-bit integers, we probably need at least 32 bits for each output element. So we generate N*4 bytes of output from N*2 input bytes.

At this point, some people will say "Yeah, memory. Processors are fast, memories are slow, sure, memory is a problem". But it isn't just about the memory; memory bandwidth is just one kind of bandwidth. Let's look at the non-memory problems of the partial-sums-of-dot-product algorithm. On the way, I'll try to show how the "bandwidth costs" heuristic can be used to guess what your hardware can do and what the performance will be.

Consider a machine with a SIMD instruction set. Most likely, the machine has registers of fixed width (say, 16 bytes), and each instruction gets 2 inputs and produces 1 output. Why? Well, the hardware ought to support 2 inputs and 1 output to do basic math. Now, if it also wants to have an instruction that produces, say, 4 outputs, then it needs to have 3 additional output buses from the data processing units to the register file. It also needs a multiplexer so that each of the 4 outputs can be routed to each of its N registers (N can be 16 or 32 or even 128). The cost of multiplexers is, roughly, O(M*N), where M is the number of inputs and N is the number of outputs. That's awfully costly. Bandwidth costs. So they probably use 2 inputs and 1 output everywhere.

Now, suppose the machine has 16 multipliers, which is quite likely – 1 multiplier for each register byte, so we can multiply 16 pairs of bytes simultaneously. Does this mean that we can then take those 16 products and compute 16 new partial sums, all in the same cycle? Nope, because, among other things, we'd need a command producing 16×4 bytes to do that, and that's too much bandwidth. Are we likely to have a command that updates less than 16 accumulators? Yes, because that would speed up dot products, and dot products are very important; let's look at the manual.

You're likely to find a command updating – guess how many? – 4 accumulators (32 bits times 4 equals 16 bytes, that's exactly one machine register). If the register size is 8 bytes, you'll probably get a command updating 2 accumulators, and so on. Sometimes the machine uses "register pairs" for output; that doubles the register size for output bandwidth calculation purposes. The bottom line is that instruction set extensions can speed up dot product to an extent impossible for its partial sums. You might have noticed another problem here, that of the dependency of a partial sum on the previous partial sum. Removing this dependency doesn't solve the bandwidth problem. For example, consider the vertical projection of point-wise multiplication of 2 8-bit images, which has the same not-enough-accumulators problem.

There is little you can do about the bandwidth problem in the partial sums case – the algorithm is I/O bound. Some algorithms aren't, so you can optimize them to minimize the cost of bandwidth. For example, matrix multiplication is essentially lots of dot products. If you do those dot products straightforwardly, you'll have a loop spending 2 commands for loading the matrix elements into registers, and one command for multiplying and accumulating (MAC). 2 loads per MAC means an overhead of 200%.

However, you can work on blocks – 4 rows of matrix A and 4 columns of matrix B, and compute the 4×4=16 dot products in your loop. That's 4+4=8 loads per 16 MACs; the overhead dropped to 50%. If you have enough registers to do this. And it's still quite impressive overhead, isn't it? Your typical AVM user would be very disappointed. (Yes, some machines can parallelize the loads and the MACs, but some can't, and it's a toy example, and stop nitpicking). BTW, blocking can be used to save loads from main memory to cache just like we've used it to save loads from cache to registers.

OK. With partial sums of dot product, the bandwidth problem kills performance, and with matrix multiplication, it doesn't. What about convolution, which is about as basic as our previous examples? Gee, I really don't know. It's tricky, because with convolution, you need to store intermediate results somewhere, and it's unclear how many of them you're going to need. The optimal implementation depends on the quirks of the data processing units, the I/O, and the filter size. If you come across a benchmark showing the performance of convolution on some machine, you'll probably find interesting variations caused by the filter size.

So we have a bread-and-butter algorithm, and non-trivial & non-portable performance characteristics. I think it's one indication that your own less straightforward algorithm will also perform somewhat unpredictably. Unless you know an exact reason for the opposite.

Addressing

Bandwidth is one problem with fetching operands and storing results. Another problem is figuring out where they go. In the case of registers, we have costly multiplexers for selecting the source and destination registers of instructions. In the case of memory, we have addresses. Computing addresses has a cost. Reading data from those addresses also has a cost. Some address sequences are costlier than others from one of these perspectives, or both.

The dumbest example is the misalignment problem. People who learned C on x86 are sometimes annoyed when they meet a PowerPC or an ARM or almost any other processor since it won't read a 32-bit integer from a misaligned address. So when you read a binary buffer from a file or a socket, you can't just cast the char* to an int* and expect it to work. Isn't it nice of x86 to properly handle these cases?

Maybe it's nice, maybe it isn't (at least if it failed, the code would be fixed to become legal C), but it sure is costly. The fact that it's "in the hardware" doesn't make it a single-cycle operation. If your address is misaligned, the 32 bits may reside in two different memory words (no matter what the word size is). The hardware will have to read the low word, and then read the high word, and then take the high bits of the low word and the low bits of the high word and make a single 32-bit value out of them. Because in one cycle, memories can only fetch one word from an aligned address.

Does it matter outside of I/O-related code using illegal pointer-casting? Consider the prosaic algorithm of computing the first derivative of a vector, spelled v(2:end)-v(1:end-1) in Matlab. If we run on a SIMD machine, we could execute several subtractions simultaneously. In order to do that, we need to fetch a word containing v[0]…v[15] and a word containing v[1]…v[16] (both zero-based). But the second word is misaligned. The handling of misalignment will have a cost, whether it's done in hardware or in software.

Well, at least the operands of subtraction live in subsequent addresses – 0,1,2…15 and 1,2,3…16. That's how data processing units like them: you read a pack of numbers from memory and feed them right into the array of adders, ready to crunch them. It's not always like that. Consider scaling: a(x) = b(s*x+t). This can be used to resize images (handy), or to play records at a different speed the way you'd do with a tape recorder (less handy, unless you like squeaky or growly voices).

Now, if s isn't integral (say, s=0.6), you'd have to fetch data from places such as s*x+t = 1.3, 1.9, 2.5, 3.1, 3.7... Suppose you want to use linear interpolation to approximate a(1.3) as a(1)*0.7+a(2)*0.3. So now we need to multiply the vector of "low" elements – a([1,1,2...]) – by the vector of weights – [0.7,0.1,0.5...] – and add the result to the similar product a([2,2,3...])*[0.3,0.9,0.5...]. The multiplications and the additions map nicely to SIMD instruction sets; the indexing doesn't, because you have those weird jumpy indexes. So this time, the addressing can become a real bottleneck because it can prevent you from using SIMD instructions altogether and serialize your entire computation.

Well, at least we access adjacent elements. This means that most memory accesses will hit the cache. When you bump into an element that isn't cached yet, the machine will bring a whole cache line (say, 32 bytes), and then you'll read the other elements in that cache line, so it will pay off. You can even issue cache prefetching instructions so that while you're working on the current cache line, the machine will read the next one in the background. That way, you'll hit the cache all the time, instead of having your processor repeatedly surprised (hey, I don't have a(32) in the cache!.. hey, I don't have a(64) in the cache!.. hey, I don't have…). Avoiding the regularly scheduled surprise can be really beneficial, although cache prefetching is truly disgusting (it's basically a very finicky kind of cooperative multi-tasking – you ought to stuff the prefetching commands into the exactly right spots in your code).

Now, consider a(x) = b(f(x)) – a generic transformation of an input vector given a function for computing the input coordinate from the output coordinate. We have no idea what the next address is going to be, do we? If the transformation is complicated enough, we're going to miss the cache a lot. By the way, if the transformation is in fact simple, and the compiler knows the transformation at compile time, the compiler is still very unlikely to generate optimal cache prefetching commands. Which is one of the gazillion differences between C++ templates and "machine-optimal" code.

DVMs and TVMs

My bandwidth and addressing heuristics don't model a real machine; they only model an upgrade to the AVM for SIMD machines. Multi-box computing is one example of an entire universe of considerations they fail to model. So what we got is a DVM – Domain-specific Virtual Machine.

Now, in order to estimate performance without measuring (which is necessary when you choose your optimizations – you just can't try all the different options), I recommend a TVM (Target-specific Virtual Machine). You get one as follows. You start with the AVM. This gives overly optimistic performance estimations. You then add the features needed to get a DVM. This gives overly pessimistic estimations.

Then, you ask some low-level-loving person: "What are the coolest features of this machine that other machines don't have?" This will give you the capabilities that the real processor has but its DVM doesn't have. For example, PowerPC with AltiVec extensions is basically a standard SIMD DVM plus vec_perm. I won't talk about vec_perm very much, but if you ever need to optimize for AltiVec, this is the one instruction you want to remember. It solves the indexing problem in the scaling example above, among other things. Using a SIMD DVM and forgetting about vec_perm would make AltiVec look worse than it really is, and some algorithms much more costly than they really are.

And this is how you get a TVM for your platform. The resulting mental model gives you a fairly realistic picture, second only to reading the entire manual and understanding the interactions of all the features (not that easy). And it definitely beats the AVM by… how do you estimate the quality of handwaving? OK, it beats the AVM by the factor of 5, on average. What, you want a proof? Just watch the hands go.