The nomadic programmer

There's this broad metaphor I have, with no conclusions attached – just an attempt to describe dynamics. I've recently shared it with the commie ex-VP school teacher and he liked it, so I thought it could fit well with the other borderline stuff I host at Proper Fixation under the "wetware" category.

So. One recurring theme in the history of civilization is the conflict between nomadic and settled people. Nomads think that land is for feeding cattle and you move elsewhere once there's nothing left to graze. Villagers figure that land is for growing food, so you settle on it and fertilize it and irrigate it and stuff. Initially, nomads typically dominate the landscape, periodically attacking the settled villagers and taking their crops. However, the settled people eventually accumulate enough surplus to support cities, nation states and standing armies, extending their control to more and more lands and eventually exterminating the nomadic lifestyle altogether.

The way I painted this picture, I tend to side with the hard-working settled folk, the nomads being the parasitic losers I've depicted, and I think most of us civilized humans share similar sentiments. However, in my metaphor I side with the nomadic programmer, at least to a large extent, and I do so because of the meaning my metaphor assigns to "land".

The thing I find analogous to land in programming is problems, because that's where programmers live. Programmers live on (in?) problems in the sense of dealing with broken things most of the time – once something starts working, you move on to something that doesn't. In another sense, large problems or problem areas a programmer deals with define that programmer's territory. The programmer is in immediate demand to the extent that solutions to "his" problems are in demand; problems feed programmers. Strong programmers seek, in one way or another, to expand their responsibility to encompass more problems, and to preserve their existing responsibilities. And so on.

Now if we restate the respective worldviews of nomads and settlers in the terms of this metaphor, we'll get this. Nomads think that problems exist for solving them and you move elsewhere once there's nothing left to graze. Settlers think that problems exist for growing them, so they settle on them and fertilize them and irrigate them and stuff.

And now you can see why I'm inclined to sympathize with the nomadic programmer. Two other things fueling this sympathy are issues of personality to be discussed soon, and the fate of the nomad to be discussed immediately. And while the nomad is no longer the parasite, rest assured that he's still, in the long run, the loser.

Initially – in a young and small organization – nomadic programmers tend to dominate the landscape. There are more problems than people around. The nomadic programmer travels from one urgent problem to another, grazing through them as fast as he can. Occasionally he stumbles upon a settler who has settled on a problem near the nomad's territory and grown crops of code there. Well, if the problem occupied by the settler becomes urgent, or if the crops stand in the way of solving the nomad's adjacent urgent problem, the nomad will go ahead and brutally solve the settler's problem, wiping out his crops. The politics of the invasion will be trivial – a promise to deliver by the nomad carries lots of weight at this stage and the settler will not issue a counter-promise (to deliver in his own way) because he's a peaceful code-growing villager who isn't into stress which necessarily comes with delivering quickly.

However, the time goes by and sure enough, the settled people accumulate quite some surplus. What you grow on land is surplus wheat; what you grow on problems is surplus code. Code that wouldn't naturally grow on a problem – but now that the problem was fertilized by the original settlers, they've grown enough code on it to support whole cities, a nation state, and a standing army of programmers, all making a living by fiddling with this code.

The nomad starts running out of pasture. Sure enough, there are lots of problems just like there used to be. But you can no longer solve them because (1) now it's the majority and not a minority of problems that are already owned by someone (growing them rather than solving them) and (2) in most cases invasion is no longer an option. Now that the problem is owned by a nation state, responsible for lots of code and with lots of people working on that code, the nomad's promise to deliver quickly carries very little weight compared to the danger of irritating the sovereign. While it is quite likely still true that a nomad will probably deliver more quickly than the whole nation state team, the nomad will not be able to take over the entire responsibility of the team. (It is possible that the single reason for the latter is the problems grown by the team itself and that a few nomads could in fact handle the original problem. But it is irrelevant since problems that could have been avoided are no less real than problems that couldn't.)

So if the organization, by some decision making mechanism, lets the nomad invade the territory of the settled team and solve the stupid problem, and then the offended team, by some decision making mechanism, fights back by effectively going on strike, there is nothing the nomad will be able to offer the organization at this point. Of course it doesn't have to come to this, just like political conflicts don't have to come to full-scale wars, or personal conflicts to fist fights or court hearings. It's enough for the worst case scenario to be likely to work out in favor of A rather than B to shift the balance decisively in favor of A. Even if neither A nor B nor anyone making decisions affecting A and B actually think in terms of this scenario, things tend to evolve and adapt such that decisions are made in favor of A. And in our case, the nomadic programmer is B.

Solving problems just isn't the big thing in this organization anymore, just like the quality of life experienced by the inhabitants of some territory isn't the main theme in international politics. Perhaps there are ways to improve the quality of life in Siberia, however this is not nearly as important politically as the fact that there's already a guy exclusively responsible for the quality of life in Siberia. Perhaps Socialism with Chinese Characteristics could yield improvements in the lives of Siberians that Managed Democracy could not, however, if the Chinese try to act on this assumption, there will be a nuclear war. If what remains of a nomadic tribe somewhere in the region makes a similar attempt, then it will remain no more.

The disgruntled nomadic programmer reduces his ambition to merely being left alone to wander the remaining wilderness. However, this option is no more real for him now than the option of being left alone was available to the settler in the old days. Back then, the settlers were never safe since a nomad could always bump into them in an attempt to solve a related problem, and if their stuff got in the way, he'd rewrite or delete/disable their stuff. Now it is the nomad who is never safe since the nation states keep expanding their responsibilities into neighboring problems – having enough people to have some of them free for that some of the time.

(Actually having even partially idle workers on a team leaves few satisfying alternatives to an attempt at expanding the team's responsibilities since other teams are always happy to seize an idle worker. Likewise, back in the old days the nomadic programmer had few satisfying alternatives to invading and solving others' problems since otherwise he couldn't keep his promises to deliver. It's not (just) the intentions that fuel wars, it's (also) the situation.)

The nation states seeking to expand won't fight each other since the nomad is a much easier target, not having resources (time and reports) to look over his entire territory. Once a nation state team managed to take over some of that ever-shrinking territory, the nomad will never gain it back. Increasingly, the nomad has to reach compromises with neighboring nation states whenever his work is related to their work. Then it turns out that in order to be able to work on what he wants at all, he has to do it the way a chief commander or an officer of a nation state team wants him to do it – and then that in order to work on anything at all, he has to report to such a manager.

At this point the nomadic programmer can use his reputation and seniority to get pseudo-promoted to a non-productive position. Alternatively, he can actually become a report of a nation state team manager with whom the relationship is likely already strained – and his seniority, reputation and ambitions won't make the transition into this particular position of a report any smoother. Alternatively he can quit. His failure is now complete.

(It may sound like a natural thing for a nomad to change jobs fairly frequently – part of a lifestyle rather than the failure of that lifestyle. However, nomadic programmers are those who like to travel from problem to problem – not from job to job; some like the latter but some don't. A new job at a new place means a temporary, but possibly significant loss of confidence and efficiency. An African nomad won't necessarily welcome a relocation to Alaska.)

As I've said above, I have reasons having to do with my personality to side with the nomadic programmer, especially at the stage of mounting pressure from nation state teams. The people I tend to relate to most easily seem to be those who prefer freedom to power. A talented freedom-seeker with a strong sense of responsibility will accumulate, well, responsibilities much more quickly than reports – a lot of territory to wander, and no standing army to protect it. (The problem with reports is that you take their freedom by telling them what to do and they take your freedom through your responsibility for their actions; who wants reports?) Since many freedom-lovers disdain politics, they won't respect international borders – a problem should be solved, dammit; hence they're likely to initiate invasions.

However, while this means that I personally will tend to find myself sympathizing with particular nomadic programmers, this does not mean that theirs is the right way or something. For example, it is unclear which share of programming problems out there can really be "solved" – grazed through and left alone – and which problems actually require continuous care and gardening that a true nomad is not likely to supply. Also, whether there's a "solution" you only need to "maintain" or an "infrastructure" you want to "extend", the code needs a permanent owner. I don't believe in collective code ownership any more than in collective ownership of anything else – what it usually means is that everybody collectively fights over something. Therefore I think that ownership should generally be respected, and so a compromise which is, from a technical viewpoint, quite moronic, can otherwise be a great thing – a belief outside the nomad's way.

So while I know where my sympathies lie, I don't know which camp I'm in and this is why this metaphor doesn't come with any conclusions, just the dynamics. In fact I'd rather leave it without conclusions but I wouldn't mind expanding more on the dynamics. For example, some – but not all – settled civilizations were actually started by nomads enslaving argicultural villagers and settling among them. Apparently a similar distinction can be made between nation state teams of programmers; it is then interesting whether differences in their behavior can be traced to their different origins. Perhaps a person more entertained than appalled by the sort of perspective on the adventurous lives of programmers here presented is also the kind of person more entertained than appaled by the history of mankind in general and so could help develop this line of thought based on his knowledge of history. Could be fun.

Update (2009-08-18) – Chuck Moore: "I’ve met too many people who want to make a career out of a project instead of completing it" – the nomad's view of the settlers. Nomadism is apparent in other writing by Chuck Moore – his disdain for "complexity" (which implies dependency on large teams of people you ought to manage, annoying constraints imposed by systems made by someone else and other things nomads don't like), his firm opinion that distinct projects should have distinct code bases (customizability and "reuse" imply complexity and otherwise reduce the chances to "hermetically close" and truly complete a project), etc.

Humans and compilers need each other: the VLIW SIMD case

The state of the art in optimizing compilers today is such that for optimizing code, you need (1) a strong optimizing compiler and (2) a strong optimizing human. My rule of thumb is that (1) alone will yield 2x to 10x slower code. This is also what a person selling a (great) compiler "giving 80% of the optimal performance with no manual intervention" once told off-record to a roomful of programmers who pressed him into a corner, elevating my rule of thumb to a nobler plane of anecdotal evidence.

Now, I claim that this situation will persist, and in this post I'll try to close the fairly large gap between this claim and the mere acknowledgment of what the state of the art is today. The gap is particularly large for the believer in the possibility of strong AI – and while my position is a bit different, I do believe in fairly strong AI (can I say that? people keep telling that I can't say "nearly context-free". oh well.)

I realize that many people experienced in optimization feel that, on the contrary, there's in fact no gap large enough to justify an attempt as boringly rigorous (for a pop tech blog) at proving what they think is obvious as will shortly follow. But I think that many language geek discussions could benefit from a stronger bound on the power of a Sufficiently Smart Compiler than can be derived from (necessarily vague) doubts on the power of AI, and in this post I'll try to supply such a bound. I actually think a lot of (mainly domain-specific) things could be achieved by AI-ish work on compilation – closer to "identify bubble-sort and convert to quick-sort" than to traditional "analyze when variables are alive and assign them to registers" – and this is why it's useful to have a feeling when not to go there.

So, consider chess, where the state of the art is apparently quite similar to that in optimization: a strong human player using a strong computer program will take out both a human and a computer playing alone. However, it is conceivable that a program can be developed that doesn't need the help of a human, being able of completely simulating human thought processes or instead alternative processes which are consistently superior. Why can't it be the same with optimizing compilers?

(Chess and optimization are similar in another respect – few care about them; I readily acknowledge the insignificance of a 10x speed-up in a continuously expanding set of circumstances, I just happen to work in an area where it does count as a checkmate.)

I'll try to show that optimization is a fundamentally different game from chess, quite aside from the formal differences such as decidability. I'll use optimizing for VLIW SIMD processors to show where compilers outperform humans and vice versa. I'll be quoting a book by the inventor of VLIW called "Embedded Computing: A VLIW Approach" to support my position on the relative strength of humans and compilers in these cases. I'll then try to show that my examples are significant outside the peculiarities of current hardware, and attempt to state the general reason why humans are indispensable in optimization.

VLIW SIMD

First, we'll do the acronym expansion; skip it if you've been through it.

VLIW stands for "Very Long Instruction Word". What it really means is that your target processor can be told to execute several instructions in parallel. For example: R0=Add R1,R2 and R3=Mul R0,R1 and R1=Shift R5,R6. For this to work, the processor ought to be able to add, multiply and shift in parallel, that is, its execution hardware must be packed into several units, each getting distinct inputs. The units can be completely symmetric (all supporting the same operations); more often, different units support different instruction sets (so, for example, only one unit in a processor can multiply, but two of them can add, etc.) A stinky thing to note about VLIW instructions is the register semantics. In the example instruction above, R0 is mentioned both as an input and as an output. When it's mentioned as an input of Mul its old value is meant, and not the value computed by Add. This is somewhat natural since the whole point is to run Add and Mul in parallel so you don't want Mul to wait for Add; but it's confusing nonetheless. We'll come back to this shortly.

SIMD stands for "Single Instruction, Multiple Data" and is known much more widely than VLIW, being available at desktop and server processor architectures like x86 and PowerPC (VLIW reigns the quieter embedded DSP domain, the most commercially significant design probably being TI's C6000 family.) SIMD means that you have commands like R0=Add8 R1,R2, which does 8 additions given 2 inputs. The registers are thus treated as vectors of numbers – for example, uint8[16], or uint16[8], or uint32[4], assuming 16b registers. This establishes a preference for lower-precision numbers since you can pack more of them into a register and thus process more of them at a time: with uint16, you use Add8, but with uint8, you get to use the 2x faster Add16. We'll come back to this, too.

Optimizing for VLIW targets

The basic thing at which VLIW shines is the efficient implementation of "flat" loops (where most programs spend most time); by "flat", I mean that there are no nested if/elses or loops. The technique for implementing loops on VLIW machines is called modulo scheduling. The same technique is used on superscalar machines like modern x86 implementations (the difference from VLIWs being the instruction encoding semantics).

Since I couldn't find a good introductory page to link to, we'll run through a basic example of modulo scheduling right here. The idea is pretty simple, although when I first saw hardware designers doing it manually in a casual manner, I was deeply shocked (they do it for designing new hardware rather than programming existing hardware but it's the same principle).

Suppose you want to compute a[i]=b[i]*c+d on a VLIW processor with 4 units, 2 of them capable of load/store operations, 1 with an adder and 1 with a multiplier. All units have single-cycle latency (that is, their output is available to the next instruction; real VLIW units can have larger latencies, so that several instructions will execute before the result reaches the output register.) Let's assume that Load and Store increment the pointer, and ignore the need to test for the exit condition through the loop. Then a trivial assembly implementation of a[i]=b[i]*c+d looks like this:

LOOP:
R0=Load b++
R1=Mul R0,c
R2=Add R1,d
Store a++,R2

This takes 4 cycles per iteration, and utilizes none of the processor's parallelism as each instruction only uses 1 of the 4 execution units. Presumably we could do better; in fact the upper bound on our performance is 1 cycle per iteration, since no unit has to be used more than once to implement a[i]=b[i]*c+d (if we had two multiplications, for example, then with only 1 multiplying unit the upper bound would be 2 cycles/iteration.)

What we'll do now is blithely schedule all of the work to a single instruction, reaching the throughput suggested by our upper bound:

LOOP:
R0=LOAD b++ and R1=MUL R0,c and R2=ADD R1,d and STORE a++,R2

Let's look at what this code is doing at iteration N:

  • b[N] is loaded
  • b[N-1] (loaded at the previous iteration into R0) is multiplied by c
  • b[N-2]*c (computed at the previous iteration from the old value of R0 and saved to R1) is added to d
  • b[N-3]*c+d is saved to a[N]

This shows why our naive implementation doesn't work (it would be quite surprising if it did) – at iteration 0, b[N-1] to b[N-3] are undefined, so it makes no sense to do things depending on these values. However, starting at N=3, our (single-instruction) loop body seems to be doing its job just fine (except for storing the result to the wrong place – b ran away during the first 3 iterations). We'll take care of the first iterations by adding a loop header – instructions which implement the first 3 iterations, only doing the stuff that makes sense in those iterations:

R0=Load b++
R0=Load b++ and R1=Mul R0,c
R0=Load b++ and R1=Mul R0,c and R2=Add R1,d
LOOP:
R0=Load b++ and R1=Mul R0,c and R2=Add R1,d and Store a++,R2

For similar reasons, we need a loop trailer – unless we don't mind loading 3 elements past the end of a[], but I reckon you get the idea. So we'll skip the trailer part, and move to the more interesting case – what happens when the loop body won't fit into a single instruction. To show that, I can add more work to be done in the loop so it won't fit into the units, or I can use a weaker imaginary target machine to do the same work which will no longer fit into the (fewer) units. The former requires more imaginary assembly code, so I chose the latter. Let's imagine a target machine with just 2 units, 1 with Load/Store and one with Add/Mul. Then our upper bound on performance is 2 cycles per iteration. The loop body will look like this:

LOOP:
R0=Load b++ and R2=Add R1,d
R1=Mul R0,c and Store a++,R2

Compared to the single-instruction case, which was still readable ("Load and Mul and Add and Store"), this piece looks garbled. However, we can still trace its execution and find that it works correctly at iteration N (assuming we added proper header code):

  • At instruction 1 of iteration N, b[N] is loaded
  • At instruction 2 of iteration N, b[N] (loaded to R0 by instr 1 of iter N) is multiplied by c
  • At instruction 1 of iteration N, b[N-1]*c (computed in R1 by instr 2 of iter N-1) is added to d
  • At instruction 2 of iteration N, b[N-1]*c+d (computed in R2 by instr 1 of iter N) is stored to a[N]

In common VLIW terminology, the number of instructions in the loop body, known to the rest of humanity as "throughput", is called "initiation interval". "Modulo scheduling" is presumably so named because the instructions implementing a loop body are scheduled "modulo initiation interval". In our second example, the operations in the sequence Load, Mul, Add, Store go to instructions 0,1,0,1 = 0%2,1%2,2%2,3%2. In our first example, everything goes to i%1=0 – which is why I needed an example with at least 2 instructions in a loop, "modulo 1" being a poor way to illustrate "modulo".

In practice, "modulo scheduling" grows more hairy than simply computing the initiation interval, creating a linear schedule for your program and then "wrapping it around" the initiation interval using %. For example, if for whatever reason we couldn't issue Mul and Store at the same cycle, we could still implement the loop at the 2 cycles/iteration throughput, but we'd have to move the Mul forward in our schedule, and adjust the rest accordingly.

I've done this kind of thing manually for some time, and let me assure you that fun it was not. An initiation interval of 3 with 10-15 temporary variables was on the border of my mental capacity. Compilers, on the other hand, are good at this, because you can treat your input program as a uniform graph of operations and their dependencies, and a legal schedule preserving its semantics is relatively easy to define. You have a few annoyances like pointer aliasing which precludes reordering, but it's a reasonably small and closed set of annoyances. Quoting "Embedded Computing: A VLIW Approach" (3.2.1, p. 92): "All of these problems have been solved, although some have more satisfyingly closed-form solution than others." Which is why some people with years of experience on VLIW targets know almost nothing about modulo scheduling – a compiler does a fine job without their help.

The book goes on to say that "Using a VLIW approach without a good compiler is not recommended" – in other words, a human without a compiler will not perform very well. Based on my experience of hand-coding assembly for a VLIW, I second that. I did reach about 95% of the performance of a compiler that was developed later, but the time it took meant that many optimizations just wouldn't fit into a practical release schedule.

Optimizing for SIMD targets

I will try to show that humans optimize well for SIMD targets and compilers don't. I'll quote "Embedded Computing: A VLIW Approach" more extensively in this section. A book on VLIW may not sound like the best source for insight on SIMD, however, I somewhat naturally haven't heard of a book on SIMD stressing how compilers aren't good at optimizing for it. But then I haven't heard of a book stressing the opposite, either, and success papers I saw claimed at automatic vectorization was modest. Furthermore, the particular VLIW book I quote is in fact focusing on embedded DSP where SIMD is ubiquitous, and its central theme is the importance of designing processors in ways making them good targets for optimizing compilers. It sounds like a good place to look for tips on designing compilers to work well with SIMD and vice versa; and if they say they have no such tips, it's telling.

And in fact the bottom line of the discussion on SIMD (which they call "micro-SIMD") is fairly grim: "The ability of compilers to automatically extract micro-SIMD without hints (and in particular, without pointer alignment information) is still unproven, and manual code restructuring is still necessary to exploit micro-SIMD parallelism" (4.1.4, p. 143). This statement from 2005 is consistent with what (AFAIK) compilers can do today. No SIMD-targeted programming environment I know relieves you of the need to use intrinsics in your C code as in "a = Add8(b,c)", where Add8 is a built-in function-looking operator translated to a SIMD instruction.

What I find fascinating though is the way they singled out pointer alignment as a particularly interesting factor necessitating "hints". Sure, most newbies to SIMD are appalled when they find out about the need to align pointers to 16 bytes if you want to use instructions accessing 16 bytes at a time. But how much of a show-stopper can that be if we are to look at the costs and benefits more closely? Aligning pointers is easy, producing run time errors when they aren't is easier, telling a compiler that they are can't be hard (say, gcc has a __vector type modifier telling that), and alternatively generating two pieces of code – optimized for the aligned case and non-optimized for the misaligned case – isn't hard, either (the book itself mentions still other option – generating non-optimized loop header and trailer for the misaligned sections of an array).

There ought to be more significant reasons for people to be uglifying their code with non-portable intrinsics, and in fact there are. The book even discusses them in the pages preceeding the conclusion – but why doesn't it mention the more serious reasons in the conclusion? To me this is revealing of the difference between a programmer's perspective and a compiler writer's perspective, which is related to the difference between optimization and chess: in chess, there are rules.

For an optimizing programmer, SIMD instructions are a resource from which most benefit must be squeezed at any reasonable cost, including tweaking the behavior of the program. For an optimizing compiler, SIMD instructions are something that can be used to implement a piece of source code, in fact the preferable way to implement it – as long as its semantics are preserved. This means that a compiler obeys rules a programmer doesn't, making winning impossible. A typical reaction of a compiler writer is to think of this as not his problem – his problem ending where program transformations preserving the semantics are exhausted. I think this is what explains the focus on things like pointer alignment (which a compiler can in fact solve with a few hints and without affecting the results of the program) at the expense of the substantive issues (which it can't).

In the context of SIMD optimizations, the most significant example of rules obeyed by just one of the contestants has to do with precision, which the book mentions right after alignment in its detailed discussion of the problems with SIMD. "Even when we manipulate byte-sized quantities (as in the case of most pixel-based images, for example), the precision requirements of the majority of manipulation algorithms require keeping a few extra bits around (9, 12, and 16 are common choices) for the intermediate stages of an algorithm. …this forces us up to the next practical size of sub-word … reducing the potential parallelism by a factor of two up front." They go on to say that a 32b register will end up keeping just 2 16b numbers, giving a 2x speed-up – modest considering all the cases when you won't get even that due to other obstacles.

This argument shows the problems precision creates for the hardware implementation of SIMD. However, the precision of intermediate results isn't as hard a problem as this presentation makes it sound, because intermediate results are typically kept in registers, not in memory. So to keep the extra bits in intermediate results, you can either use large registers for SIMD operations and not "general-purpose" 32b ones, or you can keep intermediate results in pairs of registers – as long as you have enough processing units to generate and further process these intermediate results. Both things are done by actual SIMD hardware.

However, the significant problems created by precision lie at the software side: the compiler doesn't know how many bits it will need for intermediate results, nor when precision can be traded for performance. In C, the type of the intermediate results in the expression (a[i]*3+b[i]*c[i])>>d is int (roughly, 32b), even if a, b and c are arrays of 8b numbers, and the parenthesized expression can in fact exceed 16b. The programmer may know that b[i]*c[i] never exceeds, say, 20000 so the whole thing will fit in 16b. That C has no way of specifying precise ranges of values a variable can hold (as opposed to Lisp, of all rivals to the title of the most aggressively optimizing environment) doesn't by itself make an argument since a way could be added, just like gcc added __vector, not to mention the option of using a different language. Specifying the ranges of b[i] and c[i] wouldn't always suffice and we would have to further uglify the code to specify the range of the product (in case both b[i] and c[i] can be large by themselves but never together), but it could be done.

The real problem with having to specify such information to the compiler isn't the lack of a standard way of spelling it, but that a programmer doesn't know when to do it. If it's me who is responsible for the low-level aspects of optimization, I'll notice the trouble with an intermediate result requiring too many bits to represent. I will then choose whether to handle it by investigating the ranges of b[i] and c[i] and restricting them if needed, by moving the shift by d into the expression as in (a[i]*3>>d)+(b[i]*c[i]>>d) so intermediate results never exceed 16b, or in some other way. But if it's the compiler who's responsible, chances are that I won't know that this problem exists at all.

There's a trade-off between performance gains, precision losses and the effort needed to obtain more knowledge about the problem. A person can make these trade-offs because the person knows "what the program really does", and the semantics of the source code are just a rendering of that informal spec from one possible perspective. It's even worse than that – a person actually doesn't know what the program really does until an attempt to optimize it, so even strong AI capable of understanding an informal spec in English wouldn't be a substitute for a person.

A person can say, "Oh, we run out of bits here. OK, so let's drop the precision of the coefficients." Theoretically, and importantly for my claim, strong AI can also say that – but only if it operates as a person and not as a machine. I don't claim that we'll never reach a point where we have a machine powerful enough to join our team as a programmer, just that (1) we probably wouldn't want to and (2) if we would, it wouldn't be called a compiler, it would be called a software developer. That is, you wouldn't press a button and expect to get object code from your source code, you'd expect a conversation: "Hey, why do you need so many bits here – it's just a smoothing filter, do you really think anyone will notice the difference? Do you realize that this generates 4x slower code?" And then someone, perhaps another machine, would answer that yes, perhaps we should drop some of the bits, but let's not overdo it because there are artifacts, and I know you couldn't care less because your job ends here but those artifacts are amplified when we compute the gradient, etc.

This is how persons optimize, and while a machine could in theory act as a person, it would thereby no longer be a compiler. BTW, we have a compiler at work that actually does converse with you – it says that it will only optimize a piece of code if you specify that the minimal number of iterations executed is such and such; I think it was me who proposed to handle that case using conversation. So this discussion isn't pure rhetoric. I really wish compilers had a -warn-about-missed-optimization-opportunities switch that would give advice of this kind; it would help in a bunch of interesting cases. I just think that in some cases, precision being one of them, the amount and complexity of interactions needed to make headway like that exceeds the threshold separating aggressive optimization from aggressive lunacy.

To be sure, there are optimization problems that could be addressed by strong AI. In the case of SIMD, the book mentions one such area – they call it "Pack, Unpack, and Mix". "Some programs require rearranging the sub-words within a container to deal with the different sub-word layouts. From time to time, the ordering of the sub-words within a word (for example, coming from loading a word from memory) does not line up with the parallelism in the code… The only solution is to rearrange the sub-words within the containers through a set of permutation or copying operations (for example, the MIX operation in the HP PA-RISC MAX-2 extension)."

An example of this reordering problem is warping: computing a[i]=b[i*step+shift]. This is impossible to do in SIMD without a permutation instruction of the kind they mention (PowerPC's AltiVec has vec_perm, and AFAIK x86's SSE has nothing so you can't warp very efficiently). However, even if an instruction is available, compilers are AFAIK unable to exploit it. I see no reason why sufficiently strong AI couldn't manage to do such things with few hints in some interesting cases. I wouldn't bet my money on it – I side with Mitch Kapor on the Turing Test bet, but it is conceivable like the invincible chess playing program, and unlike transformations requiring "small" changes of the semantics.

Significance

There are areas of optimization that are very significant commercially but hardly interesting in a theoretical discussion (and this here's a distinctively theoretical discussion as is any discussion where the possibility of strong AI is supposed to be taken into account).

For example, register allocation for the x86 is exceedingly gnarly and perhaps an interesting argument could be made to defend the need for human intervention in this process in extreme cases (I wouldn't know since I never seriously optimized for the x86). However, a general claim that register allocation makes compiler optimization hard wouldn't follow from such an argument: on a machine with plentiful and reasonably uniform registers, it's hard to imagine what a human can do that a compiler can't do better, and almost everybody would agree that the single reason for not making hardware that way is a commercial one – to make an x86-compatible processor.

Now, I believe that both SIMD and VLIW instruction encodings don't have this accidental nature, and more likely are part of the Right Way of designing high-performance processors (assuming that it makes no sense to move cost from software to hardware and call that a "performance gain", that is, assuming that performance is measured per square millimeter of silicon). One argument of rigor worthy of a pop tech blog is that most high-end processors have converged to SIMD VLIW: they have instructions processing short vectors and they can issue multiple instructions in parallel; some do the latter in the "superscalar" way of having the hardware analyze dependencies between instructions at run time and others do it in the "actual VLIW" way of having the lack of dependencies proven and marked by the compiler, but you end up doing modulo scheduling anyway.

However, this can of course indicate uninformed consumer preference rather than actual utility (I type this on a noisy Core 2 Duo box running Firefox on top of XP, a job better handled by a cheaper, silent single-core – and I'm definitely a consumer who should have known better). So my main reasons for believing VLIW and SIMD are "right" are abstract considerations on building von Neumann machines:

  • You typically have lots of distinct execution hardware: a multiplier has little in common with a load/store unit. Up to a point, it will therefore make sense to support parallel execution of instructions on the different execution hardware. The cost of supporting it will be more I/O ports connecting the execution units with the register file – quite serious because of the multiplexers selecting the registers to read/write. However, the cost of not supporting it will be more execution hardware left unused for more time. So the optimum is unlikely to be "no parallel execution", it's likely "judicious parallel execution".
  • It is cheaper to have few wide registers and wide buses between the register file and the execution units than it is to have many narrow registers and buses. That's because the cost of the register file is proportional to the product of #registers and #buses to the execution units. It is thus significantly cheaper to have 1 unit with 4 8bx8b multipliers and 2 32b buses for the inputs then it is to have 4 units with 1 8bx8b multiplier in each and 8 8b buses for the inputs. It's also cheaper to keep 4 bytes in 1 32b register than in 4 8b registers. Likewise, it is cheaper to have 4 multipliers in 1 processor than to have 4 full-blown processor cores, because each core would have, say, its own fetch and decode logic and instruction cache – which are in fact pure overhead. So if you have a von Neumann machine with registers and buses and instruction cache, it makes sense (up to a point) to add SIMD to make the best of that investment, and this is why commercial VLIWs have SIMD, although the VLIW theory recommends more units instead.

Since I believe that both VLIW and SIMD are essential for maximizing hardware performance, I also tend to think that optimizations needed to utilize these features are "mainstream" enough to support a broad claim about optimization in general. And the main point of my claim is that compilers can't win in the optimization game, because part of that game is the ability to change the rules once you start losing.

Humans faced with a program they fail to optimize change the program, sometimes a little, sometimes a lot – I heard of 5×5 filters made 4×4 to run on a DSP. But even if we exclude the truly shameless cheating of the latter kind, the gentler cheating going into every serious optimization effort still requires to negotiate and to take responsibility in a way that a person – human or artificial – can, but a tool like a compiler can not.

Modulo scheduling is an example of the kinds of optimizations which in fact are best left to a compiler – the ones where rules are fixed: once the code is spelled, few can be added by further annotations by the author and hence the game can be won without much negotiations with the author; although sometimes a little interrogation can't hurt.

Halved pepper

full resolution

Leaf (yellow)

Aloe

full resolution

Pearls of wisdom

Proper Fixation always had more unfinished drafts than posts, but recently it's getting ridiculous. I do have a couple of drafts I seriously intend to finish (usually the drafts which don't make it to posthood during the first 4 hours or so go to the eternal drafthood land.) Until I'm able to think this stuff out to the point where I can share the results of my thinking, I figured I could share the far less scarce resource of Wisdom with ya.

***

Since I've violated the Golden Rule of Helping Friends with their PC Problems and attempted to help a friend with his PC problem, expectedly wiping out his hard drive in vain, I had many opportunities to explain the Programmer Paradox: how can a programmer fail to make a computer do as he wishes? While the difficulty of debugging a program without the source proved hard to explain to laymen, I think I've found a metaphor that does a good job. A programmer is to the blue screen of death what Mikhail Kalashnikov is to a loaded AK-47: just as helpless a victim as any other mortal, except for having a profound understanding of the mechanisms of his execution.

***

I would like to get some statistics on file encryption. For example, of all the files on the planet, X% are encrypted. Of all those files, Y% will never be read by someone due to encryption. Of all those files, Z% will never be read by malicious intruders. If I could lay my hands on the value of just one of these unknowns, I'd pick Z, because at least 100-Z% of the files will never be read by their owners. I would bet on Z lying somewhere between 0 and 1.

***

One of the key traits of good code is the ease at which it can be modified. One of the key traits of bad code is the high cost of modifying it. So good code is likely to deteriorate until it's bad enough to become hard to change, and bad code is likely to stay bad. In short, code has a strong tendency to end up bad.

This can sound worthlessly pessimistic, similarly, for example, to "It is easier to break a leg than it is to cure it, therefore, most legs end up broken." However, I think it's more analogous to aging – the accumulation of changes in an organism, observably causing most animals to end up dead. Similarly, code that is used will be changed, code that is changed will degrade, and code that degrades beyond a certain point will die.

***

Health tends to be simpler than disease. For example, everybody can brush their teeth but few people can treat cavities. Similarly, it's not very hard to maintain a sane development environment, but pretty hard to deal with the tide of bugs and of long-living branches resulting from a failure to do so. However, I'm generally optimistic about the chances of such cavities to be treated, and as usual, the optimism is based on the pain they cause – a strong incentive to seek and reward treatment.

***

There's this evolution vs Intelligent Design debate. Well, I don't know about life on Earth, but I sure have hard time believing in Intelligent Design in software. Code has to repeatedly survive exposure to users upon whom its fate depends. Yes, "users" can be a set containing just the author, but only if it's honest-to-God USAGE, that is, the author has to pay a price when the program is hard to use – like not getting important things done properly. Show me a program that someone finds useful and that wasn't subject to such evolutionary pressure, but rather was Intelligently Designed as useful.

I think that my intense hatred of the word "design" has to do with its prominent place in the speech of software creationists. These people are likely to constantly complain about not having enough resources to do The Right Thing in the ugly real world. They are also likely to give you software that you hate enough to wish to kill them, and be articulate enough to convince you that the problem is at your end, and fail to notice how this latter ability quadruples your desire to slash their body into square millimeter pieces.

***

I'll conclude with an off-topic request: if you know a good text advocating a collectivistic or other kind of heterodox approach to economics, I'd be very grateful for a reference. By "advocacy", I mean a text for laymen expressing support for a certain set of policies (as opposed to merely criticizing the effects of existing policies) – like Milton Friedman's "Capitalism and Freedom", for example.

The C++ Sucks Series: the quest for the entry point

Suppose you run on the x86 and you don't like its default FPU settings. For example, you want your programs to dump core when they divide by zero or compute a NaN, having noticed that on average, these events aren't artifacts of clever numerical algorithm design, but rather indications that somebody has been using uninitialized memory. It's not necessarily a good idea for production code, but for debugging, you can tweak the x86 FPU thusly:

//this is a Linux header using GNU inline asm
#include <fpu_control.h>
void fpu_setup() {
unsigned short cw;
_FPU_GETCW(cw);
cw &= ~_FPU_MASK_ZM;//Divide by zero
cw &= ~_FPU_MASK_IM;//Invalid operation
_FPU_SETCW(cw);
}

So you call this function somewhere during your program's initialization sequence, and sure enough, computations producing NaN after the call to fpu_setup result in core dumps. Then one day someone computes a NaN before the call to fpu_setup, and you get a core dump the first time you try to use the FPU after that point. Because that's how x86 maintains its "illegal operation" flags and that's how it uses them to signal exceptions.

The call stack you got is pretty worthless as you're after the context that computed the NaN, not the context that got the exception because it happened to be the first one to use the FPU after the call to fpu_setup. So you move the call to fpu_setup to the beginning of main(), but help it does not. That's because the offending computation happens before main, somewhere in the global object construction sequence. The order of execution of the global object constructors is undefined by the C++ standard. So if you kindly excuse my phrasing – where should we shove the call to fpu_setup?

If you have enough confidence in your understanding of the things going on (as opposed to entering hair-pulling mode), what you start looking for is the REAL entry point. C++ is free to suck and execute parts of your program in "undefined" (random) order, but a computer still executes instructions in a defined order, and whatever that order is, some instructions ought to come first. Since main() isn't the real entry point in the sense that stuff happens before main, there ought to be another function which does come first.

One thing that could work is to add a global object to each C++ translation unit, and have its constructor call fpu_setup(); one of those calls ought to come before the offending global constructor – assuming that global objects defined in the same translation unit will be constructed one after another (AFAIK in practice they will, although in theory the implementation could, for example, order the constructor calls by the object name, so they wouldn't). However, this can get gnarly for systems with non-trivial build process and/or decomposition into shared libraries. Another problem is that compilers will "optimize away" (throw away together with the side effects, actually) calls to constructors of global objects which aren't "used" (mentioned by name). You can work around that by generating code "using" all the dummy objects from all the translation units and calling that "using" code from, say, main. Good luck with that.

The way I find much easier is to not try to solve this "portably" by working against the semantics prescribed by the C++ standard, but instead rely on the actual implementation, which usually has a defined entry point, and a bunch of functions known to be called by the entry point before main. For example, the GNU libc uses a function called __libc_start_main, which is eventually called by the code at _start (the "true" entry point containing the first executed instruction, AFAIK; I suck at GNU/Linux and only know what was enough to get by until now.) In general, running `objdump -T <program> | grep start` (which looks for symbols from shared libraries – "nm <program>" will miss those) is likely to turn up some interesting function. In these situations, some people prefer to find out from the documentation, others prefer to crawl under a table and die of depression; the grepping individuals of my sort are somewhere in between.

Now, instead of building (correctly configure-ing and make-ing) our own version of libc with __libc_start_main calling the dreaded fpu_setup, we can use $LD_PRELOAD – an env var telling the loader to load our library first. If we trick the loader into loading a shared library containing the symbol __libc_start_main, it will override libc's function with the same name. (I'm not very good at dynamic loading, but the sad fact is that it's totally broken, under both Windows and Unix, in the simple sense that where a static linker would give you a function redefinition error, the dynamic loader will pick a random function of the two sharing a name, or it will call one of them from some contexts and the other one from other contexts, etc. But if you ever played with dynamic loading, you already know that, so enough with that.)

Here's a __libc_start_main function calling fpu_setup and then the actual libc's __libc_start_main:

#include <dlfcn.h>

typedef int (*fcn)(int *(main) (int, char * *, char * *), int argc, char * * ubp_av, void (*init) (void), void (*fini) (void), void (*rtld_fini) (void), void (* stack_end));
int __libc_start_main(int *(main) (int, char * *, char * *), int argc, char * * ubp_av, void (*init) (void), void (*fini) (void), void (*rtld_fini) (void), void (* stack_end))
{
fpu_setup();
void* handle = dlopen("/lib/libc.so.6", RTLD_LAZY | RTLD_GLOBAL);
fcn start = (fcn)dlsym(handle, "__libc_start_main");
(*start)(main, argc, ubp_av, init, fini, rtld_fini, stack_end);
}

Pretty, isn't it? Most of the characters are spent on spelling the arguments of this monstrosity – not really interesting since we simply propagate whatever args turned up by grepping/googling for "__libc_start_main" to the "real" libc's __libc_start_main. dlopen and dlsym give us access to that real __libc_start_main, and /lib/libc.so.6 is where my Linux box keeps its libc (I found out using `ldd <program> | grep libc`).

If you save this to a fplib.c file, you can use it thusly:

gcc -o fplib.so -shared fplib.c
env LD_PRELOAD=./fplib.so <program>

And now your program should finally dump core at the point in the global construction sequence where NaN is computed.

This approach has the nice side-effect of enabling you to "instrument" unsuspecting programs without recompiling them s.t. they run with a reconfigured FPU (to have them crash if they compute NaNs, unless of course they explicitly configure the FPU themselves instead of relying on what they get from the system.) But there are niftier applications of dynamic preloading, such as valgrind on Linux and .NET on Windows (BTW, I don't know how to trick Windows into preloading, just that you can.) What I wanted to illustrate wasn't how great preloading is, but the extent to which C++, the language forcing you to sink that low just to execute something at the beginning of your program, SUCKS.

Barf.

Corrections - thanks to the respective commenters for these:

1. Section 3.6.2/1 of the ISO C++ standard states, that “dynamically initialized [objects] shall be initialized in the order in which their definition appears in the translation unit”. So at least you have that out of your way if you want to deal with the problem at the source code level.

2. Instead of hard-coding the path to libc.so, you can pass RTLD_NEXT to dlsym.

Leaf

The internal free market

This is going to be a bit atypical, because I'm going to talk, like, about organizing large teams of programmers. Which I rarely do, for the simple reason that it's not my problem. I'm not a manager, I don't think I'm likely to do a particularly good job as a manager in the near future, and I don't want to be a manager. As far as I'm concerned – if your problem is organizing lots of people, you brought it upon yourself. So this "internal free market" thing, which tends to work well according to my observations, is an exception to my general rule of not making or thinking much about "organizational" observations.

So, free markets. Basically a way to create incentives (because you have to compete) at the cost of redundancy (because of duplicated efforts by many competitors). A redundancy vs dependencies issue, if you like – several competitors means less dependence on each – and since I generally think redundancy is a fair price to pay for removing dependencies, you can guess that I'm leaning towards free market fundamentalism.

At this stage I'm skipping the detours where I'm dragging exciting pseudo-related stuff into this, like the subprime mortgage crisis and the enigmatically overwhelming support for Barack Obama by top programmers and tech bloggers. I'm skipping that to get to the simple point of there being no good way for an employer to create incentives for programmers with money.

How exactly this trick fails to work, and what kind of LOCs you get when you pay by the LOC is out of my scope. If you're sincerely surprised, there's lots of material for you to browse – Econ 101 Management being as good a place to start as any. The single thing I have to say about the "financial incentive" method here is that its failure isn't at all surprising to the free market fundamentalist.

In a free market, people solve their own problems, and (pay to) use the output of other people only when it helps them solve their problems, in a completely distributed way. Setting prices for things and subsidizing them is the trademark of a centralized, controlled economy. Now, does an employer paying by the LOC or by 1/#defects or whatever create an "internal free market", or an "internal government bureaucracy"? Without looking at soft stuff, like a hypothetical offense to sacred engineering values carried by the act of creating incentives, we can see that subsidizing LOCs will create surplus LOCs, just the way it works with agricultural surplus and everywhere else.

What would we do if we wanted a real internal free market? "Free market" means that we want to have people solving their problems, and let them "pay" each other in some way to solve them, without us controlling the latter process. In our specific context:

  1. "People" means "strong programmers" (or at least decent – or else why did "we", the employer, hire them, dammit?) Folks who, at the very least, like to be productive and have their stuff used. Maybe they also "like to solve puzzles", but not all do. For example, I hate puzzles, and have a strong preference for Alexandrian solutions. But you should still hire me.
  2. For those people, "solving their problems" means delivering user-visible features. This is the basic responsibility of developers towards the organization, this is what the organization is capable of judging (it better be), and this is what links the whole operation to reality through the external market forces.

The only question is, what does it mean for a developer to "pay" another developer? Paying with money makes no sense, not with our definition of "people". People who're into those transactions tend to be self-employed. However, developers do have their own currency, karma points or whatever term you prefer to use for it (they are all irksome; economics is to life what proctology is to anatomy – it's ugly because it's true). I know two kinds:

  1. Time. You can "pay" to developers, teams and the whole organization by volunteering to do particularly unsatisfying but important/urgent work.
  2. Code. When someone uses your code, they are paying you (I repeat, it's not you who are giving something to them; do not make this error or they'll stop using your code.)

Time and code are not unlike gold and printed money, because you can't make more time but you can make more code. However, proceeding with this analogy and trying to scale it to include inflation and such will expose my economical illiteracy and general lunacy to an extent making me want to stop now.

What we'll do now is examine how "trading" time and code works in programming, and how it creates incentives to invest efforts into the most needed things similarly to the way prices do in free market economies. We'll start with "trading code" – the less intuitive but the more fundamental kind of transaction.

You have to deliver something user-visible. The user couldn't care less about the guts making up your program – how you parse things, what your communication protocols are, which optimized libraries you use for processing bottlenecks, etc. However, you do care about these things, because they are really needed for things to work. With all these things, you could reuse "infrastructure" others work on (for example, me), or you could roll your own (let's ignore "international trade" where you use a third-party library for the moment).

To you, depending on me is a risk – who knows how many bugs I have, how well my stuff maps to your needs, etc. To me, on the other hand, having you reuse my code is the best thing that can happen to me during work hours. After work, better things can happen, in particular, those having to do with spending the money earned during work hours. But at work, the best thing that I can do is be productive and have others use my code. Lots of users is the workplace fortune equivalent to being rich in the real world. Do you see who pays who here?

What can an organization do to manage these infrastructure transactions?

  1. The "economical", "capitalist" solution: leave them alone except for securing them. "Leaving them alone" means not controlling them – not mandating the development and reuse of infrastructure and not assigning workforce to it. This means that by making my modules reusable, I'm only trying to please my internal users, so I'm likely to (try to) invest most effort into what they find important and helpful for doing their ultimate job. "Securing transactions" means something similar to the way public companies are forced to expose their accounting. If something becomes reusable code, it ought to have proper documentation, versioning, etc., and the organization must make sure it does.
  2. The "political", "socialist" solution: assign the task of developing a parser, an optimized library, etc. to a person/team – subsidize the parser (the price to a user is now lower – even if the parser is all buggy, a person is officially assigned to fix the bugs, and the responsibility for failures moves to that person and not to the one who decided to reuse the code). This means that the parser will be created even if in a "free company" nobody would want to develop and maintain it, knowing that most people wouldn't take the risk of using it for the benefits it provides. Leading to surplus crops of parsers.
  3. A further improvement on 2, the "communist" solution: force everyone to use The Parser. This means there are no "economical" means to punish the author whatsoever – where "punishing" means "not paying" and "not paying" means "stop using the code". However, there's still hope: you have political means to punish the author. For example, poke fun at the goddamn nightmare infrastructure, yell at the author, yell at his manager, ask your manager to yell at his manager's manager – a whole slew of counter-infrastructure measures. Victims of infrastructural communism use them all the time.

So this is how "trading code" is (more accurately, "can be") a better way of evolving reusable "infrastructure" than centralized planning. In general, the only thing I'm discussing is the reusable stuff – that's what organizations can optimize (or pessimize, creating useless "reusable" modules and not creating the actually needed ones). Nothing can be done about things which aren't reusable by definition, belonging to a single "feature"/"project" – those will have to be written once and only once no matter what.

What's wrong with this picture? Could be many things, but one thing I'll talk about (because I have a good answer for it) is the problem of "instant gratification"/"disruptive changes"/"local optimums"/etc. There are grand things that just can't be done by small incremental changes, the by-products of work on "specific features". You really need a person/team assigned to these things. This is somewhat similar to economies of scale which can be achieved by purchasing expensive machinery. How are many small farmers/shoe makers/etc. going to raise money for that machinery without central planning, if they're all busy with their small short-term profits?

This is where entrepreneurs come into play. Entrepreneurs are people with fire up their asses. Normal people want enough money to get by, enough money to not worry about money, or enough money to not have to work for money. Entrepreneurs want more money than they can sensibly spend during the decades of their lifetime. And they want it because they desperately need that money to feed the fire raging up their asses. When they see a potential for making truckloads of money, many of them are willing to put their own savings on the line to chase that chance.

This psychological profile is a speculation of mine – my best attempt to comprehend the inexplicable behavior of making efforts and burning nerve endings to make more money than you could possibly need. However, I do have motivation which is quite similar in the context of our "economics of programming" analogy. I'm a "programming entrepreneur", or at least I have, um, the same trademark proctological fireworks. I'm thrilled by opportunities to make stuff that, like, everybody will use, everything will depend on, …and everyone will want a piece of me when it breaks – so? It's still worth it.

I can't make such stuff as a by-product of working on something reasonably user-visible. I need to be assigned to it. What are the savings that I can put on the line? Time invested into doing unsatisfying, but important work. I call my own way of making these deals "buying development with debugging". I'm usually willing to debug the weirder of the urgent problems, although it's not much fun by itself, because it translates to a lot of karma points. I can then spend those karma points by working on what I want 80% of the time, 20% being the continuous urgent debugging tax.

Again, there's more than one way for that kind of "entrepreneur" to start a programming venture:

  1. The "economical" way – spend my own time implementing my ideas. Like a "real" entrepreneur putting his savings on the line and forced to look at his company bleeding that money if it doesn't take off, I will want to stop as soon as possible when I realize that I'm failing. Those so-called "organizational karma points" you gain in the trenches have better uses than wasting on the development of worthless stuff nobody will use.
  2. The "political" way – convincing "the government" (a manager) that my idea is worth implementing, and have someone assigned to it. Now nobody wants to admit the failure early on. I'm not losing anything when someone else struggles with the implementation – "I could do it better". The person working on the thing isn't really held responsible for the failure, either – not his idea, so why not keep trying to make it work? Everybody wants to make his stuff work and be used, after all. And the manager won't want to admit the failure because of all people, he'll get most of the blame. Therefore, the worthless effort will not be stopped for a lot of time.

Free market supporters are sometimes blamed for disrespecting people and reducing human nature to primitive egoism. Well, the only thing I can say is that I sure am a Good Person (how could it be different?), I respect myself lots, I successfully "launched" more than one "programming venture" both ways – "economical" (DIY) and "political" (persuasion), and of each of these two kinds, some succeeded and some failed.

And believe you me, deep down I refuse to take responsibility for the failing "politically launched" projects even now when we talk about it. On the other hand, the "economically launched" failures are – seriously – the best thing that happened to me in my professional life. I attribute most of my occasional successes – or, more accurately, non-failures – to lessons learned from the DIY failures, which I had no choice but admit responsibility for. (Damn, that was painful. To the extent that wasn't on my job description.)

Now, I'm not an "internal free market fundamentalist", simply because I know much more about programming than I do about economics, and obnoxious/oversimplified opinions usually correlate with ignorance. However, my experience seems to show that "internal free markets" are healthy enough to sustain continuous improvements on many scales, and eventually punish both "greedy" "instant gratification" techniques of pleasing managers/customers and architectural masturbation, promoting solid work.

And if you're not a manager (I mostly care about non-managers, guys and gals like me, you know), I think this quasi-economical angle can contribute to your ability to look at some young initiative around you and say "Hm, this might work out" and conversely "Epic fail on the way, I'm not going to touch this with a laser pointer, man". So, FYI.

Abstract (August 1999)