Teeth marks at the rear end

January 12th, 2008

Today we're going to discuss Good Things. Good Things, says I, are the ones that don't bite your butt. A Goodness benchmark can thus be devised as follows: if the user of a Thing has few teeth marks at his rear end, the thing is Good. Now relax and bare with me while I approach the topic sloooowly, but suuuurely.

Like most programmers, I suffer, by nature, from Environmental Laziness. I don't like to switch environments and then spend time learning new ways to do old things. This means that my natural tendency is to stick to one OS, one programming language, one text editor, etc. I realized at some point that this approach sucks, and consciously try to fight it. But I know that if I just let myself be, I'm as Environmentally Lazy as it gets, and no matter what I do, it shows through.

There's a rather small, but colorful group of Environmentalist programmers who love new environments. This group is colorful because each of them runs a different, unimaginably obscure combination of software. For example, I sit next to a world-class Environmentalist at work, and I asked him once, noticing the mosaic of windows at his screen, "Why are you simultaneously launching 12 Eclipses?" (He used Eclipse as a front-end to some weird version of gdb talking to some weird board, but that's another matter). He slowly turned to me, his expression conveying his trademark fake hostility towards the disturbingly stupid mortals, and explained: "Because I'm simultaneously running 1 Ion". Ion is a window manager that will carefully replicate the splash screen of Eclipse, and most other programs. Its author will then tell you that it's Eclipse's, and everybody else's, fault, because they are braindead applications that won't follow the ICCCM. Incidentally, the author is also an "environmentalist" in the common sense of the word: he despises cars and uses a bicycle. At -30C, when bicycles break from the cold. Explains a whole lot to me. I come from Russia, and at -30C, they don't send kids to school to not have them freeze their faces off or something. Urine doesn't freeze in the air at -30C, but it gets way too close to that point. Anyway, if a guy runs Ion, you can count on him being a die-hard Environmentalist, and on his display to contain colorful content at all times.

I always look for Environmentalists around me because they can give priceless advice when you need to choose what software to use between several alternatives. And then when you need to install and configure that program. And then when you upgrade it and your configuration no longer works. And then when you decide to switch to some other program. Environmentalists are really good at this, and I really suck at this, so together, we make a great team as far as I'm concerned. However, I don't think that Environmentalism correlates with other sorts of programming aptitude. I know an Environmentalist, with Emacs customizations and Linux at home and all, that actually said the following sentence and meant every word: "The program entered an infinite loop and then exited it". On the other hand, Probably The Strongest Hacker I've Seen lives in a gvim with appalling, white-on-yellow string literal highlighting. Probably copied the wrong combination of files from other people. And he lives with that, because he just can't be bothered to fix it, although he's awfully sharp and obviously would be done with it in no time. But then the guy with what looks like 12 Eclipses but is in fact 1 Ion is also awfully sharp. Environmentalism doesn't correlate with anything. To me, an Environmentalist is someone who can help if you need to change your environment, but quite likely will have trouble helping you if you don't need to change it. Because some of them just can't work anywhere outside of their customized-to-death workstations. And that's all I know about an Environmentalist a priori; could be any sort of programmer, really.

I think that Environmental Laziness is a manifestation of a more generic personality trait, which I call Neophobia, The Fear of The New. Today, we know which mushrooms are edible and which aren't, thanks to N+1 heroes of the ancient times: N neophiliacs and 1 neophobiac. The saga of their adventures goes like this. The Gang of N+1 traversed the virgin woods of the planet. Each time they stumbled upon a new sort of mushroom, a neophiliac would happily grab a sample, taste it, swallow and say "ummm, interesting!", genuinely amused. In a couple of hours, he would drop dead, which was methodically recorded by the neophobiac on the team: "poisonous". As you can easily prove by induction, at the end of the journey our gang had the size of N+1-K, where K is the number of poisonous kinds of mushroom on Planet Earth. K turned out to be rather large, which explains the relatively rare occurrence of the Neophilia gene in the modern population. But we should obviously cherish and encourage those scarce leftovers, because who else will we use to tell a poisonous window manager from an edible one?

Anyway, the purpose of this whole introduction was to establish the simple fact: I'm a hopeless, Environmentally Lazy Neophobiac. When I sense unknown taste, my reflexes actually want me to throw up ("hey, it's not my job to taste new kinds of mushrooms, I'm the guy with the notebook!", says the reflex). In the civilized age, this body design has its drawbacks since the new tastes normally introduce themselves at social gatherings, where vomit is most often unwelcome. I'm generally in control of my reflexes, so if you're having a party and you wanted to invite me until you read this, that issue shouldn't bother you (a couple of other issues probably should, but I digress). But at the core, below the level where I look like a toilet-trained civilized primate, I hate new things. Probably genetic. And now that we're through with the introduction, we finally reach the second introduction, the one about programming languages.

Back at the university, I learned C++, Java, Matlab, Scheme and ML. Not entirely unexpectedly, Scheme and ML was taught by arrogant weenies who never bothered to check what happens outside of the Pure Functional world. Their not-entirely-correct statements of the form "you can't do that in C++ or Java" earned them the solid credibility value of 0. As to programming assignments, they of course concentrated on the most practically useful ones. Like implementing a Scheme interpreter in Scheme, or doing type checking according to the rules of the ML type system in ML. So the primary thing that I learned about Scheme and ML was that some dead languages and programming styles are apparently particularly interesting to a particularly uninteresting kind of people. Can't really count it as learning Scheme or ML, even though I successfully finished implementing my useless interpreter and type checker. Took me lots of time to understand what's the deal with all those apparently reasonable people loving Lisp and OCaml.

And Matlab is really a DSL. A great DSL, but you won't use it for fiddling with text and files, for example. So my only readily available options for fiddling with text and files were C++ and Java. Tough choice, because both awfully suck at this sort of thing, but somewhat differently. C++ is verbose, has disgusting string manipulation facilities, compiles slowly and dumps core, while Java is even more verbose; verbosity is a big problem, perhaps I'll (verbosely) talk about that some time. I settled for C++ because that's what everybody else was using around me.

It didn't feel very productive; things which seemed like they should be trivial weren't. So I've been laaaazily looking around for some other way to fiddle with text and files. Shell scripts. Crippled variables, ugly control flow, non-existent or exceptionally painful error handling, and program-specific command line syntax used for most of the "argument passing". Screaming fucking bloody mess, to quote John Lydon. sed. Just a little bit too specialized for my taste. awk. Good when a program fits into a command line. Programs that don't fit into one screen reportedly shocked the authors of the language themselves, and that was at the time when screens were much smaller. Perl. A superset of C, shell, sed and awk. The man pages actually address "celebral C programmers", "sharp shell programmers", "accustomed awk users" and "seasoned sed programmers", to help them migrate to this new C++, sh++, awk++ or sed++, depending on their background. And my background makes it Cshawksed+=4.

This, folks, is insanity. But wait, we have a local ASIC wizard that happens to be an Environmentalist and a Perl user. He'll help me if I get stuck. And Perl does seem to make normal bread-and-butter stuff easy and cheap enough. Variables, arrays, conditions, loops, functions, arithmetics, that kind of thing. There are actual languages out there that have the guts to make this stuff less than straightforward. Like Tcl. Or Prolog. Try to implement a goddamn word count program in Prolog. Fuck Prolog! Seriously, I read that Erlang was first implemented in Prolog and what this means is that I don't understand anything about anything, because in Prolog, I feel that my hands are tied behind my back, and someone keeps spitting at my face. So anyway, Perl. We have an Environmentalist and we have the bread-and-butter things and we have modules to stuff those things into. All righty then.

So I occasionally used Perl for fiddling with text and files, but all this code was single file programs. I never got to a point where I actually evolve and reuse libraries in Perl. I think I'm not a Perl guy. There are two camps of programmers, the programming-as-definitions camp ("this function returns the number of frames in a movie clip") and the programming-as-actions camp ("this function opens an MPEG files and returns the number of frames listed in this header"). The Action guys tend to like Perl, because it lets them do things easily. The Definition guys tend to sneer at Perl, because they don't understand what all those things mean, and why; too many arbitrary things, if you ask them. Me, I'm a Definition guy, struggling to get in touch with his Action side. Deep, isn't it? Anyway, Perl clearly wasn't for me. I used it, and kept thinking about this strange situation where you have Real Languages, like C++, which make it hard to do things, and then there are Toy Languages, like Perl, which make things easy but of course you can't do Serious Programming in them. You are right. It does sound moronic. I felt the paradox. I think of myself as a person who can't say "the program exited an infinite loop". So I couldn't help seeing the paradox. But I was lazy, so I didn't act on it. Programming languages. C'mon. They're all the same. Loops, arrays, functions. The last thing I want is more new ways of doing the same old things. I know more than enough kinds of edible mushrooms already. Leave me alone.

Then, several things happened. First, I found a critical mass of Python speakers near me. With a couple Environmentalists among them, which made Python an option for me, too, because now I had them to get me out of jams, which was comforting. Second, the "Real Languages are harder to use" paradox became increasingly itchy, as I began realizing that the more you try to plug the holes in C++, the more problems you end up with. Third, a guy who joined me on some project disliked the textual interface to a bunch of C++ code I wrote. Everybody disliked that textual interface; it sucked. Why not wrap your code in Python bindings instead, he asked. Could be a good idea, I said, never tried it though, would you do it? Having the Environmental gene, he agreed. And then it turned out that I need to work a lot against this interface, because I need it for testing hardware, and when you make hardware tests, you make lots of them. The stuff going through this interface grew hairy, so I added a wrapper layer, 2-3 thousands lines of Python (for those who live in the C++ land – that's 10-15K LOC for you). Then I wanted to hook it to some other old C++ code of mine. More bindings... Meanwhile, people around were busy churning out Python code. I found myself looking for a way to produce debug information for a new debugger front-end, written in Python. And do it fast. Think, think, think. OK, I'll simply emit Python code for it to eval... More Python. Python is all over the place. I guess that makes me a Python programmer.

I never learned Python. All these introductions were introducing this single little factoid, which I find totally shocking. I NEVER LEARNED IT. How can it be?! It's a programming language, damn it. Let's take an example of a programming language. Um, I dunno, C++? OK, our example is C++. Can you imagine someone trying to make sense of C++ without learning it, just by reading code, writing code and looking things up on demand? Ha! No, make that "Ha" squared and raised to the power of "get out of here". OK, C++ sucks, we've already got enough of that at yosefk.com. Perl. You can't mentally parse Perl without knowing quite some of it. And even then, only perl can parse Perl. See? You ought to learn programming languages. Read a book or a tutorial or something. Not a very big deal, by the way, except most people are so damn Environmentally Lazy that if I weren't like that myself, it would really bug the hell out of me. You ought to actually learn, and people don't want to learn, and that's why they prefer to use fewer languages and ignore the itching. OK, so the strings are white-on-yellow. So the error messages occupy the better part of the screen. To quote John Lydon again, "I'm a Lazy Sod! I'm so Lazy! Can't even be BOTHERED!!"

But you see, we aren't necessarily talking about Bad Laziness here, and no, programming languages don't have to require that much learning. At this point, I am supposed to write lots of explanations, which I will optimize out by linking to the book User Interface Design For Programmers by Joel Spolsky. I mention it because of two of its virtues: (1) It perfectly makes and explains my next point and (2) Quite likely you've already read it, because everybody reads Joel On Software, so you know what I'm talking about. What I'm talking about is what this book calls the cardinal axiom of user interface design: "A user interface is well-designed when the program behaves exactly how the user thought it would." I won't argue why this makes sense; this is discussed in the beginning of the book, and I couldn't say it better. What I want to argue is a complementary claim: "Every interface is a user interface." The kind of user we're dealing with varies, but whichever kind it is, the last thing we want to do to that user is to surprise him or her or it (some interfaces are for machines; these are normally easier to get right).

In particular, a programmer is a kind of user, and a programming language is a kind of user interface. So is a function or a class or a library. If the programming language or a library doesn't do what actual programmers out there think it would do, than it sucks. If it goes as far as silently doing the wrong thing instead of telling the user about the mistake in a clear way pointing to the exact source of the problem (I'm talking to you, C++ template weenies), then, my friends, it totally and uncompromisingly sucks. Period.

Some people never think about it this way. Others fail to get it. Most of them are selfish, no-good egomaniacs who always prefer to inflict arbitrary amounts of pain on their users in order to save themselves a tiny bit of work and/or implement their religious beliefs about the essence of engineering aesthetics, the money of their employer be damned. In the truly advanced cases, their aesthetics actually boils down to "it's beautiful if I don't have to do anything". Once their instincts and their religion converge to this single point, these people enter a state of perfect internal harmony which can only be disturbed by a loaded shotgun. Well, if you are one of those people, stop reading now and go away. See? We don't welcome selfish, no-good egomaniacs here. Now that we've only got the caring-about-their-users kind of people with us, the situation is intimate enough for me to show you the teeth marks which Python has left on my butt.

You see, nothing is perfect. And Python certainly is no exception. I'm no big fan of Python. In particular, like all other "modern" "dynamic" languages, Python runs awfully slowly, and for no good reason, most of the time, and it won't be fixed unless they change the language spec. I can say a lot more. But it bites your butt quite rarely. Isn't that nice? I came to appreciate it. I think that when people say that X is designed in a "tasteful" way, the major thing they are telling you is simply "X isn't into butt biting". But you can't always win; either you're going to invest time into learning and memorizing the rules, or you will guess them wrong, earning yourself a teeth mark.

For example, it took me a while until I realized that Python was, in fact, evaluating my definitions, one after another, like there's no tomorrow. Without any kind of lookahead. Evaluation is neither compilation nor execution. You see, in C, you can't use a function before declaring it, because it compiles everything in one pass. In Java, you can use a function anywhere in the code, because first of all, it figures out which functions are defined, and only then issues errors about undefined this and unknown that (C++ does it, too, in some places, but not others). In Python, you can call a function you haven't declared yet, so I figured it was pretty much like Java. I didn't really think about it; let's just go with the flow, I said to myself. Lazy of me, isn't it? I actually wanted that teeth mark; it was an experiment: how lazy can you get, and how much will it hurt?

Well, what Python really does is it doesn't care if something is defined in a namespace until it absolutely has to know; but it doesn't bother to look ahead. When you define a function f that calls a function g, Python only needs to compile code, not run it. OK, let's compile a call to some "g" thing, maybe it's defined somewhere already, maybe not yet, what do I care? But if you call g before it's defined instead of defining a function calling g at that same point, Python will fail, because now you're really asking it to find g, right there on the spot, and it didn't see anything like that yet, and it can't proceed, because you ordered it to do something now. Makes perfect sense for a dynamic language, makes no sense for a static language (people use a bogus distinction "interpreted"/"compiled" in this context, but they shouldn't, because it's bogus; it refers to the implementation when you're really talking about binding rules). I came mostly from static languages; I didn't expect this kind of thing. Mac looks clunky to a novice user coming from a Windows background, as the example from Joel Spolsky's UI book shows. Bite! Ouch!

Wait, where's the actual bite? He promised to show us some ass, with teeth marks and all! Where's our metaphorical software weenie porn? OK, here it goes. Consider the definition def f(arg={}): ... This defines a function f with an argument arg, and its default value is an empty dictionary. Now, suppose you call f with the default argument, and then f puts some values into the dictionary. Next time you call it without explicitly passing an argument, the dictionary will no longer be empty. What?! I wanted arg's default argument to be an empty dictionary! But it's not what Python did; arg's default argument is in fact the empty dictionary, the particular object created when Python evaluated the expression "{}". No, it doesn't reevaluate it each time when you call f, and this is consistent with the way Python normally behaves: reads and evaluates your code right when it bumps into it. REPL. Read-eval-print loop. Totally obvious, unless you come from a static languages. Well, I got used to this The Empty Dictionary business, and even used it as a "feature" in throwaway contexts (you can fake C-like static variables by adding an undocumented argument with a default value to your function; I was then shocked to see Real Pythonistas use this Industrial Strength Technique, too).

Right near the scar from the {}-shaped bite, there's the []-shaped teeth mark. This one's very famous: a=[[]]*5. Looks like a list of 5 empty lists, but it isn't. [] is evaluated once to become an empty list, and then you get a list with 5 pointers to that empty list. Now, when you insert an element to a[0], you'll find out that it was also added to a[1]...a[4]; that will be surprising. REPL! Bite! Ouch! I really wanted [[] for x in range(5)]; in list comprehensions, the expression is reevaluated many times, because otherwise you wouldn't be able to create different elements, which is the whole point of list comprehensions. And in fact I wanted different elements, and not the same empty list pointed to from 5 places, so list comprehensions are the way to go here. My fault. I've been lazy. See, I can admit it when it's my fault. Sometimes. Not if your fucking thing keeps biting me over and over and over again though. Then it's your fault. Stop arguing, you clueless weenie. I'm The User! "User". Ever heard of users? These are the guys who use your stuff. It's like "customers". Probably heard about those ones, didn't ya? Shut up and give me some respect!

OK, let's look at some more ass. Let's see, where else did the deadly snake bite me? Imports. I so don't understand what import actually does that I'm seriously considering looking it up. I don't remember where the actual bite marks are, but it got tricky a couple of times. Name errors 5 minutes after a program started running. I misspell locals and the frigging thing compiles a reference to a global, hoping that it will be defined somewhere, later. Pure stupidity, if you ask me. REPL or no REPL, but anybody who tells me that this behavior is The Right Thing could just as well said, "Don't bother talking to me about Python – I'm a Python weenie! Weeeeenie! Isn't that cute?!" "But why didn't you test that function before running it in your program that takes whole 5 minutes to run?" Because writing that test would itself take 5 minutes if I were lucky, Mr. Weenie. And in fact, reasonable Pythonistas acknowledge that yes, it bites them, too, and it would be great if it didn't happen all the time. Oh, and lexical scoping is a great gadget, you know. When I have a variable x, I sure as hell don't want it to get overwritten by the x in [x**2 for x in range(5)]. Is this obvious, or are you Miss Weenie, the sister of our friend the test-driven Python apologist?

More ass. We want ass! We want ass! Sorry, we're out of ass. I don't remember any other class of ass bite by Python. And I have a memory for that thing; soft skin on my buttocks or something. And I didn't even learn Python; I'm still running the experiment and insist on Not Reading The Fucking Manual, although it's well past time when I should. Compare my results with Python to C++, which I did learn (read two books and a half, took a mandatory course in the university, read stuff on the net and so on). Net result? Bite! Bite! Bite! Bite!! Bite!!! Yeah, yeah, I know that you "don't have any problems with C++". Believe me, I know what you're talking about; I didn't have any problems with C++, either, and you can ask just about everyone who's ever worked with me whether I had problems with C++, and I can guarantee you that each and every person will laugh at that question. I know why you don't have problems. You can no longer see the problem when you have one. Allocate a vector of vectors of bytes. Start pushing elements into it. Core dump. Oops. I kept a reference to the bytes in a previously pushed byte vector and used that reference after pushing another byte vector. Um, I didn't think I'd need that. OK, make it a list of vectors of bytes. Wait, do I need the random access I've just given up? No, I guess I don't. Maybe I'll need it later. Well, then I'll change the element type to some smart pointer other than vector... No problem, I'm in Full and Total Control. Yes, it bites me, but I keep moving. Sloooowly. But Suuuurely. Problems? What problems? Sure, C++ is harder to use than those other languages, but that's because they aren't Real Languages! Makes perfect sense, doesn't it?

I'm telling you, C++ is such an easy target that once you start bashing it, it's hard to drop the habit. But I have to. I have to go back to our subject. Why is Python so easy to not learn, and what should I copy from it if I want my software to be usable?

I don't really have an answer, not an orderly one I can serialize into text anyhow. I'm working on it; I find it fascinating, in particular, because I work on programming languages. They are DSLs, essentially, but relatively fat ones, Turing-complete, with arrays, functions and all. I defined and wrote the interpreted implementation of one small one, and "managed" (talked to and tried to not get in the way of) people who made one big one. I have practical interest in programming language usability, but I'm still thinking about it. I can only say vague stuff right now. For example, it looks like you can take many of the things in the UI-for-programmers book and translate them to the vocabulary of programming languages or libraries and they'll still work. Good book.

One thing I can surely tell is this. Making the program act the way someone expected it is Hard, because a Someone is harder to model than, say, a Something. But making the program complain when it's unhappy is way easier. It's like bathing: if you do it regularly, you'll stay clean without much effort. If you stop bathing for a while, you'll probably run into problems that a single bathing session won't quite cure. But most people don't have first-hand experience on that one, and I doubt that they miss anything worthwhile. Error handling is similar: every time you write a line, you can think about stuff that could go wrong, and insert detailed whining into the program. Not some stupid fucking numeric error code, detailed whining. A stupid fucking numeric code will lead to "rm: Is a directory". I despise software that tells me that something, somewhere, is a directory, and finds that insightful.

With Perl, every time you make an error, it tries to do something sensible (convert the string to a number or vice versa, etc.) C++ dumps core or corrupts data. Python throws an exception with a call stack. I wish it wouldn't destroy the original context with all the variable values and stuff, but it's pretty good already. I've tried to take a fairly anal-retentive approach to error handling in my "programming environment" sort of work, with quite elaborate code to make user-defined tests easy and all that, and I think it pays off. It helps people lose fear, and "fearless" is "happy". For example, with Python, I know that if I foul up, most likely it will throw an exception rather than go ahead and commit a horrible atrocity. Helps you loosen up and try things out, and then sleep quietly once you're done. But failing early and informatively is only the next best thing after not failing, and to not fail, you need to make your system behave as expected. Which is somewhat harder than regular bathing. Rest assured that when I'll know enough about it to convert it to text, you'll find that text here, and one thing it won't be is it won't be particularly short.

P.S. I've mentioned the camp of Sadists or plain selfish people who inflict suffering on their users. There is, of course, a corresponding camp of Masochists who like to suffer. You know, people who don't have problems with C++ and the like. The psychology of technical masochism is beyond the scope of this entry; it's sufficient to note that if someone claims to be using tools which are good for pros but bad for newbies or even generally "harder to use" than lesser tools, it's quite likely a symptom of this dangerous disorder. Technical masochists looking for a cure are encouraged to read Chapter 6 of the UI book, "Designing for People Who Have Better Things To Do With Their Lives". Then you could think about things to do with your lives that you find better than licking the scars your favorite tools leave at your rear end. And at least stop calling people who have better things to do with their lives "lazy", without mentioning that this is really a good kind of lazy. Thank you.

1. kragenApr 24, 2008

You write:
Python throws an exception with a call stack. I wish it wouldn’t destroy the original context with all the variable values and stuff, but it’s pretty good already.

The exception actually contains all the variable values and stuff. Take a look:

import cgitb

Note that this means that by default your errors will go to stdout. There's an incantation to fix that but I don't remember it.

2. Yossi KreininApr 24, 2008

You can't resume execution from the point where the exception was thrown though, the way you can do in CL.

3. kragenApr 26, 2008

True enough.

4. Marcel PopescuNov 25, 2008

"You see, in C, you can’t use a function before declaring it, because it compiles everything in one pass."

You probably mean Pascal. C is at least a two-pass compiler... three with preprocessing directives.

5. Yossi KreininNov 25, 2008

Well, the fact is that in C, you really can't use a function before declaring it, unless you rely on auto-prototyping (just passing the arguments according to the calling convention and not caring about type conversions which the prototype could have force you to do).

Also, without preprocessing directives, why would you need two passes for a C translation unit escapes me. Unless you count linkage as a pass. Why can't you compile a C function to assembly with label references right there when you see it, without looking ahead?

I don't care about whether preprocessing or linkage count as "passes" and is their number thus 1, 2 or 3, just about what actually happens.

6. AllanLane5Sep 29, 2011

The reason 'C' requires multiple passes is that you CAN call something before you define it, as long as you declared it first (typically in some .h file). The Declaration keeps the compiler happy that eventually it's going to run into some Defining instance in the source code somewhere.

If the Declaration and Definition don't agree, the compiler can complain about it. If the definition never actually occurs, the Linker will complain about it.

All of this was so that 'C' could do separate compilation and linking — something Pascal did without for the longest time. This gives you a lot of freedom in the order of Definition for your routines in 'C', where in Pascal you HAD to define something before you used it. Unless you declared something as "FORWARD", which was a work-around, but still enabled one-pass compilation.

Post a comment