I can't believe I'm praising Tcl

April 21st, 2008

So the other day I both defined Tcl procedures all by myself and used them interactively, and I liked it, to the point where it felt like some kind of local optimum. This entry is an attempt to cope with the trauma of liking Tcl, by means of rationalizing it. I'll first tell the story of my fascinating adventures, and then I'll do the rationalizing (to skip the oh-so-personal first part, go straight for the bullet points).

Here's what I was doing. I have this board. The board has a chip on it. The chip has several processors in it. There's a processor in there that is a memory-mapped device, and I talk to it using CPU load/store commands. The CPU is itself a JTAG target, and I talk to it via a probe, issuing those load/store commands. To the probe I talk via USB using an ugly console that can't even handle the clipboard properly. The console speaks Tcl.

Net result: in order to talk to the memory-mapped processor, I have to speak Tcl. Or I can use a memory view window in a graphical debugger. Except some addresses will change the processor state when read (pop an item from a LIFO, that sort of thing). So no, memory view windows aren't a good idea – you have to aim for the specific address, not shoot at whole address ranges. Damn, just who thought of defining the hardware in such a stupid way? Why, it was me. Little did I know that I was thereby inflicting Tcl on myself.

Anyway, there's this bug I have to debug now, and as far as I know it could be in at least three different pieces of software or in two different pieces of hardware, and I don't like any of the 5 options very much, and I want to find out what it is already. So at the ugly probe console I type things like word 0x1f8cc000 (reads the processor status), word 0x1f8cc008 2 (halts the execution), word 0x1f8cc020 0x875 (places a breakpoint). I get sick and tired of this in about 2 minutes, when the command history is large enough to make the probability of hitting Enter at the wrong auto-completed command annoying. It's annoying to run the wrong command because if I ruin the processor state, it will take me minutes to reproduce that state, because the program reads input via JTAG, which is as slow as it gets.

So I figure this isn't the last bug I'm gonna deal with this way, and it's therefore time for Extending my Environment, equipping myself with the Right Tools for the Job, Tailored to my needs, utilizing the Scripting capabilities of the system. I hate that, I really do. Is there something more distressing than the development of development tools for the mass market of a single developer? Can a programmer have a weakness more pathetic than the tendency to solve easy generic meta-problems when the real, specific problems are too hard? Is there software more disgusting in nature than plug-ins and extensions for a butt-ugly base system? But you know what, I really fail to remember 0x1f8cwhat the breakpoint address is. This story has one hexadecimal value too much for my brain. OK, then. Tcl.

I decided to have one entry point procedure, pmem, that would get a memory-mapped processor id, 'cause there are many of them, and then call one of several functions with the right base address, so that pmem 0 pc would do the same as pmem_pc 0x1f8c0000. Well, in Tcl that's as simple as it gets. Tcl likes to generate and evaluate command strings. More generally, Tcl likes strings. In fact, it likes them more than anything else. In normal programming languages, things are variables and expressions by default. In Tcl, they are strings. abc isn't a variable reference – it's a string. $abc is the variable reference. a+b/c isn't an expression – it's a string. [expr $a+$b/$c] is the expression. Could you believe that? [expr $a+$b/$c]. Isn't that ridiculous?

In fact, that was one of my main applications for Tcl: ridiculing it. I remember reading the huge Tcl/Tk book by Brent Welch with my friend once. There was a power outage and it was past the time when the last UPS squeaked its last squeak. And the book was there 'cause the hardware guys use it for scripting their lovecraftian toolchain. We really did have fun. Tears went down my cheeks from laughter. Even people with the usual frightened/mean comments about those geeks who laugh their brains out over a Tcl book didn't spoil it. So, ridiculing Tcl, my #1 use for it. The other use is the occasional scripting of the hardware hackers' lovecraftian toolchain. Overall, I don't use Tcl very much.

The nice thing about Tcl is that it's still a dynamic language, and reasonably laconic at that, modulo quoting and escaping. So I enter the usual addictive edit/test cycle using tclsh < script. N minutes down the road (I really don't know what N was), I've finished my 2 screenfuls of Tcl and the fun starts. I actually start debugging the goddamn thing.

$ pmem 0 stat
IDLE
$ pmem 0 bkpt 0 0xbff
$ pmem 0 bkpt 1 0xa57
$ pmem 0 cmd run
$ pmem 0 stat
DEBUG
$ pmem 0 pc
0xbff
$ pmem 0 rstack
3 return addresses
addr 0: 0x0005
addr 1: 0x05a8
addr 2: 0x0766
$ pmem 0 cmd stp
$ pmem 0 pc
0xc00

Weeee! HAPPY, HAPPY, JOY, JOY!

You have no idea just how happy this made me. Yeah, I know, I'm overreacting. I'll tell you what: debug various kinds of hardware malfunction for several months, and you'll be able to identify with the warped notion of value one gains through such process. On second thought, I don't know if I'd really recommend it. Remember how I told low-level programming was easy? It is, fundamentally, but there's this other angle from which it's quite a filthy endeavor. I promise to blog about it. I owe it to the people who keep telling me "so low-level is easy?" each time they listen to me swear heartily at a degenerate hardware setup where nothing works no matter what you try. I owe it to myself – me wants to reach a closure here. Why should I tolerate being regularly misquoted at the moments of my deepest professional catharsis?

Aaaanyway, in just N minutes, I bootstrapped myself something not unlike a retarded version of gdb, the way it would work if the symbol table of my program was stripped. But no matter – I have addr2line for that. And the nice thing about my retarded debugger front-end is that it looks like shell commands: blah blah blah. As opposed to blah("blah","blah"). And this, folks, is what I think Tcl, being a tool command language, gets right.

I come from the world of pop infix languages (C/Java/Python/Ruby/you name it). Tcl basically freaks me out with its two fundamental choices:

So basically, pop infix languages (and I use the term in the most non-judgmental, factual way), pop infix languages are optimized for programming (duh, they are programming languages). Programming is definitions. Define a variable and it will be easy to use it, and computing hairy expressions from variables is also easy. Tcl is optimized for usage. Most of the time, users give simple commands. Command names and literal parameters are easy. If you are a sophisticated user, and you want to do pmem 0 bkpt [expr [pmem 0 pc] + 1], go ahead and do it. A bit ugly, but on the other hand, simple commands are really, really simple.

And eventually, simple commands become all that matters for the user, because the sophisticated user grows personal shortcuts, which abstract away variables and expressions, so you end up with pmem 0 bkpt nextpc or something. Apparently, flat function calls with literal arguments is what interactive program usage is all about.

I'm not saying that I'm going to use Tcl as the extension language of my next self-made lovecraftian toolchain (I was thinking more along the lines of doing that one in D and using D as my scripting language, 'cause it compiles fast enough and it's apparently high-level enough). I haven't thought enough about this, but the grotesque escaping/quoting in Tcl still freaks me out; I don't want to program like that. All I'm saying is that I like the interactive part. Specifically:

Allow me to elaborate.

Short code vs short commands

Lots of people have noticed that keeping your code short is extremely important. More surprisingly, many people fail to notice this, probably because "1 line is better than 5" doesn't sound that convincing. OK, think about 100K lines vs 500K and you'll get the idea. Oh, there are also those dirty Perl/shell one-liners that make one doubt about this. I've known a Bastard Programmer that used 2K bash one-liners as his weapon of choice. OK then, so the actual rule must be "short code is good unless it's written by a bastard". But it's the same core idea.

So we have the Architect type, who loves lots of classes which delegate work to each other, and we have the Enlightened type, who wants to write and read less. And the Enlightened type can rant and rave all day how Python, or Ruby, or Lisp make it oh-so-easy to define data structure literals, or to factor out stuff using meta-programming, or some other thing an Architect just never gets. And I'm all with it.

And then we have interactive shells. And in Python it's doit("xx","yy"). And in Lisp it's (doit "xx" "yy"), or (doit :xx :yy), or (doit xx yy) if you make it a macro. And in Ruby it's doit :xx :yy, if you use symbols and omit parens. And that's about as good as you can get without using your own parser as in doit "xx yy", which can suck in the (more rare) case when you do need to evaluate expressions before passing parameters, and doesn't completely remove overhead. Also note how all these languages use (), which makes you press Shift, instead of [] which doesn't. Ruby and Perl let you omit (), but it costs in readability. And [] is unanimously reserved for less important stuff than function calls.

The whole point of short code is saving human bandwidth, which is the single thing in a computing environment that doesn't obey Moore's law and doesn't double once in 18 months. Now, which kind of bandwidth is the most valuable? I'll tell you which. It's the interactive command bandwidth. That's because (1) you interact a lot with your tools and (2) this interaction isn't what you're trying to do, it's how you're trying to do it, so when it isn't extremely easy it's distracting and extremely frustrating.

This is why an editor that doesn't have short keyboard shortcuts for frequently used commands is a stupid fucking piece of junk and should go down the toilet right now. This is why a Matlab vector – [1 2 3] – is much better than a Python list – [1,2,3] (ever noticed how the space bar is much easier to hit than a comma, you enlightened dynamic language devotee? Size does matter). And don't get me started about further wrapping the vector literal for Numeric Python.

The small overhead is tolerable, though sucky, when you program, because you write the piece of code once and while you're doing it, you're concentrating on the task and its specifics, like the language syntax. When you're interacting with a command shell though, it's a big deal. You're not writing a program – you're looking at files, or solving equations, or single-stepping a processor. I have a bug, I'm frigging anxious, I gotta GO GO GO as fast as I can to find out what it is already, and you think now is the time to type parens, commas and quotation marks?! Fuck you! By which I mean to say, short code is important, short commands are a must.

Which is why I never got to like IPython or IDLE. Perhaps Ruby could be better, because of omitting parens and all. Ruby seems to be less inflicted with the language lawyer pseudo-right-thing mindset. But the basic plain vanilla function-call-with-literal-args syntax still doesn't reach the purity of *sh or Tcl. Well, the shell is an insanely defective programming language, so it's not even an option for anything non-interactive. But Tcl gets way closer to a programming language. Which brings us to the next issue:

Ad-hoc scripting languages – the sub-Turing tar pit

Many debuggers have scripting languages. gdb has one, and Green Hills MULTI has one. Ad hoc command languages usually get the command-syntax-should-be-easy part right – it's command arg arg arg... They then get everything else wrong. That is, you usually don't have any or some of: data structures, loops, conditionals and user-defined functions, option for expression evaluation in all contexts, interface to the host OS, and all the stuff which basically would make the thing a programming language. Or you get all those things in a peculiar, defective form which you haven't seen anywhere else.

I wish people stopped doing that. I understand why many people do that very well – they don't know any language which isn't a 3rd generation one (presumably C++ or Java). They don't know how scripting works except on a theoretical level. They know how to build a big software system, with objects and relationships between objects and factories of objects and stuff. At the system/outside world boundary they're helpless though. Outside of the system our objects are gone. There's this cold, windy, cruel world with users and files and stuff. Gotta have an AbstractInputParser to guard the gates into our nice, warm, little system, um, actually it's "big", no, make it "huge" system.

These are the Architects who get mocked by the Enlightened dynamic language lovers. They normally dismiss scripting languages as "not serious", therefore, when faced with the need to create a command language for their system, they start out with a plan to create a non-serious (a.k.a crippled) language. Even if they wanted to make it a good one, they never thought about the considerations that go into making a good scripting language, nor do they realize how easy/beneficial it is to embed an existing one.

So basically we have 3GL people, who realize that commands should be short ("it's a simple thing we're doing here"), but they don't see that you need a real Turing-complete programming language for the complicated cases. And we have 4GL people, who optimize for the complicated case of programming ("what's a scripting language – it's a programming language, dammit!"), and they don't care about an extra paren or quotation mark.

And then we have Tcl, which makes easy things really easy and scales to handle complicated cases (well, almost, or so I think). And not only does it make plain funcalls easy – it reserves [] for nested funcalls, in the Lispy prefix form of outercall arg [innercall arg arg] arg... [] is better than (). Pressing Shift sucks. And custom keyboard mapping which makes it possible to type parens without pressing Shift is complete idiocy, because you won't be able to work with anyone's machine. This shit matters, if you program all day long it does.

Now what?

I don't know if I'd use Tcl. It's less of a programming language than your typical pop infix 4GL. For starters, [expr] is a bitch. And then there are "advanced" features, like closures, that I think Tcl lacks. It has its interesting side from a "linguistic" perspective though. It has really few core syntax, making it closer to Lisp and Forth than the above-mentioned pop infix ilk. So you can use Tcl and claim for aristocracy. Of course you'll only manage to annoy the best programmers this way; the mediocre won't know what you're talking about, seeing only that Tcl doesn't look enough like C to be worth the name of a language.

I'd think a lot before embedding Tcl as a scripting language for my tools, because of linguistic issues and marketing issues (you ought to give them something close enough to C, whether they're a customer or a roommate). So the practical takeaways for me are modest:

  1. I ain't gonna mock Tcl-scriptable tools no more. I understand what made the authors choose Tcl, and no, it's not just a bad habit. On a level, they chose probably the best thing available today in terms of ease-of-programming/ease-of-use trade-off (assuming they bundle an interactive command shell). If I need to automate something related to those tools, I'll delve into it more happily than previously. So much for emotional self-tuning.
  2. I'll let it sink, and try to figure out whether you have a better trade-off. For example, if Ruby had macros (functions which don't evaluate their inputs), you could say doit x y without making x and y symbol objects, which forces you to prefix them with a colon. How macros of that sort should work in an infix language escapes me (not that I think that much about it, but still). Anyway, I'll definitely add Tcl to the list of things I should understand better in order to fight my linguistic ignorance. Being an amateur compiler writer, that seems like one of my duties.