Why don't we have a word for it?

January 19th, 2008

I'm going to talk about low-level vs. high-level. Again. It's complicated. Lots of angles. I need to warm my brain up. For a warm-up, let's start with an important theorem I've recently discovered.

yosefk's Semantical Decay Theorem: all useful terms which are not completely neutral become meaningless. Think about the definition of anything political, like "nationalism" or "democracy". For most people, it's little more than a synonym for "good" or "bad". Just an emotion; the meaning is long gone. Clarification: by "most people", I mean literally "the majority of the population". You know, the guys who decide who runs your government. I'm afraid "the majority of the people you hang out with" is a somewhat skewed sample.

And it's not just politics. Let's take a minute to browse through The Modern Software Industry Dictionary.

Object-Oriented
adj.

Good. Examples: "But is it an Object-Oriented language?", "Ewww, what an ugly, non-Object-Oriented interface!" Translation: "I don't know why I want something to be OO, but I want it, and I want it bad, 'cause everybody does. All the pros. I'm a clueless programmer thinking of myself as a pro. Maybe I'll get a clue some day, but don't put your money on it".
Bad. Example: "Just wonderful! What a huge pile of crud. 'Object-oriented' and all. These people have to make everything complicated." Translation: "Unlike my 'we-are-pros-we-like-OO' colleagues, I can't even understand what happens when you override a method. Where's my spaghetti dish?"

Real Time Embedded Software
n.

Goodness. Examples: "In embedded applications, we don't need all those helpers and layers and interfaces." "In Real Time Software, you never have such features." Translation: "Yes, I like to hard-code M copies of N hexadecimal values into a K-screen do-everything function, for very large values of M, N and K. And there's nothing you can do about it, 'cause without me, you can't even boot the board. I'll do as I damn please."
Badness. Example: "Oh, right, that would be too sloooow for a so-called embedded device. Damn, why can't I use a normal CPU?" Translation: "My code is slow, because I'm too lazy to find out why and speed it up. My code also happens to be quite useless, because I'm too lazy to figure out how to make it useful. But you won't see how useless it is, because your hardware is only as fast as 2 or 3 Cray-1s, so it can't run my code."

Design
n.

Goodness. Example: "We'll rewrite this code, and this time, it will be Designed properly." Translation: " I'm a Software Architect. I know Design Patterns. Just look at this code. No Structure to it. It's time for someone like me to take over this project. Look, a loop! Calls for the Iterator Pattern. Ummm... Wanky wanky! Could you please close the door? It kinda feels awkward in public. Please don't interrupt my work."
Badness. Example: "Yeah, they want to 'design' it. Talking and writing and writing and talking. Instead of getting work done." Translation: "I write crap. We ship crap here, in case you didn't notice. Crap! Ever saw crap? Now get out of my way or I'll crap all over you!"

I don't know what's attached to "OO" or "RT/Embedded" or "Design" in your brain. Maybe it's something sensible. Maybe you know that there are different object systems and why they are useful or useless in your context and how to get by without an object system at all. Maybe you think that "embedded" means that a digital processor is designed to be a part of a specific system and run a specific application as opposed to being a general-purpose delivery platform. Or maybe you think that "embedded" means "low-end" or "small" the way "embedded" cross-compilers targeted at Windows CE seem to interpret the term. Maybe these words mean something to you.

I'm not going to discuss whether you attach the right or the wrong meaning to the words. I'm saying that attaching any meaning to them at all makes you a part of a minority. Whenever I hear "OO" or "RT/Embedded" or "Design", I know there can be only 3 options, listed in the decreasing order of probability:

The speaker is clueless or dumb or evil or all three combined.
The speaker managed to not notice that most frequently, the words are used by the people from option 1.
The speaker noticed it, but decided to heroically ignore it, because the word has a meaning, damn it, and I'll still use it, risking being classified as an option 1 guy, because I'm a Linguistic Hero.

All 3 options are suboptimal. Which is why I try to avoid these words altogether. It isn't very hard, because the concepts are fairly "thin" – you can simply fall back to first principles below them. Instead of "that's not Object-Oriented", you can say "What if we need another type of query? Why not have a class Query with a handle() method instead of the switch?" Instead of "that's not suitable for Real Time Software", you can say "a dictionary lookup would be slow; why not keep a pointer to the value inside the key object?". Instead of "that code has poor Design", you can say "I'm going to rewrite that code, because it's one big mess. What's that? 'It works?' Well, if you're under that impression, I'll be happy to transfer my responsibility for its maintenance to you. Oh, suddenly it doesn't look like 'working code' anymore, does it?"

And so on. I don't find myself needing a special word most of the time. Except for one case: "high-level" and "low-level". Maybe it's because I don't quite understand the underlying first principles, but for whatever reason, I really need a shortcut for that. Sadly but expectedly, "high-level" has made it to The Modern Software Industry Dictionary long ago. I planned to look it up and share the definition with you, but it turned out to be really long, with several different ways of equating "high-level" with "good" or "bad". So I decided to spare it and go straight to the definition found in the yosefk's Small Dictionary of Special Words He Still Can't Avoid:

High-level
special adj.

An attribute of formal interfaces, such as programming languages, operating systems, computer programs and libraries, and mathematical notation. An interface A has a higher level of abstraction than an interface B with respect to a concept C if the following holds: C can be expressed in terms of both A and B, but A comes with a special word for C.

Why do I need a special word for the property of having special words so badly? I'm not sure yet. Lots of angles. Complicated. Perhaps examples will help.

Example 1: C and C++ don't have reflection. This means that you can't take a pointer to an object and serialize it, because you don't know the size and the layout of the object pointed by the pointer.

Example 2: C and C++ don't have arrays. When you see a pointer, you don't know how many adjacent objects are pointed by it. This means that if you parse your compiler-specific debug information format, creating non-portable reflection, you still can't serialize arbitrary objects. Because if an object has a member pointer, it can really be a member array of unknown size. Oh, and it can point to a dead object. It can also point to a Segmentation fault or an Access violation, depending on your target OS. All these are non-problems in Java, which has reflection, arrays and garbage collection.

Example 3: MS Windows has buttons and menus and scroll bars and stuff. The X Window System doesn't have any of that. Instead, it comes equipped with design principles, such as "If a problem is not completely understood, it is probably best to provide no solution at all" and "If you can get 90 percent of the desired effect for 10 percent of the work, use the simpler solution." This means that on Windows, you can change the way widgets look throughout the system, and with X11, you can't.

Example 4: MS Windows defines a format for keeping the text & layout of dialogs, menus and the like. This crud is called "resources" and comes with a "resource compiler", translating an ad-hoc human-editable format into the binary machine-friendly format, and a set of clunky APIs for manipulating the binary format. This means that you can create an editor for translating the text on the buttons into a different language (and replace icons having text on them and flip dialog layouts for right-to-left locales). You can't do this to a program based on the knowledge that it uses X11 for the GUI. Of course you can only do this to a Win32 program if it was designed by drones who sticked to the ad hoc format by Micro$oft. If the program was written by Real People who threw together their own ad hoc format, you're helpless.

Example 5: C doesn't have garbage collection, but at least it comes with malloc and free (unlike COBOL, for example). You can and will forget to call free on a malloced chunk, but it is possible to build a platform-specific tool for tracing calls to malloc and free, and listing the call stacks at which leaked blocks were allocated. Of course nothing prevents people from reserving a large chunk of bytes and rolling their own memory management in this chunk. Using a plethora of custom "memory pools" is a C++ weenie's favorite way to defeat the tools for memory leak detection.

Example 6: Did I mention that C doesn't have arrays? Well, it doesn't. Suppose you have a function working with int* a and int* b, supposedly pointing to two arrays of length n. Of course they can really point to the same array, and the compiler has to make sure the code works in this case. And that would be the only problem if a and b were arrays. But they aren't. They are pointers. So a can point to b+1 or b-2 or wherever it damn pleases. This gets in the way when compilers optimize code as simple as for(i=0;i<n;++i) a[i]=b[i]*2; because, say, loading 4 values from b before storing values into a changes the semantics. This is known as the pointer aliasing problem and is a major pain in the ass for optimizing compilers; years and years passed before restrict was standardized, few people use it, and almost nobody knows what it really means.

Example 7: HTML has a concept of "document content". That's why if you have a WordPress blog, you can change your broken "theme" or "skin" or whatever they're called to some other broken theme or skin, and all your precious old content will get a spiffy new look. I think that MS Word doesn't have a user-visible high-level concept of "document content". The closest thing to it is probably the support for clipboard formats such as plain text or HTML. The latter isn't very appetizing. Much as I hate the WordPress entry composition window, I dropped the idea of copying from Word and pasting to WordPress, because there's too much crud to clean up.

My way to summarize this is as follows. In natural language, I find myself trying to avoid "special words" because they tend to be used for so many different things, so nobody can tell which one of those things you meant. And using "lower-level" sentences instead of "higher-level" words isn't much of a problem. People are awfully good at taking my low-level text and "raising" it back to the original high-level concept. But with formal machine-readable languages, it's the exact opposite, because machines are awesome at lowering and they suck at raising. Machines don't have a problem figuring out what a Java array or a Win32 resource or a C call to malloc means. You never get to a point when a machine tells its fellow machine, "I've heard programs say 'malloc' so many times in so many different contexts that I just don't know what it means anymore". On the other hand, try building a program reading your code and spotting all places where custom memory management occurs and instrumenting these spots to monitor leaks at run time. Machines can't do that, but people can ("oh, look, char g_pool[maxturd]!").

People are stunningly good at raising things to a higher level. I think this is one reason people believe in X11-style design principles ("provide low-level mechanisms, leave high-level policies to users"). There's another reason to favor this approach – it results in less work. This reason is perfectly legitimate – you only have that much time. After all, if you have infinite resources, everything becomes a no-brainer (you can compute the optimal chess move in every position, for example, so the game becomes rather boring). Basically, the only thing "smart" means is "capable of sensibly utilizing finite resources". So if I lack resources for defining and implementing a higher-level interface, and I cop out, don't call me names until you can squeeze that work into a budget like mine.

But going low-level kills metaprogramming, so treating it as a value instead of a necessary compromise is, I think, wildly wrong. Metaprogramming can be considered worthless from the user perspective. There are actual pseudo-business-oriented people out there who label things as economically worthless because "users don't see them". These people deserve a separate discussion (and a spanking, but we don't do physical violence here; stay tuned for some R-rated virtual violence on the subject). The whole point of the examples above was to show how metaprogramming, made possible by high-level interfaces, enables implementation of user-visible features or detection of user-visible bugs. The funniest thing on the list of features is performance (definitely user-visible in a variety of interesting cases). Everybody knows that "higher-level is slower", for many values of "everybody". However, program optimization is just one kind of metaprogramming; what makes metaprogramming hard, makes optimization hard. The lack of arrays in C and C++ gets in the way when you serialize just like it gets in the way when compilers optimize. If you're going to deal with arrays, you better have a special word for it in your language.

Does this mean that "high-level" is, at the bottom line, a synonym for "good", and should rightfully be listed in The Modern Software Industry Dictionary as such? I wish I could say yes. I think that high-level/low-level is an extremely complicated question. Angles and all. I think I'll spill out some more half-baked thoughts on the subject. I also think that choosing the right level of abstraction is The Grand Challenge when you design something. Oops. Um, forget that I used that word. I meant "when you define something". This word seems to still have a meaning. I wonder how long it will last.

1. anonFeb 26, 2008

In your criticism of X11 vs Win32, you forget the X11 is a protocol (just like FTP or HTTP). Win32 is more accurately compared to GTK+ or Qt, both of which use standardized configuration formats and define easily theme-able widgets.

As an aside, X11's simplicity is the reason why UNIX has had true networked GUIs since the '70s, and Windows is still limping along with blitted bitmaps.

2. Yossi KreininFeb 26, 2008

I wasn't judging the X11 protocol. I said that it didn't provide a standard for the things that GTK+ or Qt provide. If it did provide those things, it could be another layer, not necessarily a change to the existing layer. This would solve many interoperability problems. The availability of many different widget libraries doesn't refute my claim, but proves it, because /now/ there is absolutely no way to reach a state where you have just one widget library, is there? Which is why there should have been a standard one from the beginning.

There's a big, big difference between theory and practice. Having one interface instead of N is an example of a problem which lacks theoretical significance but is extremely important in practice. std::string, CString, QString. When you have to convert between those, the fact that C++ is more "generic" than Java is very comforting indeed.

3. kragenApr 23, 2008

Most of the stuff in Win32's GUI APIs is in GTK+ and Qt. It's just that by virtue of X11, you don't have to choose whether you're going to run a GTK+ desktop or a Qt desktop, the way you have to choose between running an X11 desktop or a Win32 desktop; you can run GTK+ apps and Qt apps at the same time, which is pretty awesome. Similarly, getting two C++ libraries to work together, while no picnic, is a lot easier than getting a Perl library to work with a Python one.

As you point out in your "low-level is easy" post, being low-level means that more things get built on top of you. Sometimes that enables connections between things built on top that would otherwise have been kept separate without the shared substrate.

4. kragenApr 23, 2008

(I think Qt is enough better than, say, Open Look, that people would have switched desktop environments to be ale to program in Qt instead of Open Look. But I'm glad I can still use all my pre-Qt apps.)

5. Yossi KreininMay 18, 2008

Regarding integration of C++ libraries: try to integrate MFC GUI code, Qt GUI code, bare Xlib GUI code, and GTK+ GUI code.

Regarding integration of Perl and Python libraries: you can spawn a process and serialize function arguments and results. Some things will work smoothly and some won't, but nothing of the kind won't work decently with C++.

Regarding choices: I don't want to fucking choose between Ctrl-V, Shift-Insert and middle click for "paste". I want standard commands to fucking work everywhere. Have the fucking system support a fucking sane clipboard with standard copy and cut and paste commands and standard formats for images and rich text, without every fucking app reinventing the fucking wheel.

The problem with having lots of things built on top of you is that sometimes those things should follow a common protocol. Leave those things out of your design, and you got yourself a tower of Babel. "Lots of things on top" is good when they don't need to reinvent protocols, and being to low-level can thus suck big time.

6. rfAug 23, 2009

your definition: "An interface A has a higher level of abstraction than an interface B with respect to a concept C if the following holds: C can be expressed in terms of both A and B, but A comes with a special word for C."

how about: "[..] if the following holds: A includes a construction which encompasses C as a special case whereas B does not".

7. Yossi KreininAug 25, 2009

@rf: sounds like about the same thing.

8. Samuel BronsonOct 29, 2010

Actually, Word does (or, more accurately, can do) a lot better with separating content/style than most people are aware: certainly it was much better than what OO.o had last time I compared them.

Instead of attaching formatting directly to your text, you place it in "styles" and apply those to your text. There are also tools to clean up any places where you slipped up and applied formatting directly.

I suspect that this stuff exists because Microsoft actually uses Word for a lot of their technical documentation, and dealing with large documents without these features would be an exercise in tedium.

The whole WYSIWYG thing, the lack of a macro language (no, not like VBA! I mean something more like TeX or DSSSL), really lousy figure placement, annoying cross-reference handling, and the all-but-nonexistance of the documentation combined leave me largely unsatisfied, though.

9. Yossi KreininOct 30, 2010

Styles in Word? Interesting. I must admit that my knowledge of Word is very scarce, as is my knowledge of alternatives or more generally most kinds of end-user programs...

Why don't we have a word for it?

Post a comment