February 24th, 2008

Yeah, naming conventions. Looks like my brain won't do any better today; those 5 drafts will have to wait. If you aren't in a mood for a trivial subject, skip this.

I think that the best naming convention out there is the Lisp one: case-insensitive-dash-separated. It just doesn't get any better:

Unfortunately, most languages use C-style identifiers for names, the dreaded [A-Za-z_][A-Za-z_0-9]*, because their infix parsers can't tell a dash from a minus. So you can't use this convention.

This leads to two problems:

  1. How do we separate between subsequent words in an identifier?
  2. When do we capitalize letters?

Of course, we could use a lowercase_underscore_separated convention. It would solve both problems in a simple way, having all the benefits of the Lisp convention except for the no-Shifts-and-Caps-Locks part. But (1) Caps Lock is available for capital letters, but not for underscore, and Shift is less healthy for your hands, and (2) if we have case sensitivity in our language, we'll of course use it, won't we? OK, let's kill those underscores.

There are two anti-underscore schools: alllowercase and CamelCase. alllowercase looks lame – it makes it easy to know when to capitalize letters (never), but chooses to ignore the word separation problem completely. I used to sneer at it. However, it has two huge benefits: it's very typing-friendly, and it discourages the use of long names. Long names, people, are a frigging nightmare.

HaveYouEverSeenANameTakingHalfAScreen? This is awful. Awful!! I can't lock my eyes on the damned thing. I can only focus on a tiny part of it. My eyes nervously jump around the line, which mentions the moronic mega-identifier twice (at both parts of an assignment). I'm looking for differences, small differences in the names... You know, it could be BlahBlahABlah on the left and BlahBlahBBlah on the right... AAAARGH!

Reading this kind of code is pure mental pain. I prefer mental pain to physical pain on any day, and that's why I'm in the software industry, but still, this sucks. The good news are that alllowercasenametakinghalfascreen is so ridiculous that even the most clueless pseudo-orderly person won't emit it.

Now, CamelCase, which is basically the winner, because it's used in all major languages and libraries, is probably the worst possible naming convention. It fails to solve both problems created by the lack of a good word separator in the A-ZA-Z0-9 languages:

  1. You don't really know when one word ends and the next word starts.
  2. You don't really know when a letter should be capitalized.

The problems of camel case come from using capital letters for word separation. This interferes with the other uses of case in natural language. The problems are amplified by the brilliant idea to assign even more semantical payload to case: functionsLookLikeThis, but ClassesLookLikeThis, etc. Let's look at some examples.

English has words like TCP, DNA and WTF. Should a TCP socket class be called TCPSocket or TCPsocket? What about a TCPIPSocket? What if we need a tcpOpen method – should we call it TCPOpen, like a class, to preserve the natural case of an acronym, or should it be TCPopen, so that the lowercase "o" conveys the fact that it's a function?

Oh, I know, it should be "openTCP"! No, no, why are you using "openTcp" – this is ugliness for its own sake! The only important thing is to get the first letter of a name right, and then you can use natural capitalization! Unless, of course, it's "openTCPIPSocket", and then we have a problem again. "openTcpIpSocket"?.. Some people just can't handle it and resort to underscores: openTCP_IPsocket, open_TCP_IP_socket... It's no use. It's ugly no matter what you do.

Capital letters coming from the natural language, like those in acronyms and names of people, are the smaller part of the problem – Tcp looks ugly, but you know what it means. The other part of the problem is the capital letters coming from formal languages, such as mathematical notation.

For example, in computer vision it's common to denote 3D coordinates with uppercase X,Y,Z, and 2D coordinates with lowercase x,y. In a case-sensitive language, it's damn natural to follow this convention, and it works very well for a local variable X or x (including the case when you use both in the same function). It doesn't work so well when you try to name functions or classes after their arguments/coordinate systems.

Does xySomething start with a lowercase x because it's a function, or because it really accepts x values of 2D coordinates? What about xYSomething – is the Y capitalized because "y" is a word and we always capitalize the first letter of a word, or maybe the function expects Y values of 3D coordinates?

You can have a function working with 3D X coordinates and 2D y coordinates, you know. I think it's better to call it XySomething than xYSomething, because meaning is more important than convention. But did the author of the function think so, too? Of course, we can use an underscore to "clarify" the intent: something_Xy. The underscore clearly shows that the part after the underscore doesn't follow standard naming conventions, so it must be according to the computer-vision-specific convention.

So what happens is that CamelCase code deteriorates to the following state:

  1. You have ugly names like tcpIpOpen.
  2. Since you also have names like TCP_IP_Open, your real naming convention is "camel case with underscores". Which is equivalent to "any identifier that compiles".

Maybe there's a good way to augment CamelCase with rules that make it work well. I probably wouldn't know. I ought to say that I'm not that good at naming conventions in particular and in Best Practices in general. But I doubt there's a good case-sensitive naming convention out there.

Just look at the Python naming convention. You basically have everything. thingslikethis, ThingsLikeThis, things_like_this, thingsLikeThis, and they're all attached to different types of object (module, class, function, method). And every time your language entity convention disagrees with the common sense (class TCPIPSocket), you've got yourself an ugly name. And in a way, this is a good convention, because it at least tries to be consistent with the common conventions used in C, C++ and Java.

The annoying part of this is the slowdown. "Um, how should I spell this name?.." There are actual capitalization trade-offs here. Programming is almost exclusively about making decisions and choosing trade-offs. It's quite tiring, really. Nobody wants to be making some more pointless decisions on the way just for the fun of it. Maybe it's just me and the kind of people I've worked with, but I've always, always bumped into lots and lots of names which looked like a compromise. Somebody was thinking hard here. And it looks ugly anyway.