February 24th, 2008

Yeah, naming conventions. Looks like my brain won't do any better today; those 5 drafts will have to wait. If you aren't in a mood for a trivial subject, skip this.

I think that the best naming convention out there is the Lisp one: case-insensitive-dash-separated. It just doesn't get any better:

Unfortunately, most languages use C-style identifiers for names, the dreaded [A-Za-z_][A-Za-z_0-9]*, because their infix parsers can't tell a dash from a minus. So you can't use this convention.

This leads to two problems:

  1. How do we separate between subsequent words in an identifier?
  2. When do we capitalize letters?

Of course, we could use a lowercase_underscore_separated convention. It would solve both problems in a simple way, having all the benefits of the Lisp convention except for the no-Shifts-and-Caps-Locks part. But (1) Caps Lock is available for capital letters, but not for underscore, and Shift is less healthy for your hands, and (2) if we have case sensitivity in our language, we'll of course use it, won't we? OK, let's kill those underscores.

There are two anti-underscore schools: alllowercase and CamelCase. alllowercase looks lame – it makes it easy to know when to capitalize letters (never), but chooses to ignore the word separation problem completely. I used to sneer at it. However, it has two huge benefits: it's very typing-friendly, and it discourages the use of long names. Long names, people, are a frigging nightmare.

HaveYouEverSeenANameTakingHalfAScreen? This is awful. Awful!! I can't lock my eyes on the damned thing. I can only focus on a tiny part of it. My eyes nervously jump around the line, which mentions the moronic mega-identifier twice (at both parts of an assignment). I'm looking for differences, small differences in the names... You know, it could be BlahBlahABlah on the left and BlahBlahBBlah on the right... AAAARGH!

Reading this kind of code is pure mental pain. I prefer mental pain to physical pain on any day, and that's why I'm in the software industry, but still, this sucks. The good news are that alllowercasenametakinghalfascreen is so ridiculous that even the most clueless pseudo-orderly person won't emit it.

Now, CamelCase, which is basically the winner, because it's used in all major languages and libraries, is probably the worst possible naming convention. It fails to solve both problems created by the lack of a good word separator in the A-ZA-Z0-9 languages:

  1. You don't really know when one word ends and the next word starts.
  2. You don't really know when a letter should be capitalized.

The problems of camel case come from using capital letters for word separation. This interferes with the other uses of case in natural language. The problems are amplified by the brilliant idea to assign even more semantical payload to case: functionsLookLikeThis, but ClassesLookLikeThis, etc. Let's look at some examples.

English has words like TCP, DNA and WTF. Should a TCP socket class be called TCPSocket or TCPsocket? What about a TCPIPSocket? What if we need a tcpOpen method – should we call it TCPOpen, like a class, to preserve the natural case of an acronym, or should it be TCPopen, so that the lowercase "o" conveys the fact that it's a function?

Oh, I know, it should be "openTCP"! No, no, why are you using "openTcp" – this is ugliness for its own sake! The only important thing is to get the first letter of a name right, and then you can use natural capitalization! Unless, of course, it's "openTCPIPSocket", and then we have a problem again. "openTcpIpSocket"?.. Some people just can't handle it and resort to underscores: openTCP_IPsocket, open_TCP_IP_socket... It's no use. It's ugly no matter what you do.

Capital letters coming from the natural language, like those in acronyms and names of people, are the smaller part of the problem – Tcp looks ugly, but you know what it means. The other part of the problem is the capital letters coming from formal languages, such as mathematical notation.

For example, in computer vision it's common to denote 3D coordinates with uppercase X,Y,Z, and 2D coordinates with lowercase x,y. In a case-sensitive language, it's damn natural to follow this convention, and it works very well for a local variable X or x (including the case when you use both in the same function). It doesn't work so well when you try to name functions or classes after their arguments/coordinate systems.

Does xySomething start with a lowercase x because it's a function, or because it really accepts x values of 2D coordinates? What about xYSomething – is the Y capitalized because "y" is a word and we always capitalize the first letter of a word, or maybe the function expects Y values of 3D coordinates?

You can have a function working with 3D X coordinates and 2D y coordinates, you know. I think it's better to call it XySomething than xYSomething, because meaning is more important than convention. But did the author of the function think so, too? Of course, we can use an underscore to "clarify" the intent: something_Xy. The underscore clearly shows that the part after the underscore doesn't follow standard naming conventions, so it must be according to the computer-vision-specific convention.

So what happens is that CamelCase code deteriorates to the following state:

  1. You have ugly names like tcpIpOpen.
  2. Since you also have names like TCP_IP_Open, your real naming convention is "camel case with underscores". Which is equivalent to "any identifier that compiles".

Maybe there's a good way to augment CamelCase with rules that make it work well. I probably wouldn't know. I ought to say that I'm not that good at naming conventions in particular and in Best Practices in general. But I doubt there's a good case-sensitive naming convention out there.

Just look at the Python naming convention. You basically have everything. thingslikethis, ThingsLikeThis, things_like_this, thingsLikeThis, and they're all attached to different types of object (module, class, function, method). And every time your language entity convention disagrees with the common sense (class TCPIPSocket), you've got yourself an ugly name. And in a way, this is a good convention, because it at least tries to be consistent with the common conventions used in C, C++ and Java.

The annoying part of this is the slowdown. "Um, how should I spell this name?.." There are actual capitalization trade-offs here. Programming is almost exclusively about making decisions and choosing trade-offs. It's quite tiring, really. Nobody wants to be making some more pointless decisions on the way just for the fun of it. Maybe it's just me and the kind of people I've worked with, but I've always, always bumped into lots and lots of names which looked like a compromise. Somebody was thinking hard here. And it looks ugly anyway.


1. ZungBangFeb 25, 2008

Aesthetics and typing ease are just part of the problem – Hungarian notation.

2. buzaanFeb 26, 2008

A while back I considered modifying emacs to convert names-like-this to names_like_this automatically when in C or Python mode, but didn't know enough elisp to do so. I still don't, but maybe someday...

If there is anything good to be said about C++ it's that the long and frequent build times give your fingers a chance to rest. Writing python using CamelCase leaves my pinky fingers aching by the end of the day. I hate to think of what they'll be like in a few years.

3. kragenMay 14, 2008

I always figured that CamelCase was a result of the dumb decision to make _ be ← in Smalltalk, and that all the other languages where you see it are that way as a result of aping the styles of former Smalltalk programmers.

4. shadytreesJun 4, 2008

PEP 8, the most recent version of the style guide, has the bastardCamelCase eliminated. Method names use the same convention as function_names.


5. Yossi KreininJun 4, 2008

Great! Now all the people who took the old convention seriously will feel bad for a variety of reasons. I know they will.

I hate coding conventions. And I hate camel case.

Interestingly, they've converged to the same thing I pretty much did – I write everything_like_this except for TypeNames, which would look ugly This_Way and it kinda sucks to spell them this_way because they're capitalized, like, everywhere, and the type_name_t convention from C really sucks.

Considerations. It's all very deep.

6. Andreas RumpfJan 10, 2009

This is the reason why Nimrod is case insensitive and ignores underscores in identifiers. It also solves the macro problem that you refer to in one of your other articles by basing its AST on a single node type. Check it out, you might like it!
Your blog contains great articles, by the way.

7. Yossi KreininJan 10, 2009

Thanks! Regarding the language – interesting stuff; a bunch of things reminding of other languages suggest that you know plenty :) [almost certainly more than myself...]

I'll refrain from detailed comments as I've been neglecting the part of the brain which makes sense of PL information for quite some time, so these days I'm a very low-end commenter :) In particular, I lost hope in combining efficiency and elegance to the point where I'm disgustingly pessimistic compared to most people with practical interest in PL design.

8. Steve FoltaAug 4, 2009

I certainly agree that hyphen-separated is the way to go, which is why I made my language accept it. But a name may not start with a hyphen, which solves the unary-minus problem. (This means you need whitespace around a binary minus, which I think looks much better anyway. Also, identifiers are case-sensitive. And the Smalltalk camel-case legacy survives as the convention for (capitalized) class names, my one concession to history/Hungarianism.)

9. Yossi KreininAug 4, 2009

An interesting compromise, although could bite whoever copies expressions verbatim from another infix language – not that an error [as opposed to silently doing the wrong thing] is that bad a bite.

10. 4:31 ← pseudopost.orgOct 22, 2009

[...] Shared IHateCamelCase. [...]

11. DmitryFeb 3, 2010

choose different starting symbols. i.e.
types: $int $real
vars: a b c (none)
enum-items: #sunday #monday

keywords: `if `else

12. AtlantAug 17, 2010

kragen: It's quite likely that Smalltalk chose to treat "_" as "←" because early Teletypes were conflicted about what that code point should print; on early Teletypes, it actually did print a back-arrow (such as you might use to indicate you've corrected a previously-typed character) whereas on later Teletypes (as ASCII became more-standarized) it printed an underscore.

13. CarolAug 17, 2010

I like camel case because it is compact. I am a slow typist and hence my hands never hurt (after 20 years or writing on all kinds of keyboards).
I find natural language inefficient and badly designed, therefore I do not let it interfere more than it needs to with the well crafted programing language grammar.
By publishing and abiding by a few simple rules, including limiting the maximum size of an identifier, camel case can be quite effective.
I almays hated the "m_" convention for members. to me a Hungarian style m would suffice. Underscore is hard to type for me; and so is dash. I tend to be much better about numbers and digits at the center of the keyboard. My early limited keyboards with only 40 keys whould require lots of shifts to type even dashes.

14. CarolAug 17, 2010

For a long while I favored single letter identifiers. They were so much faster to type.

15. trinithisApr 10, 2012

Know what's also unreadable? opentcpipsocket. Your arguments against camel-case also apply to your ideal case. Retarded argument. Since we are talking about naming conventions, forbid openTCPIPSocket and require only the first letter of each acronym to be capitalized: openTcpIpSocket

16. Aristotle PagaltzisApr 10, 2012

His ideal case is lowercase with dashes, i.e. "open-tcp-ip-socket". How do the arguments apply to that?

17. aApr 12, 2012

I find djb's code the easiest to read. He has a great appreciation for brevity and succinctness.

The more text you can get to be left-aligned and the shorter you can keep the line width, the easier reading becomes. Also, the more the text follows as pattern, the easier it is to digest, e.g.



bbbb bbbb
bbbb bbbb
bbbb bbbb

Code as ASCII art.

Maybe that's why print book and magazine publishers often used two columns. It's easier to digest.

Why not use abbrevations, one-letter identifiers if you wish, and provide and index showing what each represents, as a comment? vi-style markers let you jump to the index quickly.

I too hate CamelCase. However, I have great respect for the smallness, speed and simplicity of TurboPascal, which is heavy on CamelCase. Some people just like it, and I guess Wirth is among them.

Why they like it is a mystery.

18. MikeMay 8, 2012

Yes, I can't stand CamelCase, especially when the first letter is lower case for methods or variables. So I tend to piss off the other devs by using_underscore_variables. CamelCase makes sense for class/struct names, but that's about it. Laziness doesn't count as a criteria for me, and monitors these days are all big enough for even ridiculously long names.

19. Yossi KreininMay 8, 2012

Actually, I prefer short names, not because of screen limitations, but because of cognitive limitations – a_long_variable_name is hard to absorb in one glance, and to distinguish from a_longer_variable_name.

20. SamMay 19, 2012

Good post. Upper case is an abomination in both code and file names, especially nowadays where every system is case sensitive (except of course the retarded Windows) and one typo can lead to a disaster.
A simple proof that CamelCase doesn't work is to look at code and file names around you. It's all over the place.

21. JrudMay 23, 2012

I prefer camel case over anything else. Maybe if I had to deal more with 2D vs 3D coordinates it would be a different story, but I hate having to hit out-of-the-way keys like the dash. I never took an official typing class, so when I hit special characters(like the dash) my entire wrist moves instead of just the pinky finger like I believe you're supposed to.

22. KillahMay 4, 2014

Pretty much all of the points you gave against camelCase are wrong when you are in the OOP world:


And FFS who uses lower-case x, y for 2D coordinates and upper-case X, Y, Z for 3D ones? As far as I can remember in OpenGL (and pretty much everything graphics related) there is just Vector2f, Vector3f etc... Also, it's pretty simple to spot that you have 3 arguments in 3D and 2 in 2D.

23. Yossi KreininMay 4, 2014

Erm... why not TCP_IP_Socket? What if it's CustomTCPIPSocket – how is that different from openTCPIPSocket? Why can't a class conceivably have an openTCPIPSocket method?..

"Who uses x,y/X,Y to distinguish between coordinates" – a common computer vision convention; in vision you mix 2D (image) and 3D (world) coordinates in expressions so it's not about spotting the number of arguments that's different in different contexts.

24. Steven G.Jun 6, 2014

Well put Yossi, well put.

Additionally, I'd like to point out an interesting article about how humans interpret written words. http://en.wikipedia.org/wiki/Typoglycemia

Removing the spacing between words effectively messes with the brains ability to easily interpret them, and also reinforces your points about lengthy names being bad for your mental health.

So many interesting points. Thank you.

25. kragenJun 13, 2014

atlant: yes, Smalltalk, even though its development started around 1970, used the older ASCII-1963 interpretations of ← and ↑ for _ and ^, which it interpreted, respectively, as assignment and return. ASCII-1963 didn't mandate lower-case, but it did leave a big open space for lower-case that ASCII-1967 filled in.

But why would a language whose first identified version was Smalltalk 72 in 1972 be using the ASCII-1963 character set instead of the ASCII-1967 character set? Probably because they wanted ← and ↑, and in 1972 using your own weird character set was a reasonable thing to do, even though looking back 42 years later (or 36 years later, when I wrote my comment above that you replied to) it seems like an obviously dumb idea. I mean, APL was still popular in 1972, IBM mainframes with EBCDIC were sort of dominant, PDP-10s could handle bytes of any size but usually used their own weird six-bit character set, and there were even UNIVAC mainframes using FIELDATA, although they were less popular than IBM's mainframes and DEC's mega-minis.

Smalltalk, also, was noted for its tendency to prioritize doing cool things over compatibility. And that's why it's still marginally relevant 42 years later, while EBCDIC, FIELDATA, UNIVAC, and PDP-10s and SIXBIT have been consigned to the dustbin of history.

26. Yossi KreininJun 14, 2014

...or "that's why it's only marginally relevant while worse languages from those days are still extremely relevant"... :)

27. LuizJul 22, 2014

I "solve" the problem with the CamelCase by using custom text render that detects them on highlighting and change the colors intensity of the capitals, thus making it very easy to see the both word separated. I could put a space between the words, but I use a monospaced font and I don't bothered to play with the kerning.

28. Yossi KreininJul 24, 2014

Wow! Cool stuff. Does it work in vim? :)

29. massauAug 18, 2014

your not having to use the shift argument is acutually a bad one you ahve to pres the shift button when you want to type an underscore wich is one of the most distant characters on the keyboard.
a better button would have been the \ on the left side of the board without caps or shift.

but if you as a programmer use long words than please use the _ or just mix them it is much better.

30. JonOct 15, 2014

Left to my own devices, I've decided this style works for me:

* functions_and_variable_names
* ClassNames

I don't even bother with lowerCamelCase, as it's not visually distinct enough from UpperCamelCase.

I also prefer to treat initialisms as 'words' for the purposes of capitalization, so something like this is representative:

XmlParser xml_parser = SpecificXmlParser(SOME_XML_CONSTANT);

31. SvendJun 24, 2015

I do a lot of camel cased programming, but it rarely bothers me too much. Part of the reason is probably auto-correction that pretty eliminates the need to actually type most of the characters in any already defined identifier. This is triggered automatically, so I also don't have to type Ctrl-Space or anything like that. I still think that Camel Case is the best solution to all of the problems specified in the article.

32. Gabriel SharpSep 1, 2015

back in the late 80's we were taught that what you all state is 'CamelCase' now was actually (for us veterans) 'camelCase' its funny to me seeing the word 'CamelCase' over and over, since it ignores the very reason it got that name. Back then we had ProperCase lowercase UPPERCASE MiXeDcAsE (sometimes called interleaved case, stupid right?) and then, of course, camelCase. The reasoning is that a camel has a bump __IN THE MIDDLE__ — suggesting that ProperCase is CamelCase now basically is like putting a big lump on the poor old camel's head. "Okay, who punched the camel!?" **not that it matters but it was just bothering me to read about it so much so i drop this on you guys right here because it was the first easy place to do it :) **


Long Live PunchCard Programming!!!

33. AnonymousMar 25, 2017

CamelCase and PascalCase are cancer. Just remap the underscore key to another key and you're good to go.

34. TaylorMFeb 13, 2018

Python often uses C/C++ libraries, and there is a useful mixed convention. Snake case is used for python objects and camel case is used for calls in the external library. It's important to keep track of the difference in many cases. Object Destructors are handled entirely differently, for example. (Ex: In PyQt, you would call QtObject.delete() to remove it from the Qt memory, and 'del QtObject' to delete the python.)

35. Mark SuttonMar 12, 2018

You missed an ambiguity, at work, we are forced to use an annoying camel case style, but where "only the first letter" is capitalised, so TCPOpen would be TcpOpen, but hold on ! TCP is an abreviation of Transmission Control Protocol, so that should be TransmissionControlProtocolOpen, so if we remove the extra letters that becomes TCPOpen. At least, this is how I interpret it, because TcpOpen is just so darned UGLY. I hate camel case with a vengence.

Post a comment