Entries Tagged 'software' ↓

The C++ FQA is on GitHub

I have become burned out in my role as guardian of the vitriol, chief hate-monger… and most exalted screaming maniac.

– Alan Bowden, then the moderator of the UNIX-HATERS mailing list

The C++ FQA's "source code" (that is, the ad-hoc markup together with the ugly code converting it to HTML) is now on GitHub.

I decided to write the FQA around 2006. I felt that I should hurry, because I'd soon stop caring enough to spend time on it, or even to keep enough of it all in my head to keep the writing technically correct.

And indeed, over the years I did mostly stop caring. Especially since I managed to greatly reduce my exposure to C++. The downside, if you can call it that, is that, in all likelihood, I'll forever know much less about C++11 than I knew about C++98. I mean, I can use auto and those crazy lambdas and stuff, but I don't feel that understand all this horror well enough to write about it.

So I invite better informed people to do it. I was looking for a single person to take over it for a while. The trouble is, most people disliking C++ choose to spend less time on it, just like me. And since writing about C++'s flaws is one way of spending time on it, people who want to do it, almost by definition, don't really want to do it. Bummer!

However, doing a little bit of editing here and there might be fun, it wouldn't take too much of your time, and you'd be creating a valuable and scarce public good in an environment where naive people are flooded with bullshit claims of C++11 being "as fast as C and as readable as Python" or some such.

I don't think my original tone has to be preserved. My model at the time was the UNIX-HATERS style, and in hindsight, I think maybe it'd actually turn out more engaging and colorful if I didn't write everything in the same "I hate it with the fire of a thousand suns" tone.

Also, and I think this too was needlessly aped from UNIX-HATERS – I keep insisting on this good-for-nothing language being literally good for nothing, and while I don't outright lie to make that point, I'm "editorializing", a.k.a being a weasel, and why do that? I mean, it's a perfectly sensible choice for many programmers to learn C++, and it's equally sensible to use it for new projects for a variety of reasons. Do I want to see the day where say Rust scrubs C++ out of the niches it currently occupies most comfortably? Sure. But being a weasel won't help us get there, and for whomever C++ is the best choice right now, persuading them that it isn't on general grounds can be a disservice. So the "good for nothing" attitude might be best scrapped as well, perhaps.

On the other hand, there was one good thing that I failed to copy from UNIX-HATERS - how it was a collaboration of like-minded haters of this ultimately obscure technical thing, where human warmth got intermixed with the heat of shared hatred. Maybe it works out more like that this time around.

Anyway, my silly markup syntax is documented in the readme file, I hope it's not too ugly, and I'm open to suggestions regarding this thing, we could make it more civilized if it helps. In terms of priorities, I think the first thing to do right now is to update the existing answers to C++11/14, and update the generated HTML to the new FAQ structure at isocpp.org; then we could write new entries.

You can talk to me on GitHub or email Yossi.Kreinin@gmail.com

 

 

Fun with UB in C: returning uninitialized floats

The average C/C++ programmer's intuition says that uninitialized variables are fine as long as you don't depend on their values.

A more experienced programmer probably suspects that uninitialized variables are fine as long as you don't access them. That is, computing c=a+b where b is uninitialized is not harmless even if you never use c. That's because the compiler could, say, optimize away the entire block of code surrounding c=a+b under the assumption that c=a+b, where b is proven to be always uninitialized, is always undefined behavior (UB). And if it's UB, the only way for the program to be correct is for this code to be unreachable anyway. And if it's unreachable, why waste instructions translating any of it?

However, the following code looks like it could potentially be OK, doesn't it?

float get(obj* v, bool* ok) {
  float c;
  if(v->valid) {
    *ok = true;
    c = v->a + v->b;
  }
  else {
   *ok = false; //not ok, so don't expect anything from c
  }
  return c;
}

Here you return an uninitialized c, which the caller shouldn't touch because *ok is false. As long as the caller doesn't, all is well, right?

Well, it turns out that even if the caller does nothing at all with the return value – ever, regardless of *ok – the program might bomb. That's because c could be initialized to a singaling NaN, and then say on x86, when the fstp instruction is used to basically just get rid of the return value, you get an exception. In release mode but not in debug mode, some of the time but not all the time. This gives you this warm, fuzzy WTF feeling when you stare at the disassembled code. "Why is there even a float here in the first place?!"

How much uninitialized data is shuffled around by real-world C programs? A lot, I wager – likely closer to 95% than to 5% of programs do this. Otherwise Valgrind would not go to all the trouble to not barf on uninitialized data until the last possible moment (that moment being when a branch is taken based on uninitialized data, or when it's passed to a system call; to not barf then would require some sort of a multiverse simulation approach for which there are not enough computing resources.)

Needless to say, most programs enjoying ("enjoying"?) Valgrind's (or rather memcheck's) conservative approach to error reporting were written neither in assembly which few use, nor in, I dunno, Java, which won't let you do this. They were written in C and C++, and most likely they invoke UB.

(Can you touch uninitialized data in C without triggering UB? I seriously don't know, I'm not a language lawyer. Being able to do this is actually occasionally useful for optimization. Integral types for instance don't have anything like signaling NaNs so at the assembly language level you should be fine. But at the C level the compiler might get needlessly clever if it manages to prove that the data is uninitialized. My own intuition is it can never prove squat about data passed by pointer because of aliasing and so I kinda assume that if I get a buffer pointing to some data and some of it is uninitialized I can do everything to it that I could in assembly. But I'm not sure.)

What a way to make a living.

 

 

Accidentally quadratic: rm -rf??

This is a REALLY bad time. Please please PLEASE tell me I'm not fucked…

I've accidentally created about a million files in a directory. Now ls takes forever, Python's os.listdir is faster but still dog-slow – but! but! there's hope – a C loop program using opendir/readdir is reasonably fast.

Now I want to modify said program so that it removes those files which have a certain substring in their name. (I want the files in that directory and its subdirectories, just not the junk I created due to a typo in my code.)

HOW THE FUCK DO I DO THAT IN O(N)??

O(N^2) is easy enough. The unlink function takes a filename, which means that under the hood it reads ALL THE BLOODY FILE NAMES IN THAT DIRECTORY until it finds the right one and THEN it deletes that file. Repeated a million times, that's a trillion operations – a bit less because shit gets divided by 2 in there, but you get the idea.

Now, readdir gives me the fucking inode number. How the FUCK do I pass it back to this piece of shit operating system from hell, WITHOUT having it search through the whole damned directory AGAIN to find  what it just gave me?

I would have thought that rm -rf for instance would be able to deal with this kind of job efficiently. I'm not sure it can. The excise function in the guts of coreutils for instance seems to be using unlinkat which gets a pathname. All attempts to google for this shit came up with advice to use find -inode -exec rm or some shit like that, which means find converts inode to name, rm gets the name, Unix converts the name back to inode…

So am I correct in that:

  • Neither Unix nor the commercial network filers nor nobody BOTHERS to use a hash table somewhere in their guts to get the inode in O(1) from the name (NOT A VERY HARD TASK DAMMIT), and
  • Unix does not provide a way to remove files given inode numbers, but
  • Unix does unfortunately makes it easy enough (O(1)) to CREATE a new file in a directory which is already full of files, so that a loop creating those bloody files in the first place is NOT quadratic?? So that you can REALLY shoot yourself in the foot, big time?

Please tell me I'm wrong about the first 2 points, especially the second… Please please please… I kinda need those files in there…

(And I mean NOTHING, nothing at all works in such a directory at a reasonable speed because every time you need to touch a file the entire directory is traversed underneath… FUUUUUCK… I guess I could traverse it in linear time and copy aside somehow though… maybe that's what I'd do…)

Anyway, a great entry for the Accidentally Quadratic blog I guess…

Update: gitk fires up really damn quickly in that repository, showing all the changes. Hooray! Not the new files though. git citool is kinda… sluggish. I hope there were no new files there…

Updates:

  • `find fucked-dir -maxdepth 1 -name "fuckers-name*" -delete` nuked the bastards; I didn't measure the time it took, but I ran it in the evening, didn't see it finish in the half an hour I had to watch it, and then the files were gone in the morning. Better than I feared it would be.
  • As several commenters pointed out, many modern filesystems do provide O(1) or O(log(N)) access to files given a name, so they asked what my file system was. Answer: fucked if I know, it's an NFS server by a big-name vendor.
  • Commenters also pointed out how deleting a file given an inode is hard because you'd need a back-link to all the directories with a hard link to the file. I guess it makes sense in some sort of a warped way. (What happens to a file when you delete it by name and there are multiple hard links to it from other directories?.. I never bothered to learn the semantics under the assumption that hard links are, um, not a very sensible feature.)

Things from Python I'd miss in Go

Let's assume Go has one killer feature – cheap concurrency – not present in any other language with similar serial performance (ruling out say Erlang). And let's agree that it gives Go its perfect niche – and we're only discussing uses of Go outside that niche from now on. So we won't come back to concurrency till the very end of the discussion. (Update – the "killer concurrency" assumption seems false… but anyway, we'll get to it later.)

***

Brian Kernighan once called Java "strongly-hyped". Certainly his ex-colleagues' Go language is one of the more strongly-hyped languages of today – and it happens to have a lot in common with Java in terms of what it does for you.

Rob Pike said he thought C++ programmers would switch to Go but it's Python programmers who end up switching. Perhaps some do; I, for one, am a Python programmer not planning to switch to Go – nor will I switch to Go from (the despicable) C++ or (the passable) C.

Why so?

What does Go have that C++ lacks? Mandatory garbage collection, memory safety, reflection, faster build times, modest runtime overhead. C++ programmers who wanted that switched to Java years ago. Those who still stick to C++, after all these years, either really can't live with the "overheads" (real time apps – those are waiting for Rust to mature), or think they can't (and nothing will convince them except a competitor eventually forcing their employer out of business).

What does Go have that Python lacks? Performance, static typing. But – again – if I needed that, I would have switched to Java from Python years before Go came along. Right?

Maybe not; maybe Java is more verbose than Go – especially before Java 8. Now, however, I'd certainly look at Java, though perhaps Go would still come out on top…

Except there are features I need that Go lacks (most of which Java lacks as well). Here's a list:

Dynamic code loading/eval

Many, many Python uses cases at work involve loading code from locations only known at run time – with imp.load_source, eval or similar. A couple of build systems we use do it, for instance – build.py program-definition.py, that kind of thing. Also compilers where the front-end evals (user's) Python code, building up the definitions that the back-end (compiler's code) then compiles down to something – especially nice for prototyping.

I don't think Go has a complete, working eval. So all those use cases seem basically impossible – though in theory you could try to build the eval'ing and eval'd code on the fly… Maybe in some cases it'd work. Certainly it's not something Go is designed for.

REPL

No REPL in Go – a consequence of not having eval. There are a few programs compiling Go on the fly, like go-repl, but you can't quite build up arbitrary state that way – you're not creating objects that "live" inside the session. Not to mention the following little behavior of Go – a production-friendly but prototyping-hostile language if there ever was one:

> + fmt
! fmt> fmt.Println("Hello, world!")
Hello, world!
! fmt> println("This won't work since fmt doesn't get used.")
Compile error: /tmp/gorepl.go:2: imported and not used: fmt

At home, I use DreamPie because I'm on Windows, where a scanner and a Wacom tablet work effortlessly, but there's no tcsh running under xterm. Perhaps go-repl could replace DreamPie – after all, you can't exactly build up state in live objects in tcsh which I'm quite used to. But I kinda got used to having "live objects" in DreamPie as a side effect of using it instead of tcsh.

And certainly Python REPLs already did and always will get more love than Go's because Python is designed for this kind of thing and Go is not.

numpy

So Python is slow and Go is fast, and just as readable, right? Try doing linear algebra with large matrices.

Python has numpy which can be configured as a thin wrapper around libraries like Intel's MKL – utilizing all of your cores and SIMD units, basically unbeatable even if you drop down to assembly. And the syntax is nice enough due to operator overloading.

Go doesn't have operator overloading, making scientific computing people or game programmers cringe. Go is ugly when it comes to linear algebra. Will it at least be as fast as Python with MKL? I doubt it – who'll continuously invest efforts to make something hopelessly ugly somewhat faster at these things? Even if someone does – even if someone did – who cares?

Why not have operator overloading? For the same reasons Java doesn't: "needless complexity". At least in Go there's a "real" reason – no exceptions, without which how would you handle errors in overloaded operators? Which brings us to…

No exceptions

Meaning that every time I open a file I need to clutter my code with an error handling path. If I want higher-level error context I need to propagate the error upwards, with an if at every function call along the way.

Maybe it makes sense for production servers. For code deployed internally it just makes it that much longer – and often error context will be simply omitted.

More code, less done. Yuck.

Update: a commenter pointed out, rather empathically I might add, that you can panic() instead of throwing an exception, and recover() instead of catching it, and defer() does what finally does elsewhere. Another commenter replied that the flow will be different in Go in some cases, because defer() works at a function level so you effectively need a function where you could use a try/finally block elsewhere.

So the tendency to return error descriptions instead of "panicking" is a library design style more than strictly a necessity. This doesn't change my conclusions, because I'd be using libraries a lot and most would use that style. (And libraries panicking a lot might get gnarly to use because of defer/recover not working quite like try/finally; "strict necessities" are not that different from this sort of "style choice".)

GUI bindings

We use Python's Qt bindings in several places at work. Go's Qt bindings are "not recommended for any real use" yet/"are at an alpha stage".

This might change perhaps – I don't know how hard making such bindings is. Certainly concurrent servers have no GUI and Go's designers and much of its community don't care about this aspect of programming.

Not unlike the numpy situation, not? Partly it's simply a question of who's been around for more time, partly of what everyone's priorities are.

All those things, combined

We have at work this largish program full of plugins that visualizes various things atop video clips, and we're in the process of making it support some kinds of increasingly interactive programming/its own REPL. Without eval, operator overloading for numeric stuff, the right GUI bindings or exceptions it doesn't seem easy to do this kind of thing. To take a generally recognizable example – Python can work as a Matlab replacement for some, Go can't.

Or, we do distributed machine learning, with numpy. What would we do in Go? Also – would its concurrency support help distribute the work? Of course not – we use multiple processes which Go doesn't directly support, and we have no use for multiple threads/goroutines/whatever.

Conclusion

Just looking at "requirements" – ignoring issues of "taste/convenience" like list comprehensions, keyword arguments/function forwarding etc. – shows that Go is no replacement for Python (at least to the extent that I got its feature set right).

What is Go really good at? Concurrent servers.

Which mature language Go competes with most directly? Java: rather fast (modulo gc/bounds checking), rather static (modulo reflection/polymorphic calls), rather safe (modulo – funnily enough – shared state concurrency bugs). Similar philosophies as well, resulting in feature set similarities such as lack of operator overloading – and a similar kind of hype.

(Updated thanks to an HN commenter) Does Go have a clear edge over Java when it comes to concurrency? With Quasar/Pulsar, maybe not anymore, though I haven't looked deep enough into either. Certainly if you're thinking about using Go, Java seems to be a worthy option to consider as well.

Would I rather be programming in Go than C or C++? Yes, but I can't. Would I rather be programming in Go than Python? Typically no, and I won't.

P.S.

Is there a large class of Python programmers who might switch to Go? Perhaps – people working on any kind of web server back-end code (blogs, wikis, shops, whatever). Here the question is what it takes to optimize a typical web site for performance.

If most of the load, in most sites, can be reduced drastically by say caching pages, maybe learning a relatively new language with a small community, less libraries and narrower applicability isn't worth it for most people. If on the other hand most code in most websites matters a lot for performance, then maybe you want Go (assuming other languages with similar runtime performance lack the cheap concurrency support).

I'm not that knowledgeable about server code so I might be off. But judging by the amount of stuff done in slow languages on the web over the years, and working just fine (Wikipedia looks like a good example of a very big but "static enough" site), maybe Go is a language for a small share of the server code out there. There are however Facebook/Twitter-like services which apparently proved "too dynamic" for slow languages.

Which kind of service is more typical might determine how much momentum Go ultimately gains. Outside servers – IMO if it gains any popularity, it will be a side-effect of its uses in servers. (And the result – such as doing linear algebra in Go – might be as ugly as using Node.js for server-side concurrency, "leveraging" JavaScript's popularity…)

And – why Python programmers would then be switching to Go but not C++ programmers? Because nobody is crazy enough to write web sites in C++ in the first place…

How to make a heap profiler

I have a new blog post at embeddedrelated.com describing heapprof, a 250-line heap profiler (C+Python) working out of the box on Linux, and easy to port/tweak.

Why bad scientific code beats code following "best practices"

I've just read "The Low Quality of Scientific Code", which claims that code written by scientists comes out worse than it would if "software engineers" were involved.

I've been working, for more than a decade, in an environment dominated by people with a background in math or physics who often have sparse knowledge of "software engineering".

Invariably, the biggest messes are made by the minority of people who do define themselves as programmers. I will confess to having made at least a couple of large messes myself that are still not cleaned up. There were also a couple of other big messes where the code luckily went down the drain, meaning that the damage to my employer was limited to the money wasted on my own salary, without negative impact on the productivity of others.

I claim to have repented, mostly. I try rather hard to keep things boringly simple and I don't think I've done, in the last 5-6 years, something that causes a lot of people to look at me funny having spent the better part of the day dealing with the products of my misguided cleverness.

And I know a few programmers who have explicitly not repented. And people look at them funny and they think they're right and it's everyone else who is crazy.

In the meanwhile, people who "aren't" programmers but are more of a mathematician, physicist, algorithm developer, scientist, you name it commit sins mostly of the following kinds:

  • Long functions
  • Bad names (m, k, longWindedNameThatYouCantReallyReadBTWProgrammersDoThatALotToo)
  • Access all over the place – globals/singletons, "god objects" etc.
  • Crashes (null pointers, bounds errors), largely mitigated by valgrind/massive testing
  • Complete lack of interest in parallelism bugs (almost fully mitigated by tools)
  • Insufficient reluctance to use libraries written by clever programmers, with overloaded operators and templates and stuff

This I can deal with, you see. I somehow rarely have a problem, if anyone wants me to help debug something, to figure out what these guys were trying to do. I mean in the software sense. Algorithmically maybe I don't get them fully. But what variable they want to pass to what function I usually know.

Not so with software engineers, whose sins fall into entirely different categories:

  • Multiple/virtual/high-on-crack inheritance
  • 7 to 14 stack frames composed principally of thin wrappers, some of them function pointers/virtual functions, possibly inside interrupt handlers or what-not
  • Files spread in umpteen directories
  • Lookup using dynamic structures from hell – dictionaries of names where the names are concatenated from various pieces at runtime, etc.
  • Dynamic loading and other grep-defeating techniques
  • A forest of near-identical names along the lines of DriverController, ControllerManager, DriverManager, ManagerController, controlDriver ad infinitum – all calling each other
  • Templates calling overloaded functions with declarations hopefully visible where the template is defined, maybe not
  • Decorators, metaclasses, code generation, etc. etc.

The result is that you don't know who calls what or why, debuggers are of moderate use at best, IDEs & grep die a slow, horrible death, etc. You literally have to give up on ever figuring this thing out before tears start flowing freely from your eyes.

Of course this is a gross caricature, not everybody is a sinner at all times, and, like, I'm principally a "programmer" rather than "scientist" and I sincerely believe to have a net positive productivity after all – but you get the idea.

Can scientific code benefit from better "software engineering"? Perhaps, but I wouldn't trust software engineers to deliver those benefits!

Simple-minded, care-free near-incompetence can be better than industrial-strength good intentions paving a superhighway to hell. The "real world" outside the computer is full of such examples.

Oh, and one really mean observation that I'm afraid is too true to be omitted: idleness is the source of much trouble. A scientist has his science to worry about so he doesn't have time to complexify the code needlessly. Many programmers have no real substance in their work – the job is trivial – so they have too much time on their hands, which they use to dwell on "API design" and thus monstrosities are born.

(In fact, when the job is far from trivial technically and/or socially, programmers' horrible training shifts their focus away from their immediate duty – is the goddamn thing actually working, nice to use, efficient/cheap, etc.? – and instead they declare themselves as responsible for nothing but the sacred APIs which they proceed to complexify beyond belief. Meanwhile, functionally the thing barely works.)

Working simultaneously vs waiting simultaneously

"Multiprocessing", "multi-threading", "parallelism", "concurrency" etc. etc. can give you two kinds of benefits:

  • Doing many things at once – 1000 multiplications every cycle.
  • Waiting for many things at once – wait for 1000 HTTP requests just issued.

Some systems help with one of these but not the other, so you want to know which one – and if it's the one you need.

For instance, CPython has the infamous GIL – global interpreter lock. To what extent does the GIL render CPython "useless on multiple cores"?

  • Indeed you can hardly do many things at once – not in a single pure Python process. One thread doing something takes the GIL and the other thread waits.
  • You can however wait for many things at once just fine – for example, using the multiprocessing module (pool.map), or you could spawn your own thread pool to do the same. Many Python threads can concurrently issue system calls that wait for data – reading from TCP sockets, etc. Then instead of 1000 request-wait, request-wait steps, you issue 1000 requests and wait for them all simultaneously. Could be close to a 1000x speed-up for long waits (with a 1000-thread worker pool; more on that below). Works like a charm.

So GIL is not a problem for "simultaneous waiting" for I/O. Is GIL a problem for simultaneous processing? If you ask me – no, because:

  • If you want performance, it's kinda funny to use pure Python and then mourn the fact that you can't run, on 8 cores, Python code that's 30-50x slower than C to begin with.
  • On the other hand, if you use C bindings, then the C code could use multiple threads actually running on multiple cores just fine; numpy does it if properly configured, for instance. Numpy also uses SIMD/vector instructions (SSE etc.) – another kind of "doing many things at once" that pure Python can't do regardless of the GIL.

So IMO Python doesn't have as bad a story in this department as it's reputed to have – and if it does look bad to you, you probably can't tolerate Python's slowness doing one thing at a time in the first place.

So Python – or C, for that matter – is OK for simultaneous waiting, but is it great? Probably not as great as Go or Erlang – which let you wait in parallel for millions of things. How do they do it? Cheap context management.

Context management is a big challenge of waiting for many things at once. If you wait for a million things, you need a million sets of variables keeping track of what exactly you're waiting for (has the header arrived? then I'm waiting for the query. has it arrived? then I ask the database and wait for it etc. etc.)

If those variables are thread-local variables in a million threads, then you run into one of the problems with C – and hence OS-supported threads designed to run C. The problem is that C has no idea how much stack it's gonna need (because of the halting problem, so you can't blame C); and C has no mechanism to detect that it ran out of stack space at runtime and allocate some more (because that's how its ABIs have evolved; in theory C could do this, but it doesn't.)

So the best thing a Unixy OS could do is, give C one page for the stack (say 4K), and make say the next 1-2M of the virtual address space unaccessible (with 64b pointers, address space is cheap). When C page-faults upon stack overflow, give it more physical memory – say another 4K. This method means at least 4K of allocated physical memory per thread, or 4G for a million threads – rather wasteful. (I think in practice it's usually way worse.) All regardless of us often needing a fraction of that memory for the actual state.

And that's before we got to the cost of context switching – which can be made smaller if we use setjmp/longjmp-based coroutines or something similar, but that wouldn't help much with stack space. C's lax approach to stack management – which is the way it is to shave a few cycles off the function call cost – can thus make C terribly inefficient in terms of memory footprint (speed vs space is generally a common trade-off – it's just a bad one in the specific use case of "massive waiting" in C).

So Go/Erlang don't rely on the C-ish OS threads but roll their own – based on their stack management, which doesn't require a contiguous block of addresses. And AFAIK you really can't get readable and efficient "massive waiting" code in any other way – your alternatives, apart from the readable but inefficient threads, are:

  • Manual state machine management – yuck
  • Layered state machines as in Twisted – better, but you still have callbacks looking at state variables
  • Continuation passing as in Node.js – perhaps nicer still, but still far from the smoothness of threads/processes/coroutines

The old Node.js slides say that "green threads/coroutines can improve the situation dramatically, but there is still machinery involved". I'm not sure how that machinery – the machinery in Go or Erlang – is any worse than the machinery involved in continuation passing and event loops (unless the argument is about compatibility more than efficiency – in which case machinery seems to me a surprising choice of words.)

Millions of cheap threads or whatever you call them are exciting if you wait for many events. Are they exciting if you do many things at once? No; C threads are just fine – and C is faster to begin with. You likely don't want to use threads directly – it's ugly – but you can multiplex tasks onto threads easily enough.

A "task" doesn't need to have its own context – it's just a function bound to its arguments. When a worker thread is out of work, it grabs the task out of a queue and runs it to completion. Because the machine works - rather than waits - you don't have the problems with stack management created by waiting. You only wait when there's no more work, but never in the middle of things.

So a thread pool running millions of tasks doesn't need a million threads. It can be a thread per core, maybe more if you have some waiting – say, if you wait for stuff offloaded to a GPU/DSP.

I really don't understand how Joe Armstrong could say Erlang is faster than C on multiple cores, or things to that effect, with examples involving image processing – instead of event handling which is where Erlang can be said to be more efficient.

Finally, a hardware-level example – which kind of hardware is good at simultaneous working, and which is good at simultaneous waiting?

If your goal is parallelizing work, eventually you'll deteriorate to SIMD. SIMD is great because there's just one "manager" – instruction sequencer – for many "workers" – ALUs. CPUs, DSPs and GPUs all have SIMD. NVIDIA calls its ALUs "cores" and 16-32 ALUs running the same instruction "threads", but that's just shameless marketing. A "thread" implies, to everyone but a marketeer, independent control flow, while GPU "threads" march in lockstep.

In practice, SIMD is hard despite thousands of man-years having been invested into better languages and libraries – because telling a bunch of dumb soldiers marching in lockstep what to do is just harder than running a bunch of self-motivating threads each doing its own thing.

(Harder in one way, easier in another: marching in lockstep precludes races – non-deterministic, once-in-a-blue-moon, scary races. But races of the kind arising between worker threads can be almost completely remedied with tools. Managing the dumb ALUs can not be made easier with tools and libraries to the same extent – not even close. Where I work, roughly there's an entire team responsible for SIMD programming, while threading is mostly automatic and bugs are weeded out by automated testing.)

If, however, you expect to be waiting much of the time – for memory or for high-latency floating point operations, for instance – then hoards of hardware threads lacking their own ALUs, as in barrel threading or hyper-threading, can be a great idea, while SIMD might do nothing for you. Similarly, a bunch of weaker cores can be better than a smaller number of stronger cores. The point being, what you really need here is a cheap way to keep context and switch between contexts, while actually doing a lot at once is unlikely to be possible in the first place.

Conclusions

  • Doing little or nothing while waiting for many things is both surprisingly useful and surprisingly hard (which took me way too long to internalize both in my hardware-related and software/server-related work). It motivates things looking rather strange, such as "green threads" and hardware threads without their own ALUs.
  • Actually doing many things in parallel – to me the more "obviously useful" thing – is difficult in an entirely different way. It tends to drag in ugly languages, intrinsics, libraries etc. about as much as having to do one single thing quickly. The "parallelism" part is actually the simplest (few threads so easy context management; races either non-existent [SIMD] or very easy to weed out [worker pool running tasks])
  • People doing servers (which wait a lot) and people doing number-crunching (work) think very differently about these things. Transplanting experience/advice from one area to the other can lead to nonsensical conclusions.

See also

Parallelism and concurrency need different tools – expands on the reasons for races being easy to find in computational code – but impossible to even uniformly define for most event handling code.

Can your static type system handle linear algebra?

It seems a good idea to use a static type system to enforce unit correctness – to make sure that we don't add meters to inches or some such. After all, spacecraft has been lost due to such bugs costing hundreds of millions.

Almost any static type system lets us check units in sufficiently simple cases. I'm going to talk about one specific case which seems to me very hard and I'm not sure there's a type system that handles it.

Suppose we have a bunch of points – real-world measurements in meters – that should all lie on a straight line, modulo measurement noise. Our goal is to find that line – A and B in the equation:

A*x + B = y

If we had exactly 2 points, A and B would be unique and we could obtain them by solving the equation system:

A*x1 + B = y1
A*x2 + B = y2

2 equations, 2 unknowns – we're all set. With more than 2 measurements however, we have an overdetermined equation system – that is, the 3rd equation given by the measurement x3, y3 may "disagree" with the first two. So our A and B must be a compromise between all the equations.

The "disagreement" of x,y with A and B is |A*x + B – y|, since if A and B fit a given x,y pair perfectly, A*x+B is exactly y. Minimizing the sum of these distances from each y sounds like what we want.

Unfortunately, we can't get what we want because minimizing functions with absolute values in them cannot be done analytically. The analytical minimum is where the derivative is 0 but you can't take the derivative of abs.

So instead we'll minimize the sum of (A*x + B – y)^2 – not exactly what we want, but at least errors are non-negative (good), they grow when the distance grows (good), and this is what everybody else is doing in these cases (excellent). If the cheating makes us uneasy, we can try RANSAC later, or maybe sticking our head in the sand or something.

What's the minimum of our error function? Math says, let's first write our equations in matrix form:

(x1 1)*(A B) = y1
(x2 1)*(A B) = y2
...
(xN 1)*(A B) = yN

Let's call the matrix of all (xi 1) "X" and let's call the vector of all yi "Y". Then solving the equation system X'*X*(A B) = X'*Y gives us A, B minimizing our error function. Matlab's "" operator – as in XY – even does this automatically for overdetermined systems; that's how you "solve" them.

And now to static type systems. We start out with the following types:

  • x,y are measured in meters (m).
  • A is a number (n)- a scale factor – while B is in meters, so that A*x + B is in meters as y should be.

What about X'*X and X'*Y? X'*X looks like this:

sum(xi*xi) sum(xi) //m^2 m
sum(xi)    sum(1)  //m   n

….while X'*Y looks like this:

sum(xi*yi) sum(yi) //m^2 m

The good news is that when we solve for A and B using Cramer's rule which involves computing determinants, things work out wonderfully – A indeed comes out in meters and B comes out as a plain number. So if we code this manually – compute X'*X and X'*Y by summing the terms for all our x,y points and then hand-code the determinants computation – that could rather easily be made to work in many static type systems.

The bad news is that thinking about the type of Matlab's operator or some other generic least-squares fitting function makes my head bleed.

Imagine that large equation systems will not be solved by Cramer's rule but by iterative methods and all the intermediate results will need to have a type.

Imagine that we may be looking, not for a line, but a parabola. Then our X'*X matrix would have sum(x^4), sum(x^3) etc. in it – and while spelling out Cramer's rule works nicely again, a generic function's input and output types would have to account for this possibility.

Perhaps the sanest approach is we type every column of X and Y separately, and every element of X'*X and X'*Y separately – not meters^something but T1, T2, T3… And then we see what comes out, and whether intermediate results have sensible types. But I just can't imagine the monstrous types of products of the various elements ever eventually cancelling out as nicely as they apparently should – not in a real flesh-and-blood programming language.

Maybe I'm missing something here? I will admit that I sort of gave up on elegance in programming in general and static typing in particular rather long ago. But we have many static type systems claiming to be very powerful and concise these days. Maybe if I did some catching-up I would regain the optimism of years long gone.

Perhaps in some of these systems, there's an elegant solution to the problem of 's type – more elegant than just casting everything to "number" when it goes into and casting the result back into whatever types we declared for the result, forgoing the automatic type checking.

So – can your static type system do linear algebra while checking that measurement units are used consistently?

C++11 FQA anyone?

isocpp.org announced "the C++ FAQ" to which reportedly Bjarne Stroustrup and Marshall Cline will link to help boost its search rank. Good stuff; also, given that the language is >30 years old – it's about time.

Which reminds me. I'm failing to keep my promise to update the C++ FQA, an overly combative reply to Marshall Cline's fairly combative FAQ, to the new standard from hell – C++11.

Could you perhaps do it instead of me? Unfortunately, explaining how terrible C++ is cannot be made into a career the way explaining how wonderful C++ is can be. So I sort of moved on to other things; I managed to summon the mental energy just that one time to write the C++ FQA, and even as I wrote it I knew that I better hurry because that energy was waning. After all, at work I managed to reduce the amount of C++ I deal with to a minimum, so there was neither a profit motive nor continuous frustration to recharge me.

Not that I haven't made money off the C++ FQA – indirectly I did make money, specifically by getting a headhunter's attention and significantly improving my employment conditions. But I don't think a new FQA's maintainer/co-author could count on something along those lines.

Still, you'd become about as widely famous in narrow circles as I am, getting maybe 200K unique visitors per year and hundreds of "thank you" emails. Most importantly, you'd do the world a great service.

There's a curious double standard in the world of programming languages. Say, PHP is widely ridiculed because, for instance, its string comparison operator converts strings to numbers if they start with "0e" and this results in unexpected behavior. And it doesn't help PHP that === works just fine.

Imagine someone mocking C++ for two char pointers comparing "wrong" with ==. Immediately they'd be told that casting to std::string (more verbose than ===) would work just fine, and that "you just don't get it".

Why is PHP – a language full of quirks which at least gives you memory and type safety – is universally ridiculed and the developers, while defending the language, never ridicule back, while C++, an absolutely insane language, is ridiculed often enough but C++ developers always counter-attack viciously? For that matter, why isn't Lisp ridiculed for EQ, EQL, EQUAL and EQUALP, if comparison operators are so funny?

The reason, IMO, is simple: PHP is not taught in academic CS courses. C++ developers are much more likely to have a CS degree, therefore both they and others treat their knowledge of crazy C++ arcana as something of intellectual value. And Lisp is the poster child of academic programming language development. Educated proponents deter attempts at ridiculing a language.

In a "rational" universe, PHP would be held in high esteem given the amount of output produced by PHP developers with little training. In our universe we have this double- or triple-standard. Pointing out the darker aspects of C++ – which are most of its aspects – thus increases the supply of a rather scarce commodity.

But why C++ and not, say, Lisp, Haskell or C#?

One reason is that C++ is arguably the craziest language in widespread use, combining the safety of C (and I can live with C just fine) with the clarity of Perl (and I can live with Perl peacefully enough). But it is of course subjective.

The thing that really sets C++ apart is its development culture and value system – a perverse amalgamation of down-to-earth shrewdness and idealistic perfectionism. The whole idea of making the best, richest, most efficient and most generic/versatile programming language of the planet – you can sense that Bjarne Stroustrup aims at nothing less – ON TOP OF C is the perfect illustration of this culture, and perhaps its origin.

Stroustrup always knew that the language would be better if it didn't have to be compatible with C and he publicly acknowledged it – the "smaller, much simpler language struggling to come out" remark. But he also knew that using an Embrace, Extend and Exterminate strategy is a much more likely way to succeed. So he did that. In fact he managed to more or less kill C or at least put it in a coma – as even C99, not to mention C11, will never be supported by the Microsoft compiler, and C89 is a really old and really restrictive standard.

A shrewd move – almost an evilly shrewd one – netting the people behind C++ fame and fortune at the expense of programmers dealing with a uniquely dangerous language full of sugar-coated death traps.

(Yes, sugar-coated death traps, you clueless cheerleaders. X& obj=a.b().c() – oops, b() is a temporary object and c() returns a reference into it! Shouldn't have assigned that to a reference. Not many chances for a compiler warning, either.)

Who did things differently? Sun and Microsoft, for example, marketing Java and C#, respectively. They made much cleaner languages from scratch, and to solve the chicken-and-egg problem – we have no legacy projects hence no programmers hence no new projects hence no programmers – they used money, large marketing budgets and large budgets for creating large standard libraries. A much more honest approach yielding much better results, I find.

And Stroustrup says about himself that he "lacks marketing clout" and says Java and C# are bad for you because they're "platforms, not languages", whatever that means.

And I'm not claiming to be able to read minds, but if I had to bet – I'd say he really believes that. He probably thinks he's your altruistic benefactor and Java and C# are evil attempts to drag you into proprietary platforms.

And you can see the same shrewdness, the same "altruism", the same attention to detail, the same tunnel vision – "this shit I'm working on is so important, it deserves all of my mental energy AND the mental energy of my users at the expense of caring about anything else" – throughout the C++ culture. From the boost libraries to "Modern C++ Design" (the author has since repented and moved to D – or did he repent?..) to the justifications for duplicate and triplicate and still incomplete language features to your local C++ expert carrying his crazy libraries and syntactic wrappers and tangling your entire code base in his net.

And this approach to life – "altruism" plus perfectionism plus cleverness plus shrewdness – extends way beyond C++ and way beyond programming. And the alternative is taming your ambitions.

This was my larger purpose in writing about all this shit. I'm pretty sure I failed in the sense that it got drowned in C++ error messages and other shits and giggles.

So maybe you'll do better than me. If you do, you might contribute to the sanity of many a young idealistic programmer – these tend to get sucked into C++'s sphere of influence, either emerging old and embittered years later or lost to sanity forever, stuck in an internally consistent but absolutely crazy way of thinking.

BTW I never wanted it to be a personal attack, in the sense that (1) I don't know what's inside the heads of people who promote C++ and (2) I can tell you with certainty that if I made something 10 times as bad as C++ and it was 0.0001x as popular as C++, I'd be immensely proud of myself and I wouldn't give a damn about what anyone thought. Just like I don't give a damn about people with so much time on their hands to actually have thousands of karma points at StackOverflow explaining just how lame C++ FQA is.

In this sense, C++ is fine and a worthy achievement of a lifetime. Especially in a world where Putin is a candidate for a Nobel Peace prize and Obama already got one.

I basically just think that (1) C++'s horrible quirks are worth pointing out and (2) there's something to be learned from this story about the way we pave the road to hell with our good intentions. That's all.

***

A lot of people have spoken about "a C++ renaissance" when C++11 was ratified. I tend to agree – indeed the new standard is a fresh doze of the same thing that C++ always was: "zero-overhead" pretty-looking syntax with semantics quite horrendous once you think what it actually means.

Fittingly for a new revision of the C++ standard, more things we could once safely count on have gone with the wind. For instance, anything you could pass to a function used to be an expression, and expressions had one and only one type. How ironic that this revision, the revision that finally capitalized on this fact by introducing auto, also made this fact no longer true: {0} could be an int or an std::initializer_list, depending on the context. Context-dependent types were one thing that Perl had (scalar/vector context) but C++ didn't have.

(I have observed someone do this: _myarr[5]={0}; – they had in the .h file the definition int _myarr[5] and they remembered that this thing could be initialized with {0} in other contexts. What they did wouldn't compile in C++98; in C++11 it promptly assigned the int 0 to the non-existent 5th element of _myarr, and the usual hilarity ensued. Imagine how PHP would be ridiculed for this kind of little behavior – and PHP at least would never overwrite an unrelated variable with garbage. Imagine how with C++, the poor programmer will be ridiculed instead.)

This is nothing, BTW – it's not the kind of thing the FQA would normally poke fun at. We have bigger fish to fry. For instance, C++11's "closures" or "lambdas" – the preposterous thing where you say [](){…} and an anonymous struct gets generated. BTW it's one of the reasons I urged everyone where I work to upgrade to C++11; because it lets you implement a nicely-looking parallel_for. In C I would have added a compiler extension for it AGES AGO but extending the C++ grammar? No sir. I waited patiently until C++11 lambdas.

So, C++11 lambdas. Someone needs to write everything about those hideous capture lists – are references captured by value or by reference?! – and how you can end up passing dangling references if this closure thingie outlives variables it references (we don't need no stinking garbage collection! but why doesn't C++11 let one capture variables by std::shared_ptr?.. It's so much better than gc!), and about the type of this shit and how it looks in debuggers, and about std::function etc. etc.

And this won't be me because frankly, I don't have time to even fully master this arcana anymore.

If you dislike C++ and have the time to write about it, I'll gladly pass the torch on to you; specifically I'll redirect C++ FQA to point to your site, following the example of Bjarne Stroustrup and Marshall Cline. Or we could run a wiki, or something. Email or comment if you're interested.

Very funny, gdb. Ve-ery funny.

Have you ever opened a core dump with gdb, tried to print a C++ std::vector element, and got the following?

(gdb) p v[0]
You can't do that without a process to debug.

So after seeing this for years, my thoughts traveled along the path of, we could make a process out of the core dump.

No really, there used to be Unices with a program called undump that did just that. All you need to do is take the (say) ELF core dump file and generate an ELF executable file which loads the memory image saved in the core (that's actually the easier, portable part) and initializes registers to the right values (the harder, less portable part). I even wrote a limited version of undump for PowerPC once.

So we expended some effort on it at work.

And then I thought I'd just check a live C++ process (which I normally don't do, for various reasons). Let's print a vector element:

(gdb) p v[0]
Could not find operator[].
(gdb) p v.at(0)
Cannot evaluate function -- may be inlined.

Very funny, gdb. "You can't do that without a process to debug". Well, I guess you never did say that I could do that with a process to debug, now did you. Because, sure enough, I can't. Rolling on the floor, laughing. Ahem.

I suggest that we all ditch our evil C arrays and switch to slow-compiling, still-not-boundary-checked, still-not-working-in-debuggers-after-all-these-YEARS std::vector, std::array and any of the other zillion "improvements".

And gdb has these pretty printers which, if installed correctly (not easy with several gcc/STL versions around), can display std::vector – as in all of its 10000 elements, if that's how many elements it has. But they still don't let you print vec[0].member.vec2[5].member2. Sheesh!

P.S. undump could be useful for other things, say a nice sort of obfuscating scripting language compiler – Perl used to use undump for that AFAIK. And undump would in fact let you call functions in core dumps – if said functions could be, um, found by gdb. Still, ouch.

P.P.S. What gdb prints and when depends on things I do not comprehend. I failed to reproduce the reported behavior in full at home. I've seen it for years at work though.