My programming personality is very depressed right now. For a while, I've been away from my usual long-term stuff and back in the trenches. I'm doing simple things, but there's lots of them, so it feels like I'm not moving anywhere even though I go as fast as I can. It's like a highway with monotonic view.
Well, this has completely drained my brain. So I won't tell you anything in this blog – instead, I'll ask you a question. This way, it's OK for me to be wrong, which is even more likely than usual in my current state. Clever trick, isn't it?
My question basically is, how is not having side effects in your programming language going to help you write correct, and optimal, and parallel programs? I've already mentioned it, but this time, I'll elaborate on the point that I fail to get.
The problem with parallelism is scheduling. We want scheduling to be:
- Correct: when A depends on B, B should happen before A.
- Optimal: A, B and everything else should complete as soon as possible.
So, in order to correctly schedule basic blocks of code called A and B, I need to know if they depend on each other. If I have side effects, this is going to be hard. A modifies an object called X, and B modifies Y. Can X and Y be the same thing? To figure it out, you need whole-program analysis, and if your language is Turing-complete, WPA won't necessarily succeed, either.
This is the aliasing problem. The aliasing problem rules. It's the problem that forces you to use restrict
to
optimize numeric code. It's the problem that forces you to use static typing if you want to optimize anything at all. And it's
the problem that makes all safe languages completely and wildly unsafe when you have multiple threads. The rule of thumb is: if
you have a good idea for compile-time or run-time anything, it won't work because of the aliasing problem.
Now, suppose we got rid of those pesky side effects. A doesn't modify X; instead, it creates a new object, Z. Similarly, B takes Y and makes W. Now we can prove that A doesn't depend on B. As soon as their inputs are ready, they can run. This takes care of the correctness problem, which was listed as the scheduling problem #1 at the beginning.
However, this didn't solve the aliasing problem – X and Y can still be the same. I suspect that the aliasing problem is going to raise its ugly head and kick our pretty butt. Specifically, while semantically we made Z from X, we'd really like to wipe out X and reuse its memory for hosting Z. Because that would be more efficient than copying X and modifying some of it and keeping two objects around. Now, we can't do this optimization if X and Y are the same thing, and we know nothing about the execution order of A and B, can we? If A replaced X with Z, B would get the wrong Y.
And if we copy things, we aren't going to get an optimal scheduling, because A and B now run slower – problem #2, right?
Seriously, what do I miss? I realize that there are parallel execution environments where you have to copy things anyway because you have a zillion processors and a zillion processors can't have shared memory. I also realize that you can have few objects shared between tasks that run in parallel, so the overhead of copying them isn't very important. I realize all kinds of things making the aliasing problem a non-problem for side-effect-free parallelization frameworks or languages or whatever.
What I fail to see is, if you run in a single-box, multi-core system, and you do have shared memory, and you do have lots of shared data, how does removing side effects help you? With side effects, you can get an efficient program, if you can handle the debugging. Without side effects, you get a correct program, and then you must somehow optimize object allocation. And I just don't see how you'd go about that without compromising correctness.
Maybe I'm ignorant. Maybe I should have read about this somewhere. But where do I find a levelheaded discussion of, say, common implementation techniques of pure functional language compilers? I just keep bumping into "sufficiently smart compiler" arguments, and then there are people who don't care much about performance (and shouldn't because of their application domain).
Compare this to imperative languages. Anybody can write a pretty decent C compiler. If you don't optimize at all, you get what, 4x slowdown compared to the state of the art optimizer? And on a scalar RISC machine, I bet you can bridge 50%-70% of that gap by your cheapest shot at register allocation. A no-brainer, you need about 30% of a comp sci degree and you're all set. Now, how am I supposed to optimize allocation if there are no side effects? I mean, your spec or your family of specs should come with implementation techniques. "Useful" is good by itself, but it must also be implementable.
By the way, I'd love to find out that I'm wrong on this one. I've done lots of parallelism-related debugging, and it sucks. And maybe I am wrong. Maybe there just isn't much talk about the implementation details of these things. I do think my reasoning makes sense though.