Why custom allocators/pools are hard

December 3rd, 2012

In languages with manually managed memory such as C and C++ as well as in garbage-collected languages, you sometimes want to roll your own memory allocator. Some common reasons are:

Speed: return &pool[last++] is faster than malloc. (A real pool would usually be slower than that, but still faster than malloc; especially since your "free", ready-to-be-allocated objects in the pool could have a lot of state initialized already since the last time they were used, unlike a malloc'd buffer – in OO terms, you don't need to call the constructor after allocating).
Predictability: people usually refer to "the pool advantage" as "lower fragmentation" and hence less chances of running out of memory due to "sudden" fragmentation in unexpected circumstances. Actually, fragmentation is higher with pools: a pool of 100 objects of type A can not be used to allocate objects of type B, even if you're using just one (or zero) A objects right now – so your memory is very much fragmented. However, it's fragmented predictably, leading to predictable allocation times.
Stability: Another things which higher fragmentation buys. Pools let you allocate B objects after running out of A objects from the predictably available "B fragment" (pool). This means you can actually handle out-of-memory conditions if you can live without another A object. A malloc-based program "runs out of everything" when it runs out of memory, so it's very unlikely to survive.

How hard are pools? Algorithmically, they're misleadingly easy, unlike malloc which is rather clearly hard. Malloc must be able to allocate chunks of many different sizes in an unpredictable order. A fast algorithm tending to have low fragmentation – or implementing garbage collection and heap defragmentation – is between hard and impossible (where "impossible" means that you'll always have pathological workloads leading to horrible behavior, and you're trying to select an algorithm such that real-life workloads are well-supported and the pathological ones remain theoretical).

Pools, however, usually allocate same-sized chunks. How hard can it be? You keep an array of free pointers and a last index; allocate() pops from this free pointer stack, and free() pushes a pointer back into it, and you're done.

The problem with pools isn't the allocation algorithm but the fact that a new memory management interface has been created. Memory management is fundamental to programming. Built-in memory management interfaces come with a lot of tool support which is rather hard for a custom interface to match.

Consider a garbage-collected language. Most often, such languages provide you a rather strong correctness guarantee: as long as an object is referenced, it will be kept alive and won't be reclaimed and reused for something else. In other words, you're promised to never have problems with dangling references (of course, a subset of these will turn into memory leak problems – too many objects that are no longer "really" needed but referenced – but these are generally way easier to debug).

However, if you implement a pool in a language with GC, that guarantee is gone. The language runtime doesn't know that pool.free(obj) "frees" something – as far as it can tell, the object is very much alive. If someone frees an object and then accesses it, it may very well be that the object has since been reused for something else, and now you have a nasty dangling reference problem.

Your only guarantee now is that you'll only get the "type-safe" variant of dangling references – you'd be fiddling with someone else's object of the same type as yours – but this doesn't necessarily make debugging easier (because changes to the wrong object of the right type may look "too sensible" to provoke the suspicion that they deserve).

Can you tell the runtime, "pool.free actually frees, and I want you to call it instead of your normal reclaiming procedure when the object is no longer referenced?" Perhaps some GC languages have this; it's certainly not a trivial thing to support, because part of the point of pools is to keep hairy, already-constructed objects in them, which point to other objects, some of which might be themselves allocated from pools and some not.

What about languages with manually managed memory? At first glance, the problem seems irrelevant to these because of their "advantage" of not providing any guarantees anyway. You very much can have dangling references with malloc, and pools don't change this.

However, there are tools such as Valgrind which flag a large share of these conditions, by marking chunks passed to free as "inaccessible", and chunks returned by malloc as "undefined" (inaccessible for reading until the first write which initializes the data). The trouble with pools is that, again, Valgrind doesn't know that pool.free frees, and hence it can't flag accesses through dangling references any more.

Is there a workaround? The answer depends on your situation and disposition:

Valgrind has a client request mechanism which lets you mark memory regions as "inaccessible" or "undefined", and your pools can issue these requests using Valgrind macros.
However, this isn't something that can be done in the pool implementation if the pool keeps constructed objects rather than plain memory chunks. You'll need a per-object-type function marking some of the memory as inaccessible/undefined – but not all of it. For instance, if the object keeps a pointer to a pre-allocated buffer, then maybe the buffer data become undefined when the object is freed and then reallocated, but the pointer to the buffer is defined, because it's already valid. For hairy objects, this can mean a lot of code for making Valgrind work as well as with malloc, and this code can have bugs, marking the wrong things as "defined".
If you're using tools other than Valgrind, you'll need to find an equivalent mechanism for these. If you use several tools, then you need to support several mechanisms. There's no standard interface for custom allocators (there could be – there is, in many languages, a standard interface for specifying custom operators, so it's not like there can't be standard ways for doing custom things; there just isn't for pools, at least there isn't in many real languages).

The main point I'm trying to make is, don't have every developer roll their own pool, unless it's for a specific type of objects used briefly and locally in some "private" bit of code. If you need pools for many different kinds of objects and these objects have long, non-trivial lifecycles and are accessed in many different contexts, standardize on the pools.

In a whole lot of cases, code reuse actually isn't worth the trouble and it's fine for people to do their own slightly different version of something which could become a common library – but it'd take too much effort and coordination and misunderstandings during that coordination.

Pools aren't one of these places. Their algorithmic simplicity actually makes it easy to standardize on a few common variants (what variant can one desire that others don't also need?) – and their non-algorithmic complications make standardization very worthwhile.

There are a bunch of other non-algorithmic problems you can have with pools besides having to describe your live objects to tools – for example:

Thread safety is another potentially non-portable aspect of memory allocation which is already handled by the language's built-in allocator and will become a headache for a custom one. You could use OS locks, or spinlocks, or a combination, or you could have a per-thread arena to avoid locking if it's too slow, in which case you'll need to handle deallocation by a thread different from the allocating one. Or perhaps you could do lock-free allocation if, say, there's an atomic increment and it's sufficient.
Placement new is something you might want to use in C++ that rather many C++ programmers aren't aware of. If you want to have your pool initialize objects in a memory chunk that's passed to it from outside, and you intend to use the pool with classes with non-empty constructors and destructors, then you'll want to do something like for(i=0;i<n;++i) new (buf+i*sizeof(T)) T(args) or what-not, and call ~T directly when the pool shuts down. If everyone rolls their own pools, a lot will do this bit wrong.

The upshot is that pools are surprisingly gnarly, and are really best avoided; there's a very good reason to build memory allocation into a programming language. To the extent that circumstances dictate the use of pools, it's a very good idea to standardize on a few common implementations, debug them, and leave them alone (though unfortunately a closed implementation likely can not deal with live objects tracking, and bugs will appear in user-implemented parts doing that).

The algorithmic simplicity of a pool, prompting people to just declare an object array and a pointer array and use these as a quick-and-dirty pool, is really quite deceiving.