Why custom allocators/pools are hard

December 3rd, 2012

In languages with manually managed memory such as C and C++ as well as in garbage-collected languages, you sometimes want to roll your own memory allocator. Some common reasons are:

How hard are pools? Algorithmically, they're misleadingly easy, unlike malloc which is rather clearly hard. Malloc must be able to allocate chunks of many different sizes in an unpredictable order. A fast algorithm tending to have low fragmentation – or implementing garbage collection and heap defragmentation – is between hard and impossible (where "impossible" means that you'll always have pathological workloads leading to horrible behavior, and you're trying to select an algorithm such that real-life workloads are well-supported and the pathological ones remain theoretical).

Pools, however, usually allocate same-sized chunks. How hard can it be? You keep an array of free pointers and a last index; allocate() pops from this free pointer stack, and free() pushes a pointer back into it, and you're done.

The problem with pools isn't the allocation algorithm but the fact that a new memory management interface has been created. Memory management is fundamental to programming. Built-in memory management interfaces come with a lot of tool support which is rather hard for a custom interface to match.

Consider a garbage-collected language. Most often, such languages provide you a rather strong correctness guarantee: as long as an object is referenced, it will be kept alive and won't be reclaimed and reused for something else. In other words, you're promised to never have problems with dangling references (of course, a subset of these will turn into memory leak problems – too many objects that are no longer "really" needed but referenced – but these are generally way easier to debug).

However, if you implement a pool in a language with GC, that guarantee is gone. The language runtime doesn't know that pool.free(obj) "frees" something – as far as it can tell, the object is very much alive. If someone frees an object and then accesses it, it may very well be that the object has since been reused for something else, and now you have a nasty dangling reference problem.

Your only guarantee now is that you'll only get the "type-safe" variant of dangling references – you'd be fiddling with someone else's object of the same type as yours – but this doesn't necessarily make debugging easier (because changes to the wrong object of the right type may look "too sensible" to provoke the suspicion that they deserve).

Can you tell the runtime, "pool.free actually frees, and I want you to call it instead of your normal reclaiming procedure when the object is no longer referenced?" Perhaps some GC languages have this; it's certainly not a trivial thing to support, because part of the point of pools is to keep hairy, already-constructed objects in them, which point to other objects, some of which might be themselves allocated from pools and some not.

What about languages with manually managed memory? At first glance, the problem seems irrelevant to these because of their "advantage" of not providing any guarantees anyway. You very much can have dangling references with malloc, and pools don't change this.

However, there are tools such as Valgrind which flag a large share of these conditions, by marking chunks passed to free as "inaccessible", and chunks returned by malloc as "undefined" (inaccessible for reading until the first write which initializes the data). The trouble with pools is that, again, Valgrind doesn't know that pool.free frees, and hence it can't flag accesses through dangling references any more.

Is there a workaround? The answer depends on your situation and disposition:

The main point I'm trying to make is, don't have every developer roll their own pool, unless it's for a specific type of objects used briefly and locally in some "private" bit of code. If you need pools for many different kinds of objects and these objects have long, non-trivial lifecycles and are accessed in many different contexts, standardize on the pools.

In a whole lot of cases, code reuse actually isn't worth the trouble and it's fine for people to do their own slightly different version of something which could become a common library – but it'd take too much effort and coordination and misunderstandings during that coordination.

Pools aren't one of these places. Their algorithmic simplicity actually makes it easy to standardize on a few common variants (what variant can one desire that others don't also need?) – and their non-algorithmic complications make standardization very worthwhile.

There are a bunch of other non-algorithmic problems you can have with pools besides having to describe your live objects to tools – for example:

The upshot is that pools are surprisingly gnarly, and are really best avoided; there's a very good reason to build memory allocation into a programming language. To the extent that circumstances dictate the use of pools, it's a very good idea to standardize on a few common implementations, debug them, and leave them alone (though unfortunately a closed implementation likely can not deal with live objects tracking, and bugs will appear in user-implemented parts doing that).

The algorithmic simplicity of a pool, prompting people to just declare an object array and a pointer array and use these as a quick-and-dirty pool, is really quite deceiving.

1. Barry KellyDec 4, 2012

Four occasions where I've seen custom allocators / pools / arenas:

1) I wrote one for .NET, for recycling largish byte arrays used as network buffers amongst other things. These were large enough that they lived in the Large Object Heap, meaning they weren't collected until expensive gen2 collections, but they were "allocated" every single request.

2) In a compiler, every translation unit had its own arena for allocations. Allocations were never released independently. Arenas were disposed of whole when a translation unit needed recompilation owing to a change in the IDE editor.

3) A C++ compiler that used arenas not just for allocations, but also for precompiled headers. The arena was streamed out, and for loads streamed in and pointers fixed up. Faster than rebuilding an object graph.

4) I wrote a simple custom allocator for debugging memory corruption errors on Windows with ASLR, which would cause addresses from malloc to vary from one run to the next. A trick for finding out how a pointer got a certain value is to set a hardware breakpoint on the pointer's location, then run the program, and potentially perform a binary search on the breakpoint hits to find the bad write. But it usually requires predictable allocation addresses, something you don't get with ASLR. So a simple custom allocator can fix this.

2. Yossi KreininDec 4, 2012

Is there no way to simply turn off ASLR on Windows?

3. Prakhar GoelDec 4, 2012

I think the PostgreSQL RDBMS has an interesting use of pools.

They use it to prevent memory leaks in their (considerable) C codebase. A function gets an arena. All allocation goes through that arena. When the function is done, the entire arena is de-allocated. Arenas are nested so even if a function forgets about an arena, the next level up function can throw the entire set out.

I'm oversimplifying a bit but I think that's the basic idea.

(Not sure if the above is analogous to region-based allocation).

4. Yossi KreininDec 4, 2012

This is fine as long as things allocated by a function don't continue to live past the exit point.

5. JerryDec 4, 2012

The times I use pools are for performance due to allocating and deallocating memory through pointers. The primary example for me is objects with vector members. Each new() actually implies a cascade of allocation, and each delete throws that collection of memory chunks away.

I also end up needing each object to have a clear() function to return it to a reusable state.

6. Jean-MarcDec 4, 2012

Another interesting use of custom allocators is to roll out your own pointer type (allocator::pointer), if the maximum number of objects is known, one can use pointers smaller than the default 32 or 64 bits. This will reduce the memory usage particularly in data structures using lot of pointers (list, graphs,...).
Note that not all containers honor the allocator pointer type, some use void* internally with disastrous consequences.

7. Yossi KreininDec 4, 2012

@Jerry: yeah, that; trouble is, all the dangling references you could have into those "zombie" objects – though of course it's no big deal as long as the program and the team are small enough.

@Jean-Mark: if I actually wanted to roll my own pointer type (or use indexes instead of pointers, I guess that's what's going on there, sort of), I wouldn't dare to hope that any off the shelf library or interface would cooperate with me... I'd write it myself from scratch, or "not tell" the library that my indexes were "logically pointers"...

8. Leo SuticDec 5, 2012

I would add one more use case: garbage disposal in a GC language. Since GC isn't predictable, you may be forced to call a dispose() method on each object, particularly if it maps to a scarce system resource.

Using a pool (or a stack) as a bookkeeping system works well if the objects have bounded lifetimes.

You start with the pool point at

int mark = pool_pointer;

Then you just take objects from the pool, never returning them. Finally, you dispose() all objects between mark and the current pool_pointer, and set the pool_pointer to mark.

I did this for buffers in a render loop.

int mark = pool.mark();
renderShape (...);
pool.reset (mark);

Of course, since renderShape doesn't put anything back into the pool, the consumption of buffers must be bounded, but this really simplified the code of renderShape, as I didn't have to check that dispose() was called once on every path, for every object.

Ok, this may seem more like a stack allocator, but the buffer object need not be completely disposed, making it a pool of sorts.

9. neleaiDec 14, 2012

As you mentioned threads they totally defeat purpose of using memory pool in first place.

With locks, atomic operations... you hit a constraint that single atomic compare and swap takes ~100 cycles.

As current c malloc/free is amortized 200-400 cycles per call you won't get big speedup.

Avoiding this cost is possible but complex. You need TLS or equivalent and several tricks how amortize free.

I have several ideas how improve malloc instead.

10. Yossi KreininDec 14, 2012

You don't necessarily use pools for the speed-up – you may care more about predictability. Then speed varies depending on what sort of threads/cores, etc. But yeah, threads hurt performance of all shared allocators.

11. Sanjoy DasDec 29, 2012

Apart from the benefit that pools help freeing complex nested structures (which has already been mentioned), I find them interesting because they reduce cache misses. A linked list of integers (say) is likely to incur more cache misses on each deref of the "next" pointer if the individual elements have been malloced than if they were allocated from a memory pool (and hence are more likely to lie close to each other). Of course, this only applies to languages like C or C++; languages like Java can have a cleverly compacting GC take care of this.

12. Yossi KreininDec 29, 2012

Languages like Java keep what, 8 bytes of overhead per object, minimum? (The method table pointer and the lock, if I recall correctly.) Which inflates the number of cache lines you need before you consider overhead of gc metadata.

What languages like Java do nicely is keep arrays of integers (incidentally, so do most languages – even Python has numpy to store arrays without its usual ridiculous per-object overheads).

Not that I have anything against languages with plenty of overhead, I just don't think that cleverness and a high level of abstraction are the path to efficiency; I think brute force and less abstraction are that path. But that's an eternal "philosophical" dispute, there being no way to truly end it with data.

13. Sanjoy DasDec 29, 2012

I can't but agree that the per-object overhead must adversely affect the cache performance. What I was really driving at is that memory pools make less sense in managed languages since you don't have to (or cannot adequately, as you point out) deal with two important problems they take care of — deallocation of nested structures and better cache performance.

14. Yossi KreininJan 1, 2013

I do think that if you have hairy objects in a managed language, then it can make sense to keep them in a pool because taking one from a pool is faster than initializing them from scratch.

15. StefanJul 24, 2013

Pools *are* hard to implement, but not so much for the reasons you stated. As you said, speed and maybe fragmentation is almost always the reason for using a pool. But, either way, the standard memory allocator is only then not sufficient, if you are allocating and deallocating _very_ many objects.

That means, a pool only makes sense if it is fast for huge numbers of objects. Consequently, every single operation must have a complexity of O(1), or amortized O(1) (i. e. averaged over all calls). At the same time, you can't afford a lot of memory overhead, or else you cannot sensibly use it for comparitively small objects: you do need some additional memory to maintain the information about free and occupied memory spots.

Finding fast, O(1) algorithms, that can keep your organizational data up to date at all times, and with a minimum of memory overhead, is indeed a complex task.

16. Mate SoosFeb 22, 2014

You just saved my ass — I didn't know about the Valgrind client request mechanism and I was using a pool. So I read up on it, marked my free memory and found a bug. Nice! Thanks a lot!

By the way, another good reason to use a custom allocator is to place data that is frequently accessed in unison next to one another (& use a 32b index, to make the 'pointer' 32b instead of 64b). This is the case for SAT solvers' boolean clauses: all modern SAT solvers use a custom memory manager for these two reasons. We only free clauses in a very predictable manner, in large batches, relatively rarely. The speedup is more than 10% of overall runtime thanks to the better memory&cache behaviour.

17. Yossi KreininMar 3, 2014

Glad to hear it was helpful.

SAT solvers rule. I've been working with a guy who draws a SAT solver to shoot at every second problem he meets. Much of the time the solver gets the shit done.

18. MattApr 18, 2014

Object finalization is what you sound like you want here. I'm not sure you can call Common Lisp a "real language" anymore due to it's lack of use, but I know that the trivial-garbage package there abstracts implementation specific finalizers which get called when the object is deallocated. I used this in a foreign function interface I was writing one time. Basically, I wrapped a foreign (C) struct pointer in a Lisp class, a finalizer for the object as an :after method to initialize-instance. You had to be careful not to reference the object in the finalizer lambda, or else it would never be collected. I think it looked something like this, (but I also know this is wrong; it's just for an idea)

(defmethod initialize-instance :after (obj &key)
(with-slots (cptr) obj
(add-finalizer cptr #'(lambda () (foreign-free cptr)))))

This made the GC "just work" with the C code. I can't imagine that only lisp has this... But I could be wrong I guess...

19. Yossi KreininApr 18, 2014

I think you can do this kind of thing in most languages with an FFI; I'm not sure if it's trivial to use with a custom allocator written in that same language though, where some objects are allocated from pools and some not, and those allocated from pools are "partially constructed" and keep references to objects allocated from other pools, and there may be circular references within those partially constructed objects, etc. Maybe it works out just fine, I'm not concentrating on this sufficiently right now... What's obvious to me is it works fine for an opaque blob allocated in C that you need to deallocate when you run out of references to it, and it may get harder if it's not an opaque blob but a native data structure and you get to worry about its guts referencing stuff and it's no longer mostly C's problem but fully your problem.

20. G_glopAug 17, 2016

Yes, pool allocators are not so powerful in Java since they can't allocate objects of different types.

Pool allocators shine in C/C++ where allocation is just syntactic sugar around a function which takes number of bytes you want and gives back a pointer.
So pool allocation is just: return current position in the pool, increment it by the # bytes requested paramater. (+ some check if the pool if full)

21. Yossi KreininAug 18, 2016

What you describe only works if all the objects in the pool are freed together. If you need to free them separately, then allocating objects of different sizes from the same memory area means you need to roll your own malloc in that memory area. The alternative is a pool of memory chunks sized for an object of one type, which is exactly what Java's/similar type system would make you do.



Post a comment