Memory allocation strategies: a stack-like (LIFO) allocator

Last time, we were looking at a linear allocator, probably the simplest of all memory allocators. This time, we will detail how to implement a non-growing stack-like allocator, along with conventional use-cases.

When talking about stack-like allocators I mean allocators that behave like a stack data structure, following the LIFO (last-in, first-out) principle. This has nothing to do with stack or heap allocations, the call stack, or anything like that.

Use cases

Such a stack allocator is extremely handy for e.g. level loading, where a bunch of allocations need to be made, and they can all be rolled back in reverse order. Additionally, it can also be used for doing some allocations which need to stay resident throughout a certain lifetime, with additional allocations on top. For more use-cases and a more detailed description, check out the excellent book Game Engine Architecture by Jason Gregory.

One of the most useful extensions to a stack allocator is what is commonly called a double-ended stack allocator. It also follows the LIFO-principle, but one can allocate from both ends of the stack, allowing users to put different types of allocations at different ends of the stack. The most well-known use-case of such an allocator is, again, level loading.

If you are not careful, loading a level can easily result in quite a lot of fragmentation because of temporary allocations that need to be made while e.g. loading resources. As an example, consider the case where loading a resource needs a temporary buffer A, from which level-resident data B is created. You cannot free A because you need to generate data into B, and you cannot free A after B because of the LIFO-like allocations. A typical example of such fragmentation is reading an XML/JSON-file, and creating/allocating other stuff from there, trying to free the read data afterwards.

This is where the double-ended stack allocator comes into play. All it takes is to allocate level-resident data from one end, and temporary allocations from the other end. As long as the two ends don’t meet or overlap during the loading process, you have exactly the same amount of memory available as you would have using a simple stack-allocator, but with the benefit of having no fragmentation at all.

Interface

Knowing why and when we might need a stack allocator, let us try to build one. The interface is similar to the linear allocator from last time:

class StackAllocator
{
public:
  StackAllocator(void* start, void* end);

  void* Allocate(size_t size, size_t alignment, size_t offset);

  void Free(void* ptr);

private:
  char* m_start;
  char* m_end;
  char* m_current;
};

Our non-growing allocator again can be used on the stack and on the heap, similar to the linear allocator we saw last time.

Of course, the implementation is different, because we need to be able to free allocations in a LIFO-like fashion. Let us assume for a moment that the user calls Allocate() and Free() in the correct order – we will concern ourself with adding debug checks later.

Implementation

How do we free allocations? Quite simple, just reset the internal buffer pointer to what it was before this allocation had been made. The first thing to realize here is that we can not just take the supplied pointer and set our internal pointer (m_current) to this, because this could result in unused memory which we would never be able to reclaim again!

As a simple example, consider the following: we haven’t done any allocations yet, and we are right at the beginning of our buffer, which sits at address 0x4004. If we want to allocate anything aligned to a 16-byte boundary, we will be handed address 0x4010. If this address is later freed and we simply put our current buffer pointer to this address, we’ve just lost 12 bytes.

So, what we need to do is store the original pointer alongside each allocation. This can easily be done by storing it in front of the user allocation, making sure that everything is still properly aligned. We have seen how to do exactly that (using the offset argument) when discussing the linear allocator.

The implementation could then look like the following:

namespace
{
  static const size_t SIZE_OF_ALLOCATION_OFFSET = sizeof(uint32_t);
  static_assert(SIZE_OF_ALLOCATION_OFFSET == 4, "Allocation offset has wrong size.");
}

void* StackAllocator::Allocate(size_t size, size_t alignment, size_t offset)
{
  // store the allocation offset right in front of the allocation
  size += SIZE_OF_ALLOCATION_OFFSET;
  offset += SIZE_OF_ALLOCATION_OFFSET;

  const uint32_t allocationOffset = safe_static_cast<uint32_t>(m_current - m_start);

  // offset the pointer first, align it, and then offset it back
  m_current = core::pointerUtil::AlignTop(m_current + offset, alignment) - offset;

  // is there enough memory left?
  if (m_current + size > m_end)
  {
    // out of memory
    return nullptr;
  }

  union
  {
    void* as_void;
    char* as_char;
    uint32_t* as_uint32_t;
  };

  as_char = m_current;

  // store allocation offset in the first 4 bytes
  *as_uint32_t = allocationOffset;
  as_char += SIZE_OF_ALLOCATION_OFFSET;

  void* userPtr = as_void;
  m_current += size;
  return userPtr;
}

There are three things to notice here:

  • Each allocation generates an overhead of 4 bytes (sizeof(uint32_t)) because of storing the offset right in front of each allocation.
  • I store offsets rather than pointers because this minimizes the overhead on 64-bit systems, based on the assumption that no single stack-allocator will ever allocate more than 4 GB of memory.
  • Both size and offset need to be increased by 4 bytes. This ensures that the pointer handed to the user will be properly aligned, and not the address we internally use for storing the offset to the buffer’s start.

Freeing an allocation is now as simple as grabbing the offset from the 4 bytes in front of the allocation, and setting our internal buffer pointer to that:

void StackAllocator::Free(void* ptr)
{
  ME_ASSERT_NOT_NULL(ptr);

  union
  {
    void* as_void;
    char* as_char;
    uint32_t* as_uint32_t;
  };

  as_void = ptr;

  // grab the allocation offset from the 4 bytes right before the given pointer
  as_char -= SIZE_OF_ALLOCATION_OFFSET;
  const uint32_t allocationOffset = *as_uint32_t;

  m_current = m_start + allocationOffset;
}

Checking for erroneous calls

In addition to the above, you can enable error checking for the stack allocators in Molecule, which will assert that calls to Allocate() and Free() are always made in LIFO-order.

This is done by storing the ID of the allocation in an additional 4 bytes at the front of the allocation, similar to how we store our base offset. The ID is nothing more than a counter which is incremented upon each call to Allocate(), and decremented upon each call to Free(). Whenever Free() is called, I simply fetch the allocation ID and compare it to the current one, asserting that they are the same.

Outlook

Next time, we will see how pool allocators can help with allocating objects of a certain size in any order, without fragmentation, in O(1) time.

12 thoughts on “Memory allocation strategies: a stack-like (LIFO) allocator

  1. Why do you store 32-bit allocationOffset, while all you need is just 8-bit distance from the end of the previous block to the beginning of the next one? Something like this:
    void* StackAllocator::Allocate(size_t size, size_t alignment, size_t offset)
    {
    const uint32_t previous = m_current;
    [… same …]
    *as_uint32_t = m_current – previous;
    }

    void StackAllocator::Free(void* ptr)
    {
    [… same …]
    m_current = as_char – *as_uint32_t;
    }

    Am I missing something here?

    Also, why to waste 4 bytes on _each_ allocation, while all you care about is the last chunk’s address? Just m_last = m_current on Allocate and ME_ASSERT(ptr == m_last, “Wrong allocation/deallocation order”); on Free.

    • Firstly, regarding the 32-bit offset: An 8-bit distance to the previous block will only suffice as long as allocations are no larger than 255 bytes, which clearly isn’t enough. Similarly, you absolutely need to store offsets/distances with 32 bits, because otherwise no allocation could ever be larger than 65535 bytes.
      Additionally, even if you could store an 8-bit distance (living with the <= 255 bytes assumption), most types need a minimum alignment of 4 bytes anyway, forcing you to offset/align/pad internally, making you "waste" the same 32 bits essentially.

      Secondly, asserting on m_last == m_current will only work for one single allocation. Imagine just doing 2 allocations in a row: m_last will point to the last chunk's address, so you can correctly assert on that. But what will m_last be after this allocation has been freed? You would need e.g. a stack of m_last, which is what I'm doing anyway, just that it's stored in-place.

      Example:
      void* alloc0 = allocator.Allocate(4, 4, 0); // alloc0 = 0x0000
      void* alloc1 = allocator.Allocate(4, 4, 0); // alloc1 = 0x0004
      // at this point, m_last would probably be 0x0004
      allocator.Free(alloc1);

      What will m_last now be? It should be 0x0000, but where do you get that value from? You need to store it somewhere.

      • It’s the distance from the _end_ of the prev block, not the start. You take your original m_current pointer, offset and align it, and then substract its original value. Here’s an example:

        current = 0
        alloc(10, 4, 0) // ptr = 0x04, current = 0x0E, dist = 0x04 – 0x00 = 4
        alloc(10, 4, 0) // ptr = 0x10, current = 0x1A, dist = 0x10 – 0x0E = 2
        alloc(32, 16, 0) // ptr = 0x20, current = 0x40, dist = 0x20- 0x1A = 6

        free(0x20) // dist = *(ptr – 1) = 6, current = ptr – dist = 0x20 – 0x6 = 0x1A
        free(0x10) // dist = *(ptr – 1) = 2, current = ptr – dist = 0x10 – 0x2 = 0x0E
        free(0x04) // dist = *(ptr – 1) = 4, current = ptr – dist = 0x04 – 0x4 = 0x00

        I.e. it’s a gap between the end of the last block and start of a new one. It should never exceed 255 bytes.

        Regarding m_last: yep, my bad. I was under impression that we have each chunk’s size stored somewhere, but it turns out it isn’t the case here.

      • True, 8-bit would suffice in that case. But as I said, with a default alignment of 4 bytes in almost every case, you need to align internally, so there’s not much to save here, is it?

        It’s a nice idea to keep in mind for other, more specialized allocators, though.

      • (WP doesn’t allow me to answer the latest message 😦 )

        It depends on your allocation granularity. If you often allocate odd sizes like 11 or 13 bytes, this will save you some memory.

        Personally, I fill that extra space with a special pattern to work as a guard against buffer overruns. Unfortunately, they are quite common in our domain (antivirus that runs rogue apps in the emulator).

      • In game development, I think I never had to allocate odd sizes like 11 or 13 bytes :). Structs will usually be padded/aligned to 4-byte boundaries anyway, unless you explicitly disable it or change the compiler’s settings.

        Additionally, alignments of more than 256 bytes are not that uncommon for allocations on certain consoles, which needs at least a 16-bit offset to make it work. Personally, I’m not too worried about saving 2 bytes for an allocation here or there – if somebody makes more than a million allocations, there’s other problems that need to be taken care of.

  2. Pingback: Memory allocation strategies: a pool allocator | Molecular Musings

  3. Pingback: Memory Management | Charles' World

  4. How do you handle bounds checking with your lifo allocator? If you follow the same strategy from the linear allocator (Use offset param, then write right before user returned address) you end up overwriting the offset written by the lifo allocator to handle deallocation.

    I have an allocator adapter to add canaries to any allocation of a given allocator:

    template
    class CannaryAllocator : public Alloc
    {
    public:
    template
    CannaryAllocator(Args&&… args) :
    Alloc(std::forward(args)…)
    {}

    void* allocate(std::size_t size, std::size_t alignment, std::size_t offset = 0)
    {
    char* user_ptr = reinterpret_cast(
    Alloc::allocate(size + 4, alignment, 4)
    );

    std::fill(user_ptr – 4, user_ptr, 0xf);
    std::fill(user_ptr, user_ptr + size, 0xc);
    std::fill(user_ptr + size, user_ptr + size + 4, 0xf);

    return user_ptr;
    }

    void deallocate(void* pointer, std::size_t size)
    {
    Alloc::deallocate(pointer, size + 4);
    }
    };

    But there’s no generic way to handle this, right?

    Thanks in advance.

    • The way I do it is that allocators don’t care about cannaries, tracking, or other mostly development-only features. Those are implemented in the arenas, where policies are used to “tack” those features onto an allocator.

      The arena uses an allocator internally, and always asks for allocation_size + bounds_checking_overhead bytes of memory. Then the allocator can allocate more internally if it needs to (such as the LIFO allocator), and hands that allocation back to the arena. The arena then hands the allocation over to the bounds checker (if any), offsets the pointer, and finally hands the memory back to the user.

      With a LIFO allocator and cannaries, the memory layout of the allocation would then be:

      Offset | Cannary Front | User Memory | Cannary Back

      User Memory is requested by the user, Cannary Front + User Memory + Cannary Back are requested from the allocator. The allocator internally adds 4/8 for the Offset, and returns a pointer to Cannary Front. This is being taken care of by the bounds checker, which bumps the pointer to User Memory.

  5. Small question but should the out of memory check not include the offset ?

    if (m_current + size + offset > m_end)

    instead of

    if (m_current + size > m_end)

    Thanks for the great blog, very informative.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.