Continuing from where we left of last time, I would like to discuss how we can build growing allocators using a virtual memory system. This post describes how to build a stack-like allocator that can automatically grow up to a given maximum size.

As explained in one of the last posts, virtual memory allows us to reserve address space without allocating any physical memory for it. We can use this to our benefit, and build an allocator that reserves address space up-front, and allocates physical memory whenever we need it. In that regard, the allocator behaves similar to the non-growing stack allocator, but is not restricted to working within a fixed region of memory.

Such an allocator is extremely useful for situations like e.g. loading level data, where the amount of memory needed varies between different levels. Using a growing allocator saves development time because there is no need to constantly change the upper bound of a fixed-size allocator whenever a certain level exceeds the previous worst-case memory usage.

An example

Consider a game where the maximum amount of memory that is spent on level-resident resources is defined to be 128 MB. What we want to have is an allocator that reserves address space for e.g. 256 MB (to be on the safe side), and allocates physical memory in 1 MB blocks whenever needed.

The basic steps of building such an allocator are simple:

Reserve address space for 256 MB.
Store the start and end of the reserved address space.
Store the start and end of the physical address space (=0 bytes in the beginning).
For each allocation:
- Work out the required amount of physical memory.
- If there is still enough physical memory left, bump the start of the physical address space (see above).
- If there is not enough memory left, allocate 1 MB pages at the end of the current physical address space until the allocation fits.
- Update the start and end of the physical address space.

In code this could look like the following:

GrowingStackAllocator::GrowingStackAllocator(unsigned int maxSizeInBytes, unsigned int growSize)
  : m_virtualStart(virtualMemory::ReserveAddressSpace(maxSizeInBytes))
  , m_virtualEnd(m_virtualStart + maxSizeInBytes)
  , m_physicalCurrent(m_virtualStart)
  , m_physicalEnd(m_virtualStart)
  , m_growSize(growSize)
{
}

In the constructor, we just reserve address space and store the start and end of it.

Allocating memory

void* GrowingStackAllocator::Allocate(size_t size, size_t alignment, size_t offset)
{
  const size_t allocationSize = size;

  // work out proper allocation sizes and possible offsets
  ...

  m_physicalCurrent = core::pointerUtil::AlignTop(m_physicalCurrent + offset, alignment) - offset;

  // is there enough physical memory left?
  if (m_physicalCurrent + size > m_physicalEnd)
  {
    // out of physical memory. check if there is still address space left from which new physical pages can be allocated.
    // remember that virtual memory must always be allocated in page-size chunks, that is why we round the needed size to
    // the next grow-size multiple.
    const size_t neededPhysicalSize = bitUtil::RoundUpToMultiple(size, m_growSize);
    if (m_physicalEnd + neededPhysicalSize > m_virtualEnd)
    {
      // the allocation doesn't fit into the address space, we're out of memory
      return nullptr;
    }

    // allocate new memory pages at the end of our currently allocated pages
    virtualMemory::AllocatePhysicalMemory(m_physicalEnd, neededPhysicalSize);
    m_physicalEnd += neededPhysicalSize;
  }

  // store book-keeping information if necessary
  ...

  // return allocation to user
  return m_physicalCurrent;
}

As described above, we allocate new pages whenever there is not enough physical memory left for the allocation to fit into. In that case, new pages are allocated at the end of the address range that already has physical memory committed to it. If there is not enough space left in the address space dedicated to this allocator, then we are out of memory.

The granularity with which the allocator grows is specified by the user in the constructor. It is a typical performance/memory-tradeoff regarding performance of allocations vs. wasted memory. If the allocator grows in 64 KB pages, it has to do lots of small allocations via the virtual memory system, but only wastes up to 64 KB, which is not much. In comparison, growing in 1 MB pages leads to far fewer allocations, but can waste up to 1 MB of memory that is backed by physical memory, but never used.

Freeing memory

One thing that is left to discuss is how the allocator behaves when freeing allocations. The primary thing to consider is whether the allocator should return physical memory pages whenever enough allocations have been freed. Depending on the allocation behaviour, that can be either good or bad.

When freeing level-resident resources, we mostly have several large allocations, and want to release physical memory as soon as possible. In situations where the allocator is used to make many small allocations, it could be more beneficial to keep physical memory that is committed to the address space. Otherwise, the allocator could suffer performance penalties whenever allocations straddle a page boundary, causing pages to be constantly allocated, freed, allocated again, and so on.

The way this is handled in Molecule is that whenever an allocations is freed, the allocator does not return physical memory to the OS. Instead, physical memory which is no longer
needed must be explicitly released by calling Purge(), which is an extra method offered by growing allocators. In case of the growing stack allocator, it simply checks which
pages are no longer needed, and returns them to the OS:

void GrowingStackAllocator::Purge(void)
{
  // we need to free all physical memory pages which are no longer needed by any allocation while making sure that we don't free the page
  // we're currently pointing to (remember that virtual memory only works in page-size granularity).
  // we do this by rounding the current physical memory pointer to the next grow-size boundary, and freeing all physical memory from there.
  char* addressToFree = pointerUtil::AlignTop(m_physicalCurrent, m_growSize);
  const unsigned int sizeToFree = safe_static_cast(m_physicalEnd - addressToFree);
  virtualMemory::FreePhysicalMemory(addressToFree, sizeToFree);

  m_physicalEnd = addressToFree;
}

Conclusion

Having growing allocators that make use of the virtual memory system allows us to specify a safe upper-bound for the amount of memory needed during development without wasting physical memory. Furthermore, knowing in which address range certain allocations lie can be a tremendous help when debugging.

12 thoughts on “Memory allocation strategies: a growing stack-like (LIFO) allocator”

Very interesting, thanks for sharing. I’m just wondering how custom allocators can be helpful for resources which are stored in the memory managed by external systems, e.g. textures which are uploaded into GPU’s memory. Once loaded into the temporary memory allocated with our custom allocator the texture is then transferred to the GPU memory and we must deallocate this temporary memory in order to avoid memory duplication… I guess, in this case custom allocator only allows us to quickly allocate this temporary memory but there still can be performance hit when the external system allocates its own memory and copies data.

Reply ↓

Stefan Reinalter on February 18, 2013 at 12:46 pm said:

I think that depends on the type of resource, and the platform and API you’re working with.

There are certain resources where it’s better to duplicate memory so you don’t pay huge performance penalties. An example would be a dynamic vertex buffer which is generated on the CPU and uploaded to a GPU vertex buffer each frame (using a ring-buffer scheme for maximum performance).

On the PC, there’s generally the GPU driver which will take care of managing memory for GPU resources for us, so there’s not much we can do. On consoles however, you have to allocate memory for every vertex buffer, index buffer, texture, etc. yourself. There you have to pay attention to the correct alignment so the GPU doesn’t crash (varies between GPUs/APIs), map memory in order for the GPU to see it, allocate memory with the “correct” properties depending on usage (e.g. write-combined memory), and much more. It’s more work, but ultimately you are in complete control. This allows you to do things not possible on the PC, e.g. aliasing the same eDRAM addresses for different things because you know you don’t need them at the same time during rendering. An example would be allocating memory for a shadow map, using that during the main render pass, then at a later point using the same memory for a completely different rendertarget.

Custom allocators surely come in handy in such situations!

Reply ↓

Really interesting post, thanks for sharing. I am slightly curious what the actual implementation details of virtualMemory::ReserveAddressSpace(maxSizeInBytes) are. Does this just involve a call to system new, or is something else involved? Thanks!

Reply ↓

Stefan Reinalter on May 21, 2014 at 12:06 pm said:

I am slightly curious what the actual implementation details of virtualMemory::ReserveAddressSpace(maxSizeInBytes) are. Does this just involve a call to system new, or is something else involved?

No, a call to system new would be wrong, because that commits physical storage. We only want to reserve address space, without commiting memory to it (yet).
I use VirtualAlloc(nullptr, size, MEM_RESERVE, PAGE_NOACCESS); for that.

Reply ↓
- Tom Hulton-Harrop on May 21, 2014 at 9:08 pm said:
  
  Ah awesome, thank you for clarifying! I don’t suppose you could recommend any references/resources that go into more detail on this subject, I’d love to learn more about it. Your blog is a fantastic and fascinating resource btw, some really amazing and inspiring stuff, thank you for sharing! Thanks for the reply!
- Stefan Reinalter on May 22, 2014 at 3:29 pm said:
  
  Thanks for the kind words!
  Drop me a line at “firstname dot lastname at molecular dash matters dot com”, and I’m sure I can provide you with references & resources, depending on what you need.

Hi Stefan,
your posts are always inspiring and I’m learning a lot from you! That said I have a doubt about how you free the physical memory. I’m new to virtual memory mechanism, and as you say, the allocation/deallocation of the physical memory works on a page-size granularity (or at least a multiple of that size e.g. windows has a specific allocationGranularity that is bigger than the page-size), but what I don’t grasp is the code:

char* addressToFree = pointerUtil::AlignTop(m_physicalCurrent, m_growSize);

Here I was expecting a “pageSize” parameter instead of m_growSize, in my mind I would give to the freePhysicalMemory function an address simply align to the page size, that should belong to a next page allowing, for example, VirtualFree on Windows to deallocate that page and all other pages containing an address in the range “addr + sizeToFree”. Surely I’m wrong :P, could you explain me your choice and the why?

-DC

Reply ↓

Stefan Reinalter on May 7, 2015 at 5:52 pm said:

Hi Daniele,

Yes, you could pass pageSize instead of m_growSize, and that would still work, doing exactly what you described. The reason I use m_growSize is simply that I only want the allocator to grow and shrink in m_growSize increments, so the user can control how much physical memory is allocated/purged without having to know the platform’s page size.

Reply ↓
- Daniele on May 7, 2015 at 6:00 pm said:
  
  thanks for the clarification!

Hi Stefan,

Very interesting post, it made virtual memory much easier to understand for me!
Am I correct to assume that “growSize” must strictly be a multiple of the OS page size?

I’m asking because, while reading about “VirtualFree”, I found the following: “a region of memory that straddles a page boundary causes both pages to be decommitted”.

Let’s assume the following situation: “addressToFree”, rounded up to a multiple of (an arbitrarily chosen, but big enough) “growSize”, ends up close to the end of a page, and so the address range spans two pages – therefore “VirtualFree” would decommit both, even though we’re still using memory from the first one. Am I missing something? Or does your “FreePhysicalMemory” function do some additional corrections perhaps?

Thanks!

Reply ↓

Stefan Reinalter on June 9, 2015 at 12:00 pm said:

Hi Adam,

Am I correct to assume that “growSize” must strictly be a multiple of the OS page size?

In my implementation, yes.

Let’s assume the following situation: “addressToFree”, rounded up to a multiple of (an arbitrarily chosen, but big enough) “growSize”, ends up close to the end of a page, and so the address range spans two pages – therefore “VirtualFree” would decommit both, even though we’re still using memory from the first one. Am I missing something? Or does your “FreePhysicalMemory” function do some additional corrections perhaps?

No, FreePhysicalMemory doesn’t do any additional things, because that situation can never happen as long as growSize is a multiple of the OS page size. The allocator purges memory in growSize-sized chunks, so regions of memory that are freed can never straddle a page boundary.

Reply ↓
- Adam Bodorkos on June 9, 2015 at 12:22 pm said:
  
  Thank you for the quick answer!

Pavel Shevaev on February 17, 2013 at 12:59 pm said:

Very interesting, thanks for sharing. I’m just wondering how custom allocators can be helpful for resources which are stored in the memory managed by external systems, e.g. textures which are uploaded into GPU’s memory. Once loaded into the temporary memory allocated with our custom allocator the texture is then transferred to the GPU memory and we must deallocate this temporary memory in order to avoid memory duplication… I guess, in this case custom allocator only allows us to quickly allocate this temporary memory but there still can be performance hit when the external system allocates its own memory and copies data.

Reply ↓
- Stefan Reinalter on February 18, 2013 at 12:46 pm said:
  
  I think that depends on the type of resource, and the platform and API you’re working with.
  
  There are certain resources where it’s better to duplicate memory so you don’t pay huge performance penalties. An example would be a dynamic vertex buffer which is generated on the CPU and uploaded to a GPU vertex buffer each frame (using a ring-buffer scheme for maximum performance).
  
  On the PC, there’s generally the GPU driver which will take care of managing memory for GPU resources for us, so there’s not much we can do. On consoles however, you have to allocate memory for every vertex buffer, index buffer, texture, etc. yourself. There you have to pay attention to the correct alignment so the GPU doesn’t crash (varies between GPUs/APIs), map memory in order for the GPU to see it, allocate memory with the “correct” properties depending on usage (e.g. write-combined memory), and much more. It’s more work, but ultimately you are in complete control. This allows you to do things not possible on the PC, e.g. aliasing the same eDRAM addresses for different things because you know you don’t need them at the same time during rendering. An example would be allocating memory for a shadow map, using that during the main render pass, then at a later point using the same memory for a completely different rendertarget.
  
  Custom allocators surely come in handy in such situations!
  
  Reply ↓
Tom on May 20, 2014 at 10:25 am said:

Really interesting post, thanks for sharing. I am slightly curious what the actual implementation details of virtualMemory::ReserveAddressSpace(maxSizeInBytes) are. Does this just involve a call to system new, or is something else involved? Thanks!

Reply ↓
- Stefan Reinalter on May 21, 2014 at 12:06 pm said:
  
  I am slightly curious what the actual implementation details of virtualMemory::ReserveAddressSpace(maxSizeInBytes) are. Does this just involve a call to system new, or is something else involved?
  
  No, a call to system new would be wrong, because that commits physical storage. We only want to reserve address space, without commiting memory to it (yet).
  I use VirtualAlloc(nullptr, size, MEM_RESERVE, PAGE_NOACCESS); for that.
  
  Reply ↓
  - Tom Hulton-Harrop on May 21, 2014 at 9:08 pm said:
    
    Ah awesome, thank you for clarifying! I don’t suppose you could recommend any references/resources that go into more detail on this subject, I’d love to learn more about it. Your blog is a fantastic and fascinating resource btw, some really amazing and inspiring stuff, thank you for sharing! Thanks for the reply!
  - Stefan Reinalter on May 22, 2014 at 3:29 pm said:
    
    Thanks for the kind words!
    Drop me a line at “firstname dot lastname at molecular dash matters dot com”, and I’m sure I can provide you with references & resources, depending on what you need.
Daniele on May 5, 2015 at 4:40 pm said:

Hi Stefan,
your posts are always inspiring and I’m learning a lot from you! That said I have a doubt about how you free the physical memory. I’m new to virtual memory mechanism, and as you say, the allocation/deallocation of the physical memory works on a page-size granularity (or at least a multiple of that size e.g. windows has a specific allocationGranularity that is bigger than the page-size), but what I don’t grasp is the code:

char* addressToFree = pointerUtil::AlignTop(m_physicalCurrent, m_growSize);

Here I was expecting a “pageSize” parameter instead of m_growSize, in my mind I would give to the freePhysicalMemory function an address simply align to the page size, that should belong to a next page allowing, for example, VirtualFree on Windows to deallocate that page and all other pages containing an address in the range “addr + sizeToFree”. Surely I’m wrong :P, could you explain me your choice and the why?

-DC

Reply ↓
- Stefan Reinalter on May 7, 2015 at 5:52 pm said:
  
  Hi Daniele,
  
  Yes, you could pass pageSize instead of m_growSize, and that would still work, doing exactly what you described. The reason I use m_growSize is simply that I only want the allocator to grow and shrink in m_growSize increments, so the user can control how much physical memory is allocated/purged without having to know the platform’s page size.
  
  Reply ↓
  - Daniele on May 7, 2015 at 6:00 pm said:
    
    thanks for the clarification!
Adam Bodorkos on June 9, 2015 at 11:43 am said:

Hi Stefan,

Very interesting post, it made virtual memory much easier to understand for me!
Am I correct to assume that “growSize” must strictly be a multiple of the OS page size?

I’m asking because, while reading about “VirtualFree”, I found the following: “a region of memory that straddles a page boundary causes both pages to be decommitted”.

Let’s assume the following situation: “addressToFree”, rounded up to a multiple of (an arbitrarily chosen, but big enough) “growSize”, ends up close to the end of a page, and so the address range spans two pages – therefore “VirtualFree” would decommit both, even though we’re still using memory from the first one. Am I missing something? Or does your “FreePhysicalMemory” function do some additional corrections perhaps?

Thanks!

Reply ↓
- Stefan Reinalter on June 9, 2015 at 12:00 pm said:
  
  Hi Adam,
  
  Am I correct to assume that “growSize” must strictly be a multiple of the OS page size?
  
  In my implementation, yes.
  
  Let’s assume the following situation: “addressToFree”, rounded up to a multiple of (an arbitrarily chosen, but big enough) “growSize”, ends up close to the end of a page, and so the address range spans two pages – therefore “VirtualFree” would decommit both, even though we’re still using memory from the first one. Am I missing something? Or does your “FreePhysicalMemory” function do some additional corrections perhaps?
  
  No, FreePhysicalMemory doesn’t do any additional things, because that situation can never happen as long as growSize is a multiple of the OS page size. The allocator purges memory in growSize-sized chunks, so regions of memory that are freed can never straddle a page boundary.
  
  Reply ↓
  - Adam Bodorkos on June 9, 2015 at 12:22 pm said:
    
    Thank you for the quick answer!

Molecular Musings

Development blog of the Molecule Engine

Memory allocation strategies: a growing stack-like (LIFO) allocator

An example

Allocating memory

Freeing memory

Conclusion

12 thoughts on “Memory allocation strategies: a growing stack-like (LIFO) allocator”

Leave a reply to Stefan Reinalter Cancel reply

An example

Allocating memory

Freeing memory

Conclusion

Share this:

Related

12 thoughts on “Memory allocation strategies: a growing stack-like (LIFO) allocator”

Leave a reply to Stefan Reinalter Cancel reply