Before we can delve into the inner workings of growing allocators, I would like to explain the concept of virtual memory and discuss what it is, why it is needed, and what we can use it for.
What is virtual memory?
Speaking in simple terms, virtual memory provides an extra indirection when accessing memory. It provides an abstraction of the virtual address space of a process, which lets each process think that it’s alone in the system. It lets us write programs without having to worry which other programs allocated memory, and without having to worry which physical memory we need to access. Even though virtual memory is often also mentioned in conjunction with paging to hard disk (=swapping), this is not what we are interested in!
Consider a system having 4 GB of RAM that consists of four physical 1 GB RAM units. We are able to allocate more than 1 GB of contiguous memory, even though there’s no actual physical memory unit that is larger than 1 GB. This works thanks to virtual memory.
Similarly, the allocations we make inside an application return virtual addresses which are valid inside our process. Such an address could be 0x80000000, and another process can also have allocations residing at 0x80000000, and yet everything works thanks to virtual memory.
Address translation
Before touching actual physical memory, the virtual address needs to be translated. This virtual address to physical address translation is being taken care of by the MMU of the CPU. Modern CPUs also have a TLB which is used to speed up this translation.
Traditionally, this translation is done on a page-by-page basis. This means that on the OS-level, memory can only be allocated in so-called pages of a certain size. As an example, Windows 7 has a default page-size of 4 KB. Consequently, this also means that whenever you allocate memory directly from the OS, you can only allocate it with page-size granularity.
The details of address translation are actually quite involved, and here is a very good post describing this process for x86 and Cell architectures: Memory Address Translation.
Allocating pages
As an example, allocating just 10 bytes of memory using VirtualAlloc (the low-level allocation function on Windows) will allocate a whole page, that is 4096 bytes. You can access all of the 4096 bytes without triggering an access violation, eventhough you only requested 10 bytes.
Of course, page sizes differ across platforms (consoles). Some platforms even offer more than just one page-size. The reason for this is that because the TLB normally is of limited size, increasing the page-size can lead to fewer TLB misses (similar to cache misses), resulting in an increase in performance. However, larger pages can also lead to more wasted memory if you’re not careful. Thus, it’s a typical space/time-tradeoff.
Normally, the OS memory allocator (e.g. malloc/free) takes care of allocating pages, coalescing nearby allocations into contiguous regions, putting several small allocations on the same page, etc. However, as soon as we want to implement our own general-purpose allocator or any other custom allocation scheme, we need to be aware of such details, and cannot use malloc/free for our purposes.
Furthermore, knowing about such low-level details enables us to use a wealth of new debugging techniques like using protected pages, guard pages, etc. As an example, some pages could be marked read-only in order to find memory stomps, race conditions (writes on shared data), and more. Guard pages serve as a one-shot alarm for memory page access, and are e.g. used for growing an application’s stack. Applications like PageHeap use those features for finding memory accesses beyond an allocation’s boundary.
Use cases
MMU, TLB, pages, address translation, memory protection… that all sounds wonderful, but what can we do with it?
Because the OS clearly distinguishes between reserving address space (see MEM_RESERVE for the VirtualAlloc function) and allocating physical memory for address space (see MEM_COMMIT), we can build allocators that can grow to a specified upper limit, but only allocate the memory they actually need.
This is very, very useful when implementing growing allocators, because we can reserve a contiguous region of memory (=virtual address space), but only commit physical memory to it whenever we need it.
Virtual memory addressing is supported by all common desktop OSs (Windows, Linux, Mac), almost all consoles (cannot go into details because of NDAs) and even on mobiles like the iPhone. How different growing allocators can be built using virtual memory on those platforms will be the topic of the next posts!
Do you have any plans of a post there you describe how you imlement virtual memory in your engine?
It’s quite simple, really.
The engine provides a namespace containing free functions for reserving/releasing address space, allocating/freeing physical memory, getting the page size, etc.
On Windows, the implementations of these functions use VirtualAlloc and VirtualFree with different parameters, and that’s basically all there is to it.
All allocators which make use of virtual memory (namely all growing allocators) simply call those functions.
If you want to have a bit more detailed information, please feel free to check out the documentation contained in the evaluation SDK.
Thanks for your quick answer. I will check out the SDK.
I’m a novice on this so I have one question.
How do a growing allocator handle the scenario then it need to get more virtual memory (new pages not contiguous with the old pages), and how does it know from whitch page to deallocate from?
To be a litle bit clearer on what I mean. The scenario is that the allocator don’t have enougth free memory in the current virtual pages.
If the OS offers a virtual memory system (like Windows does), a growing allocator can reserve address space up-front, and request physical memory whenever it needs to grow.
As an example, a growing stack-based allocator could reserve e.g. 256 MB of address space, and request physical memory that lies in that address space whenever it needs to grow. This ensures that the allocations are always contiguous without having to allocate a large amount of memory up-front. If the target platform doesn’t offer virtual memory, a growing allocator can either allocate memory for the worst-case up-front and be done with it, or make new allocations on-the-fly. In the latter case, allocated pages are mostly stored in a doubly linked-list, ensuring that adding pages to and removing pages from the list is a cheap O(1) operation. When the pages need to be freed, just walk the linked list and free the respective pages.
Check out VirtualAlloc, MEM_COMMIT and MEM_RESERVE to try for yourself how these things work and behave.
Pingback: Memory allocation strategies: a growing stack-like (LIFO) allocator | Molecular Musings