Memory system – Part 2

In the last installment of the memory system series, we covered how new, delete, and their variants work. This time, we’re going to build our own little toolset which allows us to create new instances (or arrays of instances) of any type using custom allocator classes in a standards-compliant way. Be prepared for some function templates, type-based dispatching, template magic, and nifty macros.

What I essentially wanted for Molecule’s memory system was to create new instances using the following syntax (not legal C++):

Arena arena; // one of many memory arenas
// ...
Test* test = new (arena, additionalInfo) Test(0, 1, 2);
delete (test, arena); // no placement-syntax of delete
// ...
Test* test = new (arena, additionalInfo) Test[10];
delete[] (test, arena); // no placement-syntax of delete[]

We could get the new operator to work, but the above is not legal in C++ because the delete operator has no placement syntax, that is it only accepts one single argument. We could do something like the above by directly calling operator delete, but then we run into problems with operator delete[] like discussed last time, because we can’t call the destructors in a compiler-independent, portable way.

Additionally, we want to pass some extra information like file name, line number, class name, memory tag, etc. to the memory arena (which in turn takes care of allocating memory), so a macro-based solution it is. Still, we want to keep as much as possible of the original C++ syntax, so the solution we’re aiming for looks like the following piece of code:

Test* test = ME_NEW(Test, arena)(0, 1, 2);
ME_DELETE(test, arena);
// ...
Test* test = ME_NEW_ARRAY(Test[3], arena);
ME_DELETE_ARRAY(test, arena);

Let’s deal with implementing the macros and the underlying functions one by one, starting with the least complex ones.

ME_NEW

Like the ordinary new operator, ME_NEW first needs to allocate memory for the given type, and then call its constructor at the correct location. Easily enough, this can be done with just one line, using placement new:

#define ME_NEW(type, arena)    new (arena.Allocate(sizeof(type), __FILE__, __LINE__)) type

What we do is using placement new with the memory address returned to us by a call to arena.Allocate(), while using the preprocessor symbols __FILE__ and __LINE__ to feed the arena the file name and line number where the allocation was made. Additionally, “type” is added at the end so you can still provide constructor arguments after the macro call, like this:

// Test is a class taking 3 ints in the constructor
Test* test = ME_NEW(Test, arena)(0, 1, 2);

// the preprocessor expands the above into:
Test* test = new (arena.Allocate(sizeof(Test), "test.cpp", 123)) Test(0, 1, 2);

Using ME_NEW, we can provide a memory arena which is responsible for allocating memory, pass additional information to it, and still have syntax which resembles the original new operator syntax. So let’s move on to the next macro.

ME_DELETE

Every instance created using ME_NEW needs to be deleted with a call to ME_DELETE. Keep in mind that there is no placement-form of the delete operator, so we must either use operator delete, or something completely different – otherwise we cannot pass the additional arena argument around. Either way, we must make sure to call the destructor for the given instance. We can do that by deferring the actual deletion of the object to a helper function:

#define ME_DELETE(object, arena)    Delete(object, arena)

Our helper function is implemented using a simple function template:

template <typename T, class ARENA>
void Delete(T* object, ARENA& arena)
{
    // call the destructor first...
    object->~T();

    // ...and free the associated memory
    arena.Free(object);
}

The compiler can deduce both template arguments for us, so we don’t need to explicitly specify any of the template arguments. So far, so good. Off to the next macro.

ME_NEW_ARRAY

Things start getting more complex here, so keep reading. What we first need is a helper function which allocates memory for N instances of a certain type, and properly constructs them using placement new. Because it has to work for generic types and arenas, we again use a function template:

template <typename T, class ARENA>
T* NewArray(ARENA& arena, size_t N, const char* file, int line)
{
  union
  {
    void* as_void;
    size_t* as_size_t;
    T* as_T;
  };

  as_void = arena.Allocate(sizeof(T)*N + sizeof(size_t), file, line);

  // store number of instances in first size_t bytes
  *as_size_t++ = N;

  // construct instances using placement new
  const T* const onePastLast = as_T + N;
  while (as_T < onePastLast)
    new (as_T++) T;

  // hand user the pointer to the first instance
  return (as_T - N);
}

What the code does should be clear from the comments. One thing to note is that we allocate memory for N instances of type T, and an additional sizeof(size_t) bytes for storing the number of instances. That is, our memory layout for allocating 3 instances of type T would look like this (assuming sizeof(size_t) == 4, and sizeof(T) == 4):

Bytes 0-3: N
Bytes 4-7: T[0]
Bytes 8-11: T[1]
Bytes 12-15: T[2]

The pointer returned to the user is offset by sizeof(size_t) bytes, so it correctly points to the first instance. If we were to call this function directly in C++ code, it would look like the following:

Test* t = NewArray<Test>(arena, 3, __FILE__, __LINE__);

There’s a small problem here – because the type T is not used as a function argument anywhere (but only as return type), the compiler cannot deduce its type. Therefore, we have to specify it directly in the function call, but we don’t know the type in the ME_NEW_ARRAY macro. And neither do we know the number of instances from the macro argument we get. Recall what the macro-call should look like:

Test* test = ME_NEW_ARRAY(Test[3], arena);

We really want to have our cake, and keep this syntax – so what do we do? Let’s try to write the macro, and fill in the missing parts:

#define ME_NEW_ARRAY(type, arena)    NewArray<?>(arena, ?, __FILE__, __LINE__)

What we now need is some kind of helper function which is able to deduce both the type and count from a single argument. Let’s conjure up some template magic which does exactly that! Observe:

template <class T>
struct TypeAndCount
{
};

template <class T, size_t N>
struct TypeAndCount<T[N]>
{
  typedef T Type;
  static const size_t Count = N;
};

The base template TypeAndCount simply takes a single template argument, and is completely empty. But by providing a partial template specialization for types taking the form T[N], we can define both the type and extract N at compile-time. If you’re not used to compile-time tricks like this, let me show you how it can be used by filling in the missing parts in our macro:

#define ME_NEW_ARRAY(type, arena)    NewArray<TypeAndCount<type>::Type>(arena, TypeAndCount<type>::Count, __FILE__, __LINE__)

See what we did here? Let’s look at the different steps involved by looking at the corresponding source for a call to ME_NEW_ARRAY(Test[3], arena).

First, it’s the preprocessor’s turn:

  • The macro part TypeAndCount<type>::Type will expand to TypeAndCount<Test[3]>::Type.
  • The macro part TypeAndCount<type>::Count will expand to
    TypeAndCount<Test[3]>::Count.

After that, it’s the compiler’s turn:

  • The partial template specialization for TypeAndCount<Test[3]>::Type will yield Test.
  • The partial template specialization for TypeAndCount<Test[3]>::Count will yield 3.

Hence, we simply plug in those two values into our ME_NEW_ARRAY macro, and don’t have to supply any other (redundant) arguments in the macro.

ME_DELETE_ARRAY

Ah, the last of the lot. Again, we need to come up with a helper function – one which first calls the destructors in reverse order (as per the C++ standard), and then frees the associated memory. Without further ado, here’s the implementation:

template <typename T, class ARENA>
void DeleteArray(T* ptr, ARENA& arena, NonPODType)
{
  union
  {
    size_t* as_size_t;
    T* as_T;
  };

  // user pointer points to first instance...
  as_T = ptr;

  // ...so go back size_t bytes and grab number of instances
  const size_t N = as_size_t[-1];

  // call instances' destructor in reverse order
  for (size_t i=N; i>0; --i)
    as_T[i-1].~T();

  arena.Free(as_size_t-1);
}

The implementation should be clear by reading the comments. And the macro is not too hard either, because we don’t need to explicitly specify the function template arguments – those can be deduced by the compiler. Hence, the macro is really simple:

#define ME_DELETE_ARRAY(object, arena)    DeleteArray(object, arena)

And that’s about it!

Basically, we’re finished now – we achieved our initial goals, and the implementation works for both POD and non-POD types. Still, there is something we can optimize: Remember that we don’t need to call constructors/destructors for POD-types, so we would like to optimize our NewArray and DeleteArray function templates in this regard.

This can be achieved via type-based dispatching in conjunction with template type-traits, but the explanation and implementation have to wait until the next installment in the series, because this post already got longer than expected.

8 thoughts on “Memory system – Part 2

    • Good question!

      The reasons I chose not to overload operator new/delete by default are twofold: Firstly, I don’t want to force my overloads upon my clients. They might be using other 3rd party software/middleware which overloads operator new/delete already (other middleware certainly does), which in turn causes them trouble.
      Secondly, not everyone of my clients will have access to a source-code license, so they cannot change operator new/delete’s behaviour at all.

      That being said, clients can of course still overload operator new/delete on their own if they wish, and can do whatever they want. They can use a general-purpose arena, they can use their own memory system they’re more comfortable with, etc.

  1. Pingback: Memory system – Part 3 | Molecular Musings

  2. Pingback: Memory allocation strategies: a linear allocator | Molecular Musings

  3. Nice article 🙂

    The type and count deduce works obviously only at compile time so I’m assuming you have another macro with the explicit N included as a third argument for run-time determined sizes?

    • Yes, you could have another macro for that.
      In the current version, I ditched the auto-deduce-version in favor of one macro with an explicit size argument. I noticed that I rarely need arrays anyway, because most of the time I either use a container or a special allocator (like a pool).

  4. Nice article. But it seems that NewArray doesn’t handle alignment correctly. If the alignment of T is stricter than size_t, the returned pointer would not be aligned for T. On x86, unaligned access is not a problem anyway.

Leave a reply to ashen Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.