Adventures in data-oriented design – Part 3b: Internal References

Posted on May 17, 2013 by Stefan Reinalter

As promised in the last blog post, today we are going to take a look at how Molecule handles internal references to data owned by some other system in the engine.

First, a quick recap of reasons why we don’t want to use pointers for referencing data is in order:

With raw pointers, ownership is sometimes unclear. If I’m handed a pointer, do I need to delete the instance? Who owns it? How long can I hold on to it?
This very quickly leads to double-deletes and/or dangling pointers. Both are kinds of bugs which can be hard to find if you’re unlucky.
The above can be somewhat alleviated by using a shared_ptr<> or some reference-counting mechanism, but now we have added additional overhead which isn’t really necessary. Ownership is still unclear – when is the data behind the reference-counted pointer actually freed? Who else holds on to it?
How are pointers replicated or copied, e.g. across the network? You always have to have some kind of serialization mechanism in place because you cannot just send pointers across the network – the address they contain won’t make sense in a different address space.
Pointers don’t support relocation. Ultimately, the system who owns the data should also be responsible for managing the data’s memory. Therefore, a system might want to move things around in memory, e.g. for run-time defragmentation. Notifying each and every instance that might hold a pointer to system-internal data is tedious and error-prone.

So, let us now take a closer look at how we can store internal references without running into the above-mentioned problems.

Handles

In the Molecule Engine, handles are used to refer to internal data. That is, they refer to data owned by some system directly, and not via some indirection. This is also the reason why they are called internal references.

What are handles? Basically, they are indices into the data, but with a twist. One can think of handles as „smart indices“. But before going into detail about handles, which problems do plain indices already solve?

You cannot accidentally call delete or free() on an index. Furthermore, if a system only deals with indices as input and output parameters, it should be clear that the system also owns the data.
Indices can be easily replicated and copied. They also support data relocation out-of-the-box: if we want to access the data at e.g. index 3, it doesn’t matter where the data itself resides, as long as it remains in the same order. It can reside at address 0xA000 or 0xB000 or someplace else – data[3] will give us the data we want.

Of course, there are things which are not supported by plain indices:

We cannot detect access to stale/deleted data. We might try to access the data at index 3, but that might have been freed already since our last access.
Whole data blocks can be moved around in memory, but the order of individual data items cannot be changed, because that would mess up our indices.

Handles help us with the first problem, but also don’t support arbitrary relocation of individual data items. This is what IDs or external references are for, but those will be the topic of the next post.

The question remains: how do we turn indices into handles that can detect access to already deleted data?

The idea is quite simple: instead of only using an index, a handle also stores the generation in which the index was created. The generation is simply a monotonically increasing counter that gets incremented each time a data item is deleted. The generation is stored both inside the handle, and for each data item. Whenever we want to access data using a handle, the index’ generation and the data item’s generation need to match.

An example

Going back to our example from the last post, let us assume our render backend provides space for 4k vertex buffers. New vertex buffers are allocated using a pool-allocator/free-list internally, and users only deal with a VertexBufferHandle.

Initially, our pool of vertex buffers is empty, and all generations are set to zero.

4096 Vertex buffers:
+----+----+----+----+----+----+
| VB | VB | VB | .. | VB | VB |
+----+----+----+----+----+----+

4096 Generations:
+----+----+----+----+----+----+
| 0  | 0  | 0  | .. | 0  | 0  |
+----+----+----+----+----+----+

The first time we allocate a vertex buffer, the handle will contain an index of 0, and a generation of 0. Future vertex buffer handles will have a different index, and also a generation of 0.
Assume we now destroy the first vertex buffer we allocated. The generation of the slot that contained it will increment, yielding the following layout:

+----+----+----+----+----+----+
| VB | VB | VB | .. | VB | VB |
+----+----+----+----+----+----+
+----+----+----+----+----+----+
| 1  | 0  | 0  | .. | 0  | 0  |
+----+----+----+----+----+----+

If we now want to access that vertex buffer using the handle, we check its generation against the one stored with our vertex buffer, and find that they don’t match – meaning that we tried to access already deleted data.

In code, this situation looks somewhat like the following:

VertexBufferHandle handle = backend::CreateVertexBuffer(...);

// some more vertex buffers created in the meantime

// at a later point in time, we destroy the vertex buffer...
backend::DestroyVertexBuffer(handle);
// ...but somebody, somewhere, still holds the same handle

backend::AccessVertexBuffer(handle);

Handle implementations

One thing we haven’t talked about yet is how handles can be implemented. As almost always, the simplest solutions are the best, so a trivial struct will suffice in this case:

struct Handle
{
  uint32_t index;
  uint32_t generation;
};

In practice, you normally would not use two 32-bit integers for both the index and the generation, but rather use bitfields instead. In the case of our vertex buffer handles, we need 12 bits for storing indices in the range [0, 4095], which leaves 20 bits for the generation if we want our handles to be 32-bit integers. Hence, our handles would look more like the following:

struct Handle
{
  uint32_t index : 12;
  uint32_t generation : 20;
};

This means that the generation overflows after 1048576 vertex buffers have been deleted in the same slot in our pool. Theoretically, this means that we could wrongly access a vertex buffer via an old handle that was generated more than 1048576 vertex buffer create/delete cycles ago, in that very slot. In practice this should never happen, unless we store an old handle for ages, create/delete buffers like crazy, and never access the buffer using that handle in the meantime.
Yet, depending on the number of bits you are willing to spend this can happen, so it is something to keep in mind.

Last but not least, another nice thing about handles that I mentioned in the previous blog post is that they use less memory than a pointer. Most handles can store their index and generation in a single 32-bit integer, which means they need half the amount of memory compared to 64-bit pointers. Additionally, we really only need to store the generation inside a handle for detecting access to stale data. We should not need that in a retail build, hence handles can be as small as 16-bit integers in those builds, if your indices only need to be in the range [0, 65535].

A generic implementation

In Molecule, I use a generic handle implementation which defines the underlying data types according to certain build rules, and also static_asserts whether the bits fit into that type. The basic struct is as follows:

template <size_t N1, size_t N2>
struct GenericHandle
{
  // uint16_t or uint32_t, depending on build type, realized using preprocessor-#ifs
  uint32_t index : N1;
  uint32_t generation : N2;
};

All handle types then become simple typedefs, e.g.:

typedef GenericHandle<12, 20> VertexBufferHandle;

And that concludes today’s post! In the next part in the series, we will discuss how external references also allow for moving individual data items around in memory, without user code having to care about that.

20 thoughts on “Adventures in data-oriented design – Part 3b: Internal References”

Vas on May 28, 2013 at 1:48 pm said:

Nice series of articles!
I was wondering if it’s worth mentioning that the handle system will incur an additional indirection compared to a raw pointer. Overall the locality of the referenced data, reduced storage compared to a full pointer, etc might be a win, but it’s still worth mentioning!

Reply ↓
- Stefan Reinalter on May 28, 2013 at 2:08 pm said:
  
  Yes, it is something worth mentioning, good point. Keep in mind that it is not really another indirection, though.
  
  Handles as discussed in this post are basically nothing more than an index into an array, so instead of doing this:
  
  instance->DoSomething();
  
  you do:
  
  instances[handle].DoSomething();
  
  So instead of fetching from memory directly, you read from memory + offset, which is the same as long as the offset is already in a register. Of course, in debug builds (or basically any build that also checks whether the generation is valid) there is some more overhead.
  
  The benefit of being able to relocate whole data blocks far outweighs this minor overhead, I would say. But as always, this also depends on what you use those handles for.
  
  IDs add a real indirection via an additional array, and hence might cause additional cache misses. But like handles, their advantages far outweigh the disadvantages. IDs will be the topic of the next post.
  
  Reply ↓
  - Vas on May 28, 2013 at 2:30 pm said:
    
    Presumably you still have to pay the cost of fetching instances[] in the first place…but maybe because all objects will reference the same base pointer then it’s already in a register?
  - Stefan Reinalter on May 28, 2013 at 2:53 pm said:
    
    Internal data accesses use a pointer for walking through the array anyway, so there’s no additional overhead.
    Accesses from outside will probably have to fetch instances[] again every now and then, but accesses from outside should be more seldom than internal accesses. That’s the rationale behind it, and the same is also true for IDs: minor overhead upon outside access, major benefits for internal access.
    
    In a component-based engine which uses different systems for bulk data-updates of components, internal accesses are touching all skinned meshes, animations, particles, etc.
    Outside accesses happen when e.g. a game object wants to access a certain component, and those should be far less than data being touched by individual systems.
Pingback: Adventures in data-oriented design – Part 3c: External References | Molecular Musings
Pingback: Is a forced integer buffer overflow legitime? | c# - TrauBo.Net
Pingback: Using runtime-compiled C++ code as a scripting language: under the hood | Molecular Musings
Marc Costa on August 7, 2014 at 1:33 pm said:

Hi Stefan,

I’ve been using your system in my own code for a while and I really like it! Thanks for your posts on game engine topics!

I have a Handle type like the one described in this post, but I was wondering if it is a problem than handles for different types only change in name, and not in type:

typedef GenericHandle VertexBufferHandle;
typedef GenericHandle IndexBufferHandle;

These two handles can be passed to the functions that take the other one with no problems.

I was thinking of using an extra struct for type identification, like this:

template
struct GenericHandle {…};

struct VertexBufferHandleTag {};
typedef GenericHandle VertexBufferHandle;

It could also be done with an enum, but then all possible types should be declared in a single enum across subsystems, which may not be ideal.

What do you think? Do you use something like this or do you stick with the “simpler” handle?

Thanks!
Marc

Reply ↓
- Stefan Reinalter on August 8, 2014 at 6:58 pm said:
  
  Hi Marc,
  
  Good point, I also use a struct for handles that the user needs to deal with.
  For internal-only handles, I still use an ordinary typedef.
  
  Reply ↓
heretique on October 9, 2015 at 10:02 am said:

Hey quick question:
Do you store the generation as an additional array, presumably with a smaller storage type (word , byte)?

Thanks,
Catalin

Reply ↓
- Stefan Reinalter on October 11, 2015 at 10:24 pm said:
  
  Yes, generations are stored in a separate array because they are only needed for debug checks anyway. It’s best to allocate them from development-only memory, and not include them in e.g. a master build.
  
  Reply ↓
brunohgouveia on November 18, 2015 at 12:51 am said:

Hi Stefan, great post!

I have a question: how do you handle non-POD objects such as scripts? I mean, in the ScriptSystem you will need an array of Script objects, which will probably have virtual methods or at least different sizes (due to different internal attributes). I can’t think in a solution but add an extra indirection, which is keep an array of Script pointers (and the objects won’t no longer be continuous in the memory).

Reply ↓
- Stefan Reinalter on November 18, 2015 at 11:05 am said:
  
  True, for things like scripts you need to have an extra indirection.
  But in this case, the extra indirection is not what matters, performance-wise. The script code, the fact that most gameplay-code can’t be run in parallel, etc. is what eats at your performance.
  
  Furthermore, you can still make sure that the scripts themselves are contiguous in memory by allocating them with a suitable allocator. As an example, I use a linear/stack-based allocator for all resources being loaded, and resources are sorted inside their resource bundles. This means that in memory, all script code (also meshes, textures, etc.) will be right next to each other, with pointers to scripts stored elsewhere.
  
  Reply ↓
  - brunohgouveia on November 18, 2015 at 3:09 pm said:
    
    So, do you have a single linear/stack allocator for all kinds of resources? And “resources are sorted inside their resource bundles” do you mean that your memory layout would be something like {texture1, texture2,texture3,….,mesh1,mesh1,mesh1,…,etc.}? If so… you can’t load resources on the fly.
  - Stefan Reinalter on November 18, 2015 at 3:59 pm said:
    
    Yes, I use the same allocator for all resources in a bundle. Why would it not be possible to load resources on the fly then?
    
    In Molecule, you are free to load as many different resource bundles as you like, and use different allocators for them (if you want). Hot-swapping of resources uses a different allocator for resources that have been swapped, because it is a development-only feature anyway. The engine internally takes care of that for you.
    
    Besides, there are other tricks you could use to pull off a contiguous memory layout for data of the same kind, e.g. using 64-bit address spaces or on-the-fly defragmentation. Those are on my list of features for Molecule 2.0 :).
brunohgouveia on November 18, 2015 at 5:42 pm said:

“In Molecule, you are free to load as many different resource bundles as you like, and use different allocators for them (if you want). Hot-swapping of resources uses a different allocator for resources that have been swapped, because it is a development-only feature anyway. The engine internally takes care of that for you.”

So in the handle objects you must save an extra information (the bundle which the handle belongs to)?

What I meant by not be able to load resources on the fly is that you must know the number of objects of each resource to properly compute the offset of the first object of each kind of resource (e.g. you must know the number of textures in that bundle in order to know where is the address of the first mesh object). Of course, as you said, you can work around this with on-the-fly defragmentation, which is an expensive operation, but hopefully this should not happen very often (I would love to see a post about that).

Reply ↓
- Stefan Reinalter on November 23, 2015 at 10:22 am said:
  
  So in the handle objects you must save an extra information (the bundle which the handle belongs to)?
  
  No. I iterate over all resources of the same type to find the resource that is about to be replaced.
  
  that you must know the number of objects of each resource to properly compute the offset of the first object of each kind of resource (e.g. you must know the number of textures in that bundle in order to know where is the address of the first mesh object)
  
  I don’t think you need that bit of information.
  Accessing blocks of resources (e.g. all textures, all meshes) using an extra indirection (e.g. a pointer) is no big deal. From there, you just need to know the index (or ID) of a resource to access it.
  
  on-the-fly defragmentation, which is an expensive operation, but hopefully this should not happen very often
  
  In this case, it would only happen when hot-reloading. And it is not as expensive as it might seem, if you do it every frame. If I’m not mistaken, Naughty Dog does on-the-fly defragmentation for streaming assets in the Uncharted series. We did something similar for streaming assets in Cursed Mountain on the Wii.
  
  Reply ↓
Mathias on August 20, 2019 at 2:55 am said:

Hey,

How do you keep track of which handles/indices have been destroyed internally. for example if we have 3 handles with index 0, 1 and 2, index 0 gets destroyed, now the next allocation should use index 0 and not 3 how is that handled ?.

Originally i tried using a freelist inside the handles but this leads me to my object array being full of holes, which in turn makes iterating over it really akward / impossible.

Reply ↓
- Stefan Reinalter on August 20, 2019 at 12:01 pm said:
  
  You can use a freelist combined with the trick of filling holes by moving the last object into them. Niklas wrote about this data structure that he calls a “packed array” a while ago: http://bitsquid.blogspot.com/2011/09/managing-decoupling-part-4-id-lookup.html
  
  Reply ↓
- heretique on August 20, 2019 at 12:22 pm said:
  
  Also worth reading the new articles from Niklas with an updated take on this:
  https://ourmachinery.com/post/data-structures-part-1-bulk-data/ and
  https://ourmachinery.com/post/data-structures-part-2-indices/
  
  Reply ↓