Adventures in data-oriented design – Part 3a: Ownership

Posted on May 2, 2013 by Stefan Reinalter

One thing I have noticed during the development of the Molecule Engine, is that defining clear ownership over data can tremendously help with following a data-oriented design approach, and vice versa.

Defining ownership initially requires people to think more about who owns data, who creates and destroys instances, but totally pays off in terms of maintenance, performance, and debuggability. I would like to go back to one of my favourite examples, because it is easily understood by everyone: rendering a bunch of static meshes.

Mesh rendering

The example we are looking at is the following:

A level contains any number of static meshes. A static mesh consists of a vertex buffer, an index buffer, and a bunch of triangle groups describing the indices used by each group/submesh in the mesh. We call a struct/class holding that information a Mesh.
Each of these meshes can be rendered with a different shader and material, hence those are not part of the mesh, but are used by something called a MeshInstance, a MeshComponent, or similar. We need this distinction because the same mesh can be instantiated with different shaders and materials several times in a level.

If you do not care much about memory, performance, and keeping your co-workers sane, the easiest solution is to allocate each Mesh individually using new, and store a pointer to a Mesh inside the MeshInstance. Because several such instances can now refer to the same Mesh, you either hold a shared_ptr<Mesh> or similar, or build some other means of reference-counting into the Mesh class and store a plain old raw pointer. And while you’re at it, vertex buffers and index buffers are also reference-counted, because you can and it’s nice and OOP-esque. Problem solved.

Well, yes, and no.

There are several major disadvantages with using such an approach:

Ownership: Who owns instances of Mesh? Whenever a MeshInstance is deleted or goes out-of-scope, it simply decrements the reference-count of the referenced object. Can anybody tell when the actual object gets destroyed?
The same is true for vertex buffers and index buffers. When are those deleted, exactly?
Performance: You’re handing out raw pointers, shared_ptr<> or anything along those lines to users of your class. This makes it very hard (if not impossible) to move things around in memory, and still let all pointers point to the correct object. Hence, most of your objects are scattered all over the heap, causing tons of cache-misses upon accessing them, because subsystems cannot rearrange objects in memory as they best see fit.
Debuggability: Raw pointers just scream „dangling pointer“. Oh, you correctly released a Mesh reference, the original object got destroyed, but you are still holding on to your MeshInstance? Well, the memory manager allocated a new instance at the exact same spot in memory in the meantime, and you’re now working with stale data without noticing. Guess it’s not your lucky day.

Of course we can do better.

A better solution

The first thing to think about is: who uses the data, who owns, creates and destroys it? Let’s look at vertex and index buffers first.

The only thing that should ever be doing API calls in a rendering engine is the render backend. In Molecule parlance, the rendering backend is just a namespace with tons of free functions, who directly talk to D3D11, OpenGL, DX9, or other APIs. Porting the graphics module means porting the backend, and additionally exposing platform-specific functionality, but that’s about it.

The backend is also responsible for queueing and sorting draw calls according to 64-bit keys, and hence is the only thing in the engine binding vertex buffers, setting render states, performing actual draw calls, etc.

Because the backend is the only thing touching that data, it would benefit the most if all low-level rendering-related data such as vertex and index buffers were as close together in memory as possible. Therefore, the backend itself should also be responsible for creating/destroying those buffers, taking ownership over them.

In addition, we do not want to return raw pointers to our internals, because we want to be able to track accesses to stale data – no more dangling pointers! Furthermore, by giving the user a simple identifier such as an integer instead of a pointer, the question of „Do I own this? Do I need to free this?“ never actually arises.

Simplicity trumps everything

So what is the simplest solution for getting something that is as close together as possible in memory? An array.

This is what Molecule uses. The rendering backend simply holds an array of 4096 vertex and 4096 index buffers. Of course those numbers are configurable, but do you ever need more than 4k vertex buffers in flight at the same time? If so, you have a worst-case scenario of at least 4k distinct draw calls in a certain frame, which is unreasonable anyway (in terms of performance).

Instead of a pointer, you can now simply return a 16-bit integer that can be used to uniquely identify vertex and index buffers in your array – it is nothing more than an index into the array. Not only is the question of ownership no longer a question (you cannot delete or free() a 16-bit integer, nor decrement its reference-count), you can also build in a mechanism for tracking whether a given integer refers to an existing object or not – this is what is often referred to as a handle. Depending on the maximum number of instances, a 16-bit int might suffice, or you can always go to 32-bit.

That being said, the interface for creating and destroying vertex and index buffers in Molecule looks like this:

namespace backend
{
  VertexBufferHandle CreateVertexBuffer(unsigned int vertexCount, unsigned int stride, const void* initialData);
  VertexBufferHandle CreateDynamicVertexBuffer(unsigned int vertexCount, unsigned int stride);
  void DestroyVertexBuffer(VertexBufferHandle handle);

  IndexBufferHandle CreateIndexBuffer(unsigned int indexCount, IndexBuffer::Format::Enum format, const void* initialData);
  void DestroyIndexBuffer(IndexBufferHandle handle);
}

Referencing mesh data

Thinking about mesh data ownership, we can come up with a similarly simple solution for referencing/storing that data as well.

In the Molecule Engine, a thing called the render world holds all data which is tied to the graphics module, the main things being stuff that is pulled in from resource packages, such as meshes, skeletons, animations, particle systems, graphics-related components, etc.

Similar to the fixed-size vertex and index buffers that are being held by the render backend, the render world stores e.g. an array of all meshes contained in a resource package. Because all other rendering-related data is also owned by the render world, we can reference that by using handles as well.

This means that a Mesh now looks like this:

struct Mesh
{
  VertexBufferHandle m_vertexBuffer;
  IndexBufferHandle m_indexBuffer;
  TriangleGroupHandle m_triangleGroups;
  uint16_t m_triangleGroupCount;
};

No reference-counting, no shared_ptr<>, no raw pointers. A Mesh is trivially copyable, and can be moved around in memory by using memcpy(). How do you hold on to a Mesh? What does a MeshInstance look like?

It’s simple: you just copy the Mesh. You hold on to a Mesh simply by copying it. A MeshComponent just stores a copy of Mesh, along with handles for shaders, materials, and so on.

In practice, MeshComponents themselves are owned by the render system which is responsible for rendering them, but that is something for another blog post.

Conclusion

Let us quickly recap:

Vertex and index buffers are owned by the render backend. No raw pointers are handed out, only handles. Handles are an opaque data type, and the user should not (and does not) know how to interpret the given integer.
Mesh instances are owned by the render world. Meshes are referenced simply by copying them, because that gives you all the data you need in order to do something with it.
There are no reference-counting mechanisms, no raw pointers, and most importantly no dangling pointers. The system automatically identifies accesses to stale data. In addition, most handles occupy less memory than pointers, especially on 64-bit systems.
Mesh instances, MeshComponents, and many other components are merely data containers, and as such can be freely moved around in memory, without having to worry about ownership, construction/deletion, etc.

In the next posts, we will take a closer look at what Molecule uses for referencing data that is moved around in memory by subsystems responsible for updating/rendering it. One such system is the one responsible for rendering meshes, where it’s crucial that data is accessed in a cache-friendly fashion. Specifically, we will go into detail about internal references (= handles), and external references (= IDs).

46 thoughts on “Adventures in data-oriented design – Part 3a: Ownership”

Joe Palmer on May 3, 2013 at 1:30 am said:

Great post.

Reply ↓
Dirk on May 3, 2013 at 6:02 am said:

Great post again! Thanks for sharing! I have one question though 🙂
Why does the skeleton and the animations live in the render world? I always wondered if there is a good way to keep them separate. What would be the minimal interface between the two?

I am thinking of it from the physics perspective. During simulation there is a PreStep() and PostStep() phase. In the PreStep() phase you e.g. apply forces for explosions, but you also read transforms from the animation system to compute velocities for keyframed bodies. In the PostStep() phase I write back the transforms of the dynamic bodies to the skeleton. If the skeletons live in the render world this *seems* to create a coupling with the render system, but ideally the physics should only communicate with the animation system. I am just wondering if I am missing something.

Reply ↓
- Stefan Reinalter on May 3, 2013 at 12:28 pm said:
  
  Thanks Dirk!
  
  As of now, the data flow goes like this:
  1) Animations are sampled at their current clock, and according to options like looping, time-scale, etc. This is of course done for all joints, and the output of that stage is usually called the local pose, meaning that all joints have been assigned a transformation in their local coordinate space. During this stage, the animation doesn’t know anything about the skeleton.
  2) Using the joint hierarchy (which is owned by the skeleton), the local pose gets turned into the global pose. In this stage, the only data that is needed is the skeleton hierarchy, hence skeletons and animations don’t really know anything about each other. Of course the number of joints in both has to match, but that’s about it.
  3) After the global pose has been build, optional physics-related things like ragdolls, IK, etc. can be applied, but I haven’t touched any of that yet.
  4) Once the global pose is final, the matrix palette for skinning is built.
  
  So the interface between the animations and the skeletons are basically the local pose. It is written by the low-level animation system, and read by the skeleton when generating the global pose according to the joint hierarchy. Because the local pose is nothing more than an array of floats/SIMD vectors, the animation system and skeleton data know nothing about each other.
  
  Because I haven’t touched physics in Molecule yet, the skeleton and animations both live in the render world. But to be honest I don’t think that is going to change once I’ve added physics stuff :). Somebody has to own the data, and I don’t think there’s a clear winner in a situation like this. The same is true for e.g. storing transformations: think of a game object having both a MeshComponent and a RigidBodyComponent. Who owns the transformation? Where is it stored? How is it accessed? That one also had me thinking for a while :).
  
  What’s your take on that? Who owns the skeleton and animation data, and how do physics interact with that data?
  
  Reply ↓
Dirk on May 4, 2013 at 3:17 am said:

I am not sure what the best solution is. It seems that animations, IK, and physics drive bones in a skeleton. Usually these three things come into the game with the model which also includes the meshes. The skeleton then outputs the matrix palette for the render system. I think all systems *reference* a skeleton, but I don’t see anyone preferable to own it. The skeleton should be owned by whoever owns the model I guess.

Reply ↓
David on May 5, 2013 at 1:46 pm said:

I recently started work on redesigning my personal library to be more DoD-friendly and this post was an excellent read. I’m curious about a couple of things:

– As the DoD mantra seems to be “focus more on data, less on hierarchies”, do you still have a Renderer class with member functions or just take the full ‘C-like’ road (i.e., a Renderer POD struct, passing a Renderer instance around free functions, etc.)? I understand DoD is not really about forbidding the use of classes per se, but the more I embrace this way of thinking the more I see them (classes, in the traditional object-oriented way) as an unnecessary burden. I mean, once I’ve removed virtual calls, use of heterogeneous, polymorphic containers and transitioned from AoS to SoA, what’s left? Do I still need them? You actually end up with cleaner code this way. One could argue you may be losing the benefits of encapsulation, but I haven’t found that to be a problem in practice. It certainly seems wrong _not_ to do it this way for performance-sensitive development now.

– It would be interesting to know more about your Renderer-related interfaces. For example, how do you define your VertexBuffer and IndexBuffer types? Do they contain the actual vertex and index data themselves or do they point to a general array of vertex and index data via a handle? How do you ‘glue’ that data together with the Renderer code that prepares the render command buffer? You’ve hinted at some of the details in this post, but it would be nice to know more about it.

Finally, thanks for the work on this whole series!

Reply ↓
- Stefan Reinalter on May 6, 2013 at 1:31 pm said:
  
  I still use classes for various things, but most of the bulk work is done by free functions in a namespace. Those are purely functional, have clear input & output parameters, never read or write global state, and basically do nothing more than a data transformation. An example would be the following function which builds the global poses from local poses:
  
  void BuildGlobalPoses(uint16_t jointCount, const uint16_t* ME_RESTRICT jointHierarchy, const math::vector4_t* ME_RESTRICT lpTranslationData, const math::quaternion_t* ME_RESTRICT lpRotationData, const math::vector4_t* ME_RESTRICT lpScaleData, math::vector4_t* ME_RESTRICT gpTranslationData, math::quaternion_t* ME_RESTRICT gpRotationData, math::vector4_t* ME_RESTRICT gpScaleData);
  
  This feels very C-like, and is the DOD-equivalent of e.g. AnimatedMeshComponent::Update(), which would update the global pose for one component only. I still use classes that make use of these functions, though. Mostly because all the functions expect their input & output parameters to be contiguous, which forces you to think about memory management and making sure individual data chunks are allocated contiguously. This is what the classes take care of – they own the data, they manage all the allocations for the data, but work is handed to a pure function which transform the data. Having the data transformations done in separate functions also has the benefit that I can easily let BuildGlobalPoses() run in a task, which is automatically multi-threaded by Molecule’s task scheduler.
  Going from single-threaded to multi-threaded code is dead simple then, because you know that nothing needs to be synchronized, and you can easily stuff a free function into a task kernel. So as I said: I have classes, but those are mostly systems which own the data, and make use of data transformation functions.
  
  Regarding the renderer, the only thing that ever talks to the API is the render backend. The backend knows about the underlying types like VertexBuffer, IndexBuffer, etc., but all high-level code only deals with handles – and the backend is the only one that knows how to interpret those handles. The backend also prepares the command buffer, and can do so in a multi-threaded fashion by having one command buffer for each thread, which are then merged into one big command buffer before submitting sorted draw-calls to the graphics API.
  VertexBuffer and IndexBuffer make use of the graphics API, and e.g. hold ID3D11Buffer pointers.
  I’ll eventually discuss the renderer architecture in a future blog post – take a look at bgfx in the meantime, that gives you a good idea of how things could work.
  
  Reply ↓
  - David on May 6, 2013 at 11:18 pm said:
    
    Thanks for the detailed answer, Stefan.
Pingback: Adventures in data-oriented design – Part 3b: Internal References | Molecular Musings
Anonymous on June 3, 2013 at 6:00 pm said:

You should definitely discuss how the renderer works in more detail and how the high level talks to the backend

Reply ↓
Endu on June 12, 2013 at 7:05 pm said:

Nice article, looking forward to Part 3c with external references and how everything maps together (system system). One example is bones in the animation system which are transforms in the render system and in the physics system, but all transforms might not be bones so you have a “compacted” representation in the animation system that still maps to the correct indices in the render system and physics system 🙂

Btw you mention that those fixed vectors are configurable. Do you have some sort of an init/deinit method for this backend namespace that receives a memory range (begin, end) that you use to allocate the fixed vectors depending on the config value? Or do you allow the backend to interact with your virtual memory system directly? I’m guessing you stick with the first method since it allows the backend to work with whatever memory that is given to it from a higher level system? Any preferences here and if so why? 🙂

Final question, do you also allow for configuring your handle bitfield layout that you talked about in 3b using config values instead of hard coding them in the source? Like what if you later want to switch to 8192 vertex buffers without having to recompile the engine? Do you see this as a possible problem, I’m thinking not since people using molecule will have access to source 🙂

Thanks again!

Reply ↓
- Stefan Reinalter on June 12, 2013 at 9:40 pm said:
  
  Lots of valid and interesting questions!
  
  Systems talking to other systems (like in the example you gave) is definitely interesting. The approach that I’ve been using so far is that each system “produces” data simply by allocating temporary memory for its output, writing all the data there, and returning a pointer to it from its Tick() method. All systems share a big chunk of contiguous memory which is re-used each frame, and allocating from that chunk is done by simply offseting a pointer. Every system output that is an input to another system is produced this way.
  This approach also greatly reduces memory fragmentation, because you no longer have to deal with component parts having different sizes, e.g. one component playing animation A with 50 bones needs less memory than another component playing animation B with 80 bones. In such cases you cannot use a pool allocator, and have to resort to other approaches, which eventually always have some fragmentation. If you have to rebuild your output each frame anyway, why not grab a chunk of contiguous memory from the temporary allocator :)?
  
  As an example, the animation system extracts root motion for all animation components (game objects are component-based), along with a matrix palette for skinning the model. The physics system is only interested in the root motion, the render system is interested in the matrix palette. So those systems can grab a pointer to the data produced by the animation system, and use that. Each system internally knows how the data needs to be interpreted. And to keep everything cache-friendly, data is output in a structure-of-arrays fashion (= one contiguous chunk for all root motion data, one contiguous chunk containing all matrix palettes).
  
  Regarding the configuration of backend memory, it’s actually even simpler: the array sizes are specified by constants in a configuration (header) file, and those constants are used to define static arrays. The arenas are then configured to use those static arrays as their memory. I try to come up with reasonable configurations for memory sizes and other things like handle bitfields, but one can always change that directly in the code, if desired.
  Whether something takes an allocator from the user, has a configurable size, or uses a buffer with a hardcoded size is decided on a case-by-case basis. For some things, leaving the decision to the user makes sense, in some cases it would only put additional burden on the programmer using it.
  
  Reply ↓
Tim Peters on June 29, 2013 at 4:08 pm said:

How do you handle the position of unskinned (static) meshes? Do they have one root joint that defines it’s global position or do you have an ‘TransformComponent’ or store a seperate global location in your MeshComponent?

Do your skinned meshes have a root joint that gets changed when you move a character for example or do you do it differently?

Reply ↓
- Stefan Reinalter on July 1, 2013 at 11:57 am said:
  
  Each instance of a static mesh has a separate transformation, which is stored directly in a matrix rather than through a TransformComponent. Static meshes don’t have any joints.
  
  Skinned meshes do have a root joint, but that is not their position in world space. Each instance also has a separate transformation stored in a matrix. The position of the root joint is dictated by the animation, and movement is extracted each frame by using what is commonly known as “root motion extraction” or “root motion delta”. You need that for e.g. walk- and run-cycles, where the transformation of the root joint basically dictates how much the character should move between each animation frame. Because the root joint snaps back to the origin at the end of a looping animation, you need to extract the delta motion each frame in order to drive your character forward.
  
  See the UE3 docs for more explanations.
  
  Reply ↓
Chris Xrasdk on July 4, 2013 at 10:50 pm said:

How do you handle situation where you have e.g. material A and material B, both of these hold a handle to same texture X (owned by render backend) and you destroy material A? If you destroy material A, it will destroy texture X and if you still need to use material B is has invalid handle since texture X cannot be found any more.

Reply ↓
- Stefan Reinalter on July 4, 2013 at 11:07 pm said:
  
  No, that’s not how it works. The material does not have ownership over the texture, hence it will not destroy or attempt to delete it. It cannot even do that, because all the material holds is a texture handle.
  
  Instead, all textures are owned by the backend, and the resource manager knows about the lifetime of those textures. Upon loading a resource package, all the textures contained therein are created. When a package is unloaded, all textures will be destroyed.
  
  If anybody tries to access a texture via a handle after the corresponding package has been unloaded, an assert will fire. The same is true for all other resources (mesh data, skeletons, animations, etc.).
  
  Therefore, you don’t need any reference counting mechanism, and the point of resource creation/destruction is defined exactly (which is not true in reference counted scenarios).
  
  Reply ↓
  - Chris Xrasdk on July 5, 2013 at 1:48 pm said:
    
    Ok, that’s clear now 🙂 How about case when you have texture A in two different resource packages? Seems to be quite wasteful if you have huge textures duplicated in memory.
  - Stefan Reinalter on July 5, 2013 at 2:10 pm said:
    
    Such a situation is handled by the resource manager already.
    If the same texture is contained in two (or more) resource packages, the resource manager will complain that a texture with the same name is already loaded, and refuse to load the second texture.
    If you happen to add a texture with the same contents to two resource packages with different names, there’s not much that can be done, expect for one thing: In addition to the hashed name the content pipeline could also generate a hash for the content of an asset and therefore detect such cases, but right now I would not lose much sleep over it. It clearly is the developer’s/artist’s responsibility to make sure such mistakes don’t happen, and I don’t know whether other engines like UE4/CryEngine go that far and do checks like that.
  - Bas on September 17, 2013 at 10:35 pm said:
    
    Hey hey,
    
    I was wondering how you handle the loading and unloading of resources. Is the user responsible for this? What determines when certain packages are needed or no longer needed?
    
    I am working on my own hobby project, but I have difficulties trying to decide who should own the data and how to make sure it’s there when needed and discarded when it’s no longer in use.
    
    Thanks for the many interesting blog posts! =)
  - Stefan Reinalter on September 18, 2013 at 3:47 pm said:
    
    The user is responsible for loading/unloading packages. This is usually done from script code.
    If something still refers to a resource that is no longer there, the engine will notice and tell the user.
  - Bas on September 22, 2013 at 12:17 am said:
    
    Thanks for the answer.
    
    I was wondering though. How do you handle data dependencies? For example the graphics will need data from the animation system. Does the graphics side then have direct access to the animation side? Or how does this work?
    I would think that you would want to minimize the cross referencing and keep systems as independent as possible. In the ‘shared pointer approach’ I would (maybe incorrectly) store the dependant components with a shared pointer. So in the example the view would have a shared pointer to the animation component.
  - Stefan Reinalter on September 22, 2013 at 3:23 pm said:
    
    There is no cross-referencing of systems. Generally speaking, classes are only coupled by introducing higher-level classes which use some lower-level classes as members.
    
    As an example, here’s the rough outline of classes in the graphics module:
    – The module consists of several libraries, each own responsible for loading the corresponding resources, and managing their lifetime. The MeshLibrary is responsible for meshes, the AnimationLibrary is responsible for animations, and so on. The libraries know nothing about updating, rendering, etc.
    
    – The module also has several systems. Each system is responsible for updating or working on several components, and also manages the lifetime of those components. Again, the MeshSystem would create & destroy mesh components, and prepare them for rendering. The SkinnedMeshSystem is responsible for skinned mesh components, and also skins them during the update-phase. All systems work on contiguous data, and do that in a completely multi-threaded fashion.
    
    – The renderer is responsible for rendering the scene, but of course needs to know about meshes, skinned meshes, etc. However, the renderer never directly accesses any of the libraries or systems.
    
    – Instead, each module introduces a world, which ties all of the above together. In the graphics module, this is called a RenderWorld. The render world owns all the libraries, systems, and the renderer. During the render phase, the RenderWorld asks all the systems for their components, and hands them off to the renderer.
    
    – Other modules also introduce their own worlds, such as an EntityWorld (game module), a PhysicsWorld (in the physics module), etc. The overarching thing is simply called World, which stores all other worlds as members.
    
    With this approach, systems never know of each other. Libraries don’t know abut other libraries. The renderer knows about renderable components only. The RenderWorld knows nothing about other worlds. And World::Update() & World::Render() can clearly dictate in which order which systems/worlds get updated & rendered, so I don’t have to rely on virtual Update() & Render() calls.
    
    I hope that makes sense!
    It’s all very straightforward from an implementation point-of-view: prefer composition to inheritance, use composition in higher-level classes to “couple” lower-level classes (never let them know of each other), be explicit about the order of function calls (don’t use fancy delegate/event dispatches), and store data contiguously in heterogeneous rather than homogeneous containers.
  - Bas on September 23, 2013 at 1:26 am said:
    
    Thanks for the elaborate answer. I’m starting to see the picture I think. I will try to summarize to see if I got it right.
    
    There is an ‘Almighty World’ who owns the other ‘lesser’ Worlds and it dictates the order of updating of the Systems.
    
    Then the ‘lesser’ Worlds all own Libraries for data and Systems for algorithms (that do updating/rendering).
    
    Now none of the Libraries know about other Libraries. Systems know about Libraries? But definitely not about other Libraries. And Worlds don’t know about other Worlds. So that seems all nice and separated.
    
    The Systems get their data via input parameters given by the World, who gets the data from Libraries.
    
    There is just one thing I have trouble with. With for example the animating. The SkinnedMeshSystem, which lives in the RenderWorld, would need the data from the Animation module about the pose of the skeleton, so it can correctly skin the mesh. In the same way this info would be needed by the PhysicsWorld to correctly perform collision checks for example. In the same way the RenderWorld will need to know the positions of the objects, which are computed in the PhysicsWorld. Since the ‘lesser’ Worlds are not supposed to know about each other, are these kind of data dependencies then handled by the ‘Almighty World’ class? On the other hand I could also imagine that this could maybe bring too much ‘low level details’ up to the higher levels.
    
    I’m sorry for all the questions! We use an object heavy code base at work (medical software), with virtual functions all over the place (and every object is created with a shared pointer…). Not my choice. ^^ In my hobby project I’d definitely like to use a different structure. This data-oriented approach sounds really neat and ‘no-nonsense’. I’m a big fan of no-nonsense. =)
    Thanks again for taking the time and effort to explain!
  - Stefan Reinalter on September 23, 2013 at 10:44 pm said:
    
    Thanks for the elaborate answer. I’m starting to see the picture I think. I will try to summarize to see if I got it right.
    
    There is an ‘Almighty World’ who owns the other ‘lesser’ Worlds and it dictates the order of updating of the Systems.
    
    Then the ‘lesser’ Worlds all own Libraries for data and Systems for algorithms (that do updating/rendering).
    
    Now none of the Libraries know about other Libraries. Systems know about Libraries? But definitely not about other Libraries. And Worlds don’t know about other Worlds. So that seems all nice and separated.
    
    The Systems get their data via input parameters given by the World, who gets the data from Libraries.
    
    Yes, that’s how it works.
    Regarding your question, I think I already answered that in one of the comments to this post already :).
  - Bas on September 30, 2013 at 1:45 am said:
    
    Thanks for the answer Stefan.
    I’ve read a bit more about data oriented programming now. I also found this book (writing in progress) by Richard Fabian on: http://www.dataorienteddesign.com/dodmain/dodmain.html
darksecond on July 5, 2013 at 2:57 pm said:

in your declaration of your backend, you give the vertexCount and stride seperate. Why don’t you have one ‘dataSize’ parameter instead? do you store the vertexCount and stride for some reason?

Reply ↓
- Stefan Reinalter on July 5, 2013 at 3:05 pm said:
  
  Yes, they are stored inside the vertex buffer, mainly for one reason: CPU access of data.
  
  You need to know how “large” one vertex is, and not all APIs store this data somewhere or make it available to you (certain console APIs don’t), so you have to keep track of that yourself.
  
  Reply ↓
  - darksecond on July 6, 2013 at 6:04 pm said:
    
    Why do you need to acces your VBO’s from the CPU side? when they are created you only need to access them from the GPU right?
    
    So your vertex buffer functions can (speaking in OpenGL terms) bind the VBO and set the strides and everything? or is that a separate rendering part?
  - Stefan Reinalter on July 6, 2013 at 6:20 pm said:
    
    For things that are done on the CPU, e.g. skinning, filling vertex buffers with debug draw data, etc. Not all architectures support compute shaders, and it’s often better to skin meshes on the CPU instead of doing it in a vertex shader. As an example, if you need the skinned vertex position for collision detection or need to render a lot of shadow maps, doing the skinning in the vertex shader might not be the best idea.
    Additionally, in every game I worked on we had at least one “special effect” (not graphics wise, but more gameplay wise) which forced us to dynamically change vertex buffer data. It doesn’t matter whether you alter the data directly, or prepare a buffer on the CPU which is then copied over the GPU’s data – you need to know the data’s stride.
    For me, the vertex buffer is a natural place to store that information.
    
    The vertex buffer itself is accessed and bound by the backend.
darksecond on July 6, 2013 at 6:50 pm said:

First of all i would like to thank you for answering all my asked questions, it truly has helped me a lot.

Is stride alone enough to skin a mesh? Don’t you need to know what the data means and if you know that, do you still need stride (if you know the data, you know the stride implicitly right?)

does this mean that you cannot interlace vertex data? (like vertices, normals, uv coords). because every buffer has only one stride, or is that handled separately?

Reply ↓
- Stefan Reinalter on July 7, 2013 at 9:58 am said:
  
  Is stride alone enough to skin a mesh? Don’t you need to know what the data means and if you know that, do you still need stride (if you know the data, you know the stride implicitly right?)
  
  does this mean that you cannot interlace vertex data? (like vertices, normals, uv coords). because every buffer has only one stride, or is that handled separately?
  
  Stride is the size of one vertex in the vertex buffer, and the vertex buffer can of course interleave several different data streams. Take for example a VB with a position (float3), normal (float3), and two UV sets (2x float2). The stride of this vertex buffer would be (3+3+2+2)*4 = 40 bytes. But only position and normal need to be skinned, the rest can be either copied or be left untouched.
  So you kind of need to know both the data layout for the skinned data, and the vertex buffer’s stride.
  
  Reply ↓
Jani Nurminen on July 7, 2013 at 12:16 pm said:

Thanks for the great article! There seems to be a lot of discussions and comments related to this topic. While reading all the comments one thing came into my mind: how do you handle resource streaming? This scenario came into my mind since you said that resource packages cannot contain overlapping/same resources.

Lets say that level is split into zones and each zone has a resource package. Level starts and first zone / resource package is load, after a while next zone is triggered for loading. How do you handle this? I have solved this in past by checking overlapping resources of these two packages and increased the internal ref count (not part of the actual handle) of those resources. After that old resource package is unloaded (decreases internal ref counts) and new one is loaded (if resource is already loaded it won’t be loaded). In this way you don’t need to unload/load overlapping resources, you have good enough control for resource loading/unloading and handles can be copied freely as much as you need.

Maybe there’s some fundamental difference what resource package is to you or did I miss something? For me resource package is just a list of resources to be load (including all the dependencies for those resources).

Btw. looking forward to read about your external references (= IDs) system 🙂

Reply ↓
- Stefan Reinalter on July 8, 2013 at 1:38 pm said:
  
  Thanks Jani!
  
  Your approach to resource streaming sounds like a very reasonable and good approach. I’ve used something similar for a game in the past.
  Regarding Molecule though, I haven’t really started to think about streaming resources yet. If the whole “resources cannot be in several packages”-decision is too limiting or too cumbersome to work with, I might change that into the less restrictive reference-counted mechanism you described. Even more so if it makes implementing streaming easier for both me and our clients.
  
  So, no details to give you yet, but I surely will be blogging about streaming once I start toying around with it :).
  
  Reply ↓
Endu on September 24, 2013 at 3:31 pm said:

Hey 🙂

I’m assuming you support multiple worlds and that each world you spawn also spawns the corresponding RenderWorld, PhysicsWorld, AudioWorld etc…

You said that the Libraries are responsinble for loading resources and keeping track of them / managing their lifetime. If each world owns its own library how do you handle sharing?

Alot of people in situations like this jump straight into singletons, which you should obviously avoid as stated in your blog, how do you handle situations like this? Do you declare it global? Do you have monostate pattern?

Another question that I have is. Let’s say that we do something like this:

world->spawnEntity( “soldier.molecule”);

this string probably gets hashed and then the entity blueprint / descriptor is probably looked up somewhere and then I’m assuming that world in its turn calls a bunch of spawn functions on the sublevels? Could you explain this part a bit more on how you handle it? Who owns the blueprint / database and who creates it, the world? What does each sublevel return on the corresponding spawn calls, a bunch of IDs that the overarching world stores? Just curious as I’m in this stage of my personal hobby engine and I have trouble deciding how I should design it 🙂

Best regards!

Reply ↓
- Endu on September 24, 2013 at 3:59 pm said:
  
  btw I ment subworld not sublevel 😀
  
  Reply ↓
- Stefan Reinalter on September 24, 2013 at 7:48 pm said:
  
  I’m assuming you support multiple worlds and that each world you spawn also spawns the corresponding RenderWorld, PhysicsWorld, AudioWorld etc…
  
  You said that the Libraries are responsible for loading resources and keeping track of them / managing their lifetime. If each world owns its own library how do you handle sharing?
  
  Alot of people in situations like this jump straight into singletons, which you should obviously avoid as stated in your blog, how do you handle situations like this? Do you declare it global? Do you have monostate pattern?
  
  Molecule doesn’t support sharing of resources between worlds, but only supports sharing of resources between resource packages. That means that you can load several resource packages into a world, unload some of them, load other packages, and all those packages will be able to share resources between them in order to save memory.
  Different worlds are mostly used for keeping resource packages around rather than loading/unloading them all the time. A good example would be the loading screen world, or the (main) menu world.
  
  Another question that I have is. Let’s say that we do something like this:
  
  world->spawnEntity( “soldier.molecule”);
  
  this string probably gets hashed and then the entity blueprint / descriptor is probably looked up somewhere and then I’m assuming that world in its turn calls a bunch of spawn functions on the sublevels? Could you explain this part a bit more on how you handle it? Who owns the blueprint / database and who creates it, the world? What does each sublevel return on the corresponding spawn calls, a bunch of IDs that the overarching world stores? Just curious as I’m in this stage of my personal hobby engine and I have trouble deciding how I should design it 🙂
  
  The resource package contains a binary, empty representation of the “soldier.molecule” entity – a prefab if you would like to call it that. The name is hashed, and the corresponding prefab looked up in the EntityWorld. The prefab is then simply cloned. Cloning an entity is performed by creating an empty entity, and cloning all the components from the original entity. Because components are managed by different systems living in different subworlds, cloning is done by the corresponding subworld responsible for a component. I do that generically using templates which forward the call to the correct subworld, which sounds more complicated than it actually is. The entity itself just stores two integer arrays: one containing the component IDs, one containing the component types.
  When spawning a component, each subworld simply returns the ID of that component, which gets stored inside the entity. All entities live in the EntityWorld, which is part of the Game Module – again, this ensures that higher-level implementations use lower-level implementations for getting stuff done, and making sure that things aren’t coupled too tightly.
  
  I hope that answers the questions you had!
  
  Reply ↓
  - Endu on September 26, 2013 at 1:40 pm said:
    
    Cool 🙂
    
    Thanks for such a detailed answer.
    
    The only thing that comes to my mind now is, how do you handle user defined components and systems with this approach since it’s not really a generic way of doing it? Maybe I’m wrong since I don’t know all the details 😛 With user defined I mean like gameplay components and gameplay systems. Those are probably living in a higher level, much higher than the world class.
    Wouldn’t I have to inline my systems and copy functions etc into the appropriate places in the engine when I make a new gameplay component?
    
    Best regards and thanks 🙂
    
    -Endu
  - Stefan Reinalter on September 30, 2013 at 1:18 pm said:
    
    It is generic in that you don’t need to subclass a certain component base-class, but only need to add a simple typedef for your component. This allows the custom component to be stored within an entity.
    Regarding custom user systems, I think there are a few options available:
    
    1) Let the user add his System::Update() calls to the main loop. Every licensee gets full source-code access, so that wouldn’t be the problem.
    2) Scripting is completely C++-based, so the user could also just add the System::Update() calls to one of his “main” scripts (level scripts, or application-wide scripts).
    3) Provide a GenericSystem base-class that offers a virtual Update() call, which is responsible for updating all components within a system. The engine could then store a list of generic systems and update them automatically, and the user just has to inherit from the base class, implement the interface, and that’s it.
    
    I’m not sure yet which options I will offer. What do you think?
Endu on October 3, 2013 at 11:12 am said:

Btw I don’t have a reply button on your latest post so I wrote here instead :S It disappears after a few levels..

Option 1).

It feels not very intuitive to make engine changes here and there when you want to add your own gameplay systems into the engine.. But hooking it up with the rest of the systems seems very easy to do this way. Especially for data flow like if I need something from the physics system…
But imagine when you release updates, someone has to go in and make sure the changes you’ve done doesn’t break anything when merging conflicts. It has it’s positives and negatives.. Not sure though.. I think this one GIVES most control to the user. But I would avoid writing gameplay code here I guess to have a clear separation between engine and game?

I’m only concerned with how you would data-drive this system? Loading entities and forwarding all the user defined components into the correct systems would require a bunch more inlines here and there in the engine? Maybe you need a generic system to handle all that?

Option 2).

I think this one feels very good, there is a very clear separation between gameplay and engine.. But how do you for example snoop / peek into data from the engine side systems? Like for example image that my system relies on all the transforms output from the physics system? I guess in that case one should do engine changes? Or you might expose some of those data to the scripting side as well? Not sure, but for gameplay side things definately scripting with zero to no engine changes.

Still concerned with data-driving the entities and its components like option 1. Need some form of a generic system for it I guess?

Option 3).
This one feels like option 2 apart from the fact that it gets automatically hooked up into the engine and the data-driving of entities could easily be handled automatically with a bunch of macros that declare and register the components into the appropriate factories etc?

I thnk something between option 2 and 3?

Discuss! ;D

Reply ↓
- Stefan Reinalter on October 9, 2013 at 12:39 pm said:
  
  Thanks for the detailed input, Endu!
  
  I’m aiming for options 2) and 3) (or a mix of those).
  Data-driving the entities is not really a problem, there is already a generic system in place for compiling an entity and its components (done in the content pipeline) into a binary consumed by the run-time. This uses so-called schemas, where a schema understands fundamental things such as ints, floats, arrays of values, references to assets, etc.
  
  A new component type can therefore be added by:
  1) creating a schema for it
  2) providing an implementation/definition for the component (e.g. a data-only struct)
  
  Regarding data from engine side systems: many of those will already be exposed because they are also used internally (e.g. transforms of skinned meshes, their matrix palettes, …), but if something is not readily available, I see no other solution than to either a) add a function to the engine that makes the data available to the scripts or b) wait until this functionality is available in an engine update.
  
  Reply ↓
  - Endu on October 9, 2013 at 9:44 pm said:
    
    Very cool 🙂
    
    I really like the schema idea. It’s like a data definition format almost but you validate against it and you use this schema to query the data inside a flat memory address range right? So when I ask for a property it translates this using the schema into a correct address?
    
    This could also work with arrays as well since most of the data is known at “editor time” so if an artist sets for example “number of inventory items” to 5 for an RPG game inside the editor, the runtime could use that number to allocate a chunk of memory required for the entire entity and all of its data + this array.. The schema would then handle the addressing etc inside this raw blob right?
    
    Very cool nevertheless 🙂
  - Stefan Reinalter on October 15, 2013 at 3:37 pm said:
    
    Yep, exactly. The entity’s/component’s data is stored in a contiguous chunk of memory, and data can be queried by a single function call. This function simply returns a pointer to the data, or nullptr if the data cannot be found according to the schema. Regarding arrays, the schema just states that certain data of a component is an array of *any* size, and the component data itself would hold the corresponding array entries.
    So yes, the content pipeline would then turn all of this data into a big binary blob.
Endu on October 18, 2013 at 9:24 pm said:

The reply button seems to have disappeared again 😦

Do you use your own parser for this? Like a recursive descent parser to read the schema files and then build the runtime interpretable data? 🙂

Does the content pipeline decide on AoS or SoA or do we also handle that in the schema files :)?

Regards and thanks!

Reply ↓
- Stefan Reinalter on October 21, 2013 at 11:09 am said:
  
  WordPress seems to remove the reply-button as soon as comments nest too deep. Unfortunately, there’s nothing I can do against that right now, apart from upgrading to a “proper” blog (which I might do).
  
  Yes, I use my own parser for this. I use a deliberately simple file format and parser, it’s very JSON-like. Very fast to parse, can easily be read and parsed by humans (those two are not the same!, see XML), and can be edited with a simple text editor if desired. The schema file only dictates which data a component holds, AoS vs. SoA is not decided in the schema. The schema also holds additional annotations which are used by editing tools, or component compilers – things such as the name to be displayed in the editor, min/max values, etc.
  
  Fields in the schema are parsed AoS-style, just because I think it’s easier for us humans to deal with. Internally, the binary data gets put into an SoA-style flat memory chunk, which is then loaded by the engine runtime. And lastly, the engine’s component implementation dictates whether data is stored in AoS or SoA fashion. Most of the time it’s SoA for obvious reasons.
  
  Reply ↓
JEmmeDev on August 18, 2019 at 11:42 am said:

I have revisted DOD again, my only issue is loss of data acces, for example create a texture and get a ResourceHandle back.

But a function might want information for instance:
m_TileSheet.Width() / m_TileSize.x;

Now you just have a handle the width, height etc is locked behind the GraphicsDevice interface. You could then say implment a fetch function but thats slowing things down further.

The benifits of DOD is the control of the resource, memory layout etc.which is great, but is there an elegant way to solve the above example without tons of fetch functions?

Reply ↓
- Stefan Reinalter on August 20, 2019 at 11:59 am said:
  
  But a function might want information for instance:
  m_TileSheet.Width() / m_TileSize.x;
  
  Now you just have a handle the width, height etc is locked behind the GraphicsDevice interface. You could then say implment a fetch function but thats slowing things down further.
  
  How much that’s slowing down depends on how often you need this.
  If there are only a few hundred calls each frame, I wouldn’t worry about it. Additionally, you can implement something like TextureData GetTextureData(ResourceHandle) that returns width, height, etc. in one call.
  
  In case you have e.g. 10.000 sprites and the sprite drawing function needs tile and texture information, you should push the data into the function, rather than pull it from the resource manager. I.e. it would be your responsibility to hand down an array of const TextureData into the drawing function. You’re in complete control of the layout after all ;).
  
  Reply ↓
  - JEmmeDev on October 22, 2019 at 4:29 pm said:
    
    Do you create objects at the high level though like Texture that contains a TextureHandle and keeps the data? In the case of materials and meshes you have no choice but to have objects obviously but texture doesnt seem so clearly cut.
    
    Thats currently what im doing and also storing a SamplerState handle within the texture object too. Any tips about how you manage state caches for dx11 would be great, im currently storing them in hash maps in the graphics device so when i create a PSO (mimicing DX12) i can reuse them if they exist. With the sampler though im having to store a handle in the hash map that links to the slot map which feels awkward.