About a year and a half ago I posted a few C++ tips on Twitter. Because not all of my blog’s readers are on Twitter and Twitter is not the best medium for archiving things, I decided to write a blog post instead, accumulating all tips in one place.
Additionally, this also allows me to go into more detail where necessary and comment on a few things noted on Twitter.
I will keep updating this post as I add more tips.
The last post of this series basically concluded with the following questions: how do we efficiently allocate memory for individual command packets in the case of multiple threads adding commands to the same bucket? How can we ensure good cache utilization throughout the whole process of storing and submitting command packets?
This is what we are going to tackle today. I want to show how bad allocation behavior for command packets can affect the performance of the whole multi-threaded rendering process, and what our alternatives are.
In the previous part of this series, I’ve talked a bit about how to design the stateless rendering API, but left out a few details. This time, I’m going to cover those details as well as some questions that came up in the comments in the meantime, and even show parts of the current implementation.
In this post, I would like to describe what features and performance characteristics I want from a modern rendering system: it should support stateless rendering, rendering in different layers/buckets, and rendering that can run in parallel on as many cores as are available.
Some time ago, I announced that the Molecule Engine uses C++ as a scripting language. Today, I can share implementation details and a few additional tricks that were used to keep compilation times and executable sizes down.
Serialization, reflection, and other mechanisms are often used for saving data in an editor or a tool like the asset pipeline, and then loading that data into the engine at run-time. This process is well-known, flexible, and allows us to store the data in any format conceivable. Still, all those techniques show certain weaknesses when it comes to keeping iteration times to an absolute minimum.
Even though Molecule’s run-time engine exclusively uses binary files without doing any parsing, the asset pipeline uses a human-readable non-binary format for storing pretty much everything except raw asset files like textures or models. This post explains the process behind translating data from such a human-readable format into actual instances of C++ structs with very little setup code required.
In performance-sensitive applications like games it is crucial to access data in a cache-friendly manner. Especially when dealing with a large number of objects of the same type, e.g. individual components in an entity-component-architecture, we should make sure to read as little data as possible. However, simple arrays-of-structures are often not suited for this, with structures-of-arrays yielding better performance. But the latter are not natively supported by the C++ language.
Today’s post is less of an insight into how Molecule works, and more of an announcement about an upcoming feature we are very proud of!
Molecule Engine’s scripting system uses runtime-compiled C++ code as a scripting language, and you can see the system in action here (please make sure to watch the video in original quality).
This allows the engine to leverage the full performance potential of native C++ code, while providing designers and scripters with extremely short iteration times, commonly only experienced when using traditional scripting languages such as lua, python, or others.
Scripters won’t have to deal with internal engine details, and don’t need to worry about pointers or other low-level language stuff. They only work with a pure C-interface and opaque structs, as can be seen in the video. But programmers can easily dive in and feel right at home with the whole engine available to them in native C++-code.
Furthermore, programmers can aid scripters easily by using their favourite debuggers and IDEs for debugging and development. Scripters will love certain IDE features such as IntelliSense, completion listboxes, and other things a modern IDE provides!
Having finished the third part of this series about data ownership, we will turn our attention to performance optimizations and data layout again in this post. More specifically, we will detail how character skinning can be optimized with a few simple code and data changes.
One task that is pretty common in game development is to transform data according to some sort of hierarchical layout. Today, we want to take a look at probably the most well-known example of such a task: transforming joints according to a skeleton hierarchy.
Continuing from where we left of last time, I would like to discuss how we can build growing allocators using a virtual memory system. This post describes how to build a stack-like allocator that can automatically grow up to a given maximum size.
Last time, we were looking at a linear allocator, probably the simplest of all memory allocators. This time, we will detail how to implement a non-growing stack-like allocator, along with conventional use-cases.
Even though a task scheduler can help with alleviating the burden of having to distribute small pieces of work to different threads, it cannot help preventing a few issues common in multi-threaded programming, especially in multi-processor environments.
Continuing from where we left off last time, this post explains how parent-child relationships are handled inside the task scheduler, and how streaming tasks can be split automatically by the scheduler.
In this part of the series, we will discuss Molecule’s task model in detail, and have a look at the underlying C++ code and some subleties we need to watch out for, as well as some unique optimization opportunities.