In today’s post, we will finally take a look at the last remaining piece of the new job system: adding dependencies between jobs.
I recently had the need to retrieve the type of a template argument as a human-readable string for debugging purposes, but without using RTTI – so typeid and type_info were out of the question.
Continuing from where we left off last time, today we are going to discuss how to build high-level algorithms such as parallel_for using our job system.
This week, we will finally tackle the heart of the job system: the implementation of the lock-free work-stealing queue. Read on for a foray into low-level programming.
As promised in the last post, today we will be looking at how to get rid of new and delete when allocating jobs in our job system. Allocations can be dealt with in a much more efficient way, as long as we are willing to sacrifice some memory for that. The resulting performance improvement is huge, and certainly worth it..
Back in 2012, I wrote about the task scheduler implementation in Molecule. Three years have passed since then, and now it’s time to give the old system a long deserved lifting.
The last post of this series basically concluded with the following questions: how do we efficiently allocate memory for individual command packets in the case of multiple threads adding commands to the same bucket? How can we ensure good cache utilization throughout the whole process of storing and submitting command packets?
This is what we are going to tackle today. I want to show how bad allocation behavior for command packets can affect the performance of the whole multi-threaded rendering process, and what our alternatives are.
In the previous part of this series, I’ve talked a bit about how to design the stateless rendering API, but left out a few details. This time, I’m going to cover those details as well as some questions that came up in the comments in the meantime, and even show parts of the current implementation.
Continuing where we left off last time, today I want to present a few ideas about how to design the API that enables us to do stateless rendering.
In this post, I would like to describe what features and performance characteristics I want from a modern rendering system: it should support stateless rendering, rendering in different layers/buckets, and rendering that can run in parallel on as many cores as are available.
Serialization, reflection, and other mechanisms are often used for saving data in an editor or a tool like the asset pipeline, and then loading that data into the engine at run-time. This process is well-known, flexible, and allows us to store the data in any format conceivable. Still, all those techniques show certain weaknesses when it comes to keeping iteration times to an absolute minimum.
Even though Molecule’s run-time engine exclusively uses binary files without doing any parsing, the asset pipeline uses a human-readable non-binary format for storing pretty much everything except raw asset files like textures or models. This post explains the process behind translating data from such a human-readable format into actual instances of C++ structs with very little setup code required.
In performance-sensitive applications like games it is crucial to access data in a cache-friendly manner. Especially when dealing with a large number of objects of the same type, e.g. individual components in an entity-component-architecture, we should make sure to read as little data as possible. However, simple arrays-of-structures are often not suited for this, with structures-of-arrays yielding better performance. But the latter are not natively supported by the C++ language.
Today’s post is less of an insight into how Molecule works, and more of an announcement about an upcoming feature we are very proud of!
Molecule Engine’s scripting system uses runtime-compiled C++ code as a scripting language, and you can see the system in action here (please make sure to watch the video in original quality).
This allows the engine to leverage the full performance potential of native C++ code, while providing designers and scripters with extremely short iteration times, commonly only experienced when using traditional scripting languages such as lua, python, or others.
Scripters won’t have to deal with internal engine details, and don’t need to worry about pointers or other low-level language stuff. They only work with a pure C-interface and opaque structs, as can be seen in the video. But programmers can easily dive in and feel right at home with the whole engine available to them in native C++-code.
Furthermore, programmers can aid scripters easily by using their favourite debuggers and IDEs for debugging and development. Scripters will love certain IDE features such as IntelliSense, completion listboxes, and other things a modern IDE provides!
Let us know what you think in the comments!
Having finished the third part of this series about data ownership, we will turn our attention to performance optimizations and data layout again in this post. More specifically, we will detail how character skinning can be optimized with a few simple code and data changes.
In the last installment of this series, we talked about handles/internal references in the Molecule Engine, and discussed their advantages over raw pointers and plain indices.
In a nutshell, handles are able to detect double-deletes, accesses to freed data, and cannot be accidentally freed – please read the previous blog post for all the details.
Today’s post is only a small gem I accidentally came across while I was looking for something entirely different: a faster method of multiplying a quaternion by a vector. Continue reading
One thing I have noticed during the development of the Molecule Engine, is that defining clear ownership over data can tremendously help with following a data-oriented design approach, and vice versa.