As promised in the last post, today we will be looking at how to get rid of new and delete when allocating jobs in our job system. Allocations can be dealt with in a much more efficient way, as long as we are willing to sacrifice some memory for that. The resulting performance improvement is huge, and certainly worth it..
Back in 2012, I wrote about the task scheduler implementation in Molecule. Three years have passed since then, and now it’s time to give the old system a long deserved lifting.
The last post of this series basically concluded with the following questions: how do we efficiently allocate memory for individual command packets in the case of multiple threads adding commands to the same bucket? How can we ensure good cache utilization throughout the whole process of storing and submitting command packets?
This is what we are going to tackle today. I want to show how bad allocation behavior for command packets can affect the performance of the whole multi-threaded rendering process, and what our alternatives are.
In the previous part of this series, I’ve talked a bit about how to design the stateless rendering API, but left out a few details. This time, I’m going to cover those details as well as some questions that came up in the comments in the meantime, and even show parts of the current implementation.
Continuing where we left off last time, today I want to present a few ideas about how to design the API that enables us to do stateless rendering.
In this post, I would like to describe what features and performance characteristics I want from a modern rendering system: it should support stateless rendering, rendering in different layers/buckets, and rendering that can run in parallel on as many cores as are available.