September 5, 2017 5:04 pm
About a year and a half ago I posted a few C++ tips on Twitter. Because not all of my blog’s readers are on Twitter and Twitter is not the best medium for archiving things, I decided to write a blog post instead, accumulating all tips in one place.
Additionally, this also allows me to go into more detail where necessary and comment on a few things noted on Twitter.
I will keep updating this post as I add more tips.
Update on 21th Dec 2017: Added tips #5-#7.
Table of contents:
Tip 1: Do manual CSE (Common Subexpression Elimination). Don’t assume the compiler will do it.
Tip 2: You don’t need to check for nullptr before using delete/delete[] on an object/array. No you really don’t.
Tip 3: Don’t use std::vector<bool>.
Tip 4: Build your own virtual memory-aware std::vector replacement and use the 64bit address space. Less copying&better grow behaviour.
Tip 5: If you care about performance, check the generated assembly. No matter if C++ or shader code. You’ll be surprised.
Tip 6: Learn how lang features are impl. before blindly (over)using new, delete, std::function, lambdas, STL, virt. fcts. Don’t assume.
Tip 7: Wrap simple enums in a struct to counter namespace pollution, e.g. struct Mode { enum Enum { READ, WRITE }; };
Pointer aliasing in general and in C++ member functions in particular is something that still seems to not be widely understood. If you have never heard of the terms pointer aliasing or strict aliasing rule, head over to this article from 2006 that serves as wonderful introduction to the topic.
Let’s take a closer look at the example code given above:
void Copy(char* to, const char* from)
{
for (size_t i=0u; i < strlen(from); ++i)
{
to[i] = from[i];
}
}
Note that even though from points to const char, it can still alias to, because both their types are compatible – const or not. This means that the compiler has to be conservative, assuming that a write to to could potentially write to from, implying that the result of a call to strlen(from) might have a different outcome each loop iteration.
In other words: the above loop has complexity O(N²), not O(N)!
Let us look at the assembler code generated by Clang, GCC, and MSVC, with full optimizations:
Clang 4.0.0:
Copy(char*, char const*):
push r15
push r14
push rbx
mov r15, rsi
mov r14, rdi
mov al, byte ptr [r15]
test al, al
je .LBB0_4
mov byte ptr [r14], al
mov rdi, r15
call strlen
cmp rax, 2
jb .LBB0_4
mov ebx, 1
.LBB0_3:
movzx eax, byte ptr [r15 + rbx]
mov byte ptr [r14 + rbx], al
inc rbx
mov rdi, r15
call strlen
cmp rbx, rax
jb .LBB0_3
.LBB0_4:
pop rbx
pop r14
pop r15
ret
GCC 7.2:
Copy(char*, char const*):
push r12
mov r12, rdi
push rbp
mov rbp, rsi
push rbx
xor ebx, ebx
jmp .L3
.L6:
movzx eax, BYTE PTR [rbp+0+rbx]
mov BYTE PTR [r12+rbx], al
add rbx, 1
.L3:
mov rdi, rbp
call strlen
cmp rax, rbx
ja .L6
pop rbx
pop rbp
pop r12
ret
VC2017 does the same, but inlines the call to strlen() which makes the generated assembly a bit longer and harder to read. All three compilers essentially evaluate strlen() in the inner loop.
But this is not the compilers’ fault. They can’t do better if they want to play by the rules. When compiling Copy() they can’t know whether code in a different translation unit might call this function with overlapping memory regions.
Yes, in theory (and things like link-time code generation or unity/blob/amalgamated builds) they could know by looking at all call sites of Copy(), but in practice this is not something compilers excel at.
The good news is that we can help the compiler in two ways: either by restrict-qualifying the pointers, or by using local variables.
The restrict keyword has been available in C for quite some time now, but hasn’t made its way into the C++ standard (yet?). Nevertheless, it is supported by all major compilers (__restrict or __restrict__).
In our case there is an even easier fix that doesn’t require any non-standard keywords: local variables.
void Copy(char* to, const char* from)
{
const size_t length = strlen(from);
for (size_t i=0u; i < length; ++i)
{
to[i] = from[i];
}
}
With this simple change, we no longer care whether to and from alias – the code will only evaluate strlen() once. This of course implies that you have to be sure (or make sure!) that nobody ever calls Copy() with overlapping memory regions.
Looking at the generated assembler code, Clang and GCC will fully unroll and optimize the inner loop. Things are almost the same in MSVC, minus the unrolling.
When additionally restrict-qualifying the pointers, both Clang and GCC will realize that we’re pretty much just doing a memcpy() and come up with the following:
Copy(char* __restrict, char const* __restrict):
push r14
push rbx
push rax
mov rbx, rsi
mov r14, rdi
mov rdi, rbx
call strlen
test rax, rax
je .LBB0_2
mov rdi, r14
mov rsi, rbx
mov rdx, rax
call memcpy
.LBB0_2:
add rsp, 8
pop rbx
pop r14
ret
Because all non-static C++ methods carry an implicit this-pointer, every member can pretty much alias every argument (of a compatible type per the strict aliasing rule).
Want an example?
class XorCrypt
{
public:
void Encrypt(char* buffer);
private:
char m_key;
};
Even though char m_key and char* buffer might look like they cannot alias each other, don’t forget that each access to m_key is essentially this->m_key.
What does operator-> tell you?
Indeed, that’s a char* hidden in plain sight.
Everytime you hear somebody say that members in a loop will be kept in a register by the compiler anyway, as long as they aren’t written to: it’s one of the most common misconceptions I’ve heard in the last years over and over again.
But fret not, we can do something about that in this case too: most compilers also accept a restrict qualifier on methods. Pierre Terdiman has a short example for MSVC on his blog.
Beware of pointer aliasing, use local variables, judiciously use restricted pointers, and check the generated assembler code from time to time.
The C++ standard mandates that calling delete/delete[] on nullptr is okay, which is implemented correctly by all major compilers nowadays. There used to be compilers that got it wrong, but that was ~10 years ago.
Nowadays you don’t need SAFE_DELETE macros and I also firmly believe that setting a pointer to null after deleting it just hides the symptoms, but doesn’t fix the underlying ownership problem(s). And the name SAFE_DELETE is a misnomer. And you shouldn’t use a macro for such things.
Have clear ownership rules, don’t hide symptoms but fix root causes instead.
Do not use it.
Maybe it will be deprecated in the future?
Maybe the next C++ standard gets a dynamic_bitset or similar?
As you probably know, an std::vector is a dynamic array that can grow & shrink at runtime. Whenever the vector needs to grow (e.g. in push_back) because its allocated block of memory is not large enough to hold another element, it needs to do the following:
This has several disadvantages:
Making use of virtual memory and the abundance of address space available in 64-bit applications, we can craft our own vector-like replacement with none of these disadvantages.
First, we’re going to reserve address space for the expected worst case of memory needed by the vector. If you think about it, each and every single one of your vectors/arrays/whatever has some intrinsic upper bound of memory needed, however large that may be. On consoles, the upper bound could be the maximum amount of memory that can be allocated by your process – you can never allocate more than that anyway. More realistically speaking, your code probably already knows about expected worst case scenarios for e.g. game objects, components, etc., so we can use that instead.
Note that in this first step, we only reserve address space, but don’t commit any physical pages just yet.
Now, whenever our vector needs to grow, we just grab more pages from the OS’ virtual memory system in the address space we previously reserved. We are guaranteed to get memory at that address as long as there is enough physical memory left to allocate from.
In shrink_to_fit, we can simply return/decommit as many pages as possible to the OS.
With such a setup, we never have to copy or move elements when growing the vector. All the existing elements stay where they are, which means we could even store pointers to them if we wanted. Also note that we never need space for 3*N elements, but just for the amount of elements we’re going to store.
Of course, no technique is without disadvantages:
If you liked this trick, Niklas Gray has more virtual memory tricks up his sleeve.
We can do better than a one-size-fits-all implementation by making use of the underlying hardware.
Don’t be afraid of looking at assembly code. Yes, some of the mnemonics – especially things like CVTTPS2PI or EIEIO (my favourite!) – might not make sense at first, but it is not that hard to learn how to read assembly.
Even though writing assembly code by hand is not something you do very often nowadays as a programmer in the games industry, learning how to read and understand assembly code is a very useful skill to have, especially because sooner or later you’ll have to debug that once-in-a-blue-moon crash in a release candidate right before shipping.
In order to understand assembly code, you should have a good grasp on general computer architecture, understand things like stack frames, calling conventions, argument passing, and how that translates from a high-level language like C++ into assembly code.
You can also learn all of that by just looking at assembly code generated for certain C++ constructs like initializing variables, calling functions, branches, loops, etc. Getting a feeling for how C++ code gets compiled into assembly code is really helpful.
For resources, this tutorial for x86 seems to be a nice starting point for assembly, this book is really good for learning computer architecture, and Mike Acton’s GDC 2014 Code Clinic has a few slides about what compilers can and cannot do for you, with examples of how compilers can fail to understand even the simplest of things and fail to optimize them – which is the reason you should check the generated assembly code every now and then, if you care about performance.
This is similar in spirit to tip #5, but focuses more on non-zero overhead language features like (capturing) lambdas and virtual functions. In my opinion, a programmer should have knowledge about both what a language feature does and what it allows you to do, but also about how it works under the hood.
If you’re not sure, don’t assume things, especially if you’re working on e.g. a memory- or performance-critical piece of code for a 60Hz action game.
Of course, this does not imply that you need to know the implementation of your compiler’s STL by heart, but it doesn’t hurt to be curious every now and then. Similar to that, you don’t need to reinvent the wheel over and over again, but you better know how a basic wheel works before you set out to build a better one.
This is probably best described with a piece of code:
enum Mode
{
READ,
WRITE
};
Handle OpenFile(Mode mode)
{
// ...
}
OpenFile(READ);
In the above example, both READ and WRITE pollute the global namespace. This means that you cannot have any enumerator, global const, or anything else in the global namespace named READ or WRITE, even though the enumerators are part of the enum Mode. Furthermore, note how OpenFile(READ) uses READ as an argument, and not the more readable Mode::READ (that is only well-formed starting with C++11).
The fix for these shortcomings is quite simple – wrap the enum in a struct (or namespace):
struct Mode
{
enum Enum
{
READ,
WRITE
};
};
Handle OpenFile(Mode::Enum mode)
{
// ...
}
OpenFile(Mode::READ);
This solves the above flaws, and allows us to extent the struct with e.g. a ToString function that also does not pollute the global namespace.
The simplest solutions are often the best.
Posted by Stefan Reinalter
Categories: C++
Tags: C++, programming, tips
Mobile Site | Full Site
Get a free blog at WordPress.com Theme: WordPress Mobile Edition by Alex King.
Thanks for sharing! I don’t get to work in C++ very often, but when I do it’s usually performance related… little nuggets like this are awesome! The “nowaways calling delete on nullptr is fine” thing too, when you use a language only semi-sporadically you tend to stay on the “better safe than sorry” side for way too long.
By Paul-Jan Pauptit on September 6, 2017 at 8:36 am
re tip #1 about , I have made it custom to always use the form
for ( size_t i=0u, n=strlen(from); i<n; ++i ) { .. }
(I believe I got this from Christer Ericson originally)
By Mikkel Gjøl on November 7, 2017 at 8:39 am
This is a nice write-up! With respect to avoiding aliasing, there was a C++ standards proposal to help provide tools for this: http://open-std.org/JTC1/SC22/WG21/docs/papers/2014/n4150.pdf it does not look like it progressed but there are some nice discussions in the paper.
By syaghmour on November 13, 2017 at 10:56 pm
Regarding Tip #7: Are there benefits when using a struct-enum as opposed to a namespace-enum? I am leaning towards using namespaces as they cannot be accidentally used as a type.
By gkoefner on December 22, 2017 at 12:42 pm
#7 – Why don’t you use modern language syntax like enum class Mode: unsigned char { Read, Write }; ? It’s cleaner and doesn’t pollute the namespace either, and less typing.
By Michael Lafreniere on December 27, 2017 at 5:41 am