Molecular Musings

C++ programming tips

September 5, 2017 5:04 pm

About a year and a half ago I posted a few C++ tips on Twitter. Because not all of my blog’s readers are on Twitter and Twitter is not the best medium for archiving things, I decided to write a blog post instead, accumulating all tips in one place.
Additionally, this also allows me to go into more detail where necessary and comment on a few things noted on Twitter.

I will keep updating this post as I add more tips.

Update on 21th Dec 2017: Added tips #5-#7.

Table of contents:
Tip 1: Do manual CSE (Common Subexpression Elimination). Don’t assume the compiler will do it.
Tip 2: You don’t need to check for nullptr before using delete/delete[] on an object/array. No you really don’t.
Tip 3: Don’t use std::vector<bool>.
Tip 4: Build your own virtual memory-aware std::vector replacement and use the 64bit address space. Less copying&better grow behaviour.
Tip 5: If you care about performance, check the generated assembly. No matter if C++ or shader code. You’ll be surprised.
Tip 6: Learn how lang features are impl. before blindly (over)using new, delete, std::function, lambdas, STL, virt. fcts. Don’t assume.
Tip 7: Wrap simple enums in a struct to counter namespace pollution, e.g. struct Mode { enum Enum { READ, WRITE }; };

#cpp Tip 1: Do manual CSE (Common Subexpression Elimination). Don’t assume the compiler will do it. Most of the time (esp. in class methods), it can’t due to pointer aliasing. (Twitter)

#cpp Tip 1 example: for (i=0u; i<strlen(from);++i) { to[i] = from[i]; }. to and from are both (const)char* and can alias. check the disasm!

Pointer aliasing in general and in C++ member functions in particular is something that still seems to not be widely understood. If you have never heard of the terms pointer aliasing or strict aliasing rule, head over to this article from 2006 that serves as wonderful introduction to the topic.

Let’s take a closer look at the example code given above:

void Copy(char* to, const char* from)
{
	for (size_t i=0u; i < strlen(from); ++i)
	{
		to[i] = from[i];
	}
}

Note that even though from points to const char, it can still alias to, because both their types are compatible – const or not. This means that the compiler has to be conservative, assuming that a write to to could potentially write to from, implying that the result of a call to strlen(from) might have a different outcome each loop iteration.

In other words: the above loop has complexity O(N²), not O(N)!

Let us look at the assembler code generated by Clang, GCC, and MSVC, with full optimizations:

Clang 4.0.0:
Copy(char*, char const*):
        push    r15
        push    r14
        push    rbx
        mov     r15, rsi
        mov     r14, rdi
        mov     al, byte ptr [r15]
        test    al, al
        je      .LBB0_4
        mov     byte ptr [r14], al
        mov     rdi, r15
        call    strlen
        cmp     rax, 2
        jb      .LBB0_4
        mov     ebx, 1
.LBB0_3:
        movzx   eax, byte ptr [r15 + rbx]
        mov     byte ptr [r14 + rbx], al
        inc     rbx
        mov     rdi, r15
        call    strlen
        cmp     rbx, rax
        jb      .LBB0_3
.LBB0_4:
        pop     rbx
        pop     r14
        pop     r15
        ret

GCC 7.2:
Copy(char*, char const*):
        push    r12
        mov     r12, rdi
        push    rbp
        mov     rbp, rsi
        push    rbx
        xor     ebx, ebx
        jmp     .L3
.L6:
        movzx   eax, BYTE PTR [rbp+0+rbx]
        mov     BYTE PTR [r12+rbx], al
        add     rbx, 1
.L3:
        mov     rdi, rbp
        call    strlen
        cmp     rax, rbx
        ja      .L6
        pop     rbx
        pop     rbp
        pop     r12
        ret

VC2017 does the same, but inlines the call to strlen() which makes the generated assembly a bit longer and harder to read. All three compilers essentially evaluate strlen() in the inner loop.

But this is not the compilers’ fault. They can’t do better if they want to play by the rules. When compiling Copy() they can’t know whether code in a different translation unit might call this function with overlapping memory regions.
Yes, in theory (and things like link-time code generation or unity/blob/amalgamated builds) they could know by looking at all call sites of Copy(), but in practice this is not something compilers excel at.

The good news is that we can help the compiler in two ways: either by restrict-qualifying the pointers, or by using local variables.

The restrict keyword has been available in C for quite some time now, but hasn’t made its way into the C++ standard (yet?). Nevertheless, it is supported by all major compilers (__restrict or __restrict__).

In our case there is an even easier fix that doesn’t require any non-standard keywords: local variables.

void Copy(char* to, const char* from)
{
	const size_t length = strlen(from);
	for (size_t i=0u; i < length; ++i)
	{
		to[i] = from[i];
	}
}

With this simple change, we no longer care whether to and from alias – the code will only evaluate strlen() once. This of course implies that you have to be sure (or make sure!) that nobody ever calls Copy() with overlapping memory regions.

Looking at the generated assembler code, Clang and GCC will fully unroll and optimize the inner loop. Things are almost the same in MSVC, minus the unrolling.

When additionally restrict-qualifying the pointers, both Clang and GCC will realize that we’re pretty much just doing a memcpy() and come up with the following:

Copy(char* __restrict, char const* __restrict):
        push    r14
        push    rbx
        push    rax
        mov     rbx, rsi
        mov     r14, rdi
        mov     rdi, rbx
        call    strlen
        test    rax, rax
        je      .LBB0_2
        mov     rdi, r14
        mov     rsi, rbx
        mov     rdx, rax
        call    memcpy
.LBB0_2:
        add     rsp, 8
        pop     rbx
        pop     r14
        ret

C++ member functions

Because all non-static C++ methods carry an implicit this-pointer, every member can pretty much alias every argument (of a compatible type per the strict aliasing rule).

Want an example?

class XorCrypt
{
public:
	void Encrypt(char* buffer);

private:
	char m_key;
};

Even though char m_key and char* buffer might look like they cannot alias each other, don’t forget that each access to m_key is essentially this->m_key.
What does operator-> tell you?

Indeed, that’s a char* hidden in plain sight.
Everytime you hear somebody say that members in a loop will be kept in a register by the compiler anyway, as long as they aren’t written to: it’s one of the most common misconceptions I’ve heard in the last years over and over again.

But fret not, we can do something about that in this case too: most compilers also accept a restrict qualifier on methods. Pierre Terdiman has a short example for MSVC on his blog.

Takeaway

Beware of pointer aliasing, use local variables, judiciously use restricted pointers, and check the generated assembler code from time to time.

#cpp Tip 2: You don’t need to check for nullptr before using delete/delete[] on an object/array. No you really don’t. (Twitter)

The C++ standard mandates that calling delete/delete[] on nullptr is okay, which is implemented correctly by all major compilers nowadays. There used to be compilers that got it wrong, but that was ~10 years ago.

Nowadays you don’t need SAFE_DELETE macros and I also firmly believe that setting a pointer to null after deleting it just hides the symptoms, but doesn’t fix the underlying ownership problem(s). And the name SAFE_DELETE is a misnomer. And you shouldn’t use a macro for such things.

Takeaway

Have clear ownership rules, don’t hide symptoms but fix root causes instead.

#cpp Tip 3: Don’t use std::vector<bool>. (Twitter)

Do not use it.
Maybe it will be deprecated in the future?
Maybe the next C++ standard gets a dynamic_bitset or similar?

#cpp Tip 4: Build your own virtual memory-aware std::vector replacement and use the 64bit address space. Less copying&better grow behaviour. (Twitter)

As you probably know, an std::vector is a dynamic array that can grow & shrink at runtime. Whenever the vector needs to grow (e.g. in push_back) because its allocated block of memory is not large enough to hold another element, it needs to do the following:

Allocate a new block of memory that has space for e.g. 2*N elements (the exact growth behaviour is implementation-defined, but growing by a factor of 2 is reasonable).
Copy/move existing elements into the new storage.
Copy/move the new element (e.g. in push_back) to the new storage.
Destruct the old elements.
Free the old block of memory.

This has several disadvantages:

Depending on the kind of elements held by the vector, copying/moving/destructing a lot of them is not a cheap operation.
Pointers to existing elements are invalidated, so you are never allowed to store a pointer to e.g. &myVector[5] in case the vector might need to grow.
The vector temporarily needs space for 3*N (N + 2*N) elements, which can cause out-of-memory issues even though you’re only going to use 2*N elements in the end.
Not calling reserve/resize leads to memory fragmentation due to several small allocations.

Making use of virtual memory and the abundance of address space available in 64-bit applications, we can craft our own vector-like replacement with none of these disadvantages.

First, we’re going to reserve address space for the expected worst case of memory needed by the vector. If you think about it, each and every single one of your vectors/arrays/whatever has some intrinsic upper bound of memory needed, however large that may be. On consoles, the upper bound could be the maximum amount of memory that can be allocated by your process – you can never allocate more than that anyway. More realistically speaking, your code probably already knows about expected worst case scenarios for e.g. game objects, components, etc., so we can use that instead.

Note that in this first step, we only reserve address space, but don’t commit any physical pages just yet.

Now, whenever our vector needs to grow, we just grab more pages from the OS’ virtual memory system in the address space we previously reserved. We are guaranteed to get memory at that address as long as there is enough physical memory left to allocate from.
In shrink_to_fit, we can simply return/decommit as many pages as possible to the OS.

With such a setup, we never have to copy or move elements when growing the vector. All the existing elements stay where they are, which means we could even store pointers to them if we wanted. Also note that we never need space for 3*N elements, but just for the amount of elements we’re going to store.

Of course, no technique is without disadvantages:

Depending on the page size of the OS (mostly 4k, but also 64k on certain systems) you may end up “losing” some memory due to the fact that our vector can only grow in page-size chunks. For example, consider a case where the vector needs to grow to make space for one more element – in that case, you end up allocating a new page from the OS, but only fill it with one element, rendering the rest of the page unusable for other allocations.
Committing physical pages for reserved address space is not cheap either, but should be faster than copying a lot of elements.
Only works for 64-bit applications. You’re going to run out of address space in 32-bit pretty fast.

If you liked this trick, Niklas Gray has more virtual memory tricks up his sleeve.

Takeaway

We can do better than a one-size-fits-all implementation by making use of the underlying hardware.

#cpp Tip 5: If you care about performance, check the generated assembly. No matter if C++ or shader code. You’ll be surprised. (Twitter)

Don’t be afraid of looking at assembly code. Yes, some of the mnemonics – especially things like CVTTPS2PI or EIEIO (my favourite!) – might not make sense at first, but it is not that hard to learn how to read assembly.
Even though writing assembly code by hand is not something you do very often nowadays as a programmer in the games industry, learning how to read and understand assembly code is a very useful skill to have, especially because sooner or later you’ll have to debug that once-in-a-blue-moon crash in a release candidate right before shipping.

In order to understand assembly code, you should have a good grasp on general computer architecture, understand things like stack frames, calling conventions, argument passing, and how that translates from a high-level language like C++ into assembly code.
You can also learn all of that by just looking at assembly code generated for certain C++ constructs like initializing variables, calling functions, branches, loops, etc. Getting a feeling for how C++ code gets compiled into assembly code is really helpful.

For resources, this tutorial for x86 seems to be a nice starting point for assembly, this book is really good for learning computer architecture, and Mike Acton’s GDC 2014 Code Clinic has a few slides about what compilers can and cannot do for you, with examples of how compilers can fail to understand even the simplest of things and fail to optimize them – which is the reason you should check the generated assembly code every now and then, if you care about performance.

#cpp Tip 6: Learn how lang features are impl. before blindly (over)using new, delete, std::function, lambdas, STL, virt. fcts. Don’t assume. (Twitter)

This is similar in spirit to tip #5, but focuses more on non-zero overhead language features like (capturing) lambdas and virtual functions. In my opinion, a programmer should have knowledge about both what a language feature does and what it allows you to do, but also about how it works under the hood.

If you’re not sure, don’t assume things, especially if you’re working on e.g. a memory- or performance-critical piece of code for a 60Hz action game.
Of course, this does not imply that you need to know the implementation of your compiler’s STL by heart, but it doesn’t hurt to be curious every now and then. Similar to that, you don’t need to reinvent the wheel over and over again, but you better know how a basic wheel works before you set out to build a better one.

#cpp Tip 7: Wrap simple enums in a struct to counter namespace pollution, e.g. struct Mode { enum Enum { READ, WRITE }; }; (Twitter)

This is probably best described with a piece of code:

enum Mode
{
  READ,
  WRITE
};

Handle OpenFile(Mode mode)
{
  // ...
}

OpenFile(READ);

In the above example, both READ and WRITE pollute the global namespace. This means that you cannot have any enumerator, global const, or anything else in the global namespace named READ or WRITE, even though the enumerators are part of the enum Mode. Furthermore, note how OpenFile(READ) uses READ as an argument, and not the more readable Mode::READ (that is only well-formed starting with C++11).

The fix for these shortcomings is quite simple – wrap the enum in a struct (or namespace):

struct Mode
{
  enum Enum
  {
    READ,
    WRITE
  };
};

Handle OpenFile(Mode::Enum mode)
{
  // ...
}

OpenFile(Mode::READ);

This solves the above flaws, and allows us to extent the struct with e.g. a ToString function that also does not pollute the global namespace.
The simplest solutions are often the best.

Posted by Stefan Reinalter

Categories: C++

Tags: C++, programming, tips

« Older Newer »

5 Responses to “C++ programming tips”

Thanks for sharing! I don’t get to work in C++ very often, but when I do it’s usually performance related… little nuggets like this are awesome! The “nowaways calling delete on nullptr is fine” thing too, when you use a language only semi-sporadically you tend to stay on the “better safe than sorry” side for way too long.

By Paul-Jan Pauptit on September 6, 2017 at 8:36 am
re tip #1 about , I have made it custom to always use the form

for ( size_t i=0u, n=strlen(from); i<n; ++i ) { .. }

(I believe I got this from Christer Ericson originally)

By Mikkel Gjøl on November 7, 2017 at 8:39 am
This is a nice write-up! With respect to avoiding aliasing, there was a C++ standards proposal to help provide tools for this: http://open-std.org/JTC1/SC22/WG21/docs/papers/2014/n4150.pdf it does not look like it progressed but there are some nice discussions in the paper.

By syaghmour on November 13, 2017 at 10:56 pm
Regarding Tip #7: Are there benefits when using a struct-enum as opposed to a namespace-enum? I am leaning towards using namespaces as they cannot be accidentally used as a type.

By gkoefner on December 22, 2017 at 12:42 pm
#7 – Why don’t you use modern language syntax like enum class Mode: unsigned char { Read, Write }; ? It’s cleaner and doesn’t pollute the namespace either, and less typing.

By Michael Lafreniere on December 27, 2017 at 5:41 am

Mobile Site | Full Site

Get a free blog at WordPress.com Theme: WordPress Mobile Edition by Alex King.

Molecular Musings

C++ programming tips

#cpp Tip 1: Do manual CSE (Common Subexpression Elimination). Don’t assume the compiler will do it. Most of the time (esp. in class methods), it can’t due to pointer aliasing. (Twitter)

#cpp Tip 1 example: for (i=0u; i<strlen(from);++i) { to[i] = from[i]; }. to and from are both (const)char* and can alias. check the disasm!

C++ member functions

Takeaway

#cpp Tip 2: You don’t need to check for nullptr before using delete/delete[] on an object/array. No you really don’t. (Twitter)

Takeaway

#cpp Tip 3: Don’t use std::vector<bool>. (Twitter)

#cpp Tip 4: Build your own virtual memory-aware std::vector replacement and use the 64bit address space. Less copying&better grow behaviour. (Twitter)

Takeaway

#cpp Tip 5: If you care about performance, check the generated assembly. No matter if C++ or shader code. You’ll be surprised. (Twitter)

#cpp Tip 6: Learn how lang features are impl. before blindly (over)using new, delete, std::function, lambdas, STL, virt. fcts. Don’t assume. (Twitter)

#cpp Tip 7: Wrap simple enums in a struct to counter namespace pollution, e.g. struct Mode { enum Enum { READ, WRITE }; }; (Twitter)

Share this:

Related

5 Responses to “C++ programming tips”

Leave a Reply