C++ programming tips

About a year and a half ago I posted a few C++ tips on Twitter. Because not all of my blog’s readers are on Twitter and Twitter is not the best medium for archiving things, I decided to write a blog post instead, accumulating all tips in one place.
Additionally, this also allows me to go into more detail where necessary and comment on a few things noted on Twitter.

I will keep updating this post as I add more tips.

Table of contents:
Tip 1: Do manual CSE (Common Subexpression Elimination). Don’t assume the compiler will do it.
Tip 2: You don’t need to check for nullptr before using delete/delete[] on an object/array. No you really don’t.
Tip 3: Don’t use std::vector<bool>.
Tip 4: Build your own virtual memory-aware std::vector replacement and use the 64bit address space. Less copying&better grow behaviour.


#cpp Tip 1: Do manual CSE (Common Subexpression Elimination). Don’t assume the compiler will do it. Most of the time (esp. in class methods), it can’t due to pointer aliasing. (Twitter)

#cpp Tip 1 example: for (i=0u; i<strlen(from);++i) { to[i] = from[i]; }. to and from are both (const)char* and can alias. check the disasm!

Pointer aliasing in general and in C++ member functions in particular is something that still seems to not be widely understood. If you have never heard of the terms pointer aliasing or strict aliasing rule, head over to this article from 2006 that serves as wonderful introduction to the topic.

Let’s take a closer look at the example code given above:

void Copy(char* to, const char* from)
{
	for (size_t i=0u; i < strlen(from); ++i)
	{
		to[i] = from[i];
	}
}

Note that even though from points to const char, it can still alias to, because both their types are compatible – const or not. This means that the compiler has to be conservative, assuming that a write to to could potentially write to from, implying that the result of a call to strlen(from) might have a different outcome each loop iteration.

In other words: the above loop has complexity O(N²), not O(N)!

Let us look at the assembler code generated by Clang, GCC, and MSVC, with full optimizations:

Clang 4.0.0:
Copy(char*, char const*):
        push    r15
        push    r14
        push    rbx
        mov     r15, rsi
        mov     r14, rdi
        mov     al, byte ptr [r15]
        test    al, al
        je      .LBB0_4
        mov     byte ptr [r14], al
        mov     rdi, r15
        call    strlen
        cmp     rax, 2
        jb      .LBB0_4
        mov     ebx, 1
.LBB0_3:
        movzx   eax, byte ptr [r15 + rbx]
        mov     byte ptr [r14 + rbx], al
        inc     rbx
        mov     rdi, r15
        call    strlen
        cmp     rbx, rax
        jb      .LBB0_3
.LBB0_4:
        pop     rbx
        pop     r14
        pop     r15
        ret
GCC 7.2:
Copy(char*, char const*):
        push    r12
        mov     r12, rdi
        push    rbp
        mov     rbp, rsi
        push    rbx
        xor     ebx, ebx
        jmp     .L3
.L6:
        movzx   eax, BYTE PTR [rbp+0+rbx]
        mov     BYTE PTR [r12+rbx], al
        add     rbx, 1
.L3:
        mov     rdi, rbp
        call    strlen
        cmp     rax, rbx
        ja      .L6
        pop     rbx
        pop     rbp
        pop     r12
        ret

VC2017 does the same, but inlines the call to strlen() which makes the generated assembly a bit longer and harder to read. All three compilers essentially evaluate strlen() in the inner loop.

But this is not the compilers’ fault. They can’t do better if they want to play by the rules. When compiling Copy() they can’t know whether code in a different translation unit might call this function with overlapping memory regions.
Yes, in theory (and things like link-time code generation or unity/blob/amalgamated builds) they could know by looking at all call sites of Copy(), but in practice this is not something compilers excel at.

The good news is that we can help the compiler in two ways: either by restrict-qualifying the pointers, or by using local variables.

The restrict keyword has been available in C for quite some time now, but hasn’t made its way into the C++ standard (yet?). Nevertheless, it is supported by all major compilers (__restrict or __restrict__).

In our case there is an even easier fix that doesn’t require any non-standard keywords: local variables.

void Copy(char* to, const char* from)
{
	const size_t length = strlen(from);
	for (size_t i=0u; i < length; ++i)
	{
		to[i] = from[i];
	}
}

With this simple change, we no longer care whether to and from alias – the code will only evaluate strlen() once. This of course implies that you have to be sure (or make sure!) that nobody ever calls Copy() with overlapping memory regions.

Looking at the generated assembler code, Clang and GCC will fully unroll and optimize the inner loop. Things are almost the same in MSVC, minus the unrolling.

When additionally restrict-qualifying the pointers, both Clang and GCC will realize that we’re pretty much just doing a memcpy() and come up with the following:

Copy(char* __restrict, char const* __restrict):
        push    r14
        push    rbx
        push    rax
        mov     rbx, rsi
        mov     r14, rdi
        mov     rdi, rbx
        call    strlen
        test    rax, rax
        je      .LBB0_2
        mov     rdi, r14
        mov     rsi, rbx
        mov     rdx, rax
        call    memcpy
.LBB0_2:
        add     rsp, 8
        pop     rbx
        pop     r14
        ret

C++ member functions

Because all non-static C++ methods carry an implicit this-pointer, every member can pretty much alias every argument (of a compatible type per the strict aliasing rule).

Want an example?

class XorCrypt
{
public:
	void Encrypt(char* buffer);

private:
	char m_key;
};

Even though char m_key and char* buffer might look like they cannot alias each other, don’t forget that each access to m_key is essentially this->m_key.
What does operator-> tell you?

Indeed, that’s a char* hidden in plain sight.
Everytime you hear somebody say that members in a loop will be kept in a register by the compiler anyway, as long as they aren’t written to: it’s one of the most common misconceptions I’ve heard in the last years over and over again.

But fret not, we can do something about that in this case too: most compilers also accept a restrict qualifier on methods. Pierre Terdiman has a short example for MSVC on his blog.

Takeaway

Beware of pointer aliasing, use local variables, judiciously use restricted pointers, and check the generated assembler code from time to time.


#cpp Tip 2: You don’t need to check for nullptr before using delete/delete[] on an object/array. No you really don’t. (Twitter)

The C++ standard mandates that calling delete/delete[] on nullptr is okay, which is implemented correctly by all major compilers nowadays. There used to be compilers that got it wrong, but that was ~10 years ago.

Nowadays you don’t need SAFE_DELETE macros and I also firmly believe that setting a pointer to null after deleting it just hides the symptoms, but doesn’t fix the underlying ownership problem(s). And the name SAFE_DELETE is a misnomer. And you shouldn’t use a macro for such things.

Takeaway

Have clear ownership rules, don’t hide symptoms but fix root causes instead.


#cpp Tip 3: Don’t use std::vector<bool>. (Twitter)

Do not use it.
Maybe it will be deprecated in the future?
Maybe the next C++ standard gets a dynamic_bitset or similar?


#cpp Tip 4: Build your own virtual memory-aware std::vector replacement and use the 64bit address space. Less copying&better grow behaviour. (Twitter)

As you probably know, an std::vector is a dynamic array that can grow & shrink at runtime. Whenever the vector needs to grow (e.g. in push_back) because its allocated block of memory is not large enough to hold another element, it needs to do the following:

  1. Allocate a new block of memory that has space for e.g. 2*N elements (the exact growth behaviour is implementation-defined, but growing by a factor of 2 is reasonable).
  2. Copy/move existing elements into the new storage.
  3. Copy/move the new element (e.g. in push_back) to the new storage.
  4. Destruct the old elements.
  5. Free the old block of memory.

This has several disadvantages:

  • Depending on the kind of elements held by the vector, copying/moving/destructing a lot of them is not a cheap operation.
  • Pointers to existing elements are invalidated, so you are never allowed to store a pointer to e.g. &myVector[5] in case the vector might need to grow.
  • The vector temporarily needs space for 3*N (N + 2*N) elements, which can cause out-of-memory issues even though you’re only going to use 2*N elements in the end.
  • Not calling reserve/resize leads to memory fragmentation due to several small allocations.

Making use of virtual memory and the abundance of address space available in 64-bit applications, we can craft our own vector-like replacement with none of these disadvantages.

First, we’re going to reserve address space for the expected worst case of memory needed by the vector. If you think about it, each and every single one of your vectors/arrays/whatever has some intrinsic upper bound of memory needed, however large that may be. On consoles, the upper bound could be the maximum amount of memory that can be allocated by your process – you can never allocate more than that anyway. More realistically speaking, your code probably already knows about expected worst case scenarios for e.g. game objects, components, etc., so we can use that instead.

Note that in this first step, we only reserve address space, but don’t commit any physical pages just yet.

Now, whenever our vector needs to grow, we just grab more pages from the OS’ virtual memory system in the address space we previously reserved. We are guaranteed to get memory at that address as long as there is enough physical memory left to allocate from.
In shrink_to_fit, we can simply return/decommit as many pages as possible to the OS.

With such a setup, we never have to copy or move elements when growing the vector. All the existing elements stay where they are, which means we could even store pointers to them if we wanted. Also note that we never need space for 3*N elements, but just for the amount of elements we’re going to store.

Of course, no technique is without disadvantages:

  • Depending on the page size of the OS (mostly 4k, but also 64k on certain systems) you may end up “losing” some memory due to the fact that our vector can only grow in page-size chunks. For example, consider a case where the vector needs to grow to make space for one more element – in that case, you end up allocating a new page from the OS, but only fill it with one element, rendering the rest of the page unusable for other allocations.
  • Committing physical pages for reserved address space is not cheap either, but should be faster than copying a lot of elements.
  • Only works for 64-bit applications. You’re going to run out of address space in 32-bit pretty fast.

If you liked this trick, Niklas Gray has more virtual memory tricks up his sleeve.

Takeaway

We can do better than a one-size-fits-all implementation by making use of the underlying hardware.

 

Advertisements

3 thoughts on “C++ programming tips

  1. Thanks for sharing! I don’t get to work in C++ very often, but when I do it’s usually performance related… little nuggets like this are awesome! The “nowaways calling delete on nullptr is fine” thing too, when you use a language only semi-sporadically you tend to stay on the “better safe than sorry” side for way too long.

  2. re tip #1 about , I have made it custom to always use the form

    for ( size_t i=0u, n=strlen(from); i<n; ++i ) { .. }

    (I believe I got this from Christer Ericson originally)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s