C++ programming tips

About a year and a half ago I posted a few C++ tips on Twitter. Because not all of my blog’s readers are on Twitter and Twitter is not the best medium for archiving things, I decided to write a blog post instead, accumulating all tips in one place.
Additionally, this also allows me to go into more detail where necessary and comment on a few things noted on Twitter.

I will keep updating this post as I add more tips.

#cpp Tip 1: Do manual CSE (Common Subexpression Elimination). Don’t assume the compiler will do it. Most of the time (esp. in class methods), it can’t due to pointer aliasing. (Twitter)

#cpp Tip 1 example: for (i=0u; i<strlen(from);++i) { to[i] = from[i]; }. to and from are both (const)char* and can alias. check the disasm!

Pointer aliasing in general and in C++ member functions in particular is something that still seems to not be widely understood. If you have never heard of the terms pointer aliasing or strict aliasing rule, head over to this article from 2006 that serves as wonderful introduction to the topic.

Let’s take a closer look at the example code given above:

void Copy(char* to, const char* from)
{
	for (size_t i=0u; i < strlen(from); ++i)
	{
		to[i] = from[i];
	}
}

Note that even though from points to const char, it can still alias to, because both their types are compatible – const or not. This means that the compiler has to be conservative, assuming that a write to to could potentially write to from, implying that the result of a call to strlen(from) might have a different outcome each loop iteration.

In other words: the above loop has complexity O(N²), not O(N)!

Let us look at the assembler code generated by Clang, GCC, and MSVC, with full optimizations:

Clang 4.0.0:
Copy(char*, char const*):
        push    r15
        push    r14
        push    rbx
        mov     r15, rsi
        mov     r14, rdi
        mov     al, byte ptr [r15]
        test    al, al
        je      .LBB0_4
        mov     byte ptr [r14], al
        mov     rdi, r15
        call    strlen
        cmp     rax, 2
        jb      .LBB0_4
        mov     ebx, 1
.LBB0_3:
        movzx   eax, byte ptr [r15 + rbx]
        mov     byte ptr [r14 + rbx], al
        inc     rbx
        mov     rdi, r15
        call    strlen
        cmp     rbx, rax
        jb      .LBB0_3
.LBB0_4:
        pop     rbx
        pop     r14
        pop     r15
        ret
GCC 7.2:
Copy(char*, char const*):
        push    r12
        mov     r12, rdi
        push    rbp
        mov     rbp, rsi
        push    rbx
        xor     ebx, ebx
        jmp     .L3
.L6:
        movzx   eax, BYTE PTR [rbp+0+rbx]
        mov     BYTE PTR [r12+rbx], al
        add     rbx, 1
.L3:
        mov     rdi, rbp
        call    strlen
        cmp     rax, rbx
        ja      .L6
        pop     rbx
        pop     rbp
        pop     r12
        ret

VC2017 does the same, but inlines the call to strlen() which makes the generated assembly a bit longer and harder to read. All three compilers essentially evaluate strlen() in the inner loop.

But this is not the compilers’ fault. They can’t do better if they want to play by the rules. When compiling Copy() they can’t know whether code in a different translation unit might call this function with overlapping memory regions.
Yes, in theory (and things like link-time code generation or unity/blob/amalgamated builds) they could know by looking at all call sites of Copy(), but in practice this is not something compilers excel at.

The good news is that we can help the compiler in two ways: either by restrict-qualifying the pointers, or by using local variables.

The restrict keyword has been available in C for quite some time now, but hasn’t made its way into the C++ standard (yet?). Nevertheless, it is supported by all major compilers (__restrict or __restrict__).

In our case there is an even easier fix that doesn’t require any non-standard keywords: local variables.

void Copy(char* to, const char* from)
{
	const size_t length = strlen(from);
	for (size_t i=0u; i < length; ++i)
	{
		to[i] = from[i];
	}
}

With this simple change, we no longer care whether to and from alias – the code will only evaluate strlen() once. This of course implies that you have to be sure (or make sure!) that nobody ever calls Copy() with overlapping memory regions.

Looking at the generated assembler code, Clang and GCC will fully unroll and optimize the inner loop. Things are almost the same in MSVC, minus the unrolling.

When additionally restrict-qualifying the pointers, both Clang and GCC will realize that we’re pretty much just doing a memcpy() and come up with the following:

Copy(char* __restrict, char const* __restrict):
        push    r14
        push    rbx
        push    rax
        mov     rbx, rsi
        mov     r14, rdi
        mov     rdi, rbx
        call    strlen
        test    rax, rax
        je      .LBB0_2
        mov     rdi, r14
        mov     rsi, rbx
        mov     rdx, rax
        call    memcpy
.LBB0_2:
        add     rsp, 8
        pop     rbx
        pop     r14
        ret

C++ member functions

Because all non-static C++ methods carry an implicit this-pointer, every member can pretty much alias every argument (of a compatible type per the strict aliasing rule).

Want an example?

class XorCrypt
{
public:
	void Encrypt(char* buffer);

private:
	char m_key;
};

Even though char m_key and char* buffer might look like they cannot alias each other, don’t forget that each access to m_key is essentially this->m_key.
What does operator-> tell you?

Indeed, that’s a char* hidden in plain sight.
Everytime you hear somebody say that members in a loop will be kept in a register by the compiler anyway, as long as they aren’t written to: it’s one of the most common misconceptions I’ve heard in the last years over and over again.

But fret not, we can do something about that in this case too: most compilers also accept a restrict qualifier on methods. Pierre Terdiman has a short example for MSVC on his blog.

Takeaway

Beware of pointer aliasing, use local variables, judiciously use restricted pointers, and check the generated assembler code from time to time.

#cpp Tip 2: You don’t need to check for nullptr before using delete/delete[] on an object/array. No you really don’t. (Twitter)

The C++ standard mandates that calling delete/delete[] on nullptr is okay, which is implemented correctly by all major compilers nowadays. There used to be compilers that got it wrong, but that was ~10 years ago.

Nowadays you don’t need SAFE_DELETE macros and I also firmly believe that setting a pointer to null after deleting it just hides the symptoms, but doesn’t fix the underlying ownership problem(s). And the name SAFE_DELETE is a misnomer. And you shouldn’t use a macro for such things.

Takeaway

Have clear ownership rules, don’t hide symptoms but fix root causes instead.

#cpp Tip 3: Don’t use std::vector<bool>. (Twitter)

Do not use it.
Maybe it will be deprecated in the future?
Maybe the next C++ standard gets a dynamic_bitset or similar?

Advertisements

One thought on “C++ programming tips

  1. Thanks for sharing! I don’t get to work in C++ very often, but when I do it’s usually performance related… little nuggets like this are awesome! The “nowaways calling delete on nullptr is fine” thing too, when you use a language only semi-sporadically you tend to stay on the “better safe than sorry” side for way too long.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s