Thursday, November 5, 2015

Sanitize All The Things!

[Edited 2015-12-04 to add LSAN]

If your C/C++ project includes run-time tests (you do have tests, don't you?), then it's a really good idea to run the tests under all of the sanitizers that are available from the compiler:

  • Address Sanitizer (ASAN, -fsanitize=address): detect out-of-bounds memory accesses
  • Memory Sanitizer (MSAN, -fsanitize=memory): detect reads of uninitialized memory.
  • Undefined Behaviour Sanitizer (UBSAN, -fsanitize=undefined): detect reads of uninitialized memory.
  • Thread Sanitizer (TSAN, -fsanitize=thread): detect data races in multi-threaded programs.
  • Leak Sanitizer (LSAN, -fsanitize=leak): detect memory leaks (recent Clang versions enable this by default when ASAN is used)

These tools have been available in Clang for a while, but more recent versions of GCC also include some of them:

Compiler VersionASANMSANUBSANTSANPPA / package
gcc-4.4 N N N N
gcc-4.5 N N N N
gcc-4.6 N N N N
gcc-4.7 N N N N ubuntu-toolchain-r / gcc-4.7
gcc-4.8 Y N N Y ubuntu-toolchain-r / gcc-4.8
gcc-5 Y N Y Y ubuntu-toolchain-r / gcc-5
clang-3.4 Y Y Y Y
clang-3.5 Y Y Y Y ubuntu-toolchain-r-test, llvm-toolchain-precise-3.5 /clang-3.5*
clang-3.6 Y Y Y Y ubuntu-toolchain-r-test, llvm-toolchain-precise-3.6 / clang-3.6*
clang-3.7 Y Y Y Y ubuntu-toolchain-r-test, llvm-toolchain-precise-3.7 / clang-3.7*
clang-3.8 Y Y Y Y ubuntu-toolchain-r-test, llvm-toolchain-precise / clang-3.8*

*: Only one of the different clang versions can be installed at a time, so you need to pick a single version for your build. (The gcc packages allow multiple gcc-ver binaries to be installed in parallel.)

There are a couple of things to be aware of when pointing the sanitizers at your code:

  • MSAN expects that all of the code that makes up your test executable has been compiled with -fsanitize=memory. If it hasn't – for example, if you link in some system-provided library from /usr/lib – then there will be lots of false positive uninitialized-memory-read errors (because memory initialization done by the uninstrumented library is missed). If your project has a lot of external dependencies, this may make it much harder to use MSAN.
  • UBSAN can be a little overwhelming when applied to older C codebases, so you might need to replace the overall -fsanitize=undefined with a subset of the individual checks.

Finally, to help confirm that your sanitizer builds are correctly catching errors, here's a test program that has every error under the sun:

#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>

void *inc_x(void *p) {
  int *px = (int*)p;
  (*px)++;
  return NULL;
}

int main() {
  char *p = malloc(10);
  p[5] = '\0';
  if (p[2] == '\0')  /* MSAN: uninitialized memory read */
    printf("found null\n");

  p[11] = '\0'; /* ASAN: heap-buffer-overflow */

  int i = 23;
  i <<= 32;  /* UBSAN: shift overflow */
  char array[3] = "ab";
  printf("one step beyond: %c\n", array[4]); /* UBSAN: index out of bounds*/
  char data[5] = {0x00, 0x01, 0x02, 0x03, 0x04};
  int *pi = (int*)&(data[1]);
  printf("int %08x\n", *pi); /* UBSAN: misaligned address */

  pthread_t t;
  int x = 0;
  printf("x=%d\n", x);
  pthread_create(&t, NULL, inc_x, &x);
  x = 3; /* TSAN: data race */
  pthread_join(t, NULL);
  printf("x=%d\n", x);

  p = NULL; /* LSAN: leak p */
  return 0;
}

No comments:

Post a Comment