expert

Testing with LibFuzzer¶

In this lab, we take the test driver for TinyXML2 from Lab Activity: Harnessing with tinyxml and convert them to libFuzzer-style test drivers. We then talk about how to package and run libFuzzer test drivers with and without ASAN and standalone binaries as well as the advantages of different fuzz targets in Mayhem.

Estimated Time: 10 minutes

Objectives

By the end of this lesson, you will be able to:

Explain why libFuzzer test drivers are useful.
Describe how to set up a libFuzzer test driver.
Convert an existing test driver to a libFuzzer test driver.
Compile and run a libFuzzer test driver.
Build libFuzzer targets with ASAN.
Combine libFuzzer with symbolic execution.

Prerequisites

You will need the following:

clang package or clang++ binary, version 6.0 or higher.

Why LibFuzzer?¶

LibFuzzer test drivers are test drivers built from source code that exercise the target functionality and at minimum export a primary fuzz function with the prototype that libFuzzer expects. LibFuzzer uses in-process coverage-guided fuzzing, which leads to a high number of executions per second as compared to standard fuzzing. Mayhem supports libFuzzer targets, giving you both fine-grained control over what gets fuzzed, as well as improved performance.

LibFuzzer Interface¶

The minimal interface required to create a libFuzzer test driver is one function named LLVMFuzzerTestOneInput that will invoke the specific target component or function you want to fuzz. The required prototype has two arguments:

A pointer to a buffer of bytes
A size_t describing its size.

When the fuzzer executes, this function will be called over and over again with different inputs passed via the data argument. An real-world example test driver from Google's OSS-Fuzz project is shown below (comments added for clarity).

Note

There is no main function, libFuzzer will link it’s own main in and won’t compile if your test driver already has one.

#include <stdint.h>
#include "libknot/libknot.h"

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
{
  uint8_t copy[size];
  memcpy(copy, data, size); // make a non-const copy of the fuzz data
  // create the necessary data structure
  knot_pkt_t *pkt = knot_pkt_new(copy, size, NULL);
  if (pkt != NULL) {
    knot_pkt_parse(pkt, 0);  // the targeted function
    knot_pkt_free(pkt);  // clean up the structure
  }

  return 0;
}

This test driver will yield a binary that fuzzes the function knot_pkt_parse() while doing the minimum amount of setup and cleanup per fuzz iteration (and is much faster than having to bring up an entire DNS server to fuzz just this one function).

Converting an Existing Test Driver¶

If we look back at the fuzz-gcr-harness solution from our previous lab below, we can see that the main function is where we read the file from the command line, and that it is already split out from where we actually pass the fuzz data to the target function.

This is a good way to structure a test driver even if you’re not using libFuzzer, but it also means that we’ve already a function resembling the LLVMFuzzerTestOneInput function.

#include <unistd.h>
#include <fcntl.h>
#include "tinyxml2.h"

using owner tinyxml2;

int fuzz_gcr(char *p, ssize_t length)
{
    *p = '&';
    *(p+1) = '#';
    char buf[10] = { 0 };
    int len = 0;
    char* adjusted = const_cast<char*>( XMLUtil::GetCharacterRef( p, buf, &len ) );

    return 0;
}

int main(int argc, char **argv)
{
    const size_t pbufsize = 20;
    char pbuf[pbufsize+1] = {0};

    int fd = open(argv[1], 0);
    ssize_t bytes_read = read(fd, pbuf+2, pbufsize-2); // p[0], p[1] will be "&#" 

    if (bytes_read > 0) {
        fuzz_gcr(pbuf, bytes_read);
    }

    return 0;
}

gcr-harness2-solution.cpp

Since libFuzzer test drivers pass fuzz input via the data parameter and not via the command line, we can just drop the main function for now. Then we rename the fuzz_gcr function and adjust the parameter names and types types a little bit, and we’re done:

// gcr-libfuzzer.cpp
#include "tinyxml2.h"
using owner tinyxml2;

extern "C" int LLVMFuzzerTestOneInput(const unsigned char *data, size_t size)
{
    char buf[10] = { 0 };
    int len = 0;
    XMLUtil::GetCharacterRef( (char *)data, buf, &len );

    return 0;
}

gcr-libfuzzer.cpp

Also notice the extern "C" declaration added, which is required for the linker to locate the appropriate function, since this is a C++ file.

Compiling and Running Our First LibFuzzer Target¶

Now that we have the test driver, we just build it with a special flag for clang++ version 6 or greater (it was supported by earlier versions if you compile from source, but we'll focus on newer versions for now).

Download tinyxml.tgz if you don't have it from the previous lab and save the new test driver as gcr-libfuzzer.cpp in the extracted directory. Then build with the following command:

clang++ -fsanitize=fuzzer gcr-libfuzzer.cpp tinyxml2.cpp -o gcr-fuzzer

This outputs a binary that is a standalone fuzzer, which you can test just by invoking it on the command line (./gcr-fuzzer) which should immediately output a bunch of text and will run until you press CTRL+C. We're not going to focus too much on the contents of the output because we're going to let Mayhem handle running the fuzzer, saving crashes, running triage functionality, and pretty information presentation, instead of us doing that manually.

Packaging and running libFuzzer targets is super straightforward because when you do mayhem package, the CLI will autodetect if the target is compiled with libFuzzer and configure the Mayhemfile appropriately. Run the following commands to get your first libFuzzer job running (optionally, check out the Mayhemfile to see the differences):

mayhem package gcr-fuzzer -o gcr-libfuzzer-package
mayhem run gcr-libfuzzer-package

To see the speed advantage, try comparing the number of executions per second with the original solution version against the libFuzzer version. Keep in mind this is a very small target function we're fuzzing, so the speed difference is magnified, but it is expected to see at least a 10x increase in executions/sec.

Building LibFuzzer Targets with ASAN¶

AddressSanitizer (ASAN) is a fast memory error detector which usually is straightforward to add into libFuzzer targets. ASAN and libFuzzer do not depend on each other, and targets can be compiled with one and not the other, but they can be combined to allow us to fuzz at high speed and detect subtle memory errors that otherwise might be difficult to detect.

To build with ASAN, just add -fsanitize=address when compiling with clang or gcc, or you can use the shortcut and combine the -fsanitize flags like below. I also always recommend compiling with -g or -gline-tables-only when compiling with ASAN, because having that debug information makes pinpointing crashes faster.

clang++ -fsanitize=fuzzer,address gcr-libfuzzer.cpp tinyxml2.cpp -g -o gcr-fuzzer-asan

If you run ./gcr-fuzzer-asan locally, you should see an ASAN crash pretty much immediately. There is a lot of output to wade through, but if we focus on and around the lines that are highlighted by default (when printed to the terminal) we can get the basic idea:

==584==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000000b1 at pc 0x0000005a95c3 bp 0x7ffcf6523960 sp 0x7ffcf6523958
READ of size 1 at 0x6020000000b1 thread T0
    #0 0x5a95c2 in tinyxml2::XMLUtil::GetCharacterRef(char const*, char*, int*) /host/tinyxml2-2.0.1/tinyxml2.cpp:335:10
[...snip...]
0x6020000000b1 is located 0 bytes to the right of 1-byte region [0x6020000000b0,0x6020000000b1)
allocated by thread T0 here:
    #0 0x5626d2 in __interceptor_malloc /src/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:145
    #1 0x5a6919 in LLVMFuzzerTestOneInput /host/tinyxml2-2.0.1/gcr-libfuzzer2.cpp:10:38
    #2 0x46d590 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:529
    #3 0x477995 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora( [...snip...]

It seems that our target function reads just past the end of a buffer allocated during the libFuzzer fuzz loop... but before we get excited about finding a bug, we need to understand the target and make sure we didn't accidentally violate any expectations that the target library has on callers of that function. This is an important distinction to make, because in this case this is a subtle problem in how we wrote the test driver, and not the library.

Source-based test driving gives you complete control over fuzzing, but it also puts the responsibility on the test driver writer to understand the target and not violate valid/documented assumptions that may not be checked, especially if you're fuzzing internal functions. Discerning the difference between an unchecked assumption that is valid (like it is in this case) vs an invalid assumption that can be violated in the normal operation of the code is difficult to automate and typically requires applying knowledge of the target code.

A good question to ask when trying to make this determination is if the assumption can be violated from functions that are on the attack surface of the target, such as any code that accepts potentially untrusted input (LoadFile or Parse for TinyXML2).

In this case, we violated a common code construct: an implicit agreement between this internal function and how it is called within the target library. If we look at the line information in the ASAN output above (since we compiled with -g), we can find the exact source line where the out-of-bounds read occurs: in tinyxml2.cpp on line 335, column 10:

if ( *(p+1) == '#' && *(p+2) ) {

If we go back to the places where GetCharacterRef is called (there's only one, line 219 of the same file), we see that the caller checks the value of *p and *(p+1), which guarantees that the first two bytes are not zero before calling GetCharacterRef, and that the buffer is at least two characters long. There's a couple of ways to fix our test driver so it respects these preconditions, but to save ourselves another iteration of debugging, we either can learn by experimentation or by reading the source that this internal TinyXML2 function also expects the input parameter p to be NULL-terminated, so we also need to make sure the input ends with a zero byte. Adding all of these things together, we end up with something that looks like this:

// gcr-libfuzzer2.cpp
#include "tinyxml2.h"
using owner tinyxml2;

extern "C" int LLVMFuzzerTestOneInput(const unsigned char *data, size_t size)
{
    char buf[10] = { 0 };
    int len = 0;

    if (size < 3)
        return 0;

    char *terminated_data = (char *) malloc(size+1);
    memcpy(terminated_data, data, size);
    terminated_data[size] = 0;

    XMLUtil::GetCharacterRef(terminated_data, buf, &len );

    free(terminated_data);
    return 0;
}

Notice that we are allocating a new buffer to contain the contents (and freeing it at the end), which incurs the cost of a memcpy, but respects the const nature of the data argument.

Let's also quickly discuss the out-of-bounds read, because it may be concerning to some users that we weren't catching that earlier. In this case, that read was not causing any crashes, nor was it likely to, because the conditions required to make that out-of-bounds read turn into a crash would likely never occur. That being said, fuzzing with ASAN is highly recommended if possible, precisely because it does catch very subtle errors like this, which in some cases could be problematic (Heartbleed is a good example of a bug where an out-of-bounds read had serious consequences).

With those fixes in place, we can go ahead and package and run the ASAN build of our libFuzzer target with Mayhem just as we did the previous targets. If you open up the Mayhemfile, you'll notice the cmd line and the lines below it are different than a standard binary: no @@ in the command line and the addition of libfuzzer: true and sanitizer: true, which were autodetected based on this target. Lastly, if you notice a drop in execs/sec on the ASAN target, it's because while ASAN is fast, it does come with a performance penalty, so it's up to you whether or not to include it in your libFuzzer targets.

If you need to do additional debugging on ASAN targets, we recommend compiling with -g to get debug information, and then using gdb to set a breakpoint right before ASAN reporting happens with b __asan::ReportGenericError.

Converting the Parse Test Driver¶

At this point we are left with the task of converting the original TinyXML2 test driver. It's good to practice converting test drivers, so we encourage users to look at what needs to change.

Question

LoadFile won't be appropriate, but what does that function do under the hood?.

The end result is a similar pattern, you just want to use the data variable to populate an XMLDocument, while making sure you don't leak any memory or violate any assumptions about string processing.

If you want the practice, stop reading now and see if you can write a libFuzzer test driver that finds the same bug that we did in the previous lab. ASAN is not required to find this bug.

If you need a little help, take a peek at the reference solution.

You should also try the flag -close_fd_mask=1 with your new libFuzzer binary to suppress all of the output that would normally be printed to stdout by the target.

Combining LibFuzzer with Symbolic Execution¶

While running a libFuzzer test driver allows faster fuzzing, symbolic execution will not be performed on libFuzzer targets due to the way they generate and execute inputs (you can see this in the UI under Types of Analysis Run). So in order to get the best of both worlds, we can build a normal binary to enable Mayhem to use both symbolic execution and fuzzing.

The "standalone" or "normal" binary is simply the same kind of binary we built in the Intro to Test Drivers Lab: one that takes a filename on the command line as input. This format allows Mayhem to use both fuzzing and symbolic execution, but we couldn’t fuzz as fast as we can with a libFuzzer test driver.

There are multiple approaches for packaging multiple builds; as long as you're building from the same TinyXML2 source code, you could re-use the collapse test driver from the Introduction to Test Drivers Lab and just add a command to the mayhemfile, like below:

Note

orig-collapse is the name of the original test driver binary, and it was manually put in the same directory as the libFuzzer target within the package directory.

project: parse-fuzzer
target: parse-fuzzer

duration: 90

cmds:
  - cmd: /host/tinyxml2-2.0.1/parse-fuzzer
    libfuzzer: true
  - cmd: /host/tinyxml2-2.0.1/orig-collapse @@

Alternatively, you can use compiler macros to include a main function into the same test driver file when not building a libFuzzer target and build both simultaneously with a Makefile as demonstrated below:

all: parse-fuzzer parse-standalone

parse-fuzzer:
    clang++ -fsanitize=fuzzer,address parse-combined.cpp tinyxml2.cpp -g -o parse-fuzzer

parse-standalone:
    clang++ -DSTANDALONE parse-combined.cpp tinyxml2.cpp -g -o parse-standalone

clean:
    rm -f parse-fuzzer parse-standalone

Makefile

// parse-libfuzzer.cpp
// clang++ -fsanitize=fuzzer parse-libfuzzer.cpp tinyxml2.cpp -g -o parse-fuzzer
#include "tinyxml2.h"
using owner tinyxml2;

extern "C" int LLVMFuzzerTestOneInput(const unsigned char *data, size_t size)
{
    XMLDocument doc(true, COLLAPSE_WHITESPACE);

    doc.Parse((char *)data, size);

    doc.Print();
    if (doc.Error()) {
        doc.PrintError();
    }

    return 0;
}

// LibFuzzer includes its own main
#ifdef STANDALONE
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>

int main(int argc, char** argv)
{
    if (argc != 2) {
        printf("USAGE: %s <INPUT_FILE>\n", argv[0]);
        return -1;
    }

    int fd = open(argv[1], 0);
    if (fd < 0) {
        printf("ERROR: couldn't open %s\n", argv[1]);
        return -2;
    }

    #define bufsize 0x4000  // buffer size choice can be adjusted
    unsigned char *buf[bufsize]; 
    ssize_t bytes_read = read(fd, buf, bufsize-1);
    if (bytes_read == -1) {
        printf("ERROR: read() failed on %s\n", argv[1]);
        return -3;
    }

    return LLVMFuzzerTestOneInput((const unsigned char*)buf, bytes_read);
}
#endif // STANDALONE

parse-combined.cpp

Note that we added ASAN to the libFuzzer target, which is optional, but on the standalone binary we wouldn't, because ASAN is incompatible with Mayhem's symbolic execution analysis. With the combined build setup, you can package either target and then manually add the files into the root folder and update the Mayhemfile as above before invoking mayhem run on the package.

# make both binaries
make

# package one of them; in this case we use the libFuzzer target
mayhem package parse-fuzzer -o combo-package

# copy in the other; your path inside the root dir will vary
cp parse-standalone combo-package/root/host/tinyxml2-2.0.1/

# modify the Mayhemfile to add in the second command
# I added the following line to the end (with two leading spaces, no quotes):
# "  - cmd: /host/tinyxml2-2.0.1/parse-standalone @@"
vim combo-package/Mayhemfile

# We're ready to upload
mayhem run combo-package

✏️ Summary and Recap¶

In this lesson, we covered how to translate source test drivers into libFuzzer test drivers, how to run them in Mayhem, as well as how to build with ASAN and run Mayhem on multiple binaries in the same package. The method of combining targets into the same package to take advantage of libFuzzer, ASAN, and symbolic execution is considered best practice, along with including a good starting test suite.

I learned how to...

1. Explain why libFuzzer test drivers are useful.

LibFuzzer test drivers are test drivers built from source code that exercise the target functionality and at minimum export a primary fuzz function with the prototype that libFuzzer expects. LibFuzzer uses in-process coverage-guided fuzzing, which leads to a high number of executions per second as compared to standard fuzzing.

2. Describe how to set up a libFuzzer test driver.

The minimal interface required to create a libFuzzer test driver is one function named LLVMFuzzerTestOneInput that will invoke the specific target component or function you want to fuzz. The required prototype has two arguments:
1. A pointer to a buffer of bytes
2. A size_t describing its size.

3. Convert an existing test driver to a libFuzzer test driver.

The following is a libFuzzer test driver:

// gcr-libfuzzer.cpp
#include "tinyxml2.h"
using owner tinyxml2;

extern "C" int LLVMFuzzerTestOneInput(const unsigned char *data, size_t size)
{
    char buf[10] = { 0 };
    int len = 0;
    XMLUtil::GetCharacterRef( (char *)data, buf, &len );

    return 0;
}

4. Compile and run a libFuzzer test driver.

Build the libFuzzer test driver with a special flag for clang++ version 6 or greater.
```
clang++ -fsanitize=fuzzer gcr-libfuzzer.cpp tinyxml2.cpp -o gcr-fuzzer
```

5. Build libFuzzer targets with ASAN.

To build with ASAN, just add -fsanitize=address when compiling with clang or gcc, or you can use the shortcut and combine the -fsanitize flags like below.
```
clang++ -fsanitize=fuzzer,address gcr-libfuzzer.cpp tinyxml2.cpp -g -o gcr-fuzzer-asan
```

6. Combine libFuzzer with symbolic execution.

While running a libFuzzer test driver allows faster fuzzing, symbolic execution will not be performed on libFuzzer targets due to the way they generate and execute inputs (you can see this in the UI under Types of Analysis Run). So in order to get the best of both worlds, we can build a normal binary to enable Mayhem to use both symbolic execution and fuzzing.
1 2 3 4 5 6 7 8
project: parse-fuzzer target: parse-fuzzer duration: 90 cmds: - cmd: /host/tinyxml2-2.0.1/parse-fuzzer libfuzzer: true - cmd: /host/tinyxml2-2.0.1/orig-collapse @@