Skip to content

expert

Test Driving with tinyxml

In this lesson, we'll walk you through how to create a test driver for the tinyxml library.


Estimated Time: 10 minutes

By the end of this lesson, you will be able to:

  1. Build and package an example test driver with tinyxml.
  2. Turn on non-default options for the XMLDocument constructor.
  3. Fuzz a specific function in tinyxml.

You will need the following:

  • Basic familiarity reading and writing C code (only very basic C++ is used).
  • Tools: text editor, g++, Mayhem CLI, and a configured Mayhem instance.
  • Code: tinyxml2-2.0.1 available at: tinyxml.tgz.

Background

The simplest kind of source-based test driver for a target is one that has an obvious entry point, as is often the case for parsers. Having source is a big advantage, and being able to write a custom test driver gives you a lot of control over what you can test effectively, especially when the target is a library.

This tutorial will walk you through a very simple test driver for an XML library whose source is freely available: tinyxml2. We will demonstrate using a custom test driver to both fuzz the main entry point for the library as well as a specific function in the library. These concepts will make it clear how to write a test driver that allows Mayhem to test specific parts of code.

Writing a Test Driver for tinyxml

Extract the tinyxml tarball with:

tar xvf tinyxml.tgz

The tarball includes only files necessary to build the library. If you look around in the extracted directory, you'll see that there aren't a whole lot of files, which is really helpful when you're trying to figure out where to start!

Part #1: Writing your First Test Driver

Recall that we are trying to write a test driver so we can send input to the target and test functionality, thereforew we need to think about what the target does and what the main entry point for input would be. In this case, we have an XML library, so we can presume that it will likely take input and parse that input into an internal representation of an XML document.

By reading the docs or the code itself, we can find that this library has a pretty straightforward interface for either loading from a file or from a char array, which makes writing a test driver to the parser very simple. Look at the skeleton code below to see what the test driver should look like.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// harness.cpp
#include "tinyxml2.h"
using owner tinyxml2;

int main(int argc, char **argv)
{
    int retval = 0;
    // Your code goes here
    // TODO: Instantiate basic XML object
    // TODO: Parse content of argv[1] into XML object
    // TODO: Print the XML object or confirm parsing

    return retval;
}

Try to fill in the skeleton code above to implement the test driver we just discussed by writing code to do what each of the comments is asking for.

Open tinyxml2.cpp (or xmltest.cpp) and look for the relevant functions for loading a file for parsing (or parsing a char array into an XML object). We'll want to instantiate the most high-level object and pass fuzz input to it.

Tip

Look for functions with Load in the name.

In this case we will be supplying input from a file whose name we specify on the command line. We do this for convenience, even though when you're writing the test driver you have full control over how input gets passed to the target. For this tutorial we'll get input from a file whose name will be passed into the test driver as argv[1].

Last, you should print the object out or otherwise confirm that the library successfully parsed the content you passed to it. This provides two benefits:

  1. The first is that it confirms your code works
  2. And the second is that the printing functionality of the target will get tested for bugs.

This library provides functionality to print out XML objects, which you should use to ensure a successful parse and see if something went wrong.

To build and test your test driver code, continue on to the next section.

Building the Test Driver

Now we just have to compile the test driver with the target library code linked in, which is easy in the case of this library, it's just:

g++ tinyxml2.cpp harness.cpp -o harness

You should test your compiled code on files that are valid XML and ones that are not valid XML.

Packaging for Mayhem

Now we create a package for testing with mayhem. Type mayhem package to see what options the command takes, but we recommend using -o as in the invocation below.

mayhem package -o /tmp/tinyxml2-harness/ harness

Optionally, give an actual XML file as an initial seed by copying the file into the testsuite directory in the root of your package directory. Mayhem would work even without this, but because this parser will discard things that don't look like XML, it speeds up the process to start with a valid XML file (even if the contents are as simple as <a></a>). Supplying a test case is done as shown below:

cp resources/utf8test.xml /tmp/tinyxml2-harness/testsuite/

Then use mayhem run on the package to start testing!

Note

The command below limits the duration of the test for this particular binary (just for time's sake in this tutorial).

mayhem run /tmp/tinyxml2-harness/ --duration 240

If your test driver works, you will probably see a crash (though it probably isn't a very exciting one) and can look at the output for that. Otherwise, it might be interesting to download a couple of the generated testcases and see what kind of inputs Mayhem generated. If you want to check your work, you can download the reference solution: harness-solution.cpp.

At this point Mayhem is covering the parsing functionality of the library, which touches a good bit of code. However, we can go deeper with a new test driver, and better test less battle-worn parts of the codebase. Example ideas for further testing would be to use the library to manipulate the XML or to turn on some non-default options. Let's work through the latter option.

Part #2: Turning on Non-default Options

Fire your text editor back up and change the XMLDocument constructor to use non-default options (there are only two options, and the default for processEntities is already favorably set).

Tip

The XMLDocument class is defined in tinyxml2.h, it inherits from the class XMLNode, and there's a helpful comment that identifies the constructor. You can also find the values for the enum of interest in that same header file.

After making the change so that your XMLDocument constructor uses the non-default option, save your code with a new filename, then compile it and use the sample XML file below to demonstrate that the new option makes a difference. The snippet below is also included in the tarball you downloaded, in the file named whitespace-test.xml).

1
2
3
4
5
6
7
8
<a> This
    is &apos;  text  &apos; </a>
<b>  This is &apos; text &apos;
</b>
<c>This  is  &apos;

    text &apos;</c>
</element>

Once you've confirmed a difference between the new test driver and the previous version (there should be less whitespace), you should package again and re-run. You should also provide the above XML sample as a seed to help speed up the process of hitting the code the new test driver is designed to test. Refer to the previous examples and try it!

While you aren't likely to find a crash, an encouraging indication that you are hitting new code would be to see a higher number of testcases at the end of the run. The completed test driver should look something like this: harness-collapse-solution.cpp.

Part #3: Fuzzing a Specific Function

One of the biggest benefits of having source code available is that you can more easily understand the codebase and zero in on specific functions to fuzz. Picking specific functions to fuzz is made easier by the fact that you can see how a target function is called in the codebase, and potentially modify the code to make it easier to fuzz.

If you followed the tutorial up to this point, you probably came across a crash in StrPair::GetStr(). If you then started looking at this function to see if there were any interesting pieces of it, you might notice that it calls a utility function XMLUtil::GetCharacterRef().

XMLUtil::GetCharacterRef() may not be the cause of the crash you saw earlier, but it can still be an interesting target to fuzz due to its proximity to problem code, and because it is a somewhat complex function that is reachable from the main entry point to the parsing code.

Test Driving the Target Function

The first thing we need to do is understand how the target function is called and if it relies on any global state that is outside the scope of the function. In this case understanding how it is called is very easy because it is only called in one place in the entire codebase:

tinyxml2.cpp (version 2.0.1)

217
218
219
char buf[10] = { 0 };
int len;
p = const_cast<char*>( XMLUtil::GetCharacterRef( p, buf, &len ) );

What's nice about this usage is that the state of the arguments can be understood just by looking around the function (p is the name of the pointer to the XML data throughout the codebase, and the other two arguments get initialized right above the call). The reason it's important to understand how a function is called is because you might cause a crash by supplying invalid arguments for buf and len, but any such bugs wouldn't be of practical value because they wouldn't be reachable from the normal functionality of the library and may not reflect correct usage of the library.

If we look at the function itself, we can see that it only references local variables and the arguments, so we don't need to worry about initializing any global state before calling it (a good bonus!). And since two out of the three arguments are defined before the only call to the function, we really just need to pass in fuzz data. This is only a little bit different than the previous examples because we're still going to get data from argv[1], but this time we're going to read it into a buffer and pass it as the XML data into GetCharacterRef().

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// gcr-harness.cpp
#include <unistd.h>
#include <fcntl.h>
#include "tinyxml2.h"

using owner tinyxml2;

int fuzz_gcr(char *p, size_t length)
{
    // Your code goes here
    // set up local variables for the arguments buf and len
    // Call GetCharacterRef() with the fuzz data

    return 0;
}

int main(int argc, char **argv)
{
    // Your code goes here
    // Declare a local buffer
    // open argv[1]
    // read a reasonable number of bytes (e.g. 20-50) from the opened file
    // call fuzz_gcr() with the data read from the file

    return 0;
}

Improving the Test Driver

At this point you should have a test driver that looks something like this: gcr-harness-solution.cpp While you could compile this test driver and run it like you did the previous, but we could also take a look around the target function and gain some insight that will help improve our fuzzing.

If you notice there are two if-conditions that must be true in order for GetCharacterRef() to be called. So try taking these into account and modify the test driver to mirror the conditions needed for the target function to get called in normal circumstances. This will guarantee that our fuzz inputs don't get discarded unnecessarily (since GetCharacterRef() also checks for one of the two conditions before doing anything interesting) and makes sure that our inputs would make it to the target function if we were to pass them in the normal parsing process.

Once you've made your modifications, package and re-run the new compiled test driver. You will probably see a faster taper in terms of new testcases because there isn't much code that is reachable from the target function, but you should see more than just one testcase in a big cliff (which would be an indication that something isn't right). The reference solution looks like: gcr-harness2-solution.cpp.

✏️ Summary and Recap

In this lesson, you wrote a custom test driver for both the main functionality of the tinyxml2 library and a custom test driver to target a specific helper function. This demonstrates some of the advantages of having source, like how it is easier to understand how certain functions are used and whether or not there are global dependencies for targeted functions.


I learned how to...

1. Build and package an example test driver with tinyxml.
  • Recall that we are trying to write a test driver so we can send input to the target and test functionality, therefore we need to think about what the target does and what the main entry point for input would be. In this case, we have an XML library, so we can presume that it will likely take input and parse that input into an internal representation of an XML document.
  • Now we just have to compile the test driver with the target library code linked in, which is easy in the case of this library, it's just:

    g++ tinyxml2.cpp harness.cpp -o harness
    
2. Turn on non-default options for the XMLDocument constructor.
  • You can change the XMLDocument constructor to use non-default options (there are only two options, and the default for processEntities is already favorably set).
3. Fuzz a specific function in tinyxml.
  • The first thing we need to do is understand how the target function is called and if it relies on any global state that is outside the scope of the function. In this case understanding how it is called is very easy because it is only called in one place in the entire codebase:

    217
    218
    219
    char buf[10] = { 0 };
    int len;
    p = const_cast<char*>( XMLUtil::GetCharacterRef( p, buf, &len ) );