expert
Testing with Shared Libraries¶
In this lesson, we'll walk you through how to fuzz C++ applications that may not seem immediately amenable for fuzzing but whose shared libraries can be fuzzed by linking a source C++ test driver against the binary.
Estimated Time: 10 minutes
By the end of this lesson, you will be able to:
- Define what a shared library is.
- Articulate the difference between test driving libraries vs. applications.
- Explain how binary-only test driving works.
- Walk through an example shared library test driver.
Shared Libraries¶
A shared library...
- Is a
.so
file on Linux, and a.dll
on Windows. - Contains compiled code and associated data.
- Can be shared or used among different programs.
The original purpose of shared libraries is to save disk space by sharing compiled code between multiple binary programs. When a software library is compiled as a shared library object, programs can load this library instead of containing their own copy of the library's code.
Note
Programs loading a library make a virtual in-memory copy of that library. Therefore, multiple programs using the same shared library do not interfere with each other.
Shared libraries can also be used to provide software libraries in a pre-compiled form to avoid including their full source. In this use case, header files are provided as well, which tell the source compiler how to link against the shared library. There is also program metadata within shared libraries themselves that contain linking information (most notably function name symbols).
Modern applications are often delivered in a package consisting of one or more program executables, and multiple shared libraries that the programs depend upon. In this use case, the shared libraries may not actually be intended to be shared among multiple programs, or have new programs linked against them. But it remains technically possible by writing test drivers for shared libraries. Let's see how this is done!
Harnessing libraries vs Applications¶
In general, when testing any library-like component of an application (not just shared libraries), consider the following:
- A test driver for a library is essentially an alternate application written with that library. Even though this alternate application can be simple, creating it is often more work than fuzzing an existing application.
- The way an application uses (or misuses) a library may differ from the way a test driver uses that library. A test driver that does not adequately imitate the application will miss or encounter different bugs.
- A library test driver has no ability to find bugs in application code outside of that library.
There are two main reasons you may wish to test a library, as opposed to an application:
- The library may be easier to test drive than the application. For example, an HTML parser is relatively easy to test drive, and a web browser is not.
- Potentially improving the speed or quality of fuzzing: At the library level, your test driver can have a more fine-grained ability to skip slow or uninteresting parts of the software logic, resulting in a faster test driver that finds more bugs.
Typically speaking, it's best to try to test a whole application first, along with any library components you believe are particularly easy and particularly buggy (for example, parsers or protocol-processing code). Additional library components should be test driven only if the whole-application test driver doesn't seem to be producing adequate coverage for that component.
Binary-only Test Driving¶
Security experts often need to test drive a shared library that is provided without source. This tutorial is aimed at exploring custom shared libraries shipped with an application for which source is not available.
If, for example, you know that an application uses an open source library, it's better to acquire the source (preferably for the same version as the application uses) and use source-test-driving techniques on that. As another example, if a closed-source library is released stand-alone to developers, it will likely come with header files that can make the best test driving approach more like source-based test driving.
Requirements¶
- Skills: Basic C/C++ programming, some reverse engineering
- Tools: g++ (and gnu binutils)
Note
This tutorial is for C and C++ libraries. Many details and techniques will be specific to C and C++. Although this tutorial uses Linux, and the exact commands shown will only work on Linux, all the principles and tricks here translate to any platform that supports shared libraries—Windows, Mac OS, iOS, Android and more—albeit with different tools and commands.
Example 1: Basics¶
- File: example1.tgz
example1
is a toy program that makes use of custom shared libraries to echo its command line args back to stdout.
Note
example1
is a C++ program, using one C++ library MyCustomCxxLib.so
and one C library my_custom_c_lib.so
. Feel free to look at the source before going through the exercise.
Determining Shared Library Functions¶
We'll need to first determine what functions the shared libraries offer, and which of these the application is actually using. You may use a reverse engineering program like IDA if you wish, but binutils is sufficient for this one:
Execute the following:
nm -D example1 MyCustomCxxLib.so my_custom_c_lib.so | c++filt
nm -D
shows us the imported and exported symbols of ELF objects. A U
sits next to imported symbols, and a T
sits next to exported function symbols.
Note
If you usually use binutils nm
without a -D
, that sometimes works too, but technically this lists the debug symbols in an ELF object instead of the imported & exported symbols. ELFs may have had their debug symbols stripped, as is the case here.
c++filt
is a program that un-mangles C++ names it sees. The net effect of C++ name mangling is that shared libraries reveal the argument types for C++ functions (e.g. MyCustomCxxLib::process_data(char const*, char*)
here), but you don't get argument types for C functions (e.g. my_custom_c_lib_process_data
here).
Determining Harnessing Function Calls and Sequence¶
Decide which functions we'd like to call in our test driver, and in which order. Typically, this is achieved by mild reverse-engineering of the application or libraries, to find example sequences of how the target functions are being called.
Note
This isn't a reverse-engineering tutorial, so if you aren't already comfortable with reverse engineering, just open main.cpp. For this program, our test driver should pass the output of MyCustomCxxLib::process_data()
to my_custom_c_lib_process_data()
\ , just as seen inside the loop in example1
\ 's main()
.
A test driver doesn't need to exactly imitate the application's usage of libraries, there are a variety of issues you can run into when straying too far. In this case, blindly trying to fuzz my_custom_c_lib_process_data()
alone will cause the library to issue a "bad format!" error, whereas the sequence of MyCustomCxxLib::process_data()
and then my_custom_c_lib_process_data()
will work fine. This particular case is somewhat artificial, but stereotypical of real-world test driving efforts.
Creating a Shared Library Test Driver¶
Create function declarations that allow you to link against the shared libraries. This application didn't ship with header files, but nm
gives you most of the information you need to recreate them!
Directly from the nm
output you saw before, you are able to infer
1 2 3 4 5 6 7 8 9 |
|
The missing types here are int
, void
, and char *, int
. You could determine this through trial and error, or via reverse engineering.
Lastly, see harness.cxx
, or if you think you know what to do, try writing one on your own first. Ensure that you can compile, run, and fuzz this test driver before moving on. Try to re-create harness.cxx
on your own, to check your understanding.
Tip
This strategy works best when using the same C++ compiler and platform as the target libraries were built with. For example, things may not work if you attempt to compile your test driver with g++ when the library was compiled with clang++. In particular: g++'s libstdc++ changed its implementation of std::string a few years ago, so older (still in use!) versions of g++ toolchains are not binary compatible with recent versions. (Particularly for g++, solving this is sometimes as easy as switching between -D_GLIBCXX_USE_CXX11_ABI=0
and -D_GLIBCXX_USE_CXX11_ABI=1
.)
Exercise: llua_simple
¶
File: llua_simple.tgz
To practice what you've just learned, try to test drive libllua.so
using the example set by the provided llua_simple
binary. Specifically, test drive the llual_newstate()
, llual_loadfilex()
, and lua_pcallk()
sequence. libllua.so
is a C library, so unfortunately you'll have to guess more about function argument types than if it were in C++.
You may look at the llua_simple.c
source code, but if you have reverse engineering experience, try this exercise without it at first (and look only at the llua_simple
and libllua.so
binaries). Either way, avoid going online (or to /usr/include
) to look for the Lua header files! For the sake of practice, we're pretending like libllua.so
is a closed-source library with no headers or source available (spoiler: it's not).
There's no specific intended vulnerability for your resulting test driver to be able to hit in this exercise; but, it should be able to get lots of coverage.
Example 2: C++ Objects¶
File: example2.tgz
example2
is a more complex target, but we'll follow the same general process as we did for example1
to test drive it: reverse engineer the binaries to make
working header files, write a test driver in C++ that exercises the library (in a way similar to how we see the library being used), and then compile the test driver and link it to the library.
When C++ objects are involved, this process requires more work and a greater attention to detail. Functions you'll want to test drive may take C++ objects as parameters, which requires your test driver to create these C++ objects beforehand. Furthermore, creating and properly initializing C++ objects requires having a correct-enough class definition for the object.
Quick refresher on C++ classes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Tip
You can read more about the syntax of C++ class definitions, but most of the other things you can do in a class definition are irrelevant to test driving and reverse engineering.)
Three things matter when creating a "correct-enough" class definition:
- Function declarations for member functions that you intend to call (including constructors).
- The size of the class.
- If the class has any virtual methods (including destructors), or any parent classes with virtual methods, you need to include all of those (and possibly re-create the inheritance hierarchy). We'll stay away from virtual methods in this tutorial.
If a program used the example class above, a reverse-engineered definition of the class for use in a test driver might look like:
1 2 3 4 5 6 7 8 9 |
|
To study this example, go through the same process enumerated for the first example. Study harness.cxx
, and the source files for example2
and ex2lib.so
. To check your understanding, close harness.cxx
and attempt to recreate it using only the binaries (of course, feel free to cheat with some of the example2
and ex2lib.so
source to ease the reverse-engineering process).
✏️ Summary and Recap¶
In this lesson, you dealt with a specific method of binary test driving that is applicable to many real-world applications. It's not the only way to test drive a binary, and should join your playbook of techniques rather than be interpreted as a "correct" approach to test driving.
I learned how to...
1. Define what a shared library is.
- A shared library...
- Is a
.so
file on Linux, and a.dll
on Windows. - Contains compiled code and associated data.
- Can be shared or used among different programs.
- Is a
2. Articulate the difference between test driving libraries vs. applications.
- In general, when test driving any library-like component of an application (not just shared libraries), consider the following:
- A test driver for a library is essentially an alternate application written with that library. Even though this alternate application can be simple, creating it is often more work than fuzzing an existing application.
- The way an application uses (or misuses) a library may differ from the way a test driver uses that library. A test driver that does not adequately imitate the application will miss or encounter different bugs.
- A library test driver has no ability to find bugs in application code outside of that library.
- There are two main reasons you may wish to test drive a library, as opposed to an application:
- The library may be easier to test drive than the application. For example, an HTML parser is relatively easy to test drive, and a web browser is not.
- Potentially improving the speed or quality of fuzzing: At the library level, your test driver can have a more fine-grained ability to skip slow or uninteresting parts of the software logic, resulting in a faster test driver that finds more bugs.
3. Explain how binary-only test driving works.
- Security experts often need to test driver a shared library that is provided without source. This tutorial is aimed at exploring custom shared libraries shipped with an application for which source is not available.
- If, for example, you know that an application uses an open source library, it's better to acquire the source (preferably for the same version as the application uses) and use source-test-driving techniques on that. As another example, if a closed-source library is released stand-alone to developers, it will likely come with header files that can make the best test driving approach more like source-based test driving.
4. Walk through an example shared library test driver.
- You'll need to perform the following steps:
- Determine shared library functions.
- Determine test driving function calls and sequences.
- Create a shared library test driver.