Analyzing Block, Function, and Line Coverage¶
As with other kinds of testing, it's important to not only understand what code is being covered, but also what code is not being covered so that you can make smarter decisions about your next steps with fuzzing.
A best practice would be to make sure that your fuzzing is covering parts of code that deal with input parsing or input processing from external or untrusted sources. In addition, functions with high cyclomatic complexity should be thoroughly tested, as high complexity functions have more distinct code paths, making them harder to reason about and more likely to contain unexpected corner cases.
Having an understanding of how the target application processes input, and therefore which functions are covered, not covered, or should be covered, allow developers to make more informed (and ultimately better) decisions when it comes to fuzzing.
Info
Check out the test coverage term in the glossary for more information on how coverage works.
Pre-requisites¶
Remember that in order to analyze coverage files, you must first download the coverage files for a completed Mayhem run. Therefore, we'll be utilizing the testme
coverage results obtained in Executing and Download Coverage via Mayhem CLI:
Files: coverage.tgz
Tip
When downloading via the UI, a coverage.tgz
archive will be downloaded which contains a directory named something like <target_name>_coverage
. On the other hand, using the mayhem sync
or mayhem download
commands via the Mayhem CLI will put this output directory in the same parent directory as the Mayhemfile and tests directories.
Once downloaded, the coverage directory will contain up to three files: block_coverage.drcov, func_coverage.json, and line_coverage.lcov.
Each file describes the aggregate of all code coverage from each of the inputs or test cases in the test suite that were executed by the target application; however, each file describes the aggregate coverage from a different perspectiveâat the block, function, and source code level, respectively.
Note
The edge coverage metric that Mayhem uses internally to distinguish behaviors is different from block coverage, which is a related but different metric for coverage.
You'll also need access to the testme
compiled binary. Therefore, use the 2.10 Tutorial Docker image.
docker pull forallsecure/tutorial:2.10
docker run -ti --privileged --rm forallsecure/tutorial:2.10
Let's see how to analyze coverage using the three types of files!
Block Coverage¶
A basic block is the smallest piece of code containing a single entry point and a single exit point. The block_coverage.drcov
describes basic block coverage in a packed binary format, and as such is meant for use with binary analysis tools such as Binary Ninja (bncov), IDA Pro (lighthouse), or Ghidra (Dragon Dance) where the (additional) plugins will help visualize or manipulate the data.
Tip
If you have the source code for the target and can compile the target with debug symbols, the line coverage file will likely be more convenient to use.
Using Binary Ninja and bncov for Block Coverage¶
The bncov
plugin for Binary Ninja will allow you to visualize block coverage when the block_coverage.drcov
output file is overlayed on top of the original binary that produced it.
Warning
In order to visualize block coverage using Binary Ninja, you'll need to import the compiled target application as well as the resulting block_coverage.drcov
output file from your Mayhem run into the Binary Ninja application. Therefore, if you are using Docker, copy (docker cp) the files over from the Docker container to a Host OS that can run Binary Ninja.
Now, let's open up Binary Ninja and import the compiled testme
application.
Once imported, you can now visualize the individual blocks of code that comprise the compiled testme
application.
Note
You will have to purchase a Binary Ninja license to be able to view the disassembly graph.
Finally, installing the bncov
plugin and importing the block_coverage.drcov
file for the testme
application indicates the individual blocks that were covered as well as the edges or pathways that were taken.
Note
The easiest way to install bncov is through the Binary Ninja plugin manager! Then, simply go to your Tools menu to begin using bncov
.
As a result, Binary Ninja can be used to analyze individual block coverage using the bncov
plugin.
Using Ghidra and Dragon Dance for Block Coverage¶
Using Ghidra and the Dragon Dance plugin, you can visualize and manipulate the binary code coverage for an underlying application to understand more about its implementation and code coverage behaviors.
Warning
In order to visualize block coverage using Ghidra, you'll need to import the compiled target application as well as the resulting block_coverage.drcov
output file from your Mayhem run into the Ghidra application. Therefore, if you are using Docker, copy (docker cp) the files over from the Docker container to a Host OS that can run Ghidra.
Now, we'll need to download and install Ghidra and the Dragon Dance plugin.
# Download Ghidra
wget https://ghidra-sre.org/ghidra_9.1.2_PUBLIC_20200212.zip
unzip ghidra_9.1.2_PUBLIC_20200212.zip
# Download Dragon Dance 0.2.2
git clone https://github.com/0ffffffffh/dragondance.git
cd dragondance
git checkout v0.2.2
We'll also need to download and install the Java Development Kit (JDK) for your specific operating system (OS) as well as a compatible version of gradle to build the Dragon Dance plugin.
Note
For this exercise, we are utilizing Java 15.0.2, Gradle 6.8.2, Ghidra 9.1.2, and Dragon Dance 0.2.2.
Next, navigate to the dragondance-master
folder and run the following command to build the Dragon Dance plugin in your Ghidra installation directory; the resulting package will be located under the dragondance-master/dist
directory:
gradle -PGHIDRA_INSTALL_DIR=<path_to_ghidra_install>/ghidra_9.1.2_PUBLIC
Now navigate to the Ghidra installation directory and run the following to open the application:
./GhidraRun
Since we've already built the Dragon Dance plugin, now we just have to import it as an available plugin to Ghidra. Go to File > Install Extensions and add the Dragon Dance plugin by navigating to the dragondance-master/dist
folder and selecting the .zip
file.
Next, create a new project testme_coverage
and import the compiled testme
application. The file should be located at testme-pkg/root/root/tutorial/testme/v1/testme
.
Note
You may see some warnings when importing the testme
binary into Ghidra, simply ignore and proceed as usual.
Double-click into the imported testme
file and Ghidra should now begin to display the disassembled view of the testme
binary. Go to the Symbol Tree pane on the left-hand side and select testme
under the Functions folder. This shows us the underlying assembly code of the testme
function in the testme
binary.
Now click on Window > Dragon Dance and import the block_coverage.drcov
file from the testme
coverage results.
Warning
You may need to initialize Dragon Dance for the first time before you can use it in Ghidra. Check out the official documentation for launching Dragon Dance.
In addition, for more complex binaries, Ghidra may not properly identify all the code regions and will prompt the user if it encounters block addresses outside of what it has identified as code. The proper action here is to view the coverage file as ground truth and tell Ghidra to fix up / mark the blocks as code.
Right-click on the block coverage item in Dragon Dance and select Switch To. This will overlay the testme
coverage results over the testme
diassembly view.
And that's it! We can now see parts of the testme
code that were covered and not covered.
Function Coverage¶
The func_coverage.json
coverage file is a plaintext JSON file describing the basic information about functions contained in the target binary as well as the block coverage information for each corresponding function.
Note
We've supplied this file in JSON format with the intent of allowing developers to parse the data as needed. For example, developers can use the function coverage data for automated report generation via their own custom scripts.
The key-value pair definitions for the func_coverage.json
file are as follows:
address
: The address of the function.name
: The name of the function.complexity
: The cyclomatic complexity.callers
: The other functions that are called from the current function.callees
: The other functions that call the current function.all_blocks
: All blocks corresponding to the function.covered_blocks
: All blocks corresponding to the function that are covered.called
: True or False. Indicates whether the function was covered or not.
Note
Notice all addresses are in decimal, and represent the starting address of the function (specifically the address of the first byte of the first instruction of the function, which is also the start of the first block).
Sample output for a given function is shown below.
[
{
"address": 1078889,
"name": "http_parser_init",
"complexity": 3,
"callers": [
1053267
],
"callees": [
1052768
],
"all_blocks": [
1078889,
1078981,
1078987,
1078994,
1079001,
1079006
],
"covered_blocks": [
1078889,
1079001
],
"called": true
},
...
]
Creating Custom Apps for Function Coverage Analysis¶
A quick example of how developers could utilize the func_coverage.json
to suit their needs could be to import the data as a Pandas DataFrame using Python.
Pandas is a popular and robust data analysis library that utilizes table-like structures known as DataFrames to represent records via rows and columns. By importing the func_coverage.json
into a Pandas Dataframe, developers can utilize the full suite of data analysis functions to further analyze their function coverage analysis.
Note
You may have to denormalize your data as there may be nested objects.
Line Coverage¶
Line coverage can be derived by mapping basic blocks to their origin as a line in the source code.
The line_coverage.lcov
coverage file contains line coverage information in the LCOV format, which specifies which lines in a given file are covered. As a result, .lcov
files must be processed alongside the original source directories and files. Otherwise, any alterations or modifications in file paths or source code versions will result in discrepancies/missing information to the lcov report.
Note
This file will only be created if the target contains debug symbols, otherwise line coverage information cannot be automatically generated.
The .lcov
format (also named .info
by some tools) file can be ingested by a number of tools such as LCOV to either generate browseable coverage reports or integrate with other IDEs, plugins, and third-party tools to display additional coverage information.
Generating Line Coverage Reports using genhtml
¶
It is a best practice to process .lcov
files in the original environment from which they were produced. Therefore, we will install lcov
within our docker container and generate the line coverage report using the genhtml
utility.
First, we'll need to install lcov
.
apt-get update -y
apt-get install -y lcov
Next, executing the genhtml
utility on the line_coverage.lcov
file produces the resulting line coverage report for the testme
application.
Note
The genhtml
command follows the pattern genhtml <file> --output-directory <directory_name>
.
We can then move our generated html files to the Host OS from the docker container and run a local HTTP server to view the line coverage report.
Note
Alternatively, you can also configure the docker container network via port forwarding to spin up a HTTP server from within the container and connect via the Host OS.
The code coverage report will display aggregate line coverage results as well as allow you to drill down and visualize the individual lines that were covered--0
for not covered and 1
for covered.
As can be seen, visualizing line code coverage for a target application can be both extremely valuable and highly readable.
Summary¶
The three coverage files available for download and analysis from Mayhem are the block_coverage.drcov, func_coverage.json, and line_coverage.lcov files, each describing aggregate code coverage from different perspectives--at the block, function, and source code levels, respectively.
Lastly, the percentage of code coverage isnât the end of the story. If a target is partly covered, you should consider whether the untested parts of the function indicate that an insufficient variety of input is reaching the function, or if there are simply parts of the code that will not execute under normal conditions (such as code for handling out-of-memory errors or network failures). Even if a function is 100% covered, there may be bug conditions that are possible, such as in the case of a divide-by-zero or NULL-pointer dereference. It is a matter of individual judgement when fuzzing covers a sufficient amount of code in testing, and is typically a weighing of the cost of additional fuzzing improvement vs the potential security or reliability impacts.
Knowing how to analyze these coverage files and taking into consideration the relevance of their results will allow you to make better fuzzing decisions when it comes to your target applications in Mayhem!