Kostya Serebryany | 35ce863 | 2015-03-30 23:05:30 +0000 | [diff] [blame] | 1 | LibFuzzer -- a library for coverage-guided fuzz testing. |
| 2 | ======================================================== |
| 3 | |
| 4 | This library is intended primarily for in-process coverage-guided fuzz testing |
| 5 | (fuzzing) of other libraries. The typical workflow looks like this: |
| 6 | |
| 7 | * Build the Fuzzer library as a static archive (or just a set of .o files). |
| 8 | Note that the Fuzzer contains the main() function. |
| 9 | Preferably do *not* use sanitizers while building the Fuzzer. |
| 10 | * Build the library you are going to test with -fsanitize-coverage=[234] |
| 11 | and one of the sanitizers. We recommend to build the library in several |
| 12 | different modes (e.g. asan, msan, lsan, ubsan, etc) and even using different |
| 13 | optimizations options (e.g. -O0, -O1, -O2) to diversify testing. |
| 14 | * Build a test driver using the same options as the library. |
| 15 | The test driver is a C/C++ file containing interesting calls to the library |
| 16 | inside a single function ``extern "C" void TestOneInput(const uint8_t *Data, size_t Size);`` |
| 17 | * Link the Fuzzer, the library and the driver together into an executable |
| 18 | using the same sanitizer options as for the library. |
| 19 | * Collect the initial corpus of inputs for the |
| 20 | fuzzer (a directory with test inputs, one file per input). |
| 21 | The better your inputs are the faster you will find something interesting. |
| 22 | Also try to keep your inputs small, otherwise the Fuzzer will run too slow. |
| 23 | * Run the fuzzer with the test corpus. As new interesting test cases are |
| 24 | discovered they will be added to the corpus. If a bug is discovered by |
| 25 | the sanitizer (asan, etc) it will be reported as usual and the reproducer |
| 26 | will be written to disk. |
| 27 | Each Fuzzer process is single-threaded (unless the library starts its own |
| 28 | threads). You can run the Fuzzer on the same corpus in multiple processes. |
| 29 | in parallel. For run-time options run the Fuzzer binary with '-help=1'. |
| 30 | |
| 31 | |
| 32 | The Fuzzer is similar in concept to AFL (http://lcamtuf.coredump.cx/afl/), |
| 33 | but uses in-process Fuzzing, which is more fragile, more restrictive, but |
| 34 | potentially much faster as it has no overhead for process start-up. |
| 35 | It uses LLVM's "Sanitizer Coverage" instrumentation to get in-process |
| 36 | coverage-feedback https://code.google.com/p/address-sanitizer/wiki/AsanCoverage |
| 37 | |
| 38 | The code resides in the LLVM repository and is (or will be) used by various |
| 39 | parts of LLVM, but the Fuzzer itself does not (and should not) depend on any |
| 40 | part of LLVM and can be used for other projects. Ideally, the Fuzzer's code |
| 41 | should not have any external dependencies. Right now it uses STL, which may need |
| 42 | to be fixed later. See also FAQ below. |
| 43 | |
| 44 | Examples of usage in LLVM |
| 45 | ========================= |
| 46 | |
| 47 | clang-format-fuzzer |
| 48 | ------------------- |
| 49 | The inputs are random pieces of C++-like text. |
| 50 | |
| 51 | Build (make sure to use fresh clang as the host compiler):: |
| 52 | |
| 53 | cmake -GNinja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_USE_SANITIZER=Address -DLLVM_USE_SANITIZE_COVERAGE=YES -DCMAKE_BUILD_TYPE=Release /path/to/llvm |
| 54 | ninja clang-format-fuzzer |
| 55 | mkdir CORPUS_DIR |
| 56 | ./bin/clang-format-fuzzer CORPUS_DIR |
| 57 | |
| 58 | Optionally build other kinds of binaries (asan+Debug, msan, ubsan, etc). |
| 59 | |
| 60 | TODO: commit the pre-fuzzed corpus to svn (?). |
| 61 | |
| 62 | Toy example |
| 63 | ------------------- |
| 64 | |
| 65 | See lib/Fuzzer/test/SimpleTest.cpp. |
| 66 | A simple function that does something interesting if it receives bytes "Hi!":: |
| 67 | |
| 68 | # Build the Fuzzer with asan: |
| 69 | clang++ -std=c++11 -fsanitize=address -fsanitize-coverage=3 -O1 -g Fuzzer*.cpp test/SimpleTest.cpp |
| 70 | # Run the fuzzer with no corpus (assuming on empty input) |
| 71 | ./a.out |
| 72 | |
| 73 | FAQ |
| 74 | ========================= |
| 75 | |
| 76 | Q. Why Fuzzer does not use any of the LLVM support? |
| 77 | --------------------------------------------------- |
| 78 | |
| 79 | There are two reasons. |
| 80 | |
| 81 | First, we want this library to be used outside of the LLVM w/o users having to |
| 82 | build the rest of LLVM. This may sound unconvincing for many LLVM folks, |
| 83 | but in practice the need for building the whole LLVM frightens many potential |
| 84 | users -- and we want more users to use this code. |
| 85 | |
| 86 | Second, there is a subtle technical reason not to rely on the rest of LLVM, or |
| 87 | any other large body of code (maybe not even STL). When coverage instrumentation |
| 88 | is enabled, it will also instrument the LLVM support code which will blow up the |
| 89 | coverage set of the process (since the fuzzer is in-process). In other words, by |
| 90 | using more external dependencies we will slow down the fuzzer while the main |
| 91 | reason for it to exist is extreme speed. |
| 92 | |
| 93 | Q. What about Windows then? The Fuzzer contains code that does not build on Windows. |
| 94 | ------------------------------------------------------------------------------------ |
| 95 | |
| 96 | The sanitizer coverage support does not work on Windows either as of 01/2015. |
| 97 | Once it's there, we'll need to re-implement OS-specific parts (I/O, signals). |
| 98 | |
| 99 | Q. When this Fuzzer is not a good solution for a problem? |
| 100 | --------------------------------------------------------- |
| 101 | |
| 102 | * If the test inputs are validated by the target library and the validator |
| 103 | asserts/crashes on invalid inputs, the in-process fuzzer is not applicable |
| 104 | (we could use fork() w/o exec, but it comes with extra overhead). |
| 105 | * Bugs in the target library may accumulate w/o being detected. E.g. a memory |
| 106 | corruption that goes undetected at first and then leads to a crash while |
| 107 | testing another input. This is why it is highly recommended to run this |
| 108 | in-process fuzzer with all sanitizers to detect most bugs on the spot. |
| 109 | * It is harder to protect the in-process fuzzer from excessive memory |
| 110 | consumption and infinite loops in the target library (still possible). |
| 111 | * The target library should not have significant global state that is not |
| 112 | reset between the runs. |
| 113 | * Many interesting target libs are not designed in a way that supports |
| 114 | the in-process fuzzer interface (e.g. require a file path instead of a |
| 115 | byte array). |
| 116 | * If a single test run takes a considerable fraction of a second (or |
| 117 | more) the speed benefit from the in-process fuzzer is negligible. |
| 118 | * If the target library runs persistent threads (that outlive |
| 119 | execution of one test) the fuzzing results will be unreliable. |
| 120 | |
| 121 | Q. So, what exactly this Fuzzer is good for? |
| 122 | -------------------------------------------- |
| 123 | |
| 124 | This Fuzzer might be a good choice for testing libraries that have relatively |
| 125 | small inputs, each input takes < 1ms to run, and the library code is not expected |
| 126 | to crash on invalid inputs. |
| 127 | Examples: regular expression matchers, text or binary format parsers. |
| 128 | |