Kostya Serebryany | 7967738 | 2015-03-31 21:39:38 +0000 | [diff] [blame] | 1 | ======================================================== |
Kostya Serebryany | 35ce863 | 2015-03-30 23:05:30 +0000 | [diff] [blame] | 2 | LibFuzzer -- a library for coverage-guided fuzz testing. |
| 3 | ======================================================== |
Kostya Serebryany | 7967738 | 2015-03-31 21:39:38 +0000 | [diff] [blame] | 4 | .. contents:: |
| 5 | :local: |
| 6 | :depth: 4 |
| 7 | |
| 8 | Introduction |
| 9 | ============ |
Kostya Serebryany | 35ce863 | 2015-03-30 23:05:30 +0000 | [diff] [blame] | 10 | |
| 11 | This library is intended primarily for in-process coverage-guided fuzz testing |
| 12 | (fuzzing) of other libraries. The typical workflow looks like this: |
| 13 | |
| 14 | * Build the Fuzzer library as a static archive (or just a set of .o files). |
| 15 | Note that the Fuzzer contains the main() function. |
| 16 | Preferably do *not* use sanitizers while building the Fuzzer. |
| 17 | * Build the library you are going to test with -fsanitize-coverage=[234] |
| 18 | and one of the sanitizers. We recommend to build the library in several |
| 19 | different modes (e.g. asan, msan, lsan, ubsan, etc) and even using different |
| 20 | optimizations options (e.g. -O0, -O1, -O2) to diversify testing. |
| 21 | * Build a test driver using the same options as the library. |
| 22 | The test driver is a C/C++ file containing interesting calls to the library |
| 23 | inside a single function ``extern "C" void TestOneInput(const uint8_t *Data, size_t Size);`` |
| 24 | * Link the Fuzzer, the library and the driver together into an executable |
| 25 | using the same sanitizer options as for the library. |
| 26 | * Collect the initial corpus of inputs for the |
| 27 | fuzzer (a directory with test inputs, one file per input). |
| 28 | The better your inputs are the faster you will find something interesting. |
| 29 | Also try to keep your inputs small, otherwise the Fuzzer will run too slow. |
| 30 | * Run the fuzzer with the test corpus. As new interesting test cases are |
| 31 | discovered they will be added to the corpus. If a bug is discovered by |
| 32 | the sanitizer (asan, etc) it will be reported as usual and the reproducer |
| 33 | will be written to disk. |
| 34 | Each Fuzzer process is single-threaded (unless the library starts its own |
| 35 | threads). You can run the Fuzzer on the same corpus in multiple processes. |
| 36 | in parallel. For run-time options run the Fuzzer binary with '-help=1'. |
| 37 | |
| 38 | |
Kostya Serebryany | 7967738 | 2015-03-31 21:39:38 +0000 | [diff] [blame] | 39 | The Fuzzer is similar in concept to AFL_, |
Kostya Serebryany | 35ce863 | 2015-03-30 23:05:30 +0000 | [diff] [blame] | 40 | but uses in-process Fuzzing, which is more fragile, more restrictive, but |
| 41 | potentially much faster as it has no overhead for process start-up. |
Kostya Serebryany | 7967738 | 2015-03-31 21:39:38 +0000 | [diff] [blame] | 42 | It uses LLVM's SanitizerCoverage_ instrumentation to get in-process |
| 43 | coverage-feedback |
Kostya Serebryany | 35ce863 | 2015-03-30 23:05:30 +0000 | [diff] [blame] | 44 | |
Kostya Serebryany | 7967738 | 2015-03-31 21:39:38 +0000 | [diff] [blame] | 45 | The code resides in the LLVM repository, requires the fresh Clang compiler to build |
| 46 | and is used to fuzz various parts of LLVM, |
| 47 | but the Fuzzer itself does not (and should not) depend on any |
| 48 | part of LLVM and can be used for other projects w/o requiring the rest of LLVM. |
Kostya Serebryany | 35ce863 | 2015-03-30 23:05:30 +0000 | [diff] [blame] | 49 | |
Kostya Serebryany | 7967738 | 2015-03-31 21:39:38 +0000 | [diff] [blame] | 50 | Usage examples |
| 51 | ============== |
| 52 | |
| 53 | Toy example |
| 54 | ----------- |
| 55 | |
| 56 | A simple function that does something interesting if it receives the input "HI!":: |
| 57 | |
| 58 | cat << EOF >> test_fuzzer.cc |
| 59 | extern "C" void TestOneInput(const unsigned char *data, unsigned long size) { |
| 60 | if (size > 0 && data[0] == 'H') |
| 61 | if (size > 1 && data[1] == 'I') |
| 62 | if (size > 2 && data[2] == '!') |
| 63 | __builtin_trap(); |
| 64 | } |
| 65 | EOF |
| 66 | # Get lib/Fuzzer. Assuming that you already have fresh clang in PATH. |
| 67 | svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer |
| 68 | # Build lib/Fuzzer files. |
| 69 | clang -c -g -O2 -std=c++11 Fuzzer/*.cpp -IFuzzer |
| 70 | # Build test_fuzzer.cc with asan and link against lib/Fuzzer. |
| 71 | clang++ -fsanitize=address -fsanitize-coverage=3 test_fuzzer.cc Fuzzer*.o |
| 72 | # Run the fuzzer with no corpus. |
| 73 | ./a.out |
| 74 | |
| 75 | You should get ``Illegal instruction (core dumped)`` pretty quickly. |
| 76 | |
| 77 | PCRE2 |
| 78 | ----- |
| 79 | |
| 80 | Here we show how to use lib/Fuzzer on something real, yet simple: pcre2_:: |
| 81 | |
| 82 | COV_FLAGS=" -fsanitize-coverage=4 -mllvm -sanitizer-coverage-8bit-counters=1" |
| 83 | # Get PCRE2 |
| 84 | svn co svn://vcs.exim.org/pcre2/code/trunk pcre |
| 85 | # Get lib/Fuzzer. Assuming that you already have fresh clang in PATH. |
| 86 | svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer |
| 87 | # Build PCRE2 with AddressSanitizer and coverage. |
| 88 | (cd pcre; ./autogen.sh; CC="clang -fsanitize=address $COV_FLAGS" ./configure --prefix=`pwd`/../inst && make -j && make install) |
| 89 | # Build lib/Fuzzer files. |
| 90 | clang -c -g -O2 -std=c++11 Fuzzer/*.cpp -IFuzzer |
| 91 | # Build the the actual function that does something interesting with PCRE2. |
| 92 | cat << EOF > pcre_fuzzer.cc |
| 93 | #include <string.h> |
| 94 | #include "pcre2posix.h" |
| 95 | extern "C" void TestOneInput(const unsigned char *data, size_t size) { |
| 96 | if (size < 1) return; |
| 97 | char *str = new char[size+1]; |
| 98 | memcpy(str, data, size); |
| 99 | str[size] = 0; |
| 100 | regex_t preg; |
| 101 | if (0 == regcomp(&preg, str, 0)) { |
| 102 | regexec(&preg, str, 0, 0, 0); |
| 103 | regfree(&preg); |
| 104 | } |
| 105 | delete [] str; |
| 106 | } |
| 107 | EOF |
| 108 | clang++ -g -fsanitize=address $COV_FLAGS -c -std=c++11 -I inst/include/ pcre_fuzzer.cc |
| 109 | # Link. |
| 110 | clang++ -g -fsanitize=address -Wl,--whole-archive inst/lib/*.a -Wl,-no-whole-archive Fuzzer*.o pcre_fuzzer.o -o pcre_fuzzer |
| 111 | |
| 112 | This will give you a binary of the fuzzer, called ``pcre_fuzzer``. |
| 113 | Now, create a directory that will hold the test corpus:: |
| 114 | |
| 115 | mkdir -p CORPUS |
| 116 | |
| 117 | For simple input languages like regular expressions this is all you need. |
| 118 | For more complicated inputs populate the directory with some input samples. |
| 119 | Now run the fuzzer with the corpus dir as the only parameter:: |
| 120 | |
| 121 | ./pcre_fuzzer ./CORPUS |
| 122 | |
| 123 | You will see output like this:: |
| 124 | |
| 125 | Seed: 1876794929 |
| 126 | #0 READ cov 0 bits 0 units 1 exec/s 0 |
| 127 | #1 pulse cov 3 bits 0 units 1 exec/s 0 |
| 128 | #1 INITED cov 3 bits 0 units 1 exec/s 0 |
| 129 | #2 pulse cov 208 bits 0 units 1 exec/s 0 |
| 130 | #2 NEW cov 208 bits 0 units 2 exec/s 0 L: 64 |
| 131 | #3 NEW cov 217 bits 0 units 3 exec/s 0 L: 63 |
| 132 | #4 pulse cov 217 bits 0 units 3 exec/s 0 |
| 133 | |
| 134 | * The ``Seed:`` line shows you the current random seed (you can change it with ``-seed=N`` flag). |
| 135 | * The ``READ`` line shows you how many input files were read (since you passed an empty dir there were inputs, but one dummy input was synthesised). |
| 136 | * The ``INITED`` line shows you that how many inputs will be fuzzed. |
| 137 | * The ``NEW`` lines appear with the fuzzer finds a new interesting input, which is saved to the CORPUS dir. If multiple corpus dirs are given, the first one is used. |
| 138 | * The ``pulse`` lines appear periodically to show the current status. |
| 139 | |
| 140 | Now, interrupt the fuzzer and run it again the same way. You will see:: |
| 141 | |
| 142 | Seed: 1879995378 |
| 143 | #0 READ cov 0 bits 0 units 564 exec/s 0 |
| 144 | #1 pulse cov 502 bits 0 units 564 exec/s 0 |
| 145 | ... |
| 146 | #512 pulse cov 2933 bits 0 units 564 exec/s 512 |
| 147 | #564 INITED cov 2991 bits 0 units 344 exec/s 564 |
| 148 | #1024 pulse cov 2991 bits 0 units 344 exec/s 1024 |
| 149 | #1455 NEW cov 2995 bits 0 units 345 exec/s 1455 L: 49 |
| 150 | |
| 151 | This time you were running the fuzzer with a non-empty input corpus (564 items). |
| 152 | As the first step, the fuzzer minimized the set to produce 344 interesting items (the ``INITED`` line) |
| 153 | |
| 154 | You may run ``N`` independent fuzzer jobs in parallel on ``M`` CPUs:: |
| 155 | |
| 156 | N=100; M=4; ./pcre_fuzzer ./CORPUS -jobs=$N -workers=$M |
| 157 | |
| 158 | This is useful when you already have an exhaustive test corpus. |
| 159 | If you've just started fuzzing with no good corpus running independent |
| 160 | jobs will create a corpus with too many duplicates. |
| 161 | One way to avoid this and still use all of your CPUs is to use the flag ``-exit_on_first=1`` |
| 162 | which will cause the fuzzer to exit on the first new synthesised input:: |
| 163 | |
| 164 | N=100; M=4; ./pcre_fuzzer ./CORPUS -jobs=$N -workers=$M -exit_on_first=1 |
| 165 | |
Kostya Serebryany | 5e593a4 | 2015-04-08 06:16:11 +0000 | [diff] [blame^] | 166 | Heartbleed |
| 167 | ---------- |
| 168 | Remember Heartbleed_? |
| 169 | As it was recently `shown <https://blog.hboeck.de/archives/868-How-Heartbleed-couldve-been-found.html>`_, |
| 170 | fuzzing with AddressSanitizer can find Heartbleed. Indeed, here are the step-by-step instructions |
| 171 | to find Heartbleed with LibFuzzer:: |
| 172 | |
| 173 | wget https://www.openssl.org/source/openssl-1.0.1f.tar.gz |
| 174 | tar xf openssl-1.0.1f.tar.gz |
| 175 | COV_FLAGS="-fsanitize-coverage=4" # -mllvm -sanitizer-coverage-8bit-counters=1" |
| 176 | (cd openssl-1.0.1f/ && ./config && |
| 177 | make -j 32 CC="clang -g -fsanitize=address $COV_FLAGS") |
| 178 | # Get and build LibFuzzer |
| 179 | svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer |
| 180 | clang -c -g -O2 -std=c++11 Fuzzer/*.cpp -IFuzzer |
| 181 | # Get examples of key/pem files. |
| 182 | git clone https://github.com/hannob/selftls |
| 183 | cp selftls/server* . -v |
| 184 | cat << EOF > handshake-fuzz.cc |
| 185 | #include <openssl/ssl.h> |
| 186 | #include <openssl/err.h> |
| 187 | #include <assert.h> |
| 188 | SSL_CTX *sctx; |
| 189 | int Init() { |
| 190 | SSL_library_init(); |
| 191 | SSL_load_error_strings(); |
| 192 | ERR_load_BIO_strings(); |
| 193 | OpenSSL_add_all_algorithms(); |
| 194 | assert (sctx = SSL_CTX_new(TLSv1_method())); |
| 195 | assert (SSL_CTX_use_certificate_file(sctx, "server.pem", SSL_FILETYPE_PEM)); |
| 196 | assert (SSL_CTX_use_PrivateKey_file(sctx, "server.key", SSL_FILETYPE_PEM)); |
| 197 | return 0; |
| 198 | } |
| 199 | extern "C" void TestOneInput(unsigned char *Data, size_t Size) { |
| 200 | static int unused = Init(); |
| 201 | SSL *server = SSL_new(sctx); |
| 202 | BIO *sinbio = BIO_new(BIO_s_mem()); |
| 203 | BIO *soutbio = BIO_new(BIO_s_mem()); |
| 204 | SSL_set_bio(server, sinbio, soutbio); |
| 205 | SSL_set_accept_state(server); |
| 206 | BIO_write(sinbio, Data, Size); |
| 207 | SSL_do_handshake(server); |
| 208 | SSL_free(server); |
| 209 | } |
| 210 | EOF |
| 211 | # Build the fuzzer. |
| 212 | clang++ -g handshake-fuzz.cc -fsanitize=address \ |
| 213 | openssl-1.0.1f/libssl.a openssl-1.0.1f/libcrypto.a Fuzzer*.o |
| 214 | # Run 20 independent fuzzer jobs. |
| 215 | ./a.out -jobs=20 -workers=20 |
| 216 | |
| 217 | Voila:: |
| 218 | |
| 219 | #1048576 pulse cov 3424 bits 0 units 9 exec/s 24385 |
| 220 | ================================================================= |
| 221 | ==17488==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x629000004748 at pc 0x00000048c979 bp 0x7fffe3e864f0 sp 0x7fffe3e85ca8 |
| 222 | READ of size 60731 at 0x629000004748 thread T0 |
| 223 | #0 0x48c978 in __asan_memcpy |
| 224 | #1 0x4db504 in tls1_process_heartbeat openssl-1.0.1f/ssl/t1_lib.c:2586:3 |
| 225 | #2 0x580be3 in ssl3_read_bytes openssl-1.0.1f/ssl/s3_pkt.c:1092:4 |
| 226 | |
Kostya Serebryany | 043ab1c | 2015-04-01 21:33:20 +0000 | [diff] [blame] | 227 | Advanced features |
| 228 | ================= |
| 229 | |
| 230 | Tokens |
| 231 | ------ |
| 232 | |
| 233 | By default, the fuzzer is not aware of complexities of the input language |
| 234 | and when fuzzing e.g. a C++ parser it will mostly stress the lexer. |
| 235 | It is very hard for the fuzzer to come up with something like ``reinterpret_cast<int>`` |
| 236 | from a test corpus that doesn't have it. |
| 237 | See a detailed discussion of this topic at |
| 238 | http://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html. |
| 239 | |
| 240 | lib/Fuzzer implements a simple technique that allows to fuzz input languages with |
| 241 | long tokens. All you need is to prepare a text file containing up to 253 tokens, one token per line, |
| 242 | and pass it to the fuzzer as ``-tokens=TOKENS_FILE.txt``. |
| 243 | Three implicit tokens are added: ``" "``, ``"\t"``, and ``"\n"``. |
| 244 | The fuzzer itself will still be mutating a string of bytes |
| 245 | but before passing this input to the target library it will replace every byte ``b`` with the ``b``-th token. |
| 246 | If there are less than ``b`` tokens, a space will be added instead. |
| 247 | |
Kostya Serebryany | 7967738 | 2015-03-31 21:39:38 +0000 | [diff] [blame] | 248 | |
| 249 | Fuzzing components of LLVM |
| 250 | ========================== |
Kostya Serebryany | 35ce863 | 2015-03-30 23:05:30 +0000 | [diff] [blame] | 251 | |
| 252 | clang-format-fuzzer |
| 253 | ------------------- |
| 254 | The inputs are random pieces of C++-like text. |
| 255 | |
| 256 | Build (make sure to use fresh clang as the host compiler):: |
| 257 | |
| 258 | cmake -GNinja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_USE_SANITIZER=Address -DLLVM_USE_SANITIZE_COVERAGE=YES -DCMAKE_BUILD_TYPE=Release /path/to/llvm |
| 259 | ninja clang-format-fuzzer |
| 260 | mkdir CORPUS_DIR |
| 261 | ./bin/clang-format-fuzzer CORPUS_DIR |
| 262 | |
| 263 | Optionally build other kinds of binaries (asan+Debug, msan, ubsan, etc). |
| 264 | |
| 265 | TODO: commit the pre-fuzzed corpus to svn (?). |
| 266 | |
Kostya Serebryany | 7967738 | 2015-03-31 21:39:38 +0000 | [diff] [blame] | 267 | Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23052 |
Kostya Serebryany | 35ce863 | 2015-03-30 23:05:30 +0000 | [diff] [blame] | 268 | |
Kostya Serebryany | 7967738 | 2015-03-31 21:39:38 +0000 | [diff] [blame] | 269 | clang-fuzzer |
| 270 | ------------ |
Kostya Serebryany | 35ce863 | 2015-03-30 23:05:30 +0000 | [diff] [blame] | 271 | |
Kostya Serebryany | 7967738 | 2015-03-31 21:39:38 +0000 | [diff] [blame] | 272 | The default behavior is very similar to ``clang-format-fuzzer``. |
Kostya Serebryany | 043ab1c | 2015-04-01 21:33:20 +0000 | [diff] [blame] | 273 | Clang can also be fuzzed with Tokens_ using ``-tokens=$LLVM/lib/Fuzzer/cxx_fuzzer_tokens.txt`` option. |
Kostya Serebryany | 7967738 | 2015-03-31 21:39:38 +0000 | [diff] [blame] | 274 | |
| 275 | Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23057 |
Kostya Serebryany | 35ce863 | 2015-03-30 23:05:30 +0000 | [diff] [blame] | 276 | |
| 277 | FAQ |
| 278 | ========================= |
| 279 | |
| 280 | Q. Why Fuzzer does not use any of the LLVM support? |
| 281 | --------------------------------------------------- |
| 282 | |
| 283 | There are two reasons. |
| 284 | |
| 285 | First, we want this library to be used outside of the LLVM w/o users having to |
| 286 | build the rest of LLVM. This may sound unconvincing for many LLVM folks, |
| 287 | but in practice the need for building the whole LLVM frightens many potential |
| 288 | users -- and we want more users to use this code. |
| 289 | |
| 290 | Second, there is a subtle technical reason not to rely on the rest of LLVM, or |
| 291 | any other large body of code (maybe not even STL). When coverage instrumentation |
| 292 | is enabled, it will also instrument the LLVM support code which will blow up the |
| 293 | coverage set of the process (since the fuzzer is in-process). In other words, by |
| 294 | using more external dependencies we will slow down the fuzzer while the main |
| 295 | reason for it to exist is extreme speed. |
| 296 | |
| 297 | Q. What about Windows then? The Fuzzer contains code that does not build on Windows. |
| 298 | ------------------------------------------------------------------------------------ |
| 299 | |
| 300 | The sanitizer coverage support does not work on Windows either as of 01/2015. |
| 301 | Once it's there, we'll need to re-implement OS-specific parts (I/O, signals). |
| 302 | |
| 303 | Q. When this Fuzzer is not a good solution for a problem? |
| 304 | --------------------------------------------------------- |
| 305 | |
| 306 | * If the test inputs are validated by the target library and the validator |
| 307 | asserts/crashes on invalid inputs, the in-process fuzzer is not applicable |
| 308 | (we could use fork() w/o exec, but it comes with extra overhead). |
| 309 | * Bugs in the target library may accumulate w/o being detected. E.g. a memory |
| 310 | corruption that goes undetected at first and then leads to a crash while |
| 311 | testing another input. This is why it is highly recommended to run this |
| 312 | in-process fuzzer with all sanitizers to detect most bugs on the spot. |
| 313 | * It is harder to protect the in-process fuzzer from excessive memory |
| 314 | consumption and infinite loops in the target library (still possible). |
| 315 | * The target library should not have significant global state that is not |
| 316 | reset between the runs. |
| 317 | * Many interesting target libs are not designed in a way that supports |
| 318 | the in-process fuzzer interface (e.g. require a file path instead of a |
| 319 | byte array). |
| 320 | * If a single test run takes a considerable fraction of a second (or |
| 321 | more) the speed benefit from the in-process fuzzer is negligible. |
| 322 | * If the target library runs persistent threads (that outlive |
| 323 | execution of one test) the fuzzing results will be unreliable. |
| 324 | |
| 325 | Q. So, what exactly this Fuzzer is good for? |
| 326 | -------------------------------------------- |
| 327 | |
| 328 | This Fuzzer might be a good choice for testing libraries that have relatively |
| 329 | small inputs, each input takes < 1ms to run, and the library code is not expected |
| 330 | to crash on invalid inputs. |
| 331 | Examples: regular expression matchers, text or binary format parsers. |
| 332 | |
Kostya Serebryany | 7967738 | 2015-03-31 21:39:38 +0000 | [diff] [blame] | 333 | .. _pcre2: http://www.pcre.org/ |
| 334 | |
| 335 | .. _AFL: http://lcamtuf.coredump.cx/afl/ |
| 336 | |
| 337 | .. _SanitizerCoverage: https://code.google.com/p/address-sanitizer/wiki/AsanCoverage |
Kostya Serebryany | 5e593a4 | 2015-04-08 06:16:11 +0000 | [diff] [blame^] | 338 | |
| 339 | .. _Heartbleed: http://en.wikipedia.org/wiki/Heartbleed |