Blame - llvm/docs/LibFuzzer.rst - toolchain/ci/llvm-project

blob: d2d02e89939aabc448090a4b4d425f4a5194b383 [file] [log] [blame]

Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	1	=======================================================
				2	libFuzzer – a library for coverage-guided fuzz testing.
				3	=======================================================
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	4	.. contents::
				5	:local:
Kostya Serebryany	d11dc17	2016-03-12 02:56:25 +0000	[diff] [blame]	6	:depth: 1
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	7
				8	Introduction
				9	============
Kostya Serebryany	35ce863	2015-03-30 23:05:30 +0000	[diff] [blame]	10
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	11	LibFuzzer is a library for in-process, coverage-guided, evolutionary fuzzing
				12	of other libraries.
Kostya Serebryany	35ce863	2015-03-30 23:05:30 +0000	[diff] [blame]	13
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	14	LibFuzzer is similar in concept to American Fuzzy Lop (AFL_), but it performs
				15	all of its fuzzing inside a single process. This in-process fuzzing can be more
				16	restrictive and fragile, but is potentially much faster as there is no overhead
				17	for process start-up.
Kostya Serebryany	d11dc17	2016-03-12 02:56:25 +0000	[diff] [blame]	18
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	19	The fuzzer is linked with the library under test, and feeds fuzzed inputs to the
				20	library via a specific fuzzing entrypoint (aka "target function"); the fuzzer
				21	then tracks which areas of the code are reached, and generates mutations on the
				22	corpus of input data in order to maximize the code coverage. The code coverage
				23	information for libFuzzer is provided by LLVM's SanitizerCoverage_
				24	instrumentation.
Kostya Serebryany	35ce863	2015-03-30 23:05:30 +0000	[diff] [blame]	25
				26
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	27	Versions
				28	========
Kostya Serebryany	d11dc17	2016-03-12 02:56:25 +0000	[diff] [blame]	29
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	30	LibFuzzer is under active development so a current (or at least very recent)
				31	version of Clang is the only supported variant.
Kostya Serebryany	35ce863	2015-03-30 23:05:30 +0000	[diff] [blame]	32
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	33	(If `building Clang from trunk`_ is too time-consuming or difficult, then
				34	the Clang binaries that the Chromium developers build are likely to be
				35	fairly recent:
Kostya Serebryany	35ce863	2015-03-30 23:05:30 +0000	[diff] [blame]	36
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	37	.. code-block:: console
Kostya Serebryany	9e1a238	2016-03-29 23:07:36 +0000	[diff] [blame]	38
				39	mkdir TMP_CLANG
				40	cd TMP_CLANG
				41	git clone https://chromium.googlesource.com/chromium/src/tools/clang
				42	cd ..
				43	TMP_CLANG/clang/scripts/update.py
				44
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	45	This installs the Clang binary as
				46	``./third_party/llvm-build/Release+Asserts/bin/clang``)
Kostya Serebryany	9e1a238	2016-03-29 23:07:36 +0000	[diff] [blame]	47
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	48	The libFuzzer code resides in the LLVM repository, and requires a recent Clang
				49	compiler to build (and is used to `fuzz various parts of LLVM itself`_).
				50	However the fuzzer itself does not (and should not) depend on any part of LLVM
				51	infrastructure and can be used for other projects without requiring the rest
				52	of LLVM.
Kostya Serebryany	bfbe7fc	2016-02-02 03:03:47 +0000	[diff] [blame]	53
Kostya Serebryany	bfbe7fc	2016-02-02 03:03:47 +0000	[diff] [blame]	54
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	55	Corpus
				56	======
Kostya Serebryany	bfbe7fc	2016-02-02 03:03:47 +0000	[diff] [blame]	57
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	58	Coverage-guided fuzzers like libFuzzer rely on a corpus of sample inputs for the
				59	code under test. This corpus should ideally be seeded with a varied collection
				60	of valid and invalid inputs for the code under test; for example, for a graphics
				61	library the initial corpus might hold a variety of different small PNG/JPG/GIF
				62	files. The fuzzer generates random mutations based around the sample inputs in
				63	the current corpus. If a mutation triggers execution of a previously-uncovered
				64	path in the code under test, then that mutation is saved to the corpus for
				65	future variations.
Kostya Serebryany	bfbe7fc	2016-02-02 03:03:47 +0000	[diff] [blame]	66
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	67	LibFuzzer will work fine without any initial seeds, but will be less
				68	efficient. In particular, if the library under test accepts complex,
				69	structured inputs then starting from a varied corpus is very important.
Kostya Serebryany	2adfa3b	2015-05-20 21:03:03 +0000	[diff] [blame]	70
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	71	The corpus can also act as a sanity/regression check, to confirm that the
				72	fuzzing entrypoint still works and that all of the sample inputs run through
				73	the code under test without problems.
				74
				75
				76	Getting Started
				77	===============
				78
				79	.. contents::
				80	:local:
				81	:depth: 1
				82
				83	Building
				84	--------
				85
				86	The first step for using libFuzzer on a library is to implement a fuzzing
				87	target function that accepts a sequence of bytes, like this:
				88
				89	.. code-block:: c++
				90
				91	// fuzz_target.cc
				92	extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
				93	DoSomethingInterestingWithMyAPI(Data, Size);
				94	return 0; // Non-zero return values are reserved for future use.
				95	}
				96
				97	Next, build the libFuzzer library as a static archive, without any sanitizer
				98	options. Note that the libFuzzer library contains the ``main()`` function:
				99
				100	.. code-block:: console
				101
				102	svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer
				103	# Alternative: get libFuzzer from a dedicated git mirror:
				104	# git clone https://chromium.googlesource.com/chromium/llvm-project/llvm/lib/Fuzzer
				105	clang++ -c -g -O2 -std=c++11 Fuzzer/*.cpp -IFuzzer
				106	ar ruv libFuzzer.a Fuzzer*.o
				107
				108	Then build the fuzzing target function and the library under test using
				109	the SanitizerCoverage_ option, which instruments the code so that the fuzzer
				110	can retrieve code coverage information (to guide the fuzzing). Linking with
				111	the libFuzzer code then gives an fuzzer executable.
				112
				113	You should also enable one or more of the sanitizers, which help to expose
				114	latent bugs by making incorrect behavior generate errors at runtime:
				115
				116	- AddressSanitizer_ detects memory access errors.
				117	- MemorySanitizer_ detects uninitialized reads: code whose behavior relies on memory
				118	contents that have not been initialized to a specific value.
				119	- UndefinedBehaviorSanitizer_ detects the use of various features of C/C++ that are explicitly
				120	listed as resulting in undefined behavior.
				121
				122	Finally, link with ``libFuzzer.a``::
				123
				124	clang -fsanitize-coverage=edge -fsanitize=address your_lib.cc fuzz_target.cc libFuzzer.a -o my_fuzzer
				125
				126	Running
				127	-------
				128
				129	To run the fuzzer, first create a Corpus_ directory that holds the
				130	initial "seed" sample inputs:
				131
				132	.. code-block:: console
				133
				134	mkdir CORPUS_DIR
				135	cp /some/input/samples/* CORPUS_DIR
				136
				137	Then run the fuzzer on the corpus directory:
				138
				139	.. code-block:: console
				140
				141	./my_fuzzer CORPUS_DIR # -max_len=1000 -jobs=20 ...
				142
				143	As the fuzzer discovers new interesting test cases (i.e. test cases that
				144	trigger coverage of new paths through the code under test), those test cases
				145	will be added to the corpus directory.
				146
				147	By default, the fuzzing process will continue indefinitely – at least until
				148	a bug is found. Any crashes or sanitizer failures will be reported as usual,
				149	stopping the fuzzing process, and the particular input that triggered the bug
Kostya Serebryany	2fe9304	2016-04-29 18:49:55 +0000	[diff] [blame]	150	will be written to disk (typically as ``crash-<sha1>``, ``leak-<sha1>``,
				151	or ``timeout-<sha1>``).
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	152
				153
				154	Parallel Fuzzing
				155	----------------
				156
				157	Each libFuzzer process is single-threaded, unless the library under test starts
				158	its own threads. However, it is possible to run multiple libFuzzer processes in
				159	parallel with a shared corpus directory; this has the advantage that any new
				160	inputs found by one fuzzer process will be available to the other fuzzer
				161	processes (unless you disable this with the ``-reload=0`` option).
				162
				163	This is primarily controlled by the ``-jobs=N`` option, which indicates that
				164	that `N` fuzzing jobs should be run to completion (i.e. until a bug is found or
				165	time/iteration limits are reached). These jobs will be run across a set of
				166	worker processes, by default using half of the available CPU cores; the count of
				167	worker processes can be overridden by the ``-workers=N`` option. For example,
				168	running with ``-jobs=30`` on a 12-core machine would run 6 workers by default,
				169	with each worker averaging 5 bugs by completion of the entire process.
				170
				171
				172	Options
				173	=======
				174
				175	To run the fuzzer, pass zero or more corpus directories as command line
				176	arguments. The fuzzer will read test inputs from each of these corpus
				177	directories, and any new test inputs that are generated will be written
				178	back to the first corpus directory:
				179
				180	.. code-block:: console
				181
				182	./fuzzer [-flag1=val1 [-flag2=val2 ...] ] [dir1 [dir2 ...] ]
				183
				184	If a list of files (rather than directories) are passed to the fuzzer program,
				185	then it will re-run those files as test inputs but will not perform any fuzzing.
				186	In this mode the fuzzer binary can be used as a regression test (e.g. on a
				187	continuous integration system) to check the target function and saved inputs
				188	still work.
				189
				190	The most important command line options are:
				191
				192	``-help``
				193	Print help message.
				194	``-seed``
				195	Random seed. If 0 (the default), the seed is generated.
				196	``-runs``
				197	Number of individual test runs, -1 (the default) to run indefinitely.
				198	``-max_len``
				199	Maximum length of a test input. If 0 (the default), libFuzzer tries to guess
				200	a good value based on the corpus (and reports it).
				201	``-timeout``
				202	Timeout in seconds, default 1200. If an input takes longer than this timeout,
				203	the process is treated as a failure case.
				204	``-timeout_exitcode``
				205	Exit code (default 77) to emit when terminating due to timeout, when
				206	``-abort_on_timeout`` is not set.
				207	``-max_total_time``
				208	If positive, indicates the maximum total time in seconds to run the fuzzer.
				209	If 0 (the default), run indefinitely.
				210	``-merge``
				211	If set to 1, any corpus inputs from the 2nd, 3rd etc. corpus directories
				212	that trigger new code coverage will be merged into the first corpus
				213	directory. Defaults to 0.
				214	``-reload``
				215	If set to 1 (the default), the corpus directory is re-read periodically to
				216	check for new inputs; this allows detection of new inputs that were discovered
				217	by other fuzzing processes.
				218	``-jobs``
				219	Number of fuzzing jobs to run to completion. Default value is 0, which runs a
				220	single fuzzing process until completion. If the value is >= 1, then this
				221	number of jobs performing fuzzing are run, in a collection of parallel
				222	separate worker processes; each such worker process has its
				223	``stdout``/``stderr`` redirected to ``fuzz-<JOB>.log``.
				224	``-workers``
				225	Number of simultaneous worker processes to run the fuzzing jobs to completion
				226	in. If 0 (the default), ``min(jobs, NumberOfCpuCores()/2)`` is used.
				227	``-dict``
				228	Provide a dictionary of input keywords; see Dictionaries_.
				229	``-use_counters``
				230	Use `coverage counters`_ to generate approximate counts of how often code
				231	blocks are hit; defaults to 1.
				232	``-use_traces``
				233	Use instruction traces (experimental, defaults to 0); see `Data-flow-guided fuzzing`_.
				234	``-only_ascii``
				235	If 1, generate only ASCII (``isprint``+``isspace``) inputs. Defaults to 0.
				236	``-artifact_prefix``
				237	Provide a prefix to use when saving fuzzing artifacts (crash, timeout, or
				238	slow inputs) as ``$(artifact_prefix)file``. Defaults to empty.
				239	``-exact_artifact_path``
				240	Ignored if empty (the default). If non-empty, write the single artifact on
				241	failure (crash, timeout) as ``$(exact_artifact_path)``. This overrides
				242	``-artifact_prefix`` and will not use checksum in the file name. Do not use
				243	the same path for several parallel processes.
				244	``-print_final_stats``
				245	If 1, print statistics at exit. Defaults to 0.
Kostya Serebryany	dced5d3	2016-04-29 19:28:24 +0000	[diff] [blame]	246	``-detect-leaks``
				247	If 1 (default) and if LeakSanitizer is enabled
				248	try to detect memory leaks during fuzzing (i.e. not only at shut down).
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	249	``-close_fd_mask``
				250	Indicate output streams to close at startup. Be careful, this will also
				251	remove diagnostic output from the tools in use; for example the messages
				252	AddressSanitizer_ sends to ``stderr``/``stdout`` will also be lost.
				253
				254	- 0 (default): close neither ``stdout`` nor ``stderr``
				255	- 1 : close ``stdout``
				256	- 2 : close ``stderr``
				257	- 3 : close both ``stdout`` and ``stderr``.
Kostya Serebryany	2adfa3b	2015-05-20 21:03:03 +0000	[diff] [blame]	258
				259	For the full list of flags run the fuzzer binary with ``-help=1``.
				260
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	261	Output
				262	======
				263
				264	During operation the fuzzer prints information to ``stderr``, for example::
				265
				266	INFO: Seed: 3338750330
				267	Loaded 1024/1211 files from corpus/
				268	INFO: -max_len is not provided, using 64
				269	#0 READ units: 1211 exec/s: 0
				270	#1211 INITED cov: 2575 bits: 8855 indir: 5 units: 830 exec/s: 1211
				271	#1422 NEW cov: 2580 bits: 8860 indir: 5 units: 831 exec/s: 1422 L: 21 MS: 1 ShuffleBytes-
				272	#1688 NEW cov: 2581 bits: 8865 indir: 5 units: 832 exec/s: 1688 L: 19 MS: 2 EraseByte-CrossOver-
				273	#1734 NEW cov: 2583 bits: 8879 indir: 5 units: 833 exec/s: 1734 L: 27 MS: 3 ChangeBit-EraseByte-ShuffleBytes-
				274	...
				275
				276	The early parts of the output include information about the fuzzer options and
				277	configuration, including the current random seed (in the ``Seed:`` line; this
				278	can be overridden with the ``-seed=N`` flag).
				279
				280	Further output lines have the form of an event code and statistics. The
				281	possible event codes are:
				282
				283	``READ``
				284	The fuzzer has read in all of the provided input samples from the corpus
				285	directories.
				286	``INITED``
				287	The fuzzer has completed initialization, which includes running each of
				288	the initial input samples through the code under test.
				289	``NEW``
				290	The fuzzer has created a test input that covers new areas of the code
				291	under test. This input will be saved to the primary corpus directory.
				292	``pulse``
				293	The fuzzer has generated 2\ :sup:`n` inputs (generated periodically to reassure
				294	the user that the fuzzer is still working).
				295	``DONE``
				296	The fuzzer has completed operation because it has reached the specified
				297	iteration limit (``-runs``) or time limit (``-max_total_time``).
				298	``MIN<n>``
				299	The fuzzer is minimizing the combination of input corpus directories into
				300	a single unified corpus (due to the ``-merge`` command line option).
				301	``RELOAD``
				302	The fuzzer is performing a periodic reload of inputs from the corpus
				303	directory; this allows it to discover any inputs discovered by other
				304	fuzzer processes (see `Parallel Fuzzing`_).
				305
				306	Each output line also reports the following statistics (when non-zero):
				307
				308	``cov:``
				309	Total number of code blocks or edges covered by the executing the current
				310	corpus.
				311	``bits:``
				312	Rough measure of the number of code blocks or edges covered, and how often;
				313	only valid if the fuzzer is run with ``-use_counters=1``.
				314	``indir:``
				315	Number of distinct function `caller-callee pairs`_ executed with the
				316	current corpus; only valid if the code under test was built with
				317	``-fsanitize-coverage=indirect-calls``.
				318	``units:``
				319	Number of entries in the current input corpus.
				320	``exec/s:``
				321	Number of fuzzer iterations per second.
				322
				323	For ``NEW`` events, the output line also includes information about the mutation
				324	operation that produced the new input:
				325
				326	``L:``
				327	Size of the new input in bytes.
				328	``MS: <n> <operations>``
				329	Count and list of the mutation operations used to generate the input.
				330
				331
				332	Examples
				333	========
Kostya Serebryany	d11dc17	2016-03-12 02:56:25 +0000	[diff] [blame]	334	.. contents::
				335	:local:
				336	:depth: 1
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	337
				338	Toy example
				339	-----------
				340
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	341	A simple function that does something interesting if it receives the input
				342	"HI!"::
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	343
				344	cat << EOF >> test_fuzzer.cc
Kostya Serebryany	1c80b9d	2015-11-26 00:12:57 +0000	[diff] [blame]	345	#include <stdint.h>
				346	#include <stddef.h>
				347	extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	348	if (size > 0 && data[0] == 'H')
				349	if (size > 1 && data[1] == 'I')
				350	if (size > 2 && data[2] == '!')
				351	__builtin_trap();
Kostya Serebryany	20bb5e7	2015-10-02 23:34:06 +0000	[diff] [blame]	352	return 0;
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	353	}
				354	EOF
Kostya Serebryany	abca88e	2016-03-12 03:05:37 +0000	[diff] [blame]	355	# Build test_fuzzer.cc with asan and link against libFuzzer.a
				356	clang++ -fsanitize=address -fsanitize-coverage=edge test_fuzzer.cc libFuzzer.a
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	357	# Run the fuzzer with no corpus.
				358	./a.out
				359
Kostya Serebryany	abca88e	2016-03-12 03:05:37 +0000	[diff] [blame]	360	You should get an error pretty quickly::
				361
				362	#0 READ units: 1 exec/s: 0
				363	#1 INITED cov: 3 units: 1 exec/s: 0
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	364	#2 NEW cov: 5 units: 2 exec/s: 0 L: 64 MS: 0
				365	#19237 NEW cov: 9 units: 3 exec/s: 0 L: 64 MS: 0
Kostya Serebryany	abca88e	2016-03-12 03:05:37 +0000	[diff] [blame]	366	#20595 NEW cov: 10 units: 4 exec/s: 0 L: 1 MS: 4 ChangeASCIIInt-ShuffleBytes-ChangeByte-CrossOver-
				367	#34574 NEW cov: 13 units: 5 exec/s: 0 L: 2 MS: 3 ShuffleBytes-CrossOver-ChangeBit-
				368	#34807 NEW cov: 15 units: 6 exec/s: 0 L: 3 MS: 1 CrossOver-
				369	==31511== ERROR: libFuzzer: deadly signal
				370	...
				371	artifact_prefix='./'; Test unit written to ./crash-b13e8756b13a00cf168300179061fb4b91fefbed
				372
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	373
				374	PCRE2
				375	-----
				376
Kostya Serebryany	abca88e	2016-03-12 03:05:37 +0000	[diff] [blame]	377	Here we show how to use libFuzzer on something real, yet simple: pcre2_::
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	378
Alexey Samsonov	21a3381	2015-05-07 23:33:24 +0000	[diff] [blame]	379	COV_FLAGS=" -fsanitize-coverage=edge,indirect-calls,8bit-counters"
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	380	# Get PCRE2
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	381	wget ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-10.20.tar.gz
				382	tar xf pcre2-10.20.tar.gz
				383	# Build PCRE2 with AddressSanitizer and coverage; requires autotools.
				384	(cd pcre2-10.20; ./autogen.sh; CC="clang -fsanitize=address $COV_FLAGS" ./configure --prefix=`pwd`/../inst && make -j && make install)
Kostya Serebryany	abca88e	2016-03-12 03:05:37 +0000	[diff] [blame]	385	# Build the fuzzing target function that does something interesting with PCRE2.
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	386	cat << EOF > pcre_fuzzer.cc
				387	#include <string.h>
Kostya Serebryany	1c80b9d	2015-11-26 00:12:57 +0000	[diff] [blame]	388	#include <stdint.h>
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	389	#include "pcre2posix.h"
Kostya Serebryany	1c80b9d	2015-11-26 00:12:57 +0000	[diff] [blame]	390	extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
Kostya Serebryany	20bb5e7	2015-10-02 23:34:06 +0000	[diff] [blame]	391	if (size < 1) return 0;
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	392	char *str = new char[size+1];
				393	memcpy(str, data, size);
				394	str[size] = 0;
				395	regex_t preg;
				396	if (0 == regcomp(&preg, str, 0)) {
				397	regexec(&preg, str, 0, 0, 0);
				398	regfree(&preg);
				399	}
				400	delete [] str;
Kostya Serebryany	20bb5e7	2015-10-02 23:34:06 +0000	[diff] [blame]	401	return 0;
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	402	}
				403	EOF
				404	clang++ -g -fsanitize=address $COV_FLAGS -c -std=c++11 -I inst/include/ pcre_fuzzer.cc
				405	# Link.
Kostya Serebryany	abca88e	2016-03-12 03:05:37 +0000	[diff] [blame]	406	clang++ -g -fsanitize=address -Wl,--whole-archive inst/lib/*.a -Wl,-no-whole-archive libFuzzer.a pcre_fuzzer.o -o pcre_fuzzer
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	407
				408	This will give you a binary of the fuzzer, called ``pcre_fuzzer``.
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	409	Now, create a directory that will hold the test corpus:
				410
				411	.. code-block:: console
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	412
				413	mkdir -p CORPUS
				414
				415	For simple input languages like regular expressions this is all you need.
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	416	For more complicated/structured inputs, the fuzzer works much more efficiently
				417	if you can populate the corpus directory with a variety of valid and invalid
				418	inputs for the code under test.
				419	Now run the fuzzer with the corpus directory as the only parameter:
				420
				421	.. code-block:: console
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	422
				423	./pcre_fuzzer ./CORPUS
				424
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	425	Initially, you will see Output_ like this::
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	426
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	427	INFO: Seed: 2938818941
				428	INFO: -max_len is not provided, using 64
				429	INFO: A corpus is not provided, starting from an empty corpus
				430	#0 READ units: 1 exec/s: 0
				431	#1 INITED cov: 3 bits: 3 units: 1 exec/s: 0
				432	#2 NEW cov: 176 bits: 176 indir: 3 units: 2 exec/s: 0 L: 64 MS: 0
				433	#8 NEW cov: 176 bits: 179 indir: 3 units: 3 exec/s: 0 L: 63 MS: 2 ChangeByte-EraseByte-
				434	...
				435	#14004 NEW cov: 1500 bits: 4536 indir: 5 units: 406 exec/s: 0 L: 54 MS: 3 ChangeBit-ChangeBit-CrossOver-
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	436
				437	Now, interrupt the fuzzer and run it again the same way. You will see::
				438
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	439	INFO: Seed: 3398349082
				440	INFO: -max_len is not provided, using 64
				441	#0 READ units: 405 exec/s: 0
				442	#405 INITED cov: 1499 bits: 4535 indir: 5 units: 286 exec/s: 0
				443	#587 NEW cov: 1499 bits: 4540 indir: 5 units: 287 exec/s: 0 L: 52 MS: 2 InsertByte-EraseByte-
				444	#667 NEW cov: 1501 bits: 4542 indir: 5 units: 288 exec/s: 0 L: 39 MS: 2 ChangeBit-InsertByte-
				445	#672 NEW cov: 1501 bits: 4543 indir: 5 units: 289 exec/s: 0 L: 15 MS: 2 ChangeASCIIInt-ChangeBit-
				446	#739 NEW cov: 1501 bits: 4544 indir: 5 units: 290 exec/s: 0 L: 64 MS: 4 ShuffleBytes-ChangeASCIIInt-InsertByte-ChangeBit-
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	447	...
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	448
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	449	On the second execution the fuzzer has a non-empty input corpus (405 items). As
				450	the first step, the fuzzer minimized this corpus (the ``INITED`` line) to
				451	produce 286 interesting items, omitting inputs that do not hit any additional
				452	code.
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	453
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	454	(Aside: although the fuzzer only saves new inputs that hit additional code, this
				455	does not mean that the corpus as a whole is kept minimized. For example, if
				456	an input hitting A-B-C then an input that hits A-B-C-D are generated,
				457	they will both be saved, even though the latter subsumes the former.)
				458
				459
				460	You may run ``N`` independent fuzzer jobs in parallel on ``M`` CPUs:
				461
				462	.. code-block:: console
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	463
				464	N=100; M=4; ./pcre_fuzzer ./CORPUS -jobs=$N -workers=$M
				465
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	466	By default (``-reload=1``) the fuzzer processes will periodically scan the corpus directory
Kostya Serebryany	9690fcf	2015-05-12 18:51:57 +0000	[diff] [blame]	467	and reload any new tests. This way the test inputs found by one process will be picked up
				468	by all others.
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	469
Kostya Serebryany	9690fcf	2015-05-12 18:51:57 +0000	[diff] [blame]	470	If ``-workers=$M`` is not supplied, ``min($N,NumberOfCpuCore/2)`` will be used.
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	471
Kostya Serebryany	5e593a4	2015-04-08 06:16:11 +0000	[diff] [blame]	472	Heartbleed
				473	----------
				474	Remember Heartbleed_?
				475	As it was recently `shown <https://blog.hboeck.de/archives/868-How-Heartbleed-couldve-been-found.html>`_,
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	476	fuzzing with AddressSanitizer_ can find Heartbleed. Indeed, here are the step-by-step instructions
				477	to find Heartbleed with libFuzzer::
Kostya Serebryany	5e593a4	2015-04-08 06:16:11 +0000	[diff] [blame]	478
				479	wget https://www.openssl.org/source/openssl-1.0.1f.tar.gz
				480	tar xf openssl-1.0.1f.tar.gz
Alexey Samsonov	21a3381	2015-05-07 23:33:24 +0000	[diff] [blame]	481	COV_FLAGS="-fsanitize-coverage=edge,indirect-calls" # -fsanitize-coverage=8bit-counters
Kostya Serebryany	5e593a4	2015-04-08 06:16:11 +0000	[diff] [blame]	482	(cd openssl-1.0.1f/ && ./config &&
				483	make -j 32 CC="clang -g -fsanitize=address $COV_FLAGS")
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	484	# Get and build libFuzzer
Kostya Serebryany	5e593a4	2015-04-08 06:16:11 +0000	[diff] [blame]	485	svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer
				486	clang -c -g -O2 -std=c++11 Fuzzer/*.cpp -IFuzzer
				487	# Get examples of key/pem files.
				488	git clone https://github.com/hannob/selftls
				489	cp selftls/server* . -v
				490	cat << EOF > handshake-fuzz.cc
				491	#include <openssl/ssl.h>
				492	#include <openssl/err.h>
				493	#include <assert.h>
Kostya Serebryany	1c80b9d	2015-11-26 00:12:57 +0000	[diff] [blame]	494	#include <stdint.h>
				495	#include <stddef.h>
				496
Kostya Serebryany	5e593a4	2015-04-08 06:16:11 +0000	[diff] [blame]	497	SSL_CTX *sctx;
				498	int Init() {
				499	SSL_library_init();
				500	SSL_load_error_strings();
				501	ERR_load_BIO_strings();
				502	OpenSSL_add_all_algorithms();
				503	assert (sctx = SSL_CTX_new(TLSv1_method()));
				504	assert (SSL_CTX_use_certificate_file(sctx, "server.pem", SSL_FILETYPE_PEM));
				505	assert (SSL_CTX_use_PrivateKey_file(sctx, "server.key", SSL_FILETYPE_PEM));
				506	return 0;
				507	}
Kostya Serebryany	1c80b9d	2015-11-26 00:12:57 +0000	[diff] [blame]	508	extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
Kostya Serebryany	5e593a4	2015-04-08 06:16:11 +0000	[diff] [blame]	509	static int unused = Init();
				510	SSL *server = SSL_new(sctx);
				511	BIO *sinbio = BIO_new(BIO_s_mem());
				512	BIO *soutbio = BIO_new(BIO_s_mem());
				513	SSL_set_bio(server, sinbio, soutbio);
				514	SSL_set_accept_state(server);
				515	BIO_write(sinbio, Data, Size);
				516	SSL_do_handshake(server);
				517	SSL_free(server);
Kostya Serebryany	20bb5e7	2015-10-02 23:34:06 +0000	[diff] [blame]	518	return 0;
Kostya Serebryany	5e593a4	2015-04-08 06:16:11 +0000	[diff] [blame]	519	}
				520	EOF
Mehdi Amini	30618f9	2015-09-17 15:59:52 +0000	[diff] [blame]	521	# Build the fuzzer.
Kostya Serebryany	5e593a4	2015-04-08 06:16:11 +0000	[diff] [blame]	522	clang++ -g handshake-fuzz.cc -fsanitize=address \
				523	openssl-1.0.1f/libssl.a openssl-1.0.1f/libcrypto.a Fuzzer*.o
				524	# Run 20 independent fuzzer jobs.
				525	./a.out -jobs=20 -workers=20
				526
				527	Voila::
				528
				529	#1048576 pulse cov 3424 bits 0 units 9 exec/s 24385
				530	=================================================================
				531	==17488==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x629000004748 at pc 0x00000048c979 bp 0x7fffe3e864f0 sp 0x7fffe3e85ca8
				532	READ of size 60731 at 0x629000004748 thread T0
				533	#0 0x48c978 in __asan_memcpy
				534	#1 0x4db504 in tls1_process_heartbeat openssl-1.0.1f/ssl/t1_lib.c:2586:3
				535	#2 0x580be3 in ssl3_read_bytes openssl-1.0.1f/ssl/s3_pkt.c:1092:4
				536
Kostya Serebryany	1c80b9d	2015-11-26 00:12:57 +0000	[diff] [blame]	537	Note: a `similar fuzzer <https://boringssl.googlesource.com/boringssl/+/HEAD/FUZZING.md>`_
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	538	is now a part of the BoringSSL_ source tree.
Kostya Serebryany	1c80b9d	2015-11-26 00:12:57 +0000	[diff] [blame]	539
Kostya Serebryany	043ab1c	2015-04-01 21:33:20 +0000	[diff] [blame]	540	Advanced features
				541	=================
Kostya Serebryany	d11dc17	2016-03-12 02:56:25 +0000	[diff] [blame]	542	.. contents::
				543	:local:
				544	:depth: 1
Kostya Serebryany	043ab1c	2015-04-01 21:33:20 +0000	[diff] [blame]	545
Kostya Serebryany	7d21166	2015-09-04 00:12:11 +0000	[diff] [blame]	546	Dictionaries
				547	------------
Kostya Serebryany	7d21166	2015-09-04 00:12:11 +0000	[diff] [blame]	548	LibFuzzer supports user-supplied dictionaries with input language keywords
				549	or other interesting byte sequences (e.g. multi-byte magic values).
				550	Use ``-dict=DICTIONARY_FILE``. For some input languages using a dictionary
				551	may significantly improve the search speed.
				552	The dictionary syntax is similar to that used by AFL_ for its ``-x`` option::
				553
				554	# Lines starting with '#' and empty lines are ignored.
				555
				556	# Adds "blah" (w/o quotes) to the dictionary.
				557	kw1="blah"
				558	# Use \\ for backslash and \" for quotes.
				559	kw2="\"ac\\dc\""
				560	# Use \xAB for hex values
				561	kw3="\xF7\xF8"
				562	# the name of the keyword followed by '=' may be omitted:
				563	"foo\x0Abar"
				564
Kostya Serebryany	b17e298	2015-07-31 21:48:10 +0000	[diff] [blame]	565	Data-flow-guided fuzzing
				566	------------------------
				567
				568	EXPERIMENTAL.
				569	With an additional compiler flag ``-fsanitize-coverage=trace-cmp`` (see SanitizerCoverageTraceDataFlow_)
				570	and extra run-time flag ``-use_traces=1`` the fuzzer will try to apply data-flow-guided fuzzing.
				571	That is, the fuzzer will record the inputs to comparison instructions, switch statements,
Kostya Serebryany	7f4227d	2015-08-05 18:23:01 +0000	[diff] [blame]	572	and several libc functions (``memcmp``, ``strcmp``, ``strncmp``, etc).
Kostya Serebryany	b17e298	2015-07-31 21:48:10 +0000	[diff] [blame]	573	It will later use those recorded inputs during mutations.
				574
				575	This mode can be combined with DataFlowSanitizer_ to achieve better sensitivity.
				576
Kostya Serebryany	6bd016b	2015-04-10 05:44:43 +0000	[diff] [blame]	577	AFL compatibility
				578	-----------------
Kostya Serebryany	9e1a238	2016-03-29 23:07:36 +0000	[diff] [blame]	579	LibFuzzer can be used together with AFL_ on the same test corpus.
Kostya Serebryany	6bd016b	2015-04-10 05:44:43 +0000	[diff] [blame]	580	Both fuzzers expect the test corpus to reside in a directory, one file per input.
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	581	You can run both fuzzers on the same corpus, one after another:
				582
				583	.. code-block:: console
Kostya Serebryany	6bd016b	2015-04-10 05:44:43 +0000	[diff] [blame]	584
Kostya Serebryany	9e1a238	2016-03-29 23:07:36 +0000	[diff] [blame]	585	./afl-fuzz -i testcase_dir -o findings_dir /path/to/program @@
Kostya Serebryany	6bd016b	2015-04-10 05:44:43 +0000	[diff] [blame]	586	./llvm-fuzz testcase_dir findings_dir # Will write new tests to testcase_dir
				587
				588	Periodically restart both fuzzers so that they can use each other's findings.
Kostya Serebryany	9e1a238	2016-03-29 23:07:36 +0000	[diff] [blame]	589	Currently, there is no simple way to run both fuzzing engines in parallel while sharing the same corpus dir.
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	590
Kostya Serebryany	cd073d5	2015-04-10 06:32:29 +0000	[diff] [blame]	591	How good is my fuzzer?
				592	----------------------
				593
Kostya Serebryany	566bc5a	2015-05-06 22:19:00 +0000	[diff] [blame]	594	Once you implement your target function ``LLVMFuzzerTestOneInput`` and fuzz it to death,
Kostya Serebryany	cd073d5	2015-04-10 06:32:29 +0000	[diff] [blame]	595	you will want to know whether the function or the corpus can be improved further.
				596	One easy to use metric is, of course, code coverage.
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	597	You can get the coverage for your corpus like this:
				598
				599	.. code-block:: console
Kostya Serebryany	cd073d5	2015-04-10 06:32:29 +0000	[diff] [blame]	600
Kostya Serebryany	ec77af3	2016-05-05 18:07:09 +0000	[diff] [blame]	601	ASAN_OPTIONS=coverage=1:html_cov_report=1 ./fuzzer CORPUS_DIR -runs=0
Kostya Serebryany	cd073d5	2015-04-10 06:32:29 +0000	[diff] [blame]	602
Kostya Serebryany	ec77af3	2016-05-05 18:07:09 +0000	[diff] [blame]	603	This will run all tests in the CORPUS_DIR but will not perform any fuzzing.
				604	At the end of the process it will dump a single html file with coverage information.
				605	See SanitizerCoverage_ for details.
				606
				607	You may also use other ways to visualize coverage,
				608	e.g. `llvm-cov <http://llvm.org/docs/CommandGuide/llvm-cov.html>`_, but those will require
				609	you to rebuild the code with different compiler flags.
Kostya Serebryany	cd073d5	2015-04-10 06:32:29 +0000	[diff] [blame]	610
Kostya Serebryany	926b9bd	2015-05-22 22:43:05 +0000	[diff] [blame]	611	User-supplied mutators
				612	----------------------
				613
				614	LibFuzzer allows to use custom (user-supplied) mutators,
				615	see FuzzerInterface.h_
				616
Kostya Serebryany	aca7696	2016-01-16 01:23:12 +0000	[diff] [blame]	617	Startup initialization
				618	----------------------
				619	If the library being tested needs to be initialized, there are several options.
				620
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	621	The simplest way is to have a statically initialized global object:
				622
				623	.. code-block:: c++
Kostya Serebryany	aca7696	2016-01-16 01:23:12 +0000	[diff] [blame]	624
				625	static bool Initialized = DoInitialization();
				626
				627	Alternatively, you may define an optional init function and it will receive
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	628	the program arguments that you can read and modify:
				629
				630	.. code-block:: c++
Kostya Serebryany	aca7696	2016-01-16 01:23:12 +0000	[diff] [blame]	631
				632	extern "C" int LLVMFuzzerInitialize(int argc, char **argv) {
				633	ReadAndMaybeModify(argc, argv);
				634	return 0;
				635	}
				636
Kostya Serebryany	aca7696	2016-01-16 01:23:12 +0000	[diff] [blame]	637
Kostya Serebryany	9e1a238	2016-03-29 23:07:36 +0000	[diff] [blame]	638	Leaks
				639	-----
				640
Kostya Serebryany	2fe9304	2016-04-29 18:49:55 +0000	[diff] [blame]	641	Binaries built with AddressSanitizer_ or LeakSanitizer_ will try to detect
				642	memory leaks at the process shutdown.
				643	For in-process fuzzing this is inconvenient
				644	since the fuzzer needs to report a leak with a reproducer as soon as the leaky
				645	mutation is found. However, running full leak detection after every mutation
				646	is expensive.
Kostya Serebryany	9e1a238	2016-03-29 23:07:36 +0000	[diff] [blame]	647
Kostya Serebryany	2fe9304	2016-04-29 18:49:55 +0000	[diff] [blame]	648	By default (``-detect_leaks=1``) libFuzzer will count the number of
				649	``malloc`` and ``free`` calls when executing every mutation.
				650	If the numbers don't match (which by itself doesn't mean there is a leak)
				651	libFuzzer will invoke the more expensive LeakSanitizer_
				652	pass and if the actual leak is found, it will be reported with the reproducer
				653	and the process will exit.
				654
				655	If your target has massive leaks and the leak detection is disabled
				656	you will eventually run out of RAM.
Kostya Serebryany	9e1a238	2016-03-29 23:07:36 +0000	[diff] [blame]	657	To protect your machine from OOM death you may use
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	658	e.g. ``ASAN_OPTIONS=hard_rss_limit_mb=2000`` (with AddressSanitizer_).
Kostya Serebryany	9e1a238	2016-03-29 23:07:36 +0000	[diff] [blame]	659
Kostya Serebryany	9e1a238	2016-03-29 23:07:36 +0000	[diff] [blame]	660
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	661	Fuzzing components of LLVM
				662	==========================
Kostya Serebryany	d11dc17	2016-03-12 02:56:25 +0000	[diff] [blame]	663	.. contents::
				664	:local:
				665	:depth: 1
Kostya Serebryany	35ce863	2015-03-30 23:05:30 +0000	[diff] [blame]	666
				667	clang-format-fuzzer
				668	-------------------
				669	The inputs are random pieces of C++-like text.
				670
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	671	Build (make sure to use fresh clang as the host compiler):
				672
				673	.. code-block:: console
Kostya Serebryany	35ce863	2015-03-30 23:05:30 +0000	[diff] [blame]	674
				675	cmake -GNinja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_USE_SANITIZER=Address -DLLVM_USE_SANITIZE_COVERAGE=YES -DCMAKE_BUILD_TYPE=Release /path/to/llvm
				676	ninja clang-format-fuzzer
				677	mkdir CORPUS_DIR
				678	./bin/clang-format-fuzzer CORPUS_DIR
				679
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	680	Optionally build other kinds of binaries (ASan+Debug, MSan, UBSan, etc).
Kostya Serebryany	35ce863	2015-03-30 23:05:30 +0000	[diff] [blame]	681
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	682	Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23052
Kostya Serebryany	35ce863	2015-03-30 23:05:30 +0000	[diff] [blame]	683
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	684	clang-fuzzer
				685	------------
Kostya Serebryany	35ce863	2015-03-30 23:05:30 +0000	[diff] [blame]	686
Kostya Serebryany	866e0d1	2015-09-02 22:44:46 +0000	[diff] [blame]	687	The behavior is very similar to ``clang-format-fuzzer``.
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	688
				689	Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23057
Kostya Serebryany	35ce863	2015-03-30 23:05:30 +0000	[diff] [blame]	690
Kostya Serebryany	b98e327	2015-08-31 18:57:24 +0000	[diff] [blame]	691	llvm-as-fuzzer
				692	--------------
				693
				694	Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=24639
				695
Daniel Sanders	5151b20	2015-09-18 10:47:45 +0000	[diff] [blame]	696	llvm-mc-fuzzer
				697	--------------
				698
				699	This tool fuzzes the MC layer. Currently it is only able to fuzz the
				700	disassembler but it is hoped that assembly, and round-trip verification will be
				701	added in future.
				702
				703	When run in dissassembly mode, the inputs are opcodes to be disassembled. The
				704	fuzzer will consume as many instructions as possible and will stop when it
				705	finds an invalid instruction or runs out of data.
				706
Daniel Sanders	4fe1c8b	2015-09-26 17:09:01 +0000	[diff] [blame]	707	Please note that the command line interface differs slightly from that of other
				708	fuzzers. The fuzzer arguments should follow ``--fuzzer-args`` and should have
				709	a single dash, while other arguments control the operation mode and target in a
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	710	similar manner to ``llvm-mc`` and should have two dashes. For example:
				711
				712	.. code-block:: console
Daniel Sanders	5151b20	2015-09-18 10:47:45 +0000	[diff] [blame]	713
Daniel Sanders	4fe1c8b	2015-09-26 17:09:01 +0000	[diff] [blame]	714	llvm-mc-fuzzer --triple=aarch64-linux-gnu --disassemble --fuzzer-args -max_len=4 -jobs=10
Daniel Sanders	5151b20	2015-09-18 10:47:45 +0000	[diff] [blame]	715
Kostya Serebryany	fb2f331	2015-05-13 22:42:28 +0000	[diff] [blame]	716	Buildbot
				717	--------
				718
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	719	A buildbot continuously runs the above fuzzers for LLVM components, with results
				720	shown at http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer .
Kostya Serebryany	fb2f331	2015-05-13 22:42:28 +0000	[diff] [blame]	721
Kostya Serebryany	35ce863	2015-03-30 23:05:30 +0000	[diff] [blame]	722	FAQ
				723	=========================
				724
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	725	Q. Why doesn't libFuzzer use any of the LLVM support?
				726	-----------------------------------------------------
Kostya Serebryany	35ce863	2015-03-30 23:05:30 +0000	[diff] [blame]	727
				728	There are two reasons.
				729
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	730	First, we want this library to be used outside of the LLVM without users having to
Kostya Serebryany	35ce863	2015-03-30 23:05:30 +0000	[diff] [blame]	731	build the rest of LLVM. This may sound unconvincing for many LLVM folks,
				732	but in practice the need for building the whole LLVM frightens many potential
				733	users -- and we want more users to use this code.
				734
				735	Second, there is a subtle technical reason not to rely on the rest of LLVM, or
				736	any other large body of code (maybe not even STL). When coverage instrumentation
				737	is enabled, it will also instrument the LLVM support code which will blow up the
				738	coverage set of the process (since the fuzzer is in-process). In other words, by
				739	using more external dependencies we will slow down the fuzzer while the main
				740	reason for it to exist is extreme speed.
				741
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	742	Q. What about Windows then? The fuzzer contains code that does not build on Windows.
Kostya Serebryany	35ce863	2015-03-30 23:05:30 +0000	[diff] [blame]	743	------------------------------------------------------------------------------------
				744
Kostya Serebryany	241fb61	2016-03-12 03:23:02 +0000	[diff] [blame]	745	Volunteers are welcome.
Kostya Serebryany	35ce863	2015-03-30 23:05:30 +0000	[diff] [blame]	746
				747	Q. When this Fuzzer is not a good solution for a problem?
				748	---------------------------------------------------------
				749
				750	* If the test inputs are validated by the target library and the validator
Kostya Serebryany	241fb61	2016-03-12 03:23:02 +0000	[diff] [blame]	751	asserts/crashes on invalid inputs, in-process fuzzing is not applicable.
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	752	* Bugs in the target library may accumulate without being detected. E.g. a memory
Kostya Serebryany	35ce863	2015-03-30 23:05:30 +0000	[diff] [blame]	753	corruption that goes undetected at first and then leads to a crash while
				754	testing another input. This is why it is highly recommended to run this
				755	in-process fuzzer with all sanitizers to detect most bugs on the spot.
				756	* It is harder to protect the in-process fuzzer from excessive memory
				757	consumption and infinite loops in the target library (still possible).
				758	* The target library should not have significant global state that is not
				759	reset between the runs.
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	760	* Many interesting target libraries are not designed in a way that supports
Kostya Serebryany	35ce863	2015-03-30 23:05:30 +0000	[diff] [blame]	761	the in-process fuzzer interface (e.g. require a file path instead of a
				762	byte array).
				763	* If a single test run takes a considerable fraction of a second (or
				764	more) the speed benefit from the in-process fuzzer is negligible.
				765	* If the target library runs persistent threads (that outlive
				766	execution of one test) the fuzzing results will be unreliable.
				767
				768	Q. So, what exactly this Fuzzer is good for?
				769	--------------------------------------------
				770
				771	This Fuzzer might be a good choice for testing libraries that have relatively
Kostya Serebryany	241fb61	2016-03-12 03:23:02 +0000	[diff] [blame]	772	small inputs, each input takes < 10ms to run, and the library code is not expected
Kostya Serebryany	35ce863	2015-03-30 23:05:30 +0000	[diff] [blame]	773	to crash on invalid inputs.
Kostya Serebryany	241fb61	2016-03-12 03:23:02 +0000	[diff] [blame]	774	Examples: regular expression matchers, text or binary format parsers, compression,
				775	network, crypto.
Kostya Serebryany	35ce863	2015-03-30 23:05:30 +0000	[diff] [blame]	776
Kostya Serebryany	fab4fba	2015-08-11 01:53:45 +0000	[diff] [blame]	777	Trophies
				778	========
				779	* GLIBC: https://sourceware.org/glibc/wiki/FuzzingLibc
Kostya Serebryany	fdf4418	2015-08-11 04:16:37 +0000	[diff] [blame]	780
Kostya Serebryany	fab4fba	2015-08-11 01:53:45 +0000	[diff] [blame]	781	* MUSL LIBC:
Kostya Serebryany	fdf4418	2015-08-11 04:16:37 +0000	[diff] [blame]	782
				783	* http://git.musl-libc.org/cgit/musl/commit/?id=39dfd58417ef642307d90306e1c7e50aaec5a35c
				784	* http://www.openwall.com/lists/oss-security/2015/03/30/3
				785
Kostya Serebryany	928eb33	2015-10-12 18:15:42 +0000	[diff] [blame]	786	* `pugixml <https://github.com/zeux/pugixml/issues/39>`_
Kostya Serebryany	fdf4418	2015-08-11 04:16:37 +0000	[diff] [blame]	787
Kostya Serebryany	45dac2a	2015-10-10 02:14:18 +0000	[diff] [blame]	788	* PCRE: Search for "LLVM fuzzer" in http://vcs.pcre.org/pcre2/code/trunk/ChangeLog?view=markup;
Kostya Serebryany	928eb33	2015-10-12 18:15:42 +0000	[diff] [blame]	789	also in `bugzilla <https://bugs.exim.org/buglist.cgi?bug_status=__all__&content=libfuzzer&no_redirect=1&order=Importance&product=PCRE&query_format=specific>`_
Kostya Serebryany	fdf4418	2015-08-11 04:16:37 +0000	[diff] [blame]	790
Kostya Serebryany	928eb33	2015-10-12 18:15:42 +0000	[diff] [blame]	791	* `ICU <http://bugs.icu-project.org/trac/ticket/11838>`_
Kostya Serebryany	ed48377	2015-08-11 20:34:48 +0000	[diff] [blame]	792
Kostya Serebryany	928eb33	2015-10-12 18:15:42 +0000	[diff] [blame]	793	* `Freetype <https://savannah.nongnu.org/search/?words=LibFuzzer&type_of_search=bugs&Search=Search&exact=1#options>`_
Kostya Serebryany	6292128	2015-09-11 16:34:14 +0000	[diff] [blame]	794
Kostya Serebryany	928eb33	2015-10-12 18:15:42 +0000	[diff] [blame]	795	* `Harfbuzz <https://github.com/behdad/harfbuzz/issues/139>`_
				796
Kostya Serebryany	240a159	2015-11-11 05:25:24 +0000	[diff] [blame]	797	* `SQLite <http://www3.sqlite.org/cgi/src/info/088009efdd56160b>`_
Kostya Serebryany	65e7126	2015-11-11 05:20:55 +0000	[diff] [blame]	798
Kostya Serebryany	12fa3b5	2015-11-13 02:44:16 +0000	[diff] [blame]	799	* `Python <http://bugs.python.org/issue25388>`_
				800
Kostya Serebryany	fece674	2016-04-18 18:41:25 +0000	[diff] [blame]	801	* OpenSSL/BoringSSL: `[1] <https://boringssl.googlesource.com/boringssl/+/cb852981cd61733a7a1ae4fd8755b7ff950e857d>`_ `[2] <https://openssl.org/news/secadv/20160301.txt>`_ `[3] <https://boringssl.googlesource.com/boringssl/+/2b07fa4b22198ac02e0cee8f37f3337c3dba91bc>`_ `[4] <https://boringssl.googlesource.com/boringssl/+/6b6e0b20893e2be0e68af605a60ffa2cbb0ffa64>`_ `[5] <https://github.com/openssl/openssl/pull/931/commits/dd5ac557f052cc2b7f718ac44a8cb7ac6f77dca8>`_ `[6] <https://github.com/openssl/openssl/pull/931/commits/19b5b9194071d1d84e38ac9a952e715afbc85a81>`_
Kostya Serebryany	064a672	2015-12-05 02:23:49 +0000	[diff] [blame]	802
Kostya Serebryany	928eb33	2015-10-12 18:15:42 +0000	[diff] [blame]	803	* `Libxml2
Kostya Serebryany	0d234c3	2016-03-29 23:13:25 +0000	[diff] [blame]	804	<https://bugzilla.gnome.org/buglist.cgi?bug_status=__all__&content=libFuzzer&list_id=68957&order=Importance&product=libxml2&query_format=specific>`_ and `[HT206167] <https://support.apple.com/en-gb/HT206167>`_ (CVE-2015-5312, CVE-2015-7500, CVE-2015-7942)
Kostya Serebryany	45dac2a	2015-10-10 02:14:18 +0000	[diff] [blame]	805
Kostya Serebryany	240a159	2015-11-11 05:25:24 +0000	[diff] [blame]	806	* `Linux Kernel's BPF verifier <https://github.com/iovisor/bpf-fuzzer>`_
Kostya Serebryany	6292128	2015-09-11 16:34:14 +0000	[diff] [blame]	807
Kostya Serebryany	c138b64	2016-04-19 22:37:44 +0000	[diff] [blame]	808	* Capstone: `[1] <https://github.com/aquynh/capstone/issues/600>`__ `[2] <https://github.com/aquynh/capstone/commit/6b88d1d51eadf7175a8f8a11b690684443b11359>`__
				809
				810	* Radare2: `[1] <https://github.com/revskills?tab=contributions&from=2016-04-09>`__
				811
				812	* gRPC: `[1] <https://github.com/grpc/grpc/pull/6071/commits/df04c1f7f6aec6e95722ec0b023a6b29b6ea871c>`__ `[2] <https://github.com/grpc/grpc/pull/6071/commits/22a3dfd95468daa0db7245a4e8e6679a52847579>`__ `[3] <https://github.com/grpc/grpc/pull/6071/commits/9cac2a12d9e181d130841092e9d40fa3309d7aa7>`__ `[4] <https://github.com/grpc/grpc/pull/6012/commits/82a91c91d01ce9b999c8821ed13515883468e203>`__ `[5] <https://github.com/grpc/grpc/pull/6202/commits/2e3e0039b30edaf89fb93bfb2c1d0909098519fa>`__ `[6] <https://github.com/grpc/grpc/pull/6106/files>`__
				813
Kostya Serebryany	62023f2	2016-05-06 20:14:48 +0000	[diff] [blame^]	814	* WOFF2: `[1] <https://github.com/google/woff2/commit/a15a8ab>`__
				815
Kostya Serebryany	240a159	2015-11-11 05:25:24 +0000	[diff] [blame]	816	* LLVM: `Clang <https://llvm.org/bugs/show_bug.cgi?id=23057>`_, `Clang-format <https://llvm.org/bugs/show_bug.cgi?id=23052>`_, `libc++ <https://llvm.org/bugs/show_bug.cgi?id=24411>`_, `llvm-as <https://llvm.org/bugs/show_bug.cgi?id=24639>`_, Disassembler: http://reviews.llvm.org/rL247405, http://reviews.llvm.org/rL247414, http://reviews.llvm.org/rL247416, http://reviews.llvm.org/rL247417, http://reviews.llvm.org/rL247420, http://reviews.llvm.org/rL247422.
Kostya Serebryany	fab4fba	2015-08-11 01:53:45 +0000	[diff] [blame]	817
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	818	.. _pcre2: http://www.pcre.org/
Kostya Serebryany	7967738	2015-03-31 21:39:38 +0000	[diff] [blame]	819	.. _AFL: http://lcamtuf.coredump.cx/afl/
Alexey Samsonov	675e539	2015-04-27 22:50:06 +0000	[diff] [blame]	820	.. _SanitizerCoverage: http://clang.llvm.org/docs/SanitizerCoverage.html
Kostya Serebryany	b17e298	2015-07-31 21:48:10 +0000	[diff] [blame]	821	.. _SanitizerCoverageTraceDataFlow: http://clang.llvm.org/docs/SanitizerCoverage.html#tracing-data-flow
				822	.. _DataFlowSanitizer: http://clang.llvm.org/docs/DataFlowSanitizer.html
Kostya Serebryany	9e1a238	2016-03-29 23:07:36 +0000	[diff] [blame]	823	.. _AddressSanitizer: http://clang.llvm.org/docs/AddressSanitizer.html
Kostya Serebryany	2fe9304	2016-04-29 18:49:55 +0000	[diff] [blame]	824	.. _LeakSanitizer: http://clang.llvm.org/docs/LeakSanitizer.html
Kostya Serebryany	5e593a4	2015-04-08 06:16:11 +0000	[diff] [blame]	825	.. _Heartbleed: http://en.wikipedia.org/wiki/Heartbleed
Kostya Serebryany	926b9bd	2015-05-22 22:43:05 +0000	[diff] [blame]	826	.. _FuzzerInterface.h: https://github.com/llvm-mirror/llvm/blob/master/lib/Fuzzer/FuzzerInterface.h
Kostya Serebryany	7456af5	2016-04-28 15:19:05 +0000	[diff] [blame]	827	.. _3.7.0: http://llvm.org/releases/3.7.0/docs/LibFuzzer.html
				828	.. _building Clang from trunk: http://clang.llvm.org/get_started.html
				829	.. _MemorySanitizer: http://clang.llvm.org/docs/MemorySanitizer.html
				830	.. _UndefinedBehaviorSanitizer: http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
				831	.. _`coverage counters`: http://clang.llvm.org/docs/SanitizerCoverage.html#coverage-counters
				832	.. _`caller-callee pairs`: http://clang.llvm.org/docs/SanitizerCoverage.html#caller-callee-coverage
				833	.. _BoringSSL: https://boringssl.googlesource.com/boringssl/
				834	.. _`fuzz various parts of LLVM itself`: `Fuzzing components of LLVM`_