GitHub

Tools for fuzz testing PostgreSQL's COPY FROM parser

copytester: A utility to run a corpus of inputs against a running server
hongfuzz-pg.patch: patch to allow using honggfuzz fuzzer to test COPY FROM
gencopyfuzz
prepared corpus of different inputs, generated manually and by honggfuzz

Using copytester

Usage:

copytester <inputdir> <connstring>

copytester connects to a running PostgreSQL server, and issues a COPY FROM command to load each file in to a temporary table. It prints out any errors, and the final contents of the table. This is useful for comparing behavior of two PostgreSQL versions, or a patched server against unpatched one.

There are a bunch of input files included in the 'corpus' directory, and you can use the included 'gencopyfuzz' program or honggfuzz to generate more.

For example, you can run copytester against two servers and check if the produce the same result:

copytester corpus "dbname=postgres port=5432" > results-A.txt
copytester corpus "dbname=postgres port=5433" > results-B.txt
diff -u results-A.txt results-B.txt

Using gencopyfuzz

make
./gencopyfuzz corpus

This generates files named 'gencopyfuzz-[00000-77777]' in the corpus directory.

Using honggfuzz

Apply honggfuzz-pg.patch to PostgreSQL sources:

cd patch -p1 < honggfuzz-pg.patch
Create a test cluster following the intructions in startfuzz.sh
Run hongfuzz:

./startfuzz.sh

The 'corpus' directory in the git repository contains test inputs that were generated with this method. If you just want to run the existing tests against a running server, you don't need to run honggfuzz yourself.

Corpus

The 'corpus' directory contains test input files for COPY FROM. a few of them were created by hand, the rest were generated by honggfuzz. Run 'gencopyfuzz corpus' to generate another set of inputs.

The existing corpus was generated with UTF-8 as the client and server encoding. To test other encodings and encoding conversions, you may want to edit the dictionary in gencopyfuzz.c, and also run honggfuzz yourself with different settings.

Tips

By default, copytester sends the input file to the server one byte at a time. That's highly inefficient, but useful for finding bugs in the server's handling of look-ahead and buffer boundaries. You can adjust the RAW_BUF_SIZE constant if you don't want that.

Similarly, it can be very useful to reduce the server's input buffer size, by changing the RAW_BUF_SIZE constant in src/include/commands/copyfromparse_internal.h in the PostgreSQL source tree.

Most of the corpus has been generated by fuzzing with UTF-8. But it might still be useful to run it with other encodings. Many cases will fail with invalid encoding errors, but some inputs happen to be valid in other encodings, too.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
corpus		corpus
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
copytester.c		copytester.c
fuzz-dict.txt		fuzz-dict.txt
gencopyfuzz.c		gencopyfuzz.c
honggfuzz-pg.patch		honggfuzz-pg.patch
startfuzz.sh		startfuzz.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tools for fuzz testing PostgreSQL's COPY FROM parser

Using copytester

Using gencopyfuzz

Using honggfuzz

Corpus

Tips

About

Releases

Packages

Languages

License

hlinnaka/pgcopyfuzz

Folders and files

Latest commit

History

Repository files navigation

Tools for fuzz testing PostgreSQL's COPY FROM parser

Using copytester

Using gencopyfuzz

Using honggfuzz

Corpus

Tips

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages