-
copytester: A utility to run a corpus of inputs against a running server
-
hongfuzz-pg.patch: patch to allow using honggfuzz fuzzer to test COPY FROM
-
gencopyfuzz
-
prepared corpus of different inputs, generated manually and by honggfuzz
Usage:
copytester <inputdir> <connstring>
copytester connects to a running PostgreSQL server, and issues a COPY FROM command to load each file in to a temporary table. It prints out any errors, and the final contents of the table. This is useful for comparing behavior of two PostgreSQL versions, or a patched server against unpatched one.
There are a bunch of input files included in the 'corpus' directory, and you can use the included 'gencopyfuzz' program or honggfuzz to generate more.
For example, you can run copytester against two servers and check if the produce the same result:
copytester corpus "dbname=postgres port=5432" > results-A.txt
copytester corpus "dbname=postgres port=5433" > results-B.txt
diff -u results-A.txt results-B.txt
- make
- ./gencopyfuzz corpus
This generates files named 'gencopyfuzz-[00000-77777]' in the corpus directory.
-
Apply honggfuzz-pg.patch to PostgreSQL sources:
cd patch -p1 < honggfuzz-pg.patch
-
Create a test cluster following the intructions in startfuzz.sh
-
Run hongfuzz:
./startfuzz.sh
The 'corpus' directory in the git repository contains test inputs that were generated with this method. If you just want to run the existing tests against a running server, you don't need to run honggfuzz yourself.
The 'corpus' directory contains test input files for COPY FROM. a few of them were created by hand, the rest were generated by honggfuzz. Run 'gencopyfuzz corpus' to generate another set of inputs.
The existing corpus was generated with UTF-8 as the client and server encoding. To test other encodings and encoding conversions, you may want to edit the dictionary in gencopyfuzz.c, and also run honggfuzz yourself with different settings.
By default, copytester sends the input file to the server one byte at a time. That's highly inefficient, but useful for finding bugs in the server's handling of look-ahead and buffer boundaries. You can adjust the RAW_BUF_SIZE constant if you don't want that.
Similarly, it can be very useful to reduce the server's input buffer size, by changing the RAW_BUF_SIZE constant in src/include/commands/copyfromparse_internal.h in the PostgreSQL source tree.
Most of the corpus has been generated by fuzzing with UTF-8. But it might still be useful to run it with other encodings. Many cases will fail with invalid encoding errors, but some inputs happen to be valid in other encodings, too.