Generate initial seed files using Large Language Models (LLMs). Compatible with many state-of-the-art fuzzers such as AFL++ and libFuzzer. The core functionality involves processing the source code of a fuzzing test, including the underlying functions it calls, to generate initial seed files.
- OpenAI models (GPT-3.5 Turbo, GPT-4, etc)
- CodeGen models
- StarCoder(Base/Plus)
- CodeT5+ models
- CodeGen2.5 models
- Other causal/seq2seq models supported by the Transformers library might work
Below are examples of commands to generate initial seed files for fuzzing in Go. For a full list of options, use the --help flag.
make goparser
cd <source_folder>
../seedai.py -p ../bin/goparser -c ../configs/temp_0.6.json -pt ../pt_configs/go/code_only/code.json -m Salesforce/codegen-16B-multi
Add --device-map=cpu
to run on CPU.
Note that the number of simultaneous model executions is equal to max(n, num_beams)
.
OPENAI_API_KEY=<key> ../seedai.py -p ../bin/goparser -c ../configs/top_p_0.75.json -pt ../pt_configs/go/code_multi.json -m gpt-4 -l 8192
{
"do_sample": false, # Ignored by OpenAI
"temperature": 1.0, # Default = 1.0, ignored if do_sample is false
"top_p": 1.0, # Default = 1.0, ignored if do_sample is false
"diversity_penalty": 2.0, # Ignored by OpenAI, requires group beam search
"repetition_penalty": 2.0, # frequency_penalty for OpenAI
"presence_penalty": 2.0, # Ignored by HuggingFace
"num_beams": 10, # Ignored by OpenAI, default = 1 (no beam search)
"num_beam_groups": 10 # Ignored by OpenAI, default = 1 (no group beam search)
}
{
"prefix": "You are a code completer.\n",
"suffix": "\n```\nfunc Test<count>Bugs() {\n\tinputs := []string{",
"stop": "}", # Optional stop token (next to EOS token)
"multi_vals": true, # Extract multiple values per line
"code_only": false # Set to true for code-only models
}