-
Notifications
You must be signed in to change notification settings - Fork 18
Signature File Format
Binwalk's signature file format is based on the libmagic file format and is mostly compatible with signatures created for the UNIX file utility. This makes creating, customizing and sharing signatures very easy.
To understand the basic format of a signature, let's create a new signature for a fictitious firmware header. The header structure is:
struct header
{
char magic[4]; //Magic bytes are: 'SIG0'
char description[12];
int32_t header_size;
int32_t image_size;
int32_t creation_date;
};
The resulting magic signature for this header format looks like:
0 string SIG0 SIG0 firmware header,
>4 string x description: "%s",
>16 lelong x header size: %d,
>20 lelong x size: %d,
>24 ledate x date: %s
Most practical signatures are not much more complex than this.
There are four columns for each line:
- The first column is the data offset.
- The second column is the data type.
- The third column is the condition field (
x
is a wildcard matching anything). - The fourth column is the optional text and data formatting to display.
The first line of any signature contains the actual "magic bytes" which uniquely identify that signature (the string SIG0
in the above example).
Note the use of the indent level character (>
) on all except the first line.
All comments begin with the pound sign #
.
Binwalk signature files support the following data types:
Data Type | Description |
---|---|
byte | 1-byte (8-bit) integer |
short | 2-byte integer |
long | 4-byte integer |
quad | 8-byte integer |
date | 4-byte UNIX date field |
sdate | 14-byte string date field |
string | Arbitrary sequence of bytes |
regex | A regular expression to be matched |
All integer data types (byte
, short
, long
, quad
, date
) support the following endianness prefixes:
Prefix | Example | Description |
---|---|---|
be | belong | Big endian |
le | lelong | Little endian |
If no endianness prefix is provided, big endian is assumed. Best practice dictates that the endianness should be explicitly specified.
All integer data types (byte
, short
, long
, quad
, date
) also support the following signedness prefixes:
Prefix | Example | Description |
---|---|---|
u | ubelong | Unsigned |
If no signedness prefix is specified, the value is assumed to be signed.
The following comparison operators are supported when evaluating the condition field:
Condition | Example | Description |
---|---|---|
= | =0x1234 | True if the value from the file equals the specified value |
! | !0x1234 | True if the value from the file does not equal the specified value |
< | <0x1234 | True if the value from the file is less than the specified value |
> | >0x1234 | True if the value from the file is greater than the specified value |
& | &0x1234 | True if the value from the file is not zero when ANDed with the specified value |
^ | ^0x1234 | True if the value from the file is not zero when XORed with the specified value |
If no condition is specified, =
is assumed:
# SIG0 firmware signature
0 string SIG0 SIG0 firmware header,
>4 string x description: "%s",
>16 lelong x header size: %d,
>20 lelong >0 size: %d, # This line is only processed if the value at offset 20 is greater than 0
>24 ledate x date: %s
The greater than sign >
at the beginning of a signature line indicates that line's indentation level. Lines with higher indentation levels (more >
characters) are only processed if the comparison from the preceding line evaluated to True. This allows the creation of basic conditional if
statements inside the signature:
# SIG0 firmware signature
0 string SIG0 SIG0 firmware header,
>4 byte !0
>>4 string x description: "%s", # This line is only processed if the byte at offset 4 is not 0
>16 lelong x header size: %d,
>20 lelong x size: %d,
>24 ledate x date: %s
Various arithmetic expressions can be applied to both the offset or data type fields:
Expression | Example | Description |
---|---|---|
& | belong&0xFF | Bitwise AND |
| | belong|0xFF | Bitwise OR |
^ | belong^0xFF | Bitwise XOR |
<< | belong<<4 | Logical left shift |
>> | belong>>4 | Logical right shift |
** | belong**4 | Exponent |
+ | belong+4 | Addition |
- | belong-4 | Subtraction |
* | belong*4 | Multiplication |
/ | belong/4 | Division |
A simple practical example of this is the BSD 2.x filesystem
, which specifies its size in kilobytes; thus, to display the size in bytes, the size field must be multiplied by 1024
before being displayed:
# BSD 2.x file system image; used in RetroBSD for PIC32
0 string FS\x3C\x3C BSD 2.x filesystem,
>8 lelong x size: %d kilobytes,
>8 lelong*1024 x size: %d bytes,
For file types with variable size fields, values inside the file itself may be used to specify the offset:
Syntax | Description |
---|---|
(4.l) | The offset is a little-endian long value, located 4 bytes into the file |
(4.L) | The offset is a big-endian long value, located 4 bytes into the file |
(4.s) | The offset is a little-endian short value, located 4 bytes into the file |
(4.S) | The offset is a big-endian short value, located 4 bytes into the file |
(4.b) | The offset is a single byte value, located 4 bytes into the file |
(4.B) | The offset is a single byte value, located 4 bytes into the file |
A simple practical example are Microsoft PE files, which contain a 4-byte little-endian pointer to the PE header at offset 60
:
0 string MZ Microsoft
>(60.l) string PE portable executable # The PE header starts with "PE"
>(60.l) string !PE MS-DOS executable # If no PE header, it must be MS-DOS
Arithmetic operators can also be applied to indirect offsets:
0 string MZ Microsoft
>(60.l) string PE portable executable
>(60.l) string !PE MS-DOS executable
>(60.l+4) lelong x 0x%X # Print out the four byte value at PE header + 4
Binwalk supports the use of special "tags" which give signatures additional control over the scan process.
All tags are enclosed in braces {}
. Tag keywords which require arguments should be followed by a colon :
and the required argument. Tag arguments can be hardcoded values (e.g., {size:14}
), or format strings (e.g., {size:%d}
).
Currently supported tags are:
Keyword | Argument Type | Description |
---|---|---|
adjust | int | Adjust the reported signature offset by n number of bytes |
invalid | N/A | Marks a signature as invalid |
jump | int | Tells binwalk to jump to the specified offset and resume scanning |
many | N/A | Tells binwalk to only display the first hit to a signature, even if the hits do not directly follow eachother. |
name | str | Specifies the name of the file (used during extraction) |
location | int | Specifies an expected offset where the signature should be found in a file |
size | int | Specifies the size of the file (used during extraction) |
string | N/A | Truncates any strings on the current line to strlen bytes |
strlen | int | Specifies the size of a string (used with string ) |
The most common tag is invalid
, which can be used to build false positive detection directly into any magic signature:
0 string SIG0 SIG0 firmware header,
>4 string x description: "%s",
>16 lelong x header size: %d,
>20 lelong <1 {invalid} # Firmware size shouldn't be 0 bytes or less
>20 lelong x size: %d,
>24 ledate x date: %s
All other tags are ignored if an invalid tag is encountered while applying a signature to a block of data.
Tags that take no arguments, such as invalid
, are fairly straight forward to use. However, tags that accept arguments are quite easy as well. The most commonly used tags that require arguments are size
and jump
, which respectively allow you to specify the size of the data to extract from a file, and a relative offset that binwalk should continue scanning from.
For example, let's say we have a file system named FooBar
, and in the FooBar
file system header there is a size field that says how big the file system is:
0 string FooBar\x00\x00 FooBar filesystem,
>8 lelong x size: %d
During extraction, we would only want to extract size
bytes of data from the input file. We can tell binwalk about the size field using the size
tag:
0 string FooBar\x00\x00 FooBar filesystem,
>8 lelong x size: %d
>8 lelong x {size:%d}
Likewise, we probably don't want binwalk to waste its time scanning all the data inside the FooBar
file system, so we can tell binwalk to jump ahead by size
bytes:
0 string FooBar\x00\x00 FooBar filesystem,
>8 lelong x size: %d
>8 lelong x {size:%d}
>8 lelong x {jump:%d}
The size
and jump
tags will not be displayed to the end user, but only used internally by binwalk.
The strlen
and string
tags are used when the length of a not-NULL-terminated string is stored separately from the string itself.
A simple practical example are ZIP archive headers, which include the names of the archived files:
>0 string PK ZIP archive,
>26 leshort x length of file name: %d bytes,
>26 leshort x {strlen:%d} # The strlen tag must come before the string tag
>30 string x file name: {string}%s # Only strlen bytes of this string will be printed
sudo python setup.py install