forked from digital-preservation/droid
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Using sigtool.txt
236 lines (203 loc) · 14.6 KB
/
Using sigtool.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
# sigtool
(Note: this text file is in Markdown format - if you have a markdown reader this can be rendered as a document).
sigtool can:
* convert signatures into signature XML
* convert signatures between binary and container syntax
* convert signature XML files into new signature XML files using the new syntax.
* summarise signature XML files into a tab-delimited signature summary using the new syntax.
* test binary or container signatures to see if they work on files or a folder.
## Usage
To use sigtool:
```
sigtool [Options] {expressions}
```
### Options
The options control how signatures are processed, or print help.
| Short | Long | Description |
|-------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| -h | --help | Prints sigtool help. |
| -d | --droid | Compile signatures for DROID. This can mean longer sequences to search for, which is usually faster. (Default) |
| -p | --pronom | Compile signatures for PRONOM. PRONOM only allows bytes in the main search sequences. |
| -b | --binary | Render expressions as closely as possible to binary signature format. This attempts to make signatures compatible with older versions of DROID. |
| -c | --container | Render expressions using the full container signature syntax. This is more powerful and readable, but is not compatible with versions of DROID that don't support container signatures. (Default) |
| -s | --space | Render spaces between elements for greater readability. Older versions of DROID don't support whitespace. |
| -x | --xml | Output an xml representation of an expression, or a transformed signature file using the new syntax. (Default) |
| -e | --expression | Output a tab delimited format containing PRONOM syntax compiled from another expression or signature file. |
| -f | --file | Specify a signature file to process. The next argument is the filename of the signature file. Both binary and container signatures files can be specified. |
| -a | --anchor | Specify whether an expression is anchored to BOFoffset, EOFoffset or Variable. For example: "--anchor bofoffset" |
| -n | --notabs | Don't output tab delimited metadata along with a compiled expression - just output the result of compiling on its own. |
| -m | --match | Match the file or files against the expressions. The next argument is the file or folder to scan. |
| -i | --internal | Specify an internal file path to run a signature for container signature matching. The next argument is the file path inside the container. |
### Expressions
Expressions are PRONOM syntax regular expressions we want to convert. For example, two expressions are given in the command below:
```
sigtool "01 02 03 (B1 B2 | C1 C2)" "'start:'(22|27)[01:2F]"
```
These aren't required if we're processing a signature file (the -f option).
## Convert signatures into signature file XML
To convert a PRONOM regular expression signature into its signature file XML, just run sigtool with one or more expressions:
```
sigtool "01 02 03 (B1 B2 | C1 C2)" "'start:'(22|27)[01:2F]"
```
This gives a tab delimited output consisting of each expression, a tab, and the signature XML for the expression:
```
01 02 03 (B1 B2 | C1 C2) <ByteSequence Reference="BOFoffset"><SubSequence Position="1" SubSeqMaxOffset="0" SubSeqMinOffset="0"><Sequence>010203</Sequence><RightFragment MaxOffset="0" MinOffset="0" Position="1">B1B2</RightFragment><RightFragment MaxOffset="0" MinOffset="0" Position="1">C1C2</RightFragment></SubSequence></ByteSequence>
'start:'(22|27)[01:2F] <ByteSequence Reference="BOFoffset"><SubSequence Position="1" SubSeqMaxOffset="0" SubSeqMinOffset="0"><Sequence>'start:'[2227][01:'/']</Sequence></SubSequence></ByteSequence>
```
Note that the signatures are using container syntax (e.g. they use strings and multi-byte sets). If we wanted a more backwards compatible binary signature, we can specify the --binary option:
```
sigtool --binary "01 02 03 (B1 B2 | C1 C2)" "'start:'(22|27)[01:2F]"
```
This gives a more backwards compatible output, where strings are represented as hex byte sequences:
```
01 02 03 (B1 B2 | C1 C2) <ByteSequence Reference="BOFoffset"><SubSequence Position="1" SubSeqMaxOffset="0" SubSeqMinOffset="0"><Sequence>010203</Sequence><RightFragment MaxOffset="0" MinOffset="0" Position="1">B1B2</RightFragment><RightFragment MaxOffset="0" MinOffset="0" Position="1">C1C2</RightFragment></SubSequence></ByteSequence>
'start:'(22|27)[01:2F] <ByteSequence Reference="BOFoffset"><SubSequence Position="1" SubSeqMaxOffset="0" SubSeqMinOffset="0"><Sequence>73746172743A(22|27)[01:2F]</Sequence></SubSequence></ByteSequence>
```
## Convert between signature syntax
Signatures can use different syntax depending on whether they're a traditional binary signature, or a container signature. You can convert between these formats using the --expression option. For example, running:
```
sigtool --expression --binary "'Microsoft Word'"
```
Gives the tab delimited output of the original expression and its conversion to binary signature syntax:
```
'Microsoft Word' 4D6963726F736F667420576F7264
```
Conversely, if we ran:
```
sigtool --expression --container "65617369657220746F207265616420696E20636F6E7461696E65722073796E746178" --notabs
```
We would get this:
```
'easier to read in container syntax'
```
## Transform a signature file to use PRONOM syntax
You can read an existing binary or container signature file, and output a new signature file that strips out all the old XML objects under the ByteSequence element, and replaces it with a compiled PRONOM syntax expression in the "Sequence" attribute of each ByteSequence.
For example, if you ran:
```
sigtool --file "DROID_SignatureFile_V114.xml"
```
It would write a new XML file to the console. You can pipe this into a new file using your shell. The signature file will be transformed to use the new syntax, e.g.:
```
...
<InternalSignature ID="18" Specificity="Specific">
<ByteSequence Reference="BOFoffset" Sequence="'GIF87a'"/>
<ByteSequence Reference="EOFoffset" Sequence="3B{0-4}"/>
</InternalSignature>
<InternalSignature ID="20" Specificity="Specific">
<ByteSequence Reference="BOFoffset" Sequence="'%PDF-1.4'"/>
<ByteSequence Reference="EOFoffset" Sequence="'%%EOF'{0-1024}"/>
</InternalSignature>
...
```
You can see that it's transformed a binary signature file using container syntax (the default). If we wanted to use the older binary syntax for this transformation, we could write:
```
sigtool --binary --file "DROID_SignatureFile_V114.xml"
```
It would give this output (snippet):
```
...
<InternalSignature ID="18" Specificity="Specific">
<ByteSequence Reference="BOFoffset" Sequence="474946383761"/>
<ByteSequence Reference="EOFoffset" Sequence="3B{0-4}"/>
</InternalSignature>
<InternalSignature ID="20" Specificity="Specific">
<ByteSequence Reference="BOFoffset" Sequence="255044462D312E34"/>
<ByteSequence Reference="EOFoffset" Sequence="2525454F46{0-1024}"/>
</InternalSignature>
...
```
## Summarise all signatures in a syntax file.
You can obtain a summary of all signatures in a signature file into a tab-delimited format, by specifying the --expression option. For example, running:
```
sigtool --expression --file "DROID_SignatureFile_V114.xml"
```
Would give the following output (snippet):
|Version |Sig ID |Reference |Sequence
|-----------|--------|----------|---------------
|91 |18 |EOFoffset |3B{0-4}
|91 |20 |BOFoffset |'%PDF-1.4'
|91 |20 |EOFoffset |'%%EOF'{0-1024}
|91 |21 |BOFoffset |'%PDF-1.6'
|91 |21 |EOFoffset |'%%EOF'{0-1024}
|91 |22 |BOFoffset |'%PDF-1.5'
|91 |22 |EOFoffset |'%%EOF'{0-1024}
|91 |23 |BOFoffset |'%PDF-1.3'
|91 |23 |EOFoffset |'%%EOF'{0-1024}
You can also specify --binary signature syntax or use --container syntax (the default). This also works for container files, where you get a little more metadata:
```
sigtool --expression --file "container-signature-20230822.xml"
```
Gives the following output (snippet):
| Description | Sig ID | Container File | Internal Sig ID | Reference | Sequence
|------------------------------------|--------|-----------------------|-----------------|-------------|----------------------------------------------------------------------------------------------------------
|Microsoft Word 6.0/95 OLE2 | 1000 | CompObj | 306 | BOFoffset | {40-1024}10000000'Word.Document.'\['6':'7']00
|Microsoft Word 97 OLE2 | 1020 | CompObj | 300 | BOFoffset | {40-1024}10000000'Word.Document.8'00
|Microsoft Word OOXML | 1030 | \[Content_Types].xml | 302 | BOFoffset | {0-32768}'ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"'
|Microsoft Word Macro enabled OOXML | 1060 | \[Content_Types].xml | 1060 | Variable | {0-4096}'ContentType="application/vnd.ms-word.document.macroEnabled.main+xml"'
|Microsoft Word OOXML Template | 1070 | \[Content_Types].xml | 1070 | Variable | {0-4096}'ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml"'
|Microsoft Word 97 OLE2 Template | 1100 | CompObj | 1100 | BOFoffset | {40-1024}10000000'Word.Document.8'00
|Microsoft Word 97 OLE2 Template | 1100 | WordDocument | 1100 | BOFoffset | {10}\[&01]
|Microsoft Excel OOXML | 2030 | \[Content_Types].xml | 317 | BOFoffset | {0-40000}'ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"'
## Test binary signatures
You can use sigtool to try out a signature on a file or a folder directly, using the --match option.
For example, if you write:
```
sigtool --match "/home/user/Documents/somefile.xyz" "'SOMEFILEHEADER'"
```
You will get a tab delimited output like the following:
| Expressions: | 'SOMEFILEHEADER'
|------------------------------------|------------------
| File | Hits
| /home/user/Documents/somefile.xyz | 0
In this example we didn't get a match. Note that, by default, sigtool assumes that signatures are anchored to the beginning of the file. So this expression would only match if it is literally the first bytes in the file.
If you want to search the entire file for the sequence, you have to specify a "variable" anchor, as follows:
```
sigtool --anchor Variable --match "/home/user/Documents/somefile.xyz" "'SOMEFILEHEADER'"
```
which gives:
|Expressions: | 'SOMEFILEHEADER' |
|------------------------------------|------------------
|File | Hits
|/home/user/Documents/somefile.xyz | 1
If you want to scan more than one file, you can specify a folder instead of a file.
sigtool will then scan all the immediate child files of that folder, although it
won't process sub-folders currently. For example:
```
sigtool --match "/home/user/Documents/" "'SOMEFILEHEADER'"
```
which might give:
|Expressions: | 'SOMEFILEHEADER'
|------------------------------------|------------------
|File | Hits
|/home/user/Documents/somefile.xyz | 1
|/home/user/Documents/another.txt | 0
|/home/user/Documents/more.doc | 0
|/home/user/Documents/example.png | 1
Finally, if you want to test more than one expression at a time against a file or folder, you can just add more expressions as arguments. For example:
```
sigtool --match "/home/user/Documents/" "'SOMEFILEHEADER'" "'Another thing'" "01 02 03 (04|05|06) 'complex'"
```
would add all the different expressions as columns against each file.
|Expressions: | 'SOMEFILEHEADER' | 'Another thing' | 01 02 03 \(04\|05\|06) 'complex'
|------------------------------------|------------------|-----------------|-------------------------------
|File | Hits | Hits | Hits
|/home/user/Documents/somefile.xyz | 1 | 0 | 0
|/home/user/Documents/another.txt | 0 | 1 | 0
|/home/user/Documents/more.doc | 0 | 0 | 1
|/home/user/Documents/example.png | 1 | 0 | 0
## Test container signatures
You can also test container signatures directly from sigTool. It is almost the same as matching binary signatures, except:
* You can only match one signature at a time (so you can't get multi-column output as above).
* You must specify the path to the internal file to which the binary signature will be applied.
For example:
```
sigTool --match "/home/user/Documents/" --internal "META-INF/manifest.xml" "{0-1024}'manifest:media-type=\"application/vnd.oasis.opendocument.text'"
```
Could give an output like this:
|Expressions: | {0-1024}'manifest:media-type=\"application/vnd.oasis.opendocument.text'
|------------------------------------|------------------
|File | Hits
|/home/user/Documents/somefile.zip | 0
|/home/user/Documents/another.odt | 1
|/home/user/Documents/more.doc | 0
|/home/user/Documents/example.png | 0