- Adds the
ripemd160
andripemd128
units. - Adds the
xtw
unit for extracting cryptocurrency wallet addresses. - Adds the
iemap
unit to display a colored entropy heatmap. - Introduces new syntax to the
struct
unit for handling byte alignment. - The
rsakey
unit supports a new option to output the public key portion of a private key. - The
pemeta
unit now computes the size of the PE file based on header information. - Several switches for comparison operators were added to the
iff
unit.
- Thanks to @baderj, the unit
xlmdeobf
was added which wraps the extremely useful XLMMacroDeobfuscator tool for extracting and deobfuscating Excel V4 macros. - Adds the
carve-7z
unit for carving 7zip archives from blobs.
- Renames the
blockop
unit toalu
. - Removes the shortcut unit
carveb64z
. - Renames a number of command-line switches for
carve
,xtp
, and other pattern extraction units. - Adds a default argument to
resub
that makes it strip whitespace from the input by default.
Improves performance by replacing an import of pkg_resources
with equivalent functionality from importlib
. On a test machine, this removes between 250 and 500 milliseconds from the execution time of any single unit.
Changes the format for the binary formatter used in struct
, rex
, resub
, and cfmt
. It now uses a reverse multibin handler instead of parsing the modifier like a command-line pipeline.
- Adds the
lzo
unit
- The
winreg
unit is now able to extract data from Windows registry editor exports (i.e..reg
files). - The key derivation units
pbkdf2
andpbkdf1
use a more forgiving decoder to better cover theRfc2898DeriveBytes
class, which offers a call signature that receives an arbitrary byte string as password. - The
string
regular expression pattern now excludes literal line breaks within the string.
- Base64 regular expression patterns were improved to account for correct character counts.
- The
dexstr
unit was added. - The
index
meta variable is now automatically populated within frames. - The
n40
string decryption unit was added. - The
xtpyi
unit now extracts Python disassembly when decompilation fails. - The
lzma
unit now correctly decompresses output produced by PyLZMA.
- The
doctxt
unit was added; courtesy of @baderj
- Adds the
serpent
unit.
- Adds the
xtpdf
unit for extracting embedded objects from PDF documents. - The
accu:
handler now supports pre-configured finite state machines for well-knownrand()
implementations.
- The
officectypt
unit now supports the Excel default passwordVelvetSweatshop
. - The
ci
property has been removed from the output ofpeek --meta
. - The following units were added:
xj0
,evtx
- The
hexdmp
unit was renamed once more tohexload
, and its pattern matching was improved. - The
asm
unit was completely redesigned using an Angr-based fallback to produce better disassembly. - The
pcap-http
unit now extracts the URL from whence the data was downloaded. - The
rep
unit received some performance improvements. - The refinery dependencies were cleaned up considerably.
- Blockwise operations no longer require numpy to be reasonably fast by implementing a dynamic inlining step.
- Adds the
cswap
unit. - The index counter of
blockop
now starts at zero. - An option was added to the
swap
unit to swap the contents of two meta variables. This can also be used to rename a meta variable. - An option was added to
xtpyi
to unpack, but not decompile the contents of a PYZ. - Adds the
--bare
option toesc
and uses it inpeek
. - Adds the
--meta
option toef
. Theef
unit now also descends into dot-directories and lists dot-files. - The
__init__.pkl
file containing the unit lookup cache was moved into the distribution.
- Adds the
xtvba
unit to extract Office document macros. - Adds the
pcap
unit to extract TCP streams from packet capture files. - Adds the
xthtml
unit to extract components of HTML documents. - The
htm
unit has been renamed tohtmlesc
. - The default sort order of
sorted
has been changed to descending. - The
pemeta
andpkcs7
units now also extract certificate thumbprints.
- Fixes an issue with applying
ppjscript
to obfuscated JavaScript files. - Adds Murmur Hash units
- Adds
xtpyi
unit to extract PyInstaller-packed archives. - Logging now uses the Python
logging
module.
- Significantly improves unit loading time which had regressed due to the changes in 0.4.0.
This release removes the setup-venv
helper scripts and instead uses a slightly less ugly hack to resolve dependencies before running the refinery setup by declaring every dependency a build dependency in pyproject.toml
. Any kind of installation should work seamlessly through pip
.
Updates build system.
- Fixes critical bug in deployment.
- Adds the
msgpack
unit. - Adds the
cull
unit and changes the behaviour of conditional units to make filtered chunks invisible instead of removing them. Conditional units have been renamed toiff
,iffs
,iffx
, andiifp
.
- Adds the
xfcc
unit, which replaces theintersection
unit. - The
cm
unit can now be used to remove meta variables. - JSON dumps no longer use hex encoding for big integers as JSON has no size limit on integer expressions.
- The
struct
unit was significantly redesigned and thelprefix
unit removed because it can now be trivially implemented withstruct
. - The
ifexpr
unit has been renamed toiff
and theiffp
unit was added. - The field names in
dnfields
have been altered to more closely resemble file names. - Adds a list of default passwords to archive units.
- Renames the
fread
unit toef
. - Metadata / Format string expression parsing is now more flexible.
- Adds the
intersection
unit.
- Adds the
xtjson
andxtxml
units for extracting data from JSON and XML files. - Slight redesigns of
lprefix
,peek
,xtmail
, andcfmt
. - Refinery now has (very weak) support for PowerShell.
- Adds the
--tabular
option toppjson
to produce a flattened jason output. - Changes to the in-code pipe syntax:
data | unit | unit
is an iterable over output chunksdata | unit | unit | callable
invokescallable
with a bytearray containign all concatenated chunks- connected pipelines (
data | unit | ... | unit
) can be passed tostr
andbytes
- Path extraction units (like
fread
,xtzip
) offer better control over the path variable. - Variable merging was added to the
pop
unit. - The
cm
unit only populatessize
andindex
by default, never performing a full scan unless explicitly requested.
- Meta variables are now allowed in
struct
formats, andstruct
assumes no alignment by default. - The
pemeta
unit now has support for RICH header data. - The
rsakey
unit was added. - The
pop
unit was extended by an option to discard chunks. - Several new archive extractors are now available:
xt7z
,xtace
,xtiso
, andxtcpio
. - The
xlxtr
unit was refactored and generates more metadata. - The
sorted
unit can sort by metadata variables now. - The
swap
unit can now swap with an empty variable, which will empty the chunk body.
- The
trivia
unit was renamed tocm
for "common meta". - The
pemeta
unit can now display PE header information, .NET header flags, and supports a table view instead of the JSON output. - Python expressions all across multibin arguments no longer restrict the operators that can be used.
- The domain regular expression was updated with new TLDs and the artificial TDLs
.coin
and.bazar
. - The
terminate
unit was added. - The
struct
unit was added.
- Adds the
ifexpr
andifstr
units for filtering framed data. - The
pemeta
unit now also extracts theEntryPointToken
field from the .NET header.
- The
hexview
unit was removed, instead thehexdmp
unit was created. By default, this unit converts hexdumps back to binary, the previous functionality ofhexview
is now available as the reverse operation ofhexdmp
. - Adds the
dnblob
unit. - The
drp
unit underwent major refactoring with the goal to improve both speed and quality of results. Two options were added to help control these new settings.
- Adds the
xtrtf
unit to extract embedded objects from RTF documents. - Adds the
officecrypt
unit to decode password-protected Office documents. - Improves PKCS7 parsing and fixes some cases where
pemeta
did not display the details of the digital signature. - Adds brieflz support to the universal
decompress
unit.
- Unification of (nearly) all multibin handlers. Only the
yara:
andescape:
handlers remain to regular expression type arguments. - Adds the multibin handlers
accu
,reduce
,cycle
, andtake
. - Alters the
le
andbe
handlers to support both conversion from integer to byte string and vice versa. - Renames the
unpack
handler tobtoi
and adds thebtoi
handler which performs the inverse operation. - Command line switches for the
lprefix
unit changed. - Adds the global
--lenient
option which is now required to admit partial results as output.
- Adds the
blz
unit for BriefLZ compression and decompression.
- Adds the
xtdoc
unit which can extract more files from Office documents thanxtzip
. - Adds the
trivia
unit which can be used to attach certain meta variables. Moving forward, this will be the preferred way to access simple invariants of a binary chunk. For now, it can attach the integer variablessize
andindex
, containing the size of the data in bytes and the chunk index within the current frame, respectively. Theeval:
handler for numeric multibin values no longer accepts the special variableN
to represent the chunk size as this functionality can be recovered by preprocessing each chunk withtrivia
and using the variablesize
instead ofN
. - The
carve-pe
unit is now a path extractor unit (TL/DR: More command line options).
- Changes the interface for the frame squeeze mechanic
- Adds option to
pefile
to compute carve size based on virtual section sizes & offsets.
- Using hex escape sequences in the replacement string for
resub
now works as expected. - The
yara:
modifier for regular expression based units now accepts lowercase hex characters. - The
imphash
unit's performance was improved slightly. - Additional options for the
pecarve
unit. - Adds the
ppjscript
unit (wrapper around jsbeautifier). - The
vsnip
unit can now extract more than one memory region. - Adds a count restriction to the
resplit
andresub
units.
- The interface for cipher units has been changed; the encryption mode is no longer a mandatory argument. Better handling of various cipher block chaining modes has been implemented.
- Conservative option added to
peoverlay
andpestrip
.
- The
salsa
andchacha
cipher units now have pure Python implementations that allow you to specify the number of rounds. The PyCryptodome interfaces still exist, now as unitssalsa20
andchacha20
. - The
HMAC
unit was added to support simple HMAC based key derivation. - The
dump
unit stream mode has been adjusted so that it is possible to write consecutive data to a file inside a nested frame.
- The
cfmt
unit has been reworked to support more common modern Python format string syntax. - The output of
crc32
andadler32
checksum hashes has been altered to use the correct byte order. - The
rabbit
unit was added which implements the RABBIT stream cipher.
- The
mpush
,mpop
, andmput
units have been renamed to simplypush
,pop
, andput
. - The
autoxor
unit has been transformed into thedrp
unit, the behavior ofautoxor
can be achieved usingxor drp:copy:all
. - Data types of .NET fields are better detected by
dnfields
now, but a proper parser for type signatures is still missing.
- The
gz
unit was deprecated because thezl
unit covers its usecase (and does a better job at it). - The
lprefix
unit for parsing length-prefixed data was added. - Parsing of managed .NET string resources via the
dnmr
unit was fixed, these would previously be returned unparsed. - The
binpng
unit has been improved and renamed tostego
, a more flexible unit to extract data from images.
- The
peslice
,elfslice
, andpesect
units have been removed. - In their place, the cross-format units
vsnip
andvsect
can now be used to extract data from virtual addresses and sections of PE, ELF, and MachO files.
- adds
md2
andmd4
hashing algorithms - the
CryptDeriveKey
unit now also mirrors the API call for SHA2 based hashing algorithms - message type attachments in Outlook email formats are now supported by
xtmail
- The interface of the memory slicing units
peslice
andelfslice
has changed. - Python expression parser and numeric arguments have been refactored.
- Removes the
--install-option
capability introduced in 0.3.5, see pip/#8748 for more information. - The
xttar
unit was added. - The
lzma
unit can now return partial results for buffers with junk bytes at the end.
- The
ifrex
unit was added. - The
jvstr
unit was added. - A source distribution manifest was added to fix errors that occurred during source installs.
- Using
pip install --install-option=library binary-refinery
or aREFINERY_PREFIX
environment variable with value!
will now install the binary refinery without any command line scripts, only as a library.
- It is now possible to use local refinery units (i.e. a Python script in the current director which contains a refinery unit that is not abstract) for multibin prefixes and in any other situation where units are dynamically loaded.
- The
pesect
unit was added. - The
resub
andresplit
units no longer offer options that have no bearing on their behavior. - The
lz4
unit was added with a pure Python implementation of LZ4 decompression. - The
jvdasm
unit for disassembling Java class files was added.
- The
autoxor
unit was added. - The
cfmt
unit was added. - The License of Binary Refinery was changed to 3-Clause BSD.
- The
netbios
unit was added. - The
stretch
unit was added. - The
hc128
cipher unit was added. - The unit
dnrc
was split intodnrc
for extracting .NET resources anddnmr
for unpacking managed .NET resources. - Several units that extract items from container formats have received a unified interface. So far, this interface applies to
xtmail
,xtzip
,winreg
,dnfields
,dnrc
, anddnmr
. - When using named match groups for the
rex
unit, these matches are now forwarded as metadata within frames. - The
xtzip
unit was given an optional archive password parameter. - The
xtmail
unit can now extract headers in text and json format.
- Test coverage was increased
- The
recode
unit can now autodetect input encoding. - Several bugfixes were performed on the
vbe
unit. - More bandaids were added to PowerShell deobfuscation.
- The
pestrip
andpeoverlay
units were added. - Interface retrofitting was completed.
- Fixes a tiny bug in the PyPI display of the readme file, and completes changelog from previous version.
- The
rsa
unit was improved and can handle the Microsoft blob format now. - PowerShell deobfuscation was improved, but that doesn't change the fact that this would be much better with a proper parser.
- The
b32
for base32 encoding and decoding was added. - Preliminary support for meta variables has been added with the
mpush
,mpop
, andmput
units. This feature is experimental and not well documented yet. - The
--squeeze
/-Z
option was added to all units that produce multiple outputs: It disables the default separation of these outputs by line breaks. - Pattern extraction units such as
rex
will now preserve the order of extracted strings, even when the--longest
option is used. - The suggested
PATH
environment variabe modification from the Linux installer script was corrected; The previous variant would make the refinery virtual environment take precedence over the global python executables.
- The
dump
unit has been refactored to make it easier to use; Formatting of file names is done automatically now unless the flag-p
or--plain
is specified to prevent string formatting. - The
snip
unit can now remove bytes from the input. - The
dnfields
unit was added. - The
ppjson
unit can now minify json by specifying0
as the desired indentation width. - The
dsjava
unit was improved, although it remains a work in progress. - The
fread
unit received a linewise mode.
- After some incomplete attempts to improve backwards compatibility, the package now simply requires Python 3.7.
- Units can now be written with a Python
__init__
constructor and deduce the command line interface from this constructor. A decorator class was added to help enriching the parameter list of the constructor with information on how to translate these into command line parameters. The goal is to eventually retrofit all units to follow this standard. - The
pemeta
unit has more features now. - The
couple
unit was added; it is an adapter to turn any stdin/stdout based command line tool into a refinery unit. - The
carve-xml
unit was added. - The
dnstr
unit was added.
- All hashing prefixes for multibin expressions have been implemented as separate units, i.e.
sha256
andmd5
are now units that output the corresponding hash of the input data. - The
xtmail
unit was added which can extract the body and attachments of email documents, both Outlook and MIME formats. - The framed format was extended with rudimentary support for metadata in framed chunks. This is currently used by the
xtzip
andxtmail
units to attach aname
property to emitted chunks which contains the file name information from the parsed data. Thedump
unit now has a--meta
option to read thisname
property and use it as the file name for dumping. The--meta
options defaults to using the SHA256 hash of the data as the file name if no corresponding metadata is present. - The
pemeta
unit was added. - The
carve-json
unit was added. - The
peslice
andelfslice
units were given a unified interface. - The
b85
for base 85 encoding an decoding was added.
- Fixes a bug in the .NET header parser where the tables were sometimes parsed in the wrong order.
- The
xtzip
unit has been added, which can extract data from zip archives. - The
carve-zip
unit has been added. It can carve ZIP files from buffers, similar tocarve-pe
for PE files. - The
rsa
unit has finally been added. - The
rncrypt
unit has been added. - The
dncfx
unit has been added; it extracts the strings from ConfuserEx obfuscated .NET binaries. - Adds support for TrendMicro Clicktime URL guards in the
urlguards
unit.
- Several tests were added, testing now uses malshare to test units against real world samples. To properly execute tests, the environment variable
MALSHARE_API
needs to contain a valid malshare API key. - A
numpy
import that always occured during any unit load was moved into thepeek
unit code to reduce import time of other units. - Issues with wheel installation on Windows were fixed.
- It is now possible to instantiate units in code with arguments of type
bytes
and have it work as expected, i.e.xor(B's3cr3t')
will construct axor
unit that decrypts using the byte string keys3cr3t
. - The
rex
unit can now apply an arbitrary number of transformations to each match and return the results as separate outputs. - The
urlguards
unit now supports ProofPoint V3 guarded URLs. - Thanks to the recent fix of #29 in javaobj, the
dsjava
(deserialize Java serialized data) unit should now work. However, since there are currently no tests, bugs should be expected.
- Processing of data in frames is no longer interrupted by errors in one unit.
- The global
--lenient
(or-L
) flag has been added: It allows refinery units to return partial results. This behavior is disabled by default because it usually means that an error occurred during processing. - The virtual environment setup script has received bug fixes for problems with absolute paths.
- This changelog was added.
- The unit
jsonfmt
has been renamed toppjson
(for pretty-print json). - The unit
ppxml
(pretty-print xml) was added. - The unit
carve-pe
(carve PE files) was added. - The unit
winreg
(read windows registry hives) was added, also adding a dependency on the python-registry package (also on GitHub). - .NET managed resource extraction was improved, although it is still not perfect.
- The unit
sorted
now only sorts the chunks of the input stream that are in scope. - The unit
dedup
can no longer sort the input stream becausesorted
can do this. - PowerShell deobfuscation and their test coverage was improved.
- Cryptographic units have been refactored; the
salsa
andchacha
units now take a--nonce
parameter rather than an--iv
parameter, as they should.