Skip to content
Peter Weidenbach edited this page Apr 15, 2020 · 9 revisions

FACT_extractor - Wiki

Intro

fact_extractor is offspring of FACT. Originally part of FACT_core the extraction was moved into its own project. The extraction is plugin based using a 1-n relationship regarding plugin and file_types. The idea is to have a custom utility for each container, archive or firmware update format.

The extractor then detects the file type automatically and chooses the correct unpack plugin.

The plugin concept is designed to make the extractor extendable. Depending on related work plugin code can be very short and simple. This can be seen for examples where a python library or binary tool exists, that already handles the extraction. The more complex part in these cases is to add a file type signature, where it does not exists in the standard linux file magic library.

Setup

See the readme for setup instructions.

Usage

See the readme for usage instructions.

Plugin Development

❗ Before you start developing your own plug-ins, have a look at the FACT coding guidlines.

❓ If you have any questions or problems regarding plug-in development, do not hesitate to ask here or here.

All important information regarding coding of new plugins is collected in the plugin development wiki. If you like to contribute a plugin, you can simply fork fact_extractor and develop your plugin there. Alternatively you can also develop in private and later add the plugin as git submodule on other installations. Or develop using your own favourite license on GitHub. Adding submodules is a one liner:

git submodule add https://github.com/YOUR_REPO_PATH.git fact_extractor/plugins/unpacker/NAME_OF_YOUR_PLUG-IN

You can use your custom extraction plug-in within FACT by building your own docker image:

docker build -t fkiecad/fact_extractor:latest .

File magic

If your case needs a new file type signature, the fact type library has to be extended. This is typically the case if an unpacker for a firmware update format is developed. Common container and compression formats on the other hand are generally covered by the existing file library. Given that the fact type library is installed (via pip git+https) you can check if your file is detected with

$ python3 -c "from fact_helper_file import get_file_type_from_path;print(get_file_type_from_path('<path_to_your_file>'))"

Development of file signatures has to be done according to the magic man page. Signatures are stored in the fact_helper_file/mime directory of the library.

A typical workflow then includes the steps:

  1. Detect a file format that can't be extracted with fact_extractor yet
  2. Reverse the format to find actionable magic for signature
  3. Develop a signature according to these guidelines
  4. Push to library (via fork)
    1. Increment library version
  5. pip install -U git+https://github.com/<your_fork>/fact_helper_file.git
    1. (Optional) Create pull request to fkie-cad/fact_helper_file

💡 Some further help is provided in the libraries wiki

Community quotes regarding fact_extractor

No one yet:

fact_extractor is so neat, I use it everyday

Some person:

Finally a usable extractor that does not pollute my system

Maybe a relative (?):

FACT was already cool, and now I can also use the extraction standalone? Whoa!