Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Want to run all parsers instead of preset #4932

Open
nflexfo opened this issue Dec 2, 2024 · 8 comments
Open

Want to run all parsers instead of preset #4932

nflexfo opened this issue Dec 2, 2024 · 8 comments
Assignees
Labels
parsers Issues related to parsers and parser plug-ins preprocessing Issues related to preprocessing

Comments

@nflexfo
Copy link
Contributor

nflexfo commented Dec 2, 2024

Describe the problem:

It is not possible to run all parsers when no parser filter expression is provided and pre-processor detects a specific OS.

More specifically, when running on a Windows image, it falls backs to the win7 preset which does not include the mft parser. It can be worked around with something like --parsers win7,mft, --parsers win7_slow, or even specifying the complete list. But then, what if the disk also contains data that could be parsed by spotlight_storedb (randomly chosen), or any "future" parser for that mean.

Furthermore, all parsers seems to be enable when the pre-preprocessor cannot detect a suitable preset. That is, dependending on the source format (disk or directory), different set of parsers are enable albeit no parser filter is ever set by user. This is confusing.

I understand the default behavior to use win7 instead of win7_slow suits most user. In our case, we prefer to extract everything possible and filter data afterwards by other means and tools.

What is expected is either:

a) Run all parsers unless specified otherwise
b) Add an additional magic option --parsers all (similar to --parsers list or --partitions all)
c) Add a switch to disable automatic preset detection (like --skip-preset-detection)

To Reproduce:

Plaso: 20240826
OS: Linux
Install: Sources
Data Source: The Windows 7 disk from Data Leakage Case (CFReDS)

Filter file (l2t_filter_mft.yaml):

description: File system metadata files (MFT).
type: include
path_separator: '\'
paths:
- '\\[$]MFT'

Reproducer cmd:

./scripts/log2timeline.py --partitions all --vss_stores=none --filter-file ./l2t_filter_mft.yaml --storage_file ./output/debug.plaso ./my_data/in_a_dir/cfreds.dd
./scripts/pinfo.py ./output/debug.plaso --verbose

Pinfo gives:

************************** Plaso Storage Information ***************************
            Filename : debug.plaso
      Format version : 20230327
Serialization format : json
--------------------------------------------------------------------------------

*********************************** Sessions ***********************************
5472a64d-56b3-4281-871a-e94a2a7c80a6 : 2024-12-02T14:36:27.818708+00:00
--------------------------------------------------------------------------------

**************** Session: 5472a64d-56b3-4281-871a-e94a2a7c80a6 *****************
                Start time : 2024-12-02T14:36:27.818708+00:00
           Completion time : 2024-12-02T14:36:38.659266+00:00
              Product name : plaso
           Product version : 20240826
    Command line arguments : ./scripts/log2timeline.py --partitions all
                             --vss_stores=none --filter-file
                             ./l2t_filter_mft.yaml --storage_file
                             ./output/debug.plaso ./my_data/in_a_dir/cfreds.dd
  Parser filter expression : win7
Enabled parser and plugins : bencode/bencode_transmission,
                             bencode/bencode_utorrent, binary_cookies,
                             chrome_cache, chrome_preferences,
                             custom_destinations, czip/oxml,
                             esedb/file_history, esedb/msie_webcache,
                             esedb/user_access_logging, filestat,
                             firefox_cache, java_idx, lnk, mcafee_protection,
                             msiecf, olecf/olecf_automatic_destinations,
                             olecf/olecf_default, olecf/olecf_document_summary,
                             olecf/olecf_summary, opera_global,
                             opera_typed_history, pe, plist/safari_history,
                             prefetch, recycle_bin, sqlite/chrome_17_cookies,
                             sqlite/chrome_27_history,
                             sqlite/chrome_66_cookies, sqlite/chrome_8_history,
                             sqlite/chrome_autofill,
                             sqlite/chrome_extension_activity,
                             sqlite/firefox_10_cookies,
                             sqlite/firefox_2_cookies,
                             sqlite/firefox_downloads, sqlite/firefox_history,
                             sqlite/google_drive, sqlite/safari_historydb,
                             sqlite/skype, symantec_scanlog,
                             text/gdrive_synclog, text/powershell_transcript,
                             text/sccm, text/setupapi, text/skydrive_log_v1,
                             text/skydrive_log_v2,
                             text/teamviewer_application_log,
                             text/teamviewer_connections_incoming,
                             text/teamviewer_connections_outgoing,
                             text/winfirewall, usnjrnl, winevtx, winjob,
                             winpca_db0, winpca_dic, winreg/amcache,
                             winreg/appcompatcache, winreg/bagmru, winreg/bam,
                             winreg/ccleaner, winreg/explorer_mountpoints2,
                             winreg/explorer_programscache,
                             winreg/microsoft_office_mru,
                             winreg/microsoft_outlook_mru,
                             winreg/mrulist_shell_item_list,
                             winreg/mrulist_string,
                             winreg/mrulistex_shell_item_list,
                             winreg/mrulistex_string,
                             winreg/mrulistex_string_and_shell_item,
                             winreg/mrulistex_string_and_shell_item_list,
                             winreg/msie_zone, winreg/mstsc_rdp,
                             winreg/mstsc_rdp_mru, winreg/network_drives,
                             winreg/networks, winreg/userassist,
                             winreg/windows_boot_execute,
                             winreg/windows_boot_verify, winreg/windows_run,
                             winreg/windows_sam_users, winreg/windows_services,
                             winreg/windows_shutdown,
                             winreg/windows_task_cache,
                             winreg/windows_timezone,
                             winreg/windows_typed_urls,
                             winreg/windows_usb_devices,
                             winreg/windows_usbstor_devices,
                             winreg/windows_version, winreg/winlogon,
                             winreg/winrar_mru, winreg/winreg_default
        Preferred encoding : UTF-8
       Preferred time zone : UTC
                Debug mode : False
          Artifact filters : N/A
               Filter file : N/A
--------------------------------------------------------------------------------

********** System configuration: 5472a64d-56b3-4281-871a-e94a2a7c80a6 **********
                Hostname : INFORMANT-PC
        Operating system : Windows NT
Operating system product : Windows 7 Ultimate
Operating system version : 6.1
                Language : en-US
               Code page : cp1252
         Keyboard layout : N/A
               Time zone : America/New_York
--------------------------------------------------------------------------------

...

************************* Events generated per parser **************************
Parser (plugin) name : Number of events
--------------------------------------------------------------------------------
            filestat : 8
               Total : 8
--------------------------------------------------------------------------------

So, no mft events (as it is absent in the parser list), an unprovided "Parser filter expression", and surprisingly (yet it is correctly handled), Pinfo thinks there is no "Filter file".

Now, the same run with the mft parser provided:

./scripts/log2timeline.py --partitions all --vss_stores=none --filter-file ./l2t_filter_mft.yaml --storage_file ./output/debug-with-mft.plaso ./my_data/in_a_dir/cfreds.dd --parsers mft

./scripts/pinfo.py ./output/debug-with-mft.plaso --verbose
*********************************** Sessions ***********************************
...
  Command line arguments : ./scripts/log2timeline.py --partitions all
                             --vss_stores=none --filter-file
                             ./l2t_filter_mft.yaml --storage_file
                             ./output/debug-with-mft.plaso
                             ./my_data/in_a_dir/cfreds.dd --parsers mft
  Parser filter expression : mft
  Enabled parser and plugins : mft
...

************************* Events generated per parser **************************
Parser (plugin) name : Number of events
--------------------------------------------------------------------------------
                 mft : 933089
               Total : 933089
--------------------------------------------------------------------------------

It correctly finds mft events.

Now this is what confused me, using the test_data/MFT file from Plaso's dataset (and without any parser filter):

./scripts/log2timeline.py --partitions all --vss_stores=none --filter-file ./l2t_filter_mft.yaml --storage_file ./output/file-no-mft.plaso ./plaso-20240826/test_data/MFT

And pinfo output:

**************** Session: 2197f23b-a6c2-43e3-bd04-f066b27842a1 *****************
                Start time : 2024-12-02T14:50:49.603078+00:00
           Completion time : 2024-12-02T14:50:56.436049+00:00
              Product name : plaso
           Product version : 20240826
    Command line arguments : ./scripts/log2timeline.py --partitions all
                             --vss_stores=none --filter-file
                             ./l2t_filter_mft.yaml --storage_file
                             ./output/file-no-mft.plaso
                             ./plaso-20240826/test_data/MFT
  Parser filter expression : N/A
Enabled parser and plugins : android_app_usage, asl_log, bencode,
                             bencode/bencode_transmission,
                             bencode/bencode_utorrent, binary_cookies,
                             bodyfile, bsm_log, chrome_cache,
                             chrome_preferences, cups_ipp, custom_destinations,
                             czip, czip/oxml, esedb, esedb/file_history,
                             esedb/msie_webcache, esedb/srum,
                             ...

************************* Events generated per parser **************************
Parser (plugin) name : Number of events
--------------------------------------------------------------------------------
            filestat : 3
                 mft : 126349
               Total : 126352
--------------------------------------------------------------------------------

Now this time, the mft parser is turned on by default. So, we kinda have an inconsistent behavior depending on the type of data. Or maybe, it should be made more explicit in the command line help?

I tried to add a dirty and False at the following line to disable the preset detection and I got the expected behavior.

Let me know what you think about it. I'm even fine coding the feature if you would like. Hopefully, I didn't missed any existing feature/option.

EDIT: I should add that this issue is not only about Windows or MFT. If a Linux server hosts EVTX files for archives, I want the winevtx parser to be activated even if those events don't belongs to the server.

Looking forward to read you, thanks.

@joachimmetz
Copy link
Member

joachimmetz commented Feb 23, 2025

It is not possible to run all parsers when no parser filter expression is provided and pre-processor detects a specific OS.

This is by design

In our case, we prefer to extract everything possible and filter data afterwards by other means and tools.

You can create your own preset, we are unable to facilitate for all the different preferences out there. If you want help devise a principled approach recommend you help out with #4951

Now this time, the mft parser is turned on by default.

This is by design

So, we kinda have an inconsistent behavior depending on the type of data.

This is highly subjective, IMHO it make no sense to parse the $MFT (by default) if you have a more rich and reliable source of information namely the full NTFS file system.

Now this time, the mft parser is turned on by default.

This does not make sense, for an image with full system you have much better way to extract the metadata than just the $MFT - see https://osdfir.blogspot.com/2020/04/parsing-mft-ntfs-metadata-file.html

If a Linux server hosts EVTX files for archives, I want the winevtx parser to be activated even if those events don't belongs to the server.

You can customize the presets to your needs. This will be different for different use cases.

@joachimmetz joachimmetz changed the title Unable to run all parsers when pre-processor detects a suitable preset Wnat to run all parsers instead of preset Feb 23, 2025
@joachimmetz joachimmetz changed the title Wnat to run all parsers instead of preset Want to run all parsers instead of preset Feb 23, 2025
@joachimmetz joachimmetz self-assigned this Feb 23, 2025
@joachimmetz joachimmetz added core Issues relating to Plaso's core - processing, file access etc. parsers Issues related to parsers and parser plug-ins preprocessing Issues related to preprocessing and removed core Issues relating to Plaso's core - processing, file access etc. labels Feb 23, 2025
@mpilking
Copy link

Hello @joachimmetz ,

What are your thoughts on the suggestions here from the original issue?

What is expected is either:

a) Run all parsers unless specified otherwise
b) Add an additional magic option --parsers all (similar to --parsers list or --partitions all)
c) Add a switch to disable automatic preset detection (like --skip-preset-detection)

It would be ideal to have a simple option to run log2timeline with all parsers/plugins enabled, regardless of the OS detected by preprocessing. I realize this is very inefficient, but in a scenario where completeness is more important than time, it would be very helpful to have a simple way to ensure that all parsers/plugins are used to evaluate all files. Then postprocessing can later include/exclude the events of interest.

Thanks,
Mike

@joachimmetz
Copy link
Member

joachimmetz commented Feb 25, 2025

I realize this is very inefficient, but in a scenario where completeness is more important than time,

There are many more tradeoffs here to consider than efficiency; analysis/investigation time, storage limits, processing time, dealing with incorrectly extracted events, duplication, etc.

it would be very helpful to have a simple way to ensure that all parsers/plugins are used to evaluate all files

What do you mean with "completeness"? Isn't "completeness" that of the analysis/investigation more relevant? More files parsed are not necessary going to lead better analysis/investigative outcomes.

@mpilking
Copy link

There are many more tradeoffs here to consider than efficiency; analysis/investigation time, storage limits, processing time, dealing with incorrectly extracted events, duplication, etc.

I'm aware of the tradeoffs. I mentioned that it is very inefficient. This would not be a recommended approach on a regular bases. But there are times (not often, but sometimes) when a brute-force process can be useful.

What do you mean with "completeness"? Isn't "completeness" that of the analysis/investigation more relevant? More files parsed are not necessary going to lead better analysis/investigative outcomes.

What I mean by "completeness" is that some analysts, like the original creator of this issue, would like to have the ability to use the complete set of parsers rather than the tool picking the parsers for us. And yes, I can create a custom preset myself, but it would be preferable to have it be a built-in option. Any of these 3 options @nflexfo suggested would work well:

a) Run all parsers unless specified otherwise
b) Add an additional magic option --parsers all (similar to --parsers list or --partitions all)
c) Add a switch to disable automatic preset detection (like --skip-preset-detection)

@joachimmetz
Copy link
Member

I'm aware of the tradeoffs. I mentioned that it is very inefficient.

I get the impression you are not. This can actually produce worse results/findings, there are many intricacies here. If you can factual prove otherwise, I might reconsider.

What I mean by "completeness" is that some analysts, like the original creator of this issue, would like to have the ability to use the complete set of parsers rather than the tool picking the parsers for us.

you can already do it, just define the parsers you want to run.

@mpilking
Copy link

If it's such a terrible idea to run all parsers, then why is it the fallback option based on the documentation here: https://plaso.readthedocs.io/en/latest/sources/developer/Internals.html#parsers-and-preset-selection?

In other words, if log2timeline can't guess the OS, then it's fine to run all parsers. Otherwise, it's a terrible idea?

@jleaniz
Copy link
Contributor

jleaniz commented Feb 25, 2025

@mpilking the point is that the better approach is to optimize for the general case, not some special case where you wish to run all parsers (which is inefficient as you said). If you want to run all parsers, just create a custom preset and use that. If you are not happy with that approach, this is an open source project and you can always submit a PR and ask for it to be merged, at which point your implementation can be evaluated on its practicality.

@joachimmetz
Copy link
Member

@mpilking given you are resulting to using hyperboles, you don't appear to want to have a constructive conversation. I'm locking this conversation.

@log2timeline log2timeline locked as resolved and limited conversation to collaborators Feb 26, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
parsers Issues related to parsers and parser plug-ins preprocessing Issues related to preprocessing
Projects
None yet
Development

No branches or pull requests

4 participants