Want to run all parsers instead of preset #4932

nflexfo · 2024-12-02T15:05:13Z

Describe the problem:

It is not possible to run all parsers when no parser filter expression is provided and pre-processor detects a specific OS.

More specifically, when running on a Windows image, it falls backs to the win7 preset which does not include the mft parser. It can be worked around with something like --parsers win7,mft, --parsers win7_slow, or even specifying the complete list. But then, what if the disk also contains data that could be parsed by spotlight_storedb (randomly chosen), or any "future" parser for that mean.

Furthermore, all parsers seems to be enable when the pre-preprocessor cannot detect a suitable preset. That is, dependending on the source format (disk or directory), different set of parsers are enable albeit no parser filter is ever set by user. This is confusing.

I understand the default behavior to use win7 instead of win7_slow suits most user. In our case, we prefer to extract everything possible and filter data afterwards by other means and tools.

What is expected is either:

a) Run all parsers unless specified otherwise
b) Add an additional magic option --parsers all (similar to --parsers list or --partitions all)
c) Add a switch to disable automatic preset detection (like --skip-preset-detection)

To Reproduce:

Plaso: 20240826
OS: Linux
Install: Sources
Data Source: The Windows 7 disk from Data Leakage Case (CFReDS)

Filter file (l2t_filter_mft.yaml):

description: File system metadata files (MFT).
type: include
path_separator: '\'
paths:
- '\\[$]MFT'

Reproducer cmd:

./scripts/log2timeline.py --partitions all --vss_stores=none --filter-file ./l2t_filter_mft.yaml --storage_file ./output/debug.plaso ./my_data/in_a_dir/cfreds.dd
./scripts/pinfo.py ./output/debug.plaso --verbose

Pinfo gives:

************************** Plaso Storage Information ***************************
            Filename : debug.plaso
      Format version : 20230327
Serialization format : json
--------------------------------------------------------------------------------

*********************************** Sessions ***********************************
5472a64d-56b3-4281-871a-e94a2a7c80a6 : 2024-12-02T14:36:27.818708+00:00
--------------------------------------------------------------------------------

**************** Session: 5472a64d-56b3-4281-871a-e94a2a7c80a6 *****************
                Start time : 2024-12-02T14:36:27.818708+00:00
           Completion time : 2024-12-02T14:36:38.659266+00:00
              Product name : plaso
           Product version : 20240826
    Command line arguments : ./scripts/log2timeline.py --partitions all
                             --vss_stores=none --filter-file
                             ./l2t_filter_mft.yaml --storage_file
                             ./output/debug.plaso ./my_data/in_a_dir/cfreds.dd
  Parser filter expression : win7
Enabled parser and plugins : bencode/bencode_transmission,
                             bencode/bencode_utorrent, binary_cookies,
                             chrome_cache, chrome_preferences,
                             custom_destinations, czip/oxml,
                             esedb/file_history, esedb/msie_webcache,
                             esedb/user_access_logging, filestat,
                             firefox_cache, java_idx, lnk, mcafee_protection,
                             msiecf, olecf/olecf_automatic_destinations,
                             olecf/olecf_default, olecf/olecf_document_summary,
                             olecf/olecf_summary, opera_global,
                             opera_typed_history, pe, plist/safari_history,
                             prefetch, recycle_bin, sqlite/chrome_17_cookies,
                             sqlite/chrome_27_history,
                             sqlite/chrome_66_cookies, sqlite/chrome_8_history,
                             sqlite/chrome_autofill,
                             sqlite/chrome_extension_activity,
                             sqlite/firefox_10_cookies,
                             sqlite/firefox_2_cookies,
                             sqlite/firefox_downloads, sqlite/firefox_history,
                             sqlite/google_drive, sqlite/safari_historydb,
                             sqlite/skype, symantec_scanlog,
                             text/gdrive_synclog, text/powershell_transcript,
                             text/sccm, text/setupapi, text/skydrive_log_v1,
                             text/skydrive_log_v2,
                             text/teamviewer_application_log,
                             text/teamviewer_connections_incoming,
                             text/teamviewer_connections_outgoing,
                             text/winfirewall, usnjrnl, winevtx, winjob,
                             winpca_db0, winpca_dic, winreg/amcache,
                             winreg/appcompatcache, winreg/bagmru, winreg/bam,
                             winreg/ccleaner, winreg/explorer_mountpoints2,
                             winreg/explorer_programscache,
                             winreg/microsoft_office_mru,
                             winreg/microsoft_outlook_mru,
                             winreg/mrulist_shell_item_list,
                             winreg/mrulist_string,
                             winreg/mrulistex_shell_item_list,
                             winreg/mrulistex_string,
                             winreg/mrulistex_string_and_shell_item,
                             winreg/mrulistex_string_and_shell_item_list,
                             winreg/msie_zone, winreg/mstsc_rdp,
                             winreg/mstsc_rdp_mru, winreg/network_drives,
                             winreg/networks, winreg/userassist,
                             winreg/windows_boot_execute,
                             winreg/windows_boot_verify, winreg/windows_run,
                             winreg/windows_sam_users, winreg/windows_services,
                             winreg/windows_shutdown,
                             winreg/windows_task_cache,
                             winreg/windows_timezone,
                             winreg/windows_typed_urls,
                             winreg/windows_usb_devices,
                             winreg/windows_usbstor_devices,
                             winreg/windows_version, winreg/winlogon,
                             winreg/winrar_mru, winreg/winreg_default
        Preferred encoding : UTF-8
       Preferred time zone : UTC
                Debug mode : False
          Artifact filters : N/A
               Filter file : N/A
--------------------------------------------------------------------------------

********** System configuration: 5472a64d-56b3-4281-871a-e94a2a7c80a6 **********
                Hostname : INFORMANT-PC
        Operating system : Windows NT
Operating system product : Windows 7 Ultimate
Operating system version : 6.1
                Language : en-US
               Code page : cp1252
         Keyboard layout : N/A
               Time zone : America/New_York
--------------------------------------------------------------------------------

...

************************* Events generated per parser **************************
Parser (plugin) name : Number of events
--------------------------------------------------------------------------------
            filestat : 8
               Total : 8
--------------------------------------------------------------------------------

So, no mft events (as it is absent in the parser list), an unprovided "Parser filter expression", and surprisingly (yet it is correctly handled), Pinfo thinks there is no "Filter file".

Now, the same run with the mft parser provided:

./scripts/log2timeline.py --partitions all --vss_stores=none --filter-file ./l2t_filter_mft.yaml --storage_file ./output/debug-with-mft.plaso ./my_data/in_a_dir/cfreds.dd --parsers mft

./scripts/pinfo.py ./output/debug-with-mft.plaso --verbose

*********************************** Sessions ***********************************
...
  Command line arguments : ./scripts/log2timeline.py --partitions all
                             --vss_stores=none --filter-file
                             ./l2t_filter_mft.yaml --storage_file
                             ./output/debug-with-mft.plaso
                             ./my_data/in_a_dir/cfreds.dd --parsers mft
  Parser filter expression : mft
  Enabled parser and plugins : mft
...

************************* Events generated per parser **************************
Parser (plugin) name : Number of events
--------------------------------------------------------------------------------
                 mft : 933089
               Total : 933089
--------------------------------------------------------------------------------

It correctly finds mft events.

Now this is what confused me, using the test_data/MFT file from Plaso's dataset (and without any parser filter):

./scripts/log2timeline.py --partitions all --vss_stores=none --filter-file ./l2t_filter_mft.yaml --storage_file ./output/file-no-mft.plaso ./plaso-20240826/test_data/MFT

And pinfo output:

**************** Session: 2197f23b-a6c2-43e3-bd04-f066b27842a1 *****************
                Start time : 2024-12-02T14:50:49.603078+00:00
           Completion time : 2024-12-02T14:50:56.436049+00:00
              Product name : plaso
           Product version : 20240826
    Command line arguments : ./scripts/log2timeline.py --partitions all
                             --vss_stores=none --filter-file
                             ./l2t_filter_mft.yaml --storage_file
                             ./output/file-no-mft.plaso
                             ./plaso-20240826/test_data/MFT
  Parser filter expression : N/A
Enabled parser and plugins : android_app_usage, asl_log, bencode,
                             bencode/bencode_transmission,
                             bencode/bencode_utorrent, binary_cookies,
                             bodyfile, bsm_log, chrome_cache,
                             chrome_preferences, cups_ipp, custom_destinations,
                             czip, czip/oxml, esedb, esedb/file_history,
                             esedb/msie_webcache, esedb/srum,
                             ...

************************* Events generated per parser **************************
Parser (plugin) name : Number of events
--------------------------------------------------------------------------------
            filestat : 3
                 mft : 126349
               Total : 126352
--------------------------------------------------------------------------------

Now this time, the mft parser is turned on by default. So, we kinda have an inconsistent behavior depending on the type of data. Or maybe, it should be made more explicit in the command line help?

I tried to add a dirty and False at the following line to disable the preset detection and I got the expected behavior.

Let me know what you think about it. I'm even fine coding the feature if you would like. Hopefully, I didn't missed any existing feature/option.

EDIT: I should add that this issue is not only about Windows or MFT. If a Linux server hosts EVTX files for archives, I want the winevtx parser to be activated even if those events don't belongs to the server.

Looking forward to read you, thanks.

The text was updated successfully, but these errors were encountered:

joachimmetz · 2025-02-23T10:26:47Z

It is not possible to run all parsers when no parser filter expression is provided and pre-processor detects a specific OS.

This is by design

In our case, we prefer to extract everything possible and filter data afterwards by other means and tools.

You can create your own preset, we are unable to facilitate for all the different preferences out there. If you want help devise a principled approach recommend you help out with #4951

Now this time, the mft parser is turned on by default.

This is by design

So, we kinda have an inconsistent behavior depending on the type of data.

This is highly subjective, IMHO it make no sense to parse the $MFT (by default) if you have a more rich and reliable source of information namely the full NTFS file system.

Now this time, the mft parser is turned on by default.

This does not make sense, for an image with full system you have much better way to extract the metadata than just the $MFT - see https://osdfir.blogspot.com/2020/04/parsing-mft-ntfs-metadata-file.html

If a Linux server hosts EVTX files for archives, I want the winevtx parser to be activated even if those events don't belongs to the server.

You can customize the presets to your needs. This will be different for different use cases.

mpilking · 2025-02-24T22:18:39Z

Hello @joachimmetz ,

What are your thoughts on the suggestions here from the original issue?

What is expected is either:

a) Run all parsers unless specified otherwise
b) Add an additional magic option --parsers all (similar to --parsers list or --partitions all)
c) Add a switch to disable automatic preset detection (like --skip-preset-detection)

It would be ideal to have a simple option to run log2timeline with all parsers/plugins enabled, regardless of the OS detected by preprocessing. I realize this is very inefficient, but in a scenario where completeness is more important than time, it would be very helpful to have a simple way to ensure that all parsers/plugins are used to evaluate all files. Then postprocessing can later include/exclude the events of interest.

Thanks,
Mike

joachimmetz · 2025-02-25T05:10:33Z

I realize this is very inefficient, but in a scenario where completeness is more important than time,

There are many more tradeoffs here to consider than efficiency; analysis/investigation time, storage limits, processing time, dealing with incorrectly extracted events, duplication, etc.

it would be very helpful to have a simple way to ensure that all parsers/plugins are used to evaluate all files

What do you mean with "completeness"? Isn't "completeness" that of the analysis/investigation more relevant? More files parsed are not necessary going to lead better analysis/investigative outcomes.

mpilking · 2025-02-25T14:32:25Z

There are many more tradeoffs here to consider than efficiency; analysis/investigation time, storage limits, processing time, dealing with incorrectly extracted events, duplication, etc.

I'm aware of the tradeoffs. I mentioned that it is very inefficient. This would not be a recommended approach on a regular bases. But there are times (not often, but sometimes) when a brute-force process can be useful.

What do you mean with "completeness"? Isn't "completeness" that of the analysis/investigation more relevant? More files parsed are not necessary going to lead better analysis/investigative outcomes.

What I mean by "completeness" is that some analysts, like the original creator of this issue, would like to have the ability to use the complete set of parsers rather than the tool picking the parsers for us. And yes, I can create a custom preset myself, but it would be preferable to have it be a built-in option. Any of these 3 options @nflexfo suggested would work well:

a) Run all parsers unless specified otherwise
b) Add an additional magic option --parsers all (similar to --parsers list or --partitions all)
c) Add a switch to disable automatic preset detection (like --skip-preset-detection)

joachimmetz · 2025-02-25T20:56:14Z

I'm aware of the tradeoffs. I mentioned that it is very inefficient.

I get the impression you are not. This can actually produce worse results/findings, there are many intricacies here. If you can factual prove otherwise, I might reconsider.

What I mean by "completeness" is that some analysts, like the original creator of this issue, would like to have the ability to use the complete set of parsers rather than the tool picking the parsers for us.

you can already do it, just define the parsers you want to run.

mpilking · 2025-02-25T21:59:35Z

If it's such a terrible idea to run all parsers, then why is it the fallback option based on the documentation here: https://plaso.readthedocs.io/en/latest/sources/developer/Internals.html#parsers-and-preset-selection?

In other words, if log2timeline can't guess the OS, then it's fine to run all parsers. Otherwise, it's a terrible idea?

jleaniz · 2025-02-25T22:10:34Z

@mpilking the point is that the better approach is to optimize for the general case, not some special case where you wish to run all parsers (which is inefficient as you said). If you want to run all parsers, just create a custom preset and use that. If you are not happy with that approach, this is an open source project and you can always submit a PR and ask for it to be merged, at which point your implementation can be evaluated on its practicality.

joachimmetz · 2025-02-26T04:49:11Z

@mpilking given you are resulting to using hyperboles, you don't appear to want to have a constructive conversation. I'm locking this conversation.

joachimmetz changed the title ~~Unable to run all parsers when pre-processor detects a suitable preset~~ Wnat to run all parsers instead of preset Feb 23, 2025

joachimmetz changed the title ~~Wnat to run all parsers instead of preset~~ Want to run all parsers instead of preset Feb 23, 2025

joachimmetz self-assigned this Feb 23, 2025

joachimmetz added core Issues relating to Plaso's core - processing, file access etc. parsers Issues related to parsers and parser plug-ins preprocessing Issues related to preprocessing and removed core Issues relating to Plaso's core - processing, file access etc. labels Feb 23, 2025

log2timeline locked as resolved and limited conversation to collaborators Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Want to run all parsers instead of preset #4932

Want to run all parsers instead of preset #4932

nflexfo commented Dec 2, 2024 •

edited

Loading

joachimmetz commented Feb 23, 2025 •

edited

Loading

mpilking commented Feb 24, 2025

joachimmetz commented Feb 25, 2025 •

edited

Loading

mpilking commented Feb 25, 2025

joachimmetz commented Feb 25, 2025

mpilking commented Feb 25, 2025

jleaniz commented Feb 25, 2025

joachimmetz commented Feb 26, 2025

Want to run all parsers instead of preset #4932

Want to run all parsers instead of preset #4932

Comments

nflexfo commented Dec 2, 2024 • edited Loading

joachimmetz commented Feb 23, 2025 • edited Loading

mpilking commented Feb 24, 2025

joachimmetz commented Feb 25, 2025 • edited Loading

mpilking commented Feb 25, 2025

joachimmetz commented Feb 25, 2025

mpilking commented Feb 25, 2025

jleaniz commented Feb 25, 2025

joachimmetz commented Feb 26, 2025

nflexfo commented Dec 2, 2024 •

edited

Loading

joachimmetz commented Feb 23, 2025 •

edited

Loading

joachimmetz commented Feb 25, 2025 •

edited

Loading