Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request/Improvement] Alternate JSON Output w/o b64contents #3399

Open
danieldjewell opened this issue Apr 7, 2023 · 0 comments
Open

Comments

@danieldjewell
Copy link
Contributor

Any thoughts on doing something (see below) to add a way to skip the base64 output of the scanned file in JSON format? I recognize that having it in there is part of SBuD and I can definitely see the benefit/convenience (having a more-or-less "self-contained" format with the file data is great for say later security/virus/malware analysis...) -- but it also makes the JSON output absolutely gigantic (which scales up with the size of the input file scanned, of course).

Options could be:

  • Add a new output format (like "json-nob64") that doesn't include it
  • Add a command line switch to skip it (--no-contents or something like that?)

Also a second question becomes:

  • Change the schema of the JSON output and remove the b64contents key entirely (this is probably a bad idea...)
  • Just set the b64contents key to an empty string (or even None)
  • Set the b64contents key to some string actually encoded in base64 ... say base64("null")...

Ultimately, the idea is to not introduce a breaking change into the default behavior - arguably, either a new output format or a --no-contents flag preserves existing functionality. As to removing the key entirely, I suppose it's also arguable about which is better/worse: removing the b64contents key, replacing the data in the key with None/null, or setting the key to a short base64 encoded string of "null".

In my experience, at least in the Python world, developers often don't check for the existence of a key in a dict (or they do not use the dict.get() method which gracefully handles a non-existing key - unlike the case of mydict['noKey'] ). I suppose that the concern is somewhat moot since the default behavior won't change.

With either option, it seems prudent to add an optional parameter to the polyfile.Analyzer.sbud method (see below) to skip the encoding of the data to base64 - there doesn't appear to be a reason to waste CPU cycles (and memory) to convert the data to base64 if it will be stripped from the output.

def sbud(self, matches: Optional[Iterable[Match]] = None) -> Dict[str, Any]:

b64contents = base64.b64encode(data)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant