Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review MFTINDX.py with mypy --strict #67

Conversation

ajnelson-nist
Copy link
Contributor

This patch series has three primary objectives:

  1. Begin mypy --strict review of ${top_srcdir}/indxparse, identifying an exemplar script.
  2. Have MFTINDX.py pass mypy --strict.
  3. Remove the options (argparse.Namespace) argument-passing style from MFTINDX.py.

Some bugs were fixed as type review was occurring. For instance, some variables that were reused in longer functions, with different types at different points, instigated variable renames, and highlighted an old copy-paste error obscured by a variable name persisting after a loop. Also, some code branches were updated to handle returned Nones from functions not previously highlighted as returning some Optional value.

Each patch's log message describes its purpose.

Disclaimer: Participation by NIST in the creation of the documentation of mentioned software is not intended to imply a recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that any specific software is necessarily the best available for the purpose.

The line `from indxparse.MFT import *` obscured symbol definitions from
`mypy`, and is also flagged as a code style issue by `flake8`.  This
patch makes the symbol imports explicit.

In addition to the unit tests of `make check`, this patch is written to
pass `flake8` review when tested with the following:

```bash
flake8 --ignore=E501,W503 indxparse/MFTINDX.py
```

(`flake8` is intentionally not proposed for addition to CI yet due to
needing to address other issues.)

Disclaimer:
Participation by NIST in the creation of the documentation of mentioned
software is not intended to imply a recommendation or endorsement by the
National Institute of Standards and Technology, nor is it intended to
imply that any specific software is necessarily the best available for
the purpose.

Signed-off-by: Alex Nelson <[email protected]>
This was found while reviewing variable name re-use.

Signed-off-by: Alex Nelson <[email protected]>
`mypy` assigns a type to a symbol on its first encounter within a
function body.  When the symbol is reused later, even if for a whole new
purpose, the original name assignment is preserved.  If the type is
incongruous, a type error is raised.

Reviewing `MFTINDX.py` with `mypy --strict` showed several instances of
variable re-use (including in `for` loops), with reassignment to
different types.  This patch renames symbols to avoid ambiguity.

Disclaimer:
Participation by NIST in the creation of the documentation of mentioned
software is not intended to imply a recommendation or endorsement by the
National Institute of Standards and Technology, nor is it intended to
imply that any specific software is necessarily the best available for
the purpose.

Signed-off-by: Alex Nelson <[email protected]>
Since `MFTRecord.standard_information()` could fail to find the SI
Attribute, a block of logic that assumed the SI would be found is
revised in this patch to handle the not-found case.

Signed-off-by: Alex Nelson <[email protected]>
Since `MFTRecord.filename_information()` could fail to find a FN
Attribute, a line of output that assumed the FN would be found is
revised in this patch to handle the not-found case.

Signed-off-by: Alex Nelson <[email protected]>
Type-review flagged that no method `Runlist.entries` exists.

Signed-off-by: Alex Nelson <[email protected]>
One of the methods flagged as needing type annotations by
`mypy --strict indxparse/MFTINDX.py` is `NTFSFile.mft_get_record_buf`.
This method is written in a way that, for one unfamiliar with calling
context, could return a potentially null result.  Review of the code
paths to this function show that, at present, an `array.array` will be
returned.  But, `MFT.py` would require significant rearchitecting
(likely, subclassing `NTFSFile`) to guarantee that.

Rather than return `Optional[array]`, this patch adds a
`raise ValueError` in accordance with one usage in `MFTINDX.py`.
Handling `None` looked possible, but more complex, than a blunt fail,
due to its usage in the recursive path-constructing function
`mft_record_build_path`.

Some of the other type additions in this patch are side effects from
`mypy` flagging needs, and some are side effects from reviewing the call
paths to `mft_get_record_buf`.

Disclaimer:
Participation by NIST in the creation of the documentation of mentioned
software is not intended to imply a recommendation or endorsement by the
National Institute of Standards and Technology, nor is it intended to
imply that any specific software is necessarily the best available for
the purpose.

Signed-off-by: Alex Nelson <[email protected]>
These signatures were added to reduce type-issues reported by
`mypy --strict indxparse/MFTINDX.py`.  `MFT.py` typically only needed
return types on methods to satisfy this objective.  As a progress note:
some hundreds of issues remain if `mypy --strict` is run against
`MFT.py`.

One less-obvious signature revision happened: `mft_get_record_by_path`
now returns `Optional[MFTRecord]` instead of `Union[MFTRecord, bool]`,
to remain consistent with the control flow around its sole caller, in
`MFTINDX.py`.

Disclaimer:
Participation by NIST in the creation of the documentation of mentioned
software is not intended to imply a recommendation or endorsement by the
National Institute of Standards and Technology, nor is it intended to
imply that any specific software is necessarily the best available for
the purpose.

A follow-on patch will add further signatures in `MFTINDX.py`.

Signed-off-by: Alex Nelson <[email protected]>
With this patch, `MFTINDX.py` passes `mypy --strict`.  A follow-on patch
will add this status to the testing.

Disclaimer:
Participation by NIST in the creation of the documentation of mentioned
software is not intended to imply a recommendation or endorsement by the
National Institute of Standards and Technology, nor is it intended to
imply that any specific software is necessarily the best available for
the purpose.

Signed-off-by: Alex Nelson <[email protected]>
This is in partial satisfaction of adding mypy type checking, noted on
Issue 38.

Disclaimer:
Participation by NIST in the creation of the documentation of mentioned
software is not intended to imply a recommendation or endorsement by the
National Institute of Standards and Technology, nor is it intended to
imply that any specific software is necessarily the best available for
the purpose.

References:
* williballenthin#38

Signed-off-by: Alex Nelson <[email protected]>
This is a continuation of the coding style trialed in Pull Request 62.

References:
* williballenthin#62

Signed-off-by: Alex Nelson <[email protected]>
@ajnelson-nist ajnelson-nist marked this pull request as ready for review August 15, 2023 00:57
Copy link
Owner

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tons of good changes here. thanks for your patience and attention to detail as you modernize this old code!


self.declare_field(
"qword", "child_vcn", align(self.current_field_offset(), 0x8)
)

def filename_information(self):
def filename_information(self) -> FilenameAttribute:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the type hints are nice additions. thanks!

@@ -1159,14 +1172,14 @@ class MFT_RECORD_FLAGS:
MFT_RECORD_IS_DIRECTORY = 0x2


def MREF(mft_reference):
def MREF(mft_reference) -> int:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def MREF(mft_reference) -> int:
def MREF(mft_reference: int) -> int:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this suggestion, by the way. I recall that mypy --strict used to require that arguments in the information flow path would also be required to be annotated, but it looks like the review strictness now only reaches return types when checking imported modules. I purposefully left this here to see if it was a quirk of my environment, but, nope.

"""
Given a MREF/mft_reference, return the record number part.
"""
return mft_reference & 0xFFFFFFFFFFFF


def MSEQNO(mft_reference):
def MSEQNO(mft_reference) -> int:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def MSEQNO(mft_reference) -> int:
def MSEQNO(mft_reference: int) -> int:

@@ -141,14 +171,21 @@ def record_bodyfile(ntfsfile, record, inode=None, attributes=None):
return ret


def node_header_bodyfile(options, node_header, basepath):
def node_header_bodyfile(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much better, thank you

offset: int,
prefix: str,
progress: bool,
slack: bool,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having this many parameters is rather smelly, but it was equivalent and simply obfuscated by options before.

it's interesting to reread the code i wrote a decade ago and see what i agree and disagree with today.

@williballenthin williballenthin merged commit 6752221 into williballenthin:master Aug 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants