Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exercise MFTINDX.py with test image and extracted MFT #83

Merged

Commits on Oct 18, 2023

  1. Replace array.array type annotations with MutableSequence[int]

    On trying to run `MFTINDX.py` against the sample disk image in this
    repository, an error was raised, with this (trimmed) call stack:
    
    ```
      File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
      File "...[snip].../indxparse/MFTINDX.py", line 284, in <module>
        buf: array.array[Any],
    TypeError: 'type' object is not subscriptable
    ```
    
    (Using Python 3.9.)
    
    This type signature was an attempt to satisfy a review note from
    `mypy --strict`, that `array.array` is a generic and needed a further
    type specialization, e.g. as visible with
    `mypy --strict indxparse/BinaryParser.py` (which is known to also raise
    other unrelated issues at the moment):
    
    ```
    error: Missing type parameters for generic type "array"  [type-arg]
    ```
    
    This type signature specialization was not runtime-tested before
    implementing, and turns out to be incorrect.
    
    A StackOverflow post suggested replacing `array.array` in type
    signatures with `collections.abc.MutableSequence`, which is also a
    generic but can be specialized.
    
    This patch replaces `array.array` (with and without further type
    specialization) in type signatures with `MutableSequence[int]`, because
    a single member of a `bytes` object and most other `bytes`-like objects
    is an integer.  While this is not entirely true for `array.array`, all
    of the usage of `array.array` in this code base is of the `"B"` form,
    unsigned byte, so `MutableSequence[int]` is appropriate.  Note that
    `array.array` is still used for variables' assignment; it is just not
    used in type signatures anymore.
    
    A side-effect of switching away from `array.array` in type signatures is
    that `MutableSequence` does not satisfy the `Buffer` protocol expected
    from `struct.unpack`.  `.tobytes()` is also not guaranteed to be
    available.  `mypy` highlights these issues without `--strict`.  The
    `bytes()` built-in method is used to cast all values now only guaranteed
    to be `MutableSequence[int]`.
    
    The `Block.unpack_guid` method was modified to remove `ord`, which
    happens to not be type-compatible with `MutableSequence[int]` when using
    `map`.
    
    Disclaimer:
    Participation by NIST in the creation of the documentation of mentioned
    software is not intended to imply a recommendation or endorsement by the
    National Institute of Standards and Technology, nor is it intended to
    imply that any specific software is necessarily the best available for
    the purpose.
    
    References:
    * https://stackoverflow.com/a/67775675
    
    Signed-off-by: Alex Nelson <[email protected]>
    ajnelson-nist committed Oct 18, 2023
    Configuration menu
    Copy the full SHA
    46e9421 View commit details
    Browse the repository at this point in the history

Commits on Oct 19, 2023

  1. Fix usage of types.MethodType

    `types.MethodType` in Python 3 takes 2 arguments to instantiate, per
    line 447 of the current `types.pyi` in `typeshed`.  I'm not sure when
    this switched from 3 arguments as written down to 2; using `git blame`
    to step back in the history of that line only goes back to `337abed05a`
    in 2015, and doesn't show a specific time where the "name" parameter
    this patch removes was expected.
    
    This patch removes the third parameter because it raises a runtime
    error.
    
    References:
    * https://github.com/python/typeshed/blob/21fcd8960f1dae5ec4563dd99860d0918efe5cff/stdlib/types.pyi#L447
    
    Signed-off-by: Alex Nelson <[email protected]>
    ajnelson-nist committed Oct 19, 2023
    Configuration menu
    Copy the full SHA
    779f46c View commit details
    Browse the repository at this point in the history
  2. Initialize empty "B" arrays from empty bytestring, not empty characte…

    …r string
    
    Use of an empty character string with a "B" `array` raises a runtime
    error, with the stack trace ending with this:
    
    ```
    TypeError: cannot use a str to initialize an array with typecode 'B'
    ```
    
    Signed-off-by: Alex Nelson <[email protected]>
    ajnelson-nist committed Oct 19, 2023
    Configuration menu
    Copy the full SHA
    ebc3052 View commit details
    Browse the repository at this point in the history
  3. Exercise MFTINDX.py

    This is added in partial satisfaction of Issue 41.
    
    A follow-on patch will regenerate Make-managed files.
    
    References:
    * williballenthin#41
    
    Signed-off-by: Alex Nelson <[email protected]>
    ajnelson-nist committed Oct 19, 2023
    Configuration menu
    Copy the full SHA
    08794ea View commit details
    Browse the repository at this point in the history
  4. Regenerate Make-managed files

    Signed-off-by: Alex Nelson <[email protected]>
    ajnelson-nist committed Oct 19, 2023
    Configuration menu
    Copy the full SHA
    b263c97 View commit details
    Browse the repository at this point in the history
  5. Fix MutableSequence specialization in Python 3.8

    The `typing` module offers a symbol that supports subscripting as a type
    signature without confusing the runtime; but, it was deprecated in Python 3.9.
    
    This patch was tested in Pythons 3.8 and 3.12.
    
    References:
    * https://docs.python.org/3.12/library/typing.html#typing.MutableSequence
    
    Signed-off-by: Alex Nelson <[email protected]>
    ajnelson-nist committed Oct 19, 2023
    Configuration menu
    Copy the full SHA
    44e1a9e View commit details
    Browse the repository at this point in the history
  6. Use type signatures to designate some functions as non-mutative

    This patch reviews usage of `MutableSequence[int]`, and restricts
    functions that should not modify their inputs (e.g.
    `BinaryParser.read_dword`) to use a read-only argument signature instead
    of the read-write `MutableSequence[int]`.
    
    An initial suggestion of `typing.ByteString` and/or
    `collections.abc.Buffer` led to review of their definitions for whether
    they are read-only or read-write types.  The end result is that for
    "Read-write" function parameters, `MutableSequence[int]` is still used,
    and for "Read-only" parameters, `bytes` is used.
    
    The foundations for this decision are that the `bytes` class is
    immutable, `bytearray` is mutable, and the suggested types `ByteString`
    and `Buffer` end up supporting `bytearray`.  Prior review of
    `typeshed`'s documentation on "readable buffers" and "writable buffers"
    indicate there is no "read-only Buffer" generic type currently in
    Python.
    
    `collections.abc.Buffer` is the union of `bytes`, `bytearray`, and
    `memoryview`.  Because the union includes `bytes` and `bytearray`,
    `Buffer` can be mutable.  (This definition was found in
    `typing_extensions`; source block cited in references.)
    
    `memoryview` may reference an object that supports the buffer protocol.
    This includes `bytes` and `bytearray`.  (This mutability/immutability
    detail is documented in both 3.8 and 3.12; 3.8 is cited below.)
    
    `typing.ByteString` is defined as the same union as
    `collections.abc.Buffer`, so it can be mutable.
    
    Given all the above, this patch implements "read-only" buffer
    restrictions on function parameters with the `bytes` type.
    
    A consequence of using types to restrict read-only vs. read-write is
    that several call paths to the read-only functions start from read-write
    buffers.  This patch handles this by using `bytes` to cast slices of the
    mutable buffers, done to satisfy type restrictions and to avoid casting
    larger buffers than necessary.
    
    No effects were observed on Make-managed files.
    
    References:
    * https://docs.python.org/3/library/stdtypes.html#bytearray-objects
    * https://docs.python.org/3/library/stdtypes.html#bytes-objects
    * https://docs.python.org/3.8/library/stdtypes.html#memoryview
    * https://github.com/python/typeshed/blob/21fcd8960f1dae5ec4563dd99860d0918efe5cff/stdlib/_typeshed/__init__.pyi#L239-L247
    * https://github.com/python/typing_extensions/blob/04f98954ba63a5e8a09c12171be24785298276b6/src/typing_extensions.py#L2522-L2548
    
    Requested-by: Willi Ballenthin <[email protected]>
    Signed-off-by: Alex Nelson <[email protected]>
    ajnelson-nist committed Oct 19, 2023
    Configuration menu
    Copy the full SHA
    fbe6592 View commit details
    Browse the repository at this point in the history

Commits on Oct 25, 2023

  1. Configuration menu
    Copy the full SHA
    2205d8e View commit details
    Browse the repository at this point in the history
  2. Remove bytes() calls that are redundant with type signatures

    No effects were observed on Make-managed files.
    
    Signed-off-by: Alex Nelson <[email protected]>
    ajnelson-nist committed Oct 25, 2023
    Configuration menu
    Copy the full SHA
    df0d122 View commit details
    Browse the repository at this point in the history
  3. Replace bytes() calls with typing.cast()

    This patch avoids potential copies of large buffers from feeding through
    `bytes()`, using `cast(bytes, _)` instead to satisfy type review.
    
    The slicing added in `fbe65927` has been reverted, because it had been
    added to focus the data being fed into `bytes()` calls.
    
    No effects were observed on Make-managed files.
    
    References:
    * https://docs.python.org/3/library/typing.html#typing.cast
    
    Requested-by: Willi Ballenthin <[email protected]>
    Signed-off-by: Alex Nelson <[email protected]>
    ajnelson-nist committed Oct 25, 2023
    Configuration menu
    Copy the full SHA
    08a5e02 View commit details
    Browse the repository at this point in the history