Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: GitHub action enforcing style requirements #1649

Open
dgmcdona opened this issue Feb 28, 2025 · 7 comments
Open

Feature: GitHub action enforcing style requirements #1649

dgmcdona opened this issue Feb 28, 2025 · 7 comments

Comments

@dgmcdona
Copy link
Contributor

@atcuno and I have been discussing ways that we can enforce certain style requirements for new plugins, especially in light of some cleanup that has been done recently regarding confusing variable names and the use of keyword arguments for certain methods. I believe that this can be done accurately using tree-sitter:

This issue is to keep track of the things that we want to enforce. So far we have the following:

  • Require calling the following methods with all-keyword arguments:
    • pslist.PsList.list_processes
    • hivelist.HiveList.list_hives
    • symbols.symbol_table_is_64bit
    • ssdt.SSDT.build_module_collection
  • Disallow importing classes by name into other modules (e.g. from volatility3.plugins.windows.pslist import PsList)
  • Disallow method arguments that don't make sense with their type, like symbol_table: str

I think this can save a lot of time and round-trips on PR review since these things frequently come up, and there are existing places within the codebase that go against this proposed style that we can fix up at the same time.

@ikelos
Copy link
Member

ikelos commented Feb 28, 2025

I'd say anything with more than three arguments should be called with keywords args (regardless of the function), which might be easier to locate?

I've heard of tree-sitter before but never used it myself. I'll be happy to add it as long as it's pretty easy to pick up (and we're confident we've got more than one person that can look after it).

I think the variable name/type matching will be tricky, but I'm happy to see what we can do in that regard. All of these should also be documented in the coding guide which still needs review (see #1579 ).

@atcuno
Copy link
Contributor

atcuno commented Feb 28, 2025

RE: importing classes by name, currently we are at:

volatility3/framework/symbols/linux/__init__.py:import volatility3.framework.symbols.linux.utilities.modules as linux_utilities_modules
volatility3/framework/plugins/linux/keyboard_notifiers.py:import volatility3.framework.symbols.linux.utilities.modules as linux_utilities_modules
volatility3/framework/plugins/linux/tty_check.py:import volatility3.framework.symbols.linux.utilities.modules as linux_utilities_modules
volatility3/framework/plugins/linux/tracing/tracepoints.py:import volatility3.framework.symbols.linux.utilities.modules as linux_utilities_modules
volatility3/framework/plugins/linux/tracing/ftrace.py:import volatility3.framework.symbols.linux.utilities.modules as linux_utilities_modules
volatility3/framework/plugins/linux/netfilter.py:import volatility3.framework.symbols.linux.utilities.modules as linux_utilities_modules
volatility3/framework/plugins/linux/kthreads.py:import volatility3.framework.symbols.linux.utilities.modules as linux_utilities_modules
volatility3/framework/plugins/linux/check_idt.py:import volatility3.framework.symbols.linux.utilities.modules as linux_utilities_modules
volatility3/framework/plugins/windows/debugregisters.py:import volatility3.plugins.windows.pslist as pslist
volatility3/framework/plugins/windows/debugregisters.py:import volatility3.plugins.windows.threads as threads
volatility3/framework/plugins/windows/debugregisters.py:import volatility3.plugins.windows.pe_symbols as pe_symbols
volatility3/framework/plugins/windows/suspended_threads.py:import volatility3.plugins.windows.pslist as pslist
volatility3/framework/plugins/windows/suspended_threads.py:import volatility3.plugins.windows.threads as threads
volatility3/framework/plugins/windows/suspended_threads.py:import volatility3.plugins.windows.pe_symbols as pe_symbols
volatility3/framework/plugins/windows/amcache.py:from volatility3.framework.symbols.windows.extensions import registry as reg_extensions
volatility3/framework/plugins/windows/svcscan.py:from volatility3.framework.symbols.windows.extensions import services as services_types
volatility3/framework/plugins/windows/scheduled_tasks.py:from volatility3.framework.symbols.windows.extensions import registry as reg_extensions
volatility3/framework/layers/scanners/__init__.py:from volatility3.framework.layers.scanners import multiregexp as multiregexp
volatility3/framework/layers/resources.py:    from smb import SMBHandler as SMBHandler  # lgtm [py/unused-import]
volatility3/framework/configuration/__init__.py:from volatility3.framework.configuration import requirements as requirements

@atcuno
Copy link
Contributor

atcuno commented Feb 28, 2025

@ikelos are these okay? They are very new so I assume it is, but we will need to document it as an exception and have the github action skip them:

import volatility3.framework.symbols.linux.utilities.modules as linux_utilities_modules

@ikelos
Copy link
Member

ikelos commented Feb 28, 2025

Well, just modules may conflict with other imports or variables, in which case the renaming is fine as long as it makes sense.

@ikelos
Copy link
Member

ikelos commented Feb 28, 2025

It is still a module being imported, rather than a class, so that should be fine?

@dgmcdona
Copy link
Contributor Author

dgmcdona commented Mar 3, 2025

There will likely be a little learning curve around tree-sitter for whoever else helps maintain that, especially writing queries, but I can add really thorough comments explaining those, and I'm also happy to write up a doc with a good workflow around creating/troubleshooting queries.

@dgmcdona
Copy link
Contributor Author

dgmcdona commented Mar 3, 2025

Here's a basic example of how this might work: let's say that we want to find all instances of calls to pslist.PsList.list_processes that have non-keyword arguments, and tag them in the PR.

If the call adheres to our style-guidelines, tree-sitter will produce a syntax tree for the method call node that looks like this:

(call ; [931, 16] - [936, 17]
  function: (attribute ; [931, 16] - [931, 44]
    object: (attribute ; [931, 16] - [931, 29]
      object: (identifier) ; [931, 16] - [931, 22]
      attribute: (identifier)) ; [931, 23] - [931, 29]
    attribute: (identifier)) ; [931, 30] - [931, 44]
  arguments: (argument_list ; [931, 44] - [936, 17]
    (keyword_argument ; [932, 20] - [932, 40]
      name: (identifier) ; [932, 20] - [932, 27]
      value: (attribute ; [932, 28] - [932, 40]
        object: (identifier) ; [932, 28] - [932, 32]
        attribute: (identifier))) ; [932, 33] - [932, 40]
    (keyword_argument ; [933, 20] - [933, 48]
      name: (identifier) ; [933, 20] - [933, 30]
      value: (attribute ; [933, 31] - [933, 48]
        object: (identifier) ; [933, 31] - [933, 37]
        attribute: (identifier))) ; [933, 38] - [933, 48]
    (keyword_argument ; [934, 20] - [934, 57]
      name: (identifier) ; [934, 20] - [934, 32]
      value: (attribute ; [934, 33] - [934, 57]
        object: (identifier) ; [934, 33] - [934, 39]
        attribute: (identifier))) ; [934, 40] - [934, 57]
    (keyword_argument ; [935, 20] - [935, 57]
      name: (identifier) ; [935, 20] - [935, 31]
      value: (attribute ; [935, 32] - [935, 57]
        object: (identifier) ; [935, 32] - [935, 36]
        attribute: (identifier)))))

Each node and child node is kept within a set of parentheses, and the semicolons followed by the [row, col] information in brackets are just comments indicating the text range of the node within the document. Tree-sitter gives us access to those values when we interact with nodes through the API, and this is what allows us to auto-tag those in the PR.

If we want to capture all nodes that are list_processes method calls, we can write this basic query:

(call
  function: (attribute) @method ( #eq? @method "pslist.PsList.list_processes" )
  arguments: (argument_list) @args
) @call_to_check

Then, in Python via the tree-sitter API, we would check each of the direct child nodes of the arguments field (an argument_list node) to ensure that they are all keyword_argument nodes:

import textwrap
import unittest

import tree_sitter
from tree_sitter_python import language as python_language

PYTHON_LANGUAGE = tree_sitter.Language(python_language())

PSLIST_QUERY = PYTHON_LANGUAGE.query(
    """
    (call
      function: (attribute) @method ( #eq? @method "pslist.PsList.list_processes" ) ; We only want to check these calls
      arguments: (argument_list) @args ; Capture the arguments node for easy inspection later
    ) @call_to_check
    """
)


def calls_are_valid(document_text: str) -> bool:
    parser = tree_sitter.Parser(PYTHON_LANGUAGE)

    parsed_tree = parser.parse(document_text.encode("utf-8"))

    captures = PSLIST_QUERY.captures(parsed_tree.root_node)

    for capture_name, nodes in captures.items():
        if capture_name == "args":
            for child in nodes[0].children:
                if not child.is_named:
                    continue

                if not child.type == "keyword_argument":
                    print(f"Argument at row {child.start_point.row}, column {child.start_point.column} is non-keyword")
                    return False

    return True


class TestValidation(unittest.TestCase):
    def test_valid_call(self):
        CALL = textwrap.dedent(
            """
            pslist.PsList.list_processes(
                context=self.context,
                layer_name=kernel.layer_name,
                symbol_table=kernel.symbol_table_name,
                filter_func=self._conhost_proc_filter,
            )
            """
        )

        assert calls_are_valid(CALL)

    def test_invalid_call(self):
        CALL = textwrap.dedent(
            """
            pslist.PsList.list_processes(
                self.context,
                layer_name=kernel.layer_name,
                symbol_table=kernel.symbol_table_name,
                filter_func=self._conhost_proc_filter,
            )
            """
        )

        assert not calls_are_valid(CALL)


if __name__ == "__main__":
    unittest.main()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants