-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: GitHub action enforcing style requirements #1649
Comments
I'd say anything with more than three arguments should be called with keywords args (regardless of the function), which might be easier to locate? I've heard of tree-sitter before but never used it myself. I'll be happy to add it as long as it's pretty easy to pick up (and we're confident we've got more than one person that can look after it). I think the variable name/type matching will be tricky, but I'm happy to see what we can do in that regard. All of these should also be documented in the coding guide which still needs review (see #1579 ). |
RE: importing classes by name, currently we are at:
|
@ikelos are these okay? They are very new so I assume it is, but we will need to document it as an exception and have the github action skip them:
|
Well, just modules may conflict with other imports or variables, in which case the renaming is fine as long as it makes sense. |
It is still a module being imported, rather than a class, so that should be fine? |
There will likely be a little learning curve around tree-sitter for whoever else helps maintain that, especially writing queries, but I can add really thorough comments explaining those, and I'm also happy to write up a doc with a good workflow around creating/troubleshooting queries. |
Here's a basic example of how this might work: let's say that we want to find all instances of calls to If the call adheres to our style-guidelines, tree-sitter will produce a syntax tree for the method call node that looks like this:
Each node and child node is kept within a set of parentheses, and the semicolons followed by the [row, col] information in brackets are just comments indicating the text range of the node within the document. Tree-sitter gives us access to those values when we interact with nodes through the API, and this is what allows us to auto-tag those in the PR. If we want to capture all nodes that are
Then, in Python via the tree-sitter API, we would check each of the direct child nodes of the import textwrap
import unittest
import tree_sitter
from tree_sitter_python import language as python_language
PYTHON_LANGUAGE = tree_sitter.Language(python_language())
PSLIST_QUERY = PYTHON_LANGUAGE.query(
"""
(call
function: (attribute) @method ( #eq? @method "pslist.PsList.list_processes" ) ; We only want to check these calls
arguments: (argument_list) @args ; Capture the arguments node for easy inspection later
) @call_to_check
"""
)
def calls_are_valid(document_text: str) -> bool:
parser = tree_sitter.Parser(PYTHON_LANGUAGE)
parsed_tree = parser.parse(document_text.encode("utf-8"))
captures = PSLIST_QUERY.captures(parsed_tree.root_node)
for capture_name, nodes in captures.items():
if capture_name == "args":
for child in nodes[0].children:
if not child.is_named:
continue
if not child.type == "keyword_argument":
print(f"Argument at row {child.start_point.row}, column {child.start_point.column} is non-keyword")
return False
return True
class TestValidation(unittest.TestCase):
def test_valid_call(self):
CALL = textwrap.dedent(
"""
pslist.PsList.list_processes(
context=self.context,
layer_name=kernel.layer_name,
symbol_table=kernel.symbol_table_name,
filter_func=self._conhost_proc_filter,
)
"""
)
assert calls_are_valid(CALL)
def test_invalid_call(self):
CALL = textwrap.dedent(
"""
pslist.PsList.list_processes(
self.context,
layer_name=kernel.layer_name,
symbol_table=kernel.symbol_table_name,
filter_func=self._conhost_proc_filter,
)
"""
)
assert not calls_are_valid(CALL)
if __name__ == "__main__":
unittest.main() |
@atcuno and I have been discussing ways that we can enforce certain style requirements for new plugins, especially in light of some cleanup that has been done recently regarding confusing variable names and the use of keyword arguments for certain methods. I believe that this can be done accurately using tree-sitter:
This issue is to keep track of the things that we want to enforce. So far we have the following:
pslist.PsList.list_processes
hivelist.HiveList.list_hives
symbols.symbol_table_is_64bit
ssdt.SSDT.build_module_collection
from volatility3.plugins.windows.pslist import PsList
)symbol_table: str
I think this can save a lot of time and round-trips on PR review since these things frequently come up, and there are existing places within the codebase that go against this proposed style that we can fix up at the same time.
The text was updated successfully, but these errors were encountered: