Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Node using dataclass #144

Merged
merged 4 commits into from
Feb 4, 2025
Merged

Conversation

Pennycook
Copy link
Contributor

@Pennycook Pennycook commented Dec 18, 2024

Related issues

Part of #36, similar to #143.

Proposed changes

  • Ensure that classes derived from DirectiveNode set tokens during initialization. The goal here was to simplify the process of converting things to data classes, because things can get complicated very quickly when the class hierarchy isn't obeyed.
  • Make some minor adjustments to some DirectiveNode tests, because there was some inconsistency there: some derived classes used tokens to store a subset of tokens that were deemed relevant (e.g., the expression following a #define) when they should have been storing all tokens that make up the directive (i.e, including # and the directive name).
  • Replace node-specific spelling() functions with a common function that prints all the tokens.
  • Remove the kind field from DirectiveNode and its derived classes, because it wasn't used for anything.
  • Remove the file_hash field from FileNode, because it was only used for the __repr__ string.
  • Convert Node, CodeNode, DirectiveNode, and all derived classes into data classes, removing __init__ and __repr__ appropriately.

I think the @dataclass design here is cleaner than what we had before, but it does highlight that CodeNode is a bit broken. Ideally we would derive the value for fields like num_lines from the tokens passed to the initialization of derived classes, but this doesn't work. Because we handle line continuations before lexing, directives that span multiple lines set the wrong line field for constituent tokens. We should aim to fix this as part of #102.

Note also that all the Node classes use @dataclass(eq=False). This is currently necessary because the associator code currently stores a mapping from nodes to platforms, which requires that Node classes are hashable.

@Pennycook Pennycook added the refactor Improvements to code structure label Dec 18, 2024
@Pennycook Pennycook added this to the 2.0.0 milestone Dec 18, 2024
If all derived subclasses store tokens, we can simplify the design and
eventually move things over to a @DataClass annotation.

Signed-off-by: John Pennycook <[email protected]>
Tests previously assumed that the tokens stored by a CodeNode may have been
modified (e.g., to strip "#define" and retain only the definition).

Following the changes in the previous commit, self.tokens stores the original
tokens (as produced by the lexer), and any modified tokens are stored in
dedicated node-specific variables.

Signed-off-by: John Pennycook <[email protected]>
Now that each DirectiveNode stores the original tokens, it is always possible
to recover the original spelling from these tokens.

This change should also lead to more accurate output, since previously the
spelling of "#ifdef X" may be computed as "#if defined(X)" due to incomplete
information.

Signed-off-by: John Pennycook <[email protected]>
Following the inheritance adjustments, converting preprocessor Nodes into data
classes becomes reasonably straightforward and allows for simpler __init__ and
__repr__ functionality.

The exception is CodeNode, because properties like "num_lines" are currently
set after initialization rather than during __init__.

Signed-off-by: John Pennycook <[email protected]>
@Pennycook
Copy link
Contributor Author

Rebased because #41 added another argument to the CodeNode constructor, resulting in a conflict.

The failing test is unrelated to changes in this PR, and is fixed by #148.

@Pennycook Pennycook closed this Jan 16, 2025
@Pennycook Pennycook reopened this Jan 16, 2025
Copy link
Contributor

@laserkelvin laserkelvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style comment

self.children = []
self.parent = None
children: list[Self] = field(default_factory=list, init=False)
parent: Self | None = field(default=None, init=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need field(...) if the default is None?

i.e. parent: Self | None = None should be suffice I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason to use field(...) here (and #144 (comment)) is the init=False.

Without init=False, these end up exposed as parameters to the __init__() function. The way that @dataclass inheritance works, you end up with a parent parameter in the __init__() function of every derived class, and that prevents derived classes from defining any non-default parameters (because it's unclear how you would order them).

It's definitely a bit weird, but I convinced myself that expressing things this way highlights that all Node objects have these special members, but it's impossible to initialize them. The only way to modify these fields is to access them directly, after a Node has been created.

return _representation_string(self, attrs=["filename"])
filename: str
num_lines: int = field(default=0, init=False)
total_sloc: int = field(default=0, init=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in Node: unless I'm missing a reason for it, the only instances where you have to use field is for empty lists and dicts. If you're just going to have zero as the default, int = 0 should suffice.

I guess it just makes keeping a consistent style easier?

@Pennycook Pennycook merged commit f58df1f into intel:main Feb 4, 2025
5 of 6 checks passed
@Pennycook Pennycook deleted the node-dataclass branch February 4, 2025 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Improvements to code structure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants