-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Node using dataclass #144
Conversation
If all derived subclasses store tokens, we can simplify the design and eventually move things over to a @DataClass annotation. Signed-off-by: John Pennycook <[email protected]>
Tests previously assumed that the tokens stored by a CodeNode may have been modified (e.g., to strip "#define" and retain only the definition). Following the changes in the previous commit, self.tokens stores the original tokens (as produced by the lexer), and any modified tokens are stored in dedicated node-specific variables. Signed-off-by: John Pennycook <[email protected]>
Now that each DirectiveNode stores the original tokens, it is always possible to recover the original spelling from these tokens. This change should also lead to more accurate output, since previously the spelling of "#ifdef X" may be computed as "#if defined(X)" due to incomplete information. Signed-off-by: John Pennycook <[email protected]>
Following the inheritance adjustments, converting preprocessor Nodes into data classes becomes reasonably straightforward and allows for simpler __init__ and __repr__ functionality. The exception is CodeNode, because properties like "num_lines" are currently set after initialization rather than during __init__. Signed-off-by: John Pennycook <[email protected]>
0d73d4b
to
3ee5ab1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style comment
self.children = [] | ||
self.parent = None | ||
children: list[Self] = field(default_factory=list, init=False) | ||
parent: Self | None = field(default=None, init=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need field(...)
if the default is None?
i.e. parent: Self | None = None
should be suffice I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason to use field(...)
here (and #144 (comment)) is the init=False
.
Without init=False
, these end up exposed as parameters to the __init__()
function. The way that @dataclass
inheritance works, you end up with a parent
parameter in the __init__()
function of every derived class, and that prevents derived classes from defining any non-default parameters (because it's unclear how you would order them).
It's definitely a bit weird, but I convinced myself that expressing things this way highlights that all Node
objects have these special members, but it's impossible to initialize them. The only way to modify these fields is to access them directly, after a Node
has been created.
return _representation_string(self, attrs=["filename"]) | ||
filename: str | ||
num_lines: int = field(default=0, init=False) | ||
total_sloc: int = field(default=0, init=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as in Node
: unless I'm missing a reason for it, the only instances where you have to use field
is for empty lists and dict
s. If you're just going to have zero as the default, int = 0
should suffice.
I guess it just makes keeping a consistent style easier?
Related issues
Part of #36, similar to #143.
Proposed changes
DirectiveNode
settokens
during initialization. The goal here was to simplify the process of converting things to data classes, because things can get complicated very quickly when the class hierarchy isn't obeyed.DirectiveNode
tests, because there was some inconsistency there: some derived classes usedtokens
to store a subset of tokens that were deemed relevant (e.g., the expression following a#define
) when they should have been storing all tokens that make up the directive (i.e, including#
and the directive name).spelling()
functions with a common function that prints all thetokens
.kind
field fromDirectiveNode
and its derived classes, because it wasn't used for anything.file_hash
field fromFileNode
, because it was only used for the__repr__
string.Node
,CodeNode
,DirectiveNode
, and all derived classes into data classes, removing__init__
and__repr__
appropriately.I think the
@dataclass
design here is cleaner than what we had before, but it does highlight thatCodeNode
is a bit broken. Ideally we would derive the value for fields likenum_lines
from the tokens passed to the initialization of derived classes, but this doesn't work. Because we handle line continuations before lexing, directives that span multiple lines set the wrongline
field for constituent tokens. We should aim to fix this as part of #102.Note also that all the
Node
classes use@dataclass(eq=False)
. This is currently necessary because the associator code currently stores a mapping from nodes to platforms, which requires thatNode
classes are hashable.