-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Renaming Internal Structs #1059
base: main
Are you sure you want to change the base?
Renaming Internal Structs #1059
Conversation
src/expr.rs
Outdated
// Define the new RawExpr struct and implement Debug trait | ||
#[derive(Debug, Clone)] | ||
pub struct RawExpr { | ||
pub expr: Expr, | ||
} | ||
|
||
// Implement conversion from RawExpr to Expr | ||
impl From<RawExpr> for Expr { | ||
fn from(raw_expr: RawExpr) -> Expr { | ||
raw_expr.expr | ||
} | ||
} | ||
|
||
/// A PyExpr that can be used on a DataFrame | ||
#[pyclass(name = "Expr", module = "datafusion.expr", subclass)] | ||
#[derive(Debug, Clone)] | ||
pub struct PyExpr { | ||
pub expr: Expr, | ||
pub raw_expr: RawExpr, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need another wrapper struct. Rather, I think you need to change the pyo3 class name to RawExpr
.
#[pyclass(name = "RawExpr", module = "datafusion.expr", subclass)]
This will probably require updates to the CI script that checks for all classes that require exporting.
You should also be able to test the debug error by going into any of the methods in functions.py
and remove the .expr
to get the inner object and see the error generated is updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, got it. I'll revert this and proceed as you suggested. I'll start with changing RawExpr
and the corresponding CI tests and get that reviewed again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I am getting this error:
Traceback (most recent call last):
File "/home/user/datafusion-python/rename.py", line 3, in <module>
result = f.isnan(col("a"))
File "/home/user/datafusion-python/python/datafusion/functions.py", line 284, in isnan
return Expr(f.isnan(expr))
TypeError: argument 'num': 'Expr' object cannot be converted to 'RawExpr'
When I try to run:
from datafusion import col, functions as f
result = f.isnan(col("a"))
print(result)
27ef6d3
to
80611e6
Compare
I'll be renaming all the classes in Line 77 in 9d634de
I hope that's correct. The comment above it states this for the classes in |
You'll want to get the submodules as well. Also, this can be done piecemeal if doing the entire thing in one go is too large a PR. The CI failure is the one I expected. We will need to update the check for export coverage in the wrappers to have this specific naming convention. Ideally, we would even require the |
Sure. I'll fix that CI test right away. About breaking this into several PRs, I agree. There are small changes required in different places for each renaming. One single PR would make things a lot confusing in case some future dev tries to understand what all changes were made. |
python/datafusion/expr.py
Outdated
@@ -86,6 +86,7 @@ | |||
Partitioning = expr_internal.Partitioning | |||
Placeholder = expr_internal.Placeholder | |||
Projection = expr_internal.Projection | |||
RawExpr = expr_internal.RawExpr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want this in the wrapper. Instead we want to update the CI test to identify that RawExpr
is properly covered by the Expr
class in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something along these lines?
# Skip internal context and RawExpr (which is handled by Expr class)
if attr in ["_global_ctx", "RawExpr"]:
continue
# Check if RawExpr functionality is covered by Expr class
if attr == "RawExpr" and hasattr(wrapped_obj, "Expr"):
expr_class = getattr(wrapped_obj, "Expr")
assert hasattr(expr_class, "raw_expr"), "Expr class must provide raw_expr property"
continue
# Skip internal context and RawExpr (which is handled by Expr class) | ||
if attr in ["_global_ctx", "RawExpr"]: | ||
continue | ||
|
||
# Check if RawExpr functionality is covered by Expr class | ||
if attr == "RawExpr" and hasattr(wrapped_obj, "Expr"): | ||
expr_class = getattr(wrapped_obj, "Expr") | ||
assert hasattr(expr_class, "raw_expr"), "Expr class must provide raw_expr property" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of special casing it like this, we should be able to write a more general solution that checks all class names. For the internal representation they can begin with Raw
and be converted to their non-raw counterparts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I am sorry. I thought that I could include that after renaming other classes. I am now simply extracting the base_class
name from the Raw
attr name via string slicing and then use it to check if the base_class
has such an attribute:
if attr in ["_global_ctx"]:
continue
# Check if Raw* classes have corresponding wrapper classes
if attr.startswith("Raw"):
base_class = attr[3:] # Remove "Raw" prefix
assert hasattr(wrapped_obj, base_class), f"Missing implementation for {attr}"
continue
Sorry for being this clumsy. Thanks a lot for your patience.
python/datafusion/expr.py
Outdated
@@ -102,6 +102,7 @@ | |||
|
|||
__all__ = [ | |||
"Expr", | |||
"RawExpr", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think removing this caused the test to fail. I'll add this again and run the tests locally again.
python/datafusion/expr.py
Outdated
"""This constructor should not be called by the end user.""" | ||
self.expr = expr | ||
|
||
@property | ||
def raw_expr(self) -> expr_internal.RawExpr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this.
@@ -36,7 +36,12 @@ def missing_exports(internal_obj, wrapped_obj) -> None: | |||
for attr in dir(internal_obj): | |||
if attr in ["_global_ctx"]: | |||
continue | |||
assert attr in dir(wrapped_obj) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you apply ruff it should correct these whitespace errors. I recommend turning on pre-commit in your local workspace to catch them before they get to CI.
@@ -36,7 +36,12 @@ def missing_exports(internal_obj, wrapped_obj) -> None: | |||
for attr in dir(internal_obj): | |||
if attr in ["_global_ctx"]: | |||
continue | |||
assert attr in dir(wrapped_obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove this assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Residual of a previous thing I was trying. Fixed.
It's looking good. As long as CI passes, I plan to merge. |
Which issue does this PR close?
Closes #853
Rationale for this change
Developers were getting confusing debugging errors such as "Cannot convert Expr to Expr". Hence, the internal struct
Expr
was renamed toRawExpr
for better understanding.What changes are included in this PR?
The internal struct
Expr
was renamed toRawExpr
.Are there any user-facing changes?
None. It only changes the internal structs.