Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renaming Internal Structs #1059

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Spaarsh
Copy link

@Spaarsh Spaarsh commented Mar 10, 2025

Which issue does this PR close?

Closes #853

Rationale for this change

Developers were getting confusing debugging errors such as "Cannot convert Expr to Expr". Hence, the internal struct Expr was renamed to RawExpr for better understanding.

What changes are included in this PR?

The internal struct Expr was renamed to RawExpr.

Are there any user-facing changes?

None. It only changes the internal structs.

@Spaarsh Spaarsh changed the title Renamed Expr to RawExpr Renaming Internal Structs Mar 10, 2025
@Spaarsh Spaarsh marked this pull request as draft March 10, 2025 19:54
src/expr.rs Outdated
Comment on lines 103 to 108
// Define the new RawExpr struct and implement Debug trait
#[derive(Debug, Clone)]
pub struct RawExpr {
pub expr: Expr,
}

// Implement conversion from RawExpr to Expr
impl From<RawExpr> for Expr {
fn from(raw_expr: RawExpr) -> Expr {
raw_expr.expr
}
}

/// A PyExpr that can be used on a DataFrame
#[pyclass(name = "Expr", module = "datafusion.expr", subclass)]
#[derive(Debug, Clone)]
pub struct PyExpr {
pub expr: Expr,
pub raw_expr: RawExpr,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need another wrapper struct. Rather, I think you need to change the pyo3 class name to RawExpr.

#[pyclass(name = "RawExpr", module = "datafusion.expr", subclass)]

This will probably require updates to the CI script that checks for all classes that require exporting.

You should also be able to test the debug error by going into any of the methods in functions.py and remove the .expr to get the inner object and see the error generated is updated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, got it. I'll revert this and proceed as you suggested. I'll start with changing RawExpr and the corresponding CI tests and get that reviewed again.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I am getting this error:

Traceback (most recent call last):
  File "/home/user/datafusion-python/rename.py", line 3, in <module>
    result = f.isnan(col("a"))
  File "/home/user/datafusion-python/python/datafusion/functions.py", line 284, in isnan
    return Expr(f.isnan(expr))
TypeError: argument 'num': 'Expr' object cannot be converted to 'RawExpr'

When I try to run:

from datafusion import col, functions as f

result = f.isnan(col("a"))
print(result)

@Spaarsh Spaarsh force-pushed the 853/enhancement/renaming-rust-exposed-structs branch from 27ef6d3 to 80611e6 Compare March 11, 2025 14:29
@Spaarsh
Copy link
Author

Spaarsh commented Mar 11, 2025

I'll be renaming all the classes in _internal here:

fn _internal(py: Python, m: Bound<'_, PyModule>) -> PyResult<()> {

I hope that's correct. The comment above it states this for the classes in _internal: "Low-level DataFusion internal package.
The higher-level public API is defined in pure python files under the datafusion directory."

@timsaucer
Copy link
Contributor

I'll be renaming all the classes in _internal here:

You'll want to get the submodules as well.

Also, this can be done piecemeal if doing the entire thing in one go is too large a PR.

The CI failure is the one I expected. We will need to update the check for export coverage in the wrappers to have this specific naming convention. Ideally, we would even require the Raw prefix once all of the classes are complete. But to manage the work, it likely makes sense to first get in one PR with the RawExpr handled and CI working, then iterate through the remaining classes, then update the CI check to require the Raw prefix.

@Spaarsh
Copy link
Author

Spaarsh commented Mar 11, 2025

Sure. I'll fix that CI test right away.

About breaking this into several PRs, I agree. There are small changes required in different places for each renaming. One single PR would make things a lot confusing in case some future dev tries to understand what all changes were made.

@Spaarsh Spaarsh marked this pull request as ready for review March 11, 2025 19:10
@@ -86,6 +86,7 @@
Partitioning = expr_internal.Partitioning
Placeholder = expr_internal.Placeholder
Projection = expr_internal.Projection
RawExpr = expr_internal.RawExpr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want this in the wrapper. Instead we want to update the CI test to identify that RawExpr is properly covered by the Expr class in this file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something along these lines?

# Skip internal context and RawExpr (which is handled by Expr class)
        if attr in ["_global_ctx", "RawExpr"]:
            continue
        
        # Check if RawExpr functionality is covered by Expr class
        if attr == "RawExpr" and hasattr(wrapped_obj, "Expr"):
            expr_class = getattr(wrapped_obj, "Expr")
            assert hasattr(expr_class, "raw_expr"), "Expr class must provide raw_expr property"
            continue

Comment on lines 37 to 44
# Skip internal context and RawExpr (which is handled by Expr class)
if attr in ["_global_ctx", "RawExpr"]:
continue

# Check if RawExpr functionality is covered by Expr class
if attr == "RawExpr" and hasattr(wrapped_obj, "Expr"):
expr_class = getattr(wrapped_obj, "Expr")
assert hasattr(expr_class, "raw_expr"), "Expr class must provide raw_expr property"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of special casing it like this, we should be able to write a more general solution that checks all class names. For the internal representation they can begin with Raw and be converted to their non-raw counterparts.

Copy link
Author

@Spaarsh Spaarsh Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I am sorry. I thought that I could include that after renaming other classes. I am now simply extracting the base_class name from the Raw attr name via string slicing and then use it to check if the base_class has such an attribute:

if attr in ["_global_ctx"]:
   continue
        
# Check if Raw* classes have corresponding wrapper classes
if attr.startswith("Raw"):
   base_class = attr[3:]  # Remove "Raw" prefix
   assert hasattr(wrapped_obj, base_class), f"Missing implementation for {attr}"
   continue

Sorry for being this clumsy. Thanks a lot for your patience.

@@ -102,6 +102,7 @@

__all__ = [
"Expr",
"RawExpr",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To remove.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think removing this caused the test to fail. I'll add this again and run the tests locally again.

"""This constructor should not be called by the end user."""
self.expr = expr

@property
def raw_expr(self) -> expr_internal.RawExpr:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this.

@@ -36,7 +36,12 @@ def missing_exports(internal_obj, wrapped_obj) -> None:
for attr in dir(internal_obj):
if attr in ["_global_ctx"]:
continue
assert attr in dir(wrapped_obj)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you apply ruff it should correct these whitespace errors. I recommend turning on pre-commit in your local workspace to catch them before they get to CI.

@@ -36,7 +36,12 @@ def missing_exports(internal_obj, wrapped_obj) -> None:
for attr in dir(internal_obj):
if attr in ["_global_ctx"]:
continue
assert attr in dir(wrapped_obj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this assert?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Residual of a previous thing I was trying. Fixed.

@timsaucer
Copy link
Contributor

It's looking good. As long as CI passes, I plan to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change naming of rust exposed structs to ease debugging
2 participants