Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE-50 using external decorators #192

Closed
wants to merge 4 commits into from
Closed

ISSUE-50 using external decorators #192

wants to merge 4 commits into from

Conversation

zilto
Copy link
Collaborator

@zilto zilto commented Jun 7, 2023

External decorators demo

This example is in response to this GitHub issue. However, it showcases more general mechanisms of how to parse "Hamilton modules" and dynamically alter source code.

Objective

Use an existing Python module that has no Python dependencies and decorated it with Hamilton functionalities without altering the source file.

Content

The script run.py will:

  1. load the module my_functions.py, remove decorators and hamilton imports and generate my_functions_no_hamilton.py
  2. load the module external_decorators.py, which defines decorated empty functions and map them onto my_functions_no_hamilton.py
  3. store the dynamically created newly_decorated.py or load it unto a temporary Python module (never needs file writing access)

Methods

  • By using the abstract syntax tree (how Python represents the text as grammar), the files can be edited programmatically and attributes transferred between them
  • The example also shows how to load text as a Python module without needing writing access

Notes and limitations

  • You will probably want to run your linter, debugger, etc. on a generated file
  • This approach makes debugging and version control harder because of multiple files being involved. On the other hand, if you really cannot change your original file, it can provide a more robust sync than using duplicating code to add Hamilton decorators
  • The Python importlib, ast, types, typing are areas of active development and may change in the future. Python version control is important if your system depends on these features.

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

@zilto
Copy link
Collaborator Author

zilto commented Jun 8, 2023

I found an interesting behavior. When the extract_columns decorator is used, I cannot import the external decorators file if there are missing type annotations.

For example, this causes an error:

@extract_columns("seasons_1", "seasons_2", "seasons_3", "seasons_4")
def seasons_encoded(): ...

While this doesn't (it works as intended):

@extract_columns("seasons_1", "seasons_2", "seasons_3", "seasons_4")
def seasons_encoded() -> pd.DataFrame: ...

In a way, it prevents accidentally loading external_decorators.py and trying to use its empty functions, but the error is not informative for that case.

On the negative, it prevents this usage pattern (which I believe would be ideal):

import my_functions_no_hamilton
import external_decorators   # <- will throw an error 

# pass ModuleType objects and read their source code
decorated_module = use_external_decorators(
        source_module=my_functions_no_hamilton,
        decorators_module=external_decorators,
        save_module=False,
    )

But this works:

# pass file path and read source code from file
decorated_module = use_external_decorators(
        source_module="./my_functions_no_hamilton.py",
        decorators_module="./external_decorators.py",
        save_module=True,
    )

The error trace:

Traceback (most recent call last):
  File "/home/tjean/.pyenv/versions/3.10.9/lib/python3.10/site-packages/hamilton/function_modifiers/expanders.py", line 588, in validate_return_type
    registry.get_column_type_from_df_type(output_type)
  File "/home/tjean/.pyenv/versions/3.10.9/lib/python3.10/site-packages/hamilton/registry.py", line 68, in get_column_type_from_df_type
    raise NotImplementedError(
NotImplementedError: Cannot get column type for [None]. Registered types are {'pandas': {'dataframe_type': <class 'pandas.core.frame.DataFrame'>, 'column_type': <class 'pandas.core.series.Series'>}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/tjean/projects/hamilton/examples/external_decorators/run.py", line 6, in <module>
    import external_decorators
  File "/home/tjean/projects/hamilton/examples/external_decorators/external_decorators.py", line 27, in <module>
    def seasons_encoded(): ...
  File "/home/tjean/.pyenv/versions/3.10.9/lib/python3.10/site-packages/hamilton/function_modifiers/base.py", line 67, in replace__call__
    return call_fn(self, fn)
  File "/home/tjean/.pyenv/versions/3.10.9/lib/python3.10/site-packages/hamilton/function_modifiers/base.py", line 109, in __call__
    self.validate(fn)
  File "/home/tjean/.pyenv/versions/3.10.9/lib/python3.10/site-packages/hamilton/function_modifiers/expanders.py", line 602, in validate
    extract_columns.validate_return_type(fn)
  File "/home/tjean/.pyenv/versions/3.10.9/lib/python3.10/site-packages/hamilton/function_modifiers/expanders.py", line 590, in validate_return_type
    raise base.InvalidDecoratorException(
hamilton.function_modifiers.base.InvalidDecoratorException: Error <function seasons_encoded at 0x7fc5b05283a0> does not output a type we know about. Is it a dataframe type we support?   

@elijahbenizzy
Copy link
Collaborator

I found an interesting behavior. When the extract_columns decorator is used, I cannot import the external decorators file if there are missing type annotations.

For example, this causes an error:

@extract_columns("seasons_1", "seasons_2", "seasons_3", "seasons_4")
def seasons_encoded(): ...

While this doesn't (it works as intended):

@extract_columns("seasons_1", "seasons_2", "seasons_3", "seasons_4")
def seasons_encoded() -> pd.DataFrame: ...

In a way, it prevents accidentally loading external_decorators.py and trying to use its empty functions, but the error is not informative for that case.

On the negative, it prevents this usage pattern (which I believe would be ideal):

import my_functions_no_hamilton
import external_decorators   # <- will throw an error 

# pass ModuleType objects and read their source code
decorated_module = use_external_decorators(
        source_module=my_functions_no_hamilton,
        decorators_module=external_decorators,
        save_module=False,
    )

But this works:

# pass file path and read source code from file
decorated_module = use_external_decorators(
        source_module="./my_functions_no_hamilton.py",
        decorators_module="./external_decorators.py",
        save_module=True,
    )

The error trace:

Traceback (most recent call last):
  File "/home/tjean/.pyenv/versions/3.10.9/lib/python3.10/site-packages/hamilton/function_modifiers/expanders.py", line 588, in validate_return_type
    registry.get_column_type_from_df_type(output_type)
  File "/home/tjean/.pyenv/versions/3.10.9/lib/python3.10/site-packages/hamilton/registry.py", line 68, in get_column_type_from_df_type
    raise NotImplementedError(
NotImplementedError: Cannot get column type for [None]. Registered types are {'pandas': {'dataframe_type': <class 'pandas.core.frame.DataFrame'>, 'column_type': <class 'pandas.core.series.Series'>}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/tjean/projects/hamilton/examples/external_decorators/run.py", line 6, in <module>
    import external_decorators
  File "/home/tjean/projects/hamilton/examples/external_decorators/external_decorators.py", line 27, in <module>
    def seasons_encoded(): ...
  File "/home/tjean/.pyenv/versions/3.10.9/lib/python3.10/site-packages/hamilton/function_modifiers/base.py", line 67, in replace__call__
    return call_fn(self, fn)
  File "/home/tjean/.pyenv/versions/3.10.9/lib/python3.10/site-packages/hamilton/function_modifiers/base.py", line 109, in __call__
    self.validate(fn)
  File "/home/tjean/.pyenv/versions/3.10.9/lib/python3.10/site-packages/hamilton/function_modifiers/expanders.py", line 602, in validate
    extract_columns.validate_return_type(fn)
  File "/home/tjean/.pyenv/versions/3.10.9/lib/python3.10/site-packages/hamilton/function_modifiers/expanders.py", line 590, in validate_return_type
    raise base.InvalidDecoratorException(
hamilton.function_modifiers.base.InvalidDecoratorException: Error <function seasons_encoded at 0x7fc5b05283a0> does not output a type we know about. Is it a dataframe type we support?   

Oh interesting -- I think this is pretty natural/to be expected, given that the decorators, when called, perform some validation. Not sure we want to change that, so we have a few options:

  1. Change the API for this (this is part of the reason I was suggesting the := operator
  2. Require types to match

Happy to do (2) for now, especially to get this out. As this is a recipe, people can tweak it their own way.

Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, some minor notes about justification/context.

@@ -0,0 +1,21 @@
# External decorators demo

This example is in response to this [GitHub issue](https://github.com/DAGWorks-Inc/hamilton/issues/50). However, it showcases more general mechanisms of how to parse "Hamilton modules" and dynamically alter source code.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention the overall goal/motivation first, then that it is in response to the issue. Good to have a "why".

Also worth calling it something other than "External decorators" -- E.G. "Reusing Hamilton Functions without depending on Hamilton" or something like that.

Use an existing Python module that has no Python dependencies and decorated it with Hamilton functionalities without altering the source file.

# Content
The script run.py will:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so we have two different approaches here, and they're a little confusing. Let's highlight the two use-cases separately, and make run.py do both of them distinctly.


module_uuid = str(uuid.uuid4())
module_object = ModuleType(module_uuid, reader.module_docstring)
sys.modules[module_uuid] = module_object
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is funky -- mind leaving some comments?

return module_object


if __name__ == "__main__":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth using click to break this into multiple (2) comands that each to a use-case. Then they can take it in as sys.argv.

zilto and others added 3 commits June 12, 2023 14:58
fixed typo
Clarification on version control
@elijahbenizzy elijahbenizzy mentioned this pull request Jun 20, 2023
7 tasks
@zilto zilto closed this by deleting the head repository Aug 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants