Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Better error for @pipe without step() #1218

Closed
zilto opened this issue Nov 5, 2024 · 3 comments · Fixed by #1221
Closed

fix: Better error for @pipe without step() #1218

zilto opened this issue Nov 5, 2024 · 3 comments · Fixed by #1221

Comments

@zilto
Copy link
Collaborator

zilto commented Nov 5, 2024

This problem was shared by a user.

Problem

The issue is caused by this node where @pipe_output does contain step() but the use of .when() (which interacts with the config) may mean no step is provided (relates to #1213)

@pipe_output(
    step(_foo).when(key="foo"),
    step(_bar).when(key="bar"),
)
def filtered_data(raw_data: pd.DataFrame) -> pd.DataFrame:
    return ...

This leads to this unhelpful error (trimmed). It indicates IndexError: list index out of range because the list of step is empty.

IndexError                                Traceback (most recent call last)
    354 builder = base_builder.copy()
--> 355 dr = builder.with_config(config).with_modules(cell_module).build()
    357 # determine final vars
 
File ..., in Builder.build(self)
-> 2153 return Driver(
   ...
   2161 )

File ...\hamilton\driver.py:466, in Driver.__init__(self, config, adapter, allow_module_overrides, _materializers, _graph_executor, _use_legacy_adapter, *modules)
    464     error = telemetry.sanitize_error(*sys.exc_info())
    465     logger.error(SLACK_ERROR_MESSAGE)
--> 466     raise e
    467 finally:
    468     # TODO -- update this to use the lifecycle methods
    469     self.capture_constructor_telemetry(error, modules, config, adapter)

File ..., in Driver.__init__(self, config, adapter, allow_module_overrides, _materializers, _graph_executor, _use_legacy_adapter, *modules)
    438 self.graph_modules = modules
    439 try:
--> 440     self.graph = graph.FunctionGraph.from_modules(
   ...
    445     )

File ...\hamilton\graph.py:726, in FunctionGraph.from_modules(config, adapter, allow_module_overrides, *modules)
    707 @staticmethod
    708 def from_modules(
    ...
    713 ):
   ...
--> 726     nodes = create_function_graph(
    727         *modules, config=config, adapter=adapter, allow_module_overrides=allow_module_overrides
    728     )

File ...\hamilton\graph.py:170, in create_function_graph(config, adapter, fg, allow_module_overrides, *modules)
    168 # create non-input nodes -- easier to just create this in one loop
    169 for _func_name, f in functions:
--> 170     for n in fm_base.resolve_nodes(f, config):
    171         if n.name in config:
    172             continue  # This makes sure we overwrite things if they're in the config...

File ...\hamilton\function_modifiers\base.py:833, in resolve_nodes(fn, config)
    831 except Exception as e:
    832     logger.exception(_resolve_nodes_error(fn))
--> 833     raise e

File ...\hamilton\function_modifiers\base.py:826, in resolve_nodes(fn, config)
    824 node_transformers = function_decorators[NodeTransformer.get_lifecycle_name()]
    825 for dag_modifier in node_transformers:
--> 826     nodes = dag_modifier.transform_dag(nodes, filter_config(config, dag_modifier), fn)
    827 function_decorators = function_decorators[NodeDecorator.get_lifecycle_name()]
    828 for node_decorator in function_decorators:

File ...\hamilton\function_modifiers\base.py:515, in NodeTransformer.transform_dag(self, nodes, config, fn)
    513 nodes_to_keep = self.compliment(nodes, nodes_to_transform)
    514 out = list(nodes_to_keep)
--> 515 out += self.transform_targets(nodes_to_transform, config, fn)
    516 return out

File ...\hamilton\function_modifiers\base.py:498, in NodeTransformer.transform_targets(self, targets, config, fn)
    496 out = []
    497 for node_to_transform in targets:
--> 498     out += list(self.transform_node(node_to_transform, config, fn))
    499 return out

File ...\hamilton\function_modifiers\macros.py:1317, in pipe_output.transform_node(self, node_, config, fn)
   1308 transforms = transforms + (step(__identity).named(fn.__name__),)
   1309 nodes, _ = chain_transforms(
   1310     target_arg=original_node.name,
   1311     transforms=transforms,
   (...)
   1314     fn=fn,
   1315 )
-> 1317 last_node = nodes[-1].copy_with(name=f"{node_.name}", typ=nodes[-2].type)
   1319 out = [original_node]
   1320 out.extend(nodes[:-1])

IndexError: list index out of range

Proposed solution

Have an helpful error message that says "@pipe_output is improperly defined and received no step. If you're using step(...).when(), make sure that your config enables at least one step() in the pipe"

@jernejfrank
Copy link
Contributor

Do we want to raise an error here or would it be better to return only the original node? So, in case no config_when conditions are met, pipe_output gets skipped?

@zilto
Copy link
Collaborator Author

zilto commented Nov 8, 2024

Do we want to raise an error here or would it be better to return only the original node? So, in case no config_when conditions are met, pipe_output gets skipped?

I like the idea of "passthrough" when using @pipe and no config is specified. It would work with both @pipe_input and @pipe_output. It would be valuable to manage train/test implementation skew, for example, by adding processing steps when doing inference.

The main questions we have to answer:

  • Is this behavior intuitive and clear to the user?
  • Are there risks for it to fail silently?
  • Is it hard to debug?

I think this feature passes these assessments. The decorated function is intuitively the "default" implementation with step() being additive, and the visualization represents that clearly.

Side note: Currently, visualizing a @pipe will produce several nodes using the .with_step suffix, and changing configurations will modify how many .with_step nodes are present. Would it make sense to have an option to always display all of the step node and simply grey out which one were skipped for a given run?

@jernejfrank
Copy link
Contributor

I like the idea of "passthrough" when using @pipe and no config is specified. It would work with both @pipe_input and @pipe_output. It would be valuable to manage train/test implementation skew, for example, by adding processing steps when doing inference.

The main questions we have to answer:

  • Is this behavior intuitive and clear to the user?

That should be the case since it is the same as config. But, I'll put on my TODO. to add to docs/newexample.

  • Are there risks for it to fail silently?
  • Is it hard to debug?

Yeah, but that is on the user -- since you get to see the dag in the ui, should not be hard to debug / comes with the usual dev pittfalls as config.

I think this feature passes these assessments. The decorated function is intuitively the "default" implementation with step() being additive, and the visualization represents that clearly.

Side note: Currently, visualizing a @pipe will produce several nodes using the .with_step suffix, and changing configurations will modify how many .with_step nodes are present. Would it make sense to have an option to always display all of the step node and simply grey out which one were skipped for a given run?

uuuu, I like that idea, but I think that's an upstream feature on the level of config and potentially making this a toggle option for the ui / visualise api.

@skrawcz skrawcz linked a pull request Nov 12, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants