fix: Better error for `@pipe` without `step()` #1218

zilto · 2024-11-05T23:10:14Z

This problem was shared by a user.

Problem

The issue is caused by this node where @pipe_output does contain step() but the use of .when() (which interacts with the config) may mean no step is provided (relates to #1213)

@pipe_output(
    step(_foo).when(key="foo"),
    step(_bar).when(key="bar"),
)
def filtered_data(raw_data: pd.DataFrame) -> pd.DataFrame:
    return ...

This leads to this unhelpful error (trimmed). It indicates IndexError: list index out of range because the list of step is empty.

IndexError                                Traceback (most recent call last)
    354 builder = base_builder.copy()
--> 355 dr = builder.with_config(config).with_modules(cell_module).build()
    357 # determine final vars
 
File ..., in Builder.build(self)
-> 2153 return Driver(
   ...
   2161 )

File ...\hamilton\driver.py:466, in Driver.__init__(self, config, adapter, allow_module_overrides, _materializers, _graph_executor, _use_legacy_adapter, *modules)
    464     error = telemetry.sanitize_error(*sys.exc_info())
    465     logger.error(SLACK_ERROR_MESSAGE)
--> 466     raise e
    467 finally:
    468     # TODO -- update this to use the lifecycle methods
    469     self.capture_constructor_telemetry(error, modules, config, adapter)

File ..., in Driver.__init__(self, config, adapter, allow_module_overrides, _materializers, _graph_executor, _use_legacy_adapter, *modules)
    438 self.graph_modules = modules
    439 try:
--> 440     self.graph = graph.FunctionGraph.from_modules(
   ...
    445     )

File ...\hamilton\graph.py:726, in FunctionGraph.from_modules(config, adapter, allow_module_overrides, *modules)
    707 @staticmethod
    708 def from_modules(
    ...
    713 ):
   ...
--> 726     nodes = create_function_graph(
    727         *modules, config=config, adapter=adapter, allow_module_overrides=allow_module_overrides
    728     )

File ...\hamilton\graph.py:170, in create_function_graph(config, adapter, fg, allow_module_overrides, *modules)
    168 # create non-input nodes -- easier to just create this in one loop
    169 for _func_name, f in functions:
--> 170     for n in fm_base.resolve_nodes(f, config):
    171         if n.name in config:
    172             continue  # This makes sure we overwrite things if they're in the config...

File ...\hamilton\function_modifiers\base.py:833, in resolve_nodes(fn, config)
    831 except Exception as e:
    832     logger.exception(_resolve_nodes_error(fn))
--> 833     raise e

File ...\hamilton\function_modifiers\base.py:826, in resolve_nodes(fn, config)
    824 node_transformers = function_decorators[NodeTransformer.get_lifecycle_name()]
    825 for dag_modifier in node_transformers:
--> 826     nodes = dag_modifier.transform_dag(nodes, filter_config(config, dag_modifier), fn)
    827 function_decorators = function_decorators[NodeDecorator.get_lifecycle_name()]
    828 for node_decorator in function_decorators:

File ...\hamilton\function_modifiers\base.py:515, in NodeTransformer.transform_dag(self, nodes, config, fn)
    513 nodes_to_keep = self.compliment(nodes, nodes_to_transform)
    514 out = list(nodes_to_keep)
--> 515 out += self.transform_targets(nodes_to_transform, config, fn)
    516 return out

File ...\hamilton\function_modifiers\base.py:498, in NodeTransformer.transform_targets(self, targets, config, fn)
    496 out = []
    497 for node_to_transform in targets:
--> 498     out += list(self.transform_node(node_to_transform, config, fn))
    499 return out

File ...\hamilton\function_modifiers\macros.py:1317, in pipe_output.transform_node(self, node_, config, fn)
   1308 transforms = transforms + (step(__identity).named(fn.__name__),)
   1309 nodes, _ = chain_transforms(
   1310     target_arg=original_node.name,
   1311     transforms=transforms,
   (...)
   1314     fn=fn,
   1315 )
-> 1317 last_node = nodes[-1].copy_with(name=f"{node_.name}", typ=nodes[-2].type)
   1319 out = [original_node]
   1320 out.extend(nodes[:-1])

IndexError: list index out of range

Proposed solution

Have an helpful error message that says "@pipe_output is improperly defined and received no step. If you're using step(...).when(), make sure that your config enables at least one step() in the pipe"

The text was updated successfully, but these errors were encountered:

jernejfrank · 2024-11-06T23:59:22Z

Do we want to raise an error here or would it be better to return only the original node? So, in case no config_when conditions are met, pipe_output gets skipped?

zilto · 2024-11-08T16:09:29Z

Do we want to raise an error here or would it be better to return only the original node? So, in case no config_when conditions are met, pipe_output gets skipped?

I like the idea of "passthrough" when using @pipe and no config is specified. It would work with both @pipe_input and @pipe_output. It would be valuable to manage train/test implementation skew, for example, by adding processing steps when doing inference.

The main questions we have to answer:

Is this behavior intuitive and clear to the user?
Are there risks for it to fail silently?
Is it hard to debug?

I think this feature passes these assessments. The decorated function is intuitively the "default" implementation with step() being additive, and the visualization represents that clearly.

Side note: Currently, visualizing a @pipe will produce several nodes using the .with_step suffix, and changing configurations will modify how many .with_step nodes are present. Would it make sense to have an option to always display all of the step node and simply grey out which one were skipped for a given run?

jernejfrank · 2024-11-09T00:18:38Z

I like the idea of "passthrough" when using @pipe and no config is specified. It would work with both @pipe_input and @pipe_output. It would be valuable to manage train/test implementation skew, for example, by adding processing steps when doing inference.

The main questions we have to answer:

Is this behavior intuitive and clear to the user?

That should be the case since it is the same as config. But, I'll put on my TODO. to add to docs/newexample.

Are there risks for it to fail silently?

Is it hard to debug?

Yeah, but that is on the user -- since you get to see the dag in the ui, should not be hard to debug / comes with the usual dev pittfalls as config.

I think this feature passes these assessments. The decorated function is intuitively the "default" implementation with step() being additive, and the visualization represents that clearly.

Side note: Currently, visualizing a @pipe will produce several nodes using the .with_step suffix, and changing configurations will modify how many .with_step nodes are present. Would it make sense to have an option to always display all of the step node and simply grey out which one were skipped for a given run?

uuuu, I like that idea, but I think that's an upstream feature on the level of config and potentially making this a toggle option for the ui / visualise api.

zilto mentioned this issue Nov 6, 2024

refactor: hamilton.exceptions #1181

Open

jernejfrank mentioned this issue Nov 7, 2024

Bugfix pipe_output after config.when resolution #1221

Merged

skrawcz linked a pull request Nov 12, 2024 that will close this issue

Bugfix pipe_output after config.when resolution #1221

Merged

skrawcz closed this as completed in #1221 Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Better error for `@pipe` without `step()` #1218

fix: Better error for `@pipe` without `step()` #1218

zilto commented Nov 5, 2024

jernejfrank commented Nov 6, 2024

zilto commented Nov 8, 2024

jernejfrank commented Nov 9, 2024

fix: Better error for @pipe without step() #1218

fix: Better error for @pipe without step() #1218

Comments

zilto commented Nov 5, 2024

Problem

Proposed solution

jernejfrank commented Nov 6, 2024

zilto commented Nov 8, 2024

jernejfrank commented Nov 9, 2024

fix: Better error for `@pipe` without `step()` #1218

fix: Better error for `@pipe` without `step()` #1218