Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implicit lazy-field operation tasks #724

Open
tclose opened this issue Dec 5, 2023 · 8 comments
Open

Implicit lazy-field operation tasks #724

tclose opened this issue Dec 5, 2023 · 8 comments
Labels
enhancement New feature or request

Comments

@tclose
Copy link
Contributor

tclose commented Dec 5, 2023

What would you like changed/added and why?

When connecting a lazy-out field to a downstream node, there are often minor operations to be applied to the field value, e.g. output_prefix = wf.lzin.subject_name + "_" or json_file=wf.lzin.input_niftix.json_file (see #723). When learning how to write Pydra workflows, users often intuitively expect that such operations should work and instead of needing to create custom function task.

I would like Pydra in such cases to automatically create a function task to perform the operation and insert it into the workflow transparently to the user.

What would be the benefit? Does the change make something easier to use?

The changes would be more

  • intuitive for new users
  • convenient to write
  • readable
@tclose tclose added the enhancement New feature or request label Dec 5, 2023
@effigies
Copy link
Contributor

effigies commented Dec 5, 2023

I don't think we should create full-on tasks that can be cached. If these operations are non-trivial, people can write tasks.

If you want to do this, I would start with an explicit model before making it implicit. I think we could provide a similar lazy interface as a LazyField that is a partial application of a function on LazyFields and/or values.

class LazyValue(Protocol):
    ...

class LazyField(LazyValue):
    ...

class Partial(LazyValue):
    def __init__(self, func, *args, **kwargs):
        self.func, self.args, self.kwargs = func, args, kwargs

    def apply(self):
        # Turn any LazyValues into values
        args, kwargs = resolve(self.args, self.kwargs)
        return self.func(*args, **kwargs)

    ...

The above examples would become:

import operator as op

output_prefix = Partial(op.add, wf.lzin.subject_name, "_")
json_file = Partial(op.getattr, wf.lzin.input_niftix, "json_file")

You could start adding to LazyField with something like:

class LazyField:
    def __add__(self, value):
        return Partial(op.add, self, value)

    def __getattr__(self, key):
        return Partial(op.getattr, self, key)

But where do you stop and why? Also, that __getattr__ makes me pretty nervous. And I can just see a linter-driven "cleanup" converting example one to output_prefix = f"{wf.lzin.subject_name}_" and nobody catching it because __str__ is always valid to call, and get output_prefix = "<LazyField at 0xABad1dea>_".

IMO explicit is better than implicit, here, and I think a clear model of these lightweight linking ops could be good. We could even have helpers like:

from functools import partial

Add = partial(Partial, op.add)
GetAttr = partial(Partial, op.getattr)

@tclose
Copy link
Contributor Author

tclose commented Dec 5, 2023

Yes, I was thinking that it wouldn't need to be a full blown task after I posted this. I like your Partial suggestion.

I was thinking to restrict __getattr__ calls to attributes/properties of the lazy-field type so they can be checked at construction time. Likewise we could only allow operators for field types that implement __add__, __mul__, ....

@tclose
Copy link
Contributor Author

tclose commented Dec 5, 2023

IMO explicit is better than implicit, here, and I think a clear model of these lightweight linking ops could be good. We could even have helpers like:

from functools import partial

Add = partial(Partial, op.add)
GetAttr = partial(Partial, op.getattr)

Not sure I get this, what is the explicit case in this example, using partial instead of an operator? One of the benefits as I see it is to align with user's expectations, and I have seen multiple new users expect to be able to do some basic operations on lazy fields

@tclose
Copy link
Contributor Author

tclose commented Dec 5, 2023

I'm not sure whether allowing __getattr__ on methods is a good idea or not. I can see a use case for some string manipulation methods, e.g.

wf.add(
    ATask(
        id=wf.lzin.a_field.replace('-', '_')
        ...

but agree that it will probably open a can of worms

@effigies
Copy link
Contributor

effigies commented Dec 5, 2023

GetAttr = partial(Partial, op.getattr)

Not sure I get this, what is the explicit case in this example, using partial instead of an operator? One of the benefits as I see it is to align with user's expectations, and I have seen multiple new users expect to be able to do some basic operations on lazy fields

This just shortens the boilerplate. If you're going to use this feature a lot, you might not want to use Partial(op.XYZ, ...) all the time.

output_prefix = Add(wf.lzin.subject_name, "_")
json_file = GetAttr(wf.lzin.input_niftix, "json_file")

Incidentally, I now think I'd prefer "Delayed" or something else besides "Partial", since we are waiting for arguments to be ready, not waiting for additional arguments.

I'm not sure whether allowing __getattr__ on methods is a good idea or not. I can see a use case for some string manipulation methods, e.g.

wf.add(
    ATask(
        id=wf.lzin.a_field.replace('-', '_')
        ...

I wonder if you could actually do this 100% by hacking __getattr__ and __call__, and let __add__ and replace come for free. You might end up with Delayed(Delayed(op.getattr, wf.lzin.a_field, replace), '-', '_').

but agree that it will probably open a can of worms

What if instead of doing it implicitly, we at least required people to acknowledge that they're not doing a normal expression: Magic(wf.lzin.a_field).replace('-', '_'). (Could do something like Connect or Delay.)

@tclose
Copy link
Contributor Author

tclose commented Dec 5, 2023

what about wf.lzin.a_field.delayed_value.replace('-', '_') ?

@effigies
Copy link
Contributor

effigies commented Dec 5, 2023

IMO that's harder to parse and understand where the magic is happening. If you'll accept extra syntax, I don't see why a weird attribute would be better than a weird function call.

@tclose
Copy link
Contributor Author

tclose commented Dec 6, 2023

Fair enough, would be happy with Delay I think.

This discussion has made me think that calling them "lazy" fields might be slight misnomer, as I would typically expect lazy fields to be populated when I access them, not at some specified later time. Not sure what I would have called them though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants