Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unserializable parameters value #2122

Merged
merged 21 commits into from
Oct 7, 2024
Merged

Conversation

ravi-kumar-pilla
Copy link
Contributor

@ravi-kumar-pilla ravi-kumar-pilla commented Oct 2, 2024

Description

Resolves #2010

Development notes

  • Added logic to serialize parameter_value in case of complex types
  • Updated tests

QA notes

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes

@ravi-kumar-pilla ravi-kumar-pilla marked this pull request as ready for review October 2, 2024 22:06
@ravi-kumar-pilla ravi-kumar-pilla requested review from SajidAlamQB and noklam and removed request for astrojuanlu October 2, 2024 22:06
@noklam
Copy link
Contributor

noklam commented Oct 3, 2024

Add a note here: We discussed in private that this is not a new bug, but never the bug never surface before. We fixed an issue previously for non-primitive type, but this one is related to non-serialisation due to customer resolver, lambda etc

Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks good to me. Can you post a screenshot how does it looks like for the final result with this change? We should include this in the docs to inform user in case this happens.

@rashidakanchwala
Copy link
Contributor

rashidakanchwala commented Oct 3, 2024

Awesome, i also tried this with #1712 and kedro viz runs !

Copy link
Contributor

@rashidakanchwala rashidakanchwala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FE is a bit odd since we return a string but at least it doesn't break anymore. Maybe we can figure a better way in the future (Juanlu mentioned using getattr)

Screenshot 2024-10-03 at 14 01 12

Copy link
Contributor

@SajidAlamQB SajidAlamQB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you @ravi-kumar-pilla!

@ravi-kumar-pilla
Copy link
Contributor Author

The FE is a bit odd since we return a string but at least it doesn't break anymore. Maybe we can figure a better way in the future (Juanlu mentioned using getattr)

Yes, I was trying to avoid this and experimented different ways (like trying to iterate through the dict and only converting the dict value which is an object). So what I realized is, we can -

  1. Try to come up with all different cases (like dict, set, list, custom object etc) and have helper functions which try to serialize them before returning to FE, which might get tricky when the values are combinations of complex types. We might also ask the users to have a repr or str implemented in-case of custom objects.
  2. Try to use the jsonable_encoder provided by FASTAPI which tries its best to serialize the value and if not throws an exception.

Though this is not ideal, I could not think of any better way to handle all possible cases.

@astrojuanlu could you please share any info on how to use getattr in resolving this issue (since the name of the attribute can be anything, I am not sure how to use getattr for this scenario)

Thank you

@ravi-kumar-pilla
Copy link
Contributor Author

The fix looks good to me. Can you post a screenshot how does it looks like for the final result with this change? We should include this in the docs to inform user in case this happens.

For the example where parameters file is -

split_options:
  test_size: 0.2
  random_state: 3
  target: 'price'
  data_info: "${polars:Float64}"

Custom Resolver as below -

CONFIG_LOADER_ARGS = {
      "base_env": "base",
      "default_run_env": "base",
      "custom_resolvers": {
        "add": lambda *my_list: sum(my_list),
        "polars": lambda x: getattr(pl, x)
    }
}

UI looks like -

image

Regarding the docs, we don't have much info about the metadata panel apart from stats. I can add a section about the parameter value here - https://docs.kedro.org/projects/kedro-viz/en/stable/kedro-viz_visualisation.html . What do you think ? @noklam

Thank you

@noklam
Copy link
Contributor

noklam commented Oct 3, 2024

image

I get this instead - where are those __module__ coming from?

@ravi-kumar-pilla
Copy link
Contributor Author

I get this instead - where are those __module__ coming from?

image

This is how jsonable_encoder works when it encounters custom objects (like pl.Float64), it falls back on encoding the object's attributes (such as its class and module metadata), which is what you're seeing.

@noklam
Copy link
Contributor

noklam commented Oct 3, 2024

In this case I think fastapi make this harder to understand as it creates additional key that doesn't exist. What if we don't use fastapi at all? Can you also give an example that using fastapi actually make this better?

@ravi-kumar-pilla
Copy link
Contributor Author

ravi-kumar-pilla commented Oct 3, 2024

In this case I think fastapi make this harder to understand as it creates additional key that doesn't exist. What if we don't use fastapi at all? Can you also give an example that using fastapi actually make this better?

I would not say it makes better but it will handle all edge cases that we might miss. But we can also have a custom serializer function something like below which will not return any unexpected value. I was not sure if this handles all the cases.

def serialize_value(value: Any) -> Any:
    """Custom serialization function to handle different types of values,
    including primitives, lists, dictionaries, sets, tuples, and objects."""
    # Recursively serialize dictionaries
    if isinstance(value, dict):
        return {k: serialize_value(v) for k, v in value.items()}
    # Recursively serialize lists and sets
    elif isinstance(value, (list, set)):
        return [serialize_value(v) for v in value]
    # Recursively serialize tuples
    elif isinstance(value, tuple):
        return tuple(serialize_value(v) for v in value)
    # Handle primitives
    elif isinstance(value, (int, float, str, bool, type(None))):
        return value
    # Handle bytes
    elif isinstance(value, (bytes, bytearray)):
        # Convert to string
        return value.decode()
    elif hasattr(
        value, "__dict__"
    ):  # Handle custom objects by serializing their __dict__
        return serialize_value(value.__dict__)
    # Fallback: convert unknown types to a string
    return str(value)

FYI jsonable_encoder source

@noklam
Copy link
Contributor

noklam commented Oct 3, 2024

I see, what if we just naively use str instead when it's non-serialisable, otherwise just return it as is?

self.kedro_obj.load()

Do I understand this correctly, originally we only have this part and this will pass to a pydantic model and eventually go through the fastapi json encoder?

@ravi-kumar-pilla
Copy link
Contributor Author

I see, what if we just naively use str instead when it's non-serialisable, otherwise just return it as is?

We can do that as well, but I thought since we have jsonable_encoder doing a better job than naively returning a string, opted for it. But may be this is creating a bit of confusion. The basic serialization function also seems to handle most of the cases. Do you think we can go with the function I commented ?

Do I understand this correctly, originally we only have this part and this will pass to a pydantic model and eventually go through the fastapi json encoder?

yes that is correct. So for example we have the below Response model and we allow parameters value to be Any. In case of objects, this throws error.

class TaskNodeAPIResponse(BaseGraphNodeAPIResponse):
    parameters: Dict
    ...

@noklam noklam self-requested a review October 7, 2024 14:09
@ravi-kumar-pilla ravi-kumar-pilla merged commit ed0e6aa into main Oct 7, 2024
40 checks passed
@ravi-kumar-pilla ravi-kumar-pilla deleted the fix/resolver-viz-compat branch October 7, 2024 21:42
@jitu5 jitu5 mentioned this pull request Nov 20, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to serialize objects returned by custom resolvers in parameters
4 participants