-
-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Use typing.Annotated for customization #571
Comments
I like this. I've found myself reaching for attrs metadata a lot to do particular customizations but this doesn't work for basic types like |
While interesting, I'm afraid this goes against the principles of |
There is a big difference in the proposed approach and the basics of The test for me is if you wanted to stop using In practice there is no way to avoid having specific annotations to your classes about how to customize serialization in special cases and contexts. How and where you write this behavior is only a question of style IMO. For instance do you have a central place where you register hooks for a specific type? Or do you have that at the location in which the type is defined? The way I do this currently is through attrs field metadata. By choosing this its really up to the converter to be able (and willing) to understand and honor that annotation. It in no way would effect the business logic code that works with the constructed object, like it potentially could in Pydantic. The alternative would be to have something like a large centralized lookup table for every class in your code base with the policy for it, in each converter. Maybe this works for your team, but for many its much better to have it all in one file alongside the definition. |
I understand that philosophy, and generally agree with it. I too try to avoid putting logic into my dataclasses where possible. However, having the ability to do so as an opt-in feature would be incredibly valuable in speeding up production, reducing code, and documenting field behavior in a both human- and machine-readable way. I'm not saying by any means we should make this the default, just that I think it could be immensely useful. For example, here's something I've used at work to deserialize structured logs. (Fields irrelevant to this example have been removed.) Currently147 lines from __future__ import annotations
import uuid
from collections.abc import Mapping
from typing import Any
from typing import Callable
from typing import Final
from typing import TypeVar
import attrs
import cattrs.gen
import cattrs.preconf.json as cattrs_json
@attrs.frozen(kw_only=True, slots=True, eq=False, order=False)
class LogRecord:
json: JSONSubRecord
kubernetes: KubernetesSubRecord
kubernetes_cluster: str
@attrs.frozen(kw_only=True, slots=True)
class JSONSubRecord:
function_name: str | None = None
log_level: str | None = None
line_number: int | None = None
extras: Mapping[str, Any] = attrs.field(factory=dict)
@attrs.frozen(kw_only=True, slots=True)
class KubernetesLabels:
extras: Mapping[str, str] = attrs.field(factory=dict)
@attrs.frozen(kw_only=True, slots=True)
class KubernetesSubRecord:
labels: KubernetesLabels = attrs.field(factory=KubernetesLabels)
T = TypeVar("T")
type Unstructurer = Callable[[object], dict[str, Any]]
type Structurer = Callable[[Mapping[str, Any], ...], Any]
type UnstructurerFactory = Callable[[type], Unstructurer]
type StructurerFactory = Callable[[type], Structurer]
def _make_structurer_factory(**unstructure_fn_kwargs: object) -> StructurerFactory:
"""Create a structurer factory that moves unrecognized keys into an "extras" dict
on the model."""
def _factory(cls: type[T]) -> Structurer:
# First we generate what cattrs would have used by default.
default_structure = cattrs.gen.make_dict_structure_fn(
cls,
_LOG_CONVERTER,
**unstructure_fn_kwargs,
)
# Get the set of attribute names on the dataclass. Anything that's not in this
# set will get put into the "extras" dict.
attribute_names = {a.name for a in attrs.fields(cls)}
def structure(
source_mapping: Mapping[str, Any], *_args: object, **_kwargs: object
) -> T:
copy = dict(source_mapping)
copy["extras"] = {}
# Move keys that aren't model attributes into the "extras" dict, which *is*
# an attribute.
for extra_key in source_mapping.keys() - attribute_names:
copy["extras"][extra_key] = copy.pop(extra_key)
# Now that all the unrecognized keys have been moved, we can go ahead with
# normal structuring.
return default_structure(copy, cls)
return structure
return _factory
def _make_unstructurer_factory(**unstructure_fn_kwargs: object) -> UnstructurerFactory:
"""Create an unstructurer factory that hoists all keys in "extras" into the output
dict, and removes "extras".
"""
def _factory(cls: type[T]) -> Unstructurer:
# First we generate what cattrs would have used by default.
default_unstructure = cattrs.gen.make_dict_unstructure_fn(
cls,
_LOG_CONVERTER,
**unstructure_fn_kwargs,
)
def unstructure(value: T) -> dict[str, Any]:
result_dict = default_unstructure(value)
extras = result_dict.pop("extras", {})
result_dict.update(extras)
return result_dict
return unstructure
return _factory
_LOG_CONVERTER: Final = cattrs_json.make_converter()
_LOG_CONVERTER.register_structure_hook(uuid.UUID, lambda value, _: uuid.UUID(value))
_LOG_CONVERTER.register_unstructure_hook(uuid.UUID, str)
_JSON_STRUCTURER_FACTORY_KWARGS = {
"function_name": cattrs.override(rename="funcName"),
"line_number": cattrs.override(rename="lineno"),
"log_level": cattrs.override(rename="levelname"),
}
_LOG_CONVERTER.register_structure_hook_factory(
lambda cls: issubclass(cls, JSONSubRecord),
_make_structurer_factory(**_JSON_STRUCTURER_FACTORY_KWARGS),
)
_LOG_CONVERTER.register_unstructure_hook_factory(
lambda cls: issubclass(cls, JSONSubRecord),
_make_unstructurer_factory(**_JSON_STRUCTURER_FACTORY_KWARGS),
)
_LOG_CONVERTER.register_structure_hook_factory(
lambda cls: issubclass(cls, KubernetesLabels), _make_structurer_factory()
)
_LOG_CONVERTER.register_unstructure_hook_factory(
lambda cls: issubclass(cls, KubernetesLabels), _make_unstructurer_factory()
)
_LOG_CONVERTER.register_structure_hook(
LogRecord,
cattrs.gen.make_dict_structure_fn(
LogRecord,
_LOG_CONVERTER,
kubernetes_cluster=cattrs.gen.override(rename="kubernetes.cluster"),
),
)
_LOG_CONVERTER.register_unstructure_hook(
LogRecord,
cattrs.gen.make_dict_unstructure_fn(
LogRecord,
_LOG_CONVERTER,
kubernetes_cluster=cattrs.gen.override(rename="kubernetes.cluster"),
),
) Using
|
@diego-oncoramedical The problem I see with your example is that there is no specialization for different converters. Should you always rename a field? Or only for JSON output? For instance I have multiple JSON converters for the user facing API and one for putting into a database. The database one probably doesn't need to have things renamed since there are no ergonomics for converting it. So I like the idea of using annotations, but they need to be specified as policy rather than mechanism, and it should be the converters that decide if they are in scope of the policy and need to perform it. In you example: Which converters should do renaming? That is not currently handled in your example. |
For what I need to do, JSON is the only permissible format, so I didn't bother separating out the logic that way.
Agh yeah I didn't think about that. That would certainly complicate things.
I've thought about this for maybe two seconds so take it with a pound of salt, but you just gave me an idea for how to do this in a similar declarative way: from __future__ import annotations
import cattrs.preconf.json as cattrs_json
from cattrs.policies import *
class LogRecordPolicy:
json: JSONSubRecordPolicy
kubernetes: KubernetesSubRecordPolicy
kubernetes_cluster = Rename("kubernetes.cluster")
class KubernetesSubRecordPolicy:
labels: KubernetesLabelsPolicy
class JSONSubRecordPolicy:
function_name = Rename("funcName")
log_level = Rename("levelname")
line_number = Rename("lineno")
extras = FieldPolicy(CatchAll, NoDump) # Needed for fields w/ multiple things
class KubernetesLabelsPolicy:
extras = FieldPolicy(CatchAll, NoDump)
_LOG_CONVERTER = cattrs_json.make_converter()
# Only the top level *needs* to be registered; policies are recursively extracted and
# matched by field name.
_LOG_CONVERTER.register_policy(LogRecord, LogRecordPolicy) I'm not a huge fan of the duplication, but to reduce the burden on the programmer fields that don't need special handling could be omitted and unrecognized fields ignored. |
This could probably be done as a separate library that generates factory functions for converters, instead of needing to be integrated into the main |
Not a fan of that approach :) I think annotations or attrs field metadata would be clearer and less boilerplate.
Absolutely. Even if it was in cattrs it would be an optional "strategy" (which is just a function that registers a set of hooks/factories). |
Yeah this is one of those things where you wake up the next day and are like "wait what".
@jonathanberthias what do you think? It does go against that note in the docs but the flexibility and separation of concerns advocated for does have a high price for data that is solely intended to be round-tripped. |
Hi. I'm on paternity leave so availability is limited. However, I just wanted to chime in that I would be a 100% on board with us shipping a strategy that enables complex I just don't know about the road map. I encourage you folks to play around with this on your own in the mean time, I'm happy to give guidance. |
I've implemented something like this based on the field overrides. def get_annotated_by(cls, annotation_type) -> Mapping[str, gen.AttributeOverride]: ...
class Person:
last_name: Annotated[str, gen.override(rename="surname")]
converter.register_structure_hook_factory(
attrs.has,
lambda cls: gen.make_dict_structure_fn(cls, converter, **get_annotated_by(cls, gen.AttributeOverride)),
) This works well if you use a single converter. In case of multiple converters for the same type, one could imagine something similar to this: foo_converter = Converter()
bar_converter = Converter()
class Person:
last_name: Annotated[str, foo_converter.override(rename="surname"), bar_converter.override("family_name")]
class Person:
last_name: Annotated[str, override(foo_converter, rename="surname"), override(bar_converter, "family_name")] However, I feel like the override definitions should be part of the converter if you use multiple converters. |
Huh that looks pretty cool actually. We could support this by default. I wouldn't do the per-converter version, but focus on the global one. |
Description
There's a decent amount of overhead in creating separate converters that needs to be done, which can get cumbersome quickly for a large number of classes. I was thinking it'd be possible to use
typing.Annotated
to reduce this work in a backwards-compatible way, without messing up type checking for those who need it.An example that could solve (#352 and #552 if I understand them correctly)
There are a few benefits to this:
attrs.field()
or similar.cattrs
that don't support it.Presumably this will be a non-trivial change, but I think it would be really valuable.
The text was updated successfully, but these errors were encountered: