-
-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Avro] Add logicalType
support for some java.time
types; add AvroJavaTimeModule
for native ser/deser
#283
Conversation
b29c981
to
1bd02ba
Compare
.../main/java/com/fasterxml/jackson/dataformat/avro/jsr310/deser/AvroLocalDateDeserializer.java
Outdated
Show resolved
Hide resolved
.../main/java/com/fasterxml/jackson/dataformat/avro/jsr310/deser/AvroLocalDateDeserializer.java
Outdated
Show resolved
Hide resolved
avro/src/main/java/com/fasterxml/jackson/dataformat/avro/jsr310/ser/AvroInstantSerializer.java
Show resolved
Hide resolved
Looks good; I added some suggestions. Couple of things I would need:
and with that I would be able to merge this PR to add neat new functionality in for 2.13! Also I have one question: do you think there is any likelihood that some usage might be broken due to addition of date/time type detection? If there was, adding a configurability option for |
… anything, expected pattern is to let exception pass through if any.
@cowtowncoder Thank you for your time. I am sorry for later response, I had a short time off.
|
|
First of all: absolutely no problem if things take time -- I take my time and this is all voluntary! Great to get contributions how much trouble it can be to get submission "just right" so I appreciate your work here. Additional PRs: I think one possibility is to just close existing one(s), push new? Could also be that you can push to your original forked branch (I actually do not know for sure). I do try to squash merges where applicable, but I am not the tidiest git committer myself. I did receive CLA, will check that in. |
59f82a3
to
ba76375
Compare
Looks good; hoping to review again today or tomorrow, get merged. |
avro/src/main/java/com/fasterxml/jackson/dataformat/avro/jsr310/AvroJavaTimeModule.java
Outdated
Show resolved
Hide resolved
avro/src/main/java/com/fasterxml/jackson/dataformat/avro/jsr310/AvroJavaTimeModule.java
Outdated
Show resolved
Hide resolved
avro/src/main/java/com/fasterxml/jackson/dataformat/avro/jsr310/AvroJavaTimeModule.java
Outdated
Show resolved
Hide resolved
avro/src/main/java/com/fasterxml/jackson/dataformat/avro/jsr310/AvroJavaTimeModule.java
Outdated
Show resolved
Hide resolved
avro/src/main/java/com/fasterxml/jackson/dataformat/avro/jsr310/AvroJavaTimeModule.java
Outdated
Show resolved
Hide resolved
.../main/java/com/fasterxml/jackson/dataformat/avro/jsr310/ser/AvroLocalDateTimeSerializer.java
Show resolved
Hide resolved
} | ||
|
||
@Test | ||
public void testSchemaCreation() throws JsonMappingException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One quick question here: since changes to schema visitor are applied regardless of module (I think?), would there be problems/changes to logic for existing users? That is, is there any possible backwards-compatibility concerns wrt creating more granular Avro schema types?
If there are, it might make sense to add configurability to allow disabling use of date/time types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will think about it.
Hmmm, is there a way how to control such switch from a module ?
I guess not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, looks quite good; some minor issues wrt module metadata (added notes on how to change) but nothing major.
One question I have however is wrt definition/use of Local date/time types: this is a perennial challenge as users are VERY confused about what "local" means, assuming it means "their timezone" (for example). My understanding is that these are rather "abstract" date/time/datetime values which do not have concrete time point without addition time zone or offset.
But to serialize such values numerically, one either needs to use more complex notation (separate fields) OR bind them to a concrete timezone/offset. In latter case it's common to just use UTC, and I think this is what is done here?
That is fine in itself, esp. if that's what typical Avro usage (with apache lib, frameworks) is.
I am only concerned about documentation aspects, explaining how it works.
So... thought it's good to be explicit on exact behavior, and trying to make things as straight-forward as possible (which is hard; this is Date/Time handling which as a domain is complicated).
Anyway: just wanted to ask about this before merging. It's always possible add improvements, changes (doc and otherwise) but thought I'll mention these before merging just in case.
This is correct and now I understand how mention of "local" might be confusing. Next I will think of a switch to enable/disable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caveat that I've worked a fair bit with Java time types, but very little with Avro.
avro/README.md
Outdated
Please note that time zone information is at serialization. Serialized values represent point in time, | ||
independent of a particular time zone or calendar. Upon reading a value back time instant is reconstructed but not the original time zone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of OffsetDateTime
and ZonedDateTime
, aren't they deserialized using whatever defaultZoneId
is passed to AvroInstantDeserializer.fromLong()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is correct.
They are deserialized using whatever `defaultZoneId1 is passed to AvroInstantDeserializer.fromLong()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be clearer to say something like, "Note that time zone & offset information is not serialized—the serialized representation is only a point in time. For local time types (OffsetDateTime
and ZonedDateTime
) the time zone defaultZoneId
in the Avro context is used for converting to & from the serialized representation. The same defaultZoneId
must be used at serialization & deserialization time to obtain meaningful results."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having written that, the behavior feels weird to me. Would it be possible to store the offset/zoneId for the OffsetDateTime
and ZonedDateTime
types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, zoned things absolutely should retain time zone and/or offset, and changing that to something else feels very much Wrong.
For local variants it may be necessary to do interim binding if (but only if) representation uses a fixed timepoint (like long
for "milliseconds since 1. 1. 1970") -- but if so, it should be something really fixed like UTC and not whatever user happens to configure (because "default" zone id would otherwise have to match on writer and reader).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, zoned things absolutely should retain time zone and/or offset, and changing that to something else feels very much Wrong.
Avro specification does not aim to preserve time zone for non local-XYZ
logical types.
For correct deserialization into OffsetDateTime
or ZonedDateTime
reader and writer have to agree on time zone.
Support of OffsetDateTime
and ZonedDateTime
is here for convenience - I can drop it.
For local variants it may be necessary to do interim binding if (but only if) representation uses a fixed timepoint (like long for "milliseconds since 1. 1. 1970") -- but if so, it should be something really fixed like UTC and not whatever user happens to configure (because "default" zone id would otherwise have to match on writer and reader).
For local variants, contextual time zone is not used at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MichalFoksa Ok, I think I better go over the changes once again & try to find what Avro specification says. Although handling of local/zoned types has expected semantics in Java 8, I vaguely recall Avro proscribing behvavior that seemed to differ... and I think for interoperability the letter of Avro spec should usually have precedence (even if I disagreed with how it was defined).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even after reading and re-reading what Avro spec says about timestamp, regular and local, I have no idea what are those supposed to mean -- it seems nonsense to be blunt.
Not the part about physical storage itself (although why on earth are there separate milli- vs micro-second types?) but ... well, if NEITHER stores timezone information NOR is there ANY WAY to sync send/receiver zones, then... there seems to be no actual reason for 2 types. At all. I mean, timestamp in this sense can NEITHER be local (it is concrete physical time offset) NOR non-local (no time offset or time zone!). It is much like java.util.Date
yet named in a confusing way, using 4 different but rather non-distinct types.
What a mess!
But to try to untangle the mess I guess there is only the one question of how would read and write operations handle these differently.
On writing side there probably cannot be any difference: physical timestamp is what it is. Whatever "local" timezone could be thought to be does not matter; change of zone/offset would not change that value.
On reading side timezone/offset is sort of arbitrary as well: value itself is concrete, although we can use whatever zone we might want.
I guess use of "context timezone" could make sense for "local" variant? For non-local I don't have strong opinion: either UTC or context timezone would be fine as far as I see it.
@MichalFoksa WDYT? Apologies for this taking long -- but I have some time now and will get this merged during this week :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess use of "context timezone" could make sense for "local" variant?
"local" variants do not contain time zone.
For non-local I don't have strong opinion: either UTC or context timezone would be fine as far as I see it
I would leave it on user.
Yeah, Avro is Avro ... :) But it is not bad.
...in/java/com/fasterxml/jackson/dataformat/avro/jsr310/deser/AvroJavaTimeDeserializerBase.java
Outdated
Show resolved
Hide resolved
.../main/java/com/fasterxml/jackson/dataformat/avro/jsr310/ser/AvroLocalDateTimeSerializer.java
Show resolved
Hide resolved
Co-authored-by: Drew Stephens <[email protected]>
Co-authored-by: Drew Stephens <[email protected]>
02ed97f
to
707f3a4
Compare
avro/README.md
Outdated
``` | ||
|
||
#### Note | ||
Please note that time zone information is at serialization. Serialized values represent point in time, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should "is at serialization" instead be "is not included at serialization" (or something like that)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right,
I meant "is lost at serialization"
@MichalFoksa I think I am mostly happy here; added minor notes on things to mention in README but I think I agree wrt ser/deser aspects given how Avro spec defines things (which is bit confusing wrt "local"/"non-local" timestamp but whatever). The one last thing I noticed was your mention of allowing disabling of generation of logical type, as an option.
|
db256e8
to
a05b4ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cowtowncoder
Here you are requested improvements.
Rather worry that sorry - I am really happy you review this and give feedback.
allowing disabling of generation of logical type, as an option
I want to do it. Doing it in another PR would be best.
avro/README.md
Outdated
``` | ||
|
||
#### Note | ||
Please note that time zone information is at serialization. Serialized values represent point in time, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right,
I meant "is lost at serialization"
avro/README.md
Outdated
Please note that time zone information is at serialization. Serialized values represent point in time, | ||
independent of a particular time zone or calendar. Upon reading a value back time instant is reconstructed but not the original time zone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess use of "context timezone" could make sense for "local" variant?
"local" variants do not contain time zone.
For non-local I don't have strong opinion: either UTC or context timezone would be fine as far as I see it
I would leave it on user.
Yeah, Avro is Avro ... :) But it is not bad.
java.time
types.
java.time
types.logicalType
support for some java.time
types; add AvroJavaTimeModule
for native ser/deser
Finally merged! Thank you @MichalFoksa and @dinomite -- will go in 2.13, although there can still be improvements. |
@cowtowncoder |
Functionality to support Avro schema with
logicaltype
for few java.time types. It is achieved by newDateTimeVisitor
, new moduleAvroJavaTimeModule
and couple of serializers and deserializers.AvroJavaTimeModule
works in milliseconds precision.Supported java types:
java.time.OffsetDateTime
java.time.ZonedDateTime
java.time.Instant
java.time.LocalDate
java.time.LocalTime
java.time.LocalDateTime
Risk
I think risk comes for backwards compatibility.
In
2.11
these Java types were somehow supported, they have got serialized either into array. If JSR310JavaTimeModule
with some combination ofWRITE_DATE_TIMESTAMPS_AS_NANOSECONDS
it coudd have been serialize also into string or long.Since
2.12
an exception is throw on serialization of any of these types on out of the box configuration,AvroMapper.builder().build()
. (I did not investigate further what might be idea behind this new behavior.)Exception after OffsetDateTime serialization in 2.12:
DateTimeVisitor
cannot be turned off. Anyone already using JSR310JavaTimeModule
and disablingWRITE_DATE_TIMESTAMPS_AS_NANOSECONDS
will suddenly receivelogicalType
instead ofjava-class
in Avro schema.java.time.OffsetDateTime schema before PR:
java.time.OffsetDateTime schema after PR:
It could be a problem for someone who already dealt with java.time types and found this workaround, but that somebody can adapt to in my opinion more Avro compatible solution.
Open points
Precision
Avro supports milliseconds and microseconds previsions for date and time related logicalType(s). I have implemented milliseconds precision only (based on my needs).
It is easy to change PR to microseconds prevision only. To support both previsions I think a new
Feature
switch have to be created - also not a problem.java.util.Date
I have decided to remove
java.util.Date
from supported types due to potential backwards compatibility issues. At the moment schema forjava.util.Date
looks like this:After it would look like:
I feel like it could cause issue to more people than issue with java.time types.