Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split up large schemas for AvroGenerateSpecific #168

Closed
wants to merge 5 commits into from

Conversation

natefitzgerald
Copy link
Contributor

This fixes #167, which is about the schema$ val that is added to the companion object of the generated case classes for Specific record types.

The bummer is that the testing paradigm in place here uses example files and in order to test this you obviously need a super long schema and thus a super long example file corresponding to it. I considered reworking some of these tests to generate the schema file and then check that the files compile but that seemed overkill. The downside is that I've now added 28k lines of testing for very niche functionality.

@julianpeeters
Copy link
Owner

The downside is that I've now added 28k lines...

lmao, good stuff.

@julianpeeters
Copy link
Owner

julianpeeters commented May 2, 2023

Does your PR fix the issue for you? If so, I'm happy to publish to cover your use case.

I slapped biguser.avsc into sbt-avrohugger's scripted tests (here) and ran sbt scripted avrohugger/SpecificSerializationTests, but compilation failed: while the field limit is just a gotcha I think, the others might be relevant, but my timeframe for digging deeper is a little ways off.

...
BigUser.scala:6:25: Platform restriction: a parameter list's length cannot exceed 254.
...
[info] [error] Error while emitting example/BigUser
[info] [error] UTF8 string too large
[info] [error] Error while emitting example/BigUser$
[info] [error] UTF8 string too large
...

@natefitzgerald
Copy link
Contributor Author

That's my bad, I only checked that the companion object compiled fine. I should have remembered the parameter length limit, it's not actually a problem with our large schemas because they're nested. I can't actually bring one in here so I attempted to generate one. I'll figure it out.

@julianpeeters
Copy link
Owner

Good news, @LeonPoon brings fish: #182

biguser.avsc is now passing in sbt-avrohugger v2.6.0 scripted tests (here), we can reopen this issue as needed.

Huge thanks for getting this rolling.

@steve-e
Copy link

steve-e commented Dec 13, 2023

Hi @julianpeeters
My team has a client with 919 fields, ie more than the JVM limit for parameter lists of 254. I have put a sample avsc locally into
avrohugger/SpecificSerializationTests into sbt-avrohugger and get this error

[info] [error] /private/var/folders/gc/xjm7lrlj5cn9kwbz7d0mwpbh0000gn/T/sbt_8a9778d4/target/scala-2.12/src_managed/main/compiled_avro/example/MoreThan254Fields.scala:6:35: Platform restriction: a parameter list's length cannot exceed 254.
[info] [error] final case class MoreThan254Fields(var field1: Int, var field2: Int, var field3: Int, var field4: Int, var fiel

(By the way, biguser.avsc has 251 fields, so that is within the JVM limit)

The way my colleague worked around this was to hand edit the generated file to move most of the fields from the parameter list to be fields declared in the class. This works for our use case.

I can try and create a PR to handle this issue, by generating a class with only a default no arg constructor. This could be the only behaviour for parameter lists longer than 254.

But perhaps you have other ideas on how to handle this?

@julianpeeters
Copy link
Owner

Glad you found a workaround, @steve-e, and thanks for sharing it.

However, I must decline your kind offer, due to maintenance reasons.

@steve-e
Copy link

steve-e commented Dec 15, 2023

Hi @julianpeeters
Understood. When producing a fix for this issue I ran into another, and can foresee other problems too. We may fork this repo to handle our use case, as the oversized schema is regularly updated, so code needs to be regenerated on a regular basis.

You may be interested in the changes I have made so far, which I have put in a draft and closed PR to make it easier for you and others to view. This PR solves the issues we have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AvroGenerateSpecific can generate a schema$ val with length beyond JVM constant limit
3 participants