-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RecordReader
trait and proc macro to implement it for a struct
#4773
Conversation
@tustvold would it be possible to get a review on this? Apologies if I am pinging the wrong person. Please do let me know if there is anything further I need to do to get this ready for review |
Apologies, it is on my list to review, but I'm not very familiar with the non-arrow APIs so need longer to sit down and get the necessary context. Development has largely focused over the last couple of years on the columnar, arrow-based APIs as they have better performance characteristics, so I lack any immediate context 😅 |
No problem. Just wanted to make sure this PR hadn't been missed |
@tustvold Apologies for pinging you again. I was wondering if you could estimate a possible timeline for review of this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done an initial review, overall this looks very nice, I would still encourage people who care about performance to use the columnar APIs, but I can see the appeal/utility of a row-API.
Overall this looks well tested and seems to make sense. My only comment is that the logic that converts from columnar to the row representation I think could be made easier to follow and a bit better documented
@tustvold I've resolved most of your comments. Left two open:
|
Any idea why clippy is suddenly failing on code that (I think) has nothing to do with this PR? Can I ignore these issues or is it due to something I have changed? |
You probably need to merge the latest master to pick up fixes for the lints added in the latest Rust version |
@tustvold could you approve the workflows please? |
@tustvold workflows should pass now (apologies for fmt failing the previous time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable to me, I have to confess to not being sure that it will be possible to evolve this interface to support nested data, but perhaps that isn't a problem. We don't any row/record-oriented APIs that support nesting, so perhaps this isn't an issue
} | ||
Some(ThirdPartyType::ChronoNaiveDate) => { | ||
quote! { | ||
::chrono::naive::NaiveDate::from_num_days_from_ce_opt(vals[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this logic different to above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure exactly what your question related to, but this logic for getting the NaiveDate
is just the inverse of above. However, there is no function from_signed_duration_since
that mirrors the signed_duration_since
function we are using above, so I had to use from_num_days_from_ce_opt
, which uses a different date to measure days from. That's why we need to add on the number of days between that other date and 01/01/1970.
Which issue does this PR close?
Closes #4772.
What changes are included in this PR?
Added a
parquet::record::RecordReader
trait similar to the existingparquet::record::RecordWriter
Added a
parquet_derive::ParquetRecordReader
derive macro to implement the newRecordReader
trait for slices of structs, similar to the existingparquet_derive::ParquetRecordWriter
macroAdded tests to
parquet_field.rs
forreader_snipped
similar to the existing tests forwriter_snipped
Added a test to
parqued_derive_test
to test writing a struct slice to a parquet file, reading it again, and then checking the output is the same as the original struct.Not all types supported by
ParquetRecordWriter
are supported byParquetRecordReader
yet. Specifically there is no support for options or references. References is difficult because usingRc
would mean we have to manually annotate types for the compiler which requires quite a lot of matching on theField
'sty
field. See: https://github.com/Joseph-Rance/arrow-rs/blob/f49472eb93c4bf27aa20d8040c57539fc73d1059/parquet_derive/src/parquet_field.rs#L452-L454Updated the readme to reflect the above point
Are there any user-facing changes?
parquet::record::RecordReader
trait and theparquet_derive::ParquetRecordReader
proc macro