Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There exists a ParquetRecordWriter proc macro in parquet_derive, but ParquetRecordReader is missing #4772

Closed
Joseph-Rance opened this issue Sep 4, 2023 · 2 comments · Fixed by #4773
Labels
enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate

Comments

@Joseph-Rance
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Reading parquet files into slices of structs can take quite a lot of code. There exists a parquet_derive::ParquetRecordWriter derive macro to write from a slice of structs to a parquet file, but there is no equivalent parquet_derive::ParquetRecordReader macro.

Describe the solution you'd like

A parquet_derive::ParquetRecordReader macro that does the same as parquet_derive::ParquetRecordWriter but for reading.

There already exists a parquet::record::RecordWriter trait:

pub trait RecordWriter<T> {
fn write_to_row_group<W: std::io::Write + Send>(
&self,
row_group_writer: &mut SerializedRowGroupWriter<W>,
) -> Result<(), ParquetError>;
/// Generated schema
fn schema(&self) -> Result<TypePtr, ParquetError>;
}

I would like there to be a similar parquet::record::RecordReader trait such as:

pub trait RecordReader<T> {
    fn read_from_row_group(
        &mut self,
        row_group_reader: &mut dyn RowGroupReader,
        max_records: usize,
      ) -> Result<(), ParquetError>;
}

There also exists a parquet_derive::ParquetRecordWriter proc macro to implement this trait for slices of structs:

#[proc_macro_derive(ParquetRecordWriter)]
pub fn parquet_record_writer(input: proc_macro::TokenStream) -> proc_macro::TokenStream {

Generates code to implement trait here:
impl #generics ::parquet::record::RecordWriter<#derived_for #generics> for &[#derived_for #generics] {

So I would again like there to be a similar parquet_derive::ParquetRecordReader that implements something like:

impl #generics ::parquet::record::RecordReader<#derived_for #generics> for &mut [#derived_for #generics] {

Describe alternatives you've considered

The alternative is for the user to write this code by hand. However, for large structs a macro is necessary and then quickly becomes quite a lot of code ontop of the existing parquet library.

Additional context

I have already implemented a possible solution which I will make a PR for.

@Joseph-Rance Joseph-Rance added the enhancement Any new improvement worthy of a entry in the changelog label Sep 4, 2023
@tustvold tustvold added the parquet Changes to the parquet crate label Nov 2, 2023
@tustvold
Copy link
Contributor

tustvold commented Nov 2, 2023

label_issue.py automatically added labels {'parquet'} from #4773

@tustvold
Copy link
Contributor

tustvold commented Nov 2, 2023

label_issue.py automatically added labels {'parquet-derive'} from #4773

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants