Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet Modular Encryption support #3511

Open
tibaes opened this issue Jan 11, 2023 · 18 comments · May be fixed by #6637
Open

Parquet Modular Encryption support #3511

tibaes opened this issue Jan 11, 2023 · 18 comments · May be fixed by #6637
Labels
help wanted question Further information is requested

Comments

@tibaes
Copy link

tibaes commented Jan 11, 2023

Which part is this question about
Documentation

Describe your question
Is Parquet Modular Encryption supported by this library?

Additional context
I have found some mentions to AES and encryption here and there on the documentations and code base, however there is no example of it. I am strugling to make it work, so I'm starting to think this is not fully supported yet.

@tibaes tibaes added the question Further information is requested label Jan 11, 2023
@tustvold
Copy link
Contributor

We do not currently support this, but would welcome contributions to add support for it.

@bhoberman
Copy link

@tustvold I'm interested in doing implementation work for this. I'd love to have a dedicated chat about it with a maintainer or community member who has context for this issue and could get me involved with contributor discussion spaces!

@tustvold
Copy link
Contributor

tustvold commented Jan 11, 2024

I'm afraid I don't really have any context on this, as it isn't a part of the standard I am familiar with. Implementing this will likely involve interpreting the spec at https://github.com/apache/parquet-format and applying it to the Rust reader. If this is anything like other aspects of parquet, this will also involve a fair amount of spelunking in existing implementations to clarify ambiguity.

The actual encryption part can probably use something like https://docs.rs/ring/latest/ring/ as an optional dependency, but I'm just guessing here that the encryption is something standard.

I'm sorry I can't be of more help, I'd love to see this implemented and am happy to help review code contributions, but I don't really have the bandwidth at the moment to actively help with the actual implementation effort.

@bhoberman
Copy link

Thanks for the quick response! Parquet encryption uses two extremely standard primitives (which ring has perfectly fine implementations of). In principle, the encryption step is a very simple post-processing step, but I definitely anticipate the existing implementations having some weird quirks.

Given your resources, I'll just try to roll an implementation and submit it for review.

@tustvold
Copy link
Contributor

Thank you, I'm happy to review code, especially if it is well tested

@ggershinsky
Copy link

ggershinsky commented Mar 14, 2024

Hi @bhoberman , I've started looking into building a Rust implementation of PME, but fortunately have found this thread quickly.
How this work is going?
I'd be glad to provide any help (review, advice on PME design, code contributions, etc) as needed, feel free to ping me on [sent before edit]

@bhoberman
Copy link

Hey @ggershinsky thanks for reaching out! I'll contact you privately with more details.

TL;DR for those using this thread as a status indicator: this was going to be a work project for me, and we decided after the research phase that it made the most sense to bind to Arrow C++ for our use-case and staffing. That said, I'd love to contribute some personal time to this project should @ggershinsky or someone else be willing to drive it.

@bhoberman
Copy link

Hey @tustvold, @ggershinsky and I have met and are starting on an implementation of this together. Would it be possible for us to get invites to the Apache slack (as mentioned in the README) for easier coordination than email/GitHub?

@tustvold
Copy link
Contributor

Sure, if you join the discord you can then DM me your email addresses

@matthewgapp
Copy link

matthewgapp commented Aug 6, 2024

hey @tustvold and @bhoberman did you end up connecting or make any progress on the rust implementation? I checked the discord but didn't see any messages around encryption there. We're working on something that would depend on this and would love to help contribute if there's something already partially implemented.

@adamreeve
Copy link

We (@G-Research) would also like to see Parquet encryption support added and can contribute to this effort too, maybe we can work together on this @matthewgapp?

@ggershinsky
Copy link

Hi @matthewgapp @adamreeve . Ben (@bhoberman) and I have worked on this for a while, but had to switch to other projects. Feel free to use the early draft (branch and an internal patch) any way you like.
As always, I'll be glad to help with PME design questions, etc, you can reach me on asf slack and on github.

@adamreeve
Copy link

@matthewgapp, are you on the Arrow Rust discord? I'm adamrnz there if you want to discuss this further. It looks like I could also invite you to the ASF Slack workspace if that's easier

@matthewgapp
Copy link

hey @adamreeve and @ggershinsky apologies for the delayed response here. Adam, I'll message you on discord. Happy to do slack if that's better

@matthewgapp
Copy link

thanks @ggershinsky for the draft and patch! will let you know if questions

@rok
Copy link
Member

rok commented Oct 3, 2024

Hi all. I'm @adamreeve's colleague and I happen to have time available to do some work on this. Could I help with any potential open tasks @matthewgapp or is it better I pick up the @ggershinsky's draft branch?

@alamb
Copy link
Contributor

alamb commented Oct 5, 2024

Hi all. I'm @adamreeve's colleague and I happen to have time available to do some work on this. Could I help with any potential open tasks @matthewgapp or is it better I pick up the @ggershinsky's draft branch?

❤️

I believe the Apache DataFusion Comet project may be interested in this feature too -- I believe its lack is one reason the project has its own parquet decoder

https://github.com/apache/datafusion-comet/tree/3413397ce0de890b7d71b25b5a6790cc38cff21f/native/core/src/parquet

Perhaps @andygrove or @viirya @sunchao or @kazuyukitanimura have more details they can share

cc @etseidl who may also be interested

@andygrove
Copy link
Member

Related issue in Comet: apache/datafusion-comet#1040

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants