Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a FileExporter to allow export to arrow or feather (or parquet)??? #250

Open
rickFanta opened this issue Sep 20, 2024 · 6 comments
Open

Comments

@rickFanta
Copy link

Might it be possible to add a FileExporter to this to allow export to arrow or feather (or parquet)???

Or is there an obvious way to do this that I'm missing?

@lquerel
Copy link
Contributor

lquerel commented Sep 20, 2024

Yes, implementing such an exporter was part of our plan. Storing the Arrow records as they are would be easy, but in my opinion, it wouldn’t be ideal or optimal. The current Arrow schema used in the protocol has been optimized for transport, taking into account the need to regularly close streams (and therefore reset states on the receiver side) to make the protocol more load-balancer-friendly. Some changes or transformations need to be applied first to optimize records for long-term storage. Unfortunately, I don’t have an ETA to provide at this time.

@rickFanta
Copy link
Author

Thanks for the reply. There's strong perceived value here in otlp-flavor data stored as arrow/feather to allow quick tactical queries via DuckDB and similar, etc.

Any thoughts you have on how to enact your best solution, or do a mostly durable tactical one, would be very much appreciated, if you have the cycles. Happy to try to help towards either/both.

@abhiaagarwal
Copy link

I've started progress on something similar here, with the idea of loading the otel data directly into a DuckDB database via its arrow-native functions for zero-copy. I don't think I'll have much time to work on it in the immediate future but contributions are welcomed!

https://github.com/abhiaagarwal/otelarrow-treasury

@rickFanta
Copy link
Author

rickFanta commented Dec 2, 2024 via email

@abhiaagarwal
Copy link

@rickFanta go is probably the better choice here, I just am just much better at rust. To be clear, the data is stored in a DuckDB database, which doesn't use parquet at all. Delta Lake is certainly an option, but it would require maintaining 10+ different tables, or normalizing it down to 3 tables for each otel type (which I think would lose some of the benefit)

@rickFanta
Copy link
Author

rickFanta commented Dec 2, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants