Does Hydra support the computation of data lakes in Paimon catlog format? #191

caesar168 · 2023-11-02T01:51:03Z

caesar168
Nov 2, 2023

Apache Paimon：Streaming data lake platform with high-speed data ingestion, changelog tracking and efficient real-time analytics.

caesar168 · 2023-11-02T01:51:43Z

caesar168
Nov 2, 2023
Author

Apache Paimon is an effort undergoing incubation at The Apache Software Foundation (ASF)。

0 replies

JerrySievert · 2023-11-02T01:55:59Z

JerrySievert
Nov 2, 2023

I'm just curious what you're expecting here. I saw your issue #189 but I'm still a little confused. are you asking if hydra can consume data from Paimon, or store in its format, query it, or ... ?

I'm happy to advocate for something cool, I just need to understand what it is :)

0 replies

caesar168 · 2023-11-02T01:58:34Z

caesar168
Nov 2, 2023
Author

In China, a large number of users use Paimon to perform real-time computing of massive data lakes, and if Hydra can provide an interface or driver to access Pamion, then it can find another broad scenario for Hydra. Well-known real-time data warehouse software such as ClickHouse, Doris, and Starrocks have supported reading and computation of Paimon.

0 replies

caesar168 · 2023-11-02T02:03:38Z

caesar168
Nov 2, 2023
Author

Due to the popularity of Flink, many users in China use Pamion Catlog to build data lakes, and then columnar vector computing software (such as ClickHouse and Hydra) consume (real-time computing) data lake data, and store the real-time computing results in their own (Clickhouse) databases, so as to achieve the integration of data lakes. In this process, there is only one set of data in the data lake, which reduces storage costs and data handling.

0 replies

JerrySievert · 2023-11-02T02:08:17Z

JerrySievert
Nov 2, 2023

so this would be an external table source? please forgive me, I'm very ignorant of Paimon, is this similar to parquet, where it could be considered a data source, or another database to query from?

if it is similar to parquet (a data source), then it should be possible to support, but it would be helpful if there were already a FDW for it.

0 replies

caesar168 · 2023-11-02T02:20:30Z

caesar168
Nov 2, 2023
Author

It is indeed a data source. It is also a data lake, and in China, the best practice is to store all the original business data of the enterprise together, called a data lake, and then all the analysis software can perform real-time or near-real-time calculations on the data of this lake according to business needs.

0 replies

caesar168 · 2023-11-02T02:21:00Z

caesar168
Nov 2, 2023
Author

https://paimon.apache.org/

0 replies

caesar168 · 2023-11-02T02:21:33Z

caesar168
Nov 2, 2023
Author

0 replies

wuputah · 2023-11-02T02:22:00Z

wuputah
Nov 2, 2023
Maintainer

Does it work with Postgres?

1 reply

caesar168 Nov 2, 2023
Author

We are now using software similar to ClickHouse for analysis, but we don't want to use this software, we use PG as the technology stack internally. So if Hydra is a massive real-time analysis software that provides access to Paimon and calculations, then we'd be happy to use Hydra, and everything revolves around PG, which reduces the software, reduces the work stress, and reduces the system risk.

JerrySievert · 2023-11-02T02:25:29Z

JerrySievert
Nov 2, 2023

I'm just trying to clarify your request:

is the request that we can read a Paimon data source directly from a Paimon server (or file format, like parquet), import the data from JDBC into a hydra table, or export from hydra into something that Paimon can deal with?

and as @wuputah asked, does it already work directly with postgres? if so, after a code review it might be something we could easily support.

2 replies

caesar168 Nov 2, 2023
Author

We expect Hydra to be able to read the Pamon data and perform real-time vector calculations. We use the flink-CDC data extraction tool to extract business data to pamion in real time, and hydra only needs to read and analyze it in real time.

caesar168 Nov 2, 2023
Author

Paimon should not work directly with PG.

JerrySievert · 2023-11-02T02:26:49Z

JerrySievert
Nov 2, 2023

or are you asking if we can export like the diagram you posted, from hydra to Paimon?

0 replies

wuputah · 2023-11-02T02:40:40Z

wuputah
Nov 2, 2023
Maintainer

Hydra is a columnar store for Postgres so if you can connect to Postgres (e.g. with JDBC) then it will work.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does Hydra support the computation of data lakes in Paimon catlog format? #191

{{title}}

Replies: 12 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Does Hydra support the computation of data lakes in Paimon catlog format? #191

caesar168 Nov 2, 2023

Replies: 12 comments · 3 replies

caesar168 Nov 2, 2023 Author

JerrySievert Nov 2, 2023

caesar168 Nov 2, 2023 Author

caesar168 Nov 2, 2023 Author

JerrySievert Nov 2, 2023

caesar168 Nov 2, 2023 Author

caesar168 Nov 2, 2023 Author

caesar168 Nov 2, 2023 Author

wuputah Nov 2, 2023 Maintainer

caesar168 Nov 2, 2023 Author

JerrySievert Nov 2, 2023

caesar168 Nov 2, 2023 Author

caesar168 Nov 2, 2023 Author

JerrySievert Nov 2, 2023

wuputah Nov 2, 2023 Maintainer

caesar168
Nov 2, 2023

Replies: 12 comments 3 replies

caesar168
Nov 2, 2023
Author

JerrySievert
Nov 2, 2023

caesar168
Nov 2, 2023
Author

caesar168
Nov 2, 2023
Author

JerrySievert
Nov 2, 2023

caesar168
Nov 2, 2023
Author

caesar168
Nov 2, 2023
Author

caesar168
Nov 2, 2023
Author

wuputah
Nov 2, 2023
Maintainer

caesar168 Nov 2, 2023
Author

JerrySievert
Nov 2, 2023

caesar168 Nov 2, 2023
Author

caesar168 Nov 2, 2023
Author

JerrySievert
Nov 2, 2023

wuputah
Nov 2, 2023
Maintainer