You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Below a very sketchy, untested script how to implement such a thing. I actually could only guess what parquet.schema returns, since there's no docs yet about this (see #48).
The resulting table product_example2 from reading product_example.parquet (as defined in https://github.com/CrunchyData/pg_parquet) should be created without having created table product_example2 before hand:
CREATE OR REPLACE FUNCTION import_foreign_schema_from_parquet(uri TEXT, table_name TEXT DEFAULT 'parquet_table') RETURNS VOID AS $$
DECLARE
parquet_info RECORD;
columns TEXT := '';
BEGIN
-- Loop through the metadata returned by parquet.metadata function
FOR parquet_info IN EXECUTE 'SELECT column_name, type_name FROM parquet.metadata(' || quote_literal(uri) || ')' LOOP
-- Append column name and type to the columns string
columns := columns || quote_ident(parquet_info.column_name) || ' ' ||
CASE parquet_info.type_name
WHEN 'INT32' THEN 'INTEGER'
WHEN 'INT64' THEN 'BIGINT'
WHEN 'FLOAT' THEN 'REAL'
WHEN 'DOUBLE' THEN 'DOUBLE PRECISION'
WHEN 'BOOLEAN' THEN 'BOOLEAN'
WHEN 'UTF8' THEN 'TEXT'
WHEN 'LIST' THEN 'ANYELEMENT[]'
WHEN 'TIMESTAMP_MICROS' THEN 'TIMESTAMP'
WHEN 'TIMESTAMP_MILLIS' THEN 'TIMESTAMPTZ'
ELSE 'TEXT' -- Default to TEXT for unknown types
END || ', ';
END LOOP;
-- Remove trailing comma and space
columns := RTRIM(columns, ', ');
-- Execute the CREATE TABLE statement with the generated columns
EXECUTE 'CREATE TABLE ' || quote_ident(table_name) || ' (' || columns || ');';
END;
$$ LANGUAGE plpgsql;
-- Import foreign "schema" (i.e. table definition) from Parquet - creates table "product_example2":
SELECT import_foreign_schema_from_parquet('/tmp/product_example.parquet', 'product_example2');
-- Copy the Parquet file to the newly created table product_example2:
COPY product_example2 FROM '/tmp/product_example.parquet' (format 'parquet', compression 'gzip');
-- Show table product_example2 - should have the same structure as the original table product_example:
SELECT * FROM product_example2
UNION ALL
SELECT * FROM product_example;
The text was updated successfully, but these errors were encountered:
sfkeller
changed the title
Reading data from a Parquet file without the fuss (feature request)
Reading data from a Parquet file without the fuss
Oct 22, 2024
How about reading data from a Parquet file without the fuss (inspired by https://duckdb.org/docs/guides/file_formats/parquet_import )?
Below a very sketchy, untested script how to implement such a thing. I actually could only guess what parquet.schema returns, since there's no docs yet about this (see #48).
The resulting table product_example2 from reading product_example.parquet (as defined in https://github.com/CrunchyData/pg_parquet) should be created without having created table product_example2 before hand:
The text was updated successfully, but these errors were encountered: