Skip to content

Commit

Permalink
[DOP-13259] Update Clickhouse types documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
dolfinus committed Mar 6, 2024
1 parent f92c602 commit c8492b1
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 30 deletions.
62 changes: 33 additions & 29 deletions docs/connection/db_connection/clickhouse/types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ Writing to some existing Clickhuse table
This is how Clickhouse connector performs this:

* Get names of columns in DataFrame. [1]_
* Perform ``SELECT column1, colum2, ... FROM table LIMIT 0`` query
* For each column in query result get column name and Clickhouse type.
* Perform ``SELECT * FROM table LIMIT 0`` query.
* Take only columns present in DataFrame (by name, case insensitive). For each found column get Clickhouse type.
* **Find corresponding** ``Clickhouse type (read)`` -> ``Spark type`` **combination** (see below) for each DataFrame column. If no combination is found, raise exception. [2]_
* Find corresponding ``Spark type`` -> ``Clickhousetype (write)`` combination (see below) for each DataFrame column. If no combination is found, raise exception.
* If ``Clickhousetype (write)`` match ``Clickhouse type (read)``, no additional casts will be performed, DataFrame column will be written to Clickhouse as is.
Expand Down Expand Up @@ -61,40 +61,44 @@ This may lead to incidental precision loss, or sometimes data cannot be written

So instead of relying on Spark to create tables:

.. code:: python
.. dropdown:: See example

writer = DBWriter(
connection=clickhouse,
table="default.target_tbl",
options=Clickhouse.WriteOptions(
if_exists="append",
# ENGINE is required by Clickhouse
createTableOptions="ENGINE = MergeTree() ORDER BY id",
),
)
writer.run(df)
.. code:: python
writer = DBWriter(
connection=clickhouse,
table="default.target_tbl",
options=Clickhouse.WriteOptions(
if_exists="append",
# ENGINE is required by Clickhouse
createTableOptions="ENGINE = MergeTree() ORDER BY id",
),
)
writer.run(df)
Always prefer creating tables with specific types **BEFORE WRITING DATA**:

.. code:: python
.. dropdown:: See example

clickhouse.execute(
"""
CREATE TABLE default.target_tbl AS (
id UInt8,
value DateTime64(6) -- specific type and precision
.. code:: python
clickhouse.execute(
"""
CREATE TABLE default.target_tbl AS (
id UInt8,
value DateTime64(6) -- specific type and precision
)
ENGINE = MergeTree()
ORDER BY id
""",
)
ENGINE = MergeTree()
ORDER BY id
""",
)
writer = DBWriter(
connection=clickhouse,
table="default.target_tbl",
options=Clickhouse.WriteOptions(if_exists="append"),
)
writer.run(df)
writer = DBWriter(
connection=clickhouse,
table="default.target_tbl",
options=Clickhouse.WriteOptions(if_exists="append"),
)
writer.run(df)
References
~~~~~~~~~~
Expand Down
3 changes: 2 additions & 1 deletion docs/connection/db_connection/postgres/types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ Writing to some existing Clickhuse table
This is how Postgres connector performs this:

* Get names of columns in DataFrame. [1]_
* Perform ``SELECT column1, colum2, ... FROM table LIMIT 0`` query. For each column in query result get Postgres type.
* Perform ``SELECT * FROM table LIMIT 0`` query.
* Take only columns present in DataFrame (by name, case insensitive). For each found column get Clickhouse type.
* Find corresponding ``Spark type`` -> ``Postgres type (write)`` combination (see below) for each DataFrame column. If no combination is found, raise exception.
* If ``Postgres type (write)`` match ``Postgres type (read)``, no additional casts will be performed, DataFrame column will be written to Postgres as is.
* If ``Postgres type (write)`` does not match ``Postgres type (read)``, DataFrame column will be casted to target column type **on Postgres side**.
Expand Down

0 comments on commit c8492b1

Please sign in to comment.