Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor existing & add new details in Iceberg document #20552

Merged
merged 1 commit into from
Aug 19, 2023

Conversation

agrawalreetika
Copy link
Member

Description

Refactor existing & add new details in the Iceberg document.

Motivation and Context

Refactor existing Iceberg document & also add new details to it
Resolves #20543

Impact

Provide more concrete details to users about available features in the Iceberg connector.

Existing Document -
image

After PR changes -
image

Test Plan

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== NO RELEASE NOTE ==

@agrawalreetika agrawalreetika requested a review from a team as a code owner August 11, 2023 08:13
@github-actions
Copy link

github-actions bot commented Aug 11, 2023

Codenotify: Notifying subscribers in CODENOTIFY files for diff 815bf40...d9015e6.

Notify File(s)
@steveburnett presto-docs/src/main/sphinx/connector/iceberg.rst

Metastores
-----------
Iceberg tables stores most of the metadata in the metadata files along with the data on the filesystem.
But to store small amount of metadata it still requires Metastore to find the current location of the current
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
But to store small amount of metadata it still requires Metastore to find the current location of the current
But to store small amount of metadata it still requires a central place to find the current location of the current

-----------
Iceberg tables stores most of the metadata in the metadata files along with the data on the filesystem.
But to store small amount of metadata it still requires Metastore to find the current location of the current
metadata pointer for a table. This central place is called ``Iceberg Catalog``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
metadata pointer for a table. This central place is called ``Iceberg Catalog``.
metadata pointer for a table. This central place is called the ``Iceberg Catalog``.

Comment on lines 117 to 120
To use Hadoop as Catalog with Iceberg table, catalog requires ``hive.metastore.uri`` property
added even though it doesn't rely on metastore for metadata when using Hadoop Catalog. You
can provide a placeholder with ``hive.metastore.uri=thrift://localhost:9083``. This will not
be needed once check for ``hive.metastore.uri`` property is removed in case of Hadoop Catalog.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To use Hadoop as Catalog with Iceberg table, catalog requires ``hive.metastore.uri`` property
added even though it doesn't rely on metastore for metadata when using Hadoop Catalog. You
can provide a placeholder with ``hive.metastore.uri=thrift://localhost:9083``. This will not
be needed once check for ``hive.metastore.uri`` property is removed in case of Hadoop Catalog.
To use Hadoop as Catalog with Iceberg table, catalog requires the ``hive.metastore.uri`` property
added even though it doesn't rely on metastore for metadata when using Hadoop Catalog. You
can provide a placeholder with ``hive.metastore.uri=thrift://localhost:9083``. This will be fixed in a future release of Presto.

Is there an open issue for this BTW?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there are 2 issues reporting the same - #17342

Copy link
Contributor

@yingsu00 yingsu00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agrawalreetika Thanks for putting this up so quickly! See my comment in line

Metastores
-----------
Iceberg tables stores most of the metadata in the metadata files along with the data on the filesystem.
But to store small amount of metadata it still requires Metastore to find the current location of the current
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove the "to store small amount of metadata".

Iceberg tables stores most of the metadata in the metadata files along 
with the data on the filesystem, but it still requires ...

Iceberg tables stores most of the metadata in the metadata files along with the data on the filesystem.
But to store small amount of metadata it still requires Metastore to find the current location of the current
metadata pointer for a table. This central place is called ``Iceberg Catalog``.
The Iceberg connector supports different types of Iceberg Catalogs : ``Hive Metastore``,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The Iceberg connector" -> "The Presto Iceberg connector"

But to store small amount of metadata it still requires Metastore to find the current location of the current
metadata pointer for a table. This central place is called ``Iceberg Catalog``.
The Iceberg connector supports different types of Iceberg Catalogs : ``Hive Metastore``,
``GLUE``, ``NESSIE``, ``HADOOP``

To configure the Iceberg connector, create a catalog properties file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to highlight the iceberg.catalog.type property here. Would you please add a sentence for it?

@@ -346,6 +232,8 @@ Extra Hidden Metadata Tables
The Iceberg connector exposes extra hidden metadata tables. You can query these
as a part of SQL query by appending name with the table.

``$properties`` Table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is for Line 196. Will you be able to add some explanation there to say those properties are available for all Presto tables, not just Iceberg tables?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yingsu00 Under Table Properties in the Iceberg document some properties that are mentioned are I think specific to the Iceberg table like partitioning, location & format_version.
For hive connector, these are partitioned_by , external_location respectively & format_version is not there for Hive.
Please correct me if you were talking about something else here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agrawalreetika yes I mean it would help the users to mention if they are Iceberg specific, or the same with Hive, just as you explained above.

@@ -415,7 +311,7 @@ as a part of SQL query by appending name with the table.
0 | s3://my-bucket/ctas_nation/data/9f889274-6f74-4d28-8164-275eef99f660.parquet | PARQUET | 25 | 1648 | {1=52, 2=222, 3=105, 4=757} | {1=25, 2=25, 3=25, 4=25} | {1=0, 2=0, 3=0, 4=0} | NULL | {1=0, 2=ALGERIA, 3=0, 4= haggle. careful} | {1=24, 2=VIETNAM, 3=4, 4=y final packaget} | NULL | NULL | NULL

Time Travel
------------------------
-----------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is for ln 323
At this moment, the Schema Evolution has not been talked about.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for line 355, you mentioned we just created the ctas table, but this was not mentioned anywhere above. It might be good to have basic DDL and DML sections before the SELECT queries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is for ln 323 At this moment, the Schema Evolution has not been talked about.

Ok, I will shift Schema Evolution before Time Travel Section in the document

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for line 355, you mentioned we just created the ctas table, but this was not mentioned anywhere above. It might be good to have basic DDL and DML sections before the SELECT queries.

@yingsu00 In the Example Queries section of the same there first query is CTAS, and all the further queries are done on the same table. Please let me know if anything is missing here.

The Iceberg connector supports querying and manipulating Iceberg tables and schemas
(databases). Here are the examples of SQL Operations supported from Presto :

Create a schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CREATE SCHEMA

CREATE SCHEMA iceberg.web
WITH (location = 's3://my-bucket/')

Create table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upper case, ditto

comment | varchar | |
(4 rows)

Examples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This title is not appropriate. Other top level titles were all Iceberg operations or properties, and they all contain examples. I think we can rename it to SQL Support.

Also these DDL and DML better to be moved before Time Travel.

Drop a schema::

DROP SCHEMA iceberg.web

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the section for INSERT, DELETE, Merge or upsert? Let's have a section for it and state clearly what is supported and what is not.

Copy link
Member Author

@agrawalreetika agrawalreetika Aug 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I missed the INSERT example, will add that
  • For DELETE I have mentioned in Iceberg Connector Limitations section that it's only supported if the WHERE clause matches entire partitions. Do we need section for DELETE as well?
  • MERGE & UPDATE sql syntax is not yet supported in Presto. I will open an issue for the same

@@ -415,7 +311,7 @@ as a part of SQL query by appending name with the table.
0 | s3://my-bucket/ctas_nation/data/9f889274-6f74-4d28-8164-275eef99f660.parquet | PARQUET | 25 | 1648 | {1=52, 2=222, 3=105, 4=757} | {1=25, 2=25, 3=25, 4=25} | {1=0, 2=0, 3=0, 4=0} | NULL | {1=0, 2=ALGERIA, 3=0, 4= haggle. careful} | {1=24, 2=VIETNAM, 3=4, 4=y final packaget} | NULL | NULL | NULL

Time Travel
------------------------
-----------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a section for SELECT support before Time travel? We can explain what type of Iceberg queries are supported by Presto there, e.g. read optimized, delta, real-time. We can also explain the support of COW and MOR tables there, or before talking about table properties

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yingsu00 I have added the SELECT section in the SQL Support section. Since currently Iceberg format version 1 & version 2 both read support is there but with limitations on version 2 tables where Delete files are present. I have mentioned that limitation as well. Please let me know if we should add any further details there.

@yingsu00 yingsu00 added the iceberg Apache Iceberg related label Aug 12, 2023
@agrawalreetika
Copy link
Member Author

Thanks, @tdcmeehan & @yingsu00 for your review. I have amended the changes as mentioned & replied to few. Please take a pass on the changes.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I found a couple of small suggestions that I made comments for.

presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@yingsu00 yingsu00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly look good. But will you be able to check if double back quotation marks should be used for many places, e.g. rollback_to_snapshot was only singly quoted, and its not highlighted on my browser.

@@ -11,12 +11,17 @@ The Iceberg connector allows querying data stored in Iceberg tables.

It is recommended to use Iceberg 0.9.0 or later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how user can choose the Iceberg version with Presto?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. Users will not have any control over this from the Presto engine side. And current Iceberg version in Presto is 1.3.1. As I checked Presto iceberg connector started from 0.9.0 this might have been highlighted from there for other engines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.9.0 was the version we tested when we developed the connector. During that time, we ran some tests, such as using Presto to query iceberg tables (0.9.0) written by Spark. The version information is a bit outdated now. We probably could safely remove this note.

``etc/catalog/iceberg.properties`` with the following contents,
replacing the properties as appropriate:
``etc/catalog/iceberg.properties``. To define catalog type ``iceberg.catalog.type`` property is
required along with the following contents, replacing the properties as appropriate:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To define the catalog type, iceberg.catalog.type property is ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, replacing the properties as appropriate:
->
, with the properties' values properly replaced:

@tdcmeehan do you have better recommendations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the properties' values replaced as follows:


.. note::

To use Hadoop as catalog with Iceberg table requires the hive.metastore.uri property, even
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hive.metastore.uri was not quoted


.. note::

To use Hadoop as catalog with Iceberg table requires the hive.metastore.uri property, even
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use Hadoop as catalog with Iceberg table requires the hive.metastore.uri property,
-->
To use Hadoop catalog with Iceberg table, it requires the hive.metastore.uri property,


To use Hadoop as catalog with Iceberg table requires the hive.metastore.uri property, even
though it doesn't rely on metastore for metadata when using Hadoop Catalog. You can provide
a placeholder with ``hive.metastore.uri=thrift://localhost:9083``. This will be fixed in a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand "You can provide a placeholder with hive.metastore.uri=thrift://localhost:9083". What is the placeholder? Does thrift://localhost:9083 have to be exactly the same? And how will it be fixed? It'll be nice if you can explain it better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have put up details about the issue in here #19920
Currently since hive.metastore.uri property can not be null when using Hadoop as a catalog in Iceberg connector, we have to specify some placeholder for hive.metastore.uri property (doesn't necessarily have to be thrift://localhost:9083) which has to be fixed in this case since Hadoop catalog doesn't require any HMS for metadata.

Open for suggestions on how we want to add a note here about this, we can also add this issue link in the document until this is fixed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Presto metadata store and Iceberg metadata store are two independent designs but they are coupled in the Presto-Iceberg connector, which, as a consequence, causes the dependency on the Hive metastore although the Iceberg Hadoop catalog doesn't use that.
Adding an issue link here looks good to me. Or add it to the Section Iceberg Connector Limitations. Your call.

)
WITH (
format = 'ORC',
partitioning = ARRAY['ds', 'country']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support partition column transformations? If yes, let's add a sub section here. If not, then add a line saying these are not supported yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do. I will add the section for it.

Copy link
Member Author

@agrawalreetika agrawalreetika Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presto Iceberg Connector currently already supports Bucket & Truncate transform with partition column in Iceberg Table. For Day, Month, Year, Hour Created Issue - #20570


.. note::

Iceberg Tables with format version 2 where delete files are present are not filtered in ``SELECT``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean it gives wrong results? "are not filtered in" is kind of vague here. Can you use more precise words?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iceberg Tables with format version 2 where delete files are present are not filtered in SELECT operations in Presto, and which will produce wrong results without filtering.
-->
The SELECT operations on Iceberg Tables with format version 2 do not read the delete files and remove the deleted rows as of now.

Please update the last section too.

zipcode | varchar | |
(5 rows)

We can also rename the new column to `location`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to location. -> to another name, e.g. location

I feel just "location" would likely to be confused with the table property "location"

Time Travel
------------------------
-----------

Iceberg and Presto Iceberg connector supports time travel via table snapshots
identified by unique snapshot IDs. The snapshot IDs are stored in the `$snapshots`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$snapshots was not highlighted. Should it be double quoted?

-----------------------------

* :doc:`/sql/delete` is only supported if the ``WHERE`` clause matches entire partitions.
* Iceberg Tables using v2 of the Iceberg specification where delete files are present are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain better here too

@agrawalreetika
Copy link
Member Author

Mostly look good. But will you be able to check if double back quotation marks should be used for many places, e.g. rollback_to_snapshot was only singly quoted, and its not highlighted on my browser.

Sure. As I checked All the metadata table access requires double quotes but in procedure calls for ex - rollback_to_snapshot database & table names require single quotes.

Copy link
Member

@ChunxuTang ChunxuTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agrawalreetika Thanks for your nice work!
I just left some minor comments/explanations. Hope these could be helpful for your work.

@@ -11,12 +11,17 @@ The Iceberg connector allows querying data stored in Iceberg tables.

It is recommended to use Iceberg 0.9.0 or later.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.9.0 was the version we tested when we developed the connector. During that time, we ran some tests, such as using Presto to query iceberg tables (0.9.0) written by Spark. The version information is a bit outdated now. We probably could safely remove this note.


To use Hadoop as catalog with Iceberg table requires the hive.metastore.uri property, even
though it doesn't rely on metastore for metadata when using Hadoop Catalog. You can provide
a placeholder with ``hive.metastore.uri=thrift://localhost:9083``. This will be fixed in a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Presto metadata store and Iceberg metadata store are two independent designs but they are coupled in the Presto-Iceberg connector, which, as a consequence, causes the dependency on the Hive metastore although the Iceberg Hadoop catalog doesn't use that.
Adding an issue link here looks good to me. Or add it to the Section Iceberg Connector Limitations. Your call.

``LZ4``, and ``ZSTD``.

``iceberg.catalog.type`` The catalog type for Iceberg tables. The available values ``hive``
are ``hive``, ``hadoop``, and ``nessie`` corresponding to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And glue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For glue, we need to set iceberg.catalog.type=hive. Details of each catalog property are also mentioned in the respective catalog section.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the clarification.

@agrawalreetika
Copy link
Member Author

Thank you @yingsu00 @tdcmeehan @ChunxuTang & @steveburnett for your review. I have made changes as per the comments. Please take a pass if the details look good.

@@ -48,8 +49,16 @@ Nessie catalog
^^^^^^^^^^^^^^

In order to use a Nessie catalog, ensure to configure the catalog type with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this line could be rephrased as shorter without changing the meaning? Consider if "To use a Nessie catalog, configure the catalog type as" still means what you want it to mean here.

@agrawalreetika agrawalreetika force-pushed the iceberg-docs branch 2 times, most recently from e104d1f to 4b4d325 Compare August 16, 2023 05:58
@agrawalreetika agrawalreetika self-assigned this Aug 16, 2023
Copy link
Contributor

@yingsu00 yingsu00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agrawalreetika Some minor modifications in line...


To configure the Iceberg connector, create a catalog properties file
``etc/catalog/iceberg.properties`` with the following contents,
replacing the properties as appropriate:
``etc/catalog/iceberg.properties``. To define catalog type ``iceberg.catalog.type`` property is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To define catalog type ...
->
To define the catalog type, ...

Note the comma needs to be there

filesystem, but it still requires a central place to find the current location of the
current metadata pointer for a table. This central place is called the ``Iceberg Catalog``.
The Presto Iceberg connector supports different types of Iceberg Catalogs : ``Hive Metastore``,
``GLUE``, ``NESSIE``, ``HADOOP``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NESSIE, HADOOP
-->
NESSIE, and HADOOP.

.. note::

To use Hadoop catalog with Iceberg table, it requires the ``hive.metastore.uri`` property, even
though it doesn't rely on metastore for metadata when using Hadoop Catalog. You can provide
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't rely on metastore for metadata when using Hadoop Catalog.
-->
it doesn't rely on the metastore to provide the actual URI when using the Hadoop Catalog.


To use Hadoop catalog with Iceberg table, it requires the ``hive.metastore.uri`` property, even
though it doesn't rely on metastore for metadata when using Hadoop Catalog. You can provide
a placeholder with ``hive.metastore.uri=thrift://localhost:9083``. This will be fixed in a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YOu can provide a placeholder with ...
-->
You can provide any value to this property, for example, hive.metastore.uri=thrift://localhost:9083.

-----------

The Iceberg connector supports querying and manipulating Iceberg tables and schemas
(databases). Here are the examples of SQL Operations supported from Presto :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the examples of SQL Operations supported from Presto :
-->
Here are some examples of the SQL operations supported by Presto :

* ``Bucket`` (partitions data into a specified number of buckets using a hash function)
* ``Truncate`` (partitions the table based on the truncated value of the field, and can specify the width of truncated value)

Create a Iceberg table partitioned into 8 buckets of equal sized ranges::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a Iceberg
-->
an Iceberg

Replace other 4 places too.

^^^^^^^^^^^^^

Create a new Iceberg table named ``page_views`` in the ``web`` schema
that is stored using the ORC file format, partitioned by date and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partitioned by date and country
--->
partitioned by ds and country


.. note::

``Day``, ``Month``, ``Year``, ``Hour`` based partition column transform is not supported in Presto
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Day, Month, Year, Hour based partition column transform is not supported in Presto Iceberg connector
-->
Day, Month, Year, Hour partition column transform functions are not supported in Presto Iceberg connector yet

To use Hadoop catalog with Iceberg table, it requires the ``hive.metastore.uri`` property, even
though it doesn't rely on metastore for metadata when using Hadoop Catalog. You can provide
a placeholder with ``hive.metastore.uri=thrift://localhost:9083``. This will be fixed in a
future release of Presto (:issue:`20579`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agrawalreetika Can you please double check if this issue display is correct? It shows up as this picture in my browser
Screenshot 2023-08-16 at 6 15 22 PM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified, after compiling on web UI it's showing correct issue link -
image


.. note::

Iceberg Tables with format version 2 where delete files are present are not filtered in ``SELECT``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iceberg Tables with format version 2 where delete files are present are not filtered in SELECT operations in Presto, and which will produce wrong results without filtering.
-->
The SELECT operations on Iceberg Tables with format version 2 do not read the delete files and remove the deleted rows as of now.

Please update the last section too.

Copy link
Contributor

@yingsu00 yingsu00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agrawalreetika Reetika, thanks for updating the PR so quickly. There are still some minor English grammar issues. Since I'm not a native speaker, I may not be able to identify them all. Will you be able to use some online grammar checker to double check the doc? @tdcmeehan will you be able to give it a final review as well?

-------------
Metastores
-----------
Iceberg tables stores most of the metadata in the metadata files along with the data on the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iceberg tables stores
-->
Iceberg tables store


To configure the Iceberg connector, create a catalog properties file
``etc/catalog/iceberg.properties`` with the following contents,
replacing the properties as appropriate:
``etc/catalog/iceberg.properties``. To define catalog type, ``iceberg.catalog.type`` property is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To define catalog type,
-->
To define the catalog type,

Hadoop catalog
^^^^^^^^^^^^^^

Iceberg connector supports Hadoop as Catalog
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iceberg connector supports Hadoop as Catalog
-->
Iceberg connector supports Hadoop catalog

@agrawalreetika
Copy link
Member Author

@agrawalreetika Reetika, thanks for updating the PR so quickly. There are still some minor English grammar issues. Since I'm not a native speaker, I may not be able to identify them all. Will you be able to use some online grammar checker to double check the doc? @tdcmeehan will you be able to give it a final review as well?

Sure, let me go through it again what all I could find. These could probably be there since some of the portion is copy pasted from the existing document and I might have missed checking those.

@agrawalreetika
Copy link
Member Author

@yingsu00 @tdcmeehan, I went through the full document to verify English grammar-related issues as Ying mentioned. I found a few which I have updated. Please review the changes.

Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not gonna let the perfect be the enemy of the good. This is a fantastic next step to our documentation and I think we can merge ASAP. Thanks @agrawalreetika !

@yingsu00 yingsu00 merged commit fea80c9 into prestodb:master Aug 19, 2023
50 checks passed
@agrawalreetika agrawalreetika deleted the iceberg-docs branch August 23, 2023 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
iceberg Apache Iceberg related
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[Feature Request] Improve Iceberg documentation
5 participants