-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor existing & add new details in Iceberg document #20552
Conversation
Codenotify: Notifying subscribers in CODENOTIFY files for diff 815bf40...d9015e6.
|
Metastores | ||
----------- | ||
Iceberg tables stores most of the metadata in the metadata files along with the data on the filesystem. | ||
But to store small amount of metadata it still requires Metastore to find the current location of the current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But to store small amount of metadata it still requires Metastore to find the current location of the current | |
But to store small amount of metadata it still requires a central place to find the current location of the current |
----------- | ||
Iceberg tables stores most of the metadata in the metadata files along with the data on the filesystem. | ||
But to store small amount of metadata it still requires Metastore to find the current location of the current | ||
metadata pointer for a table. This central place is called ``Iceberg Catalog``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metadata pointer for a table. This central place is called ``Iceberg Catalog``. | |
metadata pointer for a table. This central place is called the ``Iceberg Catalog``. |
To use Hadoop as Catalog with Iceberg table, catalog requires ``hive.metastore.uri`` property | ||
added even though it doesn't rely on metastore for metadata when using Hadoop Catalog. You | ||
can provide a placeholder with ``hive.metastore.uri=thrift://localhost:9083``. This will not | ||
be needed once check for ``hive.metastore.uri`` property is removed in case of Hadoop Catalog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To use Hadoop as Catalog with Iceberg table, catalog requires ``hive.metastore.uri`` property | |
added even though it doesn't rely on metastore for metadata when using Hadoop Catalog. You | |
can provide a placeholder with ``hive.metastore.uri=thrift://localhost:9083``. This will not | |
be needed once check for ``hive.metastore.uri`` property is removed in case of Hadoop Catalog. | |
To use Hadoop as Catalog with Iceberg table, catalog requires the ``hive.metastore.uri`` property | |
added even though it doesn't rely on metastore for metadata when using Hadoop Catalog. You | |
can provide a placeholder with ``hive.metastore.uri=thrift://localhost:9083``. This will be fixed in a future release of Presto. |
Is there an open issue for this BTW?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes there are 2 issues reporting the same - #17342
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agrawalreetika Thanks for putting this up so quickly! See my comment in line
Metastores | ||
----------- | ||
Iceberg tables stores most of the metadata in the metadata files along with the data on the filesystem. | ||
But to store small amount of metadata it still requires Metastore to find the current location of the current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove the "to store small amount of metadata".
Iceberg tables stores most of the metadata in the metadata files along
with the data on the filesystem, but it still requires ...
Iceberg tables stores most of the metadata in the metadata files along with the data on the filesystem. | ||
But to store small amount of metadata it still requires Metastore to find the current location of the current | ||
metadata pointer for a table. This central place is called ``Iceberg Catalog``. | ||
The Iceberg connector supports different types of Iceberg Catalogs : ``Hive Metastore``, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The Iceberg connector" -> "The Presto Iceberg connector"
But to store small amount of metadata it still requires Metastore to find the current location of the current | ||
metadata pointer for a table. This central place is called ``Iceberg Catalog``. | ||
The Iceberg connector supports different types of Iceberg Catalogs : ``Hive Metastore``, | ||
``GLUE``, ``NESSIE``, ``HADOOP`` | ||
|
||
To configure the Iceberg connector, create a catalog properties file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to highlight the iceberg.catalog.type
property here. Would you please add a sentence for it?
@@ -346,6 +232,8 @@ Extra Hidden Metadata Tables | |||
The Iceberg connector exposes extra hidden metadata tables. You can query these | |||
as a part of SQL query by appending name with the table. | |||
|
|||
``$properties`` Table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is for Line 196. Will you be able to add some explanation there to say those properties are available for all Presto tables, not just Iceberg tables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yingsu00 Under Table Properties
in the Iceberg document some properties that are mentioned are I think specific to the Iceberg table like partitioning
, location
& format_version
.
For hive connector, these are partitioned_by
, external_location
respectively & format_version
is not there for Hive.
Please correct me if you were talking about something else here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agrawalreetika yes I mean it would help the users to mention if they are Iceberg specific, or the same with Hive, just as you explained above.
@@ -415,7 +311,7 @@ as a part of SQL query by appending name with the table. | |||
0 | s3://my-bucket/ctas_nation/data/9f889274-6f74-4d28-8164-275eef99f660.parquet | PARQUET | 25 | 1648 | {1=52, 2=222, 3=105, 4=757} | {1=25, 2=25, 3=25, 4=25} | {1=0, 2=0, 3=0, 4=0} | NULL | {1=0, 2=ALGERIA, 3=0, 4= haggle. careful} | {1=24, 2=VIETNAM, 3=4, 4=y final packaget} | NULL | NULL | NULL | |||
|
|||
Time Travel | |||
------------------------ | |||
----------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is for ln 323
At this moment, the Schema Evolution has not been talked about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And for line 355, you mentioned we just created the ctas table, but this was not mentioned anywhere above. It might be good to have basic DDL and DML sections before the SELECT queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is for ln 323 At this moment, the Schema Evolution has not been talked about.
Ok, I will shift Schema Evolution before Time Travel Section in the document
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And for line 355, you mentioned we just created the ctas table, but this was not mentioned anywhere above. It might be good to have basic DDL and DML sections before the SELECT queries.
@yingsu00 In the Example Queries
section of the same there first query is CTAS, and all the further queries are done on the same table. Please let me know if anything is missing here.
The Iceberg connector supports querying and manipulating Iceberg tables and schemas | ||
(databases). Here are the examples of SQL Operations supported from Presto : | ||
|
||
Create a schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CREATE SCHEMA
CREATE SCHEMA iceberg.web | ||
WITH (location = 's3://my-bucket/') | ||
|
||
Create table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upper case, ditto
comment | varchar | | | ||
(4 rows) | ||
|
||
Examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This title is not appropriate. Other top level titles were all Iceberg operations or properties, and they all contain examples. I think we can rename it to SQL Support.
Also these DDL and DML better to be moved before Time Travel.
Drop a schema:: | ||
|
||
DROP SCHEMA iceberg.web | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the section for INSERT, DELETE, Merge or upsert? Let's have a section for it and state clearly what is supported and what is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I missed the INSERT example, will add that
- For DELETE I have mentioned in
Iceberg Connector Limitations
section that it's only supported if theWHERE
clause matches entire partitions. Do we need section forDELETE
as well? - MERGE & UPDATE sql syntax is not yet supported in Presto. I will open an issue for the same
@@ -415,7 +311,7 @@ as a part of SQL query by appending name with the table. | |||
0 | s3://my-bucket/ctas_nation/data/9f889274-6f74-4d28-8164-275eef99f660.parquet | PARQUET | 25 | 1648 | {1=52, 2=222, 3=105, 4=757} | {1=25, 2=25, 3=25, 4=25} | {1=0, 2=0, 3=0, 4=0} | NULL | {1=0, 2=ALGERIA, 3=0, 4= haggle. careful} | {1=24, 2=VIETNAM, 3=4, 4=y final packaget} | NULL | NULL | NULL | |||
|
|||
Time Travel | |||
------------------------ | |||
----------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add a section for SELECT support before Time travel? We can explain what type of Iceberg queries are supported by Presto there, e.g. read optimized, delta, real-time. We can also explain the support of COW and MOR tables there, or before talking about table properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yingsu00 I have added the SELECT
section in the SQL Support
section. Since currently Iceberg format version 1 & version 2 both read support is there but with limitations on version 2 tables where Delete files are present. I have mentioned that limitation as well. Please let me know if we should add any further details there.
b75bc4c
to
b94c752
Compare
Thanks, @tdcmeehan & @yingsu00 for your review. I have amended the changes as mentioned & replied to few. Please take a pass on the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I found a couple of small suggestions that I made comments for.
b94c752
to
2fc8f8b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly look good. But will you be able to check if double back quotation marks should be used for many places, e.g. rollback_to_snapshot was only singly quoted, and its not highlighted on my browser.
@@ -11,12 +11,17 @@ The Iceberg connector allows querying data stored in Iceberg tables. | |||
|
|||
It is recommended to use Iceberg 0.9.0 or later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how user can choose the Iceberg version with Presto?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right. Users will not have any control over this from the Presto engine side. And current Iceberg version in Presto is 1.3.1
. As I checked Presto iceberg connector started from 0.9.0
this might have been highlighted from there for other engines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.9.0 was the version we tested when we developed the connector. During that time, we ran some tests, such as using Presto to query iceberg tables (0.9.0) written by Spark. The version information is a bit outdated now. We probably could safely remove this note.
``etc/catalog/iceberg.properties`` with the following contents, | ||
replacing the properties as appropriate: | ||
``etc/catalog/iceberg.properties``. To define catalog type ``iceberg.catalog.type`` property is | ||
required along with the following contents, replacing the properties as appropriate: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To define the catalog type, iceberg.catalog.type property is ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, replacing the properties as appropriate:
->
, with the properties' values properly replaced:
@tdcmeehan do you have better recommendations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the properties' values replaced as follows:
|
||
.. note:: | ||
|
||
To use Hadoop as catalog with Iceberg table requires the hive.metastore.uri property, even |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hive.metastore.uri was not quoted
|
||
.. note:: | ||
|
||
To use Hadoop as catalog with Iceberg table requires the hive.metastore.uri property, even |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To use Hadoop as catalog with Iceberg table requires the hive.metastore.uri property,
-->
To use Hadoop catalog with Iceberg table, it requires the hive.metastore.uri
property,
|
||
To use Hadoop as catalog with Iceberg table requires the hive.metastore.uri property, even | ||
though it doesn't rely on metastore for metadata when using Hadoop Catalog. You can provide | ||
a placeholder with ``hive.metastore.uri=thrift://localhost:9083``. This will be fixed in a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand "You can provide a placeholder with hive.metastore.uri=thrift://localhost:9083
". What is the placeholder? Does thrift://localhost:9083 have to be exactly the same? And how will it be fixed? It'll be nice if you can explain it better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have put up details about the issue in here #19920
Currently since hive.metastore.uri
property can not be null when using Hadoop
as a catalog in Iceberg connector, we have to specify some placeholder for hive.metastore.uri
property (doesn't necessarily have to be thrift://localhost:9083
) which has to be fixed in this case since Hadoop
catalog doesn't require any HMS for metadata.
Open for suggestions on how we want to add a note here about this, we can also add this issue link in the document until this is fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Presto metadata store and Iceberg metadata store are two independent designs but they are coupled in the Presto-Iceberg connector, which, as a consequence, causes the dependency on the Hive metastore although the Iceberg Hadoop catalog doesn't use that.
Adding an issue link here looks good to me. Or add it to the Section Iceberg Connector Limitations
. Your call.
) | ||
WITH ( | ||
format = 'ORC', | ||
partitioning = ARRAY['ds', 'country'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we support partition column transformations? If yes, let's add a sub section here. If not, then add a line saying these are not supported yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we do. I will add the section for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presto Iceberg Connector currently already supports Bucket & Truncate transform with partition column in Iceberg Table. For Day, Month, Year, Hour Created Issue - #20570
|
||
.. note:: | ||
|
||
Iceberg Tables with format version 2 where delete files are present are not filtered in ``SELECT`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean it gives wrong results? "are not filtered in" is kind of vague here. Can you use more precise words?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iceberg Tables with format version 2 where delete files are present are not filtered in SELECT operations in Presto, and which will produce wrong results without filtering.
-->
The SELECT operations on Iceberg Tables with format version 2 do not read the delete files and remove the deleted rows as of now.
Please update the last section too.
zipcode | varchar | | | ||
(5 rows) | ||
|
||
We can also rename the new column to `location`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to location
. -> to another name, e.g. location
I feel just "location" would likely to be confused with the table property "location"
Time Travel | ||
------------------------ | ||
----------- | ||
|
||
Iceberg and Presto Iceberg connector supports time travel via table snapshots | ||
identified by unique snapshot IDs. The snapshot IDs are stored in the `$snapshots` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$snapshots
was not highlighted. Should it be double quoted?
----------------------------- | ||
|
||
* :doc:`/sql/delete` is only supported if the ``WHERE`` clause matches entire partitions. | ||
* Iceberg Tables using v2 of the Iceberg specification where delete files are present are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain better here too
Sure. As I checked All the metadata table access requires double quotes but in procedure calls for ex - rollback_to_snapshot database & table names require single quotes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agrawalreetika Thanks for your nice work!
I just left some minor comments/explanations. Hope these could be helpful for your work.
@@ -11,12 +11,17 @@ The Iceberg connector allows querying data stored in Iceberg tables. | |||
|
|||
It is recommended to use Iceberg 0.9.0 or later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.9.0 was the version we tested when we developed the connector. During that time, we ran some tests, such as using Presto to query iceberg tables (0.9.0) written by Spark. The version information is a bit outdated now. We probably could safely remove this note.
|
||
To use Hadoop as catalog with Iceberg table requires the hive.metastore.uri property, even | ||
though it doesn't rely on metastore for metadata when using Hadoop Catalog. You can provide | ||
a placeholder with ``hive.metastore.uri=thrift://localhost:9083``. This will be fixed in a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Presto metadata store and Iceberg metadata store are two independent designs but they are coupled in the Presto-Iceberg connector, which, as a consequence, causes the dependency on the Hive metastore although the Iceberg Hadoop catalog doesn't use that.
Adding an issue link here looks good to me. Or add it to the Section Iceberg Connector Limitations
. Your call.
``LZ4``, and ``ZSTD``. | ||
|
||
``iceberg.catalog.type`` The catalog type for Iceberg tables. The available values ``hive`` | ||
are ``hive``, ``hadoop``, and ``nessie`` corresponding to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And glue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For glue, we need to set iceberg.catalog.type=hive
. Details of each catalog property are also mentioned in the respective catalog section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the clarification.
2fc8f8b
to
3dd38a0
Compare
Thank you @yingsu00 @tdcmeehan @ChunxuTang & @steveburnett for your review. I have made changes as per the comments. Please take a pass if the details look good. |
@@ -48,8 +49,16 @@ Nessie catalog | |||
^^^^^^^^^^^^^^ | |||
|
|||
In order to use a Nessie catalog, ensure to configure the catalog type with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this line could be rephrased as shorter without changing the meaning? Consider if "To use a Nessie catalog, configure the catalog type as" still means what you want it to mean here.
e104d1f
to
4b4d325
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agrawalreetika Some minor modifications in line...
|
||
To configure the Iceberg connector, create a catalog properties file | ||
``etc/catalog/iceberg.properties`` with the following contents, | ||
replacing the properties as appropriate: | ||
``etc/catalog/iceberg.properties``. To define catalog type ``iceberg.catalog.type`` property is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To define catalog type ...
->
To define the catalog type, ...
Note the comma needs to be there
filesystem, but it still requires a central place to find the current location of the | ||
current metadata pointer for a table. This central place is called the ``Iceberg Catalog``. | ||
The Presto Iceberg connector supports different types of Iceberg Catalogs : ``Hive Metastore``, | ||
``GLUE``, ``NESSIE``, ``HADOOP`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NESSIE
, HADOOP
-->
NESSIE
, and HADOOP
.
.. note:: | ||
|
||
To use Hadoop catalog with Iceberg table, it requires the ``hive.metastore.uri`` property, even | ||
though it doesn't rely on metastore for metadata when using Hadoop Catalog. You can provide |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't rely on metastore for metadata when using Hadoop Catalog.
-->
it doesn't rely on the metastore to provide the actual URI when using the Hadoop Catalog.
|
||
To use Hadoop catalog with Iceberg table, it requires the ``hive.metastore.uri`` property, even | ||
though it doesn't rely on metastore for metadata when using Hadoop Catalog. You can provide | ||
a placeholder with ``hive.metastore.uri=thrift://localhost:9083``. This will be fixed in a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YOu can provide a placeholder with ...
-->
You can provide any value to this property, for example, hive.metastore.uri=thrift://localhost:9083
.
----------- | ||
|
||
The Iceberg connector supports querying and manipulating Iceberg tables and schemas | ||
(databases). Here are the examples of SQL Operations supported from Presto : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are the examples of SQL Operations supported from Presto :
-->
Here are some examples of the SQL operations supported by Presto :
* ``Bucket`` (partitions data into a specified number of buckets using a hash function) | ||
* ``Truncate`` (partitions the table based on the truncated value of the field, and can specify the width of truncated value) | ||
|
||
Create a Iceberg table partitioned into 8 buckets of equal sized ranges:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a Iceberg
-->
an Iceberg
Replace other 4 places too.
^^^^^^^^^^^^^ | ||
|
||
Create a new Iceberg table named ``page_views`` in the ``web`` schema | ||
that is stored using the ORC file format, partitioned by date and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partitioned by date and country
--->
partitioned by ds
and country
|
||
.. note:: | ||
|
||
``Day``, ``Month``, ``Year``, ``Hour`` based partition column transform is not supported in Presto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Day
, Month
, Year
, Hour
based partition column transform is not supported in Presto Iceberg connector
-->
Day
, Month
, Year
, Hour
partition column transform functions are not supported in Presto Iceberg connector yet
To use Hadoop catalog with Iceberg table, it requires the ``hive.metastore.uri`` property, even | ||
though it doesn't rely on metastore for metadata when using Hadoop Catalog. You can provide | ||
a placeholder with ``hive.metastore.uri=thrift://localhost:9083``. This will be fixed in a | ||
future release of Presto (:issue:`20579`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agrawalreetika Can you please double check if this issue display is correct? It shows up as this picture in my browser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
.. note:: | ||
|
||
Iceberg Tables with format version 2 where delete files are present are not filtered in ``SELECT`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iceberg Tables with format version 2 where delete files are present are not filtered in SELECT operations in Presto, and which will produce wrong results without filtering.
-->
The SELECT operations on Iceberg Tables with format version 2 do not read the delete files and remove the deleted rows as of now.
Please update the last section too.
4b4d325
to
9e9a9f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agrawalreetika Reetika, thanks for updating the PR so quickly. There are still some minor English grammar issues. Since I'm not a native speaker, I may not be able to identify them all. Will you be able to use some online grammar checker to double check the doc? @tdcmeehan will you be able to give it a final review as well?
------------- | ||
Metastores | ||
----------- | ||
Iceberg tables stores most of the metadata in the metadata files along with the data on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iceberg tables stores
-->
Iceberg tables store
|
||
To configure the Iceberg connector, create a catalog properties file | ||
``etc/catalog/iceberg.properties`` with the following contents, | ||
replacing the properties as appropriate: | ||
``etc/catalog/iceberg.properties``. To define catalog type, ``iceberg.catalog.type`` property is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To define catalog type,
-->
To define the catalog type,
Hadoop catalog | ||
^^^^^^^^^^^^^^ | ||
|
||
Iceberg connector supports Hadoop as Catalog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iceberg connector supports Hadoop as Catalog
-->
Iceberg connector supports Hadoop catalog
Sure, let me go through it again what all I could find. These could probably be there since some of the portion is copy pasted from the existing document and I might have missed checking those. |
9e9a9f6
to
d9015e6
Compare
@yingsu00 @tdcmeehan, I went through the full document to verify English grammar-related issues as Ying mentioned. I found a few which I have updated. Please review the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not gonna let the perfect be the enemy of the good. This is a fantastic next step to our documentation and I think we can merge ASAP. Thanks @agrawalreetika !
Description
Refactor existing & add new details in the Iceberg document.
Motivation and Context
Refactor existing Iceberg document & also add new details to it
Resolves #20543
Impact
Provide more concrete details to users about available features in the Iceberg connector.
Existing Document -
After PR changes -
Test Plan
Contributor checklist
Release Notes