Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Iceberg 1.7.0 #442

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -277,6 +277,7 @@ commons-collections:commons-collections
commons-io:commons-io
commons-logging:commons-logging
commons-net:commons-net
dev.failsafe:failsafe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is it a new transitive dependency brought by iceberg 1.7?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, it's added as part of AWS S3InputStream retry handling via : https://github.com/apache/iceberg/pull/10433/files

io.airlift:aircompressor
io.dropwizard.logback:logback-throttling-appender
io.dropwizard.metrics:metrics-annotation
Expand Down
2 changes: 1 addition & 1 deletion gradle/libs.versions.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@

[versions]
hadoop = "3.4.0"
iceberg = "1.6.1"
iceberg = "1.7.0"
dropwizard = "4.0.8"
slf4j = "2.0.13"
swagger = "1.6.14"
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,6 @@
import org.apache.iceberg.TableMetadata;
import org.apache.iceberg.TableMetadataParser;
import org.apache.iceberg.TableOperations;
import org.apache.iceberg.aws.s3.S3FileIOProperties;
import org.apache.iceberg.catalog.Namespace;
import org.apache.iceberg.catalog.SupportsNamespaces;
import org.apache.iceberg.catalog.TableIdentifier;
Expand Down Expand Up @@ -99,7 +98,6 @@
import org.apache.polaris.core.storage.PolarisStorageConfigurationInfo;
import org.apache.polaris.core.storage.PolarisStorageIntegration;
import org.apache.polaris.core.storage.StorageLocation;
import org.apache.polaris.core.storage.aws.PolarisS3FileIOClientFactory;
import org.apache.polaris.service.catalog.io.FileIOFactory;
import org.apache.polaris.service.exception.IcebergExceptionMapper;
import org.apache.polaris.service.task.TaskExecutor;
Expand Down Expand Up @@ -2055,8 +2053,6 @@ private List<TableIdentifier> listTableLike(PolarisEntitySubType subType, Namesp
*/
private FileIO loadFileIO(String ioImpl, Map<String, String> properties) {
Map<String, String> propertiesWithS3CustomizedClientFactory = new HashMap<>(properties);
propertiesWithS3CustomizedClientFactory.put(
S3FileIOProperties.CLIENT_FACTORY, PolarisS3FileIOClientFactory.class.getName());
Comment on lines -2058 to -2059
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer required because of this : https://github.com/apache/iceberg/pull/11259/files

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might have to mark this deprecated first since it's a public class

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was only ever used for this purpose, so we're probably fine

return fileIOFactory.loadFileIO(ioImpl, propertiesWithS3CustomizedClientFactory);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,21 +21,26 @@
import static org.apache.polaris.service.catalog.AccessDelegationMode.VENDED_CREDENTIALS;

import com.google.common.base.Preconditions;
import com.google.common.collect.ImmutableList;
import com.google.common.collect.ImmutableMap;
import com.google.common.collect.ImmutableSet;
import jakarta.ws.rs.core.Response;
import jakarta.ws.rs.core.SecurityContext;
import java.net.URLEncoder;
import java.nio.charset.Charset;
import java.util.EnumSet;
import java.util.Map;
import java.util.Optional;
import java.util.Set;
import org.apache.iceberg.catalog.Catalog;
import org.apache.iceberg.catalog.Namespace;
import org.apache.iceberg.catalog.TableIdentifier;
import org.apache.iceberg.exceptions.BadRequestException;
import org.apache.iceberg.exceptions.NotAuthorizedException;
import org.apache.iceberg.exceptions.NotFoundException;
import org.apache.iceberg.rest.Endpoint;
import org.apache.iceberg.rest.RESTUtil;
import org.apache.iceberg.rest.ResourcePaths;
import org.apache.iceberg.rest.requests.CommitTransactionRequest;
import org.apache.iceberg.rest.requests.CreateNamespaceRequest;
import org.apache.iceberg.rest.requests.CreateTableRequest;
Expand Down Expand Up @@ -71,6 +76,38 @@
public class IcebergCatalogAdapter
implements IcebergRestCatalogApiService, IcebergRestConfigurationApiService {

private static final Set<Endpoint> DEFAULT_ENDPOINTS =
ImmutableSet.<Endpoint>builder()
.add(Endpoint.V1_LIST_NAMESPACES)
.add(Endpoint.V1_LOAD_NAMESPACE)
.add(Endpoint.V1_CREATE_NAMESPACE)
.add(Endpoint.V1_UPDATE_NAMESPACE)
.add(Endpoint.V1_DELETE_NAMESPACE)
.add(Endpoint.V1_LIST_TABLES)
.add(Endpoint.V1_LOAD_TABLE)
.add(Endpoint.V1_CREATE_TABLE)
.add(Endpoint.V1_UPDATE_TABLE)
.add(Endpoint.V1_DELETE_TABLE)
.add(Endpoint.V1_RENAME_TABLE)
.add(Endpoint.V1_REGISTER_TABLE)
.add(Endpoint.V1_REPORT_METRICS)
.build();

private static final Set<Endpoint> VIEW_ENDPOINTS =
ImmutableSet.<Endpoint>builder()
.add(Endpoint.V1_LIST_VIEWS)
.add(Endpoint.V1_LOAD_VIEW)
.add(Endpoint.V1_CREATE_VIEW)
.add(Endpoint.V1_UPDATE_VIEW)
.add(Endpoint.V1_DELETE_VIEW)
.add(Endpoint.V1_RENAME_VIEW)
.build();

private static final Set<Endpoint> COMMIT_ENDPOINT =
ImmutableSet.<Endpoint>builder()
.add(Endpoint.create("POST", ResourcePaths.V1_TRANSACTIONS_COMMIT))
.build();

private final CallContextCatalogFactory catalogFactory;
private final MetaStoreManagerFactory metaStoreManagerFactory;
private final RealmEntityManagerFactory entityManagerFactory;
Expand Down Expand Up @@ -466,6 +503,12 @@ public Response getConfig(String warehouse, SecurityContext securityContext) {
ConfigResponse.builder()
.withDefaults(properties) // catalog properties are defaults
.withOverrides(ImmutableMap.of("prefix", warehouse))
.withEndpoints(
ImmutableList.<Endpoint>builder()
.addAll(DEFAULT_ENDPOINTS)
.addAll(VIEW_ENDPOINTS)
.addAll(COMMIT_ENDPOINT)
.build())
.build())
.build();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -341,7 +341,7 @@ public void testIcebergListNamespacesNestedNotFound() throws IOException {
sessionCatalog.listNamespaces(
sessionContext, Namespace.of("top_level", "whoops")))
.isInstanceOf(NoSuchNamespaceException.class)
.hasMessage("Namespace does not exist: top_level.whoops");
.hasMessage("Namespace does not exist: top_level%1Fwhoops");
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found this change : apache/iceberg@5fc1413 looks like we changes the seperator,
pending to make it configurable apache/iceberg#10877

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no more intent to make the separator configurable. At least, that's the latest I know. There was a big discussion around this topic in August this year.
IMHO, making the separator configurable makes things overly complicated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the response in the error message now? I think the dot in the error is much more user friendly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting, as Iceberg still uses dot to join different level of namespaces, https://github.com/apache/iceberg/blob/09634857e4a1333f5dc742d1dca3921e9a9f62dd/api/src/main/java/org/apache/iceberg/catalog/Namespace.java#L97-L97.
In this case, Namespace.of("top_level", "whoops") should be referred to top_level.whoops, unless the rest util somehow squash two levels together with the "%1F".
@singhpk234 , could you take a look?

Copy link
Author

@singhpk234 singhpk234 Nov 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops looks like it was a style miss : https://github.com/apache/polaris/actions/runs/11859822454/job/33055322598
Here are the local results :

Screenshot 2024-11-16 at 5 47 43 PM

error : org.apache.iceberg.exceptions.NoSuchNamespaceException: Namespace does not exist: ns1?ns1a

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the revert PR (apache/iceberg#11574) is targeting Iceberg 1.7.1. Should we wait for the 1.7.1 release to avoid unnecessary back-and-forth?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I think it's fair to wait as we not are chasing a release time in Polaris and 1.7.1 should be fast, I saw an RC is cut already apache/iceberg#11593

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me see, when the pr get merged and we have nightly artifact published, I can try reverting my change to correct the test and run our polaris suite with that ? wdyt

Copy link
Author

@singhpk234 singhpk234 Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can confirm its passing now, I have pointed to 1.8.0 artifact (can't find 1.7.1 in nightly) from nigthly, will change this 1.7.1 when it get released

}
}

Expand Down
2 changes: 1 addition & 1 deletion regtests/setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ if [ -z "${SPARK_HOME}" ]; then
fi
SPARK_CONF="${SPARK_HOME}/conf/spark-defaults.conf"
DERBY_HOME="/tmp/derby"
ICEBERG_VERSION="1.6.1"
ICEBERG_VERSION="1.7.0"
export PYTHONPATH="${SPARK_HOME}/python/:${SPARK_HOME}/python/lib/py4j-0.10.9.7-src.zip:$PYTHONPATH"

# Ensure binaries are downloaded locally
Expand Down
2 changes: 1 addition & 1 deletion regtests/t_pyspark/src/iceberg_spark.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ def __enter__(self):
"""Initial method for Iceberg Spark session. Creates a Spark session with specified configs.
"""
packages = [
"org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.6.1",
"org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.7.0",
"org.apache.hadoop:hadoop-aws:3.4.0",
"software.amazon.awssdk:bundle:2.23.19",
"software.amazon.awssdk:url-connection-client:2.23.19",
Expand Down
Loading