You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I am encountering an OutOfMemoryError when attempting to read a large Cloud Optimized GeoTIFF (COG) file (2.4GB in size) using the mosaic.read() method in an Azure Databricks environment. The error occurs during the execution of df.show() after reading the file.
from databricks import mosaic
mosaic.enable_mosaic(spark, dbutils)
file_path = "[path to file]"
df = mosaic.read().format("raster_to_grid")\
.option("driverName", "GTiff")\
.option("fileExtension", "*.tif")\
.load(f"file://{file_path}")
df.show()
Expected behavior
The expectation is to successfully read the COG file into a DataFrame and display it using df.show() without encountering memory issues.
Additional Context
The COG file being read is 2.4GB in size.
This issue occurs consistently with this file size.
@Thimm Thank you for reporting this issue.
It will be resolved in the next release.
There has been a bug that was causing retiling of a large file to happen at a deferred stage and not immediately on read.
Spark buffers do not support binaries > 2GB so on read we have to retile the file to tiles that are < 2gb and then perform transformations on those.
I will be opening a PR today and this will be a part of the next release.
I ran the provided file on my local machine with the new fix without any issues using a docker and rosetta tanslation since I am on mac M1 - even with those constraints it runs now.
The next release should be out within a couple of weeks.
Describe the bug
I am encountering an OutOfMemoryError when attempting to read a large Cloud Optimized GeoTIFF (COG) file (2.4GB in size) using the mosaic.read() method in an Azure Databricks environment. The error occurs during the execution of df.show() after reading the file.
To Reproduce
dbfs
Expected behavior
The expectation is to successfully read the COG file into a DataFrame and display it using df.show() without encountering memory issues.
Additional Context
Environment
Databricks Runtime Version: 3.3 LTS (includes Apache Spark 3.4.1, Scala 2.12)
Cluster Configuration: Standard_D32ads_v5 128 GB Memory, 32 Cores
Language: Python
Traceback.txt
The text was updated successfully, but these errors were encountered: