-
-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for "files"-archive #761
Conversation
https://github.com/onthegomap/planetiler/actions/runs/7391179713 ℹ️ Base Logs 389ccab
ℹ️ This Branch Logs adabc78
|
f1cf90e
to
0acaca5
Compare
static Path relativePathFromTileCoord(TileCoord tc) { | ||
return Paths.get(Integer.toString(tc.z()), Integer.toString(tc.x()), | ||
tc.y() + PBF_FILE_ENDING); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the path format should be configurable? Or at least comparing to tilelive-copy, it has a "safe" mode that breaks x/y up into 2 levels so each directory has <1000 children.
https://github.com/mapbox/tilelive-file/blob/master/lib/tilelive-file.js#L40-L42
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added support for custom "tile_scheme". Supported templates are {x}
, {y}
, {z}
as well as {xs}
and {ys}
(s=safe). {xs}
and {ys}
are split into 2 levels.
{z}/{x}/{y}.pbf => 3/1/2.pbf
{x}/{y}/{z}.pbf => 1/2/3.pbf
{x}-{y}-{z}.pbf => 1-2-3.pbf
{x}/a/{y}/b{z}.pbf => 1/a/2/b3.pbf
{z}/{x}/{y}.pbf.gz => 3/1/2.pbf.gz
{z}/{xs}/{ys}.pbf => 3/000/001/000/002.pbf
{z}/{x}/{ys}.pbf => 3/1/000/002.pbf
{z}/{xs}/{y}.pbf => 3/000/001/2.pbf
@Override | ||
public TileArchiveMetadata metadata() { | ||
return null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tilelive-file stores the metadata as metadata.json
in the root of the archive - could we do the same here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
planetiler-core/src/main/java/com/onthegomap/planetiler/files/WriteableFilesArchive.java
Outdated
Show resolved
Hide resolved
planetiler-core/src/test/java/com/onthegomap/planetiler/archive/TileArchiveConfigTest.java
Show resolved
Hide resolved
} | ||
lastCheckedFolder = folder; | ||
try { | ||
Files.write(file, encodingResult.tileData()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
encodingResult.tileData()
is gzipped here. We should probably make it configurable whether or not the output files should be gzipped (I think default to false?). If they aren't gzipped we can skip the gzip step while encoding, and if they are gzipped the output extension should probably be .pbf.gz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output is not necessarily gzipped. This is already configurable at a higher level: tile_compression. So I don't think anything needs to be handled here specifically at the archive level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh right I forgot that was there... I wonder if when tile compression is gzip if it should add .gz
suffix or if it's better for tiles to be at known locations on disk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer the tiles to be at a known location by default. With the introduction of the tile_scheme param users can adjust as needed.
2e4652e
Regarding performance, this is definitely going to be slower than a single-file archive but I wonder if there's any performance difference to These could also be separate followup issues, but I noticed:
|
Addressed those points here: 5371520
|
Good question/point. Need to investigate a bit on this. |
7fc6f64
to
b8cfa56
Compare
regarding performance: I wrote some benchmark (BenchmarkTileFilesWrites) that loads a mbtiles file into memory and then writes the tiles to files in different modes. See the benchmark results below from my machine for US (0<=z<=13). If I interpret things correctly, then there's no real benefit for virtual threads. At the same time async seems to be good compromise between absolute and CPU time. So I impletemented:
If you agree, then I'd remove the benchmark as well as the option to use virtual threads for writing, again.
changes are here: b8cfa56 |
Thanks for doing those tests! I saw similar results running this on a linux box with fast SSD - fastest seemed to be with 4 dedicated OS threads. Given these results, can we just remove the extra code paths and keep the configurable number of fixed threads you had initially since it's the fastest and simplest? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making these changes! I'm trying to test it with the custom schemes but getting some errors:
My first attempt was to use --output=tiles/{z}/{x}-{y}.pbf.gz
- it would be cool if that worked but looks like it's not currently handled.
Then I tried --output=tiles?format=files&tile_scheme={z}/{x}/{y}.pbf.gz
but I get Illegal character in query at index 31: tiles?format=files&tile_scheme={z}/{x}/{y}.pbf.gz
.
To make it work I had to do --output=tiles&format=files&tile_scheme=%7Bz%7D%2F%7Bx%7D%2F%7By%7D.pbf.gz
which isn't super user-friendly.
Is there another way that I'm missing?
final boolean firstTileWriter = firstTileWriterTracker.compareAndExchange(true, false); | ||
if (firstTileWriter) { | ||
archive.initialize(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we pull this out to run before the tile writers start like you did with finish archive? Just to make sure there's no race condition with other threads getting to the tile writing before initialize has finished?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, was thinking about it as well but thought not very likely.
Didn't find any good spot to move it to. I moved it here now:
planetiler/planetiler-core/src/main/java/com/onthegomap/planetiler/archive/TileArchiveWriter.java
Line 154 in 323f4c1
output.initialize(); |
Let me know if you have a better idea.
@@ -371,8 +387,6 @@ private void tileWriter(Iterable<TileBatch> tileBatches) throws ExecutionExcepti | |||
if (time != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove printStats above now that it happens in finishArchive
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not quite sure what you mean here, sorry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry I meant the tileWriter.printStats();
line - but I see now that's separate from archive.printStats();
It looks like nothing overrides archive.printStats() - might just be best to remove it?
planetiler-core/src/main/java/com/onthegomap/planetiler/archive/TileArchiveMetadataDeSer.java
Show resolved
Hide resolved
@@ -44,6 +44,10 @@ default void initialize() {} | |||
*/ | |||
default void finish(TileArchiveMetadata tileArchiveMetadata) {} | |||
|
|||
default void printStats() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we get rid of TileWriter
printStats as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's still a point in having printStats for a writer. And there's one usage
planetiler/planetiler-core/src/main/java/com/onthegomap/planetiler/mbtiles/Mbtiles.java
Line 804 in cbb092a
public void printStats() { |
Moving that to the archive's printStats feels strange. But we for sure could if you want, or log from #close in that specific case.
planetiler-core/src/main/java/com/onthegomap/planetiler/config/PlanetilerConfig.java
Outdated
Show resolved
Hide resolved
planetiler-core/src/main/java/com/onthegomap/planetiler/stats/ProcessInfo.java
Outdated
Show resolved
Hide resolved
planetiler-core/src/main/java/com/onthegomap/planetiler/stats/Timers.java
Outdated
Show resolved
Hide resolved
planetiler-core/src/main/java/com/onthegomap/planetiler/stream/WriteableStreamArchive.java
Outdated
Show resolved
Hide resolved
planetiler-core/src/main/java/com/onthegomap/planetiler/worker/Worker.java
Outdated
Show resolved
Hide resolved
@@ -44,6 +44,10 @@ default void initialize() {} | |||
*/ | |||
default void finish(TileArchiveMetadata tileArchiveMetadata) {} | |||
|
|||
default void printStats() {} | |||
|
|||
long bytesWritten(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this, I could do this in a separate PR as well but there's one remaining place where it gets the size of the archive which could take a very long time if you just wrote tiles for the planet: Stats.printSummary
- ideally it either uses the same output from this method, or we add a failsafe to FileUtils.size
that gives up after a certain number of files/time since that output duplicates what got printed earlier we won't miss much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done: 323f4c1
The other option - which I always use - is to use format pre-fixed args => I'll check if we can support your other approaches. |
Ah got it, that makes sense. Could we add a short javadoc to the major new classes (don't worry about all of them, the helper classes should be self-explanatory without any javadoc) and maybe include that cli arg snippet in the javadoc for readable/writeable files archive classes? The shorthand one would be a nice to have, but might be hard to disambiguate between that and writing to a file archive on disk? |
i.e. write individual pbf-files to disk in the format <base>/z/x/y.pbf in order to use that format it must be passed as "--ouput=/path/to/tiles?format=files" Fixes onthegomap#536
...if no explict format query param given, path ends with a slash, or no extension given
and refactor TileArchiveMetadata 1. put zoom into center (CoordinateXY->Coordinate) - in sync with mbtiles-format 2. add (De-)Serializer for Coordinate+Envelop => avoid duplication and cleaner 3. change the json and proto output for TileArchiveMetadata to be (more) in sync with mbtiles-format
{z}/{x}/{y}.pbf is the default and can be configured as needed - e.g.: - different order: {x}/{y}/{z}.pbf - with intermediate dirs: {x}/a/{y}/b/{z}.pbf - with different extension: {z}/{y}/{y}.pbf.gz instead of {x} and {y}, {xs} and {xy} can be used which breaks up x and y into 2 directories each and ensures that each directory has <1000 children
1. call finish archive only once after all writers are finished ...and not every time a writer finishes 2. log "zoom-progress" for the first tile write only (Finished z11 ... now starting z12) 3. remove file/dir-size progress logger bottleneck for files archive => each archive now reports the bytes written, which also fixes the issues of stream-archives reporting the size incorrectly 4. introduce printStats-hook on archive-level
...allow to use virtual threads ExecturService (bound only!) for tile writing also add some benchmark for writing tiles to disk: fixed, bound virtual, async, unbound virtual
This reverts commit b8cfa56.
- extract TileSchemeEncoding - use Counter.MultithreadCounter rather than LongAdder to count bytes written - add some JavaDoc
1. allow to pass tile scheme directly via output: --output=tiles/{x}/{y}/{z}.pbf 2. auto-encode { (%7B) and } (%7D) => no need to encode it the URI on CLI
c41b953
to
4fa3ed2
Compare
I added support for those shortcuts here: 7d0bc26 I also added javadoc where I thought it makes sense but let me know if you want more. |
1. use WriteableTileArchive#bytesWritten in summmary as well 2. call WriteableTileArchive#init in a safer manner ..and a few more adjustments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making those changes, it's looking good, a couple of minor comments...
planetiler-core/src/main/java/com/onthegomap/planetiler/archive/TileArchiveConfig.java
Show resolved
Hide resolved
@@ -371,8 +387,6 @@ private void tileWriter(Iterable<TileBatch> tileBatches) throws ExecutionExcepti | |||
if (time != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry I meant the tileWriter.printStats();
line - but I see now that's separate from archive.printStats();
It looks like nothing overrides archive.printStats() - might just be best to remove it?
planetiler-core/src/main/java/com/onthegomap/planetiler/files/FilesArchiveUtils.java
Outdated
Show resolved
Hide resolved
planetiler-core/src/main/java/com/onthegomap/planetiler/stats/Stats.java
Outdated
Show resolved
Hide resolved
planetiler-core/src/main/java/com/onthegomap/planetiler/stream/WriteableStreamArchive.java
Outdated
Show resolved
Hide resolved
Quality Gate passedKudos, no new issues were introduced! 0 New issues |
Should be all set now. Let me know if there's anything else. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thank you!!
i.e. write individual pbf-files to disk in the format BASE/z/x/y.pbf
in order to use that format it must be passed as "--ouput=/path/to/tiles?format=files"
Fixes #536
note: it also supports concurrent writing - just have to pass --tile_write_threads=5