Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tile stats #656

Merged
merged 30 commits into from
Sep 22, 2023
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NOTICE.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ The `planetiler-core` module includes the following software:
- mil.nga.geopackage:geopackage (MIT license)
- org.snakeyaml:snakeyaml-engine (Apache license)
- org.commonmark:commonmark (BSD 2-clause license)
- org.tukaani:xz (public domain)
- Adapted code:
- `DouglasPeuckerSimplifier` from [JTS](https://github.com/locationtech/jts) (EDL)
- `OsmMultipolygon` from [imposm3](https://github.com/omniscale/imposm3) (Apache license)
Expand Down
175 changes: 175 additions & 0 deletions layerstats/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
Layer Stats
===========

This page describes how to generate and analyze layer stats data to find ways to optimize tile size.

### Generating Layer Stats

Run planetiler with `--output-layerstats` to generate an extra `<output>.layerstats.tsv.gz` file with a row per
tile layer that can be used to analyze tile sizes. You can also
msbarry marked this conversation as resolved.
Show resolved Hide resolved
get stats for an existing archive by
running:

```bash
java -jar planetiler.jar stats --input=<path to mbtiles or pmtiles file> --output=layerstats.tsv.gz
```

The output is a gzipped tsv with a row per layer on each tile and the following columns:

| column | description |
|---------------------|-------------------------------------------------------------------------------------------------------------------------------------------------|
| z | tile zoom |
| x | tile x |
| y | tile y |
| hilbert | tile hilbert ID (defines [pmtiles](https://protomaps.com/docs/pmtiles) order) |
msbarry marked this conversation as resolved.
Show resolved Hide resolved
| archived_tile_bytes | stored tile size (usually gzipped) |
| layer | layer name |
| layer_bytes | encoded size of this layer on this tile |
| layer_features | number of features in this layer |
| layer_attr_bytes | encoded size of the [attribute key/value pairs](https://github.com/mapbox/vector-tile-spec/tree/master/2.1#44-feature-attributes) in this layer |
| layer_attr_keys | number of distinct attribute keys in this layer on this tile |
| layer_attr_values | number of distinct attribute values in this layer on this tile |

### Analyzing Layer Stats

Load a layer stats file in [duckdb](https://duckdb.org/):

```sql
create table layerstats as select * from 'output.pmtiles.layerstats.tsv.gz';
```

Then get the biggest layers:

```sql
select * from layerstats order by layer_bytes desc limit 2;
```

| z | x | y | hilbert | archived_tile_bytes | layer | layer_bytes | layer_features | layer_attr_bytes | layer_attr_keys | layer_attr_values |
|----|------|------|-----------|---------------------|----------|-------------|----------------|------------------|-----------------|-------------------|
| 14 | 6435 | 8361 | 219723809 | 679498 | building | 799971 | 18 | 68 | 2 | 19 |
| 14 | 6435 | 8364 | 219723850 | 603677 | building | 693563 | 18 | 75 | 3 | 19 |

To get a table of biggest layers by zoom:

```sql
pivot (
select z, layer, (max(layer_bytes)/1000)::int size from layerstats group by z, layer order by z asc
) on z using sum(size);
```

| layer | 0 | 1 | 10 | 11 | 12 | 13 | 14 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
msbarry marked this conversation as resolved.
Show resolved Hide resolved
|---------------------|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|----|-----|-----|-----|
| boundary | 10 | 75 | 24 | 18 | 32 | 18 | 10 | 85 | 53 | 44 | 25 | 18 | 15 | 15 | 29 |
| landcover | 2 | 1 | 153 | 175 | 166 | 111 | 334 | 8 | 5 | 3 | 31 | 18 | 273 | 333 | 235 |
| place | 116 | 191 | 16 | 14 | 10 | 25 | 57 | 236 | 154 | 123 | 58 | 30 | 21 | 15 | 14 |
| water | 8 | 4 | 133 | 94 | 167 | 116 | 90 | 11 | 9 | 15 | 13 | 89 | 114 | 126 | 109 |
| water_name | 7 | 7 | 4 | 4 | 4 | 4 | 9 | 7 | 6 | 4 | 3 | 3 | 3 | 3 | 4 |
| waterway | | | 20 | 16 | 60 | 66 | 73 | | 1 | 4 | 2 | 18 | 13 | 10 | 28 |
| park | | | 90 | 56 | 48 | 19 | 50 | | | 53 | 135 | 89 | 75 | 68 | 82 |
| landuse | | | 176 | 132 | 66 | 140 | 52 | | | 3 | 2 | 33 | 67 | 95 | 107 |
| transportation | | | 165 | 95 | 312 | 187 | 133 | | | 60 | 103 | 61 | 126 | 287 | 284 |
| transportation_name | | | 30 | 18 | 65 | 59 | 169 | | | | | 32 | 20 | 18 | 13 |
| mountain_peak | | | 7 | 8 | 6 | 295 | 232 | | | | | | 8 | 7 | 9 |
| aerodrome_label | | | 4 | 4 | 4 | 4 | 4 | | | | | | | 4 | 4 |
| aeroway | | | 16 | 25 | 34 | 30 | 18 | | | | | | | | |
| poi | | | | | 22 | 10 | 542 | | | | | | | | |
| building | | | | | | 69 | 800 | | | | | | | | |
| housenumber | | | | | | | 413 | | | | | | | | |

To get biggest tiles:

```sql
create table tilestats as select
z, x, y,
any_value(archived_tile_bytes) gzipped,
sum(layer_bytes) raw
from layerstats group by z, x, y;
select * from tilestats order by gzipped desc limit 2;
```

NOTE: this group by uses a lot of memory so you need to be running in file-backed
mode `duckdb analysis.duckdb` (not in-memory mode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a one-liner to change the .tsv.gz into a .duckdb?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be

duckdb analysis.duckdb -cmd "CREATE TABLE layerstats AS SELECT * FROM 'output.pmtiles.layerstats.tsv.gz';"

to drop you into a shell after importing the file, of -c "create... to just create the file - given the shortcut's not much shorter than the individual steps I'm inclined to just leave them as separate steps for clarity.


| z | x | y | gzipped | raw |
|----|------|------|---------|--------|
| 14 | 6435 | 8361 | 679498 | 974602 |
| 14 | 6437 | 8362 | 613512 | 883559 |

To make it easier to look at these tiles on a map, you can define following macros that convert z/x/y coordinates to
lat/lons:

```sql
create macro lon(z, x) as (x/2**z) * 360 - 180;
create macro lat_n(z, y) as pi() - 2 * pi() * y/2**z;
create macro lat(z, y) as degrees(atan(0.5*(exp(lat_n(z, y)) - exp(-lat_n(z, y)))));
create or replace macro debug_url(z, x, y) as concat(
'https://protomaps.github.io/PMTiles/#map=',
z + 0.5, '/',
round(lat(z, x + 0.5), 5), '/',
round(lon(z, y + 0.5), 5)
);

select z, x, y, debug_url(z, x, y), layer, layer_bytes
from layerstats order by layer_bytes desc limit 2;
```

| z | x | y | debug_url(z, x, y) | layer | layer_bytes |
|----|------|------|----------------------------------------------------------------|----------|-------------|
| 14 | 6435 | 8361 | https://protomaps.github.io/PMTiles/#map=14.5/35.96912/3.72437 | building | 799971 |
| 14 | 6435 | 8364 | https://protomaps.github.io/PMTiles/#map=14.5/35.96912/3.79028 | building | 693563 |

Drag and drop your pmtiles archive to the pmtiles debugger to see the large tiles on a map. You can also switch to the
"inspect" tab to inspect an individual tile.

#### Computing Weighted Averages

If you compute a straight average tile size, it will be dominated by ocean tiles that no one looks at. You can compute a
weighted average based on actual usage by joining with a `z, x, y, loads` tile source. For
convenience, `top_osm_tiles.tsv.gz` has the top 1 million tiles from 90 days
of [OSM tile logs](https://planet.openstreetmap.org/tile_logs/) from summer 2023.

You can load these sample weights using duckdb's [httpfs module](https://duckdb.org/docs/extensions/httpfs.html):

```sql
install httpfs;
create table weights as select z, x, y, loads from 'https://raw.githubusercontent.com/onthegomap/planetiler/main/layerstats/top_osm_tiles.tsv.gz';
```

Then compute the weighted average tile size:

```sql
select
sum(gzipped * loads) / sum(loads) / 1000 gzipped_avg_kb,
sum(raw * loads) / sum(loads) / 1000 raw_avg_kb,
from tilestats join weights using (z, x, y);
```

| gzipped_avg_kb | raw_avg_kb |
|--------------------|-------------------|
| 47.430680122547145 | 68.06047582043456 |

If you are working with an extract, then the low-zoom tiles will dominate, so you can make the weighted average respect
the per-zoom weights that appear globally:

```sql
with zoom_weights as (
select z, sum(loads) loads from weights group by z
),
zoom_avgs as (
select
z,
sum(gzipped * loads) / sum(loads) gzipped,
sum(raw * loads) / sum(loads) raw,
from tilestats join weights using (z, x, y)
group by z
)
select
sum(gzipped * loads) / sum(loads) / 1000 gzipped_avg_kb,
sum(raw * loads) / sum(loads) / 1000 raw_avg_kb,
from zoom_avgs join zoom_weights using (z);
```

| gzipped_avg_kb | raw_avg_kb |
|-------------------|-------------------|
| 47.42996479265248 | 68.05934476347593 |

Binary file added layerstats/top_osm_tiles.tsv.gz
Binary file not shown.
10 changes: 10 additions & 0 deletions planetiler-core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,11 @@
<artifactId>jts-core</artifactId>
<version>1.19.0</version>
</dependency>
<dependency>
<groupId>org.tukaani</groupId>
<artifactId>xz</artifactId>
<version>1.9</version>
</dependency>
<dependency>
<groupId>org.geotools</groupId>
<artifactId>gt-shapefile</artifactId>
Expand Down Expand Up @@ -109,6 +114,11 @@
<artifactId>jackson-dataformat-xml</artifactId>
<version>${jackson.version}</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-csv</artifactId>
<version>${jackson.version}</version>
</dependency>
<dependency>
<groupId>io.prometheus</groupId>
<artifactId>simpleclient</artifactId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@
import com.onthegomap.planetiler.util.Geofabrik;
import com.onthegomap.planetiler.util.LogUtil;
import com.onthegomap.planetiler.util.ResourceUsage;
import com.onthegomap.planetiler.util.TileSizeStats;
import com.onthegomap.planetiler.util.TopOsmTiles;
import com.onthegomap.planetiler.util.Translations;
import com.onthegomap.planetiler.util.Wikidata;
import com.onthegomap.planetiler.worker.RunnableThatThrows;
Expand All @@ -38,6 +40,7 @@
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.List;
import java.util.Optional;
import java.util.function.Function;
import java.util.stream.IntStream;
import org.slf4j.Logger;
Expand Down Expand Up @@ -101,6 +104,7 @@ public class Planetiler {
private boolean useWikidata = false;
private boolean onlyFetchWikidata = false;
private boolean fetchWikidata = false;
private final boolean fetchOsmTileStats;
private TileArchiveMetadata tileArchiveMetadata;

private Planetiler(Arguments arguments) {
Expand All @@ -111,10 +115,11 @@ private Planetiler(Arguments arguments) {
if (config.color() != null) {
AnsiColors.setUseColors(config.color());
}
tmpDir = arguments.file("tmpdir", "temp directory", Path.of("data", "tmp"));
tmpDir = config.tmpDir();
onlyDownloadSources = arguments.getBoolean("only_download", "download source data then exit", false);
downloadSources = onlyDownloadSources || arguments.getBoolean("download", "download sources", false);

fetchOsmTileStats =
arguments.getBoolean("download_osm_tile_weights", "download OSM tile weights file", downloadSources);
nodeDbPath = arguments.file("temp_nodes", "temp node db location", tmpDir.resolve("node.db"));
multipolygonPath =
arguments.file("temp_multipolygons", "temp multipolygon db location", tmpDir.resolve("multipolygon.db"));
Expand Down Expand Up @@ -666,6 +671,10 @@ public void run() throws Exception {
output.uri() + " already exists, use the --force argument to overwrite or --append.");
}

Path layerStatsPath = arguments.file("layer_stats", "layer stats output path",
// default to <output file>.layerstats.tsv.gz
TileSizeStats.getDefaultLayerstatsPath(Optional.ofNullable(output.getLocalPath()).orElse(Path.of("output"))));

if (config.tileWriteThreads() < 1) {
throw new IllegalArgumentException("require tile_write_threads >= 1");
}
Expand Down Expand Up @@ -715,6 +724,9 @@ public void run() throws Exception {
if (!toDownload.isEmpty()) {
download();
}
if (fetchOsmTileStats) {
TopOsmTiles.downloadPrecomputed(config, stats);
}
ensureInputFilesExist();

if (fetchWikidata) {
Expand Down Expand Up @@ -762,8 +774,8 @@ public void run() throws Exception {

featureGroup.prepare();

TileArchiveWriter.writeOutput(featureGroup, archive, output::size, tileArchiveMetadata,
config, stats);
TileArchiveWriter.writeOutput(featureGroup, archive, output::size, tileArchiveMetadata, layerStatsPath, config,
stats);
} catch (IOException e) {
throw new IllegalStateException("Unable to write to " + output, e);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -441,11 +441,9 @@ public VectorTile addLayerFeatures(String layerName, List<? extends Feature> fea
}

/**
* Creates a vector tile protobuf with all features in this tile and serializes it as a byte array.
* <p>
* Does not compress the result.
* Returns a vector tile protobuf object with all features in this tile.
*/
public byte[] encode() {
public VectorTileProto.Tile toProto() {
VectorTileProto.Tile.Builder tile = VectorTileProto.Tile.newBuilder();
for (Map.Entry<String, Layer> e : layers.entrySet()) {
String layerName = e.getKey();
Expand Down Expand Up @@ -492,7 +490,16 @@ public byte[] encode() {

tile.addLayers(tileLayer.build());
}
return tile.build().toByteArray();
return tile.build();
}

/**
* Creates a vector tile protobuf with all features in this tile and serializes it as a byte array.
* <p>
* Does not compress the result.
*/
public byte[] encode() {
return toProto().toByteArray();
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,10 @@ default byte[] getTile(TileCoord coord) {
*/
CloseableIterator<TileCoord> getAllTileCoords();

default CloseableIterator<Tile> getAllTiles() {
return getAllTileCoords().map(coord -> new Tile(coord, getTile(coord)));
}

/**
* Returns the metadata stored in this archive.
*/
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
package com.onthegomap.planetiler.archive;

import com.onthegomap.planetiler.geo.TileCoord;
import java.util.Arrays;
import java.util.Objects;

/** A tile stored in an archive with coordinate {@code coord} and archived {@code bytes}. */
public record Tile(TileCoord coord, byte[] bytes) implements Comparable<Tile> {

@Override
public boolean equals(Object o) {
return (this == o) ||
(o instanceof Tile other && Objects.equals(coord, other.coord) && Arrays.equals(bytes, other.bytes));
}

@Override
public int hashCode() {
int result = coord.hashCode();
result = 31 * result + Arrays.hashCode(bytes);
return result;
}

@Override
public String toString() {
return "Tile{coord=" + coord + ", data=byte[" + bytes.length + "]}";
}

@Override
public int compareTo(Tile o) {
return coord.compareTo(o.coord);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
import com.onthegomap.planetiler.config.PlanetilerConfig;
import com.onthegomap.planetiler.geo.GeoUtils;
import com.onthegomap.planetiler.util.BuildInfo;
import com.onthegomap.planetiler.util.LayerStats;
import com.onthegomap.planetiler.util.LayerAttrStats;
import java.io.IOException;
import java.util.HashMap;
import java.util.LinkedHashMap;
Expand All @@ -43,7 +43,7 @@ public record TileArchiveMetadata(
@JsonProperty(ZOOM_KEY) Double zoom,
@JsonProperty(MINZOOM_KEY) Integer minzoom,
@JsonProperty(MAXZOOM_KEY) Integer maxzoom,
@JsonIgnore List<LayerStats.VectorLayer> vectorLayers,
@JsonIgnore List<LayerAttrStats.VectorLayer> vectorLayers,
@JsonAnyGetter @JsonDeserialize(using = EmptyMapIfNullDeserializer.class) Map<String, String> others,
@JsonProperty(COMPRESSION_KEY) TileCompression tileCompression
) {
Expand Down Expand Up @@ -73,7 +73,7 @@ public TileArchiveMetadata(Profile profile, PlanetilerConfig config) {
this(profile, config, null);
}

public TileArchiveMetadata(Profile profile, PlanetilerConfig config, List<LayerStats.VectorLayer> vectorLayers) {
public TileArchiveMetadata(Profile profile, PlanetilerConfig config, List<LayerAttrStats.VectorLayer> vectorLayers) {
this(
getString(config, NAME_KEY, profile.name()),
getString(config, DESCRIPTION_KEY, profile.description()),
Expand Down Expand Up @@ -145,7 +145,7 @@ public Map<String, String> toMap() {
}

/** Returns a copy of this instance with {@link #vectorLayers} set to {@code layerStats}. */
public TileArchiveMetadata withLayerStats(List<LayerStats.VectorLayer> layerStats) {
public TileArchiveMetadata withLayerStats(List<LayerAttrStats.VectorLayer> layerStats) {
return new TileArchiveMetadata(name, description, attribution, version, type, format, bounds, center, zoom, minzoom,
maxzoom, layerStats, others, tileCompression);
}
Expand Down
Loading