-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Ian Turton
committed
Nov 11, 2023
1 parent
1f1d381
commit e80c764
Showing
2 changed files
with
98 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
--- | ||
layout: post | ||
title: Is GeoJSON a spatial data format? | ||
date: 2023-11-11 | ||
categories: gis | ||
--- | ||
# Is GeoJSON a good spatial data format? | ||
|
||
A few days ago on Mastodon [Eli Pousson](https://fosstodon.org/@[email protected]) | ||
asked: | ||
|
||
> Can anyone suggest examples of files that can contain location info but aren't often considered spatial data | ||
> file formats? | ||
> | ||
He suggested EXIF, [Iván Sánchez Ortega](@[email protected] ) | ||
followed up with spreadsheets, and being devilish I said GeoJSON. | ||
|
||
This led to more discussion, with people asking why I thought that, so I instead of being flippant I thought | ||
about it. This blog post is the result of those thoughts which I thought were kind of obvious but from things | ||
people have said since may be aren't that obvious. | ||
|
||
I've mostly been a developer for most of my career so my main interest in a spatial data format is that: | ||
|
||
1. it stores my spatial data as I want it to, | ||
2. it's fast to read and to a lesser extent, write. | ||
3. It's easy to manage. | ||
|
||
One, seems to be obvious, if I store a point then ask for it back I want to get that point back (to the limit | ||
of the precision of the processor's floating point). If a format can't manage that then please don't use it. | ||
This is not common but Excel comes to mind as a program that takes good data and trashes it. If it isn't | ||
changing [gene names into | ||
dates](https://www.theverge.com/2020/8/6/21355674/human-genes-rename-microsoft-excel-misreading-dates) then | ||
it's [reordering the dbf file to destroy your | ||
shapefile](https://gis.stackexchange.com/questions/132359/how-is-attribute-data-in-dbf-file-tied-to-shapefile-location-data-in-shp-file). | ||
GeoJSON also can fail at this as the standard says that I must store the data in WGS:84 (lon/lat), which is | ||
fine if that is the format that I store my data in already, but suppose I have some high quality OSGB data | ||
that is carefully surveyed to fractions of a millimetre and the underlying code does a conversion to WGS:84 in | ||
the background and further the developer wanted to save space and limited the number of decimal places to say | ||
6 (OK, [that was me](https://osgeo-org.atlassian.net/browse/GEOT-6650)) when it gets converted back to OSGB | ||
I'm looking at centimetres (or worse) but given the vagaries of floating point representation I may not be | ||
able to tell. | ||
|
||
Two, comes from being a GeoServer developer, a largish chunk of the time taken to draw a web map (or stream | ||
out a WFS file) is taken up by reading the data from the disk. Much of the rest of the time is converting the | ||
data into a form that we can draw. Ideally, we only want to read in the features needed for the map the user | ||
has requested (actually, ideally we want to **not** read in most of the data by having it already be in the | ||
cache, but that is hard to do). So we like indexed datasets both spatial indexes and attribute indexes can | ||
help substantially speed up map drawing. As the size of spatial datasets increases the time taken to fetch the | ||
next feature from the store becomes more and more important. An index allows the program to skip to the | ||
correct place in the file for either a specific feature or for features that are in a specific place or | ||
contain a certain attribute with the requested value. This is a great time saver, imagine trying to look | ||
something up in a big book by using the index compared to paging through it reading each page in turn. | ||
|
||
After one or more indexes the main thing I look for in a format is a binary format that is easy to read (and | ||
write). GeoJSON (and GML) are both problematic here as they are text formats (which is great in a transfer | ||
format) and so for every coordinate of every spatial object the computer has to read in a series of digits | ||
(and punctuation) and convert that into an actual binary number that it can understand. This is a slow | ||
operation (by computer speeds anyway) and if I have a couple of million points in my coastline file then I | ||
don't want to do 4 million slow operations before I even think of drawing something. | ||
|
||
Three, I have to interact with users on a fairly regular basis and in a lot of cases these are not spatial | ||
data experts. If a format comes with up to a dozen similarly named files (that are all important) that a GIS | ||
will refuse to process unless you guess which is the important one then it is more of a pain than a help. And | ||
yes shapefile I'm looking at you. If your process still makes use of Shapefiles please, please stop doing that | ||
to your users (and the support team) and switch over to GeoPackages which can store hundreds of data sets | ||
inside a single file, All good GIS products can process them by now, they have been an OGC standard for nearly | ||
10 years. If you don't think that shapefiles are confusing go and ask your support team how often they have | ||
been sent just the `.shp` file (or 11 files but not the `.sbn`) or how often they have seen people who have | ||
deleted all the none `.shp` files to save disk space. | ||
|
||
My other objection to GeoJSON is that I don't know what the structure (or schema) of the data set is until I | ||
have read the entire file. That last record could add several bonus attributes, in fact any (or all) of the | ||
records could do that, from a parsers view it is a nightmare. At least GML provides me with a fixed schema and | ||
enforces it through out the file. | ||
|
||
When I'm storing data (as opposed to transferring it) I use PostGIS, it's fast and accurate, can store my data | ||
in whatever projection I chose and is capable of interfacing with any GIS program I am likely to use, and if | ||
I'm writing new code then it provides good, well tested libraries in all the languages I care about so I don't | ||
have to get into the weeds of parsing binary formats. If I fetch a feature from PostGIS it will have exactly | ||
the attributes I was expecting no more or less. It has good indexes and a nifty DSL (SQL) that I can use to | ||
express my queries that get dealt with by a cool query optimiser that knows way more than I do about how to | ||
access data in the database. | ||
|
||
If for some reason I need to access my data while I'm travelling or share it with a colleague then I will use | ||
a GeoPackage which is a neat little database all packaged up in a single file. It's not a quick as PostGIS so | ||
I wouldn't use it for millions of records but for most day to day GIS data sets it's great. You can even store | ||
you QGIS styles and project in it to make it a single file project transfer format. | ||
|
||
One final point, I sometimes see people preaching that we should go cloud native (and often serverless) by | ||
embracing "modern" standards like GeoJSON and COGs. GeoJSON should never be used as a cloud native storage | ||
option (unless it's so small you can read it once and cache it in memory in which case why are you using the | ||
cloud) as it is large (yes, I know it compresses well) and slow to parse (and slower still if you compressed | ||
it first) and can't be indexed. So that means you have to copy the whole file from a disk on the far side of a | ||
slow internet connection. I don't care if you have fibre to the door it is still slow compared to the disk in | ||
your machine! | ||
|
||
![The Jack Sparrow worst pirate meme but for GeoJSON](/images/geojson.jpg ) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.