- Use PostGIS functions to explore properties of geometries in dataset
- Combine PostGIS with other PostgreSQL
WHERE
conditions,GROUP BY
s,ORDER BY
s, etc. - Deal with the quirks of messy data
- Get familiar with some OpenStreetMap data
Dataset: OpenStreetMap Buildings for University City Neighborhood
- Preview data
- Download link (copy and paste this into the Carto upload tool):
https://raw.githubusercontent.com/MUSA-509/week-2-digging-into-databases/master/data/university_city_osm_buildings.geojson
I fetched this data from GeoFabrik's wonderful OSM download service. I downloaded the Pennsylvania file and filtered to only included building footprints that land in University City in Philadelphia.
About the dataset, notice the schema:
cartodb_id
(number, actually an integer): Carto adds this as a utility for optimizing performance of this table. It wasn't in the original dataset.the_geom
(geometry): Carto usesthe_geom
for the geometry column. In the GeoJSON it wasgeometry
as per the GeoJSON specification.osm_id
(string): A unique identifier for this geometry in the OSM databasecode
(number): OSM code for type building. This are all the same in this dataset.fclass
(string): Description of OSM feature. All arebuilding
here.name
(string): Name of building. Note: many are not namedbuilding_type
(string): Type of building (if assigned). Notice that many are null valued.
- Upload data to your Carto account
- Open dataset in map or data view and toggle on the SQL panel
- Clone this repository or download the
Lab.md
Markdown file - Edit the file in your favorite text editor. Here are two recommendations:
Use the PostGIS function ST_Area
. Since our geometries are in WGS84, we need to use geography
type casting (the_geom::geography
).
SELECT avg(ST_Area(the_geom::geography)) as avg_area_sq_m
FROM andyepenn.university_city_osm_buildings
Write a query to give:
- type of building
- average area of building by type
- count of buildings of this type
SELECT building_type, avg(ST_Area(the_geom::geography)) as avg_area_sq_m, count(*) as building_count
FROM andyepenn.university_city_osm_buildings
GROUP BY building_type
Which building is largest?
SELECT building_type, name, ST_Area(the_geom::geography) as building_area_sq_m
FROM andyepenn.university_city_osm_buildings
ORDER BY ST_Area(the_geom::geography) DESC
LIMIT 1
Which building is smallest?
SELECT building_type, name, ST_Area(the_geom::geography) as building_area_sq_m
FROM andyepenn.university_city_osm_buildings
ORDER BY ST_Area(the_geom::geography) ASC
LIMIT 1
Smallest with a building name that is not null:
SELECT building_type, name, ST_Area(the_geom::geography) as building_area_sq_m
FROM andyepenn.university_city_osm_buildings
WHERE name is not null
ORDER BY ST_Area(the_geom::geography) ASC
LIMIT 1
SELECT building_type, name, ST_Area(the_geom::geography) as building_area_sq_m
FROM andyepenn.university_city_osm_buildings
WHERE name = 'Meyerson Hall'
Using the ST_Distance
function, find the 10 closest buildings to Meyerson Hall.
Hint: We can use ST_SetSRID(ST_MakePoint(-75.19265679, 39.9522405), 4326)
for the geometry to represent Meyerson Hall.
SELECT
building_type,
name,
ST_Distance(
the_geom::geography,
ST_SetSRID(ST_MakePoint(-75.19265679, 39.9522405), 4326)::geography
) as building_distance_m
FROM
andyepenn.university_city_osm_buildings
ORDER BY
ST_Distance(the_geom::geography, ST_SetSRID(ST_MakePoint(-75.19265679, 39.9522405), 4326)::geography) ASC
LIMIT 10
PostGIS has a distance operator (<->
) that allows one to do distance ordering in an ORDER BY
.
SELECT
building_type,
name,
ST_Distance(
the_geom::geography,
ST_SetSRID(ST_MakePoint(-75.19265679, 39.9522405), 4326)::geography
) as building_distance_m
FROM andyepenn.university_city_osm_buildings
ORDER BY the_geom::geography <-> ST_SetSRID(ST_MakePoint(-75.19265679, 39.9522405), 4326)::geography
LIMIT 10
Use the ST_Intersects
function to exclude Meyerson Hall. We know that Meyerson Hall is at the point ST_SetSRID(ST_MakePoint(-75.19265679, 39.9522405), 4326)
. ST_Intersects
returns true/false/null. To only included geometries that do not intersect, precede ST_Intersects
with NOT
.
SELECT
building_type,
name,
ST_Distance(
the_geom::geography,
ST_SetSRID(ST_MakePoint(-75.19265679, 39.9522405), 4326)::geography
) as building_distance
FROM andyepenn.university_city_osm_buildings
WHERE name != 'Meyerson Hall'
ORDER BY the_geom::geography <-> ST_SetSRID(ST_MakePoint(-75.19265679, 39.9522405), 4326)::geography
LIMIT 10
Or use the following if we did't have a name attribute for Meyerson Hall and instead only had the geometry to uniquely define it.
SELECT
building_type,
name,
ST_Distance(
the_geom::geography,
ST_SetSRID(ST_MakePoint(-75.19265679, 39.9522405), 4326)::geography
) as building_distance
FROM andyepenn.university_city_osm_buildings
WHERE Not ST_Intersects(the_geom, ST_SetSRID(ST_MakePoint(-75.19265679, 39.9522405), 4326))
ORDER BY the_geom::geography <-> ST_SetSRID(ST_MakePoint(-75.19265679, 39.9522405), 4326)::geography
LIMIT 10
Use the ST_ConvexHull
function to find the minimum convex shape that includes all of the buildings in University City. See step 5 in the Mapping Follow Along from this week's lecture.
SELECT
ST_ConvexHull(
ST_Collect(the_geom)
) as the_geom,
ST_Transform(
ST_ConvexHull(
ST_Collect(the_geom)
),
3857
) as the_geom_webmercator,
1 as cartodb_id
FROM andyepenn.university_city_osm_buildings
Note: I added the utility columns the_geom_webmercator
and cartodb_id
so that we could visualize this on a map.
This query gives us the largest building (by building footprint area) for each building class:
SELECT
ST_Area(b.the_geom::geography) as area,
b.name,
b.building_type
FROM andyepenn.university_city_osm_buildings as b
JOIN (
SELECT max(ST_Area(the_geom::geography)) as maxarea, building_type
FROM andyepenn.university_city_osm_buildings
GROUP BY building_type
) as m
ON m.building_type = b.building_type AND ST_Area(b.the_geom::geography) = m.maxarea
Modify this query to give the smallest building for each class instead:
SELECT
ST_Area(b.the_geom::geography) as area,
b.name,
b.building_type
FROM andyepenn.university_city_osm_buildings as b
JOIN (
SELECT min(ST_Area(the_geom::geography)) as maxarea, building_type
FROM andyepenn.university_city_osm_buildings
GROUP BY building_type
) as m
ON m.building_type = b.building_type AND ST_Area(b.the_geom::geography) = m.maxarea
Note: This is using a new keyword called "JOIN" that we will learn more about next week.