Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace materialized views with incremental roll ups #45

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions sql/changesets.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
-- view with a schema that matches the legacy changesets table
CREATE VIEW changesets AS
CREATE TABLE changesets AS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? As-is, this will duplicate data in raw_changesets.

The historical reason for raw_changesets (table) vs changesets (view) was that I needed new table names for replacement data when I was updating v1. The API (etc.) assume the existence of relations (tables or views) without the raw_ prefix, so creating views was the easiest way to preserve compatibility.

raw_* should probably go away in favor of tables without the prefix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning on updating the parts of this project that write raw changeset stats to write to these tables (retiring raw_changesets)? If so, the API project may also need to be updated to refer to the correct tables.

SELECT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than SELECTing (and subsequently truncating, if all you wanted was the structure), CREATE TABLE changesets LIKE raw_changesets INCLUDING ALL will replicate the structure and indices without any of the data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Though in this case, some of the column names changed, so yeah... ;-)

id,
roads_added road_count_add,
Expand All @@ -16,4 +15,4 @@ CREATE VIEW changesets AS
editor,
user_id,
created_at
FROM raw_changesets;
FROM raw_changesets;
4 changes: 2 additions & 2 deletions sql/changesets_countries.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
CREATE VIEW changesets_countries AS
CREATE TABLE changesets_countries AS
SELECT
changeset_id,
country_id
FROM raw_changesets_countries;
FROM raw_changesets_countries;
4 changes: 2 additions & 2 deletions sql/changesets_hashtags.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
CREATE VIEW changesets_hashtags AS
CREATE TABLE changesets_hashtags AS
SELECT
changeset_id,
hashtag_id
FROM raw_changesets_hashtags;
FROM raw_changesets_hashtags;
2 changes: 1 addition & 1 deletion sql/countries.sql
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
CREATE VIEW countries AS
CREATE TABLE countries AS
SELECT
id,
name,
Expand Down
2 changes: 1 addition & 1 deletion sql/hashtag_stats.sql
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
CREATE MATERIALIZED VIEW hashtag_stats AS
CREATE TABLE hashtag_stats AS
SELECT
hashtag,
count(c.id) changesets,
Expand Down
2 changes: 1 addition & 1 deletion sql/hashtags.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-- view with a schema that matches the legacy hashtags table
CREATE VIEW hashtags AS
CREATE TABLE hashtags AS
SELECT
id,
hashtag
Expand Down
2 changes: 1 addition & 1 deletion sql/raw_countries_users.sql
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
CREATE MATERIALIZED VIEW raw_countries_users AS
CREATE TABLE raw_countries_users AS
SELECT
country_id,
user_id,
Expand Down
2 changes: 1 addition & 1 deletion sql/raw_hashtags_users.sql
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
CREATE MATERIALIZED VIEW raw_hashtags_users AS
CREATE TABLE raw_hashtags_users AS
SELECT *,
rank() OVER (ORDER BY edits DESC) edits_rank,
rank() OVER (ORDER BY buildings DESC) buildings_rank,
Expand Down
2 changes: 1 addition & 1 deletion sql/user_stats.sql
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
CREATE MATERIALIZED VIEW user_stats AS
CREATE TABLE user_stats AS
SELECT
user_id,
name,
Expand Down
2 changes: 1 addition & 1 deletion sql/users.sql
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
CREATE EXTENSION IF NOT EXISTS postgis;

-- view with a schema that matches the legacy users table
CREATE VIEW users AS
CREATE TABLE users AS
SELECT
id,
u.name,
Expand Down
29 changes: 25 additions & 4 deletions src/housekeeping.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,31 @@ const env = require("require-env");
const { Pool } = require("pg");

const QUERIES = [
"REFRESH MATERIALIZED VIEW CONCURRENTLY hashtag_stats",
"REFRESH MATERIALIZED VIEW CONCURRENTLY raw_countries_users",
"REFRESH MATERIALIZED VIEW CONCURRENTLY raw_hashtags_users",
"REFRESH MATERIALIZED VIEW CONCURRENTLY user_stats"
`INSERT INTO hashtag_stats
SELECT
hashtag,
count(c.id) changesets,
count(distinct c.user_id) users,
sum(road_km_added) road_km_added,
sum(road_km_modified) road_km_modified,
sum(waterway_km_added) waterway_km_added,
sum(waterway_km_modified) waterway_km_modified,
sum(roads_added) roads_added,
sum(roads_modified) roads_modified,
sum(waterways_added) waterways_added,
sum(waterways_modified) waterways_modified,
sum(buildings_added) buildings_added,
sum(buildings_modified) buildings_modified,
sum(pois_added) pois_added,
sum(pois_modified) pois_modified,
sum(pois_modified) josm_edits,
max(coalesce(closed_at, created_at))
FROM raw_changesets_hashtags ch
JOIN raw_changesets c ON c.id = ch.changeset_id
JOIN raw_hashtags h ON h.id = ch.hashtag_id
GROUP BY hashtag
ON CONFLICT do update;`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks incomplete.

The other aggregated tables also need to be updated incrementally.

More importantly, this doesn't actually update incrementally; it's effectively the same as what REFRESH MATERIALIZED VIEW does, rewriting each of the rows in the aggregated table (it'll take just as long on a fully-populated table). Instead, it should detect which changesets rows changed since the last run and INSERT new aggregated values / UPDATE existing aggregated values by adding to them / updating counts / updating max values.


];

const query = async (pool, query) => {
Expand Down