Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert springs to use ST_PointOnSurface and reformat SQL #3233

Merged
merged 1 commit into from
Jan 11, 2019

Conversation

pnorman
Copy link
Collaborator

@pnorman pnorman commented May 13, 2018

This partly replaces #3228 and is an example of converting a layer to use ST_PointOnSurface

@pnorman pnorman requested a review from matthijsmelissen May 13, 2018 02:39
"natural"
FROM planet_osm_point
WHERE way && !bbox!
) _
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like unwanted space + underscore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you suggest as an alternative for what to call the inner subselect? _ is a reasonably standard name when you just need a name. s, p, q are also common throw-away names.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, however I don't like such shortcuts in the big SQL section of a public project. Typical i for loop index might be OK in typical code, but this one is already hard to maintain, underscore is worse than any letter and really looks like a typo, and you can't expect that other people will be real programmers to catch it. Something self-commenting like inner_subselect would be probably good.

BTW: there are more undocumented conventions and sections that I don't understand, it might make sense to add more comments one day.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inner_subselect as a name would do for me.

@sommerluk
Copy link
Collaborator

Something self-commenting like inner_subselect would be probably good.

I support this. Longer identifiers do no harm during processing, but increase legibility of the code.

"natural"
FROM
(SELECT
ST_PointOnSurface(way) AS way,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: is psql smart enough to only do the ST_PointOnSurface calculation for objects satisfying "natural" IN ('spring'), in this query?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see now it avoids the duplication of WHERE "natural" IN ('spring').

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the conditions on the tag columns are simple enough a subselect isn't necessary, but on longer queries, these can be over a dozen lines.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, clear. And what about the question above?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres can do the query either way, the results are the same. It will pick whichever is cheapest, which will normally be using the index, filtering by geometry, then calculating st_pointonsurface.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand you correctly, in this query it will also calculate ST_PointOnSurface for polygons that are not natural=spring, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no guarantee one way or the other. It's unlikely to, but with a small database and the right indexes it might.

@@ -285,14 +285,22 @@ Layer:
<<: *osm2pgsql
table: |-
(SELECT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does wrapping this in a superquery accomplish anything at all? It seems to me that leaving lines 287-290 and 302 out would yield exactly the same result?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, since then the query would return all polygons regardless of their natural column.

@matthijsmelissen
Copy link
Collaborator

This is essentially the same method as #3228 (after my last changes), which gives performance problems. We therefore should check if this PR causes performance problems too.

@pnorman
Copy link
Collaborator Author

pnorman commented May 15, 2018

This is essentially the same method as #3228 (after my last changes)

This isn't the same method, since #3228 doesn't filter the points table by bounding box.

@matthijsmelissen
Copy link
Collaborator

This isn't the same method, since #3228 doesn't filter the points table by bounding box.

I fixed the filter on the points table now. Unfortunately, this still gives me a performance hit of about 50% of overall rendering time.

FROM planet_osm_point
WHERE way && !bbox!
) _
WHERE "natural" IN ('spring')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not add another WHERE way && !bbox! in the outer subquery? Because now it will also forward points to Mapnik that are outside of the bounding box that is currently being rendered (as long as they are the label position of a polygon that falls within that bounding box).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but the PR is no different than current behavior. I suppose the difference is that filtering on contains is now cheap to do in-db

@matthijsmelissen matthijsmelissen changed the title Convert springs to use ST_PointOnSurface and reformat SQL [WIP] Convert springs to use ST_PointOnSurface and reformat SQL Jan 3, 2019
@matthijsmelissen
Copy link
Collaborator

Marking this as WIP as I suspect this will cause performance issues.

@pnorman
Copy link
Collaborator Author

pnorman commented Jan 4, 2019

Marking this as WIP as I suspect this will cause performance issues.

What are those? I don't see any.

@matthijsmelissen matthijsmelissen changed the title [WIP] Convert springs to use ST_PointOnSurface and reformat SQL Convert springs to use ST_PointOnSurface and reformat SQL Jan 4, 2019
@matthijsmelissen
Copy link
Collaborator

Is anybody able to check the performance of this PR?

I think it would be good if we could give this attention, as it is a necessary step for the move to vector tikes.

@pnorman
Copy link
Collaborator Author

pnorman commented Jan 4, 2019

Is anybody able to check the performance of this PR?

I think it would be good if we could give this attention, as it is a necessary step for the move to vector tikes.

I found no performance issues when testing this and see nothing that could cause performance issues.

@matthijsmelissen
Copy link
Collaborator

@pnorman I still don't fully understand why we need the subselect. Why can't we simply do this?

(SELECT
    ST_PointOnSurface(way) AS way,
    "natural"
  FROM planet_osm_polygon
  WHERE way && !bbox!
    AND "natural" IN ('spring')
UNION ALL
SELECT
    way,
    "natural"
  FROM planet_osm_point
  WHERE way && !bbox!
    AND "natural" IN ('spring')

Is the only reason to avoid duplication of "natural" IN ('spring'), or is there a performance reason?

@matthijsmelissen
Copy link
Collaborator

Regarding performance: I made a quick and dirty test to extend the method proposed here to all layers. This shows no performance degradation. So I'm happy to merge this - I'm sorry that I didn't review it sooner.

Copy link
Collaborator

@matthijsmelissen matthijsmelissen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Awaiting your question about the reason for the subselect, and then this can be merged.

@pnorman
Copy link
Collaborator Author

pnorman commented Jan 11, 2019

Is the only reason to avoid duplication of "natural" IN ('spring'), or is there a performance reason?

Performance is the same. It avoids duplication and is a different way to think about it. The duplication isn't a big issue here, but when we get to a query like the POI layers, there's a huge WHERE clause, and deduplicating that is a big gain.

@matthijsmelissen
Copy link
Collaborator

Clear. You can merge it if you like (or I can merge it, as you prefer).

@pnorman pnorman merged commit 16db0b8 into gravitystorm:master Jan 11, 2019
matthijsmelissen pushed a commit to matthijsmelissen/openstreetmap-carto that referenced this pull request Jan 12, 2019
Resolves gravitystorm#1644.

This is a necessary step towards vector rendering (see gravitystorm#1644).

This is a second trial of gravitystorm#3228, based on the word by @pnorman in gravitystorm#3233.

This is a first step, once we have this than we can probably merge the point and poly layers (for example amenity-points and amenity-points-poly).
@pnorman pnorman deleted the springs_pointonsurface branch April 1, 2019 03:20
@jeisenbe jeisenbe added the code label Oct 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants