-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add selectivity estimate? #30
Comments
Yes, I've been watching this topic for a long while and this is the reason I've implemented the selectivity function for parts of q3c_join (check here ). The problem is that the q3c_ functions are SQL inlined and the underlying clauses that are executed are q3c_ang2ipix()<XX and q3c_ang2ipix()>YY. So the only way to have selectivity for this is to make custom type and operator which as far as I am aware would not allow the bitmap index scan.So basically I don't know of a way of making it work. I would really want though. If you can demonstrate the approach that would work, I'll gladly help/accept it. But to my knowledge without going into new types/operators/operator classes it impossible and even with that I'm not sure. |
Looking at the https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=74dfe58a5927b22c744b29534e67bfdd203ac028 patch, it seems it only supports conditions like That would do the job, because writing those selectivities would be easy. |
On Wed, Feb 10, 2021 at 06:58:20AM -0800, Sergey Koposov wrote:
Yes, I've been watching this topic for a long while and this is the
reason I've implemented the selectivity function for parts of
q3c_join (check here
https://github.com/segasai/q3c/blob/0254998c3a6084e53a4f6d4bd07552eabd453aaf/q3c.c#L136
). The problem is that the q3c_ functions are SQL inlined and the
underlying clauses that are executed are q3c_ang2ipix()<XX and
q3c_ang2ipix()>YY. So the only way to have selectivity for this is
to make custom type and operator which as far as I am aware would
not allow the bitmap index scan.
So, the trouble is that either
(a) we can provide selectivity estimates to q3c_join(ra1, dec1, ra2,
dec2, radius) but then lose the ability to have index scans,
defeating the whole purpose, or
(b) we keep letting postgres inline things and then lose the ability
to provide selectivity estimates?
Ouch. Since @laurenz has written the nice blog post: Perhaps he can
point at a route to escape from this conundrum?
|
That's my understanding, but I've been also looking at the git commits of those support functions, and maybe there is something there that provides support for func()<value queries or provides a way to specify index strategy given a function call, but I am not sure. There isn't a lot of documentation there.. I was thinking previously writing an email pgsql-hackers asking about it, but haven't had time. (if @laurenz can help, that'd be great). to be able to execute
queries with user-provided selectivity for between operator. |
Hm. Can't you inline a function, and the inlined expression contains a function with a support function? |
Yes, that's idea. But I don't know if the support functions for function F can provide selectivity for queries like this Line 204 in 0254998
|
After looking at what You would do away with the inlined SQL function and use a support function in its place, see PostGIS' The support function would make the optimizer run an index scan instead of or as a filter before executing the function. |
Thank you for the pointer @laurenz , I'll take a look there. |
I've never understood why in q3c--2.0.0.sql the routine q3c_join tests for a range of values against q3c_nearby_it with four cases but q3c_radial_query tests for 200 cases. Is that number related to the number of polygons that can be tested for, or is that a coincidence? Is this documented anywhere? |
The reason is that the radial query is mostly designed for large circle queries, so making a more precise approximation to a circle by going deeper in the quad-tree can save significant I/O. For the q3c_join, the main use case was thought to have a pretty small x-match radius, therefore it is less crucial to have a good quad-tree approximation to a circle, as likely stuff will be sitting on the same disk block anyway. That's the original reasoning (I think). Whether in practice q3c_radial_query going deep is that beneficial is not 100% obvious (but for sure for a query near the galactic plane in say gaia q3c_radial_query with just 4 squares would run significantly longer) |
Might we be over thinking this? I'm also struggling with the issue plans not using the q3c index when I (as a human) think they should. Rather than dealing with anything complicated, perhaps simply dealing with a SupportRequestSelectivity call to allow the planner to make better row estimates would help. I don't know if people can see the plan at https://explain.dalibo.com/plan/RMS Thoughts? |
That's exactly what seljoin does But I attach that selectivity to a special operator see https://github.com/segasai/q3c/blob/master/scripts/q3c--2.0.0.sql I also looked recently at a way of using the int8range types to replace the q3c_ang2ipix()> <= conditions |
I started a thread on https://dba.stackexchange.com/questions/312526/in-postgresql-13-how-do-i-ensure-a-supportrequestselectivity-is-made where I show a first attempt at a selectivity operator, but my function isn't |
Great! |
Actually, I'm not sure the issue is the row estimation, but the cost estimation. From an email to pgsql-hackers today: Greg Hennessy [email protected] On Thu, May 26, 2022 at 3:10 PM Tom Lane [email protected] wrote:
For a small value of a match radius (0.005 degrees): q3c_test=# explain (analyze, buffers) select * from test as a, test1 as (note: I deleted some of the output, since I think I'm keeping the So, the cost of the query is calculated as 2e10, where it expect five rows, When I do the same sort of query on a much larger match radius (5 deg) I The "total cost" is the same identical 2e10, this time the number of |
The problem here is that Line 209 in 1290e00
so PG still expects the same selectivity and therefore the cost for the part of the query involving q3c_ang2ipix()> q3c_ang2ipix()< conditions which leads to similar costs for different radii. I think at some point I bumped up the CPU cost of the q3c_sindist() make the sequetial queries less likely, but I don't think that works correctly. |
This may be a crazy thing for me to try, but I'm going to see if adding selectivity to q3c_nearby_it helps or not. |
I doubt it will work. In my understanding the selectivity can be applied only to functions or operators returning booleans. |
its a rainy day, and i'm bored. |
I still believe now that the best approach is to use multirange type and special operator between (q3c_ang2ipix() <@ multirange) that would do the 'Simplify' operation into a bunch of ranges. or maybe even just using a set of single ranges. |
You know far more about what is going on than I do, I'm just treating this as a learning experience, and don't mind trying |
Sure, no problem. It'd be great if you could find a solution. |
I think it might be possible if we were to use a GIST index rather than a b-tree one. A GIST index allows @>, so I think we can use the same ang2ipix(ra,dec) index, and then come up with a multirange function that does the same as q3c_nearby_it. I've not learned the buttons to push to deal with merge requests. I can spend some time on this. if i tried to do a solution Thoughts? |
If you can make somehow the @> work, that'd be great. Again, in here https://www.mail-archive.com/[email protected]/msg30620.html I tried and I learned that I ang2ipix(ra,dec) <@ multirange won't use the index unfortunately. So the only way I know how to make it work with multirange if the gist index is created on multirange(ang2ipix(ra,dec),ang2ipix(ra,dec)+1). Also I'm perfectly happy if improvements will only be acceptable in latest PG versions. My biggest production DB with Q3C currently uses 13. And I usually try to migrate to new version 1 year after the release. So 14 would be fine from my point of view. |
I'm able to get ang2ipix(ra,dec) <@ multirange if I create the make the ang2ipix part of a multirange (which means I'm still using 12 in my DB, but I'm about to move to RH 8 and PG14, at least that's what I'm telling my folks. |
And How do you plan make the multirange @>bigint work, since at least this doesn't use the index (PG14)
and Tom Lane says such functionality doesn't exist. |
I had been planning on using "anymultirange @> anyelement → boolean", which GIST is listed as being able to index on, but it doesn't work. Grumble. It works if I do multirange @> multirange. it works if i do multirnage @> range. it SHOULD work if I do multirange @> int8, but it doesn't. grr. |
So, I did some small amount of coding trying to deal with multiranges. I had intended to create a pull request against my clone of q3c, I hadn't intended it to be against the main repository, but I'm not sure that causes any problems. Obviously it isn't ready for prime time, so you can reject it. So far I'm mostly underwhelmed with the multirange functionality. While it does make the new sql code (aka q3c_radial_query_gist is way shorter), I've found
There still isn't a solution to the not being able to calculate selectivity, but i think the selectivities are better I attempt to attach my timing results. |
Thanks for giving it a try @gshennessy and the description! |
I took a new look at this issue, and I beleive I have a selectivity function working. It do seem to make a difference in the row estimation, but not in the cost. My understanding is the postgresql query optimizer chooses the lowest cost.
I do
where I calculate the trig inside the q3c_sindist_bool function, and the function returns boolean rather than I can do a merge request whenever you want to show my code, I'm currently trying to figure out what the tall poles Small progress is better than none, I guess. |
Thanks. If you do a PR, or post a patch in some form, I can take a look when I have time. I remember i tried in the past increase artificially the sindist cost in order to minimize the amount of times it's called in order to avoid seq scans, but I forgot why I stopped doing that. I can see how making q3c_sindist_bool function can help. |
I made a pull request. To first order changing the selectivity changes the number of estimated rows, but doesn't |
I should give a shout out to Laurenz Albe since he pointed me in the right direction. |
How do you expect me to react to this shout? |
No expectations. I simply wanted to acknowledge you were helpful in giving advice to me on how to solve the problem. Thank you. |
I'm still regularly struggling with abysimal query plans when q3c functions are in queries; disabling seqscans helps a bit of course, but it certainly is not an suboptimal and has unwelcome side effects in general.
Now the support functions Laurenz Albe reports on on https://www.cybertec-postgresql.com/en/optimizer-support-functions/ would seem to be helpful here; I would guess that giving q3c_join a selectivity of (roughly) match_radius**2/4 pi (so: circle area over sphere area) would already greatly improve the query plans (even acknowledging that star density varies by orders of magnitude over the sky).
Do you have plans to add support functions? Do you see major obstacles to at least a very rough implementation that would assume evenly distributed objects?
The text was updated successfully, but these errors were encountered: