Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAIMMS harvester causing excessive database load #551

Open
ghost opened this issue Sep 21, 2017 · 0 comments
Open

FAIMMS harvester causing excessive database load #551

ghost opened this issue Sep 21, 2017 · 0 comments
Assignees

Comments

@ghost
Copy link

ghost commented Sep 21, 2017

The FAIMMS harvester is running database queries which generate an unreasonable amount of load on the database server.

aodn/data-services#742 will mitigate the impact somewhat, however the harvester must be updated in order to perform more optimally.

The queries seem to run relatively quickly, but while running, the DELETE query very quickly consumes nearly all of the available Postgres buffer pool memory, causing every other connection to slow to a crawl.

The problematic queries are the ones that look like:

harvest=# select query from database_activity();
                                                    query                                                    
-------------------------------------------------------------------------------------------------------------
 DELETE FROM measurements_qaqc where measurement IN (with duplicates as (select channel_id, "TIME", count(*)+
 from measurements_qaqc                                                                                     +
 group by channel_id, "TIME"                                                                                +
 HAVING count(*) > 1)                                                                                       +
 select min(measurements_qaqc.measurement)                                                                  +
 from duplicates,measurements_qaqc                                                                          +
 where measurements_qaqc."TIME" = duplicates."TIME" AND measurements_qaqc.channel_id = duplicates.channel_id+
 GROUP BY  duplicates.channel_id, duplicates."TIME")

I'd suggest we need to rework this query on a copy of the whole schema such that it consumes substantially less memory!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant