Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up forecast value read table #116

Open
peterdudfield opened this issue Oct 31, 2022 · 5 comments
Open

Speed up forecast value read table #116

peterdudfield opened this issue Oct 31, 2022 · 5 comments

Comments

@peterdudfield
Copy link
Contributor

peterdudfield commented Oct 31, 2022

Detailed Description

Currently takes ~ 6 seconds for each GSP to read forecast value table when looking at a X hour forecast.

Query at the momment is

Select DISTINCT ON (forecast_value.target_time) forecast_value.created_utc AS forecast_value_created_utc, forecast_value.uuid AS forecast_value_uuid, forecast_value.target_time AS forecast_value_target_time, forecast_value.expected_power_generation_megawatts AS forecast_value_expected_power_generation_megawatts, forecast_value.forecast_id AS forecast_value_forecast_id
FROM forecast_value 
JOIN forecast ON forecast.id = forecast_value.forecast_id 
JOIN location ON location.id = forecast.location_id
AND location.gsp_id = 105
and forecast_value.target_time >= '2022-10-30' 
AND forecast_value.created_utc >= '2022-10-29' 
AND forecast_value.target_time - forecast_value.created_utc >= interval '240 minute' 
AND forecast_value.created_utc < '2022-10-31 13:01'
AND forecast.created_utc >= '2022-10-30' 
ORDER BY forecast_value.target_time, forecast_value.created_utc DESC;

We have already partitation this table, but there are about 36 million rows per month. This means by the end of the month, its a bit slow. ~ 6 seconds per query. Note if you query again its quicker (this is how SQL works)

Context

  • FE app is slow due to this slow read

Possible Implementation

  • add datetime_interval join to forecast_value table. This means we would filter on these valyes and there would be no duplicate datetimes. This is fairly common pratice in databases
  • To test if this works, we could manually
    1 . add a column to a partition table, ALTER TABLE forecast_value ADD datetime_interval_id int;
    2 . update and add datetime_intervals
    3 . then run updated query, to see if it speeds up. If it does, can then add it in
@peterdudfield
Copy link
Contributor Author

peterdudfield commented Oct 31, 2022

Test results

This was done on different gsp locations
With no datetim interval [s]
15(gsp=35)
10 (gsp 99)
15 (gsp 189)
12 (gsp 250)
6 (gsp 259)
18 (gsp 121)
18 (gsp 200)
12 (gsp 150)
2 (gsp 2)
2 (gsp 4)
1 (gsp 6)
2 (gsp 8)

mean = 7

With no datetime interval [s]
10 (gsp=67)
2 (gsp 72)
20 (gsp 267)
15 (gsp 135)
6 (gsp 259)
10 (gsp 301)
4 (gsp 316)
8 (gsp 100)
19 (gsp 10)
3 (gsp 1)
2 (gsp 3)
2 (gsp 5)
2 (gsp 7)

mean = 8

I tried to make an index, but it took > 10 mins
CREATE INDEX forecast_value_2022_10_datetime_interval_id on forecast_value_2022_10(datetime_interval_id)

EXPLAIN ANALYZE Select DISTINCT ON (forecast_value.target_time) forecast_value.created_utc AS forecast_value_created_utc, forecast_value.uuid AS forecast_value_uuid, forecast_value.target_time AS forecast_value_target_time, forecast_value.expected_power_generation_megawatts AS forecast_value_expected_power_generation_megawatts, forecast_value.forecast_id AS forecast_value_forecast_id
FROM forecast_value 
JOIN forecast ON forecast.id = forecast_value.forecast_id 
JOIN location ON location.id = forecast.location_id
JOIN datetime_interval ON forecast_value.datetime_interval_id = datetime_interval.id
AND location.gsp_id = 211
-- and datetime_interval.start_datetime_utc >= '2022-10-25' 
-- and datetime_interval.end_datetime_utc <= '2022-10-26' 
and forecast_value.target_time >= '2022-10-25' 
and forecast_value.target_time <= '2022-10-26' 
AND forecast_value.created_utc >= '2022-10-25' 
AND forecast_value.target_time - forecast_value.created_utc >= interval '240 minute' 
AND forecast_value.created_utc < '2022-10-26 13:01'
AND forecast.created_utc >= '2022-10-25' 
ORDER BY forecast_value.target_time, forecast_value.created_utc DESC;

@peterdudfield
Copy link
Contributor Author

Could also add index on creation_utc - CREATE INDEX forecast_value_2022_10_created_utc on forecast_value_2022_10(created_utc DESC)

It took 20 mins, but maybe it could have been quicker

@peterdudfield
Copy link
Contributor Author

peterdudfield commented Oct 31, 2022

Other option

  1. Try changing double precision to real - https://www.postgresql.org/docs/current/datatype-numeric.html - done
  2. Look into cluster on the an index
  3. Create table that only stores the last 'week' of data, then read this - Create last week table #124
  4. create INDEX idx_forecast_created_utc on forecast(created_utc) (THIS SEEMS TO HELP) Also did index on forecast_value - created_utc. They both probably help
  5. reorder query, to filter on sub tables, rather than joining and filtering (NOT MUCH DIFFERENCE)

@peterdudfield
Copy link
Contributor Author

Would be good to bench makr against production to check the changes are doing the desired effect

@peterdudfield peterdudfield changed the title Speed up forecast valeue read table Speed up forecast value read table Nov 11, 2022
@peterdudfield
Copy link
Contributor Author

Above 3. should mean that there is atmost 7 days of data to read. This should make the fast to load. Lets review in 1 weeks time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Todo
Development

No branches or pull requests

1 participant