Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve query efficiency: minimise data processed per query request. #14

Open
time-trader opened this issue Nov 29, 2022 · 2 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@time-trader
Copy link
Contributor

time-trader commented Nov 29, 2022

Currently any sensor data query processes the whole aerosense-twined.greta.sensor_data table.
SELECT * Here Instead of SELECT datetime, "f0_", "f1_", ..... etc.

Problem

Currently the table contains around 50GB of data. So 20 Queries will result in 1TB of data processed costing around 5 USD at current pricing.

Solutions

The query is charged according to the total data processed in the columns we select, and the total data per column is calculated based on the types of data in the column. So:

  1. Immediate solution: select only the datetime and sensor_value columns.
  2. Longer term solutions to discuss:
    2.1 Create permanent tables for each sensor_type with columns for each senor
    2.2 Introduce table partitioning (for example on daily basis)
@time-trader time-trader added the enhancement New feature or request label Nov 29, 2022
@cortadocodes cortadocodes linked a pull request Nov 29, 2022 that will close this issue
@cortadocodes
Copy link
Member

@time-trader
Copy link
Contributor Author

time-trader commented Nov 29, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants