Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a separate BigQuery table for "Connection statistics" and "Battery state" sensor_type. #111

Closed
time-trader opened this issue Dec 12, 2022 · 2 comments
Assignees
Labels
wontfix This will not be worked on

Comments

@time-trader
Copy link
Contributor

Feature request

Use Case

Auxiliary data such as connection statistics (and in future battery state #35) are needed for debugging and monitoring purposes and would rarely be queried together with the rest of the data.

Current state

All data is stored in sensor_data column and is processed for every query. See Related issue in aerosense-tools.

@thclark
Copy link
Contributor

thclark commented Dec 12, 2022

This is almost certainly better dealt with by partitioning and/or clustering, since it's currently all wired up already as receipt of raw data (there's nothing inherently special compared to any other type of data, if we were to accept this pattern, we should simply do this for every different kind, which we avoid in order to allow the data to evolve)

@time-trader
Copy link
Contributor Author

@thclark Allow me to partially disagree. This data (connection stats and battery info) has entirely different purpose from the rest (measurements of direct interest), and thus different use and hence different lifecycle. Just a quick examples:

  1. Almost no-one from outside of aerosense would be interested in this data.
  2. This data is never queried together /compared to other data
  3. Primary possible use is in system supervisory monitoring (i.e.) sending it to the dashboard (so #of requests is higher than for other "sensors") and maybe debugging/system development (?)

From another side, I see your point as the total amount of this type of data per day is very little:
10 bytes per sample * 22.(2) samples per second * 86400 seconds in day ~ 20 Mb (60Mb if we have 3 nodes)

And in terms of dashboard development I imagine it will rely more and more on data from post-processed tables rather than the raw data table.. so it might be worth just splitting it from the rest at post-process stage.

Lets put it as "won't fix" for now and close it later?

@time-trader time-trader added the wontfix This will not be worked on label Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants