You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
clients_daily_scalar_aggregates contains metric name and types for every key, process, agg_type combination in each row:
This can be improved by using a nested struct for each metric so that the name, type, and key are only stored once per row:
This would break the existing schema but that should be easy to deal with by modifying the existing table in place and updating the downstream query.
A further improvement is to not store metrics with only null values but I'm not sure if there's some downstream need for retaining those values.
From testing on a 1% sample, the above schema change would reduce the table size to ~29%, not storing nulls would reduce to ~31%, and doing both would reduce to ~12.3%. This reduce storage costs significantly. If we change to a 7 day retention period for clients_daily_scalar, this would save around $150-$180 a week on storage and some more on downstream etl data scanned (maybe another couple hundred). This isn't huge but worth considering.
The text was updated successfully, but these errors were encountered:
From mozilla/bigquery-etl#919
mozilla/bigquery-etl#919 (comment)
The text was updated successfully, but these errors were encountered: