-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Make has_time_series faster #416
base: main
Are you sure you want to change the base?
Conversation
6c3dbab
to
e2b7959
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #416 +/- ##
==========================================
+ Coverage 77.75% 77.82% +0.06%
==========================================
Files 69 69
Lines 5481 5636 +155
==========================================
+ Hits 4262 4386 +124
- Misses 1219 1250 +31
Flags with carried forward coverage won't be shown. Click here to find out more.
|
e2b7959
to
7329536
Compare
85e42ee
to
bb3a3b3
Compare
bb3a3b3
to
bfe1415
Compare
@@ -567,7 +567,7 @@ function _transform_single_time_series!( | |||
horizon = params.horizon, | |||
time_series_type = DeterministicSingleTimeSeries, | |||
scaling_factor_multiplier = get_scaling_factor_multiplier(metadata), | |||
internal = get_internal(metadata), | |||
internal = InfrastructureSystemsInternal(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous behavior was incorrect, but was harmless. It caused the metadata of the transformed object to have the same UUID, but no code ever looked at that.
@@ -110,7 +110,11 @@ function add_time_series!(mgr::TimeSeriesManager, batch::Vector{TimeSeriesAssoci | |||
for uuid in new_ts_uuids | |||
serialize_time_series!(mgr.data_store, time_series_uuids[uuid]) | |||
end | |||
add_metadata!(mgr.metadata_store, owners, all_metadata) | |||
if length(all_metadata) == 1 | |||
add_metadata!(mgr.metadata_store, owners[1], all_metadata[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is faster than calling the the array version.
@@ -23,14 +23,17 @@ const SUPPLEMENTAL_ATTRIBUTE_TABLE_NAME = "supplemental_attributes" | |||
|
|||
mutable struct SupplementalAttributeAssociations | |||
db::SQLite.DB | |||
# If we don't cache SQL statements, there is a cost of 3-4 us on every query. | |||
cached_statements::Dict{String, SQLite.Stmt} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some minor changes to the supplemental attribute associations, but did not profile them. If the time series changes work out, I can pursue these in greater detail.
# Caching compiled SQL statements saves 3-4 us per query query. | ||
# DBInterface.jl does something similar with @prepare. | ||
# We need this to be tied to our connection. | ||
cached_statements::Dict{String, SQLite.Stmt} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This applies to selective queries that we run frequently.
# This caching allows the code to skip some string interpolations. | ||
# It is experimental for PowerSimulations, which calls has_metadata frequently. | ||
# It may not be necessary. Savings are less than 1 us. | ||
has_metadata_statements::Dict{HasMetadataQueryKey, SQLite.Stmt} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cached applies only to the has_time_series
path, which is the most important to optimize for PSI. And is possibly over-engineered.
47db714
to
297a698
Compare
@daniel-thom now there is a workaround in PowerSimulations to handle this case "The fact that we expect has_time_series(component, Deterministic) to return true when only DeterministicSingleTimeSeries is present causes extra latency." and any impact of that can be addressed. |
Currently, |
This PR should remain in draft until the performance in PSI is judged acceptable.
The existing code had a few issues that caused slowness when calling
has_time_series
orget_time_series
:to_json(sys)
followed bySystem(file.json)
. The new code ensures that they are always set. This could have caused latency of up to 30-40 us on each query.SQLite.DBInterface.execute(db, "a sql query as string")
has an overhead of ~3-4 us. The delay seems to come from compiling the SQL statement. I'll note that Python does not suffer from this latency. Interpolating strings causes additional latency. This new code caches the statements. It also avoids string interpolations, but only for thehas_time_series
code path. This is perhaps over-engineered. If this still isn't fast enough for PSI, we need to re-think the entire idea and allow the application to defined what time series attributes should be cached.get_time_series
results in a SQL query to retrieve the time series metadata, which is stored as a JSON string, and then deserialize it into an object. That takes ~30 us. In most cases, this is harmless. Reading the data from the disk takes a lot longer than that. Some users might want to use the in-memory time series storage, and so might notice this delay. To solve that I split the metadata-as-json data into a separate table and added caching of the real metadata objects. De-serialization will only occur on the first read request.has_time_series(component, Deterministic)
to return false if the component hasDeterministicSingleTimeSeries
. This PR fixes the bug.The fact that we expect
has_time_series(component, Deterministic)
to return true when only DeterministicSingleTimeSeries is present causes extra latency. We could remove that latency by changing the requirement (make the user ask for the exact type stored) or add another column to the database table.