Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeseries #223

Merged
merged 36 commits into from
Aug 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
280adfe
minimally working
guilhermebodin Jun 25, 2024
8e60216
updates
guilhermebodin Jun 26, 2024
00d7823
Add tests
pedroripper Jun 26, 2024
6044361
Uncomment test
pedroripper Jun 26, 2024
5f4c05b
Uncomment more tests
pedroripper Jun 26, 2024
ac4ce80
Add more tests
pedroripper Jun 27, 2024
1f8be1b
update _set_default_pragmas
guilhermebodin Jun 28, 2024
3d19856
Update timeseries methods
guilhermebodin Jul 1, 2024
9002524
Fix
pedroripper Jul 1, 2024
14c5cdf
Merge branch 'gb/timeseries' of https://github.com/psrenergy/PSRClass…
pedroripper Jul 1, 2024
69d466d
Add script
pedroripper Jul 1, 2024
fe57b52
first try of time controller
guilhermebodin Jul 1, 2024
6dca9f3
minimally working
guilhermebodin Jul 2, 2024
cfe9663
Add tests for timecontroller
pedroripper Jul 2, 2024
f679465
update
guilhermebodin Jul 2, 2024
105059c
Merge branch 'gb/timeseries' of https://github.com/psrenergy/PSRClass…
guilhermebodin Jul 2, 2024
6e1dbb3
update
guilhermebodin Jul 2, 2024
e867c8e
Update time controller tests
pedroripper Jul 2, 2024
c178f91
add function to count elements in a table
guilhermebodin Jul 3, 2024
b481369
Tests for empty cache
pedroripper Jul 3, 2024
bd6ffe5
Add docs
pedroripper Jul 3, 2024
e072cd7
Change table names
pedroripper Jul 4, 2024
53f94be
Add update and delete time series
pedroripper Jul 4, 2024
1a564ad
update
guilhermebodin Jul 5, 2024
039f932
Update Docs
pedroripper Jul 8, 2024
5422d63
Update regex
pedroripper Jul 8, 2024
fd17673
Error handling and tests
pedroripper Jul 9, 2024
719000f
Fix
pedroripper Jul 9, 2024
e87a6b6
Fix time controller query
pedroripper Jul 12, 2024
e8c741e
Fix query
pedroripper Jul 16, 2024
724f02d
Update according to .dart
pedroripper Jul 17, 2024
5ac65a8
Update docs
pedroripper Jul 17, 2024
fac94af
Revert "Update according to .dart"
pedroripper Jul 18, 2024
11a7d79
update name to update_time_series_row
guilhermebodin Jul 26, 2024
e204ec0
add new add_time_series_row! method
guilhermebodin Jul 27, 2024
b27311f
Format
pedroripper Aug 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,5 @@ Manifest.toml
*.out
*.ok
debug_psrclasses
*.gz
*.sqlite
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ makedocs(;
"PSRDatabaseSQLite Overview" => String[
"psrdatabasesqlite/introduction.md",
"psrdatabasesqlite/rules.md",
"psrdatabasesqlite/time_series.md",
],
"OpenStudy and OpenBinary Examples" => String[
"examples/reading_parameters.md",
Expand Down
32 changes: 28 additions & 4 deletions docs/src/psrdatabasesqlite/rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,10 +132,10 @@ CREATE TABLE HydroPlant_vector_GaugingStation(

```

### Time Series
### Time Series Files

- All Time Series for the elements from a Collection should be stored in a Table
- The Table name should be the same as the name of the Collection followed by `_timeseriesfiles`, as presented below
- All Time Series files for the elements from a Collection should be stored in a Table
- The Table name should be the same as the name of the Collection followed by `_time_series_files`, as presented below

<p style="text-align: center"> COLLECTION_vector_ATTRIBUTE</p>

Expand All @@ -145,12 +145,36 @@ CREATE TABLE HydroPlant_vector_GaugingStation(
Example:

```sql
CREATE TABLE Plant_timeseriesfiles (
CREATE TABLE Plant_time_series_files (
generation TEXT,
cost TEXT
) STRICT;
```

### Time Series
- Time Series stored in the database should be stored in a table with the name of the Collection followed by `_time_series_` and the name of the attribute group, as presented below.

<p style="text-align: center"> COLLECTION_time_series_GROUP_OF_ATTRIBUTES</p>

Notice that it is quite similar to the vector attributes, but without the `vector_index` column.
Instead, a mandatory column named `date_time` should be created to store the date of the time series data.

Example:

```sql
CREATE TABLE Resource_time_series_group1 (
id INTEGER,
date_time TEXT NOT NULL,
some_vector1 REAL,
some_vector2 REAL,
FOREIGN KEY(id) REFERENCES Resource(id) ON DELETE CASCADE ON UPDATE CASCADE,
PRIMARY KEY (id, date_time)
) STRICT;
```

!!! tip
For more information on how to handle time series data, please refer to the [Time Series](./time_series.md) section.

## Migrations

Migrations are an important part of the `DatabaseSQLite` framework. They are used to update the database schema to a new version without the need to delete the database and create a new one from scratch. Migrations are defined by two separate `.sql` files that are stored in the `migrations` directory of the model. The first file is the `up` migration and it is used to update the database schema to a new version. The second file is the `down` migration and it is used to revert the changes made by the `up` migration. Migrations are stored in directories in the model and they have a specific naming convention. The name of the migration folder should be the number of the version (e.g. `/migrations/1/`).
Expand Down
254 changes: 254 additions & 0 deletions docs/src/psrdatabasesqlite/time_series.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,254 @@
# Time Series

It is possible to store time series data in your database. Time series in `PSRDatabaseSQLite` are very flexible. You can have missing values, and you can have sparse data.

There is a specific table format that must be followed. Consider the following example:

```sql
CREATE TABLE Resource (
id INTEGER PRIMARY KEY AUTOINCREMENT,
label TEXT UNIQUE NOT NULL
) STRICT;

CREATE TABLE Resource_time_series_group1 (
id INTEGER,
date_time TEXT NOT NULL,
some_vector1 REAL,
some_vector2 REAL,
FOREIGN KEY(id) REFERENCES Resource(id) ON DELETE CASCADE ON UPDATE CASCADE,
PRIMARY KEY (id, date_time)
) STRICT;
```

It is mandatory for a time series to be indexed by a `date_time` column with the following format: `YYYY-MM-DD HH:MM:SS`. You can use the `Dates.jl` package for handling this format.

```julia
using Dates
date = DateTime(2024, 3, 1) # 2024-03-01T00:00:00 (March 1st, 2024)
```

Notice that in this example, there are two value columns `some_vector1` and `some_vector2`. You can have as many value columns as you want. You can also separate the time series data into different tables, by creating a table `Resource_time_series_group2` for example.

It is also possible to add more dimensions to your time series, such as `block` and `scenario`.

```sql
CREATE TABLE Resource_time_series_group2 (
id INTEGER,
date_time TEXT NOT NULL,
block INTEGER NOT NULL,
some_vector3 REAL,
some_vector4 REAL,
FOREIGN KEY(id) REFERENCES Resource(id) ON DELETE CASCADE ON UPDATE CASCADE,
PRIMARY KEY (id, date_time, block)
) STRICT;
```

## Rules

Time series in `PSRDatabaseSQLite` are very flexible. You can have missing values, and you can have sparse data.

If you are querying for a time series row entry that has a missing value, it first checks if there is a data with a `date_time` earlier than the queried `date_time`. If there is, it returns the value of the previous data. If there is no data earlier than the queried `date_time`, it returns a specified value according to the type of data you are querying.

- For `Float64`, it returns `NaN`.
- For `Int64`, it returns `typemin(Int)`.
- For `String`, it returns `""` (empty String).
- For `DateTime`, it returns `typemin(DateTime)`.

For example, if you have the following data:

| **Date** | **some_vector1(Float64)** | **some_vector2(Float64)** |
|:--------:|:-----------:|:-----------:|
| 2020 | 1.0 | missing |
| 2021 | missing | 1.0 |
| 2022 | 3.0 | missing |

1. If you query for `some_vector1` at `2020`, it returns `1.0`.
2. If you query for `some_vector2` at `2020`, it returns `NaN`.
3. If you query for `some_vector1` at `2021`, it returns `1.0`.
4. If you query for `some_vector2` at `2021`, it returns `1.0`.
5. If you query for `some_vector1` at `2022`, it returns `3.0`.
6. If you query for `some_vector2` at `2022`, it returns `1.0`.


## Inserting data

When creating a new element that has a time series, you can pass this information via a `DataFrame`. Consider the collection `Resource` with the two time series tables `Resource_time_series_group1` and `Resource_time_series_group2`.

```julia
using DataFrames
using Dates
using PSRClassesInterface
PSRDatabaseSQLite = PSRClassesInterface.PSRDatabaseSQLite

db = PSRDatabaseSQLite.create_empty_db_from_schema(db_path, path_schema; force = true)

PSRDatabaseSQLite.create_element!(db, "Configuration"; label = "Toy Case", value1 = 1.0)

df_group1 = DataFrame(;
date_time = [DateTime(2000), DateTime(2001), DateTime(2002)],
some_vector1 = [missing, 1.0, 2.0],
some_vector2 = [1.0, missing, 5.0],
)

df_group2 = DataFrame(;
date_time = [
DateTime(2000),
DateTime(2000),
DateTime(2000),
DateTime(2000),
DateTime(2001),
DateTime(2001),
DateTime(2001),
DateTime(2009),
],
block = [1, 1, 1, 1, 2, 2, 2, 2],
some_vector3 = [1.0, 2.0, 3.0, 4.0, 1, 2, 3, 4],
some_vector4 = [1.0, 2.0, 3.0, 4.0, 1, 2, 3, 4],
)


PSRDatabaseSQLite.create_element!(
db,
"Resource";
label = "Resource 1",
group1 = df_group1,
group2 = df_group2,
)
```

It is also possible to insert a single row of a time series. This is useful when you want to insert a specific dimension entry. This way of inserting time series is less efficient than inserting a whole `DataFrame`.

```julia
using DataFrames
using Dates
using PSRClassesInterface
PSRDatabaseSQLite = PSRClassesInterface.PSRDatabaseSQLite

db = PSRDatabaseSQLite.create_empty_db_from_schema(db_path, path_schema; force = true)

PSRDatabaseSQLite.create_element!(db, "Configuration"; label = "Toy Case", value1 = 1.0)

PSRDatabaseSQLite.create_element!(
db,
"Resource";
label = "Resource 1"
)

PSRDatabaseSQLite.add_time_series_row!(
db,
"Resource",
"some_vector1",
"Resource 1",
10.0; # new value
date_time = DateTime(2000)
)

PSRDatabaseSQLite.add_time_series_row!(
db,
"Resource",
"some_vector1",
"Resource 1",
11.0; # new value
date_time = DateTime(2001)
)
```

## Reading data

You can read the information from the time series in two different ways.

### Reading as a table
First, you can read the whole time series table for a given value, as a `DataFrame`.

```julia
df = PSRDatabaseSQLite.read_time_series_table(
db,
"Resource",
"some_vector1",
"Resource 1",
)
```

### Reading a single row

It is also possible to read a single row of the time series in the form of an array. This is useful when you want to query a specific dimension entry.
For this function, there are performance improvements when reading the data via caching the previous and next non-missing values.

```julia
values = PSRDatabaseSQLite.read_time_series_row(
db,
"Resource",
"some_vector1",
Float64;
date_time = DateTime(2020)
)
```

When querying a row, all values should non-missing. However, if there is a missing value, the function will return the previous non-missing value. And if even the previous value is missing, it will return a specified value according to the type of data you are querying.


- For `Float64`, it returns `NaN`.
- For `Int64`, it returns `typemin(Int)`.
- For `String`, it returns `""` (empty String).
- For `DateTime`, it returns `typemin(DateTime)`.

For example, if you have the following data for the time series `some_vector1`:

| **Date** | **Resource 1** | **Resource 2** |
|:--------:|:-----------:|:-----------:|
| 2020 | 1.0 | missing |
| 2021 | missing | 1.0 |
| 2022 | 3.0 | missing |

1. If you query at `2020`, it returns `[1.0, NaN]`.
3. If you query at `2021`, it returns `[1.0, 1.0]`.
5. If you query at `2022`, it returns `[3.0, 1.0]`.


## Updating data

When updating one of the entries of a time series for a given element and attribute, you need to specify the exact dimension values of the row you want to update.


For example, consider a time series that has `block` and `data_time` dimensions.

```julia
PSRDatabaseSQLite.update_time_series_row!(
db,
"Resource",
"some_vector3",
"Resource 1",
10.0; # new value
date_time = DateTime(2000),
block = 1
)
```

## Deleting data

You can delete the whole time series of an element for a given time series group.
Consider the following table:

```sql
CREATE TABLE Resource_time_series_group1 (
id INTEGER,
date_time TEXT NOT NULL,
some_vector1 REAL,
some_vector2 REAL,
FOREIGN KEY(id) REFERENCES Resource(id) ON DELETE CASCADE ON UPDATE CASCADE,
PRIMARY KEY (id, date_time)
) STRICT;
```

This table represents a "group" that stores two time series `some_vector1` and `some_vector2`. You can delete all the data from this group by calling the following function:

```julia
PSRDatabaseSQLite.delete_time_series!(
db,
"Resource",
"group1",
"Resource 1",
)
```

When trying to read a time series that has been deleted, the function will return an empty `DataFrame`.
3 changes: 3 additions & 0 deletions profiling/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[deps]
PProf = "e4faabce-9ead-11e9-39d9-4379958e3056"
Profile = "9abbd945-dff8-562f-b5e8-e1ebf5ef1b79"
12 changes: 12 additions & 0 deletions profiling/create_profile.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# You should run the script from the profiling directory

using Profile
using PProf
import Pkg
root_path = dirname(@__DIR__)
Pkg.activate(root_path)
using PSRClassesInterface

include("../script_time_controller.jl")
@profile include("../script_time_controller.jl")
pprof()
8 changes: 8 additions & 0 deletions profiling/open_profile.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# You should run the script from the profiling directory

using Profile
using PProf

file_name = "profile.pb.gz"

PProf.refresh(; file = file_name, webport = 57998)
1 change: 1 addition & 0 deletions src/PSRDatabaseSQLite/PSRDatabaseSQLite.jl
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ include("exceptions.jl")
include("utils.jl")
include("attribute.jl")
include("collection.jl")
include("time_controller.jl")
include("database_sqlite.jl")
include("create.jl")
include("read.jl")
Expand Down
12 changes: 12 additions & 0 deletions src/PSRDatabaseSQLite/attribute.jl
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,18 @@ mutable struct VectorRelation{T} <: VectorAttribute
end
end

mutable struct TimeSeries{T} <: VectorAttribute
id::String
type::Type{T}
default_value::Union{Missing, T}
not_null::Bool
group_id::String
parent_collection::String
table_where_is_located::String
dimension_names::Vector{String}
num_dimensions::Int
end

mutable struct TimeSeriesFile{T} <: ReferenceToFileAttribute
id::String
type::Type{T}
Expand Down
Loading
Loading