Skip to content

Commit

Permalink
Merge pull request #223 from psrenergy/gb/timeseries
Browse files Browse the repository at this point in the history
Add timeseries
  • Loading branch information
pedroripper authored Aug 7, 2024
2 parents b40ef3c + b27311f commit 5536029
Show file tree
Hide file tree
Showing 30 changed files with 2,337 additions and 98 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,5 @@ Manifest.toml
*.out
*.ok
debug_psrclasses
*.gz
*.sqlite
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ makedocs(;
"PSRDatabaseSQLite Overview" => String[
"psrdatabasesqlite/introduction.md",
"psrdatabasesqlite/rules.md",
"psrdatabasesqlite/time_series.md",
],
"OpenStudy and OpenBinary Examples" => String[
"examples/reading_parameters.md",
Expand Down
32 changes: 28 additions & 4 deletions docs/src/psrdatabasesqlite/rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,10 +132,10 @@ CREATE TABLE HydroPlant_vector_GaugingStation(

```

### Time Series
### Time Series Files

- All Time Series for the elements from a Collection should be stored in a Table
- The Table name should be the same as the name of the Collection followed by `_timeseriesfiles`, as presented below
- All Time Series files for the elements from a Collection should be stored in a Table
- The Table name should be the same as the name of the Collection followed by `_time_series_files`, as presented below

<p style="text-align: center"> COLLECTION_vector_ATTRIBUTE</p>

Expand All @@ -145,12 +145,36 @@ CREATE TABLE HydroPlant_vector_GaugingStation(
Example:

```sql
CREATE TABLE Plant_timeseriesfiles (
CREATE TABLE Plant_time_series_files (
generation TEXT,
cost TEXT
) STRICT;
```

### Time Series
- Time Series stored in the database should be stored in a table with the name of the Collection followed by `_time_series_` and the name of the attribute group, as presented below.

<p style="text-align: center"> COLLECTION_time_series_GROUP_OF_ATTRIBUTES</p>

Notice that it is quite similar to the vector attributes, but without the `vector_index` column.
Instead, a mandatory column named `date_time` should be created to store the date of the time series data.

Example:

```sql
CREATE TABLE Resource_time_series_group1 (
id INTEGER,
date_time TEXT NOT NULL,
some_vector1 REAL,
some_vector2 REAL,
FOREIGN KEY(id) REFERENCES Resource(id) ON DELETE CASCADE ON UPDATE CASCADE,
PRIMARY KEY (id, date_time)
) STRICT;
```

!!! tip
For more information on how to handle time series data, please refer to the [Time Series](./time_series.md) section.

## Migrations

Migrations are an important part of the `DatabaseSQLite` framework. They are used to update the database schema to a new version without the need to delete the database and create a new one from scratch. Migrations are defined by two separate `.sql` files that are stored in the `migrations` directory of the model. The first file is the `up` migration and it is used to update the database schema to a new version. The second file is the `down` migration and it is used to revert the changes made by the `up` migration. Migrations are stored in directories in the model and they have a specific naming convention. The name of the migration folder should be the number of the version (e.g. `/migrations/1/`).
Expand Down
254 changes: 254 additions & 0 deletions docs/src/psrdatabasesqlite/time_series.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,254 @@
# Time Series

It is possible to store time series data in your database. Time series in `PSRDatabaseSQLite` are very flexible. You can have missing values, and you can have sparse data.

There is a specific table format that must be followed. Consider the following example:

```sql
CREATE TABLE Resource (
id INTEGER PRIMARY KEY AUTOINCREMENT,
label TEXT UNIQUE NOT NULL
) STRICT;

CREATE TABLE Resource_time_series_group1 (
id INTEGER,
date_time TEXT NOT NULL,
some_vector1 REAL,
some_vector2 REAL,
FOREIGN KEY(id) REFERENCES Resource(id) ON DELETE CASCADE ON UPDATE CASCADE,
PRIMARY KEY (id, date_time)
) STRICT;
```

It is mandatory for a time series to be indexed by a `date_time` column with the following format: `YYYY-MM-DD HH:MM:SS`. You can use the `Dates.jl` package for handling this format.

```julia
using Dates
date = DateTime(2024, 3, 1) # 2024-03-01T00:00:00 (March 1st, 2024)
```

Notice that in this example, there are two value columns `some_vector1` and `some_vector2`. You can have as many value columns as you want. You can also separate the time series data into different tables, by creating a table `Resource_time_series_group2` for example.

It is also possible to add more dimensions to your time series, such as `block` and `scenario`.

```sql
CREATE TABLE Resource_time_series_group2 (
id INTEGER,
date_time TEXT NOT NULL,
block INTEGER NOT NULL,
some_vector3 REAL,
some_vector4 REAL,
FOREIGN KEY(id) REFERENCES Resource(id) ON DELETE CASCADE ON UPDATE CASCADE,
PRIMARY KEY (id, date_time, block)
) STRICT;
```

## Rules

Time series in `PSRDatabaseSQLite` are very flexible. You can have missing values, and you can have sparse data.

If you are querying for a time series row entry that has a missing value, it first checks if there is a data with a `date_time` earlier than the queried `date_time`. If there is, it returns the value of the previous data. If there is no data earlier than the queried `date_time`, it returns a specified value according to the type of data you are querying.

- For `Float64`, it returns `NaN`.
- For `Int64`, it returns `typemin(Int)`.
- For `String`, it returns `""` (empty String).
- For `DateTime`, it returns `typemin(DateTime)`.

For example, if you have the following data:

| **Date** | **some_vector1(Float64)** | **some_vector2(Float64)** |
|:--------:|:-----------:|:-----------:|
| 2020 | 1.0 | missing |
| 2021 | missing | 1.0 |
| 2022 | 3.0 | missing |

1. If you query for `some_vector1` at `2020`, it returns `1.0`.
2. If you query for `some_vector2` at `2020`, it returns `NaN`.
3. If you query for `some_vector1` at `2021`, it returns `1.0`.
4. If you query for `some_vector2` at `2021`, it returns `1.0`.
5. If you query for `some_vector1` at `2022`, it returns `3.0`.
6. If you query for `some_vector2` at `2022`, it returns `1.0`.


## Inserting data

When creating a new element that has a time series, you can pass this information via a `DataFrame`. Consider the collection `Resource` with the two time series tables `Resource_time_series_group1` and `Resource_time_series_group2`.

```julia
using DataFrames
using Dates
using PSRClassesInterface
PSRDatabaseSQLite = PSRClassesInterface.PSRDatabaseSQLite

db = PSRDatabaseSQLite.create_empty_db_from_schema(db_path, path_schema; force = true)

PSRDatabaseSQLite.create_element!(db, "Configuration"; label = "Toy Case", value1 = 1.0)

df_group1 = DataFrame(;
date_time = [DateTime(2000), DateTime(2001), DateTime(2002)],
some_vector1 = [missing, 1.0, 2.0],
some_vector2 = [1.0, missing, 5.0],
)

df_group2 = DataFrame(;
date_time = [
DateTime(2000),
DateTime(2000),
DateTime(2000),
DateTime(2000),
DateTime(2001),
DateTime(2001),
DateTime(2001),
DateTime(2009),
],
block = [1, 1, 1, 1, 2, 2, 2, 2],
some_vector3 = [1.0, 2.0, 3.0, 4.0, 1, 2, 3, 4],
some_vector4 = [1.0, 2.0, 3.0, 4.0, 1, 2, 3, 4],
)


PSRDatabaseSQLite.create_element!(
db,
"Resource";
label = "Resource 1",
group1 = df_group1,
group2 = df_group2,
)
```

It is also possible to insert a single row of a time series. This is useful when you want to insert a specific dimension entry. This way of inserting time series is less efficient than inserting a whole `DataFrame`.

```julia
using DataFrames
using Dates
using PSRClassesInterface
PSRDatabaseSQLite = PSRClassesInterface.PSRDatabaseSQLite

db = PSRDatabaseSQLite.create_empty_db_from_schema(db_path, path_schema; force = true)

PSRDatabaseSQLite.create_element!(db, "Configuration"; label = "Toy Case", value1 = 1.0)

PSRDatabaseSQLite.create_element!(
db,
"Resource";
label = "Resource 1"
)

PSRDatabaseSQLite.add_time_series_row!(
db,
"Resource",
"some_vector1",
"Resource 1",
10.0; # new value
date_time = DateTime(2000)
)

PSRDatabaseSQLite.add_time_series_row!(
db,
"Resource",
"some_vector1",
"Resource 1",
11.0; # new value
date_time = DateTime(2001)
)
```

## Reading data

You can read the information from the time series in two different ways.

### Reading as a table
First, you can read the whole time series table for a given value, as a `DataFrame`.

```julia
df = PSRDatabaseSQLite.read_time_series_table(
db,
"Resource",
"some_vector1",
"Resource 1",
)
```

### Reading a single row

It is also possible to read a single row of the time series in the form of an array. This is useful when you want to query a specific dimension entry.
For this function, there are performance improvements when reading the data via caching the previous and next non-missing values.

```julia
values = PSRDatabaseSQLite.read_time_series_row(
db,
"Resource",
"some_vector1",
Float64;
date_time = DateTime(2020)
)
```

When querying a row, all values should non-missing. However, if there is a missing value, the function will return the previous non-missing value. And if even the previous value is missing, it will return a specified value according to the type of data you are querying.


- For `Float64`, it returns `NaN`.
- For `Int64`, it returns `typemin(Int)`.
- For `String`, it returns `""` (empty String).
- For `DateTime`, it returns `typemin(DateTime)`.

For example, if you have the following data for the time series `some_vector1`:

| **Date** | **Resource 1** | **Resource 2** |
|:--------:|:-----------:|:-----------:|
| 2020 | 1.0 | missing |
| 2021 | missing | 1.0 |
| 2022 | 3.0 | missing |

1. If you query at `2020`, it returns `[1.0, NaN]`.
3. If you query at `2021`, it returns `[1.0, 1.0]`.
5. If you query at `2022`, it returns `[3.0, 1.0]`.


## Updating data

When updating one of the entries of a time series for a given element and attribute, you need to specify the exact dimension values of the row you want to update.


For example, consider a time series that has `block` and `data_time` dimensions.

```julia
PSRDatabaseSQLite.update_time_series_row!(
db,
"Resource",
"some_vector3",
"Resource 1",
10.0; # new value
date_time = DateTime(2000),
block = 1
)
```

## Deleting data

You can delete the whole time series of an element for a given time series group.
Consider the following table:

```sql
CREATE TABLE Resource_time_series_group1 (
id INTEGER,
date_time TEXT NOT NULL,
some_vector1 REAL,
some_vector2 REAL,
FOREIGN KEY(id) REFERENCES Resource(id) ON DELETE CASCADE ON UPDATE CASCADE,
PRIMARY KEY (id, date_time)
) STRICT;
```

This table represents a "group" that stores two time series `some_vector1` and `some_vector2`. You can delete all the data from this group by calling the following function:

```julia
PSRDatabaseSQLite.delete_time_series!(
db,
"Resource",
"group1",
"Resource 1",
)
```

When trying to read a time series that has been deleted, the function will return an empty `DataFrame`.
3 changes: 3 additions & 0 deletions profiling/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[deps]
PProf = "e4faabce-9ead-11e9-39d9-4379958e3056"
Profile = "9abbd945-dff8-562f-b5e8-e1ebf5ef1b79"
12 changes: 12 additions & 0 deletions profiling/create_profile.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# You should run the script from the profiling directory

using Profile
using PProf
import Pkg
root_path = dirname(@__DIR__)
Pkg.activate(root_path)
using PSRClassesInterface

include("../script_time_controller.jl")
@profile include("../script_time_controller.jl")
pprof()
8 changes: 8 additions & 0 deletions profiling/open_profile.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# You should run the script from the profiling directory

using Profile
using PProf

file_name = "profile.pb.gz"

PProf.refresh(; file = file_name, webport = 57998)
1 change: 1 addition & 0 deletions src/PSRDatabaseSQLite/PSRDatabaseSQLite.jl
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ include("exceptions.jl")
include("utils.jl")
include("attribute.jl")
include("collection.jl")
include("time_controller.jl")
include("database_sqlite.jl")
include("create.jl")
include("read.jl")
Expand Down
12 changes: 12 additions & 0 deletions src/PSRDatabaseSQLite/attribute.jl
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,18 @@ mutable struct VectorRelation{T} <: VectorAttribute
end
end

mutable struct TimeSeries{T} <: VectorAttribute
id::String
type::Type{T}
default_value::Union{Missing, T}
not_null::Bool
group_id::String
parent_collection::String
table_where_is_located::String
dimension_names::Vector{String}
num_dimensions::Int
end

mutable struct TimeSeriesFile{T} <: ReferenceToFileAttribute
id::String
type::Type{T}
Expand Down
Loading

0 comments on commit 5536029

Please sign in to comment.