diff --git a/docs/src/how-to-use.md b/docs/src/how-to-use.md index fdb0bc5f..d1011684 100644 --- a/docs/src/how-to-use.md +++ b/docs/src/how-to-use.md @@ -171,10 +171,29 @@ It hides the complexity behind the energy problem, making the usage more friendl The `EnergyProblem` can also be constructed using the minimal constructor below. -- `EnergyProblem(graph, representative_periods, timeframe)`: Constructs a new `EnergyProblem` object with the given graph, representative periods, and timeframe. The `constraints_partitions` field is computed from the `representative_periods`, and the other fields are initialized with default values. +- `EnergyProblem(table_tree)`: Constructs a new `EnergyProblem` object with the given [`table_tree`](@ref TableTree) object. The `graph`, `representative_periods`, and `timeframe` are computed using `create_internal_structures`. The `constraints_partitions` field is computed from the `representative_periods`, and the other fields are initialized with default values. See the [basic example tutorial](@ref basic-example) to see how these can be used. +### TableTree + +To move and keep data, we use [DataFrames](https://dataframes.juliadata.org) and a tree-like structure to link to these structures. +Each field in this structure is a NamedTuple. Below, you will find its fields: + +- `static`: Stores the data that does not vary inside a year. Its fields are + - `assets`: Assets data. + - `flows`: Flows data. +- `profiles`: Stores the profile data indexed by: + - `assets`: Dictionary with the reference to assets' profiles indexed by periods (`"rep-periods"` or `"timeframe"`). + - `flows`: Reference to flows' profiles for representative periods. + - `profiles`: Actual profile data. Dictionary of dictionary indexed by periods and then by the profile name. +- `partitions`: Stores the partitions data indexed by: + - `assets`: Dictionary with the specification of the assets' partitions indexed by periods. + - `flows`: Specification of the flows' partitions for representative periods. +- `periods`: Stores the periods data, indexed by: + - `rep_periods`: Representative periods. + - `timeframe`: Timeframe periods. + ### Graph The energy problem is defined using a graph. @@ -185,7 +204,7 @@ Using MetaGraphsNext we can define a graph with metadata, i.e., associate data w Furthermore, we can define the labels of each asset as keys to access the elements of the graph. The assets in the graph are of type [GraphAssetData](@ref), and the flows are of type [GraphFlowData](@ref). -The graph can be created using the [`create_graph_and_representative_periods_from_csv_folder`](@ref) function, or it can be accessed from an [EnergyProblem](@ref). +The graph can be created using the [`create_internal_structures`](@ref) function, or it can be accessed from an [EnergyProblem](@ref). See how to use the graph in the [graph tutorial](@ref graph-tutorial). diff --git a/docs/src/tutorials.md b/docs/src/tutorials.md index ae0f2b16..fa89dece 100644 --- a/docs/src/tutorials.md +++ b/docs/src/tutorials.md @@ -83,7 +83,7 @@ energy_problem.objective_value, energy_problem.termination_status ### Manually creating all structures without EnergyProblem For additional control, it might be desirable to use the internal structures of `EnergyProblem` directly. -This can be error-prone, but it is slightly more efficient. +This can be error-prone, so use it with care. The full description for these structures can be found in [Structures](@ref). ```@example manual @@ -91,7 +91,13 @@ using TulipaEnergyModel input_dir = "../../test/inputs/Tiny" # hide # input_dir should be the path to Tiny -graph, representative_periods, timeframe = create_graph_and_representative_periods_from_csv_folder(input_dir) +table_tree = create_input_dataframes_from_csv_folder(input_dir) +``` + +The `table_tree` contains all tables in the folder, which are then processed into the internal structures below: + +```@example manual +graph, representative_periods, timeframe = create_internal_structures(table_tree) ``` We also need a time partition for the constraints to create the model. diff --git a/src/io.jl b/src/io.jl index 9a7385cb..248f6595 100644 --- a/src/io.jl +++ b/src/io.jl @@ -1,5 +1,6 @@ export create_energy_problem_from_csv_folder, - create_graph_and_representative_periods_from_csv_folder, + create_input_dataframes_from_csv_folder, + create_internal_structures, save_solution_to_file, compute_assets_partitions!, compute_flows_partitions! @@ -14,15 +15,14 @@ the `EnergyProblem` structure. Set `strict = true` to error if assets are missing from partition data. """ function create_energy_problem_from_csv_folder(input_folder::AbstractString; strict = false) - graph, representative_periods, timeframe = - create_graph_and_representative_periods_from_csv_folder(input_folder; strict = strict) - return EnergyProblem(graph, representative_periods, timeframe) + table_tree = create_input_dataframes_from_csv_folder(input_folder; strict = strict) + return EnergyProblem(table_tree) end """ - graph, representative_periods, timeframe = create_graph_and_representative_periods_from_csv_folder(input_folder; strict = false) + table_tree = create_input_dataframes_from_csv_folder(input_folder; strict = false) -Returns the `graph` structure that holds all data, and the `representative_periods` array. +Returns the `table_tree::TableTree` structure that holds all data. Set `strict = true` to error if assets are missing from partition data. The following files are expected to exist in the input folder: @@ -39,48 +39,31 @@ The following files are expected to exist in the input folder: - `profiles-rep-periods-.csv`: Following the schema `schemas.rep_periods.profiles_data`. - `rep-periods-data.csv`: Following the schema `schemas.rep_periods.data`. - `rep-periods-mapping.csv`: Following the schema `schemas.rep_periods.mapping`. - -The returned structures are: - - - `graph`: a MetaGraph with the following information: - - + `labels(graph)`: All assets. - + `edge_labels(graph)`: All flows, in pair format `(u, v)`, where `u` and `v` are assets. - + `graph[a]`: A [`TulipaEnergyModel.GraphAssetData`](@ref) structure for asset `a`. - + `graph[u, v]`: A [`TulipaEnergyModel.GraphFlowData`](@ref) structure for flow `(u, v)`. - - - `representative_periods`: An array of - [`TulipaEnergyModel.RepresentativePeriod`](@ref) ordered by their IDs. - - - `timeframe`: Information of - [`TulipaEnergyModel.Timeframe`](@ref). """ -function create_graph_and_representative_periods_from_csv_folder( - input_folder::AbstractString; - strict = false, -) +function create_input_dataframes_from_csv_folder(input_folder::AbstractString; strict = false) df_assets_data = read_csv_with_implicit_schema(input_folder, "assets-data.csv") df_flows_data = read_csv_with_implicit_schema(input_folder, "flows-data.csv") - df_rep_period = read_csv_with_implicit_schema(input_folder, "rep-periods-data.csv") + df_rep_periods = read_csv_with_implicit_schema(input_folder, "rep-periods-data.csv") df_rp_mapping = read_csv_with_implicit_schema(input_folder, "rep-periods-mapping.csv") - df_assets_profiles = Dict( - profile_type => - read_csv_with_implicit_schema(input_folder, "assets-$profile_type-profiles.csv") for - profile_type in ["timeframe", "rep-periods"] + period_types = ["rep-periods", "timeframe"] + + dfs_assets_profiles = Dict( + period_type => + read_csv_with_implicit_schema(input_folder, "assets-$period_type-profiles.csv") for + period_type in period_types ) df_flows_profiles = read_csv_with_implicit_schema(input_folder, "flows-rep-periods-profiles.csv") - df_assets_partitions = Dict( - "timeframe" => - read_csv_with_implicit_schema(input_folder, "assets-timeframe-partitions.csv"), - "rep-periods" => - read_csv_with_implicit_schema(input_folder, "assets-rep-periods-partitions.csv"), + dfs_assets_partitions = Dict( + period_type => + read_csv_with_implicit_schema(input_folder, "assets-$period_type-partitions.csv") + for period_type in period_types ) df_flows_partitions = read_csv_with_implicit_schema(input_folder, "flows-rep-periods-partitions.csv") - df_profiles = Dict( + dfs_profiles = Dict( period_type => Dict( begin regex = "profiles-$(period_type)-(.*).csv" @@ -90,13 +73,13 @@ function create_graph_and_representative_periods_from_csv_folder( key => value end for filename in readdir(input_folder) if startswith("profiles-$period_type-")(filename) - ) for period_type in ["rep-periods", "timeframe"] + ) for period_type in period_types ) # Error if partition data is missing assets (if strict) if strict missing_assets = - setdiff(df_assets_data[!, :name], df_assets_partitions["rep-periods"][!, :asset]) + setdiff(df_assets_data[!, :name], dfs_assets_partitions["rep-periods"][!, :asset]) if length(missing_assets) > 0 msg = "Error: Partition data missing for these assets: \n" for a in missing_assets @@ -108,24 +91,53 @@ function create_graph_and_representative_periods_from_csv_folder( end end - # Sets and subsets that depend on input data + table_tree = TableTree( + (assets = df_assets_data, flows = df_flows_data), + (assets = dfs_assets_profiles, flows = df_flows_profiles, data = dfs_profiles), + (assets = dfs_assets_partitions, flows = df_flows_partitions), + (rep_periods = df_rep_periods, mapping = df_rp_mapping), + ) + + return table_tree +end +""" + graph, representative_periods, timeframe = create_internal_structures(table_tree) + +Return the `graph`, `representative_periods`, and `timeframe` structures given the input dataframes structure. + +The details of these structures are: + + - `graph`: a MetaGraph with the following information: + + + `labels(graph)`: All assets. + + `edge_labels(graph)`: All flows, in pair format `(u, v)`, where `u` and `v` are assets. + + `graph[a]`: A [`TulipaEnergyModel.GraphAssetData`](@ref) structure for asset `a`. + + `graph[u, v]`: A [`TulipaEnergyModel.GraphFlowData`](@ref) structure for flow `(u, v)`. + + - `representative_periods`: An array of + [`TulipaEnergyModel.RepresentativePeriod`](@ref) ordered by their IDs. + + - `timeframe`: Information of + [`TulipaEnergyModel.Timeframe`](@ref). +""" +function create_internal_structures(table_tree::TableTree) # TODO: Depending on the outcome of issue #294, this can be done more efficiently with DataFrames, e.g., - # combine(groupby(df_rp_mapping, :rep_period), :weight => sum => :weight) + # combine(groupby(input_df_periods.mapping, :rep_period), :weight => sum => :weight) # Create a dictionary of weights and populate it. weights = Dict{Int,Dict{Int,Float64}}() - for sub_df in DataFrames.groupby(df_rp_mapping, :rep_period) + for sub_df in DataFrames.groupby(table_tree.periods.mapping, :rep_period) rp = first(sub_df.rep_period) weights[rp] = Dict(Pair.(sub_df.period, sub_df.weight)) end representative_periods = [ RepresentativePeriod(weights[row.id], row.num_timesteps, row.resolution) for - row in eachrow(df_rep_period) + row in eachrow(table_tree.periods.rep_periods) ] - timeframe = Timeframe(maximum(df_rp_mapping.period), df_rp_mapping) + timeframe = Timeframe(maximum(table_tree.periods.mapping.period), table_tree.periods.mapping) asset_data = [ row.name => GraphAssetData( @@ -147,7 +159,7 @@ function create_graph_and_representative_periods_from_csv_folder( row.initial_storage_capacity, row.initial_storage_level, row.energy_to_power_ratio, - ) for row in eachrow(df_assets_data) + ) for row in eachrow(table_tree.static.assets) ] flow_data = [ @@ -164,11 +176,11 @@ function create_graph_and_representative_periods_from_csv_folder( row.initial_export_capacity, row.initial_import_capacity, row.efficiency, - ) for row in eachrow(df_flows_data) + ) for row in eachrow(table_tree.static.flows) ] num_assets = length(asset_data) - name_to_id = Dict(name => i for (i, name) in enumerate(df_assets_data.name)) + name_to_id = Dict(name => i for (i, name) in enumerate(table_tree.static.assets.name)) _graph = Graphs.DiGraph(num_assets) for flow in flow_data @@ -181,7 +193,7 @@ function create_graph_and_representative_periods_from_csv_folder( for a in MetaGraphsNext.labels(graph) compute_assets_partitions!( graph[a].rep_periods_partitions, - df_assets_partitions["rep-periods"], + table_tree.partitions.assets["rep-periods"], a, representative_periods, ) @@ -190,7 +202,7 @@ function create_graph_and_representative_periods_from_csv_folder( for (u, v) in MetaGraphsNext.edge_labels(graph) compute_flows_partitions!( graph[u, v].rep_periods_partitions, - df_flows_partitions, + table_tree.partitions.flows, u, v, representative_periods, @@ -198,11 +210,11 @@ function create_graph_and_representative_periods_from_csv_folder( end # For timeframe, only the assets where is_seasonal is true are selected - for row in eachrow(df_assets_data) + for row in eachrow(table_tree.static.assets) if row.is_seasonal - # Search for this row in the df_assets_partitions and error if it is not found + # Search for this row in the table_tree.partitions.assets and error if it is not found found = false - for partition_row in eachrow(df_assets_partitions["timeframe"]) + for partition_row in eachrow(table_tree.partitions.assets["timeframe"]) if row.name == partition_row.asset graph[row.name].timeframe_partitions = _parse_rp_partition( Val(partition_row.specification), @@ -220,11 +232,11 @@ function create_graph_and_representative_periods_from_csv_folder( end end - for asset_profile_row in eachrow(df_assets_profiles["rep-periods"]) # row = asset, profile_type, profile_name + for asset_profile_row in eachrow(table_tree.profiles.assets["rep-periods"]) # row = asset, profile_type, profile_name gp = DataFrames.groupby( # 3. group by RP filter( row -> row.profile_name == asset_profile_row.profile_name, # 2. Filter profile_name - df_profiles["rep-periods"][asset_profile_row.profile_type], # 1. Get the profile of given type + table_tree.profiles.data["rep-periods"][asset_profile_row.profile_type], # 1. Get the profile of given type ), :rep_period, ) @@ -236,11 +248,11 @@ function create_graph_and_representative_periods_from_csv_folder( end end - for flow_profile_row in eachrow(df_flows_profiles) + for flow_profile_row in eachrow(table_tree.profiles.flows) gp = DataFrames.groupby( filter( row -> row.profile_name == flow_profile_row.profile_name, - df_profiles["rep-periods"][flow_profile_row.profile_type], + table_tree.profiles.data["rep-periods"][flow_profile_row.profile_type], ), :rep_period, ) @@ -252,10 +264,10 @@ function create_graph_and_representative_periods_from_csv_folder( end end - for asset_profile_row in eachrow(df_assets_profiles["timeframe"]) # row = asset, profile_type, profile_name + for asset_profile_row in eachrow(table_tree.profiles.assets["timeframe"]) # row = asset, profile_type, profile_name df = filter( row -> row.profile_name == asset_profile_row.profile_name, # 2. Filter profile_name - df_profiles["timeframe"][asset_profile_row.profile_type], # 1. Get the profile of given type + table_tree.profiles.data["timeframe"][asset_profile_row.profile_type], # 1. Get the profile of given type ) graph[asset_profile_row.asset].timeframe_profiles[asset_profile_row.profile_type] = df.value end diff --git a/src/structures.jl b/src/structures.jl index e92e4474..eb924d68 100644 --- a/src/structures.jl +++ b/src/structures.jl @@ -4,6 +4,42 @@ export GraphAssetData, const TimestepsBlock = UnitRange{Int} const PeriodsBlock = UnitRange{Int} +const PeriodType = String +const TableNodeStatic = @NamedTuple{assets::DataFrame, flows::DataFrame} +const TableNodeProfiles = @NamedTuple{ + assets::Dict{PeriodType,DataFrame}, + flows::DataFrame, + data::Dict{PeriodType,Dict{Symbol,DataFrame}}, +} +const TableNodePartitions = @NamedTuple{assets::Dict{PeriodType,DataFrame}, flows::DataFrame} +const TableNodePeriods = @NamedTuple{rep_periods::DataFrame, mapping::DataFrame} + +""" +Structure to hold the tabular data. + +## Fields + +- `static`: Stores the data that does not vary inside a year. Its fields are + - `assets`: Assets data. + - `flows`: Flows data. +- `profiles`: Stores the profile data indexed by: + - `assets`: Dictionary with the reference to assets' profiles indexed by periods (`"rep-periods"` or `"timeframe"`). + - `flows`: Reference to flows' profiles for representative periods. + - `profiles`: Actual profile data. Dictionary of dictionary indexed by periods and then by the profile name. +- `partitions`: Stores the partitions data indexed by: + - `assets`: Dictionary with the specification of the assets' partitions indexed by periods. + - `flows`: Specification of the flows' partitions for representative periods. +- `periods`: Stores the periods data, indexed by: + - `rep_periods`: Representative periods. + - `timeframe`: Timeframe periods. +""" +struct TableTree + static::TableNodeStatic + profiles::TableNodeProfiles + partitions::TableNodePartitions + periods::TableNodePeriods +end + """ Structure to hold the data of the timeframe. """ @@ -197,6 +233,7 @@ It hides the complexity behind the energy problem, making the usage more friendl See the [basic example tutorial](@ref basic-example) to see how these can be used. """ mutable struct EnergyProblem + table_tree::TableTree graph::MetaGraph{ Int, SimpleDiGraph{Int}, @@ -221,15 +258,17 @@ mutable struct EnergyProblem time_solve_model::Float64 """ - EnergyProblem(graph, representative_periods, timeframe) + EnergyProblem(dfs_input) - Constructs a new EnergyProblem object with the given graph, representative periods, and timeframe. The `constraints_partitions` field is computed from the `representative_periods`, - and the other fields and nothing or set to default values. + Constructs a new EnergyProblem object from the input dataframes. + This will call [`create_internal_structures`](@ref). """ - function EnergyProblem(graph, representative_periods, timeframe) + function EnergyProblem(dfs_input) + graph, representative_periods, timeframe = create_internal_structures(dfs_input) constraints_partitions = compute_constraints_partitions(graph, representative_periods) return new( + dfs_input, graph, representative_periods, constraints_partitions, diff --git a/test/runtests.jl b/test/runtests.jl index 4dba48be..cf82e36c 100644 --- a/test/runtests.jl +++ b/test/runtests.jl @@ -29,5 +29,5 @@ end end @testset "Ensuring EU data can be read" begin - create_graph_and_representative_periods_from_csv_folder(joinpath(@__DIR__, "../benchmark/EU/")) + create_input_dataframes_from_csv_folder(joinpath(@__DIR__, "../benchmark/EU/")) end diff --git a/test/test-io.jl b/test/test-io.jl index 72af112d..10aac015 100644 --- a/test/test-io.jl +++ b/test/test-io.jl @@ -42,7 +42,8 @@ end @testset "Graph structure" begin @testset "Graph structure is correct" begin dir = joinpath(INPUT_FOLDER, "Tiny") - graph, _, _ = create_graph_and_representative_periods_from_csv_folder(dir) + table_tree = create_input_dataframes_from_csv_folder(dir) + graph, _, _ = create_internal_structures(table_tree) @test Graphs.nv(graph) == 6 @test Graphs.ne(graph) == 5 @@ -141,6 +142,7 @@ end end missing_asset = Symbol(split(lines[end], ",")[1]) # The asset the was not included - graph, rps, tf = create_graph_and_representative_periods_from_csv_folder(dir) + table_tree = create_input_dataframes_from_csv_folder(dir) + graph, rps, tf = create_internal_structures(table_tree) @test graph[missing_asset].timeframe_partitions == [i:i for i in 1:tf.num_periods] end