This repository contains the code to compute a global map of costs to generate methanol from renewable energy sources and CO2 from direct air capture. For each pixel on the map, we compute optimal capacities of renewable energy sources, direct air capture, storages, electrolyzer and methanol synthesis to generate a certain pre-defined amount of methanol:
The pipeline tool Snakemake is used to run the computation. Snakemake profiles are defined to run the computation on the VSC and on our institute's server nora.
The computation will first download ERA5 weather data using Atlite and then run the optimization for each pixel on the map using syfop.
The idea for the network and the parameters are taken from the paper "Pathway to a land-neutral expansion of Brazilian renewable fuel production" (Ramirez Camargo et al., 2022).
Note that this is not ready to be used at the current point in time. Parameters need to be adapted and some issues need to be fixed. See OPEN_ISSUES.md for a list of open issues.
First install all dependencies, see INSTALL.md.
Then run:
./run.sh
Note that this is built for nora (our institute's server) and the VSC.
When running the computation on the VSC, run.sh
is started on the login node. The computation is
then distributed to the nodes as SLURM jobs. run.sh
(snakemake) will continue to run on the login
node until all SLURM jobs are finished. Therefore it is advisable to use
screen (or tmux).
If you want to skip downloading the ERA5 data, there is a ready data set on nora:
/data/datasets/era5-global-wind-pv
. You may need the --ignore-incomplete
flag (see below and
OPEN_ISSUES.md).
Note: All parameters for run.sh
are simply passed to Snakemake. We will list a couple of very
useful calls here:
To disable parallelization of jobs for debugging purposes run:
./run.sh --cores 1
A snakemake dry run is helpful to test the Snakefile and list all jobs without running them:
./run.sh --dryrun
syfop-global-costs
implements a testmode which downloads only very little data and does a
complete run on this reduced data set. It can be enabled in the configuration or by passing a
config option via command line:
./run.sh --config testmode=True
If you want to do computation necessary for a specific
./run.sh data/input/SOME_FILE
If you want to skip downloading ERA5 data and already downloaded files, snakemake may complain that
the files are incomplete. Unfortunately --cleanup-metadata
does not seem to haven effect (see
OPEN_ISSUES.md). But you can ignore this with:
./run.sh --ignore-incomplete
Sometimes snakemake wants to rerun a rule but you would like to skip it because it takes too long and you know that your changes should not affect the output. You can force snakemake to skip the rule:
./run.sh --touch data/output/SOME_FILE
It is advisable to test the above using --dryrun
.
Sometimes snakemake does not rerun a rule although you want it to (for example to measure performance multiple times). You can force snakemake to rerun the rule:
./run.sh --force data/output/SOME_FILE
Note: Some rules have to be killed with SIGKILL (eg. the downloads of ERA5 data).
Configuration for the computation is done in these config files:
src/model_parameters.py
: parameters for the model as Python module - this might be moved ot params.yaml in future to allow easier sensitivity analysisconfig/config.yaml
: general configuration for the computationconfig/params.yaml
: parameters of the computation and simulation of renewable time series
Furthermore, there are two Snakemake profiles pre-defined:
config/nora/config.yaml
: parallel computation on our institute's serverconfig/vsc/config.yaml
: parallel computation on multiple nodes on the VSC
The optimization problem can be solved using different solvers. We recommend CPlex:
- CPlex: File API of linopy is quite slow, especially without SSD.
- Gurobi: fast, but probably can't be used on the VSC with an academic license.
- HiGHs: Open source, but slower than Gurobi and CPlex.
Installation instructions can be found in the documentation of syfop.
A couple of helpful hints to work with the VSC. Some of these hints may apply to Linux only (and probably for Mac OS), not sure if similar hints apply for Windows too.
List all jobs currently queued or running submitted by yourself:
watch squeue -u $USER -l
Cancel all jobs submitted by yourself:
squeue -u $USER --format=%i|tail -n +5|xargs scancel
Go to the syfop-global-costs
repo:
cd $DATA/syfop-global-costs/
You might want to try reportseff, which displays jobs even nicer.
Placing the following config snippet in $HOME/.ssh/config
should make SSH to the VSC way more
convenient. It should also work [on Windows 10](https://superuser.com/a/1544127/110122] (not tested).
You can then simply type ssh vsc
to connect - it will set your vsc user name use port 27, so you
can use public/private keys to log in (instead of a password) and set up jump proxies for the
nodes:
Replace YOUR_VSC_USERNAME
with your vsc user name:
Host vsc
HostName vsc4.vsc.ac.at
# Port 27 necessary to allow pubkey... :face_screaming_in_fear:
# https://wiki.vsc.ac.at/doku.php?id=doku:vpn_ssh_access#connecting_to_vsc-2_or_vsc-3_via_ssh-key
Port 27
User YOUR_VSC_USERNAME
Host n*-*
# This allows to directly connect to a computation node of the VSC. The host
# name needs to match the pattern above, e.g. "n4901-034". If it does, a jump
# node will be used automatically (one cannot connect to it from outside).
User YOUR_VSC_USERNAME
ProxyJump vsc
It can be helpful to directly log into a computation node via SSH to use htop
to monitor RAM or
CPU usage (e.g. to see if parallelization is working as expected). SSH to a computation node does
not work directly, one needs a jump server, i.e. one
needs to first login to a login node on the VSC (=ssh vsc
using config from the previous section)
and then login from there to a computation node. This can be done in one step using the config from
above, e.g. n4908-006 for the computation node:
ssh n4908-006
Without the config above one can use the following command:
ssh -J vsc4.vsc.ac.at YOUR_VSC_USERNAME@n4901-035
The host name n4908-006 is the name of the node, which can be found in the output of squeue
.
You can use htop
to monitor the resources or other pre-installed tools.
Instead of submitting a job which runs a script on a node, you can request an interactive session on a node.
salloc -N 1
You can then login in as described above and do whatever you want:
More details: https://wiki.vsc.ac.at/doku.php?id=doku:slurm_interactive