This repository helps provide a way to group similar tsdat
ingest scripts in the same
repository so that they can be more easily maintained and deployed on input files.
runner.py
: Top-level CLI to run the appropriate ingest on one or more files. Runpython runner.py --help
to see the full list of commands it offers.ingest/*
: collection of python modules, each of which is a self-describing and self-contained ingest. Every ingest module exports the necessary information for therunner
or higher-level processes to instantiate and run the ingest.tests/*
: tests performed on all ingests. Note that individual ingests also define their own tests, so this folder is primarily used for high-level sanity checks.utils/*
: utility methods and classes used throughout the project. This folder will be updated as needed to allow ingests to leverage common project-specific tools..devcontainer/*
,.vscode/*
,*docker*
: Configurations to simplify and development environment setup..github/*
: Workflows and templates to ensure code is well-tested and issues are tracked appropriately.
After you have added at least one ingest pipeline in the ingest/
folder, you can run
the runner.py
script as a CLI to ingest data matching one of your registered ingests.
You can run python runner.py --help
to see a full list of runtime options, shown
below:
$ python runner.py --help
Usage: runner.py [OPTIONS] FILES...
--------------------------------------------------------------------------
Main entry point to run a registered ingestion pipeline on provided data
file(s). This script takes path(s) to input file(s), automatically
determines which ingest to use to process the data, and runs that ingest
on the provided data.
Args:
files (List[Path], optional): The path(s) to the input file(s) to
process.
--------------------------------------------------------------------------
Arguments:
FILES... Path(s) to the file(s) to process [required]
Options:
--install-completion Install completion for the current shell.
--show-completion Show completion for the current shell, to copy it or
customize the installation.
--help Show this message and exit.
Developers should follow the following steps to create a new ingest pipeline:
- Clone or fork this repository to your development area
- Set up your development environment according to the instructions below.
- Run
> pytest
to ensure that tests pass (validate your setup) - Run
> cookiecutter templates/ingest -o ingest/
to generate your own ingest. - Follow the steps outlined in the generated ingest README to modify the generated ingest code.
- Test your changes, then push back up to your remote repository.
This repository supports adding as many ingests as you want. Just follow steps 3-6 for each new ingest you want to add.
This section outlines how to set up the recommended development environment for this project.
-
Download and install VS Code. Make sure to add
code
to your path if prompted.We chose VS Code because of its clean user interface, quick startup time, extremely powerful capabilities out-of-box, and its rich library of open source extensions.
-
Clone your fork of this repository to your laptop and open it up in VS Code
-
The first time you open this project in VS Code you will be prompted to install the recommended extensions. Please do so now.
-
Windows users: We recommend using Docker to manage dependencies for this project. If you choose to use Docker follow the steps below:
- Press
F1
(orctrl-shift-p
) to bring up the command pane in VS Code - In the command pane, type:
Remote-Containers: Open Folder in Container...
and hitreturn
- You will be prompted to specify which folder should be opened. Select the folder
containing this
README
file - Several dialog boxes may appear while the VS Code window is refreshing. Please install the recommended extensions via the dialog box. An additional dialog box should appear asking you to reload the window so Pylance can take effect. Please do this as well.
- After the window refreshes your development environment will be set up correctly. You may skip steps 5. and 6.
You can find more information about VS Code and docker containers here.
- Press
-
We highly recommend using conda to manage dependencies in your development environment. Please install this using the link above if you haven't already done so. Then run the following commands to create your environment:
$ conda create --name ingest python=3.8 $ conda activate ingest (ingest) $ pip install -r requirements-dev.txt
-
Tell VS Code to use your new
conda
environment:- Press
F1
(orctrl-shift-p
) to bring up the command pane in VS Code - In the command pane, type:
Python: Select Interpreter
and hitreturn
- Select the newly-created
ingest
conda environment from the list. Note that you may need to refresh the list (cycle icon in the top right) for it to show up. - Reload the VS Code window to ensure that this setting propagates correctly.
This is probably not needed, but doesn't hurt. To do this, press
F1
to open the control pane again and typeDeveloper: Reload Window
.
- Press
- Learn more about
tsdat
:- GitHub: https://github.com/tsdat/tsdat
- Documentation: https://tsdat.readthedocs.io
- Data standards: https://github.com/tsdat/data_standards
- Learn more about
xarray
:- GitHub: https://github.com/pydata/xarray
- Documentation: https://xarray.pydata.org
- Other useful tools:
- VS Code: https://code.visualstudio.com/docs
- Docker: https://docs.docker.com/get-started/
pytest
: https://github.com/pytest-dev/pytestblack
: https://github.com/psf/blackmatplotlib
guide: https://realpython.com/python-matplotlib-guide/