OCS Archive Library

A base library for the Science Archive and Ingester library to support generalized input file types, generalized data stores, and shared configuration items. This library is configurable via environment variables, but more customization is possible by subclassing the DataFile class for a specific file type, or subclassing the FileStore class for a specific file storage scheme.

Prerequisites

Optional prerequisites may be skipped for reduced functionality.

Python >= 3.8

Usage

This library is used with the OCS Science Archive and OCS Ingester library. It has built in support for .fits files, compressed .fits files, or custom files if the metadata is supplied separately. It is critical that the Header Mapping environment variables below are set up to map to the proper .fits header keys in your data product definition, so that the information can be automatically extracted and stored in the Science Archive. The FileStore environment variables also need to be set for whichever file storage system you are targetting (S3 preferred). The DataFile environment variables aid in the extracting and ingestion of header values from data. For more information on how to customize your OCS Science Archive, please review the data flow documentation.

Installation

It is highly recommended that you install and run your python code inside a dedicated python virtual environment.

Add the ocs_archive package to your python environment:

(venv) $ pip install ocs_archive

Configuration

Environment Variables

Group	Variable	Description	Default
FileStore	`FILESTORE_TYPE`	Type of filestorage to use. Options are `dummy`, `local`, or `s3`.	`dummy`
	`FILESYSTEM_STORAGE_ROOT_DIR`	If using `local` file storage, this is the directory on the local filesystem to use as the root of the storage directories	empty string
	`FILESYSTEM_STORAGE_BASE_URL`	If using `local` file storage, this is the base URL at which those files will be hosted from	`http://0.0.0.0/`
Observation Portal	`OBSERVATION_PORTAL_BASE_URL`	Base URL for the Observation Portal	empty string
	`OBSERVATION_PORTAL_API_TOKEN`	API Token used to authenticate with the Observation Portal	empty string
AWS	`BUCKET`	If using `s3` file storage; AWS S3 Bucket Name	`testbucket`
	`AWS_ACCESS_KEY_ID`	If using `s3` file storage; AWS Access Key with write access to the S3 bucket	empty string
	`AWS_SECRET_ACCESS_KEY`	If using `s3` file storage; AWS Secret Access Key	empty string
	`AWS_DEFAULT_REGION`	If using `s3` file storage; AWS S3 Default Region	empty string
	`S3_ADDRESSING_STYLE`	If using `s3` file storage; Sets the addressing_style for generated s3 paths. Can be one of `virtual`, `path`, or `auto`.	`virtual`
	`S3_ENDPOINT_URL`	If using `s3` file storage; Endpoint url for connecting to s3. This can be modified to connect to a local instance of s3.	`"http://s3.us-west-2.amazonaws.com"`
	`S3_DAYS_TO_IA_STORAGE`	If using `s3` file storage, this is the age in days after which data will be ingested directly to Infrequent Access (IA) storage vs normal storage.	60
DataFile	`FILETYPE_MAPPING_OVERRIDES`	A string literal representation of a python dictionary containing a mapping of file extensions to dotpaths to python Classes which subclass the DataFile class. This appends and overrides the default list in the FileFactory class.	`"{}"`
	`HEADER_BLACKLIST`	Comma delimited string list of header values that should be removed from the data before storage in the archive. This can be overriden when instantiating a DataFile as well as via environment variable.	`HISTORY,COMMENT`
	`REQUIRED_HEADERS`	Comma delimited string list of header values that must be present in the DataFile. This can be overriden when instantiating a DataFile as well as via environment variable
	`NULL_HEADER_VALUES`	Comma delimited string list of header values that should be turned into `None` or empty keys. This only applies to the FitsFile class.	`N/A,UNSPECIFIED,UNKNOWN`
	`CALIBRATION_TYPES`	Comma delimited string list of configuration types which represent calibration images. This is used to automatically set calibration images public date to be the observation date if it is not present	`BIAS,DARK,SKYFLAT,EXPERIMENTAL`
	`PUBLIC_PROPOSAL_TAGS`	A comma delimited string list of Observation Portal proposal tags to denote data from this proposal as public. If public, the public date will be set to the observation date. The ocs_archive will fall back to the list of `PUBLIC_PROPOSALS` if any of a proposal's tags are not found in this list.	`public`
	`PRIVATE_PROPOSAL_TAGS`	A comma delimited string list of Observation Portal proposal tags to denote data from this proposal as private. If private, the public date will be set to 999 years in the future. The ocs_archive will fall back to the list of `PRIVATE_PROPOSALS` if any of a proposal's tags are not found in this list.	`private,internal`
	`PUBLIC_PROPOSALS`	Comma delimited string list of proposal IDs which represent public proposals. This is used to set the public date of observations under those proposals to the observation date if it is not present. The matching is based on if each character group appears anywhere within the proposal ID	`EPO,calib,standard,pointing`
	`PRIVATE_PROPOSALS`	A comma delimited string list of proposal IDs which represent private proposals. This is used to set the public date of the observations under those proposals to be 999 years in the future. The matching is based on if each character group appears anywhere within the proposal ID	`LCOEngineering`
	`DAYS_UNTIL_PUBLIC`	The number of days until user data becomes public by default. This is added onto the observation date to get the public date if one is not specifed with the data	`365`
	`PRIVATE_FILE_TYPES`	A comma delimited string list of fragments of the file name which denote a private data file. If any of the fragments are found within the filename, the public date will be set 999 years in the future for this file	`-t00,-x00`
Header Mapping	`OBSERVATION_DATE_KEY`	The key in which to find an iso formatted observation date within the header data	`DATE-OBS`
	`OBSERVATION_DAY_KEY`	The key in which to find an iso formatted observation day within the header data	`DAY-OBS`
	`OBSERVATION_END_TIME_KEY`	The key in which to find an iso formatted observation end date within the header data	`UTSTOP`
	`REDUCTION_LEVEL_KEY`	The key in which to find a numeric reduction level within the header data. Raw is 0, while anything non-zero is some form of processing	`RLEVEL`
	`EXPOSURE_TIME_KEY`	The key in which to find the exposure time in fractional seconds in the header data	`EXPTIME`
	`INSTRUMENT_ID_KEY`	The key in which to find the instrument ID in the header data	`INSTRUME`
	`SITE_ID_KEY`	The key in which to find the site ID in the header data	`SITEID`
	`TELESCOPE_ID_KEY`	The key in which to find the telescope ID in the header data	`TELID`
	`OBSERVATION_ID_KEY`	The key in which to find the observation ID in the header data	`BLKUID`
	`CONFIGURATION_ID_KEY`	The key in which to find the configuration ID in the header data	`MOLUID`
	`PRIMARY_OPTICAL_ELEMENT_KEY`	The key in which to find the primary filter value in the header data	`FILTER`
	`TARGET_NAME_KEY`	The key in which to find the target object's name in the header data	`OBJECT`
	`REQUEST_ID_KEY`	The key in which to find the request ID in the header data	`REQNUM`
	`REQUESTGROUP_ID_KEY`	The key in which to find the request group ID in the header data	`TRACKNUM`
	`CONFIGURATION_TYPE_KEY`	The key in which to find the configuration type in the header data	`OBSTYPE`
	`PROPOSAL_ID_KEY`	The key in which to find the proposal ID in the header data	`PROPID`
	`CATALOG_TARGET_FRAME_KEY`	The key in which to find the base filename of the catalog file for the target of this observation in the header data	`L1IDCAT`
	`PUBLIC_DATE_KEY`	The key in which to find the iso formatted date in which this data should become available to the public in the header data	`L1PUBDAT`
	`RELATED_FRAME_KEYS`	A comma delimited list of keys in the header data to look for related frame base filenames for this observation	`L1IDBIAS,L1IDDARK,L1IDFLAT,L1IDSHUT,L1IDMASK,L1IDFRNG,L1IDCAT,L1IDARC,L1ID1D,L1ID2D,L1IDSUM,TARFILE,ORIGNAME,ARCFILE,FLATFILE,GUIDETAR`
	`THUMBNAIL_FRAME_BASENAME_KEY`	The key used to store the original frame (e.g. FITS image) basename for the associated thumbnail	`frame_basename`
	`THUMBNAIL_SIZE_KEY`	The key used to store the string description of the thumbnail size (e.g. small, medium, large)	`size`
	`RADIUS_KEY`	The key in which to find FOV radius for a circular FOV, used to calculate WCS polygon if specified. Unit of arcseconds	`RADIUS`
	`RA_KEY`	The key in which to find FOV center RA for a circular FOV, used to calculate WCS polygon if specified. Unit of hour angle	`RA`
	`DEC_KEY`	The key in which to find FOV center DEC for a circular FOV, used to calculate WCS polygon if specified. Unit of decimal degrees	`DEC`

Input File Format Configuration

The library is designed to be configured mostly through environment variables, but custom DataFile subclasses can be included and specified via an environment variable in order to support new and more complicated data formats. All data files must contain the minimum set of metadata in order to ingested into the archive. This metadata is used to provide filtering and querying support within the archive. The pieces of file metadata that should be specified have their mappings defined in the Header Data section of the environment variables below. The FitsFile class provided will work for normal or funpacked fits files, provided you set up the Header Data environment variables with the correct mapping of observation concepts to header keys in your data format.

File Storage Format Configuration

The library supports three types of file storage by default, that can be selected via environment variable. The dummy type is just used for testing and development and doesn't actually store any file. The local storage just saves the files into a locally mounted directory. It requires you to run a separate file server on that directory so it knows how to direct links to download the files. This can be accomplished as simply as running python -m http.server --directory=/my/root/dir. It could alternatively be served using any other file server, like node's http-server. The third option is s3, and expects to connect to Amazon's S3 or something with that same interface like minio. S3 file storage requires BUCKET, AWS_*, and S3_* environment variables to be set. More storage types can be added via forking the library and subclassing the FileStore class. S3 storage is the preferred method for a production and public-facing archive, since it is the only option to support versioning and secure data access by generating presigned data URLs.

Development

Poetry

We use Poetry for package management. If you already have Poetry installed, you can skip this section.

You can install Poetry using one of the many options listed at https://python-poetry.org/docs/#installation. One simple option is using Pipx:

python3 -m pip install --user pipx
python3 -m pipx ensurepath
pipx install poetry

Install

Install the project and its Python dependencies:

poetry install

This will install the project in a Poetry managed virtual environment. To run commands in that environment either use poetry run ... or start a shell in that environment with poetry shell

Test

poetry run pytest

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
.github/workflows		.github/workflows
ocs_archive		ocs_archive
tests		tests
.gitignore		.gitignore
CHANGELOG		CHANGELOG
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCS Archive Library

Prerequisites

Usage

Installation

Configuration

Environment Variables

Input File Format Configuration

File Storage Format Configuration

Development

Poetry

Install

Test

About

Releases 2

Packages

Contributors 2

Languages

License

observatorycontrolsystem/ocs_archive

Folders and files

Latest commit

History

Repository files navigation

OCS Archive Library

Prerequisites

Usage

Installation

Configuration

Environment Variables

Input File Format Configuration

File Storage Format Configuration

Development

Poetry

Install

Test

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

Packages