Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ingest.sh script. #164

Merged
merged 1 commit into from
Nov 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ HELM_REPO_URL=https://devseed.com/eoapi-k8s/
HELM_CHART_NAME=eoapi/eoapi
PGO_CHART_VERSION=5.7.0

.PHONY: all deploy minikube help
.PHONY: all deploy minikube ingest help

# Default target
all: deploy
Expand All @@ -31,8 +31,14 @@ minikube:
@echo "eoAPI is now available at:"
@minikube service ingress-nginx-controller -n ingress-nginx --url | head -n 1

ingest:
@echo "Ingesting STAC collections and items into the database."
@command -v bash >/dev/null 2>&1 || { echo "bash is required but not installed"; exit 1; }
@./ingest.sh || { echo "Ingestion failed."; exit 1; }

help:
@echo "Makefile commands:"
@echo " make deploy - Install eoAPI on a cluster kubectl is connected to."
@echo " make minikube - Install eoAPI on minikube."
@echo " make ingest - Ingest STAC collections and items into the database."
@echo " make help - Show this help message."
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,4 +56,5 @@ Instead of using the `make` commands above you can also [manually `helm install`

* Read about [Default Configuration](./docs/configuration.md#default-configuration) and
other [Configuration Options](./docs/configuration.md#additional-options)
* [Manage your data](./docs/manage-data.md) in eoAPI
* Learn about [Autoscaling / Monitoring / Observability](./docs/autoscaling.md)
34 changes: 34 additions & 0 deletions docs/manage-data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Data management

eoAPI-k8s provides a basic data ingestion process that consist of manual operations on the components of the stack.

# Load data

You will have to have STAC records for the collection and items you wish to load (e.g., `collections.json` and `items.json`).
[This repo](https://github.com/vincentsarago/MAXAR_opendata_to_pgstac) contains a few script that may help you to generate sample input data.

## Preshipped bash script

Execute `make ingest` to load data into the eoAPI service - it expects `collections.json` and `items.json` in the current directory.

## Manual steps

In order to add raster data to eoAPI you can load STAC collections and items into the PostgreSQL database using pgSTAC and the tool `pypgstac`.

First, ensure your Kubernetes cluster is running and `kubectl` is configured to access and modify it.

In a second step, you'll have to upload the data into the pod running the raster eoAPI service. You can use the following commands to copy the data:

```bash
kubectl cp collections.json "$NAMESPACE/$EOAPI_POD_RASTER":/tmp/collections.json
kubectl cp items.json "$NAMESPACE/$EOAPI_POD_RASTER":/tmp/items.json
```
Then, bash into the pod or server running the raster eoAPI service, you can use the following commands to load the data:

```bash
#!/bin/bash
apt update -y && apt install python3 python3-pip -y && pip install pypgstac[psycopg]';
pypgstac pgready --dsn $PGADMIN_URI
pypgstac load collections /tmp/collections.json --dsn $PGADMIN_URI --method insert_ignore
pypgstac load items /tmp/items.json --dsn $PGADMIN_URI --method insert_ignore
```
79 changes: 79 additions & 0 deletions ingest.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
#!/bin/bash

# Default files
DEFAULT_COLLECTIONS_FILE="./collections.json"
DEFAULT_ITEMS_FILE="./items.json"

# Check for provided parameters or use defaults
if [ "$#" -eq 2 ]; then
EOAPI_COLLECTIONS_FILE="$1"
EOAPI_ITEMS_FILE="$2"
else
EOAPI_COLLECTIONS_FILE="$DEFAULT_COLLECTIONS_FILE"
EOAPI_ITEMS_FILE="$DEFAULT_ITEMS_FILE"
echo "No specific files provided. Using defaults:"
echo " Collections file: $EOAPI_COLLECTIONS_FILE"
echo " Items file: $EOAPI_ITEMS_FILE"
fi

# Define namespaces
NAMESPACES=("default" "eoapi", "data-access")
EOAPI_POD_RASTER=""
FOUND_NAMESPACE=""

# Discover the pod name from both namespaces
for NS in "${NAMESPACES[@]}"; do
EOAPI_POD_RASTER=$(kubectl get pods -n "$NS" -l app=raster-eoapi -o jsonpath="{.items[0].metadata.name}" 2>/dev/null)
if [ -n "$EOAPI_POD_RASTER" ]; then
FOUND_NAMESPACE="$NS"
echo "Found raster-eoapi pod: $EOAPI_POD_RASTER in namespace: $FOUND_NAMESPACE"
break
fi
done

# Check if the pod was found
if [ -z "$EOAPI_POD_RASTER" ]; then
echo "Could not determine raster-eoapi pod."
exit 1
fi

# Check if input files exist
for FILE in "$EOAPI_COLLECTIONS_FILE" "$EOAPI_ITEMS_FILE"; do
if [ ! -f "$FILE" ]; then
echo "File not found: $FILE. You may set them via the EOAPI_COLLECTIONS_FILE and EOAPI_ITEMS_FILE environment variables."
exit 1
fi
done

# Install required packages
echo "Installing required packages in pod $EOAPI_POD_RASTER in namespace $FOUND_NAMESPACE..."
if ! kubectl exec -n "$FOUND_NAMESPACE" "$EOAPI_POD_RASTER" -- bash -c 'apt update -y && apt install python3 python3-pip -y && pip install pypgstac[psycopg]'; then
echo "Failed to install packages."
exit 1
fi

# Copy files to pod
echo "Copying files to pod..."
echo "Using collections file: $EOAPI_COLLECTIONS_FILE"
echo "Using items file: $EOAPI_ITEMS_FILE"
kubectl cp "$EOAPI_COLLECTIONS_FILE" "$FOUND_NAMESPACE/$EOAPI_POD_RASTER":/tmp/collections.json
kubectl cp "$EOAPI_ITEMS_FILE" "$FOUND_NAMESPACE/$EOAPI_POD_RASTER":/tmp/items.json
pantierra marked this conversation as resolved.
Show resolved Hide resolved

# Load collections and items
echo "Loading collections..."
if ! kubectl exec -n "$FOUND_NAMESPACE" "$EOAPI_POD_RASTER" -- bash -c 'pypgstac load collections /tmp/collections.json --dsn "$PGADMIN_URI" --method insert_ignore'; then
echo "Failed to load collections."
exit 1
fi

echo "Loading items..."
if ! kubectl exec -n "$FOUND_NAMESPACE" "$EOAPI_POD_RASTER" -- bash -c 'pypgstac load items /tmp/items.json --dsn "$PGADMIN_URI" --method insert_ignore'; then
echo "Failed to load items."
exit 1
fi

# Clean temporary files
echo "Cleaning temporary files..."
kubectl exec -n "$FOUND_NAMESPACE" "$EOAPI_POD_RASTER" -- bash -c 'rm -f /tmp/collection.json /tmp/items.json'

echo "Ingestion complete."
Loading