Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add geonames pod #3479

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions .github/workflows/geonames-image.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
name: geonames-image
on:
pull_request:
push:
branches:
- main
workflow_dispatch:
inputs:
build_arm:
type: boolean
description: "Build for ARM as well"
default: false
required: false
workflow_call:
inputs:
build_arm:
type: boolean
description: "Build for ARM as well"
default: false
required: false
env:
DOCKER_IMAGE_NAME: ghcr.io/loculus-project/geonames
BRANCH_NAME: ${{ github.head_ref || github.ref_name }}
BUILD_ARM: ${{ github.event.inputs.build_arm || inputs.build_arm || github.ref == 'refs/heads/main' }}
sha: ${{ github.event.pull_request.head.sha || github.sha }}
concurrency:
group: ci-${{ github.ref == 'refs/heads/main' && github.run_id || github.ref }}-geonames-${{github.event.inputs.build_arm}}
cancel-in-progress: true
jobs:
geonames-image:
name: Build geonames Docker Image # Don't change: Referenced by .github/workflows/update-argocd-metadata.yml
runs-on: ubuntu-latest
timeout-minutes: 15
permissions:
contents: read
packages: write
checks: read
steps:
- name: Shorten sha
run: echo "sha=${sha::7}" >> $GITHUB_ENV
- uses: actions/checkout@v4
- name: Generate files hash
id: files-hash
run: |
DIR_HASH=$(echo -n ${{ hashFiles('geonames/**', '.github/workflows/geonames-image.yml') }})
echo "DIR_HASH=$DIR_HASH${{ env.BUILD_ARM == 'true' && '-arm' || '' }}" >> $GITHUB_ENV
- name: Setup Docker metadata
id: dockerMetadata
uses: docker/metadata-action@v5
with:
images: ${{ env.DOCKER_IMAGE_NAME }}
tags: |
type=raw,value=${{ env.DIR_HASH }}
type=raw,value=latest,enable=${{ github.ref == 'refs/heads/main' }}
type=raw,value=${{ env.BRANCH_NAME }}
type=raw,value=commit-${{ env.sha }}
type=raw,value=${{ env.BRANCH_NAME }}-arm,enable=${{ env.BUILD_ARM }}
- name: Login to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Check if image exists
id: check-image
run: |
EXISTS=$(docker manifest inspect ${{ env.DOCKER_IMAGE_NAME }}:${{ env.DIR_HASH }} > /dev/null 2>&1 && echo "true" || echo "false")
echo "CACHE_HIT=$EXISTS" >> $GITHUB_ENV
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build and push image if input files changed
if: env.CACHE_HIT == 'false'
uses: docker/build-push-action@v6
with:
context: ./geonames
push: true
tags: ${{ steps.dockerMetadata.outputs.tags }}
cache-from: type=gha,scope=geonames-${{ github.ref }}
cache-to: type=gha,mode=max,scope=geonames-${{ github.ref }}
platforms: ${{ env.BUILD_ARM == 'true' && 'linux/amd64,linux/arm64' || 'linux/amd64' }}
- name: Retag and push existing image if cache hit
if: env.CACHE_HIT == 'true'
run: |
TAGS=(${{ steps.dockerMetadata.outputs.tags }})
for TAG in "${TAGS[@]}"; do
docker buildx imagetools create --tag $TAG ${{ env.DOCKER_IMAGE_NAME }}:${{ env.DIR_HASH }}
done
3 changes: 3 additions & 0 deletions geonames/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
results/
uploads/
*.db
23 changes: 23 additions & 0 deletions geonames/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
FROM mambaorg/micromamba:1.5.8

COPY --chown=$MAMBA_USER:$MAMBA_USER environment.yaml /tmp/env.yaml

RUN micromamba config set extract_threads 1 \
&& micromamba install -y -n base -f /tmp/env.yaml \
&& micromamba clean --all --yes

# Set the environment variable to activate the conda environment
ARG MAMBA_DOCKERFILE_ACTIVATE=1

COPY --chown=$MAMBA_USER:$MAMBA_USER . /package

RUN mkdir -p /package/uploads
RUN mkdir -p /package/results

WORKDIR /package
ENV PATH="/opt/conda/bin:$PATH"

EXPOSE 5000

ENTRYPOINT ["/bin/bash", "-c"]
CMD ["sh"]
31 changes: 31 additions & 0 deletions geonames/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
## Geonames API

This is a simple flask API with a local SQLlite database server and swagger API, the API can be run locally using

```
python api.py
```

initially the database will be empty but Geonames offers a free download service with all administrative regions: https://download.geonames.org/export/dump/, see local development for details.

### Local Development

[Dbeaver](https://dbeaver.io/) is great interface for SQLlite - enter the path to `geonames_database.db` to view the local database.

Run the following commands to download all administrative regions from Geonames and upload to the SQLlite db.

```
wget https://download.geonames.org/export/dump/allCountries.zip -O results/allCountries.zip
unzip results/allCountries.zip
tsv-filter --str-eq 7:A results/allCountries.txt > results/adm.tsv
tsv-select -f 1-3,5-6,8-13 results/adm.tsv > results/adm_dropped.tsv
curl -X POST -F "file=@results/adm_dropped.tsv" http://127.0.0.1:5000/upload/upload-tsv
```


If you want to test the docker image locally. It can be built and run using the commands:

```sh
docker build -t geonames .
docker run -p 5000:5000 geonames
```
160 changes: 160 additions & 0 deletions geonames/api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
import csv
import os
import sqlite3

import yaml
from flask import Flask, request
from flask_restx import Api, Resource

app = Flask(__name__)
api = Api(app, title="Geoname API", description="A simple API to manage geoname data")
DB_PATH = "geonames_database.db"
SCHEMA_PATH = "schema.sql"
UPLOAD_FOLDER = "uploads"

# Ensure the upload folder exists
os.makedirs(UPLOAD_FOLDER, exist_ok=True)
app.config["UPLOAD_FOLDER"] = UPLOAD_FOLDER

search = api.namespace("search", description="Geoname search operations")


# Initialize SQLite database from schema file
def init_db():
with sqlite3.connect(DB_PATH) as conn:
with open(SCHEMA_PATH, "r") as schema_file:
conn.executescript(schema_file.read())
file_path = os.path.normpath(os.path.join(app.config["UPLOAD_FOLDER"], "input.tsv"))
insert_tsv_to_db(file_path)
print("Database initialized successfully!")

@search.route("/get-admin1")
class SearchAdmin1(Resource):
@api.doc(params={"query": "Get list of all admin1_codes for a given INSDC country"})
def get(self):
"""Get list of all admin1_codes for a given country_code"""
query = request.args.get("query", "")
if not query:
return {"error": "Query parameter is required"}, 400

country_code = app.config["insdc_country_code_mapping"].get(query, None)

if not country_code:
return {"error": "Invalid country code"}, 400

try:
with sqlite3.connect(DB_PATH) as conn:
cursor = conn.cursor()
cursor.execute(
"""
SELECT asciiname
FROM administrative_regions
WHERE feature_code = 'ADM1' AND country_code = ?""",
(country_code,),
)
results = cursor.fetchall()
return [row[0] for row in results]
except Exception as e:
return {"error": str(e)}, 500


@search.route("/get-admin2")
class SearchAdmin2(Resource):
@api.doc(params={"query": "Get list of all admin1_codes for a given INSDC country"})
def get(self):
"""Get list of all admin1_codes for a given country_code"""
query = request.args.get("query", "")
if not query:
return {"error": "Query parameter is required"}, 400

country_code = app.config["insdc_country_code_mapping"].get(query, None)

if not country_code:
return {"error": "Invalid country code"}, 400

try:
with sqlite3.connect(DB_PATH) as conn:
cursor = conn.cursor()
cursor.execute(
"""
SELECT asciiname
FROM administrative_regions
WHERE feature_code = 'ADM2' AND country_code = ?""",
(country_code,),
)
results = cursor.fetchall()
return [row[0] for row in results]
except Exception as e:
return {"error": str(e)}, 500


def insert_tsv_to_db(tsv_file_path):
try:
with sqlite3.connect(DB_PATH) as conn:
cursor = conn.cursor()

# Open the TSV file for reading
with open(tsv_file_path, "r") as file:
tsv_reader = csv.reader(file, delimiter="\t")

# Begin a transaction for bulk inserts
cursor.execute("BEGIN TRANSACTION;")

# Loop through each row in the TSV and insert into the database
for row in tsv_reader:
# Adjust the SQL INSERT statement according to your table structure
cursor.execute(
"""
INSERT INTO administrative_regions
(geonameid, name, asciiname, latitude, longitude, feature_code, country_code, cc2, admin1_code, admin2_code, admin3_code)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
row,
) # Pass the row as a tuple of values

# Commit the transaction
cursor.execute("COMMIT;")
print("Data inserted successfully!")

except Exception as e:
print(f"An error occurred: {e}")
return False
return True


upload = api.namespace("upload", description="Geoname upload operations")


# Define the endpoint to handle file uploads
@upload.route("/upload-tsv", methods=["POST"])
class UploadTSV(Resource):
@api.doc(params={"file": "tsv file to upload"})
def post(self):
if "file" not in request.files:
return {"error": "No file part"}, 400

file = request.files["file"]

if not file.filename:
return {"error": "No selected file"}, 400

if file and file.filename.endswith(".tsv"):
# Save the file to the uploads directory
file_path = os.path.normpath(os.path.join(app.config["UPLOAD_FOLDER"], file.filename))
if not file_path.startswith(app.config["UPLOAD_FOLDER"]):
return {"error": "Invalid file path."}, 400
file.save(file_path)

# Insert data from the TSV file into the database
if insert_tsv_to_db(file_path):
return {"message": "File successfully uploaded and data inserted."}, 200
return {"error": "Failed to insert data into the database."}, 500
else:
return {"error": "Invalid file format. Please upload a .tsv file."}, 400


if __name__ == "__main__":
init_db()
config = yaml.safe_load(open("config/default.yaml", encoding="utf-8"))
app.config["insdc_country_code_mapping"] = config.get("insdc_country_code_mapping", {})
debug_mode = os.getenv("FLASK_DEBUG", "False").lower() in ("true", "1", "t")
app.run(debug=debug_mode, host="0.0.0.0", port=5000)
Loading
Loading