CAMPI

Computer-Aided Metadata for Photoarchives Initiative

Prototype application for document similarity browsing with photoarchive collections written Summer 2020 by Matthew Lincoln (@mdlincoln) in conjunction with a pilot project with CMU Archives' General Photograph Collection (GPC) with Julia Corrin and Emily Davis, and project management by Scott Weingart.

Project whitepaper: https://doi.org/10.1184/R1/12791807

This is a prototype implementation of a few computer-vision-aided metadata generation workflows for this specific collection. It is not yet meant to be a general-purpose piece of reusable software to be deployed in other contexts. While the general concepts and workflows could be adaptable to other digital photo collections, the specific data models for photographs and hierarchical organization are tailored specifically to the GPC. Our goal in this short development cycle was to prototype, test, and report out on these workflows, with recommendations for later system work that could integrate concepts from this application with in-production collection management services. We are publishing the code of this prototype system only to illustrate how we went about implementing these workflows and technologies. As we discuss in the project whitepaper, we would need to do significant changes and re-implementations to create a system that would 1) work at scale and 2) interact with production systems such as ArchivesSpace or Islandora.

Overview

Docker Compose Services

Django REST Framework site
PostgreSQL Database (The Django application utilizes the PGSQL-specific ArrayField for efficiently storing image embeddings)
A Vue-based SPA frontend that makes calls to the API provided by Django
Nginx reverse proxy over everything

During the duration of this project, the images themselves were served from a temporary IIIF server outside this stack, running IIPImage. Image data in the test.json file shows a sample of the paths used, however the URLs are no longer live.

Django Modules

As this is the most alpha of alpha software, some of these modules could well be refactored into more logically-separate components.

photograph - Models describing individual photographs as well as annotations on those photographs.
collection - Models describing different organizational hierarchies for photographs, such as "jobs" defined in the GPC's original organization, and directories in which original TIFFs were stored during digitization.
cv - Models describing computer vision models and methods for calculating image features, approximate-nearest-neighbor search indices and methods for retrieving nearest neighbors, and close match detection algorithms and the match sets of photographs that they create.
tagging - Models for a domain-specific-vocabulary and a tagging decision workflow
gcv - Models and management commands for making image annotation requests to Google Cloud Vision API, storing the raw responses, and parsing raw responses into structured annotations on photographs.
campi - Helpful abstract models and DRF ViewSet mixin classes used by all the other modules

Configuration

The docker-compose file expects a .env file specifying certain paths and credentials. .env-template describes these.

Name		Name	Last commit message	Last commit date
Latest commit History 419 Commits
data		data
nginx		nginx
rest		rest
src		src
vue		vue
.env-template		.env-template
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CAMPI

Overview

Docker Compose Services

Django Modules

Configuration

About

Contributors 2

Languages

License

cmu-lib/campi

Folders and files

Latest commit

History

Repository files navigation

CAMPI

Overview

Docker Compose Services

Django Modules

Configuration

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages