Biomedical text mining and question answering.
- api
Interface definitions.
This package will likely be removed. - core
Common classes and utilities. - docs
Documentation. - elasticsearch Not Used
A placeholder project for eventual indexing and searching with ElasticSearch. - error
Error logging service. Services can send logging message to consolidate error messages in a common location. - indexer
Standalone program for creating the Solr index of PubMed and PubMed Central. - nlp
Standalone service that uses Stanford CoreNLP to perform sentence splitting, tokenization, lemmatization, and part of speech tagging. - preprocess
Process PMC documents to:- Extract just the text content to a separate file.
- Create LIF versions with
sentence
,token
,lemma', and
pos` annotations. - Create text versions with stop words, punctuation, numbers and symbols removed ready to be processed with
word2vec
ordoc2vec
- query
Query processors. Accepts natural language from the user and converts it into a search engine query. - rabbitmq
RabbitMQ messaging services. - ranking
Document ranking algorithms. - retreival
Standalone service for retrieving PubMed or PubMed central documents. - scraper-pubmedmedline
Python script used to download and extract PubMed documents from the NIH FTP server. - solr
Solr configuration files. - test (To be removed)
Experimental programming. This module has nothing to do with actual testing. - upload
Upload service for loading json into Galaxy. - web
Spring Boot application that provides a web user interface and REST API.
Running mvn install
in the top level project directory will build all of the Java/Groovy modules, but not all modules are Maven projects.
The web project includes a Makefile that can be used to generate the Docker image and push the image to docker.lappsgrid.org
.
$> make clean
$> make
$> make docker
$> make push
Since the web project is a Spring Boot application simply run the jar file:
$> java -Xmx8G -jar eager.jar
Note In the (near) future JMX capabilities will be added which means the start up procedure will change considerably. Check for the presence of a startup.sh
script in the root directory of the project.
See the README.md files in each project for instruction on running that module.
The following modules are intended to be run as standalone services:
- error Error logging service used to collect error messages in a single location.
- nlp Stanford Core NLP processing service.
- retrieval Document retrieval service.
- upload Galaxy upload service.
All of the above services use RabbitMQ as a message broker. The nlp project has an example Groovy script for submitting documents to the Stanford NLP service for processing.
The following modules contain standalone programs that are intended to be run from the command line.
- indexer Creates the Solr index(es).
- [preprocess](preprocess/README.md
)