This document contains instructions on how to install and deploy the arXivDigest service. Information on the sample recommender system can be found in a separate document.
- Prerequisites:
- Clone this repository to a location of your choice (will be referred to as
REPO_PATH
below). - Execute all SQL scripts under db/ in sequential order, starting with db/database_v1.1.sql then v1.1, v2.0, etc.
- Run
pip install .
while insideREPO_PATH
to install thearxivdigest
Python package and its dependencies.- If installing with the purpose of development, use the command
pip install -e .
instead, to allow editing of the installed package. - If running the service under an Apache Web Server, you may need to grant access to the respective user (e.g., www-data on Ubuntu) to the installed package.
- If installing with the purpose of development, use the command
- Make sure to put config.json in any of the below directories and update the settings specific to your system:
~/arxivdigest/config.json
/etc/arxivdigest/config.json
%cwd%/config.json
- Run the
init_topic_list.py
script in the/scripts/
folder to populate the database with an initial topic list of general topics that the user can select from.- Under
REPO_PATH
, execute the command:python scripts/init_topic_list.py
- Under
- Pull changes from this repository.
- Execute any new SQL scripts in db/.
- Run
pip install .
while insideREPO_PATH
to update the package and its dependencies.- If needed, check permissions for the installed package.
- Update your local
config.json
file with any new configuration options introduced in config.json.
If you have Docker installed and do not want to set up MySQL locally, a Dockerfile is provided for the database.
You can build and run this image with Docker Compose by running docker-compose up
.
The frontend and API should be started by running app.py
in their respective folder while developing.
Make sure that port 80 is free for the frontend and 5000 is free for the API (or change the frontend and API dev_ports in config.json).
Instructions on how to deploy a Flask application can be found here.
Below is an example WSGI file for the frontend (for the API, just replace "frontend" with "api" everywhere):
#!/opt/anaconda3/bin/python
import sys
import logging
logging.basicConfig(stream=sys.stderr)
sys.path.insert(0, "/opt/anaconda3/lib/python3.6/site-packages/")
from arxivdigest.frontend.app import app as application
Remember to configure the settings in config.json, especially the secret_keys. For more details, see Frontend configuration and API configuration.
For best performance, static files should be served by the web server directly. To achieve this, data_path
must be set in the config file. Then, the web server needs to be configured to reroute calls to /static
to the folder named static
that gets generated inside this location after the first launch. If not rerouted, these files will be served through Flask.
There is a number of recurrent processes that should be automated to run at specific times. This can be achieved by running these scripts with a cronjob.
The scripts should be run in the following order:
- Article scraper: Should be run when arXiv releases new articles. The arXiv release schedule can be found here. Note that articles are not released every day, so this script will not always insert any articles.
- Interleaver: Should be run after the Article scraper. Make sure that there is enough time for the recommender systems to generate recommendations between running the two scripts.
- Send digest mail: Should be run after the Interleaver, the amount of time in between can be varied based on when one wants to send out the digest mails.