mlab-ns is intergrated with Travis to build and deploy to testing and production.
- commits to master are built and pushed to mlab-ns-testing by travis
- tagged releases are built and pushed to mlab-ns production by travis
Though we use Travis now to build and deploy, the previous instructions are retained below for reference.
To deploy to the standard mlab-ns testing environment
(mlab-sandbox.appspot.com), follow the instructions below with no
modifications. To deploy to a different testing environment, you may need to
edit server/app.yaml.mlab-sandbox
to update the "service" field to work
within your test environment's other App Engine services names.
git clone --recursive https://github.com/m-lab/mlab-ns.git mlabns-testing
cd mlabns-testing
# Or, for an existing repo:
# git checkout master
# git submodule update --init
# git pull origin master
python environment_bootstrap.py testing
~/google_appengine/appcfg.py --oauth2 update server/
git clone --recursive https://github.com/m-lab/mlab-ns.git mlabns-live
cd mlabns-live
# Or, for an existing repo:
# git checkout master
# git submodule update --init
# git pull origin master
python environment_bootstrap.py live
# Verify all tests are passing
./build
# Deploy to production
~/google_appengine/appcfg.py --oauth2 update server/
When deploying to production make sure to deploy from the master branch.
To deploy mlab-ns in a fresh GCP project, it is necessary to first deploy the code (see above). Once the code is deployed, mlab-ns needs seed data so that it can properly query Prometheus and build up its datastore. To create this seed data, follow the instructions below.
Note: These instructions require billing to be enabled on your GCP project, as the data population process will exhaust a free-tiered project's daily AppEngine quota.
Note: These instructions require you to have files named nagios.csv
and
prometheus.csv
, which are not under source control as they contains secret
credentials, but is available to authorized users here: https://goo.gl/tfEg1v.
Manually create those two files and paste the file content found at that URL
into the files before running the appcfg.py
commands below.
# Replace URL with other project's URL if not populating mlab-sandbox.
GAE_URL=http://mlab-sandbox.appspot.com
TOKEN=$( gcloud auth print-access-token )
appcfg.py --url ${GAE_URL}/_ah/remote_api upload_data \
--oauth2_access_token=${TOKEN} \
--config_file=server/bulkloader.yaml \
--filename=server/mlabns/conf/tools.csv \
--kind=Tool
appcfg.py --url ${GAE_URL}/_ah/remote_api upload_data \
--oauth2_access_token=${TOKEN} \
--config_file=server/bulkloader.yaml \
--filename=server/mlabns/conf/prometheus.csv \
--kind=Prometheus
appcfg.py --url ${GAE_URL}/_ah/remote_api upload_data \
--oauth2_access_token=${TOKEN} \
--config_file=server/bulkloader.yaml \
--filename=server/mlabns/conf/redirect_probability.csv \
--kind=RedirectProbability
Note: If you see repeated errors including Refreshing due to a 401 (attempt 1/2)
, this is an appcfg
bug. To work
around the issue, delete any cached appcfg tokens in your home directory (will
likely start with ~/.appcfg*
).
After the Datastore is populated with seed information, manually kick off the cron jobs to finish populating the Datastore with the latest live information from Prometheus.
Run the following jobs from GCP under Compute > App Engine > Task queues > Cron Jobs.
/cron/check_site
/cron/check_status
If bootstrapping was successful, you should see a populated map at the root mlab-ns URL (e.g. mlab-sandbox.appspot.com) with M-Lab's sites properly located.
Help! After following the bootstrapping instructions, the map is not populated / my requests are being routed badly.
The likely cause is that the cron jobs did not complete successfully. Check the AppEngine logs to view the result of the cron jobs.
If the jobs failed due to "quota exceeded", you need to enable billing on the account.
If the jobs failed due to timeout exceeded or memory exhaustion, it's likely that you're hitting issue #5. The workaround for now is to keep running the cron jobs manually until they complete successfully. Each run is more likely to succeed because repeats of the job have to do successively fewer datastore/memcache inserts.