From bc68802d8a93f9cbe3520b73b621e5dd53bac849 Mon Sep 17 00:00:00 2001
From: Syphax bouazzouni <gs_bouazzouni@esi.dz>
Date: Tue, 16 Jan 2024 19:08:38 +0100
Subject: [PATCH] Sync: bring OntoPortal up-to-date with BioPortal releases
 5.26.0 and onward (#2)

* add a script to eradicate (delete data+ files) submissions of an ontology

* Auto stash before merge of "development" and "master"

* omit logs link file

* update the eradicator to support the eradication of not archived submissions if wanted

* fix the delete submission files to not let behind empty directories

* not remove the submission directory beaucse it's already done by the submission.delete

* Update Gemfile.lock

* Reset branch specifier to develop

* extract do_ontology_pull function

* some simple code refactor in the ontology_pull

* simple code refactor of test_ontology_pull

* add a script to do a ontology pull on an ontology on demand

* set the name of the new script in $0

* extract new_file_exists? method from do_ontology_pull

* save the submission in the RemoteFileException

* some automatic code refactor/lint

* use the new do_ontology_pull in the old  do_remote_ontology_pull

* fixed an API call mentioned by @syphax-bouazzouni in ncbo/bioportal-project#254

* fixed an API call mentioned by @syphax-bouazzouni in ncbo/bioportal-project#254

* Gemfile.lock update

* bump up version of actions/checkout from v2->v3

* Gemfile.lock update

* Merge branch 'develop'

* remove forgot variables

* GH Actions unit test workflow refactor

- add ruby versioning via docker-compose.yml file
- bump up ruby v2.6 -> v2.7
- add AllegroGraph backend
- add code coverage

* Remove extra space

* fix for #61

- create contact instance if it doesn't exist
- changed --from-api to --from-apikey
- minor linting

* Restore branch specifier to develop

* Optimization - remove repeated query

* Gemfile.lock update

* Gemfile.lock update

* Gemfile.lock update

* Gemfile had references to develop branch

* implemented #64 - ability to generate labels independently of RDF processing (and vise versa)

* Gemfile.lock update

* fixed a bug in #64

* Relocate docker-compose file and update default configs

* Add GH workflow for publishing docker images

* use ruby native method for listing files instead of a git function

Resolves warning messages when we exclude .git directory from docker image

* remove comment

* capitalize argument in order to be consistent with other scripts

* add arm/64 platform

* additional error handling for SPAM deletion script, #60

* additional error handling for SPAM deletion script, #60

* implemented #67 - improved corrupt data and error handling

* Gemfile.lock update

* exclude test/data/dictionary.txt from git commits

* update version of solr-ut

* Gemfile.lock update

* Restore branch specifier to master

* fixed configuration for the analytics module

* Gemfile.lock update

* implemented #69 - scheduled annotator dictionary file generation should be a configurable option instead of the default

* Gemfile.lock update

* gem update

* create new rake taks for updating purls for all ontologies

moved from ontologies_api/fix_purls.rb

* initial implementation of #70 - Google Analytics v4 Update Compatibility Issue

* added the /data folder to ignore

* update gems

* Gemfile.lock update

* Gemfile.lock update

* Gemfile.lock update

* use patched version of agraph v7.3.1

* unpin faraday gem

* A chnage to reference Analytics Redis from LinkedData block

* Gemfile.lock update

* Gemfile.lock update

* Gemfile.lock update

* Gemfile.lock update

* use assert_operator instead of assert

minitest style guide adherence.
encountered an intermittent unit test failure so assert_operator will provide
better failure feedback than assert

* fixed ncbo_ontology_archive_old_submissions error output

* Gemfile.lock update

* Gemfile.lock update

* Gemfile update

* Gemfile update

* fixes to the analytics script and a new script to generate UA analytics for documentation

* Gemfile.lock update

* Gemfile.lock update

* implemented the first pass at bmir-radx/radx-project#37

* implemented the first pass at bmir-radx/radx-project#37

* set bundler version to be comptatible with ruby 2.7

+ AG v8

* Gemfile.lock update

* Gemfile.lock update

---------

Co-authored-by: Jennifer Vendetti <vendetti@stanford.edu>
Co-authored-by: mdorf <mdorf@stanford.edu>
Co-authored-by: Alex Skrenchuk <alexskr@stanford.edu>
---
 .dockerignore                                 |   9 +-
 .github/workflows/docker-image.yml            |  42 +++
 .github/workflows/ruby-unit-tests.yml         |  18 +-
 .gitignore                                    |   5 +
 Dockerfile                                    |  28 +-
 Gemfile                                       |  14 +-
 Gemfile.lock                                  | 177 +++++++-----
 bin/generate_ua_analytics_file.rb             | 126 ++++++++
 bin/ncbo_cron                                 |  71 +----
 bin/ncbo_ontology_annotate_generate_cache     |   2 +-
 bin/ncbo_ontology_archive_old_submissions     | 112 +++++++-
 bin/ncbo_ontology_import                      |  54 ++--
 bin/ncbo_ontology_process                     |  11 +-
 bin/ncbo_ontology_pull                        |  42 +++
 bin/ncbo_ontology_submissions_eradicate       | 107 +++++++
 config/config.rb.sample                       |  91 ++++--
 config/config.test.rb                         |  76 +++--
 dip.yml                                       |  54 ++++
 docker-compose.yml                            | 139 +++++++++
 lib/ncbo_cron.rb                              |   1 +
 lib/ncbo_cron/config.rb                       |  20 +-
 lib/ncbo_cron/ontologies_report.rb            |   2 +-
 lib/ncbo_cron/ontology_analytics.rb           | 269 ++++++++++++------
 lib/ncbo_cron/ontology_helper.rb              | 185 ++++++++++++
 lib/ncbo_cron/ontology_pull.rb                | 139 +--------
 lib/ncbo_cron/ontology_rank.rb                |   7 +-
 .../ontology_submission_eradicator.rb         |  39 +++
 lib/ncbo_cron/ontology_submission_parser.rb   |  61 ++--
 lib/ncbo_cron/spam_deletion.rb                |  12 +-
 ncbo_cron.gemspec                             |   4 +-
 rakelib/purl_management.rake                  |  28 ++
 test/docker-compose.yml                       |  38 ---
 test/run-unit-tests.sh                        |  10 +-
 test/test_case.rb                             |  26 +-
 test/test_ontology_pull.rb                    |  39 ++-
 test/test_scheduler.rb                        |   2 +-
 36 files changed, 1500 insertions(+), 560 deletions(-)
 create mode 100644 .github/workflows/docker-image.yml
 create mode 100755 bin/generate_ua_analytics_file.rb
 create mode 100755 bin/ncbo_ontology_pull
 create mode 100755 bin/ncbo_ontology_submissions_eradicate
 create mode 100644 dip.yml
 create mode 100644 docker-compose.yml
 create mode 100644 lib/ncbo_cron/ontology_helper.rb
 create mode 100644 lib/ncbo_cron/ontology_submission_eradicator.rb
 create mode 100644 rakelib/purl_management.rake
 delete mode 100644 test/docker-compose.yml

diff --git a/.dockerignore b/.dockerignore
index c712142f..96c8053c 100644
--- a/.dockerignore
+++ b/.dockerignore
@@ -1,5 +1,6 @@
 # Git
-#.git
+.git
+.github
 .gitignore
 # Logs
 log/*
@@ -8,3 +9,9 @@ tmp/*
 # Editor temp files
 *.swp
 *.swo
+coverage
+create_permissions.log
+# Ignore generated test data
+test/data/dictionary.txt
+test/data/ontology_files/repo/**/*
+test/data/tmp/*
diff --git a/.github/workflows/docker-image.yml b/.github/workflows/docker-image.yml
new file mode 100644
index 00000000..6105c1d8
--- /dev/null
+++ b/.github/workflows/docker-image.yml
@@ -0,0 +1,42 @@
+name: Docker Image CI
+
+on:
+  release:
+    types: [published]
+
+jobs:
+  push_to_registry:
+    name: Push Docker image to Docker Hub
+    runs-on: ubuntu-latest
+    steps:
+      - name: Check out the repo
+        uses: actions/checkout@v3
+
+      - name: Set up QEMU
+        uses: docker/setup-qemu-action@v2
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v2
+
+      - name: Log in to Docker Hub
+        uses: docker/login-action@v2
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_TOKEN }}
+
+      - name: Extract metadata (tags, labels) for Docker
+        id: meta
+        uses: docker/metadata-action@v4
+        with:
+          images: bioportal/ncbo_cron
+
+      - name: Build and push Docker image
+        uses: docker/build-push-action@v4
+        with:
+          context: .
+          platforms: linux/amd64,linux/arm64
+          build-args: |
+            RUBY_VERSION=2.7
+          push: true
+          tags: ${{ steps.meta.outputs.tags }}
+          labels: ${{ steps.meta.outputs.labels }}
diff --git a/.github/workflows/ruby-unit-tests.yml b/.github/workflows/ruby-unit-tests.yml
index 192774d1..b61ce745 100644
--- a/.github/workflows/ruby-unit-tests.yml
+++ b/.github/workflows/ruby-unit-tests.yml
@@ -6,15 +6,25 @@ on:
 
 jobs:
   test:
+    strategy:
+      fail-fast: false
+      matrix:
+        backend: ['ncbo_cron', 'ncbo_cron-agraph'] # ruby runs tests with 4store backend and ruby-agraph runs with AllegroGraph backend
     runs-on: ubuntu-latest
     steps:
-    - uses: actions/checkout@v2
+    - uses: actions/checkout@v3
     - name: copy config.rb file from template
       run:  cp config/config.test.rb config/config.rb
     - name: Build docker-compose
-      working-directory: ./test
       run: docker-compose build
     - name: Run unit tests
-      working-directory: ./test
-      run: docker-compose run unit-test wait-for-it solr-ut:8983 -- rake test TESTOPTS='-v'
+      run: |
+        ci_env=`bash <(curl -s https://codecov.io/env)`
+        docker-compose run $ci_env -e CI --rm ${{ matrix.backend }} bundle exec rake test TESTOPTS='-v'
+    - name: Upload coverage reports to Codecov
+      uses: codecov/codecov-action@v3
+      with:
+        flags: unittests
+        verbose: true
+        fail_ci_if_error: false # optional (default = false)
 
diff --git a/.gitignore b/.gitignore
index a7b2058f..ccf97ea0 100644
--- a/.gitignore
+++ b/.gitignore
@@ -3,6 +3,8 @@
 config/config.rb
 config/config_*.rb
 config/*.p12
+config/*.json
+data/
 projectFilesBackup/
 .ruby-version
 repo*
@@ -11,6 +13,9 @@ repo*
 .DS_Store
 tmp
 
+# Code coverage reports
+coverage*
+
 # Ignore eclipse .project
 .project
 .pmd
diff --git a/Dockerfile b/Dockerfile
index 1c463704..73e1379c 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -1,13 +1,29 @@
-FROM ruby:2.6
+ARG RUBY_VERSION
+ARG DISTRO_NAME=bullseye
 
-RUN apt-get update -yqq && apt-get install -yqq --no-install-recommends openjdk-11-jre-headless raptor2-utils wait-for-it
+FROM ruby:$RUBY_VERSION-$DISTRO_NAME
+
+RUN apt-get update -yqq && apt-get install -yqq --no-install-recommends \
+  openjdk-11-jre-headless \
+  raptor2-utils \
+  && rm -rf /var/lib/apt/lists/*
 
-# The Gemfile Caching Trick
-# we install gems before copying the code in its own layer so that gems would not have to get
-# installed every single time code is updated
 RUN mkdir -p /srv/ontoportal/ncbo_cron
+RUN mkdir -p /srv/ontoportal/bundle
 COPY Gemfile* *.gemspec /srv/ontoportal/ncbo_cron/
+
 WORKDIR /srv/ontoportal/ncbo_cron
-RUN gem install bundler -v "$(grep -A 1 "BUNDLED WITH" Gemfile.lock | tail -n 1)"
+
+# set rubygem and bundler to the last version supported by ruby 2.7
+# remove version after ruby v3 upgrade
+RUN gem update --system '3.4.22'
+RUN gem install bundler -v '2.4.22'
+RUN gem update --system
+RUN gem install bundler
+ENV BUNDLE_PATH=/srv/ontoportal/bundle
 RUN bundle install
+
 COPY . /srv/ontoportal/ncbo_cron
+RUN cp /srv/ontoportal/ncbo_cron/config/config.rb.sample /srv/ontoportal/ncbo_cron/config/config.rb
+
+CMD ["/bin/bash"]
diff --git a/Gemfile b/Gemfile
index 8d9bd46c..bcf5f137 100644
--- a/Gemfile
+++ b/Gemfile
@@ -2,13 +2,17 @@ source 'https://rubygems.org'
 
 gemspec
 
-gem 'faraday', '~> 1.9'
 gem 'ffi'
+
+# This is needed temporarily to pull the Google Universal Analytics (UA)
+# data and store it in a file. See (bin/generate_ua_analytics_file.rb)
+# The ability to pull this data from Google will cease on July 1, 2024
 gem "google-apis-analytics_v3"
+
+gem 'google-analytics-data'
 gem 'mail', '2.6.6'
-gem 'minitest', '< 5.0'
 gem 'multi_json'
-gem 'oj', '~> 2.0'
+gem 'oj', '~> 3.0'
 gem 'parseconfig'
 gem 'pony'
 gem 'pry'
@@ -28,6 +32,8 @@ gem 'sparql-client', github: 'ncbo/sparql-client', branch: 'master'
 
 group :test do
   gem 'email_spec'
+  gem 'minitest', '< 5.0'
+  gem 'simplecov'
+  gem 'simplecov-cobertura' # for codecov.io
   gem 'test-unit-minitest'
 end
-
diff --git a/Gemfile.lock b/Gemfile.lock
index eb1dffec..99d242db 100644
--- a/Gemfile.lock
+++ b/Gemfile.lock
@@ -1,6 +1,6 @@
 GIT
   remote: https://github.com/ncbo/goo.git
-  revision: fd7d45cb862c5c2c1833b64a5c8c14154384edc2
+  revision: 75436fe8e387febc53e34ee31ff0e6dd837a9d3f
   branch: master
   specs:
     goo (0.0.2)
@@ -15,7 +15,7 @@ GIT
 
 GIT
   remote: https://github.com/ncbo/ncbo_annotator.git
-  revision: ed325ae9f79e3b0a0061b1af0b02f624de1d0eef
+  revision: 1170a94d266d3e469bfb034a3aa3c4852bd0de82
   branch: master
   specs:
     ncbo_annotator (0.0.1)
@@ -26,7 +26,7 @@ GIT
 
 GIT
   remote: https://github.com/ncbo/ontologies_linked_data.git
-  revision: 8196bf34b45c75f8104bb76dfcba1db0f2c048e4
+  revision: ee0013f0ee23876076bff9d9258b46371ec3b248
   branch: master
   specs:
     ontologies_linked_data (0.0.1)
@@ -46,7 +46,7 @@ GIT
 
 GIT
   remote: https://github.com/ncbo/sparql-client.git
-  revision: fb4a89b420f8eb6dda5190a126b6c62e32c4c0c9
+  revision: d418d56a6c9ff5692f925b45739a2a1c66bca851
   branch: master
   specs:
     sparql-client (1.0.1)
@@ -60,7 +60,7 @@ PATH
     ncbo_cron (0.0.1)
       dante
       goo
-      google-apis-analytics_v3
+      google-analytics-data
       mlanett-redis-lock
       multi_json
       ncbo_annotator
@@ -74,48 +74,49 @@ GEM
     activesupport (3.2.22.5)
       i18n (~> 0.6, >= 0.6.4)
       multi_json (~> 1.0)
-    addressable (2.8.0)
-      public_suffix (>= 2.0.2, < 5.0)
-    bcrypt (3.1.18)
+    addressable (2.8.6)
+      public_suffix (>= 2.0.2, < 6.0)
+    base64 (0.2.0)
+    bcrypt (3.1.20)
+    bigdecimal (3.1.5)
     builder (3.2.4)
     coderay (1.1.3)
-    concurrent-ruby (1.1.10)
+    concurrent-ruby (1.2.2)
+    connection_pool (2.4.1)
     cube-ruby (0.0.3)
     dante (0.2.0)
     declarative (0.0.20)
-    domain_name (0.5.20190701)
-      unf (>= 0.0.5, < 1.0.0)
+    docile (1.4.0)
+    domain_name (0.6.20240107)
     email_spec (2.1.1)
       htmlentities (~> 4.3.3)
       launchy (~> 2.1)
       mail (~> 2.6)
-    faraday (1.10.0)
-      faraday-em_http (~> 1.0)
-      faraday-em_synchrony (~> 1.0)
-      faraday-excon (~> 1.1)
-      faraday-httpclient (~> 1.0)
-      faraday-multipart (~> 1.0)
-      faraday-net_http (~> 1.0)
-      faraday-net_http_persistent (~> 1.0)
-      faraday-patron (~> 1.0)
-      faraday-rack (~> 1.0)
-      faraday-retry (~> 1.0)
+    faraday (2.8.1)
+      base64
+      faraday-net_http (>= 2.0, < 3.1)
       ruby2_keywords (>= 0.0.4)
-    faraday-em_http (1.0.0)
-    faraday-em_synchrony (1.0.0)
-    faraday-excon (1.1.0)
-    faraday-httpclient (1.0.1)
-    faraday-multipart (1.0.4)
-      multipart-post (~> 2)
-    faraday-net_http (1.0.1)
-    faraday-net_http_persistent (1.2.0)
-    faraday-patron (1.0.0)
-    faraday-rack (1.0.0)
-    faraday-retry (1.0.3)
-    ffi (1.15.5)
-    google-apis-analytics_v3 (0.10.0)
-      google-apis-core (>= 0.7, < 2.a)
-    google-apis-core (0.7.0)
+    faraday-net_http (3.0.2)
+    faraday-retry (2.2.0)
+      faraday (~> 2.0)
+    ffi (1.16.3)
+    gapic-common (0.21.1)
+      faraday (>= 1.9, < 3.a)
+      faraday-retry (>= 1.0, < 3.a)
+      google-protobuf (~> 3.18)
+      googleapis-common-protos (>= 1.4.0, < 2.a)
+      googleapis-common-protos-types (>= 1.11.0, < 2.a)
+      googleauth (~> 1.9)
+      grpc (~> 1.59)
+    google-analytics-data (0.4.0)
+      google-analytics-data-v1beta (>= 0.7, < 2.a)
+      google-cloud-core (~> 1.6)
+    google-analytics-data-v1beta (0.11.1)
+      gapic-common (>= 0.21.1, < 2.a)
+      google-cloud-errors (~> 1.0)
+    google-apis-analytics_v3 (0.13.0)
+      google-apis-core (>= 0.11.0, < 2.a)
+    google-apis-core (0.11.2)
       addressable (~> 2.5, >= 2.5.1)
       googleauth (>= 0.16.2, < 2.a)
       httpclient (>= 2.8.1, < 3.a)
@@ -124,13 +125,37 @@ GEM
       retriable (>= 2.0, < 4.a)
       rexml
       webrick
-    googleauth (1.2.0)
-      faraday (>= 0.17.3, < 3.a)
+    google-cloud-core (1.6.1)
+      google-cloud-env (>= 1.0, < 3.a)
+      google-cloud-errors (~> 1.0)
+    google-cloud-env (2.1.0)
+      faraday (>= 1.0, < 3.a)
+    google-cloud-errors (1.3.1)
+    google-protobuf (3.25.2)
+    google-protobuf (3.25.2-x86_64-darwin)
+    google-protobuf (3.25.2-x86_64-linux)
+    googleapis-common-protos (1.4.0)
+      google-protobuf (~> 3.14)
+      googleapis-common-protos-types (~> 1.2)
+      grpc (~> 1.27)
+    googleapis-common-protos-types (1.11.0)
+      google-protobuf (~> 3.18)
+    googleauth (1.9.1)
+      faraday (>= 1.0, < 3.a)
+      google-cloud-env (~> 2.1)
       jwt (>= 1.4, < 3.0)
-      memoist (~> 0.16)
       multi_json (~> 1.11)
       os (>= 0.9, < 2.0)
       signet (>= 0.16, < 2.a)
+    grpc (1.60.0)
+      google-protobuf (~> 3.25)
+      googleapis-common-protos-types (~> 1.0)
+    grpc (1.60.0-x86_64-darwin)
+      google-protobuf (~> 3.25)
+      googleapis-common-protos-types (~> 1.0)
+    grpc (1.60.0-x86_64-linux)
+      google-protobuf (~> 3.25)
+      googleapis-common-protos-types (~> 1.0)
     htmlentities (4.3.4)
     http-accept (1.7.0)
     http-cookie (1.0.5)
@@ -138,48 +163,50 @@ GEM
     httpclient (2.8.3)
     i18n (0.9.5)
       concurrent-ruby (~> 1.0)
-    json (2.6.2)
-    json_pure (2.6.2)
-    jwt (2.4.1)
-    launchy (2.5.0)
-      addressable (~> 2.7)
-    libxml-ruby (3.2.3)
-    logger (1.5.1)
+    json (2.7.1)
+    json_pure (2.7.1)
+    jwt (2.7.1)
+    launchy (2.5.2)
+      addressable (~> 2.8)
+    libxml-ruby (5.0.2)
+    logger (1.6.0)
     macaddr (1.7.2)
       systemu (~> 2.6.5)
     mail (2.6.6)
       mime-types (>= 1.16, < 4)
-    memoist (0.16.2)
     method_source (1.0.0)
-    mime-types (3.4.1)
+    mime-types (3.5.2)
       mime-types-data (~> 3.2015)
-    mime-types-data (3.2022.0105)
-    mini_mime (1.1.2)
+    mime-types-data (3.2023.1205)
+    mini_mime (1.1.5)
     minitest (4.7.5)
     mlanett-redis-lock (0.2.7)
       redis
     multi_json (1.15.0)
-    multipart-post (2.2.3)
     net-http-persistent (2.9.4)
     netrc (0.11.0)
-    oj (2.18.5)
+    oj (3.16.3)
+      bigdecimal (>= 3.0)
     omni_logger (0.1.4)
       logger
     os (1.1.4)
     parseconfig (1.1.2)
     pony (1.13.1)
       mail (>= 2.0)
-    pry (0.14.1)
+    pry (0.14.2)
       coderay (~> 1.1)
       method_source (~> 1.0)
-    public_suffix (4.0.7)
-    rack (2.2.4)
-    rack-test (2.0.2)
+    public_suffix (5.0.4)
+    rack (3.0.8)
+    rack-test (2.1.0)
       rack (>= 1.3)
-    rake (13.0.6)
+    rake (13.1.0)
     rdf (1.0.8)
       addressable (>= 2.2)
-    redis (4.7.1)
+    redis (5.0.8)
+      redis-client (>= 0.17.0)
+    redis-client (0.19.1)
+      connection_pool
     representable (3.2.0)
       declarative (< 0.1.0)
       trailblazer-option (>= 0.1.1, < 0.2.0)
@@ -190,7 +217,7 @@ GEM
       mime-types (>= 1.16, < 4.0)
       netrc (~> 0.8)
     retriable (3.1.2)
-    rexml (3.2.5)
+    rexml (3.2.6)
     rsolr (2.5.0)
       builder (>= 2.1.2)
       faraday (>= 0.9, < 3, != 2.0.0)
@@ -199,26 +226,32 @@ GEM
     rubyzip (2.3.2)
     rufus-scheduler (2.0.24)
       tzinfo (>= 0.3.22)
-    signet (0.17.0)
+    signet (0.18.0)
       addressable (~> 2.8)
       faraday (>= 0.17.5, < 3.a)
       jwt (>= 1.5, < 3.0)
       multi_json (~> 1.10)
-    sys-proctable (1.2.6)
-      ffi
+    simplecov (0.22.0)
+      docile (~> 1.1)
+      simplecov-html (~> 0.11)
+      simplecov_json_formatter (~> 0.1)
+    simplecov-cobertura (2.1.0)
+      rexml
+      simplecov (~> 0.19)
+    simplecov-html (0.12.3)
+    simplecov_json_formatter (0.1.4)
+    sys-proctable (1.3.0)
+      ffi (~> 1.1)
     systemu (2.6.5)
     test-unit-minitest (0.9.1)
       minitest (~> 4.7)
     trailblazer-option (0.1.2)
-    tzinfo (2.0.4)
+    tzinfo (2.0.6)
       concurrent-ruby (~> 1.0)
     uber (0.1.0)
-    unf (0.1.4)
-      unf_ext
-    unf_ext (0.0.8.2)
     uuid (2.3.9)
       macaddr (~> 1.0)
-    webrick (1.7.0)
+    webrick (1.8.1)
 
 PLATFORMS
   ruby
@@ -228,16 +261,16 @@ PLATFORMS
 DEPENDENCIES
   cube-ruby
   email_spec
-  faraday (~> 1.9)
   ffi
   goo!
+  google-analytics-data
   google-apis-analytics_v3
   mail (= 2.6.6)
   minitest (< 5.0)
   multi_json
   ncbo_annotator!
   ncbo_cron!
-  oj (~> 2.0)
+  oj (~> 3.0)
   ontologies_linked_data!
   parseconfig
   pony
@@ -245,9 +278,11 @@ DEPENDENCIES
   rake
   redis
   rest-client
+  simplecov
+  simplecov-cobertura
   sparql-client!
   sys-proctable
   test-unit-minitest
 
 BUNDLED WITH
-   2.3.14
+   2.4.22
diff --git a/bin/generate_ua_analytics_file.rb b/bin/generate_ua_analytics_file.rb
new file mode 100755
index 00000000..0a432a92
--- /dev/null
+++ b/bin/generate_ua_analytics_file.rb
@@ -0,0 +1,126 @@
+require 'logger'
+require 'google/apis/analytics_v3'
+require 'google/api_client/auth/key_utils'
+
+module NcboCron
+  module Models
+
+    class OntologyAnalyticsUA
+
+      def initialize(logger)
+        @logger = logger
+      end
+
+      def run
+        redis = Redis.new(:host => NcboCron.settings.redis_host, :port => NcboCron.settings.redis_port)
+        ontology_analytics = fetch_ontology_analytics
+        File.open(NcboCron.settings.analytics_path_to_ua_data_file, 'w') do |f|
+          f.write(ontology_analytics.to_json)
+        end
+      end
+
+      def fetch_ontology_analytics
+        google_client = authenticate_google
+        aggregated_results = Hash.new
+        start_year = Date.parse(NcboCron.settings.analytics_start_date).year || 2013
+        ont_acronyms = LinkedData::Models::Ontology.where.include(:acronym).all.map {|o| o.acronym}
+        # ont_acronyms = ["NCIT", "ONTOMA", "CMPO", "AEO", "SNOMEDCT"]
+        filter_str = (NcboCron.settings.analytics_filter_str.nil? || NcboCron.settings.analytics_filter_str.empty?) ? "" : ";#{NcboCron.settings.analytics_filter_str}"
+
+        ont_acronyms.each do |acronym|
+          max_results = 10000
+          num_results = 10000
+          start_index = 1
+          results = nil
+
+          loop do
+            results = google_client.get_ga_data(
+              ids          = NcboCron.settings.analytics_profile_id,
+              start_date   = NcboCron.settings.analytics_start_date,
+              end_date     = Date.today.to_s,
+              metrics      = 'ga:pageviews',
+              {
+                dimensions:  'ga:pagePath,ga:year,ga:month',
+                filters:     "ga:pagePath=~^(\\/ontologies\\/#{acronym})(\\/?\\?{0}|\\/?\\?{1}.*)$#{filter_str}",
+                start_index: start_index,
+                max_results: max_results
+              }
+            )
+            results.rows ||= []
+            start_index += max_results
+            num_results = results.rows.length
+            @logger.info "Acronym: #{acronym}, Results: #{num_results}, Start Index: #{start_index}"
+            @logger.flush
+
+            results.rows.each do |row|
+              if aggregated_results.has_key?(acronym)
+                # year
+                if aggregated_results[acronym].has_key?(row[1].to_i)
+                  # month
+                  if aggregated_results[acronym][row[1].to_i].has_key?(row[2].to_i)
+                    aggregated_results[acronym][row[1].to_i][row[2].to_i] += row[3].to_i
+                  else
+                    aggregated_results[acronym][row[1].to_i][row[2].to_i] = row[3].to_i
+                  end
+                else
+                  aggregated_results[acronym][row[1].to_i] = Hash.new
+                  aggregated_results[acronym][row[1].to_i][row[2].to_i] = row[3].to_i
+                end
+              else
+                aggregated_results[acronym] = Hash.new
+                aggregated_results[acronym][row[1].to_i] = Hash.new
+                aggregated_results[acronym][row[1].to_i][row[2].to_i] = row[3].to_i
+              end
+            end
+
+            if num_results < max_results
+              # fill up non existent years
+              (start_year..Date.today.year).each do |y|
+                aggregated_results[acronym] = Hash.new if aggregated_results[acronym].nil?
+                aggregated_results[acronym][y] = Hash.new unless aggregated_results[acronym].has_key?(y)
+              end
+              # fill up non existent months with zeros
+              (1..12).each { |n| aggregated_results[acronym].values.each { |v| v[n] = 0 unless v.has_key?(n) } }
+              break
+            end
+          end
+        end
+
+        @logger.info "Completed Universal Analytics pull..."
+        @logger.flush
+
+        aggregated_results
+      end
+
+      def authenticate_google
+        Google::Apis::ClientOptions.default.application_name = NcboCron.settings.analytics_app_name
+        Google::Apis::ClientOptions.default.application_version = NcboCron.settings.analytics_app_version
+        # enable google api call retries in order to
+        # minigate analytics processing failure due to occasional google api timeouts and other outages
+        Google::Apis::RequestOptions.default.retries = 5
+        # uncoment to enable logging for debugging purposes
+        # Google::Apis.logger.level = Logger::DEBUG
+        # Google::Apis.logger = @logger
+        client = Google::Apis::AnalyticsV3::AnalyticsService.new
+        key = Google::APIClient::KeyUtils::load_from_pkcs12(NcboCron.settings.analytics_path_to_ua_key_file, 'notasecret')
+        client.authorization = Signet::OAuth2::Client.new(
+          :token_credential_uri => 'https://accounts.google.com/o/oauth2/token',
+          :audience             => 'https://accounts.google.com/o/oauth2/token',
+          :scope                => 'https://www.googleapis.com/auth/analytics.readonly',
+          :issuer               => NcboCron.settings.analytics_service_account_email_address,
+          :signing_key          => key
+        ).tap { |auth| auth.fetch_access_token! }
+        client
+      end
+    end
+  end
+end
+
+require 'ontologies_linked_data'
+require 'goo'
+require 'ncbo_annotator'
+require 'ncbo_cron/config'
+require_relative '../config/config'
+ontology_analytics_log_path = File.join("logs", "ontology-analytics-ua.log")
+ontology_analytics_logger = Logger.new(ontology_analytics_log_path)
+NcboCron::Models::OntologyAnalyticsUA.new(ontology_analytics_logger).run
diff --git a/bin/ncbo_cron b/bin/ncbo_cron
index 8d212382..3b7aa063 100755
--- a/bin/ncbo_cron
+++ b/bin/ncbo_cron
@@ -111,19 +111,9 @@ opt_parser = OptionParser.new do |opts|
   opts.on("--disable-update-check", "disable check for updated version of Ontoportal (for VMs)", "(default: #{options[:enable_update_check]})") do |v|
     options[:enable_update_check] = false
   end
-
-
-
-
-  opts.on("--disable-dictionary-generation", "disable mgrep dictionary generation job", "(default: #{options[:enable_dictionary_generation]})") do |v|
-    options[:enable_dictionary_generation] = false
+  opts.on("--enable-dictionary-generation-cron-job", "ENABLE mgrep dictionary generation JOB and DISABLE dictionary generation during ontology processing. If this is not passed in, dictionary is generated every time an ontology is processed.", "(default: Dictionary is generated on every ontology processing, CRON job is DISABLED)") do |v|
+    options[:enable_dictionary_generation_cron_job] = true
   end
-
-
-
-
-
-
   opts.on("--disable-obofoundry_sync", "disable OBO Foundry synchronization report", "(default: #{options[:enable_obofoundry_sync]})") do |v|
     options[:enable_obofoundry_sync] = false
   end
@@ -160,18 +150,10 @@ opt_parser = OptionParser.new do |opts|
   opts.on("--obofoundry_sync SCHED", String, "cron schedule to run OBO Foundry synchronization report", "(default: #{options[:cron_obofoundry_sync]})") do |c|
     options[:cron_obofoundry_sync] = c
   end
-
-
-
-
-  opts.on("--dictionary-generation SCHED", String, "cron schedule to run mgrep dictionary generation job", "(default: #{options[:cron_dictionary_generation]})") do |c|
-    options[:cron_dictionary_generation] = c
+  opts.on("--dictionary-generation-cron-job SCHED", String, "cron schedule to run mgrep dictionary generation job (if enabled)", "(default: #{options[:cron_dictionary_generation_cron_job]})") do |c|
+    options[:cron_dictionary_generation_cron_job] = c
   end
 
-
-
-
-
   # Display the help screen, all programs are assumed to have this option.
   opts.on_tail('--help', 'Display this screen') do
     puts opts
@@ -484,49 +466,27 @@ runner.execute do |opts|
     end
   end
 
-
-
-
-
-
-
-
-  # temporary job to generate mgrep dictionary file
+  # optional job to generate mgrep dictionary file
   # separate from ontology processing due to
   # https://github.com/ncbo/ncbo_cron/issues/45
-
-  if options[:enable_dictionary_generation]
+  if options[:enable_dictionary_generation_cron_job]
     dictionary_generation_thread = Thread.new do
       dictionary_generation_options = options.dup
-      dictionary_generation_options[:job_name] = "ncbo_cron_dictionary_generation"
+      dictionary_generation_options[:job_name] = "ncbo_cron_dictionary_generation_cron_job"
       dictionary_generation_options[:scheduler_type] = :cron
-      dictionary_generation_options[:cron_schedule] = dictionary_generation_options[:cron_dictionary_generation]
-      logger.info "Setting up mgrep dictionary generation job with #{dictionary_generation_options[:cron_dictionary_generation]}"; logger.flush
+      dictionary_generation_options[:cron_schedule] = dictionary_generation_options[:cron_dictionary_generation_cron_job]
+      logger.info "Setting up mgrep dictionary generation job with #{dictionary_generation_options[:cron_dictionary_generation_cron_job]}"; logger.flush
       NcboCron::Scheduler.scheduled_locking_job(dictionary_generation_options) do
-        logger.info "Starting mgrep dictionary generation..."; logger.flush
+        logger.info "Starting mgrep dictionary generation CRON job..."; logger.flush
         t0 = Time.now
         annotator = Annotator::Models::NcboAnnotator.new
         annotator.generate_dictionary_file()
-        logger.info "mgrep dictionary generation job completed in #{Time.now - t0} sec."; logger.flush
-        logger.info "Finished mgrep dictionary generation"; logger.flush
+        logger.info "mgrep dictionary generation CRON job completed in #{Time.now - t0} sec."; logger.flush
+        logger.info "Finished mgrep dictionary generation CRON job"; logger.flush
       end
     end
   end
 
-
-
-
-
-
-
-
-
-
-
-
-
-
-
   # Print running child processes
   require 'sys/proctable'
   at_exit do
@@ -549,12 +509,5 @@ runner.execute do |opts|
   mapping_counts_thread.join if mapping_counts_thread
   update_check_thread.join if update_check_thread
   obofoundry_sync_thread.join if obofoundry_sync_thread
-
-
-
-
   dictionary_generation_thread.join if dictionary_generation_thread
-
-
-
 end
diff --git a/bin/ncbo_ontology_annotate_generate_cache b/bin/ncbo_ontology_annotate_generate_cache
index 07286e7c..18399bea 100755
--- a/bin/ncbo_ontology_annotate_generate_cache
+++ b/bin/ncbo_ontology_annotate_generate_cache
@@ -49,7 +49,7 @@ opt_parser = OptionParser.new do |opts|
     options[:generate_dictionary] = true
   end
 
-  options[:logfile] = "logs/annotator_cache.log"
+  options[:logfile] = STDOUT
   opts.on('-l', '--logfile FILE', "Write log to FILE (default is 'logs/annotator_cache.log').") do |filename|
     options[:logfile] = filename
   end
diff --git a/bin/ncbo_ontology_archive_old_submissions b/bin/ncbo_ontology_archive_old_submissions
index 3dc5c87c..1b2268a5 100755
--- a/bin/ncbo_ontology_archive_old_submissions
+++ b/bin/ncbo_ontology_archive_old_submissions
@@ -11,31 +11,125 @@ require_relative '../lib/ncbo_cron'
 config_exists = File.exist?(File.expand_path('../../config/config.rb', __FILE__))
 abort("Please create a config/config.rb file using the config/config.rb.sample as a template") unless config_exists
 require_relative '../config/config'
+require 'optparse'
 
-logfile = 'archive_old_submissions.log'
+options = { delete: false }
+opt_parser = OptionParser.new do |opts|
+  # Set a banner, displayed at the top of the help screen.
+  opts.banner = "Usage: #{File.basename(__FILE__)} [options]"
+
+  options[:logfile] = STDOUT
+  opts.on( '-l', '--logfile FILE', "Write log to FILE (default is STDOUT)" ) do |filename|
+    options[:logfile] = filename
+  end
+
+  # Delete submission if it contains bad data
+  opts.on( '-d', '--delete', "Delete submissions that contain bad data" ) do
+    options[:delete] = true
+  end
+
+  # Display the help screen, all programs are assumed to have this option.
+  opts.on( '-h', '--help', 'Display this screen' ) do
+    puts opts
+    exit
+  end
+end
+
+opt_parser.parse!
+logfile = options[:logfile]
 if File.file?(logfile); File.delete(logfile); end
 logger = Logger.new(logfile)
-options = { process_rdf: false, index_search: false, index_commit: false,
-            run_metrics: false, reasoning: false, archive: true }
+process_actions = { process_rdf: false, generate_labels: false, index_search: false, index_commit: false,
+            process_annotator: false, diff: false, run_metrics: false, archive: true }
 onts = LinkedData::Models::Ontology.all
 onts.each { |ont| ont.bring(:acronym, :submissions) }
-onts.sort! { |a,b| a.acronym <=> b.acronym }
+onts.sort! { |a, b| a.acronym <=> b.acronym }
+bad_submissions = {}
 
 onts.each do |ont|
   latest_sub = ont.latest_submission
-  if not latest_sub.nil?
+
+  unless latest_sub.nil?
     id = latest_sub.submissionId
     subs = ont.submissions
-    old_subs = subs.reject { |sub| sub.submissionId >= id }
-    old_subs.sort! { |a,b| a.submissionId <=> b.submissionId }
+
+    old_subs = subs.reject { |sub|
+      begin
+        sub.submissionId >= id
+      rescue => e
+        msg = "Invalid submission ID detected (String instead of Integer): #{ont.acronym}/#{sub.submissionId} - #{e.class}:\n#{e.backtrace.join("\n")}"
+        puts msg
+        logger.error(msg)
+
+        if options[:delete]
+          sub.delete if options[:delete]
+          msg = "Deleted submission #{ont.acronym}/#{sub.submissionId} due to invalid Submission ID"
+          puts msg
+          logger.error(msg)
+        end
+        bad_submissions["#{ont.acronym}/#{sub.submissionId}"] = "Invalid Submission ID"
+        true
+      end
+    }
+    old_subs.sort! { |a, b| a.submissionId <=> b.submissionId }
     old_subs.each do |sub|
-      if not sub.archived?
+      unless sub.archived?
         msg = "#{ont.acronym}: found un-archived old submission with ID #{sub.submissionId}."
         puts msg
         logger.info msg
-        NcboCron::Models::OntologySubmissionParser.new.process_submission(logger, sub.id.to_s, options)
+
+        begin
+          NcboCron::Models::OntologySubmissionParser.new.process_submission(logger, sub.id.to_s, process_actions)
+        rescue => e
+          if e.class == Goo::Base::NotValidException
+            if sub.valid?
+              msg = "Error archiving submission #{ont.acronym}/#{sub.submissionId} - #{e.class}:\n#{e.backtrace.join("\n")}"
+              puts msg
+              logger.error(msg)
+              bad_submissions["#{ont.acronym}/#{sub.submissionId}"] = "Submission passes valid check but cannot be saved"
+            else
+              msg = "Error archiving submission #{ont.acronym}/#{sub.submissionId}:\n#{JSON.pretty_generate(sub.errors)}"
+              puts msg
+              logger.error(msg)
+
+              if options[:delete]
+                sub.delete if options[:delete]
+                msg = "Deleted submission #{ont.acronym}/#{sub.submissionId} due to invalid data"
+                puts msg
+                logger.error(msg)
+              end
+              bad_submissions["#{ont.acronym}/#{sub.submissionId}"] = "Submission is not valid to be saved"
+            end
+          else
+            msg = "Error archiving submission #{ont.acronym}/#{sub.submissionId} - #{e.class}:\n#{e.backtrace.join("\n")}"
+            puts msg
+            logger.error(msg)
+
+            if options[:delete] && (e.class == Net::HTTPBadResponse || e.class == Errno::ECONNREFUSED)
+              sub.delete
+              msg = "Deleted submission #{ont.acronym}/#{sub.submissionId} due to a non-working pull URL"
+              puts msg
+              logger.error(msg)
+            end
+            bad_submissions["#{ont.acronym}/#{sub.submissionId}"] = "#{e.class} - Runtime error"
+          end
+        end
       end
     end
   end
 end
 
+puts
+
+if bad_submissions.empty?
+  msg = "No errored submissions found"
+  puts msg
+  logger.info(msg)
+else
+  msg = JSON.pretty_generate(bad_submissions)
+  puts msg
+  logger.error(msg)
+  msg = "Number of errored submissions: #{bad_submissions.length}"
+  puts msg
+  logger.error(msg)
+end
\ No newline at end of file
diff --git a/bin/ncbo_ontology_import b/bin/ncbo_ontology_import
index db2e90c5..57d63aa1 100755
--- a/bin/ncbo_ontology_import
+++ b/bin/ncbo_ontology_import
@@ -20,27 +20,27 @@ require 'net/http'
 require 'optparse'
 ontologies_acronyms = ''
 ontology_source = ''
-source_api = ''
+source_apikey = ''
 username = ''
 opt_parser = OptionParser.new do |opts|
   opts.banner = 'Usage: ncbo_ontology_import [options]'
-  opts.on('-o', '--ontology ACRONYM', 'Ontologies acronyms which we want to import (separated by comma)') do |acronym|
+  opts.on('-o', '--ontologies ACRONYM1,ACRONYM2', 'Comma-separated list of ontologies to import') do |acronym|
     ontologies_acronyms = acronym
   end
 
-  opts.on('--from url', 'The ontoportal api url source of the ontology') do |url|
+  opts.on('--from URL', 'The ontoportal api url source of the ontology') do |url|
     ontology_source = url.to_s
   end
 
-  opts.on('--from-api api', 'An apikey to acces the ontoportal api') do |api|
-    source_api = api.to_s
+  opts.on('--from-apikey APIKEY', 'An apikey to acces the ontoportal api') do |apikey|
+    source_apikey = apikey.to_s
   end
 
-  opts.on('--admin-user username', 'The target admin user that will submit the ontology') do |user|
+  opts.on('--admin-user USERNAME', 'The target admin user that will submit the ontology') do |user|
     username = user.to_s
   end
   # Display the help screen, all programs are assumed to have this option.
-  opts.on( '-h', '--help', 'Display this screen') do
+  opts.on('-h', '--help', 'Display this screen') do
     puts opts
     exit
   end
@@ -48,9 +48,8 @@ end
 opt_parser.parse!
 
 # URL of the API and APIKEY of the Ontoportal  we want to import data FROM
-SOURCE_API =  ontology_source
-SOURCE_APIKEY =  source_api
-
+SOURCE_API = ontology_source
+SOURCE_APIKEY = source_apikey
 
 # The username of the user that will have the administration rights on the ontology on the target portal
 TARGETED_PORTAL_USER = username
@@ -58,17 +57,15 @@ TARGETED_PORTAL_USER = username
 # The list of acronyms of ontologies to import
 ONTOLOGIES_TO_IMPORT = ontologies_acronyms.split(',') || []
 
-
 def get_user(username)
   user = LinkedData::Models::User.find(username).first
   raise "The user #{username} does not exist" if user.nil?
+
   user.bring_remaining
 end
 
-
 # A function to create a new ontology (if already Acronym already existing on the portal it will return HTTPConflict)
 def create_ontology(ont_info)
-
   new_ontology = LinkedData::Models::Ontology.new
 
   new_ontology.acronym = ont_info['acronym']
@@ -97,23 +94,30 @@ def upload_submission(sub_info, ontology)
   # Build the json body
   # hasOntologyLanguage options: OWL, UMLS, SKOS, OBO
   # status: alpha, beta, production, retired
-  attr_to_reject = %w[id submissionStatus hasOntologyLanguage metrics ontology @id @type contact]
-  to_copy = sub_info.select do |k,v|
+  attr_to_reject = %w[id submissionStatus hasOntologyLanguage metrics ontology @id @type contact uploadFilePath diffFilePath]
+  to_copy = sub_info.select do |k, v|
     !v.nil? && !v.is_a?(Hash) && !v.to_s.empty? && !attr_to_reject.include?(k)
   end
   to_copy["ontology"] = ontology
-  to_copy["contact"] =  [LinkedData::Models::Contact.where(email: USER.email).first]
-  to_copy["hasOntologyLanguage"] =  LinkedData::Models::OntologyFormat.where(acronym: sub_info["hasOntologyLanguage"]).first
+
+  contact = LinkedData::Models::Contact.where(email: USER.email).first
+  unless contact
+    contact = LinkedData::Models::Contact.new(name: USER.username, email: USER.email).save
+    puts "created a new contact; name: #{USER.username}, email: #{USER.email}"
+  end
+
+  to_copy["contact"] = [contact]
+  to_copy["hasOntologyLanguage"] = LinkedData::Models::OntologyFormat.where(acronym: sub_info["hasOntologyLanguage"]).first
 
   to_copy.each do |key, value|
     attribute_settings = new_submission.class.attribute_settings(key.to_sym)
 
     if attribute_settings
-      if  attribute_settings[:enforce]&.include?(:date_time)
+      if attribute_settings[:enforce]&.include?(:date_time)
         value = DateTime.parse(value)
       elsif attribute_settings[:enforce]&.include?(:uri) && attribute_settings[:enforce]&.include?(:list)
         value = value.map { |v| RDF::IRI.new(v) }
-      elsif  attribute_settings[:enforce]&.include?(:uri)
+      elsif attribute_settings[:enforce]&.include?(:uri)
         value = RDF::IRI.new(value)
       end
     end
@@ -124,12 +128,11 @@ def upload_submission(sub_info, ontology)
   new_submission
 end
 
-
 USER = get_user username
-#get apikey for admin user
+# get apikey for admin user
 TARGET_APIKEY = USER.apikey
 
-SOURCE_APIKEY == '' && abort('--from-api has to be set')
+SOURCE_APIKEY == '' && abort('--from-apikey has to be set')
 SOURCE_API == '' && abort('--from has to be set')
 
 def result_log(ressource, errors)
@@ -143,10 +146,11 @@ end
 # Go through all ontologies acronym and get their latest_submission informations
 ONTOLOGIES_TO_IMPORT.each do |ont|
   sub_info = JSON.parse(Net::HTTP.get(URI.parse("#{SOURCE_API}/ontologies/#{ont}/latest_submission?apikey=#{SOURCE_APIKEY}&display=all")))
-  puts "Import #{ont} " ,
+  puts "Import #{ont} ",
        "From #{SOURCE_API}"
   # if the ontology is already created then it will return HTTPConflict, no consequences
   raise "The ontology #{ont} does not exist" if sub_info['ontology'].nil?
+
   new_ontology = create_ontology(sub_info['ontology'])
   errors = nil
   if new_ontology.valid?
@@ -159,6 +163,7 @@ ONTOLOGIES_TO_IMPORT.each do |ont|
 
   new_ontology ||= LinkedData::Models::Ontology.where(acronym: ont).first
   new_submission = upload_submission(sub_info, new_ontology)
+
   if new_submission.valid?
     new_submission.save
     errors = nil
@@ -167,6 +172,3 @@ ONTOLOGIES_TO_IMPORT.each do |ont|
   end
   result_log(sub_info["id"], errors)
 end
-
-
-
diff --git a/bin/ncbo_ontology_process b/bin/ncbo_ontology_process
index d96f0d87..879e749d 100755
--- a/bin/ncbo_ontology_process
+++ b/bin/ncbo_ontology_process
@@ -31,9 +31,14 @@ opt_parser = OptionParser.new do |opts|
   end
 
   options[:tasks] = NcboCron::Models::OntologySubmissionParser::ACTIONS
-  opts.on('-t', '--tasks process_rdf,index_search,run_metrics', "Optional comma-separated list of processing tasks to perform. Default: #{NcboCron::Models::OntologySubmissionParser::ACTIONS.keys.join(',')}") do |tasks|
-    t = tasks.split(",").map {|t| t.strip.sub(/^:/, '').to_sym}
-    options[:tasks].each {|k, _| options[:tasks][k] = false unless t.include?(k)}
+  opts.on('-t', '--tasks process_rdf,generate_labels=false,index_search,run_metrics', "Optional comma-separated list of processing tasks to perform (or exclude). Default: #{NcboCron::Models::OntologySubmissionParser::ACTIONS.keys.join(',')}") do |tasks|
+    tasks_obj = {}
+    tasks.split(',').each { |t|
+      t_arr = t.gsub(/\s+/, '').gsub(/^:/, '').split('=')
+      tasks_obj[t_arr[0].to_sym] = (t_arr.length <= 1 || t_arr[1].downcase === 'true')
+    }
+    tasks_obj[:generate_labels] = true if tasks_obj[:process_rdf] && !tasks_obj.has_key?(:generate_labels)
+    options[:tasks].each {|k, _| options[:tasks][k] = false unless tasks_obj[k]}
   end
 
   options[:logfile] = STDOUT
diff --git a/bin/ncbo_ontology_pull b/bin/ncbo_ontology_pull
new file mode 100755
index 00000000..be3e08de
--- /dev/null
+++ b/bin/ncbo_ontology_pull
@@ -0,0 +1,42 @@
+#!/usr/bin/env ruby
+
+$0 = "ncbo_ontology_pull"
+
+# Exit cleanly from an early interrupt
+Signal.trap("INT") { exit 1 }
+
+# Setup the bundled gems in our environment
+require 'bundler/setup'
+# redis store for looking up queued jobs
+require 'redis'
+
+require_relative '../lib/ncbo_cron'
+require_relative '../config/config'
+require 'optparse'
+
+ontology_acronym = ''
+opt_parser = OptionParser.new do |opts|
+  opts.on('-o', '--ontology ACRONYM', 'Ontology acronym to pull if new version exist') do |acronym|
+    ontology_acronym = acronym
+  end
+
+  # Display the help screen, all programs are assumed to have this option.
+  opts.on( '-h', '--help', 'Display this screen') do
+    puts opts
+    exit
+  end
+end
+opt_parser.parse!
+
+logger = Logger.new($stdout)
+logger.info "Starting ncbo pull"; logger.flush
+puller = NcboCron::Models::OntologyPull.new
+begin
+  puller.do_ontology_pull(ontology_acronym, logger: logger, enable_pull_umls: true)
+rescue StandardError => e
+  logger.error e.message
+  logger.flush
+end
+logger.info "Finished ncbo pull"; logger.flush
+
+
diff --git a/bin/ncbo_ontology_submissions_eradicate b/bin/ncbo_ontology_submissions_eradicate
new file mode 100755
index 00000000..ef2c7a19
--- /dev/null
+++ b/bin/ncbo_ontology_submissions_eradicate
@@ -0,0 +1,107 @@
+#!/usr/bin/env ruby
+
+$0 = 'ncbo_cron'
+
+# Exit cleanly from an early interrupt
+Signal.trap('INT') { exit 1 }
+
+# Setup the bundled gems in our environment
+require 'bundler/setup'
+# redis store for looking up queued jobs
+require 'redis'
+
+require_relative '../lib/ncbo_cron'
+require_relative '../config/config'
+require 'optparse'
+ontology_acronym = ''
+submission_id_from = 0
+submission_id_to = 0
+
+opt_parser = OptionParser.new do |opts|
+  opts.banner = 'Usage: ncbo_ontology_sumissions_eradicate [options]'
+  opts.on('-o', '--ontology ACRONYM', 'Ontology acronym which we want to eradicate (remove triples+files) specific submissions') do |acronym|
+    ontology_acronym = acronym
+  end
+
+  opts.on('--from id', 'Submission id to start from deleting (included)') do |id|
+    submission_id_from = id.to_i
+  end
+
+  opts.on('--to id', 'Submission id to end deleting (included)') do |id|
+    submission_id_to = id.to_i
+  end
+  # Display the help screen, all programs are assumed to have this option.
+  opts.on( '-h', '--help', 'Display this screen') do
+    puts opts
+    exit
+  end
+end
+opt_parser.parse!
+
+
+
+
+
+def ontology_exists?(ontology_acronym)
+  ont = LinkedData::Models::Ontology.find(ontology_acronym)
+                                    .include(submissions: [:submissionId])
+                                    .first
+  if ont.nil?
+    logger.error "ontology not found: #{options[:ontology]}"
+    exit(1)
+  end
+  ont.bring(:submissions) if ont.bring?(:submissions)
+  ont
+end
+
+
+def get_submission_to_delete(submissions, from, to)
+  min, max = [from, to].minmax
+  submissions.select { |s| s.submissionId.between?(min, max) }.sort { |s1, s2| s1.submissionId <=> s2.submissionId}
+end
+
+def eradicate(ontology_acronym, submissions , logger)
+  logger ||= Logger.new($stderr)
+  submissions.each do |submission|
+    begin
+      logger.info "Start removing submission #{submission.submissionId.to_s}"
+      NcboCron::Models::OntologySubmissionEradicator.new.eradicate submission
+      logger.info"Submission #{submission.submissionId.to_s} deleted successfully"
+    rescue NcboCron::Models::OntologySubmissionEradicator::RemoveNotArchivedSubmissionException
+      logger.info "Submission #{submission.submissionId.to_s} is not archived"
+      ask? logger, 'Do you want to force remove ? (Y/n)'
+      NcboCron::Models::OntologySubmissionEradicator.new.eradicate submission, true
+      logger.info"Submission #{submission.submissionId.to_s} deleted successfully"
+    rescue NcboCron::Models::OntologySubmissionEradicator::RemoveSubmissionFileException => e
+      logger.error "RemoveSubmissionFileException in submission #{submission.submissionId.to_s} : #{e.message}"
+    rescue  NcboCron::Models::OntologySubmissionEradicator::RemoveSubmissionDataException => e
+      logger.error "RemoveSubmissionDataException in submission #{submission.submissionId.to_s} : #{e.message}"
+    rescue  Exception => e
+      logger.error "Error in submission #{submission.submissionId.to_s} remove: #{e.message}"
+    end
+  end
+end
+
+def ask?(logger, prompt)
+  logger.info prompt
+  choice = gets.chomp.downcase
+  exit(1) if choice.eql? 'n'
+end
+
+begin
+  logger = Logger.new($stderr)
+
+  logger.info 'Start of NCBO ontology submissions eradicate'
+
+  ont = ontology_exists? ontology_acronym
+
+  submissions = ont.submissions
+  submissions_to_delete = get_submission_to_delete submissions, submission_id_from, submission_id_to
+
+  logger.info "You are attempting to remove the following submissions of #{ontology_acronym} : #{submissions_to_delete.map{ |s| s.submissionId }.join(', ')}"
+  logger.info 'They will be deleted from the triple store and local files'
+  ask? logger, 'Do you confirm ? (Y/n)'
+
+  eradicate ontology_acronym , submissions_to_delete, logger
+  exit(0)
+end
\ No newline at end of file
diff --git a/config/config.rb.sample b/config/config.rb.sample
index 15125224..668c7a0c 100644
--- a/config/config.rb.sample
+++ b/config/config.rb.sample
@@ -1,16 +1,42 @@
-LinkedData.config do |config|
-  config.enable_monitoring = false
-  config.cube_host = "localhost"
-  config.goo_host = "localhost"
-  config.goo_port = 8080
-  config.search_server_url = "http://localhost:8983/solr/term_search_core1"
-  config.property_search_server_url = "http://localhost:8983/solr/prop_search_core1"
-  config.repository_folder = "./test/data/ontology_files/repo"
-  config.http_redis_host = "localhost"
-  config.http_redis_port = 6379
-  config.goo_redis_host = "localhost"
-  config.goo_redis_port = 6379
+# This file is designed to be used for unit testing with docker-compose
+
+GOO_BACKEND_NAME      = ENV.include?("GOO_BACKEND_NAME")      ? ENV["GOO_BACKEND_NAME"]      : "4store"
+GOO_HOST              = ENV.include?("GOO_HOST")              ? ENV["GOO_HOST"]              : "localhost"
+GOO_PATH_DATA         = ENV.include?("GOO_PATH_DATA")         ? ENV["GOO_PATH_DATA"]         : "/data/"
+GOO_PATH_QUERY        = ENV.include?("GOO_PATH_QUERY")        ? ENV["GOO_PATH_QUERY"]        : "/sparql/"
+GOO_PATH_UPDATE       = ENV.include?("GOO_PATH_UPDATE")       ? ENV["GOO_PATH_UPDATE"]       : "/update/"
+GOO_PORT              = ENV.include?("GOO_PORT")              ? ENV["GOO_PORT"]              : 9000
+MGREP_HOST            = ENV.include?("MGREP_HOST")            ? ENV["MGREP_HOST"]            : "localhost"
+MGREP_PORT            = ENV.include?("MGREP_PORT")            ? ENV["MGREP_PORT"]            : 55555
+MGREP_DICT_PATH       = ENV.include?("MGREP_DICT_PATH")       ? ENV["MGREP_DICT_PATH"]       : "./test/data/dictionary.txt"
+REDIS_GOO_CACHE_HOST  = ENV.include?("REDIS_GOO_CACHE_HOST")  ? ENV["REDIS_GOO_CACHE_HOST"]  : "localhost"
+REDIS_HTTP_CACHE_HOST = ENV.include?("REDIS_HTTP_CACHE_HOST") ? ENV["REDIS_HTTP_CACHE_HOST"] : "localhost"
+REDIS_PERSISTENT_HOST = ENV.include?("REDIS_PERSISTENT_HOST") ? ENV["REDIS_PERSISTENT_HOST"] : "localhost"
+REDIS_PORT            = ENV.include?("REDIS_PORT")            ? ENV["REDIS_PORT"]            : 6379
+REPORT_PATH           = ENV.include?("REPORT_PATH")           ? ENV["REPORT_PATH"]           : "./test/tmp/ontologies_report.json"
+REPOSITORY_FOLDER     = ENV.include?("REPOSITORY_FOLDER")     ? ENV["REPOSITORY_FOLDER"]     : "./test/data/ontology_files/repo"
+REST_URL_PREFIX       = ENV.include?("REST_URL_PREFIX")       ? ENV["REST_URL_PREFIX"]       : "http://localhost:9393"
+SOLR_PROP_SEARCH_URL  = ENV.include?("SOLR_PROP_SEARCH_URL")  ? ENV["SOLR_PROP_SEARCH_URL"]  : "http://localhost:8983/solr/prop_search_core1"
+SOLR_TERM_SEARCH_URL  = ENV.include?("SOLR_TERM_SEARCH_URL")  ? ENV["SOLR_TERM_SEARCH_URL"]  : "http://localhost:8983/solr/term_search_core1"
 
+LinkedData.config do |config|
+  config.goo_backend_name              = GOO_BACKEND_NAME.to_s
+  config.goo_host                      = GOO_HOST.to_s
+  config.goo_port                      = GOO_PORT.to_i
+  config.goo_path_query                = GOO_PATH_QUERY.to_s
+  config.goo_path_data                 = GOO_PATH_DATA.to_s
+  config.goo_path_update               = GOO_PATH_UPDATE.to_s
+  config.goo_redis_host                = REDIS_GOO_CACHE_HOST.to_s
+  config.goo_redis_port                = REDIS_PORT.to_i
+  config.http_redis_host               = REDIS_HTTP_CACHE_HOST.to_s
+  config.http_redis_port               = REDIS_PORT.to_i
+  config.ontology_analytics_redis_host = REDIS_PERSISTENT_HOST.to_s
+  config.ontology_analytics_redis_port = REDIS_PORT.to_i
+  config.repository_folder             = REPOSITORY_FOLDER.to_s
+  config.search_server_url             = SOLR_TERM_SEARCH_URL.to_s
+  config.property_search_server_url    = SOLR_PROP_SEARCH_URL.to_s
+#  config.replace_url_prefix            = false
+#  config.rest_url_prefix               = REST_URL_PREFIX.to_s
   # Email notifications.
   config.enable_notifications    = true
   config.email_sender            = "sender@domain.com" # Default sender for emails
@@ -19,35 +45,38 @@ LinkedData.config do |config|
   config.smtp_user               = nil
   config.smtp_password           = nil
   config.smtp_auth_type          = :none
-  config.smtp_domain             = "localhost.localhost"  
+  config.smtp_domain             = "localhost.localhost"
 end
 
 Annotator.config do |config|
-  config.mgrep_dictionary_file   ||= "./test/tmp/dict"
-  config.stop_words_default_file ||= "./config/default_stop_words.txt"
   config.mgrep_host              ||= "localhost"
-  config.mgrep_port              ||= 55555
-  config.annotator_redis_host  ||= "localhost"
-  config.annotator_redis_port  ||= 6379
+  config.annotator_redis_host  = REDIS_PERSISTENT_HOST.to_s
+  config.annotator_redis_port  = REDIS_PORT.to_i
+  config.mgrep_host            = MGREP_HOST.to_s
+  config.mgrep_port            = MGREP_PORT.to_i
+  config.mgrep_dictionary_file = MGREP_DICT_PATH.to_s
 end
 
 NcboCron.config do |config|
-  config.redis_host  ||= "localhost"
-  config.redis_port  ||= 6379
+  config.redis_host = REDIS_PERSISTENT_HOST.to_s
+  config.redis_port = REDIS_PORT.to_i
+  # Ontologies Report config
+  config.ontology_report_path = REPORT_PATH
+
+  # do not deaemonize in docker
+  config.daemonize  = false
+
   config.search_index_all_url = "http://localhost:8983/solr/term_search_core2"
   config.property_search_index_all_url = "http://localhost:8983/solr/prop_search_core2"
 
-  # Ontologies Report config
-  config.ontology_report_path = "./test/reports/ontologies_report.json"
-
-  # Google Analytics config
-  config.analytics_service_account_email_address = "123456789999-sikipho0wk8q0atflrmw62dj4kpwoj3c@developer.gserviceaccount.com"
-  config.analytics_path_to_key_file              = "config/bioportal-analytics.p12"
-  config.analytics_profile_id                    = "ga:1234567"
-  config.analytics_app_name                      = "BioPortal"
-  config.analytics_app_version                   = "1.0.0"
-  config.analytics_start_date                    = "2013-10-01"
-  config.analytics_filter_str                    = "ga:networkLocation!@stanford;ga:networkLocation!@amazon"
+  # Google Analytics GA4 config
+  config.analytics_path_to_key_file     = "config/your_analytics_key.json"
+  config.analytics_property_id          = "123456789"
+  # path to the Universal Analytics data, which stopped collecting on June 1st, 2023
+  config.analytics_path_to_ua_data_file = "data/your_ua_data.json"
+  # path to the file that will hold your Google Analytics data
+  # this is in addition to storing it in Redis
+  config.analytics_path_to_ga_data_file = "data/your_ga_data.json"
 
   # this is a Base64.encode64 encoded personal access token
   # you need to run Base64.decode64 on it before using it in your code
diff --git a/config/config.test.rb b/config/config.test.rb
index 97eaf1f7..84a621ac 100644
--- a/config/config.test.rb
+++ b/config/config.test.rb
@@ -1,49 +1,69 @@
 # This file is designed to be used for unit testing with docker-compose
-#
-GOO_PATH_QUERY   = ENV.include?('GOO_PATH_QUERY')   ? ENV['GOO_PATH_QUERY']   : '/sparql/'
-GOO_PATH_DATA    = ENV.include?('GOO_PATH_DATA')    ? ENV['GOO_PATH_DATA']    : '/data/'
-GOO_PATH_UPDATE  = ENV.include?('GOO_PATH_UPDATE')  ? ENV['GOO_PATH_UPDATE']  : '/update/'
-GOO_BACKEND_NAME = ENV.include?('GOO_BACKEND_NAME') ? ENV['GOO_BACKEND_NAME'] : 'localhost'
-GOO_PORT         = ENV.include?('GOO_PORT')         ? ENV['GOO_PORT']         : 9000
-GOO_HOST         = ENV.include?('GOO_HOST')         ? ENV['GOO_HOST']         : 'localhost'
-SOLR_HOST        = ENV.include?('SOLR_HOST')        ? ENV['SOLR_HOST']        : 'localhost'
-REDIS_HOST       = ENV.include?('REDIS_HOST')       ? ENV['REDIS_HOST']       : 'localhost'
-REDIS_PORT       = ENV.include?('REDIS_PORT')       ? ENV['REDIS_PORT']       : 6379
-MGREP_HOST       = ENV.include?('MGREP_HOST')       ? ENV['MGREP_HOST']       : 'localhost'
-MGREP_PORT       = ENV.include?('MGREP_PORT')       ? ENV['MGREP_PORT']       : 55555
+
+GOO_BACKEND_NAME      = ENV.include?("GOO_BACKEND_NAME")      ? ENV["GOO_BACKEND_NAME"]      : "4store"
+GOO_HOST              = ENV.include?("GOO_HOST")              ? ENV["GOO_HOST"]              : "localhost"
+GOO_PATH_DATA         = ENV.include?("GOO_PATH_DATA")         ? ENV["GOO_PATH_DATA"]         : "/data/"
+GOO_PATH_QUERY        = ENV.include?("GOO_PATH_QUERY")        ? ENV["GOO_PATH_QUERY"]        : "/sparql/"
+GOO_PATH_UPDATE       = ENV.include?("GOO_PATH_UPDATE")       ? ENV["GOO_PATH_UPDATE"]       : "/update/"
+GOO_PORT              = ENV.include?("GOO_PORT")              ? ENV["GOO_PORT"]              : 9000
+MGREP_HOST            = ENV.include?("MGREP_HOST")            ? ENV["MGREP_HOST"]            : "localhost"
+MGREP_PORT            = ENV.include?("MGREP_PORT")            ? ENV["MGREP_PORT"]            : 55555
+MGREP_DICT_PATH       = ENV.include?("MGREP_DICT_PATH")       ? ENV["MGREP_DICT_PATH"]       : "./test/data/dictionary.txt"
+REDIS_GOO_CACHE_HOST  = ENV.include?("REDIS_GOO_CACHE_HOST")  ? ENV["REDIS_GOO_CACHE_HOST"]  : "localhost"
+REDIS_HTTP_CACHE_HOST = ENV.include?("REDIS_HTTP_CACHE_HOST") ? ENV["REDIS_HTTP_CACHE_HOST"] : "localhost"
+REDIS_PERSISTENT_HOST = ENV.include?("REDIS_PERSISTENT_HOST") ? ENV["REDIS_PERSISTENT_HOST"] : "localhost"
+REDIS_PORT            = ENV.include?("REDIS_PORT")            ? ENV["REDIS_PORT"]            : 6379
+REPORT_PATH           = ENV.include?("REPORT_PATH")           ? ENV["REPORT_PATH"]           : "./test/tmp/ontologies_report.json"
+REPOSITORY_FOLDER     = ENV.include?("REPOSITORY_FOLDER")     ? ENV["REPOSITORY_FOLDER"]     : "./test/data/ontology_files/repo"
+REST_URL_PREFIX       = ENV.include?("REST_URL_PREFIX")       ? ENV["REST_URL_PREFIX"]       : "http://localhost:9393"
+SOLR_PROP_SEARCH_URL  = ENV.include?("SOLR_PROP_SEARCH_URL")  ? ENV["SOLR_PROP_SEARCH_URL"]  : "http://localhost:8983/solr/prop_search_core1"
+SOLR_TERM_SEARCH_URL  = ENV.include?("SOLR_TERM_SEARCH_URL")  ? ENV["SOLR_TERM_SEARCH_URL"]  : "http://localhost:8983/solr/term_search_core1"
 
 LinkedData.config do |config|
+  config.goo_backend_name              = GOO_BACKEND_NAME.to_s
   config.goo_host                      = GOO_HOST.to_s
   config.goo_port                      = GOO_PORT.to_i
-  config.goo_redis_host                = REDIS_HOST.to_s
+  config.goo_path_query                = GOO_PATH_QUERY.to_s
+  config.goo_path_data                 = GOO_PATH_DATA.to_s
+  config.goo_path_update               = GOO_PATH_UPDATE.to_s
+  config.goo_redis_host                = REDIS_GOO_CACHE_HOST.to_s
   config.goo_redis_port                = REDIS_PORT.to_i
-  config.http_redis_host               = REDIS_HOST.to_s
+  config.http_redis_host               = REDIS_HTTP_CACHE_HOST.to_s
   config.http_redis_port               = REDIS_PORT.to_i
-  config.ontology_analytics_redis_host = REDIS_HOST.to_s
+  config.ontology_analytics_redis_host = REDIS_PERSISTENT_HOST.to_s
   config.ontology_analytics_redis_port = REDIS_PORT.to_i
-  config.search_server_url             = "http://#{SOLR_HOST}:8983/solr/term_search_core1".to_s
-  config.property_search_server_url    = "http://#{SOLR_HOST}:8983/solr/prop_search_core1".to_s
+  config.repository_folder             = REPOSITORY_FOLDER.to_s
+  config.search_server_url             = SOLR_TERM_SEARCH_URL.to_s
+  config.property_search_server_url    = SOLR_PROP_SEARCH_URL.to_s
+#  config.replace_url_prefix            = false
+#  config.rest_url_prefix               = REST_URL_PREFIX.to_s
   # Email notifications.
   config.enable_notifications    = true
-  config.email_sender            = 'sender@domain.com' # Default sender for emails
-  config.email_override          = 'test@domain.com' # By default, all email gets sent here.  Disable with email_override_disable.
-  config.smtp_host               = 'smtp-unencrypted.stanford.edu'
+  config.email_sender            = "sender@domain.com" # Default sender for emails
+  config.email_override          = "test@domain.com" # By default, all email gets sent here.  Disable with email_override_disable.
+  config.smtp_host               = "smtp-unencrypted.stanford.edu"
   config.smtp_user               = nil
   config.smtp_password           = nil
   config.smtp_auth_type          = :none
-  config.smtp_domain             = 'localhost.localhost'
+  config.smtp_domain             = "localhost.localhost"
 end
 
 Annotator.config do |config|
-  config.annotator_redis_host          = REDIS_HOST.to_s
-  config.annotator_redis_port          = REDIS_PORT.to_i
-  config.mgrep_host                    = MGREP_HOST.to_s
-  config.mgrep_port                    = MGREP_PORT.to_i
-  config.mgrep_dictionary_file         = './test/data/dictionary.txt'
+  config.annotator_redis_host  = REDIS_PERSISTENT_HOST.to_s
+  config.annotator_redis_port  = REDIS_PORT.to_i
+  config.mgrep_host            = MGREP_HOST.to_s
+  config.mgrep_port            = MGREP_PORT.to_i
+  config.mgrep_dictionary_file = MGREP_DICT_PATH.to_s
 end
 
+# LinkedData::OntologiesAPI.config do |config|
+#   config.http_redis_host = REDIS_HTTP_CACHE_HOST.to_s
+#   config.http_redis_port = REDIS_PORT.to_i
+# end
+#
 NcboCron.config do |config|
-  config.redis_host = REDIS_HOST.to_s
+  config.daemonize  = false
+  config.redis_host = REDIS_PERSISTENT_HOST.to_s
   config.redis_port = REDIS_PORT.to_i
-  config.ontology_report_path = './test/ontologies_report.json'
+  config.ontology_report_path = REPORT_PATH
 end
diff --git a/dip.yml b/dip.yml
new file mode 100644
index 00000000..3bbe4444
--- /dev/null
+++ b/dip.yml
@@ -0,0 +1,54 @@
+version: '7.1'
+
+# Define default environment variables to pass
+# to Docker Compose
+#environment:
+#  RAILS_ENV: development
+
+compose:
+  files:
+    - docker-compose.yml
+    #  project_name: ncbo_cron
+
+interaction:
+  # This command spins up a ncbo_cron container with the required dependencies (solr, 4store, etc),
+  # and opens a terminal within it.
+  runner:
+    description: Open a Bash shell within a ncbo_cron container (with dependencies up)
+    service: ncbo_cron
+    command: /bin/bash
+
+  # Run a container without any dependent services
+  bash:
+    description: Run an arbitrary script within a container (or open a shell without deps)
+    service: ncbo_cron
+    command: /bin/bash
+    compose_run_options: [ no-deps ]
+
+  # A shortcut to run Bundler commands
+  bundle:
+    description: Run Bundler commands within ncbo_cron container (with depencendies up)
+    service: ncbo_cron
+    command: bundle
+
+  # A shortcut to run unit tests
+  test:
+    description: Run unit tests with 4store triplestore
+    service: ncbo_cron
+    command: bundle exec rake test TESTOPTS='-v'
+
+  test-ag:
+    description: Run unit tests with AllegroGraph triplestore
+    service: ncbo_cron-agraph
+    command: bundle exec rake test TESTOPTS='-v'
+
+  'redis-cli':
+    description: Run Redis console
+    service: redis-ut
+    command: redis-cli -h redis-ut
+
+#provision:
+  #- dip compose down --volumes
+  #- dip compose up -d solr 4store
+  #- dip bundle install
+  #- dip bash -c bin/setup
diff --git a/docker-compose.yml b/docker-compose.yml
new file mode 100644
index 00000000..5f4e9307
--- /dev/null
+++ b/docker-compose.yml
@@ -0,0 +1,139 @@
+x-app: &app
+  build:
+    context: .
+    args:
+      RUBY_VERSION: '2.7'
+  # Increase the version number in the image tag every time Dockerfile or its arguments is changed
+  image: ncbo_cron:0.0.2
+  environment: &env
+    BUNDLE_PATH: /srv/ontoportal/bundle
+    # default bundle config resolves to /usr/local/bundle/config inside of the container
+    # we are setting it to local app directory if we need to use 'bundle config local'
+    BUNDLE_APP_CONFIG: /srv/ontoportal/ncbo_cron/.bundle
+    COVERAGE: 'true'
+    GOO_REDIS_HOST: redis-ut
+    REDIS_GOO_CACHE_HOST: redis-ut
+    REDIS_HTTP_CACHE_HOST: redis-ut
+    REDIS_PERSISTENT_HOST: redis-ut
+    REDIS_PORT: 6379
+    SOLR_TERM_SEARCH_URL: http://solr-ut:8983/solr/term_search_core1
+    SOLR_PROP_SEARCH_URL: http://solr-ut:8983/solr/prop_search_core1
+    MGREP_HOST: mgrep-ut
+    MGREP_PORT: 55556
+  stdin_open: true
+  tty: true
+  command: "bundle exec rackup -o 0.0.0.0 --port 9393"
+  volumes:
+    # bundle volume for hosting gems installed by bundle; it helps in local development with gem udpates
+    - bundle:/srv/ontoportal/bundle
+    # ncbo_cron code
+    - .:/srv/ontoportal/ncbo_cron
+    # mount directory containing development version of the gems if you need to use 'bundle config local'
+    #- /Users/alexskr/ontoportal:/Users/alexskr/ontoportal
+  depends_on: &depends_on
+    solr-ut:
+      condition: service_healthy
+    redis-ut:
+      condition: service_healthy
+    mgrep-ut:
+      condition: service_healthy
+
+services:
+  ncbo_cron:
+    <<: *app
+    environment:
+      <<: *env
+      GOO_BACKEND_NAME: 4store
+      GOO_PORT: 9000
+      GOO_HOST: 4store-ut
+      GOO_PATH_QUERY: /sparql/
+      GOO_PATH_DATA: /data/
+      GOO_PATH_UPDATE: /update/
+    profiles:
+      - 4store
+    depends_on:
+      <<: *depends_on
+      4store-ut:
+        condition: service_started
+
+  ncbo_cron-agraph:
+    <<: *app
+    environment:
+      <<: *env
+      GOO_BACKEND_NAME: ag
+      GOO_PORT: 10035
+      GOO_HOST: agraph-ut
+      GOO_PATH_QUERY: /repositories/bioportal_test
+      GOO_PATH_DATA: /repositories/bioportal_test/statements
+      GOO_PATH_UPDATE: /repositories/bioportal_test/statements
+    profiles:
+      - agraph
+    depends_on:
+      <<: *depends_on
+      agraph-ut:
+        condition: service_healthy
+
+  redis-ut:
+    image: redis
+    healthcheck:
+      test: redis-cli ping
+      interval: 10s
+      timeout: 3s
+      retries: 10
+
+  4store-ut:
+    image: bde2020/4store
+    platform: linux/amd64
+    #volume: fourstore:/var/lib/4store
+    command: >
+      bash -c "4s-backend-setup --segments 4 ontoportal_kb
+      && 4s-backend ontoportal_kb
+      && 4s-httpd -D -s-1 -p 9000 ontoportal_kb"
+    profiles:
+      - 4store
+
+  solr-ut:
+    image: ontoportal/solr-ut:0.0.2
+    healthcheck:
+      test: ["CMD-SHELL", "curl -sf http://localhost:8983/solr/term_search_core1/admin/ping?wt=json | grep -iq '\"status\":\"OK\"}' || exit 1"]
+      start_period: 3s
+      interval: 10s
+      timeout: 5s
+      retries: 5
+
+  mgrep-ut:
+    image: ontoportal/mgrep:0.0.2
+    platform: linux/amd64
+    healthcheck:
+      test: ["CMD", "nc", "-z", "-v", "localhost", "55556"]
+      start_period: 3s
+      interval: 10s
+      timeout: 5s
+      retries: 5
+
+  agraph-ut:
+    image: franzinc/agraph:v8.0.0
+    platform: linux/amd64
+    environment:
+      - AGRAPH_SUPER_USER=test
+      - AGRAPH_SUPER_PASSWORD=xyzzy
+    shm_size: 1g
+    # ports:
+    #  - 10035:10035
+    command: >
+       bash -c "/agraph/bin/agraph-control --config /agraph/etc/agraph.cfg start
+       ; agtool repos create bioportal_test
+       ; agtool users add anonymous
+       ; agtool users grant anonymous root:bioportal_test:rw
+       ; tail -f /agraph/data/agraph.log"
+    healthcheck:
+      test: ["CMD-SHELL", "agtool storage-report bioportal_test || exit 1"]
+      start_period: 20s
+      interval: 60s
+      timeout: 5s
+      retries: 3
+    profiles:
+      - agraph
+
+volumes:
+  bundle:
diff --git a/lib/ncbo_cron.rb b/lib/ncbo_cron.rb
index 309b15db..884e6b33 100644
--- a/lib/ncbo_cron.rb
+++ b/lib/ncbo_cron.rb
@@ -6,6 +6,7 @@
 require 'ncbo_annotator'
 require_relative 'ncbo_cron/config'
 require_relative 'ncbo_cron/ontology_submission_parser'
+require_relative 'ncbo_cron/ontology_submission_eradicator'
 require_relative 'ncbo_cron/ontology_pull'
 require_relative 'ncbo_cron/scheduler'
 require_relative 'ncbo_cron/query_caching'
diff --git a/lib/ncbo_cron/config.rb b/lib/ncbo_cron/config.rb
index 49db0fb4..6d3db51e 100644
--- a/lib/ncbo_cron/config.rb
+++ b/lib/ncbo_cron/config.rb
@@ -40,16 +40,8 @@ def config(&block)
     @settings.enable_spam_deletion ||= true
     # enable update check (vor VMs)
     @settings.enable_update_check ||= true
-
-
-
-
     # enable mgrep dictionary generation job
-    @settings.enable_dictionary_generation ||= true
-
-
-
-
+    @settings.enable_dictionary_generation_cron_job ||= false
 
     # UMLS auto-pull
     @settings.pull_umls_url ||= ""
@@ -85,17 +77,9 @@ def config(&block)
     @settings.cron_obofoundry_sync ||= "0 8 * * 1,2,3,4,5"
     # 00 3 * * * - run daily at 3:00AM
     @settings.cron_update_check ||= "00 3 * * *"
-
-
-
-
     # mgrep dictionary generation schedule
     # 30 3 * * * - run daily at 3:30AM
-    @settings.cron_dictionary_generation ||= "30 3 * * *"
-
-
-
-
+    @settings.cron_dictionary_generation_cron_job ||= "30 3 * * *"
 
     @settings.log_level ||= :info
     unless (@settings.log_path && File.exists?(@settings.log_path))
diff --git a/lib/ncbo_cron/ontologies_report.rb b/lib/ncbo_cron/ontologies_report.rb
index 43f0505f..99463a0a 100644
--- a/lib/ncbo_cron/ontologies_report.rb
+++ b/lib/ncbo_cron/ontologies_report.rb
@@ -345,7 +345,7 @@ def good_classes(submission, report)
         page_size = 1000
         classes_size = 10
         good_classes = Array.new
-        paging = LinkedData::Models::Class.in(submission).include(:prefLabel, :synonym, metrics: :classes).page(page_num, page_size)
+        paging = LinkedData::Models::Class.in(submission).include(:prefLabel, :synonym, submission: [metrics: :classes]).page(page_num, page_size)
         cls_count = submission.class_count(@logger).to_i
         # prevent a COUNT SPARQL query if possible
         paging.page_count_set(cls_count) if cls_count > -1
diff --git a/lib/ncbo_cron/ontology_analytics.rb b/lib/ncbo_cron/ontology_analytics.rb
index e06fcd77..c5a4de00 100644
--- a/lib/ncbo_cron/ontology_analytics.rb
+++ b/lib/ncbo_cron/ontology_analytics.rb
@@ -1,117 +1,223 @@
 require 'logger'
-require 'google/apis/analytics_v3'
-require 'google/api_client/auth/key_utils'
+require 'json'
+require 'benchmark'
+require 'google/analytics/data'
+
 
 module NcboCron
   module Models
 
     class OntologyAnalytics
-      ONTOLOGY_ANALYTICS_REDIS_FIELD = "ontology_analytics"
+      ONTOLOGY_ANALYTICS_REDIS_FIELD = 'ontology_analytics'
+      UA_START_DATE = '2013-10-01'
+      GA4_START_DATE = '2023-06-01'
 
       def initialize(logger)
         @logger = logger
       end
 
       def run
-        redis = Redis.new(:host => NcboCron.settings.redis_host, :port => NcboCron.settings.redis_port)
+        redis = Redis.new(:host => LinkedData.settings.ontology_analytics_redis_host, :port => LinkedData.settings.ontology_analytics_redis_port)
         ontology_analytics = fetch_ontology_analytics
+        File.open(NcboCron.settings.analytics_path_to_ga_data_file, 'w') do |f|
+          f.write(ontology_analytics.to_json)
+        end
         redis.set(ONTOLOGY_ANALYTICS_REDIS_FIELD, Marshal.dump(ontology_analytics))
       end
 
       def fetch_ontology_analytics
-        google_client = authenticate_google
-        aggregated_results = Hash.new
-        start_year = Date.parse(NcboCron.settings.analytics_start_date).year || 2013
-        ont_acronyms = LinkedData::Models::Ontology.where.include(:acronym).all.map {|o| o.acronym}
-        # ont_acronyms = ["NCIT", "ONTOMA", "CMPO", "AEO", "SNOMEDCT"]
-        filter_str = (NcboCron.settings.analytics_filter_str.nil? || NcboCron.settings.analytics_filter_str.empty?) ? "" : ";#{NcboCron.settings.analytics_filter_str}"
-
-        ont_acronyms.each do |acronym|
+        @logger.info "Starting Google Analytics refresh..."
+        @logger.flush
+        full_data = nil
+
+        time = Benchmark.realtime do
           max_results = 10000
-          num_results = 10000
-          start_index = 1
-          results = nil
-
-          loop do
-            results = google_client.get_ga_data(
-              ids          = NcboCron.settings.analytics_profile_id,
-              start_date   = NcboCron.settings.analytics_start_date,
-              end_date     = Date.today.to_s,
-              metrics      = 'ga:pageviews',
-              {
-                dimensions:  'ga:pagePath,ga:year,ga:month',
-                filters:     "ga:pagePath=~^(\\/ontologies\\/#{acronym})(\\/?\\?{0}|\\/?\\?{1}.*)$#{filter_str}",
-                start_index: start_index,
-                max_results: max_results
-              }
-            )
-            results.rows ||= []
-            start_index += max_results
-            num_results = results.rows.length
-            @logger.info "Acronym: #{acronym}, Results: #{num_results}, Start Index: #{start_index}"
-            @logger.flush
-
-            results.rows.each do |row|
-              if aggregated_results.has_key?(acronym)
-                # year
-                if aggregated_results[acronym].has_key?(row[1].to_i)
-                  # month
-                  if aggregated_results[acronym][row[1].to_i].has_key?(row[2].to_i)
-                    aggregated_results[acronym][row[1].to_i][row[2].to_i] += row[3].to_i
+          aggregated_results = Hash.new
+
+          @logger.info "Fetching all ontology acronyms from backend..."
+          @logger.flush
+          ont_acronyms = LinkedData::Models::Ontology.where.include(:acronym).all.map {|o| o.acronym}
+          # ont_acronyms = ["NCIT", "SNOMEDCT", "MEDDRA"]
+          @logger.info "Authenticating with the Google Analytics Endpoint..."
+          @logger.flush
+          google_client = authenticate_google
+
+          date_range = Google::Analytics::Data::V1beta::DateRange.new(
+            start_date: GA4_START_DATE,
+            end_date: Date.today.to_s
+          )
+          metrics_page_views = Google::Analytics::Data::V1beta::Metric.new(
+            name: "screenPageViews"
+          )
+          dimension_path = Google::Analytics::Data::V1beta::Dimension.new(
+            name: "pagePath"
+          )
+          dimension_year = Google::Analytics::Data::V1beta::Dimension.new(
+            name: "year"
+          )
+          dimension_month = Google::Analytics::Data::V1beta::Dimension.new(
+            name: "month"
+          )
+          string_filter = Google::Analytics::Data::V1beta::Filter::StringFilter.new(
+            match_type: Google::Analytics::Data::V1beta::Filter::StringFilter::MatchType::FULL_REGEXP
+          )
+          filter = Google::Analytics::Data::V1beta::Filter.new(
+            field_name: "pagePath",
+            string_filter: string_filter
+          )
+          filter_expression = Google::Analytics::Data::V1beta::FilterExpression.new(
+            filter: filter
+          )
+          order_year = Google::Analytics::Data::V1beta::OrderBy::DimensionOrderBy.new(
+            dimension_name: "year"
+          )
+          orderby_year = Google::Analytics::Data::V1beta::OrderBy.new(
+            desc: false,
+            dimension: order_year
+          )
+          order_month = Google::Analytics::Data::V1beta::OrderBy::DimensionOrderBy.new(
+            dimension_name: "month"
+          )
+          orderby_month = Google::Analytics::Data::V1beta::OrderBy.new(
+            desc: false,
+            dimension: order_month
+          )
+          @logger.info "Fetching GA4 analytics for all ontologies..."
+          @logger.flush
+
+          ont_acronyms.each do |acronym|
+            start_index = 0
+            string_filter.value = "^(\\/ontologies\\/#{acronym})(\\/?\\?{0}|\\/?\\?{1}.*)$"
+
+            loop do
+              request = Google::Analytics::Data::V1beta::RunReportRequest.new(
+                property: "properties/#{NcboCron.settings.analytics_property_id}",
+                metrics: [metrics_page_views],
+                dimension_filter: filter_expression,
+                dimensions: [dimension_path, dimension_year, dimension_month],
+                date_ranges: [date_range],
+                order_bys: [orderby_year, orderby_month],
+                offset: start_index,
+                limit: max_results
+              )
+              response = google_client.run_report request
+
+              response.rows ||= []
+              start_index += max_results
+              num_results = response.rows.length
+              @logger.info "Acronym: #{acronym}, Results: #{num_results}, Start Index: #{start_index}"
+              @logger.flush
+
+              response.rows.each do |row|
+                row_h = row.to_h
+                year_month_hits =  row_h[:dimension_values].map.with_index {
+                  |v, i| i > 0 ? v[:value].to_i.to_s : row_h[:metric_values][0][:value].to_i
+                }.rotate(1)
+
+                if aggregated_results.has_key?(acronym)
+                  # year
+                  if aggregated_results[acronym].has_key?(year_month_hits[0])
+                    # month
+                    if aggregated_results[acronym][year_month_hits[0]].has_key?(year_month_hits[1])
+                      aggregated_results[acronym][year_month_hits[0]][year_month_hits[1]] += year_month_hits[2]
+                    else
+                      aggregated_results[acronym][year_month_hits[0]][year_month_hits[1]] = year_month_hits[2]
+                    end
                   else
-                    aggregated_results[acronym][row[1].to_i][row[2].to_i] = row[3].to_i
+                    aggregated_results[acronym][year_month_hits[0]] = Hash.new
+                    aggregated_results[acronym][year_month_hits[0]][year_month_hits[1]] = year_month_hits[2]
                   end
                 else
-                  aggregated_results[acronym][row[1].to_i] = Hash.new
-                  aggregated_results[acronym][row[1].to_i][row[2].to_i] = row[3].to_i
+                  aggregated_results[acronym] = Hash.new
+                  aggregated_results[acronym][year_month_hits[0]] = Hash.new
+                  aggregated_results[acronym][year_month_hits[0]][year_month_hits[1]] = year_month_hits[2]
                 end
-              else
-                aggregated_results[acronym] = Hash.new
-                aggregated_results[acronym][row[1].to_i] = Hash.new
-                aggregated_results[acronym][row[1].to_i][row[2].to_i] = row[3].to_i
               end
-            end
+              break if num_results < max_results
+            end # loop
+          end # ont_acronyms
+          @logger.info "Refresh complete"
+          @logger.flush
+          full_data = merge_and_fill_missing_data(aggregated_results)
+        end # Benchmark.realtime
+        @logger.info "Completed Google Analytics refresh in #{(time/60).round(1)} minutes."
+        @logger.flush
+        full_data
+      end
 
-            if num_results < max_results
-              # fill up non existent years
-              (start_year..Date.today.year).each do |y|
-                aggregated_results[acronym] = Hash.new if aggregated_results[acronym].nil?
-                aggregated_results[acronym][y] = Hash.new unless aggregated_results[acronym].has_key?(y)
+      def merge_and_fill_missing_data(ga4_data)
+        ua_data = {}
+
+        if File.exists?(NcboCron.settings.analytics_path_to_ua_data_file) &&
+            !File.zero?(NcboCron.settings.analytics_path_to_ua_data_file)
+          @logger.info "Merging GA4 and UA data..."
+          @logger.flush
+          ua_data_file = File.read(NcboCron.settings.analytics_path_to_ua_data_file)
+          ua_data = JSON.parse(ua_data_file)
+          ua_ga4_intersecting_year = Date.parse(GA4_START_DATE).year.to_s
+          ua_ga4_intersecting_month = Date.parse(GA4_START_DATE).month.to_s
+
+          # add up hits for June of 2023 (the only intersecting month between UA and GA4)
+          ua_data.each do |acronym, _|
+            if ga4_data.has_key?(acronym)
+              if ga4_data[acronym][ua_ga4_intersecting_year].has_key?(ua_ga4_intersecting_month)
+                ua_data[acronym][ua_ga4_intersecting_year][ua_ga4_intersecting_month] +=
+                    ga4_data[acronym][ua_ga4_intersecting_year][ua_ga4_intersecting_month]
+                # delete data for June of 2023 from ga4_data to avoid overwriting when merging
+                ga4_data[acronym][ua_ga4_intersecting_year].delete(ua_ga4_intersecting_month)
               end
-              # fill up non existent months with zeros
-              (1..12).each { |n| aggregated_results[acronym].values.each { |v| v[n] = 0 unless v.has_key?(n) } }
-              break
             end
           end
         end
 
-        @logger.info "Completed ontology analytics refresh..."
+        # merge ua and ga4 data
+        merged_data = ua_data.deep_merge(ga4_data)
+        # fill missing years and months
+        @logger.info "Filling in missing years data..."
         @logger.flush
+        fill_missing_data(merged_data)
+        # sort acronyms, years and months
+        @logger.info "Sorting final data..."
+        @logger.flush
+        sort_ga_data(merged_data)
+      end
+
+      def fill_missing_data(ga_data)
+        # fill up non existent years
+        start_year = Date.parse(UA_START_DATE).year
+
+        ga_data.each do |acronym, _|
+          (start_year..Date.today.year).each do |y|
+            ga_data[acronym] = Hash.new if ga_data[acronym].nil?
+            ga_data[acronym][y.to_s] = Hash.new unless ga_data[acronym].has_key?(y.to_s)
+          end
+          # fill up non existent months with zeros
+          (1..12).each { |n| ga_data[acronym].values.each { |v| v[n.to_s] = 0 unless v.has_key?(n.to_s) } }
+        end
+      end
 
-        aggregated_results
+      def sort_ga_data(ga_data)
+        ga_data.transform_values { |value|
+          value.transform_values { |val|
+            val.sort_by { |key, _| key.to_i }.to_h
+          }.sort_by { |k, _| k.to_i }.to_h
+        }.sort.to_h
       end
 
       def authenticate_google
-        Google::Apis::ClientOptions.default.application_name = NcboCron.settings.analytics_app_name
-        Google::Apis::ClientOptions.default.application_version = NcboCron.settings.analytics_app_version
-        # enable google api call retries in order to
-        # minigate analytics processing failure due to ocasional google api timeouts and other outages
-        Google::Apis::RequestOptions.default.retries = 5
-        # uncoment to enable logging for debugging purposes
-        # Google::Apis.logger.level = Logger::DEBUG
-        # Google::Apis.logger = @logger
-        client = Google::Apis::AnalyticsV3::AnalyticsService.new
-        key = Google::APIClient::KeyUtils::load_from_pkcs12(NcboCron.settings.analytics_path_to_key_file, 'notasecret')
-        client.authorization = Signet::OAuth2::Client.new(
-            :token_credential_uri => 'https://accounts.google.com/o/oauth2/token',
-            :audience             => 'https://accounts.google.com/o/oauth2/token',
-            :scope                => 'https://www.googleapis.com/auth/analytics.readonly',
-            :issuer               => NcboCron.settings.analytics_service_account_email_address,
-            :signing_key          => key
-        ).tap { |auth| auth.fetch_access_token! }
-        client
+        Google::Analytics::Data.analytics_data do |config|
+          config.credentials = NcboCron.settings.analytics_path_to_key_file
+        end
       end
-    end
+    end # class
+
+  end
+end
+
+class ::Hash
+  def deep_merge(second)
+    merger = proc { |key, v1, v2| Hash === v1 && Hash === v2 ? v1.merge(v2, &merger) : v2 }
+    self.merge(second, &merger)
   end
 end
 
@@ -120,7 +226,8 @@ def authenticate_google
 # require 'ncbo_annotator'
 # require 'ncbo_cron/config'
 # require_relative '../../config/config'
-# ontology_analytics_log_path = File.join("logs", "ontology-analytics.log")
-# ontology_analytics_logger = Logger.new(ontology_analytics_log_path)
+# # ontology_analytics_log_path = File.join("logs", "ontology-analytics.log")
+# # ontology_analytics_logger = Logger.new(ontology_analytics_log_path)
+# ontology_analytics_logger = Logger.new(STDOUT)
 # NcboCron::Models::OntologyAnalytics.new(ontology_analytics_logger).run
 # ./bin/ncbo_cron --disable-processing true --disable-pull true --disable-flush true --disable-warmq true --disable-ontologies-report true --disable-mapping-counts true --disable-spam-deletion true --ontology-analytics '14 * * * *'
diff --git a/lib/ncbo_cron/ontology_helper.rb b/lib/ncbo_cron/ontology_helper.rb
new file mode 100644
index 00000000..42534768
--- /dev/null
+++ b/lib/ncbo_cron/ontology_helper.rb
@@ -0,0 +1,185 @@
+require 'logger'
+
+module NcboCron
+  module Helpers
+    module OntologyHelper
+
+      REDIS_SUBMISSION_ID_PREFIX = "sub:"
+      PROCESS_QUEUE_HOLDER = "parseQueue"
+      PROCESS_ACTIONS = {
+        :process_rdf => true,
+        :generate_labels => true,
+        :index_search => true,
+        :index_properties => true,
+        :run_metrics => true,
+        :process_annotator => true,
+        :diff => true,
+        :remote_pull => false
+      }
+
+      class RemoteFileException < StandardError
+        attr_reader :submission
+
+        def initialize(submission)
+          super
+          @submission = submission
+        end
+      end
+
+      def self.do_ontology_pull(ontology_acronym, enable_pull_umls = false, umls_download_url = '', logger = nil,
+                                add_to_queue = true)
+        logger ||= Logger.new($stdout)
+        ont = LinkedData::Models::Ontology.find(ontology_acronym).include(:acronym).first
+        new_submission = nil
+        raise StandardError, "Ontology #{ontology_acronym} not found" if ont.nil?
+
+        last = ont.latest_submission(status: :any)
+        raise StandardError, "No submission found for #{ontology_acronym}" if last.nil?
+
+        last.bring(:hasOntologyLanguage) if last.bring?(:hasOntologyLanguage)
+        if !enable_pull_umls && last.hasOntologyLanguage.umls?
+          raise StandardError, "Pull umls not enabled"
+        end
+
+        last.bring(:pullLocation) if last.bring?(:pullLocation)
+        raise StandardError, "#{ontology_acronym} has no pullLocation" if last.pullLocation.nil?
+
+        last.bring(:uploadFilePath) if last.bring?(:uploadFilePath)
+
+        if last.hasOntologyLanguage.umls? && umls_download_url && !umls_download_url.empty?
+          last.pullLocation = RDF::URI.new(umls_download_url + last.pullLocation.split("/")[-1])
+          logger.info("Using alternative download for umls #{last.pullLocation.to_s}")
+          logger.flush
+        end
+
+        if last.remote_file_exists?(last.pullLocation.to_s)
+          logger.info "Checking download for #{ont.acronym}"
+          logger.info "Location: #{last.pullLocation.to_s}"; logger.flush
+          file, filename = last.download_ontology_file
+          file, md5local, md5remote, new_file_exists = self.new_file_exists?(file, last)
+
+          if new_file_exists
+            logger.info "New file found for #{ont.acronym}\nold: #{md5local}\nnew: #{md5remote}"
+            logger.flush()
+            new_submission = self.create_submission(ont, last, file, filename, logger, add_to_queue)
+          else
+            logger.info "There is no new file found for #{ont.acronym}"
+            logger.flush()
+          end
+
+          file.close
+          new_submission
+        else
+          raise self::RemoteFileException.new(last)
+        end
+      end
+
+      def self.create_submission(ont, sub, file, filename, logger = nil, add_to_queue = true, new_version = nil,
+                                 new_released = nil)
+        logger ||= Kernel.const_defined?("LOGGER") ? Kernel.const_get("LOGGER") : Logger.new(STDOUT)
+        new_sub = LinkedData::Models::OntologySubmission.new
+
+        sub.bring_remaining
+        sub.loaded_attributes.each do |attr|
+          new_sub.send("#{attr}=", sub.send(attr))
+        end
+
+        submission_id = ont.next_submission_id()
+        new_sub.submissionId = submission_id
+        file_location = LinkedData::Models::OntologySubmission.copy_file_repository(ont.acronym, submission_id, file, filename)
+        new_sub.uploadFilePath = file_location
+
+        unless new_version.nil?
+          new_sub.version = new_version
+        end
+
+        if new_released.nil?
+          new_sub.released = DateTime.now
+        else
+          new_sub.released = DateTime.parse(new_released)
+        end
+        new_sub.submissionStatus = nil
+        new_sub.creationDate = nil
+        new_sub.missingImports = nil
+        new_sub.metrics = nil
+        full_file_path = File.expand_path(file_location)
+
+        # check if OWLAPI is able to parse the file before creating a new submission
+        owlapi = LinkedData::Parser::OWLAPICommand.new(
+          full_file_path,
+          File.expand_path(new_sub.data_folder.to_s),
+          logger: logger)
+        owlapi.disable_reasoner
+        parsable = true
+
+        begin
+          owlapi.parse
+        rescue Exception => e
+          logger.error("The new file for ontology #{ont.acronym}, submission id: #{submission_id} did not clear OWLAPI: #{e.class}: #{e.message}\n#{e.backtrace.join("\n\t")}")
+          logger.error("A new submission has NOT been created.")
+          logger.flush
+          parsable = false
+        end
+
+        if parsable
+          if new_sub.valid?
+            new_sub.save()
+
+            if add_to_queue
+              self.queue_submission(new_sub, { all: true })
+              logger.info("OntologyPull created a new submission (#{submission_id}) for ontology #{ont.acronym}")
+            end
+          else
+            logger.error("Unable to create a new submission for ontology #{ont.acronym} with id #{submission_id}: #{new_sub.errors}")
+            logger.flush
+          end
+        else
+          # delete the bad file
+          File.delete full_file_path if File.exist? full_file_path
+        end
+        new_sub
+      end
+
+      def self.queue_submission(submission, actions={:all => true})
+        redis = Redis.new(:host => NcboCron.settings.redis_host, :port => NcboCron.settings.redis_port)
+
+        if actions[:all]
+          actions = PROCESS_ACTIONS.dup
+        else
+          actions.delete_if {|k, v| !PROCESS_ACTIONS.has_key?(k)}
+        end
+        actionStr = MultiJson.dump(actions)
+        redis.hset(PROCESS_QUEUE_HOLDER, get_prefixed_id(submission.id), actionStr) unless actions.empty?
+      end
+
+      def self.get_prefixed_id(id)
+        "#{REDIS_SUBMISSION_ID_PREFIX}#{id}"
+      end
+
+      def self.last_fragment_of_uri(uri)
+        uri.to_s.split("/")[-1]
+      end
+
+      def self.acronym_from_submission_id(submissionID)
+        submissionID.to_s.split("/")[-3]
+      end
+
+      def self.new_file_exists?(file, last)
+        file = File.open(file.path, "rb")
+        remote_contents = file.read
+        md5remote = Digest::MD5.hexdigest(remote_contents)
+
+        if last.uploadFilePath && File.exist?(last.uploadFilePath)
+          file_contents = open(last.uploadFilePath) { |f| f.read }
+          md5local = Digest::MD5.hexdigest(file_contents)
+          new_file_exists = (not md5remote.eql?(md5local))
+        else
+          # There is no existing file, so let's create a submission with the downloaded one
+          new_file_exists = true
+        end
+        return file, md5local, md5remote, new_file_exists
+      end
+
+    end
+  end
+end
\ No newline at end of file
diff --git a/lib/ncbo_cron/ontology_pull.rb b/lib/ncbo_cron/ontology_pull.rb
index ac6da70e..c554c95e 100644
--- a/lib/ncbo_cron/ontology_pull.rb
+++ b/lib/ncbo_cron/ontology_pull.rb
@@ -1,18 +1,11 @@
-require 'open-uri'
 require 'logger'
-require_relative 'ontology_submission_parser'
+require_relative 'ontology_helper'
 
 module NcboCron
   module Models
 
     class OntologyPull
 
-      class RemoteFileException < StandardError
-      end
-
-      def initialize()
-      end
-
       def do_remote_ontology_pull(options = {})
         logger = options[:logger] || Logger.new($stdout)
         logger.info "UMLS auto-pull #{options[:enable_pull_umls] == true}"
@@ -23,65 +16,26 @@ def do_remote_ontology_pull(options = {})
         ontologies.select! { |ont| ont_to_include.include?(ont.acronym) } unless ont_to_include.empty?
         enable_pull_umls = options[:enable_pull_umls]
         umls_download_url = options[:pull_umls_url]
-        ontologies.sort! {|a, b| a.acronym.downcase <=> b.acronym.downcase}
+        ontologies.sort! { |a, b| a.acronym.downcase <=> b.acronym.downcase }
         new_submissions = []
 
         ontologies.each do |ont|
           begin
-            last = ont.latest_submission(status: :any)
-            next if last.nil?
-            last.bring(:hasOntologyLanguage) if last.bring?(:hasOntologyLanguage)
-            if !enable_pull_umls && last.hasOntologyLanguage.umls?
-              next
-            end
-            last.bring(:pullLocation) if last.bring?(:pullLocation)
-            next if last.pullLocation.nil?
-            last.bring(:uploadFilePath) if last.bring?(:uploadFilePath)
-
-            if last.hasOntologyLanguage.umls? && umls_download_url
-              last.pullLocation= RDF::URI.new(umls_download_url + last.pullLocation.split("/")[-1])
-              logger.info("Using alternative download for umls #{last.pullLocation.to_s}")
+            begin
+              new_submissions << NcboCron::Helpers::OntologyHelper.do_ontology_pull(ont.acronym,
+                                                       enable_pull_umls: enable_pull_umls,
+                                                       umls_download_url: umls_download_url,
+                                                       logger: logger, add_to_queue: true)
+            rescue NcboCron::Helpers::OntologyHelper::RemoteFileException => error
+              logger.info "RemoteFileException: No submission file at pull location #{error.submission.pullLocation.to_s} for ontology #{ont.acronym}."
               logger.flush
+              LinkedData::Utils::Notifications.remote_ontology_pull(error.submission)
             end
-
-            if last.remote_file_exists?(last.pullLocation.to_s)
-              logger.info "Checking download for #{ont.acronym}"
-              logger.info "Location: #{last.pullLocation.to_s}"; logger.flush
-              file, filename = last.download_ontology_file()
-              file = File.open(file.path, "rb")
-              remote_contents  = file.read
-              md5remote = Digest::MD5.hexdigest(remote_contents)
-
-              if last.uploadFilePath && File.exist?(last.uploadFilePath)
-                file_contents = open(last.uploadFilePath) { |f| f.read }
-                md5local = Digest::MD5.hexdigest(file_contents)
-                new_file_exists = (not md5remote.eql?(md5local))
-              else
-                # There is no existing file, so let's create a submission with the downloaded one
-                new_file_exists = true
-              end
-
-              if new_file_exists
-                logger.info "New file found for #{ont.acronym}\nold: #{md5local}\nnew: #{md5remote}"
-                logger.flush()
-                new_submissions << create_submission(ont, last, file, filename, logger)
-              end
-
-              file.close
-            else
-              begin
-                raise RemoteFileException
-              rescue RemoteFileException
-                logger.info "RemoteFileException: No submission file at pull location #{last.pullLocation.to_s} for ontology #{ont.acronym}."
-                logger.flush
-                LinkedData::Utils::Notifications.remote_ontology_pull(last)
-              end
-            end
-          rescue Exception => e
-            logger.error "Problem retrieving #{ont.acronym} in OntologyPull:\n" + e.message + "\n" + e.backtrace.join("\n\t")
-            logger.flush()
-            next
           end
+        rescue Exception => e
+          logger.error "Problem retrieving #{ont.acronym} in OntologyPull:\n" + e.message + "\n" + e.backtrace.join("\n\t")
+          logger.flush()
+          next
         end
 
         if options[:cache_clear] == true
@@ -93,70 +47,7 @@ def do_remote_ontology_pull(options = {})
         new_submissions
       end
 
-      def create_submission(ont, sub, file, filename, logger=nil,
-        add_to_pull=true,new_version=nil,new_released=nil)
-        logger ||= Kernel.const_defined?("LOGGER") ? Kernel.const_get("LOGGER") : Logger.new(STDOUT)
-        new_sub = LinkedData::Models::OntologySubmission.new
-
-        sub.bring_remaining
-        sub.loaded_attributes.each do |attr|
-          new_sub.send("#{attr}=", sub.send(attr))
-        end
-
-        submission_id = ont.next_submission_id()
-        new_sub.submissionId = submission_id
-        file_location = LinkedData::Models::OntologySubmission.copy_file_repository(ont.acronym, submission_id, file, filename)
-        new_sub.uploadFilePath = file_location
-        unless new_version.nil?
-          new_sub.version = new_version
-        end
-        if new_released.nil?
-          new_sub.released = DateTime.now
-        else
-          new_sub.released = DateTime.parse(new_released)
-        end
-        new_sub.submissionStatus = nil
-        new_sub.creationDate = nil
-        new_sub.missingImports = nil
-        new_sub.metrics = nil
-        full_file_path = File.expand_path(file_location)
-
-        # check if OWLAPI is able to parse the file before creating a new submission
-        owlapi = LinkedData::Parser::OWLAPICommand.new(
-            full_file_path,
-            File.expand_path(new_sub.data_folder.to_s),
-            logger: logger)
-        owlapi.disable_reasoner
-        parsable = true
-
-        begin
-          owlapi.parse
-        rescue Exception => e
-          logger.error("The new file for ontology #{ont.acronym}, submission id: #{submission_id} did not clear OWLAPI: #{e.class}: #{e.message}\n#{e.backtrace.join("\n\t")}")
-          logger.error("A new submission has NOT been created.")
-          logger.flush
-          parsable = false
-        end
-
-        if parsable
-          if new_sub.valid?
-            new_sub.save()
-
-            if add_to_pull
-              submission_queue = NcboCron::Models::OntologySubmissionParser.new
-              submission_queue.queue_submission(new_sub, {all: true})
-              logger.info("OntologyPull created a new submission (#{submission_id}) for ontology #{ont.acronym}")
-            end
-          else
-            logger.error("Unable to create a new submission in OntologyPull: #{new_sub.errors}")
-            logger.flush
-          end
-        else
-          # delete the bad file
-          File.delete full_file_path if File.exist? full_file_path
-        end
-        new_sub
-      end
+      private
 
       def redis_goo
         Redis.new(host: LinkedData.settings.goo_redis_host, port: LinkedData.settings.goo_redis_port, timeout: 30)
diff --git a/lib/ncbo_cron/ontology_rank.rb b/lib/ncbo_cron/ontology_rank.rb
index b60c2740..64de8844 100644
--- a/lib/ncbo_cron/ontology_rank.rb
+++ b/lib/ncbo_cron/ontology_rank.rb
@@ -1,5 +1,6 @@
 require 'logger'
 require 'benchmark'
+require_relative 'ontology_helper'
 
 module NcboCron
   module Models
@@ -66,7 +67,7 @@ def umls_scores(ontologies)
 
         ontologies.each do |ont|
           if ont.group && !ont.group.empty?
-            umls_gr = ont.group.select {|gr| acronym_from_id(gr.id.to_s).include?('UMLS')}
+            umls_gr = ont.group.select {|gr| NcboCron::Helpers::OntologyHelper.last_fragment_of_uri(gr.id.to_s).include?('UMLS')}
             scores[ont.acronym] = umls_gr.empty? ? 0 : 1
           else
             scores[ont.acronym] = 0
@@ -75,10 +76,6 @@ def umls_scores(ontologies)
         scores
       end
 
-      def acronym_from_id(id)
-        id.to_s.split("/")[-1]
-      end
-
       def normalize(x, xmin, xmax, ymin, ymax)
         xrange = xmax - xmin
         yrange = ymax - ymin
diff --git a/lib/ncbo_cron/ontology_submission_eradicator.rb b/lib/ncbo_cron/ontology_submission_eradicator.rb
new file mode 100644
index 00000000..40f8ef4d
--- /dev/null
+++ b/lib/ncbo_cron/ontology_submission_eradicator.rb
@@ -0,0 +1,39 @@
+module NcboCron
+  module Models
+
+    class OntologySubmissionEradicator
+      class RemoveSubmissionFileException < StandardError
+      end
+
+      class RemoveSubmissionDataException < StandardError
+      end
+
+      class RemoveNotArchivedSubmissionException < StandardError
+      end
+
+      def initialize()
+      end
+
+      def eradicate(submission , force=false)
+        submission.bring(:submissionStatus) if submission.bring(:submissionStatus)
+        if submission.archived? || force
+          delete_submission_data submission
+        else submission.ready?
+          raise RemoveNotArchivedSubmissionException, "Submission #{submission.submissionId} is not an archived submission"
+        end
+
+      end
+
+      private
+      def delete_submission_data(submission)
+        begin
+          submission.delete
+        rescue Exception => e
+          raise RemoveSubmissionDataException, e.message
+        end
+      end
+
+
+    end
+  end
+end
diff --git a/lib/ncbo_cron/ontology_submission_parser.rb b/lib/ncbo_cron/ontology_submission_parser.rb
index fe7a3e06..8d33f89d 100644
--- a/lib/ncbo_cron/ontology_submission_parser.rb
+++ b/lib/ncbo_cron/ontology_submission_parser.rb
@@ -1,38 +1,22 @@
 require 'multi_json'
+require_relative 'ontology_helper'
 
 module NcboCron
   module Models
 
     class OntologySubmissionParser
 
-      QUEUE_HOLDER = "parseQueue"
-      IDPREFIX = "sub:"
-
-      ACTIONS = {
-        :process_rdf => true,
-        :index_search => true,
-        :index_properties => true,
-        :run_metrics => true,
-        :process_annotator => true,
-        :diff => true
-      }
+      QUEUE_HOLDER = NcboCron::Helpers::OntologyHelper::PROCESS_QUEUE_HOLDER
+      ACTIONS = NcboCron::Helpers::OntologyHelper::PROCESS_ACTIONS
 
       def initialize()
       end
 
-      def queue_submission(submission, actions={:all => true})
-        redis = Redis.new(:host => NcboCron.settings.redis_host, :port => NcboCron.settings.redis_port)
-
-        if actions[:all]
-          actions = ACTIONS.dup
-        else
-          actions.delete_if {|k, v| !ACTIONS.has_key?(k)}
-        end
-        actionStr = MultiJson.dump(actions)
-        redis.hset(QUEUE_HOLDER, get_prefixed_id(submission.id), actionStr) unless actions.empty?
+      def queue_submission(submission, actions={ :all => true })
+        NcboCron::Helpers::OntologyHelper.queue_submission(submission, actions)
       end
 
-      def process_queue_submissions(options = {})
+      def process_queue_submissions(options={})
         logger = options[:logger]
         logger ||= Kernel.const_defined?("LOGGER") ? Kernel.const_get("LOGGER") : Logger.new(STDOUT)
         redis = Redis.new(:host => NcboCron.settings.redis_host, :port => NcboCron.settings.redis_port)
@@ -43,6 +27,20 @@ def process_queue_submissions(options = {})
           realKey = process_data[:key]
           key = process_data[:redis_key]
           redis.hdel(QUEUE_HOLDER, key)
+
+          # if :remote_pull is one of the actions, pull the ontology and halt if no new submission is found
+          # if a new submission is found, replace the submission ID with the new one and proceed with
+          # processing the remaining actions on the new submission
+          if actions.key?(:remote_pull) && actions[:remote_pull]
+            acronym = NcboCron::Helpers::OntologyHelper.acronym_from_submission_id(realKey)
+            new_submission = NcboCron::Helpers::OntologyHelper.do_ontology_pull(acronym, enable_pull_umls: false,
+                                                                                umls_download_url: '', logger: logger,
+                                                                                add_to_queue: false)
+            return unless new_submission
+            realKey = new_submission.id.to_s
+            actions.delete(:remote_pull)
+          end
+
           begin
             process_submission(logger, realKey, actions)
           rescue Exception => e
@@ -55,7 +53,7 @@ def process_queue_submissions(options = {})
       def queued_items(redis, logger=nil)
         logger ||= Kernel.const_defined?("LOGGER") ? Kernel.const_get("LOGGER") : Logger.new(STDOUT)
         all = redis.hgetall(QUEUE_HOLDER)
-        prefix_remove = Regexp.new(/^#{IDPREFIX}/)
+        prefix_remove = Regexp.new(/^#{NcboCron::Helpers::OntologyHelper::REDIS_SUBMISSION_ID_PREFIX}/)
         items = []
         all.each do |key, val|
           begin
@@ -75,10 +73,6 @@ def queued_items(redis, logger=nil)
         items
       end
 
-      def get_prefixed_id(id)
-        "#{IDPREFIX}#{id}"
-      end
-
       def zombie_classes_graphs
         query = "SELECT DISTINCT ?g WHERE { GRAPH ?g { ?s ?p ?o }}"
         class_graphs = []
@@ -165,7 +159,7 @@ def process_submission(logger, submission_id, actions=ACTIONS)
 
           # Check to make sure the file has been downloaded
           if sub.pullLocation && (!sub.uploadFilePath || !File.exist?(sub.uploadFilePath))
-            multi_logger.debug "Pull location found, but no file in the upload file path. Retrying download."
+            multi_logger.debug "Pull location found (#{sub.pullLocation}, but no file in the upload file path (#{sub.uploadFilePath}. Retrying download."
             file, filename = sub.download_ontology_file
             file_location = sub.class.copy_file_repository(sub.ontology.acronym, sub.submissionId, file, filename)
             file_location = "../" + file_location if file_location.start_with?(".") # relative path fix
@@ -190,6 +184,10 @@ def process_submission(logger, submission_id, actions=ACTIONS)
         end
       end
 
+      def get_prefixed_id(id)
+        NcboCron::Helpers::OntologyHelper.get_prefixed_id(id)
+      end
+
       private
 
       def archive_old_submissions(logger, sub)
@@ -219,10 +217,11 @@ def process_annotator(logger, sub)
           begin
             annotator = Annotator::Models::NcboAnnotator.new
             annotator.create_term_cache_for_submission(logger, sub)
-            # commenting this action out for now due to a problem with hgetall in redis
+            # this action only occurs if the CRON dictionary generation job is disabled
+            # if the CRON dictionary generation job is running,
+            # the dictionary will NOT be generated on each ontology parsing
             # see https://github.com/ncbo/ncbo_cron/issues/45 for details
-            # mgrep dictionary generation will occur as a separate CRON task
-            # annotator.generate_dictionary_file()
+            annotator.generate_dictionary_file() unless NcboCron.settings.enable_dictionary_generation_cron_job
           rescue Exception => e
             logger.error(e.message + "\n" + e.backtrace.join("\n\t"))
             logger.flush()
diff --git a/lib/ncbo_cron/spam_deletion.rb b/lib/ncbo_cron/spam_deletion.rb
index 8db5568b..e2ec64f8 100644
--- a/lib/ncbo_cron/spam_deletion.rb
+++ b/lib/ncbo_cron/spam_deletion.rb
@@ -25,8 +25,18 @@ def initialize(logger=nil)
       end
 
       def run
-        auth_token = Base64.decode64(NcboCron.settings.git_repo_access_token)
+        auth_token = NcboCron.settings.git_repo_access_token
         res = `curl --header 'Authorization: token #{auth_token}' --header 'Accept: application/vnd.github.v3.raw' --location #{FULL_FILE_PATH}`
+
+        begin
+          error_json = JSON.parse(res)
+          msg = "\nError while fetching the SPAM user list from #{FULL_FILE_PATH}: #{error_json}"
+          @logger.error(msg)
+          puts msg
+          exit
+        rescue JSON::ParserError
+          @logger.info("Successfully downloaded the SPAM user list from #{FULL_FILE_PATH}")
+        end
         usernames = res.split(",").map(&:strip)
         delete_spam(usernames)
       end
diff --git a/ncbo_cron.gemspec b/ncbo_cron.gemspec
index 821881d1..c8faa03d 100644
--- a/ncbo_cron.gemspec
+++ b/ncbo_cron.gemspec
@@ -8,7 +8,7 @@ Gem::Specification.new do |gem|
   gem.summary       = %q{}
   gem.homepage      = "https://github.com/ncbo/ncbo_cron"
 
-  gem.files         = `git ls-files`.split($\)
+  gem.files         = Dir['**/*']
   gem.executables   = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
   gem.test_files    = gem.files.grep(%r{^(test|spec|features)/})
   gem.name          = "ncbo_cron"
@@ -16,7 +16,7 @@ Gem::Specification.new do |gem|
 
   gem.add_dependency("dante")
   gem.add_dependency("goo")
-  gem.add_dependency("google-apis-analytics_v3")
+  gem.add_dependency("google-analytics-data")
   gem.add_dependency("mlanett-redis-lock")
   gem.add_dependency("multi_json")
   gem.add_dependency("ncbo_annotator")
diff --git a/rakelib/purl_management.rake b/rakelib/purl_management.rake
new file mode 100644
index 00000000..58cfadd7
--- /dev/null
+++ b/rakelib/purl_management.rake
@@ -0,0 +1,28 @@
+# Task for updating and adding missing purl for all ontologies
+#
+desc 'Purl Utilities'
+namespace :purl do
+  require 'bundler/setup'
+  # Configure the process for the current cron configuration.
+  require_relative '../lib/ncbo_cron'
+  config_exists = File.exist?(File.expand_path('../../config/config.rb', __FILE__))
+  abort('Please create a config/config.rb file using the config/config.rb.sample as a template') unless config_exists
+  require_relative '../config/config'
+
+  desc 'update purl for all ontologies'
+  task :update_all do
+    purl_client = LinkedData::Purl::Client.new
+    LinkedData::Models::Ontology.all.each do |ont|
+      ont.bring(:acronym)
+      acronym = ont.acronym
+
+      if purl_client.purl_exists(acronym)
+        puts "#{acronym} exists"
+        purl_client.fix_purl(acronym)
+      else
+        puts "#{acronym} DOES NOT exist"
+        purl_client.create_purl(acronym)
+      end
+    end
+  end
+end
diff --git a/test/docker-compose.yml b/test/docker-compose.yml
deleted file mode 100644
index 5bdb51f5..00000000
--- a/test/docker-compose.yml
+++ /dev/null
@@ -1,38 +0,0 @@
-version: '3.8'
-
-services:
-  unit-test:
-    build: ../.
-    environment:
-      - GOO_BACKEND_NAME=4store
-      - GOO_PORT=9000
-      - GOO_HOST=4store-ut
-      - REDIS_HOST=redis-ut
-      - REDIS_PORT=6379
-      - SOLR_HOST=solr-ut
-      - MGREP_HOST=mgrep-ut
-      - MGREP_PORT=55555
-    depends_on:
-      - solr-ut
-      - redis-ut
-      - 4store-ut
-      - mgrep-ut
-    #command: "bundle exec rake test TESTOPTS='-v' TEST='./test/parser/test_owl_api_command.rb'"
-    command: "wait-for-it solr-ut:8983 -- bundle exec rake test TESTOPTS='-v'"
-
-  solr-ut:
-    image: ontoportal/solr-ut:0.1
-
-  redis-ut:
-    image: redis
-
-  mgrep-ut:
-    image: ontoportal/mgrep-ncbo:0.1
-
-  4store-ut:
-    image: bde2020/4store
-    command: >
-      bash -c "4s-backend-setup --segments 4 ontoportal_kb
-      && 4s-backend ontoportal_kb
-      && 4s-httpd -D -s-1 -p 9000 ontoportal_kb"
-
diff --git a/test/run-unit-tests.sh b/test/run-unit-tests.sh
index 385898e6..b2c119da 100755
--- a/test/run-unit-tests.sh
+++ b/test/run-unit-tests.sh
@@ -3,10 +3,10 @@
 #
 # add config for unit testing
 [ -f ../config/config.rb ] || cp ../config/config.test.rb ../config/config.rb
-docker-compose build
+docker compose build
 
 # wait-for-it is useful since solr container might not get ready quick enough for the unit tests
-docker-compose run --rm unit-test wait-for-it solr-ut:8983 -- rake test TESTOPTS='-v'
-#docker-compose run --rm unit-test wait-for-it solr-ut:8983 -- bundle exec rake test TESTOPTS='-v' TEST='./test/controllers/test_annotator_controller.rb'
-#docker-compose up --exit-code-from unit-test
-docker-compose kill
+docker compose run --rm ruby bundle exec rake test TESTOPTS='-v'
+#docker compose run --rm ruby-agraph bundle exec rake test TESTOPTS='-v'
+#docker-compose run --rm ruby bundle exec rake test TESTOPTS='-v' TEST='./test/controllers/test_annotator_controller.rb'
+docker compose kill
diff --git a/test/test_case.rb b/test/test_case.rb
index 81a10aa6..75bb0454 100644
--- a/test/test_case.rb
+++ b/test/test_case.rb
@@ -1,3 +1,21 @@
+# Start simplecov if this is a coverage task or if it is run in the CI pipeline
+if ENV['COVERAGE'] == 'true' || ENV['CI'] == 'true'
+  require 'simplecov'
+  require 'simplecov-cobertura'
+  # https://github.com/codecov/ruby-standard-2
+  # Generate HTML and Cobertura reports which can be consumed by codecov uploader
+  SimpleCov.formatters = SimpleCov::Formatter::MultiFormatter.new([
+    SimpleCov::Formatter::HTMLFormatter,
+    SimpleCov::Formatter::CoberturaFormatter
+  ])
+  SimpleCov.start do
+    add_filter '/test/'
+    add_filter 'app.rb'
+    add_filter 'init.rb'
+    add_filter '/config/'
+  end
+end
+
 require 'ontologies_linked_data'
 require_relative '../lib/ncbo_cron'
 require_relative '../config/config'
@@ -7,7 +25,7 @@
 require 'test/unit'
 
 # Check to make sure you want to run if not pointed at localhost
-safe_host = Regexp.new(/localhost|-ut|ncbo-dev*|ncbo-unittest*/)
+safe_host = Regexp.new(/localhost|-ut/)
 unless LinkedData.settings.goo_host.match(safe_host) &&
        LinkedData.settings.search_server_url.match(safe_host) &&
        NcboCron.settings.redis_host.match(safe_host)
@@ -38,7 +56,7 @@ def count_pattern(pattern)
     return 0
   end
 
-  def backend_4s_delete
+  def backend_triplestore_delete
     raise StandardError, 'Too many triples in KB, does not seem right to run tests' unless
           count_pattern('?s ?p ?o') < 400000
 
@@ -71,7 +89,7 @@ def _run_suites(suites, type)
   end
 
   def _run_suite(suite, type)
-    backend_4s_delete
+    backend_triplestore_delete
     suite.before_suite if suite.respond_to?(:before_suite)
     super(suite, type)
   rescue Exception => e
@@ -80,7 +98,7 @@ def _run_suite(suite, type)
     puts 'Traced from:'
     raise e
   ensure
-    backend_4s_delete
+    backend_triplestore_delete
     suite.after_suite if suite.respond_to?(:after_suite)
   end
 end
diff --git a/test/test_ontology_pull.rb b/test/test_ontology_pull.rb
index 57fa9f47..ca3c6130 100644
--- a/test/test_ontology_pull.rb
+++ b/test/test_ontology_pull.rb
@@ -41,14 +41,14 @@ def self.after_suite
     @@redis.del NcboCron::Models::OntologySubmissionParser::QUEUE_HOLDER
   end
 
-  def test_remote_ontology_pull()
+  def test_remote_ontology_pull
     ontologies = init_ontologies(1)
     ont = LinkedData::Models::Ontology.find(ontologies[0].id).first
     ont.bring(:submissions) if ont.bring?(:submissions)
     assert_equal 1, ont.submissions.length
 
     pull = NcboCron::Models::OntologyPull.new
-    pull.do_remote_ontology_pull()
+    pull.do_remote_ontology_pull
 
     # check that the pull creates a new submission when the file has changed
     ont = LinkedData::Models::Ontology.find(ontologies[0].id).first
@@ -72,7 +72,33 @@ def test_remote_ontology_pull()
     ont = LinkedData::Models::Ontology.find(ontologies[0].id).first
     ont.bring(:submissions) if ont.bring?(:submissions)
     assert_equal 2, ont.submissions.length
-    pull.do_remote_ontology_pull()
+    pull.do_remote_ontology_pull
+    assert_equal 2, ont.submissions.length
+  end
+
+  def test_remote_pull_parsing_action
+    ontologies = init_ontologies(1, process_submissions: true)
+    ont = LinkedData::Models::Ontology.find(ontologies[0].id).first
+    ont.bring(:submissions) if ont.bring?(:submissions)
+    assert_equal 1, ont.submissions.length
+
+    # add this ontology to submission queue with :remote_pull action enabled
+    parser = NcboCron::Models::OntologySubmissionParser.new
+    actions = NcboCron::Models::OntologySubmissionParser::ACTIONS.dup
+    actions[:remote_pull] = true
+    parser.queue_submission(ont.submissions[0], actions)
+    parser.process_queue_submissions
+
+    # make sure there are now 2 submissions present
+    ont = LinkedData::Models::Ontology.find(ontologies[0].id).first
+    ont.bring(:submissions) if ont.bring?(:submissions)
+    assert_equal 2, ont.submissions.length
+
+    # verify that no new submission is created when the file has not changed
+    parser.queue_submission(ont.submissions[0], actions)
+    parser.process_queue_submissions
+    ont = LinkedData::Models::Ontology.find(ontologies[0].id).first
+    ont.bring(:submissions) if ont.bring?(:submissions)
     assert_equal 2, ont.submissions.length
   end
 
@@ -164,15 +190,16 @@ def test_no_pull_location
 
   private
 
-  def init_ontologies(submission_count)
-    ont_count, acronyms, ontologies = LinkedData::SampleData::Ontology.create_ontologies_and_submissions(ont_count: 1, submission_count: submission_count, process_submission: false)
+  def init_ontologies(submission_count, process_submissions = false)
+    ont_count, acronyms, ontologies = LinkedData::SampleData::Ontology.create_ontologies_and_submissions(
+                            ont_count: 1, submission_count: submission_count, process_submission: process_submissions)
     ontologies[0].bring(:submissions) if ontologies[0].bring?(:submissions)
     ontologies[0].submissions.each do |sub|
       sub.bring_remaining()
       sub.pullLocation = RDF::IRI.new(@@url)
       sub.save() rescue binding.pry
     end
-    return ontologies
+    ontologies
   end
 
 end
diff --git a/test/test_scheduler.rb b/test/test_scheduler.rb
index bac2f842..58808ea5 100644
--- a/test/test_scheduler.rb
+++ b/test/test_scheduler.rb
@@ -39,7 +39,7 @@ def test_scheduler
       sleep(5)
       finished_array = listen_string.split("\n")
 
-      assert finished_array.length >= 4
+      assert_operator 4, :<=, finished_array.length
 
       assert job1_thread.alive?
       job1_thread.kill