diff --git a/docs/conf.py b/docs/conf.py index 5147a87f..e6fb75b3 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -57,7 +57,7 @@ # General information about the project. project = u'Invenio-Stats' -copyright = u'2017, CERN' +copyright = u'2020, CERN' author = u'CERN' # The version info for the project you're documenting, acts as replacement for diff --git a/docs/index.rst b/docs/index.rst index 0e865347..8ff24a2f 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -22,6 +22,7 @@ Invenio-Stats. overview configuration usage + operations examplesapp diff --git a/docs/operations.rst b/docs/operations.rst new file mode 100644 index 00000000..5779262e --- /dev/null +++ b/docs/operations.rst @@ -0,0 +1,47 @@ +.. + This file is part of Invenio. + Copyright (C) 2016-2020 CERN. + + Invenio is free software; you can redistribute it and/or modify it + under the terms of the MIT License; see LICENSE file for more details. + +Operations +========== + +(#NOTE: The only copy of the raw events is stored in the index, so in case of an +Elasticsearch cluster failure/loss, the events will be lost.) + +Since our statistics are stored in Elasticsearch in the unfortunate event that +our cluster goes down, we will find ourself in the unpleasant poition to have +lost all of our statistics for our service. Though a backup/restore mechanism +is adviced for projects in production. We will go though the defacto solution +for that and provide some possible alternatives for those who want a more fine +grained approach. + +Backup ES +~~~~~~~~~ + +Possible options for backing up ES + +- elasticdump (defacto) +- ES Snapshots +- Raw filesystem backups for each node... 🤢 +- In terms of managing indices it might be also worth taking a look into the + Python library elasticsearch-curator. + +downloads and views for Zenodo for January 2020 + +- 3M users (not crappy harvesters/ users) +- ~ 10Gb + +Restore ES +~~~~~~~~~~ + +There is a saying that goes "A backup worked only when it got restored." This +section will take us through the restore process of the previous step. We will +have to bring our application close to the state it was before the ES cluster +failure. + +.. note:: + Some data loss is possible, from the time we notice the issue and restore + our cluster and its data to the last valid backed up dataset. diff --git a/invenio_stats/__init__.py b/invenio_stats/__init__.py index 04f98985..a94264e3 100644 --- a/invenio_stats/__init__.py +++ b/invenio_stats/__init__.py @@ -223,7 +223,7 @@ def register_events(): delete or archive old indices. 2. Aggregating -^^^^^^^^^^^^^^ +~~~~~~~~~~~~~~ The :py:class:`~invenio_stats.processors.EventsIndexer` processor indexes raw events. Querying those events can put a big strain on the Elasticsearch