Skip to content

Commit

Permalink
docs: backup and restore ES
Browse files Browse the repository at this point in the history
  • Loading branch information
topless committed Mar 12, 2020
1 parent 7752c39 commit 8d877e9
Show file tree
Hide file tree
Showing 4 changed files with 50 additions and 2 deletions.
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@

# General information about the project.
project = u'Invenio-Stats'
copyright = u'2017, CERN'
copyright = u'2020, CERN'
author = u'CERN'

# The version info for the project you're documenting, acts as replacement for
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Invenio-Stats.
overview
configuration
usage
operations
examplesapp


Expand Down
47 changes: 47 additions & 0 deletions docs/operations.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
..
This file is part of Invenio.
Copyright (C) 2016-2020 CERN.
Invenio is free software; you can redistribute it and/or modify it
under the terms of the MIT License; see LICENSE file for more details.

Operations
==========

(#NOTE: The only copy of the raw events is stored in the index, so in case of an
Elasticsearch cluster failure/loss, the events will be lost.)

Since our statistics are stored in Elasticsearch in the unfortunate event that
our cluster goes down, we will find ourself in the unpleasant poition to have
lost all of our statistics for our service. Though a backup/restore mechanism
is adviced for projects in production. We will go though the defacto solution
for that and provide some possible alternatives for those who want a more fine
grained approach.

Backup ES
~~~~~~~~~

Possible options for backing up ES

- elasticdump (defacto)
- ES Snapshots
- Raw filesystem backups for each node... 🤢
- In terms of managing indices it might be also worth taking a look into the
Python library elasticsearch-curator.

downloads and views for Zenodo for January 2020

- 3M users (not crappy harvesters/ users)
- ~ 10Gb

Restore ES
~~~~~~~~~~

There is a saying that goes "A backup worked only when it got restored." This
section will take us through the restore process of the previous step. We will
have to bring our application close to the state it was before the ES cluster
failure.

.. note::
Some data loss is possible, from the time we notice the issue and restore
our cluster and its data to the last valid backed up dataset.
2 changes: 1 addition & 1 deletion invenio_stats/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ def register_events():
delete or archive old indices.
2. Aggregating
^^^^^^^^^^^^^^
~~~~~~~~~~~~~~
The :py:class:`~invenio_stats.processors.EventsIndexer` processor indexes raw
events. Querying those events can put a big strain on the Elasticsearch
Expand Down

0 comments on commit 8d877e9

Please sign in to comment.