Skip to content

Deploying Scipion: Data Management

David Antos edited this page Oct 5, 2018 · 12 revisions

Data handling in Scipion

In this section, we describe basic principles users need to understand in order to be able to use Scipion with OneData, and also give general recommendations on data handling.

General principles

Data handling for Scipion is based on following principles:

  • primary storage for the data is a OneData folder,
  • when deploying a Scipion instance, the data is copied onto computation nodes,
  • to protect the data, locking is in place--a primary storage folder can be deployed just once at the same time. To deploy it somewhere else, un-deploy where it is currently deployed,
  • working data is periodically copied into OneData to keep the primary storage in sync with the working environment,
  • when un-deploying a Scipion instance, the data is finally copied back into OneData and the computation nodes are cleaned and released.

When working with Scipion, it is customary that large input files need to be accessible from various folders. In order to prevent copying them unnecessarily, users use symbolic links to do so. Note that OneData does not support symbolic links at all (as in `Elvis didn't do no drugs'), making it an ideal storage place for such data. As symbolic links are an extremely useful mechanism with Scipion, symbolic links in the working folder are collected and stored in a special file in OneData so that they can be re-created when deploying the folder.

Important recommendations

Note:

  • do not use " and/or newline characters in file names
  • folder called by default ".deployment_status_do_not_delete" contains lock files. Do not tamper with it unless you fully understand what you are doing. Your work may be overwritten when those files get damaged.
  • file .deployment_status_do_not_delete/lock contains information where the deployment has been made, when and with what parameters. The lock file may need to be manually deleted in case when the deployment is destroyed in a non-standard way. Make doubly sure that the deployment no longer exists before removing the lock forcibly.
  • the OneData folder is mounted on computation nodes using FUSE. After deployment, the data in the FUSE mount get regularly overwritten with the working copy, do not access it directly. The working copy must reside on a volume supporting symbolic links, and performance reasons play a significant role in this design decision, too.
Clone this wiki locally