🚧 Under construction !!! 🚧
These are our dry-lab guidelines for data and project organization.
Disclaimer: These guidelines should define the minimal project requirements. The use of some of the tools is mandatory, e.g. git
and this GitHub/GitLab
platform, while other tools are only listed as recommendations. You are free to adapt the suggested setup to your needs and preferences as long as the chosen tools ensure the minimal requirements described here.
Also, feel free to propose changes and corrections, and to ask questions if something is not clear.
For information on the group and (neo)huxley servers, please see here.
- each project should have a repository, and a designated place for raw data and results which is backed up (see below)
- for large projects use the (neo)huxley servers or our group servers (see here)
- especially on the servers, pay attention to your storage consumption (see here)
- do not download databases/datasets and do not install tools in non-user folders (see here)
- make sure you know what you are doing then changing permissions on files/folders (see below)
❗ mandatory
- where
- on
(neo)huxley
- finished projects:
/mnt//projects/<project>/
- ongoing projects:
/work/projects/<project>/
- finished projects:
- on group servers
- TODO
- on
- how: main folders
raw/
: raw data files- (after we have established the raw data organization, these will be only
symlinks
) - remove write-permissions from raw data files/folders
- (after we have established the raw data organization, these will be only
data/
: additional data, e.g. metadata, external data- remove write-permissions from relevant files/folders
repo/
: cloned repositorywiki/
: cloned repository wikianalysis/
: final resultsusers/
: preliminary results, in sub-folders<ulogin>/
After a project has been finished make sure to
- remove redundant data
- compress files and folders if possible
- remove write permissions from important files and folders if possible
Note: See also Dry-lab cleanup.
❗ mandatory
There should be (at least) one private GitHub/GitLab repository containing a wiki and all the code used to do the analysis. Additional repositories should be avoided, except for public repositories for publications (see below) and sub-projects which, for some reason, cannot or should not be part of the main repository.
When creating a new project:
- GitHub/GitLab
- Project name/slug:
- as short and descriptive as possible, ideally the acronym from the corresponding grant application
- allowed characters: letters, digits,
-
,_
- Project URL:
https://github.com
, i.e. the namespace should bemoleco
- Visibility Level: Private, exceptions are
- implementation of a tool for public use
- public (sub-)repositories for publications (to store a copy of the final code version)
Note: if the repository is public, make sure that it does not contain any sensitive data in its code, wiki, milestones, issues etc.
Project content:
- Code
README.md
(you can have multiple, i.e. per analysis sub-folder).gitignore
: (content depends on your project and setup)- one sub-folder per analysis/workflow/user
- template for project structure when using
snakemake
(also useful for other set-ups)
- template for project structure when using
- Wiki:
- projects description
md5sum
files (for relevant external data)- any relevant project information and notes
Note: for more information on git
see also Dry-lab git.
ℹ️ info/recommendations
Suggestions for tools for workflow and tools management:
- Pipelines
- Software
- Container
If no tool management system is used you should provide a README
describing
- versions of all relevant tools (incl. dependencies)
- installation instructions, incl. download links
❗ mandatory
Using the repositories' wiki to save relevant notes and information about the projects and the analysis is compulsory.
Large relevant documents (e.g. metadata and presentations) can be saved together with the data on the server(s) in the folder data/
.
Create README
files in the code repository to describe how to re-run your workflow or analysis.
ℹ️ info/recommendations
Using GitHub/GitLab's issue system, is a great way to document project's progress. Issues can be used to keep track of things to be done.
ℹ️ info/recommendations
Depending on how much data you have and whether you are collaborating with someone, write your results to
- your
HOME
directory:~/<project>/
(/mnt/(neo)huxleygpfs/users/<ulogin>
) (for small projects) - your folder on
scratch
:/scratch/users/<ulogin>/<project>
(recommended)
ℹ️ info/recommendations
If you do not know about Linux file permissions, please familiarize yourself with this topic, e.g. here.
- do NOT use
chmod 777 <files/folders>
orchmod o+rwx <files/folder>
- most of the time you will not need to give users outside of your group all permissions to your files/folders
- use the flag
-R
to change permissions of a folder recursively - a folder is only readable if the bits
r
andx
are set - do NOT make files executable (set bit
x
) if they are not intended for execution- e.g. there is no reason to make a FASTA file executable
- remove write permissions from important files to avoid overwriting them
- IMPORTANT: deleting them with
rm -f
will still work
- IMPORTANT: deleting them with
- remove write permissions from important folders to avoid deleting their files by accident (even when using
rm -f
)- IMPORTANT: modifying existing files will still work
Useful references: