Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify and simplify README #38

Merged
merged 5 commits into from
Sep 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ ifeq ($(SCRATCH_PATH),)
SCRATCH_PATH = $(PWD)/scratch
endif

OCI_BINARY=docker
SING_BINARY=singularity
OCI_BINARY?=docker
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do differently? = here won't overwrite the variable if you set it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, for me

THE_VAR=foo
.PHONY:
test-env-var:
       echo ${THE_VAR}

THE_VAR is foo, even if I export THE_VAR=somethingelse

When I added the ?= it started working.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strange, ok, let's leave it like that then.

SING_BINARY?=singularity

DISTFILE_CACHE_CMD :=

Expand Down
75 changes: 56 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,51 +17,88 @@ cd opfvta-replication-2023

## How to re-run

**Note:** *If the `SCRATCH_PATH` variable is not defined for the `make` invocation, all intermediary results (approx. 400GB) will be stored in the `scratch/` directory, which is inside the directory of the repository.
This might go beyond the available space on the respective partition, crashing the workflow and possibly other programs.
It is advisable to check space availability on your partition before full reexecution, and if sufficient space is unavailable specify a `SCRATCH_PATH` on a partition with more available space.*

There are 2 distinct phases of executing this study, which differ strongly in both time and space requirements.
While they are hierarchically related, the results of the first step are version tracked, meaning that you can choose to only run the latter.

### I. Reexecuting the OPFVTA Article

This is by far the most time consuming and resource-intensive step as it re-computes all work that was required to generate the original OPFVTA article, starting from the bare raw data.
The requirements of this step are therefore the raw data (study data and mouse brain templates), and the article code, which are included in this repository as submodules and whose content can be fetched via a dedicated `make` target:
::: warning
Warnings:
1. We estimate that the analysis required more than 500GB, 400GB of which will be stored in a scratch directory, which is `./scratch/` by default and can be configured with the `SCRATCH_PATH` variable.
1. The analysis self-limits RAM to run on less powerful systems
1. Reexecuting the computation as well as the article is time consuming and resource-intensive, it is recommended to use a tool such as `tmux` or `screen` to preserve long running processes.
:::

First, retrieve the data and other large files:

```console
make submodule-data
```

Once the required content is fetched, you can reexecute the OPFVTA article via either of the following commands, depending on the desired platform:
Once the required content has been fetched, you can reexecute the OPFVTA article via `singularity` or `oci` containers.
This step generates intermediate results in the scratch directory and are not preserved by default, as configured in `scratch/.gitignore`.
The final result is a PDF article and its associated elements (mainly volumetric binary data, `.nii.gz` files) which will be stored in a datestamped and annotated directory under `outputs/`.
Most large files, including the results are stored and versioned via `git-annex` and therefore present in this repository, and your output can also be saved and recorded.

For apptainer/singularity:

```console
make analysis-singularity
```
_or_

With docker or podman, you can execute the analysis inside an OCI container.

```console
make analysis-oci
```

This produces a PDF article and its associated elements (mainly volumetric binary data, `.nii.gz` files) which will be stored in a datestamped and annotated directory under `outputs/`.
A number of outputs are recorded via `git-annex` and therefore present in this repository, and your output can also be saved and recorded.

### II. Reexecuting the Meta-Article

To avoid confusion, we use the term 'article' to refer to a version of the OPFVTA article, and 'meta-article' to refer to the paper regarding the reexecution process and findings.

### II. Reexecuting the Meta-Article
Generation of the meta-article uses files generated by the OPFVTA analysis which are expected to be in the `outputs/` directory.
Prior to generating the meta-article, `outputs/` must contain the data from previous analyses, which is not locally available by default.

This uses the aforementioned PDF files in `outputs/` in order to generate dynamic graphical elements, and subsequently compiles them alongside the document text via LaTeX into a novel and fully distinct article PDF.
To avoid confusion, please make sure you understand this is *not* another version of the OPFVTA article but a fully different text.
Note: Regenerating the OPFVTA article will create an additional pdf, but the previous pdfs are required to compare.

This phase requires fetching the actual binary content for the myriad PDF outputs of OPFVTA reexecution, and then running the `make article` target:
To fetch the OPFVTA analysis outputs:

```console
datalad get outputs/*/article.pdf
make article
```

A finer point here is that the dynamic elements of this article can be cached.
If you are not merely trying to get a PDF to read or working on the human-readable text — but instead working on the figure-generating code — it is advisable to always clean the dynamic elements in between re-making the article via the dedicated target.
Finally we generate new graphical elements and compile the text via LaTeX into a novel meta-article PDF.

The meta-article can then be generated by a container with all of the dependencies preinstalled using:

```console
make container-article
```

_or_

If you prefer to run the generation outside of a container, you will need to install dependencies (suggested to use distribution package manager, packages below are debian names):
- laTex
- biber
- datalad
- diff-pdf
- graphviz
- matplotlib
- pandas
- seaborn
- sklearn
- statsmodels
- yaml

You will also need to install sourceserifpro font using the tlmgr.
Copy link
Collaborator

@TheChymera TheChymera Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asmacdo

I have tried make article-clean && make article-container without the tlmgr code ( dfced35 ) and it worked. Can you confirm that it also works for you without that? if so, we can drop this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see other


```console
make container-article
```

#### Cleaning up between runs

The steps are designed to be idempotent, and some dynamically generated components will not be regenerated for subsequent runs.
If you are not merely trying to get a PDF to read or working on the human-readable text — but instead working on the figure-generating code — it is advisable to always deep-clean the dynamic elements in between re-making the article.

```console
make article-clean && make article
Expand Down
Loading