Skip to content

Commit

Permalink
Merge pull request #244 from CovertLab/bug-fixes
Browse files Browse the repository at this point in the history
More bug fixes
  • Loading branch information
thalassemia authored Nov 20, 2024
2 parents 3b590a5 + 086b6e4 commit 0347403
Show file tree
Hide file tree
Showing 17 changed files with 167 additions and 54 deletions.
68 changes: 68 additions & 0 deletions doc/workflows.rst
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,13 @@ Configuration options for the ParCa are all located in a dictionary under the
regardless of whether running with :py:mod:`runscripts.workflow` or
:py:mod:`runscripts.parca`.

.. warning::
If running :py:mod:`runscripts.parca` and :py:mod:`ecoli.experiments.ecoli_master_sim`
manually instead of using :py:mod:`runscripts.workflow`, you must create two config JSON
files: one for the ParCa with a null ``sim_data_path`` and an ``outdir``
as described above and one for the simulation with
``sim_data_path`` set to ``{outdir}/kb/simData.cPickle``.

.. _variants:

--------
Expand Down Expand Up @@ -661,3 +668,64 @@ the output directory specified via ``out_dir`` or ``out_uri`` under the
or ``{experiment ID}_report.html``) and run ``bash .command.sh`` with
breakpoints set in the relevant code (``import ipdb; ipdb.set_trace()``)
for debugging.

.. tip::
To save space, you can safely delete ``nextflow_workdirs`` after you are finished
troubleshooting a particular workflow.


.. _troubleshooting:

---------------
Troubleshooting
---------------

To troubleshoot a workflow that was run with :py:mod:`runscripts.workflow`, you can
either inspect the HTML execution report ``{experiment ID}_report.html`` described
in :ref:`output` (nice summary and UI) or use the ``nextflow log`` command
(more flexible and efficient).

HTML Report
===========

Click "Tasks" in the top bar or scroll to the bottom of the page. Filter for failed
jobs by putting "failed" into the search bar. Find the work directory (``workdir`` column)
for each job. Navigate to the work directory for each failed job and
inspect ``.command.out`` (``STDOUT``), ``.command.err`` (``STDERR``), and
``.command.log`` (both) files.

CLI
===

Run ``nextflow log`` in the same directory in which you launched
the workflow to get the workflow name (should be of the form
``{adjective}_{famous last name}``). Use the ``-f`` and ``-F``
flags of ``nextflow log`` to show and filter the information that
you would like to see (``nextflow log -help`` for more info).

Among the fields that can be shown with ``-f`` are the ``stderr``,
``stdout``, and ``log``. This allows you to automatically retrieve
relevant output for all failed jobs in one go instead of manually
navigating to work directories and opening the relevant text files.

For more information about ``nextflow log``, see the
`official documentation <https://www.nextflow.io/docs/latest/reports.html#reports>`_.
For a description of some fields (non-exhaustive) that can be specified with
``-f``, refer to `this section <https://www.nextflow.io/docs/latest/reports.html#trace-fields>`_
of the official documentation.

As an example, to see the name, stderr, and workdir for all failed jobs
in a workflow called ``agitated_mendel``::

nextflow log agitated_mendel -f name,stderr,workdir -F "status == 'FAILED'"


Test Fixes
==========

After identifying the issue and applying fixes, you can test a failed job
in isolation by invoking ``bash .command.run`` inside the work
directory for that job. Once you have addressed all issues,
you relaunch the workflow by navigating back to the directory in which you
originally started the workflow and issuing the same command with the
added ``--resume`` option (see :ref:`fault_tolerance`).
7 changes: 6 additions & 1 deletion ecoli/library/initial_conditions.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,12 @@
masses_and_counts_for_homeostatic_target,
normalize,
)
from wholecell.utils.mc_complexation import mccFormComplexesWithPrebuiltMatrices
try:
from wholecell.utils.mc_complexation import mccFormComplexesWithPrebuiltMatrices
except ImportError as exc:
raise RuntimeError(
"Failed to import Cython module. Try running 'make clean compile'."
) from exc
from wholecell.utils.polymerize import computeMassIncrease
from wholecell.utils.random import stochasticRound

Expand Down
6 changes: 3 additions & 3 deletions ecoli/library/parquet_emitter.py
Original file line number Diff line number Diff line change
Expand Up @@ -762,7 +762,7 @@ def _finalize(self):
unified_schema, unified_schema_path, filesystem=self.filesystem
)
experiment_schema_path = os.path.join(
self.outdir, "history", self.experiment_id, EXPERIMENT_SCHEMA_SUFFIX
self.outdir, self.experiment_id, "history", EXPERIMENT_SCHEMA_SUFFIX
)
self.filesystem.create_dir(os.path.dirname(experiment_schema_path))
pq.write_metadata(
Expand Down Expand Up @@ -849,7 +849,7 @@ def emit(self, data: dict[str, Any]):
# create new folder for config / simulation output
try:
self.filesystem.delete_dir(os.path.dirname(outfile))
except FileNotFoundError:
except (FileNotFoundError, OSError):
pass
self.filesystem.create_dir(os.path.dirname(outfile))
self.executor.submit(
Expand All @@ -867,7 +867,7 @@ def emit(self, data: dict[str, Any]):
)
try:
self.filesystem.delete_dir(history_outdir)
except FileNotFoundError:
except (FileNotFoundError, OSError):
pass
self.filesystem.create_dir(history_outdir)
return
Expand Down
7 changes: 6 additions & 1 deletion reconstruction/ecoli/dataclasses/process/complexation.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,12 @@

import numpy as np
from wholecell.utils import units
from wholecell.utils.mc_complexation import mccBuildMatrices
try:
from wholecell.utils.mc_complexation import mccBuildMatrices
except ImportError as exc:
raise RuntimeError(
"Failed to import Cython module. Try running 'make clean compile'."
) from exc


class ComplexationError(Exception):
Expand Down
11 changes: 8 additions & 3 deletions runscripts/container/build-runtime.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# image with requirements.txt installed. If using Cloud Build, store the
# built image in the "vecoli" folder in the Google Artifact Registry.
#
# ASSUMES: The current working dir is the vivarium-ecoli/ project root.
# ASSUMES: The current working dir is the vEcoli/ project root.

set -eu

Expand Down Expand Up @@ -36,8 +36,13 @@ if [ "$RUN_LOCAL" = true ]; then
echo "=== Locally building WCM runtime Docker Image: ${RUNTIME_IMAGE} ==="
docker build -f runscripts/container/runtime/Dockerfile -t "${WCM_RUNTIME}" .
else
PROJECT="$(gcloud config get-value core/project)"
REGION=$(gcloud config get compute/region)
PROJECT=$(curl -H "Metadata-Flavor: Google" \
"http://metadata.google.internal/computeMetadata/v1/project/project-id" |
cut '-d/' -f4)
REGION=$(curl -H "Metadata-Flavor: Google" \
"http://metadata.google.internal/computeMetadata/v1/instance/zone" |
awk -F'/' '{print $NF}' |
sed 's/-[a-z]$//')
TAG="${REGION}-docker.pkg.dev/${PROJECT}/vecoli/${RUNTIME_IMAGE}"
echo "=== Cloud-building WCM runtime Docker Image: ${TAG} ==="
echo $TAG
Expand Down
16 changes: 9 additions & 7 deletions runscripts/container/build-wcm.sh
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
#!/bin/sh
# Use Google Cloud Build or local Docker install to build a personalized image
# with current state of the vivarium-ecoli repo. If using Cloud Build, store
# with current state of the vEcoli repo. If using Cloud Build, store
# the built image in the "vecoli" folder in the Google Artifact Registry.
#
# ASSUMES: The current working dir is the vivarium-ecoli/ project root.
# ASSUMES: The current working dir is the vEcoli/ project root.

set -eu

Expand All @@ -26,7 +26,7 @@ print_usage() {
while getopts 'r:w:l' flag; do
case "${flag}" in
r) RUNTIME_IMAGE="${OPTARG}" ;;
w) WCN_IMAGE="${OPTARG}" ;;
w) WCM_IMAGE="${OPTARG}" ;;
l) RUN_LOCAL="${OPTARG}" ;;
*) print_usage
exit 1 ;;
Expand All @@ -50,14 +50,16 @@ if [ "$RUN_LOCAL" = true ]; then
else
echo "=== Cloud-building WCM code Docker Image ${WCM_IMAGE} on ${RUNTIME_IMAGE} ==="
echo "=== git hash ${GIT_HASH}, git branch ${GIT_BRANCH} ==="
PROJECT_ID=$(gcloud config get project)
REGION=$(gcloud config get compute/region)
REGION=$(curl -H "Metadata-Flavor: Google" \
"http://metadata.google.internal/computeMetadata/v1/instance/zone" |
awk -F'/' '{print $NF}' |
sed 's/-[a-z]$//')
# This needs a config file to identify the project files to upload and the
# Dockerfile to run.
gcloud builds submit --timeout=15m --config runscripts/container/cloud_build.json \
--substitutions="_REGION=${REGION},_WCM_RUNTIME=${RUNTIME_IMAGE},\
_WCM_CODE=${WCM_IMAGE},_GIT_HASH=${GIT_HASH},_GIT_BRANCH=${GIT_BRANCH},\
_TIMESTAMP=${TIMESTAMP}"
_WCM_CODE=${WCM_IMAGE},_GIT_HASH=${GIT_HASH},_GIT_BRANCH=${GIT_BRANCH},\
_TIMESTAMP=${TIMESTAMP}"
fi

rm source-info/git_diff.txt
2 changes: 1 addition & 1 deletion runscripts/container/cloud_build.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"--build-arg", "git_branch=${_GIT_BRANCH}",
"--build-arg", "timestamp=${_TIMESTAMP}",
"-t", "${_REGION}-docker.pkg.dev/${PROJECT_ID}/vecoli/${_WCM_CODE}",
"-f", "cloud/docker/wholecell/Dockerfile",
"-f", "runscripts/container/wholecell/Dockerfile",
"."
]
}
Expand Down
6 changes: 3 additions & 3 deletions runscripts/container/runtime/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Container image #1: wcm-runtime.
# This Dockerfile builds the runtime environment for the whole cell model.
#
# To build this image locally from the vivarium-ecoli/ project root directory:
# To build this image locally from the vEcoli/ project root directory:
#
# > docker build -f cloud/docker/runtime/Dockerfile -t ${USER}-wcm-runtime .
#
Expand Down Expand Up @@ -33,8 +33,8 @@ ENV OPENBLAS_NUM_THREADS=1
COPY requirements.txt /
RUN (b1="" \
&& echo "Installing pips with '$b1'" \
&& pip install --no-cache-dir --upgrade pip setuptools wheel \
&& pip install --no-cache-dir numpy==1.26.3 $b1 \
&& pip install --no-cache-dir --upgrade pip setuptools==73.0.1 wheel \
&& pip install --no-cache-dir numpy==1.26.4 $b1 \
&& pip install --no-cache-dir -r requirements.txt $b1 \
&& umask 000 && mkdir -p /.aesara)

Expand Down
16 changes: 8 additions & 8 deletions runscripts/container/wholecell/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Container image #2: wcm-code.
# This Dockerfile builds a container image with the vivarium-ecoli whole cell model
# This Dockerfile builds a container image with the vEcoli whole cell model
# code, layered on the wcm-runtime container image.
#
# To build this image locally from the vivarium-ecoli/ project root directory:
# To build this image locally from the vEcoli/ project root directory:
#
# > docker build -f cloud/docker/wholecell/Dockerfile -t ${USER}-wcm-code --build-arg from=${USER}-wcm-runtime .
#
Expand Down Expand Up @@ -44,19 +44,19 @@ ENV IMAGE_GIT_HASH="$git_hash" \

LABEL application="Whole Cell Model of Escherichia coli" \
email="[email protected]" \
license="https://github.com/CovertLab/vivarium-ecoli/blob/master/LICENSE" \
license="https://github.com/CovertLab/vEcoli/blob/master/LICENSE" \
organization="Covert Lab at Stanford" \
website="https://www.covert.stanford.edu/"

COPY . /vivarium-ecoli
WORKDIR /vivarium-ecoli
COPY . /vEcoli
WORKDIR /vEcoli

RUN make clean compile
ENV PYTHONPATH=/vivarium-ecoli
ENV PYTHONPATH=/vEcoli

# Since this build runs as root, set permissions so running the container as
# another user will work: Aesara needs to write into the data dir it uses when
# running as a user with no home dir, and Parca writes into /vivarium-ecoli/cache/.
RUN (umask 000 && mkdir -p /.aesara /vivarium-ecoli/cache)
# running as a user with no home dir, and Parca writes into /vEcoli/cache/.
RUN (umask 000 && mkdir -p /.aesara /vEcoli/cache)

CMD ["/bin/bash"]
10 changes: 5 additions & 5 deletions runscripts/nextflow/analysis.nf
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
process analysisSingle {
publishDir "${params.publishDir}/${params.experimentId}/analyses/variant=${variant}/lineage_seed=${lineage_seed}/generation=${generation}/agent_id=${agent_id}"
publishDir "${params.publishDir}/${params.experimentId}/analyses/variant=${variant}/lineage_seed=${lineage_seed}/generation=${generation}/agent_id=${agent_id}", mode: "move"

tag "variant=${variant}/lineage_seed=${lineage_seed}/generation=${generation}/agent_id=${agent_id}"

Expand Down Expand Up @@ -47,7 +47,7 @@ process analysisSingle {
}

process analysisMultiDaughter {
publishDir "${params.publishDir}/${params.experimentId}/analyses/variant=${variant}/lineage_seed=${lineage_seed}/generation=${generation}"
publishDir "${params.publishDir}/${params.experimentId}/analyses/variant=${variant}/lineage_seed=${lineage_seed}/generation=${generation}", mode: "move"

tag "variant=${variant}/lineage_seed=${lineage_seed}/generation=${generation}"

Expand Down Expand Up @@ -93,7 +93,7 @@ process analysisMultiDaughter {
}

process analysisMultiGeneration {
publishDir "${params.publishDir}/${params.experimentId}/analyses/variant=${variant}/lineage_seed=${lineage_seed}"
publishDir "${params.publishDir}/${params.experimentId}/analyses/variant=${variant}/lineage_seed=${lineage_seed}", mode: "move"

tag "variant=${variant}/lineage_seed=${lineage_seed}"

Expand Down Expand Up @@ -137,7 +137,7 @@ process analysisMultiGeneration {
}

process analysisMultiSeed {
publishDir "${params.publishDir}/${params.experimentId}/analyses/variant=${variant}"
publishDir "${params.publishDir}/${params.experimentId}/analyses/variant=${variant}", mode: "move"

tag "variant=${variant}"

Expand Down Expand Up @@ -179,7 +179,7 @@ process analysisMultiSeed {
}

process analysisMultiVariant {
publishDir "${params.publishDir}/${params.experimentId}/analyses"
publishDir "${params.publishDir}/${params.experimentId}/analyses", mode: "move"

input:
path config
Expand Down
2 changes: 1 addition & 1 deletion runscripts/nextflow/colony.nf
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ process colony {

script:
"""
python /vivarium-ecoli/ecoli/experiments/ecoli_engine_process.py --config $config --sim_data_path $sim_data --initial_state_file $initial_state
python /vEcoli/ecoli/experiments/ecoli_engine_process.py --config $config --sim_data_path $sim_data --initial_state_file $initial_state
STATUS=$?
"""

Expand Down
13 changes: 11 additions & 2 deletions runscripts/nextflow/config.template
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,24 @@ profiles {
// Codes: 137 (out-of-memory)
((task.exitStatus == 137)
&& (task.attempt <= process.maxRetries)) ? 'retry' : 'ignore' }
// Symlinks break when files go through process chains (nextflow#4845)
process.stageInMode = 'copy'
google.project = 'allen-discovery-center-mcovert'
google.location = 'us-west1'
google.batch.spot = true
docker.enabled = true
params.projectRoot = '/vivarium-ecoli'
params.projectRoot = '/vEcoli'
params.publishDir = "PUBLISH_DIR"
process.maxRetries = 1
// Check Google Cloud latest spot pricing / performance
process.machineType = 't2d-standard-1'
process.machineType = {
def cpus = task.cpus
def powerOf2 = 1
while (powerOf2 < cpus && powerOf2 < 64) {
powerOf2 *= 2
}
return "t2d-standard-${powerOf2}"
}
workflow.failOnIgnore = true
}
sherlock {
Expand Down
4 changes: 2 additions & 2 deletions runscripts/nextflow/sim.nf
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
process simGen0 {
publishDir path: "${params.publishDir}/${params.experimentId}/daughter_states/variant=${sim_data.getBaseName()}/seed=${lineage_seed}/generation=${generation}/agent_id=${agent_id}", pattern: "*.json"
publishDir path: "${params.publishDir}/${params.experimentId}/daughter_states/variant=${sim_data.getBaseName()}/seed=${lineage_seed}/generation=${generation}/agent_id=${agent_id}", pattern: "*.json", mode: "copy"

tag "variant=${sim_data.getBaseName()}/seed=${lineage_seed}/generation=${generation}/agent_id=${agent_id}"

Expand Down Expand Up @@ -52,7 +52,7 @@ process simGen0 {
}

process sim {
publishDir path: "${params.publishDir}/${params.experimentId}/daughter_states/variant=${sim_data.getBaseName()}/seed=${lineage_seed}/generation=${generation}/agent_id=${agent_id}", pattern: "*.json"
publishDir path: "${params.publishDir}/${params.experimentId}/daughter_states/variant=${sim_data.getBaseName()}/seed=${lineage_seed}/generation=${generation}/agent_id=${agent_id}", pattern: "*.json", mode: "copy"

tag "variant=${sim_data.getBaseName()}/seed=${lineage_seed}/generation=${generation}/agent_id=${agent_id}"

Expand Down
Loading

0 comments on commit 0347403

Please sign in to comment.