Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run bigscape v2 #251

Merged
merged 57 commits into from
Jul 17, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
d2dc38b
add BiG-SCAPE 2 to dependencies
adraismawur May 29, 2024
93848dd
add example config for bigscape 2
adraismawur May 29, 2024
9d0a2d0
implement running BiG-SCAPE
adraismawur May 29, 2024
6d40414
fix bigscape2 dependency
adraismawur May 31, 2024
02bc843
copy db file properly
adraismawur May 31, 2024
2351055
remove cluster arg
adraismawur May 31, 2024
1b43a31
run ruff formatter
adraismawur May 31, 2024
eb28d38
fix ruff check issues
adraismawur May 31, 2024
9293c64
ensure str for mypy static type checking
adraismawur Jun 3, 2024
f18ecb5
Merge branch 'dev' of github.com:NPLinker/nplinker into run-bigscape-v2
adraismawur Jun 6, 2024
4a0e86b
Move configuration to correct file
adraismawur Jun 7, 2024
7361a98
use os.path.join instead of string concat
adraismawur Jun 7, 2024
a329e43
fix merge mistake
adraismawur Jun 7, 2024
caf2711
remove extra bigscape 2 files
adraismawur Jun 14, 2024
525c707
add missing library
adraismawur Jun 14, 2024
2433d76
add validator for bigscape version
adraismawur Jun 14, 2024
8447272
add test for bigscape version
adraismawur Jun 14, 2024
f6330e9
fix typo
adraismawur Jun 14, 2024
bc096bf
Merge branch 'dev' of github.com:NPLinker/nplinker into run-bigscape-v2
adraismawur Jun 19, 2024
84095b7
add simple run testing
adraismawur Jun 19, 2024
21b4600
add test to check for nonextent input path
adraismawur Jun 19, 2024
a2b6eb8
add info to docstring
adraismawur Jul 15, 2024
c03f64a
add exception on invalid version number
adraismawur Jul 15, 2024
9e9758e
move log to after validation
adraismawur Jul 15, 2024
9e8c767
add version info to log
adraismawur Jul 15, 2024
e9f7345
use specific exception
adraismawur Jul 15, 2024
775cbf5
rework return codes and exceptions
adraismawur Jul 15, 2024
874ea3a
add wrong version test
adraismawur Jul 16, 2024
bd699de
add invalid path test for v2
adraismawur Jul 16, 2024
3189999
specify exception
adraismawur Jul 16, 2024
19e72f2
fix tests not correctly running
adraismawur Jul 16, 2024
a9c9cec
change imports to reflect style in other tests
adraismawur Jul 16, 2024
2164f6c
specify exception type
adraismawur Jul 16, 2024
92578fd
add minimal test data
adraismawur Jul 16, 2024
3ac3e91
add real data tests
adraismawur Jul 16, 2024
0db25d8
remove class
adraismawur Jul 16, 2024
65fa549
force string for mypy
adraismawur Jul 16, 2024
3096bcc
Apply suggestions from code review
adraismawur Jul 17, 2024
d4cf769
add exceptions to docstring
adraismawur Jul 17, 2024
c00d59c
add docstring to tests
adraismawur Jul 17, 2024
18b2317
use tmp path instead of data path
adraismawur Jul 17, 2024
8a356a5
add missing typing
adraismawur Jul 17, 2024
5726f22
add explanation of cluster mode
adraismawur Jul 17, 2024
aab5e69
parameterize tests
adraismawur Jul 17, 2024
1d6da60
remove two gbks
adraismawur Jul 17, 2024
7914f3a
better documentation
adraismawur Jul 17, 2024
88c1f19
skip tests with dataset
adraismawur Jul 17, 2024
195c791
do not check output code within run
adraismawur Jul 17, 2024
6cc45d1
move log
adraismawur Jul 17, 2024
4a288a9
add test with incorrect parameters for runtime exception
adraismawur Jul 17, 2024
a24454c
remove temporary nplinker.toml
adraismawur Jul 17, 2024
a4b3a46
add stderr to error log
adraismawur Jul 17, 2024
84eb933
add import needed for skipping test on CI
adraismawur Jul 17, 2024
69f7674
Apply suggestions from code review
adraismawur Jul 17, 2024
5cadd45
expand docstring
adraismawur Jul 17, 2024
1efc8fd
Apply suggestions from code review
adraismawur Jul 17, 2024
bdd1f8e
fix ruff complaints
adraismawur Jul 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions bin/install-nplinker-deps
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ pip install -q -U pip setuptools
echo "🔥 Start installing BigScape ..."
[[ -d BiG-SCAPE ]] || git clone https://github.com/medema-group/BiG-SCAPE.git
cd BiG-SCAPE
git reset --hard
CunliangGeng marked this conversation as resolved.
Show resolved Hide resolved
git config --add advice.detachedHead false # disable advice
git config pull.ff only
git checkout master
Expand All @@ -136,6 +137,21 @@ echo "🔥 Start installing BigScape ..."
chmod 775 Annotated_MIBiG_reference
ln -sf $LIB_PATH/BiG-SCAPE/bigscape.py $PY_PATH/bin
cd ..
# blob size limit to remove large files left in history
[[ -d BiG-SCAPE-v2 ]] || git clone -b dev --filter=blob:limit=10m https://github.com/medema-group/BiG-SCAPE.git BiG-SCAPE-v2
cd BiG-SCAPE-v2
git config --ad advice.detatchedHead false
git checkout dfb0d78427e020aab2c72cc741327ccd102470a1 # specific commit that includes important fixes to v2
pip install click
pip install sqlalchemy
pip install pyhmmer
chmod 754 bigscape.py
ln -sf $LIB_PATH/BiG-SCAPE-v2/bigscape.py $PY_PATH/bin/bigscape-v2.py
ln -sf $LIB_PATH/BiG-SCAPE-v2/config.ini $PY_PATH/bin # new system of configuration needs default config file
ln -sf $LIB_PATH/BiG-SCAPE-v2/big_scape $PY_PATH/bin # folder is needed for some files
CunliangGeng marked this conversation as resolved.
Show resolved Hide resolved
cd ..


echo -e "✅ BigScape installed successfully\n"

#--- Install FastTree (not support Windows, required by BigScape)
Expand Down
35 changes: 25 additions & 10 deletions src/nplinker/arranger.py
Original file line number Diff line number Diff line change
Expand Up @@ -263,21 +263,34 @@ def _run_bigscape(self) -> None:
default BiG-SCAPE directory.
"""
defaults.BIGSCAPE_RUNNING_OUTPUT_PATH.mkdir(exist_ok=True, parents=True)

version = config.bigscape.version

run_bigscape(
defaults.ANTISMASH_DEFAULT_PATH,
defaults.BIGSCAPE_RUNNING_OUTPUT_PATH,
config.bigscape.parameters,
version,
)
for f in glob(
str(
defaults.BIGSCAPE_RUNNING_OUTPUT_PATH
/ "network_files"
/ "*"
/ "mix"
/ "mix_clustering_c*.tsv"

if version == 1:
for f in glob(
str(
defaults.BIGSCAPE_RUNNING_OUTPUT_PATH
/ "network_files"
/ "*"
/ "mix"
/ "mix_clustering_c*.tsv"
)
):
shutil.copy(f, defaults.BIGSCAPE_DEFAULT_PATH)
elif version == 2:
shutil.copy(
defaults.BIGSCAPE_RUNNING_OUTPUT_PATH / "data_sqlite.db",
defaults.BIGSCAPE_DEFAULT_PATH,
)
):
shutil.copy(f, defaults.BIGSCAPE_DEFAULT_PATH)
else:
raise ValueError(f"Invalid BiG-SCAPE version: {version}")

def arrange_strain_mappings(self) -> None:
"""Arrange the strain mappings file.
Expand Down Expand Up @@ -319,7 +332,9 @@ def _validate_strain_mappings(self) -> None:

def _generate_strain_mappings(self) -> None:
"""Generate the strain mappings file for the PODP mode."""
podp_json_file = defaults.DOWNLOADS_DEFAULT_PATH / f"paired_datarecord_{config.podp_id}.json"
podp_json_file = (
defaults.DOWNLOADS_DEFAULT_PATH / f"paired_datarecord_{config.podp_id}.json"
)
genome_status_json_file = defaults.DOWNLOADS_DEFAULT_PATH / GENOME_STATUS_FILENAME
genome_bgc_mappings_file = defaults.ANTISMASH_DEFAULT_PATH / GENOME_BGC_MAPPINGS_FILENAME
gnps_file_mapping_file = self.gnps_file_mappings_file
Expand Down
32 changes: 28 additions & 4 deletions src/nplinker/genomics/bigscape/runbigscape.py
adraismawur marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -15,22 +15,46 @@ def run_bigscape(
antismash_path: str | PathLike,
output_path: str | PathLike,
extra_params: str,
version: int = 1,
adraismawur marked this conversation as resolved.
Show resolved Hide resolved
):
bigscape_py_path = "bigscape.py"
"""Runs BiG-SCAPE to cluster BGCs.

The behavior of this function is slightly different depending on the version of
BiG-SCAPE that is set to run using the configuration file.
Mostly this means a different set of parameters is used between the two versions.
adraismawur marked this conversation as resolved.
Show resolved Hide resolved
"""
adraismawur marked this conversation as resolved.
Show resolved Hide resolved
adraismawur marked this conversation as resolved.
Show resolved Hide resolved
# switch to correct version of BiG-SCAPE
if version == 1:
bigscape_py_path = "bigscape.py"
elif version == 2:
bigscape_py_path = "bigscape-v2.py"
adraismawur marked this conversation as resolved.
Show resolved Hide resolved

logger.info(
f'run_bigscape: input="{antismash_path}", output="{output_path}", extra_params={extra_params}"'
adraismawur marked this conversation as resolved.
Show resolved Hide resolved
)

try:
subprocess.run([bigscape_py_path, "-h"], capture_output=True, check=True)
except Exception as e:
raise Exception(f"Failed to find/run bigscape.py (path={bigscape_py_path}, err={e})") from e
raise Exception(
adraismawur marked this conversation as resolved.
Show resolved Hide resolved
f"Failed to find/run bigscape.py (path={bigscape_py_path}, err={e})"
adraismawur marked this conversation as resolved.
Show resolved Hide resolved
) from e

if not os.path.exists(antismash_path):
raise Exception(f'antismash_path "{antismash_path}" does not exist!')
adraismawur marked this conversation as resolved.
Show resolved Hide resolved

# configure the IO-related parameters, including pfam_dir
args = [bigscape_py_path, "-i", antismash_path, "-o", output_path, "--pfam_dir", PFAM_PATH]
# assemble arguments. first argument is the python file
args = [bigscape_py_path]

# version 2 points to specific Pfam file, version 1 points to directory
# version 2 also requires the cluster subcommand
if version == 1:
args.extend(["--pfam_dir", PFAM_PATH])
elif version == 2:
args.extend(["cluster", "--pfam_path", PFAM_PATH + "/Pfam-A.hmm"])
adraismawur marked this conversation as resolved.
Show resolved Hide resolved

# add input and output paths. these are unchanged
args.extend(["-i", str(antismash_path), "-o", str(output_path)])

# append the user supplied params, if any
if len(extra_params) > 0:
Expand Down
5 changes: 5 additions & 0 deletions src/nplinker/nplinker_default.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,12 @@ to_use = true
version = "3.1"

[bigscape]
verison = 1
adraismawur marked this conversation as resolved.
Show resolved Hide resolved
adraismawur marked this conversation as resolved.
Show resolved Hide resolved
parameters = "--mibig --clans-off --mix --include_singletons --cutoffs 0.30"

# for version 2, use the following parameters string:
# parameters = "--mibig_version 3.1 --include_singletons --gcf_cutoffs 0.30"

adraismawur marked this conversation as resolved.
Show resolved Hide resolved
cutoff = "0.30"

[scoring]
Expand Down
Loading