Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: extend barcode collection search #323

Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
0cd94f1
Update barcode_collection with recursive search of barcode subfolders
marcomoscasgr Jun 3, 2024
447264e
Fix to barcode_collection function when checking tag ids
marcomoscasgr Jun 3, 2024
cea4344
check tag_id and barcode collection
marcomoscasgr Jun 4, 2024
fff9982
fix to dictionary items returned
marcomoscasgr Jun 4, 2024
23f3c02
Fix to barcode_collections to output multiple barcode folders
marcomoscasgr Jun 5, 2024
ef41cbd
Add tests for metadata update on rebasecalled runs
marcomoscasgr Jun 6, 2024
af0b721
Update html report tests
marcomoscasgr Jun 6, 2024
408626b
update mlwh queries tests
marcomoscasgr Jun 6, 2024
667204a
Update metadata update tests
marcomoscasgr Jun 6, 2024
5d9747c
Test rebasecalled data with 1 exp name and 1 slot
marcomoscasgr Jun 6, 2024
c999dde
comments in test html report
marcomoscasgr Jun 6, 2024
b360a0f
fix to barcode path variable in metadata update test
marcomoscasgr Jun 7, 2024
b3fd7b3
fix for metadata update of each barcode folder
marcomoscasgr Jun 7, 2024
f3b227f
Update barcode_collection with recursive search of barcode subfolders
marcomoscasgr Jun 3, 2024
74561d6
Fix to barcode_collection function when checking tag ids
marcomoscasgr Jun 3, 2024
4bf3123
check tag_id and barcode collection
marcomoscasgr Jun 4, 2024
b70af01
fix to dictionary items returned
marcomoscasgr Jun 4, 2024
fd465c8
Fix to barcode_collections to output multiple barcode folders
marcomoscasgr Jun 5, 2024
006ef30
Add tests for metadata update on rebasecalled runs
marcomoscasgr Jun 6, 2024
3472fe4
Update html report tests
marcomoscasgr Jun 6, 2024
90adf9e
update mlwh queries tests
marcomoscasgr Jun 6, 2024
ca1677e
Update metadata update tests
marcomoscasgr Jun 6, 2024
bd6665b
Test rebasecalled data with 1 exp name and 1 slot
marcomoscasgr Jun 6, 2024
3d3c862
comments in test html report
marcomoscasgr Jun 6, 2024
9a59199
fix to barcode path variable in metadata update test
marcomoscasgr Jun 7, 2024
0aa147c
fix for metadata update of each barcode folder
marcomoscasgr Jun 7, 2024
de86163
Merge branch 'feature/extend-barcode-collection-search' of github.com…
marcomoscasgr Jun 7, 2024
6756334
Add 2024 to copyright statement
marcomoscasgr Jun 7, 2024
d3e8ed8
Update test run folder names
marcomoscasgr Jun 10, 2024
49ab479
Update paths for rebasecalled testdata
marcomoscasgr Jun 11, 2024
b35f534
Handle duplicated barcode folder names under the same collection
marcomoscasgr Jun 12, 2024
83911d7
Update barcode_collection function to use parent folders to check bar…
marcomoscasgr Jun 13, 2024
b290a48
Update barcode_collection-related tests
marcomoscasgr Jun 13, 2024
de5a504
Update docstring of barcode_collection
marcomoscasgr Jun 13, 2024
98780f5
Add test about missing barcode folders with no process interruption
marcomoscasgr Jun 14, 2024
7685b32
Remove test folder after testing barcode_collections
marcomoscasgr Jun 14, 2024
9be3e7c
add subfolder in missing folders test data
marcomoscasgr Jun 14, 2024
dc6857d
add path argument in log.warn of barcode_collections
marcomoscasgr Jun 17, 2024
5647ec5
typo in actual_tag_identifiers in missing folder test of barcode_coll…
marcomoscasgr Jun 17, 2024
489ad52
remove space between format string and colon in conftest.py
marcomoscasgr Jun 17, 2024
365945f
fix to log.warn and path argument
marcomoscasgr Jun 17, 2024
55a22e1
remove str from purepath in strings
marcomoscasgr Jun 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 49 additions & 13 deletions src/npg_irods/ont.py
Original file line number Diff line number Diff line change
Expand Up @@ -464,10 +464,10 @@ def barcode_name_from_id(tag_identifier: str) -> str:

def barcode_collections(coll: Collection, *tag_identifier) -> list[Collection]:
"""Return the barcode-specific sub-collections that exist under the specified
collection.
collection basecalled and deplexed on instrument or offline.

The arrangement of these collections mirrors the directory structure created by the
guppy basecaller. E.g. for tag identifier NB01:
guppy/dorado basecaller. E.g. for tag identifier NB01:

<coll>/fast5_pass/barcode01
...
Expand All @@ -477,7 +477,23 @@ def barcode_collections(coll: Collection, *tag_identifier) -> list[Collection]:
...
<coll>/fastq_fail/barcode01

etc.
...or for rebasecalled runs:

<coll>/pass/barcode01
...
<coll>/pass/barcode02
...

...or for old rebasecalled runs (up until June 2024):

<coll>/barcode01
...
<coll>/barcode02
...

If collection paths contain duplicated barcode folder names,
it will raise a ValueError.
E.g. <coll>/pass/barcode01/.../barcode01

Args:
coll: A collection to search.
Expand All @@ -486,26 +502,46 @@ def barcode_collections(coll: Collection, *tag_identifier) -> list[Collection]:

Returns:
A sorted list of existing collections.

Raises:
ValueError: Duplicated barcode folder names are found in a path
"""
bcolls = []

sub_colls = [item for item in coll.contents() if item.rods_type == Collection]
for sc in sub_colls: # fast5_fail, fast5_pass etc
# These are some known special cases that don't have barcode directories
if sc.path.name in IGNORED_DIRECTORIES:
log.debug("Ignoring", path=sc)
continue

barcode_folders = [barcode_name_from_id(tag_id) for tag_id in tag_identifier]
sub_colls = [
item
for item in coll.contents(recurse=True)
if item.rods_type == Collection and item.path.name in barcode_folders
]

parents = set()
for sc in sub_colls:
duplicated = re.findall(r"(barcode\d+)", str(sc.path))
if len(duplicated) > 1:
msg = (
f"Incorrect barcode folder path {str(sc.path)}. "
f"Contains multiple barcode folders {duplicated}"
)
log.error(msg)
raise ValueError(msg)
parents.add(str(sc.path.parent))

for parent in parents:
for tag_id in tag_identifier:
bpath = sc.path / barcode_name_from_id(tag_id)
bpath = PurePath(parent, barcode_name_from_id(tag_id))
bcoll = Collection(bpath)
if bcoll.exists():
bcolls.append(bcoll)
else:
# LIMS says there is a tag identifier, but there is no sub-collection,
# so possibly this was not deplexed on-instrument for some reason e.g.
# a non-standard tag set was used
log.warn("No barcode sub-collection", path=bcoll, tag_identifier=tag_id)
log.warn(
"No barcode sub-collection",
kjsanger marked this conversation as resolved.
Show resolved Hide resolved
parent=parent,
subfolder=barcode_name_from_id(tag_id),
tag_identifier=tag_id,
)
bcolls.sort()

return bcolls
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
7b8a278d0e2fc81a08c48f66cdde07fe
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ed7f8cb02e745bd9d30b44e365da7877
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
940b04002f3353cba60bc4e12cfd8c19
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
dba90887e879d79026f9ce9969e0c683
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
cf4f5714965c1c4259f4848006f718c4
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0a598ac81ad24e1972e676dd44b3eb9a
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
717442c288c19611cbf24425266d5d3c
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
e0bd998a05714aceb11603743f498a20
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
e1b208b290d4dd316bf1550913e3d1e5
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3b21628cf224eb71388bfe17bc7b70ea
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
417fd2fa72fd2fa5a9c78c4d57079219
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
df0deb917e1de059c6f2bc73f714f5f2
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
68b43c2f74ca8e99edb3991cf82d7528
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ce555bf560b4dff924fb744842406fef
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
d421761de9f680f41dfc2819ee066b1c
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
e9532a59552ee99b47356225cba6b38e
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
5a509ab6eba17afb5d31437d1bbaa047
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
55bec70e6b21fd8bd9c0dd438423d0dd
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
d699af01b8c46ca50c5b4da184133e9d
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
9daadee300bd8adb8f57679973162220
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
52ef184cf1845037490b3d424543596f
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
4f3af0cc3d20d7e08ce867b6a4a175de
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
bbe3abcc99820a6b4944cbfee3bb72fb
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
a3c859f159f98f660d46b408ac3870e9
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
78586e462f4b26766d9a8ad3ce255767
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
14c0f5b8145e98f8085b4f6bec83dad3
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
c5cb38066a395a49d8c2129e198b0099
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
c3b3d18b3527a3d322d46e75db6bdd81
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3233f1563cabc466c622295ef68a7bf5
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0698c00577fd02fcd707b2c5fb095aff
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
eeb7ebcd7191568c46e3d667691a65ff
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
d6deec72a77e2931e67dbaea48695fe3
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1548be5901f49c15cd4a62c429bbfffd
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
b7e6163e8156ca7b1eb50020b9736c83
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
478eca62ec90d58587338ef3f85edde3
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
f87b3805af1c633ba99ab8299db408eb
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
10e24e3655cf59ba00cb23ebbb055594
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
7363b16865430a7c002315fdc81e9bee
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
186a642ed809d851cdc2af39db84c446
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
4d50318ef77bd07b4b5e4eff0ef0b1e6
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
2e4cb1336dcfcc88b18c3a3eda2abe3d
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
a0a361557da7b021e59262c8274cb667
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
bb10f2eea05af9dfb174d932c958d6c4
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
5c7ad7da18da6bf60c42c3d0bf72943d
107 changes: 97 additions & 10 deletions tests/ont/conftest.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright © 2021, 2022, 2023 Genome Research Ltd. All rights reserved.
# Copyright © 2021, 2022, 2023, 2024 Genome Research Ltd. All rights reserved.
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
Expand Down Expand Up @@ -122,12 +122,13 @@ def make_simple_flowcell(ex, sl, n):
flowcells.append(make_simple_flowcell(expt, slot, sample_idx))
sample_idx += 1

def make_mplex_flowcell(ex, sl, tid, bc, n):
"""Make a multiplexed flowcell give an experiment number, instrument slot,
tag identifier, barcode and sample index."""
def make_mplex_flowcell(ex_name, ex_n, fc_start, sl, tid, bc, n):
"""Make a multiplexed flowcell given an experiment name, experiment number,
flowcell start idx, instrument slot, tag identifier, barcode and sample index.
"""

when = EARLY # All the even experiments have the early datetime
if ex % 2 == 1:
if ex_n % 2 == 1:
when = LATE # All the odd experiments have the late datetime
if sl % 2 == 1:
when = LATEST # Or latest if they have an odd instrument position
Expand All @@ -137,9 +138,9 @@ def make_mplex_flowcell(ex, sl, tid, bc, n):
study=study_z,
instrument_name=instrument_name,
instrument_slot=sl,
experiment_name=f"multiplexed_experiment_{ex :0>3}",
experiment_name=f"{ex_name}_{ex_n :0>3}",
id_lims=f"Example LIMS ID {n}",
id_flowcell_lims=f"flowcell{sl + 100 :0>3}",
id_flowcell_lims=f"flowcell{sl + fc_start :0>3}",
kjsanger marked this conversation as resolved.
Show resolved Hide resolved
tag_set_id_lims="Example LIMS tag set ID",
tag_set_name="EXP-NBD104",
tag_sequence=bc,
Expand Down Expand Up @@ -172,7 +173,44 @@ def make_mplex_flowcell(ex, sl, tid, bc, n):
# MinKNOW.
tag_id = ont_tag_identifier(barcode_idx + 1)
flowcells.append(
make_mplex_flowcell(expt, slot, tag_id, barcode, msample_idx)
make_mplex_flowcell(
"multiplexed_experiment",
expt,
100,
slot,
tag_id,
barcode,
msample_idx,
)
)
msample_idx += 1

msample_idx = 0
for expt in range(1, 2):
for slot in range(1, 2):
for barcode_idx, barcode in enumerate(barcodes[:4]):
tag_id = ont_tag_identifier(barcode_idx + 1)
flowcells.extend(
[
make_mplex_flowcell(
"old_rebasecalled_multiplexed_experiment",
expt,
200,
slot,
tag_id,
barcode,
msample_idx,
),
make_mplex_flowcell(
"rebasecalled_multiplexed_experiment",
expt,
300,
slot,
tag_id,
barcode,
msample_idx,
),
]
)
msample_idx += 1

Expand Down Expand Up @@ -245,8 +283,57 @@ def ont_synthetic_irods(tmp_path):
]
coll.add_metadata(*meta)

# We have synthetic data only for simple_experiment_001 and
# multiplexed_experiment_001
for expt in range(1, 2):
for slot in range(1, 2):
expt_name = f"old_rebasecalled_multiplexed_experiment_{expt :0>3}"
id_flowcell = f"flowcell{slot + 200 :0>3}"
kjsanger marked this conversation as resolved.
Show resolved Hide resolved
run_folder = f"20190904_1514_GA{slot}0000_{id_flowcell}_b4a1fd79"
path = PurePath(
expt_root,
expt_name,
run_folder,
"dorado",
"7.2.13",
"sup",
"simplex",
"normal",
"default",
)
coll = Collection(path)
coll.create(parents=True)
meta = [
AVU(ont.Instrument.EXPERIMENT_NAME, expt_name),
AVU(ont.Instrument.INSTRUMENT_SLOT, f"{slot}"),
]
coll.add_metadata(*meta)

for expt in range(1, 2):
for slot in range(1, 2):
expt_name = f"rebasecalled_multiplexed_experiment_{expt :0>3}"
id_flowcell = f"flowcell{slot + 300 :0>3}"
run_folder = f"20190904_1514_GA{slot}0000_{id_flowcell}_08c179cd"
path = PurePath(
expt_root,
expt_name,
run_folder,
"dorado",
"7.2.13",
"sup",
"simplex",
"normal",
"default",
)
coll = Collection(path)
coll.create(parents=True)
meta = [
AVU(ont.Instrument.EXPERIMENT_NAME, expt_name),
AVU(ont.Instrument.INSTRUMENT_SLOT, f"{slot}"),
]
coll.add_metadata(*meta)

# We have synthetic data only for simple_experiment_001,
# multiplexed_experiment_001, rebasecalled_multiplexed_experiment_001
# and old_rebasecalled_multiplexed_experiment_001
iput("./tests/data/ont/synthetic", rods_path, recurse=True)

try:
Expand Down
12 changes: 10 additions & 2 deletions tests/ont/test_html_reports.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,14 @@ def test_ont_runs_html_report(self, ont_synthetic_irods):
links = [x for x in doc.result if x.startswith('<a href="/testZone/')]

expected_colls = 40
expected_rebasecalled_colls = 2
expected_objs = 3

assert len(links) == expected_colls + expected_objs
expected_rebasecalled_objs = 2

assert (
len(links)
== expected_colls
+ expected_rebasecalled_colls
+ expected_objs
+ expected_rebasecalled_objs
)
Loading