Skip to content

Commit

Permalink
Updated documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
mgcam committed Oct 25, 2024
1 parent 250d960 commit 825d5a0
Show file tree
Hide file tree
Showing 2 changed files with 141 additions and 92 deletions.
25 changes: 25 additions & 0 deletions Changes
Original file line number Diff line number Diff line change
@@ -1,5 +1,30 @@
LIST OF CHANGES FOR NPG-QC PACKAGE

- npg_qc::autoqc::qc_store - changed loading of ORM classes into memory
to on demand at ru time.
- Added lane-level assessment to RoboQC:
- Extended npg_qc::Schema::Result::Review class to enable saving of
sequencing (lane-level) QC outcomes to database
- Extended npg_qc::autoqc::check::review to enable lane-level assessment
and documented the new feature.
- Made redundant 'qc_type' key of the 'robo' configuration.
- Fixed a bug in creating a list of class names in
npg_qc::autoqc::check::review.
A list of names for result classes to be loaded was inferred
from all criteria in the 'robo' section under consideration.
Until now this did not pesent a problem since event when in
the study RoboQC section of the configuration file a number
of different criteria sections were present, it was possible
to load all of the AutoQC results for any given entity.
The bug is revealed when lane-level assessment is added to
the 'robo' section under the 'default' configuration. This
type of assessment might require presence of lane-level autoqc
results for checks that are never performed om sample level,
'tag_metrics' check being the most obvious example.
The fix is to infer the list of class names from the criteria,
which are going to be evaluated for the entity, rather than
from a collection of criteria from the whole 'robo' section.

release 73.0.0 (2024-10-24)
- Removed from the repository unused test data.
- npg_qc::autoqc::check::review:
Expand Down
208 changes: 116 additions & 92 deletions lib/npg_qc/autoqc/checks/review.pm
Original file line number Diff line number Diff line change
Expand Up @@ -59,84 +59,80 @@ npg_qc::autoqc::checks::review
=head2 Overview
This checks evaluates the results of other autoqc checks
against a predefined set of criteria.
This checks evaluates the results of other autoqc checks against a predefined
set of criteria.
If data product acceptance criteria are defined, it is possible to
introduce a degree of automation into the manual QC process. To
provide interoperability with the API supporting the manual QC process,
the outcome of the evaluation, which is performed by this check, is
recorded not only as a simple undefined, pass or fail as in other autoqc
checks, but also as one of valid manual or user QC outcomes.
If data product acceptance criteria are defined, it is possible to introduce
a degree of automation into the QC process. To provide interoperability with
the API supporting the manual QC process, the outcome of the evaluation is
recorded not only as a simple C<undefined>, C<pass> or C<fail> as in other
autoqc checks, but also as one of valid manual lane or library-level QC
outcomes.
=head2 Types of criteria
The robo section of the product configuration file sits either
within the configuration for a particular study or in the default
section, or in both locations. A study-specific RoboQC definition
The C<robo> section of the product configuration file sits either within
the configuration for a particular study or in the C<default> section, or
in both locations. For a given entity a study-specific RoboQC definition
takes precedence over the default one.
Evaluation criteria for samples vary depending on the sequencing
instrument type, library type, sample type, etc. There might be a
need to exclude some samples from RoboQC. The criteria key of the
robo configuration points to an array of criteria objects. Each of
the criteria contains two further keys, one for acceptance and one
for applicability criteria. The acceptance criteria are evaluated
if either the applicability criteria have been satisfied or no
applicability criteria are defined.
The applicability criteria for each criteria object should be
set in such a way that the order of evaluation of the criteria
array does not matter. If applicability criteria in all of the
criteria objects are not satisfied, no QC outcome is assigned
and the pass attribute of the review result object remains unset.
The product can satisfy applicability criteria in at most one
criteria object. If none of the study-specific applicability
criteria are satisfied, the review check does not proceed even if
the product might satisfy one of the default applicability criteria.
=head2 QC outcomes
A valid Manual QC outcome is one of the values from the library
qc outcomes dictionary (mqc_library_outcome_dict table of the
npg_qc database), i.e. one of 'Accepted', 'Rejected' or 'Undecided'
outcomes. If the final_qc_outcome flag of this class' instance is
set to true, the outcome is also marked as 'Final', otherwise it is
marked as 'Preliminary' (examples: 'Accepted Final',
'Rejected Preliminary'). By default the final_qc_outcome flag is
false and the produced outcomes are preliminary.
The type of QC outcome can be configured within the Robo QC
section of product configuration. The default type is library
Manual QC.
Evaluation criteria for samples vary depending on the sequencing instrument
and flowcell type, library type, sample reference, etc. Each of the
C<criteria> keys of the C<robo> configuration contains two further keys, one
for acceptance and one for applicability criteria.
The applicability criteria for each criteria object should be set in such
a way that the order of evaluation of the criteria array does not matter.
If applicability criteria in all of the criteria objects are not satisfied,
the evaluation is not performed, no QC outcome is assigned and the C<pass>
attribute of the review result object remains unset.
The product can satisfy applicability criteria in at most one criteria object.
If none of the study-specific applicability criteria are satisfied, the review
check does not proceed even if the product satisfies one of the applicability
criteria in the C<default> section.
=head2 QC outcomes values
A valid manual QC outcome is one of the values from the library
qc outcomes dictionary (C<mqc_library_outcome_dict> table of the
QC database), i.e. one of 'Accepted', 'Rejected' or 'Undecided'
outcomes. If the C<final_qc_outcome> flag of the review check object
is set to a C<true> value, the outcome is also marked as 'Final',
otherwise it is marked as 'Preliminary' (examples: C<Accepted Final>,
C<Rejected Preliminary>). By default the C<final_qc_outcome> flag is
C<false> and the produced outcomes are preliminary.
=head2 QC outcomes types
The behaviour mimics 'manua; QC process via SeqQC UI. For lane entities
'sequencing' QC outcomes are generated. Fo samples 'library' QC outcomes
are generated.
=head2 Rules for assignment of the QC outcome
The rules below apply to a single criteria object.
The 'Accepted' outcome is assigned if the outcome of evaluation is
true, the 'Rejected' outcome is assigned otherwise.
a C<true> value, the 'Rejected' outcome is assigned otherwise.
=head2 Retrieval of autoqc results to be evaluated
It is possible to invoke this check on any entity. At run time an
attempt is made to retrieve autoqc results for this entity (product),
which are relevant to the RoboQC for this product. If this attempt
fails, the execute method of the check will exit with an error. A
which are relevant to RoboQC for this product. If this attempt
fails, the C<execute> method of the check will exit with an error. A
failure to retrieve the autoqc results might be for one of three
reasons: (1) either the entity is not an end product (example: a pool)
and no such results exist or (2) it is a product, but the autoqc results
have not been computed yet, or (3) they have, but their file system
location (if that's where we are looking) is different from expected
(ie given by the qc_in attribute).
The autoqc results are retrieved either from the file system (use_db
attribute should be set to false, which is default) or from a database
(use_db attribute should be set to true). npg_qc::autoqc::qc_store class
is used to retrieve results. In contrast to the default behaviour of the
npg_qc::autoqc::qc_store class, if the database retrieval is enabled, no
fall back to a search on a file system is performed.
reasons: (1) the autoqc results have not been computed yet, or (2) the
file system location is different from the one given by C<qc_in>
attribute.
The autoqc results are retrieved either from the file system (C<use_db>
attribute should be set to C<false>, which is default) or, C<use_db> is
set to a C<true> value, from a database. C<npg_qc::autoqc::qc_store>
class is used to retrieve results. In contrast to the default behaviour
of C<npg_qc::autoqc::qc_store> class, if no result are retrieved from
the database, a search on a file system is not attempted.
=head2 Record of the evaluation criteria
Expand Down Expand Up @@ -190,9 +186,9 @@ all criteria in the array are equally essential. Therefore, a conjunction
=head2 use_db
A boolean read-only attribute, false by default.
If set to false, autoqc results are loaded from the qc_in path.
If set to true, they are loaded from the database.
A boolean read-only attribute, C<false> by default.
If set to C<false>, autoqc results are loaded from the C<qc_in> path.
If set to C<true>, the results are loaded from the database.
=cut

Expand All @@ -201,6 +197,7 @@ has 'use_db' => (
is => 'ro',
);

# The DBIx schema object for the database with autoqc results.
has '_qc_schema' => (
isa => 'Maybe[npg_qc::Schema]',
is => 'ro',
Expand All @@ -220,9 +217,10 @@ sub _build__qc_schema {

=head2 final_qc_outcome
A boolean read-only attribute, false by default.
If set to false, the result of the evaluation is saved as a
preliminary manual QC outcome. If set to true, the result of the
A boolean read-only attribute, C<false> by default.
If set to C<false>, the result of the evaluation is saved as a
preliminary manual QC outcome. If set to C<true>, the result of the
evaluation is saved as a final manual QC outcome.
=cut
Expand All @@ -234,22 +232,30 @@ has 'final_qc_outcome' => (

=head2 conf_path
An attribute, an absolute path of the directory with
the pipeline's configuration files. Inherited from
npg_tracking::util::pipeline_config
An attribute, an absolute path of the directory with the pipeline's
configuration files.
Inherited from C<npg_tracking::util::pipeline_config>.
=head2 conf_file_path
A method. Returns the path of the product configuration file.
Inherited from npg_tracking::util::pipeline_config
Method returning the path of the product configuration file.
Inherited from C<npg_tracking::util::pipeline_config>.
=head2 runfolder_path
The runfolder path, an optional attribute. In case of complex products
(multi-component compositions) is only relevant if all components belong
to the same sequencing run. This attribute is used to retrieve information
from RunInfo.xml and {r,R}unParameters.xml files. Some 'robo' configuration
might not require information of this nature, thus the attribute is optional.
The runfolder path, an optional attribute.
In case of complex products (multi-component compositions) it is only relevant
if all components belong to the same sequencing run. This attribute is used to
retrieve information from C<RunInfo.xml> and C<{r,R}unParameters.xml> files.
Some C<robo> configurations do not require information of this nature, thus the
attribute is optional. In practice the attribute has to be set to allow for
correct evaluation of applicability of all criteria of a production RoboQC
configuration.
If the information from the above-mentioned files is required, but the access
to the staging run folder is not available, the check cannot be run.
Expand All @@ -265,7 +271,7 @@ has 'runfolder_path' => (
=head2 BUILD
A method that is run before returning the new object instance to the caller.
Errors if any attributes of the object are are in conflict.
Errors if any attributes of the object are in conflict.
=cut

Expand All @@ -281,8 +287,9 @@ sub BUILD {

=head2 can_run
Returns true if the check can be run, ie a valid RoboQC configuration exists
and one of the applicability criteria is satisfied for this product.
Returns a C<true> value if the check can be run, ie a valid RoboQC
configuration exists and one of the applicability criteria is satisfied for
this product.
=cut

Expand Down Expand Up @@ -320,14 +327,13 @@ sub can_run {

=head2 execute
Returns early if the can_run method returns a false value.
Returns early without an error if C<can_run> method returns a C<false> value.
An assessment of applicability of running RoboQC on this entity is performed
next, an early return is possible after that. If RoboQC is applicable, a full
evaluation of autoqc results for this product is performed. If autoqc results
that are necessary to perform the evaluation are not available or there is
some other problem with evaluation, an error is raised if the final_qc_outcome
flag is set to true, otherwise it is captured and and logged as a comment.
If C<can_run> returns a C<true> value, a full evaluation of autoqc results
for this product is performed. If autoqc results that are necessary to perform
the evaluation are not available or there is some other problem with evaluation,
an error is raised if the C<final_qc_outcome> flag is set to C<true>, otherwise
the error it is captured and and logged as a comment.
No QC outcome is assigned if the evaluation has not had a chance to run to a
successful completion.
Expand Down Expand Up @@ -365,8 +371,8 @@ sub execute {
=head2 evaluate
Method implementing the top level evaluation algorithm. Returns the outcome
of the evaluation as 0 for a fail or 1 for a pass. Saves the outcomes of
evaluation of individual expressions in the evaluation_results attribute of
of the evaluation as C<0> for a fail or C<1> for a pass. Saves the outcomes of
evaluation of individual expressions in the C<evaluation_results> attribute of
the result object.
=cut
Expand All @@ -384,8 +390,25 @@ sub evaluate {
Returns a hash reference representing the QC outcome.
my $m_outcome = $r->generate_qc_outcome('mqc');
my $qc_outcome = $r->generate_qc_outcome();
The returned hash has three keys:
=over
=item C<timestamp> records the current time and is interpreted downstream
as the time the QC decision was made.
=item C<username> with the value of C<'robo_qc'>, is interpreted downstream as
the user who performed 'manual' QC.
=item C<mqc_outcome> or C<mqc_seq_outcome> for the outcome of the evaluation.
Examples of values: C<Accepted preliminary>, C<Rejected final>. If lane-level
evaluation is performed, the key is C<mqc_seq_outcome>, otherwise it is
C<mqc_outcome>.
=back
=cut

sub generate_qc_outcome {
Expand All @@ -407,7 +430,7 @@ sub generate_qc_outcome {

=head2 lims
st::api::lims object corresponding to this object's rpt_list
C<st::api::lims> object corresponding to this object's C<rpt_list>
attribute.
=cut
Expand All @@ -424,7 +447,8 @@ sub _build_lims {

=head2 runfolder
npg_tracking::illumina::runfolder object
C<npg_tracking::illumina::runfolder> object for C<runfolder_path> attribute
value.
=cut

Expand Down

0 comments on commit 825d5a0

Please sign in to comment.