forked from wtsi-npg/npg_seq_pipeline
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Changes
2249 lines (1905 loc) · 105 KB
/
Changes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
LIST OF CHANGES
---------------
- npg_pipeline::function::autoqc
- Simplified the flow of the code.
- Made clearer logic for choosing QC checks to run, provided comments.
- Allowed for running the review QC check on lanes.
release 68.6.0 (2024-10-24)
- The runfolder_path attribute is passed to the constructor of the review
autoqc check object when deciding whether this check should run.
release 68.5.1 (2024-10-04)
- Added .github/dependabot.yml file to auto-update GitHub actions
- Following a release on 07/09/2024, see https://metacpan.org/dist/App-perlbrew/changes,
the checksum of the script served by https://install.perlbrew.pl had changed.
https://install.perlbrew.pl is a redirect to raw
https://github.com/gugod/App-perlbrew/blob/master/perlbrew-install, so
the change originates from GitHub and can be trusted. Our CI flow compares
the checksum of the downloaded script to the expected value. We now store
an updated expected checksum value, which corresponds to the latest release.
- GitHub CI - updated deprecated v2 runner to v3
release 68.5.0 (2024-09-04)
- The runfolder_path argument is added to the command for the autoqc review
check. See https://github.com/wtsi-npg/npg_qc/pull/869
release 68.4.0 (2024-08-06)
- Ensured mark duplicate method can be inferred for a product with multiple
studies (tag zero).
- Upgrading tests
- Use contemporary run folders for tests (NovaSeqX)
- Clean fixtures
- Prevent tests from accessing live databases (reset HOME)
release 68.3.0 (2024-05-24)
- Removing Tidyp dependency from CI
- Added 'SampleSheet.csv' file from the top level of the run folder to
a list of archived run-level Illumina data. This file is only present
in MiSeq run folders.
release 68.2.0
- Added '--process_separately_lanes' to the pipeline to explicitly exclude
multiple lanes from a merge.
- Generalised dir_path method in npg_pipeline::product. This fixed a bug in
npg_run_is_deletable, which manifested in wrong expectations about the
directory tree for partially merged run data.
- Dropped a check for DRAGEN analysis data from npg_run_is_deletable.
- Removed unnecessary tests from t/10-pluggable-central.t
release 68.1.0
- Apply changes to the code and tests, which follow from removing some
functionality from npg_tracking::illumina::runfolder, see
https://github.com/wtsi-npg/npg_tracking/pull/807. The pipeline retains
all its previous functionality.
release 68.0.0
- Use st::api::lims->aggregate_libraries() method for both 'merge_lanes' and
'merge_by_library' pipeline options. This is a breaking change as far as
archival and deletion of NovaSeq Standard workflow data is concerned.
Key change for lane merging for this data will be that tag 0 and tag 888 will
not be merged across the lanes.
The NovaSeq Standard workflow, where there is only one input port, is
different to the more general merging across lanes where the (claimed) same
library has been sequenced. But this is not a valid reason to maintain separate
code.
- Deletable shadow folders are detected, but not considered as deletable
for now. They are flagged in the log of the 'npg_run_is_deletable' script.
- Removed all code that was used for the UKB project and the upload of Heron
project data to CLIMB. Updated the archival pipeline function graph and its
graphical representation.
- Removed 'cache_merge_component' and 'archive_to_s3' pipeline functions.
- Removed all functions for retrieving the QC state of the product
from 'npg_pipeline::product'.
- Dropped npg_pipeline::base dependency on QC database ('qc_schema'
attribute for 'npg_qc::Schema'). Removed test fixtures for the QC database.
- Deleted 'npg_receipt4run_is_deletable' script.
- Dropped checks for files upload to the third-party cloud locations when
deciding whether the run folder is deletable.
- Updated examples in POD in 'npg_pipeline::product::release'.
- Excluded redundant settings from 'product_release.yml' files used in
unit tests.
release 67.1.1
- Fixed correct pp collection root for MiSeq
release 67.1.0
- Fix typo in analysis specific overrides for bwa_als_se mapping to bwa0_6
- Add in use of autosome target regions for BGE libraries in seq_alignment
- Add 'merge_by_library' pipeline boolean option. This options is automatially
activated for NovaSeqX platform. It triggers a discovery of sets of data
that belong to the same libraries. If cases like this are found, the pipeline
is instructed, at the secondary analysis stage, to process this data as a
single entity. In practice, if the same pool is sequenced in more than one
lane of the run, sample data for the pool are merged across these lanes.
The 'discovery' part of the algorithm is implemented in
https://github.com/wtsi-npg/npg_tracking/pull/772
- Removed provisions for inline indexes
- Removed a check for rapid runs when deciding whether to merge
- Stop warnings about an undefined value when writing to the log from
npg_pipeline::function::seq_alignment
- Some tests were creating test data in the package's source tree. These
activities are redirected to temporary files and directories in /tmp
- Removed listing of non-existing files from MANIFEST
- Removed superfluous dependency on now removed st::api::request
- Added a test to expose a problem with ref cache, which is resolved by
https://github.com/wtsi-npg/npg_tracking/pull/761
release 67.0.0
- Turn off spatial filter QC check for NovaSeqX
- Switch to Perlbrew to obtain multiple Perl versions
- Remove npg_ml_warehouse dependency
- Enhance README with more context
- Improve Markdown format consistency
- Add images of DAGs, add links, fix a typo
- Add info on data intensive P4 based steps
release 66.0.0
- small tweak to seq_alignment so GbS samples with no study ref do not fail
- switch off spatial filter for NovaSeqX
- for NovaSeqX, default RNA analysis should be STAR
release 65.1.0
- ensure per-product archival for NovaSeqX data
- runs with data analysed on-board are not deletable
- ensure per product archival impacts hierarchy
- per product publish doc and variable name fixup
- when platform_NovaSeqX is detected, set p4 parameter i2b_nocall_qual_switch
to "on"
- avoid archiving "Analysis" hierarchy with XML and InterOp
release 65.0.0
- remove wr limit of p4s1 to specific flavor
release 64.0.1
- set p4 parameter to fix bug in bwa-mem2 + non-consented human split
release 64.0.0
- add bwa_mem2 flag to options.pm to allow override of default bwa analyses at pipeline invocation
- update seq_alignment to recognise the bwa_mem2 flag and also default to bwa_mem2 for NovaSeqX platform
release 63.3.0
- force no_target_alignment for haplotag libraries
release 63.2.0
- Adapt the pipeline to load a syslog dispatcher file for npg_publish_tree
release 63.1.0
- add configuration and functionality to support filtering and appending
irods error log messages to syslog.
release 63.0.0
- removed pp_archiver from the function graph.
- removed scripts for CLIMB data maintenance
- add haplotag QC check
release 62.2.0
- CI
- update version of github actions
- change CI runner from Ubuntu 18.04 to ubuntu-latest
- run deletion:
accounted for a potential human split for GBS runs and lanes where
a primer panel is set,
when pp files are missing on staging, error message extended with info
about a product
release 62.1.0
- allow non consented human split for GBS runs
- allow analysis of single-end runs with non-consented human split
release 62.0.4
- fix tag 888 bug in force markdup_method:samtools for single-end runs
release 62.0.3
- always use markdup_method:samtools in stage2 analysis for single-end runs
release 62.0.2
- Adjust bam_flagstats QC check invocation in seq_alignment for nonconsented
human split and XA/Y human splits
nchs: always use --skip_markdup_metrics
XA/Y splits: use --skip_markdup_metrics in the same way as the target subset
release 62.0.1
- Prevent the file glob expansion by the shell when calling a loader
for the run parameters XML file. A non-existing file might be passed
to the loader.
release 62.0.0
- Add substitution metrics to the seq alignment command.
- Removed provisions for loading the old warehouse, the function
for loading the old warehouse was removed some time ago.
- Extended the ml warehouse loader job. For all warehouse loader jobs
that are run prior to setting the 'qc complete' run status an extra
script is invoked; it loads the content of RunParameters.xml Illumina
file to the warehouse to facilitate automatic billing.
release 61.2.0
- Added a new pipeline function - archive_irods_locations_to_ml_warehouse
release 61.1.0
- Heron artic full primer version will be uploaded to Majora.
release 61.0.0
- Reverse logic for SamHaplotag in the seq. alignment function:
--revcomp flag - should be used when is_i5_opposite attribute is false.
- When loading main pipeline product to iRODS, added an option to create
files recording the location of these products. In future these files
will be used to load the iRODS locations to the ml warehouse.
- Purge Net::AMQP::RabbitMQ and message notification system used for UKB
Vanguard
release 60.6.0
- Fixed bugs in autoqc checks validation for the npg_run_is_deletable
scrip. Previously the runs where libraries had the primer panel set
were not autoqc-deletable since the insert_size check results, which
are not run for such products, were absent in the database, but were
expected by the code. A similar problem is fixed for ref_match check
results for GBS runs.
release 60.5.0
- Add Targeted NanoSeq Pulldown to Duplex-Seq analysis library_type list
- Refactored existing code to create new public functions, which will be
used in the extension of npg_run_deletable script dealing with decisions
about correctness of archival of portable pipelines output.
- Extended npg_run_deletable script to deal with the output of portable
pipelines for a case when this output is archived to iRODS.
release 60.4.0
- Removed co-location check (host and staging area) from the pipeline
daemon code. This check is no longer relevant, the staging servers
listed in the code are no longer in use.
- do haplotag processing for appropriate library types
- switch off spatial filtering application and QC check for NovaSeq platform
- Remove pipeline extensions for the npg_tracking daemon monitor,
they are no longer in use.
release 60.3.0
- Extended i5opposite pad to 5 bases.
- Check for pp archive existence prior to job creation.
pp archival could notcope with data produced by non-current
pp versions. When the version of an artic pp is updated, the
pp archival jobs that are scheduled after the update, but have to
archive data, which was produced prior to the update, fail.
The solution is to check that the pp staging archive for the 'current'
pp version exists. If it does not exist, but there is an archive for
some other version, data from that staging archive should be archived.
Staging archives for multiple pp versions cause the code to error.
release 60.2.0
- When generating lane taglist files truncate and pad i7 and i5 tags independently
- Support no_auto, no_auto_archive and no_auto_analysis tags for daemons
- npg_run_is_deletable - delete runs with status 'run cancelled' and
'data discarded' after 14 days of the status date irrespectively of
what study the samples belong to. The samplesheet is not available for
these runs. Prior to this change the code was accessing an external
source of LIMS data (XML LIMS API) to get information about the study.
This change also helps to go around very long deletion times which
are set for some studies.
- Create class/package level methods in npg_pipeline::product::release::irods
to allow for reusing the logic about run and product iRODS collections
paths in the code outside of this package.
- Stop loading data to the old warehouse. Remove
the update_ml_warehouse_post_qc_complete function from the archival
pipeline graph.
- Use CRAM files as input for the pulldown metrics autoqc job.
release 60.1.0
- GBS - block tag 0 from GBS pipeline and extra tests.
- Update docs for samplesheet generation
- Function graph for a reduced Heron pipeline
release 60.0.0
- When HiC library type detected in seq_alignment, set appropriate flag values
for bwa mem alignment.
- Change seq_alignment to observe gbs pipeline allowed or not in product
release study config.
- Configure status change jobs to optionally save statuses to a database.
Use an updated script name in the job - npg_status_save.
Saving to the database is disabled when either the 'local' or
'no_db_status_update' attributes are set to true. The description of the new
option appears in help. Using --local option is recommended in SOP when
testing the pipeline, which will automatically prevent the new version of
the job from saving statuses to the database. --local flag ensures that the
directory with the new analysis is not visible to the production daemons,
including the status daemon. Thus, before this change one of the
consequences of using the --local flag was the new statuses not appearing
in the database; this will continue to be the case.
- Pass the portable pipeline repository URL to the autoqc generic job.
pp_repo_url parameter is read from the study configuration for a
portable pipeline and, if defined, is passed to the autoqc generic job
so that the URL can be captured in the information about the autoqc
generic check, which is saved by the generic result object.
- Reimplement management of resources in the pipeline.
-- Declaration of resources is moved to the input JSON function graphs, see
https://github.com/wtsi-npg/npg_seq_pipeline/blob/devel/data/config_files/README.md
for details.
-- All pre-existing definition of resources (number of CPUs, memory, etc),
either in other configuration files or hardcoded in the code, are removed.
-- A new parent class npg_pipeline::base_resource for classes in the
npg_pipeline::finction namespace, which are responsible for job
definition generation. The new class has a factory function
create_definition() for generating npg_pipeline::function::definition
type objects. This class is also responsible for correct interpretation
of resources specified in the input JSON function graph.
-- Extension of the npg_pipeline::pluggable class to parse resource
definitions from the JSON input graph and to pass this to the function
implementor.
-- Functions, which can be executed by using the --function_order pipeline
argument, are restricted to the ones that are defined in the JSON
function graph, which is used by the pipeline.
- Remove provisions for generate_compositions function as it is no longer
used.
- Remove provisions for the upstream_tags autoqc check as it is no longer run
in production as part of the analysis pipeline.
release 59.2.0
- iRODS connections to be opened on-demand for validation
- Added an option of having a new boolean flag 'accept_undef_qc_outcome'
in the study configuration for a product for a particular archiver.
If this flag is set to a true value, the return value of the
has_qc_for_release method might return true in cases where previously
it would have returned false. This is done in order to allow for
archival of products which either passes QC or have never been through
manual or robo QC. The data retention policy implementation is changed
to take into account the new flag.
- npg_majora_for_mlwh - dry_run option goes further in the processes,
but avoids writes/updates
- Use GitHub actions for CI in place of Travsi-CI
- Unused JSON files for function graphs are removed.
- Per-study product configuration file product_release.yml, which is used
when creating jobs, is copied to the analysis directory to preserve
run conditions
- CI: replace Travsi-CI with GitHub Actions
release 59.1.0
- More options for defining wr limit groups: allow an exact match
to the last component of pipeline function class name.
- An additional wr limit group - s3 - is configured.
- The wait4path pipeline job does not have a log. To enable wr to
recognise that these jobs are unique, echoing a random string
is added to the shell command.
- Remove the limit on the number of NovaSeq runs being archived at
the same time. Introduce a limit on the number of runs,
irrespectively of the instrument type, which are moved to archival
within the last hour.
- Include pipeline version and name when sending sequencing run metadata
to the majora service.
release 59.0.0
- Following a decision to send data to CLIMB regardless of artic
QC status, the file glob of the data to upload is changed to
locations where both passed and failed data are available.
- Code for the archival and analysis daemons refactored:
1. Removed provisions for the access to configuration files
which have never been used.
2. The analysis daemon is no longer responsible for marking
runs as QC runs.
3. Removed access to ml warehouse for retrieval of LIMs data
since this information is no longer required.
- The qc_run pipeline option is removed, it was supporting a way
of setting up LIMs data for MiSeq runs which is no longer used.
- The lims_driver_type pipeline option is removed, it has never
been used, the pipeline will use the ml_warehouse driver by default
when creating a samplesheet. Internally this option is available
in some classes of the pipeline code base to indicate what driver
should be used by the jobs, this functionality remains intact.
- A simpler implementation of wr's limit groups to allow for setting
a persistent limit globally and for using limit groups that map
directly to accessors of the function definition object.
- Code in npg_pipeline::product::heron::majora is reimplemented as
a Moose class, most of the code of the npg_majora_for_mlwh script
is moved to this class, a logger is introduced. Improved a way
of matching a library type to majora metadata.
release 58.3.0
- bugfix in the code for interaction with the Majora/COG-UK API:
cope with no iseq_flowcell entry for resultset
- function graph for the analysis pipeline - add early archival of
the artic pp output to iRODS
release 58.2.0
- enhancement of code for interaction with the Majora/COG-UK API
- added analysis for Duplex-Seq libraries
release 58.1.0
- added npg_climb2mlwh to update warehouse from uploaded
climb data
- added ability to use custom locations and/or names for the
main log of the pipeline script
- the main log of the pipeline script is copied to the analysis directory
- add script for updating MLWH with state of Majora/COG-UK metadata
release 58.0.0
- a class for generating job definitions for autoqc generic
checks
- implementation of job generation for autoqc generic checks
for artic and ampliconstats
- generation of the autoqc generic result for artic and the
review result is removed from the stage2pp job for artic
- generation of the autoqc generic result for ampliconstats is
removed from the stage2App job for ampliconstats
release 57.17.0
- a generic way to specify constructor options in the function
listing in a registry and its implementation to iRODS archival
jobs and a stage2pp job
- implementation for the ampliconstats portable pipeline
- new pipeline function - stage2App - and its mapping to the
npg_pipeline::function::stage2pp class
- a new portable pipeline to produce ampliconstats data and its
mapping to the stage2App pipeline function
release 57.16.0
- a new function for archival of pp data to iRODS
- stage2pp function implementation is refactored to create common functions
and attributes, which in future could be used by additional portable
pipelines
release 57.15.0
- append autoqc generic result generation at the end of ncov2019_artic_nf
portable pipeline
- tests update following a change in the default behaviour of the
add_object method in WTSI::NPG::iRODS
release 57.14.0
- switch to sample control flag when determining eligibility for
pp data archival
release 57.13.1
- made run deletion policy consistent with a change to eligibility
for iRODS archival (see commit 457da605c9f7fe97f82954ffe7155ca96e034753),
which makes non-products (tag zero and spiked PhiX tag) not being
archived to iRODS if none of the lane products are archived to iRODS
release 57.13.0
- a new script - npg_upload2climb - to perform the upload, which is
specified in the definition generated by the pp_archiver function
- extended the spiked phix i5 tag (SPIKED_PHIX_TAG2) to 10-bases
- required arguments are passed to the npg_upload2climb script when
the pp_archiver function job description is generated
- the pp_archiver function is added to the archival pipeline graph
- archival to CLIMB is skipped for samples with withdrawn consent
- library type and primer panel are added to the CLIMB archival
manifest
- simplification of dependencies representation for LSF jobs in private
functions of the LSF executor class, which fixes the little-understood
problem of disappering dependencies for seq_alignment jobs when they
are split between multiple LSF job arrays
release 57.12.0
- a new function definition class npg_pipeline::function::pp_archiver,
implementing two new pipeline functions - 'pp_archiver' and
'pp_archiver_manifest'
release 57.11.0
- a generic API for sequencing data metadata upload to a third party
and a script for uploading metadata for Illumina
sequencing platform
- product-specific primer panel bed file in seq_alignment
- simple robo QC step added straight after running the ncov2019-artic-nf
portable pipeline; the step creates a utility (user) QC outcome
release 57.10.0
- new function, stage2pp, for running portable pipelines straight
after stage1 in parallel to seq_alignment
release 57.9.0
- small change to seq_alignment.pm so it does not error if
gbs_plex_name (primer_panel) is set but lib type incompatible
with gbs analysis
- when markdup_method is "none", add skip_markdup_metrics flag
to bam_flagstats qc command
release 57.8.0
- ability to apply limits to wr groups of jobs and a limit for
all iRODS jobs
- function creating definitions for autoqc jobs - when evaluating
whether the autoqc check should be run:
reduce run time by passing to the autoqc class instance,
where appropriate, a lims object and fastq reference path;
explicitly pass product_conf_file_path to this instance
- iRODS archival of non-products is driven by settings of products,
i.e if all products in the lane should not be archived to iRODS,
non-products (tag zero and spiked PhiX tag) will not be archived
either
- remove the old warehouse loader from the analysis function graph
- remove function for illumina qc analysis archival (old way of
saving InterOp data to QC database)
release 57.7.0
- cluster count check and p4stage1 functions use new class
(npg_qc::illumina::interop::parser) to parse Illumina InterOp files
- change npg_pipeline::product::release to use tertiary config
- new qc_interop function to run interop autoqc check
- simplification of the analysis function graph: number of mlwh
updates is reduced to two, one after stage 1 and interop autoqc
check and another towards the end of the flow
release 57.6.0
- only set p4 parameter values for markdup_method and
markdup_optical_distance_value when do_target_alignment is true;
this also stops an error being thrown if the entity (for example,
tag zero product) has multiple studies and references
- fix haplotype caller check for a PCR free library type to be case
insensitive
- increase memory for bqsr and haplotype caller jobs
- make test CRAM files compliant with samtools v.1.10.0,
which gives an error if no header is present in a file
release 57.5.1
- bug fix - correct node id in splice (for GbS)
release 57.5.0
- add BWA MEM2 support to seq_alignment function
- bug fix: add -f to rm command removing intermediate files (to
avoid error when no intermediate files are present)
- allow selection of duplicate marking method (biobambam,samtools
or picard) in seq_alignment via product_release.yml
- detect flowcell type and set uses_patterned_flowcell attribute
to allow setting of optical duplicate region size
- add ability to select bwakit postalt processing (if reference has
alternate haplotypes) in seq_alignment via product_release.yml
release 57.4.0
- change genotype qc check to cram input
- LSF array indexes fix for jobs dealign with chunked data
(multiple jobs per product)
- esignate no_archive directory for files for chunked entities,
which are not end products
- haplotype caller function: early detection of prducts that are
not for release (tag zero and control)
release 57.3.0
- add chromium libs (forced no target alignment) to bam prune skip
list in seq_alignment
- archival pipeline function for deletion of intermediate files
- script to generate receipts files to be used by npg_run_is_deletable
scrit for one of teh studies
release 57.2.0
- prune bam generation for most products with no alignment and
change bam_flagstats command in seq_alignment to crams
- skip markdup step in seq_alignment for spike tag
- add haplotypecaller to function list
- use only public run folder methods
- path logic improvements
release 57.1.1
- all components of npg_run_is_deletable script to use samplesheet
as a source of LIMS data
release 57.1.0
- configurable study-level qc criteria for archival and for minimum
delay for run folder deletion
release 57.0.3
- add missing indexing step to merge_recompress
release 57.0.2
- fix logic in WR dependancies where pipeline converges
release 57.0.1
- fix where new code was not taking NPG_REPOSITORY_ROOT and add
duplicated code to ref cache.
release 57.0.0
- supply MD5 in bucket file upload if available in sibling md5 file
- add function to support GATK HaplotypeCaller and apply BQSR
- add function to concat and recompress gVCFs
- add function to calculate BQSR table
- cram files as input to the adapter autoqc check
- make list of files due to be archived dependent on alignment
confuguration of the study
- run folders for test data restructured to reflect new-style
product hierarchy and not to use outdated path component
names (bustard, etc)
release 56.1.0
- move reference cache from seq_alignment to own singleton class
- remove provisions for old-style run folders
- qc_review function added
- provisions for splitting a product into chunks
- to be forward compatible with changes in tracking, remove direct
dependency of the pipeline daemon on the short_info and location
tracking roles
- ability to run the pipeline for individual products; some archival
pipeline functions updated to enbable this ability on their level
- autoqc adapter check - give cram files as input
release 56.0.1
- ensure that the paths serived from the archive directory in
different parts run_is_deletable utility are consistent.
- add autosome stats file to product release
- add missing bait prune to seq_alignment
release 56.0.0
- add autosome target to seq_alignment
- pipeline configuration module and product release configuration
accessors are moved to npg_tracking package in order for the product
configuration be accessible from other packages, code in this
package refactored to accommodate the change
- conform to bambi's v 0.12.0 file and directory naming schema for
tileviz data
- add facility to do LSF 1:1 job index dependencies on array jobs
- when validating run folder for deletion, ensure linked directories
and files are recognised
release 55.2
- switched from S3 to Google Compute Storage
- change bcfstats qc job to use CRAM instead of BAM file as input
release 55.1
- added configuration option to change the S3 endpoint URL
release 55.0.1
- bug fix for invocation of the generate() function in the
seq_alignment function module following an addition of the
generate_composition function
release 55.0
- additional 'GnT MDA' library type added to allowed types for gbs analysis
- a new archival pipeline function, cache_merge_component, for caching merge
candidates as a part of the archival pipeline
- no overwriting existing tileviz files when scaffolding teh runfolder
- a new function, generate_compositions, for generating composition JSON files
- npg_run_is_deletable:
cross-checks for all file archival destinations to ensure that each
product is archived in at least one destination;
full logic for validating correctness of s3 archival
release 54.1.2
- set explicit umask for wr jobs to guarantee that output is group-writable
release 54.1.1
- bug fix in command generation for iRODS data archival from old-style
run folders
release 54.1
- minor speed-up in seq_alignment function due to caching of
unseccessfully retrieved references
- npg_run_is_deletable understands per-product iRODS collections and
make runs that have products archivable to s3 not deletable
- function for saving fastqcheck files is removed from the archival
pipeline function graph, implementation of this function is deleted
- changes of p4_stage1 and seq_alignment functions to accommodate
removal of fastqcheck files generation in respective p4 templates
release 54.0
- archival function graph includes publishing both to s3 and iRODS
- a function graph for post 'run archived' small pipeline
- no_s3_archival flag to switch off archival to s3 and notification by
a message, false by default, is automatically sey to true if the
local flag is set to true
- per-product restart file for iRODS publisher
- function definition for a job to wait to move from the analysis
to the outgoing directory
- wr job log file to be appended to if the job is retried
- propagation of the iRODS settings to wr jobs
- persistent mode for RabbitMQ message delivery
release 53.1
- publishing of seq data to iRODS:
make product destination aware;
iRODS directories hierarchy for NovaSeq runs to mirror product
archive directories hierarchy
- run data validation (npg_run_id_deletable acript) reimplemented to provide
support for new style of run dolder and merged entities.
release 53.0
- a wrapper object npg_pipeline::product to represent a product
- use products attribute to drive p4_stage1, seq_alignment and autoqc
- create composition.json files to guide archiving
- p4 params files for seq_alignment moved from no_cal/laneN to no_cal
(changes run folder structure when merging lanes)
- cluster_count and seqchksum_comparator checks now done at run level instead
of lane level
- upfront definition of all products
- generic runfolder scaffolding for any products
- since the top-level qc directory is no longer required, the tileviz
directory is moved to the analysis directory
- reshuffle of roles in npg_pipeline::roles:
npg_pipeline::roles::business::base merged into npg_pipeline::base;
npg_pipeline::roles::business::flag_options moved to
npg_pipeline::base::options, a number of pipeline options from other
modules moved to this role;
npg_pipeline::roles::accessors moved to npg_pipeline::base::config;
helper functions moved to a new role - npg_pipeline::function::util
- ref_adapter_pre_exec_string method renamed to repos_pre_exec_string
- metadata_cache_dir method, formerly in npg_pipeline::roles::business::base,
removed; npg_pipeline::function::p4_stage1_analysis module, the only user
of this function, switched to use the relevant accessor from the
npg_pipeline::runfolder_scaffold role
- minor changes for bcfstats qc check
- executor type (lsf or wr) can be specified in the configuration file
- wr executor:
set per-job priority;
increase priority for p4 stage 1 job and its predecessors;
set priority of status and start-stop jobs to zero so that
they are executed immediately, but still within dependencies
and memory constraints;
map queues to arbitrary wr options, in particular, a special queue
for p4_stage1 maps to a specific cloud host flavour
- correction of build method for rpt_list attribute in product
- make bam_cluster_count_check pipeline job dependent on
qc_spatial_filter (in function_list_central.json)
- archival daemon - limit number of simultaneously archived NovaSeq runs
- wr executor - explicitly propagate pipeline's environment to jobs
- illumina archiver job:
exclude discontinued verbose attribute and paths that are not needed
for the minimal work this loader is doing now;
remove LSF preexec requesting that the job is a unique runner since
db queries are much simpler now
- change signature of the autoqc archival job in line with extended
functionality of the autoqc db loader (ability to find JSON files
in the run folder)
- change components_as_products method of npg_pipeline::product to
return a list with one item when there is only one component in
the composition (instead of an empty list)
- tileviz index file with links to lane-level tileviz reports is created
- seq_alignment supports HISAT2 aligner for RNA libraries
- explicit iRODS destination collection is set for iRODS loaders,
/seq/illumina/runs/RUN_ID for NovaSeq runs and /seq/RUN_ID
for the rest
- explicitly use iRODS loader from an 'old' dated directory for
old style runfolders
- a new function, archive_run_data_to_irods, to publish run-level non-product data to iRODS
- modify run_data_to_irods_archiver module to ensure the interop files go to a dedicated directory
- additional tags for NovaSeq in dbic_fixtures
release 52.1
- bug fix in jobs names where jobs name should include the pipeline
name: pipeline name is now propagated from the pluggable module
to the function module; bug manifestation - job names contained
function module name instead of the pipeline name, ie, for
example prod_pipeline_end_26263_start_stop instead of
prod_lsf_start_26263_central
- pipeline name attribute is derived from the script name that
invoked the pipeline, making it unnecessary to explicitly pass
the function list name in the archival pipeline script
- fix for seq_alignment so specified rna aligners do rna analysis
- added (samtools) target stats to stage2 analysis
- correct p4 prunes for samtools stats (target/baits)
release 52.0.5
- bug fix in npg_run_is_deletable: stop using unsupported options
for npg_pipeline::cache
- npg_run_is_deletable should not expect adapter qc results for a
pool, the source files do not exist since release 52.0
- add log archiver to the end of the archival pipeline
- use outgoing paths for jobs which are run after the run_qc_complete
function; this patch also fixes the log file path for lsf_end job of
the archival pipeline, which previously was always in outgoing
release 52.0.4
- bug fix: change path for a file with LSF commands to a path in
outgoing for jobs that run after the run was moved to the outgoing
directory
release 52.0.3
- bug fix: use analysis_path instaed of bam_basecall_path in a method
that is used by both analysis and archival pipelines; the value of
bam_basecall_path is available only when explicitly set, ie only
in the analysis pipeline
release 52.0.2
- allocate more memory to sequence_error and insert_size autoqc
checks since they now use newer bwa, which creates twice larger
reference index
release 52.0.1
- alignment of tag#0 not done by default (align_tag0 flag added)
release 52.0
- remove dependency of tests of LIMs XML, use samplesheet instead
- remove dependency on tracking XML feeds
- update p4 stage1 default values in general_values.ini
restored p4_stage1_split_threads_count=4
- removed illumina_basecall_stats function and associated code
- remove generation of empty fastq and fastqcheck files
- removed bam2fastqcheck_and_cached_fastq function
- removed create_archive_directory function, scaffolding the runfolder
is called in the beginning of the pipeline within the 'prepare'
method of the analysis pipeline
- increased number of threads for p4 stage1 (newer bambi version required)
- added LSF-independent evaluation for number of threads
- removed redundant dependency on illumina2bam jars
- stopped forcing ownership and permissions when creating
new directories
- single log directory for all jobs with per-function subdirectories
- added LSF-independent for number of threads
- added wr executor
- new modules to execute submission of definitions to LSF
- captured dependencies between pipeline steps in a directed acyclic graph
- moved flags, attributes and method related to the overall
pipeline logic to npg_pipeline::pluggable
- flattened directory structure for modules implementing functions,
they all now belong to npg_pipeline::function namespace
- removed methods representing functions, created mapping of
functions to modules, methods and options in
npg_pipeline::pluggable::registry
- removed ::harold:: component from pipelines'namespace
- removed post_qc_review pipeline module
- added npg_pipeline_ prefix to this package's script names if
was not part of their name
- removed unused module for fixing Illumina config files
- removed unused module for LSF job creation for tag deplexing -
this is now done within p4 stage 1
- removed unused implementation for function copy_interop_files_to_irods
- removed unused spatial_filter, fix_broken_files and force_phix_split flags
- removed a number of unused methods in npg_pipeline::base
- no lane-lavel bam files are produced by p4 stage1 for pools - do not run
the adapter check in these cases
- adapterfind flag added to switch adapterfind on/off (default: on)
- scaffolding of runfolder includes .npg_cache_10000 directory creation (lane and plex)
- stage1 analysis: parse interop data for cluster count calculation (used for 10K subsampling)
- seq_alignment reads tag_metrics files to calculate fraction for 10K subsampling
- seqchksum_comparator function now uses seqchksum files from analyses (no regeneration)
- QC spatial_filter now run as standard QC check
- add p4s2_aligner_intfile flag to force temporary file production in stage2 alignment
- p4 stage1 splice/prune directives moved from vtfp command line to params file
release 51.12.2
- fixed lane taglist files for TraDIS libraries
no longer pad spiked phix tag simply add missing i5 tag for dual index runs
- update p4 stage defaul values in general_values.ini
p4_stage1_memory=20000, +p4_stage1_slots=8, +p4_stage1_i2b_thread_count=8
release 51.12.1
- tweak to GbS library type check in seq_alignment.pm as arrived as GBS (now case-insensitive).
release 51.12.0
- Travis CI build - add iRODS test server
- run_is_deletable script moved to this package from data_handling,
custom conversion between run id and run folder path refactored to use
npg_tracking::illumina::runfolder,
lims-driver-type argument is added to reset the default samplesheet driver type,
iRODS build is added to Travis CI configuration to enable all new tests to run,
Log::Log4perl is used for logging
- added support for GbS processing
- travis build tweak for npg_qc
release 51.11.3
- seq_alignment: fixes for no target alignment and no target alignment+non-consented human split
release 51.11.2
- use align_intfile_opt=1 when aligning with star to produce intermediate bam file
- by default, force bambi i2b to single-threading (general_values parameter available for override)
release 51.11.1
- Handle dual indexes (create new format lane tag files)
- remove remaining broken provisions for xml LIMs driver
- use the new log publisher
- now allows XA/Y-split with no target alignment
release 51.11.0
- added support for RNA analysis/quantification using STAR and salmon
- STAR alignment jobs get more memory using bmod after seq_alignment jobs have been submitted.
- removed unneeded coordinate sort and duplicate marking when there is no alignment to a target reference
release 51.10.3
- no alignments for chromium libraries
- seq_alignment to do_rna analysis regardless of the organism specified (other conditions stay in place)
release 51.10.2
- use bwa aln for human split with tophat target alignment
release 51.10.1
- Modified qc run function list, removed copy_interop and switched archive_to_irods to samplesheet
release 51.10
- Chained execution of RNA-SeQC to the vtfp/viv alignment cmd for RNA-Seq libraries only:
entries for qc check rna_seqc removed from central function and parallelisation.
code that created rna_seqc-specific directories has been removed as this is
now handled by the check itself using qc_out arg.
- remove GCLP-specific code and configuration files
- remove unused force_p4 attribute
- OLB analysis removed
- recalibration removed
- pb_cal_path and dif_files_path accessors disabled
- allow p4 stage 1 to analyse runs with different length reads
- illumina2bam function removed
- update p4 stage 2 (seq_alignment) warn rather than croak if multiple references for tag 0
- update p4 stage 2 (seq_alignment) to use bambi chrsplit instead of SplitBamByChromosomes.jar for Y-split runs
- pipeline scripts - redirect stderr output to the log to capture output from all
NPG and CPAN modules in one place
release 51.9
- p4stage2 speed-up by caching references
- p4stage2 errors in getting a reference made fatal
- iRODS publish script new options: (1) --restart_file to pin the script's
process file name to a particular LSF job, (2) --max_errors to force the script to
fail after certain number of errors (10 specified in the configuration file)
- seqchksum_comparator test fixed for gseq by generating a cram file with a header
that lists a reference available on gseq and supressing an outside search by
setting REF_PATH to an invalid value; the test will continue to work on hosts
where REF_PATH i sset and available
- consistent computation of absolute path, which takes account of substitution
release 51.8
- when comparing checksums, generate seqchksums for each cram file and merge
the results rather than merging the cram files and generating seqchksum
release 51.7
- replaces the original log role with the one from DNAP utilities,
which provides a Log4perl logger and some convenience methods.
- new signature for the sequencescape warehouse loader so that it uses
samplsheet LIMs driver at the analysis stage and ml_warehouse_fc_cache
LIMs driver at the archival stage
release 51.6
- test and code fixes to ensure problem-free tests under Perl 5.22.2
- tweak to qc_report_dir in bsub command for one library per lane case
- fix convert-low-quality flag in bambi decode command; set bid_implementation always to bambi
release 51.5
- update p4 stage 2 (seq_alignment) to handle all cases (e.g. no target alignment, spike tag)
- support generation of targeted stats files in seq_alignment.pm with p4
- qc jobs creation, can_run, check object instantiation:
do not supply path/qc_in, which is now optional
do not set attributes that the object does not have
- allow specification of implementation (java or bambi) of illumina2bam and bamindexdecoder
in p4 stage 1 via general_values.ini
- add Broad Institute's RNA-SeQC to list of autoqc checks
- run bam_flagstats autoqc check via the qc script
- tweak for targeted stats files and also human split
release 51.2
- patch to script_must_be_unique_runner - only ignore exact matches to the job id
- change function order to run p4 stage 1 analysis by default
release 51.1.1
- extended is_hiseqx_run to detect HiSeq 4000 runs
- samtools1 cat .. doesn't work with different references, replaced by samtools1 merge ..
release 51.1
- replaced bamcat .. by samtools1 cat .. in seqchksum comparision
previous command line was too long for large pools
- changes for pools with >999 samples, LSF job array index now 5 digits
modified tests
- added lims_driver_type cli option
release 51.0
- use npg_irods npg_publish_illumina_run.pl in place of data_handling irods_bam_loader.pl
- provide appropiately changed second index read tags for ordered flowcell
instruments (typically rev. complement) e.g. HiSeqX
- in both the analysis and archival function order have an extra
ml warehouse loader job to set the stage for loading to iRODS
- warehouse loaders that are run after setting qc complete date
wait for the runfolder to be moved to outgoing, their log location
is updated accordingly
release 50.3
- use 'purpose' field to decide if qc_run
- study-specific software stack for the analysis pipeline
- added new module for p4 stage1 analysis
release 50.2
- names of the pipeline daemon modules and scripts harmonised
- the daemon module does not inherit from the pipeline base class thus
reducing the number of command line script options
- common code moved from the daemon scripts to the daemon module
- a new role for common accessors
release 50.1
- bug fix to allow archival daemon to work (restore availabilty of run folder
finding method)
release 50.0
- purge carriage returns (as well line feeds) from study descriptions for RG header
records (xml lims driver has previously done this as part of XML parsing)
- require minimum version 5.10 for perl
- add study analysis configuration accessor
- simpler name for the archival daemon module
- parent class for pipeline daemons
- dry_run option for daemons
- consistent behaviour of the archival and analysis daemons
when LIMs data are not available in the ml warehouse and
the run is not a QC run, the run is skipped
- the pipeline daemons define the type of the pipeline to run
(default, gclp, qc) and set appropriate backward-compatible
options for the pipeline script
- Log::Log4perl logger is used in pipeline daemons
- cached samplesheet generation - use ml warehouse for all
runs except QC runs, for which the old warehouse is still used
- add npg_pipeline_job_env_to_threads script (to avoid excessive repeated perl one-
liners in command arguments).
- archival of logs should run after an asynchronous move to outgoing (peformed by the
staging daemon) - paths adjusted and job preexec checking for the existence of the
runfolder in outgoing is added
release 49.8
- seq-alignment now uses bwa_aln_se for single read runs