forked from wtsi-npg/npg_irods
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Changes
615 lines (437 loc) · 20.8 KB
/
Changes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
Unreleased
Release 2.51.0 (2024-10-25)
- Banish more Sequels
Release 2.50.1 (2024-10-04)
- Remove sequel specific references to Sequel from the code
- Following a release on 07/09/2024, see https://metacpan.org/dist/App-perlbrew/changes
the checksum of the script served by https://install.perlbrew.pl had changed.
https://install.perlbrew.pl is a redirect to raw
https://github.com/gugod/App-perlbrew/blob/master/perlbrew-install, so
the change originates from GitHub and can be trusted. Our CI flow compares
the checksum of the downloaded script to the expected value. We now store
an updated expected checksum value, which corresponds to the latest release.
Release 2.50.0 (2024-08-07)
- SMRT Link IsoSeq analysis has changed substantially in SMRT Link v13
so old code and tests related to this was removed - further work is
planned in this area to remove the Sequel subdirectory entirely. As
it is an evolving area in sporadic use the loading code for IsoSeq
analysis was moved to a separate new IsoSeqPublisher module.
- added new WTSI::NPG::HTS::PacBio::Sequel::IsoSeqPublisher module
which handles loading of analysis output from SMRT Link
pb_segment_reads_and_isoseq jobs
- old code for IsoSeq loading completely removed from
WTSI::NPG::HTS::PacBio::Sequel::AnalysisPublisher
Release 2.49.2
- Prevent access to a live mlwh database from tests.
Release 2.49.1
- Removing Tidyp dependency from CI
- Add iRODS 4.3.2, remove 4.3-nightly
Release 2.49.0
- PacBio iRODS data - set QC state metadata when it is safe to do so.
- Allow non deplexed Revio cells to publish now
- Analysis loading changes post SMRT Link v13
Remove very old test data and update some tests. Add new test data and minor
code change to support loading of data from deplexing jobs in SMRT Link v13+
(very rare as only done when deplexing on instrument is incorrect).
Release 2.48.0
- Add sequencing_control.subreads.bam and index file to PacBio run
archiving. These files can be used to regenerate QC metrics.
- Add new (SMRT Link v13+) reports.zip file to PacBio Revio data
archiving.
Release 2.47.1
- Amended policy on archival of index files. No longer skipping
index files if there are no reads.
Release 2.47.0
- Fixed some outdated POD
- 'alt_target' iRODS metadata attribute value is now computed in the
same way as the 'target' metadata attribute value. To be consistent
with prior practice, the 'alt_target' value is assigned only when it
is true. This resolves the issue of tag zero files, which are
produced by the alternative process, previously having 'alt_target'
metadata attribute value of 1.
- Following introduction of ss_<STUDY_ID>_human iRODS groups to
provide limited access to split-out human data and consolidation
of logic about assignment of study_related iRODS groups in
https://github.com/wtsi-npg/perl-irods-wrap/pull/297, the following
changes were made:
- contains_nonconsented_human and update_group_permissions methods
were dropped from WTSI::NPG::HTS::Illumina::AlnDataObject
- ss_<STUDY_ID>_human iRODS groups were added to test fixtures
Release 2.46.0
- Add iRODS 4.3.1 Ubuntu 22.04 as a optional test target
- Minor change to PacBio AnalysisPublisher
- Update test matrix iRODS versions
Remove the combination of 4.2.11 clients with a 4.2.7 server
Add the combination of 4.2.7 clients with a 4.2.7 server
Add the combination of 4.3-nightly clients with a 4.3-nightly server
Release 2.45.0
- Add link to metadata repository to README
- Refactor to use common iquest wrapper from perl-irods-wrap
- Add Perl 5.34 to CI matrix
- Fixed a bug in the PacBio metadata updater.
Added a missing plate number when calling the find_pacbio_runs()
method that retrieves up-to-date metadata from mlwh.
- Refactor by hoisting environment variables
- Install iRODS clients outside of working directory
Release 2.44.0
- Changes to PacBio iRODS loading to support multi plate runs
(requires npg_id_generation v4.0.0)
- Unify location file writing for Illumina and PacBio products
Release 2.43.0
- Change PacBio min deplex default as all data now HiFi
- Remove dependency on npg_ml_warehouse for generating PacBio product IDs
- Add Singularity container support for iRODS clients, remove Conda-based
clients
Release 2.42.0
- Add location file writer and implement in pacbio run/analysis publisher.
- Change docker image repository used by CI
Release 2.41.0
- Add --logconf option to npg_publish-tree.pl
- Add id_product metadata to PacBio data objects.
- Remove WTSI::NPG::HTS::PacBio::Sequel::APIClient (moved to
wtsi-npg/perl-dnap-utilities).
- Add sequence diagrams of metadata flow when publishing.
Release 2.40.0
- PacBio - added support niche run folder processing special case
to support an ongoing PhD process (not likely to be required once
the project is finished)
- Update action versions in CI
Release 2.39.0
- PacBio - support iRODS loading for changed deplexing off instrument
files & directory structure in SMRT Link v11.
- Update to ubuntu-latest in actions
Release 2.38.1
- Update baton version in github actions workflow to 4.0.0
Release 2.38.0
- PacBio - add new files from new run configurations in SMRT Link v11.0
to RunPublisher
- Update baton version in github actions workflow to 3.3.0
- Remove iRODS 4.2.10 from github actions workflow
- archive metrics files produced by SamHaplotag
- PacBio - extend runauditor to cope with fixing permissions on
runfolder subdirectories created by deplexing on board
- Demote error to warning when EBI and iRODS checksums are different
Release 2.37.1
- Bugfix: Object limit for the iRODS single replica object-finding query
Release 2.37.0
- Bugfix: iquest parsing for the iRODS single replica object-finding query
- Change the field used to determine IsCCS for PacBio
- Archive the substitution metrics qc JSON file for Illumina
Release 2.36.0
- Add ML warehouse JSON file generation to npg_publish_tree.pl
- Add standard input option for providing metadata to npg_publish_tree.pl
- Add --limit CLI option to npg_update_single_replica_metadata.pl
- Bug fix: Avoid creating duplicate entries in the ML warehouse JSON file
generated by the Illumina run publisher
Release 2.35.0
- Tweak Sequel AnalysisPublisher for SMRT Link 10.2 to allow xml
in entry-points subdir.
- Correct propagation of the ml warehouse db connection to st::api::lims
object. The publisher was previously written to cache the connection
created by LIMSFactory, but failed to do so, leading to creation of
multiple connections.
Release 2.34.0
- PacBio - prep and load fasta file output from IsoSeq analysis.
- Change baton version in actions to 3.2.0 for compatibility
- Add Haplotagging logs to log publisher
Release 2.33.0
- Tweaks to ebi meta updater to account for files with no md5 in
the subtrack db (now common due to special projects).
- Also avoid nulls in ebi_sub_acc field for ebi meta updates.
- Tests simplified.
Release 2.32.0
- Archive raw pulldown metrics (gatk_collecthsmetrics) txt file from
the qc directory.
- Disable the samtools CRAM reference cache during tests.
Release 2.31.1
- substitution_analyis filename fix
Release 2.31.0
- PacBio - ensure runfolders are writable for md5 files prior to iRODS
loading.
- Allow another BAM file prefix in PacBio analysis loading (IsoSeq).
- PacBio code to check and change unix permissions on runfolders.
- GitHub Actions: verbose testing and update baton clients
Release 2.30.0
- Add substitution_analyis and substition_metrics ancillary files.
- Add PacBio API query (query_datasets) to support mlwarehouse loading.
- Add some extra checks before loading PacBio secondary analysis data.
- Update alignment file header parser for minimap2.
- Install NPG and CPAN Perl modules separately in CI, to avoid caching
NPG modules.
- Move from Travis CI to GitHub Actions.
Release 2.29.0
- Additional PacBio API queries to support mlwarehouse loading.
Release 2.28.0
- Support loading of Pacbio Sequel OnInstrument data.
Release 2.27.0
- set_object_permissions in WTSI::NPG::Data::ConsentWithdrawn updated to
use <user>#<zone> rather than <user> only format
- Pacbio RSII specific code clear out
- Restore the gbs plex meta data
Release 2.26.0
- Add iRODS 4.2.8 to tests
iRODS 4.2.8 is marked as an expected failure because we have not
yet built Conda packages of irods-icommands or libhts plugins for
iRODS 4.2.8.
- Pacbio Sequel merged analysis report file changes due to version update
- Pacbio Sequel generate and publish archive of QC images and reports.
Release 2.25.0
- Pacbio Sequel merged analysis report file changes due to version update
Release 2.24.1
- Order Conda channels for tests so that NPG channels have highest priority
and conda-forge the lowest.
Release 2.24.0
- Create merged analysis report for Sequel analysis jobs
- Remove iRODS 4.1.12 from tests
Release 2.23.0
- Bugfix: handling of --exclude CLI arguments by npg_publish_tree.pl.
- Make Treepublisher create an empty target directory, even when
there are no files to publish.
Release 2.22.0
- New script, npg_publish_tree.pl, to publish arbitrary directory trees of
files to iRODS, set permissions and add metadata to the destination root
collection.
- Add filter parameter to TreePublisher::publish_tree.
- Add named parameters to BatchPublisher::publish_file_batch.
- Fix teardown of iRODS collection in TreePublishTest.
Release 2.21.0
- New script, npg_process_consent_withdrawn.pl, to withdraw permissions on
iRODS data for samples with concent withdrawn. Ported from the internal
SVN repository and refactored to use WTSI::NPG::iRODS class for iRODS
operations.
- Add testing with iRODS v4.2.7 clients to Travis CI matrix.
Release 2.20.0
- When archiving logs skip large .err files in archive/tmp_.. directories
- Pacbio tweak to support traction barcode identifiers and loading older analysis.
Release 2.19.0
- PacBio run loading add options as in other scripts to support loading
of older runs.
- PacBio tweak to analysis loading to support traction barcode identifiers.
Release 2.18.0
- Add support for PacBio run deletion.
- Change ML warehouse query used to find PacBio runs in order to support
Traction LIMS (use pac_bio_run_name).
- Change GD package name on Travis from libgd2-xpm-dev to libgd2-dev
Release 2.17.0
- Uniquify pac_bio_run records from mlwarehouse - to handle unexpected
duplicate entries and stop adding target = 1 to scraps BAM files.
Change secondary archiving to support SMRT Link v8.
- Use samtools executable instead of samtools_irods.
- Travis CI tests:
use prod WSI Conda channel,
use specific version of samtools, pin it to v.1.10.0,
to enable samtools' access to iRODS, install HTS lib irods pligins
since they do not come with the samtools v.1.10.0 we have in the
channel
- Test change to cope with samtools v.1.10.0 inserting PG lines into
the header.
- WTSI::NPG::HTS::PacBio::Sequel::AnalysisPublisher restrict filename
checks to just the filename
- Publish new files generated by GATK.
- Removed ONT publishing support (including inotify and HDF5 dependencies).
Release 2.16.0
- WTSI::NPG::HTS::Illumina::ResultSet - added pulldown_metrics to
qc_regex.
Release 2.15.0
- remove usage of private method and prep for simplified path framework
- avoid defining own builders for runfolder accessors by inheriting from
runfolder object
Release 2.14.0
- Add support for loading PacBio ccs BAM files and setting target = 1
on relevant PacBio sequence files.
Release 2.13.0
- WTSI::NPG::HTS::Illumina::ResultSet - added geno to genotype_regex.
- WTSI::NPG::HTS::Illumina::ResultSet - added quant.zip to ancillary_regex.
- WTSI::NPG::HTS::Illumina::ResultSet - added _target_autosome.stats to
ancillary_regex.
- WTSI::NPG::HTS::PacBio::Sequel::AnalysisPublisher - archive tag zero file.
- Control caching of st::api::lims objects to reduce memory use in highly
plexed runs. WTSI::NPG::HTS::LIMSFactory now uses Cache::LRU to limit
the number of cached st::api::lims objects to 100.
- Added BioNano Saphyr run publisher and run publishing script
Release 2.12.1
- WTSI::NPG::HTS::Illumina::AlnDataObject - corrected is_paired_read
calculation and added missing test.
- Illumina RunPublisher modified to use a TreePublisher backed - no
change in behaviour is expected.
Release 2.12.0
- WTSI::NPG::HTS::LIMSFactory - for performance, cache st::api::lims
objects.
- Add id_product metadata attribute to primary metadata, compute the
attribute's value as a digest of composition JSON string.
- Metadata updater for Illumina sequencing data - search for a run
collection in multiple locations.
- Support for minor Pacbio API changes.
Release 2.11.0
- Only add run-, lane- or plex-level metadata (id_run=x, lane=y,
tag_index=z) on merged data when all the constituents are from the
same run, lane or plex. E.g.
run 1000, lane 1; run 1000, lane 2 gives id_run=1000
run 1000, lane 1; run 1001, lane 1 gives lane=1
run 1000, lane 1, plex 2; run 1000, lane 2, plex 2;
run 1000, lane 3, plex 2 gives id_run=1000, tag_index=2
Release 2.10.0
- Make the RunPublisher include paths relative to the source
directory. This generalises the behaviour for QC files to other
file types.
- Bug fix: restored correct creation and consumption of restart
files.
Release 2.9.2
- Register RunParameters.xml file for Illumina run publishing.
- When publishing cram files, only publish crai (not bai).
Release 2.9.1
- Illumina::AlnDataObject
Call super() in update_group_permissions to trigger inherited
before and after methods which manage setting the 'public' data
access group.
Release 2.9
- Archive output from PacBio auto deplexing jobs.
- Use tears -d to ensure the default iRODS server is used (avoids
a HEIRARCHY_ERROR when using getHostForGet cross-zone).
- Improvements and bug fixes for publishing from the GridION.
Release 2.8.1
- Use tears 1.2.4 (-w flag now required for write to iRODS)
Release 2.8
- Switched to disposable-irods 1.3 (iRODS packages from Sanger S3,
replaced RENCI FTP site).
- Added single-server option.
- Added support for data files to change during the tar process
and to detect and archive those changes.
- Individually check the checksum of files local to the gridion
where they are loaded and compare with the iRODS checksum.
- Updated the samtools and htslib versions to 1.7.
- Added tar file auditor.
- Use manifest checksums to confirm file contents.
- Support loading PacBio RSII files created using out of date reagents.
- Allow a user-supplied checksum to be used in TarPublisher and TarStream
(so GridIONRunPublisher can allow the checksum of uncompressed data to be used).
- Publish fastq files during the catchup phase.
- Allow auditing of older GridION tar files which used a relative path for
tarred files.
- Archiving changes for illumina sequencing genotype files.
- Fix for the regex used when choosing to ignore files.
- Minor fix to DataObjectFactory.pm.
Release 2.7
- Moved quant file tests to illumina run publisher tests
- Added compression extentions to illumina ancillary file regular expression
- Added tag hops files to illumina run publisher tests
Release 2.6
- Extended log publisher find command to include STAR aligner log files.
- Add 'STAR' as a valid aligner to the header parser.
- Make file formats 'tab' and 'zip' have restricted access.
- Add compress suffixes to ancillary lane/plex file patterns.
Release 2.5
- Refactored BioNano publication, with metadata from MLWH.
- Add file type option in PacBio meta updater.
- Added a configurable local output directory.
- GridION: ancillary files and metadata; configurable TMPDIR.
- Added Star's tab and Salmon's quant.zip to the list of file
formats with restricted access.
Release 2.4
- Added GridION publishing and primary metadata support.
- Requires perl-irods-wrap >= 3.*
Release 2.3
- Made tests requiring h5repack TODO until we work out why it fails
intermittently.
- MinIONRunPublisher
use /tmp instead of /dev/shm.
added boolean attributes to control compression.
- Support streaming MinION data to iRODS;
this adds dependencies on tears, GNU tar and h5 tools (h5repack).
- Update as well as add PacBio legacy meta data.
- Test against htslib 1.5, samtools 1.5.
Release 2.2
- PacBio
Sequel monitor to use new completedAt date.
Add warning for Sequel R&D run entry.
Archive Sequel adapters.fasta.
Restrict access for relevant PacBio files.
Release 2.1
- Added --restart-file CLI option to the Illumina run publisher to enable
job-specific file naming.
- Added a local cache of loaded file names. A restarted job will use
this to determine which files remain to be loaded.
- Added a CLI option to abort a loading run after a user-specified
maximum number of errors.
- Use WTSI::NPG::iRODS::Publisher, removed deprecated
WTSI::NPG::HTS::Publisher.
- Added monitor for PacBio Sequel and changed metadata files after v4
upgrade.
- Added library_name meta data for PacBio.
- Added support for PacBio Sequel.
Release 2.0
- API change: removed WTSI::NPG::HTS::Annotator in favour of
WTSI::NPG::iRODS::Annotator.
- PacBio: no longer require run ids to be specified whe updating
metadata.
- PacBio: remove dependancy on multi value user defined fields.
Change run_uuid field to optional to support R&D runs/wells.
Release 1.6
- BioNano: add command line script for publication.
- Added a script to update PacBio metadata in iRODS.
- Added support for PacBio legacy metadata.
- Added PacBio run monitor CLI script.
- Archival of bam_stats files.
Release 1.5
- Bug fix: Propagate a failure to read a CRAM header
- Bug fix: Failure to parse a JSON read count cache file is captured
and added to the error count. It is no longer immediately fatal,
but will cause a no-zero exit of the loading script.
- Bug fix: Failure to make an MD5 cache file is reduced from a fatal
error to a warning.
- Warnings now come through the logger, rather than raw carping,
so they gain a WARN tag.
- The default log level has been lowered from ERROR to WARN.
- Added WTSI_NPG_BUILD_BRANCH environment variable to permit overriding
of the default build branch.
- Added negation to CLI file category selection.
- Added support for the samplesheet lims driver.
- WTSI::NPG::HTS::Publisher now supports multi-value AVUs. Previously
it retained only the last value processed for a particular attribute.
Release 1.4
- Added library_type metadata.
- Added tgz as recognised file suffix for metadata.
- Improved logging; configurable per-class, reduced verbosity in metadata
updater, increased default verbosity in scripts.
- The metadata updater will handle an id_run of 0.
- Added a CLI option to specify an id_run for cases where it can't be
detected automatically.
- Avoid loading index files for empty alignment files.
- Avoid loading the JSON-wrapped samtools stats created in more recent
runs.
- Added a PacBio run publisher and monitor.
- Added log file publisher.
- Ensure test dependencies are installed.
Release 1.3
- Defer checksums on (re)loading files until after upload. Assume
that iRODS checksums are in a good state prior to upload.
- Bug fix: avoid calling $obj->str on a string, triggered when remote
path is a collection.
- Count errors during group permission removal and re-throw.
- Added strict_groups parameter to overridden update_group_permissions
method.
- Use the return values of metadata-setting methods to inform the caller
of any failures while each operation remains in a try-catch.
- Add the ability to load InterOp files.
- Added Illumina namespace.
Release 1.2
- Added --alt-process and --archive-path command line options to
publish_illumina_run.
- Added options aliases, e.g. position/lanes, to publish_illumina_run.
- Initial seqchksum digest metadata support for publish_illumina_run.
- Restrict the types of secondary metadata on ancillary files i.e
Restrict JSON file secondary metadata to study_id.
- Change run option to id_run (or id-run).
- Filter data objects by lane and tag index (without recourse to
metadata).
Release 1.1
- Added support for alternative ML warehouse drivers.
- The default samtools is now samtools_irods.
- The Publisher now avoids creating MD5 cache files for small files
and tests for stale cache files.
- Bug fix: Corrected handling of nonconsented human.
- Bug fix: Corrected caching of file lists in RunPublisher.
- Bug fix: Publish run-level XML files.
- Bug fix: Obtain num reads value from the correct flagstats JSON
file for alignment subsets.
Release 1.0