From 43419d30040e74f655323558e8b9acf712ec425c Mon Sep 17 00:00:00 2001 From: James Bonfield Date: Wed, 9 Nov 2022 11:55:01 +0000 Subject: [PATCH 1/2] SAM: add a sentence on case-insensitivity of RG PL (PR #684) This is not changing what is valid / permitted, and indeed this hopefully clarifies it further. However the practicality of dealing with wide-spread non-compliant data with lowercase PL values is that tools may wish to be lenient and use case-insensitive matching. Also removes test/sam/failed/hdr.RG6.sam due to explicitly testing against the use of lower-case PL fields. While strictly not conforming, it's overly harsh if we are advocating a more spec-tolerant testing regime for PL. Fixes #679 --- SAMv1.tex | 3 ++- test/sam/failed/hdr.RG6.sam | 1 - 2 files changed, 2 insertions(+), 2 deletions(-) delete mode 100644 test/sam/failed/hdr.RG6.sam diff --git a/SAMv1.tex b/SAMv1.tex index c1d76d1ae..b320db2be 100644 --- a/SAMv1.tex +++ b/SAMv1.tex @@ -330,7 +330,8 @@ \subsection{The header section} & {\tt PI} & Predicted median insert size, rounded to the nearest integer.\\\cline{2-3} & {\tt PL} & Platform/technology used to produce the reads. \emph{Valid values}: {\tt CAPILLARY}, {\tt DNBSEQ} (MGI/BGI), {\tt ELEMENT}, {\tt HELICOS}, {\tt ILLUMINA}, {\tt IONTORRENT}, {\tt LS454}, {\tt ONT} (Oxford Nanopore), {\tt PACBIO} (Pacific Biosciences), {\tt SINGULAR}, {\tt SOLID}, and {\tt ULTIMA}. - This field should be omitted when the technology is not in this list (though the {\tt PM} field may still be present in this case) or is unknown.\\\cline{2-3} + This field should be omitted when the technology is not in this list (though the {\tt PM} field may still be present in this case) or is unknown. + The values should be written as described in uppercase, however due to the existance of public data with lowercase values tools should also accept lowercase when decoding.\\\cline{2-3} & {\tt PM} & Platform model. Free-form text providing further details of the platform/technology used.\\\cline{2-3} & {\tt PU} & Platform unit (e.g., flowcell-barcode.lane for Illumina or slide for SOLiD). Unique identifier.\\\cline{2-3} & {\tt SM} & Sample. Use pool name where a pool is being sequenced.\\\cline{1-3} diff --git a/test/sam/failed/hdr.RG6.sam b/test/sam/failed/hdr.RG6.sam deleted file mode 100644 index 229863580..000000000 --- a/test/sam/failed/hdr.RG6.sam +++ /dev/null @@ -1 +0,0 @@ -@RG ID:1 PL:illumina From 3cfc7b4a7a2cebb2edab803452f1c9ad037f08a5 Mon Sep 17 00:00:00 2001 From: John Marshall Date: Wed, 6 Nov 2024 02:05:23 +1100 Subject: [PATCH 2/2] Describe case-insensitivity in a footnote, as this is not really normative. --- SAMv1.tex | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/SAMv1.tex b/SAMv1.tex index b320db2be..392065d7d 100644 --- a/SAMv1.tex +++ b/SAMv1.tex @@ -329,9 +329,10 @@ \subsection{The header section} & {\tt PG} & Programs used for processing the read group.\\\cline{2-3} & {\tt PI} & Predicted median insert size, rounded to the nearest integer.\\\cline{2-3} & {\tt PL} & Platform/technology used to produce the reads. \emph{Valid values}: - {\tt CAPILLARY}, {\tt DNBSEQ} (MGI/BGI), {\tt ELEMENT}, {\tt HELICOS}, {\tt ILLUMINA}, {\tt IONTORRENT}, {\tt LS454}, {\tt ONT} (Oxford Nanopore), {\tt PACBIO} (Pacific Biosciences), {\tt SINGULAR}, {\tt SOLID}, and {\tt ULTIMA}. - This field should be omitted when the technology is not in this list (though the {\tt PM} field may still be present in this case) or is unknown. - The values should be written as described in uppercase, however due to the existance of public data with lowercase values tools should also accept lowercase when decoding.\\\cline{2-3} + {\tt CAPILLARY}, {\tt DNBSEQ} (MGI/BGI), {\tt ELEMENT}, {\tt HELICOS}, {\tt ILLUMINA}, {\tt IONTORRENT}, {\tt LS454}, {\tt ONT} (Oxford Nanopore), {\tt PACBIO} (Pacific Biosciences), {\tt SINGULAR}, {\tt SOLID}, and {\tt ULTIMA}.% +\footnote{The {\tt PL} value should be written in uppercase exactly as shown in this list of valid values. +Tools should also accept lowercase when reading the {\tt @RG PL} field, due to the existence of public data files with lowercase {\tt PL} values.} + This field should be omitted when the technology is not in this list (though the {\tt PM} field may still be present in this case) or is unknown.\\\cline{2-3} & {\tt PM} & Platform model. Free-form text providing further details of the platform/technology used.\\\cline{2-3} & {\tt PU} & Platform unit (e.g., flowcell-barcode.lane for Illumina or slide for SOLiD). Unique identifier.\\\cline{2-3} & {\tt SM} & Sample. Use pool name where a pool is being sequenced.\\\cline{1-3}