Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Somewhat unclear references to types in SAMtags #798

Open
cmdcolin opened this issue Oct 14, 2024 · 1 comment · May be fixed by #804
Open

Somewhat unclear references to types in SAMtags #798

cmdcolin opened this issue Oct 14, 2024 · 1 comment · May be fixed by #804
Assignees

Comments

@cmdcolin
Copy link
Contributor

"Optional fields are usually displayed as TAG:TYPE:VALUE; the type may be one of A (character), B (general
array), f (real number), H (hexadecimal array), i (integer), or Z (string)."

however there is C,I,S, and the notion that B is combined with the types is not immediately obvious from the sentence. somewhat unclear what H is also.

digging deeper into cross references from BAM/CRAM might reveal more clarity, but perhaps even explicitly linking to those docs from SAMtags could help

@jkbonfield
Copy link
Contributor

c, C, i, I and s, S are (or were) BAM encoding specific. So in SAM we could have AB:i:7 while in BAM it would be e.g. AB C \007.

Once "B" was added as a byte array, the internal C,I,S representation was exposed to the text format. Although arguably it's not needed and i could have sufficed, I am guessing this was to aid rapid conversion to BAM and to avoid the need for multiple passes through the data to work out the minimum and maximum values.

PS. I'm not sure I like "are usually displayed". For SAM it's mandatory, and there's really no "display" for CRAM or BAM. It's a bit of a woolly definition. We should probably copy the table from SAMv1.tex where it defines them more precisely using a regular expression.

@jkbonfield jkbonfield self-assigned this Nov 5, 2024
jkbonfield added a commit to jkbonfield/hts-specs that referenced this issue Jan 7, 2025
SAM section 1.5 clearly defines the standard SAM tag types along with the
expanded codes used in the B byte-array type.  This text has been
copied into the SAMtags document to remove a rather woolly definition
there.  The text describing lower-case tags has been removed, because
this was already discussed in further detail at the end of the SAMtags
document (and is itself somewhat woolly when it comes to half-upper
half-lower combinations due to the addition of draft tags).

Fixes samtools#798
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants