Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HML Spec: What should raw-reads look like? #7

Open
dvaliga opened this issue Sep 11, 2014 · 2 comments
Open

HML Spec: What should raw-reads look like? #7

dvaliga opened this issue Sep 11, 2014 · 2 comments

Comments

@dvaliga
Copy link

dvaliga commented Sep 11, 2014

What data formats/options do DaSH attendees see sending as 'raw reads' to NMDP? FASTA? FASTQ? Done as an HML message payload or external link? If URI, where will the data live? For how long?

@mmaiers-nmdp
Copy link
Contributor

According to the MIRING the raw reads are not part of the message so it will not appear in HML.

@gturenchalk
Copy link

According to the MIRING Documentation the primary reads are covered as Category 9 data of an HGS/HTS-specific "Accessory Data" addendum to the main MIRING message in the genotyping report.
My interpretation is that a URI will be provided to raw data in SFF or FASTQ format that has been uploaded to the SRA. Others, please correct me if I am misinterpreting. The excerpt on Category 9 data is provided below:

Category 9:
Primary Data: unmapped reads with quality scores must be made
available as the primary NGS data, permitting re-analysis of the genotype
result by different NGS analytic software. This primary data must be
limited to full-length reads that include syntactically valid adapter and
indexing/barcoding sequences. However, adapter sequences need not be
included in the primary data.
Due to their potential large size, it may not be possible to make the
primary data available as part of a genotyping report; however, these data
must be made available through other electronic means (e.g., deposition
in the NCBI’s Sequence Read Archive), and instructions for obtaining
them must be included in the genotyping report.
We recommend using either the Sanger FASTQ or Standard Flowgram
Format (SFF, used by the Roche 454 platform) or a comparable format to
report unmapped sequence reads for NGS HLA or KIR primary data. SFF
files can be converted to FASTQ format using available software.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants