Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What can we learn from the FastQC report? #1

Open
kipkurui opened this issue Mar 11, 2021 · 4 comments
Open

What can we learn from the FastQC report? #1

kipkurui opened this issue Mar 11, 2021 · 4 comments
Assignees

Comments

@kipkurui
Copy link
Contributor

View the HTML summary of the QC report and tell us what it means, what do we need to do based on the results?

@simeonhebrew
Copy link
Collaborator

One thing that stands out is a low quality portion of the reads in the first approximately 40 base pairs, the same region that has been reported to have a high percentage N-count.
We could consider trimming out this region.

@nanjalaruth
Copy link
Collaborator

From Ruth&Gatua

During trimming, the minimum length could be set to 50 and the phred quality score set to30.
Warning: Our data is characterized by low per base sequence content, abnormal GC content across the genome and a majority of the sequences are overrepresented and duplicated.
Mitigation (Things that could have been done):
Overrepresentation - normalization during library preparation
Study the genome to be able to know whether its characterized by repeats and whether it's AT rich or GC rich

@Kauthar-Omar
Copy link
Collaborator

From Kauthar and Rose

  • Some sequences are contaminated with adapter sequences and removing the adapters will improve the quality score of the sequences.
  • The thing to note is that our data has has abnormal GC content, low per base content, high sequence duplication and high levels of overrepresented regions.
    With the consideration that this is an RNA Seq library, the overrepresentation may be due to very abundant transcript as opossed to the normal conclusion of PCR enrichment bias.

@Asatsa
Copy link
Collaborator

Asatsa commented Mar 11, 2021

From Asatsa and Wilson

We need to normalize the reads due to over-representation

There is high duplication, therefore dereplication needs to be done maybe using tools such as Mark Duplicates?..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants