Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference genome mismatch due to lowercase sequence #12

Open
gabrielle-y opened this issue Jun 7, 2023 · 1 comment
Open

Reference genome mismatch due to lowercase sequence #12

gabrielle-y opened this issue Jun 7, 2023 · 1 comment

Comments

@gabrielle-y
Copy link

https://github.com/tkzeng/Pangolin/blob/5cf94b8db938c658391b4305cd7ce33297d44ff7/pangolin/pangolin.py#LL110C1-L111C1

Trying to run pangolin with the UCSC hg38 genome, which has some lowercase sequences. "[Line 64] WARNING, skipping variant: Mismatch between FASTA (ref base: g) and variant file (ref base: G)." error subsequently occurs as a result of the if statement at line 110. Attempts have been made to make seq uppercase using built in Python function however this has been unsuccessful in resolving the issue.

Would appreciate accommodations made to the script to support lowercase sequences - if resolved in the meantime, will update issue with the solution.

@gabrielle-y
Copy link
Author

Found the issue - we had to re-run the pip install to regenerate the updated pangolin.py file. Appending a .upper() to line 103 overcame the error. Have not tested downstream implications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant