Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include LD reference panel in bulk download #1

Merged
merged 1 commit into from
Apr 10, 2021

Conversation

jdblischak
Copy link
Contributor

When I bulk downloaded the data*, I noticed that for the BST1 locus that there were 2 rows per SNP, and the only column that differed was MAF. This PR includes the column LD_ref to distinguish between the 2 rows.

Related questions:

  1. How is the Freq column calculated? I was surprised that it is unchanged when using different LD panels. Is MAF calculated in the reference panel and Freq calculated in the study samples?

  2. Is there a documentation page that describes each of the output columns in detail? I haven't been able to find it.

* Caveat: I didn't actually perform the bulk download from within the app. It always crashed at 87%. I extracted the code from app.R to combine all the results.

@jdblischak
Copy link
Contributor Author

Is MAF calculated in the reference panel and Freq calculated in the study samples?

Update: that was just a guess. After looking at the data, it appears that Freq is always calculated from the UKB samples. It perfectly matches MAF when LD_ref == "UKB" but is off when LD_ref != "UKB". Is my interpretation correct?

@jdblischak
Copy link
Contributor Author

@bschilder I'd appreciate any advice on my questions above

@bschilder bschilder mentioned this pull request Apr 10, 2021
@bschilder
Copy link
Member

bschilder commented Apr 10, 2021

Hi @jdblischak, apologies for the delay.

  1. Freq/MAF: Freq and MAF come from the GWAS summary statistics when they're provided. If only Freq or MAF is provided, the other is inferred (e.g. 1-MAF=Freq). Of course, it's preferred that the user provide both, since this inference may be inaccurate in the case of multi-allelic SNPs. If neither col is available, echolocatoR tries to fill them in with the UKB reference panel.
  2. Multiple rows/SNP: This occurs because some loci have been fine-mapped using multiple LD references panels (one row/LD ref). Did you programmatically check that they're actually all duplicated without any differences (besides MAF), or was this from a visual inspection? All the sum stats should be the same, but the fine-mapping results should be slightly different across LD panels.
  3. Col descriptions: Making this a separate issue here Multi-finemap column descriptions #4.
  4. Include LD ref in bulk download: I'll check this. Now here Include LD ref in bulk download #5.
  5. Bulk download: Making this a separate issue here Bulk download failing #3 .

@bschilder bschilder merged commit c5f5edb into RajLabMSSM:master Apr 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants