Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2707 caQTL in RASQUAL paper #19

Open
YaCui opened this issue Apr 19, 2019 · 12 comments
Open

2707 caQTL in RASQUAL paper #19

YaCui opened this issue Apr 19, 2019 · 12 comments

Comments

@YaCui
Copy link

YaCui commented Apr 19, 2019

Dear Natsuhiko,
Thanks so much for developing rasqual! Could you provide the 2707 caQTLs identified in RASQUAL paper?

best,
Ya

@natsuhiko
Copy link
Owner

Hi,

Here is the link to the google drive: https://drive.google.com/open?id=0B-aFDIHv9Wy3M3kwS1hPM09TRlU

You can find the peak annotation (peaks.bed.gz) as well as the peak IDs at FDR 10% (pid.fdr10.txt).

I would, however, recommend to use the latest caQTL result with 100 British samples presented in our latest paper (https://www.nature.com/articles/s41588-018-0278-6?WT.feed_name=subjects_epigenetics).

Best regards,

Natsuhiko

@YaCui
Copy link
Author

YaCui commented Apr 19, 2019

Great! Thanks for sharing!

best,
Ya

@YaCui
Copy link
Author

YaCui commented May 26, 2019

Dear Natsuhiko,
I have a small question. How should I determine the values of -l and -m? Can I just use "-l 378 -m 62" in my analysis for all features?

Thanks,
Ya

@natsuhiko
Copy link
Owner

You need to count appropriate numbers of SNPs for each feature by your self. It's relatively easy to count the number of tested SNPs (-l) by counting the number of rows in VCF that are fed to RASQUAL (you can just use wc command on linux). You could set the number of feature SNPs (-m) as the number of tested SNPs if you have enough memory and not sure how to count the number of SNPs overlapping with multiple features.

Best regards,
Natsuhiko

@YaCui
Copy link
Author

YaCui commented Jun 19, 2019

Dear Natsuhiko,
I am a little confused about the results of Rasqual. I can get the results like "rasqual_atac_1M.gz", but how can I get the q-values in "Q.val.txt.gz"? It seems that q-values in "Q.val.txt.gz" are different from the "Log_10 Benjamini-Hochberg Q-value" in "rasqual_atac_1M.gz".

All files are from https://drive.google.com/drive/folders/0B-aFDIHv9Wy3M3kwS1hPM09TRlU.

Thanks,
Ya

@natsuhiko
Copy link
Owner

Sorry for the confusion. The file "rasqual_atac_1M.gz" is old and the 10th column is not the Q value. This is because we provide the Q values as a separate file.

Best regards,
Natsuhiko

@YaCui
Copy link
Author

YaCui commented Jun 20, 2019

Hi Natsuhiko,
So how can I get the Q values file? I cannot get this file if I just run the commands like below:

cd $RASQUALDIR
tabix data/chr11.gz 11:2315000-2340000 | bin/rasqual -y data/Y.bin -k data/K.bin -n 24 -j 1 -l 378 -m 62 -s 2316875,2320655,2321750,2321914,2324112 -e 2319151,2320937,2321843,2323290,2324279 -t -f C11orf21 -z

Thanks,
Ya

@natsuhiko
Copy link
Owner

natsuhiko commented Jun 21, 2019

Sorry, but I don't understand your problem. I believe Q.val.txt.gz gives you the Q value for each peak in the rasqual_atac_1M.gz file.

The example command found in the github page is for RNA-seq, but not ATAC-seq we provided in the Google drive.

Best regards,
Natsuhiko

@YaCui
Copy link
Author

YaCui commented Jun 21, 2019

Hi Natsuhiko,
Got it. Thank you so much for your help.

Thanks,
Ya

@plbngl
Copy link

plbngl commented Nov 4, 2019

Hi Natsuhiko,

regarding the caQTL result with 100 British samples (https://www.nature.com/articles/s41588-018-0278-6?WT.feed_name=subjects_epigenetics), I have your summary statistics with the probabilities but I don't know what is the cutoff you use to define a caQTL and how many are there in total? I cannot find it in the paper. Thank you very much!!!!!
Paola

@natsuhiko
Copy link
Owner

natsuhiko commented Nov 5, 2019

Hi Paola,

The RASQUAL mapping result based on 24 LCLs (not 100 LCLs) is found here: https://drive.google.com/drive/folders/0B-aFDIHv9Wy3M3kwS1hPM09TRlU

The paper you cited is different. In the paper, we used 100 LCLs and performed caQTL mapping with a different approach to detect causal interactions in the genome. Because we used a Bayesian approach, we don't have "significant caQTLs" but just posterior probabilities.

Best regards,
Natsuhiko

@plbngl
Copy link

plbngl commented Nov 5, 2019

Thank you Natsuhiko!
Yes I have been using the results from the 24 LCLs of the first study, but since in your comment above you said:
"I would, however, recommend to use the latest caQTL result with 100 British samples presented in our latest paper (https://www.nature.com/articles/s41588-018-0278-6?WT.feed_name=subjects_epigenetics)", I though that you also identified caQTL, maybe more than using 24 samples so I though to use this new study.... Anyway I can just use the results from the 24 samples !
Thank you very much!!
Paola

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants