-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Outlier detection analysis - empty output #11
Comments
Hi @gspirito! I'm assuming that you've got a directory structure that looks like: //postprocessing/motifs/ I think the combination of both manifest and input arguments might be having some unintentional consequences. Could you please try |
Hi, thanks for the response, the directory structure looks like that. Tried re-running with python outliers.py -i //postprocessing/ --bootstrapCI -pc 95 -is
|
Hi @gspirito! I'm sorry about the delay; I missed the notification. What's happening here is that for some reason there's an extra field in the output data that isn't expected. Is there any chance that you'd be able to please email me the CSV failing ([email protected])? If not, I'm happy to try and debug this with you here but it might be a little tedious - we'd need to start by checking that none of your samples/identifiers had commas in them, etc. |
Hi, thanks for the answer, I sent you the folders via Google Drive. |
Right! Sorry about this delay. The samples in the files you sent through have varying read lengths - the samples with prefix ASD have 151nt reads and the samples with prefix HG and NA are 150nt. We didn't implement code to handle this automatically because the specifics of how you handle this can impact your outlier calls - in a strict sense the samples with 151nt read length may be outliers relative to those with 150nt because their read lengths differ, not because of their repeat content. The way we handled this in the superSTR manuscript was to use read trimming (via trimmomatic - http://www.usadellab.org/cms/?page=trimmomatic) prior to processing of sequencing data. You could potentially also assign all 151nt long expansions to the 150nt bin (effectively setting a max repeat length), however you'll want to proceed with a little caution here - I can send you a script to do this is that's of use. |
Hi, I performed the preprocessing and postprocessing analyses on a cohort of cram files, obtaining two folders, 'motifs' and 'samples', whose content seems to be correct. Also the postprocessing did not give any errors or warnings.
However, running this command:
python outliers.py -i /<path>/postprocessing/motifs/ --bootstrapCI -pc 95 -is -m /<path>/manifest.txt
I get a one-line output:
Motif Threshold Outlier samples Group counts Status
With no errors or warnings.
I also tried to give the 'samples' or 'postprocessing' directories as input, however I still do not get a result.
Which directory should I use as input?
Thanks.
The text was updated successfully, but these errors were encountered: