Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How different methods on RepeatMasker running can lead to different results #3

Open
yshcai opened this issue May 24, 2022 · 0 comments

Comments

@yshcai
Copy link

yshcai commented May 24, 2022

Hi! Thank you for providing this pipline to annotate a new assembling genome!

I run RepeatMasker on different libraries ( one is build by RepeatModeler, another is 'species' contained in the RepBase ) according to your command, but I add the parameter -xsmall [returns repetitive regions in lowercase (rest capitals) rather than masked] because I notice it is recommended that the braker runs on genomic sequences that have been softmasked for Repeats. And also the option --softmasking is suitable for softmasked genomes. The command looks like this:

RepeatMasker -pa 20 -lib 01_repeatModeler-denovo-repeat.lib/RM_*/consensi.fa.classified -html -gff -xsmall -dir 02_delete-denovo-lib-result genome.fa &>RepeatMasker_run.log1
RepeatMasker -pa 20 -species "Lepidoptera" -html -gff -xsmall -dir 03_delete-repeatmasker-lib-result 02_delete-denovo-lib-result/genome.fa.masked &>RepeatMasker_run.log2
RepeatMasker -pa 20 -species "Lepidoptera" -html -gff -xsmall -noint -dir 04_delete-repeamasker-noint-result 03_delete-repeatmasker-lib-result/genome.fa.masked.masked &>RepeatMasker_run.log3

I have compared the two masked genome generated by run 1 & 2, I see some sequences that is masked in the first run (with -lib) is unmasked in the second run (with -species). I think this case is caused by the option -xsmall, so I run again above commands but delete the option -xsmall and this problem is solved.

It is confusing that this pipeline use the genome.fa.masked.masked in 03_delete-repeatmasker-lib-result/genome.fa.masked.masked directory rather than the genome.fa.masked.masked.masked in 04_delete-repeamasker-noint-result/genome.fa.masked.masked.masked directory to run BRAKER.

So what the meaning of run 3 ( with -noint ) ? If we run RepeatMasker in hardmasking model, I think the braker.pl shouldn't add the option --softmasking. If we run RepeatMasker in softmasking model, does the better way is to combine the libraries into one library as mentioned by jebrosen Dfam-consortium/TETools#20 (comment) and then feed the genome.fa.masked ( soft-masking ) to BRAKER with --softmasking ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant