-
Notifications
You must be signed in to change notification settings - Fork 4
Day 3
phyton (v. 3.11.3) - https://www.python.org/ biopython - https://biopython.org/wiki/AlignIO bowtie2(v. 2.3.2) - https://github.com/BenLangmead/bowtie2 samtools - https://github.com/samtools/samtools mafft (v7.427) - https://mafft.cbrc.jp/alignment/software raxml-ng (v. 0.8.1 BETA) - https://github.com/amkozlov/raxml-ng bamcov (v. 0.1) - https://github.com/fbreitwieser/bamcov angsd (v. 0.928) - https://github.com/ANGSD/angsd BEAST (v.1.10.4) - https://beast.community/ or BEAST2 - https://www.beast2.org/ bwa (v. 0.7.17-r1188) - https://github.com/lh3/bwa pathPhynder - https://github.com/ruidlpm/pathPhynder snp-sites - https://github.com/sanger-pathogens/snp-sites phynder - https://github.com/ruidlpm/pathPhynder - https://github.com/richarddurbin/phynder FIGTREE (v1.4.4) - http://tree.bio.ed.ac.uk/software/figtree/
Print sample list based on name files
ls *lca.txt > sample.list
nano name_genus
> save it as > 'taxa.list'
while read -r line
do
arr=($line)
lib=${arr[0]}
echo $lib
cat taxa.list | parallel -j20 "grep {} $lib | cut -f1,2,3,4,5,6,7 -d: > $lib.{}.readID.txt"
done < sample.list
wc -l *.readID.txt| awk '$1 == 0' | awk '{print $2}' > rm.list
cat rm.list | parallel -j20 "rm {}"
wc -l *.readID.txt| paste > tot_genus_sequences.txt
for infile in *readID.txt
do
bname=$(basename $infile)
bname1=$(echo $bname | sed 's/.readID.txt*/.fq/')
bname2=$(echo $bname | cut -f1 -d. )
echo $bname1
echo $bname2
seqtk subseq path_to_files_fq/adap2_kmer2_*$bname2*.fq.pp.rmdup.fq $bname > $bname1 &
done
Download reference mitochondrial genomes from NCBI or directly on NCBI Nucleotide database
Send to > File > Fasta > Create File
The samples used here were based on paper Taylor et al. 2021:Taylor, William TT, et al. "Evidence for early dispersal of domestic sheep into Central Asia." Nature Human Behaviour 5.9 (2021): 1169-1179.)
See ReferenceTable1.xlsx
sed "s/:/_/g" in.fa > out.fa
sed "s/ /_/g" in.fa > out.fa
sed "s/;/_/g" in.fa > out.fa
sed "s/__/_/g" in.fa > out.fa
sed "s/(/./g" in.fa > out.fa
sed "s/)/./g" in.fa > out.fa
cat *.fasta > in_to_align.fa
mafft --thread 40 --leavegappyregion Ovis_aries_musimon2.fasta > Ovis_aries_musimon2_aligned.fa
in Geneious (if available): Tools > Generate Consensus Sequence > majority base calling
Or use Python BioAlign's consensus function (python) - biophyton needs to be installed: conda install -c conda-forge biopython
python /path_to_Script/get_consensus.txt Reference_aligned.fa Reference_output_Consensus.fa
for file in *.fa
do
bn=`basename $file .fa`
awk '/^>/ {gsub(/.fa(sta)?$/,"",FILENAME);printf(">%s\n",FILENAME);next;} {print}' $file > ${bn}_renamed.fa ; rm $file
done
cat *Consensus.fa > MSA_to_align.fa
mafft --thread 40 --leavegappyregion MSA_to_align.fa > MSA_out_aligned.fa
raxml-ng --msa MSA_out_aligned.fa --model GTR+G --prefix output_name --threads 2
If any warnings, run the '*reduced.phy' files to get a clean Tree.
for file in *.fa
do
bname=$(basename $file)
bowtie2-build -f $file DB_$bname
done
for file in $(pwd)/*.fq
do
bname=$(basename $file)
echo $bname
bname2=$(echo $bname | cut -f1-3 -d_)
echo $bname2
bname3=$(echo $bname | sed 's/.fq.pp.rmdup.fq*/_holi/')
basepath=$(pwd)/
basefolder=$basepath
echo $basepath
echo $bname3
mkdir $basepath$bname3
cd $basepath$bname3
done
for DB in /path_to_DB/name_DB
do
echo Mapping adap2_kmer2_$bname.pp.rmdup.fq against $DB
bowtie2 --threads 20 -x $DB -U $file --no-unal | samtools view -bS - > $bname2.$(basename $DB).bam
done
for DB in /path_to_DB/name_DB
do
echo Mapping adap2_kmer2_$bname.pp.rmdup.fq against $DB
bowtie2 --threads 20 -x $DB -U $file --no-unal | samtools view -bS - > $bname2.$(basename $DB).bam
done
for bam in *.bam
do
echo Sorting bam files
samtools sort -O BAM -o sort.$bam $bam
done
for sort in sort.*.bam
do
echo Bamcov $sort
/path_to_bamcov/bamcov -m -w0 $sort | paste > bamcov_hist_$sort.txt
/path_to_bamcov/bamcov $sort | paste > bamcov_table_$sort.txt
done
cd $basepath
done
Important to assess the coverage of the mapped reads to different mitogenomes before proceeding with the tree.
mkdir new_folder
for file in *holi/sort.*.bam
do
mv $file new_folder
done
for file in sort*.bam
do
angsd -doFasta 2 -doCounts 1 -i $file -out $file.consensus.fa
done
for file in *.consensus.fa.fa.gz
do
bname=$(basename $file)
echo $bname
gunzip $file
done
for file in *.consensus.fa.fa
do
bname=$(basename $file)
echo $bname
find $file -size 0 -delete
done
for file in *.fa.fa
do
bn=`basename $file .fa.fa`
awk '/^>/ {gsub(/.fa(sta)?$/,"",FILENAME);printf(">%s\n",FILENAME);next;} {print}' $file > ${bn}_renamed.fa ; rm $file
done
cat *renamed.fa MSA_not_aligned.fa > Allref_mycons.fa
mafft --thread 40 --leavegappyregion Allref_mycons.fa > Allref_mycons_aligned.fa
raxml-ng --msa output_name.raxml.reduced.phy --model GTR+G --prefix output_nameRED --threads 2
rerun using reduced.phy file if optimised alignment by raxml is suggested
We ran a tree in Beast based on the msa file for references and for references with sample files. Parameters used in BEAST (v.1.10.4) with 20,000,000 replicates, applying the HKY model and a Coalescent Constant Population as prior. BEAUTI > import alignment, set parameters, save file as Aln_mafft_taxa_references.xml
beast -threads n Aln_mafft_taxa_references.xml
Check log file TRACER > open .log file and check burnin, if not okay, increase replicates
TreeAnnotator → open .trees file and save it as a Maximum Credibility Tree with 10% Burnin when saving/naming the file you have to add the file extension .nexus
If using BEAST2 the tree will be saved with .tree extension that can be read by FIGTREE.
FIGTREE > Open tree.nexus > export tree as Newick > save with .nwk #Visualise node labels/branch labels as posterior values
Check .nwk file
Check with text editor if tree.nwk has any ‘ symbols, if so, remove them
to call those SNPs in a given dataset of ancient samples and find the best path and branch where these can be mapped in the tree.
snp-sites -v -c -o Aln_mafft_taxa_references.vcf Aln_mafft_taxa_references.fa
Rscript /path_to_Rscript/fix_vcf.R Aln_mafft_taxa_references.vcf Aln_mafft_taxa_references_output.vcf
awk '{ if ($1 == "1") $1="consensus";}1' Aln_mafft_taxa_references_output.vcf | sed 's/ /\t/g' > Aln_mafft_taxa_references_fixed_output.vcf
python /path_To_Script/get_consensus.txt Aln_mafft_taxa_references.fa Cons_Aln_mafft_taxa_references.fa
bwa index Cons_Aln_mafft_taxa_references.fa
mkdir map_to_cons
bwa aln -l 1024 -n 0.001 -t 10 /path_to_folder/Consensus_Mafft_All_references.fa /path_to_folder/library.fq
| bwa samse /path_to_folder/Consensus_Mafft_All_references.fa - /path_to_folder/library.fq
| samtools view -F 4 -q 25 -@ 10 -uS - | samtools sort -@ 10 -o library.sort.bam
samtools view -c file.bam
Create directory in pathPhynder_analysis folder
mkdir pathphynder_results
phynder -B -o /path_to_folder/branches.snp /path_to_folder/Mafft_All_reference.nwk /path_to_folder/Mafft_All_references_fixed_consensus.vcf
Prepare data - this will output a bed file for calling variants and tables for phylogenetic placement
pathPhynder -s prepare -i /path_to_folder/Mafft_All_reference.nwk -p taxa_pathphynder_tree -f /path_to_folder/branches.snp -r /path_to_folder/Consensus_Mafft_All_references.fa
pathPhynder -s all -t 100 -m transversions -i /path_to_folder/Mafft_All_references.nwk -p /path_to_folder/taxa_pathphynder_tree -l /path_to_folder/bamlist.txt -r /path_to_folder/Consensus_Mafft_All_references.fa
pathPhynder -s all -t 100 -i /path_to_folder/Mafft_All_references.nwk -p /path_to_folder/taxa_pathphynder_tree -l /path_to_folder/bamlist.txt -r /path_to_folder/Consensus_Mafft_All_references.fa
Visualise pdf files with the results.