DNA-shape

Based on long read SRR11606870 assembly of Mus musculus genome (Ton et al., Scientific Data 2020; PMID: 33203859), and using Packiaraj and Thakur 2024 Genome Biol (PMID: 38378611) major and minor satellite reads as templates, the scripts allow computing:

Number of contiguous A/T stretches (default set to length of 4)
Number of tetranucleotides associated with narrow minor groove (Rohs et al., 2009 Nature; PMID: 19865164)

Requirements:

Operating system tested: MacOS (M1) Ventura
Language: Python 3.11
Installer: miniconda
Installation time: minutes (10min)
Code run time: minutes (10-30min)
Modules: blast - version 2.6.0, biopython - version 1.83; mkl-service - version 2.4.0; regex - version 2024.5.15

Tools to download (we recommend using terminal window/bash):

Download miniconda (https://docs.anaconda.com/miniconda/)
Download blast via miniconda (https://anaconda.org/bioconda/blast)
Download sra-tools (https://github.com/ncbi/sra-tools/wiki/01.-Downloading-SRA-Toolkit) Useful wiki: https://github.com/ncbi/sra-tools/wiki/HowTo:-fasterq-dump Useful blog: https://edwards.flinders.edu.au/fastq-dump/
We recommend using a free software Pycharm 2023.2.5 Community Edition (https://www.jetbrains.com/pycharm/) to run majsat.py and minsat.py scripts

Prepare dataset (we recommend using terminal window/bash):

Use sra-tools to download SRR11606870 dataset from https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR11606870&display=metadata
Prefetch dataset (prefetch SRR11606870)
Download all reads unsorted in fasta format (fasterq-dump pathtoSRAfile --outdir pathtoSRAfile/fasta --fasta-unsorted)
Rename to: SRR11606870.fasta
Copy the two fasta files representing major (SRR11606870_2342980.fasta) and minor satellite (SRR11606870_111923.fasta) reads from https://github.com/DDudka9/DNA-shape.git into the folder with SRR11606870.fasta file

Run the scripts (in Pycharm 2023.2.5 Community Edition)

Clone the Github repository: (Git / Clone / url https://github.com/DDudka9/DNA-shape.git)
Create new interpreter: Python interpreter (bottom right corner) / Add New Interpreter / Add Local Interpreter / Conda Environment / Create New Environment (provide path to miniconda; select Python 3.11)
Select the "requirement.txt" file and click "Install requirements"
Follow instructions inside majsat.py and minsat.py scripts (use the PyCharm in-built Python Console to run subsequent parts of the scripts by copy-pasting the code into the console -> press return)
The output should appear in the folder with SRR11606870_2342980.fasta and SRR11606870_111923.fasta files

Expected output files:

SRR11606870_Maj_2342980_tetranucleotides.csv - Spreadsheet where each column represents a number of tetranucleotides with narrow major groove (order: AAAT; AATA; AATC; AATT; AAAA; AAGT; GAAT; GAAA; TAAT; AAAC) per 1kb along a representative major satellite array (SRR11606870_2342980)
SRR11606870_Min_111923_tetranucleotides.csv - Spreadsheet where each column represents a number of tetranucleotides with narrow minor groove (order: AAAT; AATA; AATC; AATT; AAAA; AAGT; GAAT; GAAA; TAAT; AAAC) per 1kb along a representative minor satellite array (SRR11606870_111923)
SRR11606870_Maj_tetranucleotides_average.fasta - Number of tetranucleotides with narrow major groove per 234bp of 500 major satellite arrays (find averages at the end of the file)
SRR11606870_Min_tetranucleotides_average.fasta - Number of tetranucleotides with narrow minor groove per 234bp of 500 minor satellite arrays (find averages at the end of the file)
SRR11606870_Maj_ATstretches_average.fasta - Number of AT stretches (default: minimum 4) per 234bp of 500 major satellite arrays (find averages at the end of the file)
SRR11606870_Min_ATstretches_average.fasta - Number of AT stretches (default: minimum 4) per 234bp of 500 minor satellite arrays (find averages at the end of the file)

You can modify the scripts (array ID) to run any other array or use a different dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.idea		.idea
LICENSE		LICENSE
README.md		README.md
SRR11606870_111923.fasta		SRR11606870_111923.fasta
SRR11606870_2342980.fasta		SRR11606870_2342980.fasta
majsat.py		majsat.py
minsat.py		minsat.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DNA-shape

About

Releases

Packages

Languages

License

DDudka9/DNA-shape

Folders and files

Latest commit

History

Repository files navigation

DNA-shape

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages