Systematic characterization of indel variants using a yeast-based protein folding sensor

Introduction

This respository contains all data (except from the raw FASTQ files, which are available at the NCBI Gene Expression Omnibus (GEO) repository (accession number: GSE270811)) and code to repeat the processing and analysis of the CPOP data in Larsen-Ledet et al.: "Systematic characterization of indel variants using a yeast-based protein folding sensor".

Overview of files

Output files

cpop_data.csv - CPOP scores and standard deviations for DHFR indel, synonymous and nonsense variants.
cpop_data_pre_rescale - Raw CPOP scores and standard deviations for DHFR indel, synonymous and nonsense variants prior to rescaling.
cpop_data_ROC_[ins|del].csv - CPOP scores for ROC curves, where duplicated indel variants on protein level have been removed.
tile[1-5].csv - Counts per tile for DHFR indel, synonymous and nonsense variants for each replicate and condition.

Input files

[ins|del]_dplddt_ddg.csv - dpLDDT and ddG predictions for DHFR indel variants.
rSASA.csv - Relative solvent accessible surface area (rSASA) for each residue in DHFR.
mtx_dist.csv - Distance (Å) of each residue in DHFR to the MTX binding site.
[ins|del]_esm1b - ESM1b predictions for DHFR indel variants.
del_sequence_alignment - MSA generated with HHblits of DHFR homologs with deletions per position.

Excel files

CPOP_primers_annealing.temp..xlsx - Primers and annealing temperatures for the first PCR in amplicon preparation.
SupplementalFile1.xlsx - All data files combined in a single Excel file.

Processing of raw sequencing data

The function.py file is used to call DHFR variants and calculate CPOP scores. The script takes raw FASTQ files as input. The output is a dataset with CPOP scores and standard deviations for DHFR indel, synonymous and nonsense variants.

Data analysis and plotting

The CPOP_data_analysis.R file is used to produce all plots in the main figures, and the CPOP_data_analysis_supplementary.R file is used to produce all plots in the supplementary figures. Both files take the dataset with CPOP scores and standard deviations as input.

Preprint

https://doi.org/10.1101/2024.07.11.603017

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
CPOP_data_analysis.R		CPOP_data_analysis.R
CPOP_data_analysis_supplementary.R		CPOP_data_analysis_supplementary.R
CPOP_primers_annealing.temp..xlsx		CPOP_primers_annealing.temp..xlsx
LICENSE		LICENSE
README.md		README.md
SupplementalFile1.xlsx		SupplementalFile1.xlsx
cpop_data.csv		cpop_data.csv
cpop_data_ROC_del.csv		cpop_data_ROC_del.csv
cpop_data_ROC_ins.csv		cpop_data_ROC_ins.csv
cpop_data_pre_rescale.csv		cpop_data_pre_rescale.csv
del_dplddt_ddg.csv		del_dplddt_ddg.csv
del_esm1b.csv		del_esm1b.csv
del_sequence_alignment.csv		del_sequence_alignment.csv
functions.py		functions.py
ins_dplddt_ddg.csv		ins_dplddt_ddg.csv
ins_esm1b.csv		ins_esm1b.csv
mtx_dist.csv		mtx_dist.csv
rSASA.csv		rSASA.csv
tile1.csv		tile1.csv
tile2.csv		tile2.csv
tile3.csv		tile3.csv
tile4.csv		tile4.csv
tile5.csv		tile5.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Systematic characterization of indel variants using a yeast-based protein folding sensor

Introduction

Overview of files

Processing of raw sequencing data

Data analysis and plotting

Preprint

About

Releases

Packages

Contributors 3

Languages

License

KULL-Centre/_2024_Larsen-Ledet_CPOP

Folders and files

Latest commit

History

Repository files navigation

Systematic characterization of indel variants using a yeast-based protein folding sensor

Introduction

Overview of files

Processing of raw sequencing data

Data analysis and plotting

Preprint

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages