This README describes the analysis in:
An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies
conda env create -f Ab_epitope/environment.yml
- Flu antibody dataset in this paper: ./doc/HA_Abs_v18.xlsx
- SARS-CoV-2 antibody and HIV dataset is from our previous paper: A large-scale systematic survey reveals recurring molecular features of public antibody responses to SARS-CoV-2
- All antibody from NCBI./doc/all_paired_antibodies_from_GB_v6.xlsx: List of HA antibodies collected from GenBank
- OSA human paired memory B cell sequences: OAS
-
Extract CDR H3 sequences and references
python3 script/parse_Ab_table.py
- Input file:
- Output files:
-
Clustering CDR H3 sequences
python3 script/CDRH3_clustering_optimal.py
- Input file:
- Output file:
-
Analyzing CDR H3 clustering results
python3 script/analyze_CDRH3_cluster.py
- Input files:
- Output files:
-
Analyzing CDR H3 property
python3 script/analyze_CDRH3_property.py
-
Create sequence logos for different CDR H3 clusters
python3 script/CDRH3_seqlogo.py
- Input file:
- Output file:
- ./CDRH3_seqlogo/*.png
-
Plot CDR H3 property for HA head and stem antibodies
Rscript script/plot_CDRH3_property.R
- Input file:
- Output files:
-
Clonotype assignment
python3 script/assign_clonotype.py
- Input files:
- Output file:
-
Compute germline usag and extract public clonotype
python3 script/extract_public_clonotype_VDJ.py
-
Extract IGHD4-17-encoded head antibodies
python3 script/analyze_IGHD4-17.py
- Input file:
- Output file:
-
Analyzing the occurrence of YGD motif in CDR H3
python3 script/analyze_YGD_motif.py
- Input files:
- Ouput file:
-
Plot VDJ gene usage
Rscript script/plot_VDJgene_freq.R
-
Plot IGHV/IGK(L)V pairing frequency
Rscript script/plot_Vpair_heatmap.R
- Input files:
- Output file:
-
Plot frequency of YGD motif
Rscript script/plot_YGD_freq.R
- Input file:
- Output file:
See Ab_epitope