PCASA

PCASA (Prioritization of Combinatorial Cancer-Associated Surface Antigens) is a tool for predicting the best gene combination targets of surface antigens by classification of malignant and non-malignant cells using single-cell RNA-seq data.

Requirements

Required packages for RF (Random forest) in R.

R (>= 3.6.0): https://cloud.r-project.org/bin/linux/ubuntu/
ROCR (>= 1.0-11), caret (>= 6.0-88), e1071 (>= 1.7-8), ggplot2 (>= 3.3.5), gplots (>=3.1.1), randomForest (>= 4.6-14), rpart (>= 4.1-15)

Required modules for CNN (Convolutional neural network) in Python.

Python (>= 3.6.9): https://www.python.org/downloads/source/
pip3 (>= 21.2.4), numpy (>= 1.19.5), matplotlib (>= 3.3.4), pandas (>= 1.1.5), scikit-learn (>= 0.24.1), keras (>= 2.6.0), tensorflow (>= 2.6.0), tf-keras-vis (>= 0.8.0)

Running codes

Overview

PCASA is a three-step program.

Step 1. Prioritization of the single genes well-classifying tumor and normal cells.
Step 2. Calculation of the expressing cell fraction (ECF) for each gene combination.
Step 3. Prioritization of the gene combinations well-classifying tumor and normal cells.

Input files

The first command for each step requires input data.

For step 1, a tab-delimited file containing cell-by-gene sparse matrix composed of log-transformed counts with a column showing the binary class (Tumor, 1; Normal, 0).
For steps 2 & 3, a two-column tab-delimited file containing cell-code and cell-type.
For steps 2 & 3, a tab-delimited file containing gene-by-cell sparse matrix composed of log-transformed counts.

Step1. RF - Cell classifier for single genes

cd code

Rscript step-1a__random_forest.R ../data/input-1__scrna_class.txt
# 'input-1__scrna_class.txt': Cell-by-Gene-Matrix-with-Class

python step-1b__random_forest.py

Step2. Expression logic evaluator for ECF

python step-2__gate_coverage_calc.py \
../data/input-2a__scrna_annotation.txt \
../data/input-2b__scrna_gc-matrix.txt
# 'input-2a__scrna_annotation.txt' : Cell-Type-Annotation
# 'input-2b__scrna_gc-matrix.txt' : Gene-by-Cell-sparse-Matrix

Step3. CNN - Cell classifier for gene combinations

python step-3a__cnn_gradcam.py \
../data/input-2a__scrna_annotation.txt \
../data/input-2b__scrna_gc-matrix.txt
# 'input-2a__scrna_annotation.txt' : Cell-Type-Annotation
# 'input-2b__scrna_gc-matrix.txt' : Gene-by-Cell-sparse-Matrix

python step-3b__cnn_gradcam.py
python step-3c__cnn_gradcam.py

Pre-process

The directory named 'preprocess' contains the codes for pre-processing.

Initial filtering, normalization, and dimensional reduction by Seurat
Reference-based cell-type annotation by SingleR
Aneuploid cell prediction by CopyKat
Volume-dependent sub-sampling by Geosketch
Batch correction by BBKNN
Geosketch and BBKNN codes are based on the module named 'scjp' that Dr.Park made manually.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
code		code
data		data
preprocess		preprocess
results		results
LICENSE		LICENSE
README.md		README.md
run_all_codes.sh		run_all_codes.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PCASA

PCASA (Prioritization of Combinatorial Cancer-Associated Surface Antigens) is a tool for predicting the best gene combination targets of surface antigens by classification of malignant and non-malignant cells using single-cell RNA-seq data.

Requirements

Running codes

Overview

Input files

Step1. RF - Cell classifier for single genes

Step2. Expression logic evaluator for ECF

Step3. CNN - Cell classifier for gene combinations

Pre-process

About

Releases 1

Packages

Languages

License

kaistomics/PCASA

Folders and files

Latest commit

History

Repository files navigation

PCASA

PCASA (Prioritization of Combinatorial Cancer-Associated Surface Antigens) is a tool for predicting the best gene combination targets of surface antigens by classification of malignant and non-malignant cells using single-cell RNA-seq data.

Requirements

Running codes

Overview

Input files

Step1. RF - Cell classifier for single genes

Step2. Expression logic evaluator for ECF

Step3. CNN - Cell classifier for gene combinations

Pre-process

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages