Skip to content

scGenAI is a Python package for single-cell RNA sequencing (scRNA-seq) data prediction and analysis using large language models (LLMs).

License

Notifications You must be signed in to change notification settings

VOR-Quantitative-Biology/scGenAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scGenAI

Author: Ruijia Wang
Date: 2024-10-14
Version: 1.0.0

scGenAI Logo

About scGenAI


scGenAI is a Python package for single-cell RNA sequencing (scRNA-seq) data prediction and analysis using large language models (LLMs). The package allows users to train, fine-tune, and make predictions on single-cell data using transformer-based models, including custom versions of LLaMA, GPT, BigBird, and scGenT. It provides multi-GPU support with PyTorch DistributedDataParallel (DDP).

Table of Contents


I. Installation


To install the package, use the following steps:

  1. Optional: Create a Env for scGenAI:

    conda create -n scGenAI python==3.10
    conda activate scGenAI
  2. Clone the repository:

    git clone https://github.com/VOR-Quantitative-Biology/scGenAI.git
  3. Navigate to the project directory:

    cd scGenAI
  4. Install dependencies, then install scGenAI:

    pip install -r requirements.txt
    pip install .

II. Quick Start/Tutorials


Once installed, scGenAI can be accessed through either (1) python IDE or notebook or (2) the command line interface (CLI). You can train, predict, or fine-tune a model by calling the scGenAI in python or CLI commands along with a configuration YAML file containing your settings.

Option 1. Use scGenAI through notebook or python script

As a quick start, we highly recommend users to begin with the following tutorials using a testing size of data (40 cells) and config template files according to the training/prediction purposes:

Testing Case Tutorial Training/Finetune Config Template Prediction Config Template
Modeling healthy cell type TrainData_Tutorial config_Train config_Prediction
Modeling disease/cancer cell type TrainData_Tutorial config_Train or config_Train config_Prediction
Modeling cell genotype TrainData_Tutorial config_Train config_Prediction
Modeling using multiomics data TrainData_Tutorial config_Train config_Prediction
Fine-tune using pretrained model FinetuneData_Tutorial config_Train config_Prediction

In addition to the testing data, we also provide full-size datasets and config files according to the training/prediction purposes.

Study Tutorial Training/Finetune Config Prediction Config
Mouse eye cell types
gene list context
TrainData_Tutorial config_Train config_Prediction
Myeloma myeloid cell types
genomic context
TrainData_Tutorial config_Train config_Prediction
AML cell types and cell status
biofunction context
TrainData_Tutorial config_Train config_Prediction
PBMC CITE-Seq TrainData_Tutorial config_Train config_prediction
Fine-tune of bone marrow cell types FinetuneData_Tutorial config_Finetune config_Prediction

Option 2. Use scGenAI through Command Line Interface

The CLI supports the following commands:

  • Train a model:

    scgenai train --config_file <path_to_config.yaml>
  • Make predictions:

    scgenai predict --config_file <path_to_config.yaml>
  • Fine-tune a pre-trained model:

    scgenai finetune --config_file <path_to_config.yaml>

III. Documentation


Please see the full documentation for the details usage of scGenAI.

V. License Agreement


  • The use of scGenAI is governed by a custom license permitting non-commercial use only LICENSE. This package is freely available to individuals, universities, non-profit organizations, educational institutions, and government entities for non-commercial research or journalistic purposes.

  • By cloning or downloading this repository, the user acknowledge that the user has read, understood, and agree to abide by the terms outlined in the LICENSE file.

About

scGenAI is a Python package for single-cell RNA sequencing (scRNA-seq) data prediction and analysis using large language models (LLMs).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages