Output Files

This document explains the structure of the output files and folders generated by scGenAI during training, prediction and finetune.

Train and Fine-tune Output

Output folder and files are same from the Train and Finetune mode, which is list as below:

├── best_model
│   ├── config.json
│   ├── expression_vocab.npy
│   ├── gene_vocab.npy
│   ├── label_encoder_classes.npy
│   ├── pad_token_id.npy
│   ├── scGenAI_model.pt
│   ├── special_tokens_map.json
│   ├── tokenizer.json
│   ├── tokenizer_config.json
│   ├── train_setting.yaml
│   └── trained_genes.npy
├── last_model
│   ├── config.json
│   ├── expression_vocab.npy
│   ├── gene_vocab.npy
│   ├── label_encoder_classes.npy
│   ├── pad_token_id.npy
│   ├── scGenAI_model.pt
│   ├── special_tokens_map.json
│   ├── tokenizer.json
│   ├── tokenizer_config.json
│   ├── train_setting.yaml
│   └── trained_genes.npy
├── combined_epoch_results.csv
└── train_summary.pdf

`combined_epoch_results.csv`

This file contains the combined results of all training/finetune epochs, including metrics such as loss and accuracy over time.

`train_summary.pdf`

This PDF file provides a summary of the training/finetune process, including visualizations such as loss curves and accuracy metrics over time.

`/best_model/`

This folder contains the best model performance checkpoints, configuration, and tokenization data after training/finetune. The files include:

config.json: Contains the model's configuration details.
expression_vocab.npy: The expression vocabulary file used during model training/fine-tune.
gene_vocab.npy: The gene vocabulary file used during model training/finetune.
label_encoder_classes.npy: Encodes the labels (e.g., cell types) used for training/fine-tune.
pad_token_id.npy: The padding token ID used during tokenization.
scGenAI_model.pt: The PyTorch model file containing the trained weights.
special_tokens_map.json: A mapping of special tokens used by the tokenizer.
tokenizer.json: The tokenizer configuration used during training/finetune.
tokenizer_config.json: Detailed configuration of the tokenizer.
train_setting.yaml: The YAML configuration file used during training/finetune.
trained_genes.npy: The list of genes the model was trained on.

`/last_model/`

Similar to the best_model folder, this folder contains the model's last checkpoint after the final epoch. The contents are the same as the best model, including the configuration, vocabularies, and tokenizer settings.

Prediction Output

The prediction output is a CSV file, as defined in the configuration file. It contains the original metadata extracted from the input prediction file (obs slot) along with three additional prediction columns: context_id, PredictedFeature, and prediction_score.

context_id represents the context used to determine the prediction for the corresponding cell.
PredictedFeature is the final predicted feature for the cell using the trained model.
prediction_score indicates the confidence level of the prediction, with a maximum value of 1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

output.md

output.md

Output Files

Train and Fine-tune Output

`combined_epoch_results.csv`

`train_summary.pdf`

`/best_model/`

`/last_model/`

Prediction Output

Files

output.md

Latest commit

History

output.md

File metadata and controls

Output Files

Train and Fine-tune Output

combined_epoch_results.csv

train_summary.pdf

/best_model/

/last_model/

Prediction Output

`combined_epoch_results.csv`

`train_summary.pdf`

`/best_model/`

`/last_model/`