Skip to content

Latest commit

 

History

History
77 lines (57 loc) · 3.36 KB

output.md

File metadata and controls

77 lines (57 loc) · 3.36 KB

Output Files

This document explains the structure of the output files and folders generated by scGenAI during training, prediction and finetune.

Train and Fine-tune Output

Output folder and files are same from the Train and Finetune mode, which is list as below:

├── best_model
│   ├── config.json
│   ├── expression_vocab.npy
│   ├── gene_vocab.npy
│   ├── label_encoder_classes.npy
│   ├── pad_token_id.npy
│   ├── scGenAI_model.pt
│   ├── special_tokens_map.json
│   ├── tokenizer.json
│   ├── tokenizer_config.json
│   ├── train_setting.yaml
│   └── trained_genes.npy
├── last_model
│   ├── config.json
│   ├── expression_vocab.npy
│   ├── gene_vocab.npy
│   ├── label_encoder_classes.npy
│   ├── pad_token_id.npy
│   ├── scGenAI_model.pt
│   ├── special_tokens_map.json
│   ├── tokenizer.json
│   ├── tokenizer_config.json
│   ├── train_setting.yaml
│   └── trained_genes.npy
├── combined_epoch_results.csv
└── train_summary.pdf

combined_epoch_results.csv

This file contains the combined results of all training/finetune epochs, including metrics such as loss and accuracy over time.

train_summary.pdf

This PDF file provides a summary of the training/finetune process, including visualizations such as loss curves and accuracy metrics over time.

/best_model/

This folder contains the best model performance checkpoints, configuration, and tokenization data after training/finetune. The files include:

  • config.json: Contains the model's configuration details.
  • expression_vocab.npy: The expression vocabulary file used during model training/fine-tune.
  • gene_vocab.npy: The gene vocabulary file used during model training/finetune.
  • label_encoder_classes.npy: Encodes the labels (e.g., cell types) used for training/fine-tune.
  • pad_token_id.npy: The padding token ID used during tokenization.
  • scGenAI_model.pt: The PyTorch model file containing the trained weights.
  • special_tokens_map.json: A mapping of special tokens used by the tokenizer.
  • tokenizer.json: The tokenizer configuration used during training/finetune.
  • tokenizer_config.json: Detailed configuration of the tokenizer.
  • train_setting.yaml: The YAML configuration file used during training/finetune.
  • trained_genes.npy: The list of genes the model was trained on.

/last_model/

Similar to the best_model folder, this folder contains the model's last checkpoint after the final epoch. The contents are the same as the best model, including the configuration, vocabularies, and tokenizer settings.


Prediction Output

The prediction output is a CSV file, as defined in the configuration file. It contains the original metadata extracted from the input prediction file (obs slot) along with three additional prediction columns: context_id, PredictedFeature, and prediction_score.

  • context_id represents the context used to determine the prediction for the corresponding cell.
  • PredictedFeature is the final predicted feature for the cell using the trained model.
  • prediction_score indicates the confidence level of the prediction, with a maximum value of 1.