Skip to content

uHAF is a unified hierarchical annotation framework for cell type standardization and harmonization

Notifications You must be signed in to change notification settings

SuperBianC/uhaf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

uHAF: a unified hierarchical annotation framework for cell type standardization and harmonization

uHAF is a Python library developed to address the challenges of inconsistent cell type annotations in single-cell transcriptomics, such as varied naming conventions and hierarchical granularity. It integrates organ-specific hierarchical cell type trees (uHAF-T) and a mapping tool (uHAF-GPT) powered by large language models to provide a standardized framework for annotation. By enabling consistent label unification, hierarchical analysis, and integration of diverse datasets, uHAF enhances machine learning applications and facilitates biologically meaningful evaluations. This library is an essential resource for the single-cell research community, fostering collaborative refinement and standardization of cell type annotations.

Explore Online

Installation

Install uHAF via pip:

pip install uhaf

Getting Started

Building uHAF

Start by building a uHAF object for your dataset:

import uhaf as uhaflib

uhaf = uhaflib.build_uhaf(latest=True)
print(len(uhaf.df_uhafs))

This generates a uHAF instance containing annotations for all organs. The example above initializes the uHAF2.2.0 dataset.

Tracing Cell Types

Trace the hierarchical ancestry of a target cell type:

ancestors = uhaf.track_cell_from_uHAF(sheet_name='Lung', cell_type_target='CD8 T cell')
print(ancestors)

Output:

['Cell', 'Lymphocyte', 'T cell', 'CD8 T cell']

Annotation Levels

Retrieve hierarchical annotation levels for cell types. Specify the desired level (e.g., main, middle, or fine).

example_cell_types = ['Pericyte', 'Macrophage', 'Monocyte-derived macrophage', 'Monocyte', 'Dendritic cell']
annotation_level = 2  # Middle cell type level
annotations = uhaf.set_annotation_level(example_cell_types, sheet_name='Heart', annotation_level=annotation_level)
print(annotations)

Example Output:

{'Pericyte': 'Pericyte', 'Macrophage': 'Macrophage', 'Monocyte-derived macrophage': 'Macrophage', 'Monocyte': 'Monocyte', 'Dendritic cell': 'Dendritic cell'}

Mapping Custom Labels

To map custom cell type labels to uHAF:

  1. Prepare unique cell type labels from your dataset:

    original_labels = ['V-CM', 'LA-CM', 'RA-CM', 'Capillary-EC', 'Lymphatic-EC']
  2. Generate uHAF-GPT prompts:

    print(uhaf.generate_uhaf_GPTs_prompts('Heart', original_labels))

    Copy the output and use it on the uHAF-GPT Mapping Website to get the mapped labels.

  3. Use the mapping dictionary to transform your labels:

    mapping_results = {"V-CM": "Ventricle cardiomyocyte cell", "LA-CM": "Atrial cardiomyocyte cell"}
    transformed_labels = [mapping_results[label] for label in original_labels]
    print(transformed_labels)

Generating Nested JSON

Export the uHAF tree for a specific organ in nested JSON format:

print(uhaf.dict_uhafs['Heart'])

Contribution

We welcome contributions to improve and expand the uHAF library. For more details, please refer to our contribution guidelines.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

uHAF is a unified hierarchical annotation framework for cell type standardization and harmonization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published