uHAF is a Python library developed to address the challenges of inconsistent cell type annotations in single-cell transcriptomics, such as varied naming conventions and hierarchical granularity. It integrates organ-specific hierarchical cell type trees (uHAF-T) and a mapping tool (uHAF-GPT) powered by large language models to provide a standardized framework for annotation. By enabling consistent label unification, hierarchical analysis, and integration of diverse datasets, uHAF enhances machine learning applications and facilitates biologically meaningful evaluations. This library is an essential resource for the single-cell research community, fostering collaborative refinement and standardization of cell type annotations.
- uHAF-T Explorer: Browse and explore uHAF-Ts.
- uHAF-GPT Mapping: Map custom cell type labels to uHAF-Ts.
Install uHAF via pip:
pip install uhaf
Start by building a uHAF object for your dataset:
import uhaf as uhaflib
uhaf = uhaflib.build_uhaf(latest=True)
print(len(uhaf.df_uhafs))
This generates a uHAF instance containing annotations for all organs. The example above initializes the uHAF2.2.0
dataset.
Trace the hierarchical ancestry of a target cell type:
ancestors = uhaf.track_cell_from_uHAF(sheet_name='Lung', cell_type_target='CD8 T cell')
print(ancestors)
Output:
['Cell', 'Lymphocyte', 'T cell', 'CD8 T cell']
Retrieve hierarchical annotation levels for cell types. Specify the desired level (e.g., main, middle, or fine).
example_cell_types = ['Pericyte', 'Macrophage', 'Monocyte-derived macrophage', 'Monocyte', 'Dendritic cell']
annotation_level = 2 # Middle cell type level
annotations = uhaf.set_annotation_level(example_cell_types, sheet_name='Heart', annotation_level=annotation_level)
print(annotations)
Example Output:
{'Pericyte': 'Pericyte', 'Macrophage': 'Macrophage', 'Monocyte-derived macrophage': 'Macrophage', 'Monocyte': 'Monocyte', 'Dendritic cell': 'Dendritic cell'}
To map custom cell type labels to uHAF:
-
Prepare unique cell type labels from your dataset:
original_labels = ['V-CM', 'LA-CM', 'RA-CM', 'Capillary-EC', 'Lymphatic-EC']
-
Generate uHAF-GPT prompts:
print(uhaf.generate_uhaf_GPTs_prompts('Heart', original_labels))
Copy the output and use it on the uHAF-GPT Mapping Website to get the mapped labels.
-
Use the mapping dictionary to transform your labels:
mapping_results = {"V-CM": "Ventricle cardiomyocyte cell", "LA-CM": "Atrial cardiomyocyte cell"} transformed_labels = [mapping_results[label] for label in original_labels] print(transformed_labels)
Export the uHAF tree for a specific organ in nested JSON format:
print(uhaf.dict_uhafs['Heart'])
We welcome contributions to improve and expand the uHAF library. For more details, please refer to our contribution guidelines.
This project is licensed under the MIT License. See the LICENSE file for details.