Are Zero-Shot Cell Embeddings Dependent on Input Sample Chunk Size & max_length? #281

peterhealx · 2025-01-20T15:07:10Z

Dear scGPT Authors (@subercui et al.) ,

When evaluating the zero-shot cell embeddings of scGPT's "embed_data" function (inside scgpt.tasks.cell_emb.py) I noticed the following given a test adata containing 100 samples (obs) x 20,000 genes (vars) and either :

(a) Quering embed_data with the entire adata file and getting one output embedding: 100 samples (obs) x 512 embeddings (vars)
(b) chunking the adata file into n chunks (e.g. 4 x 25 samples) and contatenating the 4 output embeddings

Observations:

Small max_length (e.g. default 1200): the output embeddings from (a) and (b) were different, although strangely for the first ~ 25 samples were identical
Larger max_length (e.g. 2000 or 30000): the output embeddings from (a) and (b) were identical

It appears that the samples are not entirely independent as dataset

chunking yields different final embeddings IF a small max_length "context window" is used, but the issue goes away if a large enough context window is used. The larger the dataset/chunks (no. samples) the larger the context window needs to be to get identical results in cases (a) and (b). Note: this also holds if the model is initialised once before (a) and (b), or once for each chunk within (b).

Can the authors explain this behaviour of scGPT please. Perhaps the best way to proceed is to ensure a large enough context window for the dataset/chunks being used. Please advise.

Best regards,

Dr. Peter Wright

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are Zero-Shot Cell Embeddings Dependent on Input Sample Chunk Size & max_length? #281

Are Zero-Shot Cell Embeddings Dependent on Input Sample Chunk Size & max_length? #281

peterhealx commented Jan 20, 2025

Are Zero-Shot Cell Embeddings Dependent on Input Sample Chunk Size & max_length? #281

Are Zero-Shot Cell Embeddings Dependent on Input Sample Chunk Size & max_length? #281

Comments

peterhealx commented Jan 20, 2025