You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Trying to run the geneformer example on provided testdata as explained in tutorials. After installing the last version of geneformer from the hugging faces repository and some plumbing to get everything to work, I run into the following error
# reload pretrained model
model = BertForSequenceClassification.from_pretrained(model_dir)
# create the trainer
trainer = Trainer(model=model,
data_collator=DataCollatorForCellClassification())
All data being used is from the example data and none from external sources.
I tried overrulling the token_dictionary by performing:
# reload pretrained model
model = BertForSequenceClassification.from_pretrained(model_dir)
# create the trainer
kwargs = {"token_dictionary": tokenizer.gene_token_dict};
trainer = Trainer(model=model,
data_collator=DataCollatorForCellClassification(**kwargs))
But this results in the following error during training:
File ~/.pyenv/versions/3.10.0/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py:1073, in BertModel.forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
[1071](~/.pyenv/versions/3.10.0/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py:1071) if hasattr(self.embeddings, "token_type_ids"):
[1072](~/.pyenv/versions/3.10.0/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py:1072) buffered_token_type_ids = self.embeddings.token_type_ids[:, :seq_length]
-> [1073](~/.pyenv/versions/3.10.0/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py:1073) buffered_token_type_ids_expanded = buffered_token_type_ids.expand(batch_size, seq_length)
[1074](~/.pyenv/versions/3.10.0/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py:1074) token_type_ids = buffered_token_type_ids_expanded
[1075](~/.pyenv/versions/3.10.0/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py:1075) else:
RuntimeError: The expanded size of the tensor (2377) must match the existing size (2048) at non-singleton dimension 1. Target sizes: [8, 2377]. Tensor sizes: [1, 2048]
To Reproduce
Run the geneformer notebook using the latest version of geneformer installed.
Environment
Provide a description of your system and the software versions.
Mac M1 pro, running python 3.10 with most important libs:
Sorry for the roadbump. That revision is what we coded against when we created the example; at that time (possibly still?), the Geneformer repository didn't have tagged/released versions, making it a little challenging to track with subsequent changes. We'll be doing some work shortly to update the cellxgene_census Geneformer integration to a newer version, but it will take some time to get out the door. Thanks!
Describe the bug
Trying to run the geneformer example on provided testdata as explained in tutorials. After installing the last version of geneformer from the hugging faces repository and some plumbing to get everything to work, I run into the following error
when I try to execute
All data being used is from the example data and none from external sources.
I tried overrulling the token_dictionary by performing:
But this results in the following error during training:
To Reproduce
Run the geneformer notebook using the latest version of geneformer installed.
Environment
Provide a description of your system and the software versions.
Mac M1 pro, running python 3.10 with most important libs:
cellxgene-census 1.16.2
geneformer 0.1.0
tiledb 0.32.5
tiledbsoma 1.14.5
torch 2.5.1
transformers 4.46.2
The text was updated successfully, but these errors were encountered: