Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't support the latest version of transformers. IndexError: tuple index out of range #14

Open
alexey-krasnov opened this issue Oct 17, 2023 · 3 comments

Comments

@alexey-krasnov
Copy link

Hi! I tried to install the program on Mac M2 and found the error while using fresh python3.11 and transformers 4.34.0 as well as tokenizers 0.14.1. When I ran pipeline.py I got the next error, I also printed outputs variable and its type from model.py file:

python pipeline.py --model models --input tests/data/raw.txt --output out.json

Loading product extractor from models/prod...Some weights of the model checkpoint at models/prod were not used when initializing BertCRFForTagging: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']

  • This IS expected if you are initializing BertCRFForTagging from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).

  • This IS NOT expected if you are initializing BertCRFForTagging from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    done
    Loading role extractor from models/role...Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
    Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
    Some weights of the model checkpoint at models/role were not used when initializing BertCRFForRoleLabeling: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']

  • This IS expected if you are initializing BertCRFForRoleLabeling from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).

  • This IS NOT expected if you are initializing BertCRFForRoleLabeling from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    done
    BaseModelOutputWithPoolingAndCrossAttentions(last_hidden_state=tensor([[[-7.2273e-01, -1.3122e-01, -1.1214e+00, ..., 1.3058e+00,
    -7.9485e-01, 1.7867e+00],
    [-7.0862e-01, 8.6485e-01, -8.8587e-01, ..., -9.6626e-01,
    1.1189e+00, 8.2820e-01],
    [ 4.5367e-01, -1.0321e+00, -4.2794e-02, ..., -1.1382e-01,
    -6.6842e-01, -2.5176e-01],
    ...,
    [ 4.6953e-01, -4.4274e-01, -2.4798e-01, ..., -7.2705e-02,
    1.0744e-01, -6.4336e-01],
    [ 2.9490e-01, -6.9888e-01, 4.3615e-01, ..., 2.6774e-01,
    -2.0582e-01, -5.2261e-01],
    [ 3.9034e-01, -4.5327e-01, 7.8649e-02, ..., 9.6617e-02,
    1.2183e-01, -6.4031e-01]],

      [[-7.0996e-01, -1.4535e-01, -1.1391e+00,  ...,  1.2885e+00,
        -7.9856e-01,  1.7955e+00],
       [-6.9886e-01,  8.5938e-01, -8.7897e-01,  ..., -9.6362e-01,
         1.1165e+00,  8.3504e-01],
       [ 4.8238e-01, -1.0400e+00, -3.7558e-02,  ..., -1.3158e-01,
        -6.7545e-01, -2.3461e-01],
       ...,
       [ 3.2453e-01, -4.9036e-01, -6.7992e-02,  ...,  2.5228e-01,
        -3.6437e-01, -5.2642e-01],
       [-1.5090e-01, -5.1699e-01,  8.8815e-01,  ..., -6.2315e-02,
        -5.7753e-02,  3.7765e-01],
       [ 2.5326e-01, -7.0302e-01,  4.8222e-01,  ...,  3.4928e-01,
        -3.8039e-01, -3.5345e-01]],
    
      [[-7.1101e-01, -1.8783e-01, -1.1474e+00,  ...,  1.3191e+00,
        -8.0083e-01,  1.8080e+00],
       [-6.7759e-01,  8.8157e-01, -8.7389e-01,  ..., -9.1561e-01,
         1.0664e+00,  8.2624e-01],
       [ 2.9864e-01, -1.1082e+00, -6.4317e-02,  ..., -4.3113e-01,
        -6.0993e-01, -3.3053e-04],
       ...,
       [ 6.3764e-01, -6.6906e-01,  6.6547e-02,  ...,  3.8974e-01,
        -1.4845e-01, -3.4640e-02],
       [ 6.1456e-01, -7.3927e-01,  2.4568e-02,  ...,  5.0147e-01,
        -1.6768e-01, -5.0775e-02],
       [ 8.0470e-01, -8.2970e-01,  3.1760e-01,  ...,  2.6904e-01,
        -2.7527e-01,  7.3508e-02]],
    
      ...,
    
      [[-6.5839e-01, -1.3844e-01, -1.1229e+00,  ...,  1.3440e+00,
        -8.8502e-01,  1.8236e+00],
       [-6.7542e-01,  8.9506e-01, -8.3126e-01,  ..., -9.7451e-01,
         1.0457e+00,  8.9183e-01],
       [ 5.9978e-01, -8.9523e-01, -4.6267e-01,  ..., -6.4620e-01,
        -1.1831e+00, -4.5417e-02],
       ...,
       [ 5.6698e-01, -8.3520e-01, -5.7488e-02,  ...,  9.3281e-02,
        -4.7615e-01, -6.3753e-01],
       [ 5.2334e-01, -8.4141e-01, -5.8037e-02,  ...,  2.9975e-01,
        -4.9307e-01, -5.2515e-01],
       [ 5.3504e-01, -8.5081e-01, -2.9094e-02,  ...,  2.7717e-01,
        -5.0541e-01, -5.6720e-01]],
    
      [[-6.4393e-01, -1.1897e-01, -1.1036e+00,  ...,  1.3327e+00,
        -8.7286e-01,  1.8048e+00],
       [-6.6012e-01,  9.5893e-01, -8.2026e-01,  ..., -9.3506e-01,
         1.0918e+00,  8.8813e-01],
       [ 2.1059e-01, -6.8708e-01, -1.2977e-01,  ..., -9.9983e-01,
        -5.8769e-01,  1.2027e-01],
       ...,
       [ 9.4669e-01, -4.1373e-01, -1.5878e-01,  ..., -5.9621e-02,
        -3.9900e-01, -2.0323e-01],
       [ 5.5884e-01, -6.9609e-01, -1.5338e-01,  ...,  4.2257e-01,
        -4.0991e-01, -2.6131e-01],
       [ 6.7532e-01, -6.0004e-01, -1.8916e-01,  ...,  2.6614e-01,
        -3.2382e-01, -1.0976e-01]],
    
      [[-6.4499e-01, -1.1385e-01, -1.1036e+00,  ...,  1.3126e+00,
        -8.9420e-01,  1.7905e+00],
       [-6.6014e-01,  9.2368e-01, -8.3718e-01,  ..., -9.3139e-01,
         1.0758e+00,  8.7325e-01],
       [ 6.0877e-01, -9.7736e-01, -2.6773e-01,  ..., -6.1211e-01,
        -8.7993e-01, -1.7766e-01],
       ...,
       [ 1.1020e+00, -2.3823e-01, -1.2828e-01,  ..., -1.8427e-02,
        -2.1446e-01,  6.2169e-02],
       [ 1.1225e+00, -1.1497e-01, -3.5708e-01,  ..., -1.2119e-01,
        -2.2195e-02,  3.4493e-01],
       [-1.9072e-01, -5.9194e-01,  9.7846e-01,  ...,  7.0911e-02,
        -2.7881e-01,  5.8151e-01]]]), pooler_output=None, hidden_states=None, past_key_values=None, attentions=None, cross_attentions=None)
    

<class 'transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions'>
Traceback (most recent call last):
File "/Users/alekseikrasov/Desktop/OntoChem/work/ChemRxnExtractor/ChemRxnExtractor/pipeline.py", line 22, in
rxns = rxn_extractor.get_reactions(sents[:10])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alekseikrasov/Desktop/OntoChem/work/ChemRxnExtractor/ChemRxnExtractor/chemrxnextractor/cre.py", line 224, in get_reactions
outputs = self.role_extractor(
^^^^^^^^^^^^^^^^^^^^
File "/Users/alekseikrasov/miniforge3/envs/ontochem/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alekseikrasov/miniforge3/envs/ontochem/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alekseikrasov/Desktop/OntoChem/work/ChemRxnExtractor/ChemRxnExtractor/chemrxnextractor/models/model.py", line 321, in forward
extended_cls_h = outputs[1].unsqueeze(1).expand(batch_size, seq_length, hidden_dim) # FIXME: here is the errer causing IndexError: tuple index out of range in
~~~~~~~^^^
File "/Users/alekseikrasov/miniforge3/envs/ontochem/lib/python3.11/site-packages/transformers/utils/generic.py", line 405, in getitem
return self.to_tuple()[k]
~~~~~~~~~~~~~~~^^^
IndexError: tuple index out of range

I tried to use the old version of transformers v.3.0.2, however, I cannot compile tokenizers==0.8.1.rc1 (from transformers==3.0.2) with existing Rust compiler. The same problem was on Linux Opensuse machine.

Could you please help to solve this problem and maybe update the code and requirements so that one could use the latest version of Python, transformers, and tokenizers?

@LingjieBao1998
Copy link

I have met the same problems. Could there is a solution?

@alexey-krasnov
Copy link
Author

Unfortunately, the installation on Mac with ARM architecture failed, however, I managed to install and run it on a Linux machine with the required dependencies. The only thing I could recommend is using Python<3.9 on Linux with the rest of the installation instructions.

@alexey-krasnov
Copy link
Author

alexey-krasnov commented Nov 7, 2023

Hi @LingjieBao1998, I have found the solution for Mac users with Apple silicon. You can follow next instruction to install ChemRxnExtractor:

1. Install compatible conda with either Miniforge or Anaconda(Miniconda). We recommend using Miniforge.

wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
bash Miniforge3-MacOSX-arm64
and follow instructions.

if you have installed Anaconda(Miniconda) update it to the latest version:
update the conda package manager to the latest version
conda update conda
use conda to update Anaconda to the latest version
conda update anaconda
check the version of Anaconda. The 2022.05 release of Anaconda Distribution features native compiling for Apple M1’s ARM64 architecture.
Set conda-forge as prior channel:
conda config --add channels conda-forge
conda config --set channel_priority strict

2. Create environment and install essential libraries.

conda create --name ENV_NAME "python<3.12"
conda activate ENV_NAME
pip install install pyproject-toml torch tqdm numpy seqeval -U

It’s important to use channel conda-forge for installation next version of tokenizers and transformed.
Find the version of tokenizers according to your Python version:
conda search tokenizers
E.g. works fine:

  • for Python 3.9 tokenizers=0.10.1 transformers=3.0.2
  • for Python 3.11 tokenizers=[0.13.1, 0.13.2] and transformers=[3.0.2, 3.1.0]

Install tokenizers then transformers:
conda install -c conda-forge tokenizers=0.13.2
conda install -c conda-forge transformers=3.1.0

3. Install ChemRxnExtractor

git clone https://github.com/jiangfeng1124/ChemRxnExtractor
cd ChemRxnExtractor
pip install -e .

4. If the error occurs:

line XXX, in init
BertWordPieceTokenizer(
TypeError: init() got an unexpected keyword argument 'vocab_file'

Please, go to the file:

/Users/USER_NAME/miniforge3/envs/ENV_NAME/lib/python3.YY/site-packages/transformers/tokenization_bert.py

and in line XXX change 'vocab_file' to 'vocab'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants