Layer extracted by grafzahl #27

LuigiC72 · 2024-01-25T06:33:56Z

Given the discussion about which layer keeping as a token's representation in a down-streaming analysis (Jawahar et al., 2019; Ethayarajh, 2019) when for example using a pre-trained bert model, I was wondering if you are considering to allow to the user the possibility to select one specific layer when fine-tuning a Transformer via grafzahl. At the moment, which layer is consider when for example I specify [model_name = "bert-base-uncased"]? Thanks for your great package!

chainsawriot · 2024-01-25T13:54:24Z

By default, grafzahl uses almost the same default as the underlying simpletransformers, i.e. no freezing and all layers might get finetuned.

If you really want to freeze some layers, it is possible to do that with simpletransformers; but unfortunately, not possible with grafzahl. If you want to customize the finetuning at the layer level, I think you would be better off using simpletransformers or even transformers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Layer extracted by grafzahl #27

Layer extracted by grafzahl #27

LuigiC72 commented Jan 25, 2024

chainsawriot commented Jan 25, 2024 •

edited

Loading

Layer extracted by grafzahl #27

Layer extracted by grafzahl #27

Comments

LuigiC72 commented Jan 25, 2024

chainsawriot commented Jan 25, 2024 • edited Loading

chainsawriot commented Jan 25, 2024 •

edited

Loading