You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We project token-level representations obtained from the BERT embedders onto a 2-dimentional space using t-SNE.
And the paper claims that Figure 3 shows the usefulness of pretraining on OntoNotes by showing more compact clusters. However, as the word embeddings returned by transformer model are contextualized, I am wondering how you get the embeddings of individual tokens in the test set and then apply the t-SNE technique. Do you obtain all of the embeddings and then do the average?
Additionally, I could not find the associated code for visualizing embeddings. Would it be possible the code to obtain Figure 3 provided?
The text was updated successfully, but these errors were encountered:
From the original paper
And the paper claims that Figure 3 shows the usefulness of pretraining on OntoNotes by showing more compact clusters. However, as the word embeddings returned by transformer model are contextualized, I am wondering how you get the embeddings of individual tokens in the test set and then apply the t-SNE technique. Do you obtain all of the embeddings and then do the average?
Additionally, I could not find the associated code for visualizing embeddings. Would it be possible the code to obtain Figure 3 provided?
The text was updated successfully, but these errors were encountered: