Skip to content

Latest commit

 

History

History
45 lines (34 loc) · 1.57 KB

05-Word_representations.md

File metadata and controls

45 lines (34 loc) · 1.57 KB

Word representation

Representations:

  • One-hot encoding [0 0 0 1 0 0].
  • It is a problem, because making the AND of two things gives 0.
  • Distributional similarity: "You shall know a word by the company it keeps".
  • Small windows = distinguish singular, plural / big windows = Syntactic or semantic clustering.

Traditional word representations:

  • Class based: brown clustering and exchange clustering.
  • Soft clustering models. Learn from each cluster a distribution of words of how likely that word is in each cluster.

As distributed representation:

  • Word meaning is represented as a dense vector.

Dimensionality reduction:

  • Can be used to plot N-dimension words to 2D.
  • Can be used to analyze the use of words among time e.g. gay from 1900 to 2000.

Neural networks:

  • Can be more meaningful, through adding supervision.
  • E.g. word → encoding → sentiment.

Unsupervised word vector learning.

Mainstream methods:

  • word2vec: skipgram and continuous bag of words. It is a neural network.
  • GloVe.
  • They are valid, but have been surpassed by neural networks.

Constrastive Estimation of Word Vectors:

  • A word and its context is a positive training sample; a random word in that same context gives a negative training sample.
  • To do this, formalize a score: score(cat chills on a mat) > score(cat chills Ohaio a mat).
  • Uses a matrix.
  • Multiply the sentence per the matrix.

Learning word-level classifiers: POS and NER

The model:

  • Same one, with a single layer of NN.

Sharing statistical strength:

  • Why are they so powerful?
  • Advantage is: statistical.
  • Can be used multi-task.