Skip to content

Latest commit

 

History

History

L2

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

L2: Text clustering and similarity

This lecture covers the topics of text clustering and text similarity. Additionally, it introduces fundamental Python libraries for machine learning, which will be utilized in the exercises: numpy and scikit learn.

Lecture videos and slides

Module Topic Lecture material
L2-1 Text clustering lecture video, slides
L2-2 Text similarity lecture video, slides
L2-3 Text clustering evaluation Read: Zhai & Massung: Section 14.4
L2-4 Numpy tutorial external video
L2-5 Scikit learn Complete the following two official scikit-learn tutorials: An introduction to machine learning with scikit-learn and Working with text data

Reading

  • Text Data Management and Analysis (Zhai & Massung)
    • Chapter 14 (except Section 14.3)

Summary

Key concepts in this lecture:

  • Problem of (text) clustering
  • Similarity-based clustering algorithms (Agglomerative Hierarchical Clustering and K-means)
  • Measuring text similarity (Jaccard and cosine)
  • Working with numpy arrays
  • Machine learning basics
  • Working with text data in scikit-learn