In this tutorial, we will walk through the process of using Snorkel to identify mentions of spouses in a corpus of news articles. The tutorial is broken up into 3 notebooks, each covering a step in the pipeline:
-
Preprocessing [Intro_Tutorial_1]: First, we parse the raw input documents into contexts (documents, sentences), and extract candidate spouse mentions.
-
Generating and modeling noisy training labels [Intro_Tutorial_2]: Next, we go through the process of writing labeling functions and learning a generative model to denoise them.
-
Training an End Extraction Model [Intro_Tutorial_3]: Finally, we train a neural network to identify spouses in the news using our probabilistic training labels.
For example, in the sentence (specifically, a photograph caption)
Prime Minister Lee Hsien Loong and his wife Ho Ching leave a polling station after casting their votes in Singapore (Photo: AFP)
our goal is to extract the spouse relation pair ("Lee Hsien Loong", "Ho Ching").