-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathNLP-cfg-lab.tex
139 lines (119 loc) · 6.23 KB
/
NLP-cfg-lab.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
\documentclass[10.9pt]{article}
%\usepackage[top=20mm,left=20mm,right=20mm,bottom=15mm,a4paper]{geometry} % see geometry.pdf on how to lay out the page. There's lots.
\usepackage[top=20mm,left=20mm,right=20mm,bottom=15mm,headsep=15pt,footskip=15pt,a4paper]{geometry} % see geometry.pdf on how to lay out the page. There's lots.
%\geometry{a4paper} % or letter or a5paper or ... etc
% \geometry{landscape} % rotated page geometry
\usepackage[round]{natbib}
\setlength{\bibsep}{0.0pt}
\usepackage{color}
\usepackage{times}
%\usepackage[T1]{fontenc}
%\usepackage{mathptmx}
\usepackage{tikz-dependency}
\usepackage{enumitem}
%\usepackage{times}
\usepackage[procnames]{listings}
\usepackage{color}
% See the ``Article customise'' template for come common customisations
\newcommand{\refeq}[1]{Equation~\ref{eq:#1}}
\newcommand{\reffig}[1]{Figure~\ref{fig:#1}}
\newcommand{\reftab}[1]{Table~\ref{tab:#1}}
\newcommand{\refsec}[1]{\textsection\ref{sec:#1}}
\newcommand{\newsec}[2]{\section{#1}\label{sec:#2}\noindent}
\newcommand{\newsubsec}[2]{\subsection{#1}\label{sec:#2}\noindent}
\newcommand{\argmax}{\operatornamewithlimits{argmax}}
\newcommand{\argmin}{\operatornamewithlimits{argmin}}
\makeatletter
\def\@maketitle{ % custom maketitle
\begin{center}%
{\bfseries \@title}%
{\bfseries \@author}%
\end{center}%
\smallskip \hrule \bigskip }
% custom section
\renewcommand{\section}{\@startsection
{section}% % the name
{1}% % the level
{0mm}% % the indent
{-0.8\baselineskip}% % the before skip
{0.3\baselineskip}% % the after skip
{\bfseries\large}}% the style
% custom subsection
\renewcommand{\subsection}{\@startsection
{subsection}% % the name
{2}% % the level
{0mm}% % the indent
{-0.8\baselineskip}% % the before skip
{0.3\baselineskip}% % the after skip
{\bfseries\large}}% the style
\renewcommand{\paragraph}{%
\@startsection{paragraph}{4}%
{\z@}{1.5ex \@plus 1ex \@minus .2ex}{-1em}%
{\normalfont\normalsize\bfseries}%
}\makeatother
%\title{{\LARGE Universal Parser (UP)}\\[-8mm]
%\includegraphics[height=8mm]{RUPA}~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\includegraphics[height=8mm]{RUPA}}
\title{{\LARGE Natural Language Processing}\\[1.5mm]{\large Lab 10: Context-Free Grammar}}
\author{}
\date{} % delete this line to display the current date
%%% BEGIN DOCUMENT
\begin{document}
\definecolor{keywords}{RGB}{255,0,90}
\definecolor{comments}{RGB}{0,0,113}
\definecolor{red}{RGB}{160,0,0}
\definecolor{green}{RGB}{0,150,0}
\lstset{language=Python,
basicstyle=\ttfamily\small,
keywordstyle=\color{keywords},
commentstyle=\color{comments},
stringstyle=\color{red},
showstringspaces=false,
identifierstyle=\color{green},
procnamekeys={def,class}}
\maketitle
%\tableofcontents
%\vspace{3mm}
\vspace{-2mm}
\newsec{Introduction}{intro}%
In this lab, we will write a context-free grammar for a small fragment of English and test the grammar using a CFG parser from the NLTK toolkit. All files that are needed are available from {\tt /local/kurs/nlp/syntax/}.
\newsec{Data}{data}%
As our fragment of English, we will use the first five sentences from the file {\tt en10-anno.conll}, which we annotated in Lab 8 and parsed in Lab 9:
\begin{enumerate}[noitemsep]
\item He worked for the BBC for a decade.
\item She spoke to CNN Style about the experience.
\item Global warming has caused a change in the pattern of the rainy seasons.
\item I also wonder whether the Davis Cup played a part.
\item The scheme makes money through sponsorship and advertising.
\end{enumerate}
\newsec{Write a Grammar}{grammar}%
Our task is to write a context-free grammar that generates the five sentences with reasonable syntactic analyses. The file {\tt grammar.txt} contains a grammar that covers the first sentence. Extending it will involve adding new terminals, new rules and (probably) new nonterminals. Consult Jurafsky and Martin if you are uncertain about what nonterminals and rules to add.
\begin{center}
\fbox{
\lstinputlisting{code/grammar.txt}
}
\end{center}
%The grammar format is similar to that used in Figure 9.3 on page 326 of Jurafsky and Martin, but note the following:
%The grammar format is similar to that used in Figure 12.10 on page 441 of Jurafsky and Martin, but note the following: % Miryams clue
The grammar format is similar to that used in Figure 13.1 on page 462 of Jurafsky and Martin, but note the following: % Joakims screenshot
\begin{enumerate}[noitemsep]
\item The left-hand side of a rule is separated from the right-hand side by an ``arrow'' consisting of two characters ({\tt ->}), which must be surrounded by spaces.
\item Alternative right-hand sides are separated by a vertical bar ({\tt |}), which must also be surrounded by spaces.
\item Terminal symbols are enclosed in single quotes ({\tt ' '}).
\item Comment lines start with the hash symbol ({\tt \#}).
\end{enumerate}
\newsec{Parse Sentences}{parse}%
To test your grammar, you can use a simple parser, built on top of the NLTK chart parser. In the Python interpreter, you first import the module {\tt cfg\_parser}, which allows you
to read the grammar from the text file. You can then start the interactive parser, which allows you to type in one sentence at a time and prints out all parse trees that the grammar assigns to the sentence, as shown below.
\begin{center}
\fbox{
\lstinputlisting{code/cfg_parser.py}
}
\end{center}
To make your life easier, you can also run the parser directly in a terminal using a slightly different script:
\begin{verbatim}
python cfg_parser_b.py
\end{verbatim}
The script parses the first sentence. You will need to modify it to parse the other sentences. Note that it produces a tree which is easier to visualize.
\newsec{Discuss Ambiguities}{parse}%
It is likely that, once you have extended the grammar to cover all sentences, you will get more than one parse tree for some sentences. Check these analyses and discuss with your classmates whether they all correspond to reasonable interpretations of the sentence. Also discuss whether there are sentences that only receive one analysis but should have several because they are ambiguous.
\end{document}