-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathNLP-Semantics-Ass.tex
238 lines (207 loc) · 10.4 KB
/
NLP-Semantics-Ass.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
\documentclass[11pt]{article}
%\usepackage[top=20mm,left=20mm,right=20mm,bottom=15mm,a4paper]{geometry} % see geometry.pdf on how to lay out the page. There's lots.
\usepackage[top=20mm,left=20mm,right=20mm,bottom=15mm,headsep=15pt,footskip=15pt,a4paper]{geometry} % see geometry.pdf on how to lay out the page. There's lots.
%\geometry{a4paper} % or letter or a5paper or ... etc
% \geometry{landscape} % rotated page geometry
\usepackage[round]{natbib}
\setlength{\bibsep}{0.0pt}
\usepackage{color}
\usepackage{times}
%\usepackage[T1]{fontenc}
%\usepackage{mathptmx}
\usepackage{tikz-dependency}
\usepackage{enumitem}
%\usepackage{times}
\usepackage[procnames]{listings}
\usepackage{color}
\usepackage{todonotes}
\newenvironment{titlemize}[1]{%
\paragraph{#1}
\begin{itemize}
\setlength\itemsep{0pt}}
{\end{itemize}}
% See the ``Article customise'' template for come common customisations
\newcommand{\refeq}[1]{Equation~\ref{eq:#1}}
\newcommand{\reffig}[1]{Figure~\ref{fig:#1}}
\newcommand{\reftab}[1]{Table~\ref{tab:#1}}
\newcommand{\refsec}[1]{\textsection\ref{sec:#1}}
\newcommand{\newsec}[1]{\section{#1}\noindent}
%\newcommand{\newsec}[2]{\section{#1}\label{sec:#2}\noindent}
\newcommand{\newsubsec}[2]{\subsection{#1}\label{sec:#2}\noindent}
\newcommand{\argmax}{\operatornamewithlimits{argmax}}
\newcommand{\argmin}{\operatornamewithlimits{argmin}}
\makeatletter
\def\@maketitle{ % custom maketitle
\begin{center}%
{\bfseries \@title}%
{\bfseries \@author}%
\end{center}%
\smallskip \hrule \bigskip }
% custom section
\renewcommand{\section}{\@startsection
{section}% % the name
{1}% % the level
{0mm}% % the indent
{-0.8\baselineskip}% % the before skip
{0.3\baselineskip}% % the after skip
{\bfseries\large}}% the style
% custom subsection
\renewcommand{\subsection}{\@startsection
{subsection}% % the name
{2}% % the level
{0mm}% % the indent
{-0.8\baselineskip}% % the before skip
{0.3\baselineskip}% % the after skip
{\bfseries\large}}% the style
\renewcommand{\paragraph}{%
\@startsection{paragraph}{4}%
{\z@}{1.5ex \@plus 1ex \@minus .2ex}{-1em}%
{\normalfont\normalsize\bfseries}%
}\makeatother
%\title{{\LARGE Universal Parser (UP)}\\[-8mm]
%\includegraphics[height=8mm]{RUPA}~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\includegraphics[height=8mm]{RUPA}}
\title{{\LARGE Natural Language Processing}\\[1.5mm]{\large Assignment 4: Semantics}}
\author{}
\date{} % delete this line to display the current date
%%% BEGIN DOCUMENT
\begin{document}
\definecolor{keywords}{RGB}{255,0,90}
\definecolor{comments}{RGB}{0,0,113}
\definecolor{red}{RGB}{160,0,0}
\definecolor{green}{RGB}{0,150,0}
\definecolor{UUlight}{RGB}{230,230,230}
\definecolor{UUmedium}{RGB}{190,190,190}
\definecolor{UUdark}{RGB}{130,130,130}
\definecolor{UUred}{RGB}{153,0,0}
\lstset{language=Python,
basicstyle=\ttfamily\small,
keywordstyle=\color{keywords},
commentstyle=\color{comments},
stringstyle=\color{red},
showstringspaces=false,
identifierstyle=\color{green},
procnamekeys={def,class}}
\maketitle
%\tableofcontents
%\vspace{3mm}
\section{Introduction}
\noindent This assignment involves material from lectures 11 and 12. You should
have watched the relevant videos, read the relevant chapters in the textbook and
made a serious attempt at completing the relevant labs before you attempt this
assignment. If you feel you have done that and still find the instructions
unclear, you are welcome to email the course teachers and/or go to office hours
to ask for help. The assignment is split into 2 sections, one about lexical
semantics, one about semantic role labelling. Each section is worth equally
much. We expect between half a page and a page for each section. If you are
interested in receiving a VG grade, you may complete the \textit{optional}
exercise available to you, which is related to WordNet - a popular lexical
semantic resource. Note that the exercise must be completed in full in order to
receive VG credit. Please do not submit more than 5 pages overall. Your answers
for each section should be self-contained.
\newsec{Lexical Semantics: Error analysis}%
In this exercise you will perform an error analysis which is based on
the results you got from training the classifiers in Lab 11. In Lab
11, you have already analyzed the performance of a classifer. However,
in addition to that it is useful to perform an error analysis. By
providing additional arguments to {\tt wsd\_classifier()}, you can get
a confusion matrix as well as a printout of all the errors.
\begin{center}
\fbox{
\scalebox{0.55}{
\lstinputlisting{code/error.py}
}}
\end{center}
The relevant arguments are {\tt confusion\_matrix=True} and {\tt
log=True}, but note that you must also insert the argument {\tt
distance=3} to get the right interpretation. The errors are printed
to an external file called {\tt errors.txt}. Do an error analysis of
the \textbf{best performing classifier for each target word}, answering the
following questions:
\begin{enumerate}
\item Using the confusion matrix, identify which sense is the hardest
one for the model to estimate. Look in {\tt errors.txt} for examples
where that hardest word sense is the correct label. Do you see any
patterns or systematic errors? If so, can you think of a way to
adapt the feature representation to improve the model?
\textcolor{UUred}{[ca. 1/2 page of discussion]}
\item Get familiar with the concept of Precision, Recall and F-Measure
(coursebook, Chapter 13.5.3) and calculate them based on the
contingency table of the best performing classifier for one of the
words. Then, discuss the (possible) advantages of Precision, Recall
and F-measure as opposed to Accuracy when you want to improve the
performance of a tool (in our case: a classifier)
\textcolor{UUred}{[ca. 1/2 page]}
\end{enumerate}
\newsec{Semantic Role Labeling}%
In Lab 12, you annotated some data and discussed with your classmates.
Describe this experience and discuss inter-annotator agreement. Give
examples of cases where agreement was good and where it was bad. Look
up measures of inter-annotator agreement on the internet and calculate
one of them. \\\textcolor{UUred}{[at least 1 page for this section]}
%\todo[inline]{Can we do that? :D} \todo[inline,
%color=yellow!40]{FC: giving examples is maybe not sufficient? We may
% wanna ask them to reflect on the examples, give arguments for their
% reasoning behind cases which led to disagreements?}
\clearpage
\section{VG: WordNet}
Formally define the following relations: \textit{polysemy, hyperonomy, hyponymy,
holonymy, meronymy,} and \textit{troponomy}. Familiarize yourself with the NLTK WordNet interface\footnote{https://www.nltk.org/howto/wordnet.html} and demonstrate an example of each relation as attested in WordNet (using different Synsets). Consider the follow abstract concepts: “ambiguity”, “abstraction”, “understanding”. Do the WordNet entries of these concepts (e.g. their hypernyms and hyponyms) align with your own categorization? Are these better organised than some more “concrete” concepts, e.g. “tree”, “paper”, or “pillow”?
\section{Grading Criteria}
To pass the assignment, you must meet all the basic criteria on all
subparts of the assignment. To get VG, you must in addition meet some
of the additional criteria for most of subparts.
\begin{titlemize}{Basic Criteria}
\item Answers are given in understandable English.
\item Answers are stated clearly and coherently.
\item Answers are essentially correct.
\end{titlemize}
\begin{titlemize}{Additional Criteria}
\item Answers are well motivated.
\item Answers are well illustrated.
\item Answers reveal extensive knowledge of the textbook
chapter(s).
\end{titlemize}
% \begin{titlemize}{Points}
% \item 1-4: poor on each of the basic criteria
% \item 5-6: fair on each of the basic criteria
% \item 7: at least fair on each of the basic criteria and fair on some of
% the additional criteria
% \item 8: at least good on each of the basic criteria and at least fair on
% each of the additional criteria
% \item 9: at least good on each of the basic criteria and at least fair and
% most times good or excellent on each of tem the additional criteria
% \item 10: good or excellent on each of the basic and additional criteria.
% \end{titlemize}
\section{Submit the Assignment}
%\noindent
Please submit your assignment though studium as a PDF file without identifying
information (i.e., do not include your name in the report or the file name). It
should follow the style and margins given in the example submission, even if
not created with LaTeX. The deadlines can be found in studium.
%Submit your assignment as a pdf file named
%firstname\_lastname\_assignment\_4.pdf. It should follow the style and
%margins given in the example submission even if not created with
%LaTeX. See deadline on studentportalen.
% \todo[inline, color=blue!40]{FC: we should decide whether or not we
% want them to submit in LaTeX}
% \noindent
% Submit your assignment as a pdf file named
% firstname\_lastname\_assignment\_2.pdf. It should follow the style and
% margins given in the example submission even if not created with
% LaTeX. The submission is due on \textit{studentportalen} before Friday
% 15th December 20h00. Later submissions will be considered failed
% submissions and assessed after the final re-submission deadline on
% January 15th. To pass the assignment, you must have answered both
% sections, reached at least 5 points in each section and at least 12
% overall. To get VG, you should obtain at least 8 in each question and
% at least 18 points overall.\todo[inline, color=yellow!40]{FC: This
% sentence is a bit redundant. They cannot get 18 points without
% getting at least 8 points in each question anyway, right? Moreover,
% I changed the wording here from 'questions' to 'sections' in order
% to make it fit with the text in the introduction where we say that
% each \textbf{section} is worth 10 points.} \todo[inline,
% color=blue!40]{FC: maybe say ``must have answered \underbar{all}
% questions?''. Sometimes, the large questions contain sub-questions.}
% \todo[inline, color=blue!40]{FC: I added the re-submission policy here
% and will add it to the webpage as well.}
\end{document}