-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathproceedings.yaml
399 lines (398 loc) · 37.2 KB
/
proceedings.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
-
id: 2
title: "Think Global, Write Local -- Patterns of Writing Dialect on SNS"
shorttitle: "Patterns of Writing Dialect on SNS"
title_utf8: "Think Global, Write Local – Patterns of Writing Dialect on SNS"
authors: "Aivars Glaznieks"
shortauthors: "A.Glaznieks"
authors_utf8:
- name: "Glaznieks, Aivars"
affiliation: "Eurac Research, Italy"
keywords: "orthography, dialect writing, Facebook"
abstract: |
Social Network Sites (SNS) claim that they are "on a mission to connect the world". They facilitate communication among people wherever they are located. Consequently, many users of SNS communicate with a broad and heterogenic group of friends on different occasions and thereby express various aspects of their identities (such as gender, age, ethnic background etc.). One aspect may also be a local identity.
Users of SNS can show their local identity linguistically by using a regional variety. Sometimes, the use of single regionally marked words or sporadic regiolectal spellings are sufficient to identify the regional background of the writer; in other cases entire text messages and conversations appear in dialectal spellings meaning that the dialect appears as the main variety of the conversation. The extent of dialect use in computer-mediated communication (CMC) may depend on various factors such as the individual dialect skills, the vividness and prestige of the respective dialect in the community, emotional involvement in the given topic, age, gender, the intended recipient, and other factors probably interacting with each other.
The use of regional dialects in written CMC is one reason (amongst others) why language in CMC often differs from the respective standard languages. Since no orthographic rules are usually available for writing in dialect, it is up to the users to represent their dialect in a proper but readable and comprehensible way. Users have to construct their regiolectal language variety on the basis of the orthography of the respective standard language, which usually allows also for variation. One reason for this may be various adequate possibilities to represent a dialect word within a given writing system (e.g. German). Another reason may be the (sometimes very slight) phonetic differences between regionally close dialects that writers want (or do not want) to turn up in the dialect respelling. Therefore, dialect respellings are not always coherent (neither with respect to a group of dialect speakers nor with respect to individual writers) but usually appear in various forms. However, unifications of respellings in CMC are described for pidgin languages and also occur in dialectal CMC.
Over the last decade, researchers started to compile corpora containing different genres of CMC. Such CMC corpora enable a systematic analysis of the way dialect features are reflected in written communication. In my talk, I will focus on patterns of the regional dialect(s) in the <a href="http://www.eurac.edu/didi">DiDi Corpus</a>, a collection of Facebook messages from around 100 South Tyrolean writers. I will provide examples of regional features, analyse the distribution of such features, and discuss challenges of identifying local writings on SNS.
types:
- invitedtalk
-
id: 3
title: "Small vs.~Big Data in Language Research: Challenges and Opportunities"
shorttitle: "Small vs.~Big Data in Language Research"
title_utf8: "Small vs. Big Data in Language Research: Challenges and Opportunities"
authors: 'A.~Seza Do\u{g}ru\"oz'
shortauthors: 'A.S.Do\u{g}ru\"oz'
authors_utf8:
- name: "A. Seza Doğruöz"
affiliation: "Independent Researcher"
keywords: "social media data, machine learning, small vs. large data sets, multilingualism"
abstract: |
Mobile communication tools and platforms provide various opportunities for users to interact over social media. With the recent developments in computational research and machine learning, it has become possible to analyze large chunks of language related data automatically and fast. However, these tools are not readily available to handle data in all languages and there are also challenges handling social media data. Even when these issues are resolved, asking the right research question to the right set and amount of data becomes crucially important.
Both qualitative and quantitative methods have attracted respectable researchers in language related areas of research. When tackling similar research problems, there is need for both top-down and bottom-up data-based approaches to reach a solution. Sometimes, this solution is hidden under an in-depth analysis of a small data set and sometimes it is revealed only through analyzing and experimenting with large amounts of data. However, in most cases, there is need for linking the findings of small data sets to understand the bigger picture revealed through patterns in large sets.
Having worked with both small and large language related data in various forms, I will compare pros and cons of working with both types of data across media and contexts and share my own experiences with highlights and lowlights.
types:
- invitedtalk
-
id: 1
title: "CLARIN Survey of CMC Resources and Tools"
authors: 'Darja Fi\v{s}er'
shortauthors: 'D.Fi\v{s}er'
authors_utf8:
- name: "Darja Fišer"
affiliation: "Faculty of Arts - University of Ljubljana, Slovenia"
keywords: "CLARIN ERIC, research infrastructure, language resources, NLP tools, computer-mediated communication"
abstract: |
With the growing volume and importance of computer-mediated communication, the need to understand its linguistic and social dimensions, along with CMC-robust language technologies is on the rise as well. This is reflected in the increasing number of conferences, projects and positions involving analysis of CMC in a wide range of disciplines in Digital Humanities, Social Sciences and Computer Science. As a result, a number of valuable CMC corpora, datasets and tools are being developed but unfortunately, due to non-negligible technical, legal and ethical obstacles, not many are being shared and reused.
Since it is the mission of CLARIN to create and maintain an infrastructure to support the sharing, use and sustainability of language data and tools for researchers in Digital Humanities and Social Sciences, it is our goal to have a good overview of the available resources and tools, to offer support to their developers to overcome the technical, legal and ethical obstacles and deposit them to the CLARIN infrastructure, as well as to the researchers with diverse backgrounds, such as linguistics, media studies, psychology etc., but also to interested parties from the educational, commercial, political, medical and legal sectors of the society who are interested in using them.
The first step in this direction was an <a href="https://www.clarin.eu/event/2017/clarin-plus-workshop-creation-and-use-social-media-resources">interdisciplinary workshop</a> on the creation and use of social media which was organized within the Horizon 2020 CLARIN-PLUS project on 18 and 19 May 2017 in Kaunas, Lithuania. The aims of the workshop were to demonstrate the possibilities of social media resources and natural language processing tools for researchers with a diverse research background and an interest in empirical research of language and social practices in computer-mediated communication, to promote interdisciplinary cooperation possibilities, and to initiate a discussion on the various approaches to social media data collection and processing.
The workshop also served as a platform to conduct a <a href="https://office.clarin.eu/v/CE-2017-1064-Resources-for-computer-mediated-communication.docx">survey</a> of corpora, datasets and tools of computer-mediated communication in the languages spoken in countries that are members and observers of CLARIN ERIC. Apart from identifying the existing resources and tools, our motivation was to establish to which extent they are accessible through the CLARIN infrastructure and how the information and accessibility of them could be further optimized from a user perspective.
In this talk, we will give an overview of the identified corpora, the smaller, more focused datasets and tools that are tailored to processing computer-mediated communication. The focus of the talk will be on the comprehensiveness of the provided metadata, level of availability and accessibility of the identified resources and tools and the degree of their actual or potential inclusion in the CLARIN infrastructure. We will also discuss the simple and long-term possibilities of enriching the current state of the infrastructure and provide guidelines for creating and depositing CMC resources with a CLARIN center.
types:
- invitedtalk
-
id: 2
title: "European Language Ecology and Bilingualism with English on Twitter"
shorttitle: "Language Ecology and Bilingualism with English"
authors: "Steven Coats"
shortauthors: "S.Coats"
authors_utf8:
- name: "Coats, Steven"
affiliation: "University of Oulu, Finland"
keywords: "bilingualism, social media, Twitter, corpus linguistics, quantitative methods"
abstract: |
The present paper deals with Flemish adolescents' informal computer-mediated communication (CMC) in a large corpus (2.9 million tokens) of chat conversations. We analyze deviations from written standard Dutch and possible correlations with the teenagers' gender, age and educational track. The concept of non-standardness is operationalized by means of a wide range of features that serve different purposes, related to the chatspeak maxims of orality, brevity and expressiveness. It will be demonstrated how the different social variables impact on non-standard writing, and, more importantly, how they interact with each other. While the findings for age and education correspond to our expectations (more non-standard markers are used by younger adolescents and students in practice-oriented educational tracks), the results for gender (no significant difference between girls and boys) do not: they call for a more fine-grained analysis of non-standard writing, in which features relating to different chat principles are examined separately.
types:
- paper
- talk
-
id: 4
title: "Reliable Part-of-Speech Tagging of Low-frequency Phenomena in the Social Media Domain"
shorttitle: "Reliable PoS Tagging of Low-frequency Phenomena"
authors: 'Tobias Horsmann, Michael Bei{\ss}wenger and Torsten Zesch'
shortauthors: 'T.Horsmann, M.Bei{\ss}wenger and T.Zesch'
authors_utf8:
- name: "Horsmann, Tobias"
affiliation: "University of Duisburg-Essen, Germany"
- name: "Beißwenger, Michael"
affiliation: "University of Duisburg-Essen, Germany"
- name: "Zesch, Torsten"
affiliation: "University of Duisburg-Essen, Germany"
keywords: "part-of-speech, social media, rare phenomena"
abstract: |
We present a series of experiments to fit a part-of-speech (PoS) tagger towards tagging extremely infrequent PoS tags of which we only have a limited amount of training data.
The objective is to implement a tagger that tags this phenomenon with a high degree of correctness in order to be able to use it as a corpus query tool on plain text corpora, so that new instances of this phenomenon can be easily found in plain text.
We focused on avoiding manual annotation as much as possible and experimented with altering the frequency weight of the PoS tag of interest in the small training data set we have.
This approach was compared to adding machine tagged training data in which only the phenomenon of interest is manually corrected.
We find that adding more training data is unavoidable but machine tagging data and hand correcting the tag of interest is sufficient.
Furthermore, the choice of the tagger plays an important role as some taggers are equipped to deal with rare phenomena more adequately than others.
The best trade off between precision and recall of the phenomenon of interest was achieved by a separation of the tagging into two steps
An evaluation of this phenomenon-fitted tagger on social media plain-text confirmed that the tagger serves as a useful corpus query tool that retrieves instances of the phenomenon including many unseen ones.
types:
- paper
- talk
-
id: 5
title: "Emoticons as multifunctional and pragmatic Resources: a corpus-based Study on Twitter"
shorttitle: "Emoticons as multifunctional and pragmatic Resources"
authors: "Stefania Spina"
shortauthors: "S.Spina"
authors_utf8:
- name: "Spina, Stefania"
affiliation: "University for Foreigners Perugia, Italy"
keywords: "emoticons, Twitter, mixed-effects models"
abstract: |
Emoticons play an important role in digital written communication: they can serve as markers either of emotions or familiarity, and they can intensify or downgrade the pragmatic force of a text.
The aim of this study is to investigate the use of emoticons in Twitter by Italian users, and to verify, by relying on corpus data and on statistical methodologies, some of the prevailing opinions on the use of emoticons: that they are technically-driven resources, that they are mostly used by young people, and more often by females, and that they are superficial and easy ways of expressing emotions using images instead of words.
A mixed-effects model analysis has shown that the use of emoticons on Twitter is affected by a complex interaction of cultural, technological, situational and sociolinguistic variables.
types:
- paper
- talk
-
id: 8
title: >
"You're trolling because\ldots" -- A Corpus-based Study of Perceived Trolling and Motive Attribution in the Comment Threads of Three British Political Blogs
shorttitle: >
"You're trolling because\ldots"
title_utf8: "“You’re trolling because…” – A Corpus-based Study of Perceived Trolling and Motive Attribution in the Comment Threads of Three British Political Blogs"
authors: "M\\'arton Petyk\\'o"
shortauthors: "M.Petyk\\'o"
authors_utf8:
- name: "Petykó, Márton"
affiliation: "Lancaster University, United Kingdom"
keywords: "troll(ing), motive attribution, blog"
abstract: |
This paper investigates the linguistically marked motives that participants attribute to those they call trolls in 991 comment threads of three British political blogs. The study is concerned with how these motives affect the discursive construction of trolling and trolls. Another goal of the paper is to examine whether the mainly emotional motives ascribed to trolls in the academic literature correspond with those that the participants attribute to the alleged trolls in the analysed threads. The paper identifies five broad motives ascribed to trolls: emotional/mental health-related/social reasons, financial gain, political beliefs, being employed by a political body, and unspecified political affiliation. It also points out that depending on these motives, trolling and trolls are constructed in various ways. Finally, the study argues that participants attribute motives to trolls not only to explain their behaviour but also to insult them.
types:
- paper
- talk
-
id: 9
title: "The \\#Id\\'eo2017 Platform"
shorttitle: "\\#Id\\'eo2017"
title_utf8: "The #Idéo2017 Platform"
authors: "Julien Longhi, Claudia Marinica, Nader Hassine, Abdulhafiz Alkhouli and Boris Borzic"
shortauthors: "J.Longhi, C.Marinica, N.Hassine, A.Alkhouli and B.Borzic"
authors_utf8:
- name: "Longhi, Julien"
affiliation: "Université de Cergy-Pontoise, France"
- name: "Marinica, Claudia"
affiliation: "ETIS - ENSEA UCP CNRS - UMR 8051, France"
- name: "Hassine, Nader"
affiliation: "ETIS, France"
- name: "Alkhouli, Abdulhafiz"
affiliation: "ETIS, France"
- name: "Borzic, Boris"
affiliation: "ETIS, France"
keywords: "NLP for social media, NLP applications, textometry, tweets mining"
abstract: |
The #Idéo2017 platform allows citizens to analyze the tweets of the 11 candidates at the French 2017 Presidential Election. #Idéo2017 processes the messages of the candidates by creating a corpus in almost real time. By using techniques from linguistics supplied with tools, #Idéo2017 is able to provide the main characteristics of the corpus and of the employment of the political lexicon, and allows comparisons between the different candidates.
types:
- paper
- talk
-
id: 12
title: "Modeling Non-Standard Language Use in Adolescents' CMC: The Impact and Interaction of Age, Gender and Education"
shorttitle: "Non-Standard Language Use in Adolescents' CMC"
authors: "Lisa Hilte, Reinhild Vandekerckhove and Walter Daelemans"
shortauthors: "L.Hilte, R.Vandekerckhove and W.Daelemans"
authors_utf8:
- name: "Hilte, Lisa"
affiliation: "University of Antwerp, Belgium"
- name: "Vandekerckhove, Reinhild"
affiliation: "University of Antwerp, Belgium"
- name: "Daelemans, Walter"
affiliation: "University of Antwerp, Belgium"
keywords: "non-standardness, teenage talk, language modeling"
abstract: |
The present paper deals with Flemish adolescents' informal computer-mediated communication (CMC) in a large corpus (2.9 million tokens) of chat conversations. We analyze deviations from written standard Dutch and possible correlations with the teenagers' gender, age and educational track. The concept of non-standardness is operationalized by means of a wide range of features that serve different purposes, related to the chatspeak maxims of orality, brevity and expressiveness. It will be demonstrated how the different social variables impact on non-standard writing, and, more importantly, how they interact with each other. While the findings for age and education correspond to our expectations (more non-standard markers are used by younger adolescents and students in practice-oriented educational tracks), the results for gender (no significant difference between girls and boys) do not: they call for a more fine-grained analysis of non-standard writing, in which features relating to different chat principles are examined separately.
types:
- paper
- talk
-
id: 13
title: "Corpus-Based Analysis of Demonyms in Slovene Twitter"
authors: 'Taja Kuzman and Darja Fi\v{s}er'
shortauthors: 'T.Kuzman and D.Fi\v{s}er'
authors_utf8:
- name: "Kuzman, Taja"
affiliation: "Faculty of Arts - University of Ljubljana, Slovenia"
- name: "Darja Fišer"
affiliation: "Faculty of Arts - University of Ljubljana, Slovenia"
keywords: "demonyms, nationalities, Twitter, discourse analysis, Slovene"
abstract: |
This paper reports on a corpus-based analysis of demonym mentions in the corpus of Slovene tweets. First, we analyze the frequency of mentions for the demonyms for the inhabitants of the European and G8 countries. Then, we focus on the representation of demonyms for residents of Slovenia’s neighboring countries: Austria, Italy, Hungary and Croatia. The main topic of the tweets mentioning Croatians, Austrians and Italians is sport, whereas Hungarians occur most often in relation to the Eurovision. Some economic and political issues are also represented, such as the selling of Slovene companies to foreign firms, the refugee crisis and the arbitration procedure between Slovenia and Croatia. A collocation analysis revealed a highly stereotypical treatment of the neighboring nations and hostility of some Slovene Twitter users to inhabitants of Slovenia’s neighboring countries.
types:
- paper
- talk
-
id: 15
title: "A Comparative Study of Computer-mediated and Spoken Conversations from Pakistani and U.S. English using Multidimensional Analysis"
shorttitle: "Comparative Study of CM and Spoken Conversations"
authors: "Muhammad Shakir and Dagmar Deuber"
shortauthors: "M.Shakir and D.Deuber"
authors_utf8:
- name: "Shakir, Muhammad"
affiliation: "University of Münster, Germany"
- name: "Deuber, Dagmar"
affiliation: "University of Münster, Germany"
keywords: "register variation, multidimensional analysis, World Englishes, conversations"
abstract: |
Present study compares four computer-mediated conversational registers – comments, FB groups, FB status and tweets – and spoken conversations from Pakistani and US English using Biber's Multidimensional Analysis framework on three dimensions of variation, i.e. (i) Interactive versus Descriptive Explanatory Discourse, (ii) Expression of Stance, and (iii) Informational Focus versus 1st Person Narrative. Spoken conversations have a high score on dimension 2, while CM conversations show register and regional variation on dimension 1 and 3. FB groups are significantly different in both regional varieties, followed by FB status, comments and tweets. Pakistani FB groups discuss self-help related topics, and appear to be slightly interactive and highly informational, while the US ones are interactive and narrative discussing community and political issues. Pakistani FB status and tweets use English mainly for informational purposes, while the US counterparts have interactive and personal orientation indicating a wider functional role of English.
types:
- paper
- talk
-
id: 18
title: "Developing a protocol for collecting data in Higher Education: assessing natural language metadata for a Databank of Oral Teletandem Interactions"
shorttitle: "Developing a protocol for collecting data in Higher Education"
authors: "Paola Leone"
shortauthors: "P.Leone"
authors_utf8:
- name: "Leone, Paola"
affiliation: "Università del Salento, Italy"
keywords: "corpora, data collection, metadata, telecollaboration, language learning, protocol for data collection"
abstract: |
The current study addresses the definition of a protocol for collecting, storing data and describing (in a simple and generic way) a databank. Particularly, the transparency of a form aimed at gathering information about the pedagogical context of oral telecollaboration for language learning named Teletandem (TT; Telles, 2006) will be tested before it is spread more widely. To uncover problems in submitting information, data-input-triggers quality and reliability have been tested interviewing professors and language instructors who will be involved in a preliminary phase of Teletandem corpus implementation. General goals of the study are to enlarge the research group, to increase data and to improve efficiency in data collection.
types:
- paper
- talk
-
id: 19
title: "Investigating Interaction Signs across Genres, Modes and Languages: The Example of OKAY"
shorttitle: "OKAY"
authors: "Laura Herzberg and Angelika Storrer"
shortauthors: "L.Herzberg and A.Storrer"
authors_utf8:
- name: "Herzberg, Laura"
affiliation: "University of Mannheim, Germany"
- name: "Storrer, Angelika"
affiliation: "University of Mannheim, Germany"
keywords: "interaction signs, cross-lingual CMC study, Wikipedia talk pages"
abstract: |
The paper presents results of a case study that compared the usage of OKAY across genre types (Wikipedia articles vs. talk pages), across media (spoken vs. written interaction), and across languages (German vs. French CMC data from Wikipedia talk pages). The cross-genre study builds on the results of Herzberg 2016, who compared the usage of OKAY in German Wikipedia articles with its usage in Wikipedia talk pages. These results also form the basis for comparing the CMC genre of Wikipedia talk pages with occurrences of OKAY in the German spoken language corpus FOLK. Finally, we compared the results on the usage of OKAY in German Wikipedia talk pages with the usage of OKAY in French Wikipedia talk pages. With our case study, we want to demonstrate that it is worthwhile to investigate interaction signs across genres and languages, and to compare the usage in written CMC with the usage in spoken interaction.
types:
- paper
- talk
-
id: 23
title: "The Impact of WhatsApp on Dutch Youths' School Writing"
authors: "Lieke Verheijen and Wilbert Spooren"
shortauthors: "L.Verheijen and W.Spooren"
authors_utf8:
- name: "Verheijen, Lieke"
affiliation: "Radboud University, Netherlands"
- name: "Spooren, Wilbert"
affiliation: "Radboud University, Netherlands"
keywords: "social media, WhatsApp, school writing, literacy"
abstract: |
Today’s youths are continuously engaged with social media. The informal language they use in computer-mediated communication (CMC) often deviates from spelling and grammar rules of the standard language. Therefore, parents and teachers fear that social media have a negative impact on youths’ literacy skills. This paper examines whether such worries are justifiable. An experimental study was conducted with 500 Dutch youths of different educational levels and age groups, to find out if social media affect their productive or perceptive writing skills. We measured whether chatting via WhatsApp directly impacts the writing quality of Dutch youths’ narratives or their ability to detect ‘spelling errors’ (deviations from Standard Dutch) in grammaticality judgement tasks. The use of WhatsApp turned out to have no short-term effects on participants’ performances on either of the writing tasks. Thus, the present study gives no cause for great concern about any impact of WhatsApp on youths’ school writing.
types:
- paper
- talk
-
id: 26
title: "Anonymisation of the Dortmund Chat Corpus 2.1"
authors: 'Harald L\"ungen, Michael Bei{\ss}wenger, Laura Herzberg and Cathrin Pichler'
shortauthors: 'H.L\"ungen, M.Bei{\ss}wenger, L.Herzberg and C.Pichler'
authors_utf8:
- name: "Lüngen, Harald"
affiliation: "Institute for the German Language, Germany"
- name: "Beißwenger, Michael"
affiliation: "University Duisburg-Essen, Germany"
- name: "Herzberg, Laura"
affiliation: "University of Mannheim, Germany"
- name: "Pichler, Cathrin"
affiliation: "University Duisburg-Essen, Germany"
keywords: "corpora, anonymisation"
abstract: |
As a consequence of a recent curation project, the Dortmund Chat Corpus is available in CLARIN-D research infrastructures for download and querying. In a legal expertise it had been recommended that standard measures of anonymisation be applied to the corpus before it could be republished. This paper reports about the anonymisation campaign that was conducted for the corpus. Anonymisation has been realised as categorisation, and the taxonomy of anonymisation categories applied is introduced and the method of applying it to the TEI files is demonstrated. The results of the anonymisation campaign as well as issues of quality management are discussed. Finally, pseudonymisation as an alternative to categorisation is discussed in general as a method of the anonymisation of CMC data, as well as possibilities of a (partial) automatisation of the process.
types:
- paper
- talk
-
id: 27
title: "Fear and Loathing on Twitter: Attitudes towards Language"
authors: 'Damjan Popi\v{c} and Darja Fi\v{s}er'
shortauthors: 'D.Popi\v{c} and D.Fi\v{s}er'
authors_utf8:
- name: "Popič, Damjan"
affiliation: "Faculty of Arts - University of Ljubljana, Slovenia"
- name: "Darja Fišer"
affiliation: "Faculty of Arts - University of Ljubljana, Slovenia"
keywords: "orthography, (linguistic) prestige"
abstract: |
The paper deals with the sociolinguistic concept of prestige imbued in the notion of standard language, and the social status connected to the inherent language skill (or lack thereof). To this end, we analyse Slovenian tweets pertaining to language use and the (in-)correctness of other users’ use of language, propose a typology, especially in cases where language use is used as an argument against someone’s qualifications or beliefs.
types:
- paper
- talk
-
id: 28
title: "Connecting Resources: Which Issues Have to be Solved to Integrate CMC Corpora from Heterogeneous Sources and for Different Languages?"
shorttitle: "Connecting Resources"
authors: >
Michael Bei{\ss}wenger, Ciara Wigham, Carole Etienne, Holger Grumt Su\'arez, Laura Herzberg, Darja Fi\v{s}er, Erhard Hinrichs, Tobias Horsmann, Natali Karlova-Bourbonus, Lothar Lemnitzer, Julien Longhi, Harald L\"ungen, Lydia-Mai Ho-Dac, Christophe Parisse, C\'eline Poudat, Thomas Schmidt, Egon Stemle, Angelika Storrer and Torsten Zesch
shortauthors: 'M.Bei{\ss}wenger, C.Wigham and colleagues'
authors_utf8:
- name: "Beißwenger, Michael"
affiliation: "University Duisburg-Essen, Germany"
- name: "Wigham, Ciara"
affiliation: "Université Clermont Auvergne, France"
- name: "Etienne, Carole"
affiliation: "ICAR Laboratory Lyon, France"
- name: "Grumt Suárez, Holger"
affiliation: "Justus-Liebig-Universität Gießen, Germany"
- name: "Herzberg, Laura"
affiliation: "University of Mannheim, Germany"
- name: "Darja Fišer"
affiliation: "Faculty of Arts - University of Ljubljana, Slovenia"
- name: "Hinrichs, Erhard"
affiliation: "Eberhard-Karls-Universität Tübingen, Germany"
- name: "Horsmann, Tobias"
affiliation: "University of Duisburg-Essen, Germany"
- name: "Karlova-Bourbonus, Natali"
affiliation: "Justus-Liebig-Universität Gießen, Germany"
- name: "Lemnitzer, Lothar"
affiliation: "Berlin-Brandenburg Academy of Sciences, Germany"
- name: "Longhi, Julien"
affiliation: "Université de Cergy-Pontoise, France"
- name: "Lüngen, Harald"
affiliation: "Institute for the German Language, Germany"
- name: "Ho-Dac, Lydia-Mai"
affiliation: "Université Toulouse 2, France"
- name: "Parisse, Christophe"
affiliation: "Université Paris Nanterre, France"
- name: "Poudat, Céline"
affiliation: "Université Nice Côte d’Azur, France"
- name: "Schmidt, Thomas"
affiliation: "Institute for the German Language, Germany"
- name: "Stemle, Egon W."
affiliation: "Eurac Research, Italy"
- name: "Storrer, Angelika"
affiliation: "University of Mannheim, Germany"
- name: "Zesch, Torsten"
affiliation: "University of Duisburg-Essen, Germany"
keywords: "corpora, research infrastructures, annotation, anonymisation"
abstract: |
The paper reports on the results of a scientific colloquium dedicated to the creation of standards and best practices which are needed to facilitate the integration of language resources for CMC stemming from different origins and the linguistic analysis of CMC phenomena in different languages and genres. The key issue to be solved is that of interoperability – with respect to the structural representation of CMC genres, linguistic annotations metadata, and anonymization/pseudonymization schemas. The objective of the paper is to convince more projects to partake in a discussion about standards for CMC corpora and for the creation of a CMC corpus infrastructure across languages and genres. In view of the broad range of corpus projects which are currently underway all over Europe, there is a great window of opportunity for the creation of standards in a bottom-up approach.
types:
- paper
- talk
-
id: 14
title: "MoCoDa 2: Creating a Database and Web Frontend for the Repeated Collection of Mobile Communication (WhatsApp, SMS \\& Co)"
shorttitle: "MoCoDa 2"
title_utf8: "MoCoDa 2: Creating a Database and Web Frontend for the Repeated Collection of Mobile Communication (WhatsApp, SMS & Co)"
authors: 'Michael Bei{\ss}wenger, Marcel Fladrich, Wolfgang Imo and Evelyn Ziegler'
shortauthors: 'M.Bei{\ss}wenger, M.Fladrich, W.Imo and E.Ziegler'
authors_utf8:
- name: "Beißwenger, Michael"
affiliation: "University Duisburg-Essen, Germany"
- name: "Fladrich, Marcel"
affiliation: "University of Münster, Germany"
- name: "Imo, Wolfgang"
affiliation: "University of Halle-Wittenberg, Germany"
- name: "Ziegler, Evelyn"
affiliation: "University Duisburg-Essen, Germany"
keywords: "corpora, collection strategies, WhatsApp, annotation"
abstract: |
The poster reports about intermediate results of MoCoDa 2, an ongoing project funded by the Ministry for Innovation, Science, Research and Technology of the German federal state North Rhine-Westphalia in which we are developing a database and web frontend for the repeated, donation-based collection of CMC interactions from smartphone messaging apps like WhatsApp. The database shall serve as a resource not only for quantitative but also for qualitative approaches in the analysis of CMC. MoCoDa 2 builds on experiences from the preceding project MoCoDa which has collected a (relatively small) set of 2,198 interactions with 19,161 user posts or ca. 193,000 tokens since 2012. For MoCoDa 2 the database and web frontend will be re-implemented from the scratch and expanded with additional functions and features:
- A form for donating and editing the data, which involves the donators into the editing and anonymization process and assists them with capturing metadata on the context and topic of the donated sequences as well as on the interlocutors and their social relations. Anonymization will follow an anonymization guideline developed in the CLARIN-D curation project ChatCorpus2CLARIN.
- Part-of-speech annotations which comply with the extended ‘STTS 2.0’ tagset for German CMC and which will be created using a toolchain provided by the Language Technology Lab (LTL) at the University of Duisburg-Essen.
- A TEI export for the collected data on basis of the ‘CLARIN-D TEI schema for CMC’.
Through adopting the STTS 2.0 tagset and a TEI-based export format the corpus data will be interoperable with corpora that are already part of the CLARIN-D corpus infrastructure at the Institute for the German Language (IDS) in Mannheim. To allow for comparative analyses of the MoCoDa 2 data with the discourse found in text corpora and in other CMC corpora, MoCoDa 2 will not only be made available as a standalone resource but also be integrated into the German Reference Corpus (DeReKo) at the IDS Mannheim.
types:
- paper
- poster
-
id: 21
title: "The graphic realization of /l/-vocalization in Swiss German WhatsApp messages"
shorttitle: "/l/-vocalization in Swiss German"
authors: "Simone Ueberwasser"
shortauthors: "S.Ueberwasser"
authors_utf8:
- name: "Ueberwasser, Simone"
affiliation: "University of Zurich, Switzerland"
keywords: "Swiss German dialects, /l/-vocalization, WhatsApp: dialect use in"
abstract: |
/l/-vocalization is a feature normally found in a rather clear cut area in the western part of Switzerland. Its geographical boundaries are well documented as well as social influences on the realisation of the variants. This study, based on a corpus of authentic WhatsApp messages, takes another approach by documenting isolated forms of /l/-vocalization outside the area traditionally attributed to the feature.
types:
- paper
- poster
-
id: 25
title: "Public Service News on Facebook: Exploring Journalistic Usage Patterns and Reaction Data"
shorttitle: "Public Service News on Facebook"
authors: "Daniel Pfurtscheller"
shortauthors: "D.Pfurtscheller"
authors_utf8:
- name: "Pfurtscheller, Daniel"
affiliation: "Universität Innsbruck, Austria"
keywords: "Facebook, social media interaction, public service news, metadata, media linguistics"
abstract: |
As social networking sites have become staples in everyday life an increasing number of people worldwide use social media as a source of news. To reach this audiences, news organisations and public service broadcasters have ventured on services such as Facebook, which in terms of news is by far the most important social networking site in many parts of Europe. This poster presentation explores the ways in which public service media from different European countries are delivering news on public Facebook Pages. The analysis is based on public data gathered from different Facebook pages operated by national broadcasting agencies. The data are extracted using the public Facebook Graph API. The corpus contains all the posts and comments of the Facebook Pages as well as related metadata. No personally-identifiable information is collected.
The social media data are explored using statistical research methods to identify and compare different usage patterns and to visualise the reactions of Facebook users. This provides an overview over the different forms of content (i.e. types of posts) and the basic communicative practices that can be observed in the context of the Facebook Pages (i.e. number of comments, shares, likes and Reaction types). To allow deeper insights an exploratory case-study approach is used. Drawing upon media linguistic research the focus is on the micro level of the media texts and their multimodal design. The in-depth analysis aims to characterise different forms of news reporting via Facebook and looks at the different usage of multimodal ressources in the context of the Facebook posts and comments.
This combination of qualitative and quantitative methods should allow a better understanding of how Facebook is used as a means of news distribution by public service media providers on a large scale and how technical affordances shape the design of news content and follow-up interactions. This knowledge is critical for the discussion of the emerging role of social media in the context of public opinion and political decision-making. The poster presents the project as work in progress and shows preliminary findings.
types:
- paper
- poster