Questions for Dr. Lauren Collister
See this page for instructions: https://naraehan.github.io/Data-Science-for-Linguists-2019/todo#todo7
Add yours below.
- Why does linguist llama want to be a schwa? (Na-Rae the fake student)
- How do we credit the original data source when it is publicly available for non-commercial research purposes, but we alter it to fit our needs for the specific project? Are we allowed to redistribute any of it, or do we just redirect to the original data source posting? (Katie)
- Literary corpora are typically derived works, that is, constructed from literary sources that are not original with the compiler or editor of the corpus. To what extent can the compiler and editor of a corpus legally or ethically control or restrict reuse of the literary materials in that corpus? (David)
- Similarly, movie corpora contain data from previously published work. Is the compiler of the original corpus restricted in what they can include/how much they can redistribute by copyright/licensing laws from the movies' creators? To what extent? Does it vary based on what exactly the corpus contains (ie samples vs full scripts)? (Cassie)
- If I'm using someone else's corpus which they've licensed for public use but also contains another person's previously published data, how do I know that the corpus creator is following copyright/licensing laws? (Cassie)
- What are some ways to insure research participant's anonymity and safety when publishing results or data using inherently personal data? E.g. demographics, sociolinguistic background. (Eva)
- Two of "The Four Factors" of fair use are "The nature of the copyrighted work" and "The amount and substantiality of the portion used in relation to the copyrighted work as a whole." However, the concepts of "nature", "amount", and "substantiality" can vary widely depending on the subject at hand. Who gets to determine these definitions? And do you have any general guidelines about what constitutes a "substantial" amount? (John)
- If I'm using a corpus of student essays and it contains possibly identifying information of some users (e.g. name, school, etc.), is it my duty to anonymize or remove that information? Or should I assume that the corpus authors obtained the users' permission to publish that information? (Elena)
- We can easily access a wide variety of linguistic data or corpus from the Internet. How can we know which data is legality and legitimacy, Which data should be retained, shared, and/or preserved? Are there any restrictions on data sharing required? (Ting-Wei)
- If a copyright claim expires, but then the copyright law is changed to extend the duration of copyright claims, is the expired claim automatically renewed? Similarly, if the duration of all copyright claims is shortened, thus causing some to expire, do the owners of the claim get a grace period to renew? Or do the protections expire immediately when the law changes? (Matt)
- If I'm using a corpus or data that was originally created outside of the US, would fair use guidelines not apply and/or be different from those in the United States? If so, where can I find information on copyright restrictions and guidelines for other countries? (Patrick)
- If a corpora was published in a different country, are there some licensing laws that transfer differently? (Goldie)