Skip to content

Data-Science-for-Linguists-2020/Arabic-Learner-Corpus-Considerations

Repository files navigation

Arabic-Learner-Corpus-Considerations

Anthony Verardi | [email protected] | University of Pittsburgh

Project completed 4/24/2020


About the Project

This project explores the contents of the Arabic Learner Corpus (ALC) to assess how they might be applied to Second Language Acquisition/Teaching. The ALC is a collection of written and spoken texts collected from learners of Modern Standard Arabic (MSA) in Saudi Arabia, including both native speaker learners (learning MSA as a prestige variant) and non-native speaker learners. The XML files also accompanied by metadata about each participant and each observation of their data.

Directory

Folders

  • Notebooks: Jupyter Notebooks that contain all of the coding and preliminary analysis done for this project
  • Presentation: a short presentation outlining the preliminary findings of this project, available as both a full PowerPoint presentation with voiceover or .pdf slides
  • Data: samples of the dataset used for this project, namely the first 1000 original XML files (GitHub won't allow me to upload > 1000 files). Note: none of the original XML files have been altered! The cleaning process was done entirely on imported data in my Organization Notebook, leaving the originals untouched.
  • Visualizations: image file copies of all visualizations created over the course of this project

Files

  • .gitignore: a list of filetypes my repository is set to ignore on my local rig
  • final_report.md: the final report for this project containing full analysis and conclusions
  • LICENSE.md: the license under which this project has been made publicly available; you can find a quick overview of the license on this page
  • README.md: the document you are currently reading!
  • progress_report.md: markdown file documenting the development of this project
  • project_plan.md: markdown file containing the original and revised project plans for this work

Licensing

This project is licensed under a Creative Commons Attribution-NonCommercial (CC BY-NC 4.0). Choose this license if you want to permit others to share (mirror) and adapt (borrow and alter) your mod content, providing that they credit you and don't use your work for commercial purposes.

Original corpus credit to:

Alfaifi, A., Atwell, E. and Hedaya, I. (2014). Arabic Learner Corpus (ALC) v2: A New Written and Spoken Corpus of Arabic Learners. In the proceedings of the Learner Corpus Studies in Asia and the World (LCSAW) 2014, 31 May - 01 Jun 2014. Kobe, Japan. http://www.arabiclearnercorpus.com.

Have a comment? Visit my guest book here!

About

Anthony's term project exploring the contents of the Arabic Learner Corpus

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published