Skip to content

Latest commit

 

History

History
34 lines (24 loc) · 2.47 KB

project_plan.md

File metadata and controls

34 lines (24 loc) · 2.47 KB

Elena Cimino; [email protected]; 30 January 2019

ESL Article Acquisition

SUMMARY

English has a complex article system, with definite, indefinite, and zero articles. This system can pose significant problems in acquisition for L2 English learners. I have a particular interest in Arabic, having studied it in undergrad. Arabic has definite and zero articles, but no indefinite articles. I would like to compare the acquisition of the English article system by Arabic L1 speakers with Spanish (which has definite, indefinite, and zero articles) and possibly Korean, which lacks an overt article system.

Because there are many article rules and usages, I will focus on the usage of articles with 10 count and 10 non-count nouns in particular.

DATA

I will be using two corpora: the BuiD Arabic Learner Corpus (BALC, available for download here: http://www.buid.ac.ae/balc) and the Pitt English Language Institute Corpus (PELIC, https://github.com/ELI-Data-Mining-Group/Pitt-ELI-Corpus)

Because PELIC has already been in development for some time, this corpus will be used mostly for linguistic analysis. The corpus I will be working on for the data component of this project is the BALC corpus.

BALC is comprised of 1,865 written texts from year 12 high school students and 1st year university students. Some texts are coded for student corrections (e.g. insertions, deletions). All texts are available in .txt files, and some essays (those written for the Common Educational Proficiency Assessment, or CEPA) are available in both .txt files and with an accompanying photo of the essay in .png format.

Clean-up

BALC:

  • Text files need to be read in, cleaned, tokenized, etc.
  • The tagging mentioned above is inconsistent, so that will need to be dealt with.

ANALYSIS

As mentioned, I would like to investigate the acquisition of articles in L2 English, with reference to specific mass and count nouns.

Questions:

  • Is there an influence from the L1?
  • Which articles do L2 speakers have the most difficulty acquiring?
  • Are there specific situations in which L2 speakers are more likely to make mistakes? (form-function)
  • Concerning L1 Arabic speakers, is there a difference in acquisition of articles between the two corpora (BALC and PELIC)?

Hypotheses:

  • Speakers from all L1 groups will overuse definite article
  • Arabic speakers will have biggest problem with indefinite articles
  • Korean speakers will be more accurate than Arabic and Spanish L1 speakers with indefinite article usage