This repository contains the code, data and results for our paper:
GPT Detectors Are Biased Against Non-Native English Writers
Weixin Liang*, Mert Yuksekgonul*, Yining Mao*, Eric Wu*, James Zou
arXiv: 2304.02819
@article{Liang2023GPTDA,
title={GPT detectors are biased against non-native English writers},
author={Weixin Liang and Mert Yuksekgonul and Yining Mao and Eric Wu and James Y. Zou},
journal={Patterns},
year={2023},
volume={4},
url={https://api.semanticscholar.org/CorpusID:257985499}
}
@article{liang2023gpt,
title={GPT detectors are biased against non-native English writers},
author={Weixin Liang and Mert Yuksekgonul and Yining Mao and Eric Wu and James Zou},
year={2023},
eprint={2304.02819},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
The rapid adoption of generative language models has brought about substantial advancements in digital communication, while simultaneously raising concerns regarding the potential misuse of AI-generated content. Although numerous detection methods have been proposed to differentiate between AI and human-generated content, the fairness and robustness of these detectors remain underexplored. In this study, we evaluate the performance of several widely-used GPT detectors using writing samples from native and non-native English writers. Our findings reveal that these detectors consistently misclassify non-native English writing samples as AI-generated, whereas native writing samples are accurately identified. Furthermore, we demonstrate that simple prompting strategies can not only mitigate this bias but also effectively bypass GPT detectors, suggesting that GPT detectors may unintentionally penalize writers with constrained linguistic expressions. Our results call for a broader conversation about the ethical implications of deploying ChatGPT content detectors and caution against their use in evaluative or educational settings, particularly when they may inadvertently penalize or exclude non-native English speakers from the global discourse.
Figure 1: Bias in GPT detectors against non-native English writing samples.
Figure 2: Simple prompts effectively bypass GPT detectors.
.
├── README.md
├── Data_and_Results/
├── Human_Data/
├── TOEFL_real_91/
├── name.json
├── data.json
├── [GPT Detector Name].json
├── TOEFL_gpt4polished_91/
├── ...
├── CollegeEssay_real_70/
├── CS224N_real_145/
├── HewlettStudentEssay_real_88/
├── HewlettStudentEssay_GPTsimplify_88/
├── GPT_Data/
├── CollegeEssay_gpt3_31/
├── CollegeEssay_gpt3PromptEng_31/
├── CS224N_gpt3_145/
├── CS224N_gpt3PromptEng_145/
├── Code/
The Data_and_Results
folder contains the human-written and AI-generated datasets used in our study, as well as the detection results of GPT detectors. Each subfolder contains a name.json
file, which provides the metadata, and a data.json
file, which contains the text samples.
@article{liang2023gpt,
title={GPT detectors are biased against non-native English writers},
author={Weixin Liang and Mert Yuksekgonul and Yining Mao and Eric Wu and James Zou},
year={2023},
eprint={2304.02819},
archivePrefix={arXiv},
primaryClass={cs.CL}
}