Students fail to benefit from learning opportunities and do not contribute their own work when they falsely utilized ChatGpt. As a result, we could potentially see downfalls in education systems leading to society losing the original purpose of education. We aim to compare the academic performance of students and ChatGPT by automatically grading student-written and ChatGPT-generated essays using a machine-learning model. And answer the following question: is it worthwhile for a student to use ChatGPT, knowing that they could potentially face academic penalties?
-
Kaggle (6230 essays)
- Total of 8 essay prompts
- Excluding 4 prompts as they required background information to the texts we didn’t have access to
- 4 prompts remained: prompts 1, 2, 7 and 8, total 6230 essays
-
ChatGPT (609 essays)
- Used the OpenAI API to generate essays
- Chose the “text-curie-001” model as it’s very capable, fast and low cost
-
LSTM Network (LSTM) Network
-
The model used mean squared error (MSE) loss
- Loss: 0.0261
- MAE: 0.127
- When observing essay vs score distribution we notice that the ChatGpt score distribution is right-skewed and the human written score distribution is left-skewed, indicating that there may be some underlying issues with the data. However, a larger sample size may help reduce the impact of skewness and bring the means of both graphs closer together using the central limit theorem.
- As a result, it appears that ChatGPT is more capable of generating average essays, and is less likely to produce very poor or great ones compared to human ability. Because the ChatGPT model has a narrower distribution of scores and less extreme values than the human-generated essays.