GPT Gets Quizzed: A Mixed-Method Study on ChatGPT’s Performance in Answering the SATs

E. Tan, K. Ramos, M. Nazario, S. Lim, and S. Chu (12-36)

 

Abstract

Artificial intelligence has been integral to daily societal systems, including modern education through tools like OpenAI’s ChatGPT. While past studies have assessed ChatGPT’s performance in various domains, such as law and medicine, a gap in research on the analysis of its efficacy in secondary school-level subjects persists. Therefore, this study assessed the performance of ChatGPT in high school level linguistics and mathematics questions, in correlation to the perceptions of students and professors. By extension, it provides a more detailed analysis of ChatGPT’s potential as a learning tool. To achieve this, SAT questions are administered to ChatGPT. Through this investigation, it was observed that ChatGPT generally demonstrates greater consistency in linguistics compared to mathematics, with different levels of reliability across distinct SAT subareas. Additionally, ChatGPT was also observed to perform better than at least 50% of high school SAT student test takers, with accuracy rates of 59.59% in linguistics and 56.41% in mathematics. Through survey and interview, the study also reveals that there is a gap between student perception on ChatGPT’s performance than its simulation accuracy rate. In linguistics, there was a significant gap between the mean survey with interview results and the simulation accuracy, while in mathematics, the gap was smaller.