Enhancing Fairness and Efficiency in Subjective Assessment through LLM-Based Automated Grading

A. Sanusi Funmilayo; Lucas-Adebayo Daniel; Fatade Oluwayemisi Boye; Okorie Grace Chinenye

doi:10.70112/ajcst-2026.15.1.4373

Authors

A. Sanusi Funmilayo Department of Software Engineering, Babcock University, Ilisan-Remo, Ogun State, Nigeria
Lucas-Adebayo Daniel Department of Computer Science and Mathematics, Mountain Top University, Ogun State, Nigeria
Fatade Oluwayemisi Boye Department of Computer Science and Mathematics, Mountain Top University, Ogun State, Nigeria
Okorie Grace Chinenye Department of Software Engineering, Babcock University, Ilisan-Remo, Ogun State, Nigeria

DOI:

https://doi.org/10.70112/ajcst-2026.15.1.4373

Keywords:

Automated Grading Systems, Large Language Models , Vision-Based Document Analysis, Educational Assessment, Multi-Agent Architectures

Abstract

In recent years, the demand for fairness, speed, and transparency in grading has catalyzed interest in automated systems, particularly for subjective, theory-based assessments. Unlike objective tests, these examinations require nuanced understanding and contextual reasoning, traditionally making them dependent on human graders. However, human grading is often affected by inconsistencies, biases, and fatigue-induced errors. This work presents a system that leverages Large Language Models (LLMs) as grading agents for automating the evaluation of handwritten, theory-based exam scripts. Methods: The methodology employs a modular system architecture in which uploaded scripts are digitized, interpreted using vision-based models, and subsequently graded by domain-specific LLM agents. The system is implemented using FastAPI for the backend, Celery and RabbitMQ for asynchronous task handling, Redis for log streaming and task status management, and Next.js for the frontend interface. For mathematics scripts, a Math Agent is used to evaluate exam responses through context-aware reasoning. Results: Preliminary evaluation indicates that the system can grade an eight-question script within three minutes, significantly faster than the approximately fifteen minutes required by a human grader. This demonstrates that LLM-based grading systems can scale efficiently while reducing human bias and fatigue. Discussion and Conclusion: The project provides a foundation for broader integration of LLMs into educational assessment, while acknowledging limitations in current open-source vision models and inference latency. Future improvements may include fine-tuning and offline model support to enhance speed and reliability.

References

[1] S. M. Darwish, R. A. Ali, and A. A. Elzoghabi, “An automated English essay scoring engine based on neutrosophic ontology for electronic education systems,” Appl. Sci., vol. 13, no. 15, art. 8601, Jul. 2023, doi: 10.3390/app13158601.

[2] I. Aggarwal, P. Gautam, and G. Parashar, “Automated subjective answer evaluation using machine learning,” in Proc. Int. Conf. Comput. Sci. (ICCS), 2023.

[3] M. Lundgren, “Large language models in student assessment: Comparing ChatGPT and human graders,” SSRN Electron. J., Jun. 24, 2024, doi: 10.2139/ssrn.4874359.

[4] S. Wang, T. Xu, H. Li et al., “Large language models for education: A survey and outlook,” arXiv preprint arXiv:2403.18105, Mar. 2024.

[5] S. Dikli and S. Bleyle, “Automated essay scoring feedback for second language writers: How does it compare to instructor feedback?” Assess. Writing, vol. 22, pp. 1–17, Oct. 2014, doi: 10.1016/j.asw.2014.03.006.

[6] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, … and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017, pp. 5998–6008.

[7] J. Flodén, “Beyond human subjectivity and error: A novel AI grading system,” arXiv preprint arXiv:2405.04323, 2024.

[8] A. Gobrecht, F. Tuma, M. Möller, T. Zöller, M. Zakhvatkin, A. Wuttig, H. Sommerfeldt, and S. Schütt, “Beyond human subjectivity and error: A novel AI grading system,” arXiv preprint arXiv:2405.04323, May 2024.

[9] H. Xu, W. Gan, Z. Qi, J. Wu, and P. S. Yu, “Large language models for education: A survey,” arXiv preprint arXiv:2405.13001, May 2024.

[10] A. O. Adeyanju, O. T. Oladele, and O. A. Adebayo, “Artificial intelligence-based essay grading system,” Int. J. Innov. Res. Multidiscip. Philos. Stud., vol. 2, no. 2, 2024.

[11] S. M. Darwish, R. A. Ali, and A. A. Elzoghabi, “An automated English essay scoring engine based on neutrosophic ontology for electronic education systems,” Appl. Sci., vol. 13, no. 15, art. 8601, 2023.

[12] F. Li, X. Xi, Z. Cui, D. Li, and W. Zeng, “Automatic essay scoring method based on multi-scale features,” Appl. Sci., vol. 13, no. 11, art. 6775, 2023. [Online]. Available: https://www.mdpi.com/2076-3417/13/11/6775.

[13] S. M. Darwish, R. A. Ali, and A. A. Elzoghabi, “Automatic essay exam scoring system: A systematic literature review,” Appl. Sci., 2023.

[14] “Large language models and automated essay scoring of English language learner writing: Insights into validity and reliability,” J. Writing Anal., 2024.

[15] H. M. Alawadh, T. Meraj, L. Aldosari, and H. T. Rauf, “An efficient text-mining framework of automatic essay grading using discourse macrostructural and statistical lexical features,” SAGE Open, vol. 14, no. 1, pp. 1–18, 2024.

[16] Y. Sun, J. Li, and H. Wu, “AI in teaching English writing: Automatic scoring and feedback system,” Appl. Math. Nonlinear Sci., vol. 9, no. 1, pp. 1203–1215, 2024.

[17] I. Gambo, F. Abegunde, O. Gambo, and R. Ogundokun, “GRAD-AI: An automated grading tool for code assessment and feedback in programming courses,” Educ. Inf. Technol., 2024.

[18] M. Messer, N. Brown, and M. Kölling, “Automated grading and feedback tools for programming education: A systematic review,” J. Comput. Assist. Learn., 2023.

[19] H. M. M. Ahmed and S. E. Sorour, “Classification-driven intelligent system for automated evaluation of higher education exam paper quality,” Educ. Inf. Technol., vol. 29, 2024.

[20] O. O. Akinwale and O. Tunde-Adeleke, “A structured dataset for automated grading: From raw data to processed dataset,” Data, vol. 10, no. 6, art. 87, 2025.

[21] I. Eweoya, O. J. Adeniyi, A. O. Awoniyi, E. Mgbeahuruike, J. O. Adewuyi, T. O. Adigun, and Y. A. Mensah, “Design and implementation of a university attendance management system using geofencing,” Asian J. Comput. Sci. Technol., vol. 14, no. 1, pp. 12–18, 2025.

[22] H. G. Misgna et al., “Artificial intelligence-based essay grading system,” Int. J. Innov. Sci. Res. Technol., vol. 8, no. 11, 2023.

Enhancing Fairness and Efficiency in Subjective Assessment through LLM-Based Automated Grading

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Announcement

Open Access

Abstracting and Indexing

Join as Reviewer

Make a Submission

Information