Research has been done to develop computer systems for automatic essay (i.e. free-text responses to examinations) grading since the 1960s. While the earliest systems, such as Project Essay Grade (PEG) (Page, 1966) were based on rather simple measures, including the total number of words and the average length of words in an essay, state-of-the-art assessment systems use more sophisticated techniques.

Automatic essay assessment offers an interesting application area for Natural Language Processing (NLP) and Information Retrieval (IR) techniques. Automatic essay grading is closely related to automatic text categorization. Many methods suitable for other IR tasks have been found applicable to text categorization. For example, support vector machines and dimensionality reduction methods such as Latent Semantic Analysis (LSA), have been successfully applied to the problem. Moreover, NLP techniques, such as automatic summarization, the detection of rhetorical structure and writing style, are applied in state-of-the-art assessment systems.

Automatic essay grading

AEAThe arcitecture of AEA. Automatic Essay Assessor (AEA) system developed at the University of Joensuu, Finland originates from Tuomo Kakkonen's master's thesis (Kakkonen, 2003). An approach to automatic grading, followed by AEA, is to automatically grade essays by comparing them to learning materials and human-graded essays. Learning materials form the basis of determining the amount of relevant content in an essay. In order to obtain grades which correspond to those given by human assessors, it is crucial to train the system with human-graded essays. The grading model of AEA can be calibrated using as few as thirty essays.

Document comparison methods

The core of AEA is the document comparison component that uses three dimensionality reduction methods LSA, Probabilistic LSA (PLSA) and Latent Dirichlet Allocation (LDA) for comparing essays and learning materials (Kakkonen, Myller, Sutinen & Timonen, in press).

Dimensionality reduction refers to the process in which individual words in the language model are given weights according to their significance for the topic. The aim of the dimensionality reduction step is to trim down noise or unimportant details in the data and to allow the underlying semantic structure to become evident in order to use it to compare essays with learning materials.

In addition to applying these models in AEA, we have enhanced them in several ways, for example by automating the dimensionality selection in LSA (Timonen, 2005, Kakkonen, Sutinen & Timonen, 2005) and by applying part-of-speech tagging to enrich the information in LSA models (Kakkonen, Myller & Sutinen, 2006).

Semi-automatic essay assessment

While an automatic grading module forms the backbone of any essay assessment system, recent research on automatic assessment has been directed toward more transparent and detailed measures of essay quality. The idea in AEA is to move from fully automatic grading towards semi-automatic assessment (Kakkonen, Myller & Sutinen, 2004). Such a system is capable of more analytic assessment: it provides scores for different components of the essay such as the content, the writing style, and the structure of argumentation. Rather than helping teachers to get their students' final grades from an entirely automated assessment system, the idea is also to support teachers during the evaluation process and to help students to reflect on their learning process as early as possible and point out the strong and weak aspects of an essay.

This paradigmatic shift from teacher-centered assessment towards learner-centered process evaluation offers interesting challenges to educational technologists and NLP researchers. We are currently developing following methods that enable semi-automatic assessment:


Publications by the AEA group

Kakkonen, T.: Esseetehtävien tietokoneavusteinen arviointi (Computer Assisted Assessment of Essays). Master's thesis, Department of Computer Science, University of Joensuu, 2003.

Timonen, J.: Validointimenetelmien soveltaminen LSA-dimension etsintään automaattisessa esseiden arvioinnissa. (Using Validation Methods in Order to Determine the LSA-dimension in Automatic Essay Grading). Master's thesis, Department of Computer Science, University of Joensuu, 2005.

Journal articles
Kakkonen, T., Sutinen, E., Timonen, J.: Applying Validation Methods for Noise Reduction in LSA-based Essay Grading. WSEAS Transactions on Information Science and Applications, 9(2):1334-1342, 2005.

Kakkonen, T., Myller, N., Sutinen, E., and Timonen, J.: Comparison of Dimensionality Reduction Methods - A Case Study on Automatic Essay Grading. To appear in Journal of Educational Technology and Society.

Conference publications
Kakkonen, T., Myller, N., and Sutinen, E.: Semi-Automatic Evaluation Features in Computer-Assisted Essay Assessment. Proceedings of the 7th IASTED International Conference on Computers and Advanced Technology in Education (CATE 2004), pp. 456-461. Kauai, Hawaii, USA, 2004.

Kakkonen, T., Myller, N., Sutinen, E.: Applying Part-of-Speech Enhanced LSA to Automatic Essay Grading. Proceedings of the 4th IEEE International Conference on Information Technology: Research and Education (ITRE 2006). Tel Aviv, Israel, 2006.

Other references

Page, E.B.: The imminence of grading essays by computer. Phi Delta Kappa 47(1):238-243.