Evaluating the Quality of Multiple-Choice Tests with Automatically Generated Visualizations

An Application of Scalable Vector Graphics in Medical Education

Table of Contents

Program Outputs

In 2004 the Medical Faculty at the University of Muenster introduced a modular reorganization of several clinical courses changing the focus of teaching from subject (e.g. radiology) to topic (e.g. heart and vessels). Additionally, testing was centralized using multiple-choice questions (MCQs). In this context a database was developed to support the processes of student registration, content editing, preparation of individual test sheets, and score calculation. The demand for uniformly computed evaluations of the quality of large numbers of test items led to the design of a software module for the automation of this reporting procedure.

All inferences about the quality of a single item are based on the assumption that the overall score (i.e. the fraction of correct answers) permits the discrimination of the individual candidate 'competence'. Agreement between the scores for a single question and the aggregate student abilities is considered to indicate high-quality items. Students who are more 'competent' in summary are also more likely to answer correctly in these cases. This presumptive correlation is evaluated both statistically and graphically. Deviations from this pattern can point to problems such as ambiguous phrasing.

Key steps of the Java-based implementation include allocating the relevant data with SQL queries, computing statistical parameters and arranging these values with the MCQ contents in an intermediary XML-stream (straightforward format). Vector graphics are created during the subsequent XSL-transformation and are integrated into print-oriented XSL-FO using XML-namespaces. The resulting standard-compliant stream comprises a synopsis of the MCQ contents with the statistical parameters and graphics. In the final step this monolithic XML-source is converted into a PDF-file for printing or electronic transmission via email or HTTP.

Different stylesheets were built in order to produce a variety of output formats (single source publishing). Other targets of the XSL-transformation include CSV (comma separated values) with tabular data for further manual processing, and scatterplots aggregating the parameters of several items (discriminatory power versus difficulty level).

The output files contain for each possible answer bar charts and Tukey's box plots illustrating the sizes and 'competence'-distributions of the relevant subpopulations (i.e. the different candidate groups which were making either the correct choice or were deluded by one of the distractors). The visual representations of differences in 'competence' among these groups are reinforced with color-gradients. This color-encoding in conjunction with a consistent graphical layout intends to provide quickly accessible conclusions about the item quality. The following tables contain sample diagrams and corresponding sourcecode fragments.

The creation of XHTML integrating SVG with JavaScript (see Table 6, “ Production overview and screenshot of alternatively generated XHTML with integrated SVG and Javascript. Javascript allows parts of the scatterplot to become responsive and mutable, so that the user can navigate the different test items in this report. ”) initially had only little widespread effect due to the low availability of SVG-enabled browsers among our users.

The software module is based on the FOP library from the Apache XML Graphics Project, invoked for rendering the PDF-files. The combined use of this print formatter, XSL-transformations, and SVG has quickened the expansion of the available functionality. At the same time this approach supported a maintainable separation of program logic and output layout. The inclusion of vector graphics allowed the automatic production of integrated files with efficient filesizes, facilitating a network-based distribution of the reports without compromising image quality.