Abstract
In this paper, I propose an integrated Optical Mark Reading (OMR) application, which is named Shared Questionnaire System (SQS), and is based on the W3C standards for printing: XSL-FO and SVG Print. SQS is a proof-of-concept application to demonstrate the separation of presentation and content, while handling form documents using OMR technology.
Table of Contents
Optical Mark Reading (OMR) is a traditional method for capturing human-marked data from document forms such as surveys and tests. Today, OMR is still a good choice for situations where there is no network connectivity or electricity supply, or where no desktop PCs, PDAs or smart phones are available. OMR systems only require a printer to make a hard copy of the form document, and an image scanner to optically scan answers.
Questionnaire surveys using OMR continue to be useful. They will coexist with web form questionnaires. SQS was developed as a proof-of-concept application to demonstrate the separation of presentation and content for form documents using OMR technology.
OMR software users are reluctant to risk vendor lock-in, because most OMR software has its own proprietary data formats for describing the logical structure and display elements of scannable form documents. To make matters worse, there is no common data format between different OMR software. Therefore, in this section, I propose a new OMR form document markup language as a common data format using the best of XML standards.
There are various instances of OMR software, divided into two broad types: the older type and the modern type.
The older type is designed to be used with a dedicated OMR device to scan dedicated form cards with a fixed answer area layout. In this type, questionnaire sheets must often be printed separately from the form card, causing usability problems for answering questions.
The modern type is more flexible. It can be used with general purpose image scanners to scan form documents with customizable answer area layouts. This type of OMR software is a kind of DTP software that allows users to create their own forms. Questionnaire text and answer areas can be printed side-by-side on the same page. The modern type of form has better usability for answering questions than the older type.
The modern type of OMR software is backwards compatible with the older type as long as it can create and read the older type of form card. Therefore, the requirements for an OMR form document markup language as a common data format should be based on the modern type.
As for interoperability, an OMR form document should be some kind of XML instance using a subset of XML standard vocabularies and schemes.
OMR software should be designed for web-aware and transmedia questionnaire surveys, so that the questionnaires in OMR form documents can be easily translated into an HTML form, and answered through a web interface.
An OMR form document is a collection of OMR form elements, such as single, multiple or free answer questions. To markup OMR form elements, form control elements in XHTML or XForms may be usable.
On the other hand, as a host language, there are two possible categories of XML standards: source documents and result documents.
Structured document as source document:
An OMR form document can be an extension of a structured document which has a hierarchical composition of logical elements: body, section, header, paragraph, list, table, inline image, and so on. Some kind of XHTML, DocBook and OpenOffice Writer are suitable as the host language of the OMR form document. In this case, the OMR form markup language is based on a design concept of separation of presentation and content.
Visual representation in print media as result document:
An OMR form document must have a visual representation in print media as a set of two-dimensional vector graphic primitives to be printed on paper. OMR form elements in logical structures are also rendered as groups of 2D primitives. Geometric information of the answer area is used to process the questionnaire. A visual representation of OMR form documents can be created with typesetting systems or WYSIWYG editors. The possible host languages for markup of the visual elements include SVG Print, PDFXML, XPS, and OpenOffice Draw.
The elements in the logical structure must be transformed and typeset into two-dimensional vector graphics before printing. There are XML transforming and typesetting technologies that invoke W3C standards, such as XSLT and XSL-FO. A structured document can be translated into an XSL-FO document with XSLT, and typeset into a page model representation with an FO processor.
It is important to make initial layout of OMR form document contents in page model. There are two methods to make initial layout: automatic layout with typesetting and manually layout with WYSIWYG editor. The typesetting has advantage to manual layout. That is because the OMR form elements in structured document context are automatically bound to visual objects in page model context.
For ease of implementation to make a proof-of-concept prototype language, an OMR form document should be designed as a kind of XML instance using the very limited subsets of XML standard vocabularies/schemes.
A subset of XForms1.0 Form Control for the OMR form document language. As for the next-generation of Web form, Form Control elements of XForms is relatively simple than traditional HTML Forms to describe abstract user interface control that collect user input.
A subset of XHTML2.0: Structure, Text, List, Presentation and Tables modules. It is reasonable to choose XHTML as the host language of OMR document to describe the form document structure, because HTML family is the most major and interoperable vocabularies/schemes.
A subset of SVG Print1.0 for makup visual representation of OMR forms in print media as the result typesetting procedure with XSL-FO. In addition, I also use PDF format for visual representation of OMR forms in page model, enabling users to easily print out and distribute them.
I developed two SQS applications as OMR form document processors: "SQS SourceEditor" as an OMR document editor, and "SQS MarkReader" as an OMR form engine. They are signed JavaWebStart applications distributed under the Apache License, Version 2.
SQS SourceEditor is an OMR document editor based on the OMR form document language. Figure 1, “SQS SourceEditor” shows the GUI looks like an outline editor with customized node icons and node editors; it can handle multiple DOM trees of XHTML+XForms documents with multiple tabs.
The quick export buttons, located in the toolbar, produce visual representations of OMR forms using the XSL-FO engine. The export result is a combination file of SVG Print and PDF document; an SVG Print file with OMR form metadata is attached to a host PDF file. Figure 2, “Translation of an OMR form structured document to its visual representation” shows a translation system of an OMR form structured document to its visual representation. Figure 3, “A visual representation of a question” shows an example of typesetted questionnaire text and answer areas. An answer area is in the dotted elliptical line. Figure 4, “The export result, a combination file of SVG Print and PDF document” shows that SVG Print file with OMR form metadata which has an ".sqm" extension. It is attached to a host PDF file.
To produce the SVG Print file, I made a quick hack on Apache FOP to build an SVG Print document when rendering PDF objects.
This SVG Print document is not used as the print source for the document, but as the container of processing information for the OMR engine to read answer areas at a later phase.
A structure of an SVG Print document exported by SQS SourceEditor is described below.
In the root element of an SVG Print document, SVG and XForms namespaces are defined and the default is set to SVG namespace. SQS and SQS Master namespaces are also defined.
The width and height attributes are specified to describe the paper size(595 x 842 points: the size of A4 sheet of paper).
<svg:svg xmlns="http://www.w3.org/2000/svg" xmlns:svg="http://www.w3.org/2000/svg" xmlns:xforms="http://www.w3.org/2002/xforms" xmlns:sqs="http://sqs.cmr.sfc.keio.ac.jp/2004/sqs" xmlns:master="http://sqs-xml.sourceforge.jp/2007/master" svg:width="595" svg:height="842"> ...
In a <svg:metadata/> child element to the outermost <svg:svg/> element within an SVG document, an instance of XForms model can be defined. There are some placeholder elements to store the read state of answer areas: <mark/> and <image/> are defined in <answer/> element. This metadata is usually created from the structured document.
<svg:metadata>
<xforms:model>
<xforms:instance>
<data xmlns="">
<answer type="select1">
<mark density=""/>
<mark density=""/>
<mark density=""/>
</anser>
<answer type="select">
<mark density=""/>
<mark density=""/>
<mark density=""/>
</ansewr>
<answer type="textarea">
<image/>
</answer>
</data>
</xforms:instance>
</xforms:model>
</svg:metadata>
...
In an <svg:metadata/> child element to the <svg:masterPage/> element, OMR form processing information is defined: the coordinates of position for projective transformation to deskew the scanned images of OMR form, the page upside down error checker area, and the page even odd error checker area. They are <master:corner/>, <master:upsideDownChecker/> and <master:evenOddChecker/> respectively. This metadata may be extended for various algorithms to deskew OMR forms and error check subsystems.
...
<svg:pageSet>
<svg:masterPage>
<svg:metadata>
<master:master master:version="1.4" master:numPages="1">
<master:corner master:x1="99" master:y1="29" master:x2="497" master:y2="29"
master:x3="94" master:y3="810" master:x4="492" master:y4="810" />
<master:upsideDownChecker>
<master:checkerArea master:side="header">
<svg:rect x="89" y="19" width="20" height="20" />
</master:checkerArea>
<master:checkerArea master:side="footer">
<svg:rect x="84" y="800" width="20" height="20" />
</master:checkerArea>
</master:upsideDownChecker>
<master:evenOddChecker>
<master:checkerArea master:side="left">
<svg:rect x="30" y="796" width="24" height="12" />
</master:checkerArea>
<master:checkerArea master:side="right">
<svg:rect x="531" y="796" width="24" height="12" />
</master:checkerArea>
</master:evenOddChecker>
</master:master>
...
</svg:metadata>
</svg:masterPage>
...
An <svg:page/> element is a representation of single page of an OMR form document. When a visual representation of form control is rendered as a PDF object, a set of <svg:g/> elements is created to specify a rectangle of each OMR answer area to be read. Each <svg:g/> element has <svg:rect/> and <xforms:select/> elements, as the same number of selectable answer items. It is to be described as OMR answer areas. Attributes with ".sqs" prefix are used to preserve the original context of the form control.
...
<svg:page>
<svg:g id="mark1-1">
<svg:rect x="61.0" y="176.93902587890625" width="5.0" height="16.0">
<svg:metadata>
<xforms:select xforms:ref="answer[1]/mark[1]"
sqs:qid="1" sqs:itemIndex="1" sqs:prev-xform-type="select1">
<xforms:label>(1)</xforms:label>
<xforms:hint>Do you agree or disagree?: XHTML 2 Working Group expected to stop work for increasing resources on HTML 5.</xforms:hint>
<xforms:item>
<xforms:label>agree</xforms:label>
<xforms:value>1</xforms:value>
</xforms:item>
</xforms:select>
</svg:metadata>
</svg:rect>
</svg:g>
<svg:g id="mark1-2">
<svg:rect x="230.09100341796875" y="176.93902587890625" width="5.0" height="16.0">
<svg:metadata>
<xforms:select xforms:ref="answer[1]/mark[2]"
sqs:qid="1" sqs:itemIndex="2" sqs:prev-xform-type="select1">
<xforms:label>(1)</xforms:label>
<xforms:hint>Do you agree or disagree?: XHTML 2 Working Group expected to stop work for increasing resources on HTML 5.</xforms:hint>
<xforms:item>
<xforms:label>disagree</xforms:label>
<xforms:value>2</xforms:value>
</xforms:item>
</xforms:select>
</svg:metadata>
</svg:rect>
</svg:g>
...
SQS MarkReader is an OMR engine. Before starting SQS MarkReader, the user must create a number of scanned image files by scanning each page of OMR form answers. The combination file of the SVG Print and PDF documents is also required. The user puts them all together into a folder, drag and drop it to the SQS MarkReader's window, and then starts the OMR engine.
Figure 5, “SQS MarkReader” shows the initial window and the task progress window of SQS MarkReader.
The SVG Print document is extracted and used by the OMR engine to assist in reading answer areas. In this phase, the form control ID, form control type and geometry of answer area are retrieved from the SVG Print document and processed and used to gather questionnaire form answers.
SQS MarkReader introduces brand new OMR technologies.
To reduce the turnaround time of OMR engine tasks, a set of SQS MarkReaders in a local area network can be used as a high performance cluster computing system for parallel execution. The application processes communicate with each other to make loosely coupled clusters with Administratively Scoped IP Multicast, and the relationships between master and workers are created dynamically. Figure 6, “Parallel Execution of SQS MarkReader” shows the schematic drawing of parallel execution of SQS MarkReader.
The result data segment of distributed OMR engine tasks are collected, and a result data structure is generated in the form of spreadsheets(CSV and Excel files), charts, statistics, image files of free-answer areas, and so on. These result data can be browsed and analyzed in a Web browser through an AJAX user interface, as shown in Figure 7, “AJAX user interface to browse and edit the result data”.
A certain number of read errors are inevitable in OMR form processing, so the operator has to correct the errors manually with some kind of form-like GUI. In this context, a declarative GUI of AJAX interface for read error collection can be generated from the source OMR form document.
In this paper, I proposed an integrated OMR application, named SQS, which is based on the W3C standards for document system for printing: XSL-FO and SVG Print.
SQS is carefully designed to provide a straightforward and easy-to-use OMR form application. It is a proof-of-concept demonstrating the separation of presentation and content for the OMR form document.
Through typesetting, an OMR form document is presented, and a set of OMR form metadata is produced. The presentation is produced as a ready-to-print PDF document. The OMR form metadata is produced as an SVG Print document, which describes the geometric information of the answer area, allowing the answers to be read by the OMR engine.
This research was supported by the Exploratory Software Project (MITOH Program) in 2007, which was conducted by the Information-Technology Promotion Agency(IPA), Japan.