Visualizing Natural Language Structure using SVG

Phil Middleton
Department of Computer Science
University of Victoria
E-Mail: middleto@uvic.ca
Tel: (604) 943-4754

 

Introduction

We are currently building a prototype system for natural language generation. For a variety of reasons, SVG [1] is an ideal platform to build an interface to our system, which allows natural language sentence trees to be represented and explored interactively and graphically.

Our system uses the Attribute-Value Grammar formalism from Johnson [2]. In this system a sentence is represented by a tree similar to a context-free parse tree. The nodes in this tree are annotated with attribute-value matrices containing information on the sentence. An Attribute-Value Grammar requires the attributes associated with each node satisfy certain lexical and syntactic restrictions.

Attribute-Value Logic and Natural Language

 

Sentence Tree with single-level Attribute-Value Matrix
Figure 1: Representation of the sentence "Mary chased John".

 

Figure 1 depicts the structure of the sentence "Mary chased John." Nodes are color coded according to type. Leaf nodes have their lexical forms shown. The matrix associated with the root node is displayed.

A Grammar consists of a collection of lexical and syntactic rules. These rules check that the attributes associated with a node fit certain criteria. Two simple examples of such rules are shown below.

a) ( chased , V, x(pred) = chase AND x(tense) = past AND x(subj)(agr)(num) = sg AND s(subj)(agr)(pers) = 3rd)

b) (S -> V NP, x=x1 AND x(obj) = x2)

The lexical rule in a) ensures that any node with lexical form chased be of type Verb. The node’s associated matrix must be in the past tense, and the verb’s subject must be singular and in the first person.

The syntactic rule in b) ensures that any Sentence node must have a Verb and a Noun Phrase as children. The matrices associated with these nodes are must satisfy certain conditions.

A grammar is said to generate a sentence tree if each node of the tree satisfies one or more of these rules. Grammars are usually defined as approximations to certain natural languages such as English.

Problem

Our goal was to create a system to allow for the creation and exploration of sentence trees and grammars. This system is to stay as close to the derivations given in Johnson [2] as possible. It is also intended that this system have a Web-enabled user interface to allow for ease of use. To this end six conditions were identified:

  1. Generated diagrams similar to those in current publications.
  2. Sentence trees defined in a format similar to Johnson.
  3. Grammars defined in a format similar to Johnson.
  4. Grammars must be executable.
  5. Multiple sentence trees and grammars able to be defined and used.
  6. Web-enabled system.

Condition one implies the ability to generate tree diagrams, tables, and other sentence representations. Conditions two and three ensure the readability of sentence tree and grammar definitions. Condition four is necessary to allow dynamic validation of sentence trees. Condition six implies the use of Web technologies that allow for the dynamic generation of documents without the need for a custom client.

Implementation

We have implemented our system as a Web-application. Our system has three parts. An SVG interface is used as a client. The system itself is implemented in server-side Perl. Data excahnge is facilitated through XML files. When a session begins, an initial view of the sentence tree is generated. The user sends requests to the server and an appropriate SVG representation of the requested information is then generated and returned to the user. Representations of each type of data can be displayed.

The following list discusses how our system addresses each of the conditions identified above:

XML-Based(2,3): Our use of XML for storage of sentence trees and grammars allows us to define our rules in a format very similar to that used by Johnson. Using XML allows us to leverage existing tools for storage and transformation.

Perl Backend(3,4,5,6): Using Perl allows us to access and transform the sentence tree and grammar information in XML files into SVG representations. This information can then be sent to a client through a PhP interface. Perl allows us to transform grammar rules into executable code and then use those rules to validate sentence trees.

SVG User Interface(1,6): SVG provides both high-quality graphics and an interactive medium. SVG diagrams generated by the system are similar to those in current publications[2,3]. Because we take advantage of the interactive nature of SVG documents there is no need to create a separate user interface for the system.

Results and Evaluation

SVG allows our system to have a graphically rich user interface without the maintenance of a custom client or applet. In our experience the amount of code needed to generate SVG is significantly less than would be expected if we had created a system based on another technology.

In summary, our experience with SVG has been positive. In the future, we plan to use the animation capabilities of SVG to illustrate the actions of grammar rules on sentence trees.

References

[1] W3C. Scalable Vector Graphics (SVG) 1.0 Specification, September 2001. http://www.w3.org/TR/SVG/

[2] Mark Johnson, Attribute-Value Logic and the Theory of Grammar, Center for the Study of Language and Information, 1988

[3] Bharati, Akshar, Vineet Chaitanya, and Rajeev Sangal. Natural Language Processing: A Paninian Perspective, Prentice-Hall of India, New Delhi, 1995.