Accessing SVG Content Linguistically and Conceptually

Ascription of Visuals into Language

Keywords: language, ascription , paper

David Dodds, Mr.
Founder
DDWyndham
Vancouver
British Columbia
Canada
david_dodds_2001@yahoo.com

Biography

David Dodds has worked with computers since 1968, when he wrote Continuous Systems Models (CSMP) on IBM mainframe computers at university. Later he worked at Nortel (Northern Telecom Bell Northern Research) where he designed GUI and wrote text understanding software, these were in-house projects to allow extraction of content from telephony specification documents for use by graphical-interface programs, and he also wrote expert systems in C and Prolog. He has been working the last several years on the various emerging XML technologies, was on the W3C SVG workgroup to develop the specification for SVG 1.0, and on the Idealliance committee to develop XML Topic Map (XTM) specifcation. David has published numerous papers in robotics and on fuzzy systems. Two of these papers were published in the SPIE proceedings Space Station Automation III. Most recently he was co-author of the book WROX Professional XML Meta Data. He also worked as technical reviewer for Kurt Cagle's SVG book.


Abstract


SVG content consists of both the actual vector graphic elements that are in the SVG picture (file) and also what the picture "is about" or its meaning semantics. In this paper we examine both types of content and see how computer programs recognize each. The former are explicit ('in the file'), the latter may be contained in Title or Description elements in an SVG file, or may be described in the metadata one way or another by RDF, XTM, or XGMML. It is also possible that 'what the picture is about' (ie its semantics meaning) is NOT AT ALL explicitly described in the SVG file itself.

The paper covers the programming which maps SVG 'code' to XGMML (XML Graph Modeling and Markup Language). The latter is a representation of the explicit SVG code elements, in a semantic network (graph) form. The paper also shows how code can display the XGMML graph network as an SVG picture.

Linguistic accessibility of SVG picture information is shown in this paper by means of programs which input English sentences and output XGMML representation of the sentences. It is possible, by means of standard graph matching, to determine the presence or absence of the visual items described in the sentences, in a given SVG file picture. In this way it is possible to search a collection of SVG pictures by content by using an English sentence to describe that content. (Search of all SVG elements not just Title and Description elements.)

Linguistic accessibility is further shown whereby, using English sentences as input, one can describe higher-level graphical visual constructs. LR grammars are discussed as picture grammars. An example is shown in the paper where a 'bargraph' is described in English, parsed into XGMML, which is used to locate an SVG file containing only the existing SVG constituent elements of the 'bargraph'. The bargraph does not explicitly exist anywhere, it is a "perception" comprised of the collection of certain visual elements in a certain configuration. Such a 'perception' is an INFERENCE. In this paper we see how this is done, with a 'business graph'.

A means of automating the 'discovery' of such predicates in arbitrary SVG files is shown. RDF, from 'linearization', is shown translated into English sentences by programs which are discussed.

XGMML graphs are shown being able to produce English which describes 'perceptual' content of SVG file. A 'business graph' is shown as an example. An SVG animation is shown as an example of where conceptualization of 'motion' and other visual changes are captured in XGMML and output as English sentences. Implicit or tacit visual content of SVG pictures is shown captured programmatically and stored as XGMML representation (which can be processed to produce English sentence output.)

Text-to-speech voice synthesis is briefly covered as a means of outputting text, but in a spoken form, to increase the dimensions of accessibility. Dragon Systems speech input systems are discussed as a complementary means of accessible text input to the above systems.

A knowledge-base, using the FRAMES technology, represented via XGMML graph structures is used for several semantic processing functions. These frames contain a description of all the parts of each SVG element in the specification (as appear in this paper). For example, a line is known to have certain attributes, like two end points, a thickness, a stroke colour, etc. (And other info, like 'must specify', 'optional presence', etc). Amongst the semantic tasks performed with these frames is 'good completion' (a term from Perceptual Psychology) and 'correctness'. The paper discusses these frames and how they are used by programs in the generation of English sentences about SVG (files).


Table of Contents


1. The Concepts Behind SVG Visual and Linguistic Processing

1. The Concepts Behind SVG Visual and Linguistic Processing

The DARPA Epca programme says, in part, "...the real power of human information processing seems to come from higher-level capabilities that use abstraction, ... powerful language-understanding and generation capabilities ...".

Language-understanding and generation capabilities in humans comes from a deeper set of processes called ascription. Ascription is the collection of cognitive and memory processes which allow humans to externalize concepts. Concepts do not occur in the form of natural language in the human brain. (This can be seen when we hear someone say "...but I can't put it into words", or someone SEES a colour but doesnt know the colour name.) (what is chartruse?)Consciousness and awareness are constructed from pre-conscious material in the brain which is given a time varying and situation varying "moment in the spotlight" of attention.

Through the means of ascription we are able to assign (usually syntactically directed) processes which externalize the concepts we have in mind. These processes occur in all cultures but are largely the same in each one. These processes are language (in the form of speech and writing, singing), painting, drawing, (hand) signing, and coded artifacts such as morse code, semaphore and naval signalling flags. They are syntactic in that they all use the human musculature in the process of externalizing, and this musculature can perform only one movement at a time. Your tongue, for example, cannot be instantaneously in two distinctly different places at the same time, as cannot your hands, etc. This physical constraint means that externalization must include a serialization of what may appear to be simultaneous mental events.

A means (metaphorically) like ascription is needed in computers to externalize the bits and bytes whizzing about in there, into a form which is meaningfully consumable by human users. A means of ascription for SVG-based "pictures" is the focus of the current paper. It is not the mechanism of ascription used by humans to "perform linguistic output", it is rather a simpler but functional means whereby SVG pictures can be communicated to humans via language. In this light the ascription mechanism described here may be seen as an accessibility option provided to people (and other programs) who wish to use it.

SVG is a means of producing visuals, and that means that the typical consumer of SVG might be considered to be those (human users) possessing eyesight. This assumption is considered an afront by some, for it may appear to not include people who are typical in all but visual ability. Blind persons and those with visual impairment often have difficulty with visually oriented computer systems because those systems often do not have a means of access to the visual (material) other than displaying it on a screen or paper. SVG has a number of built-in features which can augment the purely visual aspect of an SVG "picture". These features, such as title and description, have been thoroughly covered by other people and will not be covered here. One of the built-in features of SVG is its ability to contain metadata, through means of the SVG metadata element. (Metadata may in fact be placed in certain other elements also, but this author considers that a dubious practice.) SVG metadata elements may contain Dublin Core RDF metadata, for example, where information such as author, creation date, title etc may be entered. Metadata may be contained inside the actual SVG picture file or it may be stored externally (external to a given SVG file) and referenced from an SVG "picture".

This author has, for example, produced an SVG picture which was used for the SVG Metadata BE test on file at the W3C, which contained substantial metadata.

Metadata is mentioned here as it is a means of representing the knowledge required whereby any non-trivial linguistic system may operate successfully. It is not possible for a non-trivial linguistic system to operate in a knowledge vaccuum.

Clearly there are two ways of providing a linguistic capability in an SVG system. The first way is to simply place all of the text elements comprising the linguistic output into SVG Title and Description elements, etc. All that is then required to have apparently linguistic output ability is to have an XSLT, SAX, Java, J Python or other "scanner" "walk" the SVG file (the "picture") and locate these text items and output them somewhere, perhaps performing some formatting as well. This is not the kind of language capability that this paper addresses, although this first method is workable in many cases. It suffers from a dependence on the creator of the SVG picture placing USEFUL text in the correct places in the SVG file. Not all SVG picture makers have the time, patience or inclination to produce and place such text.

The second way to provide a linguistic capability in an SVG system is to provide a program which is capable of performing the necessary parts of the ascription system described earlier. This paper is about such an ascription system. The SVG file (which I will colloquially refer to, variously, as "(SVG) code", "the (SVG) picture", and "the SVG file") is actively scanned and language response is generated as a result. The ascription mechansim described in this paper is far simpler than the one used by humans to create their language output. What is shown is that the ascription mechanism has "some understanding" of the nature of the material being "transcoded" into language. Language cannot be created in a vaccuum. The knowledge used (by the programs) discussed in this paper which replaces that vaccuum is found by examination of 1) the SVG picture itself, 2) metadata about the picture (which may be embedded in the picture, or may be elsewhere, perhaps in a knowledge base) and 3) other knowledge which might not be considered metadata per se, such as knowledge of spatiality, change (time and motion), language competancies (syntax, vocabularies, semantics, pragmatics).

SVG "picture" content consists of both the actual vector graphic elements that are in the SVG picture (file) and also what the picture "is about" or its meaning / semantics. In this paper we examine both types of content and see how computer programs recognize each. The former are explicit ('in the file'), the latter may be contained in Title or Description elements in an SVG file, or may be described in the metadata one way or another by RDF, DAML-OIL, XTM, or XGMML. It is also possible that 'what the picture is about' (ie its semantics / meaning) is NOT AT ALL explicitly described in the SVG file itself. (We look at 'external' metadata in this paper, too.)

This paper covers two uses of linguistic text, 1) generating English language text as output from the system, and 2) using English language text sentences as input to the system.

While no details are presented here about use of spoken natural English, one should be aware that there are available text-to-speech computer programs and speech input programs. An example of such programs is that of Dragon Systems, who have a Continuous Speech Recognition program commercially available for ordinary personal computers. This program also has text-to-speech capabilities.

Text-to-speech programs input ordinary text from the keyboard or as data input from other programs and generate usable quality spoken output of those words. (The speech quality no longer sounds droning and mechanical.)

A text-to-speech program is able to input the text generated by our SVG linguistic program and produce an audible spoken rendition of those words immediately. The value to visually disadvantaged persons is obvious.

A continuous speech input program is able to receive spoken sentences in lieu of typing (them). In this way the SVG linguistic system is able to receive text input yet the input itself was speech. A visually disadvantaged person is able to speak English sentences as input to the system rather than be constrained to typing them in with a keyboard. Such sentences may be questions about the picture being displayed or some other picture, a spoken search request for a picture, or may be the updating of the vocabulary knowledge of the system.

Now we will turn to looking at generating linguistic output from SVG pictures. Two SVG pictures are used as examples, downarrow and barchart.

After glimpsing the SVG picture called downarrow, we immediately see that it consists of several black lines and that they are oriented in particular ways. Some people will recognize the funnel shape quite quickly, and most everyone will see the three downward pointing arrows. We see the arrows because of the arrow heads, a (culturally) well known diagram signature.

A non-complex XSLT (linguistic output) program, or SAX Java JPython program, scanning the SVG code (below) which produces this picture

<svg width="9in" height="7in" viewBox="0 0 255 201">
         <g id="leftfunnelside">
             <path d="M 17 11 L 99 97 99 197 "  
              style="fill:none; stroke:black; stroke-width:3"/>
           </g>
           
         <g id="rightfunnelside">
             <path d="M 225 11 L 153 97 153 197 " 
              style="fill:none; stroke:black; stroke-width:3"/>
           </g>
           
         <g id="downarrow1">
             <path d="M 108 1 L 108 147 98 129 M 108 147 
                      L 116 129 "
              style="fill:none; stroke:black; stroke-width:1"/>
           </g>
           
         <g id="downarrow2">
             <path d="M 127 1 L 127 147 117 129 M 127 147 
                      L 135 129 "
              style="fill:none; stroke:black; stroke-width:1"/>
           </g>
         <g id="downarrow3">
             <path d="M 142 1 L 142 147 132 129 M 142 147 
                      L 150 129 "
              style="fill:none; stroke:black; stroke-width:1"/>
           </g>
           </svg>

would, in simple default context mode, report the following things being explicitly in the program itself:

"The picture is 9 inches wide and 7 inches in height. The viewbox is 0 0 by 255 201. There are five groups, they are called leftfunnelside, rightfunnelside, downarrow1, downarrow2, downarrow3. Each named group contains a path."

Well, this isn't very interesting English, even though it is accurate and what is actually there in the picture. It's not that the English is wrong or misleading but rather it is too slavishly detailed. It is too "low level" to be of interest to most folks. There may be instances where this information is actually wanted and we would program our system to have the capability of being told such a desired output context by the SVG picture viewer (the person) was to be used. The user might speak or type a query to the system, "Tell me the size of the picture and its parts."

The default English output for a picture diagram of this sort would preferably be of higher cognitive level , that is more abstract, speaking to the ganzfeld of the picture rather than simply anglifying the SVG elements actually present in the picture. It is because a viewer of the Mona Lisa wants to hear that the picture is of the Mona Lisa and not details about all the brush strokes that constitute it.

By analyzing the actual SVG elements (using computer programs), including their attributes and respective values, one can garner a higher level English description. For example, the SVG file named downarrow can have (a slightly higher level) English output such as:

"The groups downarrow1, 2, 3 are inside the thicker-lined leftfunnelside and rightfunnelside groups. "

This isn't really exciting English either but it is a little more sophisticated. The more sophisticated parts are "inside..." and also "thicker-lined...". While these two pieces of conceptual information are trivially obvious to you the human viewer of the downarrow file, remember that we are talking about the "perception" of computer programs, not the 10 billion neuron supercomputer behind your eyeball!

Briefly, the linguistic system is able to know to use the term "inside" because examination of the actual SVG code in downarrow by program based analyser determines that the paths (co-ordinates) of the SVG path elements do not touch or cross past either of the two groups leftfunnelside and rightfunnelside lines. The analyser program also detects that those two groups are 3 units wide while the others are only 1 unit wide, and this "detection" can be translated into the "perception" (of) "thicker".

Yes these two little perceptions are screamingly obvious to our eyes but remember that the computer program cannot "look" at the actual image on the screen and certainly does not have the benefit of millions of neurons in the visual processing system of our eyes, which precede the actual brain which receives a highly-preprocessed signal from the eyes.

A slightly more sophisticated version of the preceding output might be:

"There are three downward pointing arrows, inside a funnel."

Notice that that text is not contained anywhere inside the SVG picture file. It is not merely read out of the picture and displayed. The english text of that sentence has to be generated. Notice too that there is no funnel in the actual picture nor are there three downward pointing arrows. Look at the code.

"Three" is simply a count of like things, "arrows" is a perception which is assembled from what's actually in the picture. What's actually in the picture is five paths. There is no funnel and there are no arrows in the picture. What is an "arrow" is something that is perceived, it is inferred. The inference is made by recognizing that part of the path of each of the three downarrow elements constitutes a detectable "signature". In this case it is the signature of an arrowhead. This particular signature is subtle because it is part of a single SVG element and not a separate SVG element or group of elements.

"Funnel" is recognized from the text of the "id" in two of the groups and the knowledgebase contains knowledge of (visual) objects such as funnel. "Funnel" can also be dealt with strictly as a symetrical (visual) object found in the picture, and which has visual objects situated inside its boundaries.

"Downward" is a linguification of the axis orientation of the picture. The origin is at 0,0 or upper left hand corner. "Downward" is an ascription of the general vector or change in axis magnitude from y=0 toward y=129. In this case "downward" is tied to the visual metaphor of "up and down" discussed by Lakoff in his work "Metaphors We Live By". For a more complete understanding of how the computer code works that does this please look at the program code which accompanies this paper and also at my chapters in the WROX book "Professional XML Meta Data", where the code and process it performs is explained. The process is inference, not simply reading some canned text from the SVG picture.

One can see that for certain kinds of pictures that a top-down left-to-right grammar (LR grammar) can be used to guide an SVG code analysis program as to what to detect ("what to look for").

simple LR grammar

HFACE := FACE + SMILE
      FACE := CIRCLE + CIRCLE + CIRCLE
      SMILE := ARC
      
      AFACE := FACE + FROWN
      FROWN := ARC

Those simple little grammars can be used to infer the presence of a "happy face" or an "angry face" SVG drawing. (In order to detect the smile in the Mona Lisa the digitized painting would have to be processed with edge following routine first, to vectorize the eyes and mouth.)

By having an english sentence input system available to the SVG user a description can be typed in. This description can be parsed into XGMML graph. Such a graph can be examined by SVG picture builder and the appropriate SVG elements put together into an SVG program. Here are some example input senetences.

"Define an angry face has small open eyes and small open mouth or closed mouth." "Define a happy face has medium open eyes and medium open mouth." "Define a happy face as (being) a circle as a head, two smaller circles inside (head) as eyes, and a small rectangle inside the head centered below the eyes as a mouth."

In order for the system to build a usable SVG program from those sentences the system would have to have been told what "eyes" and "mouth" are in terms of visual aspect, like arc or circle, etc.

As we see in this paper linguistic processing in done in concert with one or more domain understanding systems. A major part of the ability to assign relevent linguistic items to pictures is that the processor which examines the picture data do so in a non-random, even hopefully optimized fashion. This focussing is augmented strongly by use of a context mechanism which here will be called Point Of View. (POV herein is not to be confused with mere personal opinion.)

POV

1-> horizontal LR parsing works for some instances:

English has a syntax and consequently a writing direction which is left-to-right. It is a culturally embedded syntactic directionality. This LR disposition is also often reflected in diagrams and drawings we make, and hence left-to-right directing of the focus of the picture scanning and evaluation is a reasonable first attempt at detecting the actual "grammar" or "flow" used by or intended in a diagram or drawing.

2->topdown LR

In the case where there is or appears to be more than one row of content in a picture top-down left-to-right evaluation of the visual is quite reasonable.

3->outside in or inside out

Some diagrams are organized in a radial fashion rather than in a linear fashion as lines of writing usually are. In the case of a radially oriented diagram or picture the syntax (direction of flow) is either outside in or inside out. This kind of point of view is obviously successful in material such as bulls-eye, radar screens, polar graphs and others.

Our next example is SVG file, barchart, look at its picture using an SVG viewer, such as Adobe's I.E. plug-in. The SVG code for barchart is listed below:

<?xml version="1.0" standalone="yes" ?>
    <svg xmlns = 'http://www.w3.org/2000/svg'>
    <metadata xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
              xmlns:rdfs="http://www.w3.org/TR/. ..-schema#" 
              xmlns:daxsvg="http://www.openmeta.org/daxschema/" >
      <rdf:Description about="#text1">
         <daxsvg:Below resource="#xbaseline"/>
      </rdf:Description>
      <rdf:Description about="#text1">
         <daxsvg:IsNear resource="#xbaseline" />
      </rdf:Description>
      <rdf:Description about="#text2">
         <daxsvg:Below resource="#text1"/>
      </rdf:Description>
      <rdf:Description about="#text2">
         <daxsvg:IsNear resource="#text1" />
      </rdf:Description>
      <rdf:Description about="#endlineleft">
         <daxsvg:AtRight resource="#line1"/>
      </rdf:Description>
      <rdf:Description about="#endlineleft">
         <daxsvg:IsNear resource="#line1" />
      </rdf:Description>
      <rdf:Description about="#endlineright">
         <daxsvg:AtLeft resource="#bar13"/>
      </rdf:Description>
      <rdf:Description about="#endlineright">
         <daxsvg:IsNear resource="#bar13" />
      </rdf:Description>
      <rdf:Description about="#line1">
         <daxsvg:AtRight resource="#line2" />
      </rdf:Description>
      <rdf:Description about="#line2">
         <daxsvg:AtRight resource="#line3" />
      </rdf:Description>
      <rdf:Description about="#line3">
         <daxsvg:AtRight resource="#line4" />
      </rdf:Description>
      <rdf:Description about="#line4">
         <daxsvg:AtRight resource="#line5" />
      </rdf:Description>
      <rdf:Description about="#line5">
         <daxsvg:AtRight resource="#line6" />
      </rdf:Description>
      <rdf:Description about="#line6">
         <daxsvg:AtRight resource="#line7" />
      </rdf:Description>
      <rdf:Description about="#line7">
         <daxsvg:AtRight resource="#line8" />
      </rdf:Description>
      <rdf:Description about="#line8">
         <daxsvg:AtRight resource="#line9" />
      </rdf:Description>
      <rdf:Description about="#line9">
         <daxsvg:AtRight resource="#line10" />
      </rdf:Description>
      <rdf:Description about="#line10">
         <daxsvg:AtRight resource="#line11" />
      </rdf:Description>
      <rdf:Description about="#line11">
         <daxsvg:AtRight resource="#line12" />
      </rdf:Description>
    </metadata>
    <rect id="lineval18" x="37" y="190" width="280" height="1" style="stroke:black; stroke-width:1" />
    <text id="text3" x="317" y="194"
       style="font-family:Verdana; font-size:12.333; fill:indigo">
    18
    </text>
    <rect id="xbaseline" x="37" y="200" width="329" height="1" style="stroke:blue; stroke-width:1" />
    <rect id="endlineright" x="333" y="96" width="1" height="104" style="stroke:black; stroke-width:1" />
    <rect id="endlineleft" x="37" y="96" width="1" height="104" style="stroke:black; stroke-width:1" />
    <rect id="line1" x="40" y="160" width="20" height="40" style="stroke:green; fill:green; stroke-width:0" />
    <rect id="line2" x="60" y="140" width="20" height="60" style="stroke:yellow; fill:yellow; stroke-width:0" />
    <rect id="line3" x="80" y="111" width="20" height="89" style="stroke:red; fill:red; stroke-width:0" />
    <rect id="line4" x="100" y="130" width="20" height="70" style="stroke:yellow; fill:yellow; stroke-width:0" />
    <rect id="line5" x="120" y="173" width="20" height="27" style="stroke:green; fill:green; stroke-width:0" />
    <rect id="line6" x="140" y="191" width="20" height="09" style="stroke:green; fill:green; stroke-width:0" />
    <rect id="line7" x="160" y="140" width="20" height="60" style="stroke:yellow; fill:yellow; stroke-width:0" />
    <rect id="line8" x="180" y="167" width="20" height="33" style="stroke:green; fill:green; stroke-width:0" />
    <rect id="line9" x="200" y="175" width="20" height="25" style="stroke:green; fill:green; stroke-width:0" />
    <rect id="line10" x="220" y="129" width="20" height="71" style="stroke:yellow; fill:yellow; stroke-width:0" />
    <rect id="line11" x="240" y="150" width="20" height="50" style="stroke:green; fill:green; stroke-width:0" />
    <rect id="line12" x="260" y="139" width="20" height="61" style="stroke:yellow; fill:yellow; stroke-width:0" />
    <rect id="line13" x="280" y="125" width="20" height="75" style="stroke:yellow; fill:yellow; stroke-width:0" />
    <text id="text1" x="37" y="210"
       style="font-family:Verdana; font-size:12.333; fill:black">
    87  88  89  90  91  92  93  94  95  96  97  98  99
    </text>
    <text id="text2" x="37" y="230"
       style="font-family:Verdana; font-size:12.333; fill:brown">
    Mean High Ratings August 1999
    </text>
    </svg>

This SVG program consists of a collection of SVG rectangles and text lines, preceded by a metadata element containing RDF statements. The RDF statements use the daxsvg RDFSchema to describe relationships of named objects. The names are in the form of XML "id".

Some of the relationships represented in the system are:

Above, Below, AtRight, AtLeft, Beside, Behind, Higher, Lower, Near, Far, Inside, Outside, Convex, Concave, Straight, Curved, Circle, Rectangle, Path, Animate.

In the daxsvg RDFSchema these terms and others are used to represent their english word equivalent (in so far as RDF scope can).

Here next are some of the daxsvg RDFSchema items used for inferencing the spatial perceptions in the above SVG picture.

<rdf:Property ID="AtRight">
    	<rdfs:comment>has a degree of to the right (by value).  (uses context) g15(x)</rdfs:comment>
            <rdfs:range rdf:resource="#www.openmeta.org/2004/AtRight(x)"/>
            <rdfs:domain rdf:resource="#SvgEntity" />
    </rdf:Property>
    
    <rdf:Property ID="AtLeft">
    	<rdfs:comment>has a degree of to the left (by value). (uses context) g16(x)</rdfs:comment>
            <rdfs:range rdf:resource="#www.openmeta.org/2004/AtLeft(x)"/>
            <rdfs:domain rdf:resource="#SvgEntity" />
    </rdf:Property>
    
    <rdf:Property ID="IsNear">
    	<rdfs:comment>has a degree of nearness (by value). g1(x)</rdfs:comment>
            <rdfs:range rdf:resource="#www.openmeta.org/2004/IsNear(x)"/>
            <rdfs:domain rdf:resource="#SvgEntity" />
    </rdf:Property>

This is how the W3C Linearization system works also, which uses their own RDFSchema vocabulary.

This author's system goes beyond that approach, however, by also incorporating a means of having the effect of a paramaterization on the relationship, instead of it being solely true or false as in RDF. The paramaterization is accomplished using Lotfi Zadeh's Fuzzy Set Theory Linguistic Variables. The linguistic variables occur in a context, which is explained in this paper and also explained in my WROX book Professional XML Meta Data. Code and data associated with this paper explaining situated linguistic variables and the parameterization of RDF predicates is available.

Next we see some Java code snippets used to produce the "English" output lines shown following the code listing. The code shown is purposely incomplete but there is enough code shown to convey the concept desired. The purpose of showing the source code here is to show (to people able to understand programming code) HOW the language system is able to detect ("perceive") instances of visual spatial concepts such as contains, left of, right of, above, below, near, far, and many others. Notice that the code shown next does not show how the concept recognizers builds and uses context, of such things as the canvas default or declared in the SVG file (for example, determining which direction is down in terms of SVG x,y axes), and of query bracketing and inter-SVGobject context. It will be clear that by inspecting the code immediately below, generating non-tedious yet accurate english requires non-trivial programming to do it.

for (int i= 0; i < relationCount; i++ )
                {
                    if (relation.compareTo(Relations[i] )== 0 )
                    {
                      switch (i )
                      {
                        case 0:
                            if ( Containz(x1, x2, y1, y2, r1, r2) )
                            {
                                System.out.println("Object " + obj1 + " contains object " + obj2);
                            }
                            else
                            {
                                System.out.println("Object " + obj1 + " does not contain object " + obj2);
                                
                            }
                            
                        case 1:
                            if ( Center(x1, x2, y1, y2, r1, r2) )
                            {
                                System.out.println("Object " + obj1 + " is center of object " + obj2);
                            }
                            else
                            {
                                System.out.println("Object " + obj1 + " is not center of object " + obj2);
                                
                            }
        
                        case 2:
                            if ( AtRight(x1, x2  ) )
                            {
                                System.out.println("Object " + obj1 + " is at right of object " + obj2);
                            }
                            else
                            {
                                System.out.println("Object " + obj1 + " is not at right of  object " + obj2);
                                
                            }
     
                        case 3:
                            if ( AtLeft(x1, x2 ) )
                            {
                                System.out.println("Object " + obj1 + " is at left of object " + obj2);
                            }
                            else
                            {
                                System.out.println("Object " + obj1 + " is not at left of object " + obj2);
                                
                            }
        
                        case 4:
                            if ( IsAbove( y1, y2  ) )
                            {
                                System.out.println("Object " + obj1 + " is above object " + obj2);
                            }
                            else
                           {
                                System.out.println("Object " + obj1 + " is not above object " + obj2);
                                
                            }
     
                        case 5:
                            if ( Below( y1, y2  ) )
                            {
                                System.out.println("Object " + obj1 + " is below object " + obj2);
                            }
                            else
                            {
                                System.out.println("Object " + obj1 + " is not below object " + obj2);
                                
                            }
     
     
                        case 6:
                            if (IsNear( x1, x2 ) )
                            {
    			       System.out.println("Object " + obj1 + " is near object " + obj2);
    			 }
    			 else
    			 {
    			      System.out.println("Object " + obj1 + " is not near object " + obj2);
    			                            
                            }
                    
                      }//switch
                   }//if
                   
                }//for
             }//while
             
        }//doResults
        
        
            public boolean Containz( int x1, int x2, int y1, int y2,  int r1, int r2)
            {
                if ( (r1 == 0) || (r2 == 0) )
                { return false; }
                
                if ( (x1 == x2 ) && (y1 == y2) && (r1 > r2 )  )
                { return true;}
                else
                {return false;}
            }
            
            public boolean Center( int x1, int x2, int y1, int y2,int r1, int r2 )
            {
                return false;
            }
            
            public boolean AtRight( int x1, int x2 )
            {
                if (x1 > x2 )
                {
                    return true;
                }
                else
                {
                    return false;
                }
            }
            
            public boolean AtLeft( int x1, int x2 )
            {
                 if (x1 < x2 )
                {
                    return true;
                }
                else
                {
                    return false;
                }
            }
                
            
            public boolean Below( int y1, int y2 )
            {
                 if (y1 > y2 )
                {
                    return true;
                }
                else
                {
                    return false;
                }
            }
            
            public boolean IsAbove( int y1, int y2 )
            {
                 if (y1 < y2 )
                {
                    return true;
                }
                else
                {
                    return false;
                }
            }
            
            public boolean IsNear( int x1, int x2 )
            {
                return false;
            }

Each of the SVG elements which appear visually have an id, these ids are used by the metadata processing as handles but the text which constitutes those ids, generally speaking, has no meaning to the processor.

Following is the output from an example run of a Java coded program which reads RDF statements and outputs the findings in "English". (Notice the quote marks.)

main (19:23:22): Object circle4 contains object circle3
   main (19:23:22): Object circle4 is not center of object circle3
   main (19:23:22): Object circle4 is not at right of  object circle3
   main (19:23:22): Object circle4 is not at left of object circle3
   main (19:23:22): Object circle4 is not above object circle3
   main (19:23:22): Object circle4 is not below object circle3
   main (19:23:22): Object circle3 contains object circle2
   main (19:23:22): Object circle3 is not center of object circle2
   main (19:23:22): Object circle3 is not at right of  object circle2
   main (19:23:22): Object circle3 is not at left of object circle2
   main (19:23:22): Object circle3 is not above object circle2
   main (19:23:22): Object circle3 is not below object circle2
   main (19:23:22): Object circle2 contains object circle1
   main (19:23:22): Object circle2 is not center of object circle1
   main (19:23:22): Object circle2 is not at right of  object circle1
   main (19:23:22): Object circle2 is not at left of object circle1
   main (19:23:22): Object circle2 is not above object circle1
   main (19:23:22): Object circle2 is not below object circle1
   main (19:23:22): Object circle1 is not center of object circle2
   main (19:23:22): Object circle1 is not at right of  object circle2
   main (19:23:22): Object circle1 is not at left of object circle2
   main (19:23:22): Object circle1 is not above object circle2
   main (19:23:22): Object circle1 is not below object circle2
   main (19:23:22): Object line1 is not at right of  object line2
   main (19:23:22): Object line1 is at left of object line2
   main (19:23:22): Object line1 is not above object line2
   main (19:23:22): Object line1 is not below object line2
   main (19:23:22): Object line2 is not at right of  object line3
   main (19:23:22): Object line2 is at left of object line3
   main (19:23:22): Object line2 is not above object line3
   main (19:23:22): Object line2 is not below object line3
   main (19:23:22): Object line3 is not at right of  object line4
   main (19:23:22): Object line3 is at left of object line4
   main (19:23:22): Object line3 is not above object line4
   main (19:23:22): Object line3 is not below object line4
   main (19:23:22): Object line4 is not at right of  object line5
   main (19:23:22): Object line4 is at left of object line5
   main (19:23:22): Object line4 is not above object line5
   main (19:23:22): Object line4 is not below object line5
   main (19:23:22): Object line5 is not at right of  object line6
   main (19:23:22): Object line5 is at left of object line6
   main (19:23:22): Object line5 is not above object line6
   main (19:23:22): Object line5 is not below object line6
   main (19:23:22): Object line6 is not at right of  object line7
   main (19:23:22): Object line6 is at left of object line7
   main (19:23:22): Object line6 is not above object line7
   main (19:23:22): Object line6 is not below object line7
   main (19:23:22): Object line7 is not at right of  object line8
   main (19:23:22): Object line7 is at left of object line8
   main (19:23:22): Object line7 is not above object line8
   main (19:23:22): Object line7 is not below object line8
   main (19:23:22): Object line8 is not at right of  object line9
   main (19:23:22): Object line8 is at left of object line9
   main (19:23:22): Object line8 is not above object line9
   main (19:23:22): Object line8 is not below object line9
   main (19:23:22): Object line9 is not at right of  object line10
   main (19:23:22): Object line9 is at left of object line10
   main (19:23:22): Object line9 is not above object line10
   main (19:23:22): Object line9 is not below object line10
   main (19:23:22): Object line10 is not at right of  object line11
   main (19:23:22): Object line10 is at left of object line11
   main (19:23:22): Object line10 is not above object line11
   main (19:23:22): Object line10 is not below object line11
   main (19:23:22): Object line11 is not at right of  object line12
   main (19:23:22): Object line11 is at left of object line12
   main (19:23:22): Object line11 is not above object line12
   main (19:23:22): Object line11 is not below object line12
   main (19:23:22): Object line12 is not at right of  object bar13
   main (19:23:22): Object line12 is at left of object bar13
   main (19:23:22): Object line12 is not above object bar13
   main (19:23:22): Object line12 is not below object bar13
   main (19:23:22): Object text1 is below object xbaseline
   main (19:23:22): Object text2 is below object text1
   main (19:23:22): Object endlineleft is not at right of  object line1
   main (19:23:22): Object endlineleft is at left of object line1
   main (19:23:22): Object endlineleft is not above object line1
   main (19:23:22): Object endlineleft is not below object line1
   main (19:23:22): Object endlineright is not at left of object bar13
   main (19:23:22): Object endlineright is not above object bar13

While technically accurate and understandable this would be considered miserable output by most SVG picture users. The example is given because it shows what the limitations of simple code in terms of english sentence "quality".

English language sentences can be input to the system and "understood" in the sense that if the topic is about SVG code language, such as sentences containing terms like "translate" or about visual relationships ("spatiality"), such as "to the left of", then the processor can generally get the gist of them.

The sentences that this type of system can understand can be typed input or spoken as explained above. Sentences with topics and words about other domains, such as "The Rolling Stones" music, would not be understood. There is no universal domain language processor available yet. All of them have a limited number of domains within which their performance is acceptable quality. This processor is set up to have knowledge of SVG code and graphics, and nothing else.

The current system uses a particular grammar developed by Russell Suereth:

Copyright (c) 1996 Russell Suereth 
   DETR-ADJV-NOUN                                NP
   DETR-NOUN                                          NP
   DETR-NAME                                          NP
   ADJV-NOUN                                          NP
   NOUN                                                     NP
   NAME                                                     NP
   PRON                                                      NP
   DETR                                                      NP
   AUXL-AUXL-AUXL-ADVB-ADVB-VERB VP
   AUXL-AUXL-ADVB-ADVB-VERB            VP
   AUXL-AUXL-AUXL-ADVB-VERB            VP
   AUXL-ADVB-ADVB-VERB                       VP
   AUXL-AUXL-ADVB-VERB                       VP
   AUXL-AUXL-AUXL-VERB                       VP
   AUXL-AUXL-VERB                                  VP
   AUXL-ADVB-VERB                                  VP
   ADVB-ADVB-VERB                                  VP
   AUXL-AUXL-AUXL                                  VP
   AUXL-VERB                                            VP
   ADVB-VERB                                            VP
   AUXL-AUXL                                            VP
   VERB                                                       VP
   AUXL                                                       VP
   PREP                                                       PP
   ADJV-ADVB-ADJV                                  AP
   ADJV-ADVB                                            AP
   ADVB-ADJV                                            AP
   ADVB-ADVB                                            AP
   ADJV                                                       AP
   ADVB                                                       AP
   WHQU                                                      WH

This snippet of grammar shows sequences of parts of speech which constitute each phrase type, such as "PREP PP" (PREPosition Prepositional Phrase).

This grammar cannot handle all possible English grammar situations in the input sentences and is not intended too. The users of this input system are expected to be "cooperative". The system is able to convey usefully when there is a grammar occurrence in an input which it does not understand and the user may then make adjustments to the input so that the processor will understand.

This current language processor also uses a semantic analysis section which uses Suereth's SEMANTIC analysis approach but which has been extended beyond what his original system did. (This is what he published he intended to be done with his code.)

Below is part of a header from the semantic subsystem, it gives an idea of some of the various stages involved performing semantic analysis of the input sentences consists of.

***********************************************************/
    /* SEMANTIC.H                                                       */
    /* This header file contains knowledge for semantic analysis.       */
    /*                               Copyright (c) 1996 Russell Suereth */
    /********************************************************************/
    
    /* Prototypes                                                       */
    void semantic_find_items(void);
    void semantic_find_subject(void);
    void semantic_find_action(void);
    void semantic_find_manner(void);
    void semantic_find_time(void);
    void semantic_find_object(void);
    void semantic_find_place(void);
    void semantic_different_sentences(void);
    void semantic_multiple_actions(void);
    void semantic_multiple_manners(void);
    /* end of file                                                      */

Notice that there is explicit provision for locating and handling words in the concept domain of time and space, and that not only who the doer of an action is but also the items used and the manner in which the action is done. This allows the user to type in a sentence which states something like "put the starburst in the lower left corner and make it burst quickly." The starburst is a named reference to some animation item the inputter described in some other sentence in the input paragraph.

The source of a C program (converse.c), associated with this paper is available. The converse program inputs text in the form of multiple sentences, analyses them and outputs a text response. Where an action is required the user can type in a request such as a query about an SVG picture and the sytem provides an answer in text form. This sentence input system also has a modest ability to add new vocabulary or semantic information via sentence input. As with any computer program, with the language processor, garbage in garbage out.

Here are some examples of what might be typed in :

XHTML rendition created by gcapaper Web Publisher v2.0, © 2001-3 Schema Software Inc.