SVG as graphical metadata for distributed spatial data processing

SVG metadata

Martin Brändli, Jörg Sparenborg
Swiss Federal Institute WSL

Zürcherstrasse 111
CH-8903 Birmensdorf Switzerland
e-mail: martin.braendli@wsl.ch
e-mail: joerg.sparenborg@wsl.ch
fax: ++41 1 739 22 15
webpage: http://www.wsl.ch/

Keywords: Distributed database, web feature server, SVG metadata, data integration

Abstract

This paper is proposing the use of Scalable Vector Graphics (SVG) as metadata for the description of graphical representation when spatial data is shared and exchanged via Internet. We are reporting on a project called "Virtual Database" which aims at the integration of different databases of the Swiss Agency for the Environment, Forests and Landscape storing fauna and flora data. The architecture of the "Virtual Database" follows the principle of loose coupling of individual data components (databases). Every component provides data according to uniformly specified interfaces. These interfaces are implemented according to the Web Feature Service Implementation Specification proposed by the Open GIS Consortium. As a minimal requirement, such a service must be able to process requests concerning existing capabilities, feature type descriptions and feature data. Requests and responses are transferred as XML, in particular as GML for feature data.

The proposed Web Feature Service (WFS) is data-oriented and neglects graphical representation of spatial data. Since visualization is an important requirement for most spatial data handling applications, we are proposing to extend the basic capabilities of the WFS with the ability to share rendering information. For the implementation of the "Virtual Database" we are specifying an additional capability which allows to query and serve graphical metadata. Graphics are defined using SVG elements, in particular symbols, markers, patterns, and the more general presentation attributes, embedded within a schema to express cartographic legends.

Introduction

The main task of the Department of Landscape Inventories of the Swiss Federal Research Institute for Forest, Snow and Landscape Research (WSL) is to study the current state of the landscape and to detect and simulate future developments. In collaboration with other research institutions and departments of the WSL corresponding methods for the collection and long-term storage as well as models and analysis methods are being developed. A broad range of tools is used: Database systems, geographical information systems (GIS), remote sensing systems, statistic analysis packages and numerous self-developed programs. A medium-term goal of the department is the development and establishment of an integrated environmental and landscape information system, which offers a unifying platform in order to apply these diverse technologies to existing data. An information system in general consists of appropriately compiled data, methods and models to process the data, and usually of the possibility to access external data sources. The goal of an integrated information system, which we present in this paper is the development of a comprehensive platform which allows for a uniform access of distributed data and methods. It thus permits the potentially arbitrary combination of any available data and methods. The most important components of such a system form database systems (DBMS), geographical information systems (GIS), servers for the access of methods and models and a middleware which enables the communication between and the combination of the components (see fig. 1). Basing the middleware on Internet/Intranet technology allows a general access of the different components.

Components

Figure 1: Components of an integrated environmental and landscape information system. The system can access distributed data repositories, different services for geoprocessing, and other data processing resources as well as external information sources.

The work presented in this paper reports on the design and implementation of an architecture for a Virtual Database which concentrates on the data component of the comprehensive environmental and landscape information system. The main goal of the Virtual Database is the integration of data from distributed data repositories in order to allow transparent access and analysis of available data. In particular, the Virtual Database has the aim of integrating different databases of the Swiss Agency for the Environment, Forests and Landscape storing fauna and flora data. The architecture of the Virtual Database follows the principle of loose coupling of individual data components (databases). Every component provides its data according to uniformly specified interfaces. An integration layer collects the data from the internet, integrates and processes them. Since the development is in an early stage, the integration is only implemented for the combined visualization of distributed data and allows for web mapping functionality. However, the proposed architecture of the Virtual Database shows a clear potential to be extended towards integration of methods which reside on distributed servers.

Rendering spatial data requires appropriate symbolization of spatial features. Within a clearly determined application with a defined set of data, the symbolization task may be part of the design process. In contrast, in a distributed environment with a high degree of freedom of choice of data sets, symbolizing spatial data must be accomplished dynamically. The solution we propose in this paper is to use metadata for the graphical representation which may be accessed together with the spatial data. Graphical metadata is provided as a subset of features from the Scalable Vector Graphics (SVG), embedded in a comprehensive scheme for specifying legends for spatial features.

The paper proceeds as follows: Section 2 discusses some of the basics of the presented work. Section 3 sketches the architecture of the Virtual Database giving some insights into implementation issues. Section 4 shows the implementation details of using SVG as graphical metadata for the rendering of involved data. The paper concludes with the presentation of first results of the implementation of the Virtual Database.

Integration of web-based data repositories

The basic section addresses issues concerning the way how geodata are currently made available via the Internet on one hand and methods and technologies available for integrating data from different sources on the other hand. The use of the Intranet and the Internet for the transfer of geodata has substantially been increasing during the last few years particularly due to the progress commercial software vendors made by the development of map server software. At the same time, several open source projects, such as MapServer (http://mapserver.gis.umn.edu/), have contributed to this fact. However, most of the commercially available software packages as well as open source programs suffer from the functionality of a mature GIS. These programs focus on web mapping or web cartography methods which offer functionality for the use, distribution and production of maps by means of the Internet [1]. They mainly allow viewing operations on spatial data and for submitting simple queries. Methods for the manipulation, analysis and submission of complex queries of the data are limited because of the fact that no data but digital maps in the form of images are made available [2]. Current standardization efforts such as the initiatives by the Open GIS Consortium (OGC) support this type of geospatial data handling. OGC released the Web Map Service Implementation Specification (WMS) which standardizes the way in which maps are requested by clients and the way servers describe their data holdings [3]. Like all the other web mapping applications, this specification limits data access to the exchange of digital maps as images. It does not provide spatial analysis functions such as overlays for instance.

However, the need for exchanging and sharing spatial data on the internet which goes beyond images and maps is well recognized, however. For instance, OGC recently released a request for comment on the Web Feature Service Implementation Specification (WFS, [4]). A web feature service is a program or module that implements support for query and optionally for transaction operations on web accessible spatial features. The specification enables exchange of spatial data in an XML-compliant way. In contrast to web mapping applications, however, such feature services are data-oriented and neglect the graphical representation of spatial data. Rendering hints may be useful, though, since many geospatial application fields such as geology have agreed-on cartographic symbols for mapping specific data. One way of how symbolization of spatial data may be exchanged is the use of cartographic metadata [5], [6], [7]. Accessing both - spatial data as well as the cartographic metadata - allows translation of the sources into an appropriate graphical representation. [6] as well as the OGC with its OpenGIS draft candidate specification for a Styled Layer Descriptor [8] propose to use Scalable Vector Graphics (SVG) elements for the definition of map symbols.

The discussion of integrating distributed databases or data repositories is based on the requirements of the Virtual Database. The first demand concerns the homogenization of existing heterogeneities such as different data types, data models and software systems. The second requirement results from the need that the autonomy of databases which contribute to the Virtual Database must not be limited. Maintaining the autonomy of a database is important because particular applications that access the data may already exist.

Federated database systems (FDBS) are a technology which may be used in order to approach the two requirements: "A federated database system is a collection of cooperating but autonomous component database systems." ([9], p. 183). Using FDBS, the integration task concentrates on the homogenization of the schemes (data models) of the different database systems. [9] developed an integration methodology which integrates the schemes of involved database components based on a five level hierarchy. The integration process maps the different local schemes into a unified global scheme which applies to the federation. In contrast to FDBS, recently developed technologies such as Component Database Management Systems (CDBMS) not only support the schema integration approach but offer additional alternatives for database integration [10]. CDBMS built on database middleware are of particular interest for the implementation of the Virtual Database. The combination of database components is approached in a first step by defining a common format into which local formats must be translated. The second step consists of the specification of uniform interfaces which control data exchange and communication between the database components and the middleware. The set of required interfaces are implemented by means of wrappers which for instance are able to submit queries to the database components and handle the corresponding results [11].

The architecture and implementation of the Virtual Database follows the principle of a database middleware using standardized interfaces. Design and development issues are presented in the next section.

Architecture of the Virtual Database

Some of the design goals of the architecture of the Virtual Database have already been mentioned in the previous section. They are in particular:

Fig. 2 shows the design of the architecture of the Virtual Database arranging the software components and interfaces into clearly separated layers. The description of these components - from bottom to top - is as follows:

Architecture

Figure 2: Architecture of the Virtual Database

Implementation of the Virtual Database

Involved data components

For the implementation of a prototype of the Virtual Database two databases are at disposal for integration. The first database is installed at WSL and is called "Data Center for Nature and Landscape" (DNL). It consists mainly of inventory data of protected biotopes in Switzerland [12]. The second database is located at the "Centre Suisse de Cartographie de la Faune" (CSCF) and stores data on endangered animal species (http://www.unine.ch/cscf/). The integration of the two databases mainly allows overlaying the boundaries of the protected areas with the observed locations of the animals. Both databases run under the same database management system software (Oracle) but store the geometry of the spatial features differently. Polygon data of the DNL is stored using ESRI’s Spatial Database Engine (SDE). The coordinates of the locations of the fauna database are stored as regular Oracle database table columns without support of any spatial data handling functionality such as the Oracle Spatial Option for instance. The current configuration of both databases would allow an integration based on Oracle and SDE software. However, in order to make the implementation as generic as possible, the implementation follows the architecture proposed in the last section. Figure 3 shows the components of the current implementation of the Virtual Database which are presented in the next few sections.

Software components

Figure 3: Software components for the implementation of the Virtual Database

OGIS-compliant implementation of the access layers

The implementation of the access layers uses the interfaces specified by OGC’s Web Feature Server WFS [4]. The specification - currently in the state of "Request for Comment" - is proposing interfaces for the manipulation of spatial features and bases the communication between the distributed computing platforms on HTTP. The full functionality of a WFS consists of methods for querying, inserting, updating and deleting data. A minimal WFS must offer the following subset of interfaces:

The Virtual Database only implements this minimal set of interfaces since it agrees on the goals listed in the previous section.Transactions such as insertions and updates are not required by the Virtual Database at the moment.

As figure 3 shows, the three WFS interfaces of the access layers are implemented by using the programming language Java. Java plus the Java servlet technology offer the basic elements needed for network programming, the embedding within a web server environment, and the communication using HTTP [13]. In addition, many public domain Java classes already exist for generating and parsing XML documents (see for instance [14]) - a fact which supports development substantially since the communication between access layers and integration layer uses the Extensible Markup Language (XML) as common its format.

The particular implementation details for the interfaces are discussed below (we do not list the XML schemes of the requests and responses of the different requests but refer directly to corresponding OGC repository at http://www.opengis.net/schema.htm):

Extending the access layers for the retrieval of graphical metadata

The Virtual Database presented in this paper implements the interfaces specified for the WFS for the description and access of spatial data from distributed data repositories. Since spatial data must be presented to users in an appropriate way, data components should also have the ability to provide symbolization information for data rendering. In order to enable sharing of rendering information, the approach of extending the WFS specification has been chosen. Symbolization for spatial data is provided by the definition of a cartographic legend for spatial features. Two extensions for a WFS are necessary:

The first extension supplements the WFS by a service which allows to serve rendering information. The actual WFS implementation specification provides an element "VendorSpecificCapabilities" which allows the inclusion of additional capabilities. The following code snippet shows the XML schema of this extension which is part of the GetCapabilities Response section of the WFS specification (for the complete source file see GetCapabilitiesResponse.xsd):


<complexType name="GetLegendType">
	<sequence>
		<element name="DCPType" type="wfs:DCPTypeType" maxOccurs="unbounded"/>
	</sequence>
</complexType>

<complexType name="VendorSpecificCapabilitiesType">
	<sequence>
		<element name="GetLegend" type="wfs:GetLegendType"/>
	</sequence>
</complexType>

The code defines a new capability called "GetLegend" which includes an element that supplies information about the distributed computing platform from which a legend definition is being served. The declaration of the "DCPType" element is included in the source file.

The second extension of the WFS concerns the way of how legend information must be requested and how results are being expressed. Like the other requests and responses presented in the previous section, the corresponding information is provided by means of XML. In order to request legend information for a spatial feature, a query consisting of the feature type and optionally the property of a feature type for which symbolization is required must be composed. The XML code closely follows the rules for querying feature data except that no filters have to be applied for constraining the resulting feature set. The following XML schema snippet shows the core elements necessary for requesting legend information (for the complete source file see GetLegendRequest.xsd):


<element name="GetLegend" type="wfs:GetLegendType"/>
<element name="Query" type="wfs:QueryType"/>

<complexType name="GetLegendType">
	<sequence>
		<element ref="wfs:Query" maxOccurs="unbounded"/>
	</sequence>
</complexType>

<complexType name="QueryType">
	<sequence>
		<element ref="wfs:PropertyName" minOccurs="0" maxOccurs="unbounded"/>
	</sequence>
	<attribute name="typeName" type="string" use="required" />
</complexType>

An XML document requesting a cartographic legend must include the name of the feature type, specified by the "typeName" attribute of the "Query" element. The returned symbol information may be further constrained by specifying the element "PropertyName" indicating which attribute of the feature type should be mapped.

The core part of this second WFS extension, however, is the specification of the legend information, e.g. the structure of the XML document containing the rendering data. This information is composed of two complementing parts: On one hand graphical elements such as colors, strokes, fill patterns, etc., must be defined in order to know how to draw symbols. On the other hand, look up tables by means of legends must be provided. They specify which symbols must be drawn for which feature property values. The symbol part is based on the Scalable Vector Graphics (SVG), because SVG offers a rich set of graphical elements and is an emerging standard for rendering vector graphics on the Internet. However, only a small subset of SVG elements are used for the definition of cartographic symbols, in particular SVG symbols, markers, patterns, and the more general presentation attributes.

The construction of the legends which refer to the symbol elements is achieved by adapting the techniques used by internet map servers, in particular ESRI’s ArcIMS. Two reasons are responsible for this choice: The Virtual Database bases the display of spatial data on map server functionality (see fig. 2). Currently, an instance of ArcIMS is running at the WSL which may be included into the Virtual Database cluster. Basing legend construction on methods implemented by ArcIMS enables rapid integration. The second reason is the language which ArcIMS is using for legend descriptions: Legends are based on ArcXML (http://arcimsonline.esri.com/arconline/documentation/ims_/Support_files/arcxmlguide.htm) - a language, as the name implies, completely based on XML. Since ArcXML has been developed for the definition of complete map services, only the subset which serves the definition of legends, in particular the subset of renderer elements, is used for the construction of legends. In order to make the renderer definitions usable more generically, the graphical elements of ArcXML have been replaced by the SVG elements mentioned above.

The source code of the XML schema for the definition of simple legends for points, lines and polygons implemented for the Virtual Database is provided by the file legend.xsd (the SVG DTD was translated into an XML schema representation by the dtd2xs translation tool available from http://puvogel.informatik.med.uni-giessen.de/lumrix/. The root element of a legend is a "FeatureRendererCollection" which holds a set of "FeatureRenderers". A "FeatureRenderer" is composed of either a "SimpleRenderer", a "ValueMapRenderer" or both. "SimpleRenderers" are used to apply unique symbols to all features of a feature set, "ValueMapRenderers" to apply a range of symbols to a corresponding range of property values of available features. Further details for the description of the different renderers are available online from [17] (http://arcimsonline.esri.com/arconline/documentation/ims_/Support_files/arcxmlguide.htm). An example for the combination of the use of elements from ESRI’s ArcXML and SVG elements for the case of a simple renderer is shown below:

<!-- Elements for the SimpleRenderer -->

<element name="SimpleRenderer" type="axl:SimpleRendererType" />

<complexType name="SimpleRendererType">
	<choice maxOccurs="unbounded">
		<element ref="axl:SimpleMarkerSymbol" />
		<element ref="axl:SimpleLineSymbol" />
		<element ref="axl:SimplePolygonSymbol" />
	</choice>
</complexType>

<!-- Elements for the SimpleLineSymbol -->

<element name="SimpleLineSymbol" type="axl:SimpleLineSymbolType" />

<complexType name="SimpleLineSymbolType">
	<!--
		SimpleLineSymbolType is based on the svg:g Element.
		However, instead of creating the type based on restricting svg:g
		we include the elements and attributes necessary for our application:
		- A subset of elements (see below)
		- A subset of PresentationAttributes-Color
		- PresentationAttributes-FillStroke
	-->

	<!-- included elements from svg:g -->

	<choice minOccurs="0" maxOccurs="unbounded">
		<element ref="svg:desc" />
		<element ref="svg:title" />
		<element ref="svg:defs" />
		<element ref="svg:g" />
	</choice>

	<!-- general attributes -->

	<attribute name="id" type="ID" />

	<!-- PresentationAttributes-Color -->
	<attribute name="color" type="string" />

	<!-- PresentationAttributes-FillStroke -->

	<attribute name="fill" type="string" />
	<attribute name="fill-opacity" type="string" />
	<attribute name="fill-rule" type="svg:fill-ruleType" />
	<attribute name="stroke" type="string" />
	<attribute name="stroke-dasharray" type="string" />
	<attribute name="stroke-dashoffset" type="string" />
	<attribute name="stroke-linecap" type="svg:stroke-linecapType" />
	<attribute name="stroke-linejoin" type="svg:stroke-linejoinType" />
	<attribute name="stroke-miterlimit" type="string" />
	<attribute name="stroke-opacity" type="string" />
	<attribute name="stroke-width" type="string" />

	<!-- Attributes for transformations -->

	<attribute name="transform" type="string" />
</complexType>

Implementation of the integration layer

Fig. 2, presenting the architecture of the Virtual Database, suggests the integration layer being implemented as a Java servlet. Like this, the browser on top of the architecture could be realized as a thin client as proposed in section 3. However, the integration layer is implemented as a stand-alone Java application in a first step because development of the different integration tasks is much easier this way. The transformation towards servlets will be approached in the near future. This pragmatical solution implies that the data browser is implemented as a rather thick client.

The two main functions of the integration layer - collection and integration of data from different repositories and the graphical compilation based on graphical metadata are implemented as follows:

Conclusions

The implementation of the Virtual Database is currently in the state of a simple but working prototype. Therefore, we can not report on extensive experience from a user point of view. However, the development process is showing the benefits of choosing the presented architecture and methodology for the implementation of the selected concepts. Uniform access to distributed data repositories is successfully achieved by the use of standardized interfaces proposed by the WFS implementation specification of the Open GIS Consortium. This technique essentially simplifies integration of additional data components in the future. XML is chosen for data communication, data representation (GML), graphical metadata (ArcXML in combination with SVG) and graphical representation (SVG). The uniform use of this emerging technology simplifies development substantially, and is supported by the considerable availability of public domain software for parsing, creating and rendering of XML data. Java, as the selected programming language, has the advantages of being the preferred language of this public domain projects and offering comprehensive components for network programming.

The concept and first implementation of the Virtual Database show the clear potential for a successful realization of a comprehensive environmental and landscape information system in the future. The Virtual Database enables sharing of data from distributed data repositories in contrast to web mapping applications which allow sharing of digital maps. However, since visualization is an important requirement of most spatial data handling applications, this capability is included in the presented implementation. Sharing of data instead of images enables to extend the Virtual Database towards the proposed information system by collaborating with existing web services. Data may then be shared in order to send them to locations, where they are involved in subsequent data processing.

References

[1] "Settings and needs for web cartography", M.-J. Kraak, 2001. In: "Web Cartography. Developments and prospects", M.-J. Kraak, A.Brown (eds.), Taylor and Francis, London and New York.

[2] "Tools for Cartographic Visualization of Statistical Data on the Internet", A. Cecconi, C. Shenton, R. Weibel, 1999, Proceedings 19th International Cartographic Conference, Ottawa, Canada. Available at http://www.geo.unizh.ch/publications/acecconi/pdf/ottawa99.pdf.

[3] "Web Map Service Implementation Specification". Version 1.1.1, Open GIS Consortium Inc., 2002. Available at http://www.opengis.org/techno/specs/01-068r3.pdf.

[4] "Web Feature Server Implementation Specification". Version 0.0.14, Open GIS Consortium Inc., 2001. Available at http://www.opengis.org/techno/RFC13.pdf.

[5] "Modeling and Sharing Graphic Presentations of Geospatial Data", S. F. Keller, H. Thalmann, 1999. In: "Interoperating Geographic Information Systems", A. Vckovski, K. E. Brassel, H.-J. Schek. Proceedings INTEROP’99, Zurich, Switzerland.

[6] "XML in Web-based Geospatial Applications", L. Lehto, 2000. Proceedings 3rd Agile Conference Geographic Information Science, Helsinki/Espoo, Finnland. Available at http://agile.uni-muenster.de/Conference/Helsinki2000/Abstract/Agile2000_62.pdf.

[7] "Multi-Source Cartography in Internet GIS", P. van Oosterom, T. Tijessen, I. Alkemade, M. de Vries, 2001. Proceedings 4th Agile Conference Geographic Information Science, Helsinki/Espoo, Finnland. Available at http://agile.uni-muenster.de/Conference/Brno2001/REFERATY_4_1.html.

[8] "OpenGIS Styled Layer Descriptor Draft Candidate Implementation Specification". Discussion Paper, Version 0.7.0. Open GIS Consortium Inc., 2001. Available at http://www.opengis.org/techno/discussions/01-028.pdf.

[9] "Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases", A. Sheth, J. A. Larson, 1990, ACM Computing Surveys, Vol 22, Nr. 3.

[10] "Component Database Systems: Introduction, Foundations, and Overview", A. Geppert, K. R. Dittrich, 2001. In: "Component Database Systems", K. R. Dittrich, A. Geppert (eds.). Morgan Kaufmann Publishers, San Francisco.

[11] "An Architecture for Transparent Access to Diverse Data Sources", M. T. Roth, P. Schwarz, L. Haas, 2001. In: "Component Database Systems", K. R. Dittrich, A. Geppert (eds.). Morgan Kaufmann Publishers, San Francisco.

[12] "Prozessorientierte Strukturierung von Metadaten in einem WebGIS", M. Brändli, C. Ginzler, 2001, Angewandte Geographische Informationsverarbeitung XIII, Beiträge zum AGIT-Symposium, Salzburg.

[13] "Java Servlet Programming", J. Hunter, W. Crawford, 2001, 2nd Edition, O'Reilly.

[14] "Java & XML", B. McLaughlin, 2001, 2nd Edition, O'Reilly.

[15] "A Process-Oriented Approach for Representing Lineage Information of Spatial Data", M. Brändli, 2000. Proceedings 3rd Agile Conference Geographic Information Science, Helsinki/Espoo, Finnland. Available at http://agile.uni-muenster.de/Conference/Helsinki2000/Abstract/Agile2000_48.pdf.

[16] "OpenGIS Geography Markup Language (GML) Implementation Specification, Version 2.1.1", Open GIS Consortium Inc., 2002. Available at http://www.opengis.net/gml/02-009/GML2-11.html.

[17] "ArcXML Programmer’s Reference Guide", ESRI, 2002. Available at http://arcimsonline.esri.com/arconline/documentation/ims_/Support_files/arcxmlguide.htm.

[18] "XSLT Map Style Sheet Specification. OGC Working Draft, Version 0.12", R. Lake, D. Monie, 2000.


Valid XHTML 1.1!