SVG-attributes for storing metadata in product design

Metadata and workflow processes

Uwe Leonhardt
Intelliact AG

Siewerdtstrasse 105
CH-8050 Zurich
Switzerland
email: uwe.leonhardt@intelliact.ch
Tel.: +41-1-315 67 47
Fax: +41-1-315 67 59
webpage: http://www.intelliact.ch

Jan Hoffmann
IMES-ZPE
ETH Zurich

CLA E23
Tannenstr. 3
CH-8092 Zurich
Switzerland
email: jan.hoffmann@imes.mavt.ethz.ch
tel.: +41-1-632 04 16
fax.: +41-1-632 11 81
webpage: http://www.zpeportal.ethz.ch

Keywords: metadata and workflow processes

Abstract

In product development, it is very important to store data from the development process reliably in order to be able to use it for further reference. Most of the data generated in the early phases of a product's development is unstructured and currently not digitally stored. This paper proposes a method for storing metadata in principle sketches by using SVGs. The sketches are managed in a database, but with easy access over the world wide web. The prototype also incorporates different search possibilities.

Introduction

Storing and managing data, knowledge and information is a growing task for many companies. Many electronic tools have been developed for supporting the development process, but they mostly address the technologies necessary for using and coordinating processes in a global environment without offering solutions to the steadily increasing flow of data. What is chiefly needed in this area are technologies and tools for retrieving data, even if it is not presented in a well-structured form. The implementation concentrates on the early phases of product design, were inspiration from earlier works or reliance on existing solutions is very important, but the recollection of such "templates" is fuzzy. It is this seldom well structured data that shall be tapped as a source of knowledge. The aim of the presented method is to propose a tool that allows searching and most importantly finding loosely structured or unstructured data from a specific data collection. One possible way to do this is including searchable metadata directly in the documents that are the object of the search. Because all industries increasingly use the internet as a means of internal communication, a web compatible format for graphical representation of data was needed. The W3C designated SVG as the new basis for graphical content on the World Wide Web. In view of this standardization, the new format was chosen as the basis for the documents the proposed system manages.

This paper is structured as follows: In the first chapter the project is motivated by giving an overview on the data and information management requirements in product development. In the second chapter, an outline of the use and information content of sketches of principle in the early phases of product development is given, alongside an analysis of relevant attributes that will serve as search criteria. Chapter three focuses on the reasons for choosing SVG as a format for the sketches. In chapter four, the possibilities for searching information in databases or larger, unstructured collections are enumerated. The prototype application that has been developed is described in chapter five. Chapter six summarizes the work and gives an outlook on further improvements.

This work is based on a semester thesis written by Reto Flueckiger at the Center for Product Development in the winter semester 2001/2002. His original work is titled Erfassen und Wiederfinden von unstrukturierter Infomation - Realisierung anhand eines Skizzenmanagement-Tools. The authors would like to thank all the contributors from the institute for their help. Special thanks go to Andreas Neumann from the Institute of Cartography at ETHZ for his help with SVG.

Motivation

Nowadays business processes are characterized by a large amount of support with digital tools and databases. The main problem of accessing information is not the lack of availability but the description, search, navigation and representation of information. The following picture shows the areas of product development where digital tools are available and support the corresponding methods. This proposal is set to provide additional capture capacity on the left part of the Digital Product arrow.

The digital product

Figure 1: The Digital Product

Modern product development relies extensively on data and information managing capabilities, as best illustrated by the many PDM (Product Data Management) and ERP (enterprise Resource Planning) Solutions currently available on the market. Under the generic acronym of collaborative product commerce tools in great quantity are being developed for working in global and widely distributed environments. These focus on support for collaboration, meaning that several people can work on the same objects, for example a CAD model of a part. The current standard is to store the data in one logical entity (backups not taken into account), typically a database. The system guarantees data integrity by controlling access and change. The content of the database is very strictly organized and objects are identified by their numbers. Every object in the database is described by its metadata and the relations with other objects. Users can search for objects by querying metadata and constraints between objects. Because metadata is added mostly automatically from attributes defined in other applications, e.g. CAD, and therefore confined to clearly limited categories, the available metadata is highly structured. As a consequence, search possibilities are also limited.

Especially in the early phases of product development there are only a few methods of describing the design information for use in later design processes. Most of these methods like principle sketches are not or only poorly suited for digital mapping and structure access methods. In these early phases, engineers do not yet use the capabilities of advanced applications like CAD, but mostly draw freehand sketches (see next chapter). Consequently, due to a lack of specific tools, most of these sketches are not stored digitally, and only rarely added to the database of a PDM system. These sketches are used to display ideas and sometimes as an inspiration, thus generating additional solutions for the problem at hand. To be able to do that, it is important to collect and store such freehand drawings, either by having the developer use a computerized sketching board or by scanning the sketches and converting them to a unified format. The advantage of having such sketches in a database or rather a readily retrievable form is that they can be reused for other projects, without necessarily involving the people that drew them in the earlier one ("Could You please find that sketch of a gear box in Your collection of bits and scraps, I think You drew it on a napkin..."). To be able to find specific items in such a collection, some way to index and catalogue them (structuring the data) or a search engine that can use simple human associations (there was some circle with three spokes, I need a lever...) to search for relevant examples in the same database are needed.

Vision

The vision of this paper is to achieve an integration of graphical, textual and classification information by using a special tool in the early design phase for describing the design intent. This tool should support a highly interactive method in relating design information for use by multiple designers who will add information in any phase of product development. Design in the future will take place within a locally separated independent team, where every designer can sketch his ideas in a easy way. These will then be transferred via a network to the other team members in an understandable (graphical) format.

By relating the highly interlinked information that is created during the design phase with a graphical description, there will be a big step from storing only one-dimensional data and their relations to a two-dimensional knowledge base. By automatic classification of information in a graphical file using geometric primitives there will be a very efficient way to find and access the right information in a complex structure.

Dimensions of information

Figure 2: Dimensions of Information - from single- to multi-dimensional

Information representation will link the easy to handle graphical information (represented in SVG) with the textual information added during the design phase. That way, graphical information can be augmented with background data added in text form by the designer. This data can be superimposed on the graphic or only be available on demand (mouse events).

This work provides a first storing and sharing application for the proposed distributed environment. Information from the early phases of design is captured and stored, and then made available for later steps. Access to the data will be simplified by the different search strategies provided by the application: database indexes can be coupled with string searches on the data and metadata. Ultimately, relations of data objects with each other will be interpreted by the system, in order to provide information for the user based on the current context of his work. This exploration of links will be based on the geometrical information.

Sketches of principle in product development

In the early phases of product development, sketches are one of the most important means of communication: ideas can be fixed on a physical support, and the graphical expression helps to communicate its essence. Another advantage is that sketches can be used as a reminder, when the next step is taken towards the finished product. An important step in the development of a product is the definition of the action principles that will be used in the product. Once the functions the product has to fulfill have been defined, means of achieving these functions need to be found.

In product development, a special kind of symbols for expressing functionality in freehand sketches has been developed. They are used to show mechanical functions without detailing the complete mechanism. Predefined building units are combined to modules that can solve one of the demands defined for the new product. A few examples are given in the following picture.

Sketches of principle

Figure 3: Exemplary elements of sketches of principle (provided by ETH Zurich, Prof. Dr. A. Breiing)

At the moment, no library is implemented in the prototype, and only a few examples have been added. A further step in giving the user more value will be to have the system propose solutions from earlier projects as an inspiration by correlating the new design context or the new sketches with known data. This could improve re-usage of already found and implemented solutions by giving a direct link to finished products, either from the same area of product development or from totally different fields of activity.

An example for a sketch of principle with the resulting product is given in the following picture.

From a Sketch to a CAD-drawing

Figure 4: Sketch of principle and CAD-drawing of a clutch (provided by ETH Zurich, Prof. Dr. A. Breiing)

Why SVG?

The most important reason for choosing SVG as the graphical format for this prototype is that the XML-basis provides the new standard in the internet. This virtually guarantees compatibility with many applications and also supports the idea of an internet portal as the access point to the database. The files are web-enabled and relatively small, guaranteeing short download times.

Because XML is set to conquer the web, most CAD and PDM providers already offer or rather announce import and export tools for the new formats. That is very interesting for this work, because it gives a significant guarantee of compatibility with many different systems, without having to define the translators oneself. SVG are poised to become a standard exchange format for graphics, and their vector form accommodates most CAD standards, because they typically also use vector graphics. A problem for import of SVG based sketches into CAD systems will be the conversion and scaling of units, but since sketches are mostly not to scale or only roughly, they will only be used as coarse models for the three-dimensional CAD objects.

Another positive factor with conversion is that physical documents can be scanned and saved as SVG, which enables the user to upload data from almost any format into the proposed database. A drawback is that converted scans are not very good vector graphics, therefore searches for geometrical information will not be effective on them. Aside from scans, many SVG editors already provide an interface for drawing sketches, so this does not have to be created specifically. Providing a library of sketch elements will be one goal of further work on the prototype.

From the viewpoint of data management and searching, the most interesting feature of SVG as a subset of XML is the possibility to use user-defined tags as receptacles for metadata. With the metadata not only shown and stored in the database but accessible over a string search, the implementation of data content comparisons becomes much easier, because the same tools can be used to process standard metadata and geometrical information. In SVG, the parseable format gives a direct link between the graphical representation and the text information, thereby facilitating manual and automatic changes on the data from the perspective of the user.

Another advantage of having the metadata in the graphical file and thus also outside of the database is the possibility to work offline without losing information content. The user does not have to export the file and the metadata to a local machine, but needs only the file. Of course, all the changes that are effected offline need to be transcribed into the database as quickly as possible, in order to avoid collisions between different versions of the same file.

Search strategies

Search strategies are heavily influenced by their surroundings, because different systems entail different data formats and also may limit the options open for a search. One possibility is to have the data on a single server, so called centralized data, another is to have a large data collection distributed over many computers, which is known as decentralized data. In this project, the prototype starts out with centralized data on a server, but with possible access from anywhere. As long as the amount of data is not too great and the number of user accesses stays low, the single machine solution will be kept up. A bigger application environment with higher demands on speed and distribution might lead to a multi-tiered solution. An implementation with a decentralized data collection would be most interesting in an environment with many widely distributed accessing users, such as a technology transfer portal.

Although the system and database size is still limited, the solutions found for searching reflect technologies from both areas: centralized as well as decentralized data. Solutions for the first possibility come from databases, while the second uses typical web techniques.

In a database, searching is mostly done on special attributes of objects, which can be subjected to a rigid formalization. The nomenclature is adapted to the query language inherent to the DBMS. A good example is SQL with its special phrasing: select attribute from defining group where selection criterion is satisfied. A SQL query must follow a strictly defined form to be valid. In a database, storage is centralized, with all the data contained in one vault. The typical search activities in a database are performed on the indexes rather than on the data itself, in order to minimize response time. The data in the indexes is necessarily truncated and limited to parseable text data.

Once a suitable bit of information has been found in the index, the according object can be retrieved from the database and displayed to the user. Typically, a selection of objects is presented to the user, who either chooses one or refines the search criteria. Indexes are dynamically built and maintained by the database, always reflecting the latest and the most frequent searches. Databases also allow other types of searches, e.g. finding a certain string in stored text files.

In an environment with distributed data, searching is done by maintaining a catalogue of data, which means mostly a collection of addresses where certain data is kept. To assemble such a catalogue, different possibilities exist:

All three methods have their advantages and shortcomings; live search is not time efficient on large data collections but offers the best resolution for the final results. The results are also always up-to-date, because the search is done in real time over the actual data. Performance can be increased by pre-caching the results of the most frequent queries. Directories have the advantage of being adapted to larger data collections and offer good performance when the query is very open. They also offer directly a suitable collection of results that is highly structured and hierarchically organized. Problems come from the low resolution (specific queries only yield genera results) and the manual work that has to be performed. Spiders are the most commonly used search engines on the web today, because they gather their information automatically. These programs go to web pages, gather some keywords and then follow all the links on the page. In this way, all pages that are mentioned somewhere can be found.

All of these techniques can be used in the prototype, but in view of the chosen architecture with a database, a search index coupled with the possibility of a live search is the method of choice. At the moment, no method for assessing the accuracy of a search has been implemented, but some assessment will certainly be added. That way, it will be possible to sort the contents of the database according to their relevance for certain user classes or query purposes. With such a grouping, follow-up documents could be presented to the user in addition to the ones most relevant to the query, but not in direct response to the query, rather in accordance to the general theme outlined by all queries the user has started.

Description of the prototype

The prototype is divided in four parts: An indexing tool that allows users to introduce new data into the database and to interact with older data, a viewing tool for displaying the data, a query tool with search functionalities incorporated into it and a MySQL database for storing the data more reliably. The data is also stored independently on the server, in order to provide quick access without the delay caused by the database.

Main menu of the prototype

Figure 5: Main menu of the prototype

Some goals were realized in the prototype, although it has not yet reached the full potential envisioned at the start. The most important concept is that information management and handling need three distinct areas of interaction with the data, which are then realized through concept, hardware and software:

Classification of interactions with information

Figure 6: Schema of user interaction with information

Since the present work deals mostly with the software side of the realization, hardware is not discussed in depth. Aside from the server needed for the prototype and the clients, possible hardware would at best include a scanner for converting physical sketches into a digital form. In the prototype, the analysis and the modeling of the data are done directly in the indexing tool, while the data is stored on the server within the MySQL database. Representation of data is done via the Adobe SVG Viewer. By reviewing the different interactions with the data, a better understanding of the prototype can be achieved. In some areas, the current tool still has shortcomings which will be addressed later.

The first area of interest is certainly capture of information. The basic information in the proposed environment with sketches of principle is inherent to the sketch (geometric), but also described in metadata, e.g. author, project, purpose. All these additional information contents are currently inserted when the data (=sketch) is entered into the data collection via the indexing tool. Mouse events are supported in the viewing tool. The geometrical information displayed in the sketch can be augmented by pop-ups of metainformation that appear on mouse over or mouse click events. These dynamic feature can not yet be added in the indexing tool, but have to be inserted via a text editor. Simplified input of such events is to be included into the indexer.

The goal is to have this encapsulating process take place directly at creation of the sketch, either automatically from the context or through input from the creator. It will also be attempted to deduce metadata from special characteristics of the geometrical data: special combinations of graphic objects are identified as new objects and the corresponding names inserted into the graphics file.

In the prototype, the necessary meta-information is entered by the user who is registering a dataset. He is asked by the system to provide all the specifics he knows about the file. The corresponding input window is shown in the next picture:

Indexing form

Figure 7: Input form of the indexing tool

The user is offered a choice of predefined entries for each category, which are taken from a thesaurus. However, these are nor exclusive, any input may be given. The thesaurus can also be edited by an administrator, thereby making new entries available to all users. If the predefined entries are not satisfying, it is also possible to enter new ones. The information the user enters is then stored in the database's tables, but also converted to XML and inserted into the SVG file itself, which is done by the following write_in function:


function write_in_svg ($datei)
    {
    global $datum_form;
    global $firstname_form;
    global $secondname_form;
    global $email_form;
    global $abteilung_form;
    global $projekt_form;
    global $function_form;
    global $sketchtype_form;
    global $quality_form;
    global $lod_form;
    global $color_form;
    global $comment_form;

    $zu_loeschen = "</svg>";
    $input= "<desc id='datum'>$datum_form</desc>
             <desc id='vorname'>$firstname_form</desc>
             <desc id='nachname'>$secondname_form</desc>
             <desc id='email'>$email_form</desc>
       	     <desc id='abteilung'>$abteilung_form</desc>
             <desc id='projekt'>$projekt_form</desc>
             <desc id='funktion'>$function_form</desc>
	     <desc id='typ'>$sketchtype_form</desc>
	     <desc id='qualitaet'>$quality_form</desc>
	     <desc id='lod'>$lod_form</desc>
	     <desc id='farbe'>$color_form</desc>
	     <desc id='kommentar'>$comment_form</desc>

          </svg>";

    replace ($zu_loeschen, $input, $datei);
    }

The advantage of this seemingly redundant process is that the metadata can be accessed without connecting to the database and the information stays with the file, even if the relation in the database is broken. In certain applications, this might even lead to a system tat does not use a database anymore, but in this prototype the database is maintained because data security and integrity are important for information containers used in the design of a product. Since one goal of this project is to keep information and knowledge from the early phases of product design available in the later phases, use of a database is unavoidable.

Entering the sketch into the database means storing and modeling the information, because the defined attributes are separated from the file in order to be available for the special search mechanisms in the database. The attributes are displayed to the user after they have been stored, in order to provide a last check.

Indexing results

Figure 8: Display of indexed information

One special aspect of SVG use is the possibility of a full text search over the file. For this purpose, the characters in the file are extracted into a string and stored in a database attribute. The file itself is stored in an upload folder . Access takes place over the reference in the database entity. Interactivity between different operating systems is possible thanks to Unicode-encoding. By saving the geometric data as a string, it becomes possible to start searches on it. Data modeling is also implied in the indexing tool, since the attributes that contain metadata are added there, with the additional possibility of changing syntax definitions via the administration tool.

To allow the data to be represented in the viewer, the prototype accesses the stored file from the local upload folder, taking the link from a reference in the database. It would also be possible to use the string from the database directly, by assigning it to an object that would then be fed through an interpreter, but this solution is not implemented at the moment.

The next interface is the search tool, which can be accessed by all users, but not necessarily with the same goal. A designer who has just uploaded a sketch into the database will want to test if it can be found with the attributes that were added. Another user might just be looking for a sketch that could help with a new problem or a collection of sketches dealing with a specific solution. These different focuses are addressed by complementary search possibilities. The user can choose to look for index terms, using an interface very similar to that of the indexing tool, again with the help of the thesaurus. It is also possible to search for terms not included in the thesaurus, but then spelling and captions can be critical. Accordingly, the thesaurus should be kept up-to-date and give a wide enough range of choices.

Another opportunity for searching is to do a keyword search inside the stored SVG files by going through their defining text. Here the user can look either for geometric information like circles with a predefined radius (the correct units should be used consistently) or certain predefined building blocks (a gear box might consist of a box with several circles inside). These blocks would be identified in the file by the correct string of characters, e.g. <circle class="L1" cx="24" cy="-1" r="14.5 "/>, which the user can either input through a mask (not yet implemented) or directly, provided fluency in XML slang. To be reliable, such a search relies on the data being highly consistent, meaning that the proposed building block must be properly identified and named in all files. A further step might be comparisons between SVG files (string matching) and the use of comparing search algorithms on the files in the database.

This search is done via the SQL-function like, which searches through all SVG files in the database. Truncation is allowed in the tool, thus making it possible to search fragments of a word. The results are displayed in a table, and ca be sent to the viewer. A specialization of the string search is to look for all sketches that contain elements of a certain size, thus implementing a parameter search that could be interesting for the preparation of parametric CAD models. As yet, no additional concepts or tools have been introduced for such a search, like an implementation of design guide lines reaching back from CAD.

Search form

Figure 9: Search tool with different input masks

Search results

Figure 10: Search results

The last but also the most important interface of the prototype is the viewer, where users are presented with the sketches they have chosen from the search results. The viewing frame allows some interaction with the picture, like popup information menus on a mouse over or opening the source code to view it or even effect changes. The picture is built up from the local file and the embedded metadata is extracted on demand with a script. A typical mouse over on an element of a sketch calls showInfo() while the mouse out empties the string again.

Viewer

Figure 11: Viewing tool

The metadata that is shown alongside the image in the viewing tool is extracted from the SVG file via the following scripted function called extract_meta:


  var svgplugin;
  function extract_meta()
  {
	//get a reference to the SVG-Document, within the DOM-Tree
	svgplugin = document.embeds[0].getSVGDocument(); //first plugin within page
	//get reference to meta elements

	datum_obj = svgplugin.getElementById("datum");
	datum_obj = datum_obj.getFirstChild();
	datum_obj = datum_obj.nodeValue;

	vorname_obj = svgplugin.getElementById("vorname");
	vorname_obj = vorname_obj.getFirstChild();
	vorname_obj = vorname_obj.nodeValue;

	nachname_obj = svgplugin.getElementById("nachname");
	nachname_obj = nachname_obj.getFirstChild();
	nachname_obj = nachname_obj.nodeValue;

	abteilung_obj = svgplugin.getElementById("abteilung");
	abteilung_obj = abteilung_obj.getFirstChild();
	abteilung_obj = abteilung_obj.nodeValue;
	
	projekt_obj = svgplugin.getElementById("projekt");
	projekt_obj = projekt_obj.getFirstChild();
	projekt_obj = projekt_obj.nodeValue;
	
	funktion_obj = svgplugin.getElementById("funktion");
	funktion_obj = funktion_obj.getFirstChild();
	funktion_obj = funktion_obj.nodeValue;

	art_obj = svgplugin.getElementById("typ");
	art_obj = art_obj.getFirstChild();
	art_obj = art_obj.nodeValue;

	qual_obj = svgplugin.getElementById("qualitaet");
	qual_obj = qual_obj.getFirstChild();
	qual_obj = qual_obj.nodeValue;	

	lod_obj = svgplugin.getElementById("lod");
	lod_obj = lod_obj.getFirstChild();
	lod_obj = lod_obj.nodeValue;

	farbe_obj = svgplugin.getElementById("farbe");
	farbe_obj = farbe_obj.getFirstChild();
	farbe_obj = farbe_obj.nodeValue;

	comment_obj = svgplugin.getElementById("kommentar");
	comment_obj = comment_obj.getFirstChild();
	comment_obj = comment_obj.nodeValue;

	//write in "Erstellungsinformation"
	document.all.erstell_info.innerHTML = "<p>Vorname:   " + vorname_obj + 
	                                      "</p><p>Nachname:   " + nachname_obj + 
                                              "</p><p>Datum:   " + datum_obj + 
                                              "</p><p>Abteilung:   " + abteilung_obj + 
                                              "</p><p>Projekt:   " + projekt_obj + "</p>";

	//write in "Funktion"
	document.all.funktion_info.innerHTML = "<p>" + funktion_obj + "</p>";

	//write in "Formatinformation"
	document.all.format_info.innerHTML = "<p>Skizzenart:   " + art_obj + 
	                                     "</p><p>Qualitaet der Skizze:   " + qual_obj + 
	                                     "</p><p>Detailierungsgrad:   " + lod_obj + 
	                                     "</p><p>Farbe:   " + farbe_obj + "</p>";
	
	//write in "Kommentar"
	comment_info.value = comment_obj;
  }

Conclusion, outlook

The prototype that was presented in this paper fulfils the basic demands defined in the introduction. Sketches in SVG format can be included into a database with the necessary metadata via a web-based portal. The metadata is also inserted into the files, thereby making it possible to search the raw data for the user-defined attributes. The portal allows indexing of sketches, offers different search capacities and has a standard viewer with added functionality like information extraction on mouse-over and display of metadata.

Up to now, no direct editing or generation of sketches has been implemented, but any SVG editor can be used for source material. The portal in itself is backed by a database that manages the data collection and uses JavaScript applications for its special gimmicks. The SVG files are stored in the database as well as in the file system, giving priority to quick access.

The possible expansion stages of the current prototype include the following:

Outlook

Figure 12: Outlook on further projects

The prototype can be tested and explored at http://www.zpe-svg.ethz.ch and will be online from the 20th of June 2002. The authors would like to sincerely thank all those who contributed.


Valid XHTML 1.0!