SVG and the Preservation of Vector Images

David Duce

Professor of Computer Science
Oxford Brookes University School of Technology


                        Oxford
                        Wheatley Campus
                        OX33 1HX
                        UK
                        ++44-1865-484528
                        ++44-1865-484545
                    

David Duce is a professor of computer science at Oxford Brookes University.

Bob Hopgood

Professor of Computer Science
Oxford Brookes University School of Technology


                            Oxford
                            Wheatley Campus
                            OX33 1HX
                            UK
                            ++44-1865-484582
                            ++44-1865-484545
                        

Bob Hopgood is a professor of computer science at Oxford Brookes University.

Mike Coyne

System Simulation


                            London
                            Bedford Chamber, the Piazza, Covent Garden
                            WC2E 8HA
                            UK
                            ++44-20-78367406
                           

Mike Stapleton

System Simulation


                            London
                            Bedford Chamber, the Piazza, Covent Garden
                            WC2E 8HA
                            UK
                            ++44-20-78367406
                        

George Mallen

System Simulation


                            London
                            Bedford Chamber, the Piazza, Covent Garden
                            WC2E 8HA
                            UK
                            ++44-20-78367406
                        


Abstract


Computer software making use of vector graphics to produce images has been present since the 1950s and vector graphics applications have been extensive. They range from computer art and computer-aided graphic design to science, computer-aided engineering and cartography. Ensuring that there can be access to vector graphics material in the future poses a series of challenges. With changes in technology, information systems lose their ability to access digital material created in earlier forms. This paper reports the results of a study into the preservation of vector images carried out by the authors under the Digital Preservation and Records Management Programme of JISC, the Joint Information Systems Committee of the UK Higher Education funding bodies. One of the key recommendations of the study was that WebCGM, SVG 1.1 and PDF/A be used as archival formats for 2D vector graphics. We explore the positive and negative aspects of SVG in this context.

The study was grounded in a framework provided by the JISC-Funded InSPECT project. InSPECT is building on earlier work and, in particular, an approach to digital preservation based on the notion of significant properties. The notion owes much to work by the National Archives of Australia (NAA) and The National Archives in the UK, as well as a series of earlier JISC-funded projects.

A novel aspect of our approach was to marry the notion of Significant Properties (“the characteristics of digital objects that must be preserved over time in order to ensure the continued accessibility, usability, and meaning of objects”) with the levels of abstraction for computer graphics identified in the ISO/IEC Computer Graphics Reference Model. This led to a framework for identifying and recording Significant Properties of vector images. Case studies of the approach are discussed, along with a pilot tool to aid in the extraction of significant properties (with human input) from SVG documents. The study raised questions about why digital objects are being preserved, by whom, and for what purpose. We also open up these issues to wider debate.


Table of Contents

Introduction
The Study
Background and Context
Methodology
Approach
The Computer Graphics Reference Model
Significant Properties and Metrics
Tool Support
Case Studies
Archaeology
Computer Art
SVG as an Archival Format
Conclusions and Recommendations
Acknowledgements
References

This paper is based on a study carried out by the authors for JISC (Joint Information Systems Committee), a UK organization whose mission is "to provide world-class leadership in the innovative use of ICT to support education and research". Outside the UK JISC is probably best known as the organization that funds the JANET network. JISC activities cover a broad spectrum of strategic themes including networking, access management, e-Learning and e-Research and digital environment. The digital environment theme includes digital preservation as well as digitization and digital repositories programmes [JISC08a]. JISC has funded an extensive programme in Digital Preservation and Assets Management, which started in June 2000 and is due to end in July 2009. The authors were commissioned to carry out a study on presevation of vector images, in the context of an over-arching approach called Significant Properties. The full report is freely available at [CoyneDHCSM07]. At the same time, JISC also commissioned studies on moving images, software and e-Learning objects; these reports are also freely available [JISC08b].

The next section describes the background to the study and the context in which it was carried out. The main aspects of the approach will then be described together with some case studies showing its application. The paper concludes with a discussion of the key recommendations made in the full report.

Digital preservation has been driven by the professional concerns of archivists and is sometimes naively seen as an extension into the digital age of the work of archivists, librarians and other professional guardians of material from the past. For many centuries correspondence consisted of an exchange of letters, usually on paper; modern correspondence is more likely to be based on an exchange of e-mails and this would suggest that the task of archivists now is to preserve important correspondence in this form. Similarly, books and articles are increasingly accessed as electronic documents, so that the role of librarians in safeguarding significant texts is moving in a similar direction. And so on with still and moving images that are soon likely to be predominantly available in digital form.

In many ways this is a natural extension of thinking about traditional archiving activity, promoted by the obvious consideration that for many activities, the intentions of those engaged are pretty similar to those of a person using traditional media.

  • the range of objects worthy of consideration for preservation has been extended. In the digital age, there are new phenomena: significant data collections, a huge variety of computer programmes and specialised applications, such as those that give rise to e-learning objects;

  • access to digital objects is mediated by information technology; you need a computer and the right software to see (or print out) a digital letter;

  • this technology is subject to continuous obsolescence;

  • with many digital objects a greater degree of interaction is possible than was the case with traditional media.

Digital objects at the most basic level consist of binary data, which requires conversion before the object becomes accessible. Hardware, operating systems and file formats are all subject to continuous change and obsolescene and this poses challenges in order to ensure that access continues to be possible. A number of different approaches to the problem have been proposed; the approach that formed the context for this present study is the migration approach - moving digital objects to new media and/or new file formats as obsolescence sets in. A migration-based approach immediately raises the question of how faithful the migrated object should be to the original, given that migration may incur loss.

The JISC study was carried out in the context provided by another project, InSPECT (Investigating the Significant Properties of Electronic Content Over Time) [InSPECT08]. InSPECT has embraced the migration approach “the current consensus about digital preservation holds that approaches that are data-centric, ie. concerned about keeping the data object usable over time, offer better prospects for success than those which are process-centric, ie. concerned to keep original software and/or hardware environments operational over time”. InSPECT is clear that for the authenticity and integrity of a digital object to be preserved, it is not necessary to preserve it exactly in its original form. It is asserted that “a record is considered to be essentially complete and uncorrupted if the message [it] meant to communicate in order to achieve its purpose is unaltered”. There are parallels here to the “Do you see what I mean” question in visualization ([DukeBDH05]). The conclusion is that “any successful preservation strategy must reconcile the perceived requirement to maintain the authenticity and integrity of the logical information object, with the inevitable transformation of the technical environment in which the object resides”. The approach is underpinned by the notion of a performance model developed by a digital preservation project of the National Archives of Australia (NAA) in 2002 [HislopDW02]. In the NAA project, the focus was on the preservation of digital records. InSPECT regards the approach as being generalisable to all forms of digital object.

The performance model sees each interaction between a data source and the technology that presents it as a performance and recognises that “a source may be mediated by many different software platforms, and each combination of source and specific process platform may produce a slightly different performance”. The view is that this conceptualisation of digital object shows that “neither the source nor the process need be retained in their original state for a future performance to be considered authentic. As long as the essential parts of the performance can be replicated over time, the source and process can be replaced” ([HislopDW02], page 11). The performance model is illustrated in figure 1.


A source combined with a process creates a performance of the object. The strategy for preservation is to transform the original together with any related information available, to result in a second rendering that has the essence of the original. Essence is defined by a set of Significant Properties, such that if this set of properties is retained, the essential characteristics or essence of the original will also be maintained. The InSPECT project has identified five categories of Significant Properties:

  • content, e.g. text, image, slides

  • context, e.g. who, when, why

  • appearance, eg. font and size, colour, layout

  • structure, e.g. embedded files, pagination, headings

  • behaviour, e.g. hypertext links, updating calculations, active links

The present study focused on the appearance category, though we later make some remarks about the context category.

It is worth remarking that the notion of Significant Properties might also find application in the development of test suites, e.g. for SVG. A test suite serves not only to illustrate correct behaviour, but also to document the allowable limits on correct behaviour, in other words to express the Significant Properties that must be preserved in a rendition of a test object in order for the rendition to be considered valid.

The methodology adopted was to follow a series of developments that would enable the project team to progress systematically from an initial definition of how the concept of Significant Properties applies to vector images, through proving and refining exercises to the formulation of templates that express the Significant Properties of vector images in standard ways. The methodology thus consisted of consistently applying this process of elaboration and refinement throughout a series of activities, designed to provide evidence and enhance understanding of distinct aspects of the core issues.

As the study progressed, the importance of the context of preservation emerged and the need for further work in this area emerges in the recommendations made by the study. Computer graphics might well be a very good driver for this, but it was our view that this debate needs to build on the foundation established in this project, while taking place in a wider context than the current study could properly provide.

At an early stage it was realized that the ISO/IEC Computer Graphics Reference Model ([CGRM92], [Carson93]) might provide a framework for characterising significant properties and for providing a metric for expressing Significant Properties. A number of members of the project team had contributed to the development of the reference model in the late 1980s. The important point about the CGRM is that it identifies a number of levels of abstraction (five to be precise) in the generation of a computer graphics image, and it develops the idea that properties of the final image may be bound at different levels in this model, thus providing, for example, a way to distinguish between a usage of colour that is of primary importance to the application, and a usage that is purely to enable one group of primitives to be distinguished from another. This idea was put to the test in the first of a series of project workshops.

The initial ideas were then developed into a first working paper that was presented to a wider audience at a second workshop, which included representatives from the scientific visualization and file format (in particular PDF/A) communities. This workshop enabled the team to explore the notion of performance in more depth. The workshop confirmed that the project should focus on three candidate formats for preservation of vector images: CGM ([CGM], [DuceHH02]), SVG [SVG] and PDF/A [PDF/A]. CGM, the Computer Graphics Metafile, has a Web profile, WebCGM [WebCGM], which was included along with the full standard. PDF/A is a file format based on PDF “which provides a mechanism for representing electronic documents in a manner that preserves their visual appearance over time, independent of the tools and systems used for creating, storing or rending the files” (from the ISO standard); the key to achieving this is that the documents are self-contained. The next step then was to identify a set of properties common to these three formats and an organisational framework over them.

It became clear that there were strong similarities with questions that concern visualization and accessibility researchers, in particular when a subject views a visualization do they see what the author intended (“Do you see what I mean?”), and the dual question “Do I mean what you see?” that arises in bottom-up approaches to accessibility in which a third party interprets the visualization to a subject. These somewhat philosophical discussions enabled the project team to select a few representative application areas where the notion of meaning is very different. The choice was also driven by the realisation that there are primarily three ways of generating vector images:

  • by generation from data;

  • directly;

  • by extraction from a raster image.

The approach taken in this study focuses on the first two of these. The areas included: science, engineering, cartography, statistics, graphics design and computer art. Computer Art often falls into the second way of generating vector images above; the other areas usually fall into the first category. The work was then taken forward through case studies involving examples from these communities. The list of application areas was not based on any particular taxonomy, but it was thought that consideration of these particular areas would be representative of the problems likely to be encountered across a broader range of areas.

As an aside, one interesting example discussed was the preservation of pianola rolls; they can be scanned, and recreated to play on a physical instrument, or on a virtual instrument. Faults and imperfections can be removed during processing. This latter kind of processing can be highly contentious and the discussion highlighted the need to be able to record the processing that has been applied during preservation, what was thought to be of significance before processing and how this was preserved or modified as a result of processing. Similar issues surely arise in the preservation (and restoration) of images.

The approach to defining and documenting Significant Properties that emerged from this study is described in the next section.

The process of creating graphical output can be thought of as a binding process. Typically graphics ssytems are based on a pipeline model where each stage of the pipeline adds more specific information (or refines existing information) about, say, the position and appearance of primitive elements. The Computer Graphics Reference Model [CGRM92] supports this view. The notion of significance can also be linked to the notion of trade-off between properties: for example if it is necessary to distinguish one set of lines from another, this might be done using colour, or linestyle; the choice might be made late in the binding process dependent upon the characteristics of the device on which the lines are to be rendered.

This idea is illustrated in figure 2. The same dataset is used in both visualizations. For the left-hand image colour is used to differentiate the data series; in the right-hand image linestyle is used. In this case neither linestyle nor colour are core properties of the visualization. What is important is that the observer should be able to differentiate between the four data series. In other applications (an example from cartography will be given later) linestyle might have a precise meaning for the application and such a tradeoff between linestyle and colour would not be permissible. In civil engineering there are symbolic conventions for representing different types of material, for example those registered as hatch styles in the International Register of Items for CGM [CGMRegister]. In the telecommunications industry, linestyles may be used to denote co-axial cables of specific diameters [SCTE03].


The Computer Graphics Reference Model captures these notions, by recognizing different levels of abstraction in the creation of graphical output. (CGRM also deals with graphical input, but that is beyond the scope of this paper.) CGRM discerns five abstract levels, called environments.

  • construction

  • virtual

  • viewing

  • logical

  • realization

The model is illustrated in figure 3.


A model of the entity to be displayed is established in the construction environment. Properties that are intrinsically important are bound at the virtual environment to generate a scene. For 3D, and also 2D, the viewing environment defines what part of the scene is to be viewed (called the picture) and how. The logical environment binds those properties regarded as styling rather than content to create a graphical image to be rendered on a display. Properties bound at the virtual environment are more significant than those bound at the logical environment.

There is a helpful analogy with film-based photography (still images). Processing a film to produce a photographic negative, is in some sense akin to the viewing environment. Many properties of the image are defined in the negative, but the relationship between negative and print is not unique. Further information is bound in when the negative is printed, or put another way, further choices are made. These include the choice of paper (glossy, satin, ...), the region of the negative to be printed and the size of the print, together with factors determined by the print procss such as the exposure time, and processing conditions. Given the choice, the properties considered significant in a final print may determine whether the print itself or the negative should be archived. In CGRM terms, we could think of the negative as the Scene and the print as the Display. The process of editing/cutting a film can also be conceptualized in terms of Model, Scene and Picture.

This model also points to one of the reasons, given the choice, for preserving a digital image at the vector rather than the raster level: the vector level provides much more opportunity for re-purposing the image than the raster level does.

Using the Computer Graphics Reference Model, it is possible to define some broad categories for the value of Significant Properties as follows:

  • Property is significant in the scene of the virtual environment.

  • Property is bound to the graphical image of the logical environment.

  • Property is used, but in a minor way and, if missing, little information would be lost.

  • Property is not used at all or has no significance.

The challenge is to identify the Significant Properties of the original vector graphics and ensure that any migrated vector graphics retains the Significant Properties when performed (rendered).

There is a reasonably standard set of primitives in all the vector graphics systems in use and a reasonably standard set of styling that is applied to them. However, which are significant varies from application to application. An application may require the metrics of the original data to be kept or it may only require the structure to be retained. Line and area attributes may be used just for differentiation or their exact values may be significant.

If attributes were being used just for differentiation, it should be possible to substitute one presentation for another without any loss of information. Two lines drawn in red and green could be drawn as solid and dashed if the colours were just being used for differentiation. In figure 2 colour is not significant. Some method of differentiating between lines is important but the method is not significant. It should be possible to retrieve temperature values from the graphs. Changing text orientation is of minor significance. The legend means that it could be deduced that the Y-value is temperature in degrees Fahrenheit.

To allow some gradation in importance, it would be possible to give a significance value to a property in the range 0 to 9 as follows:

  • 0: property has no significance, it is not used.

  • 1 to 3: property is used but does not have any major effect on the diagram.

  • 4 to 6: property is used and different values of this property must be differentiable in the diagram. However, substitution by another property would not be significant.

  • 7 to 9: property is used and is significant. Not rendering it or substituting another property for it will cause a serious loss of information.

The aim would be to use the values 0, 2, 5, and 8 as the main differentiators and then use the values above and below to shade the significance of properties when there are several with the same value.

Based on the properties in SVG, WebCGM and PDF/A, a tentative property hierarchy was developed. It is important to stress that this hierachy is intended to illustrate the idea, it is not offered as a definitive contribution. The hierarchy was based on the observation that there are relationships between properties, for example:

  • line cap and line join can only be significant if line width is significant.

  • text properties are only significant if there is text in the diagram.

  • If there are no filled areas, all the properties associated with area filling can be ignored.

  • Colour is not significant in black and white drawings.

The property hiearchy is illustrated in figure 4.


The idea is that such a hierarchy could be used as a basis for a questionnaire for an author or archivist, tailored to the content of a diagram. A prototype tool to support this process is described in the next section.

The root of the diagram is the node diagram. The first level of questions concern the structure, style and primitives used in the diagram. These nodes are shown in blue.

Structure within the graphic is frequently used to improve the quality of the definition of a vector graphic drawing. In SVG, the g element is used to break a large diagram into parts, those parts into sub-parts and so on. In consequence, each group is related to its parent by some geometric transformation giving its position relevant to the parent. Particularly relevant in the CAD and mapping area is the ability to group parts of the vector graphic drawing into layers where individual layers can be made visible or not. This gives the user the ability to declutter a drawing so that the parts relevant for a specific task are visible and non-relevant parts are not displayed. It is important that any transformation of the diagram retains hierarchical or layering structure.

The style question establishes how opacity and tone are used in the diagram and whether there are constraints on the quality of rendering.

If a diagram contains text primitives, the text question will establish the Significant Properties of text in this context. The text attributes are associated with questions about the glyphs, strings, bounding boxes and writing direction of text in the diagram.

The Significant Properties of paths, areas and points are elicited by the paths, path construction, areas and points questions. (To avoid confusion, we note that the reentrant question in the figure is associated with the property interior definition.)

As an example of this approach, the Significant Properties of figure 2(b) are shown in figure 5. The figure illustrates which properties are significant ; properties not in the figure have no significance; however this representation does not display actual values of properties. Some other graphical representation could be used to denote the level of significance of each of these properties.


The approach outlined above was developed further in the context of SVG by mapping the hiearchy of properties into XML and exploring how that process could be automated, using the SVG document itself to constrain the set of properties that needs to be examined. Properties were represented as elements in an XML document; an alternative approach would have been to use RDF to make statements about properties, and to use the web ontology language, OWL, to express the underlying datamodel and its constraints. For piloting purposes, XML offered the easier route.

Figure 6 illustrates the approach.


The aim of tool support is to generate a list of candidate properties, based on the content of the SVG document. The first stage of processing takes the candidate SVG document and a list of all potential properties (AllProp.xml). An XLST transformation then generates a list of potentially significant properties for that document, based on the property hierarchy. A second XSLT transformation is then used to generate an XHTML or XForms document which the archivist can then use human judgement to supply values for these properties. This is the stage labelled “archivist input”. In the final stage the property document is transformed to generate a human-readable property report.

A pilot implementation of the first XSLT stylesheet has been developed; its use is illustrated in the next section.

The Computer Art area poses interestingly different challenges, not least because of the nature of artistic content. As we have argued above the two main methods of generating vector graphics are those that are data driven and those, often art works, which are generated directly. By "directly" we mean that the graphic is the end product of the artist's subjective intention realised through the computer graphics production process. The judgement about whether the result is "good" or "effective" is often then subjective and can't be done by reference to a pre-existing visual vocabulary, such as engineering drawings, bar charts, histograms and so on.

Figure 8 shows some early examples of art works from the Computer Arts Society's collection of early works now held at the V&A museum in London.


A brief examination of some of this collection and applying the proposed approach revealed that:

  • The significant properties of early computer art works do vary.

  • In only one case was text significant at a metric value of 9.

  • Several were significant at a metric value around 5 and a few had values around 2.

  • Most of the line drawing needed a reasonable fine line but otherwise the line thickness was not significant. Only one line drawing made significant use of thick lines.

  • Many consisted of a regular pattern of characters not used as text but because of their overall grey scale intensity.

  • Only a few of the area filled works had subpaths that required the fill-rule to be specified.

  • Colour was primarily used for differentiation. Changing one value of red for another would be of little consequence. One or two had sufficiently precise differences between colours that they could be called significant.

In summary, for this set of early computer art works, the significant properties do bring out the different types of works. The metric is also useful in giving a view of the significance of the property. At least four of the five layers of the CGRM are used and, in the case of the presentations of multiple instances of an object in some relationship, it can be argued that all the five layers of the CGRM have been used.

Discussion of the CGRM model applied to art images has revealed some aspects which seemed to question the immediate technical concepts underpinning the model. These revolve round the perennial question of “what is art?” and the associated philosophical or aesthetic issues of interpretation and how such subjective and culturally dependent elements can or should be taken into account in preserving digital objects in this category. Paul Brown (an internationally recognized computer artist and Vice Chair of the Computer Arts Society) observed:

I've reconstructed many of my early plotter works using contemporary technology. To me these are essentially 'identical' or maybe even better than the originals but the art world is unlikely to agree. This is the distinction between an artist working in the “conceptual domain” and an art world addicted to the unique artefact. Somewhere in here is the concept of the “original”. Many artists (e.g. Verostko and Hebert) keep old plotters alive in order to maintain their unique characteristics.

These comments point to the need to document the reasons why a digital object is being preserved and precisely why it is being preserved in a particular way. Significant Properties of the kinds identified in this study have a role to play in this, but a much broader context of metadata is required in order to embrace all the aspects of this. Since different players (artist, critics, conservationists, etc) may well have very different perspectives on what exactly constitutes “the original”, there will potentially be many ways to preserve this in digital form which are equally valid modulo the perspective. Hence it is vitally important to be able to record metadata about the perspective along with the more fine-grained metadata about the specific properties of the vector image and their significance levels in a particular perspective.

All three of the 2D vector graphics formats considered: WebCGM, PDF/A and SVG 1.1 are well defined and well supported. Of the three, PDF/A is the only one that is specifically defined for archival purposes. Both SVG 1.1 and WebCGM files can rely on external information that is not automatically contained in the archival file. CGM symbols can appear in a separate symbol library. There is no possibility of including font definitions within the WebCGM file as there is with PDF/A. For all three formats thre are tools available to convert between these formats and proprietary formats, to some degree of accuracy. Proprietary applications frequently provide import/export for some or all of these formats.

SVG 1.1 does have the ability to define the fonts used within the SVG document and it is possible to serialise the SVG 1.1 so that it can be rendered in a single pass. It would be quite simple to define a checker to ensure that an archived file does not contain external links.

Whilst animated images were outside the scope of this particular study, the declarative animation functionality of SVG provides a powerful mechanism for preserving animated images, not least because this functionality avoids issues such as interlacing and number of frames per second that pose significant challenges to film/video industries.

The XML representation of SVG makes this format particularly attractive, given the availability of transformation tools such as XSLT to manipulate SVG documents.

There are however some areas in SVG could be improved from the standpoint of archiving. Layering is important in many applications. Whilst the g element can be used to markup layers, this element is in practice used for many other purposes, and explicit support would be appropriate. The output primitive set in SVG is reasonably rich, though it is noticeably leaner than CGM. Support for Bezier splines is useful, but richer spline support (for example inclusion of NURBS), would widen its applicability. It can be argued that the properties/attributes model in SVG is deficient with respect to the CGRM. The issue here is one of control. Because of the general CSS styling framework in which SVG is situated, SVG's presentation attributes do not mirror the individual attribute concept of earlier graphics standards. In the latter case, individual attributes were used to indicate that the value of a property was in some sense essential to the presentation of the primitive (for example the colour must be red), whereas so-called bundled attributes were used to indicate that the property was being used for differentiation and could be given a value in a display environment dependent way. The standards also had the concept of Aspect Source Flags to control whether a bundled or individual value should apply when both were defined. Because the presentation attributes in SVG have lower priority than CSS style rules, this kind of control is not available in SVG.

The study broadly endorsed the InSPECT approach. From organizations consulted, it seems that the preservation of vector graphics is currently not widely practised. Where vector images are generated from underlying data, it is more common to preserve at the application data level; though we note that if reconstructing vector images from such data is a concern, then preserving the data alone is not sufficient. It is also more common to preserve images in fully rendered (e.g. raster image) format, though this too raises issues, not least concerning colour spaces.

Preserving vector images offers flexibility, akin to preservation of photographic negatives, i.e. the ability to re-render with equivalent or changed parameters. SVG Print (work in progress within W3C) appears to offer a format in which all rendering parameters are bound.

We note that although all three standards considered capture structure at the level of object insertion, grouping hiearchy and layering, there is little ability to capture constraints, such as "Box A is connected to Box B". Use of CSS styling by SVG does provide a way to manage tradeoffs between properties (e.g. use of either colour or linestyle for differentiation), though this is not explicitly recorded. The individual and bundled attribute model of the early graphics standards [DuceHH02] provided a more explicit mechanism for this purpose.

The recommendations for further work include:

  • Currently, we recommend that WebCGM, SVG 1.1 and PDF/A be used as the archival formats for 2D vector graphics;

  • PDF/A is a profile of PDF 1.4, for archiving purposes, and requires that all the information necessary for displaying the document is embedded in the file. If SVG is to be used for archival purposes, it would be appropriate to develop a corresponding archival profile.

  • A review of the conversion tools available between CGM, PDF/A and SVG would be appropriate. We have noted the existence of robust tools on some of the paths between them but not all;

  • In the context of conversion tools and conversion services, Significant Properties and a Significant Properties report of the kind explored here could be used to drive the conversion process (to indicate what should be done) and could also be an output of the conversion process (to indicate what compromises had to be made, what might have been lost);

  • We would recommend further investigation of the possibility of developing automated tools for the extraction of information regarding the Significant Properties of vector graphics;

  • Some of the lessons learnt during the development of CGM within ISO/IEC have been picked up by W3C and OASIS in the development of WebCGM and also by W3C in the development of SVG. We are thinking particularly here of the importance of test suites as complimentary to the formal standards documents themselves in defining the correct behaviour of interpreters. There is a strong case for ensuring that test suites (and their preservation) feature in a digital object preservation strategy alongside preservation of the most formal (and normative) definition of a standard available; We also note also in this regard that PDA/A Competence Center is developing a test suite of documents for validating compliance in PDF/A products;

  • Investigate RDF/A and related work for adding metadata to XHTML and other XML applications, in particular SVG, via a set of additional attributes to existing elements. This would be a way of adding constraint-related metadata to vector graphics. We recommend that it be investigated further, including the definition of an appropriate ontology for vector graphics.

  • Address the limitations of SVG as a preservation format, pointed to in section SVG as an Archival Format.

  • We have only had a cursory look at 3D vector graphics, animation and interaction. A similar exercise would be appropriate for 3D vector graphics looking at the merits of VRML, X3D and PDF/E at the least.

The contributions of the following individuals to the original study are gratefully acknowledged: Kevin Ashley, Jon Blower, Paul Brown, David Cruickshank, Douglas Dodds, David Duke, Gareth Knight, Nick Lambert, Brian Matthews, Kieron Niven, Alan Shipman, Alan Smith and Will Wilcox. Funding from JISC under the Digital Preservation Programme is also gratefully acknowledged.