Keywords: Autonomous Image, docbook, Electronic Publishing, Integrated Image, Publishing Architecture, Publishing Concept, SVG, XHTML
Benjamin Jung is a lecturer in Computer Science and Medical Informatics at TCD (Trinity College Dublin) . He is a regular speaker at XML conferences in Europe and the US. His main research interest are data visualisation, document architectures and XML technologies in the domains of medical informatics and electronic publishing. Since 1997, Benjamin has presented papers and chaired sessions at various Computer Science and Medical conferences. He developed full-day XML tutorials and workshops that were given at conferences in Europe and the US. In 2000, Benjamin co-founded deepX Ltd (http://www.deepx.com/), where he holds positions of director and consultant.
Electronic Publishing with tools from the XML (Extensible Markup Language) family of technologies has been increasingly used since the first XML and XSLT (Extensible Style Sheet Language Transformation) specifications were been published in 1998/1999 and supporting processing applications emerged.
This paper describes ideas and solutions of how to migrate the existing electronic publishing procedures widely used in textual publishing into the graphical domain. The best known technique might be the separation of content from presentation information, but additional characteristics such as transformation, linking and semantic aspects are equally important and have to be considered. The methodology of image composition and decomposition will be explained by means of examples and compared to textual publishing scenarios. Answers to questions such as the following are given:
A PDF version of this paper is available (PubConcept.pdf).
2. Textual Publishing with XML
3. Image Publishing with XML
3.1 Publishing Architecture
3.2 Autonomous Image Publishing
3.3 Integrated Image Publishing
4. Outlook: Multimedia Publishing
Publishing with XML technologies follows the concept of component separation. A total of (at least) five different components or layers have been defined that play important roles in the operation. These are (in order of their apperance in the overall process): Content, Transformation, Semantic/Links, Logic/Scripting and Presentation. Ideally, information related to each individual layer is kept independently in separate files. It is only during the final electronic publishing process that data from these files is merged to successively stitch together a final publication. The concept of modularisation through data separation offers a number of advantages, such as easy maintenance, variable reusability, flexible extensibility and high consistency. To achieve the highest possible result each component is managed by the relevant experts, e.g. content by author and presentation by designer. Each component is independent and kept separate from the other ones, i.e. changes in the author's content won't affect the designers work and changes in the designer's layout does not result in inevitable changes to the authors content. This guarantees the reuse of (once validated) information throughout the publishing process and achieves a consistent look and feel. Changes in one of the components will have an immediate effect right through the entire publishing process.
Naturally, textual publishing was the first domain that adopted this concept. Following steady research and prototype phases, first versions of markup and document processing languages appeared such as GML (Generalized Markup Language) and DCF (Document Composition Facility) . Eventually, standards of these early versions were released in the mid 1980s, including SGML (Standard Generalized Markup Language) , a standard for platform-, device- and system-independent electronic representation of text and DSSSL (Document Style Semantics and Specification Language) , a language to define presentation styles. However, these standards were primarily intended for use in textual publishing and only the rise of a simplified subset of SGML , namely XML , paved the way for a substantial use of markup languages in other domains such as graphics.
If not declared otherwise, the expression 'publishing' specifies a method of publishing in its most generic sense. This includes publishing into all kinds of multimedia formats, such as text (e.g. book), graphics (e.g. image), audio (e.g. song) and video (e.g. movie). Secondly, the words 'image' and 'graphic' in the following chapters always refer to synthetic images (see Chapter 3 ).
The classical publishing process can be illustrated as layered progression architecture as shown in Figure 1 . In order to transform a manuscript into a final publication format, the author's original content has to pass through each of (at least) five layers. During this process the document gets assembled and customised according to the rules, defined by the publisher (contractor) and/or requested by the reader (customer). After each processing step (and possible source modification), the result has to be validated and approved before being forwarded to the next layer.
The first stream shows the conceptual view of the process: it starts with the publication's schema definition, followed by content creation and inclusion of metadata, before it is rendered and ready for display at the user's side. The information stream indicates which group of people and professions are involved at the various stages in the information handling. All of them directly relate to phases in the "traditional" method of publishing, where transformations were often executed manually. Examples of XML - and/or web-technologies to define schemes and rules of the different steps are given in the Technology stream.
Not only does each stage of the technology stream relate to a specific task and method in the overall process of electronic publishing, it also keeps its individual processing information (e.g. instructions) in a separate file. Using XML technologies, XML DTD (Document Type Definition) , XML Schema and Relax NG are widely used for schema definitions at present. XML vocabularies such as docBookX (an XML derivate of docBook) are suitable for content mark up and TopicMaps and/or RDF (Resource Description Format) are good candidates for metadata definition, including links and semantic information. Last but not least, transformation rules are defined with XSL (Extensible Style Sheet Language) , in particular its XSLT part, whereas presentation information is kept in FO (Formatting Objects) style sheets for publishing into PDF (Portable Document Format) and CSS (Cascading Style Sheet) for XHTML (Extensible Hypertext Markup Language) styling.
The information and processing separation ensures that content is stored in exactly one source file instead of multiple files of exactly the same content but different formats (e.g. PDF , HTML , eBook); changes take place in the source file and are reflected forward into the different output formats. Other formats can be created on user's request and 'on-the-fly' to support multi-format publishing. A whole set of source files could easily be converted into a new emerging publishing format by just creating another set of transformations from the original source format and a collection of presentation properties to cater for the new layout.
The transformation process as shown in Figure 2 is based on three stages: input, content storage and output. The core of this architecture (also called hub) is the XML document pool, which stores documents in a raw format, e.g. docBookX, independent from the file formats of the input (source) and output (target) documents. Proprietary conversions exist to and from this format using different technologies. Whenever a new source or target format is developed and specified, a single transformation has to be created (again, to or from the XML document pool) to maintain the whole functionality and "upgrade" the system to the new standard.
In general, graphics or images can be categories into two types, namely real world images (mainly raster graphics) that we know from photography and video on the one hand and synthetic images such as cliparts and diagrams on the other hand. The latter, also called 'artificial' images are manually put together and/or programmatically created using a combination of elements (e.g. lines, shapes) and properties (e.g. colours, filters). Examples of synthetic images include visualisations of abstract data but also virtual reality images, ideally saved in vector graphic formats such as SVG for 2D (two-dimensional) images and VRML (Virtual Reality Markup Language) for 3D (three-dimensional) images.
Real world images or raster-graphic images are defined in terms of pixels, the smallest logical unit of visual information that can be used to build an image. They are shot with a distinct size and resolution and each pixel is associated with a specific colour. Applying temporary changes to raster-graphic images, such as zooming and resizing, most often results in 'information smudge', e.g. a person's face in a group portrait become unrecognizable (zoom-out) or facial contours are blurred (zoom-in). Permanent changes on the other hand are often not retractable once the picture is saved. Altering colour properties such as hue, saturation or brightness leads to a change in the colour scheme that can be difficult to undo. Applying filters such as texturizers take the 'destruction' of information another step further as even original contours become irreversible and colour schemes are changed irreparably.
The separation approach as described in the previous chapter has been (in more or less detail) successfully adopted by web publishers. However, almost all of the available web publications are solely based on textual content. Images are often treated as static and independent entities which are simply linked and/or imported into the final publication. At present, a seamless integration of graphics into a semantic context is hardly ever achieved, mainly because a high percentage of images used in today's electronic publications are still raster images that do not easily support the functionality of linking from and to parts within the image. Automated and meaningful 'understanding' of relations between images and its 'surrounding text', as predicted by Semantic Web enthusiasts, are far from reality. Furthermore, even relations between parts or elements within an image are rarely accessible and associations between images are not often realised.
Fortunately, SVG already offers all necessary functionality to lay foundations for intelligent and future-proof integration of images into a rich and meaningful web. An ancient proverb says that 'a picture says more than a thousand words', which indicates the vast amount and complexity of information contained in an image. Nevertheless, a physical separation of an image (decomposition) into their basic components (as described in the previous chapter) can easily be accomplished. The real problem (and challenge) lies in defining the level of granularity to which the image data should be broken up. Is closely related to domain and context of the image and elements are often difficult to classify in a graphics environment.
Naturally, images are looked at in a holistic way. We perceive the entire image as a single impression rather than different components that are compiled together in order to form the whole. Practical solutions to logically create an image from its structural components and their implications have only started to emerge. However, the process is very similar (if not even equal in most cases) to the process of textual publishing and the above mentioned concepts can be adopted more or less unchanged.
Two fundamental techniques can be followed in order to create a basis for a flexible graphics creation, customisation and publication publishing process: bottom-up and top-down. The first approach starts with a hypothetical image, which is more a description than a visualisation of the scene. It is followed by an investigation and analysis of the available data that will be used for the image compilation. A detailed classification and logical definition of the various sources within their associated topologies is essential. The second approach (top-down) begins with the final image, being disected into abstract components and logical elements. This decustomisation process includes subtraction of presentation information (e.g. colours and sizes), definition of dynamic and static parts and degeneration of information into more generic data and data formats.
This process can under certain circumstances (caused by e.g. the initial graphic's author or designer) create havoc with images, as displayed in Figure 3 . Starting from the right, the image was decolourised and the colour information was stored in a separate file. In a second step, all components related to the musical theme were deleted, such as the notes and the violin. In this process, a first problem was encountered, i.e. a single line was used to separate the areas of shirt and fiddlestick. Taking away the last remaining line of the fiddlestick would open the shirt area and make re-colouring the shirt impossible… But it is getting worse when taking away hat and shoes as neither head nor feet are present and the image becomes worthless! These anomalies occur because the images were not composed using logical components (i.e. violin, hat). Instead, presentation features such as autonomous lines and shapes defined areas of the image.
In summary it can be said that both methodologies result in an abstract description of logical image components and supplementary information which is needed to define the transformation rules and rendering settings into one or more output formats. This shows a close analogy to the earlier described textual publishing process, as the image customisation process will similarily start with the raw content, which is passed and processed through the various layers in order to enrich and tailor (customise) the content until it (again) resembles the output image.
|Application Layer||XHTML, PDF||SVG, JPEG||MIDI, PDF||MPEG-4 (BIFS)|
|Presentation Layer||CSS, FO||CSS||Finale Template||MPEG-4 (OD)|
|Transformation Layer||XSLT, DSSSL, fxp|
|Schema Layer||XML-DTD, -Schema, Relax NG|
|Content Layer||Filesystem (XML-) file, (XML-) database|
Table 1 shows XML related technologies for this process compared to technologies used in textual publishing as well as audio- and video-publishing (see also Chapter 4 ). It is important to distinguish between two diverse types of differences between textual and image publishing technologies: a syntactical and a semantic variation. A semantic difference can be noticed in the schema or schema layer, where e.g. two XML - DTD s use the same syntax but describe two different vocabularies, i.e. a different semantic. Interestingly, content should be defined and saved in a domain-specific but layout- and format-independent structure, and therefore, both techniques (textual-, graphical publishing) could possible use a similar (or even the same) content storage vocabulary (in XML syntax). But different XSL transformations would provide conversions into e.g. a text file to describe the image and the image itself (for an example see the following chapter). However, this is not always adequate. The second and syntactical difference, naturally, would be in the output format as text and images are expected to be created. Here it is interesting that SVG , as a very complete and logically defined XML vocabulary for graphics, is quite often used as content format already in the schema layer. This method should be carefully revised to avoid inconsistencies and redundant information in e.g. a number of graphics. Similar to textual publishing, transformation and presentation information should be stripped off and kept separately, only to be compiled into the final image during the pipeline processing.
Autonomous Image Publishing is a technique where all information from the source ( XML vocabulary) is transformed and compiled into one image, i.e. one single image file. The image is self-contained, independent from other information sources and could be described as a stand-alone application as all information is contained within the image file itself. Figure 4 gives an example of an autonomous image. It shows the first frame of a Dilbert comic strip with both, static elements (background, i.e. the comic image) and dynamic elements (foreground, i.e. the text in the L10N (localization) callouts) clearly identifiable. For different reasons, variations of this image could include:
Figure 4: Autonomous Image: Dilbert comic with L10N frames
Additionally, the image might contain navigational components to e.g. change frame and/or language. However, further information such as descriptions, comments and annotations to the image are not essential.
Another interesting use case of this technique is a set of presentation slides, where the manuscript text is a hidden part of each slide. People having received the presentation would be able to quickly navigate through the set of images until they have found the part that they are most interested in. Here they toggle to text mode and get a detailed description and accompanying information to the slides' content as well as author's annotations.
Besides autonomous image publishing, another type is called integrated or embedded image publishing. In this case, data from the content file is transformed into a set of multimedia files (e.g, graphics, text), each saved in its own file and format and relating to each other. Possible output includes a number of multimedia formats, including images as well as text. Imagine an entertainment map of your holiday destination (see Figure 5 ).
Similar to the model of autonomous graphic publishing, all information is kept in a data store (e.g. the document pool) and assembled on user's request. As in the previous example, static and dynamic data is merged into one single image file and customised according to e.g. a specific date/time and language setting. The image is then embedded into another file, e.g. an XHTML page with related and accompanying information and objects in the image are interlinked with the related part in the text. Hence, the image itself resembles valid information, but is not fully self-contained anymore as the accompanying information is stored outside the image. Saving the image alone would not incorporate the whole set of available data.
In the previous chapters, textual and graphical publications architectures have been explored and compared. But how does this design relate to the other two major multimedia formats, namely audio and video? Questions arise such as:
The answer is easy: theoretically everything is achievable but the implementation is very difficult because of challenging and complex transformations Paragraph 47 . Most of the required technologies are already available and in broad use for textual publishing, increasingly for graphical publishing (see Table 1 ), too. In the majority of the cases they are based on XML and therefore use and aid other applications of the ever growing XML family of technologies. In audio publishing, musical information could be represented using a generic music markup language, e.g. MusicXML Paragraph 47 , transformed using XSLT into a specific application format (Finale Templates) and enriched with presentation information for properties such as speed, pitch and instrument. Video publishing adds the aspect of synchronised information visualisation to the audio design. As an example, adaptive video systems tailor CBT (Computer Based Training) content according to the user's level using XSL transformations and add presentation information, such as presenters visual appearance (e.g. age, nationality) and audio characteristics (e.g. language, dialect).
Summarising can be said that electronically publishing graphics is not far from electronic publishing text if XML technologies are involved. Following the 5-tier textual publishing architecture, graphics are partitioned into five different components which later will be used to compose the final image. Components include content, transformation, semantic, logic and presentation. Technically textual as well as graphical publishing does not differ at all, as both techniques use XML vocabularies for source (most likely proprietary to the user) as well as target (e.g. XHTML , SVG ) documents. Only the software viewer, necessary to display and render the result, has to be available: a simple browser in the case of XHTML or a specialised SVG viewer or plug-in for existing browsers. Although the technical process is equivalent, the practical phase might look fundamentally different at first sight. But at a closer look and after some initial struggle with being spoilt for choice in content definition at varying granularity, both methods become increasingly similar.
A single user can create and maintain a simple website but many professions are involved if it comes to creating an ambitious web presence of an entire organisation. This hypothesis is directly applicable to graphics creation and publishing, where more than just designers are required to create future proof graphic systems. Creating the actual image is just the final step in the process of creating visualisations. They additionally include data authors, data interpreters, people to enrich the data and finally data presenter. Each of these professions is directly related to one of the architectural components.
Implementing the textual publication architecture for graphics, the advantages are multifaceted and already well known: easier maintenance, reuse of composition methods and code and consistent layout are only a few.
XHTML rendition created by gcapaper Web Publisher v2.0, © 2001-3 Schema Software Inc.