XHTML and SVG: Publishing with concept

Benjamin Jung
Department of Computer Science
Trinity College Dublin, Dublin/Ireland

screenshot PublishingPipelinePublishing with XML technologies follows the concept of component separation. We can define a total of five different components or layers that are important in the overall process: Content, Transformation, Semantic/Links, Logic/Scripting and Presentation. In the ideal publishing world, information of each component would be kept in a separate file, e.g. a DocBook file for the content, an XSL file for the transformation, a TopicMaps file for links/semantics, an ECMA-script file for logic/scripting and a CSS/FO file for layout properties and rendering. The final publication is successively stitched together, pipelined by processing engines for each individual layer and "released" as e.g. XHTML or PDF. Advantages of this procedure include reusability of components, ease of maintenance, and avoidance of redundant information.

This approach has been (in more or less detail) successfully adapted to web publications. However, almost all of the available web publications are solely based on textual content. Images are often treated as static and independent entities which are simply linked from the final publishing. A seamless integration of graphics into the semantic context is hardly ever achieved, associations between image parts and text fragments and/or remote components rarely realised. At present, automated and meaningful "understanding" of relations between the text and images, as predicted by Semantic Web enthusiasts, are far from reality.

Fortunately, SVG already offers all necessary functionality to lay foundations for intelligent and future-proof integration of images into a rich and meaningful web. A physical separation of images into their fundamental components (as known from textual publishing) can easily be accomplished. But the level of granularity to which image data should be broken up is closely related to its context and difficult to set in a graphics environment. Possible methods of resolution and their implications have only started to emerge. An ancient proverb says that "a picture says more than a thousand words", which indicates the vast amount and complexity of information contained in an image. The real challenge is placed before the compilation of the picture: An idea of the resulting image (model) has to be unsnarled into various (often abstract) elements. They are defined and specified independently in their associated architecture components. Beginning from the raw content, the image is then processed in order to enrich and customise the content until it resembles the original model.

This pinpoints "Best-Practise" guidelines for SVG image element definitions with the help of real-life examples. The methodology of image composition and decomposition will be explained by means of examples and compared to textual publishing scenarios. Answers to questions such as the following are given:

Valid XHTML 1.1!