Keywords: content-driven, dynamic creation, publishing, printing, SVG, transformation, XML, XSL-FO, XSLT
About the Author:
Richard Jinks has been a developer at Advent Publishing Systems for 18 months working on the industry leading 3B2 publishing software. During this time, he has been responsible for improving support for XML and XSLT processing and for adding native support for SVG graphics CSS Stylesheets and XSL-FO.
About the Product:
3B2 is integrated professional publishing software, providing structured solutions for document production. 3B2 is used throughout the world in the production of structured documents from STM (Scientific/Technical/Medical) journals to technical aircraft documentation, books to online publishing.
3B2 supports many languages, output formats and international standards, including SGML, XML, XSLT, XPath and Perl. Adoption of the growing X standards combine to make 3B2 particularly strong in automatic composition and pagination of long and complex documents for print or electronic dissemination.
In the current economic downturn, publishers need to find new ways of cutting the costs involved in producing their many manuals, catalogues, journals or other documentation.
This area in the publishing market is special because whilst the layout of a manual or a journal might never change, the content inside it will differ from edition to edition. Any technology that can assist or automate the production of the documentation and cut down on the time spent proofing and adapting the layout will help to reduce costs.
One very labour intensive task is the production of diagrams, which is usually a two stage process. First, the XML data needs to be imported into a graphics package where the diagram is produced and saved out as an image. Next, the same XML data is imported into the publishing package along with the image file, where the document can get formatted. Should there be an error in the data, or in the image itself (size, for example), then the image needs to be manually reproduced in the graphics package and reimported back into the publishing software.
Advent 3B2 is a recognised leader in the content driven publishing industry and as from the newly released version 8, has the ability to handle SVG natively. Combined with its existing XML and XSLT processing, or with its new XSL-FO rendering abilities, this manual step in the production can be removed, saving time and money whilst improving the standard of the document being produced.
The raw XML data can be imported into 3B2, where at the appropriate place in the document it can be automatically fed through the inbuilt XSLT processor to produce a diagram or chart in SVG, which can then be placed in the correct place at a size suitable for its position on the page. Any further modifications required to the raw data before publication can then be made once inside 3B2, instantly updating the SVG diagrams.
This presentation will discuss the issues involved in producing content-driven documents. 3B2 will be used to demonstrate how dynamically generating SVG from XML data can simplify the processes involved with publishing documents. Emphasis will be made on how SVG relates to existing specifications such as XSLT and XSL-FO for output into PDF, PostScript or into paginated SVG. Finally, the presentation will cover areas in the SVG specification where improvements can be made to solidify its place in the publishing market, for example extending support for units and colours.
2. Existing methods
3. SVG in Advent 3B2
4. Relationship between SVG, XSL-FO and 3B2
5. Problems with SVG
While the world economy is currently trying to hold off recession, many publishers are finding themselves in a difficult situation. They still need to print their existing portfolio of catalogues, journals and manuals on paper, meet their customers demand to publish the same information on the internet, and at the same time cut costs.
Although this problem is true for most areas of publishing, it is considerably important for publishers that specialise in areas such as financial reports, catalogues, technical documentation, or journals. These types of publication are special in that they are created at regular intervals, need to maintain a consistant quality of output and often have to follow strict standards regarding the way the final printed output looks, but the content contained in them can differ wildly. Thus any methods that can improve the production of such documents can have huge positive benefits on both speed and costs.
This paper demonstrates that SVG [SVG] can improve the workflow of such publications when used in combination with XML [XML] and XSLT [XSLT] to generate data-dependant graphics such as charts or graphs. Whilst the idea of dynamically creating SVG using XSLT is not new for internet based content, it can make a significant improvement to the publishing industry which has been traditionally slow to react to new methods of working. The take up of SVG in publishing has been slower than for the internet, both because publishing tools have been slow to exploit the benefits of XML functionality, and due to limitations in the SVG specification itself - something which is also discussed in this paper.
Traditionally, the production of graphics for documents has been a slow, expensive process, limited both by resources and by what tools are ultimately going to be used to publish the finished document. For example, authors of technical documentation or journals will often generate the graphics themselves by importing their data into a spreadsheet or charting package and embed the generated raster into the document. The publisher will then often require the graphic to be regenerated to suit the output format used, for example to improve the clarity of the picture on a printed page, or to match specific colours, or to adjust the size, etc. Large companies, particularly in financial markets will often have external art departments responsible for creating all graphics. Thus the authors send their data to the art department with any specific requirements, and the resulting raster will again be embedded in the document.
Obviously both methods for creating graphics are time consuming and expensive, particularly for the case where they are created by an external department. If the graphic needs to be corrected in any way, or redrawn for different sizes or outputs, then this will add to the time and costs incurred. If the document is time sensitive, or doesn't result in a revenue stream by itself (e.g. financial reports), then these costs need to be minimised.
There are obvious benefits to using SVG for normal diagrams wherever possible. Not only does it allow for the diagram to be reproduced easily, it can also allow a small amount of flexibility when fitting the diagram on the page. Just as most publishing packages can adjust text size and spacing to fit paragraphs, vector diagrams can also be resized in a similar manner.
However, the benefits of using SVG are best seen with charts and graphs which are dependant on data. As most journals or documentation need to ensure a consistent look between editions, charts and graphs need to have the same appearance each time they are used. Without any automated way of generating them, the author will be responsible for ensuring the style of the charts has not changed. If the raw data is marked up in XML, then it is possible to write an XSLT stylesheet to generate multiple different styles of graphs from bar charts, pie charts, scatter charts, etc. Once the XSLT stylesheet has been written, it speeds up graphic production - mistakes can be corrected and updated immediately, and a consistent look can be ensured between multiple editions. If the publisher requires access to the raw data to regenerate the graphic for different output formats, the XSLT stylesheet can be reused resulting in further savings in publishing costs.
Advent 3B2 is a professional publishing software package, specialising in providing industry-leading solutions for the structured and content-driven publishing markets. Such markets include scientifical, technical or medical journals and documentation, financial reports and publications, books, catalogues and online media. When 3B2 was first created in 1986, it was the first publishing software to be based around SGML with later versions adopting many international standards and languages such as XML, XSLT, Unicode, and Perl. More information on 3B2 can be found on the web site (http://www.3b2.com/).
As of Version 8, 3B2 has the ability to handle SVG natively, either as an input format, or by creating an image from the SVG content embedded in the XML stream. Full control is given to the user over placement and size of the resulting image, placing it in an absolute position on a page, or inline in the text. 3B2 is fully namespace aware, and by nature is designed to ignore anything it doesn't understand, from elements in a different namespace, or elements it is unable to handle such as those associated with animation. As the SVG image would be generated inside 3B2 from the raw data, it is possible to get higher quality output than would be possible from embedding a raster. Thus, the SVG is equally well rendered in Postscript for the printed page as it is in PDF or on a web page.
The figure below shows an example page output from 3B2 along with the raw XML used to generate it.
Figure 1: Sample page generated from raw XML
Although everything shown on the example page can be created using SVG, in practice only the line graph and labels will be drawn in SVG. The remaining content on the page (i.e. the header, footer and table at the top of the page) can be easily generated by 3B2 directly from the raw XML. A full set of files are available on request showing the complete XML file and XSLT stylesheet required to generate the page in SVG as used in the associated demonstration.
Once the 3B2 processing and XSLT stylesheet have been produced, the final output can be created simply by importing the raw XML and associated XSLT stylesheet into 3B2 and choosing which output format to print to. In addition, as 3B2 has a fully WYSIWYG interface with native XSLT and SVG support, any edits made to the raw XML will instantly be reflected on the preview page. For example, if it is discovered that the rate of inflation for March was really 1.7% instead of 1.3%, the only change required would be to modify the rate attribute for March, and the table and graph will instantly change reflecting the new value. Then, by saving the 3B2 document with the XSLT transformation, you have a template available which can be immediately reused to generate new pages once the full set of data for 2003 is available, for example.
In addition to processing SVG, it is also possible to print documents as SVG. 3B2 has multiple output drivers, allowing the user to output their documents in TIFF, PDF, Postscript, SVG and many others. As far as SVG is concerned, 3B2 currently outputs each page as individual SVG documents, with a HTML page to allow the user to view multiple pages together. When the page is printed through the SVG output driver, all the data is rendered using paths and basic shapes. This allows for a higher degree of accuracy in the final output, particularly where large blocks of text are concerned. As all the page manipulation data is stored in the HTML pages, the SVG is clean from extension elements and properties making it compatible for use in different SVG viewers or editors. Similarly, the use of paths also ensures that the resulting SVG files can be used on SVG Tiny and SVG Mobile [SVGMobile] devices.
Although all the benefits that SVG offers the publishing world can also apply to XSL-FO [XSL-FO] , professional publishing packages such as 3B2 still have a large amount to offer that an XSL-FO renderer doesn't. For example: XSL-FO is limited in the different amount of page layouts that can be created; it lacks a number of professional features such as support for Pantone; there is no knowledge of where an object will appear on a page until the document is rendered; and any automatic functionality such as tables of contents or indexes are hard to achieve. A package like 3B2 is able to allow for scripting or design decisions to be made during the formatting process, making it easier to make complex decisions about how to fine tune the scaling of objects, or the spacing between paragraphs or letters, or where to place objects. There are also limits in that the XSL-FO specification currently only allows for the SVG to be placed inside a rectangular box on the page. There are no options for allowing text to run around or inside complex SVG shapes. And whilst you could use SVG to place text on your XSL-FO document, you have the potential for problems with matching font, styles and sizes with the surrounding text generated by the XSL-FO. Problems such as these are easier to solve in a publishing package.
One area that might prove interesting in the future is that SVG1.2 [SVG12] is shaping up to be a good competitor for the XSL-FO specification by itself. Two of the largest additions to the SVG 1.2 specification are text wrapping and multiple page support. Combine this with the functionality currently provided by SVG to place characters and graphics at fixed positions, and it should be possible to write a set of XSLT transformations that is capable of generating more complex pages than those currently offered by XSL-FO. Although a certain amount of simplicity provided by XSL-FO will be lost, once XSL-FO documents reach a certain level of complexity, then the developer of the XSL-FO output is required to perform a high level of proofing and modification in order to ensure that the layout of the finished document is as required. This same proofing time could equally be spent creating a document purely using the advanced features in SVG 1.2
The largest problem with getting SVG accepted by the publishing community is that it is primarily designed and promoted for the web, with most examples relying on heavy use of dynamic content, scripting or animation. As these features are unable to be represented on a printed page, any data stored in the script or the animation will be lost. Thus without careful design, it is difficult to write an SVG document that can use the benefits of both print and screen mediums.
As the drafts for the SVG Print [SVGPrint] and SVG 1.2 specifications are starting to recognise, publishing users have a different set of requirements to web designers. Any flexibility in the specification that might affect the final output increases costs in extra proofing and through the possibility that the final document might require reprinting.
The primary requirement that needs to be met is the use of both CMYK colours and spot colours. This is essential as the publishers need to guarantee that the colours used on the printed page are correct - something that is hard to achieve solely through the RGB colourspace. It is also important to ensure that there is a proper method for supporting Pantone [Pantone] colours, where the specific colour used is dependant on the output method. For example, the actual CMYK values required for one single Pantone colour may differ for screen, coated paper, normal print paper, etc.
Publishing users also have a requirement for fixed units - as the SVG specification is and appears to remain flexible regarding the conversion between mm or inches and pixels, this can create problems when the users either export their graphics from external tools or manually edit an SVG file of any complexity. Although it is possible to achieve a set resolution using the viewBox property, this isn't a very natural way of thinking for a user base primarily involved with absolute measures such as points, picas or fractions of an inch. Pixels are only a useful unit for artists or web designers. For example, if the user wants to place some text in a box, a web user would be comfortable specifying both the text and box in pixels. A user from the publishing industry will automatically think of specifying text as 24 point, at 10mm in from the right hand side. This publishing user will then find problems trying to draw the box as the path element is unable to accept any real-world units. The problem gets worse if the user needs to manipulate an SVG file exported from a graphics editing package. In order to save work designing the final look of any charts or graphs, a publishing user is likely to design a template in an editing package, export as SVG then use XSLT to create the drawing instructions to fill in the details on the graph. If the user has no control over what coordinate space or resolution the exported SVG is in, or is unable to easily calculate at which points to draw the data for the graph, then this will hinder acceptance of this way of working.
It is absolutely essential that the SVG diagram will always look exactly as the author or publisher intended, regardless of what tool is viewing the diagram, or what medium the diagram is published on. Just as this is an issue for colours and units, it is also significant for the way professional editing tools create their SVG. Extensibility through namespaces is one of SVG's biggest advantages and disadvantages. Namespaces allow programs to extend the specification to provide missing functionality, such as improved support for colours or gradients. However if the extensions are not well documented, then this will limit the places in which the SVG can then be used. For example, if a popular graphics editing program allows the user to save as an SVG file, but that program hides a large percentage of detail in external namespaces, then the only tool that can ever render that file correctly is the original program itself. If a user then tries to import that SVG file into a different package, any problems caused by the extension functionality is seen to be the fault of the importing procedure, even though it might be able to import clean SVG correctly. This then leaves the problem of either explaining to the users that the problem was caused by the tool that generated the SVG, or having to reverse engineer the extension functions and playing catch-up with the developers of the originating graphics program.
The ability to generate SVG dynamically from raw XML data will lead to great improvements in the workflows involved in publishing documents. It will reduce the time taken and costs involved in correcting mistakes, and make it easier to reuse the same graphic on multiple output media. It will also set up a good framework to ensure the costs are reduced for future editions or versions of the document.
However although the SVG Print and SVG 1.2 specifications are beginning to address the publishing market, there are still some areas of concern especially when reusing or maintaining SVG created from different tools. Additions such as page definitions and text flowing help to bridge the gap, but there is confusion in the current draft about how multiple pages, units and colours are to be implemented, and no restriction on the SVG Printing requirements on elements from different namespaces. There is still an underlying assumption in the future SVG specifications that an SVG instance will be wholly generated automatically by one single tool - something that might not always be true in the future.
XHTML rendition created by gcapaper Web Publisher v2.0, © 2001-3 Schema Software Inc.