Keywords: SVG, XSLT, Graphical Stylesheet, Sankey
After receiving his Ph.D. in Mathematical Physics from Yale University in 1989, Philip spent a year as Assistant Professor of Physics at Knox College, followed by four years as Assistant Professor of Mathematics at the University of Toronto. His background in Differential Geometry and in computer modelling of physical phenomena served as unorthodox preparation for his subsequent move into industry as a Software Engineer with an emphasis on Computer Graphics. By 1997 Philip was in charge of a software research team creating early Web technologies based on HTML, XML, CSS and Java. Philip now lives and works in Vancouver, Canada, where he is President of SchemaSoft (http://www.schemasoft.com/), a software development consulting company he co-founded in 1999. He is an Advisory Committee Representative of the World Wide Web Consortium (http://www.w3.org/), and has been a member of the W3C Scalable Vector Graphics Working Group (http://www.w3.org/Graphics/SVG/) since its inception in 1998. Philip is Chair of the BC Advanced Systems Institute International Scientific Advisory Board (http://www.asi.bc.ca/). He is also a Director of the Vancouver XML Developers Association (http://www.vanx.org/), an organization that he co-founded in 2000. He regularly writes and lectures on topics related to software engineering, XML and SVG.
Mark graduated from the University of Waterloo after serving co-op terms at Sybase, Microsoft and Paradigm Development Corp. After graduating, he spent a brief stint at Algorithmics in Toronto before rejoining his friends from Paradigm at SchemaSoft.
This paper will describe a method for producing SVG drawings of directed, weighted graphs. In particular, the focus is on weight functions that satisfy flow conservation: the sum of the incoming weights equals the sum of the outgoing weights at all nodes except sources and sinks. Such a graph is particularly appropriate for representing flow through a system. Examples include traffic flow in a city, energy flow in a production plant, water flow through a network of pipes, Internet packet flow, the flow of money through a national economy, and so on. Drawings of these graphs are sometimes called Sankey diagrams.
Simple graph-rendering algorithms will often produce a confusing jumble of overlapping nodes or tangled edges on any but the smallest data sets. The challenge lies in producing an algorithm that renders the graphs in a legible and complete manner for a wide range of typical input. All information including weights must be visually represented, accessible and clearly conveyed. We were able to meet this challenge head-on after imposing the following two practical constraints:
In fact, we will demonstrate a graphical stylesheet that can generate interactive and attractive SVG representations automatically.
1.1 Modeling Systems with Flow
1.2 Sankey Diagrams
2. Overcoming the Obstacles
2.1 Programming in XSLT Without External Language Bindings
2.2 Optimizing Layout for Readability
2.3 Geometrical Manipulations
3. Sankey Diagram Stylesheet
3.1 The Stylesheet
Suppose we have some information representing the transmission of ingredients in through a candy factory. One simple way to represent this information is using a directed, weighted graph, as in Figure 1 .
In this diagram, the weight function on each edge represents the masses of the ingredients as they are combined to make new substances and, finally, finished candies. The nodes at the top represent ingredients or sources; the middle nodes represent machines that turn ingredients into something new; and the bottom nodes represent the finished products, which we will call sinks.
This candy factory makes chocolate bars and caramel core candies. Caramel core candies are coated with green and red candy stripes, while the chocolate bars made by this factory contain both caramel and peanuts. While ingredients normally flow in one direction, the chocolate bar maker wastes some chocolate. Fortunately, this chocolate is recovered and recycled back into the chocolate mixer.
This type of directed, weighted graph has some additional restrictions not shared by all such graphs. First, except for the sources and sinks, the sum of the weights of all the edges flowing into a node is equal to the sum of the weights of the edges flowing out. This is because candy ingredients follow the Law of Conservation of Mass. Of course, many other real-life systems exhibit this property: national economies, power grids, computer networks.
The second restriction is defined less rigorously. Note that this graph almost has four ordered partitions. Although chocolate can flow back "upwards" to the chocolate mixer, most of the ingredients flow in one direction through the factory. Furthermore, most of the ingredients eventually participate in all four stages of the diagram, except for peanuts, which skip the preprocessing stage. This loose partitioning of the system makes it easier to arrange this graph in an understandable way.
Although the diagram in Figure 1 does contain all the information that must be presented, it does have some practical drawbacks:
It is for these reasons that Sankey diagrams ( Figure 2 ) are often used.
That Sankey diagrams can be difficult, time-consuming, and uninteresting to render by hand is left as an excercise to the reader.
However, Sankey diagrams do add an indisputable expressive power to a standard mathematical rendering of a graph. When professionally constructed, Sankey diagrams represent flow in a manner that can be understood by anyone, instantly.
The benefits of being able to generate these diagrams automatically, anytime, are obvious. If, for instance, the recipe for red candy were to be adjusted slightly, it would be nice not to have to redraw this diagram by hand.
Ideally, this diagram could be generated using industry- standard technologies that are supported on many platforms. XSLT and SVG seem like the perfect combination of technologies to solve this problem! The data could start as XML, be transformed automatically by a stylesheet, and come out as an attractive SVG diagram. The XSLT would then be an instance of a Graphical Stylesheet , the general theory of which has been discussed in a previous paper [GS] .
Unfortunately, there are several obstacles to this solution. First, it is clear that many nontrivial mathematical calculations would have to be performed to produce the appropriate SVG. However, according to the third paragraph of the XSLT 1.0 specification,
"...XSLT is not intended as a completely general-purpose XML transformation language."
XSLT is is not intended to be a programming language, but if the stylesheet is to work, XSLT must be used for just that purpose.
Second, Figure 2 shows that optimally laying out Sankey diagrams is difficult even for humans. Overlapping edges are inevitable, and in the worst case, could completely obscure the meaning of the diagram. The stylesheet must use a variety of techniques to ensure that all edges are visible to the viewer.
Third, although SVG contains powerful primitives for rendering arcs, the arcs in properly laid out Sankey diagrams are so specialized that SVG has no intrinsic primitives for them. So some interesting geometrical manipulations must be performed to determine the coordinates and parameters of these arcs.
XSLT is intended to style text. Despite having a primitive facility for function templates, XSLT was never meant to perform general logic. Nevertheless, function templates can be used recursively in the style of a functional programming language.
As an example, the function in Figure 3 trims spaces from the end of a string.
<xsl:template name="trim_right"> <xsl:param name="string"/> <xsl:choose> <xsl:when test="substring($string, string-length($string), 1) = ' '"> <xsl:call-template name="trim_right"> <xsl:with-param name="string" select="substring($string, 1, string-length($string) - 1)"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:value-of select="$string"/> </xsl:otherwise> </xsl:choose> </xsl:template>
Figure 3: XSLT Function to Trim Spaces
String manipulations such as this are used alot in our Graphical Stylesheet. To deal with XSLT 1.0's limited ability to represent matrices and other nontrivial data structures, these data structures are passed around encoded as delimited strings.
If improperly laid out, many of the tubes of the Sankey diagram can be obscured, as in Figure 4 .
To improve readability, the following techniques are employed:
The result of applying these techniques is exemplified by Figure 5
Each tube of a Sankey diagram must have constant width, and it must retain this property as it winds through gentle corners. Simple mechanisms to connect tube segments do not have the constant width property, such as those in Figure 6 .
To solve this problem, we can take advantage of the fact that the perpendicular distance between concentric circles is constant, as is the perpendicular distance between two parallel lines. Therefore a solution is to compose tubes from both of these geometrical primitives, as shown in Figure 7 .
The straight segments must be tangent to the circular segments in order to maintain the constant width property. Figure 8 shows the geometry involved in routing one edge.
To route a tube of given width from one given location to another requires computation of
(a2,b2) from knowledge of
r2. With the help of similar triangles, this computation can be done in the following three steps:
The last of these involves a square root, and in order to compute functions such as this, it was necessary for us to implement a library of math functions in XSLT.
The Sankey Diagram Stylesheet is an XSLT 1.0 Stylesheet that converts XML data into SVG. Figure 10 shows what the source XML for the candy factory looks like.
<?xml version="1.0"?> <!-- <!DOCTYPE Spaghetti SYSTEM "Metaflow.dtd"> --> <Spaghetti title="Candy Factory"> <Partition name="Ingredients"> <Node name="Sugar" weight="50"/> <Node name="Red Dye #2" weight="5"/> <Node name="Green Dye #3" weight="5"/> <Node name="Corn Syrup" weight="10"/> <Node name="Milk ingredients" weight="10"/> <Node name="Cocoa butter" weight="10"/> <Node name="Peanuts" weight="10"/> </Partition> <Partition name="Preprocessing"> <Node name="Red candy mixer" weight="15"/> <Node name="Green candy mixer" weight="15"/> <Node name="Caramel maker" weight="20"/> <Node name="Chocolate maker" weight="50"/> </Partition> <Partition name="Candy makers"> <Node name="Caramel core candy maker" weight="40"/> <Node name="Chocolate bar maker" weight="70"/> </Partition> <Partition name="Packaging"> <Node name="Caramel core candies" weight="40" type="arrow"/> <Node name="Chocolate Bars" weight="60" type="arrow"/> </Partition> <Edge n1="Ingredients;Sugar" n2="Preprocessing;Red candy mixer" weight="5"/> <Edge n1="Ingredients;Red Dye #2" n2="Preprocessing;Red candy mixer" weight="5" colour="red"/> <Edge n1="Ingredients;Corn Syrup" n2="Preprocessing;Red candy mixer" weight="5" colour="#FFFFCC"/> <Edge n1="Ingredients;Sugar" n2="Preprocessing;Green candy mixer" weight="5"/> <Edge n1="Ingredients;Green Dye #3" n2="Preprocessing;Green candy mixer" weight="5" colour="green"/> <Edge n1="Ingredients;Corn Syrup" n2="Preprocessing;Green candy mixer" weight="5" colour="#FFFFCC"/> <Edge n1="Ingredients;Sugar" n2="Preprocessing;Caramel maker" weight="15"/> <Edge n1="Ingredients;Milk ingredients" n2="Preprocessing;Caramel maker" weight="5" colour="#EEEEFF"/> <Edge n1="Ingredients;Sugar" n2="Preprocessing;Chocolate maker" weight="25"/> <Edge n1="Ingredients;Cocoa butter" n2="Preprocessing;Chocolate maker" weight="10" colour="#884400"/> <Edge n1="Ingredients;Milk ingredients" n2="Preprocessing;Chocolate maker" weight="5" colour="#EEEEFF"/> <Edge n1="Ingredients;Peanuts" n2="Candy makers;Chocolate bar maker" weight="10" colour="#FFBB88"/> <Edge n1="Preprocessing;Red candy mixer" n2="Candy makers;Caramel core candy maker" weight="15" colour="pink"/> <Edge n1="Preprocessing;Green candy mixer" n2="Candy makers;Caramel core candy maker" weight="15" colour="lightgreen"/> <Edge n1="Preprocessing;Caramel maker" n2="Candy makers;Caramel core candy maker" weight="10" colour="#DDBB99"/> <Edge n1="Preprocessing;Chocolate maker" n2="Candy makers;Chocolate bar maker" weight="50" colour="brown"/> <Edge n1="Preprocessing;Caramel maker" n2="Candy makers;Chocolate bar maker" weight="10" colour="#DDBB99"/> <Edge n1="Candy makers;Caramel core candy maker" n2="Packaging;Caramel core candies" weight="40" colour="lightgreen"/> <Edge n1="Candy makers;Chocolate bar maker" n2="Packaging;Chocolate Bars" weight="60" colour="brown"/> <Edge n1="Candy makers;Chocolate bar maker" n2="Preprocessing;Chocolate maker" weight="10" colour="brown"/> </Spaghetti>
Figure 10: Raw XML Data for the Candy Factory
The nodes in the graph have already been arranged into pseudo-partitions. Nodes are given text names, and are globally identified by the (partion name, node name) pair. A "weight" has been assigned to each node; this is the sum of the weights of the input edges or the sum of the wieght of the output edges, whichever is greater. Node weights are simply there to assist the stylesheet; they could of course be generated automatically. In addition, some colour instructions have been manually added to further enhance the readability of the diagram.
Figure 11 illustrates the result of applying the Sankey diagram stylesheet to the raw data.
XHTML rendition created by gcapaper Web Publisher v2.0, © 2001-3 Schema Software Inc.