Programmatic Rendering of Directed, Weighted Graphs

Keywords: SVG, XSLT, Graphical Stylesheet, Sankey

Philip A. Mansfield, Ph.D.
President
SchemaSoft
Vancouver
British Columbia
Canada
philipm@schemasoft.com
http://www.schemasoft.com

Biography

After receiving his Ph.D. in Mathematical Physics from Yale University in 1989, Philip spent a year as Assistant Professor of Physics at Knox College, followed by four years as Assistant Professor of Mathematics at the University of Toronto. His background in Differential Geometry and in computer modelling of physical phenomena served as unorthodox preparation for his subsequent move into industry as a Software Engineer with an emphasis on Computer Graphics. By 1997 Philip was in charge of a software research team creating early Web technologies based on HTML, XML, CSS and Java. Philip now lives and works in Vancouver, Canada, where he is President of SchemaSoft (http://www.schemasoft.com/), a software development consulting company he co-founded in 1999. He is an Advisory Committee Representative of the World Wide Web Consortium (http://www.w3.org/), and has been a member of the W3C Scalable Vector Graphics Working Group (http://www.w3.org/Graphics/SVG/) since its inception in 1998. Philip is Chair of the BC Advanced Systems Institute International Scientific Advisory Board (http://www.asi.bc.ca/). He is also a Director of the Vancouver XML Developers Association (http://www.vanx.org/), an organization that he co-founded in 2000. He regularly writes and lectures on topics related to software engineering, XML and SVG.

Mark Ambachtsheer
Software Developer
SchemaSoft
Vancouver
British Columbia
Canada
http://www.schemasoft.com

Biography

Mark graduated from the University of Waterloo after serving co-op terms at Sybase, Microsoft and Paradigm Development Corp. After graduating, he spent a brief stint at Algorithmics in Toronto before rejoining his friends from Paradigm at SchemaSoft.


Abstract


This paper will describe a method for producing SVG drawings of directed, weighted graphs. In particular, the focus is on weight functions that satisfy flow conservation: the sum of the incoming weights equals the sum of the outgoing weights at all nodes except sources and sinks. Such a graph is particularly appropriate for representing flow through a system. Examples include traffic flow in a city, energy flow in a production plant, water flow through a network of pipes, Internet packet flow, the flow of money through a national economy, and so on. Drawings of these graphs are sometimes called Sankey diagrams.

Simple graph-rendering algorithms will often produce a confusing jumble of overlapping nodes or tangled edges on any but the smallest data sets. The challenge lies in producing an algorithm that renders the graphs in a legible and complete manner for a wide range of typical input. All information including weights must be visually represented, accessible and clearly conveyed. We were able to meet this challenge head-on after imposing the following two practical constraints:

  1. We restrict our attention to the practical problem of directed, weighted graphs satisfying flow conservation.
  2. The layout is constrained by an additional input: an ordered partition on the set of nodes. This additional information permits classification of nodes into conceptual stages in the flow.

In fact, we will demonstrate a graphical stylesheet that can generate interactive and attractive SVG representations automatically.


Table of Contents


1. Justification
     1.1 Modeling Systems with Flow
     1.2 Sankey Diagrams
     1.3 Challenges
2. Overcoming the Obstacles
     2.1 Programming in XSLT Without External Language Bindings
     2.2 Optimizing Layout for Readability
     2.3 Geometrical Manipulations
3. Sankey Diagram Stylesheet
     3.1 The Stylesheet
Bibliography

1. Justification

1.1 Modeling Systems with Flow

Suppose we have some information representing the transmission of ingredients in through a candy factory. One simple way to represent this information is using a directed, weighted graph, as in Figure 1 .

CandyFactoryDWG.svg

Figure 1: Directed, Wieghted Graph Representing a Candy Factory

In this diagram, the weight function on each edge represents the masses of the ingredients as they are combined to make new substances and, finally, finished candies. The nodes at the top represent ingredients or sources; the middle nodes represent machines that turn ingredients into something new; and the bottom nodes represent the finished products, which we will call sinks.

This candy factory makes chocolate bars and caramel core candies. Caramel core candies are coated with green and red candy stripes, while the chocolate bars made by this factory contain both caramel and peanuts. While ingredients normally flow in one direction, the chocolate bar maker wastes some chocolate. Fortunately, this chocolate is recovered and recycled back into the chocolate mixer.

This type of directed, weighted graph has some additional restrictions not shared by all such graphs. First, except for the sources and sinks, the sum of the weights of all the edges flowing into a node is equal to the sum of the weights of the edges flowing out. This is because candy ingredients follow the Law of Conservation of Mass. Of course, many other real-life systems exhibit this property: national economies, power grids, computer networks.

The second restriction is defined less rigorously. Note that this graph almost has four ordered partitions. Although chocolate can flow back "upwards" to the chocolate mixer, most of the ingredients flow in one direction through the factory. Furthermore, most of the ingredients eventually participate in all four stages of the diagram, except for peanuts, which skip the preprocessing stage. This loose partitioning of the system makes it easier to arrange this graph in an understandable way.

1.2 Sankey Diagrams

Although the diagram in Figure 1 does contain all the information that must be presented, it does have some practical drawbacks:

  1. It is difficult to appreciate the relative weights of the edges visually.
  2. The arrows could tangle badly, leaving a bird's nest of overlapping edges.
  3. It is difficult to fit captions and other contextual information into the diagram. A key is necessary to tie the mathematical model to the system it represents.

It is for these reasons that Sankey diagrams ( Figure 2 ) are often used.

CandyFactorySankeyManual.png

Figure 2: Sankey Diagram

That Sankey diagrams can be difficult, time-consuming, and uninteresting to render by hand is left as an excercise to the reader.

However, Sankey diagrams do add an indisputable expressive power to a standard mathematical rendering of a graph. When professionally constructed, Sankey diagrams represent flow in a manner that can be understood by anyone, instantly.

The benefits of being able to generate these diagrams automatically, anytime, are obvious. If, for instance, the recipe for red candy were to be adjusted slightly, it would be nice not to have to redraw this diagram by hand.

Ideally, this diagram could be generated using industry- standard technologies that are supported on many platforms. XSLT and SVG seem like the perfect combination of technologies to solve this problem! The data could start as XML, be transformed automatically by a stylesheet, and come out as an attractive SVG diagram. The XSLT would then be an instance of a Graphical Stylesheet , the general theory of which has been discussed in a previous paper [GS] .

1.3 Challenges

Unfortunately, there are several obstacles to this solution. First, it is clear that many nontrivial mathematical calculations would have to be performed to produce the appropriate SVG. However, according to the third paragraph of the XSLT 1.0 specification,

"...XSLT is not intended as a completely general-purpose XML transformation language."

XSLT is is not intended to be a programming language, but if the stylesheet is to work, XSLT must be used for just that purpose.

Second, Figure 2 shows that optimally laying out Sankey diagrams is difficult even for humans. Overlapping edges are inevitable, and in the worst case, could completely obscure the meaning of the diagram. The stylesheet must use a variety of techniques to ensure that all edges are visible to the viewer.

Third, although SVG contains powerful primitives for rendering arcs, the arcs in properly laid out Sankey diagrams are so specialized that SVG has no intrinsic primitives for them. So some interesting geometrical manipulations must be performed to determine the coordinates and parameters of these arcs.

2. Overcoming the Obstacles

2.1 Programming in XSLT Without External Language Bindings

XSLT is intended to style text. Despite having a primitive facility for function templates, XSLT was never meant to perform general logic. Nevertheless, function templates can be used recursively in the style of a functional programming language.

As an example, the function in Figure 3 trims spaces from the end of a string.

<xsl:template name="trim_right">
  <xsl:param name="string"/>
  <xsl:choose>
    <xsl:when test="substring($string, string-length($string), 1) = ' '">
      <xsl:call-template name="trim_right">
        <xsl:with-param name="string" select="substring($string, 1, string-length($string) - 1)"/>
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$string"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

Figure 3: XSLT Function to Trim Spaces

String manipulations such as this are used alot in our Graphical Stylesheet. To deal with XSLT 1.0's limited ability to represent matrices and other nontrivial data structures, these data structures are passed around encoded as delimited strings.

2.2 Optimizing Layout for Readability

If improperly laid out, many of the tubes of the Sankey diagram can be obscured, as in Figure 4 .

BadEdgeLayout1.svg

Figure 4: Bad Edge Layout

To improve readability, the following techniques are employed:

The result of applying these techniques is exemplified by Figure 5

goodEdgeLayout.svg

Figure 5: Good Edge Layout

2.3 Geometrical Manipulations

Each tube of a Sankey diagram must have constant width, and it must retain this property as it winds through gentle corners. Simple mechanisms to connect tube segments do not have the constant width property, such as those in Figure 6 .

BadCurves.svg

Figure 6: Width of Tubes is Variable

To solve this problem, we can take advantage of the fact that the perpendicular distance between concentric circles is constant, as is the perpendicular distance between two parallel lines. Therefore a solution is to compose tubes from both of these geometrical primitives, as shown in Figure 7 .

CurveGeometry.svg

Figure 7: Constructing Tubes of Constant Width

The straight segments must be tangent to the circular segments in order to maintain the constant width property. Figure 8 shows the geometry involved in routing one edge.

CurveGeometry2.svg

Figure 8: Connecting an Edge

To route a tube of given width from one given location to another requires computation of (a1,b1) and (a2,b2) from knowledge of (x1,y1), (x2,y2), r1 and r2. With the help of similar triangles, this computation can be done in the following three steps:

EqnStep1.svg
EqnStep2.svg
EqnStep3.svg

The last of these involves a square root, and in order to compute functions such as this, it was necessary for us to implement a library of math functions in XSLT.

3. Sankey Diagram Stylesheet

3.1 The Stylesheet

The Sankey Diagram Stylesheet is an XSLT 1.0 Stylesheet that converts XML data into SVG. Figure 10 shows what the source XML for the candy factory looks like.

<?xml version="1.0"?>
<!-- <!DOCTYPE Spaghetti SYSTEM "Metaflow.dtd"> -->
<Spaghetti title="Candy Factory">
    
    <Partition name="Ingredients">
        <Node name="Sugar" weight="50"/>
        <Node name="Red Dye #2" weight="5"/>
        <Node name="Green Dye #3" weight="5"/>
        <Node name="Corn Syrup" weight="10"/>
        <Node name="Milk ingredients" weight="10"/>
        <Node name="Cocoa butter" weight="10"/>
        <Node name="Peanuts" weight="10"/>
    </Partition>
    
    <Partition name="Preprocessing">
        <Node name="Red candy mixer" weight="15"/>
        <Node name="Green candy mixer" weight="15"/>
        <Node name="Caramel maker" weight="20"/>
        <Node name="Chocolate maker" weight="50"/>
    </Partition>
    
    <Partition name="Candy makers">
        <Node name="Caramel core candy maker" weight="40"/>
        <Node name="Chocolate bar maker" weight="70"/>
    </Partition>
    
    <Partition name="Packaging">
        <Node name="Caramel core candies" weight="40" type="arrow"/>
        <Node name="Chocolate Bars" weight="60" type="arrow"/>
    </Partition>
    
    <Edge n1="Ingredients;Sugar" n2="Preprocessing;Red candy mixer" weight="5"/>
    <Edge n1="Ingredients;Red Dye #2" n2="Preprocessing;Red candy mixer" weight="5" colour="red"/>
    <Edge n1="Ingredients;Corn Syrup" n2="Preprocessing;Red candy mixer" weight="5" colour="#FFFFCC"/>
    <Edge n1="Ingredients;Sugar" n2="Preprocessing;Green candy mixer" weight="5"/>
    <Edge n1="Ingredients;Green Dye #3" n2="Preprocessing;Green candy mixer" weight="5" colour="green"/>
    <Edge n1="Ingredients;Corn Syrup" n2="Preprocessing;Green candy mixer" weight="5" colour="#FFFFCC"/>
    <Edge n1="Ingredients;Sugar" n2="Preprocessing;Caramel maker" weight="15"/>
    <Edge n1="Ingredients;Milk ingredients" n2="Preprocessing;Caramel maker" weight="5" colour="#EEEEFF"/>
    <Edge n1="Ingredients;Sugar" n2="Preprocessing;Chocolate maker" weight="25"/>
    <Edge n1="Ingredients;Cocoa butter" n2="Preprocessing;Chocolate maker" weight="10" colour="#884400"/>
    <Edge n1="Ingredients;Milk ingredients" n2="Preprocessing;Chocolate maker" weight="5" colour="#EEEEFF"/>
    <Edge n1="Ingredients;Peanuts" n2="Candy makers;Chocolate bar maker" weight="10" colour="#FFBB88"/>
    
    <Edge n1="Preprocessing;Red candy mixer" n2="Candy makers;Caramel core candy maker" weight="15" colour="pink"/>
    <Edge n1="Preprocessing;Green candy mixer" n2="Candy makers;Caramel core candy maker" weight="15" colour="lightgreen"/>
    <Edge n1="Preprocessing;Caramel maker" n2="Candy makers;Caramel core candy maker" weight="10" colour="#DDBB99"/>
    <Edge n1="Preprocessing;Chocolate maker" n2="Candy makers;Chocolate bar maker" weight="50" colour="brown"/>
    <Edge n1="Preprocessing;Caramel maker" n2="Candy makers;Chocolate bar maker" weight="10" colour="#DDBB99"/>

    <Edge n1="Candy makers;Caramel core candy maker" n2="Packaging;Caramel core candies" weight="40" colour="lightgreen"/>
    <Edge n1="Candy makers;Chocolate bar maker" n2="Packaging;Chocolate Bars" weight="60" colour="brown"/>
    <Edge n1="Candy makers;Chocolate bar maker" n2="Preprocessing;Chocolate maker" weight="10" colour="brown"/>

</Spaghetti>

Figure 10: Raw XML Data for the Candy Factory

The nodes in the graph have already been arranged into pseudo-partitions. Nodes are given text names, and are globally identified by the (partion name, node name) pair. A "weight" has been assigned to each node; this is the sum of the weights of the input edges or the sum of the wieght of the output edges, whichever is greater. Node weights are simply there to assist the stylesheet; they could of course be generated automatically. In addition, some colour instructions have been manually added to further enhance the readability of the diagram.

Figure 11 illustrates the result of applying the Sankey diagram stylesheet to the raw data.

CandyFactory.svg

Figure 11: The Output of the Sankey Diagram Stylesheet

Bibliography

[GS]
Graphical Stylesheets: Using XSLT to Generate SVG, P. A. Mansfield, D. W. Fuller, XML 2001 Conference Paper, 13 December 2001. Available at http://www.idealliance.org/papers/xml2001/papers/html/05-05-02.html.

XHTML rendition created by gcapaper Web Publisher v2.0, © 2001-3 Schema Software Inc.