Even faster web mapping

Michael Neutze

vis.uell.net


appended October 8th, 2011 with information about Inkscape’s “Optimized SVG Output” and minor corrections

appended October 13th, 2011 with an additional figure showing the effect of path simplification in Adobe Illustrator more clearly


Inspiration

We all hate to wait. But there is actually some science behind this when it comes to websites and how they load. So much so that speed concerns are tackled at their own convention, namely the Velocity Conference and the W3C has set up a Web Performance Working Group. Sites like amazon.com have shown that slowing down their pages in the milliseconds region has measurable impact on sales and some search engines use page-loading times as a signal for ranking results.

The mantra of performance optimisation can be studied e.g. by the works of Steve Souders, and eventually his book Even Faster Web Sites has been the inspiration for the title of this paper. To get up to speed with his teachings you can watch some of Steve’s presentations online, like his Webstock ’11 talk Web Performance Optimisation: The Gift that Keeps on Giving.

As is the case with a lot of web developments, their implications are not yet widely discussed with Scalable Vector Graphics (SVG) in mind. So although SVG is now entering its teenage years, oftentimes the age-appropriate swiftness is still missing. Ironically a contest at the Web Directions Conference of 2010 asked: “How could we help encourage more developers to finally take a look at SVG?” And the answer they came up with was building a progress bar in SVG. This seems a bit contradictory, as in theory vectors should in itself be so much more efficient than bitmaps. However most users experience mapping applications in the browser as very efficiently loaded tiles of bitmap data – and no progress bar.

It is no surpise that latency obsessed Google Inc. sent their own Robert Russell to keynote at SVGopen 2010 about Efficient SVG (Video). Rob gave a very thorough overview on many aspects of efficiency in SVG, ranging from cleaning up markup and simplifying path data to the cost of manipulating the DOM (“Don’t touch the DOM!”).

This paper focusses on thematic (choropleth) mapping, explores the different use cases of SVG therein and provides hands-on advice on how to optimize the static SVG parts of such maps. The performance aspects of different scripting techniques to interact with SVG lie outside the scope of this paper.

Speed can be conceived as part of the experience design of web apps. It is a part of the design spectrum that can be grasped best with data. Thus it’s an area closer to the engineering heart of SVG developers.

Basic Principles

According to the Best Practices for Speeding Up Your Web Site the Yahoo! Developer Network suggests the following:

The performance in these areas can easily be measured with the YSlow browser extension. Furthermore Waterfall Charts, that are part of browsers’ developer tools, will be used to examine the loading behaviour of webpages.

An example to improve upon

In his SVGopen 2009 paper on web mapping the author explored SVG based thematic mapping where all SVG elements were embedded directly within the HTML markup, using either XHTML with explicitly declaring two separate namespaces (XHTML Example) or as was proposed by the HTML5 draft (HTML5 Example). For compatibility purposes with existing browsers at the time, the SVG Web JavaScript Library was used to make SVG within plain HTML possible.

Idealized example code of embedded SVG within HTML5
Idealized example code of embedded SVG within HTML5

Admittedly those two versions of an Atlas for the European Election in Germany 2009 mixed all of the SVG elements with (X)HTML just because it had become feasible. Now looking through the optimisation lens, how did those maps perform? The following waterfall chart shows the instance of the HTML5 version of the 2009 election atlas:

Assets by size and loading pattern
Assets by size and loading pattern

Of the aforementioned performance principles, two can be ticked off: There are only 4 http requests in all and the files are gzipped by the server. However, the main html file that makes up the page is quite heavy as it contains the full geometry of the map. Of the 590 KB that the html file weighs unzipped, 578 KB (98%) is pure SVG path data that defines the boundaries of the electoral districts.

This is obviously not very efficient given that users will hit reload very often during election night, just to see if new data has arrived. It lies in the human nature that even modern AJAX driven sites cannot prevent impatient users from that long learned habit (that particular atlas did not yet use AJAX methods for loading the data).

Instead of discussing how performance optimisation techniques can be applied to sites containing SVG in the abstract, the following chapter will detail the parts that make up a thematic map. In terms of functionality the design follows Andreas Neumann’s Choroplethe map with interchangeable statistic variables as a blueprint. Unlike that early map, that is currently not working in recent browsers’ native SVG implementations, the following map explores the state of mixing SVG with HTML that has been promised under the HTML5 umbrella.

Building blocks of a thematic map

For the sake of this paper the map of the state level election in the German state of Rhineland-Palatinate in March 2011 shall serve as an example. This map cannot be directly compared to the one for the European Election in Germany (see above) as the path data of the boundaries is much simpler and election districts are far fewer. The map is part of the author’s Election Atlas. As the election results for that election are finalized the aforementioned atlas is now working slightly different than during the vote counting.

To regain the election night situation where people come back to the site frequently and often hit reload as they await new data, the technical version of that state has been reassembled for the purpose of this paper at vis.uell.net/svgopen/11/atlas.html. Below is a schematic description of the different parts that comprise the map’s design.

Main design elements of a thematic map

Naturally most of the screen real estate is devoted to the map. While election districts may change in between elections, once they are set for a coming election and being published, they can be considered a static asset. Therefore the path data should not be part of the framing HTML file. The map geometry in this example is loaded from an external .svgz file via an object element.

Depending on the shape of the map the remaining parts have to be arranged. In the examples discussed here they are situated on the right hand side. The results table gets updated on hovering over the map with the mouse, thereby displaying the detailed results for the district the pointer is over.

No map would be complete without a key. As with the results table the key is literally constructed using the html table element. This limits the visual design options a bit but helps keeping the code simple and allows for robust input capabilities for chosing discrete classification values. The main disadvantage being that the color key is painted as a background-color attribute to a table cell which by default doesn’t get printed.

Finally the histogram shows the distribution of the values of all districts. It is built using SVG line elements that are constructed dynamically.

The UI for zooming and panning is less discoverable but follows the proven concept of established web mapping applications, i.e. using the scrollwheel of a mouse to zoom in and out and dragging the map for panning. As a hint for this and a preliminary UI for touch devices, zoom and reset buttons are available in the lower right corner.

Let’s now have a look how these functional areas are constructed in terms of files being transmitted. A first glance is always being achieved by a waterfall diagram, again this time realized by the chart that the Firebug extension offers:

Firebugs Net Panel, Waterfall View
Firebug’s Net Panel, Waterfall View

Although all files are gzipped serverside in this example, static SVG content should be saved in the compressed .svgz format, which should ease some server load and provides compression in instances where the server side cannot be controlled.

Most assets can be mapped to the aforementioned functional areas by guessing from the filenames but the following table gives a more detailed explanation.

Table 1: Description of assets, by filesize

File Usage Size (gziped)
svg.js JavaScript library SVG Web for compatibility with legacy versions of Internet Explorer 26.1 KB
map.svgz Geometry (boundary paths) of the electoral districts, derived from a shape file 17.1 KB
static_data.js Election results of previous elections as well as demographic data 7.0 KB
programme.js Program logic, e.g. reading of the data, statistics, classification methods, colouring of the map 6.3 KB
atlas.html Main HTML file, contains results table, key and dropdown menus 3.1 KB
realtime_data.txt Results from the current election that gets updated throughout election night and is loaded via XMLHttpRequest 2.3 KB

What sticks out is that a JavaScript library uses even more bandwidth than the path data for the map. SVG Web converts the SVG markup on the fly to Flash for legacy versions of Internet Explorer (namely 6, 7 & 8). But even for recent browsers with native SVG support a 26 KB JavaScript file (gzipped) is being transfered. It is just a matter of time until those techniques become fully obsolete, but in general the use of JavaScript libraries has to be justified in terms of bandwidth and latency. With regard to caching it is necessary to use the same version and location of the library in question all over the site.

Using SVG at the appropriate places

The mapping example in this paper showcases both, the use of static as well as dynamically generated SVG markup and it restricts the use of SVG to the appropriate areas. Even in maps there is plenty of content and UI that can be perfectly implemented using standard HTML.

As far as diagrams such as bar-, line- or pie charts go, those can usually be generated most efficiently by script and therefore their optimization lies in the field of improved JavaScript programming.

The map itself, i.e. the boundary data that shapes the electoral districts or denotes other mapping features such as roads or rivers is mostly path data. It is this path data that needs most of the attention. This is the heart of SVG use and this is what the SVG standard has been optimized for. Path data in SVG can be described very efficiently but not all programs that come in touch with SVG do so.

Aggressive Caching of Assets

As we have seen, in a mapping application that loads most recent data via XMLHttpRequest (aka ‘AJAX’), all other assets can be cached agressively. But even without AJAX-style data handling, a thematic map as discussed here consists of assets that change with a very different frequency. While the strategies of caching vary between browser vendors, site administrators can set expire headers with far future dates and thus allowing the browser to cache accordingly. This requires changing filenames whenever an asset is updated as otherwise users wouldn’t get the latest version of the site. Usually the revision number of a file is included in the filename to keep things organized.

When using the Apache web server a .htaccess file can be placed in directories to adjust the expire headers according to how often the assets in that given directory are updated. The following example shows the content of such a file for the above mapping application.

ExpiresActive On
ExpiresDefault "access plus 300 seconds"
ExpiresByType text/javascript "access plus 1 week 1 hour"
ExpiresByType application/x-javascript "access plus 1 week 1 hour"
ExpiresByType image/svg+xml "access plus 1 year 1 hour"

The YSlow extension for Firefox gives an overview of a site’s assets including suggested actions for optimization. This comes as a clear table (with drill downs) that includes the expire headers that get sent from the server.

YSlow table of expire headers
YSlow table of expire headers

The YSlow inspection clearly shows that the intended expiration dates are set. The SVG Web JavaScript library as well as the mapping geometry are set to expire a year after first access. This obviously doesn’t necessarily mean that the file in question will be held that long in the browser’s cache.

Expiration headers can also be examined by most other developer tools. See here e.g. how you can expand the waterfall view in firebug to reveal the full request and response headers for each file:

Firebug extension showing a far future expire header
Firebug extension showing a far future expire header

Caching of SVG files in Browsers

The technique of aggressive caching is universal and has almost nothing to do with SVG but experiences drawn from different browsers and access statistics seem to suggest that not all browsers treat SVG content equal when it comes to respecting expiration headers. Just as an example the following charts compare Safari and Chrome as they have similar developer tools.

Safari 5.1 Webinspector showing loading timeline
Safari 5.1 Webinspector showing loading timeline

Here Safari 5.1 Webinspector shows that unlike the JavaScript files, it didn’t use the cached version of map.svgz, despite it not being changed and having a far future expire header. Contrary Chrome 14 behaves as expected in terms of caching, which results in shorter load times.

Chrome 14 Developer Tools showing caching behaviour
Chrome 14 Developer Tools showing caching behaviour

Other browsers’ behaviours weren’t inspected in detail for this paper. This should just serve as a hint to implementors.

Optimizing path data for map geometries

Until now it should have become clear that optimizing path data, that makes up the boundaries of a map, is crucial. And there are no such tools as “Save for web” where the quality of the path data and the resulting SVG filesize can be controlled in one go as is the case e.g. for saving jpeg files in common image editors.

Where do mapping geometries come from

Ideally, boundary data can be obtained in the Shapefile format which is best suited for the purpose as will become clear later on. However there are many sources where boundary data comes in the form of SVG already or is even provided in WMF or EMF formats.

Typically path data in general terms — may it represent boundaries or other shapes — will be edited with programs such as Adobe Illustrator or the Open Source Inkscape. Those tools can be used to optimize SVG path data in two ways:

  1. Simplifying paths = reduce the number of anchor points
  2. Adjusting the precision of SVG coordinates

Simplifying path data with Adobe Illustrator

Illustrator comes with a very streamlined yet powerful interface for simplifying paths which can be accessed through the Object menu:

Adobe Illustrator CS 5.1 Path Simplification command
Adobe Illustrator CS 5.1 Path Simplification command

The complexity of the paths can be reduced via sliders while the simplified version is previewed on top of the original paths. At the same time the reduction in anchor points is displayed in the same interface. The path simplification can work on both bezier curves as well as straight lines:

Simplification using Bezier Curves (original paths in red)
Simplification using Bezier Curves (original paths in red)
Simplification using Straight Lines (original paths in red)
Simplification using Straight Lines (original paths in red)

While the reduction in anchor points is impressive with these settings, already negative side effects become visible: Adjacent boundaries don’t align anymore. However this is a shortcoming of the path element in SVG, as it wasn’t particularly constructed for representing boundary data.

Adjacent paths dont align after simplification
Adjacent paths don’t align after simplification

The example above certainly showcases an extreme outcome. Experimentation with different levels of simplification will lead to acceptable results in many cases as long as screen resolution accuracy is sufficient.

Exporting SVG from Adobe Illustrator

Adobe Illustrator gained SVG capabilities in Version 9 in the year 2000. From the beginning path data was exported to SVG using relative coordinates which is a first step in reducing filesize.

Apart from simplifying paths, the precision of SVG coordinates, i.e. the number of decimal places, is another factor that determines the resulting filesize. As Illustrator’s native fileformat isn’t SVG, the precision can be adjusted in the extended options of the export dialog:

Adobe Illustrator CS 5.1 Save Dialog for SVG (1-7 Decimal Places are valid)
Adobe Illustrator CS 5.1 Save Dialog for SVG (1–7 Decimal Places are valid)

The following figure exemplifies what impact different precision parameters can have on mapping geometries. Note that the number of decimal places have to relate to the size/precision of the viewbox, so usually a little experimentation will be necessary.

Effect of different precision settings for files saved from Illustrator (map only partly shown)
Effect of different precision settings for files saved from Illustrator (map only partly shown)

Again at some point reducing the precision will lead to rounding errors that will result in paths not aligning anymore.

Editing SVG path data with Inkscape

The open source vector editing program Inkscape uses SVG as its native file format. Starting with version 0.47 (November 24, 2009), Inkscape has been using optimized path data:

In this version, the size of the path data written in the d= attribute of path elements is reduced by about 10%. Inkscape generates the shortest possible path strings by avoiding repeated operators and using relative coordinates (when it helps).

Path simplification in Inkscape can be accessed from the Path menu and has no further options. It is intended to be applied multiple times until the desired effect is gained. The Simplification threshold can be adjusted in preferences:

Settings for path simplification threshold (Inkscape 0.48)
Settings for path simplification threshold (Inkscape 0.48)

The following figure shows how different simplification thresholds affect the path simplification. Again the values in this example depend on the viewbox and are given here just as a guiding principle.

Effect of different simplification thresholds when using the Simplify command from the Path menu
Effect of different simplification thresholds when using the Simplify command from the Path menu

Please note that Inkscape’s path simplification always results in bezier curves so that the simplified path looks smoother but doesn’t necessarily result in a smaller file.

Just like Illustrator Inkscape allows for adjusting the numeric precision of coordinates. Since SVG is Inkscape’s native file format these adjustments are set in preferences:

SVG output settings: Numeric precision
SVG output settings: Numeric precision

Inkscape’s “Optimized SVG Output”

Under the headline Inkscape for the Web Tavmjong Bah writes in “Inkscape, Guide to a Vector Drawing Program” (4th ed.):

To be quite honest, making Inkscape convenient for creating SVGs for the web has been more of an afterthought. Having said that, many of the great SVG examples on the web have started life as Inkscape drawings, as evidenced by tell-tail fingerprints left in the source. This section focuses on ways to prepare Inkscape SVGs for the web.

As you can see from the following screenshot showing the Inkscape 0.48.2 Save as Dialog, “Inkscape SVG” and “Plain SVG” are the prominent default options. The former also keeps editing information that is not necessary for the display, whereas the latter will omit information in the Sodipodi and Inkscape namespaces and thus results in a much smaller filesize.

Then somewhere hidden in between more obscure file formats is the Optimized SVG output, that leads to a dialog box shown below. Here more finegrained control is given to the user.

Inkscape 0.48 Save As Dialog
Inkscape 0.48 Save As Dialog

Note, that writing the “Optimized SVG” format in Inkscape 0.48.2 requires that the lxml Python library is present, which by default is not the case on MacOSX. A standard installation of Inkscape under Ubuntu 11.04 however is fully functional in this regard.

Inkscape 0.48 Optimized SVG Output Dialog Box
Inkscape 0.48 Optimized SVG Output Dialog Box

For static SVG files like the SVGopen 2010 website header stripping id attributes may be a usefull approach in reducing the filesize. On the other hand, SVG graphics that will be manipulated by script like a thematic map, would be rendered useless with this approach.

The Set precision option is the same as discussed before but can be adjusted here on a per file basis without changing the overall preferences. Again setting the precision too low can result in adjacent paths being “torn apart”.

For a complete description of the above Save as Optimized SVG Dialog see the aforementioned Inkscape for the Web section of Tavmjong Bah’s book and also consider the blog entry Optimizing Inkscape SVG size for the Web of the same author for a realworld example and possible filesize reductions.

Editing shapefiles with MapShaper

Wherever possible one should start a mapping project from a shapefile, the defacto standard in GIS. A lot of the aforementioned issues in optimizing path data result from the fact that in SVG adjacent polygons “don’t know of each other”, i.e. there is no information that an editing programme could use to keep a shared boundary intact after generalization.

On the other hand the shapefile format was created with exactly that in mind:

Over the past two or three decades, the general consensus in the GIS community had been that topological data structures are advantageous because they provide an automated way to handle digitizing and editing errors and artifacts; reduce data storage for polygons because boundaries between adjacent polygons are stored only once; and enable advanced spatial analyses such as adjacency, connectivity, and containment.

from Understanding Topology and Shapefiles (ArcUser April-June 2001, emphasis added)

For those in the GIS community working with shapefiles in the respective environments, generalization methods are well known. However not everybody has access to and knowledge of these tools, but professional maps are typically provided in that format. Enter MapShaper, a free, interactive online Shapefile editor (available at mapshaper.org), that may be the last remaining reason to keep a Flash plugin around.

Matthew Bloch researched and programmed mapshaper while he was getting an M.S. degree in Cartography/GIS at the University of Wisconsin - Madison where his academic advisor Mark Harrower funded the project and offered advice.

MapShaper shapefile editor - Flash-based
MapShaper shapefile editor - Flash-based

The above screenshot speaks for itself as it combines the simplification methods already discussed in previous sections with regards to Inkscape and Illustrator but now working on a fileformat that respects adjacency. MapShaper is a perfect tool to gain the desired level of detail balanced against the resulting filesize.

Obviosuly results vary greatly depending on the necessary level of detail. For the election map discussed in this paper the following results could be obtained.

Table 2: Filesize reduction for selected simplification levels

Shapefile Size
Original 112 KB
3% Simplification 49 KB
5% Simplification 44 KB

Converting shapefiles to SVG using shp2svg

Once the shapefile has been simplified as desired it has to be converted into SVG. This second step is similar to the already discussed Save as step in vector editing programs, where again precision has to be balanced against filesize.

The utility section of carto.net offers a Shapefile to SVG converter (LGPL licensed) to convert shapefiles to SVG. Here the rounding parameter determines the precision and thus filesize of the output. A typical use case would be

perl ogis2svg.pl --input yourshapename --output yourshapename.svg --roundval 0.1

A value of 1 means that we round to full integers, a value of 0.1 means that we round to one decimal place, a value of 0.01 means that we round to two decimal places and so on … For meters as units 1 or zero decimal place will be accurate enough (roundval 1 or 0.1), for degrees as base units I recommend to use at least 5 decimal places (roundval 0.00001).

For screen based solutions so far a roundval parameter of 1 has proven sufficient. The resulting SVG file uses path data with absolute coordinates. Open and saving again in Inkscape 0.47 or above will may further optimize the path data using relative coordinates without any other intervention or quality loss.

SVG Tidy and Hand editing

Sam Ruby developed a script called SVG Tidy, that helps cleaning up SVG files from vector editing programmes. His work has been incorporated in Inkscape’s Save cleaned SVG option (now called “Optimized SVG”, see above).

However Sam shows with an example drawing of the “Corrosive” danger sign that editing markup by hand lets him simplify SVG files even further (before: 15 KB, after: 2 KB).

For assets that don’t change but will get loaded very often, there is good value in manually improving and optimizing SVG markup.

Conclusion: Finding the right level of detail

The paper aimed at making the SVG community aware of website performance optimisation techniques and also checking with browser vendors how they treat SVG content in terms of caching. However the main focus turned out to be the level of detail of path data.

Cartographers have always been knowing that maps at different scales aren’t just magnified or scaled-down versions of one another, but that maps at different scales really differ in content. On the other hand the promise of scalability has often lead to “SVG bloat”, a level of detail that most of the time goes unused.

Operating systems still use bitmap images for icons and even though they now have to cater for an even wider variety of screen resolutions, they usually require that icons are provided at several magnification levels (example).

At SVGopen 2009 the CTO of Wikimedia, Brion Vibber, analyzed the use of SVG in Wikipedia (pdf, 5.5 MB) and why the png renderings of maps are more efficient to them in terms of file size and rendering speed.

The tools and possible workflows for optimizing vector graphics to a distinct level of detail that fits the purpose of the output medium or a range of magnification levels, has been discussed. What has yet to evolve in vector based mapping are clever solutions to load finer level of details when they are needed.

Furthermore the focus was mainly on the “loading” side of the equation, considering bandwidth use and latency. Nevertheless with regard to memory usage the question of how to get rid of the path data once it is no longer visible will also have to be adressed.