Results of Profiling Script Animation in SVG


Table of Contents

What I'm Trying to Measure
Rationale
Results
The Minimal Test
The Instruments Test
The Wuse Test
Measurement Methodology
Statistical Approach
Conclusions
Future Directions
A. Resolving Bad Data
B. Profiling Assumptions
C. Sources of Errors
D. Update: Profiling on Linux
E. Update: Profiling on Mac OS X
F. Update: Profiling on Windows XP, SP3
G. References
H. Acknowledgements

I have developed several SVG applications over the years. Up until recently, my focus was not on performance but on getting them to work in multiple viewers. In the last few years, most of the major browsers have developed reasonable support for SVG. When Firefox 3 came out, I found that it was actually fast enough to render one of my applications at a reasonable update rate.

This article describes the process I used to determine how quickly some of the viewers can update SVG using ECMAscript and to see how the viewers compare. I describe both my results and testing methodology in order that other people can duplicate my results. The intent of this experiment is to measure the capabilities of some current SVG viewers. I do not intend to find which viewer is the fastest, but to study what is feasible with scripting.

The main conclusions I can draw from this experiment is that script-based animation of SVG is fast enough for many applications. Moreover, the different viewers that I tested were all fast enough to have consistent behavior between viewers over a wide range of update rates. Finally, as expected, the speed at which a particular viewer can update the SVG is very dependent on the kind of manipulations that are done.

For this experiment, I am measuring the maximum update frequency of an SVG document. I am determining this frequency by counting the number of times a function is executed using the setInterval() ECMAscript function. I am not measuring the actual rendering frequency, because I have no way to directly determine that from script.

I am also explicitly not measuring either CPU or memory usage in these experiments. Although these performance measures are important and worthy of study, I felt that focusing on one measure at a time would yield more concrete results.

Another approach to script-based animation is to use setTimeout() to fire an update method. To make the animation continuous, the update method must reschedule the update with setTimeout() again. In theory, this would make the wait time be less dependent on the amount of work being done during the update. Unfortunately, this approach is subject to a bug reported in Firefox. [1] In my tests, there was no sign of symptoms of this bug. However, this is a valid animation technique that should be explored in a later test.

Absolute performance measures are incredibly hard to reproduce (or understand). So don't take the absolute numbers too seriously. The more important measures are relative. How do different interval sizes effect the update rate? How does one viewer perform relative to another (at least on my platform)?

I have run my performance tests on three different viewers (Batik 1.8 pre-release, Firefox 3.03, and Opera 9.6) with three different SVG applications. Each viewer is run with all three applications and the results for each application are compared. The first application is a reference application that does minimal work in order to establish a benchmark of the maximum update rates for each viewer. The other two applications are demos that I have had on my website for years. They have both been modified to contain code that does the profiling.

The reference application establishes a baseline update rate that is effectively the fastest the viewer can achieve. This gives some measure of the overhead of the profiling code on the viewer under test. The difference between this baseline and the measurements for another application shows the cost for the script actually executed in that application.

Figure 1. Instruments Snapshot


Wuse Snapshot

The actual Wuse application is available on-line.

Figure 2. Wuse Snapshot


The control application (labeled Minimal) has a black rectangle as a background and one text area that is updated by the profiling code (just like the other two). On each update, the application calls the profiler to update the current count. No modifications of the DOM are made and no real work is performed by the script.

The next application is a profiled version of my Instruments demo. This application manipulates a set of SVG objects designed to mimic typical reporting instruments: bar graphs, radial dials, strip charts, and numeric readouts. This test does a small amount of calculation in the script and mostly performs scaling, rotation, and text updates to the various SVG objects.

The final application is a profiled version of my Wuse program. This application is basically nothing more than eye candy, but it does exercise the SVG scripting in a much different way than the Instruments demo. On each update, this application performs some calculations, updates the position of one end of a line and modifies the color of several lines.

Obviously, the measurements only make sense in the context the platform I am using for the tests. My tests were run on Ubuntu running on a 1.66 GHz Core 2 Duo machine. Batik was run with Sun's Java version 1.6. The numbers on a different machine could be very different.

One might ask why we should benchmark script-based animation. After all, SVG supports a very powerful declarative animation system using SMIL. There are several answers to this question.

  • SMIL is not supported in all viewers.

  • Many animation techniques (such as data-driven animations), cannot be accomplished through SMIL alone.

  • There are many other applications for using script to manipulate the SVG document besides animation.

For these reasons, scripting will probably remain a useful tool in the SVG developer's tool chest. Testing the performance of script-based animation gives some information on how useful the technique is for many applications.

I had a few applications that were already doing script-based animations. One of them, Wuse, had performed miserably on early versions of the Firefox browser. This was not unexpected since SVG support was very new at that point. I recently opened Wuse again in a more modern Firefox and was surprised at the improvement in the performance. This lead me to wonder how it compared to other SVG viewers that I have access to.

Rather than spend a lot of time trying to develop a powerful, comprehensive method for profiling the code, I decided to stick with a simple counter-based approach (testing methodology). It has the following advantages:

  • It is relatively independent of the application being profiled.

  • It is simple to understand.

  • It is quick to implement.

  • It gives repeatable results.

Using this profiler we can easily test how changes to the SVG modifies the performance characteristics.

The goal is not to provide a definitive answer about the speeds of the various viewers or to minutely measure the performance of various techniques. The idea is to determine if script-based animation in SVG is good enough for different purposes. It also gives a more analytical measure than it seems faster.

The use of this profiling approach does rely on some assumptions. The code assumes that the viewer calls the scheduled update function at an interval that has some relation to the requested interval. The approach also assumes that the Date.getTime() method returns a value related to the actual time. We would also like to assume that the viewer renders the changes some time shortly after the changes are made to the image.

David Dailey has reported that the update function driven by setTimeout() will only be executed after the rendering for the previous update is complete. [2] Further testing would be needed to prove the same effect applies for setInterval(). Dailey's paper performs a similar form of profiling. In his paper, the experiments measure differences between the expect time for updates to occur and the actual time.

The Dailey paper also focused more on testing individual techniques than on larger applications. My experiments used an automated approach to gather more data at a time, based on the (possibly invalid) assumption that I could not reasonably measure rendering. The Dailey experiments used a more flexible means of modifying parameters to test a larger number of variations in a single test. The results described here can be seen as somewhat complementary to the results from that paper.

Before examining the details, it is important to point out that all of the viewers did quite well on the tests. Comparisons of the actual measurements are less useful in general because they depend strongly on platform characteristics such as processor and operating system. The important point was that in all cases, the viewers were able to manage on the order of 20-50 updates per second. This is very different from my unscientific experiments of a few years ago. At this update rate, we can expect fairly smooth, continuous animation driven by scripting.

At very low interval sizes (high potential update rate), the variation in the measurements was fairly high. This is to be expected, since we would be pushing the viewers harder as the interval size drops. As the interval size became relatively high (low update rate), the variation reduces and all of the viewers become more similar in their performance. This is also to be expected. Even if the overhead for each update had been very high, if we wait long enough, all of the viewers should complete their work within the specified interval.


On the charts that follow, the data points are marked by a symbol like the one to the right. This mark shows 5 pieces of information at once. The point where the data line crosses the center vertical line is average value. The points of the vertical line mark the minimum and maximum values. The top and bottom of the open box mark one standard deviation above and below the mean. The height of the box therefore gives an idea of the variability in the data.

The control case for the performance testing is the Minimal Test. This case is driven by the minimal_benchmark.svg file. The functionality run on each update was the minimum needed to trigger the profiler: a method call and variable increment. The purpose of this case is to establish a baseline for a minimal amount of script to run for an update.

Table 1. Mean sample in updates/second for Minimal test.
IntervalBatik 1.8 preFirefox 3.03Opera 9.6
5198.7698.6383.54
1099.6798.6283.69
1566.5166.4162.39
2049.9149.8449.93
2539.9439.9439.93
3033.2933.2431.29
3528.5428.5527.75
4024.9825.0024.97
4522.2022.2222.28
5019.9920.0019.25

Batik provided a fairly large surprise in that its update rate remained relatively close to the theoretical maximum update rate for each interval size tested. This suggests that the overhead for the script animation is very low in Batik.

At least as far as was tested, Batik continued to increase its update rate with smaller interval sizes. Both Firefox and Opera appeared to have a minimum interval size around 10 ms. Since this would result in a theoretical rate of 100 updates per second, that's probably not an unreasonable design decision.

In the range of 50 to 20 ms interval size, the three viewers were effectively equivalent.

Interestingly, Opera did show some unusual variability in measured update rates at the 25 and 45 ms interval sizes. Further testing would be needed to determine if this was an anomaly of that particular test or some strangeness in the Opera viewer.

In order to profile an application, you need to identify what is being measured. See Profiling Assumptions.

The SVG files being profiled use a relatively standard approach to script-based animation. The setInterval() function is used to schedule an update function to be called every x milliseconds. Each update will take some period of time to execute. If the update takes more time to execute than the interval size, the next update is delayed.

This gives an obvious approach to profiling the code. Count the number of times the update method is called. This count is then sampled in another function scheduled by a second call to setInterval() with a much longer interval. The change in the count from the last value divided by the amount of time since the last sample gives a good average update rate.

By varying the time interval used for the update method, we can study the relation between requested update rate and actual measured update rate. The update time interval was varied in 5 ms increments in the range of 5 ms to 50 ms.

In this thread from mozilla.dev.tech.svg, Boris Zbarsky explains how the setTimeout() and setInterval() methods work on Gecko-based browsers. [5] One interesting datapoint that has a major bearing on this experiment has to do with scheduled code that takes longer than the defined interval. Instead of shifting the interval by a small amount to compensate for the overrun, the Gecko engine (at least) skips the update at the interval that was overrun and lets the update happen at the next interval. In other words, if we have an interval of 20ms and a function that runs reliably in 21ms, we can expect the update function to be called at 20ms, 60ms, 100ms, etc. skipping every other interval.

In addition to the update method scheduled using setInterval(), another function was scheduled at a longer interval of 10 seconds. This 10 second sampling interval seemed long enough to reduce the sampling error. The time difference between calls to this sampling function was measured using the Date.getTime() method. The difference in the count generated with the update method was also measured. The count divided by this time difference gives an average update rate. A number of samples like this were taken at each update interval. These samples are averaged to give a value for each interval size.

The data at each sample were written to a text element to allow the user to gauge progress. In addition, the data was sent as parameters in a request to an HTML page. The results of this page request were ignored. The only purpose in this request was to generate a webserver log entry for the request. The data would eventually be harvested from these log files.

Three different applications were tested to provide measures of different amounts of work. The same data for each application was collected for three different viewers. All of the data was graphed to allow visual comparison of the data.

Another script makes a new page request after a given (long) period of time. This allows for automated runs of the test on a given viewer. The script cycles through the interval sizes for a given application and then moves to the next application. This gives repeatable results without me needing to sit in front of the computer and change the intervals and applications. The only difficulty with this approach was that the functionality to request a new page was missing from Batik. [6] A patch was submitted for the 1.8 pre-release version to support this functionality.

When measuring complicated systems, there are a number of different sources for error. By using statistical methods, it is still possible to get reasonable information. See Sources of Errors for more information.

The data for a given interval contains a fair amount of variation. There is also a bit of error in the sampling interval. The data for a given interval is averaged to give an update rate for the interval. Three different sampling approaches were explored to reduce the variation in the data.

  1. Average the first N samples.

  2. Drop the first sample (to reduce startup effects) and average the next N samples.

  3. From the first N+2 samples, drop the highest and lowest and average the rest.

Each of these sampling strategies were tried and three statistics were calculated: mean, median, and standard deviation. Comparing the mean and median showed that the two were relatively close. In addition, sometimes the mean was larger and other times it was the median. There did not seem to be any systematic bias between the two.

Since the measured update rates cannot be negative and the fact that the mechanisms of bias should be mostly random, assuming a gamma distribution seems fairly reasonable.

If the mean of a gamma distribution is large enough, it tends to look something like a normal distribution, the standard deviation of the data should give an idea of which sampling approach would be the most stable. Of the three sampling approaches, the third gave the lowest variation.

In the end, the measurements were done using the third sampling method with N equal to 50. So, at least 52 samples must be taken.

Although there is a lot of variation between the three viewers tested at very high update rates, they all performed fairly consistently in the 20 ms to 50 ms interval size. As expected, different work in the scripts has a very strong effect on the update rate. As long as the interval size does not become too small, the viewers could sustain update rates around 20 to 50 updates per second or better. This update rate is high enough to generate fairly smooth animation.

Someone might question the need for update rates faster than the frame rate of the monitor displaying the SVG. Most movies use a frame rate of 24 fps. Monitors are actually capable of refresh rates of between 50Hz and 120Hz. This gives a fairly wide range of potentially useful update rates.

Choosing an optimal update rate is would depend on how quickly a human observer can detect the change. Updates that occur faster than a person can detect could be considered to be a waste of resources. However, a person's ability to detect changes is actually a very complicated problem. [7] There does not seem to be any firm value that we can call the fastest refresh rate that matters.

Instead of making an arbitrary cutoff, I decided to make a series of measurements to determine what was possible. While a faster update rate may not be directly useful, we might expect that a viewer capable of faster update rates may be able to generate smoother animation. Moreover, if a particular viewer can do a given amount of work with a very high update rate, it may be able to do more work while still sustaining a reasonable update rate. Proving this would make a good experiment for a different time.

The only results worth reporting are ones that others can reproduce, so I'm including the materials I used to run the tests and my initial sets of result data in this tarball. To duplicate the tests, you will need access to a webserver where you can put the appropriate files. You can then follow the instructions in the enclosed README file.

Now that the infrastructure and profiling tools have been written, there are many other directions that can be explored. The most obvious is to profile more SVG files, with different mixes of manipulations.

Tests comparing different image manipulations with similar effects might be interesting. For example, a comparison of changing the center of a circle vs. using the transform attribute to translate it.

Measurements of different filter effects could also be interesting.

There have been a number of reports that some viewers (notably Batik) are slow with some text operations. These tools could be used to give some solid information about this issue. We could also add performance information to the ongoing debate of the difference between presentation attributes and CSS styling.

Obviously, tests of more viewers and platforms would be useful. Although we can only usefully compare viewers on the same platform, this kind of profiling is still useful to gauge the feasibility of different applications.

Although the current tests are somewhat automatic, more automation of the testing would definitely be a benefit to anyone duplicating this test. At the moment, the post-processing stage is somewhat primitive and could benefit from more work. Better analysis of the data might yield more useful information from the data logged in the test.

Finally, a few longer test runs are needed to help validate the assumption of a gamma distribution of the samples.

More testing of intervals close to the 10ms estimated cutoff for Firefox and Opera would be useful to establish the actual value of the cutoff.

It would probably be worth testing the setTimeout()-based scripting approach. That would allow comparison with the current approach. It might also be possible to test the Firefox bug described earlier. [1]

A test that verifies whether other viewers treat interval overrun the same as Firefox would be useful.

A. Resolving Bad Data

In confirming my results, I found that I had fewer samples for Firefox than expected. For some reason, the first few samples did not reach the web server for a couple of interval sizes. I changed the code to call clearInterval() before moving to the next interval and reran all tests.

In this test run, all of the data points were reported for all three viewers. The results were consistent with the previous tests. The new results are what is now reported.

B. Profiling Assumptions

Any experiment relies on assumptions about the way the system under test responds. The following are the assumptions that I am aware of making with regards to this test.

  1. The Date object provides relatively accurate time measurements in the sub-second range.

  2. Separate setInterval() calls can schedule more than one repeating function that run concurrently.

  3. Two or more functions scheduled by setInterval() can access shared memory without race conditions.

  4. The setInterval() function can schedule an anonymous function or closure.

  5. Asynchronous HTTP requests can be made without undue impact upon scheduled functions.

The first assumption seems to be correct based on the results returned and the timestamps in the webserver logs.

The second assumption is confirmed by the fact that the three independent scheduled functions do run.

The third assumption should be valid, since browsers run all of the script on a given page in a single thread.

All of the viewers tested support the anonymous function syntax for setInterval(). ASV on Internet Explorer does not support this syntax. Renesis has a bug report for the same issue. [3] Neither of these viewers were tested in this experiment.

All of the major browsers support the asynchronous HTTP requests. This is the basis of the AJAX technology. Since there is no processing on response, the only cost should be minimal.

There are also assumptions I have not made, that are also worth documenting. The code does not assume millisecond-level resolution from the Date object. (There's evidence that assuming high resolution from the JavaScript timer is not safe. [4]) The test only measures times on the order of 10 seconds. The code does not assume that the setInterval() scheduled functions are called at an exact or even interval. By using a long interval to sample the fast interval and measuring the actual time of the long interval, the errors should average out.

C. Sources of Errors

There are several potential sources for error in this test. The test has been designed to reduce some errors. The web browser-based viewers appear to have a lower limit on the interval that can be chosen. Obviously, trying to chose an interval less than that value will result in invalid measurements.

There is a potential error that the profiler cannot work around involves rendering the SVG. The current code measure when updates are made, not when the changed SVG is rendered. In most viewers, there is no way for the script to find out what has been rendered. The browser could potentially skip rendering updates that happen too close together to allow a higher update rate.

This would defeat the method of measuring the update rate to profile the animation speed. The only clue is whether the animation looks smooth or jerky. In addition to being subjective, it is quite possible that updates could be lost without an observer being able to tell.

D. Update: Profiling on Linux

Both the Firefox and Opera browsers have been updated since the original performance testing. I reran the performance tests on the same laptop as before, with the most recent versions of those browsers: Firefox 3.5.3 and Opera 10.

Figure D.1, “Results of Profiling Minimal on Linux” shows the results of the minimal application when tested on the Linux laptop. There is not much to say, because the graphs track pretty close to the results described before.


Figure D.2, “Results of Profiling Instruments on Linux” shows the results of the Instruments application when tested on the Linux laptop. Again, the results match the original tests pretty closely.


Finally, Figure D.3, “Results of Profiling Wuse on Linux” shows the results of the Wuse application when tested on the Linux laptop. In this chart, Opera and Firefox do a little better than before. Of the two, Opera has more improvement.


Obviously, this laptop is underpowered for really fast animation of complicated images. But, for slower updates, it still performs respectably.

E. Update: Profiling on Mac OS X

After the initial version of this paper, I got the opportunity to rerun the tests on a Macbook 2.53 GHz Core 2 Duo laptop with 4 GB of RAM, running Mac OS X 10.5.8. The Macbook posed several annoying challenges to performing this test. Issues included an automatic logout after 30 minutes. This caused significant difficulties for a 5.5 hour test.

After configuring the laptop not to log off, I found a large number of missing requests in the Apache logs. This made consistent performance statistics unreasonable until I increased the length of the test significantly. I suspect the problem had to do with caching, but I never proved it. Increasing the test run to 7.5 hours per browser resolved the issue.

I tested the latest version of three browsers: Safari 4.0.3, Firefox 3.5.3, and Opera 10. I was not able to get the Batik application running on the Mac due to differences in runtime versions.

Figure E.1, “Results of Profiling Minimal on a Macbook” shows the results of the minimal application when tested on the Macbook.


Here all three browsers track pretty close to optimal update rates for the reference application. All three also seem to have the 10ms cutoff built in.

Figure E.2, “Results of Profiling Instruments on a Macbook” shows the results of the Instruments application when tested on the Macbook.


Once again, all three browsers track pretty well. Results are slightly slower than the minimal application, with Opera lagging a bit behind the other two. However, all three browsers generate quite respectable results in the 20ms to 50ms range.

Finally, Figure E.3, “Results of Profiling Wuse on a Macbook” shows the results of the Wuse application when tested on the Macbook.


The Wuse application was the hardest on all three browsers as we had seen before. However, Opera and Safari both performed much better in the range of 20ms to 50ms. In fact, they performed almost as well as they had on the Instruments test. Firefox matches the other two down to 25ms, but begins to fall off in the 20ms range.

F. Update: Profiling on Windows XP, SP3

After the initial version of this paper, I got decided to profile the same applications on my wife's 2.53 GHz Windows XP (SP3) machine with 1 GB of RAM. Although I was able to add a few browsers for testing, I could not add plugins to her Internet Explorer configuration or update Java to run Batik without compromising some of her work.

I ran the long versions of the tests that I used for the Mac testing because I did not want to run the tests repeated trying to get them to work, like I did with the Mac. I tested the latest version of three browsers: Firefox 3.5.3, Chrome 3.0.195.21, and Opera 10.

Figure F.1, “Results of Profiling Minimal on Windows XP” shows the results of the minimal application when tested on Windows XP.


All three browsers track pretty close to optimal for the range 15ms to 50ms. Chrome goes on to perform very well down to 5ms, without the 10ms cutoff we saw in other tests. Opera performs a little worse down to the 10ms cutoff. For some reason, Firefox showed a cutoff around 15ms. Both Opera and Chrome showed signs of some low outlying data points at various levels. This could probably use some further investigation.

Figure F.2, “Results of Profiling Instruments on Windows XP” shows the results of the Instruments application when tested on Windows XP.


Once again, all three browsers track pretty well. Results are slightly slower than the minimal application, with Opera lagging a bit behind the other two. However, all three browsers generate reasonable results in the 15ms to 50ms range. Firefox shows a cutoff at 15ms as before.

Finally, Figure F.3, “Results of Profiling Wuse on Windows XP” shows the results of the Wuse application when tested on Windows XP.


The Wuse application appears to have not been much of a problem for the three browsers in this case. All performed very close to the Instruments test in the whole 15ms to 50ms range. At a smaller interval, they did not fare so well. In fact, Firefox seems to have reacted very badly to the 5ms interval in this test.

G. References

H. Acknowledgements

My wife, Debbie Campbell, provided the initial editing, and review. Members of the SVG Developers Yahoo Group also reviewed the article. I am indebted to Frank Bruder, Erik Dahlström, Helder Magalhães, and Cameron McCormack, who provided insights and suggestions after reading a draft of the article. The article is better for their help. Any errors that remain are purely my responsibility.