Bitmap to Vector Conversion for Multi-level Analysis and Visualization

Darren MacDonald

University of Ottawa
School of Information Technology and Engineering

                        Ottawa, On.
                        K1N 6N5

Darren MacDonald of Nackawic, New Brunswick, Canada, currently resides in Ottawa, Ontario, Canada and works as a Graphical Services Developer for IBM. Darren recieved his Bachelor's degree in Computer Science at Dalhousie University in Halifax, Nova Scotia, Canada, and is awaiting his Master's degree in Computer Science at the University of Ottawa. He studied at the DISCOVER lab under Dr. Lang and held a Canada Graduate Scholarship.

Jochen Lang

University of Ottawa
School of Information Technology and Engineering

                        Ottawa, On.
                        K1N 6N5

Jochen Lang is currently an Assistant Professor at the University of Ottawa, Canada. He received his M.Sc. in Computer Science from York University, Toronto, Canada and his Ph.D. from the University of British Columbia, Vancouver, Canada in 1996 and in 2001, respectively. He holds a B.Eng.(Hons) and a Dipl.-Ing.(FH) in Electrical and Electronic Engineering and is a Professional Engineer (PEng). He gained industrial experience in real-time systems, system verification and computer vision in Canada and Germany. He was a researcher at the Max-Planck-Institut für Informatik, Saarbrücken, Germany from 2002 to 2004. His research focuses on image-based modeling in the area of visual computing combining techniques from computer graphics and computer vision. He has published on image and measurement-based modeling in the fields of computer graphics, robotics and computer vision.

Table of Contents

Related Work
HSVGen Overview
Region Merging
Region Refinement
Active Contours
Boundary Simplification
Hierarchical Scalable Vector Graphics
Hierarchical SVG Listing
Clip Art Images

Various methods have been introduced in recent years at this [BattitoPuglisi07], [PrasardSwaminarayan07], [PrasardSkourikhine05] and other venues, e.g., [SunLiangShenShum07], [LecotLevy06], for the conversion of natural images in a raster format to a vector representation. Excellent results are achieved by methods which segment the raster image first [PrasardSwaminarayan07], [BattitoPuglisi07], [LecotLevy06], followed by a conversion of the segment boundaries into a vector format. A segment is a region of the bitmap image. In general, the conversion is lossy, however, we attempt to be accurate to the image data using a minimally complex SVG, in terms of path lengths and number of polygons.

A high quality vector image can only be achieved with a high quality segmentation. A desirable segmentation of an image groups pixels corresponding to objects in the scene. This correspondence establishes a semantics for the vectorized image and is useful for various vector image applications. Our method could serve as a tool for graphic design or as part of a SVG-based image browser where users may want to interact with the image on an object level rather than on a image feature-level, e.g., colours or pixels. However, segmentations which correspond to objects in an image exist at varying levels of detail. For example, an image of a house may be considered to contain a single house object at a very high level but at a more detailed level, the house object contains a roof, windows, walls, doors, etc.. These sub-parts of the house may be considered objects in their own right and are comprised of further objects at a even finer level of detail. The most desirable level of detail for the image segmentation depends on the intent of the observer and on the capability of the display device. The same vector images should be displayed at the best level of detail for each display device and each user's intent. Hence an ideal vector image format should encode all levels and let the rendering engine decide what to display.

In our method, which we call HSVGen, conversion is performed in three main steps: colour segmentation [LuccheseMitra01] followed by active contour localization with simplification and hierarchical colour segmentation. The hierarchical colour segmentation builds a full binary tree of image regions with the complete image as the root. Each subtree represents a region and its contained subparts, simultaneously. Our hierarchical segmentation algorithm uses a statistical merging criterion involving the important visual characteristics of size, shape, colour and brightness. The relative influences of these factors were optimized in ground-truth based tests. We employ the same segmentation algorithm for the initial segmentation and the hierarchical segmentation but any segmentation algorithm which produces an image partition could be substituted for the initial segmentation.

We refine the boundary between pixels classified by the initial segmentation based on a novel use of active contours. This refinement process is characterized by region competition which allows good contour localization in the absence of a steep gradient. A vectorized image approximates the raster image well if contours are well localized. We interleave the active contour step with a contour simplification step based on edge collapses. In the edge collapse, we use a quadric simplification modeled on a method by Garland and Heckbert [GarlandHeckbert97] with can be adjusted to trade off accuracy for file size.

We encode the vectorized images obtained with the hierarchical segmentation after contour refinement as a hierarchical scalable vector graphics. Our images can be rendered by any SVG compliant render engine, however, the full potential of the hierarchical representation requires a hierarchical rendering strategy. In our approach, we render the segments as a stack of vector drawings with increasing levels of resolution rather than applying a stopping criterion to the segmentation process. These stacks of vector drawings reflect different levels of the segmentation process. Scripting allows the user to adjust the level of detail dynamically. Moreover, since lower-detail image segments can be obtained by merging high-detail segments, we encode the SVG as a hierarchy of elements with group tags. Dynamic level of detail trims off data that is at too small a scale to be significant, which is useful in applications such as mapping, especially in cases of limited bandwidths and screen sizes. The polygon hierarchy may also be useful in computer vision applications, such as depth perception and object labeling.

In Related Work, we give an introduction to segmentation focusing on hierarchical image segmentation and region merging. Our vectorization algorithm is summarized in the section on HSVGen Overview, our hierarchical segmentation method is described in the section on Region Merging, and the details of the subpixel boundary localization method are given in Vectorization of Image Segmentation. Finally, we demonstrate our Results against related software packages. We can show that the polygons generated by our automatic method have good spectral agreement with the bitmap and also tend to reflect the different objects in the scene. We end in a Conclusion.

A large proportion of raster to vector conversion systems are designed to extract linear features such as polylines from imagery. Such systems are important for the conversion of scanned technical drawings to digital formats but can typically not convert general color images (see e.g. the biannual Graphics Recognition Workshop GREC and its arc segmentation contest). A simple approach to the conversion of general bitmap imagery may proceed by clustering pixel colors in an image histogram combined with a minimum region size. Better results require the portrayal of objects and their two-dimensional projections and hence, salient, semantically meaningful regions of the image must be detected. Unfortunately, determining regions of an image that correspond to distinct objects in the scene is one of the fundamental problems in image processing. An image segmentation algorithm groups pixels according to some image-based criteria and may be considered a low-level process in recovering information about objects in a scene.

A good vectorization should be based on an image segmentation where segments correspond to objects in the scene. This correspondence establishes a semantics for the vectorized image and is useful for various vector image applications including image understanding. However, valid segmentations of images exist at varying levels of detail and the desired level depends on the intent of the observer [MacDonaldLangMcAllister06], [UnnikrishanPantofaruHerbert07]. We argue that a vector representation of an image will become more useful if it supports various level of detail. Like most modern segmentation approaches, e.g., [BroxFarinDeWith01], [ComaniciuMeer02], [VanDroogenbroeckTalbot03], our novel hierarchical vectorization method groups pixels with a set of criteria.

In general, two complementary approaches to image segmentation are region-based and edge-based methods [FreixenetMunozRabaMartiCufi02]. Region-based methods search for contiguous groups of pixels having some type of homogeneity that suggests they represent objects; edge-based methods search for steep transitions in the image as evidence of a boundary between objects. The two approaches are not quite dual because region-based methods may not produce boundaries that indicate a transition between objects and edge-based methods may not return the closed contours necessary for a partition of the image, or if they do, the regions between contours may not necessarily have the desired homogeneity properties. Improved performance for practical vision applications can be obtained with hybrid algorithms [FreixenetMunozRabaMartiCufi02]. Our algorithm may be considered a hybrid method because of the use of active contours in the process. The closest method to ours for vectorization of color raster images is the method by Prasad and Skourikhine [PrasadSkourikhine06]. Their segmentation routine is an edge-based method that utilizes a hybrid trixel feature during later stages of the vectorization. An implementation of their algorithm is available in the RaveGrid software [PrasadSkourikhine06]. The vectorization produced by RaveGrid is not hierarchical and does not locate segment boundaries beyond pixel resolution.

Our hybrid approach is based on a region merging segmentation algorithm where higher-level segments are produced by iteratively merging similar low-level segments. A commonly cited shortcoming of the algorithm is that no provision is built-in for determining at what level-of-detail to stop merging and output the segments. We circumvent this problem by applying the algorithm in two phases, an initial segmentation phase which produces intermediate-level segments followed by a second phase which produces a hierarchy of segments, i.e., all segments above the intermediate level are represented in the binary segmentation tree. We are outputting the full hierarchy of segments and allowing the user to adjust the level-of-detail dynamically. Implementations of region merging vary widely in the criteria by which the similarity of adjacent segments is measured. Traditionally size and colour have been central factors, although other experiments have included edge gradient [BroxFarinDeWith01], convexity [Beaulieu06], texture [VanDroogenbroeckTalbot03], and colour distributions [MacDonaldLangMcAllister06]. Our tests measure segment similarity with a measure combining segment size, mean colour, common boundary length and common boundary average gradient.

As stated earlier, our conversion method uses a multi-step process. The method uses both region and edge information with the goal of producing a maximally correct vectorization. An overview of the steps of our HSVGen algorithm is given by:

HSVGen: Raster to Hierachical Vector Image

  1. Initial fine-level segmentation.

  2. Refinement of the linear contours of the fine level segmentation to sub-pixel precision and simplification.

  3. Merging of the segments with the refined contours into a hierarchical tree of image segments.

  4. Conversion of the segments in the tree to a hierarchical SVG image.

The above steps are largely independent and any segmentation technique could be used for the initial segmentation in the first step and any hierarchical segmentation technique which produces a hierarchy of bi-partitions could be used for the third step. However, our approach is based on the same region merging algorithm for both of these steps with the only difference that in the first step, we start merging from the pixel level to a fine-level segmentation and in the third step, we start from the fine-level segmentation and merge until the whole image is in a single segment. Both steps work with the same set of constraints which are combined into a variational criterion. We consider luminance and color similarity within a region, and size and boundary length of a region.

Our region merging algorithm is a greedy approach where starting from an oversegmentation of the image, e.g., each pixel in a separate segment for the initial segmentation, we always merge the two most similar segments until all segments have been merged into a single region. The algorithm produces a hierarchical segmentation in the form of a binary tree. Pseudo-code for this algorithm is as follows:

Region Merging

  1. input: Oversegmentation P

  2. Set the leaves of the binary tree T to P, S=P

    1. while |S| > 1

    2. choose distinct region S1 and S2 such that argmin(S1,S2) {C(Ω,S1,S2)}

    3. S = S / {S1,S2} ∪ {S1 ∪ S2}

    4. add {S1 ∪ S2} to T as parent of S1 and S2

  3. output: T

The binary tree produced by the algorithm has a height which is O(|P|) in the worst case. Finding the segments to merge can be implemented with a adaptable priority queue which stores region pairs {S1,S2} with the merging criterion C(Ω,S1,S2) as a key. Each time through the loop, the minimum cost pair is removed and the remaining at most O(|S|) region pairs involving S1 or S2 are updated.

We use a merging criterion C(Ω,S1,S2) which is a weighted sum of merging costs with the weight vector Ω. As previously stated, our criterion weighs region size, combined boundary length, color and luminance similarity in the region. It also effectively contains a binary multiplicative term which is one if two segments are neighbors and zero otherwise. The cost of merging due to color and luminance within a merged region is determined by the increase in the standard deviation of a Gaussian distribution. The spatial compactness of the boundary is measured by the length of the common boundary between two regions plus the boundary length of the region which would be formed by merging. The region size criteria prefers equally sized regions to be merged according to a beta distribution. The result of region merging is a hierarchical binary partition of the image pixels into regions. In our vectorization method during the initial segmentation, we stop merging once a user-defined number of segments are left in the priority queue. These segments are the leaves for the second application of the region merging technique and serve as the starting point for the vectorization. More details on our region merging approach can be found in [MacDonald08].

Below we present our method to refine region boundaries using the active contour framework[KassWitkinTerzopoulos88]. The purpose of this process is to obtain better localization of the segment boundaries than is possible by using a pixel-based segmentation algorithm alone. Our region refinement has three components: a pixel-based refinement, an active contour based boundary location method and a boundary simplification step. The pixel-based refinement is a preprocessing step which reassigns pixels on region boundaries in the segmentation produced by region merging. This minimizes artifacts created by the sequential nature of the region merging algorithm. After preprocessing, alternating steps of active contour localization and simplification are performed. After the refinement has converged, it is expected that the boundaries will represent a better fit to the data (though we do not quantify this).

Our method localizes the boundaries with active contours. Active contours minimize an energy function based on internal energy, governed by regularity of the contour shape, and external energy and balanced with the fit to the image data. The energy associated with an active contour C(q) [KassWitkinTerzopoulos88] is:

E = ∫ (α |C'(q)|2 + β |C''(q)|2) dq + γ Eext(C(q))

The first two terms determine the contour's tension and rigidity and together these terms define the internal energy of the contour. The term Eext is the external energy which anchors the contour in the image. We employ a novel external energy term derived from competitive forces exerted by adjacent regions on a contour point.

In our implementation the external energy is governed by competing region forces based on the region competition principle [Rosin98]. The basis of this principle is that at the precise boundary of two (true) regions, the domain will be equally representative of each. If there is an imbalance we adjust the boundary in the direction toward one region and the domain will become increasingly akin to the property of that region and less so to the other adjacent regions. The region competition principle allows us to localize boundaries even when the gradient between regions is very gradual or noisy. Regions exert an outward force on the contour proportional to the similarity in colour between the mean of the region estimate and the image data along the contour. The outward force is zero at a position where the image data is equally representative of all incident regions. When computing region forces, image data is convolved with a small Gaussian filter. Smoothing provides robustness to noise and allows contours to be located with sub-pixel precision among mixed pixels. Moreover, the localized Gaussian integration provides improved interpolation of the neighbourhood compared to discrete circular windows used by previous methods since it approximates a narrowly-windowed sinc function.

The active contour step adjusts the contour or boundary between regions. The representation of the boundary is not changed, i.e., the number of edges for a boundary remains fixed. The boundary simplification step performs edge collapses to adjust the representation for the smoother boundary produced by the active contours. We adopt the quadric-based simplification method by Garland and Heckbert [GarlandHeckbert97] to decide if a edge may be collapsed without introducing a large amount of approximation error. The approximation error threshold controls the details of the contour and in our application the path complexity of the produced SVG image. We are satisfied with the results obtained by performing two passes of active contours followed by boundary simplification.

As stated above, one of the problems with computationally identifying salient objects in imagery is the problem of scale. Objects may be composed of smaller objects, and moreover, may not be fully distinct from the objects around them. Any vectorization algorithm should be sensitive to the scale at which an object appears and should be able to adjust the level-of-detail in the conversion. Our hierarchical vectorization goes beyond adjusting the level-of-detail in the conversion by encoding all level-of-detail in the image, simultaneously. This enables to adapt the level-of-detail during the display of the image rather than during conversion. Adaptive level of detail based on display resolution is a feature being considered for SVG 1.2.

HSVGen encodes multiple levels of detail simultaneously by including a supplemental 'detail' attribute within SVG's group element. The detail attribute takes a value between 0.0 and 1.0. When the SVG is rendered, a detail filter can be specified such that a group will be drawn if its detail attribute is smaller than or equal to the detail filter. Thus, at '0.0' only the root of the hierarchy is drawn, at a low detail value most basic shapes are rendered and at '1.0' all shapes are displayed. For hierarchical SVGs generated from raster images, level 1.0 corresponds to the level at which refinement takes place; navigation to a higher detail than this is not possible.

Note that this format redundantly encodes some of the segment contours. However, it allows less important segment contours to be filtered out and also for high-level segments to be rendered with the appropriate interior colour. Note that the detail attribute will be ignored by non-hierarchical SVG viewers.

We show results for two types of images. One type are images with a small limited number of colors, e.g., clip art, traffic signs or other types of stylized graphics. Segmentation may seem trivial but since we typically encounter these images in a lossy compression format, they provide for an interesting comparison of ours to other methods. These images demonstrate mainly the ability of our subpixel active contour method. The other set of results which we present are for images which are photographs of natural scenes. In these natural scenes, the number of objects clearly depends on scale and these results demonstrate the advantages of our hierarchical representation.

We present the conversion of two clip art images. The first set of results uses a clip art showing an elk and demonstrates the level-of-detail feature of our hierarchical conversion. In this figure, the vectorization result is given as a hierachical SVG which can be explored at this link. We also converted the same clip art image to SVG with various fixed numbers of segments (see here). In the case of the example clip art of the elk, our conversion produces SVG images at various useful level-of-detail and the refinement arguably follows a viewers expectation.

The second example is the butterfly shown below. We reduced the size of the original raster image by downsampling with the Lanczos sinc filter in Gimp. We converted the downsampled version to SVG with our HSVGen, with Ravegrid and with Potrace/Inkscape and then render the resulting SVG image in the original resolution again. Below is the result for HSVGen, Ravegrid and Potrace/Inkscape, respectively. Visually, the result for HSVGen are smooth while producing no holes in the images. Ravegrid locates boundaries only to pixel resolution while the Potrace/Inkscape approach leaves holes between regions. The Peak Signal to Noise Ratio (PSNR) corresponding to this test (see table) confirms this observation.

Original butterfly clip art.
Downsampled butterfly clip art.

Original raster image on the left (Image by lemmling available from Open Clip Art Library (in public domain); downsampled image on right (Lanczos filter in Gimp).

Figure 4. Butterfly Clip Art

We choose to characterize the performance of HSVGen with photographs based on three images from the Berkley Segmentation Dataset. A hierarchical SVG image generated by HSVGen demonstrates the usefulness of hierarchies (see the result converted back to a raster format at 220 segments and the hierarchy). We use image #108082 (tiger) to visually compare the result between Ravegrid and HSVGen, as well as tabulate the PSNR. Note that Battito and Puglisi [BattitoPuglisi07] produced visually very pleasing results for the same image but we did not compare them with our results because the SVG images did not align. The visual comparison and the PSNR table support our method. We show the utility of level-of-detail in our conversion method with the example image #102061 (castle). For comparison, we show the three level-of-details which Ravegrid provides to a user. Clearly, the wider range of level-of-details may be useful in various application ranging from background subtraction to adaptive rendering.

This paper argues for the use of hierarchical vectorization of raster images. We encode the result of the vectorization in a hierarchical SVG image which can be displayed with an adaptive level of resolution. The resolution of objects in a hierarchical SVG image may selectively be refined in visualization and analysis applications. All that is needed is a suitable script to select the corresponding group of paths in the SVG image. We also put forward a method for such a hierarchical vectorization based on a hierarchical image segmentation algorithm. We employ region merging to build the hierarchy but we also use active contours to locate region boundary to sub-pixel precision. The length of the path corresponding to the boundary of an individual region is simplified to the desired level by edge collapses. The results of our HSVGen approach compare well with existing non-hierarchical segmentation methods.