RaveGrid: Raster to Vector Graphics for Image Data

Keywords: Raster to Vector , image segmentation , trixel , Delaunay triangulation , polygonization , SVG , perceptual grouping

Dr. Sriram Swaminarayan
Technical Staff Member
Los Alamos National Laboratory
P. O. Box 1663, Bikini Atoll Road
Los Alamos
New Mexico
U. S. A.


Sriram Swaminarayan received a dual Ph.D. in Materials Science and Scientific Computing from the University of Michigan in 1995, and a Bachelors of Technology from Indian Institute of Technology, Bombay, India in 1989. He is currently with the Continuum Dynamics group at Los Alamos National Laboratory. His research interests include advanced algorithms and programming methods for high performance computing on advanced architectures, simulation, and image analysis.

Dr. Lakshman Prasad
Technical Staff Member
Los Alamos National Laboratory
P. O. Box 1663, Bikini Atoll Road
Los Alamos
New Mexico
U. S. A.


Lakshman Prasad received his M.S. degree in Mathematics in 1986 from the Indian Institute of Technology, Kanpur, India, and his Ph.D. in Computer Science in 1995 from the Louisiana state University, Baton Rouge, U. S. A. He is currently with the Space and Remote Sensing group at Los Alamos National Laboratory. His research interests are in the areas of digital signal and image processing, image understanding, artificial perception, computational geometry, wavelet transforms and multiscale methods.


We introduce RaveGrid, software that rapidly converts large, complex raster images comprised of pixels into vector images comprised of polygons of varying colors, shapes, and sizes representing image features. The vector representations typically have smaller file sizes, can be scaled arbitrarily, and can be used to identify and tag objects in images based on the polygons constituting them. RaveGrid reduces millions of pixels to a few thousand polygons, typically resulting in significant data compression. Polygons are easily scaled without changing file size. RaveGrid uses only a small subset of image pixels, namely edge pixels separating image features, to construct polygons by algorithms that emulate human visual perception. This not only enables rapid computation but also yields visually meaningful polygons. In addition to significant data reduction, RaveGrid serves as a framework for extracting complex features by grouping polygons based on structural and spatial attributes computed from triangles comprising them, and developing automated feature detection and recognition algorithms. Indeed, the many digital images produced by devices ranging from simple point-and-shoot cameras to high-resolution image sensors on commercial and surveillance satellites have fostered a rapidly growing need to be able to identify the objects in a digital image. Large-scale survey and mapping of terrain for developing geographic information systems (GISs), content-based searching or blocking of images on the Internet are just a few applications that require object identification. RaveGrid can be used to obtain geometric information about the structure of an image that can be easily combined in new ways to answer various queries about image content. The potential to address new queries with minimal recomputation on an image is important for developing the query-driven image search-and-retrieval engines of the future for the Internet and large databases. The underlying triangular composition of polygons produced by RaveGrid helps the software easily and readily compute a large and richly informative set of feature attributes such as relative sizes, color variability, linearity, shape structures, neighborhood connectivity, etc., that enable one to efficiently answer multiple queries about image content.

Table of Contents

1. Introduction
2. Background
3. Efficient Image Vectorization
4. Multiscale vectorization and salient feature extraction
     4.1 Grain agglomeration
5. Conclusion
6. References

1. Introduction

Today, images are increasingly shared across computers, cell phones, TVs, Web pages etc., requiring compact file sizes for transmission and storage economy, quick and easy scaling to resolutions of various device screens, graphical and textual representations for incorporation into multimedia Web pages, and automated object identification for content-based search. Raster bitmap images, made up of pixels, are limiting in that file sizes increase with image sizes, taxing storage and bandwidth resources. Scaling or rotating raster images requires interpolation, taxing computing resources on small devices. They cannot be embedded or described economically in a Web-friendly document such as XML or SVG. Moreover, pixel-based representation is too primitive for describing content. Vector images, on the other hand, can be described via editable text files, such as SVG, that can be seamlessly included in multimedia Web documents. Vector images can be specified in terms of graphic and geometric primitives such as points, lines, curves, polygons, fills, and gradients. Apart from scalability, this affords a much higher level of description of images than possible with pixel-based raster images. In particular, it is desirable to be able to automatically parse a raster image into geometric primitives, each of which is structurally and spectrally adapted to the visual information underlying the image region it represents at the desired level of detail. This would enable the creation of a rich vocabulary of descriptors, such as shapes attributed with color information, adapted to image content. Such descriptors would then provide a means of characterizing content in terms of "keyshapes" for image search and retrieval. The ubiquity and quantity of imagery in media and the Web necessitates that such automated transcriptions of raster imagery to rich vector formats be performed efficiently and economically. Further, such procedures would be most useful when they can be flexible to the computational needs of various applications. In this paper, we present vectorization software RaveGrid (Raster to Vector Graphics for Image Data) that is aimed at meeting these challenges.

2. Background

Recently, several tools have emerged for vectorizing raster images for the purposes of further manipulation or rendering, notably in the area of Web graphics design (http://en.wikipedia.org/wiki/List_of_raster_to_vector_conversion_software). Most of these are nothing more than user-interactive tools directed at the graphics designer to create an artistic vector image from scratch or from a reference raster image. This is often labor-intensive and not amenable for converting large,complex imagery, such as satellite images, or large volumes of imagery into vector format. Other vectorization software [3,4] use image segmentation to obtain regions of uniform pixel values, and then fit curves or polygons to the boundaries of these regions to obtain a vector description. Typically, image segmentation algorithms are computationally intensive as they process all pixels in an image and assign them new color values based on neighborhood and image statistics [6]. This also limits the size of images that can be segmented in a reasonable time. Thus, most current image vectorization software can handle modest size images that are not too complex.

In earlier work [1-3] we introduced VISTA (Vectorized Image Segmentation via Trixel Agglomeration), a radically new method for efficiently decomposing images in terms of visually meaningful feature primitives. VISTA exploits both regional and edge cues in an image to obtain an efficient, data-adaptive representation using polygons instead of pixels. The essential idea behind this approach is the utilization of a sparse but salient subset of pixels in an image, namely edge pixels. These pixels, situated at feature boundaries, act as anchor points for a Delaunay triangulation of the image. The triangulation serves as an image-adaptive grid from which spatial relationships between feature boundaries can be computed. Based on these relationships, the triangles are then selectively merged to obtain polygons by means of grouping criteria that are modeled after perceptual organization principles, such as proximity, continuity, etc., long observed empirically in the workings of human vision. Figures 1-8 illustrate of our vectorization process.

VISTA provides a flexible algorithmic framework for image vectorization at multiple resolutions and based on a wide range of criteria for merging trixels and polygons. Furthermore, the data reduction achieved by computing only with edge pixels instead of all image pixels for polygonal decomposition assures superior efficiency over other approaches.


A raster color/grayscale image comprised of pixels is input to RaveGrid.

Figure 1: Raster image


An edge detection algorithm is applied and chains of contiguous edge pixels yield an initial incomplete set of feture boundaries (shown in maroon).

Figure 2: Low detail edges


The vertices of the Delaunay triangles (shown with gray edges) are anchored at the vertices of the contour chains (maroon) and do not cross contour chains.

Figure 3: Constrained Delaunay triangulation of edge points


A sparse sampling of triangles is performed to obtain average pixel color of the corresponding trixels.

Figure 4: Sampling triangle colors to obtain trixels (color attributed triangles).


A trixel grouping graph, wherein the vertices represent trixels and edges go between adgacent trixels that are to be merged according to the composite criterion of all perceptual filters,is constructed. Connected components of this grouping graph correspond to polygonal regions of the vectorized image. The polygons are obtained by merging trixels in each connected component. Each polygon is attributed the area-weighted mean color of its constituent trixels.

Figure 5: Perceptual grouping of trixels into polygons


The resulting polygons with their attributed colors define a vector representation of the original raster image.

Figure 6: Polygonization from low level edges


Vectorization obtained from a medium level of edge detail

Figure 7: Polygonization from medium level edges


Vectorization obtained from a high level of edge detail

Figure 8: Polygonization from high level edges

3. Efficient Image Vectorization

We have recently developed RaveGrid, software that can rapidly convert large, complex raster images into vector images. RaveGrid, based on VISTA, runs an order of magnitude faster than its closest competitor Vector Eye [4]. Additionally, RaveGrid scales linearly with image size and can handle large and highly textured images (Figure 9). RaveGrid vectorizes images at an average rate 0.5 megapixels per second on a Pentium 2.13 GHz laptop with 2 gigabytes of RAM. Currently, RaveGrid produces vector images in SVG and EPS formats with gzip compression. Ravegrid can perform vectorize several images in batch mode. A limited functionality non-commercial version of RaveGrid is available for evaluation at http://www.lanl.gov/software/RaveGrid/. RaveGrid can compute vector images at any given sensitivity or scale of edge detection accordingly, the vector images are finer or coarser representation of the original raster image with larger or smaller file sizes, respectively.


Empirical tests show better than linear scaling of RaveGrid's performance with image size

Figure 9: Performance scaling of RaveGrid with image size

4. Multiscale vectorization and salient feature extraction

In addition to controlling the level of detail and scale of vectorization by the sensitivity and scale of edge detection, RaveGrid offers control of the granularity of vectorization by allowing selective merging of polygons based on their shape,size, color, and adjacencies at multiple scales. This is very useful for "discovering" features at various saliencies in images for image search and retrieval applications. Since the polygons are comprised of triangles, they each come equipped with an adaptive grid to compute shape and structural properties, ranging from area and perimeter to more sophisticated attributes such as skeletons and parts. These descriptors can be used to decide which shapes are interesting, salient, or meaningless. They can also be used to decide if two polygons are related or whether they should be merged into a larger polygon. This makes RaveGrid a flexible and open-ended tool for automated feature extraction. Elsewhere [5-8], we have developed methods for characterization of shapes via their skeletons and parts using Delaunay triangulations of shape interiors. The triangles constituting polygons can then be used to characterize their shapes for query and comparison with a minimum of recomputation. The description of these methods is outside the scope of this paper and we refer the interested reader to the aforementioned papers. In this paper we will restrict ourselves to describing and illustrating a simple technique of growing polygons to reflect salient structures in vector images produced by RaveGrid.

4.1 Grain agglomeration

Real world raster images can contain texture, and subtle or noisy variations in neighboring pixel values that might suggest irrelevant or false boundaries to most segmentation and edge detection algorithms. In general it is very hard to specify just the right tolerances, thresholds, or denoising to eliminate these unwanted artifacts. The presence of such artifacts tends to yield fragmentation or oversegmentation of images, obscuring the important underlying feaures. Rather than filter edges for these artifacts, we look to obliterating such nonsalient boundaries by merging polygons whose sizes (areas) fall below a threshold with neighboring polygons that are spectrally the closest. The rationale behind this strategy is twofold: First, polygons below a certain size may be useless or unlikely candidates for adequately representing a shape of interest. For example, it is hard to say for sure if a polygon corresponds to an entire car if its area is about four pixels. On the other hand it is likely that a small, semantically insignificant polygon is a fragment due to oversegmentation, or a textural element. Second, nonsalient boundaries often occur due to gradations in shade of a region of pixels of roughly the same hue. So it is reasonable to merge a small polygon with aneighboring polygon that is closest in color. In the event that there is no neighboring polygon that is close enough in color to justify merging, the polygon will remain isolated, being salient for the reason that it stands out among its immediate neighbors, and hence possibly be of interest. This strategy can be pursued at multiple levels of grain size to obtain hierarchical, multiscale clustering of fine to coarse polygons of increasing saliency, in size and shape.

In general, it is not clear at what granularity an image will yield salient or interesting objects. Indeed, most real world images contain meaningful features at multiple scales. It is therefore unrealistic to expect a single segmentation of the image to cater to discovering objects and features at multiple scales. At the same time, it is desirable to have a method for estimating the 'natural' granularities of an image whose homogenization through polygon fusion would precipitate interesting features. We propose a method of non-parametric hierarchical segmentation via grain agglomeration by estimating the dominant granularity at any resolution of polygonization. For this we will employ a recently developed method for optimal histogram bin selection due to Shimazaki and Shinomoto [9]. Using their method, we can plot the size distribution of polygons and choose the largest peak as representing the dominant granularity size and merging polygons of area less than or equal to the upper edge of the dominant bin with adjacent polygons that are least expensive to merge with. That is to say, with polygons that are closest in color/intensity, and are not separated by a salient edge from the polygon to be merged. Edge saliency can be determined at the edge detection stage based on the length and strength of edges obtained, as well as by attributing triangle edges that are retained after grouping with a saliency measure that is based on the saliencies of the image edges they connect. The resulting set of Polygons can then be subjected to a similar analysis of dominant granularity and subsequent merging, thus yielding a hierarchy of polygonal decompositions that could contain structurally salient features at multiple scales. This results in a forest of shape trees which can then be mined for meaningful shapes. We sketch the key steps of this method below:

Basic Algorithm

Step 1: Compute polygons from fine scale edges

Step 2: Compute optimal histogram of polygon areas

step 3: Select location of biggest mode of histogram as dominant granularity of vector image

Step 4: Merge polygons with areas less than or equal to the area of dominant granularity with adjacent polygon of minimal distance( e.g., color, intensity, edge saliency cost)

Step 5: Repeat steps 2 through 4 until number of polygons converges or desired result is obtained

The method of polygonization employed by RaveGrid lends itself particularly well to implementing this strategy in a very efficient manner with minimal recomputation. Indeed, because polygons are composed of triangles, computing the area of a polygon amounts to summing the areas of its constituent triangles. Next, the merging of two adjacent polygons is easily accomplished by regrouping the triangles of both polygons together and extracting their joint outer contour via the trixel grouping graph mentioned in section 2. Finally, the proximity in hue of neighboring polygons is easily checked for by comparing the color attributes of the individual polygons. Thus polygon agglomeration based on grain size and color can be performed rapidly and at multiple scales within the framework of RaveGrid, greatly increasing the chances of extracting salient and meaningful structures in polygonal decompositions. We illustrate below the synthesis of such salient polygons in vector images produced by RaveGrid.


Figure 10: Raster image of cropfields (courtesy Digitalglobe)


Figure 11: RaveGrid vector image of cropfields at medium edge detail


The bold blue outlines indicates the result of agglomerating polygons with area of 500 pixels or less, the green outlines are a result of agglomerations of polygons with areas of 50 pixels or less, and the thin maroon lines are the outlines of polygons in the original vectorization in Figure 11.

Figure 12: Multilevel Grain agglomeration showing polygonal features at different scales


Figure 13: Raster image of mare and foal


Figure 14: RaveGrid vector image of mare and foal at high edge detail


The bold blue outline indicates the result of agglomerating polygons with area of 120 pixels or less, and the thin maroon lines are the outlines of polygons in the original vectorization in Figure 14.

Figure 15: Grain agglomeration showing mare and foal as a salient object


Figure 16: Raster of image of girls in costume


Low edge detail vectorization by RaveGrid

Figure 17: RaveGrid vector image at low edge detail of girls in costume


Figure 18: Grain agglomeration showing salient polygons outlined in blue


Figure 19: Raster image of elk


Figure 20: RaveGrid vector image of elk at high edge detail


Polygon corresponding to the elk is extracted by agglomerating polygons with area of 15000 pixels or less

Figure 21: Result of grain agglomeration


Figure 22: Raster image of bridge


Figure 23: RaveGrid vector image of bridge at high edge detail


Salient feature polygons extracted by agglomerating polygons with area of 200 pixels or less

Figure 24: Result of grain agglomeration


Figure 25: Raster image of steeple


Figure 26: RaveGrid vector image of steeple at medium edge detail


Steeple polygon extracted by agglomerating polygons with area of 700 pixels or less

Figure 27: Result of grain agglomeration


Figure 28: Raster image of jet


Figure 29: RaveGrid vector image of jet at high edge detail


Jet polygon extracted by agglomerating polygons with area of 700 pixels or less

Figure 30: Result of grain agglomeration

5. Conclusion

We have briefly described the advantages and features of RaveGrid in terms of efficiency and utility for both image vectorization as well as the potential use of vector images in extracting salient objects of interest for application in content-based image query, search, and retrieval. We illustrated the concept of polygon synthesis to mine images for objects of interest by using the simple criterion of polygon size and color. More sophisticated criteria based on polygon shape and adjacency properties can be developed using shape analysis methods available today. Polygon agglomeration along with rich attribution of polygons with structural and spectral descriptors hold the potential of enabling the lore of text search algorithms to work in the context of images as well. This can then realize image search engines that can operate on raw databases of unannotated images and automatically create metadata information for efficient tagging and retrieval.

6. References

  1. L. Prasad, A. Skourikhine, Raster to Vector Conversion of Images for Efficient SVG Representation, Pro-ceedings of SVG Open 2005, Enschede, The Netherlands, August 2005.
  2. L. Prasad, A. N. Skourikhine, Vectorized Image Segmentation via Trixel Agglomeration, Pattern Recognition vol. 39, issue 4, pp 501-514, April 2006
  3. L. Prasad, A. N. Skourikhine, VISTA: Vectorized Image Segmentation via Trixel Agglomeration, U.S. Patent 7,127,104, issued 2006
  4. Vector Eye, Raster to Vector Converter, http://www.siame.com/index.html (2003)
  5. L. Prasad, Morphological Analysis of Shapes, Center for Nonlinear Studies, Los Alamos National Laboratory, CNLS Newsletter #139, July`97
  6. L. Prasad, R. L. Rao, A geometric transform for shape feature extraction, Proceedings of the SPIE, Vision Geometry IX, 2000, San Diego, CA, pp 222-233
  7. L. Prasad, A. Skourikhine, B. Schlei.: Feature-based Syntactic and Metric Shape Recognition, Proc. SPIE, vol. 4117, Vision Geometry IX (2000)
  8. L. Prasad. Rectification of the chordal axis transform skeleton and criteria for shape decomposition, Image and Vision Computing, vol. 25, issue 10,pp 1557-1571, October 2007
  9. H. Shimazaki, S. Shinomoto, A Method for Selecting the Bin Size of a Time Histogram, Neural computation, vol. 19(6), pp 1503-1527, 2007

XHTML rendition made possible by SchemaSoft's Document Interpreter™ technology.