Database Driven Web-enabled Public Health GIS

Using XHTML, SVG, ECMAScript,DOM and a Three-tier Architecture

Keywords: web-enabled, GIS, SVG, Public Health GIS

Gerald I. Cheves,
Ph.D. Candidate
New Jersey Institute of Technology
New Jersey


Before beginning the PhD program in Bio-medical Informatics at New Jersey Institute of Technology (NJIT), Gerald Cheves completed two master's degrees at Columbia University in New York City - one degree in Statistics and the other in Mathematics Education. Gerald's research focuses on public health GIS web-enabled models, and public health GIS meta data. In addition to doctoral studies, Gerald also provides consultancy services in data management and quantitative reporting using Statistical Analysis Software (SAS) at the New York City Health and Hospitals Corporation, where he was an Associate Director for Hospital Information Systems prior to beginning work on the PhD at NJIT.

Jason T.L. Wang, Ph.D.
New Jersey Institute of Technology
New Jersey



A data-driven GIS model with user interactivity, made possible by the Document Object Model (DOM), JavaScript and Scalable Vector Graphics (SVG), allows flexibility in map design to reveal spatial patterns in public health and epidemiological data. The model makes it possible for users of varying levels of GIS and cartographic knowledge to visualize many different diseases on a thematic map. A balance is struck between automation and interactivity that provides sophistication in spatial analysis and ease-of-use. The range and choice of colors that are necessary for creating the shading schemes in a choropleth map and the ability to make symbols to represent themes in the map are available in SVG.

Table of Contents

Visualization and Public Health Areal Data
Limitations in Mapping and Analyzing Areal Data
Choosing Class Number and Interval
Using JavaScript to Calculate the Frequency Distribution
Proper Shading and SVG
SVG and Spatial Databases
Model Architecture and Implementation
Concluding Remarks


The web-enabled Public Health GIS model that we propose is one solution to the problem of how to visualize public health and epidemiological data for analysis by policy makers, epidemiologists, healthcare professionals, and emergency management personnel over the internet. Accessing and retrieving data over the World Wide Web (WWW) has been uniformly accepted because it has already become the backbone for most Geospatial Data Infrastructures, by virtue of providing a network among organizations and easily facilitating intra-organizational and inter-organizational transport of massive amounts of data. [KR] Underscoring the need for a model is the US government's initiative to create such a data infrastructure - The National Spatial Data Infrastructure (NSDI) - which would have to include geo-referenced public health data for timely response to unpropitious developments that could have a detrimental effect on public health.

Scalable Vector Graphics (SVG) and Document Object Model (DOM) make it possible to render graphics, such as maps, dynamically in an interactive web site without the high memory overhead that is related to bitmap or raster images, such as JPEG, that are typically found on the WWW. [PL]

The well-documented interactive and dynamic capabilities combine to form the core component of the functioning of the model, and make it possible to include flexibility and choices for visualizing areal data in the user interface. Special consideration has to be given to the color scheme and the distribution of the data when visualizing areal data in a choropleth thematic map design. There are some inherent limitations in using the choropleth map design to display areal data, some of which can be mitigated with spatial data analysis techniques that aid in the selection of the most appropriate model.

We begin our discussion with problems in visualizing areal data and some solutions that utilize JavaScript and interactivity, followed by considerations of color and shading for optimal visual and analytic effect, and concluding with some thoughts on SVG and spatial databases.

Visualization and Public Health Areal Data

Displaying computer graphic images of multidimensional scientific data, such as public health data, for analysis is the essence of visualization. The computing methods, which are broadly interpreted as visualization, incorporate data collection, organization, modeling and representation, and are an outgrowth of statistical analysis. [PE] Visualization can also be described as the quantitative display of data, and has influenced all forms of data analysis, including cartographic data.

In the broad sense, irrespective of the media used to display the map, geographic visualization makes use of visual representations in order to reveal spatial patterns for the most effective use of visual human information processing. In MacEchren's [ME] three dimensional map paradigm, visualization is associated with data exploration in an interactive environment that can be best characterized by the manipulation of class break points in the choropleth map design to bring clearly into focus spatial patterns in the data.

Because public health data is often times agglomerated to some sort of areal unit - usually a geo-political boundary such as census tract, zip code, community board, city, county, state, etc. - a thematic or statistical map is used to emphasize the spatial distribution of one or more geographic attributes or variables. The choropleth map, which is a type of thematic map that represents different magnitudes of a variable within the enumeration units (or data collection units), is appropriate for areal phenomena where there are distinctions among the data collection unit boundaries and when the focus is on the "typical" value for the individual collection unit. [SL]

Limitations in Mapping and Analyzing Areal Data

Two caveats must be considered when using the choropleth design:

1) An average value is assigned to the data collection unit, and variation within the unit is obscured.

2) The boundaries of the data collection units have been arbitrarily drawn and do not correlate with the actual changes in density in the distribution of the phenomena over a geographic region.

Seemingly apparent patterns revealed by area-valued data could be equally attributed to the underlying distribution of the data or the result of how the zones were chosen. One solution to this problem is to standardize data collection units with an equal-sized grid, but this option may not be feasible when the data does not provide geo-referencing at the street address level, or when the public health agency must conduct an analysis based on predetermined geo-political boundaries because of issues related to political constituencies.

Another approach is to standardize data by weighting the raw data counts by the size of the data collection units to adjust for differences in zone sizes. Weighting is a technique that figures prominently in methods used to determine the class intervals and/or number of categories to be represented by shading.

Choosing Class Number and Interval

The appearance of a pattern on a choropleth map is also greatly dependent upon the number and class limits for the shading categories. Choosing the correct number of classes and the size of the class interval should be contingent upon the data distribution. A uniform distribution with a large number of zones requires a large number of classes, but no more than seven or eight. The human eye cannot easily distinguish more shades than seven or eight. A smaller number of classes would be more appropriate for a small number of zones and a bimodal frequency distribution.

The size of the interval is dependent up the frequency distribution of the data as well. A normal or Guassian distribution would argue for class intervals based on standard deviation units, while equal and regular intervals would be more suitable for uniform distributions. J-shaped distributions are best suited to geometric progressions, and multimodal distributions should have class intervals that correspond to breaks in the frequency distribution. [UN]

Using JavaScript to Calculate the Frequency Distribution

JavaScript can be used to set the number of classes and the intervals, but not without a visual examination of the frequency distribution first. Such an algorithm is possible in JavaScript as well. And after opening a pop-up window to display the distribution, the number of classes and the intervals could be entered into a JavaScript form. Class intervals based on percentiles in a uniform distribution or standard deviations in a normal distribution can be chosen from a drop-down menu, for example.

After selecting the number of classes and the interval widths, it is important to know if the selection was optimal. When data are grouped into classes, the true value of the zones are obscured because they are replaced by a limited number of shades. A map with no error would be one in which the shades match the zone data, and the least accurate map would have only one shade. A spatial analysis tool called the error index [UN]

E = 1 - (fitted error/total error)

- Where fitted area = The sum of the squared differences between the true value and the midpoint of the category to which the zone has been assigned, weighted by the proportion of the total mapped area lying in the zone

And total area = The sum of each zone value less the overall data mean squared and weighted -

can be calculated with JavaScript to determine which model has the best fit to the data.

Proper Shading and SVG

Special care and attention must be given to the selection of shades for displaying grades in a choropleth map. There are considerations of the human reaction to color and human visual perception, as well as issues of color limitations on computer screens that have to be considered. When used correctly, color can heighten the level of perception and meaning of spatial patterns in a map, but if used incorrectly, color can just cause confusion.

Human vision perceives three dimensions of color - hue, value and chroma. The sensation that allows us to classify colors as red, yellow, blue, green and so forth is the attribute known as hue. The value of the color describes the lightness or darkness. Chroma, sometimes referred to as saturation, is related to the intensity or brilliance of the color, or the amount of hue.

Certain color pairings can cause blur because of chromatic aberration, which means the eye focuses on different wavelengths at different distances within the eye. Blue and red are at almost opposite poles, and using these colors together will sometimes result in one or the other being out of focus. For example, blue text on a black background can be illegible if there is white or red text in close proximity. It is, therefore, advisable to use a diluted blue to make fine patterns. [WA]

Chromatic aberration can cause strong illusory depth effects that have been used to enhance presentations. Most people perceive red to be closer when both red and blue are superimposed on a black background. Thus, placing red or white lettering on a blue background will cause the lettering to appear to stand out for most people.

The computer is capable of displaying about half of the 4.5 million shades of color that are distinguishable by the human eye. Because some florescent colors have an ultraviolet component that is perceptible to the eye, but cannot be produced by the blue gun, they cannot be displayed by the computer. Metallic colors also cannot be displayed because their specular component gives the color an iridescent and reflective quality that is not easily produced on a computer screen. [CA] Many colors in the very nuanced green family cannot be displayed either.

A light to dark sequential color scheme is suggested, where increases in darkness correspond to increases in the presence of the variable under consideration. Using a single hue for conveying changes in magnitude of a graded series is one option. Another, and sometimes more effective scheme, is the hue-lightness scheme that uses two or more colors, such as a bright yellow to dark red progression. If the data is bipolar - having a natural or meaningful dividing point - a diverging scheme, in which two hues diverge away from a common light hue or neutral gray, is recommended. [SL] Percentage change in population is one example of where there would be a natural dividing point at zero.

Most public health data is probably best suited to the sequential scheme, but either sequential or diverging schemes can be managed easily with the SVG rgb() function. Pre-selection of the color scheme for each of the class numbers is probably best because proper selection of colors is a somewhat advanced topic and should not be left to a novice GIS user.

<svg width="597" height="100">
<path style="fill:#fcfefe" d="M597 0 l0 42 l-597 0 l0 -42 l597 0 z"/>
<path style="fill:#040201" d="M4 4 c194.5 0.6667 390.5 -1.333 584 1 l-0.25 29.75 c-194.4 0.4167 -389.1 0.4167 -583.5 0 l-0.25 -30.75 z"/>
<path style="fill:#fecece" d="M5 5 l50.75 0.25 c0.4166 9.405 0.4166 19.09 0 28.5 c-16.74 0.4167 -33.76 0.4167 -50.5 0 l-0.25 -28.75 z"/>
<path style="fill:#fecfce" d="M58 5 l50.75 0.25 c0.4166 9.405 0.4166 19.09 0 28.5 c-16.74 0.4167 -33.76 0.4167 -50.5 0 l-0.25 -28.75 z"/>
<path style="fill:#fecece" d="M111 5 l50.75 0.25 c0.4166 9.405 0.4166 19.09 0 28.5 c-16.74 0.4167 -33.76 0.4167 -50.5 0 l-0.25 -28.75 z"/>
<path style="fill:#fe9e9f" d="M164 5 l50.75 0.25 c0.4166 9.405 0.4166 19.09 0 28.5 c-16.74 0.4167 -33.76 0.4167 -50.5 0 l-0.25 -28.75 z"/>
<path style="fill:#fe9e9e" d="M217 5 l50.75 0.25 c0.4167 9.405 0.4166 19.09 0 28.5 c-16.74 0.4167 -33.76 0.4167 -50.5 0 l-0.25 -28.75 z"/>
<path style="fill:#fe6d6d" d="M270 5 l51.75 0.25 c0.4167 9.405 0.4166 19.09 0 28.5 c-17.07 0.4167 -34.43 0.4167 -51.5 0 l-0.25 -28.75 z"/>
<path style="fill:#fe6e6d" d="M324 5 l50.75 0.25 c0.4167 9.405 0.4166 19.09 0 28.5 c-16.74 0.4167 -33.76 0.4167 -50.5 0 l-0.25 -28.75 z"/>
<path style="fill:#fe2d2d" d="M377 5 l50.75 0.25 c0.4167 9.405 0.4166 19.09 0 28.5 c-16.74 0.4167 -33.76 0.4167 -50.5 0 l-0.25 -28.75 z"/>
<path style="fill:#fe2d2d" d="M430 5 l50.75 0.25 c0.4167 9.405 0.4166 19.09 0 28.5 c-16.74 0.4167 -33.76 0.4167 -50.5 0 l-0.25 -28.75 z"/>
<path style="fill:#fc0000" d="M483 5 l50.75 0.25 c0.4166 9.405 0.4167 19.09 0 28.5 c-16.74 0.4167 -33.76 0.4167 -50.5 0 l-0.25 -28.75 z"/>
<path style="fill:#fc0000" d="M536 5 l50.75 0.25 c0.4166 9.405 0.4167 19.09 0 28.5 c-16.74 0.4167 -33.76 0.4167 -50.5 0 l-0.25 -28.75 z"/>

Figure 1: Example of the Sequential Shading Scheme With a Single Hue

SVG and Spatial Databases

In this particular instance, the spatial database is a relational model, consisting of one table that contains the zip code for each of the data collection units, disease data, and other variables, such as population and average income. The variables - just plain numbers and a unique identifier, which is the zip code - in this context, are also geographic objects. Geographic objects that are spatially referenced to the data collection units and are descriptive attributes of those units, containing information describing the characteristics of the population in each zone.

In a thematic map design, these attributes are usually displayed as symbols, which are graphical files stored in the database, sometimes as binary large objects or BLOBs. In this model, the SVG program becomes an extension of the spatial database, where the symbolic representation of the data is stored.

Two advantages are a streamlined database that can actually be stored in an XML file instead of a DBMS (size permitting), and greater flexibility in making changes to the symbology. The database is not changed, just the code in the SVG program (and possibly the JavaScript program) that translates the data into the symbols that are displayed on the map.

If the code is written in a object-oriented, modular fashion, new symbols can be constructed that inherit properties in the same way that instances of a programming object can inherit properties from a class of objects. The model extends the relational spatial database and adds some object-oriented features.

Model Architecture and Implementation

The programming logic in the middle tier is quite simple and just executes a query on a disease. Afterwhich, an array containing the value for each zone, innput from the user interface, determines how the JavaScript program will make calculations for each class interval before it shades the SVG MapObject that is embedded in the XHTML page. The complexity lies in the programming logic of the JavaScript program because of the flexibility that the user is afforded in choosing the number of classes and the interval width of each class.

<svg width="850" height="1100">
	<rect x="241" y="59" width="583" height="261"
	<rect x="849" y="251" width="1" height="0"
	<text x="501px" y="80px" style="stroke:none;font-family:@Batang;font-weight:bold;font-size:16"
		transform="translate(-26 3) translate(14 1) translate(533.289 68.4) scale(1 0.916002) translate(-533.289 -68.4) translate(19 1.0917) translate(-17 1.0917) translate(0 6.3471) translate(1.16279 0) translate(-3.48837 -7.61652)"
	<rect x="272" y="97" width="228" height="203"
		transform="translate(0 0) translate(0 0) translate(0 0)"
	<rect x="272" y="97" width="228" height="203"
		transform="matrix(1 0 0 1 111 64) translate(179 -65)"
	<text x="380px" y="121px" transform="translate(-51 1)"
		 style="fill:none;stroke:rgb(0,0,0);font-size:12;font-family:@Batang">JavaScript Object</text>
	<text x="668px" y="121px" transform="translate(-24 2)"
		 style="fill:none;stroke:rgb(0,0,0);font-size:12;font-family:@Batang">Map Object</text>
	<rect x="238" y="374" width="594" height="151"
	<rect x="282" y="476" width="0" height="1"
	<rect x="284" y="390" width="231" height="121"
	<rect x="238" y="580" width="614" height="185"
	<text x="514px" y="351px" transform="translate(-24.4186 -1.16279)"
		 style="stroke:rgb(0,0,0);font-size:18;font-family:@Batang">Web Server</text>
	<text x="516px" y="556px" transform="translate(-53.3333 2.66667)"
		 style="stroke:rgb(0,0,0);font-size:18;font-family:@Batang;font-weight:bold">Database Server</text>
	<text x="516px" y="29px" style="stroke:rgb(0,0,0);font-size:18;font-family:@Batang;font-weight:bold">Client</text>
	<text x="379px" y="420px"
		transform="translate(503.008 416.382) scale(1.31226 1) translate(-503.008 -416.382) translate(503.008 416.382) scale(0.81948 1) translate(-503.008 -416.382) translate(-30.997 1.33333) translate(-21.0779 -1.33333) translate(16.1184 0)"
		 style="stroke:rgb(0,0,0);font-family:@Batang;font-weight:bold;font-size:12">Web Application Code</text>
	<rect x="327" y="621" width="441" height="121"
	<text x="515px" y="604px" style="stroke:rgb(0,0,0);font-family:@Batang;font-size:14">DBMS</text>
		d="M540.348 201 L564 205.975 L540.348 211 L540.348 207.935 L527.826 207.935 L527.826
			203.965 L540.348 203.965 zM525.739 203.965 L525.739 207.935 L519.652 207.935
			L519.652 203.965 zM517.739 203.965 L517.739 207.935 L512.87 207.935 L512.87
			203.965 zM510.957 203.965 L510.957 207.935 L507.304 207.935 L507.304 203.965
			zM505.565 203.965 L505.565 207.935 L502.957 207.935 L502.957 203.965 zM501.565
			203.965 L501.565 207.935 L500 207.935 L500 203.965 z"
	<line x1="340" y1="300" x2="340" y2="374"
	<line x1="362" y1="510" x2="362" y2="619"

Figure 2: Model Architecture

<svg width="825" height="1000">
	<rect x="28" y="45" width="194" height="98"
	<rect x="581" y="51" width="200" height="96"
	<rect x="932" y="173" width="1" height="1"
	<text x="38px" y="71px" transform="translate(0 0) translate(36.7347 4.08163)"
		 style="stroke:rgb(0,0,0);font-family:@Batang;font-size:14">MapFill Object</text>
	<text x="597px" y="80px" transform="translate(38.7755 2.04082)"
		 style="stroke:rgb(0,0,0);font-family:@Batang;font-size:14">User Selection</text>
	<line x1="220" y1="92" x2="579" y2="92"
	<text x="275px" y="61px" transform="translate(75.5102 2.04082)"
		 style="stroke:rgb(0,0,0);font-family:@Batang;font-size:14">determined by</text>
	<rect x="34" y="292" width="171" height="84" transform="translate(0 0)"
	<rect x="34" y="292" width="171" height="84"
		transform="matrix(1 0 0 1 293 166) translate(-95.9184 -167.347) translate(0 2.04082)"
	<rect x="34" y="292" width="171" height="84"
		transform="matrix(1 0 0 1 293 166) translate(91.8367 -169.388)"
	<rect x="34" y="292" width="171" height="84"
		transform="matrix(1 0 0 1 293 166) translate(281.633 -173.469)"
	<line x1="87" y1="229" x2="701" y2="229"
	<line x1="87" y1="229" x2="87" y2="292"
	<line x1="303" y1="229" x2="303" y2="290"
		transform="translate(303 259.5) scale(1 1) translate(-303 -259.5)"
	<line x1="487" y1="231" x2="487" y2="288"
	<line x1="699" y1="229" x2="699" y2="284"
	<polygon points="659.5,154.809 673,178.191 646,178.191"
		transform="matrix(1 0 0 1.49683 0 -82.7229)"
	<line x1="658" y1="184" x2="658" y2="227"
	<text x="428px" y="335px" transform="translate(46.9388 2.04082)"
	<text x="618px" y="329px" style="stroke:rgb(0,0,0);font-size:14;font-family:@Batang">Breaks in Distribution</text>
	<text x="242px" y="335px" style="stroke:rgb(0,0,0);font-size:14;font-family:@Batang"
		transform="translate(0 0) translate(26.5306 -2.04082) translate(-20.4082 0)">Uniform Intervals</text>
	<text x="40px" y="331px"
		transform="translate(24.4898 -2.04082) translate(-16.3265 2.04082)"
		 style="stroke:rgb(0,0,0);font-size:14;font-family:@Batang">Standard Deviations</text>
	<rect x="34" y="410" width="171" height="78" transform="translate(0 0)"
	<rect x="242" y="408" width="0" height="2"
	<rect x="34" y="410" width="171" height="78"
		transform="matrix(1 0 0 1 293 51) translate(-93.8776 -57.1429) translate(0 0) translate(0 8.16327)"
	<rect x="34" y="410" width="171" height="78"
		transform="matrix(1 0 0 1 293 51) translate(93.8776 -51.0204)"
	<rect x="34" y="410" width="171" height="78"
		transform="matrix(1 0 0 1 293 51) translate(281.633 -57.1429) translate(0 0) translate(0 6.12245)"
	<text x="56px" y="465px" style="stroke:rgb(0,0,0);font-size:14;font-family:@Batang">predetermined</text>
	<text x="50px" y="424px" transform="translate(0 0) translate(2.04082 14.2857)"
		 style="stroke:rgb(0,0,0);font-size:14;font-family:@Batang">width and number</text>
	<text x="254px" y="431px" transform="translate(0 0)"
		 style="stroke:rgb(0,0,0);font-size:14;font-family:@Batang">width determined by</text>
	<text x="258px" y="459px" transform="translate(0 0)"
		 style="stroke:rgb(0,0,0);font-size:14;font-family:@Batang">number of intervals</text>
	<text x="440px" y="429px" style="stroke:rgb(0,0,0);font-size:14;font-family:@Batang">width determined by</text>
	<text x="440px" y="459px" style="stroke:rgb(0,0,0);font-size:14;font-family:@Batang">quantile selection</text>
	<text x="622px" y="429px" style="stroke:rgb(0,0,0);font-size:14;font-family:@Batang">width and number</text>
	<text x="636px" y="461px"
		transform="translate(-18.3673 2.04082) translate(6.12245 12.2449) translate(0 0) translate(-4.08163 -14.2857)"
		 style="stroke:rgb(0,0,0);font-size:14;font-family:@Batang">derermined by user</text>
	<line x1="89" y1="408" x2="89" y2="376"
	<line x1="309" y1="410" x2="309" y2="376"
		transform="translate(309 393) scale(1 1) translate(-309 -393)"
	<line x1="503" y1="408" x2="503" y2="371"
	<line x1="687" y1="408" x2="687" y2="367"
	<line x1="952" y1="429" x2="954" y2="431"
	<text x="411px" y="580px"
		transform="translate(-44.898 0) translate(-2.04082 -30.6122) translate(-4.08163 -22.449)"

Figure 3: JavaScript Programming Logic

Concluding Remarks

Of the map design elements that must be included, the legend is the most important in a thematic map. Providing a clear explanation of what each symbol associated with a theme represents is the purpose of the legend. There are a couple of simple guidelines to follow in legend design: Place text immediately to the right of the symbol for an obvious association, and position symbols that represent largest values at the highest positions in a descending arrangement because higher is perceived as larger.

The legend has to be generated dynamically when the number of classes and intervals are selected. This operation requires a number of shaded rectangles equal to the number of classes depicted on the map to be drawn and filled with SVG, along with labels for each rectangle indicating the range of values in the interval.

A title for the map and source of the data are also necessary elements.

A fully automated GIS would be simpler from an implementation perspective but would lack the flexibility to choose the best design for the data distribution. A better fit reveals more meaningful information about the spatial pattern, and that is the purpose of the map.


Special thanks to Dr. Frank King for sharing with me the history of cartography, and to my dissertation committee member Dr. Ying, of Columbia University, for the proofreading and timely suggestions.


Cagle, K. SVG Programming: The Graphical Web. Berkeley: Apress, 2002
Croner, CM. Public Health GIS and the Internet. National Center for Health Statistics. Public Health GIS News and Information; Annual Review of Public Health, Vol 24, May 2003
Gigaux, P, et al. Spatial Databases With Applications to GIS. San Francisco: Morgan Kaufman, 2002
Kraak, M., Brown A. Web Cartography: Developments and Prospects New York: Taylor and Francis, Inc., 2001
MacEachren, A. How Maps Work: Presentation, Visualization and Design. New York: The Guilford Press, 1995
Marini, J. The Document Object Model: Processing Structured Documents. Berkeley: McGraw Hill/Osborne, 2002
Peterson, M. Interactive and Animated Cartography Englewood Cliffs: Prentice Hall, 1995
Plewe, B. A Simple Web Mapping Solution For Complex Databases. SVG Open Conference. 2002
Ramakrishnan, R., Gehrke, J. Database Management Systems, 3rd ed. New York: McGraw-Hill, 2003
Slocum, T. Thematic Cartography And Visualization Upper Saddle River: Prentice Hall, 1998
Unwin, D. Introductory Spatial Analysi New York: Methuen, 1981
Ware, C. Information Visualization: Perception For Design, 2nd ed. San Francisco: Morgan Kaufman, 2004

XHTML rendition created by gcapaper Web Publisher v2.0, © 2001-3 Schema Software Inc.