In this paper we present ThemeMountain, an SVG-based Visual Data Mining tool. ThemeMountain is inspired by ThemeRiver [1], a tool that identifies sequential patterns, trends and temporal relationships within large collections of documents and analyzes them over time. A collection of documents or other data could be displayed as a river or ribbon of different colors that flows across a period of time. Within the river, color-coded currents identify widen or narrow themes depending on their relative strength.

The Data Mining is an information extraction activity able to discover hidden facts contained in databases. Using a combination of machine learning, statistical analysis, modeling techniques and database technology, data mining finds patterns and subtle relationships in data, inferring rules that allow the prediction of future results. Typical applications include market segmentation, customer profiling, fraud detection, evaluation of retail promotions and credit risk analysis.

The Visual Data Mining (VDM) approach integrates the human mind ability with the computer computational power in order to create a powerful discovering environment. VDM presents information in such a way the user can easily extracts underlying patterns and mathematical models. On the other hand patterns reveal flows, relations, structures and anomalies of the data helping the user to verify and confirm his knowledge and hypothesis. Moreover patterns can suggest new questions leading the user to new conclusions. Everyday a huge amount of information is generated, so to explore and analyze these data is ever more difficult. Therefore suitable Information Visualization could improve VDM capabilities. Visual data exploration has the great advantage to join the user himself in the data mining process. SVG can be used as an effective visualization tool, due to its powerful scalability and portability. Recently SVG has been also used for rendering of real raster images (see for example [2], [3], [4]).

ThemeMountain is a new tool for VDM, which uses the mountain metaphor instead of the river one. In this system each peak represents the strength by which a theme is presented in a given period of time. ThemeMountain is a server-side interactive visualization system. Using a simple web form, user inputs time range data and themes of interest. The tool works as follows:

  1. send the request to the php module;
  2. send the query to the MySQL database;
  3. receive the answer from the MySQL database;
  4. create the SVG file;
  5. redirect the file to the browser.

Figures below show the overall pipeline together with an SVG output.


Major details, on line demo and results can be found at the following web address:

