Bill Rutherford

Group for Advanced Information Technology

3700 Willingdon Avenue

Burnaby, BC, V5G 3H2, Canada

email: wrutherf@bcit.ca

tel: (604) 451-6958

web: www.tc.bcit.ca

 

SVG Bioinformatics Collaboration Tool

 

Abstract

 

The presentation will be on the design and implementation of a bioinformatics collaboration, annotation and visualization tool based on SVG. Open standards allow data to be freely exchanged between systems while maintaining its semantic integrity. This paper outlines the design and development approaches we took, functionality of the tool, lessons we learned, as well as some of the remaining issues we continue to research in order to build more efficient SVG solutions.

The input to the tool is an XML file compliant with NCBI Blast Output DTD for both amino acid and nucleic acid sequences. The resulting SVG file is a standalone application that can have sequence annotations and comments added to it successively by several authors each saving their changes for the others to view. The creation of the file is done in a scalable manner using streaming over the XML input with SAX while using the event handler tree to stream out and accumulate the components of the SVG files visualization, navigation and mouse driven information systems. Interactivity is achieved thru ECMA Scripting and custom naming system of the SVG file data objects. The main view window of the tool allows the user to display sequence search results in graphical format rank ordered by HSP score and to zoom this to the underlying textual sequence information as required.

As the output form the BLAST search is sometimes in a disordered array of XML formatted hits the generation software, written in Java, does a preliminary sort using a limited DOM representation of critical parameters only (HSPs). This process is iterated over an arbitrary number of input XML files related by a discovery tree where a result sequence of a previous search can be used as the input parameter for a subsequent search. This provides several panels of graphical data which are limited in the number of hits displayed by a cutoff parameter and which can be selected either by accessing a separate navigation tree which is part of the SVG module or by clicking on a special embedded icon which displays the search descendent as an overlay structure which can be nested in like manner arbitrarily.

The top level SVG file is composed of a number of sub files some of which are static and others, which are generated by the handler tree. The top two SVG windows contain the initial search sequence, which is annotatable in detail and can be zoomed to the underlying textual reference sequence data. The window directly below this is the current reference sequence for the particular search result selected. The window to the right of this is an information display block that is populated by mouse events generated as one passes over the related structure in any of the other sub windows.

The right mouse button is used to access a context sensitive popup menu, which is customized for each sub window.  This can be used to save the state of the entire SVG file including current views and all annotations. The annotation popup is accessed by the context sensitive popup while in the top window containing the initial search sequence. This popup modal dialogue allows one to embed colour coded scalable symbols by numerical sequence index range in the reference panel and to include attached text title, comments and initials of researcher.

Numerous scalability problems were encountered in the course of the project including the arbitrarily large number of input search result files and the number of hits in each of these. Iteration over the SVG elements in the file is also a significant scalability issue due to sluggish performance as the number of elements increases. In particular zooming from graphical to textual representation consumes more resources and causes a delay. The performance of scrolling and panning also suffers somewhat when in text view. We are currently conducting research on some of these issues.