BioViz: Genome Viewer

Development of an SVG GUI for the visualization of genome data

Christopher T Lewis
Steve Karcz
Andrew Sharpe
Isobel A.P. Parkin

Molecular Genetics
Saskatoon Research Centre
Agriculture and Agri-food Canada
107 Science Place
Saskatoon, SK.
S7N 0X2, Canada
Email: lewisCT@em.agr.ca
Phone: +1-306-956-7693
Fax: +1-306-956-7247
Webpage: www.brassica.ca

Keywords: Scientific Visualization; SVG-GUI; Scripting; Interactivity

Abstract

We have developed an interactive, browser-based SVG application for the visualization and mining of genome data from the model plant Arabidopsis thaliana. The genome of this plant consists of DNA sequence information organized into five chromosomes that can be up to 30 million base pairs in length and contain up to 7000 distinct genes distributed along each chromosome. This linear arrangement makes the data particularly amenable to display using SVG as it provides a spatial relationship between the constituent parts of the genome. It is important that the user can scale from an overview encompassing millions of base pairs to a detailed view of a specific region without losing spatial reference as the user will often only be interested in a small part of the whole. For instance, each gene has a specific structure that can only be visualized at high levels of magnification. Additionally, client-side scripting allows the user to create new views, or to request supplemental data which can be loaded and dismissed on demand.

To enable this visual browsing of the genome we created what is essentially a client-server, multi-windowed application inside the web browser. The client-side of this application relies on our custom GUI library (CGUI) and the Adobe SVG plug-in. CGUI was written using JavaScript objects, and contains basic objects allowing the easy construction of a GUI. Windows can be opened, closed, and moved inside the GUI. The contents of a window can be scaled and translated independent of other windows. Use of the Adobe postURL method allows the user to request new data and views from the server. Asynchronous data retrieval eliminates the need for annoying page reloads common in other web-based genome browsers, and allows the user to continue working while additional data is loading. The genome data itself resides in a mix of XML files, flat-files and relational databases on the server. It is made available to the client on-demand, after being parsed and transformed into SVG by Perl CGIs. Presentation of the data is handled entirely on the client-side.

SVG has allowed us to create a visually appealing, functional viewer for our data. The CGUI library is our attempt at a generic SVG based GUI library, and demonstrates that such a library is completely feasible. While we used CGUI to display genomics data, it seems that such a library could be useful for other web developers who need to display a large dataset in an organized manner. Inquiries about the CGUI library may be directed to lewisCT@em.agr.ca.

Screen Shot showing BioViz being used to view EST statistics.

A screen shot showing BioViz being used to view EST statistics. The ESTs displayed above and below the BAC in the BAC View have been pulled from a MySQL database. The HSP statistics in the HSP View are from the same database and are produced by BLAST. The BLAST stats are from a seperate database, and show significant matches to genes in other organisms.

Introduction

"Because all modern genomes have arisen from common ancestral genomes, the relationships between genomes can be studied with this fact in mind. This commonality means that information gained in one organism can have application in other even distantly related organisms. Comparative genomics enables the application of information gained from facile model systems to agricultural and medical problems. The nature and significance of differences between genomes also provides a powerful tool for determining the relationship between genotype and phenotype through comparative genomics and morphological and physiological studies." [1]

The model organism of interest in our lab is the plant Arabidopsis thaliana (thale cress or mouse eared cress); the genome sequence for this plant was completed in 2000 [2]. As of August 10, 2001, the TIGR [3] release of the genome contains annotations for 25617 genes, of which 40-50% have been assigned a function. The goal is to use the similarities between the genes in the Arabidopsis thaliana genome and the related Brassica napus (rutabaga, Swedish turnip, canola, rape) genome to assign function to B napus ESTs (Expressed Sequence Tags) collected as part of our EST sequencing program.

There are standard bioinformatics tools available, such as BLAST (Basic Local Alignment Search Tool) [4] and WU-BLAST (Washington University - BLAST), which will find regions of similarity between given sequences. Although these are the standard tools for comparing genomic sequence, the plain text report they produce is not always the most informative way to visualize the relationship. This is especially true when trying to visualize alignments over a large region such as: a BAC (Bacterial Artificial Chromosome), average size 100,000 bp [5]; a chromosome, average size 20-30 Mbp; or the whole genome. We have developed the BioViz: Genome Viewer to assist in visualizing such relationships.

Goals of the Project

The primary purpose of the BioViz: Genome Viewer is to help us visualize the relationship between ESTs (or other features) derived from our crop plant of interest (B napus) and the sequence of the model genome (A. thaliana). By extension the Viewer is also a functional genome browser as the user must be able to access the underlying annotation for the model genome.

The Viewer allows us to efficiently display similarities between a large number of ESTs over a large region of the genome grouped by BAC. We determined the regions of similarity between our ESTs and the BACs in the A. thaliana genome using BLAST and then stored this information in a MySQL database. This database allows the user to view all ESTs which have similarity to a given BAC, to see how different ESTs align with the predicted A. thaliana genes, and to search for a specific EST. SVG allows an intuitive, easy to use, web accessible, front end for this data.

The Viewer is not intended to be an annotation tool. There is no capacity within the Viewer for updating the underlying dataset. The Viewer could however be used to complement an annotation tool by providing a user friendly, publicly accessible (via the web), front end to the underlying database.

It was necessary that the tool be web accessible for two reasons. Firstly, the web provides a venue for advertising the availability and extent of our EST resource. Secondly, making the data accessible in a browser makes it readily available to the scientists who need it, and the simplicity of the web format has helped to ensure user acceptance.

Comparison to Other Browsers

There are currently several genome browsers available, and while they are all useful and informative they have two main drawbacks which BioViz aims to overcome. The Ensembl Genome Browser [6] provides a wealth of information in a straight-forward manner. The data is presented as a bitmap, so when the user wants to change the view by zooming in or by looking at a region further down the contig it is necessary to reload a new bitmap from the server. While this page reload is perfectly functional it can be distracting. This page reload is the first drawback BioViz seeks to overcome.

Another option is the Vista Genome Browser [7]. The Vista browser is available to view the results of the Berkeley Genome Pipeline. This browser uses a Java Applet and relies on Servlets to update the view. It provides a graph showing regions of similarity between the mouse genome and human genome. In this tool the data refreshes are less noticeable and browsing seems to flow better, but the Vista browser suffers from the second drawback inherent in current browsers. The designers have decided where and how the data will be presented and so it has a static display. At present the Vista browser contains only the relationship between mouse and human genome, and to provide additional information it links to a version of the UCSC Genome Browser [8]. The UCSC Genome Browser is very informative, but like the Ensembl Genome Browser it suffers from distracting page reloads and a static display.

The BioViz: Genome Viewer overcomes both of these drawbacks through the use of SVG. All the data is scaled and translated on the client side, so there is no need for the data to be reloaded each time the user desires a different view. This client side transformation drastically improves the flow of browsing. BioViz has been implemented using a "multi-windowed" format which provides the user with a great deal of flexibility when arranging the view of their data.

The browser has been implemented using a client server methodology with the client side responsible for presentation of the data, and the server side responsible for retrieval of the appropriate data at the client's request. Use of the Adobe postURL and parseXML methods allows asynchronous data retrieval, this allows the user to request several pieces of supplementary information and continue working while the data loads. This results in a further improvement to the flow of browsing, especially on a slow network.

Implementation

The server side of the Viewer consists of a collection of Perl CGIs which are called by client requests via the postURL method. These scripts are responsible for gathering data from the appropriate source and turning it into SVG for display in the client. Possible sources include XML files, flat files, MySQL databases, and dynamically generated text reports. The SVG returned from the server is added to the GUI using the parseXML method available in the Adobe SVG plugin.

The client portion of the project is based on the CGUI (Custom Graphical User Interface) toolkit which provides the basic GUI elements used in the browser. These libraries are built using Javascript objects which allow the easy creation of the GUI elements. BioViz specific extensions are included in the BioViz package. A UML diagram of the client side hierarchy has been included at the end of the paper. From here on the discussion will focus primarily on the CGUI toolkit. Consider the following example:

sample.svg
This sample uses the CGUI toolkit to create some simple GUI elements.
Click and hold the title bar to drag the frame.
If you are using Internet Explorer, click and hold the box in the lower right to the resize the frame.
The source can be viewed by right-clicking the image and choosing View Source.
Some of the more interesting portions are discussed below.

Example Code

    // called when the page is first opened
    function _onload (evt) {
      g_root.addEventListener("mousemove", new MouseTracker("mousemove"),false);
      g_root.addEventListener("mouseup", new MouseTracker("mouseup"), false); 

      CGUI.createDefs();
      sampleGUI();
    }

The _onload function is called when the onload event is fired. It creates 'MouseTracker' objects to handle mousemove and mouseup events. This functionality is currently used to handle the dragging and resizing of Frames.

The function then calls the createDefs method, which creates the definitions that are used to give the GUI it's 'look and feel'. If a user wanted to change the look and feel they would need to override or extend this method.

function sampleGUI () {

      var controller = new Object();

      controller.handleEvent = function(evt) {
        var target = evt.getTarget().parentNode.parentNode.jsref;

        if (target.toString() == "All Alone") {
          alert ("Look at me outside the box.");
        } else {
          alert (target.toString());
        }
      }

The simpleGUI function is responsible for creating the GUI in this example. The first thing it does is create a controller object which is then used to control the buttons. Any object would work in this capacity so long as it defined a handleEvent method.

 // continue simpleGUI ()
      var a_buttons = new Array();
      a_buttons[0] = new Button (45, 15, "Help", help_button);
      a_buttons[1] = new Button (45, 15, "Button2", controller);
      a_buttons[2] = new Button (45, 15, "Hello", "Button3('Hello')");
      a_buttons[3] = new Button (45, 15, "Square", addSquare);

Next an array of buttons is created. The parameters are width, height, label and 'event'. Note that the event can be a function - "Help" and "Square", an object - "Button2", or a string - "Button3". This array is passed into the Frame constructor, and the buttons are displayed in the frame's button bar.

 // continue simpleGUI ()
      var ctrl_opts = ControlBox.ALL_CTRLS;

      var main_frame = new Frame (
                         0, 0, 350, 200, 
                         "Title: Sample Framed Window",
                         false, a_buttons, ctrl_opts, true, true, true
                       );

      main_frame.addToParent (g_root);
      main_frame.setStatus ("Status: Frame Created and Displayed.");
      ... add a square

Then a Frame is created. The parameters are x, y, width, height, title, closable, buttons, control options, status bar, content pane, resizable. The frame in this example cannot be closed, contains the 4 buttons created above (a_buttons), has all the control options (ControlBox.ALL_CTRLS), is draggable (default behaviour for a frame), has a status bar, a content pane, and is resizable.

After creating the Frame it must be added to the display. The addToParent method is used here to add the frame to the root, but it could also be used to add it to another Frame or some other CGUI component. The message in the status bar is then updated to tell the user what has happened.

 // continue simpleGUI ()
      var button = new Button (60, 15, "All Alone", controller);
      button.setX(175); 
      button.setY(300); 
      button.addToParent (g_root);

      var button = new Button (60, 15, "Anchored", controller);
      button.setAnchor (g_root, "left", 175, "top", 320);
      button.addToParent (g_root);
    
    } // end simpleGUI

Two additional buttons are created and added to the root document. One is created and positioned using absolute x/y coordinates, the other is created and positioned using the buttons anchor property.

The Anchor object allows a CGUI object (and some non-CGUI objects with a little work) to be anchored relative to another GUI object. The anchor is used to position all the elements inside a Frame, and causes the elements to move as the Frame is resized. The anchor was implemented as a simple (though limited) alternative to a layout manager.

For buttons like these which are outside of any CGUI object it might be good if the Anchor object implemented Kevin Lindsey's AZAP (Anti Zoom and Pan) [9] functionality.

function addSquare(evt) {
      var frame = Frame.getFrame (evt.getTarget());
      frame.addContentAsText (
         "<rect id='MyID' fill='blue' x='"+
          (frame.content.center_coordx - 12.5) +
         "' y='" +
          (frame.content.center_coordy - 12.5) +
         "' width='25' height='25'/>"
      );
    }

The addSquare method demonstrates one way of adding SVG content to a CGUI element. Each time the "Square" button is clicked a square is added to the center of the frame. Note that the SVG is written as normal and passed to the addContentAsText function. If the user wants to create the SVG elements using the document.createElement method, they can add the resulting SVG using the addElementAsSVG method.

BioViz passes the result from the postURL call to addContentAsText and it is added to the GUI.

<script type="text/javascript"
        xlink:href="./cgui_lib.js.gz"/>

The CGUI library is sourced at the end of the SVG using an xlink.

Discussion

CGUI allows the easy creation of an SVG GUI. Creation of a Frame might be considered complicated as there are quite a few parameters to remember, but this is easily overcome by extending the Frame class for your own project and writing a number of class methods for common Frame types. For example a MyFrame.messageBox method might create a Frame that cannot be resized, has no controls, no content pane, one 'ok' button, and a text message.

SVG worked very well for our project, and allowed rapid development; it took one developer 2 months on the GUI side, and another developer 2-3 weeks on the server side. It should be noted that neither developer had experience with XML/SVG or javascript, and that only one had any Perl experience: evidence that SVG is quite straight-forward.

At present CGUI feels like the initial Java GUI in terms of implementation and responsiveness. Like the initial Java GUI, it is not quite as snappy as one might desire, however this is only an issue when displaying large amounts of data or text, and will become less of an issue as processor speeds increase.

Text handling in SVG could be improved in future releases. The browser becomes unresponsive when asked to scale or translate large quantities of text. We tried to use CGUI to display chromatograms (vector graphs with each peak representing, and annotated with a letter), and while it was easy to have the client display the data, we found that the text involved resulted in the browser becoming unresponsive when trying to manipulate the image in any way. We also experienced this limitation of SVG in BioViz where moving a window containing a large piece of sequence tends to be sluggish. Another limitation encountered was the absence of text wrapping in SVG. This is an issue when returning data of unknown length to the client, and when resizing windows.

Conclusions

SVG has allowed us to develop a novel, web-based, genome browser. While it has limitations of it's own, it overcomes what seem to be the two key drawbacks of current web-based genome browsers: page refreshes, and static displays. The CGUI toolkit was produced to aid in development of the Viewer, and will aid in the development of future SVG GUIs. This toolkit will be made publicly available with the hope that other developers will find it useful and interesting, and that they will share modifications with other interested parties. A version of the Viewer, with a subset of the functionality available in our lab, will be made publicly available at www.brassica.ca [10] as a way of sharing our EST resource with the scientific community.

Future Work

As with any software development effort, the documentation needs to be updated and in some cases completed. There are a number of planned additions to the CGUI library, for instance a class which would provide a drop down menu. Some common window types should be created, for instance the messageBox mentioned above. A minimum size check should be implemented when resizing a window to prevent it from being resized out of existence, and the user should be allowed to scroll the contents of windows using the arrow keys. The issue with resizing windows in Netscape needs to be resolved.

Planned improvements to BioViz include highlighting the EST or gene after searching for it, allowing users to paste sequence that they would like to find in the genome, enabling searching by key words (which would ideally involve incorporating the Gene Ontology (GO) [11] controlled vocabulary), and rewriting the scripts on the server as a Perl module.

Client Side Hierarchy

UML Client Side Heirarchy (png)

[1] "What is Genomics?", Genomics at UC Davis
[2] The Arabidopsis Information Resource
[3] The Institute for Genomics Research
[4] BLAST "Basic Overview", NCBI
[5] "How do we Sequence DNA?", University of Michigan, DNA Sequencing Core
[6] Project Ensembl
[7] Vista Genome Browser, Berkeley Genome Pipeline
[8] UCSC Genome Browser, UC Santa Cruz
[9] AntiZoomAndPan, www.kevlindev.com
[10] Not available at time of submission, but coming soon!
[11] Gene Ontology Consortium


Valid XHTML 1.0!