The present invention relates to a computer-aided system and method for for analysis and visualization of signalling and metabolic pathways. The present invention particularly relates to a system and a method for pathway, component and micro-array analysis and visualization of signaling and metabolic pathways.
The physiological functions of an organism are accomplished through coordinated regulation of complex networks, which occur at multiple levels. Homeostasis is maintained through the coordinated cell-cell signaling network potentiated through chemical signals.
Intracellular signaling pathways communicate extra cellular information to modulate cellular functions in response to external stimuli. Biomolecular interactions serve not only as a basis to transmit information but also to process the information as it is being transmitted. Such processing occurs due to interaction between various signaling pathways thus weaving a huge network. Such networks are quite complex and may have properties that are non intuitive.
Understanding such complex network becomes increasingly important as it gives us the much needed insights of the molecular pathogenesis of a disease and more so the cause-effect relationship of an individual entity in a system. Thus, intelligent, swift and logical research based products would hasten the understanding and helps to derive logical conclusions for designing more effective approaches for targeting the disease.
The advent of wide range of molecular tools and powerful computers provides us with unprecedented capacity to generate data that reveals the architecture of genomes, genes, traits and how these influence the cellular and molecular processes to bring about the desired phenotypic changes in an organism. The development of micro-array technologies provides a powerful tool by which the expression patterns of thousands of genes can be monitored simultaneously. Comparison of expression arrays from different tissue samples is proving to be quite useful in providing insight into and information about the important genes and their function. To analyze and make sense of this data, we need computers and sophisticated algorithms.
In recent years, the field of bioinformatics has emerged to meet these challenges. By definition, bioinformatics is the science of turning biological data into information. A combination of computer science, information technology, and molecular biology, bioinformatics allows researches to quickly access and interpret a rising tide of genomic information. This is critical for the genomic era: scientists are sequencing the genomes of many species, but they know little about how great regions of these genomes and the proteins they give rise to actually function.
With increase in data, there is an ever increasing demand for storage analysis and retrieval of the data in the form of databases.
The most commonly used public domain databases such as EMBL (The EMBL Nucleotide Sequence Database (also known as EMBL-Bank) constitutes Europe's primary nucleotide sequence resource. Main sources for DNA and RNA sequences are direct submissions from individual researchers, genome sequencing projects and patent applications.(http://www.ebi.ac.uk/embl/), GenBank (GenBank® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. GenBank is part of the International Nucleotide Sequence Database Collaboration, which comprises the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI; http://www.ncbi.nlm.nih.gov/Genbank/), PIR-NRL3D (The PIR-NRL3D Sequence-Structure Database is produced by PIR-International from sequence and annotation information extracted from three-dimensional structures in the Protein Databank (PDB); http://pir.georgetown.edu/pirwww/); PDB (protein and nucleic acid three-dimensional structures; http://www.rcsb.org/pdb/); OWL (OWL is a non-redundant composite of 4 publicly-available primary sources: SWISS-PROT, PIR, GenBank (translation) and NRL-3D; http://bioinfman.ac.uk/dbbrowser/OWL/); Swiss-Prot (a curated protein sequence database which strives to provide a high level of annotation (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases; http://us.expasy.org/sprot/); TrEMBL (a computer-annotated supplement of Swiss-Prot that contains all the translations of EMBL nucleotide sequence entries not yet integrated in Swiss-Prot; http://us.expasy.org/sprot/) etc., contain genomic, proteomic, biochemical, chemical, and molecular biological data as well as structural data comprising geometric and anatomical information from the sub-cellular localization to the molecular function of the biological entity. The databases allow researchers to search online for a given gene's composition, proteins, mutations, coverage in the scientific literature, and many other relevant parameters that are collectively termed “annotation”. Integrating such information from varied resources will be of vital importance for a single point access to all the related information, as described by Maauley et al., A Model System for studying the Integration of Molecular Biology Databases, 14 Bioinformatics, 575 (1998).
However, understanding gene structure and its function is not just sufficient enough to understand how these genes interact with each other in a regulatory network to modulate the cellular processes. One such approach is found in the PATHDB program available from the National Centre for Genome Resources (http://www.ncgr.org/pathdb). PathDB is a beta level research tool for scientists interested in analyzing their experimental or computational data in the context of biological pathways and networks. The main data types represented by PathDB are compounds, reactions, enzymes and other metabolic proteins and pathways. Similar metabolic pathway databases containing gene sequences data and other biochemical information include EMP and MPW, which are available from the Argonne National Laboratory Computational Biology Group. (http://emp.mcs.anl.gov/; http://wit.mcs.anl.gov/MPW)
One of the best repositories for protein-protein interactions is the Biomolecular Interaction Network Database, is a collection of records documenting molecular interactions. The contents of BIND include high-throughput data submissions and hand-curated information gathered from the scientific literature, coordinated in part by Genome Canada, a genomic research organization based in Ottawa. (http://www.bind.ca/).
Protein-protein interaction data is increasing enormously in volume at an unpredictable rate. Such proteomic data from various sources is available in text files or databases. Due to its volume, the data can be understood or interpreted more easily if expressed into graphs rather than a long list of proteins. Efforts are on to provide better visualizations to depict protein-protein interactions in form of 2D and 3D graph. For e.g., A method for partitioned layout of interaction networks, as described in U.S. Pat. No. 59,522 A1 have been used to represent protein interaction networks into a three dimensional graph.
Other layout algorithm for depicting protein-protein interaction data in the form of graphs is the Spring-force layout algorithm and Sugiyama algorithm. The class SpringLayout represents the spring embedded layout algorithm by Fruchterman and Reingold [Graph Drawing by Force-Directed Placement, Software—Practice and Experience 21, pp. 1129-1164, 1991]. This algorithm draws a general graph G straight-line. The drawing of a planar graph must contain crossings. The idea of the algorithm is the one of simulating a system of mass particles. The vertices simulate mass points repelling each other and the edges simulate springs with attracting forces. The algorithm tries to minimize the energy of this physical system. The Sugiyama layout is a very popular and fast layout algorithms. The class Sugiyama Layout represents a general framework for drawing graphs with the hierarchical drawing method suggested by Sugiyama, How to Draw a Directed Graph, Journal of Information Processing, 13 (4), pp. 424-437, 1990.
Many biological functions are accomplished by altering the expression of various genes through transcriptional and/or translational control. The fundamental biological processes including cell cycle progression and regulation, cell differentiation and cell death are characterized by the variations in gene expression levels. However, expression of a particular gene is regulated by the coordinated interaction of large number of regulatory proteins. Understanding such complex protein-protein interactions in the form of regulatory networks or molecular pathways becomes increasingly important as it gives us the much needed insights of the molecular pathways. This also becomes increasingly important as it gives us the much needed insights of the molecular pathogenesis of a disease and more so the cause-effect relationship of an individual entity in a system. The assessment of large scale gene expression studies is enabled by high through put gene expression studies such as microarray, SAGE, etc.
Analysis, visualization and mapping of gene expression data on maps of known metabolic and signaling pathways is vital significance in understanding the biological relevance of gene expression. One such software tool, Gene MicroArray Pathway Profiler (GENMAPP) (http://www.genmapp.org/), is a free computer application designed to visualize gene expression and other genomic data on maps representing biological pathways and groupings of genes. Integrated with GenMAPP are programs to perform a global analysis of gene expression or genomic data in the context of hundreds of pathway MAPPs and thousands of Gene Ontology Terms (MAPPFinder), import lists of genes/proteins to build new MAPPs (MAPPBuilder), and export archives of MAPPs and expression/genomic data to the web. It has been developed by Gladstone-Genome, University of California at San Francisco.
The other such commercially available software is TRANSPATH®/NetProTM database which provide information about signal transduction pathways, in particular those that aim at transcription regulatory components.
On the other hand, the disease or the physiology specific networks are the missing links in such software.
Citation of a reference herein shall not be construed as an admission that such reference is prior art to the present invention.
The primary object of the present invention is to provide a computer-aided system for analysis and visualization of signaling and metabolic pathways of biological entities.
An object of the present invention is to provide a computer-aided method for pathway and component search, micro-array data analysis and visualization of signaling and metabolic pathways.
Another object of the present invention is to provide information on regulatory and signalling pathways across species, information on all participating biomolecules, high priority diseases and disease responsive genes and knowledge databases.
Yet another object of the present invention is to provide pathway visualization in terms of biological entities and interactions between the biological entities.
Further object of the present invention is to identify all the genes in a network directly or indirectly influencing the disease/physiological disorder.
Another object of the present invention is to secure regulatory information stimulated by a trigger or condition in a disease/physiological disorder.
Still another object of the present invention is to identify the critical genes implicated in a disease/physiological disorder.
Further object of the invention is to provide pathways specific to a disease/physiology, organism, organ, tissue or cell line/cell type.
Another object of the present invention is to provide pathway search based on organism, disease, physiology, pathway name, etc.
Yet another object of the present invention is to provide micro-array data analysis based on genes and their expression data.
Still another object of the present invention is its ability to inter operate with statistical visualisation packages like Spotfire, Genespring, etc. for customised analysis of microarray expression data and mapping refined expression data on to pathways to find its biological relevance.
Further object of the present invention is to provide an easy navigation to view information on protein-protein interaction, knockout, mutagenesis, catalyst, interaction site, etc.
Another object of the present invention is to provide information on all biological entities in the pathway and represent them in the form of either a pathway diagram or report.
Yet another object of the present invention is to display the nature of interactions between two biological entities (mechanism, mode, relation and direction) in a pathway diagram.
Further another object of the present invention is to display information on the expression profiles of the responsive genes.
Another object of the present invention is to generate customized reports on genes and their interactions.
Yet object of the present invention is to provide dynamic generation of pathway diagrams with highlighting based on expression level.
Still another object of the present invention is prioritising the pathways/disease/physiology based on the number of gene hits in a pathway/disease/physiology in a microarray search.
Further object of the present invention is the ability to port the pathway information in XML, SBML, Resnet, etc. file formats for interoperability of data across platforms.
A computer system for analysis and visualization of signalling and metabolic pathways, said system comprising a plurality of functionally inter-related databases including data warehouse for extracting at least one attribute of a biological entity, a pathway database for storing curated signalling interaction, component and micro-array data of the biological entities, said plurality of databases including a processed database, said processed database further comprising a hierarchical arrangement of signalling interactions among the biological entities, components and micro-array data, a curator member to generate curated signalling interaction between the biological entities obtained from external sources, a processing system including a server module to fetch and/or generate desired dynamic pathways from the stored signalling interactions, components or micro-array data, said processing system to obtain information on the selected biological entities or their interactions from said dynamic pathways, a user interface for creating, querying, and viewing the dynamic pathways. The present invention also provides a method for pathway and component search and microarray analysis and visualization of signalling and metabolic pathways of biological entities.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
a & b depict for the Sequence Diagram of the Pathway search.
a & b depict Sequence Diagram of the Component and Microarray search.
a & b depicts Sequence Diagram of the Graph Builder.
Definitions
A “biological entity”, which is a particular or discrete unit that is a part of, plays role in, or affects a biological system. Biological entities include components of a biological system or objects, elements or molecules that affect biological functions.
An “interaction” defines the nature by which two or more proteins or bio-molecules are related to each other in a signaling or metabolic network, linked by directional arrows.
A “pathway diagram” is a graphical representation of relationships between and among biological entities or compositions of biological entities, involved in a biochemical cascade stimulated by a trigger or condition in a disease or physiological process.
A “component” is a gene, protein or any other bio-molecule participating in an interaction.
An “interaction map” also is a graphical representation of relationships between and among biological entities or compositions of biological entities, linked to each other irrespective of their involvement in a biochemical cascade, but due to their nature to interact with one another.
A “gene” is a fundamental physical and functional unit of heredity. A gene is an ordered sequence of nucleotides located in a particular position on a particular chromosome that encodes a specific functional product (i.e., a protein or RNA molecule).
A “protein” is a polymer of amino acids linked via peptide bonds and which may be composed of two or more chains. The uniqueness of individual proteins depends on the length and order of amino acids within the proteins.
A “hit” refers to a result—a component, interaction or a pathway that matches the user query.
“Data” refer to the information gathered from literatures and public domain databases relating to the biological entities.
“Upregulation” refers to a positive regulatory effect on physiological processes at the molecular, cellular or systemic level.
“Downregulation” refers to a negative regulatory effect on physiological processes at the molecular, cellular or systemic level.
“Micro-array” refers to an array of DNA or protein samples that can be hybridized with probes to study patterns of gene expression.
“Dataset” is a collection of data records having values obtained by performing Micro-array experiments.
“Time series data” refers to data obtained by measurement of gene expression amounts of a subject of group of genes over the course of time.
The present invention relates to a digitally-implemented computer system for storing, modifying, retrieving, analyzing and visualizing biological data of biological entities.
Referring to
The data storage means of the present invention comprises, an external database, a pathway database and a pathart database, said databases are functionally linked to one another to facilitate transfer of data.
The external database which is designated as jbl_pddb schema is an integrated platform for data from more than 13 external data sources. The external data sources are public domain databases having data pertaining to functional annotation of human, mouse and rat genes. The public domain databases include UniGene, LocusLink, HomoloGene, Genbank, Affymetrix, Agilent, and Applied Biosystems & Amersham Biosciences. The data from public domain databases are imported into data ware house or jbl_pddb. The sequence, function, localization and summary data obtained from public domain databases such as GO, OMIM, Pubmed, InterPro, EC, TrEMBL/SWISS-PROT and KEGG Pathway databases can also be made available to the present system by way of hyperlinks, subject to prior permission, wherever necessary, obtained from the respective owners of such sources.
The data storage means also comprises a pathway database, which is designated as jbl_pathway schema. The pathway database is a knowledge base comprising interactions between biological entities. The data of said pathway database are acquired through a data capture application means designated as curator member or curator's workbench (CWB).
The application of curator's workbench (CWB) is depicted in
The set of interactions, which belong to a specific interaction property, is organized under one abstract interaction. The abstract interaction is an interaction, which doesn't contain any data; but has interaction properties only to be inherited by child interactions. If there are multiple parents with interaction properties, all the interaction property tuples are considered. The child interaction also can have interaction properties. Usually organism, physiology, disease, pathway, trigger, receptor, and organ, are specified in the parent abstract interaction whereas tissue, cell type, and cell line are specified in the specific child interactions. An interaction may involve one or more components. It comprises at least one source component and a target component. It may optionally have other information pertaining to the interaction like expression, kinetics, effect, catalysts, mutation, knock out, etc. (
A component participating in an interaction may be in a specific cellular location and state. The cellular location may be Nuclear Membrane, Cytoplasm, Plasma Membrane, Mitochondria, etc. The component state tells whether the component is bound to other components or phosphorylated. (
For example, if a component A participates in the interaction only when it is bound to B and C, which in turn is bound to D. In the notation it is written as [bound:B,C(bound:D)].
Interaction Notation: It shows the components participating in the interaction, their location and state. It also shows the mechanism and mode by which the source components are regulating target components. It also shows the direction of interaction and relation, which tells whether the interaction is direct, indirect, or speculative.
Source component: A, at cytoplasm, which is bound to B and C, which in turn is bound to D.
Directly upregulates the target component via phosphorylation.
Target component: E, at cytoplasm, which is bound to F, which in turn is bound to H and G, which in turn is bound to H.
For instance, the canonical Wnt Signaling pathway is highly conserved between Drosophila, Xenopus and vertebrates. In the absence of a Wnt signal, active GSK3 is present in a multi-protein complex that targets beta-catenin for degradation via ubiquitin-mediated degradation. The phosphorylation of beta-catenin by Glycogen synthase kinase-3 (GSK3) at a series of N-terminal serine residues is greatly enhanced by the presence of Axin, which acts as a scaffold by binding to several components of the complex, including Glycogen synthase kinase-3 (GSK3) and the product of the adenomatous polyposis coli (APC) gene.
This information is represented as
The Pathway curation approach for elucidating the molecular networks include identification or selection of a disease one is interested in. Study the etiology as well as the pathophysiology of the disease from published reviews. Study the normal physiological pathway in the target tissues and the affected physiology of the target tissues. Select the mediators that are known to influence the normal physiology of the target tissues by going through peer reviews. Shortlist a set of keywords for searching published papers. Find keywords related to selected mediators for pathway building in relevance to the particular disease and critical components in the pathway to screen relevant papers.
To select the most relevant papers, using the selected keywords search PubMed (www.ncbi.nlm.nih.gov/PubMed), other relevant online journal sites and search engines to search the titles and abstracts for identifying protein-protein interactions in a patients/diseased tissue/cell type. Select the sources that speak about some components of the normal pathway which are being modulated in some manner by the trigger so that it affects the normal signalling and leading to a condition that ultimately leads to the disease. Organise all the papers on the basis of their cascade like triggers to receptors and receptor to other signalling components.
For instance, for Diabetes Type II, find the relevant patho-physioliological conditions associated with the disease like insulin resistance, obesity, hyperglycemia etc. The mediators influencing such conditions like Free Fatty Acid (FFA), Insulin, TNFalpha, etc. will be listed. All these will be used as key words to search the relevant literature in the literature databases like PubMed, Highwire, etc.
Data are entered into a data acquisition application, called the Curator's Workbench that organizes the entered data in a hierarchy to avoid redundant entries. Curator's Work Bench is used for entering pathway information or updating the existing pathway information as per the current scientific understanding of an interaction. The interactions are organized in a hierarchical fashion to ease the data acquisition process. From each scientific sources all the interactions covered in that source are manually read and extracted and entered interaction by interaction in the interaction form. For a particular interaction, details of protein-protein interaction (domains, motifs, residues, etc.,) are entered along with regulation details of the interaction. Any other details pertaining to the interaction such as mutation, knock out, kinetics, catalyst, expression data are entered in the respective forms. (
The interactions are entered in hierarchical manner to reduce data redundancy and to speed up the data acquisition process. The root interaction will be added as abstract interaction. This can be done by clicking is abstract check box. By adding an interaction property form under the abstract interaction form, one can enter the interaction properties such as pathway name, disease or physiology name and organism name which are common to all the child interactions. All the interaction forms for this specific pathway will be added under this interaction property form. Under each interaction form an interaction property form can be added to enter the properties like Organ, Tissue, Cell Type, Cell Line which are specific to one particular interaction. (
In order to maintain the hierarchical relationship between the interactions in one specific pathway, interaction table comprises two columns termed as interaction id and parent interaction id. A child interaction contains the parent's interaction id in the parent interaction id column. All the PMIDs for interaction, effect, mutation, knock out are stored in a single table called reference. This table contains columns like reference id, table id, column id, reference data (PMID). This table enables the feasibility to have more than one PMID in single form. The functions loadReference and saveReference are used to store and retrieve the PMID.
Dimension Tables form enables the Administrator to add, modify and delete the dimension values for combo boxes. buildCombo function is used to populate the dimension values in combo boxes.
The third type of database, which is a pre-computed pathart database and designated as jbl_pathart schema, wherein all the gene names and protein names from jbl_pddb are stored into a table in jbl_pathart schema. In jbl_pathway schema, component names are mostly Locuslink official gene symbols. The jbl_pathart schema acts as a bridge between jbl_pathway and jbl_pddb external databases. jbl_pathart maintains the linkage by building a mapping between the official/alternate gene/protein symbols available in locuslink and unigene databases to gene/protein symbols stored in jbl_pathway.
The address locations of the data including pathway, physiology, disease, organism, interaction and component tables from jbl_pathway are mapped into jbl_pathart. The pathway database is updated and the corresponding changes are carried out in jbl_pathart as well.
Integration of all the above databases with the pathart database is performed done using data loaders (written in Java & JDBC using Oracle DB) and SQL file for creating relational tables.
The user interface of the system of the present invention provides means for creating, querying and viewing the processed data. The user interface is a web-based graphical visualization tool that analyzes the underlying database and dynamically builds a pathway schematic. The biological entities are displayed in a cell schematic or as a pathway diagram. The user interface also displays annotated information on different biological entities and interactions between them.
The pathway search performed by implementing the method of the present invention based on an Organism, Disease or Physiology, and Pathway. The Pathway search is the first screen on the PathArt application. It enables identification and comparison of pathways across physiologies, diseases, organisms using Pathway search. The pathway of choice can be selected from the proprietary list of pathway names displayed in the Pathway name list in combination with the physiology/disease and organism or combination thereof. (
PathwaySearchServlet reads the request from input stream and writes the response to output stream. This class sends the read request to PathwaySearchHandler for further database operations.
PathwaySearchHandler is a java class designed with DAO pattern. This class establishes the connection with Pathart database and passes the search parameters and retrieves the result. This also constructs the result object and sends to PathwaySearchServlet.
Pathart Database is an oracle database where all the curated and PDDB data stored across tables. (
For instance, as an exemplary embodiment, searching Epidermal Growth Factor (EGF) Signaling Pathway in Asthma in Homo sapiens is shown. (
The biomolecular signalling interactions are displayed as either dotted or solid lines, this information is also called as regulatory information.
These details are the information curated from scientific literature and public domains databases. Finally the Generate Report processing bar will be displayed. The report generated is based on selected parameters.
In Component Search the proprietary list of components along with their pathways can be searched for. The search can be performed across pathways, physiologies/diseases, and organism. (
Pathart Client is the client UI using which user can select the component(s), Pathways, Physiologies, Diseases, Organisms for Component Search. (
MainServlet is a servlet and gives the entry point to Pathart Server. This Servlet receives the client request from HttpServerUtil and forwards to MicroarraySearchHandler.
MicroarrayServlet reads the request from input stream and writes the response to output stream. This class sends the read request to MicroarraySearchHandler for further database operations.
MicroarraySearchHandler is a java class designed with DAO pattern. This class establishes the connection with Pathart database and passes the search parameters and retrieves the result. This also constructs the result object and sends to MicroarrayServlet.
Pathart Database is an Oracle database where all the curated and PDDB data stored across tables. (
User can select one or more components of choice from the Component Search feature. Selected Component name(s) is put inside a hashtable (requestHash) along with searchType (‘MicroarraySearch’) and sent to HttpServerUtil class. MainServlet receives the request and forwards to MicroarrayServlet. MicroarrayServlet reads from the input stream and sends the request object to MicroarraySearchHandler.
MicroarraySearchHandler checks the validity of the request object and then type casts into MicroarraySearchParam. Database connection is obtained through DBUtil class. Handle for selected components is obtained by executing PathwaySearch.getPathwaysForPathwaySearch procedure by passing the pathway name, disease, physiology as parameter. This procedure joins the component, pathway, physiology and organism tables and searches for selected components. It stores the results into a temporary table. (
Child nodes are added into root nodes by using addChildNode method of PathwayTreeNode class. Final Pathway tree is built in util class and it is sent to MicroarraySearchHandler. MicroarraySearchHandler puts the searchResultTree into a hashtable and constructs the response object. MicroarrayServlet writes the response object into output stream. PathartHelper class reads the response object sent by servlet and sends to PathartApplet. PathartApplet extracts the root nodes and child nodes and displays in tree panel. (
In “Microarray search”, the user can upload a microarray data set (as shown in
Microarray search requires the input data to be in delimited text file. The delimiter can be any valid character like comma, semi-colon, tab, hyphen, etc. The format of the file can be one of the following Time Series Data (Gene ID, Time1, Time2, . . . ), Raw Microarray Data (Gene ID, Cy3, Cy5), Raw Microarray Data (Gene ID, Cy3, Cy5, Expression Ratio), Single Point Microarray Data (Gene ID, Expression Ratio).
The Gene ID can be Locuslink ID, Affymetrix Probeset ID, Amersham Probeset ID, Applied Biosystems Probe ID, Genbank Accession Number, Gene Name, Gene Symbol, etc. (
In Microarray Search user selects a file from the Microarray Search feature. Selected File content is put inside a hashtable (requestHash) along with searchType (‘MicroarraySearch’) and sent to HttpServerUtil class. MainServlet receives the request and forwards to MicroarrayServlet. MicroarrayServlet reads from the input stream and sends the request object to MicroarraySearchHandler.
MicroarraySearchHandler checks the validity of the request object and then type casts into MicroarraySearchParam. Database connection is obtained through DBUtil class.
Handle for selected components is obtained by executing PathwaySearch.getPathwaysForPathwaySearch procedure by passing the pathway name, disease, physiology as parameter. This procedure joins the component, pathway, physiology and organism tables and searches for selected components. It stores the results into temporary table. (
From COMPONENT, PATHWAY, PHYSIOLOGY, ORGANISM tables organism, physiologyOrDiseaseLabel, physiologyOrDisease, pathway, pathwayid, interactions values are obtained. Root node values like organism name, physiologyOrDiseaseLabel are passed into constructor of PathwayTreeNode class. Child nodes are added into root nodes by using addChildNode method of PathwayTreeNode class. Final Pathway tree is built in util class and it is sent to MicroarraySearchHandler. MicroarraySearchHandler puts the searchResultTree into a hashtable and constructs the response object. MicroarrayServlet writes the response object into output stream. PathartHelper class reads the response object sent by servlet and sends to PathartApplet. PathartApplet extracts the root nodes and child nodes and displays in tree panel. (
The results of micro-array analysis are depicted in the form of a summary sheet. The summary sheet as shown in
The components derived from the micro-array data are displayed in a pathway diagram. These components are differentially colour-coded, based on their level of expression. Colour-coding of the molecules is based on expression ratios. The default colour settings are as follows: Genes with expression ratio above 2 fold (up regulated) are coloured red, Genes with expression ratio in the range of 1 and 0 (down regulated) are coloured green, Genes with expression ratio in the range 1 to 2 (unchanged) are coloured yellow. The colour threshold can be customized according to requirements of the user. The colour gradient can also be changed to suit requirement of the user. (
Normalization of Micro-Array Data
Normalization helps to remove systematic variation in microarray experiments, which affect the gene expression levels. Normalization is done for a raw microarray data, which has Cy3 and Cy5 values for a set of Gene ID's for single time point or condition. The format of the uploaded dataset determines if normalization is possible or not. For data that cannot be normalized, the Normalizer tab is deactivated.
Clustering of Micro-Array Data
Clustering of data is essential for identifying biologically relevant groups of genes. Clustering helps in grouping genes, with similar expression profiles, especially in analysis of large scale gene expression data. The format of the uploaded dataset determines if clustering is possible or not. The clustering of Microarray data is mainly applied for time-series data. The selected gene set can be clustered using various metrics and linkages. For data that cannot be clustered, the Cluster tab is deactivated.
Gene Report
The Gene Report displays information on the Summary, Sequence, Affymetrix probeset data, Function, localization and the pathway. Appropriate links to the pubmed citation are also given. If no Gene ID is selected, the available information for all the genes is displayed in the Gene Report.
The data set consists of the expression patterns of different cell types of colon tissue. Gene expression in 40 tumor and 22 normal colon tissue samples was analyzed with an Affymetrix oligonucleotide array (Affymetrix Hum600 array) complementary to more than 6,500 human genes.
The types of analysis that can be done in PathArt are
5. Clustering based on GO biological process/cellular localization/molecular function: the expression data was also clustered using cellular process such as “cell cycle”. Around 83 genes were clustered, of which 24 genes were common to “Colon Cancer” and “Cell Cycle” Pathways.
Graph Builder feature of Pathart is used for generating pathway diagrams in Pathart Application. User can select the desired Pathway from the Pathway Tree Panel and view the respective Pathway diagram. (
Pathart Client is the client UI using which an user can select the desired Pathway from the Pathway Tree Panel.
HttpServerUtil is an inferface between Pathart client and Server. This is a java class and it is applied with façade pattern.
MainServlet is a servlet and gives the entry point to Pathart Server. This Servlet receives the client request from HttpServerUtil and forwards to InteractionMapSearchHandler.
InteractionMapSearchServlet reads the request from input stream and writes the response to output stream. This class sends the read request to InteractionMapSearchHandler for further database operations.
InteractionMapSearchHandler is a java class designed with DAO pattern. This class establishes the connection with Pathart database and passes the search parameters and retrieves the result. This also constructs the result object and sends to InteractionMapSearchServlet.
Pathart Database is an oracle database where all the curated and PDDB data stored across tables. (
E.g. Building the Pathway Map for ‘EGF Signaling Pathway’ by selecting a pathway from the Pathway tree panel: When the user selecting ‘EGF Signaling Pathway’ from tree panel, the pathway name is put inside a hashtable (requesthash) along with searchType (‘InteractionMapSearch) and sent to HttpServerUtil class. MainServlet receives the request and forwards to InteractionMapSearchServlet.
InteractionMapSearchServlet reads from the input stream and sends the request object to InteractionMapSearchHandler. InteractionMapSearchHandler checks the validity of the request object and then type casts into InteractionMapSearchParam. Database connection is obtained through DBUtil class. The following procedure is called and, InteractionMapSearch.getInteractionsHandle (handle, pathwayName, physiologyName, diseaseName, organismName) procedure searches pathway and interaction_property tables to find the distinct list of interaction ids for the given input parameters. Find the child interaction for all unique interaction_ids and store the data into interaction_map global temporary table. (
A SQL query is executed to obtain interaction values. Using this values interaction is built. Mapcomponent is built by executing the following procedure.
InteractionMapSearch.getMapComponents(interactionId) procedure joins component, interaction_component and interaction_map tables to find the list of components. It inserts all the components into map_component global temporary table and also it inserts all the complex components into map_component2component global temporary table. By inserting the component_id and interaction_id into interaction_map_intr_comp global temporary table it builds the relationship between components and interactions. It also join response and catalyst tables with interaction_component and interaction_map table to pull effect and catalyst data. (
INTERACTION MAP table is queried and result set is passed into Linkage class.
From Linkage class values are obtained and the linkage between interactions and map components is built. Then the Graph is built using GraphBuilder and put into the ResultHash. Graph coordinates also added into ResultHash.
InteractionMapSearchServlet writes the response object(ResultHash) into output stream. PathartHelper class reads the response object sent by servlet and sends to PathartApplet. Pathart applet renders the interaction map in Pathway Panel of Pathart. (
The pathart system data can be ascertained by accessing the external data resources through the web server module as shown in
Number | Date | Country | |
---|---|---|---|
60509956 | Oct 2003 | US |