The present disclosure relates generally to information systems for biological and life sciences research. More particularly, the disclosure relates to a network-based virtual research laboratory and collaboration portal with which biological and life sciences research may be more efficiently conducted.
Humanity passed a significant milestone in unraveling the mysteries of life on Jun. 26, 2000, when Dr. Craig Venter and Dr. Francis Collins stood proudly beside President Clinton to announce that the code of the human genome had been cracked, nearly two years ahead of schedule. In President Clinton's words, “Today, we are learning the language in which God created life.” His meaning: research scientists have now identified and recorded, in computer database form, the some 3 billion base pairs that comprise the entire human genome. This was a stunning achievement, but it is only the beginning.
According to recent estimates there are 30,000 to 40,000 genes in the human genome. While the identity and sequences of the 3 billion base pairs has now been worked out, little is yet known about which of these base sequences correspond to the 30,000 to 40,000 genes. Similarly, little is yet known about which of these base sequences are responsible for which proteins and bodily functions, or which of these base sequences are implicated in treating disease. In short, there is much to learn.
In practical effect, the decoding and storing of the human genome in a computer database has changed biology from an information gathering science into an information processing science. Computer scientists have joined the ranks of the laboratory scientists to spawn a new field, called computational biology—the application of quantitative analytical techniques in modeling biological systems. Much of the effort in this new field has been devoted to the science of using information to understand biology. Computer scientists call this science, bioinformatics.
To the bioinformatics computer scientist, the human genome represents a vast data-mining project that holds profound promise to cure disease and prolong our lives. The current approach to data-mining involves applying statistical methods and pattern recognition algorithms upon the genome database to make predictions about the information that is locked in our DNA. The nature of the problem is such that computer scientists must perform these analytical tasks without a complete understanding of where the biological data comes from or what it means.
Moreover, the bioinformatics field is still in its infancy. Currently, many life sciences researchers are struggling to learn how to employ computational tools in their work. Unfortunately, many of the computational tools require quite sophisticated knowledge of computer science and statistical mathematics, not to mention vast computational resources. This has placed many of the more promising analytical techniques off-limits to all but the largest research companies and institutions. For humanity's sake, this is quite unfortunate, because it squanders the full potential of humanity's creative minds. These are the creative minds working, without great funding, throughout the many small university and private research laboratories around the world-creative minds which would be capable of making significant, life-improving discoveries if empowered with the right tools.
Some recently developed tools and techniques related to these endeavors are discussed in the following patent applications, each assigned to the assignee of the present invention: U.S. Provisional Application No. 60/386,296, entitled Informatics System Architecture, and filed Jun. 4, 2002; U.S. Provisional Application No. 60/411,574, entitled Integration Instructions for Informatics Systems Architecture, and filed Sep. 16, 2002; U.S. application Ser. No. 10/455,262, entitled System and Method for Open Control and Monitoring of Biological Instruments, and filed Jun. 4, 2003; U.S. application Ser. No. 10/455,264, entitled System and Method for Discovery of Biological Instruments, and filed Jun. 4, 2003; U.S. application Ser. No. 10/455,579, entitled System and Method for Providing a Standardized State Interface for Instrumentation, and filed Jun. 4, 2003; U.S. application Ser. No. 10/455,263, entitled System and Method for Generating User Interfaces for Different Instrument Types, and filed Jun. 4, 2003; U.S. application Ser. No. 10/334,793, entitled Method for Placing, Accepting and Filling Orders for Products and Services, and filed Jan. 2, 2003; PCT Application No. US0234599, entitled Method for Operating a Computer and/or Computer Network to Distribute Biotechnology Products, and filed Oct. 30, 2002; U.S. Provisional Application No. 60/431,879, entitled A Browsable Database for Biological Use, and filed Dec. 19, 2002; U.S. Provisional Application No. 60/433,421, entitled Methods for Identifying Orthologous Genomic Regions Between Two or More Species, and filed Dec. 13, 2002; and U.S. Provisional Application No. 60/466,310, entitled Methodology and Graphical User Interface to Visualize Genomic Information, and filed Apr. 28, 2003. The disclosures of each of the aforementioned patent applications are incorporated herein by reference.
The present system provides a life sciences laboratory system employing at least one networked computer system that defines a virtual research environment. Users access the system through a portal associated with the networked computer system(s). The virtual research environment has a data coupling mechanism by which the user designates a set of user-specified data for bioinformatics processing. At least one processor associated with the networked computer system(s) performs bioinformatics services upon the user-specified data. In one embodiment, the data coupling mechanism enables transfer of the user-specified data to a memory space that is mediated or accessed by the processor performing the bioinformatics processing. This embodiment allows users to exploit bioinformatics processing resources that are not deployed on users' local computer environments, and to store and organize information relating to life sciences research in a secure, online workspace.
In another embodiment, the data coupling mechanism enables transfer of bioinformatics processing routines to a memory space that is mediated or accessed by the processor that locally accesses the user-specified data. This embodiment allows users to perform bioinformatics processing operations locally, without security concerns that others may be able to access their user-specified data.
According to a further aspect, a virtual community system is provided to facilitate collaboration and sharing of life sciences information. At least one networked computer system defines a virtual community that is accessible by a plurality of users. The virtual community provides information linking services whereby users may provide references to life sciences information. The system includes an index service provider, associated with the virtual community, that coordinates the provided references to life sciences information. Coordination is through an information architecture that defines hierarchical levels and defines links among related information across the hierarchical levels.
In one embodiment, the index service provider uses an indexing or cataloging system, based on the genome itself, that establishes a unified indexing schema or coordinate system. The indexing system provides a common reference system by which otherwise disparate blocks of information can be associated with one another.
In yet another aspect, the system provides a life sciences network portal system employing at least one networked computer system that defines the portal. Users may access the networked computer system through the portal to conduct life sciences research. The portal system includes a workflow system that is operable to allow a user to prescribe and track the performance of a series of steps associated with that user's life sciences research.
The system includes a data store of life sciences information accessible through the portal, as well as a product specifying system that identifies offered products useful in connection with performing the series of steps. An indexing mechanism associated with the networked computer system mediates relationships among the workflow system, the data store of life sciences information and the product specifying system.
According to a further aspect, the life sciences laboratory system employs at least one networked computer system that defines a virtual research environment accessible to a user through a portal associated with the networked computer system. The computer system is configured according to a framework that defines a common communication interface to a plurality of different life sciences laboratory equipment. The framework further defines a virtual laboratory equipment interface presented through the portal, whereby the user may interact with selected ones of the plurality of different life sciences laboratory equipment.
The framework allows users to establish working links between plural different life components of, otherwise incompatible, sciences equipment that may be located anywhere in the world.
Still further, a life sciences workflow management system employing at least one networked computer system is configured to provide a workflow interface to a user through a portal. The workflow interface is operable to allow a user to prescribe and track the performance of a series of steps associated with life sciences research. The system employs a data store associated with the networked computer system into which the user stores a set of user-specified data for bioinformatics processing. At least one processor associated with the networked computer system is configured to perform bioinformatics processing upon the user-specified data. The workflow interface has a user interaction mechanism whereby the user can manipulate user-specified data stored in the data store and whereby the user can control the performance of the bioinformatics processing.
Further areas of applicability of the present system will become apparent from the detailed description provided hereinafter. It should be understood that the detailed description and specific examples are intended for purposes of illustration only and are not intended to limit the scope of the invention.
The present system will become more fully understood from the detailed description and the accompanying drawings, wherein:
The following description of the various embodiment(s) is merely exemplary in nature and is in no way intended to limit the invention, its application, or uses.
An information system is illustrated diagrammatically in
It is anticipated that many practical implementations of the information system 20 will consist of a collection of computer and information systems that are distributed across a network such as the Internet. In this regard, the virtual community 22 may be implemented using one or more servers associated with a service provider, such as bioinformatics service provider 28. As will be illustrated by example in connection with
The illustrated bioinformatics service provider 28 may, itself, have a collection of life sciences information 30 that users of virtual community 22 may have access to. In one presently preferred embodiment the life sciences information 30 may include information at various levels, e.g., genomics, pharmacogenomics, proteomics, cellular biology and cheminformatics information. The information may be extracted from a variety of data sources and in a variety of data formats. Such formats include, but are not limited to: the FASTA format, the GenBank/EMBL/DDBJ format, the SWISS-PROT format, the Pfam format and the PROSITE format.
Bioinformatics service provider 28 may also have a collection of predefined workflow patterns 32 that are made accessible to users of virtual community 22 for use in conducting biological and life sciences research. Examples of such workflow patterns will be presented in connection with
The virtual community 22 is preferably configured so that its users can also access resources that are not necessarily associated with bioinformatics service provider 28. Thus, users of virtual community 22 can access life sciences information 36, workflow patterns 38 and research appliances 40 that are made available on the network by third parties or by other members of the virtual community 22.
As will often be the case, the biological or life sciences researcher will have a particular technical discipline or technical field of endeavor that defines much about that researcher's experiments conducted in virtual laboratory 24. However, the biological and life sciences represent a vast body of knowledge that spans numerous scientific fields of endeavor. The virtual community 22 is designed with this in mind. Thus, as illustrated at 22a, the virtual community 22 preferably comprises an N-dimensional space that may be diagrammatically depicted as layers each corresponding to a different biological or life sciences discipline. At 22a, the following disciplines are illustrated: genomics, pharmacogenomics, proteomics, cellular biology and cheminformatics. The virtual community 22 is configured using an information indexing system that allows a researcher working primarily in one field of endeavor (one layer) to “tunnel” up or down to access resources or information that are defined primarily for other disciplines (other layers). Thus, a genomics researcher can use the virtual community 22 to acquire proteomics information that may be useful in an experiment that research is conducting in the genomics field.
The gene indexing system of a presently preferred embodiment builds upon the coordinate system 44 to include relational links among diverse collections of information that correspond to the information layers illustrated at 22a in
The information system employs a layered information architecture illustrated at 60 in
According to the presently preferred information architecture, raw data is acquired at the data acquisition layer 64. As shown by the adjacent information scale, the acquired data is typically raw data of the type produced by research appliances 34. In the illustrated example, two such appliances 34 are shown. One is connected through a laboratory information management system 66 and the other through a suitably configured application program interface (API) 68. The appliances 34 communicate the raw data over a suitable network such as network/Internet 70, thereby making the raw data from appliances 34 accessible to the information system as raw data element 72. Note that the raw data element 72 is initially acquired by the data acquisition layer 64. Thereafter, data element 72 is passed or made available to the indexing and data conversion layer 74 where one or more bioinformatics tools 76 are applied to convert the raw data element 72 into scientific information data element 78.
The scientific information data element represents a higher form of information on information scale 62. It is within the indexing and data conversion layer 74 that the gene indexing system 42 (
After processing at the indexing and data conversion layer 74, data element 78 is passed or made available to the life sciences portal layer 80. It is within this layer that much of the analytical work is performed by the researcher. The researcher uses a workspace 82 defined within the virtual laboratory 24 (see
Once the researcher has completed his or her analysis, the analyzed information data element 86 may be passed to the collaboration layer 88 where that element may be made available to others as a shared information data element 90. The shared data element 90 may be made available to others by placing it in a public location or shared workgroup location within the virtual community 22 (
Referring again to
The information system illustrated in
Each of the aforementioned pages or screens provides a different type of functionality, which will now be explained through the use case example illustrated in
The researcher then selects all or a portion of the result set and places it into the workspace page 108. To assist the researcher in a systematic analysis, a suitable workflow template may be loaded into workspace 108 by accessing the workflow's page 106. In addition, the researcher may elect to couple his or her in silico research (contained on workspace page 108) to a research appliance or instrument 110. In this regard framework 112 provides the necessary control and data connectivity to allow the user to control and obtain raw data from instrument 110 without the need to directly invoke the instrument control functions in the native instrument's control language. Rather, framework 112 provides a universal structured control language by which instrument 110 may be controlled and the results transmitted to the storage location specified by the researcher on the workspace page 108. The actual data storage may be assigned to a storage location associated with the virtual data store 26 (
Workspace page 108 can be used to perform many of the information processing tasks associated with the layered information architecture shown in
In some instances a given workflow template will specify that certain bioinformatics tools 76 should be utilized upon the data set being analyzed within the workspace page 108. Such analyses can be performed within workspace page 108, however, a presently preferred embodiment allocates the more computationally intensive bioinformatics tasks to a separate page designated as the workbench page 104. Results of bioinformatics processing effected on workbench page 104 can be sent back to the workspace page 108, or optionally, to an electronic notebook page 116. The electronic notebook page provides the researcher with a convenient place to store personal notes about his or her research that are not necessarily intended for sharing within the workspace page 108.
Much of the power of the information system lies in its ability to integrate information from diverse sources, across multiple scientific disciplines, and to coordinate experimental research through workflows. To further illustrate these concepts, two exemplary workflows will now be described in connection with
Meanwhile, the chromosome regions saved in workspace 202 represent linkage regions that may be converted at step 210 to three gene lists. A data union operation is performed on the gene lists at 212 and the result is converted at 214 to a transcript text.
Meanwhile, at step 216 the researcher selects Panther protease inhibitors program which can be acquired through the search page 102 (
The saved transcript list is then converted at step 226 into GEx assays and the desired assays (GEx AoD, and GEx AbD) are selected at steps 228 and 230, with the resulting assay list being stored at 232 to comprise the GEx assays list.
Once the assays list is stored, it can be used to access an e-commerce and e-purchasing system to obtain the physical assay kit and associated supplies for conducting wet laboratory research based on the information developed.
In order to accomplish the workflow outlined in
The data analysis workflow example of
The functions required to perform the data analysis workflow of
Another workflow illustrated in
Referring to
In the illustrated hardware implementation, users interact with the information system 400 by access over the Internet 402 using a suitable browser 404. The information system 400 is coupled to the Internet as at 406. Although a single Internet connection may be utilized, the illustrated embodiment illustrates how a second Internet connection as at 408 can be employed to connect different parts of the information system to the Internet. As illustrated, connection 406 couples a portion of the server subsystems through a distribution server 410, also designated as Big/IP 410. Big/IP system 410, in turn, supplies multiple TCP/IP connections as at 412 to the web front end system 414. Web front end system 414 comprises a plurality of servers that may be configured to provide different website functionality. In
Internet connection 408 couples to an e-commerce system 418, that includes an e-commerce store server 420, a business database 422 and a selector server 424 that functions to integrate the store server with the business database.
The lab front end 414 is coupled through a second Big/IP system 426 to a sequence retrieval system 428. The sequence retrieval system (SRS) includes a data store 430 containing gene sequence data. The SRS system 428 is coupled to a collection of servers identified as the compute farm 432. These servers perform various bioinformatics processes upon the sequence data within data store 430. For example, the compute farm could perform a BLAST search upon the sequence data.
Associated with the SRS back end system 428 is a workspace file structure 434 into which the user workspace information is stored. In the illustrated embodiment, the workspace file structure 434 allows workspace information to be conveniently stored for later retrieval and use by the user through browser 404. In this regard, the web front end 414 includes a workspace servlet 436 that provides workspace manipulation functionality at the browser 404. In the illustrated embodiment, the servlet 436 provides workspace chooser functionality within browser 404, as illustrated at 438. Servlet 436 also provides workstation explorer functionality at 440. The chooser functionality 438 allows a user to identify locations within the workspace file structure 434 for saving information. Conversely, the explorer functionality 440 gives the user access to the workspace files 434 for information retrieval and subsequent manipulation operations such as moving or renaming information.
Other functionality may also be provided using servlet technology. Thus, as illustrated, the map viewer functionality may be provided using servlet 441. The map viewer will be illustrated in greater detail below.
The information system 400 further includes a business database 442 that is used to store user information and session information as well as system utilization information. Access to the information system 400 is mediated by an access control module identified as eRights server 444. The eRights server is coupled to business database 442 and also to the web front end 414. In an exemplary embodiment, the system provides different levels of user access. In a first level a user is entitled to only view certain information available through the various websites available to the web front end 414. At a next higher level a user is authenticated and given access to additional functionality, which may include access to workspace files within workspace file structure 434 and access to other features of the system as previously described. At a third and yet higher level the user is also given access to certain premium data files, such as data files associated with the Celera Discovery System (CDS). The eRights server 444 is utilized to ascertain the user's identity, authenticate the user and then grant the user access to whatever level of use the user is entitled to enjoy.
As previously described, the information system 400 provides a useful set of workflow tools or protocols that allow the researcher to organize his or her research and to integrate that research with the work of others. This workflow or protocol functionality is provided by a workflow JSP (Java Server Page) server 446 that is coupled to the web front end and also to the business database 442. Workflows or protocols are stored by the workflow JSP server 446 and may be served to selected web pages or frames within web pages on the user's browser 404. Additionally, workflows may be downloaded to a user's biotechnology instrument, personal computer, or networked instrument system. As previously described, these workflows identify predetermined steps that a user of the system may wish to follow when conducting research. At each step, the user is presented with convenient information and/or access to the e-commerce systems to purchase materials needed for conducting further research.
The e-commerce system illustrated in
As previously discussed, the system is capable of providing collaboration among users to promote virtual communities and to foster more advanced research. Sharing of information is possible through the workspace files 434. This may be implemented using the eRights server 444. The eRights server can give any designated user access to another designated user's workspace files. In this way, those two users can collaborate with one another. The eRights server 444 can also give access to selected users to the workflow JSP server, to allow authenticated users to upload and thereby share workflows with one another. The uploaded workflows would be stored in business database 442, for example.
In the illustrated implementation, there are various protocols by which data may flow. The SRS back end 428 may be configured to provide HTML data that is then proxied through the web front end 414 for display on one or more of the web server sites within the web front end. Alternatively, the SRS back end and web front end may communicate with each other using XML data. In this use, the web front end 414 treats the SRS back end 428 as a data store from which it retrieves information for display on one or more of its websites. In addition, the web front end 414 and the SRS back end 428 are both configured to communicate through respective connections 456 and 458 with the business database 442. Such communication may be by direct SQL query, for example.
Having thus described an exemplary hardware embodiment of the information system, an exemplary web portal implementation will now be described. In this regard, it will be appreciated that any web implementation involves design decisions regarding how the site will appear and how the user will navigate the site. Thus, the illustrated embodiment shown in
An exemplary workflow map is shown in
The access control system thus implemented allows different levels of content to be provided to different levels of users. For example, proprietary genome data may be provided solely to subscribers based on the need of the subscribers for privacy in their research, and based on contractual obligations relating to the proprietary nature of the data and its use. Also, publicly available genome data may be provided to all users as this data could be accessed alternatively through other sources. Further, the registration process allows users to be accurately identified, so that related users may share a common workspace, while privacy is still maintained. Thus, the system can provide users who are reluctant to have their research patterns tracked by others monitoring Internet traffic the capability to perform research in a secure environment. Simultaneously, the system can service other users accessing publicly available data.
As illustrated in
In addition to the searching capability, the homepage illustrated in
The myScience site provides a research environment that gives users multiple ways to search for genomic information and genomic products. Illustrated in
As illustrated in
As illustrated, the map viewer correlates visually the selected information at different hierarchical levels. The user can readily expand or contract the view to “zoom in” or “zoom out” as needed by view control 513. In this regard,
Once a useful assay had been identified, the user can conveniently select it for purchase by clicking on the assay and interacting with the shopping cart basket as depicted at 514 in
In addition to the functions and features described above, the information system supports a rich environment for creating and sharing workflows to assist the researcher and to promote collaboration. If desired the information system can be implemented to include a workflow framework having tools with which a user can create new workflows and modify existing workflows. Such a workflow framework embodiment is shown in
The stages also include linking variables with which one stage is linked to another, as illustrated by the workflow arrows a, b and c in
The individual steps or rules within each stage can be used to effect a variety of different operations or data manipulations. The steps may be either passive steps, which merely provide instructional information to the researcher, or active steps, which perform or launch data manipulation steps performed by the researcher's workstation or elsewhere.
In
According to the workflow framework, the individual workflow stages may be stored as separate objects or components that may be linked together in a variety of different ways, to create new workflows, or to modify existing workflows. In addition, the individual data members and the associated steps or rules can be edited or modified by a user to create new workflows or to modify existing workflows. The framework can be implemented in a variety of different software platforms. If desired, the workflow stages, and the associated objects, components, steps and rules may be expressed using XML. This XML description of a workflow thus defines the workflow in terms of the workflow stages involved. From this description the actual implementation or instantiation of the workflow is constructed and made available to end users via the portal described above.
The description of the invention is merely exemplary in nature and, thus, variations that do not depart from the gist of the invention are intended to be within the scope of the invention. Such variations are not to be regarded as a departure from the spirit and scope of the invention.
Number | Date | Country | |
---|---|---|---|
60495506 | Aug 2003 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 10643204 | Aug 2003 | US |
Child | 12207909 | US |