The invention relates to a system and method for dynamically and graphically relating unstructured or un-fielded data with structured or fielded database search results. Both the unstructured data and the structured data may be obtained from suitable database search results.
Current database tools generally allow a user to perform searches on database contents based on structured database contents. For example, entries into a database may be searchable based on certain fields or criteria that have been populated for a particular entry in the database. In addition, database tools exist which offer a user the ability to perform a search of database contents based on unstructured database contents. An example of an unstructured search may be a text search that seeks the appearance of a particular word, phrase, of group of words within a database entry. Because the text of a database entry does not appear in any particular field, the text of a database entry is said to be un-fielded or unstructured. One of the problems with known database management tools is that the database often contains vast amounts of data that is too vast for a user to process. Efficient means of analyzing and understanding the data stored in a database is difficult as relationships between unstructured data (for example, text) and structured data (for example, fields in a database) is not readily apparent to the user.
In certain embodiments, a computer implemented method of relating structured data to unstructured data, includes the steps of: displaying unstructured data in a first display area; displaying structured data related to the unstructured data in a second display area; in response to a change in the display of either the unstructured data in the first display area or the structured data in the second display area, automatically dynamically changing the display in the other of the first display area or the second display area to display the changed data based on its relation to the changed data in the one of the first display area or the second display area.
In certain embodiments, displaying the unstructured data includes performing a search of one or more databases to retrieve the unstructured data displayed in the first display area.
In certain embodiments, the step of displaying the structured data includes retrieving structured data from the one or more databases based on its association with the unstructured data retrieved from the one or more databases.
In certain embodiments, the step of displaying structured data is performed automatically responsive to the step of displaying unstructured data.
In certain embodiments, the step of displaying the unstructured data includes displaying a cluster map of retrieved data in which similar data based on one or more attributes of the retrieved data are grouped together in similar clusters.
In certain embodiments, the step of displaying the unstructured data includes displaying a classification scheme of the retrieved data in which similar data based on one or more attributes of the retrieved data are grouped together in similar classifications.
In certain embodiments, the step of displaying structured data includes displaying a one-dimensional display based on an attribute of the retrieved data displayed in the first display area.
In certain embodiments, the step of displaying the structured data includes displaying a two-dimensional display based on two attributes of the retrieved data displayed in the first display area.
In certain embodiments, the first display area and the second display area are respective windows in a graphical user interface on a computer display.
In certain embodiments, the display in any two of the first display area, the second display area, and a third display area are automatically dynamically changed to reflect a changed display in the other of the first display area, the second display area, and the third display area.
In certain embodiment, the first display area displays a cluster map of documents clustered based on concept indicators associated with each document, the second display area displays a one-dimensional display that displays one attribute associated with the documents in the cluster map displayed in the first display area, and the third display area displays a multi-dimensional display that displays at least two attributes associated with the documents in the cluster map displayed in the first display area.
In certain embodiments, the method further includes receiving a selection of a subset of data in one of the first display area, second display area, or the third display area, and automatically dynamically highlighting the data in the others of the first, second, and third display areas that correspond to the selected subset of data in the one of the first display area, the second display area, and the third display area.
In certain embodiments, the method further includes providing a document viewer display area in which specific documents included in the unstructured data or lists of documents included in the unstructured data may be viewed, wherein the list of documents displayed in the document viewer display area corresponds to a selection made in one of the first display area or the second display area.
In certain embodiments, a system is provided for relating structured data to unstructured data, which includes: a display unit configured to display unstructured data in a first display area; the display unit also configured to display structured data related to the unstructured data in a second display area; and a processing unit configured, in response a change in the display of one of the unstructured data in the first display area or the structured data in the second display area, to automatically dynamically change the display in the other of the first display area or the second display area to display changed data based on its relation to the changed data in the one of the first display area or the second display area.
In certain embodiments, a computer readable medium is provided having program code recorded thereon that, when executed on a computing system, relates structured data to unstructured data, the program code including: code for displaying unstructured data in a first display area; code for displaying structured data related to the unstructured data in a second display area; code for, in response a change in the display of one of the unstructured data in the first display area or the structured data in the second display area, automatically dynamically changing the display in the other of the first display area or the second display area to display changed data based on its relation to the changed data in the one of the first display area or the second display area.
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
In a general aspect, the present invention provides a system, method, and software that dynamically and graphically relates unstructured data to structured data and provides a dynamic display of the relationship between the unstructured data and the structured data.
As shown in
With reference to the flowchart of
Second, the user may use a search or query interface provided by the system in which the user can access and retrieve data from databases and data sources to which the system is connected (for example, the internal databases 210). One of the features of the system provided herein is that if the search or query interface of the system is used, the data from external or internal databases or data sources is automatically formatted for use with system so that no separate importation or formatting process is necessary. One skilled in the art would recognize that, in certain embodiments, a user may use both the first and second methods together to retrieve the data so that the coverage of databases and data sources is maximized.
In step 110, the data that is retrieved responsive to the user's request is processed by the system to provide the interrelated display of the structured data and unstructured data. One skilled in the art would recognize that the data could also be requested by more than one user and all the data so requested may be used for the display provided by the system of the present invention. This could be accomplished by, for example, defining groups or projects so that data could be specified by several users and the processing could be done on all the data that is included in a particular group or project.
Initially, the data that is retrieved is harmonized so that data that is retrieved from different databases or data sources is treated consistently by the system. For example, the structured fields associated with documents from different databases may have slightly different field names or formats. Therefore, the process of harmonization may change some of these field names to a standard name for fields of a certain type or update a reference table that shows the interrelationships between the different field names so that the subsequent processing of the data treats the similar fields semantically the same way even if the field names or formats are different across the different databases or data sources that are accessed by the system.
In step 275, the document vectors are used to cluster together documents based on a similarity of the document vectors of the various documents. In addition, ordination, K-means, and/or other techniques may be used which are other clustering techniques that are well known to those skilled in the art. Some clustering techniques that may be used are: Hierarchical, nearest neighbor, support vector machine, self-organizing maps.
Returning to
In the research landscape display 510 (
In certain embodiments, the research landscape display may instead display the unstructured data (for example, documents) arranged in a classification scheme in which a document is classified into one of the categories or groups of the classification scheme.
The structured data related to the unstructured data needs to be organized so that they can be displayed in one or more display areas (i.e., a second and/or third display area or additional display areas). In one embodiment, the structured data related to the unstructured data may be displayed using a one-dimensional display, such as a bar chart. Therefore, for example, if the documents retrieved are patents, the bar chart may provide a display of the assignees of the patents in which the length of the bar indicates the number of patents assigned to that assignee. It should be noted that there could be multiple instances of any one of the display areas discussed herein. Therefore, for example, multiple bar charts (based on different attributes) or multiple research landscape displays could be provided in certain embodiments.
In certain embodiments, the structured data may also be displayed in a two-dimensional display, such as, a matrix. In this display, the documents retrieved responsive to the user's request may be classified based on two attributes (which are the axes of the matrix). For example, if the retrieved data is patents, the matrix display may display the assignees correlated to the technical field of the patents so that one can visually assess not only the assignees that are active but also the technical fields in which the assignees have focused their patents. Likewise, it should be noted that multiple instances of the second display area could be displayed at the same time. Furthermore, it should be noted that certain embodiments could also display a multi-dimensional display having more than two dimensions. For example, graphical constructs such as circle graphs could be used to generate a multi-dimensional display that displays information in more than two dimensions.
In certain embodiments, the system also provides a document viewer in which a specified document can be viewed in full (or in significant sections). Therefore, if the user selects a particular document in any one of the other display areas, the document viewer automatically retrieves and displays that particular document. Alternatively, or in addition, the document viewer may, by default, display a list of documents that have been retrieved in a searchable and indexed display. Therefore, a user may be able to select a document from the list in the document window itself so that the document can then be displayed in the document window.
In certain embodiments, the document viewer display area may include several tabs (or other similar indicators) that enable a user to control the documents displayed in the document viewer display area. For example, a “highlighted” tab can be provided which lists the specific documents that are in a selected state in one of the other display areas and this list of specific documents will change each time the selected state changes in one of the other display areas. A “drill down” tab provides a user the ability to drill down on a list of documents or select a specific document for viewing. A “flagged” tab allows a user to select one or more documents that are kept in the document list in the document viewer display area irrespective of the selection state of those documents in the other display areas. Therefore, the flagged documents are kept accessible in the list of documents displayable in the document viewer display area irrespective of the selection state of the documents in one or more of the other display areas.
With reference to
It should be noted that the system 200 provides that these various display areas, for example, the first, second, third and document viewer display areas are displayed in a logical workspace. In certain embodiments, the entire workspace including all the display areas are displayed on the display of a single computing system or other similar display. Alternatively, the workspace may be physically distributed over two or more computer displays (or other similar display) so that some of the display areas are displayed on one computer display while the other display areas are displayed on another computer display. However, the display areas are still dynamically interoperable in the manner described herein even if the display areas are physically displayed on different computer or other similar displays. In certain embodiments, a display unit includes a graphical user interface which independently controls and formats the first display area and the second display area. For example, the first display area and the second display area may be separate windows, frames, or panels or combinations thereof which are interoperable in the manner discussed herein.
In step 120, the system checks to see if there is any user input. For example, the user may select one of the clusters in the research landscape map or one of the attributes displayed in the structured data displays (for example, the bar chart or the matrix display). If there is no input, the system checks to see if the user has indicated that the session should be terminated in step 130 and if not returns to check for user input in step 120.
If user input is detected in step 120, the method proceeds to step 125 in which the display automatically and dynamically changes in response to the user input. For example, if the user selects one of the clusters in the research landscape map in the first display area, that cluster may be highlighted or otherwise indicated in the research landscape map in the first display area. The bar chart in the second display area is also substantially simultaneously updated to reflect the selected cluster in the first display area so that the corresponding data elements in the bar chart are also highlighted or otherwise indicated. Likewise, the matrix display in the third display area is also substantially simultaneously updated to reflect the selected cluster in the first display area. Furthermore, the document viewer may also be updated to reflect or highlight the documents that correspond to the selected cluster in the first display area.
It should be noted that while the above discussion discloses that a change in the first display area is automatically and dynamically reflected in the other display areas, the initial change or selection could be made to any one of the display areas and the other display areas would automatically and dynamically change their display in response.
Once the large answer set 404 has been retrieved, the system 200 provides a display that provides a multi-window display areas of the results in which each of the windows cooperatively display various aspects of the answer set. For example, one of the display areas displays a research landscape of the retrieved documents by clustering documents into the relevant clusters, for example, based on the concept indicators. Other display areas display one or more attributes of the documents in the answer set so that a user may iterate through a discovery stage 408 in which the user is able to analyze the documents based on the correlated changes in the display areas (which may be GUI windows in certain embodiments). In this way, a user is able to identify relevant documents from a larger and more relevant answer set based on criteria that better matches a user's search strategy.
In certain embodiments, the system 200 provides that two or more selections can be active in the selected state in one or more of the display areas. If two sets of data are to be displayed in a single display area (based on the fact that there are two active selected states), the data corresponding to each of the selections could be color coded to be different or the brightness of the data could be varied to reflect which selected state the data corresponds. Data that belongs to both selected states could be easily tracked by displaying a third color that may correspond to a combination of the colors for the other two selected states.
Accordingly, display area 510 (shown in
Display area 520 (shown in
Display area 530 (shown in
Display area 540 (shown in
Further details of each of these display areas and their interaction is provided with respect to
In the two dimensional display area 530A (shown in
Furthermore, the document viewer display 520 typically displays a listing of only the documents that belong to the selected cluster 511 in landscape map 510. Document viewer display 520 also includes a flag icon 521 which allows a user to “flag” specific documents so that the document viewer display 520 keeps a flagged document irrespective of a selection state of the documents based on a selection or a change in selection of the documents in any one or more of the other display areas.
Therefore, each of the other display areas automatically and dynamically change its display to highlight or indicate data points that correspond to a selected list of documents in any one of the other display areas. Furthermore, whenever the selected data in any one of the display areas is changed, the other display areas also change automatically in substantially the same time to reflect the changes in the one display area (for example, based on the changed selection of documents). Therefore, a user can easily visually analyze not only the documents in a research landscape map but also the attributes associated with specific selected documents selected in the research landscape map 510.
Therefore, some of the benefits of the display and analysis system and method disclosed herein is that accurate and cleaned data can be used to improve an answer set derived from a search of multiple relevant databases or data sources. The data can then be visualized in multiple displays which can each display one or more attributes of the data or documents in the answer set. Furthermore, intelligent analysis can be performed by changing the selections as well as the attributes so that each of the display areas automatically and dynamically change their displays to display data that corresponds to the documents in the particular selected state in one of the display areas. Furthermore, this process of selection of documents as well as choosing which attributes to use can be iteratively changed while the displays in all the other display areas change automatically to reflect the selection change in any one of the display areas.
Furthermore, it should be appreciated that it is within the abilities of one skilled in the art to program and configure a networked computer system to implement the method and system discussed earlier herein. The present invention also contemplates providing computer readable data storage medium with program code recorded thereon (i.e., software) for implementing the method steps described earlier herein. Programming the method steps discussed herein using custom and packaged software is within the abilities of those skilled in the art in view of the teachings disclosed herein. Furthermore, it should be recognized that data signals that embody one or more of the software instructions to implement the method disclosed herein are also within the scope of the present invention.
Other embodiments of the invention will be apparent to those skilled in the art from a consideration of the specification and the practice of the invention disclosed herein. It is intended that the specification be considered as exemplary only, with such other embodiments also being considered as a part of the invention in light of the specification and the features of the invention disclosed herein. Furthermore, it should be recognized that the present invention includes the methods and system disclosed herein together with the software and systems used to implement the methods and systems disclosed herein