A. Field of the Invention
The present invention relates to a medium containing information gathered from material including a source, and a data processing system for generating content for the medium and permitting access to the content.
B. Description of the Related Art
The communication and manipulation of ideas is limited by the forms in which they can be packaged and transported. Books in their modern, codex form are a substantial improvement on earlier forms in the amount of information that can be packaged together, the portability of that information, the speed with which the information can be accessed, and its suitability for commerce. A typical book might consist of 400 pages, contain 160,000 words, and weigh 4 pounds. It is possible to find books larger or smaller than this by perhaps a half-order of magnitude (factor of 3). Beyond this range, larger material tends to be broken into separate book volumes, as in encyclopedias, and smaller material tends to be grouped into book volumes, as in journals of scientific articles or collections of short stories.
Essentially, the size of books in terms of physical form and number of pages is determined first by what a reader finds convenient to carry and second by what the publisher finds economical to publish and distribute. Very large books or very expensive books exist, but tend to have limited markets and distribution. On the other hand, paperback pocket books, the books of truly mass circulation, conform carefully to a portable size and economical cost.
The cost in time of accessing information in a book is much lower than accessing information outside the book, such as the contents of other publications the book references. Access to additional material not previously assembled may mean a trip to the library or ordering from a publisher, processes requiring hours or even weeks. Moreover, even if all the referenced contents have been assembled, they would not share the book's portability, i.e. they could not be readily packed off to the beach or taken home from work.
These limitations on book size mean that it is not practical to publish a book together with the contents of the material it cites. Yet, references are often pursued as a consequence of reading the book. This use of books is part of a larger process called knowledge crystallization.
Knowledge crystallization includes collecting information, making sense of it, and authoring some new work based on the research and insight. An example would be writing a scientific research paper or authoring a business slide presentation.
The idea of electronic, hyperlinked books exists. For example, D. C. Engelbart, “Augmenting Human Intellect: A Conceptual Framework,” Stanford Research Institute, Menlo Park, Calif. AFOSR-3223 (October 1962); T. H. Nelson, Literary Machines. Swarthmore, Pa.: Self-published (1981); and N. Yankelovich et al., “Intermedia: The Concept and Construction of a Seamless Information Environment,” IEEE Computer, vol. 21, pp. 81–96, 1988, developed hypertext systems in which documents were related to each other through links. Engelbart and Nelson's systems, however, emphasized merely linking in a new document that references other documents already in the system, and the links in the Engelbart, Nelson, and Van Dam systems must be explicitly authored.
J. R. Remde et al., “Superbook: An Automatic Tool for Information Exploration,” (1987) (presented at ACM Hypertext '87 Proceedings) and D. E. Egan, J. R. Remde et al., “Behavioral evaluation and analysis of a hypertext browser,” (1989) (presented at ACM CHI '89 Conference on Human Factors in Computer Systems, Austin, Tex.) describe a hyperlinked “Superbook” with integrated fisheye visualization and indexing. Creating an electronic Superbook from an existing paper statistics manual resulted in improved access time for information.
There are currently many electronic, hyperlinked books on the market. Typical of the genre are T
E. G
The prior systems, however, fail to adequately provide a user quick access to information related to a source material. Further, the prior systems fail to provide a visualization of source material and information related to a source material that can maximize the user's understanding of the material.
Systems and methods consistent with the present invention significantly effect a reader's ability to understand information provided in a source material and related secondary material. For example, systems and methods consistent with the present invention provide a medium including information regarding features of a source material and features of secondary materials related to the source material. Collecting the information on a medium permits quick access to the information.
In addition, information regarding features of a source material and features of secondary materials related to the source material can be graphically displayed in color and arranged to form patterns at a large scale, thereby aiding in the exploration of information contained in the medium. Unlike a physical book, the information can be manipulated and analyzed not just by the reader, but also by statistical processes. Thus, systems and methods consistent with the present invention can make specific recommendations for reading based on the user's indication of items of interest in the medium.
In accordance with methods consistent with the present invention, a method is provided for producing a storage medium that provides information regarding a source material. The method comprises the steps of gathering features of the source material, accessing secondary materials related to the features, gathering features of the secondary materials, determining attributes of the gathered materials, analyzing the attributes based on a predetermined characteristic, and recording information regarding the source material and the secondary materials based on the analysis.
In accordance with another method consistent with the present invention, a method is provided for providing a user interface for graphically displaying information. The method comprises the steps of displaying information regarding a source material and secondary materials, determining a selection of information based on a user input, analyzing the source material, the secondary materials, and the selection of information, and updating the display of information regarding the source material and secondary materials based on the analysis.
In accordance with an apparatus consistent with the present invention, an apparatus is provided for producing a storage medium that provides information regarding a source material. The apparatus comprises a memory including a program, a processor for executing the program, and a storage medium, wherein the program includes instructions to gathers features of the source material, access secondary materials related to the features, gather features of the secondary materials, determine attributes of the gathered materials, analyze the attributes based on a predetermined characteristic, and record on the storage medium information regarding the source material and the secondary materials based on the analysis.
In accordance with a user interface consistent with the present invention, an interface is provided for graphically displaying information. The interface comprises a display that displays information regarding a source material and secondary materials, a user interface that determines a selection of information based on a user input and performs an analysis based on the source material, the secondary materials, and the selection of information, and a controller that instructs the display of information regarding the source material and secondary materials to be updated based on the analysis.
A medium produced using principles consistent with the present invention has a format for interacting with an automated information accessing device, the format including information for use in assisting a user to understand a source material, wherein the format includes information produced by a method, the method comprising gathering features of the source material, accessing secondary materials related to the features, gathering features of the secondary materials, determining attributes of the gathered materials, analyzing the attributes based on a predetermined characteristic, and recording, based on the format, information regarding the source material and the secondary materials based on the analysis.
A computer-readable medium produced consistent with the present invention contains instructions for controlling a computer to perform a method for producing a storage medium that provides information regarding a source material, including gathering features of a source material, accessing secondary materials related to the features, gathering features of the secondary materials, determining attributes of the gathered materials, analyzing the attributes based on a predetermined characteristic, and recording information regarding the source material and the secondary materials based on the analysis.
Another computer-readable medium produced consistent with the present invention contains instructions for controlling a computer to perform a method for providing an interface for graphically displaying information, including displaying information regarding a source material and secondary materials, determining a selection of information based on a user input, analyzing the source material, the secondary materials, and the selection of information, and updating the display of information regarding the source material and secondary materials based on the analysis.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate the implementations of the invention and together with the description, serve to explain the principles of the invention.
Reference will now be made in detail to the construction and operation of an implementation of the present invention which is illustrated in the accompanying drawings. The present invention is not limited to this implementation but it may be realized by other implementations.
A. Overview
Systems and methods consistent with the present invention create a medium containing information related to material including a source and provide an interface to graphically display this information.
Unlike previous mediums linking information related to a source material, an automated process creates a medium consistent with the present invention. The process includes a gathering routine that accesses material, gathers features of the material, and indexes the features as objects. An analysis routine of the process then determines attributes of the objects. A stop routine of the process checks the attributes based on a predetermined characteristic. If the characteristic is found, the objects are provided on the medium. If the characteristic is not found, the stop routine recalls the gathering routine to iteratively seek additional material and features. Because of the automated process, the medium content does not have to be specifically authored. Also, in some cases, all of the features related to a source material could be provided on a medium. In other cases, the analysis routine could involve a statistical process, used to limit the number of objects provided on the medium.
The interface simultaneously displays representations of all of the objects provided on the medium to allow a user to see the materials on a large scale. The display includes a representation of objects of the source material in a first area and objects of the secondary materials in a second area. Interaction with the objects permits the objects to be rearranged based on user interest. The display areas are linked so that manipulation of an object in one area will effect a view of the same object in another area, for example. Using the interface, a user can rapidly gain greater understanding of the material.
B. Architecture
A computer system used to create a medium or a data processing system using a medium could be number of machines, a separate machine, or a portion of a machine. An exemplary computer system is illustrated in
The program to create a medium 108 includes a gathering routine, an analysis routine, and a stop rule routine. Visualization program 109 includes a graphics engine, a user interface that monitors user action and dynamically predicts user interest, a control routine for the visualization, a document database, and a browser routine.
B. Architectural Operation
1. Creating the Medium
Initially, the gathering routine accesses a source material 300 (step 200). Then, the gathering routine parses the content of the source material 300 to find features related to the source material that may be of interest to a user (step 210). As illustrated in
To parse the features, the gathering routine could search the text of the source material or access a previously-extracted feature list. The gathering routine then designates the features as objects. Alternatively, the gathering routine could permit a publisher or an author to intelligently review the gathered information and define the objects using professional judgement, thereby providing or removing materials.
In the example of
Once the features of the source material are extracted as objects, the analysis routine determines attributes of the features (step 220). For example, the medium in
Based on the determined attributes, a stop routine analyzes the attributes to see if an attribute has a predetermined characteristic (step 230). This analysis could ensure that the total amount of information is within a preset limit, such as the capacity of a physical storage medium. Also, the analysis could look for the presence or absence of certain attributes of the source material and secondary materials, such as the presence of selected key words. The stop routine could analyze a plurality of attributes, each associated with different characteristics, and provide a single result through processing.
If the stop routine detects the presence of the predetermined characteristic, the stop routine inhibits gathering information for the medium (step 240). Otherwise, the stop routine calls the gathering routine to iteratively access and parse secondary material, thereby locating and processing more information (steps 250 and 260).
In the example of
The gathering routine accesses the content of the references and sets them as objects (step 250). In
Based on the objects gathered in the gathering routine, as shown in
Regardless of how medium 600 is published, it includes an index pointing to where objects contained in the medium are located for quick access to information in the objects. In the case of
Other mediums could also be provided. For example, as shown notationally in
For some source materials, medium 600 would ideally contain every conceivable secondary material. Nevertheless, the volume of such secondary material may exceed the maximum amount of information that can be stored on a storage medium, such as a CD-ROM or DVD, or contain such a large amount of useless information to be meaningless. Using manual pruning or statistical analysis to identify attributes of the source and secondary materials, in step 220 medium 600 can include only the most important and relevant secondary materials. In one aspect of the invention, the statistical analysis is always performed. In another aspect of the invention, the statistical analysis is performed when the gathered information exceeds a predetermined amount, i.e. the stop rule checks the attribute of total size of data collected, triggers the statistical analysis when the total size exceeds a predetermined amount, and calls the analysis routine.
The analysis routine can use various statistical analyses to determine the attributes of gathered features. Examples of techniques to determine attributes can be found in C. Chen and L. Carr, supra; S. K. C
The analysis routine may use attributes of a cocitation statistical analysis. The cocitation statistical analysis uses a citation index, created by populating an incidence matrix (or citation matrix, an example 900 of which is shown in
For example, a cocitation statistical analysis using the features of references would include a directed graph edge between node Di and node Dj indicating that Di references Dj and that Dj contains a citation from Di. The value of the cell for row Di and column Dj denotes the number of times document Di refers to document Dj, which is called the citation frequency. In this manner, a citation matrix C illustrates the “reference” relationships and the transpose of the citation matrix CT illustrates the “is-referenced-by” relationships. As can be seen in
With m features that contain references to n other features in a citation matrix C=(cij), then the number of references of document Di is the sum of the row vector for Di or (CCT)ii, and the number of citations received by document Di is the sum of the column vector for Di or (CTC)jj. In
A bibliographic coupling strength, which indicates the number of references that documents Di and Dj share in common, can also be computed as an attribute. The bibliographic coupling strength is given by the equation:
Once written, the references a document Di makes to other materials are fixed, yet additional papers can be written that reference Di as well as cite the references in Di. At any given point in time, one can inspect the bibliographic coupling strengths for a set of documents to gain insight into what awareness authors had of each others work or used to retrieve the set of documents most bibliographically coupled to a document. In other words, the medium could include only the documents having a bibliographic coupling strength larger than a predetermined amount.
As time progresses, this set of bibliographically coupled items can increase as others cite similar papers and a medium that updates the collection of information could also updated bibliographic coupling strengths.
Cocitation strength, which is the number of citations which documents Di and Dj share in common, can also be used as an attribute. Cocitation strength is given by the equation:
Cocitation identifies pairs of documents that are references together. Frequently citing documents together implies the shared semantic judgement of others that each of the documents Di and Dj in the pair DiDj is related to the other. This is an important insight because the two documents may not contain a reference to one another. Like bibliographic coupling strengths, cocitation strengths vary over time and can provide a glimpse into the papers that influence a particular field at any given time.
Typical cocitation analysis creates a correlation matrix from the cocitation strengths and applies multidimensional scaling on the results. Visually, related documents cluster together indicating sub-fields within the main field and the medium can include these most relevant materials.
The analysis routine can also use spreading activation to determine attributes. Spreading activation is a class of algorithms that propagate numerical values among a set of connected items. For any features of a source material, activation can be spread though the network of associations. The resulting activation vector can be sorted with the highest values representing items most closely associated with the features of the source material. Since multiple features can be used as sources of activation, the interest function is computed relative to several features at the same time.
For example, the spreading activation analysis can use a leaky capacitor model. An activation network can be represented as a square matrix R, where each element Ri,j contains the strength of association between nodes i and j, and the diagonal contains zeros. The amount of activation that flows between nodes is determined by the activation strengths, which for our purposes correspond to bibliographic coupling and cocitation strengths. In some implementations, both bibliographic coupling strengths and cocitation strengths can be used simultaneously. For example, after performing spreading activation on each of bibliographic coupling and cocitation strengths, the results can be added or “fused.” Alternatively, matrices respectively representing bibliographic coupling strengths and cocitation strengths can be normalized and summed, with the spreading activation analysis being performed on the result.
Source activation is represented by a vector C, where Ci represents the activation pumped in by node i. The dynamics of activation can be modeled over discrete steps t=1, 2, . . . . N, with activation at step t represented by a vector A(t), with element A(t,i) representing the activation at node i at step t. The evolution of the flow of activation is determined by:
A(t)=C+MA(t−1)
M=(1−γ)I+αR
where M is a matrix that determines the flow and decay of activation among nodes, with γ determining the relaxation of node activation back to zero when it receives no additional activation input, and α denoting the amount of activation spread from a node to its neighbors. I is the identity matrix.
The parameters of M could be fixed for each generation or could vary. Step 230 stops the spreading after a predetermined plies of activation are computed or stops when the activation for all of the features of a generation of the secondary material being analyzed falls below a predetermined threshold. Then, secondary materials having an activation above a predetermined activation are included in medium 600.
Also, the contents of any referenced material does not have to be Included in medium 600. As shown in
2. Data Processing System Operation
After the medium is created, a user can manipulate, automatically or manually, the objects contained in the medium to reveal insights about the collection of ideas.
This way of working with the medium is interesting for several reasons. The author of the source material has used her knowledge to choose, e.g., the reference documents as being highly related to the source material. Processing the objects in the medium can be expected to include other works that are evidence for the current work, contrasting views, development of related ideas, descriptions of methodology, etc. In other words, the information of the objects can expand the knowledge provided by the source material in a manner that is unexpected even to the author or publisher of the source material.
A data processing system consistent with the present invention uses the medium to give readers a broad view of how the source material was organized and the reason for that organization, to help the reader determine which articles are the most influential in the field discussed in the source material, and how influence flows in the source material from other information, such as the references, to suggest which materials to read next, and to allow the reader to quickly access the material of interest.
The control routine of visualization program 109 is a supervisor for the interface. The control routine controls access to a document database, including for example a medium containing information gathered from material including a source, and commands information therein to be rendered as placards, or icons, using one of a variety of layout algorithms. For example, at start-up, the control routine commands transferring of information contained in a medium into memory, starting a browser routine, and rendering the graphics scene. The control routine can use a basic event-driven model, with timer-events to update animations.
The graphics engine of visualization program 109 composes and maintains an internal scene graph of graphics objects. The graphics engine includes a graphics object database, a rendering engine, and a set of visual operations. The graphics object database stores a number of objects that are to be displayed. The rendering engine uses the graphics database to set up a global state of a scene and uses transform matrices of the objects to render the scene. i.e., the actual rendering is performed by the object itself.
A portion of the scene could include information from the browser routine of visualization program 109. The browser routine calls or includes a program, such as Microsoft's Internet Explorer component, that can present the hypertext markup language (HTML) associated in a visual form.
The user interface provides the user with commands to display material in the medium, query the information in the medium by keywords in fields such as the contents, references, authorship, or institution, extract portions of the medium, and author new content based on the medium, and responds to the commands. For example, when the user selects placards representing several articles and asks for a recommendation about what to read next, the user interface can use the selections to derive an ordered result-set. The graphics engine would then graphically display the result-set.
The user interface uses similar statistical analysis as that used in creating the medium, placing the reader in control of selecting materials of interest, which is difficult to predict when the document is shipped.
The data processing system uses visualization program 109 to provide the graphic display. As illustrated in
The user interface then monitors the user's interaction with the objects in the visualization (step 1120). The monitoring could detect affirmative selections, such as a user command to select an object, or implied selections, with a process watching the history and context of a user's actions and determining a degree of interest.
The user interface predicts the preferences of the users based on, e.g., affirmative selection or a statistical process (step 1120) and provides the preferences to the control routine. The control routine instructs the graphics engine to update the view of the medium, for example, by displaying a selected reading in a browser window or highlighting a set of recommended readings in a previous view (steps 1130 and 1140). The statistical analysis could include a combination of spreading activation and citation analysis similar to the analysis used in creating the medium and can employ cocitation and bibliographic coupling strengths as association matrices in the spreading activation model. When implicit selections are used, the source vector can be seeded based upon a history of user selections weighted by time and frequency of the selections.
While the results of the statistical analysis are displayed on a display in step 1140, the user could also arrange the information in structuring substrates, such as information visualization spreadsheets (for examples of information visualization spreadsheets, please see E. H. Chi et al., “A Spreadsheet Approach to Information Visualization,” ACM Symposium on User Interface Software and Technology (UIST '97) 79–80 (1997). 79–80; E. H. Chi, “A Framework for Information Visualization Spreadsheets,” Ph.D. thesis, University of Minnesota (1999), all of which are incorporated by reference herein) and perspective walls (for examples of perspective walls, please see J. D. Mackinlay et al., The Perspective Wall: Detail and Context Smoothly Integrated,” ACM Conference on Human Factors in Computing Systems (CHI '91) 173–179 (1991); and U.S. Pat. No. 5,689,287 issued to Mackinlay et al. on Nov. 18, 1997, all of which are incorporated by reference herein).
Of course, the user could forego statistical analysis and collect sets of references and/or contents into groups and arrange them in any manner that suits the user.
C. Example
To provide a concrete example of the medium and data processing system consistent with principles at the present invention, S. K. C
Statistical processing was used in this visualization. Accordingly, the user interface created a database of linkages from the objects of the medium. These linkages were used to derive citation matrices, cocitation matrices, and bibliographic coupling matrices, which form the basis of the tools with which users interact with the medium.
The lower part of contents area 1210 is a citation board 1240 which displays objects of the secondary material (the set R0 of
Color and display order can be used independently to create visual patterns. A user can select any of the icons representing the materials by, for example, clicking the left button of the mouse. Upon selection, the control routine and graphics engine could change selected material 1250 in form to provide feedback that the desired material was selected (for example, by turning an icon representing the material 1250 green). If an icon of an object in content board 1230 is selected, control routine instructs the graphics engine to change the form of an icon 1260 of the object in citation board 1240. In other words, the content board 1230 and the citation board 1240 are linked. Upon selection, the control routine, graphics engine, and linking program could display selected material in browser area 1220. A different selection could highlight material in a set of interest. For example, selection with the right mouse button will stand up items in the set of interest and turns them blue without displaying the information in the browser area 1220.
Also, the user could search for material by keywords and fields through, for example, a dialog box initiated by an onscreen button. For example, in
As a default, the material in the citation board 1240 is sorted alphabetically by the first author's nume. The user can, however, provide different visualization of the medium. For example, in
To find articles of high influence, the user can rearrange the citation board by the number of times the reference material is cited. In
To make the more heavily-cited articles stand out against a background of time, in
As another line of investigation, the user has the system compute which articles in the content board, and hence in the book, cite a particular article. For example, in
To increase the likelihood that a substantive discussion would occur in a citing material that references the target material, the user can unselect materials without substantive discussion. For example, the user could unselect the left column of content board 1230G, which represents introductions so that only articles are left.
This is shown in content board 1230H of
Alternatively, the user interface could highlight the more relevant materials. To find the most relevant materials, the statistical analysis of this routine uses spreading activation on the cocitation matrix of the selected articles to produce an activation value. As shown in
A user can also select a document and the user interface to could recommend a document to read next based on, e.g., spreading activation over the cocitation matrix from that article. In
D. Conclusion
Systems and methods consistent with principles of the present invention create an electronic medium that is like no other. The medium can be viewed as an enhanced index that is generated using a source material as a seed. The generation of the index extracts information about features of the source material and features of secondary materials related to the source material. One index consistent with the present invention includes selected features of both the source and secondary materials.
In one of its aspects, the index points to a location on a storage medium for the content of secondary material related to a source material. Thereby, this content is available in seconds or minutes. In this regard, while a Web-based medium is within the scope of the present invention, a physical medium, such as a digital video disk (DVD), would have an advantage over the Web-based medium because all of the content could be accessed nearly instantaneously, rather than slower over a typical network connection. Broadband technologies offer the capability to reduce this disadvantage of a Web-based medium.
Because the publication is electronic, the publication overcomes the natural size and weight limitation of books. More importantly, the medium can accelerate a reader's interaction and enable new capabilities not afforded by books. For example, the present invention can provide a user with tools that respond to the user's needs and requests at a level of the collection, rather than just with a single work. This can provide the user with a greater understanding of the collected material and, perhaps, enable the user to create an original work based on the insight amassed during the interaction with the medium.
While there has been illustrated and described what are at present considered to be a preferred implementation and method of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made, and equivalents may be substituted for elements thereof without departing from the true scope of the invention.
Modifications may be made to adapt a particular element, technique, or implementation to the teachings of the present invention without departing from the spirit of the invention. For example, while the previous discussion focused on published books, the present invention could be used to create a medium for a business paper, such as a deposition in a court case. In that case, the user could create from an existing bibliography, from a new work using a digital library, such as a companies accounting database, from workflow, or from any set of initial information, such as business intelligence information.
Similarly, catalogs could be used as source material. Secondary materials and features in catalogs could include technical data, specification sheets, and price lists.
In an academic setting, the medium could add to a student's understanding by providing required readings and all of the research put into the readings. This could help the student gain a better understanding of the material and, perhaps, author new works.
Also, the foregoing description is based on a client-server architecture, but those skilled in the art will recognize that a peer-to-peer architecture may be used consistent with the invention. Moreover, although the described implementation includes software, the invention may be implemented as a combination of hardware and software or in hardware alone. Additionally, although aspects of the present invention are described as being stored in memory, one skilled in the art will appreciate that these aspects can also be stored on other types of computer-readable media, such as secondary storage devices, like hard disks, floppy disks, or CD-ROM; a carrier wave from the Internet; or other forms of RAM or ROM.
Therefore, it is intended that this invention not be limited to the particular implementation and method disclosed herein, but that the invention include all implementations falling within the scope of the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5594897 | Goffman | Jan 1997 | A |
5761497 | Holt et al. | Jun 1998 | A |
5832476 | Tada et al. | Nov 1998 | A |
5835905 | Pirolli et al. | Nov 1998 | A |
6078924 | Ainsbury et al. | Jun 2000 | A |
6256648 | Hill et al. | Jul 2001 | B1 |
6289342 | Lawrence et al. | Sep 2001 | B1 |
6647534 | Graham | Nov 2003 | B1 |