Existing electronic search technologies retrieve documents using keywords and other familiar search techniques. The main goal of these search technologies is to retrieve all documents that satisfy the search criteria. Some of these search technologies use algorithms to present the search results in an order that reflects the anticipated usefulness to the searcher. For example, some online search engines rank results according to the number of other pages that link to that page.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Embodiments of the present invention generally relate to systems, methods, and computer-readable media for generating a research progression summary. A research progression summary provides a snapshot of documents, e.g., articles, that have made a significant impact on a particular field of research, or at least a portion thereof, over time. A countless number of documents relevant to a particular topic (i.e., a particular field of research or some portion thereof) is stored in one or more databases. Some documents relevant to a particular topic are more significant than others from the perspective of the academic researcher. However, the significance of a document is often not readily apparent by simply reading it. Research progression, in accordance with embodiments hereof, sorts through all accessible relevant documents, analyzes the importance of each, and summarizes for presentation only those documents determined to be of particular importance. In this manner, a researcher can readily determine how the thinking with respect to a particular topic has progressed over time. A particular research progression summary may focus on the historical developments in a particular field, current developments with respect to a topic of interest, an overall summary of a particular field/topic, or any combination thereof.
The present invention is described in detail below with reference to the attached drawing figures, wherein:
The subject matter of the present invention is described with specificity herein to meet statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventors have contemplated that the claimed subject matter might also be embodied in other ways, to include different steps or combinations of steps similar to the ones described in this document, in conjunction with other present or future technologies. Moreover, although the terms “step” and/or “block” may be used herein to connote different elements of methods employed, the terms should not be interpreted as implying any particular order among or between various steps herein disclosed unless and except when the order of individual steps is explicitly described.
A research progression summary provides a snapshot of documents, e.g., articles, that have made a significant impact on a particular field of research, or at least a portion thereof, over time. A countless number of documents relevant to a particular topic (i.e., a particular field of research or some portion thereof) is stored in one or more databases. Some documents relevant to a particular topic are more significant than others from the perspective of the academic researcher. However, the significance of a document is often not readily apparent by simply reading it. Research progression, in accordance with embodiments hereof, sorts through all accessible relevant documents, analyzes the importance of each, and summarizes for presentation only those documents determined to be of particular importance. In this manner, a researcher can readily determine how the thinking with respect to a particular topic has progressed over time. A particular research progression summary may focus on the historical developments in a particular field, current developments with respect to a topic of interest, an overall summary of a particular field/topic, or any combination thereof.
Accordingly, in one embodiment, the present invention relates to one or more computer-readable media having computer-executable instructions embodied thereon, that when executed, perform a method of generating a research progression summary for a particular research field. The method includes receiving one or more research criteria, identifying one or more documents that satisfies the research criteria, generating a research progression summary utilizing the identified documents, and storing the generated research progression summary.
In another embodiment, the present invention relates to one or more computer-readable-media having computer-executable instructions embodied thereon, that when executed, perform a method for generating a clustered-ranked-citation link graph. The method includes receiving document information for one or more documents, generating a citation link graph for the one or more documents, categorizing the one or more documents into one or more domains, generating a static rank for each of the one or more documents, and storing the clustered-ranked-citation link graph.
In a further embodiment, the present invention relates to a computerized system for generating a research progression summary for a particular topic. The computerized system includes a receiving module configured for receiving one or more research criteria, a retrieving module configured for retrieving document information from one or more documents, a generating module configured to generate a clustered-ranked-citation link graph, a research progression summary generating module configured to generate a research progression summary, and at least one database configured for storing at least document information from one of the one or more documents and the research progression summary.
Having briefly described an overview of embodiments of the present invention, an exemplary operating environment suitable for use in implementing embodiments of the present invention is described below.
Referring to the drawings in general, and initially to
The invention may be described in the general context of computer code or machine-useable instructions, including computer-executable instructions such as program components, being executed by a computer or other machine, such as a personal data assistant or other handheld device. Generally, program components including routines, programs, objects, components, data structures, and the like, refer to code that performs particular tasks, or implements particular abstract data types. Embodiments of the present invention may be practiced in a variety of system configurations, including hand-held devices, consumer electronics, general-purpose computers, specialty computing devices, and the like. Embodiments of the invention may also be practiced in distributed computing environments where tasks are performed by remote-processing devices that are linked through a communications network.
With continued reference to
Computing device 100 typically includes a variety of computer-readable media. By way of example, and not limitation, computer-readable media may comprise Random Access Memory (RAM); Read Only Memory (ROM); Electronically Erasable Programmable Read Only Memory (EEPROM); flash memory or other memory technologies; CDROM, digital versatile disks (DVD) or other optical or holographic media; magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, carrier wave or any other medium that can be used to encode desired information and be accessed by computing device 100.
Memory 112 includes computer-storage media in the form of volatile and/or nonvolatile memory. The memory may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid-state memory, hard drives, optical-disc drives, etc. Computing device 100 includes one or more processors that read data from various entities such as memory 112 or I/O components 120. Presentation component(s) 116 present data indications to a user or other device. Exemplary presentation components include a display device, speaker, printing component, vibrating component, etc. I/O ports 118 allow computing device 100 to be logically coupled to other devices including I/O components 120, some of which may be built in. Illustrative components include a microphone, joystick, game pad, satellite dish, scanner, printer, wireless device, and the like.
Turning now to
Computing system architecture 200 includes a user device 210, a server 212, and a database 214, all in communication with one another via a network 216. The network 216 may include, without limitation, one or more local area networks (LANs) and/or one or more wide area networks (WANs). Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets, and the Internet. Accordingly, the network 216 is not further described herein.
The database 214 is configured to store documents of interest to researchers and information associated with the documents. In various embodiments, such documents may include, but are not limited to, academic papers, master's theses, Phd. theses, dissertations, articles published in trade journals, articles published in scholarly journals, books, online resources, conference papers, and white papers. This list is not comprehensive, and any document relevant to researchers is contemplated to be within the scope of embodiments hereof. Further, the term “researchers”, as utilized herein, encompasses anyone attempting to access information about a particular topic, including, but not limited to, medical researchers, R&D researchers, students, teachers, professors, engineers, scientists, philosophers, sociologists, journalists, and so on. All fields of interest, ranging from the hard sciences, to the liberal arts are possible topics of research.
In embodiments, the database 214 is configured to be searchable for one or more of the items stored in association therewith. It will be understood and appreciated by the those of ordinary skill in the art that the information stored in the database 214 may be configurable and may include any information relevant to documents. The content and volume of such information are not intended to limit the scope of embodiments of the present invention in any way. Further, though illustrated as a single, independent component, database 214 may, in fact, be a plurality of databases, for instance, a database cluster, portions of which may reside on the end-user device 210, the server 212, another external computing device (not shown), and/or any combination thereof.
Each of the end-user device 210 and the server 212 may be any type of computing device, such as, for example, computing device 100 described above with reference to
As shown in
The receiving module 218 is configured for receiving one or more research criteria. In one embodiment, the receiving module is configured to receive the research criteria from the user device 210. Research criteria may be input by a user much like a standard keyword search query may be input, for instance, in association with an appropriate field presented on a graphical user interface, or the like.
In addition to the subject matter related research criteria, the receiving module 218 is also configured to receive weighing information research criteria (not shown). The weighing information may be used by the research progression summary generating module 228 to customize the research progression to meet the researchers needs. In this way, the researcher can receive a research progression summary that gives more weight to recent documents, documents of historical significance, and anything in between. Additionally, the researcher may chose to give more or less weight to citations from documents in different sub-domains in determining the significance thereof.
The retrieving module 220 is configured to search through one or more databases 214 and retrieve document information from one or more documents. Document information includes, but is not limited to, bibliographic information, citations to other documents, domain classification information, meta data, and information about the documents supplied by the author, publisher, or others that describes or classifies the document. Those of ordinary skill in the art will understand that there are many methods to search for and retrieve document information, all such methods are contemplated to be within the scope of the invention.
The clustered-ranked citation link graph generating module 222 is configured to generated a clustered-ranked citation link graph.
The communication module 224, is configured to communicate the research progression summary. The communication module may communicate with a user interface, a printer, e-mail generator or any other known communication means.
The graphical representation module 226 is configured to generate graphical representations to be presented, e.g., displayed, in association with the user device 210. For instance, the graphical representation module 226 may generate a display of the research progression summary, including graphical representations (e.g., icons) representing the important or significant documents selected for inclusion in the research progression summary. Methods of generating graphical representations are well known in the art and all known methods are considered to be within the scope of this invention.
The research progression summary generating module 228 is configured to generate research progression summaries. The research progression summary generating module 228 runs calculations over the clustered-ranked citation link graph, and selects documents that are significant for inclusion in the research progression summary. The research progression summary generating module 228 may take weighing information provided by the researcher into account when determining the relative significance of each document. Based on the weighing-research criteria, the research progression summary generating module will give more weight to certain factors that go into determining a document's significance rank. Specifically, the date of publication can be given more or less weight and the importance of the article within its own sub-domain can be given more or less weight. For example, citations by articles within the same sub-domain could be given more weight than citations by articles in different sub-domains, and even more compared to articles in different domains. In some embodiments, more weight will be given to recent documents than more historical, for example, in response to research criteria specifying a focus on the state of a particular field of research, or some portion thereof.
In embodiments, the research progression generating module 228 may also take into consideration the dates of publication when weighing the significance of a document. For example, documents having earlier publication dates will generally have more citations to them than a document having a later publication date just by virtue of age. Accordingly, looking only at total citations may cause more recently published documents to be excluded. In one embodiment, the significance of a document may be determined using citations per unit of time, such as year, rather than just total number of citations.
The linking module 232 is configured to generate links between the user interface and the actual documents in the database 214, or storage module 230. In one embodiment, upon selecting the link on the user interface the document may be opened in a separate window by software configured to open the document. In one embodiment, upon selecting the link, the user may be prompted to download, email, or print the document, or select other available options. Methods of linking documents and retrieving documents in response are well known in the art and all such methods are considered to be within the scope of the invention.
The storage module 230 is configured to provide storage for all other modules and processes that need temporary or permanent storage. Such items include, but are not limited to, retrieved document information, retrieved documents, the clustered-ranked citation link graph, the complete research progression summary, and all required steps in between.
Turning now to
The details of generating a research progression summary, in accordance with embodiments hereof, are explained subsequently. However, in general, the clustered-ranked citation link graph is analyzed and significant articles within the requested research domain or sub-domain are identified. The research progression summary 360 is generated using at least some of these documents and then presented to the researcher. The number of documents presented in a research progression summary can vary and the five shown in 360 are merely representative of one possible embodiment.
Referring next to
Next, computations will be run over the clustered ranked citation link graph using the research criteria and relevant documents will be identified 412. The identification algorithm could use any of the aforementioned research criteria, or a combination of research criteria to identify documents meeting the criteria.
Next, a research progression summary is generated 414. The research progression summary will be explained in more detail with reference to
Returning to
Turning next to
In an illustrative embodiment, the static rank for each document is calculated by determining how many times an individual document is cited in other documents. Finally, in an illustrative embodiment, each document is classified into one or more subject matter domains or sub-domains. In an illustrative embodiment the classification occurs by evaluating document information. The domain or sub-domain classification is then included in a field associated with the unique document identifier in the citation link graph.
The clustered-ranked citation link graph allows computations to be run over the graph that can then be used to produce a research progression summary. The clustered-ranked citation link graph makes it possible to determine the relative age of the documents, because older documents can't cite new documents. Additionally, the number of citations per documents and domain or sub-domain of the citing documents is also apparent. Further, the domain or sub-domain of each document within the clustered-ranked citation link graph is also apparent.
The completed clustered ranked citation link graph is then stored 522, and available for use in generating the research progression summary. In one embodiment the clustered-ranked citation link graph is stored in the storage module 234.
Turning next to
The present invention has been described in relation to particular embodiments, which are intended in all respects to be illustrative rather than restrictive. Alternative embodiments will become apparent to those of ordinary skill-in-the-art to which the present invention pertains without departing from its scope.
From the foregoing, it will be seen that this invention is one well adapted to attain all the ends and objects set forth above, together with other advantages which are obvious and inherent to the system and method. It will be understood that certain features and sub-combinations are of utility and may be employed without reference to other features and sub-combinations. This is contemplated by and is within the scope of the claims.