The present invention relates to software generally, and more specifically to an information search method and system.
Information is generated and accumulated at an astonishing speed. A method of effectively searching information related to a specific subject is a necessary means to resolve real-life problems. Many commercial search engines such as Google provide the function of searching internet web sites for a string of words through indexes created by their own proprietary algorithms.
A search engine is a program that searches documents in web sites for specified keywords and returns a list of documents where keywords were found. Typically, a search engine works by sending out spiders to automatically fetch documents in web sites and feed back to the search engine. It is called a spider because it “crawls” over the web. The search engine then reads these documents and creates indexes based on its proprietary algorithm. Due to inherit limitation of the proprietary algorithms employed by search engines, some related web sites may be neglected. After receiving a query, the search engine in fact searches the indexes rather than going out to direct search web sites again. As a result, some search results are not the latest information, although spiders periodically send back information to update indexes. In addition, concurrently searching a plurality of databases is available in the prior art.
Patents are an important portion of information that people in many industries would like to search. Patents usually cite other patents in the same or similar technology fields that are published earlier as prior art. Thus, the relationship among patents that cite or are cited by other patents indicates a certain degree of relevance among those patents. The identification of cited patents such as patent number is generally included in a patent document. Through the citation list in patent documents, a citation search is available to provide an indication among patents. For example, published United States patents have a field of “reference cited” listing other related patents as prior art. In the web site of the United States Patent and Trademark Office, the function of a basic citation search is provided.
A computer-based information search method comprises the steps of: receiving at least a search query, the search query comprising at least one term; receiving a network resource list, the list comprising at least one web site selected from a predetermined web site list; semantically analyzing the search query; and searching the network resource list for a response to the search query using a search engine. A computer-based citation search method comprises the steps of: receiving a search query, the search query comprising at least one patent identification condition; receiving a list of patent databases, the list comprising at least one patent database; searching the list of patent databases to collect at least one reference patent that cites patents or is cited by patents satisfying the condition of the search query; and producing a citation list, the list comprising at least an owner of the reference patent.
An exemplary embodiment of the present invention provides an efficient computer-based information search method and/or citation search method.
In a computer-implemented system, the step 210 is performed by a search-query receiving means that receives at least one search condition. The search-query receiving means can be a processor programmed to receive a search condition. The program can be written in any kind of computer language such as Java, C, C++, Visual C, Visual Basic, or Assembly. Various input devices that can be used to pass the data to the processor can include but are not limited to a keyboard, a mouse, a touch-screen, a writing recognition device, a voice recognition device, a storage medium reading device, a network connection, or the like.
At step 220, a network resource list is received. The network resource list comprises at least one web site selected from a predetermined web site list. A user can request to search at least one specific web site in addition to a routine search conducted by a search engine which indexes web site information by its own proprietary algorithm. Consequently, search results with high relevance can be attained because the user may have better knowledge about which web sites may contain more related information relevant to a specific search. In addition, by directly searching user-specified web sites, the most updated search result can be obtained from these web sites, compared with the search conducted by the search engine. Because the search engine searches indexes created by itself (rather than directly searching web sites) to find related web sites, and because the indexes are only updated once in a while from the information sent back by spiders, the search result from the search engine can be outdated.
In some embodiments, the predetermined web site list from which the user can specify at least one web site to search is categorized by technologies. A tree structure is used to form technology categorization. For example, in
In addition to the specified web sites, the user can also request to search some specific databases and other network resources. For example, a U.S. patent database and a database of IEEE published papers can be searched.
In a computer-implemented system, the step 220 is performed by a network-resource-list receiving means that receives a list of web sites and/or databases. The network-resource-list receiving means can be a processor programmed to receive a list of web sites and/or databases. The program can be written in any kind of computer language such as Java, C, C++, Visual C, Visual Basic, or Assembly. Various input devices that can be used to pass the data to the processor can include but are not limited to a keyboard, a mouse, a touch-screen, a writing recognition device, a voice recognition device, a storage medium reading device, a network connection, or the like.
At step 230, the search query is semantically analyzed before the search is conducted. When the search query contains more than one word, a semantic analysis is undertaken to obtain relations between words used in a phrase, a sentence, a paragraph, or an article as a guidance for the search conducted thereafter. Several commercial products, such as PolyAnalyst from Megaputer, Knowledgist from Knowledge Management Connection Corporation, TextAnalyst, Hunter-Gatherer, Semantic Web, or Ontologies, can be incorporated to perform semantic analysis.
In a computer-implemented system, the step 230 is performed by a semantic analysis means that analyzes the search query. The semantic analysis means can be a processor programmed to analyze the search query. The program can be written in any kind of computer language such as Java, C, C++, Visual C, Visual Basic, or Assembly.
At step 240, searching the network resource list for a response to the search query is conducted by using a search engine. The search engine searches specified databases and web sites listed on the network resource list, in addition to a routine web site search conducted through the proprietary algorithm of the search engine. In some embodiments, searches are conducted at a pre-scheduled time. Several commercial products of a search engine, such as Field Search Management from Empolis, Freesearcher, KMS from Intumit Technology Corporation, Yahoo, Google, or Altavista can be employed to perform the network resource searching. A combined search result is then presented to the user.
In a computer-implemented system, the step 240 is performed by a search means that searches the web sites and/or databases on the list received. The search means can be a processor programmed to search the web sites and/or databases. The program can be written in any kind of computer language such as Java, C, C++, Visual C, Visual Basic, or Assembly.
In a computer-implemented system, the step 310 is performed by a translation means that translates the search query into another language. The translation means can be a processor programmed to receive a search condition. The program can be written in any kind of computer language such as Java, C, C++, Visual C, Visual Basic, or Assembly.
At step 330, after receiving search results from the search engine, in some embodiments, the search result is prioritized by an attribute selected by a user. For example, the search result can be prioritized by the date each documents was generated. The search result can be prioritized simply by the level of word-for-word matching with the search query. The search result can also be prioritized for the relevance with the search query using subject-action-object analysis.
At step 340, a summary report of an item of the search result is produced. The search results may contain a long article or patent that consumes tremendous amount of time to read. The article or patent can be summarized. Accordingly, the user can quickly catch the gist of the article or the patent and decide whether he/she wants to read more contents about the article or patent. Many algorithms can be used to produce the summary report. For example, the summary report is generated by using subject-action-object analysis. Several commercial products, such as KMS from Intumit Technology Corporation, can be employed to produce summary reports.
In
At step 420, a list of patent databases is received. More than one database can be specified. The patent databases can include issued patents and published patent applications. The patent databases can be United States patent database, Japanese patent database, or European patent database. When a different language is required to search a specified patent database, the search query is translated into that language for conducting the search.
At step 430, patent databases are searched to collect first tier reference patents that cite or are cited by patents satisfying conditions of the search query. Using the aforementioned search query, for example, the first tier reference patents are patents that cite or are cited by IBM's patents issued after 1 Jan. 2002. In other words, the first tier reference patents are patens having forward citation relationship or backward citation relationship with IBM's patents issued after 1 Jan. 2002.
At step 440, a citation list is produced. In one embodiment, the citation list comprises owners, patent numbers, titles, and issued dates of the first tier reference patents. In some embodiments, patents commonly owned by a single entity are identified in the citation list even if those patents specify different names of assignee. Various statistical functions such as summation can be performed while producing the citation list. For example, a citation list of first tier reference patents citing IBM's patents issued after 1 Jan. 2002 can be first sorted by owners and further sorted by issued dates.
In a computer-implemented system, the step 410 is performed by a search query receiving means to receive at least one search condition. The step 420 is performed by a patent-database-list receiving means to receive a list of patent databases. The step 430 is performed by patent-database searching means to collect first tier reference patents. The step 440 is performed by citation-list producing means to produce citation list. These means can be a processor programmed to appropriately perform specific functions. The program can be written in any kind of computer language such as Java, C, C++, Visual C, Visual Basic, or Assembly.
At step 520, a notice is generated to a predetermined person when the owner of the first tier reference patents matches a predetermined entity. For example, if the predetermined entity is Intel, a notice is generated to a manager when the owner of at least one first tier reference patents is Intel. Taking the example of the search query of IBM's patents issued after 1 Jan. 2002, a notice is generated if at least one Intel patent cites or is cited by IBM's patents issued after 1 Jan. 2002. In some embodiments, the notice is automatically generated by the system and sent to the predetermined person by e-mail.
In a computer-implemented system, the step 520 is performed by a notice generating means that generates a notice to a predetermined person. The notice generating means can be a processor programmed to generate a notice. The program can be written in any kind of computer language such as Java, C, C++, Visual C, Visual Basic, or Assembly. In some embodiments, the notice can be an electronic mail automatically generated by the system and sent to the predetermined person. In other embodiments, the notice can be a fax or a phone call automatically generated by the system.
At step 720, a second tier citation list is produced. In one embodiment, the second tier citation list comprises owners, patent numbers, titles, and issued dates of the second tier reference patents. Various statistical functions such as summation can be performed while producing the second tier citation list.
In a computer-implemented system incorporating processes shown in
The present invention may be embodied in the form of computer-implemented processes and apparatus for practicing those processes. The present invention may also be embodied in the form of computer program code embodied in tangible media, such as floppy diskettes, read only memories (ROMs), CD-ROMs, hard disk drives, high density (e.g., ZIP™) diskettes, electrically erasable programmable ROM (EEPROM), flash memory, or any other computer-readable storage medium, wherein, when the computer program code is loaded into and executed by a computer, the computer becomes an apparatus for practicing the invention. The present invention may also be embodied in the form of computer program code, for example, whether stored in a storage medium, loaded into and/or executed by a computer, or transmitted over some transmission medium, such as over the electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the computer program code is loaded into and executed by a computer, the computer becomes an apparatus for practicing the invention. When implemented on a general-purpose processor, the computer program code segments configure the processor to create specific logic circuits.
Although the invention has been described in terms of exemplary embodiments, it is not limited thereto. Rather, the appended claims should be construed broadly, to include other variants and embodiments of the invention, which may be made by those skilled in the art without departing from the scope and range of equivalents of the invention.