The invention is herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of the preferred embodiments of the present invention only, and are presented in order to provide what is believed to be the most useful and readily understood description of the principles and conceptual aspects of the invention. In this regard, no attempt is made to show structural details of the invention in more detail than is necessary for a fundamental understanding of the invention, the description taken with the drawings making apparent to those skilled in the art how the several forms of the invention may be embodied in practice.
In the drawings:
The present embodiments comprise a device, system or a method, which allow for the improved use of search-engines by creating, maintaining and making available keywords from past searches, and document search pattern data to produce more focused searches. From the user point of view the searcher is able to find a document, find keywords that have been used in the past in association with the document, and use the retrieved keywords to refine his search. The user can also use the retrieved keywords as an indication for the quality of his search.
The principles and operation of a device, system and method according to the present invention may be better understood with reference to the drawings and accompanying description.
Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not limited in its application to the details of construction and the arrangement of the components set forth in the following description or illustrated in the drawings. The invention is capable of other embodiments or of being practiced or carried out in various ways. In addition, it is to be understood that the phraseology and terminology employed herein is for the purpose of description and should not be regarded as limiting.
A preferred embodiment of the present invention is designed to provide a device for managing document search-pattern records. The device comprises an associative memory, such as a designated repository. The associative memory is configured for recording the usage of keywords in records to enable a user to be able to access keywords used by previous searchers. For example, the user may find a first document of interest and then access keywords that allowed earlier users to find that same document so that the user can now find further documents of interest and focus his search more effectively.
The device comprises an associative memory, such as a designated repository. The associative memory is configured for recording the usage in keywords of search queries such that each keyword is associated with a number of documents which are retrieved in response to the search queries that comprise the keyword. The device is configured for managing access to the associative memory and to the keywords which are stored therein. Preferably each keyword is stored in a designated record in the associative memory. The designated record is associated with related documents, as fully described below.
Another embodiment of the present invention is a method for managing the documenting of search query keywords. During the first step, keywords used by a search-engine user in a search query are received. Then the usage of each one of the keywords is stored in association with documents retrieved in response to the search query comprises one of the keywords. In the following step, independent access and usage thereof is given for the stored keywords, for example to allow later users to make more focused searches. The access is typically given via a communication network.
Another embodiment of the present invention is a system for facilitating access to keywords used in search queries over user networks. The system comprises a network-accessible repository which is usable for storing keywords such that each keyword is associated with documents retrieved in response to the search query for that keyword. The system further comprises user applications which are configured to be connected to the repository via a communication network. Each user application facilitates the retrieval of some of the keywords in response to submitting a document identification mark of a document which is associated with them.
A network entity may be understood as a server, a router, a personal computer, or any other computing unit, which can be used for implementing database management.
A communication network may be understood as the Internet, the Ethernet, a wired or wireless computer network, a local area network, etc.
A keyword may be understood as a word, a number, a term, a sentence, a phrase, a trademark, a file name, a URL, an IP address, a term, a phrase, a link, etc. keyword may also be understood as a string of keywords that comprise number of keywords and number of logical relationship between them.
A document may be understood as a Web page, a file, a WORD document, a PDF document, an XML page, an HTML page, an Internet page, or any other document which is accessible via the communication network.
A document identification mark may be understood as a hyperlink, a Uniform Resource Locator (URL) address, a pointer to a document, a logical address of a document in storage, a relative address of a document in storage, or a reference to a document or other resource.
Reference is now made to
Reference is now made to
In one embodiment of the present invention, the browsing application 11 is a client application which is configured to access the records which are stored the associative memory 2 directly.
In a preferred embodiment of the present invention, the keyword managing unit 1 is are configured to allow network users 10 to rely on keywords used in different user search queries in order to refine their searches, as further described below and depicted in
The network users 10 are preferably connected to the communication network 5 using computing units (not shown). Each computing unit may be understood as a personal computer, a personal digital assistant, a mobile telephone, or a laptop. Each computing unit is used for hosting a browsing application 11. In one embodiment, the browsing application 11 is a Web browser such as the Microsoft Internet Explorer™ Web browser. The Web browser allows a user to access any Web page which is available via the communication network 5. As commonly known, each Web page has an address such as a URL address, which is a standard way of specifying the location of an object on the Internet. The Web browser points to the URL of a Web page to receive a related Web page in the hosting computing unit. In another preferred embodiment the communication network 5 is a geographically limited communications network such as a LAN. The communication network 5 may be a communication network of a business entity, such as a Lawyers' office or a company, or a public entity, such as a library or a governmental organization. In such an embodiment the keyword managing unit 1 is used to document the searching activity of local network users 10.
The keyword managing unit 1 is configured to record the searching activity of network users 10 which are connected to the communication network 5 using the browsing applications 11. In one embodiment of the present invention, the keyword managing unit 1 is connected to one or more search-engine servers 6, either directly or via the communication network 5. This connection allows the keyword managing unit 1 to document search queries and documents, which are retrieved in response to the search queries, as described below.
The search-engine server 6, which is connected to the communication network 5 is accessible to user 10 by its IP or URL address and lets the user perform keyword searches for information on the communication network 5. As would be known by any programmer of ordinary skill in the art, a search-engine server includes the following major components: a means to access a collection of documents available over the communication network; an indexing component for building an index of the document collection; and a retrieval (or search) component that, in response to a search query, provides via the index a subset of documents or links that are identified as the search results that are relevant to the query, preferably by some ranking criteria. A document collection typically consists of a certain number of electronic documents of various formats, such as text files, HTML Web pages, or links a link thereto. Large-scale document retrieval systems generally use inverted indices, i.e., indices that record for each keyword (called an index keyword) a list of documents that contains that keyword. Such a list is usually termed an inverted list. Each inverted index consists of many inverted lists, each of which corresponds to a keyword in the index. In many cases, the inverted index may include more information on the frequency, occurrence positions and text formats of each keyword in each document. A document may contain many keywords, and hence may be included in many inverted lists.
Preferably, the search-engine server 6 comprises one or more indices or inverted indices that map the document collection which is available through the computer network 5. As in many common search-engines, when a network user 10 uses the browsing application 11 to access the search-engine server 6 and makes a search query, by giving keywords, the search-engine looks up the index and provides a listing of best-matching documents according to its criteria, usually with a short summary containing the document's title and, sometimes, parts of the text. The search-engine preferably supports search queries that comprises Boolean terms such as AND, OR and NOT which are used to further narrow the search query and other features such as a proximity search, which allows the network user 10 to define the distance between keywords. It should be noted that the manner of performing keyword searching is well known and, hence, will not be described here in detail.
The keyword managing unit 1 is used for recording the keywords in the submitted search query and the documents which are retrieved in response thereto. In one embodiment of the present invention, when a network user 10 uses the browsing application 11 to access the search-engine server 6 and make a search query, the keywords which are used in the search query and the document identification marks which are retrieved in response to the search query are transferred to the managing agent 3 of the keyword managing unit 1. The managing agent documents the keywords in one or more keyword records which are associated with one or more document records, which may be addressed as keyword records hereinafter. The document records comprise document identification marks which then retrieved in response to the aforementioned search query. Preferably, each document, which has been retrieved in response to a particular input search query, is associated with a document record. The document record is associated or linked with one or more keyword records, each of which comprises a keyword used in the certain given input search query. Each keyword record is coupled to a counter that counts the number of occurrences of the related keyword in subsequent search queries in order to reflect the prevalence of the related keyword in different search queries that resulted in retrieving the document which was documented in the associated document record. This information is preferably collected in a dynamic manner, as further discussed below. The collected information allows a network user 10 to refine his search based upon searching activity of other network users, which activity is documented in the associative memory 2. A remotely located network user can receive information from the associative memory 2 that indicates which keywords are usually used for retrieving certain documents, as further described below. Such a process may be regarded as a search in reverse, hereinafter a reverse search, since the keywords are retrieved in response to document identification marks and not the opposite, as in a common search process. In one embodiment of the present invention the associative memory 2 is a designated repository which is used to document the document search-patterns, as explained in greater detail below.
Reference is now made to
The keywords which are used by the network users are documented in the document search-pattern repository by the managing agent. A list, preferably dynamic, of document records constitutes documents retrieved in response to keywords used by network users during their searches. The exemplary database architecture, which is depicted in
Preferably, the document record 56 comprises a validation entry. The validation entry is used to store the last time a certain document record 56 has been updated. Such a validation entry may be used to refresh the repository by deleting document records which have not been updated for a certain period. Optionally a field creation entry may be stored in the document record 56. The creation entry is used to store the creation time of the document record. Such a creation entry may also be used to refresh the repository.
It should be noted that other implementations of the repository are possible. In one embodiment of the present invention, the keyword entry 58 comprises pointers to the data fields of a collective keyword list that comprises all the keywords and terms which have been documented in different search queries. Such an implementation may substantially reduce the required memory storage capacity, thus effectively lowering the storage hardware cost, and greatly increasing the speed of generating and processing keyword records.
Clearly, the number of records in the dynamic document search-pattern repository depends on the number of performed search queries and retrieved documents. The higher the number, the more comprehensive the document search-pattern repository will be.
Reference is now made to
As described above, the document search-pattern repository is configured to be dynamically updated according to network users' search queries. Such dynamic updating allows the document search-pattern repository to provide network users with information regarding the frequency of use of different keywords. However, in order to provide more comprehensive information regarding the search-patterns of the stored documents, the document search-pattern repository has to be expanded. In one embodiment of the present invention, as shown in
Preferably, one of the attribute entries is an IP entry 106 which is used to record the IP of the user that submitted the search query. Other user identification marks such as IP addresses, subscriber names, or email addresses, may be used. Another attribute entry 108 records the country of origin from which the network user accessed the communication network for using the search tool of the search-engine server. This information can easily be tracked as the IP of the network user is available and mostly its origin is generally indicative of the country.
Preferably, one of the attribute entries is a time stamp entry 107 that documents the time in which each network user accessed the communication network for using the search tool of the search-engine server. This information can easily be tracked as a clock-based module that can be used to indicate the exact time each network user accessed the search-engine server. Preferably, time adjustments are made in order to adjust the access hour according to the time zone of each user. The time zone can be identified according to the IP address that reflects the country of origin, as described above. It should be noted that a different time intervals may be documented. Such time intervals may be daily hours, seasons, months, or days of the week. Relative time or local time may be used.
The attribute entries may also be used for recording user-related information. Such information may be documented if the search-engine server or the keyword managing unit has more information about the network user that submits the search query. Examples of attribute lists that document keyword usage in search queries that retrieve the related document are presented in
As described above, the keyword managing unit is configured for documenting information about the network users. As further described above, the keyword managing unit is configured for allowing different users to submit search queries. One embodiment of the present information allows the differentiating between different search queries which are submitted by different users. In such an embodiment, keywords of search queries which have been submitted by certain users may be given with more weight than keywords of search queries which have been submitted by others. Users may be divided into different groups; each group preferably represents different professional level. For example, users may be divided to novice searchers, average searchers, and professional searchers. In such an embodiment the records of the document search-pattern repository are updated according to the user professional level. For example, if a novice user used a certain keyword in a search query, the counter which is associated with the keyword is increased by one. However if a professional user used the same word in a search, the counter is increased by 3. In order to implement such an embodiment the document record 103 may comprise an attribute entry 113 that stores the professional level of the user.
Preferably, the document record 101 comprises attribute entries which record navigational data. Navigational data includes log files and click stream data. Navigational data can identify a user's Web browser and operating system, when and for how long a user visits a certain Website, what pages a user views on a Website, and the address of the Website that the user visited immediately prior to that Website. This information is typically used to administer a Website, improve Website content, and compile aggregated statistics for marketing and research purposes. The navigational data may be collected on the server side by examining Web server page request logs or on the client side by monitoring user surfing patterns using, for example, a designated add-in. Such information can better reflect the relevance of the associated document to the keyword which was used in the search query in which it was retrieved. Clearly, a certain document which a user spent a significant amount of time viewing, or users spend time, or a website in which users a certain Website in which a user viewed a large number of pages, is more relevant to the keyword which was used in the search query that retrieved it than a document or Website which was viewed only briefly. Thus, documenting the navigational data may allow the user to rely on better information when conducting his search. Moreover, by using the navigational data one can avoid misleading keywords. Even if a certain keyword was used for a particular document in a large number of search queries, the related navigational data indicates that the keyword is not relevant to the particular document since users did not utilize the retrieved document.
In one embodiment of the present invention the document record 102 comprises an attribute entry 112 that records the time a certain network user stays in the related Website which is pointed by the document entry 51. Such information can be acquired by different calculations which are based on navigational data which is related to the user. Preferably, the time a certain network user stays in the related Website is updated by an external source which is designated for acquiring such information. Another preferred attribute entry documents the average number of pages the user visits in the related Website.
Since the document records 101 records all the keywords which are used in different search queries. As described above, the total amount of search queries that uses a certain keyword to retrieve a certain document can easily be calculated.
Reference is now made to
The display 500 is further configured to display in parallel relevant document search-pattern information which is stored in an associative memory such as the aforementioned document search-pattern repository. Preferably, the keyword managing unit is configured to receive one or more document identification marks and, accordingly, to retrieve one or more sets of related keywords and additional related information. Preferably, the keyword managing unit is configured to retrieve the most prevalent keywords which are used for retrieving the document. In one embodiment of the present invention, the user application is configured to display a pop-up window 505 that is configured to show relevant statistical document search-pattern information, when available, about the retrieved documents that comprise the search result list 503. Preferably, the pop-up window 505 is automatically displayed when the input pointer 504 is moved over one of the links, or when a designated button is pressed. Preferably, when the input pointer is moved over one of the links, a related document identification mark is sent to the keyword managing unit. The keyword managing unit retrieves matching keywords and preferably additional information. The retrieved keywords are presented in the pop-up window, as depicted at 506.
As shown in the exemplary display of
As described above, other document search-pattern information is stored in the associative memory. In use, the additional information may also be displayed in parallel in the pop-up window 505 or, if desired, in a separate pop-up window, as per the user's requirements. Preferably, the keywords 506 which are displayed in the pop-up window 505 and were submitted in text box 502 are displayed in bold letters. Such a display facilitates the user to distinguish between words that he already uses to words that might assist him to refine his search. Preferably, the user can move the input pointer 504 over one of the displayed keywords 506 and click on it in order to add it to his search query. A search according to the new search query that comprises the selected keyword may be preformed automatically after the selected keyword has been clicked on.
In one embodiment of the present invention, the links of the search result list 503 are arranged according to the prevalence of keywords, which were submitted in searches retrieving the linked document in the past, and are currently submitted in text box 502. As commonly known, elements of search result lists are usually arranged according to numerical weighting methods which are used for evaluating the relative importance of elements that comprise the list. Usually, each element of a hyperlinked set of documents, such as the World Wide Web, is weighted for the purpose of measuring its relative importance within the set. Such methods may be applied to any collection of entities with reciprocal quotations and references. An example of such a numerical weighting method is the PageRank method by Google™.
Preferably, the links that comprise the search result list 503 are arranged not only by their numerical weight, but also by their prevalence in search result lists, which were generated according to previous searches comprising one or more of the keywords, used in generating the search result list 503. Such an embodiment can be implemented by accessing related records in the associative memory. As described above, an associative memory such as the document search-pattern repository preferably comprises records of documents or document identification marks. Each record is associated with entries that reflect the prevalence of different keywords in search queries retrieving the document stored or marked in the associated record.
Preferably, the numerical weight of links of the search result list 503 is determined by matching keywords used in the search query which is currently submitted by the user with document records that reflect the prevalence of different keywords in search queries retrieving the linked document. The higher the prevalence of the matched keywords in search queries retrieving the linked document, the higher the given numerical weight.
In one embodiment of the present invention, the records of the associative memory are used for identifying similar documents. As commonly known, some search engines allows users to access similar pages by clicking on a designated link. When the user selects the link for a particular result of the result list, the search engine automatically scouts the Web for pages that are related to this result. In the present invention, when the user selects such a link 551 for a particular result of the result list a related document identification mark is sent to the document search repository, preferably via the search engine, and similar documents are scouted based upon related records of the document search pattern repository are retrieved. For example, the time stamp of the document record, as described above, may be used to find similar pages. Documents which are accessed by the same user, approximately at the same time, can be estimated as similar documents. The similar documents may be documented offline or online. The retrieved similar documents can be chosen according to the information which is documented in the document records. For example, a common user which is documented in the IP entries, a common age group, or the combination thereof.
As described above, user applications may be used for accessing the associative memory, which is part of the keyword managing unit, for downloading related records, as described above. Such ability allows the users to use the information stored in the associative memory to refine their searches. For example, in
As described above, the pop-up window 505 is configured to display a list of keywords according to their usage in previous searches that retrieved the related document. The list of keywords indicates the prevalence of each one of the list's keywords in previous searches. In one preferred embodiment, the pop-up window 505 is configured to display a list of keywords which is a conjunction of two or more lists of keywords which are each associated with different documents. In such an embodiment the user can chose two or more retrieved document from the search result list 503. The keyword managing unit receives two or more respective document identification marks and generates for each one of them a list of keywords, as described above. Than the keyword managing unit chooses keywords with the highest number of occurrences in the sum of the occurrences from each one of the two lists of keywords. In another embodiment of the present invention, this process is done automatically as a list of keywords which is a conjunction of two or more lists of keywords is produced for a predefined number of documents in each search result list. For example, such a list of keywords may be automatically produced for the portion of the search result list which is currently displayed on the screen.
In another preferred embodiment the list of keywords is displayed in a diagram such as a graph or a chart. The diagram may be used for displaying a series of points or lines to demonstrate a connection between two or more attributes. For example, as depicted in
In another embodiment of the present invention, the keyword managing unit allows users to refine their searches using keywords which have been used by a certain user or group of users which are, preferably, from a common location or part of the same department. As described above, each document record may comprise an IP entry that records the IP of the user that submitted a related search query. Preferably, the keyword managing unit can retrieve the keywords a certain user used in a search query for retrieving a certain document.
Reference is now made to
In another embodiment of the present invention, the keyword managing unit may further comprise a search-engine module. The search-engine module is preferably configured to search the associative memory, using the keyword managing unit, according to a received search query or index. As described above, the associative memory documents querying information that is associated with different documents which are accessible via the communication network. An exemplary structure of the relationship between records that are stored in an associative memory such as the document search-pattern repository is depicted in
As described above, the information, which is accumulated in the associative memory which is connected to the keyword managing unit, reflects the behavior of a network user. As such, the keyword managing unit using the search-engine module can be used as an analytic tool for analyzing the behavior and search patterns of network users. Such an analytic tool can be used in academic and commercial studies. For examples, advertisers, Website administrators and promoters can utilize the database to identify which keywords are used to retrieve certain documents in order to improve their traffic, search-engine ranking, and Web presence. For instance, advertisers can use the keyword managing unit to improve their website hit rate by identifying which keywords are commonly used for retrieving their website and websites which are related to their service or product. Moreover, the keyword managing unit may be used to identify which demographic groups retrieve their website or websites which are related to their service or product. In addition, the keyword managing unit may be used to identify which segment of the population uses which words for searching their website or websites which are related to their service or product. Such information can be highly beneficial for improving marketing activities.
Psychological information can be gathered and analyzed according to the statistical information which is gathered, as aforementioned, in the database.
One embodiment of the present invention is related to the generation of document summaries in search result lists. As commonly known, search result lists include individual entries that have been identified by the search-engine as satisfying the user's search expression. Each entry includes a hyperlink that points to a URL location or a Web page. In addition to the hyperlink, certain search result pages include a short document summary that describes the content of the URL location. Typically, search-engines generate this document summary from the file at the URL, and only provide acceptable results for URLs that point to HTML format documents. For URLs that point to HTML documents or Web pages, a typical document summary includes a combination of values selected from HTML tags. These values may include a text from the Web page's “title” tag, from what are referred to as “annotations” or “meta tag values” such as “description,” “keywords,” etc., from “heading” tag values (e.g., H1 or H2 tags), or from some combination of the content of these tags. Some search-engines generate the document summary according to matches between document features such as the HTML tags and the keywords that comprise the search query that initiate the retrieval of the summarized document. However, it is noted that search query keywords may not always accurately reflect the content of the summarized document and such a summary may, therefore, mislead the user.
In one embodiment of the present invention, the records of the associative memory are used for generating a summery of an associated document. As described above, the associative memory comprises documentation of the keyword usage in connection with different documents which are retrieved in response to search queries that comprise related keywords. Preferably, during the document summary generation, the generating module of the search-engine accesses the associative memory using the keyword managing unit. This allows the generating module to use keywords which are stored in association with related document records instead of keywords which comprise the user's search query. Preferably, only the most common keywords are used for generating the document summary. For instance, in the example depicted in
Reference is now made to
Then, as shown at 302, the usage of each one of the keywords is stored in the associative memory, preferably in designated records. The designated records are associated with one or more documents which are retrieved in response to the search query. If additional information is received, it is stored in association with the keywords of the search query. An exemplary database structure is disclosed explained in detail hereinabove. In the following step, as shown at 303, independent access to the designated records and the ability to use them is provided to the user via a communication network. If additional information is stored, access is given thereto as well. Preferably, the user can use user applications, as described above, to access the designated records which are stored in the associative memory using the keyword managing unit.
Reference in now made to
It is expected that during the life of this patent many relevant devices and systems will be developed and the scope of the terms herein, particularly of the terms search-engine, server, Website, Web page, communication network, and user application are intended to include all such new technologies a priori.
Additional objects, advantages, and novel features of the present invention will become apparent to one ordinarily skilled in the art upon examination of the following examples, which are not intended to be limiting. Additionally, each of the various embodiments and aspects of the present invention as delineated hereinabove and as claimed in the claims section below finds experimental support in the following examples.
Reference is now made, once again, to
It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination.
Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims. All publications, patents, and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention.
This Application claims the benefit of U.S. Provisional Patent Application No. 60/747,418, filed on May 17, 2006, the contents of which are hereby incorporated by reference.
| Number | Date | Country | |
|---|---|---|---|
| 60747418 | May 2006 | US |