Embodiments described herein relate generally to an interest extraction device and an interest extraction method, which determine what part of text information such as a web page or a manuscript a user browsing the text information is interested in and recommend information suitable for the user.
There have been demands for determining what part of text information (also called a “document”) such as a web page or a manuscript a user browsing the text information is interested in, and for recommending information suitable for the user. For devices of this type, a proposal has been made for technology for updating importance degrees of keywords located near keywords being operated in a page (for example, see JP-A 2001-188792 (KOKAI)).
However, according to the method described above in which keywords included in a page are simply extracted and subjected to a search, there is a case that different search results, such as homonyms, are presented. There is another case that, even when one same document is browsed, which content attracts attention differs depending on context. Since an interesting point cannot adequately be determined, how much a recommended content matches an interest of a user can not be estimated when the recommended content is presented. Among conventional proposals, the technology for searching relevant documents with a focus on the periphery of a word pointed on a page does exist. However, there is no technological proposal for presenting content to be recommended for a document to be browsed next to the present document, based on an interest on an immediately preceding document.
In general, according to one embodiment, an information recommendation device includes an input unit, a subject-keyword extraction unit, an interest-keyword extraction unit, an interest-keyword extraction unit, an acquiring unit and a presentation unit. The input unit is configured to input a first document browsed by a user, and a second document which has been browsed before the first document. The subject-keyword extraction unit is configured to extract one or more first subject keywords from the first document, and to extract one or more second subject keywords from the second document. The interest-keyword extraction unit is configured to extract one or more first interest keywords from the first subject keywords and the second subject keywords, and to extract one or more second interest keywords from the first subject keywords and the second subject keywords, based on information items specifying the first document and the second document, the first interest keywords, the first subject keywords, and the second subject keywords, the second interest keywords being estimated to be keywords in which the user is next interested. The acquiring unit is configured to acquire, based on the second interest keywords, recommendation information items on one or more third documents which are candidates to be browsed after the first document. The presentation unit is configured to present the recommendation information items.
Hereinafter, various embodiments will be described with reference to the accompanying drawings.
According to one embodiment, content/service recommendations can be adequately performed so as to match a user's interest. For example, when a user browses a page relevant to “grilled chicken-wing-tip restaurant in Kawasaki”, “Kawasaki” is understood to be an interest point if the user browsed a page of “French restaurant in Kawasaki” immediately before, or “chicken-wing-tip” is understood to be an interest point if the user browsed “grilled chicken-wing-tip restaurant in Yokohama” immediately before. Accordingly, content recommendation is possible with basing information to be presented next on keywords which more match a user's interest than important keywords derived only from a document being presently browsed, by a search considering an interest point (continuation of an interest) or recommendation of or a search for relevant keywords based on transition of an interest.
The following embodiment will be described based on the assumption that an interest extraction device 100 is included in a server and an information presentation device 200 is included in a terminal owned by a user. However, the same as described also applies to a case of including the interest extraction device 100 and information presentation device 200 in one same terminal. Further, the embodiment mainly deals with web pages as information or documents to be browsed. A web page which internally includes a still image and/or a moving image may be dealt with in the same manner as the aforementioned web pages.
Next, the interest extraction device 100 will be described with reference with
At first, subject keywords are extracted from text information of a web page (URL(t)) which the user presently browses, and subject scores are calculated and assigned to the subject keywords (step S1). In the present embodiment, positions of the keywords on the web page are used to calculate the subject scores. For example, a keyword existing in a title or located in the fore part of a body is assigned with a high score.
Further, a correction depending on a display area may be performed. For example, a keyword, which originally located in the back part of the body and is assigned with a low score, obtains a high score when the keyword is displayed at a high position as the web page moves up.
Next, interest keywords concerning transition to the present web page (URL(t))from a web page (URL(t-1)) which has been browsed immediately before are searched for, and interest scores are calculated and assigned to these interest keywords (step S2). A detection method for detecting the interest keywords is one in which, for example, when a hyperlink in a body is clicked, keywords in the periphery of the hyperlink are regarded as interest keywords. A calculation method for calculating the interest scores is one in which, for example, an interest score increases as a corresponding interest keyword is closer to a keyword or hyperlink which the user clicked or paid attention to.
Next, one or more keywords and queries to be used for chaining are determined based on weights of the calculated subject scores and the interest scores (step S3). In this case, a search method for a query and a presentation method are determined referring to chain rules stored in the chain-rule storage unit 105 by using the subject scores and interest scores. The chain rules will be described later. Further, a search result is presented, added with a reason, and sets of the interest keywords and the URLs of web pages are stored in the interest-keyword-history storage unit 104 (step S4). Processing then ends. Presentation of the search result added with the reason denotes to display the interest keywords by using a presentation method in a chain rule.
Next, operation of the interest extraction device 100 according to the embodiment will be described with reference to
At first, the user browses a web page through the information selection unit 202 by using the information presentation device 200.
Next, the interest-keyword extraction unit 103 associates a keyword included in the page being browsed with a URL of a next page, as an interest keyword. For example, the expression “here” in the body of the URL(t-1) in
Assume that the interest keywords in the above paragraph are stored in the interest-keyword-history storage unit 104 and the web page at the URL(t) is browsed. Then, descriptions existing in the periphery of the words “round roll” and “rolled cake” are considered to be interested in if an interest concerning transition to the page at the URL(t) from the page at the URL(t-1) is continued. Otherwise, “XX cafe Kawasaki ΔΔ plaza branch” which is a subject of the page being newly browsed is considered to be of new interest. The interest-keyword extraction unit 103 extracts “XX cafe Kawasaki ΔΔ plaza branch”, which is a keyword given a high subject score, “XoXo” which is a keyword appearing in the vicinity of the interest keyword “round roll” indicating a transition traced this time, and a set of “round roll” and “XoXo”, as new interest keywords for searching for and presenting recommendation information.
Then, a search query is generated, by the chain-rule application unit 106, based on the extracted interest keywords. The chain-rule application unit 106 selects, from the chain rules stored in the chain-rule storage unit 105, an applicable chain rule based on the subject scores, interest scores, and meaning classes of the interest keywords.
Concerning keywords extracted from
A search is actually performed by the recommendation-information acquiring unit 107 in accordance with the search query generated by the chain-rule application unit 106. Although the embodiment is assumed as performing a search using a web service, a search method other than a web service may be used, such as a database search from a dictionary stored in the interest extraction device 100.
URLs as results acquired by the recommendation-information acquiring unit 107 are stored in the interest-keyword-history storage unit 104, each combined in a set with an interest keyword upon which the query is based.
The results acquired by the recommendation-information acquiring unit 107 are presented to the user through the information presentation device 200 by the recommendation-information presentation unit 201, by using a presentation method described in a chain rule stored in the chain-rule storage unit 105. When the user selects one of presented contents, a web page corresponding to a URL as a recommendation result is then displayed as a page being browsed on the information presentation device 200.
Thus, when the user browses a web page, interest information can be extracted and information can be recommended in accordance with an interest.
Although the present embodiment uses only keywords included in a page browsed immediately before, as interest keywords, a method for decreasing scores by a function of n, such as 1/n, may be used for n-page preceding keywords.
The browsing-information input unit 101 may input a keyword expressing a situation which the user is presently in, in addition to a web page. For example, if a web browser is installed in a mobile terminal, a word such as “Kawasaki” is considered to be input as a keyword expressing a present location.
The present embodiment assumes that the interest extraction device 100 is used in a server and the information presentation device 200 is used in a terminal owned by a user. However, the interest extraction device 100 and information presentation device 200 may be configured to be integrated with each other. The interest extraction device 100 is applicable even to a popular computer which includes a control device such as a CPU, a storage device such as a ROM or RAM, an external storage device such as an HDD, a display device such as a monitor, and input devices such as a keyboard and a mouse.
The interest extraction device 100 in the above embodiment can also be achieved by using, for example, a general-purpose computer device as basic hardware. A program to be executed configures a module including each of the functions as described above. The program may be provided recorded in a recording medium, such as a CD-ROM, floppy (registered trademark) disc, CD-R, or DVD, which is readable from computers, or may be provided preinstalled in a ROM.
Alternatively, the interest extraction device 100 can be achieved by using, for example, a general-purpose computer device as basic hardware. That is, the browsing-information input unit 101, subject-keyword extraction unit 102, interest-keyword extraction unit 103, chain-rule application unit 106, recommendation-information acquiring unit 107, recommendation-information presentation unit 201, and information selection unit 202 can be achieved by causing a processor mounted in the computer device to execute a program. At this time, the interest extraction device 100 can be achieved by pre-installing the aforementioned program in the computer device. Alternatively, the aforementioned program may be stored in a storage medium such as a CD-ROM or distributed through a network, and the program can then by achieved by appropriately installing the program in the computer device. Further, the interest-keyword-history storage unit 104 and chain-rule storage unit 105 can be achieved by appropriately using a storage medium such as a memory, hard disc, CD-R, CD-RW, DVD-RAM, or DVD-R, which is built in or externally attached to the computer device.
Hereinafter, an information recommendation device according to one embodiment will be supplementarily described.
(1) An information recommendation device according to one embodiment includes: an input unit configured to input a plurality of documents; a subject-keyword extraction unit configured to extract one or more subject keywords from a predetermined document and a document immediately preceding the predetermined document; an interest-keyword extraction unit configured to extract one or more interest keywords from the subject keywords of the immediately preceding document and the predetermined document; an interest-keyword-history storage unit configured to store the interest keywords, wherein the interest-keyword extraction unit further extracts one or more next interest keywords which a user is likely to be next interested in, based on information specifying the predetermined document, the interest keywords, and the subject keywords of the predetermined document; an acquiring unit configured to acquire one or more next documents next to the predetermined document, based on the next interest keywords; and a presentation unit configured to present the next documents.
(2) In the information recommendation device according to the (1), the interest-keyword extraction unit extracts the interest keywords in consideration of transition to the predetermined document from the subject keywords of the immediately preceding document.
(3) In the information recommendation device according to the (1), the input unit acquires the predetermined document itself, based on the information specifying the predetermined document.
(4) In the information recommendation device according to the (1), the input unit acquires a title, a summary, and a body area from the predetermined document.
(5) The information recommendation device according to the (1) further includes a chain-rule storage unit configured to store a search rule for chaining to a next piece of content based on types of the interest keywords extracted by the interest-keyword extraction unit, and a chain-rule application unit configured to generate a search query based on the interest keywords and chain rule.
(6) The information recommendation device according to the (1) further includes an information selection unit configured to select a next document from the next documents presented by the presentation unit.
(7) In the information recommendation device according to the (1), the interest-keyword extraction unit inputs an additional keyword which expresses a situation of a user, such as a location of the user or an action of the user.
(8) In the information recommendation device according to the (1), the interest-keyword extraction unit extracts interest keywords included in documents which have been browsed within a predetermined range up to a preceding plurality of times, with weights.
(9) In the information recommendation device according to the (1), if a browsed document is browsed again, the interest-keyword extraction unit decreases scores for interest keywords included in the document browsed immediately before.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2009-046795 | Feb 2009 | JP | national |
This application is a Continuation Application of PCT Application No. PCT/JP2010/051436, filed Feb. 2, 2010 and based upon and claiming the benefit of priority from prior Japanese Patent Application No. 2009-046795, filed Feb. 27, 2009, the entire contents of all of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2010/051436 | Feb 2010 | US |
Child | 13217875 | US |