The present invention relates to a contents-retrieving apparatus and a contents-retrieving method, whereby expected contents, such as image files or music data files, are being retrieved from a database storing huge amount of contents on the basis of arbitrarily entered search keywords.
Recently, databases storing a variety of contents such as text data, image data and music data are disclosed through communication networks like the Internet, so the users can register some contents on the database or search the database for favorable contents and download them by operating personal computers or mobile terminals, which are connected to the communication network.
As a method for retrieving expected contents from the database, “search based on keyword” is general. This is a method, wherein such a keyword or keywords that have some relation to the expected contents are entered so as to find out those contents which contain or relate to the entered keyword or keywords. Since it is unnecessary to categorize the contents in the database, the search based on keyword simplifies the management of the database and improves availability of an enormous number of contents from the database.
In the case where an enormous number of contents are stored in the database, it often occurs with some keyword that the number of contents hit by the keyword is so large that the users cannot easily find out their expecting contents. As a solution for this problem, so-called narrowed search has been known, wherein the contents hit by the first keyword are winnowed by entering another keyword, and winnowed more and more by entering additional keywords.
Because the user is required to think of the keyword to enter for the narrowed search, if the entered keyword is irrelevant, the contents will be insufficiently winnowed or some relevant contents will be wrongly winnowed out. To solve this problem, a prior art for supporting the user on searching has been suggested for example in JPA 2003-108594. In this prior art, histories of narrowed search with past keywords are memorized, so that those keyword which have relation to a newly entered keyword are retrieved from the past keywords and are offered to the user.
However, according to the conventional search technique, the search result varies depending upon the search histories of all users as well as the search history of the present user, so the search result is influenced by the trend of the times or the period or season when the search is carried out. That means, such contents that definitely reflect the trend of the times will be hit more frequently. For example, as for a search based on a keyword “Mt. Fuji”, if it is carried out in summer, the search result will include a larger number of such contents that relate also to “Climbing”. On the contrary, if the search with the keyword “Mt. Fuji” is carried out in winter, such contents that relate also to “Climbing” will be scarcely retrieved.
Getting such search result has no problem if the user wants to get such contents that are in tune with the times or reflect the trend of the times. However, if the user wants to get such contents that relate to basic information on the entered keyword, it can be difficult to retrieve expected contents in the conventional search method, because of the influence of the trend of the times on the search result.
In view of the foregoing, a primary object of the present invention is to provide a contents-retrieving apparatus and a contents-retrieving method, which allow the user to cut off the influence of the trend of the times from the search result and retrieve proper contents while taking account of the influence of the trend of the times.
In a contents-retrieving apparatus for retrieving some contents from a database, which stores variable contents with their respective keywords attached thereto, on the basis of an entered search keyword, the present invention comprises an inter-keyword relevancy calculator for calculating an inter-keyword relevancy between every pair of keywords attached to contents as stored in the database at constant time-intervals to produce time-sequential data on the inter-keyword relevancy of every pair of keywords; a basic relevancy calculator for calculating a basic relevancy of a particular keyword to the search keyword by smoothing the time-sequential data on the inter-keyword relevancy between the search keyword and the particular keyword; a contents-extracting device for extracting at least a content from the database on the basis of the search keyword; a judging device for making a judgment as to whether the extracted content should be included in a search result, on the basis of the basic relevancy between the search keyword and a keyword which is attached to the extracted content; and an outputting device for outputting the search result.
Preferably, the basic relevancy calculator smoothes the time-sequential data on the inter-keyword relevancy by moving average.
The inter-keyword relevancy calculator calculates the relevancy between each pair of keywords on the assumption that those keywords which are attached to the same content have some relation to each other.
Preferably, the contents-retrieving apparatus further comprises a total relevancy calculator for calculating a total relevancy of a content to the search keyword when a plurality of keywords are attached to the content, the total relevancy calculator calculating the total relevancy by averaging the basic relevancies between the search keyword and the respective keywords attached to the content, wherein the result judging device judges the extracted content by its total relevancy.
Preferably, the result judging device judges those contents, of which total relevancy is greater than a predetermined value, to be included in the search result.
The contents-extracting device preferably extracts those contents which are attended by the search keyword from the database, and the basic relevancy calculator calculates the basic relevancies with respect to the extracted contents.
A contents-retrieving method for retrieving some contents from a database on the basis of an entered search keyword, wherein the database stores variable contents with their respective keywords attached thereto, the contents-retrieving apparatus comprising steps of:
calculating an inter-keyword relevancy between every pair of keywords attached to the contents as stored in the database at constant time-intervals to produce time-sequential data on the inter-keyword relevancy of every pair of keywords; calculating a basic relevancy of a particular keyword to the search keyword by smoothing the time-sequential data on the inter-keyword relevancy between the search keyword and the particular keyword; extracting at least a content from the database on the basis of the search keyword; making a judgment as to whether the extracted content should be included in a search result on the basis of the basic relevancy between the search keyword and a keyword which is attached to the extracted content; and outputting the search result.
Since the relevancy of each content to the search keyword is determined on the basis of the basic relevancies, which are calculated by smoothing the time-sequential data and thus less influenced by the trend of times, the contents-retrieving apparatus and method of the present invention allow the user to cut off the influence of the trend of the times from the search result and retrieve proper contents while taking account of the influence of the trend of the times
The above and other objects and advantages of the present invention will be more apparent from the following detailed description of the preferred embodiments when read in connection with the accompanied drawings, wherein like reference numerals designate like or corresponding parts throughout the several views, and wherein:
In
The server 11 is connected to clients' terminals 13 via a communication network 12, to constitute a network system 14. Each client's terminal 13 is constituted of a well-known personal computer, which is provided with a monitor 15 for displaying various operational screens and operating devices 18 comprising a mouse 16 and a keyboard 17. Search keywords for image-retrieval are input through the keyboard 17.
The client's terminal 13 takes images captured by a digital camera 19 or images recorded on a recording medium 20 such as a memory card or a CD-R. These images have respective keywords attached as their tags. The tag is attached to every image by operating the operating devices 18 as the image is taken into the client's terminal 13.
The digital camera 19 is connected to the client's terminal 13 through a communication cable like an USB (universal serial bus) cable or a wireless line like a wireless LAN, so that the digital camera 19 can exchange data with the client's terminal 13.
Referring to
The RAM 23 is a work memory for the CPU 21 to execute various processing. The HDD 24 stores various programs and data served for the work of the client's terminal 13 as well as the images taken from the digital camera 19 and the recording media 20. The CPU 21 reads out the program from the HDD 24 and develops it in the RAM 23 to execute a process based on the program.
The 25 controls a communication protocol that is suitable for the communication network 12, and mediates the data-exchange through the communication network 12. The 25 also mediates the data-exchange between the client's terminal 13 and external instruments such as the digital camera 19 and the recording media 20.
Referring to
The RAM 28 is a work memory for the CPU 26 to execute various processing. The HDD 29 stores various programs and data served for the work of the server 11. The CPU 26 reads out the program from the HDD 29 and develops it in the RAM 28 to execute a process based on the program. Note that the relevancy calculator 35 is a functional block that is constituted of a program stored in the RAM 28.
The communication I/F 30 controls a communication protocol that is suitable for the communication network 12, and mediates the data-exchange through the communication network 12. The data taken through the communication I/F 30 is stored temporarily in the RAM 28. If an image is taken as the data, it is stored in the HDD 29.
In the HUD 29, an image database (DB) 36 and a keyword information manager 37 are incorporated. The image database 36 stores images taken via the communication network 12 and the keywords attached to the images in association with each other. As shown in
The keyword information manager 37 stores time-sequential data of such information that show the degree of relevancy between two keywords which are attached to the same image as registered in the image DB 36. The degrees of relevancy between the keywords are obtained by the inter-keyword relevancy calculator 32. The inter-keyword relevancy calculator 32 refers to the keywords attached to each image, and calculates the degree of relevancy between each pair of keywords which are attached to the same image, on the assumption that the keywords attached to the same image have some relation to each other. It means that the inter-keyword relevancy Rt between two keywords gets greater as the number of such images that are attended by these two keywords increases in the image database 36. Then the inter-keyword relevancy calculator 32 systematizes the calculated inter-keyword relevancies to build up a thesaurus in the keyword information manager 37.
The CPU 26 activates the inter-keyword relevancy calculator 32 periodically, e.g. once a day, on the basis of the time counted by the timer 31, to revise or restructure the thesaurus periodically and obtain time-sequential data D1 on the relevancy between every pair of keywords, as shown in
When the CPU 26 receives a search command from the client's terminal 13, the CPU 26 searches the image DB 36 for those images associated with a keyword input on the client's terminal 13, hereinafter called the search keyword. Then, the CPU 26 activates the data bus 22 and the RAM 23 to execute a narrowed search, winnowing the extracted images. So the CPU 26 functions as a contents-extracting device. The basic relevancy calculator 33 makes a filtering process or smoothing process of the time-sequential data D1 with respect to the relevancy Rt between the input search keyword and any other keyword attached to the extracted images, to calculate a basic relevancy Mt of the individual keyword to the search keyword. The basic relevancy Mt is expressed as a smoothed time-sequential data D2, as shown in
Concretely, the basic relevancy Mt at a particular time “t” is obtained by calculating an average of the inter-keyword relevancies Rt obtained in a period T, e.g. thirty days, right before the particular time “t”, using a method called moving average. Provided that “N” and “ΣRt” respectively represent the number and the sum of the keyword relevancies Rt obtained in the period T, the basic relevancy Mt is expressed by an equation: Mt=ΣRt/N. Since the relevancy Rt before the filtering depends upon the times, this value Rt will be called “momentary relevancy” in contrast with the basic relevancy Mt.
The total relevancy calculator 34 calculates a total relevancy St of each individual extracted image to the search keyword. The total relevancy calculator 34 calculates the total relevancy St on the basis of either the basic relevancies Mt or the momentary relevancies Rt between the search keyword and other keywords attached to the extracted image. Whether the basic relevancies Mt or the momentary relevancies Rt are to be used for calculating the total relevancy St can be designated on the client's terminal 13 at the start of searching.
According to the present embodiment, the total relevancy calculator 34 calculates the total relevancy St of each image as an average AMt of the basic relevancies Mt or an average ARt of the momentary relevancies Rt. Concretely, in a case where “Mt. Fuji” is input as a search keyword, and the above-mentioned image P1 is extracted, the basic relevancies Mt and the momentary relevancies Rt between the search keyword “Mt. Fuji” KA1 and other keywords KA2 to KA4 can be as shown in
The CPU 26 compares the total relevancy St of each of the extracted images with a predetermined value, and sends information on those images, of which the total relevancy St is greater than the predetermined value, to the client's terminal 13 via the communication network 12. The information on the images, including their image data and file names, is displayed as a search result on the monitor 15 of the client's terminal 13.
Now the operation of the network system 14 having the above construction will be described.
When the images have been sent from the client's terminal 13 to the server 11 in the step S12, the sequence gets back to the step S10. If it is judged that any images are not taken into in the step S10, the client's terminal 13 checks if a searching operation is done for retrieving some images from the image DB 36 of the server 11. The searching operation may be done through the operating devices 18 while watching a search command screen 40 displayed on the monitor 15, like as shown in
When the search command is given in the step S13, the client's terminal 13 sends search command data, including the search keyword and information on the choice between the basic relevancy search and the momentary relevancy search, to the server 11 in the step S14. In response to the search command data, the server 11 executes an image retrieval process as set forth later. In the next step S15, the client's terminal 13 checks whether it receives any image information, such as image data and file names of the retrieved images, as a search result from the server 11. When the image information is received, the client's terminal 13 displays the search result on the monitor 15 on the basis of the image information in the step S16. After the step S16 is terminated, the sequence goes back to the step S10.
After the step S20, the server 11 checks if it receives the search command data that is sent from the client's terminal 13 in the step S14. This step S21 is made repeatedly till a predetermined time, e.g. 24 hours, is judged to have passed in the next step S22. When it is judged in the step S22 that the predetermined time has passed, the server 11 gets back to the step S20 to calculate relevancy between keywords. This way, the step S20 is repeated at the predetermined intervals, so the time-sequential data D1 showing the inter-keyword relevancy in a time-sequential fashion is provided, as shown in
When it is judged in the step S21 that the client's terminal 13 receives the search command information from the server 11, the sequence proceeds to the next step S23, wherein the CPU 26 extracts from among the images stored in the image DB 36 those images which are attended by the search keyword received as the search command information. For example, when the search keyword is “Mt. Fuji”, such images as shown in
When the step S23 is complete, it is judged by the search command information in the step S24 which is chosen the basic relevancy search or the momentary relevancy search. When the basic relevancy search is chosen, the sequence proceeds to the step S25, wherein the basic relevancy calculator 33 calculates the basic relevancies Mt between the search keyword and other keywords, which are attached to the images as extracted in the step S23. That is, the time-sequential data D1 of the momentary relevancy Rt of another keyword to the search keyword is subjected to a filtering or smoothing process to get the basic relevancy Mt between them. As shown for example in
In the step S26, the total relevancy calculator 34 calculates the total relevancies St of the extracted images to the search keyword on the basis of the basic relevancies Mt or the momentary relevancies Rt. That is, when the search based on basic relevancy is chosen, the total relevancy calculator 34 calculates the total relevancy St of each image as an average AMt of the basic relevancies Mt between the search keyword and other keywords attached to that image. Whereas when the search based on momentary relevancy is chosen, the total relevancy calculator 34 calculates the total relevancy St as an average ARt of the momentary relevancies Rt between the search keyword and other keywords attached to that image. As for the example shown in
In the following step S27, the CPU 26 compares the total relevancy St of each image with a predetermined threshold value, to sort out only those images, of which total relevancies St are greater than the threshold value. Then, information on the sorted images is sent to the client's terminal 13, so the client's terminal 13 displays the received information on the retrieved images as a search result on the monitor 15 (step S16).
As for the image P1, the degree of relevancy to the search keyword “Mt. Fuji” gets higher in summer because of its another keyword “Climbing”, so the probability of hitting this image P1 is higher in summer when the search based on momentary relevancy is chosen for the image searching. On the contrary, through the search based on basic relevancy, the probability of hitting this image P1 is relatively low in summer. This means that the user should choose the basic relevancy search if it is desirable to reduce the influence of the times from the search result. Then the user gets more likely to obtain expected images while eliminating such images that are certainly under the influence of the trend of the times.
In the above embodiment, the basic relevancy Mt is calculated by smoothing through moving average of the relevancies Rt as calculated by the inter-keyword relevancy calculator 32 in a predetermined period. The period of moving average may also be designated by the user on the client's terminal 13. Thereby, the user can adjust the degree of smoothing, i.e. the degree of reducing the influence of the time from the search result.
Other kinds of smoothing processes than the moving average are usable for calculating the basic relevancy Mt. For example, frequency analysis such as Fourier transformation is also usable. It is also possible to use a low-pass filter to obtain the most frequent value of the relevancies Rt as the basic relevancy (a constant value) Mt. Of course, it is also possible to allow the user to input a calculation period for the alternative method on the client's terminal 13.
Although the value calculated by the inter-keyword relevancy calculator 32 is directly used as the momentary relevancy Rt in the above embodiment, the momentary relevancy Rt may be calculated by smoothing the time-sequential data D1 for a shorter period than that applied to the basic relevancy Mt. The momentary relevancy Rt may also be calculated by subtracting the basic relevancy Mt from a value calculated by the inter-keyword relevancy calculator 32.
Although the above embodiment calculates the total relevancy St based on either the basic relevancy Mt or the momentary relevancy Rt, it is possible to calculate the total relevancy St based on both the basic relevancy Mt and the momentary relevancy Rt, using a coefficient α (0≦α≦1): St−αMt+(1−α)Rt. For example, α=0.9 for the search based on the basic relevancy, whereas α=0.1 for the search based on the momentary relevancy. This coefficient a may be designated by the user on the client's terminal 13.
In the above embodiment, information on those images, of which total relevancies St are greater than the threshold value, is sent as the search result to the client's terminal 13. However, it is possible to send information on a predetermined number of images, of which the total relevancy St to the search keyword is in the top. It is also possible that the user may designate the threshold value of the total relevancy or the number of retrieved images as a search criterion on the client's terminal 13.
In the above embodiment, the user alternatively chooses between the search based on the basic relevancy and the search based on the momentary relevancy. Instead of this, the present invention may be so configured that the user can execute the search based on the basic relevancy and the search based on the momentary relevancy simultaneously. In that case, respective results of these two kinds of searches should be displayed distinguishably from each other on the client's terminal 13. For example, as shown in
In the above embodiment, those images are extracted from the image DB 36, which are attended by a search keyword as entered by the user, and then the narrowed search is done based on relevancies of other keywords of the extracted images to the entered search keywords. Instead of this, relevancy (total relevancy St) to an entered search keyword may be calculated with respect to every image in the image DB 36 while calculating relevancies between the search keyword and individual keywords or a representative keyword of every image based on the thesaurus that is built in the keyword information manager 37, so as to retrieve such images that are highly relevant to the search keyword. Since the search process using the thesaurus covers those images which are not attended by the entered search keyword as search targets, so-called fuzzy search is available.
Although the above embodiment enters only one word as a search key, it is possible to use more than one keyword as search keywords for a search process. In that case, those images which are attended by these search keywords are extracted from the image DB 36, and the narrowed search is done based on relevancies of other keywords of the extracted images to the respective search keywords. For the sake of the above-mentioned fuzzy search using the thesaurus, the search process is done based on relevancies between the respective search keywords and individual keywords or a representative keyword of each image in the image DB 36. Where more than one search keyword is used for a search process, relevancies (basic relevancies Mt and momentary relevancies Rt) of all keywords of each image to the respective search keywords are averaged to calculate a total relevancy St of each image.
In the above embodiment, a search keyword is entered as a text through the keyboard 17. Instead of this, it is possible to display several keywords on a list, so that the use may designate a search keyword by choosing one from among the displayed ones.
It is also possible to enter a search keyword by designating an image among several candidates, wherein each of the candidate images are attended by a keyword or keywords. As shown for example in
In the above embodiment, the basic relevancy AMt and the momentary relevancy ARt of a particular image to the search keyword are calculated by averaging basic relevancies Mt and momentary relevancies Rt of individual keywords of the particular image, respectively. If the keywords attached to the particular image are weighted differently from each other, it is preferable to calculate these values ARt and AMt by way of correspondingly-weighted average. If, for example, the individual keywords as shown in
AMt=(15×70+5×20+10×10)/100=12.5
ARt=(80×70+5×20+5×10)/100=57.5
Although the above described embodiment refers to images as the contents or search targets, the contents are not limited to images but may be movie data, music data, text data, computer software, Web pages and complex mixtures of these contents. The keywords attached to the individual contents are not limited to letters or characters but may be expressed by codes, numbers or the like.
Although the above embodiment calculates the inter-keyword relevancy on account that those keywords which are attached to the same content are relevant to each other, if several keywords are simultaneously entered as search keys, it is possible to calculate the inter-keyword relevancy on account that the simultaneously entered keywords are relevant to each other.
Thus, the present invention is not to be limited to the above embodiments but, on the contrary, various modifications will be possible without departing from the scope of claims appended hereto.
Number | Date | Country | Kind |
---|---|---|---|
2007-324499 | Dec 2007 | JP | national |