1. Field of the Invention
The present invention relates to a method and apparatus for retrieving data.
2. Description of the Related Art
Digital images captured by portable imaging devices, such as digital cameras, can be managed with personal computers (PCs) or server computers. For example, captured images can be organized in folders on PCs or servers, and a specified image among the captured images can be printed out or inserted in a greeting card. For management on servers, opening some images to other users is possible.
To conduct these management operations, it is necessary to find an image that a user desires. If the number of images to be retrieved is small, a user can find a target image by viewing the list of thumbnails of the images. However, if hundreds of images must be retrieved, or if a group of images to be retrieved is partitioned and stored in multiple folders, finding the target image by viewing is difficult.
Sound annotations added to images on imaging devices are often used in retrieving. For example, when a user captures an image of a mountain and says “Hakone no Yama” to the image, this sound data and image data are stored as a set in an imaging device. The sound data is then speech-recognized in the imaging device or a PC to which the image is uploaded, and converted to text information indicating “hakonenoyama”. After annotation data is converted to text information, common text retrieving techniques are applicable. Therefore, the image can be retrieved by a word, such as “Yama”, “Hakone”, or the like.
Another conventional technique relating to the present invention is disclosed in Japanese Patent Laid-Open No. 2-027479 describing a technique for registering a retrieval key input by a user. According to this technique, the retrieval key input by the user is registered as an operation expression of an existing keyword in a system by the use of synonyms and the like.
In the case of retrieving performed after sound annotations are converted by speech recognition, recognition errors are inescapable under present circumstances. A high proportion of recognition errors leads to poor correlation in matching even if a retrieval key is correctly entered, thus resulting in unsatisfactory retrieval. In other words, no matter how the retrieval key is entered, because of poor speech recognition, desired image data is not retrieved at a high ranking.
Accordingly, it is necessary to introduce a technology capable of realizing a high data-retrieval accuracy even when retrieval data includes an associated annotation created by speech recognition together with recognition errors.
To solve the above problems, according to one aspect of the present invention, a method for retrieving data from a database storing a plurality of retrieval data components including associated annotation data segments, each annotation data segment including at least one subword string obtained by speech recognition, includes a receiving step for receiving a retrieval key, an acquiring step for acquiring a result by retrieving retrieval data components based on a degree of correlation between the retrieval key received by the receiving step and each of the annotation data segments, a selecting step for selecting a data segment from the result acquired by the acquiring step in accordance with an instruction from a user, and a registering step for registering the retrieval key received by the receiving step in an annotation data segment associated with the data segment selected by the selecting step.
According to another aspect of the present invention, an apparatus for retrieving data from a database storing a plurality of retrieval data components including associated annotation data segments, each annotation data segment including at least one subword string obtained by speech recognition, includes a receiving unit configured to receive a retrieval key, an acquiring unit configured to acquire a result by retrieving retrieval data components based on a degree of correlation between the retrieval key received by the receiving unit and each of the annotation data segments, a selecting unit configured to select a data segment from the result acquired by the acquiring unit in accordance with an instruction from a user, and a registering unit configured to register the retrieval key received by the receiving unit in an annotation data segment associated with the selected data segment.
Therefore, the method and the apparatus according to the present invention can realize a high data-retrieval accuracy even when retrieval data includes an associated annotation created by speech recognition together with recognition errors.
Further features of the present invention will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
A retrieval-key input unit 105 is used for inputting a retrieval key for retrieving a desired content data segment 102. A retrieval-key converting unit 106 is used for converting the retrieval key to a subword string having the same format as that of the speech-recognized annotation data segment 104 in order to perform matching for the retrieval key. A retrieval unit 107 is used for performing matching between the retrieval key and a plurality of speech-recognized annotation data segments 104 stored in the database 100, determining a correlation score with respect to each of the speech-recognized annotation data segments 104, and ranking a plurality of content data segments 102 associated with the speech-recognized annotation data segments 104. A display unit 108 is used for displaying the content data segments 102 ranked by the retrieval unit 107 in a ranked order. A user selecting unit 109 is used for selecting a user-desired data segment among the content data segments 102 displayed on the display unit 108. An annotation registering unit 110 is used for additionally registering the subword string to which the retrieval key is converted in the speech-recognized annotation data segment 104 associated with the data segment selected by the user selecting unit 109.
The functional structure of the apparatus for retrieving data according to the exemplary embodiment is generally as described above. Processing performed by this apparatus proceeds from the top of the blocks shown in
As mentioned earlier, the retrieval data components 101 including images, documents, or the like as their content contains the corresponding sound annotation data segments 103 and the speech-recognized annotation data segments 104, which are created by performing the speech recognition on the sound annotation data segments 103 (see
A retrieval key input by a user to the retrieval-key input unit 105 is received. The received retrieval key is transferred to the retrieval-key converting unit 106, and the retrieval key is converted to a phoneme string having the same format as that of each of the speech-recognized phoneme strings 201.
Then, the retrieval unit 107 performs phoneme matching between the phoneme string of the retrieval key and the speech-recognized annotation data segment 104 of each of the retrieval data components 101 and determines a phoneme accuracy indicating the degree of correlation between the retrieval key and each data segment. A matching technique may use a known dynamic programming (DP) matching method.
Phoneme Accuracy={(the number of phonemes of retrieval key)−(the number of insertion errors)−(the number of deletion errors)−(the number of substitution errors)}×100/(the number of phonemes of retrieval key)
In
Next, data segments are displayed on the display unit 108 in the order of retrieval.
In this step, a user can select one or more content data segments from the data segments displayed. As previously described, a recognition error may occur in speech recognition, and therefore, a desired content data segment may not appear at a high ranking and may barely appear at a low ranking. In this embodiment, even if the desired content data segment is not retrieved at a high ranking, once a user selects the desired content data segment (image), the retrieval operation using the same retrieval key for the second and subsequent times can reliably retrieve the desired content data segment at a high ranking by the processing described below.
The user selecting unit 109 selects a data segment in accordance with the user's selecting operation. In response to this, the annotation registering unit 110 additionally registers the phoneme string to which the retrieval key is converted in the speech-recognized annotation data segment 104 associated with the selected data segment.
In the exemplary embodiment described above, the score acquired by matching using phonemes as subwords is used. However, the present invention is not limited to this. For example, the score may be acquired by matching using syllables, in place of the phonemes, or by matching in units of words. A recognition likelihood determined by speech recognition may be added to this. The score may have a weight using the degree of similarity between phonemes (e.g., a high degree of similarity between “p” and “t”).
In the exemplary embodiment described above, the phoneme accuracy determined by exact matching of the phoneme string is used as the score for retrieving, as shown in
The speech-recognized annotation data segment 104 in the embodiment described above is data consisting of the speech-recognized phoneme strings 201, as shown in
The speech-recognized annotation data segment 104 in the embodiment described above is stored such that the top N recognized results are stored as subword strings (e.g. phoneme strings), as shown in
In this case, when a phoneme string is added by the annotation registering unit 110, a necessary node may be added to the subword graph shown in
The annotation registering unit 110 additionally registers the phoneme string of the retrieval key in the speech-recognized annotation data segment 104 in the embodiment described above. However, the present invention is not limited to this. For example, the N-th phoneme string among the top N speech-recognized phoneme strings (i.e., the phoneme string with the bottom recognition score among the speech-recognized annotation data segment 104) may be replaced with the phoneme string of the retrieval key.
In the embodiment described above, the phoneme string to which the retrieval key is converted is additionally registered in the speech-recognized annotation data segment 104 associated with a selected data segment. In this step, as a result of comparing the previously registered annotation data with the phoneme string to which the retrieval key is converted, when the degree of similarity is low, the phoneme string of the retrieval key may not be registered, and only when the degree of similarity is high, the phoneme string of the retrieval key may be additionally registered.
An exemplary embodiment of the present invention is described above. The present invention is applicable to a system including a plurality of devices and to an apparatus composed of a single device.
The present invention can be realized by supplying a software program for carrying out the functions of the embodiment described above directly or remotely to a system or an apparatus and reading and executing program code of the supplied program in the system or the apparatus. In this case, the program may be replaced with any form as long as it has the functions of the program.
Program code may be installed in a computer in order to realize the functional processing of the present invention by the computer. A storage medium stores the program.
In this case, the program may have any form, such as object code, a program executable by an interpreter, script data to be supplied to an operating system (OS), or some combination thereof, as long as it has the functions of the program.
Examples of storage media for supplying a program include a flexible disk, a hard disk, an optical disk, a magneto-optical disk (MO), a compact disc read-only memory (CD-ROM), a CD recordable (CD-R), a CD-Rewritable (CD-RW), magnetic tape, a nonvolatile memory card, a ROM, a digital versatile disk (DVD), including a DVD-ROM and DVD-R, and the like.
Examples of methods for supplying a program include connecting to a website on the Internet using a browser of a client computer and downloading a computer program or a compressed file of the program with an automatic installer from the website to a storage medium, such as a hard disk; and dividing program code constituting the program according to the present invention into a plurality of files and downloading each file from different websites. In other words, a World Wide Web (WWW) server may allow a program file for realizing the functional processing of the present invention by a computer to be downloaded to a plurality of users.
Encrypting a program according to the present invention, storing the encrypted program in storage media, such as CD-ROMs, distributing them to users, allowing a user who satisfies a predetermined condition to download information regarding a decryption key from a website over the Internet and to execute the encrypted program using the information regarding the key, thereby enabling the user to install the program in a computer is applicable.
Executing a read program by a computer can realize the functions of the embodiment described above. In addition, performing actual processing in part or in entirety by an operating system (OS) running on a computer in accordance with instructions of the program can realize the functions of the embodiment described above.
Moreover, a program read from a storage medium is written on a memory included in a feature expansion board inserted into a computer or in a feature expansion unit connected to the computer, and a CPU included in the feature expansion board or the feature expansion unit may perform actual processing in part or in entirety in accordance with instructions of the program, thereby realizing the functions of the embodiment described above.
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all modifications, equivalent structures and functions.
This application claims the benefit of Japanese Application No. 2004-249014 filed Aug. 27, 2004, which is hereby incorporated by reference herein in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
2004-249014 | Aug 2004 | JP | national |