Claims
- 1. A multimedia system comprising:
a query module capable of generating a query in a plurality of media modalities; a database capable of storing data representing a plurality of media modalities; an object detection module capable of extracting a first plurality of object features from the query and a second plurality of object features from the database wherein the first plurality of object features and the second plurality of object features are extracted from media representing different modalities; a processor coupled to the object detection modules, wherein the processor is arranged to determine a correlation between the first plurality of object features and the second plurality of object features and to retrieve those items from the database which have a correlation at least equal to a predetermined maximum degree of correlation.
- 2. The system as in claim 1, wherein prior to retrieval, the system is trained to correlate cross-modality media using sample data.
- 3. The system as in claim 1, wherein the correlation is calculated using a canonical correlation methodology.
- 4. The system as in claim 1, wherein the correlation is calculated using a latent semantic indexing methodology.
- 5. The system as in claim 2, wherein the training produces orthogonal matrices A=Cxx−1/2U and B=Cyy−1/2V wherein det(A)=det(B)=1 and Cxx=E{(X−mx) (X−mx)T}, Cyy=E{(Y−my) (Y−my)T}, Cxy=E{(X−mx) (Y−my)T}, K=Cxx−1/2·Cxy·Cyy−1/2=U·S·VT and the correlation between AX representing a first feature set in a first modality and BY representing a second feature set in a second modality is greatest, thereby enabling a transfer of features from the first modality to the second modality.
- 6. The system as in claim 5 wherein AX, the query, representing the first feature set can be identified given only BY, the result of the query, representing the second feature set, in that BY has the greatest correlation with AX.
- 7. A method of retrieving at least one item of interest to a user from a multimedia archive comprising the steps of:
generating a query; extracting a first plurality of object features from the query, the object features representing a first modality; extracting a second plurality of object features from items in the multimedia archive, the object features representing a second modality; determining a correlation between the first plurality of object features and the second plurality of object features; retrieving those items from the archive which have object features having a correlation with the object features in the query at least equal to a predetermined maximum degree of correlation.
- 8. The method as in claim 7, further comprising the step of using sample data to generate correlation matrices which are used to correlate cross-modality media.
- 9. The method as in claim 7, wherein the method of correlation is canonical correlation.
- 10. The method as in claim 7, wherein the method of correlation is latent semantic indexing.
- 11. The method as in claim 7, wherein the matrices generated are represented by A=Cxx−1/2U and B=Cyy−1/2V and wherein det(A)=det(B)=1 and Cxx=E{(X−mx) (X−mx)T}Cyy=E{(Y−my) (Y−my)T}, Cxy=E{(X−mx) (Y−my)T}, K=Cxx−1/2·Cxy·Cyy−1/2=U·S·VT and the correlation between AX representing a first feature set in a first modality and BY representing a second feature set in a second modality is greatest, thereby enabling a transfer of features from the first modality to the second modality.
- 12. The method as in claim 11, wherein AX, the query, representing the first feature set can be identified given only BY, the result of the query, representing the second feature set, in that BY has the greatest correlation with AX.
- 13. Computer-executable process steps, the computer-executable process steps being stored on a computer-readable medium enabling a user to retrieve media of interest from a database of multimedia comprising:
a query generation step for obtaining a query from the user, the query being in a first media modality; a first extracting step for extracting a first plurality of object features from the query; a second extracting step for extracting a second plurality of object features from items in the multimedia archive, the object features representing a second media modality; a correlation calculation step for determining a correlation between the first plurality of object features and the second plurality of object features; a retrieval step for retrieving those items from the database which have object features having a correlation with the object features in the query at least equal to a predetermined maximum degree of correlation.
- 14. Means for retrieving at least one item of interest to a user from a multimedia archive comprising:
means for generating a query in a first media modality; means for extracting a first plurality of object features from the query; means for extracting a second plurality of object features from items in the multimedia archive; means for determining a correlation between the first plurality of object features and the second plurality of object features, the second plurality of object features being extracted from a second media modality; retrieving those items from the archive which have object features having a correlation with the object features in the query at least equal to a predetermined maximum degree of correlation.
- 15. A method for retrieving at least one video clip of a character from a multimedia archive, the method comprising the steps of:
generating a query comprising an audio clip of the character's voice; extracting a plurality of audio features from the query; extracting a plurality of video features from each video clip in the multimedia archive; calculating a correlation between the plurality of audio features and the plurality of video features; and retrieving at least one video clip of the character speaking based upon maximizing the degree of correlation between the audio and the video.
- 16. A method for retrieving at least one picture of a person, stored within a multimedia archives the method comprising the steps of:
generating a query comprising a biometric feature of the person; extracting a plurality of visual features from the query; extracting a plurality of visual features from each picture in the multimedia archive; calculating a correlation between the plurality of visual features from the archive and the plurality of visual features form the query; and retrieving at least one picture of the person based upon maximizing the degree of correlation between the plurality of visual features extracted from the query and the plurality of visual features extracted from the multimedia archive.
- 17. The method as in claim 16, wherein the biometric feature is a retinal image.
- 18. A method for retrieving at least one item of information, stored within a multimedia archive, identifying an unknown liquid, the method comprising the steps of:
generating a query comprising an aroma; extracting a plurality of chemical features, which are represented digitally, from the query; extracting a plurality of textual features from each text item in the multimedia archive; calculating a correlation between the plurality of chemical features extracted from the query and the plurality of text features extracted from the multimedia archive; and retrieving at least one item of information identifying the unknown liquid, based upon maximizing the degree of correlation between the chemical features extracted from the query and the textual features extracted from the multimedia archive.
- 19. The method as in claim 18, wherein the unknown liquid is a beverage.
- 20. A method for retrieving from a multimedia archive a sound associated with an emotion, the emotion being chosen from the standard emoticons list, the method comprising the steps of:
generating a query comprising a word for the emotion; extracting a plurality of textual features from the query; extracting a plurality of audio features for each sound in the multimedia archive; calculating a correlation between the plurality of textual features extracted from the query and the plurality of audio features extracted from the multimedia archive; and retrieving at least one sound based upon maximizing the degree of correlation between the textual features extracted from the query and the audio features extracted from the multimedia archive.
- 21. A method for retrieving a query in a first media modality, when only a result of the query, in a second media modality, is initially known, comprising the steps of:
retrieving a stored matrix, B, for transforming features in the second modality into feature space that is correlated with the first modality, wherein the matrix B was produced during a training procedure to correlate items in the first modality A with items in the second modality B, and vice-versa, such that A=Cxx−1/2U and B=Cyy−1/2V wherein det (A)=det (B)=1 and Cxx=E{(X−mx) (X−mx)T}, Cyy=E{(Y−mY) (Y−my)T}, Cxy=E{(X−mx) (y−my)T}, K=Cxx−1/2·Cxy·Cyy−1/2=U·S·VT and the correlation between AX representing a first feature set in the first modality and BY representing a second feature set in the second modality is greatest; extracting object features from items in the second modality; calculating AY for the second modality; extracting object features from items in the first modality, stored in a multimedia database; calculating AX for each of the items; correlating AX and AY; and retrieving the X having the greatest correlation between AX and BY.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application incorporates by reference Assignee's application entitled Speaking Face Detection in TV Domain, filed on Feb. 14, 2002, inventors M. Li, D. Li, and N. Dimitrova, Ser. No. 10/076,194. This Li application provides background for the present invention.