Claims
- 1. Apparatus for recognizing free-field audio signals, comprising:
a hand-held device having a microphone to capture free-field audio signals; a local processor, coupleable to said hand-held device, to transmit audio signal features corresponding to the captured free-field audio signals to a recognition site; one of said hand-held device and said local processor including circuitry which extracts a time series of spectrally distinct audio signal features from the captured free-field audio signals; and a recognition processor and a recognition memory at the recognition site, said recognition memory storing data corresponding to a plurality of audio templates, said recognition processor correlating the audio signal features transmitted from said local processor with at least one of the audio templates stored in said recognition processor memory, said recognition processor providing a recognition signal based on the correlation.
- 2. Apparatus according to claim 1, wherein said hand-held device includes:
an analog-to-digital converter which digitizes the captured free-field audio signals; and a processor which extracts the time series of spectrally distinct audio signal features from the captured free-field audio signals.
- 3. Apparatus according to claim 1, wherein said local processor extracts the time series of spectrally distinct audio signal features from the captured free-field audio signals
- 4. Apparatus according to claim 1, wherein said local processor comprises a personal computer coupled to the Internet.
- 5. Apparatus according to claim 1, wherein said recognition processor memory stores a plurality of audio templates, each template corresponding to substantially an entire audio work.
- 6. Apparatus according to claim 5, wherein said hand-held device has a memory which stores free-field audio signals which correspond to less than an entire audio work.
- 7. Apparatus according to claim 6, wherein the audio work comprises a song.
- 8. Apparatus according to claim 1, wherein said recognition processor, in response to the recognition signal, transmits at least a portion of the at least one template stored in said recognition processor memory to said local processor for verification.
- 9. Apparatus according to claim 1, wherein said recognition processor mathematically correlates the audio signal features transmitted from said local processor with the at least one of the audio templates stored in said recognition processor memory.
- 10. A hand-held device for capturing audio signals to be transmitted from a network computer to a recognition site, the recognition site having a processor which receives extracted feature signals that correspond to the captured audio signals and compares them to a plurality of stored song information, the hand-held device comprising:
a microphone receiving analog audio signals; an A/D converter converting the received analog audio signals to digital audio signals; a signal processor extracting spectrally distinct feature signals from the digital audio signals; a memory storing the extracted feature signals; and a terminal transmitting the stored extracted feature signals to the network computer.
- 11. A device according to claim 10, further comprising an anti-aliasing filter for filtering the received analog audio signals.
- 12. A device according to claim 10, wherein said memory comprises a flash memory.
- 13. A device according to claim 10, wherein said signal processor extracts a time series of signals corresponding to energy in a plurality of different frequency bands of the digital audio signals.
- 14. A device according to claim 10, wherein said signal processor compresses the extracted feature signals, and wherein said memory stores the compressed signals.
- 15. A device according to claim 10, wherein said hand-held device comprises a cellular telephone.
- 16. A device according to claim 10, wherein said hand-held device comprises a portable device assistant.
- 17. A device according to claim 10, wherein said hand-held device comprises a radio receiver.
- 18. A local processor for an audio signal recognition system having a hand-held device and a recognition server, the hand-held device capturing audio signals and downloading them to the local processor, the recognition server (i) receiving from the local processor extracted feature signals that correspond to the captured audio signals and (ii) comparing received extracted feature signals to a plurality of stored song information, the local processor comprising:
an interface for receiving the captured audio signals from the hand-held device; a processor for forming extracted feature signals corresponding to the received captured audio signals, the extracted feature signals corresponding to different frequency bands of the captured audio signals; a memory for storing the extracted feature signals; and an activation device which causes the stored extracted feature signals to be sent to the recognition server.
- 19. A processor according to claim 18, further comprising audio structure for playing back to a user a verification signal received from the recognition server, the verification signal corresponding to the captured audio signal.
- 20. A processor according to claim 18, wherein said processor forms the extracted feature signal from less than an entire audio work.
- 21. A processor according to claim 18, wherein the local processor sends the extracted feature signals to the recognition server over the Internet.
- 22. A recognition server for an audio signal recognition system having a hand-held device and a local processor, the hand-held device capturing audio signals and transmitting to the local processor signals which correspond to the captured audio signals, the local processor transmitting extracted feature signals to the recognition server, the recognition server comprising:
an interface receiving the extracted feature signals from the local server; a memory storing a plurality of feature signal sets, each set corresponding to an entire audio work; and processing circuitry which (i) receives an input audio stream and separates the received audio stream into a plurality of different frequency bands; (ii) forms a plurality of feature time series waveforms which correspond to spectrally distinct portions of the received input audio stream; (iii) stores in the memory the plurality of feature signal sets which correspond to the feature time series waveforms, (iv) compares the received feature signals with the stored feature signal sets, and (v) provides a recognition signal when the received feature signals match at least one of the stored feature signal sets.
- 23. A server according to claim 22, wherein said processing circuitry also (i) forms multiple feature streams from the plurality of feature time series waveforms; (ii) forms overlapping time intervals of the multiple feature streams; (iii) estimates the distinctiveness of each feature in each time interval; (iv) rank-orders the features according to their distinctiveness; (v) transforms the feature time series to obtain the complex spectra; and (viii) stores the feature complex spectra in the memory as the feature signal sets.
- 24. A server according to claim 22, wherein said interface receives extracted feature signals which comprise less than an entire audio work.
- 25. A server according to claim 22, wherein said interface is coupled to the Internet.
- 26. A server according to claim 22, wherein said processor forwards to the local processor, verification audio signals which correspond to the matched at least one stored feature signal sets.
- 27. A server according to claim 22, wherein said processor forwards to the local processor, purchase signals which correspond to the matched at least one stored feature signal sets.
- 28. A hand-held music capture device, comprising:
a microphone which receives an arbitrary portion of an analog audio signal; an analog-to-digital converter to convert the received portion of the audio signal into a digital signal; a signal processor which receives a fixed-time-portion of the digital signal and signal processes same into a digital time series representing the voltage waveform of the captured audio signal; a memory which stores the processed fixed-time portion of the digital signal that corresponds to less than a complete audio work; and a terminal which is connectable to a computer device and transmits the stored portion of the digital signal to the computer device.
- 29. A device according to claim 28, wherein the signal processor compresses the received arbitrary portion of the analog audio signal before storing it in said memory.
- 30. Apparatus according to claim 28, wherein said signal processor forms a time series signal corresponding to the energy in different frequency bands of the received analog audio signal.
- 31. A portable device to capture and store samples of free-field audio signals and store these samples for later identification, comprising:
a microphone to receive an audio waveform; an analog to digital converter to convert the received audio waveform into a digital time series; a trigger to allow the user to manually initiate audio waveform reception; a signal processor to extract and compress spectrally distinct features of the received audio waveform; a memory to store the compressed spectrally distinct features; and an interface to allow transfer of the stored features to recognition equipment.
- 32. A method for recognizing an input data stream, comprises the steps of:
receiving the input data stream with a hand held device; with the hand held device, randomly selecting any one portion of the received data stream; forming a first plurality of feature time series waveforms corresponding to spectrally distinct portions of the received data stream; transmitting to a recognition site the first plurality of feature time series waveforms; storing a second plurality of feature time series waveforms at the recognition site; at the recognition site, correlating the first plurality of feature time series waveforms with the second plurality of feature time series waveforms; and designating a recognition when a correlation probability value between the first plurality of feature time series waveforms and one of the second plurality of feature time series waveforms reaches a predetermined value.
- 33. A method for recognizing free-field audio signals, comprising the steps of:
capturing free-field audio signals with a hand-held device having a microphone; transmitting signals corresponding to the captured free-field audio signals to a local processor; transmitting from the local processor to a recognition site, audio signal features which correspond to the signals transmitted from the hand-held device; one of the hand-held device and the local processor extracting a time series of spectrally distinct audio signal features from the captured free-field audio signals; storing data corresponding to a plurality of audio templates in a memory at the recognition site; correlating the audio signal features transmitted from the local processor with at least one of the audio templates stored in the recognition site memory, using a recognition processor; and providing a recognition signal based on the correlation.
- 34. A method according to claim 33, wherein said capturing step includes the steps of:
analog-to-digital converting the captured free-field audio signals; and extracting the time series of spectrally distinct audio signal features from the captured free-field audio signals.
- 35. A method according to claim 33, wherein said local processor extracts the time series of spectrally distinct audio signal features from the captured free-field audio signals
- 36. A method according to claim 33, wherein said local processor comprises a personal computer coupled to the Internet.
- 37. A method according to claim 33, wherein said storing step comprises the step of storing in the recognition site memory a plurality of audio templates, each template corresponding to substantially an entire audio work.
- 38. A method according to claim 33, further comprising the step of storing, in a hand-held device memory, free-field audio signals which correspond to less than an entire audio work.
- 39. A method according to claim 33, wherein the audio work comprises a song.
- 40. A method according to claim 33, further comprising the step of the recognition processor, in response to the recognition signal, transmitting at least a portion of the at least one template stored in said recognition processor memory to the local processor for verification.
- 41. A method according to claim 33, wherein said recognition processor mathematically correlates the audio signal features transmitted from said local processor with the at least one of the audio templates stored in said recognition processor memory.
- 42. A method for a hand-held device to capture audio signals to be transmitted from a network computer to a recognition site, the recognition site having a processor which receives extracted feature signals that correspond to the captured audio signals and compares them to a plurality of stored song information, the method comprising the steps of:
receiving analog audio signals with a microphone; A/D converting the received analog audio signals to digital audio signals; extracting spectrally distinct feature signals from the digital audio signals with a signal processor; storing the extracted feature signals in a memory; and transmitting the stored extracted feature signals to the network computer through a terminal.
- 43. A method according to claim 42, further comprising the step of anti-alias filtering the received analog audio signals.
- 44. A method according to claim 42, wherein said memory comprises a flash memory.
- 45. A method according to claim 42, wherein said signal processor extracts a time series of signals corresponding to energy in a plurality of different frequency bands of the digital audio signals.
- 46. A method according to claim 42, wherein said signal processor compresses the extracted feature signals, and wherein said memory stores the compressed signals.
- 47. A method according to claim 42, wherein said hand-held device comprises a cellular telephone.
- 48. A method according to claim 42, wherein said hand-held device comprises a personal digital assistant.
- 49. A method according to claim 42, wherein said hand-held device comprises a radio receiver.
- 50. A local processor method in an audio signal recognition system having a handheld device and a recognition server, the hand-held device capturing audio signals and downloading them to the local processor, the recognition server (i) receiving from the local processor extracted feature signals that correspond to the captured audio signals and (ii) comparing received extracted feature signals to a plurality of stored song information, the method comprising the steps of:
receiving the captured audio signals from the hand-held device through an interface; forming extracted feature signals corresponding to the received captured audio signals with a processor, the extracted feature signals corresponding to different frequency bands of the captured audio signals; storing the extracted feature signals in a memory; and causing the stored extracted feature signals to be sent to the recognition server.
- 51. A method according to claim 50, further comprising the step of playing back to a user at the local processor, a verification signal received from the recognition server, the verification signal corresponding to the captured audio signal.
- 52. A method according to claim 50, wherein said processor forms the extracted feature signal from less than an entire audio work.
- 53. A method according to claim 50, wherein the local processor sends the extracted feature signals to the recognition server over the Internet.
- 54. A recognition server method in an audio signal recognition system having a hand-held device and a local processor, the hand-held device capturing audio signals and transmitting to the local processor signals which correspond to the captured audio signals, the local processor transmitting extracted feature signals to the recognition server, the method comprising the steps of:
receiving the extracted feature signals from the local server through an interface; storing a plurality of feature signal sets in a memory, each set corresponding to an entire audio work; and with processing circuitry (i) receiving an input audio stream and separates the received audio stream into a plurality of different frequency bands; (ii) forming a plurality of feature time series waveforms which correspond to spectrally distinct portions of the received input audio stream; (iii) storing in the memory the plurality of feature signal sets which correspond to the feature time series waveforms, (iv) comparing the received feature signals with the stored feature signal sets, and (v) providing a recognition signal when the received feature signals match at least one of the stored feature signal sets.
- 55. A method according to claim 54, wherein said processing circuitry also (i) forms multiple feature streams from the plurality of feature time series waveforms; (ii) forms overlapping time intervals of the multiple feature streams; (iii) estimates the distinctiveness of each feature in each time interval; (iv) rank-orders the features according to their distinctiveness; (v) transforms the feature time series to obtain the complex spectra; and (viii) stores the feature complex spectra in the memory as the feature signal sets.
- 56. A method according to claim 54, wherein said interface receives extracted feature signals which comprise less than an entire audio work.
- 57. A method according to claim 54, wherein said interface is coupled to the Internet.
- 58. A method according to claim 54, wherein said processor forwards to the local processor, verification audio signals which correspond to the matched at least one stored feature signal sets.
- 59. A method according to claim 54, wherein said processor forwards to the local processor, purchase signals which correspond to the matched at least one stored feature signal sets.
- 60. Computer readable storage media storing code which causes one or more processors to carry out a method for recognizing an input data stream, the code causing the one or more processors to perform the steps of:
receiving the input data stream with a hand held device; with the hand held device, randomly selecting any one portion of the received data stream; forming a first plurality of feature time series waveforms corresponding to spectrally distinct portions of the received data stream; transmitting to a recognition site the first plurality of feature time series waveforms; storing a second plurality of feature time series waveforms at the recognition site; at the recognition site, correlating the first plurality of feature time series waveforms with the second plurality of feature time series waveforms; and designating a recognition when a correlation probability value between the first plurality of feature time series waveforms and one of the second plurality of feature time series waveforms reaches a predetermined value.
- 61. Computer readable storage media storing code which causes one or more processors to carry out a method for recognizing free-field audio signals, the code causing the one or more processors to perform the steps of:
capturing free-field audio signals with a hand-held device having a microphone; transmitting signals corresponding to the captured free-field audio signals to a local processor; transmitting from the local processor to a recognition site, audio signal features which correspond to the signals transmitted from the hand-held device; at least one of the hand-held device and the local processor extracting a time series of spectrally distinct audio signal features from the captured free-field audio signals; storing data corresponding to a plurality of audio templates in a memory at the recognition site; correlating the audio signal features transmitted from the local processor with at least one of the audio templates stored in the recognition site memory, using a recognition processor; and providing a recognition signal based on the correlation.
- 62. Computer readable storage media according to claim 61, wherein said code includes code for causing the one or more processors to perform the steps of:
analog-to-digital converting the captured free-field audio signals; and extracting the time series of spectrally distinct audio signal features from the captured free-field audio signals.
- 63. Computer readable storage media according to claim 61, wherein said code includes code for causing the said local processor to extract the time series of spectrally distinct audio signal features from the captured free-field audio signals
- 64. Computer readable storage media according to claim 61, wherein said local processor comprises a personal computer coupled to the Internet.
- 65. Computer readable storage media according to claim 61, wherein said storing step comprises the step of storing in the recognition site memory a plurality of audio templates, each template corresponding to substantially an entire audio work.
- 66. Computer readable storage media according to claim 61, further comprising code for causing the step of storing, in a hand-held device memory, free-field audio signals which correspond to less than an entire audio work.
- 67. Computer readable storage media according to claim 61, wherein the audio work comprises a song.
- 68. Computer readable storage media according to claim 61, further comprising code for causing the recognition processor, in response to the recognition signal, to transmit at least a portion of the at least one template stored in said recognition processor memory to the local processor for verification.
- 69. Computer readable storage media according to claim 61, further comprising code for causing said recognition processor to mathematically correlate the audio signal features transmitted from said local processor with the at least one of the audio templates stored in said recognition processor memory.
- 70. Computer readable storage media storing code which causes a hand-held device to capture audio signals to be transmitted from a network computer to a recognition site, the recognition site having a processor which receives extracted feature signals that correspond to the captured audio signals and compares them to a plurality of stored song information, the code causing the hand-held device to perform the steps of:
receiving analog audio signals with a microphone; A/D converting the received analog audio signals to digital audio signals; extracting spectrally distinct feature signals from the digital audio signals with a signal processor; storing the extracted feature signals in a memory; and transmitting the stored extracted feature signals to the network computer through a terminal.
- 71. Computer readable storage media according to claim 70, wherein the code causes said signal processor to extract a time series of signals corresponding to energy in a plurality of different frequency bands of the digital audio signals.
- 72. Computer readable storage media according to claim 70, wherein said code causes said signal processor to compress the extracted feature signals, and wherein said code causes said memory to store the compressed signals.
- 73. Computer readable storage media according to claim 70, wherein said hand-held device comprises a cellular telephone.
- 74. Computer readable storage media according to claim 70, wherein said hand-held device comprises a personal digital assistant.
- 75. Computer readable storage media according to claim 70, wherein said hand-held device comprises a radio receiver.
- 76. Computer readable storage media storing code which causes a local processor to transmit extracted feature signals to a recognition server, in an audio signal recognition system having a hand-held device and the recognition server, the hand-held device capturing audio signals and downloading them to the local processor, the recognition server (i) receiving from the local processor extracted feature signals that correspond to the captured audio signals and (ii) comparing received extracted feature signals to a plurality of stored song information, the code causing the local processor to perform the steps of:
receiving the captured audio signals from the hand-held device through an interface; forming extracted feature signals corresponding to the received captured audio signals with a processor, the extracted feature signals corresponding to different frequency bands of the captured audio signals; storing the extracted feature signals in a memory; and causing the stored extracted feature signals to be sent to the recognition server.
- 77. Computer readable storage media according to claim 76, wherein the code causes the local processor to play back to a user at the local processor, a verification signal received from the recognition server, the verification signal corresponding to the captured audio signal.
- 78. Computer readable storage media according to claim 76, wherein said processor forms the extracted feature signal from less than an entire audio work.
- 79. Computer readable storage media according to claim 76, wherein code causes the local processor to send the extracted feature signals to the recognition server over the Internet.
- 80. Computer readable storage media storing code which causes a recognition server to recognize signals in an audio signal recognition system having a hand-held device and a local processor, the hand-held device capturing audio signals and transmitting to the local processor signals which correspond to the captured audio signals, the local processor transmitting extracted feature signals to the recognition server, the code causing the recognition server to perform the steps of:
receiving the extracted feature signals from the local server through an interface; storing a plurality of feature signal sets in a memory, each set corresponding to an entire audio work; and with processing circuitry (i) receiving an input audio stream and separates the received audio stream into a plurality of different frequency bands; (ii) forming a plurality of feature time series waveforms which correspond to spectrally distinct portions of the received input audio stream; (iii) storing in the memory the plurality of feature signal sets which correspond to the feature time series waveforms, (iv) comparing the received feature signals with the stored feature signal sets, and (v) providing a recognition signal when the received feature signals match at least one of the stored feature signal sets.
- 81. Computer readable storage media according to claim 80, wherein said code causes said processing circuitry to also (i) form multiple feature streams from the plurality of feature time series waveforms; (ii) form overlapping time intervals of the multiple feature streams; (iii) estimate the distinctiveness of each feature in each time interval; (iv) rank-order the features according to their distinctiveness; (v) transform the feature time series to obtain the complex spectra; and (viii) store the feature complex spectra in the memory as the feature signal sets.
- 82. Computer readable storage media according to claim 80, wherein said code causes said interface to receive extracted feature signals which comprise less than an entire audio work.
- 83. Computer readable storage media according to claim 80, wherein said code causes said processor to forward to the local processor, verification audio signals which correspond to the matched at least one stored feature signal sets.
- 84. A business method of recognizing free-field audio signals, comprising the steps of:
capturing free-field audio signals with a hand-held device having a microphone; transmitting signals corresponding to the captured free-field audio signals to a local processor; transmitting from the local processor to a recognition site, audio signal features which correspond to the signals transmitted from the hand-held device; at least one of the hand-held device and the local processor extracting a time series of spectrally distinct audio signal features from the captured free-field audio signals; storing data corresponding to a plurality of audio templates in a memory at the recognition site; correlating the audio signal features transmitted from the local processor with at least one of the audio templates stored in the recognition site memory, using a recognition processor; providing a recognition signal based on the correlation; forwarding the recognition signal to a user at the local processor, together with instruction for the purchase of an audio work which corresponds to the at least one of the audio templates stored in the recognition site memory.
- 85. A business method according to claim 82, further comprising the steps of:
receiving payment authorization from said user; and in response to the authorization, forwarding the audio work which corresponds to the at least one of the audio templates stored in the recognition site memory to the user.
Parent Case Info
[0001] This application claims priority to provisional patent Application No. 60/218,824, filed Jul. 18, 2000, and is a continuation-in-part of U.S. patent application Ser. No. 09/420,945, filed Oct. 19, 1999 (incorporated herein by reference), which is based on U.S. provisional patent Application No. 60/155,064, filed Sep. 21, 1999.
Provisional Applications (2)
|
Number |
Date |
Country |
|
60218824 |
Jul 2000 |
US |
|
60155064 |
Sep 1999 |
US |
Continuation in Parts (1)
|
Number |
Date |
Country |
Parent |
09420945 |
Oct 1999 |
US |
Child |
09903627 |
Jul 2001 |
US |