This invention relates in general to delivered content identification, and more particularly to systems, apparatuses and methods for identifying transitions from one delivered or otherwise presented media item to another.
When originally introduced into the marketplace, analog mobile telephones used exclusively for voice communications were viewed by many as a luxury. Today, mobile communication devices are highly important, multi-faceted communication tools. A substantial segment of society now carries their mobile devices with them wherever they go. These mobile devices include, for example, mobile phones, Personal Digital Assistants (PDAs), laptop/notebook computers, and the like. The popularity of these devices and the ability to communicate “wirelessly” has spawned a multitude of new wireless systems, devices, protocols, etc. Consumer demand for advanced wireless functions and capabilities has also fueled a wide range of technological advances in the utility and capabilities of wireless devices. Wireless devices not only facilitate voice communication, but also messaging, multimedia communications, e-mail, Internet browsing, and access to a wide range of wireless applications and services.
More recently, wireless communication devices are increasingly equipped with other media capabilities such as radio receivers. Thus, a mobile phone can be equipped to receive amplitude modulated (AM) radio and/or frequency modulated (FM) radio signals, which can be presented to the device user via a speaker or headset. With the processing power typically available on such a mobile communication device, broadcast radio can be a more rich experience than with traditional radios. For example, a terminal (e.g., mobile phone, PDA, computer, laptop/notebook, etc.) is often equipped with a display to present images, video, etc. Terminals are also often capable of transmitting and/or receiving data, such as via GSM/GPRS systems or otherwise. These technologies enable such terminals to present images, video, text, graphics and/or other visual effects in addition to presenting the audio signal received via the radio broadcast. For example, the song title, artist name, album name, and/or other information or “metadata” relating to a song broadcast from a radio station can be provided to a terminal for visual presentation in addition to the audio presentation.
Such a “visual radio service” or other media provider may provide such information during a time when the song or other media item is being presented via the user terminal(s). More particularly, in the context of radio services, visual radio services can be provided by certain radio stations that are integrated with visual radio content creation tools. If a song is being sent to a terminal's radio application, the radio station server(s) can provide information such as the artist name, album name, image graphics and the like to the terminal.
However, the visual information may not correspond to the audio signal at all times, which can adversely affect the user's experience. For example, network congestion may cause the metadata or other information to be delayed in reaching the terminal. Thus, the terminal may already be playing a new audio item (e.g., song on the radio) although the presented information (e.g., artist name, album name, etc.) reflects a prior audio item. Current techniques for synchronizing the audio and information channels may provide for poor end of song detection and/or result in an inordinate quantity of data traveling through the mobile network.
Accordingly there is a need in the industry for, among other things, reducing the load on the network(s), improving end of song detection capabilities, and synchronizing multiple media portions such as audio signals and visual data. The present invention fulfills these and other needs, and offers other advantages over the prior art.
To overcome limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses systems, apparatuses and methods for identifying transitions of delivered content.
In accordance with one embodiment, a method is provided that involves presenting content via a device, and transmitting a calculated fingerprint(s) representative of at least a portion of the content. A content fingerprint for the content is received in response to the transmission of the calculated fingerprint. The device locally compares further calculated fingerprints to the received content fingerprint to identify an end to the presentation of the content.
According to more particular embodiments, the method may further involve receiving metadata associated with presented content. This metadata may be received in response to transmitting the calculated fingerprint(s). In a more particular embodiment, the method may further involve presenting at least some of the metadata contemporaneously with the presentation of the content via the device. One embodiment further involves discontinuing presentation of the metadata upon detection of the end to the presentation of the content. In another embodiment, the content may include at least an audio stream of audio items, and the method may involve presenting at least a portion of the metadata via a device display during the audio presentation of the corresponding audio item. In a more particular embodiment, at least one of the audio items may include a song, and presenting at least a portion of the metadata may thus involve visually presenting text, graphic or other visual information related to the song currently being played via the audio stream.
According to still other particular embodiments, such a method may further involve the device locally calculating the fingerprint(s) representative of at least a portion of the content, and locally calculating the further/additional fingerprints that are compared to the content fingerprint. According to another embodiment, the content may include multiple content items, where receiving a content fingerprint thus involves receiving a fingerprint representative of at least a remaining part of the content item from which the calculated fingerprint was based. In another embodiment, receiving a content fingerprint may involve receiving a fingerprint representing multiple portions of the content item from which the calculated fingerprint was based. In still another embodiment, receiving a content fingerprint may involve receiving a fingerprint representative of any temporal portion of the content item from which the calculated fingerprint was based. Yet another embodiment involves receiving a content fingerprint by receiving a more comprehensive fingerprint relative to the calculated fingerprint in response to transmitting the calculated fingerprint.
In another particular embodiment of such a method, the content may include streaming content. In such a case, the method may further involve the device calculating the fingerprint(s) by generating the fingerprint(s) for a temporal portion of the streaming content. In one embodiment, the calculated fingerprints are compared to the content fingerprint until the calculated fingerprint no longer corresponds to the content fingerprint. In another embodiment, the method further involves repeatedly calculating the fingerprints at the device and locally comparing the calculated fingerprints to the content fingerprint until the calculated fingerprint no longer corresponds to the content fingerprint. In yet another embodiment, the method further involves calculating the fingerprints at the device and locally comparing the calculated fingerprints to the content fingerprint substantially continuously, and doing so until the calculated fingerprint no longer corresponds to the content fingerprint.
In accordance with another embodiment of the invention, a method is provided that includes receiving a partial fingerprint representative of a portion of a content item. The method further includes locating a content fingerprint based on the content item identified by the partial fingerprint, and transmitting the content fingerprint associated with the content item corresponding to the partial fingerprint for use by devices in locally detecting an end of a local presentation of the content item.
According to one embodiment, such a method may further involve locating metadata based on the content item identified by the partial fingerprint, and transmitting the metadata associated with the content item for use by the devices in presenting the metadata in connection with presenting the content item. In another embodiment, the metadata includes information characteristics related to the located content item, and includes any one or more of textual, audio, graphical or multimedia information.
In another embodiment, locating a content fingerprint involves searching a song database to locate a song represented by the partial fingerprint. In still another embodiment, the method further involves searching a database to locate the content item represented by the partial fingerprint, and locating a content fingerprint by identifying the content fingerprint associated with the content item located in the database. In a more particular embodiment, the method further includes locating metadata from the database based on the content item identified by the partial fingerprint, and transmitting the metadata associated with the content item for use by devices in presenting at least some of the metadata in connection with presentation of the content item.
In one embodiment of such a method, the content fingerprint is a more comprehensive fingerprint for the content item relative to the partial fingerprint. In another embodiment, when a device has locally identified an end of a local presentation of a content item, the process of receiving a partial fingerprint, locating a content fingerprint, and transmitting the content fingerprint is repeated for each subsequent content item.
In accordance with another embodiment, a method is provided that involves calculating a partial fingerprint at a device for a portion of an audio segment playing on the device. The device then determines when the audio segment has stopped playing on the device by locally performing repeated partial fingerprint calculations and comparisons of the resulting partial fingerprints to one or more reference fingerprints for that audio segment.
In accordance with another embodiment of such a method, the reference fingerprint(s) includes one or more prior partial fingerprints calculated at the device for that audio segment, and where locally performing comparisons of the resulting partial fingerprints to a reference fingerprint for that audio segment involves locally performing comparisons of the resulting partial fingerprints to the prior partial fingerprint calculation(s) on the device for that audio segment.
According to another embodiment, such a method further involves performing a search at a network element for the audio segment based on the calculated partial fingerprint, and providing an audio segment fingerprint as the reference fingerprint to the device in response to locating the audio segment. In a more particular embodiment, performing repeated partial fingerprint calculations and comparisons involves performing repeated partial fingerprint calculations and comparisons of the resulting partial fingerprints to the audio segment fingerprint received from the network element.
In accordance with another embodiment of the invention, an apparatus is provided that includes a user interface configured to present content. The apparatus includes a transmitter configured to transmit at least one calculated fingerprint representative of at least a portion of the presented content. A receiver is configured to receive a content fingerprint for the presented content in response to transmitting the calculated fingerprint(s), and a comparator or other compare module is provided to compare further calculated fingerprints to the content fingerprint to identify an end of the content presentation.
In one embodiment, the user interface includes a visual and/or audio component. For example, the user interface may include a display, and/or a speaker(s), headphone jack(s), etc.
In other embodiments of such an apparatus, a fingerprint calculation module may be configured to calculate the calculated fingerprint(s) based on at least a portion of the presented content, and to calculate the further calculated fingerprints used for comparison to the content fingerprint. In another embodiment, the user interface is further configured to present metadata associated with the content and received in response to transmitting the at least one calculated fingerprint. In another particular embodiment, the user interface is further configured to discontinue presentation of the metadata upon detection of the end of the content presentation. In one embodiment, the user interface includes at least a display.
In another embodiment of such an apparatus, a memory is provided, which is configured to locally store the content fingerprint for the presented content. In such an embodiment the compare module may be further configured to compare further calculated fingerprints, of a subsequent presentation of the content, to the locally stored content fingerprint to identify the end of the subsequent presentation of the content.
In accordance with another embodiment of the invention, an apparatus is provided that includes a receiver configured to receive a calculated fingerprint representative of a portion of a content item. A content analysis module is configured to locate a content fingerprint based on the content item identified by the calculated fingerprint. A transmitter is configured to send the content fingerprint associated with the located content item for use by devices in locally identifying an end of a local presentation of the content item.
In other embodiments of such an apparatus, the content analysis module is further configured to locate metadata associated with the content item, based on the content item identified by the calculated fingerprint. In still another particular embodiment, the content analysis module includes instructions and a processing system capable of executing the instructions to search a database for the content item and the metadata identified by the calculated fingerprint.
The above summary of the invention is not intended to describe every embodiment or implementation of the present invention. Rather, attention is directed to the following figures and description which sets forth representative embodiments of the invention.
The invention is described in connection with the embodiments illustrated in the following diagrams.
In the following description of exemplary embodiments, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration various manners in which the invention may be practiced. It is to be understood that other embodiments may be utilized, as structural and operational changes may be made without departing from the scope of the present invention.
Generally, the present invention provides systems, apparatuses and methods for detecting the end of a presentation of a content/media item. In one embodiment, a device consumes (e.g., presents; plays) content, such as playing a song. The content item is identified externally to the device through the use of a fingerprint generated and provided by the device for that content item. A more comprehensive fingerprint is returned to the device, which in turn accepts responsibility for detecting the end of the song or other content item using further locally calculated fingerprints and the more comprehensive fingerprint received by the device. In this manner, the device itself performs the task of detecting when the song or other content item has ended. As is described more fully below, detection of the end of the content item may be desirable for various reasons such as, for example, knowing when to start and stop presenting metadata that corresponds to the content item being presented at the device.
The description provided herein often refers to radio content (e.g., broadcast radio such as AM/FM radio) as a media type, but it should be recognized that the present invention is equally applicable to any type of transmitted content/media. These other types of content media include, but are clearly not limited to, radio, television, webcasts, podcasts, and/or other transmitted media. In one embodiment, the invention provides approaches to content generation and detection for visual radio services (e.g., NOKIA Visual Radio Service™) for any radio station that is received by a mobile terminal. These radio stations may be any type, such as frequency modulated (FM), amplitude modulated (AM), etc. As used herein, visual radio (or analogously, visual media) involves any visually presented information associated with the audio transmission, such as the song title, artist, album name, album cover art, advertiser/product, video clips, music videos, and/or other information that may correlate to the provided audio transmission.
One embodiment of the invention proposes manners for enabling content generation for services such as visual radio services, while enhancing the timing or synchronization of different yet cooperative media items such as audio and associated visual data. Particularly, media delivered to a terminal may be “recognized” at the terminal or elsewhere, such that associated data may be provided to augment the media with the data. Where the media is an audio stream such as in the case of broadcast radio, the receiving terminal may recognize a currently played song, and may also receive data (e.g., artist name, album name, album image, etc.) to augment the audio experience with that data. For example, as a mobile device user listens to a song-A via the mobile device, data indicating the song-A name, artist, album, album cover image, and/or other related data may be presented to the user via the mobile device. The same may be applied in the case of television signals, animations, podcasts, albums, and/or other media that can be delivered to and/or played at a terminal.
In one embodiment, identification of the data associated with a currently played media item involves using media recognition technology. For example, in the case of radio transmissions, song recognition technology may be used where the mobile terminal calculates an audio fingerprint and provides it to a server(s) for recognition and content creation. Generally, “fingerprinting” is a technique used for song identification. Fingerprints are typically (but not necessarily) smaller in size than the actual digital content, yet contain enough information to uniquely identify the song or other media item. Each audio fingerprint is sufficiently unique to identify a song or other media item. Any known “fingerprinting” technology that allows continuous/repeated recognition, or recognition from any part of the song or other content, may be used in connection with the invention. After receiving the fingerprint and identifying the music piece or other media, the visual radio server can send content that matches the currently broadcast song or other media item to the terminal.
In order to generate the visual content with the radio or other media broadcast, an exemplary fingerprinting task may be performed relatively continuously, or at least repetitively. By continuously and/or repeatedly recognizing the received content, a song (or other media item) change can be determined. By recognizing the song change, visual content for the terminated song can be discontinued, and new visual content can be created for the next song. It is important to have both the audio and visual channels at least somewhat “synchronized” in that the correct visual data is presented while the corresponding audio content is being played. Otherwise, it would be confusing and frustrating for the user to be presented with visual content whose presentation timing does not substantially align with the start and end of its corresponding song or other media item. Identifying the end of or change of song (or other media item) “continuously” does not necessarily mean without any gaps—rather, “continuously” as used herein generally suggests that the task is repeated often enough to avoid significant confusion to the user.
More particularly, assume a mobile device has radio capabilities such as a frequency modulated (FM) and/or amplitude modulated (AM) radio. Assume that such mobile device includes a client side application(s) to digitize some or all of the received audio and calculate a fingerprint(s) from this digitized information. As previously indicated, such a fingerprint typically involves less data than that of the original digitized audio, but enough to uniquely identify the song or other media item. The mobile device may send the fingerprint over the mobile and/or other network(s) to a visual radio server, which uses the fingerprint to locate the appropriate information for that song/media from a database. This information may include, for example, a song name, artist name, album name and/or cover image, advertisements, and/or any other information that may be relevant to the currently played song or otherwise desirable to associate with the currently played song. It takes time for the fingerprint to be calculated and sent over the network, and for the associated media information to be located from the database and returned to the mobile terminal. Among other things, the time required includes the time, and network delays, involved in communicating the fingerprint and the resulting metadata, as well as the processing time to perform all of the transmission, locating, and/or other functions involved in the transaction. This latency can cause the audio signal and visual data received at the mobile device to be offset in time, or otherwise “unsynchronized” such that the received data does not correspond to the simultaneously received audio.
In one example, detection of when a current song ends may be accomplished by repeating the fingerprint calculation and song recognition procedure continuously. More particularly, the mobile terminal can calculate and transmit a fingerprint and receive the corresponding metadata via the network over and over until the calculated fingerprint results in the receipt of metadata for a new song. This continuous, repetitive activity places significant stress on the network and involved components. In such an approach, the end of song detection occurs with some delay since it takes time for the fingerprint to reach the server, for the server to search through the database, and for the response to arrive. During that delay, content that does not correspond to the currently consumed media stream may be presented to the terminal user. As a more particular example, the song played via a mobile device's radio module can change during the latency resulting from these processing and network delays, thereby presenting incorrect visual data for the new song being heard via the radio broadcast. Among other things, the present invention improves the timing in detecting the end of songs or other media items, and reduces the quantity of data traversing the network.
Referring now to
As will be described more fully below, one aspect of the present invention involves the content termination determination 106 performed at the local device 100. The device 100 obtains enough information regarding the media stream to perform comparisons between this information and the currently played media. In this manner, the device 100 itself can determine when the information and currently played media no longer match, thereby indicating the termination of the currently played media. For example, the device 100 may become privy to data indicative of a song, and the device 100 repeatedly calculates a fingerprint(s) for the currently played song to compare to that data. If there is a match, it indicates that the same song is still being played. If the currently played song and the data do not match, it is indicative of the termination of the prior song and/or playing of a new song/content.
In one exemplary embodiment, the device 100 partakes in a new fingerprint calculation 108, which involves calculating a fingerprint(s) for a song or other content item that has not yet been recognized or identified. This information can be sent via a network(s) 104 to a network element(s) 110, which includes or otherwise has access to a database 112 or other storage of data that may be used to accompany corresponding media/content. In one embodiment, a database 112 of content is stored, where the calculated fingerprints are used to ultimately locate the data that corresponds to that fingerprint(s). This database 112 can be stored in a stand-alone terminal, server, etc. This database 112 can alternatively be stored in a distributed server system, and/or in other distributed systems including any one or more of the terminal, server, etc. In one embodiment, the content is stored in a database 112 associated with a server 110, where the calculated fingerprint is used to index the database 112 to obtain the associated data. This data may be any data associated with the media stream. For example, where a radio broadcast represents the media stream, this “data” or “content” may be visual radio content such as a song title, author/artist, album cover art, artist photos or images, related trivia, artist biographies, and/or any other information that may pertain to the current media stream item. In other embodiments, the content may not specifically relate to the current media stream item (e.g., song), but may represent other content such as advertisements, coupons or discounts, trivia, concert information, “next song” indications, etc.
The network element 110 engages in content recognition 114, such as by using the received fingerprint information to identify the correct song or other content, and thereby recognizing the data associated therewith. This associated data can be transferred 116 to the device 100, where is can be presented 118. For example, the data may include album graphics and/or text to visually present the artist, song title, etc.
In one embodiment, the network element 110 also provides actual fingerprint data for the currently played song/content. This actual fingerprint data may be located based on the database search, which in turn was based on the calculated fingerprint(s) received from the device 100 on which the content is being played. In this manner, the device 100 can then compare this actual fingerprint data to repeatedly calculated fingerprint data to determine 106 when that currently played song or other content has stopped playing. This actual fingerprint data may be referred to herein as the media fingerprint, or reference fingerprint. For example, when a partial fingerprint is calculated at a device 100, the calculation may (and typically is) directed to a portion or subset of an entire fingerprint that would represent that song/content. In order to identify the song/content, the network element(s) 110 and/or database 112 stores fingerprint data for a larger portion of the song/content, and typically for the entire song/content. Thus, the media fingerprint or reference fingerprint relates to the actual fingerprint for that song or other content to which a comparison of the device-calculated fingerprint can be made. In another embodiment, the device 100 can compare the repeatedly calculated fingerprint data with the previously calculated fingerprint data that was sent to the network element to ultimately identify the desired associated data (often referred to herein as “metadata”). In such an embodiment, the “reference” fingerprint is provided locally, and is based on prior fingerprint data calculated at the device 100. Regardless of the origin of the reference fingerprint(s), when the content termination determination 106 indicates that the song/media item has stopped or changed to another media item, then a new fingerprint calculation 108 can be executed to obtain new associated data for presentation 118.
Detection of the end of a song or other media item is beneficial where associated data is to be presented substantially contemporaneously with the play of the song or other media item. For example, in the case of a song being played on the device, it may be desirable to display information such as the song title, artist, album graphics, song time remaining, and/or other information. When that song ends, the presented song title, artist, and the like will no longer be valid. Therefore, detection of the end of the song/item is important so that the proper associated data (e.g., song title, artist, etc.) corresponds to the song/item that is being played.
In another embodiment, fingerprint data may be cached or otherwise stored at the device for future comparisons. For example, when a song plays for the first time, the media or reference fingerprint can be provided by a server or other network element, and stored on the device for later use in comparing to newly calculated fingerprints at the device. As another example, fingerprint data previously calculated at the device may be locally cached or otherwise stored at the device. Radio stations often play the most popular songs repetitively, and thus the device can recognize such songs and locally store the media/reference fingerprint data for those songs. After such fingerprint data has been locally stored, the device can first check if the song/content can be identified using the locally stored fingerprint data for previously played songs. If no match is locally found, then the identification request can be sent to the server or other network element.
If the reference fingerprint matches 204 a newly calculated fingerprint, this indicates that the currently played content item is still the same content item 206 that was playing when the reference fingerprint was determined. In other words, the content item has not terminated or changed to another content item. In such case, further fingerprints can be calculated 200 for more comparisons 202 to the reference fingerprint to ultimately determine when the content item has stopped or changed to another content item. If there is no match 204, this is indicative of a changed/stopped content item 208. More particularly, if the reference fingerprint does not match the newly, locally calculated fingerprint, the content item has stopped playing and/or the media has changed to a different content item.
If it is determined 306 that the song metadata is located in the database, the server sends 312 the song metadata to the device, and in one embodiment also sends fingerprint data corresponding to the song to the device. While the device initially calculates some fingerprint data that can be used by the server to locate the particular song, the song is likely different at a point midway through the song than it was at the time the initial fingerprint was calculated. Therefore, fingerprint data for the entire song may be stored with the song's metadata, which can then be sent 314 to the device. As will be described more fully below, this fingerprint thus serves as a reference for comparison to the song as it plays on the device.
When the device receives the metadata, it can present 316 it via the device. For example if the metadata is audio data, it can be presented via a speaker(s) or headset associated with the device. This may be appropriate where the content/media being consumed is a visual image. For example, if the content is a still image of a piece of museum art, the metadata can provide an audio presentation indicating the name of the piece of art, the artist, the museum where the original is on display, etc. In the illustrated embodiment where the media is an audio radio signal, the metadata may be presented visually, such as via a display on the device. More particularly, as a song plays on the device, the metadata may be presented on the device display. This metadata may include, for example, text of the artist and album names, an image of the album cover art, etc.
In the embodiment of
As previously indicated, one embodiment of the invention involves the use of a network entity(s), such as a recognition server, to locate the song being played based on a calculated fingerprint from the consuming device.
In the illustrated embodiment, a radio station 400 provides a radio signal 402A. While the radio signal could be provided to the mobile device 404 via the Internet, local area network and/or other network(s), it is assumed for purposes of
When the mobile device 404 receives the metadata, it can present 424 that song metadata. For example, it may display the artist name, song title, album art and/or other information on a display screen. The mobile device 404 continues to calculate 426 fingerprints for the currently playing audio, and compares the resulting fingerprints to the more complete fingerprint received 418B from the recognition server 410. As long as the calculated fingerprints match 428 the complete (i.e., reference) fingerprint, SONG-A is still playing and the song metadata for SONG-A should continue to be presented 424. However, when SONG-A ends and some other audio starts as depicted by the radio signal 402B, SONG-A has ended, and the mobile device 404 should detect that. The device 404 detects the end of SONG-A by calculating a fingerprint for the new radio signal 402B, and since SONG-A ended the calculated fingerprint will not longer match 428 the reference fingerprint. In this case, presentation 424 of the song metadata can be discontinued until the next song, referred to as SONG-B, can be detected and the appropriate metadata obtained. When the calculated fingerprint does not match 428 the reference fingerprint, the mobile device 404 will not have enough information to detect which song has now started to play on the radio, and will have to again calculate 406 a new fingerprint to send 408 to the server 410. The process then continues, and when the metadata for SONG-B is obtained from the database 412, this new metadata for SONG-B can be presented 424 via the mobile device 404.
The device then repeatedly calculates a fingerprint sufficient to compare to the reference fingerprint as shown at times F, G, H and J. While the calculated fingerprint matches the reference fingerprint, the metadata continues to be presented. At time I a new song, SONG-B, begins playing. The fingerprint calculated at the device no longer matches the reference fingerprint, and can remove the metadata as depicted by display 500C. Further, the device sends a new calculated fingerprint at time K to obtain the metadata for SONG-B. When this metadata is obtained at time M, it can be presented as shown by the metadata 504 on the display 500D.
A calculated fingerprint(s) that is representative of the content is transmitted 602. In one embodiment, the calculated fingerprint(s) is obtained by the device itself calculating the fingerprint(s) based on the presented 600 content. Other embodiments may involve the use of remote devices or systems to assist in the calculation of the fingerprint which may then be made accessible to the device.
In response to transmitting the calculated fingerprint(s), at least one fingerprint for the content item is received 604. In one embodiment, this “reference fingerprint” or “content fingerprint” represents a larger segment of the content item, which may include up to the entire content item. For example, a fingerprint calculated at the device may be a partial fingerprint corresponding to a portion or subset of the entire content being presented (e.g., a ten second portion of a song), yet is still representative of that content. On the other hand, a content fingerprint received 604 by the device may correspond to a larger portion or all of the entire content being presented. In the embodiment of
In the illustrated embodiment, the device receives 614 a song fingerprint for the song, and metadata associated with the song, in response to transmitting the calculated fingerprint. The received song fingerprint may be a more comprehensive fingerprint relative to the calculated fingerprint. In another example, the song fingerprint may relate to only a remaining portion of that song or other content item. For example, if the device-calculated fingerprint corresponded to a ten second period from the thirty second (00:30) point in a four minute song to the forty second (00:40) point in the song, then the song fingerprint received in response may include a fingerprint from no earlier than 00:40 until approximately the end (i.e., 04:00) of the song. In another embodiment, the song fingerprint is representative of substantially all of the song or other content item.
The embodiment of
In accordance with the present invention, the end of the song can be detected. One reason to detect the end of the song is to discontinue presenting the metadata when the song has ended. In the illustrated embodiment, the device calculates further fingerprints 618, which are compared to the received song fingerprint. If the calculated fingerprint matches 620 the received song fingerprint, the same song is still being played via the device, and the metadata continues to be presented 616. Otherwise, when there is no match 620, this indicates that the song has ended, and presentation of the metadata for that song is discontinued 622. Thus, in one embodiment, this local comparison occurs until the calculated fingerprint no longer corresponds to the song fingerprint. These fingerprint calculations 618 can occur at any desired frequency or quantity. For example, the calculations 618 may occur periodically, sporadically, substantially continuously, or any other desired frequency. It is noted that as a statistical average, the more often the calculation 618 and comparison 620 is performed, the less time the metadata will be incorrectly presented for a previous song.
In the illustrated embodiment, a server receives 710 a partial fingerprint(s) representative of at least a portion of a song. The server searches 712 a song database to locate the song represented by the partial fingerprint, and obtains a song fingerprint and metadata stored with the located song. As previously indicated, the metadata may be any information and of any form desired, such as, for example, textual, audio, graphical, multimedia and/or other information providing characteristics of the song. Furthermore, the song fingerprint may be a fingerprint that is more comprehensive than the partial fingerprint used to initially identify the song. The song fingerprint and metadata are transmitted 714 for use by devices in locally detecting an end of the song. This process 710, 712, 714 may be repeated for each subsequent song for which a partial fingerprint is received 710.
In one embodiment, the reference fingerprint may be derived 802A at the device itself. For example, the reference fingerprint(s) may include one or more prior fingerprints calculated at the device for that audio segment. In such a case, the local comparisons of the resulting partial fingerprints to the reference fingerprint(s) for that audio segment involves locally performing the comparisons of the resulting partial fingerprints to the one or more prior partial fingerprint calculations on the device for that audio segment. In other words, as the device continues to calculate fingerprints for some content such as a song, those calculated fingerprints may then serve as the reference fingerprints to which the newly calculated fingerprints are compared. In another embodiment, the reference fingerprint may be derived 802B at a remote device, such as a server or other network element. In such an embodiment, the network element may perform a search for the audio segment based on the calculated partial fingerprint(s), and if located, provide the device with an audio segment fingerprint as the reference fingerprint.
A representative system in which the present invention may be implemented or otherwise utilized is illustrated in
The representative terminal 900A utilizes computing/processing systems to control and manage the conventional device activity as well as the device functionality provided by the present invention. For example, the representative wireless terminal 900B includes a processing/control unit 910, such as a microprocessor, controller, reduced instruction set computer (RISC), or other central processing module. The processing unit 910 need not be a single device, and may include one or more processors. For example, the processing unit may include a master processor and one or more associated slave processors coupled to communicate with the master processor.
The processing unit 910 controls the basic functions of the device 900B as dictated by programs available in the program storage/memory 912. The storage/memory 912 may include an operating system and various program and data modules associated with the present invention. In one embodiment of the invention, the programs are stored in non-volatile electrically-erasable, programmable read-only memory (EEPROM), flash ROM, etc., so that the programs are not lost upon power down of the terminal. The storage 912 may also include one or more of other types of read-only memory (ROM) and programmable and/or erasable ROM, random access memory (RAM), subscriber interface module (SIM), wireless interface module (WIM), smart card, or other fixed or removable memory device/media. The programs may also be provided via other media 913, such as disks, CD-ROM, DVD, or the like, which are read by the appropriate interfaces and/or media drive(s) 914. The relevant software for carrying out terminal operations in accordance with the present invention may also be transmitted to the device 900B via data signals, such as being downloaded electronically via one or more networks, such as the data network 915 or other data networks, and perhaps an intermediate wireless network(s) 916 in the case where the device 900A/900B is a wireless device such as a mobile phone.
For performing other standard terminal functions, the processor 910 is also coupled to user input interface 918 associated with the device 900B. The user input interface 918 may include, for example, a keypad, function buttons, joystick, scrolling mechanism (e.g., mouse, trackball), touch pad/screen, and/or other user entry mechanisms.
A user interface (UI) 920 may be provided, which allows the user of the device 900A/B to perceive information visually, audibly, through touch, etc. For example, one or more display devices 920A may be associated with the device 900B. The display 920A can display web pages, images, video, text, links, television, visual radio information and/or other information. A speaker(s) 920B may be provided to audibly present instructions, information, radio or other audio broadcasts, etc. A headset/headphone jack 920C and/or other mechanisms to facilitate audio presentations may also be provided. Other user interface (UI) mechanisms can also be provided, such as tactile 920D or other feedback.
The exemplary mobile device 900B of
In one embodiment, the storage/memory 912 stores the various client programs and data used in connection with the present invention. For example, a fingerprint extractor module 930 can be provided at the device 900B to sample an audio stream received by way of a broadcast receiver, such as the radio receiver/tuner 940. For example, a fingerprint extractor module 930 can be provided to sample an audio stream (e.g., radio signal) and may be, for example, a software/firmware program(s) executable via the processor(s) 910. The fingerprint extractor may calculate a sample of, for example, several seconds although the particular duration may vary. Longer durations may produce more accurate results. In one embodiment, at the end of a sampling period, a request is sent to the recognition backend, such as a server 950 that looks up the song or other content item in a database based on the fingerprint sample(s).
The device 900B includes a fingerprint calculation module 932 to generate the fingerprint portions previously described. A compare module 934 can perform the local comparisons previously described, such as comparing the locally generated fingerprints to the reference fingerprint to determine when the content segment has ended. These and other modules may be separate modules operable in connection with the processor 910, may be a single module performing each of these functions, or may include a plurality of such modules performing the various functions. In other words, while the modules are shown as multiple software/firmware modules, these modules may or may not reside in the same software/firmware program. It should also be recognized that one or more of these functions may be performed using hardware. For example, a compare function may be performed by comparing the contents of hardware registers or other memory locations using hardware compare functions. These modules are representative of the types of functional and data modules that may be associated with a terminal in accordance with the invention, and are not intended to represent an exhaustive list. Also, other functions not specifically shown may be implemented by the processor 910.
In accordance with one embodiment of the invention, the storage/memory 954 and/or media devices 960 store the various programs and data used in connection with the present invention. For example, the storage 954 may include a content analysis module 980 that is configured to locate a content fingerprint that represents some content item, where that content item is identifiable via the fingerprint received from the device 900B. For example, the content analysis module can compare the received partial fingerprint to all of the more complete fingerprints in the content database 982A (e.g., song database). In one embodiment, the content analysis module therefore includes a comparison module configured to compare these fingerprints. When a match is found, the song or other content item corresponding to that fingerprint is known, and the more complete fingerprint and/or associated metadata can then be returned to the device 900B. In the context of a visual radio server, the storage/memory 954 may include the content database 982A (e.g., song database) where the desired content is stored and located using the fingerprint(s) received from the device 900B. Alternatively, such a database 982B may be in a separate server, such as a music recognition server accessible via a network or otherwise.
The illustrated computing system 950 also includes DSP circuitry 966, and at least one transceiver 968 (which is intended to also refer to discrete transmitter/receiver components). While the server 950 may communicate with the data network 915 via wired connections, the server may also/instead be equipped with transceivers 968 to communicate with wireless networks 916 whereby an antenna 970 may be used.
Hardware, firmware, software or a combination thereof may be used to perform the functions and operations in accordance with the invention. Using the foregoing specification, some embodiments of the invention may be implemented as a machine, process, or article of manufacture by using standard programming and/or engineering techniques to produce programming software, firmware, hardware or any combination thereof. Any resulting program(s), having computer-readable program code, may be embodied within one or more computer-usable media such as memory devices or transmitting devices, thereby making a computer program product, computer-readable medium, or other article of manufacture according to the invention. As such, the terms “computer-readable medium,” “computer program product,” or other analogous language are intended to encompass a computer program existing permanently, temporarily, or transitorily on any computer-usable medium such as on any memory device or in any transmitting device.
From the description provided herein, those skilled in the art are readily able to combine software created as described with appropriate general purpose or special purpose computer hardware to create a computing system and/or computing subcomponents embodying the invention, and to create a computing system(s) and/or computing subcomponents for carrying out the method(s) of the invention.
The foregoing description of the exemplary embodiment of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope of the invention be limited not with this detailed description, but rather determined by the claims appended hereto.