Audio signals may include both desired components, such as a user's voice, and undesired components, such as noise. Noise removal (or cancellation) attempts to remove the undesired components from the audio signals. One implementation of noise removal is dual microphone noise cancellation, where a first microphone is used to pick up primarily a desired signal (e.g., the user's voice) and a second microphone is used to pick up primarily an undesired signal (e.g., a noise signal, such as background noise). The dual microphone cancellation system may remove noise by subtracting the audio signal picked up by the second microphone from the audio signal picked up by the first microphone. This and other noise cancellation techniques have various drawbacks. For example, this noise cancellation technique does not perform well if the geometry of the audio source versus the noise source is not fixed or known. These and other drawbacks are addressed in this disclosure.
This summary is not intended to identify critical or essential features of the disclosures herein, but instead merely summarizes certain features and variations thereof. Other details and features will also be described in the sections that follow.
Some of the various features described herein relate to a system and method for removing an audio noise component from a received audio signal. For example, a speech recognition system may attempt to decipher a user's voice command while a television in the background is on. The method may comprise receiving (e.g., for analysis) an audio signal having noise. The noise may correspond to a piece of content previously or currently being provided to a user. The method may further comprise identifying noise by identifying the piece (e.g., an item) of content provided to the user. In response to identifying the item of content, for example, an audio component of the item of content may be identified and/or received. The audio component may have been provided to the user while the audio signal having noise was generated. The method may include synchronizing the audio component of the item of content to the received audio signal. In some aspects, the synchronization may include identifying a first audio position mark (e.g., watermark) in the audio component of the item of content provided to the user, identifying a second audio position mark in the received audio signal, and matching the first audio position mark in the audio component to the second audio position mark in the received audio signal. The method may also include determining a first timestamp included in the first audio position mark and a second timestamp included in the second audio position mark, wherein matching the first audio position mark to the second audio position mark may include matching the first timestamp to the second timestamp. The audio component of the item of content may also be synchronized to the received audio signal based on a cross-correlation between the two signals. After the synchronization and further processing, the audio component of the item of content may be identified as noise and removed from the received audio signal.
In some aspects, the noise may be time-shifted from the audio component of the piece of content because the noise and audio component may be received separately and/or from different sources, and synchronizing the audio component of the piece of content to the received audio signal may include removing the time-shift between the audio component and the noise. The method may further include determining the magnitude of the noise, adjusting the magnitude of the audio component based on the magnitude of the noise, and subtracting the audio component having the adjusted magnitude from the received audio signal. In additional aspects, the piece of content may be a television program, and the audio signal may include a voice command.
A method described herein may comprise receiving an audio signal, extracting an audio watermark from the audio signal, identifying an audio component of a piece of content based on the audio watermark, and removing the audio component of the piece of content from the received audio signal. The method may further comprise extracting a second audio watermark from the audio component of the piece of content and synchronizing the audio component of the piece of content to the audio signal based on the audio watermark and the second audio watermark. Removing the audio component of the piece of content from the received audio signal may include subtracting the synchronized audio component of the piece of content from the received audio signal.
Identifying the audio component of the piece of content may include extracting an identifier identifying the piece of content from the audio watermark. The audio signal may include a voice command, and the method may further comprise forwarding, to a voice command processor, the audio signal having the audio component of the piece of content removed, wherein the voice command processor may be configured to determine an action to take based on the voice command. Additionally or alternatively, the audio signal may include a portion of a telephone conversation, and the method may further comprise forwarding, to at least one party of the telephone conversation, the audio signal having the audio component of the piece of content removed.
A method describe herein may comprise delivering a piece of content to a user, receiving, from the user, a voice command having noise, identifying an audio component of the piece of content delivered to the user, synchronizing the audio component of the piece of content to the received voice command, and/or removing the audio component of the piece of content from the received voice command based on the synchronization. In some aspects, synchronizing the audio component of the piece of content to the received voice command may include identifying a first audio watermark in the audio component of the piece of content, identifying a second audio watermark in the received voice command, and matching the first audio watermark to the second audio watermark. The method may also include determining a first timestamp included in the first audio watermark and a second timestamp included in the second audio watermark, wherein matching the first audio watermark to the second audio watermark may include matching the first timestamp to the second timestamp.
In some aspects, the noise included in the received voice command may comprise a second audio component corresponding to the audio component of the piece of content. The second audio component may be time-shifted from the audio component of the piece of content. Furthermore, synchronizing the audio component of the piece of content to the received voice command may comprise removing the time-shift between the audio component and the second audio component. Next, the magnitude of the second audio component may be determined and used to adjust the magnitude of the audio component. Further, the audio component having the adjusted magnitude may be subtracted or removed from the received voice command. In some aspects, the piece of content removed from the received voice command may correspond to a television program. The method may further comprise determining whether a user device scheduled to play the piece of content is on, and in response to determining that the user device is on, performing the audio component removal step.
Some features herein are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements.
There may be one link 101 originating from the local office 103, and it may be split a number of times to distribute the signal to various homes 102 in the vicinity (which may be many miles) of the local office 103. Although the term home is used by way of example, locations 102 may be any type of user premises, such as businesses, institutions, etc. The links 101 may include components not illustrated, such as splitters, filters, amplifiers, etc. to help convey the signal clearly. Portions of the links 101 may also be implemented with fiber-optic cable, while other portions may be implemented with coaxial cable, other links, or wireless communication paths.
The local office 103 may include an interface 104, which may be a termination system (TS), such as a cable modem termination system (CMTS), which may be a computing device configured to manage communications between devices on the network of links 101 and backend devices such as server 106 (to be discussed further below). The interface may be as specified in a standard, such as, in an example of an HFC-type network, the Data Over Cable Service Interface Specification (DOCSIS) standard, published by Cable Television Laboratories, Inc. (a.k.a. CableLabs), or it may be a similar or modified device instead. The interface may be configured to place data on one or more downstream channels or frequencies to be received by devices, such as modems at the various homes 102, and to receive upstream communications from those modems on one or more upstream frequencies. The local office 103 may also include one or more network interfaces 108, which can permit the local office 103 to communicate with various other external networks 109. These networks 109 may include, for example, networks of Internet devices, telephone networks, cellular telephone networks, fiber optic networks, local wireless networks (e.g., WiMAX), satellite networks, and any other desired network, and the interface 108 may include the corresponding circuitry needed to communicate on the network 109, and to other devices on the network such as a cellular telephone network and its corresponding cell phones.
As noted above, the local office 103 may include a variety of servers that may be configured to perform various functions. For example, the local office 103 may include a data server 106. The data server 106 may comprise one or more computing devices that are configured to provide data (e.g., content) to users in the homes. This data may be, for example, video on demand movies, television programs, songs, text listings, etc. The data server 106 may include software to validate user identities and entitlements, locate and retrieve requested data, encrypt the data, and initiate delivery (e.g., streaming) of the data to the requesting user and/or device.
An example home 102a may include an interface 117. The interface may comprise a device 110, such as a modem, which may include transmitters and receivers used to communicate on the links 101 and with the local office 103. The device 110 may comprise, for example, a coaxial cable modem (for coaxial cable links 101), a fiber interface node (for fiber optic links 101), or any other desired modem device. The device 110 may be connected to, or be a part of, a gateway interface device 111. The gateway interface device 111 may be a computing device that communicates with the device 110 to allow one or more other devices in the home to communicate with the local office 103 and other devices beyond the local office. The gateway 111 may comprise a set-top box (STB), digital video recorder (DVR), computer server, or any other desired computing device. The gateway 111 may also include (not shown) local network interfaces to provide communication signals to devices in the home, such as televisions 112, additional STBs 113, personal computers 114, laptop computers 115, wireless devices 116 (wireless laptops and netbooks, mobile phones, mobile televisions, personal digital assistants (PDA), etc.), and any other desired devices. Wireless device 116 may also be a remote control, such as a remote control configured to control other devices at the home 102a. For example, the remote control may be capable of commanding the television 112 and/or STB 113 to switch channels. As will be described in further detail in the examples below, a remote control 116 may include speech recognition services that facilitate audio commands (e.g., a command to switch to a particular program and/or channel) made by a user. Examples of the local network interfaces include Multimedia Over Coax Alliance (MoCA) interfaces, Ethernet interfaces, universal serial bus (USB) interfaces, wireless interfaces (e.g., IEEE 802.11), Bluetooth interfaces, and others.
The local office 103 and/or devices in the home 102a (e.g., a wireless device 116, such as a mobile phone or remote control device) may communicate with an audio computing device 118 via one or more interfaces 119 and 120. The interfaces 119 and 120 may include transmitters and receivers used to communicate via wire or wirelessly with local office 103 and/or devices in the home using any of the networks previously described (e.g., cellular network, optical fiber network, copper wire network, etc.). Audio computing device 118 may have a variety of servers and/or processors, such as audio processor 121, that may be configured to perform various functions. As will be described in further detail in the examples below, audio processor 121 may be configured to receive audio signals from a user device (e.g., a mobile phone 116), to receive an audio component of a piece of content being consumed by a user at the user's home 102a, and/or to remove the audio component of the piece of content from the received audio signal.
Audio computing device 118, as illustrated, may be one or more component within a cloud computing environment. Additionally or alternatively, computing device 118 may be located at local office 103. For example, device 118 may comprise one or more servers in addition to server 106 and/or be integrated within server 106. Device 118 may also be wholly or partially integrated within a user device, such as a device within a user's home 102a. For example, device 118 may include various hardware and/or software components integrated within a TV 112, an STB 113, a personal computer 114, a laptop computer 115, a wireless device 116, such as a user's mobile phone or remote control, an interface 117, and/or any other user device.
Content playing in the background while a user issues a voice command or conducts a phone call may contribute unwanted noise to the voice command or phone call. By removing the content playing in the background (which may be noise), a signal to noise ratio of an audio signal generated by the voice command or phone call may be improved.
In step 300, a computing device may receive an audio signal, such as an audio message signal (e.g., from a remote control having a voice recognition service, a set top box, a smartphone, etc.). As previously discussed, the computing device that receives the audio signal may be located at any number of locations, including within a cloud computing environment, at local office 103, in a user device, and/or a combination of any of these locations. The audio signal (e.g., a message) may include a desired signal, such as a voice command, and undesired signals, such as an audio component of content playing in the background (which may be considered noise). In at least some embodiments, these signals may be simultaneously received at a single (or several) microphone or other sensor devices. In step 305, the computing device may identify content previously or currently being presented (e.g., viewed or played) by one or more devices within the home 102a (e.g., played within a predetermined time period, such as the length of the received audio signal, the last five seconds of all content played, or prior to the time it took to receive and analyze the audio signal). In step 310, the computing device may receive audio components of the content identified in step 305, which may have been previously-played or are currently playing on a user device or at a user home (e.g., audio components of audiovisual content). For example, if the computing device determined that television 112 was playing Television Show 1 while the user was speaking a voice command, the computing device may retrieve a recently-played audio component of Television Show 1 in step 310 to account for, for example, the volume of noise sources.
In step 315, the computing device may synchronize the audio signal with the received audio component of the previously-played content. For example, the computing device may match watermarks, or any other marker associated with time or location, present in the audio signal with corresponding watermarks in the audio component. Alternatively, the audio component and audio signal may be synchronized based on a cross-correlation between the two signals. In step 320, the computing device may optionally adjust the magnitude of the audio component to correspond to the magnitude of the noise signals present in the voice command. In step 325, the computing device may remove (e.g., isolate, subtract, etc.) the audio component of the playing content from the received audio signal (e.g., a voice command), thereby removing undesired noise signals from the audio signal. In step 330, the computing device may use and/or otherwise forward the resulting audio signal for further processing. For example, the computing device may process the audio signal to determine a voice command issued by a user (e.g., a voice command to switch channels).
In step 405, the computing device may identify potential noise sources. As described herein, noise may include the audio components of content generated by various devices (e.g., noise sources) that play the content (or otherwise provide the content to users). Noise sources may include various devices at the user's home 102a, such as television 112, STB 113, computer 114, laptop 115, mobile device 116, and/or other client premises equipment, and also appliances such as refrigerators, washing machines, alarms, street noise, etc. Content that may contribute noise may include linear content (e.g., broadcast content or other scheduled content), content on demand (e.g., video on demand (VOD) or other programs available on demand), recorded content (e.g., content recorded and/or otherwise stored on a local or network digital video recorder (DVR)), and other types of content. As will be appreciated by one of ordinary skill in the art, other devices may be considered noise sources. For example, a gaming system (e.g., SONY PLAYSTATION, MICROSOFT XBOX, etc.) playing a movie, running a game, and/or playing music may introduce noise.
The audio component of a movie playing on television 112 or another device may constitute background noise if the user is attempting to issue a voice command to a remote control device, such as a command to switch to a particular channel or play a particular program. The audio component of the movie may interfere with processing (e.g., understanding by a voice command processor) the user's voice command. If laptop 115 is playing music, the music may constitute background noise if the user is speaking on the user's mobile phone 116 with a friend. The background music may cause the user's voice to be more difficult to understand by the friend on the other side of the conversation. Other examples of noise sources include television shows, commercials, sports broadcasts, video games, or other content having audio components.
Noise sources need not be located at the user's home 102a. For example, the user may be streaming a television show from laptop 115 at a location different from the user's home (e.g., at a friend's house, outdoors, at a coffee shop, etc.). The user may also be holding a conversation on the user's mobile phone 116 near the laptop 115 streaming the television show. The audio component of the television show, if audible to a microphone on the mobile phone 116 or other computing device, may contribute noise to the user's telephone conversation.
Noise resulting from various content may have the same or similar frequency components as the audio signal. For example, if the noise source is a television sitcom, the frequency range of the sitcom may include the frequency range of human voice. If the audio signal is a voice command, the frequency range of the voice command may also include the frequency range of human voice.
The computing device may identify potential noise sources by comparing a list of devices at the user's home (or otherwise associated with the user) to a list of known noise sources. For example, the computing device may retrieve a list of known noise sources, such as a list including televisions, STBs, laptop computers, personal computers, appliances, etc. The list may be stored at, for example, a storage device within audio computing device 118, a storage device at local office 103), or at another local and/or network storage location. By comparing the user's devices with the list, the computing device may determine that the user's television 112, STB 113, personal computer 114, and laptop computer 115 are potential noise sources. On the other hand, the computing device may determine that mobile device 116 is not a potential noise source because mobile devices are not included on the list.
The computing device may also identify noise sources by determining which user devices receive content from local office 103 and/or other content provider. For example, the computing device may determine that TV 112, STB 113, and mobile device 116 are potential noise sources because they are configured to receive content from local office 103 or another content provider. TV 112 and/or STB 113 may be potential noise sources because they receive linear and/or on-demand content from the content provider or content stored on a DVR. Mobile device 116 may be a potential noise source because an application configured to display content from the content provider (e.g., a video player, music player, etc.) may be installed on the mobile device 116.
In some aspects, any device capable of accessing online content (e.g., on demand and/or streaming video, on demand and/or streaming music, etc.) from the content provider may be a potential noise source. These devices may include, for example, computers 114 and 115 or any other device capable of accessing online content. These devices may render the online content using a web browser application, an Internet media player application, etc. The computing device may identify these sources as potential noise sources based on whether a user is logged onto the user's account provided by the service provider, such as a provider of content and/or a provider of the noise removal service. Content delivered to these devices while the user is logged onto the account may be considered background noise. Potential noise sources may include devices that might, but not necessarily always, contribute noise. For example, television 112 may be capable of contributing noise (e.g., a television program), but might not actually contribute noise if the television is turned off, muted, etc. The computing device may store identifiers for the potential noise sources in the user's noise profile (e.g., an IP address, MAC address, other unique identifier, etc. for each noise source).
In step 410, the computing device may determine the location of each of the potential noise sources. This location may be the user's home 102a, such that all devices located in the user's home may be considered potential noise sources. Locations may also include more specific locations within the user's home 102a. For example, the user may have a first STB and/or television in the user's living room, a second STB and/or television in the user's bedroom, and a personal computer also in the user's bedroom. The user may provide the computing device with the locations of the noise sources. For example, the user might log onto an account provided by a service provider providing the noise removal service and input information identifying the various devices (e.g., by MAC address, IP address, or other identifier) and the location of each device (e.g., bedroom 1, living room, kitchen, etc.). The computing device may use the location of each potential noise source when identifying actual noise sources. For example, if the user conducts a telephone conversation in the user's bedroom, the second STB and/or television and the user's personal computer may be identified as actual noise sources because they are located in the user's bedroom. On the other hand, the first STB and/or television might not be identified as a noise source because the first STB and/or television are located in the living room, not the bedroom. The identified locations of the noise sources may be stored in the user's noise profile.
In step 415, the computing device may determine the expected noise contribution of each noise source, such as the expected magnitude of the noise picked up by various microphones at the user's home 102a. Magnitude of the noise may depend on various factors, such as the volume of the noise source (e.g., the volume of television 112). The magnitude of the noise may be high if the volume of the television is high and low if the volume of the television is low. Magnitude may also depend on acoustic attenuation of the noise source. For example, losses caused by the transmission of the content from the noise source (e.g., a television) to the microphone (e.g., located on a user's mobile device 116) may occur. In general, less attenuation may occur if a microphone is located in the same room (living room, bedroom, etc.) as the noise source than if the microphone is located in a different room from the noise source. The attenuation amount may also depend on the distance between the microphone and the noise source, even if the two devices are within the same room. For example, there may be less attenuation (and thus the noise may have a higher magnitude) if the microphone is five feet from a television 112 generating noise than if the microphone is fifteen feet from the television. Acoustical and/or corresponding electrical losses may also occur at the noise source and/or microphone (e.g., dependent on the gain, amplification, sensitivity, efficiency, etc.) of the noise source and/or the microphone.
The computing device may obtain estimates of the expected magnitude for potential noise sources. Each room within the user's home 102a may have an estimated attenuation and/or magnitude amount. For example, the user's living room may have an attenuation amount of A decibels, the bedroom may have an attenuation amount of less than A, and the kitchen may have an attenuation amount of more than A. The attenuation amounts may be a default amount set by a noise removal service provider and/or factor in various noise magnitude measurements or other estimates, either locally (e.g., for a particular user of the noise removal service) or globally (e.g., for all users of the noise removal service).
A profile for the noise magnitude may be generated by periodically collecting noise data (e.g., hourly, daily, weekly) or otherwise collecting the noise data (e.g., at irregular times, such as each time the user uses a microphone on a user device to issue a voice command or to make a call, each time content is detected as running in the background, etc.). The collected noise data may be used to make a local estimate of the magnitude of the noise. For example, a local noise profile may identify that the magnitude of the noise is reduced by 57% from a baseline magnitude at the user's home or within a particular room in the user's home. In some aspects, the baseline magnitude may be the default magnitude at which the content is delivered to the user from local office 103 (e.g., the magnitude level at which the content is broadcast to user devices). The computing device may use the 57% level (a delta or offset from the baseline of 100% level) to adjust the audio component of the piece of content (e.g., the noise signal) to remove from a received audio signal, as will be described in further detail in the examples below. The attenuation and/or magnitude amount for a particular user may be combined with other users of the noise cancellation service to generate a global noise profile. For example, the global noise profile may combine the estimate for a first user (e.g., 57% acoustical loss) with an estimate for a second user (e.g., 63% acoustical loss) to obtain a global estimate (e.g., 60% acoustical loss or other weighted average). Any number of users may be factored in to determine the global estimate.
A profile for the noise magnitude may also be generated during configuration of the noise removal service by the user. For example, after the user is signed up for the noise removal service, the user may be prompted to configure the user's device(s) for the service.
Referring to
The device 700 may display another interface illustrated in
Returning to
In step 510, the computing device may determine the location of the device having the audio service (e.g., the user's mobile phone). If the user is in the user's home 102a, the relevant location may be the user's home or a particular room in the home (e.g., bedroom 1, kitchen, living room, etc.). The user may provide the computing device with the location of the user device. For example, the user device may display various graphical user interfaces (similar to the example interfaces of
The computing device may also determine the location of the user device by taking an audio sample (e.g., a noise sample) using the user device's microphone.
In step 570, the computing device may receive a request to determine the location of the user device. For example, as illustrated in
In step 572, the computing device may obtain an audio sample when the user presses the start button. The user device may record an audio sample (e.g., a two second sample, a five second sample), and the recorded audio sample may be forwarded to the computing device (which, as previously described, might or might not be within the user device). The computing device may use the audio sample to determine the location of the user device, as will be described in further detail in the examples below. In some aspects, the computing device may determine the location of the user device based on audio watermarks encoded in noise signals. Thus, when the microphone records the noise signals, it may also record the audio watermarks.
Audio watermarks (e.g., audio signals substantially imperceptible to human hearing) may be encoded in an audio component of a piece of content. The audio watermarks may be included in the content at predetermined time intervals (e.g., every second, every two seconds, every four seconds, etc.). Each audio watermark may include various types of information. The audio watermark may encode a timestamp (or date stamp) of the audio watermark relative to a baseline time. For example, an audio watermark may be located 23 minutes into a television program. If the baseline time is the start time of television program (e.g., baseline is 0 minutes), the timestamp of the audio watermark may be 23 minutes. The timestamp may also indicate an absolute time. For example, if the current time is 6:12 PM, the timestamp may indicate a timestamp of 6:12 PM. The timestamp may include an absolute time if, for example, the timestamp is included in the audio component of a linear content (or other content scheduled to play at a particular time).
In some aspects, the audio watermark may also identify the piece of content having the audio watermark. For example, a unique identifier, such as a program identifier (PID) may be included in the audio watermark. Other globally unique identifiers may be used (e.g., identifiers unique to the piece of content that distinguish the piece of content from other pieces of content). An identifier for the source of the content (e.g., a content provider) may also be included in the audio watermark. In some aspects, audio watermarks may be NIELSEN watermarks or other types of audio fingerprints.
In step 574, the computing device may extract one or more audio watermarks from the recorded audio sample to identify the corresponding piece of content. For example, the computing device may identify the piece of content based on the unique identifier of the piece of content encoded in the audio watermark. In step 576, the computing device may compare the unique identifier to content played by various devices at the user's home 102a to identify the noise source that generated the noise. For example, if the noise sample was collected at 5:05 PM and the identifier extracted from the audio watermark indicated TV Show 1, the computing device may search various content schedules for any instances of TV Show 1 scheduled to play at or before 5:05 PM (e.g., linear content scheduled to play at or before 5:05 PM or on demand content requested to play at or before 5:05 PM). The content schedule may correspond to a television program listing, such as a listing included in a television program guide. The content schedule may also correspond to a listing of content stored by the user (e.g., in a local or network DVR). The computing device may retrieve the content schedules from one or more devices at the home 102a (e.g., a STB 113 that stores the schedule) or a network storage location (e.g., from a content provider, from local office 103, etc.).
When a match for TV Show 1 is made, the computing device, in step 578, may identify the corresponding noise source scheduled to play TV Show 1 (e.g., Television 1). For example, if TV Show 1 is listed in a content schedule stored on STB 113 that provides content to Television 1, the computing device may identify Television 1 as the noise source. In step 580, the computing device may determine the location of the user device by finding the identified noise source in the user's noise profile and its associated location (e.g., as determined and/or stored in step 410). For example, the computing device may determine that Television 1 is located in the user's living room and thus determine that the user device is also currently located in the user's living room. The computing device may also determine the location of the user device without requiring the user to press the “Start” button 803 (e.g., as illustrated in
Returning to
In step 530, the computing device may determine whether an audio signal has been received from the user device (e.g., a remote control, mobile phone, etc.). For example, during a phone call, the computing device may receive an audio signal including a user's voice signal. As will be described in further detail in the examples below, the computing device may process the audio signal (e.g., by removing noise), and forward the audio signal to a phone call recipient (or an intermediate node between the computing device and the phone call recipient). Similarly, if the audio signal includes a voice command, the computing device may process the voice command signal (e.g., by removing noise), and forward the voice command signal to a voice command processor (e.g., a processor configured to identify the voice command and perform an action, such as switching channels on a television, in response to the voice command).
The computing device may wait, in step 530, to receive an audio signal. When the computing device receives an audio signal (step 530: Y), the computing device may process the received audio signal. In step 532, the computing device may determine whether an audio watermark is present in the audio signal. If the computing device does not detect an audio watermark (step 532: N), the computing device may perform additional steps as illustrated in
In step 582, the computing device may determine whether the noise sources are off. If the noise sources are off (step 582: Y), the computing device may determine that the noise sources are not contributing noise signals. The computing device may take path C and forward the audio signal to the next destination (e.g., in step 565) without performing noise removal, as will be discussed in further detail in the examples below. In step 583, the computing device may determine whether the volume of the noise sources fall below a predetermined level (e.g., a volume level that might not require removal of noise signals, such as 10% of the maximum volume for the noise source) if the noise sources are not off (step 582: N). Each noise source may have its own predetermined level. If the volume levels of the noise sources are below the one or more predetermined volume levels (step 583: Y), the computing device may determine that the noise sources are not contributing noise signals (or are contributing an imperceptible amount of noise). The computing device may take path C and forward the audio signal to the next destination (e.g., in step 565) without performing noise removal. If the volume levels of the noise sources are not below the one or more predetermined levels (step 583: N), the computing device may attempt to detect watermarks in the received audio signal.
In step 585, the computing device may continue to receive the audio signal received in step 530. For example, the computing device may transmit a command to the user device to continue receiving (e.g., recording) the audio signal. The user device may respond to the command by keeping the microphone used to receive the audio signal active (e.g., in an audio signal capture mode).
In step 587, the computing device may determine whether a predetermined time period has been exceeded. In some aspects, the computing device may extend the length of the captured audio signal by the predetermined time period. For example, if the audio signal captured in step 530 is two seconds in length and the predetermined time period is one second in length, the computing device may extend the captured audio signal to three seconds. The predetermined time period may be an arbitrary length of time, such as one second. The predetermined time period may also depend on the timing/frequency of the audio watermarks. The length of the recorded audio signal may be extended to guarantee detection of at least one watermark, if a watermark is present. For example, if watermarks are present in the noise signal every four seconds and a two second audio signal is captured in step 530, the computing device may set the predetermined time period to two seconds so that the total length of the captured audio signal is four seconds. The computing device may set the length of the captured audio signal (by adjusting the predetermined time period) to capture any number of audio watermarks (e.g., 8 seconds for two watermarks, 12 seconds for three watermarks, etc.).
In step 589, the computing device may determine whether a watermark has been detected if the time period has not yet passed (step 587: N). If a watermark has been detected (step 589: Y), the computing device may take path B in order to perform noise removal, as will be described in further detail in the examples below. If a watermark has not been detected (step 589: N), the computing device may return to step 587 to determine if the predetermined time period has been exceeded. If the predetermined time period has been exceeded (step 587: Y), the computing device may take path C and forward the audio signal to the next destination (e.g., in step 565) without performing noise removal.
Returning to
In step 540, the computing device may identify the noise signals present in the received audio signal. In some aspects, the computing device may request information identifying content previously played by one or more noise sources at the home 102a. The computing device may request the information from each user device in the home 102a configured to play content (e.g., TV 112, STB 113, PC 114, laptop 115, and/or mobile device 116), an interface device that forwards content from content sources (e.g., local office 103) to the user devices (e.g., modem 110, gateway 111, DVR, etc.), and/or any other device at the home 102a that stores this information. The computing device may similarly request the information from a device located at the local office 103, a central office, and/or any other device that stores information on content delivered to devices at the home 102a. In some aspects, the computing device may request information on content played by a subset of user devices. For example, the computing device might only request information for devices located at the same location as the user's remote control and/or phone (as determined, for example, in step 515).
The computing device may request information on content played within a predetermined time period. The time period may correspond to the length of time of the received audio signal (voice command). For example, if a two second voice command is received, the computing device may request information on content played during the two second time period of the voice command. The time period may be any predetermined length of time. For example, the computing device may request information identifying content played in the last five seconds since receiving the audio signal. The computing device may also extract noise signal identifiers (e.g., program identifiers) from the audio watermarks present in the received audio signal (e.g., a unique identifier for TV Show 1, such as TVSHOW1).
In step 545, the computing device may identify and/or receive various pieces of content corresponding to the noise signals identified in step 540. For example, the computing device may identify content provided to the user while the audio signal having noise was generated (e.g., created by noise sources and/or received by the user device, such as at the microphone). Receiving the pieces of content may include receiving a portion of the audio component of the content (e.g., a fraction of the audio component of a television program, such as the last ten seconds of the program), the entire audio component of the content (e.g., an entire forty minutes of the audio component if the television program is forty minutes long), the entire content (e.g., the entire audio component of the content, the entire video component of the content, and other data related to the content, such as timestamps, content identifiers, etc.), or any combination thereof (e.g., five minutes of the video component and forty minutes of the audio component of a piece of content).
The computing device may receive the audio component of content from various sources, such as a local office 103, a central office, a content provider, networked storage (e.g., cloud storage), and or any other common storage location. For example, the computing device may receive the audio component of content from a network DVR utilized by the user to store recorded content or content server 106 providing the content to the user. Additionally (or alternatively), the computing device may receive the audio component of content from devices at the user's home 102a. The computing device may receive the audio component of content from the television 112, STB 113, a local DVR, and/or any other device that stores (permanently or temporarily) the content. For example, if the STB buffers, caches, and/or temporarily stores the content, the computing device may retrieve the audio component of the content from the STB. In addition to receiving the audio component of content, the computing device may receive status information on the noise sources. As previously described, status information may include whether a noise source is on or off and/or the volume of the noise source during the time frame of the audio signal (voice command). As will be described in further detail in the examples below (e.g., with respect to step 555), the computing device may use the status information to determine the magnitude (e.g., contribution) of the noise source.
In step 550, the computing device may synchronize the audio signal having one or more noise signals included therein with one or more corresponding audio components of content (e.g., the content signals). The computing device may compare one or more watermarks included in the received audio signal (having both a desired signal, such as a voice command, and an undesired signal, such as a noise signal caused by a noise source) with one or more watermarks included in the audio components of content.
In some aspects, the computing device may synchronize the noise signal 620 and the audio signal 610 without using watermarks. For example, the computing device may compute the cross-correlation between the noise signal 620 and the audio signal 610. The noise signal 620 may be synchronized with the audio signal 610 at the point in time of the maximum of the cross-correlation function. The cross-correlation method may be more useful if the magnitude of the noise component of the audio signal 610 (e.g., a background television program) is large relative to the desired component of the audio signal 610 (e.g., the voice command). Accordingly, the computing device may determine whether to use cross-correlation or watermarks to synchronize the audio signal 610 (having the noise and desired components) and the noise signal 620 based on the magnitude of the noise component relative to the magnitude of the desired component. For example, if the magnitude of the noise component is three times greater than the magnitude of the desired component, the computing device may select the cross-correlation synchronization method. On the other hand, if the magnitude of the noise component is less than three times the magnitude of the desired component, the computing device may synchronize based on watermarks. Three times the magnitude is merely exemplary and any threshold may be used in deciding between synchronization methods.
Returning to
In step 560, the computing device may remove noise signals from the audio signal, such as by subtracting the synchronized and/or magnitude-adjusted audio component 630 from audio signal 610. Signal 640 represents a resulting audio signal having the audio component of a noise signal 630 removed from the received audio signal 610. As will be appreciated by one of ordinary skill in the art, other ways of subtracting signals, adding signals, performing mathematical functions on signals, correlating signals (e.g., Fast Fourier Transform), etc. to produce the resulting signal in step 560 may be performed.
In some aspects, the computing device might not adjust the magnitude of the audio component 630 before subtracting component 630 from the audio signal 610 (e.g., step 555 may be optional). Instead, the computing device may subtract the synchronized audio component 630 (without adjusting the magnitude of the audio component 630) from the audio signal 610 in step 560. The audio component 630 initially subtracted from the audio signal 610 may have a baseline magnitude (e.g., the magnitude of the content delivered to the user, as previously discussed). The computing device may then determine whether the signal-to-noise ratio (SNR) of the noise-removed audio signal is above a predetermined SNR threshold (e.g., an SNR that permits a voice command processor to identify the user command). If the SNR is not above the predetermined threshold, the computing device may adjust the magnitude of audio component 630 and subtract the new magnitude-adjusted audio component from the received audio signal 610. The computing device may determine the SNR of the resulting signal. The computing device may continue to adjust the magnitude of the audio component 630 and subtract the component from the audio signal 610 until the resulting noise-removed signal has reached the predetermined SNR or has reached an optimal SNR (e.g., the maximum SNR).
In step 565, the computing device may use and/or otherwise forward the noise-removed audio signal to the next destination. For example, if the audio signal is a voice command, the computing device may forward the audio signal to a voice command processor configured to process the voice command, such as to determine an action to take in response to the command (e.g., switch channels, play a requested program, etc.). Alternatively, if the computing device includes voice command services, the computing device may process the noise-removed audio signal itself to identify and act on the voice command. If the audio signal is part of a phone conversation, the computing device may forward the audio signal to a phone call recipient (or an intermediate node).
The various features described above are merely non-limiting examples, and can be rearranged, combined, subdivided, omitted, and/or altered in any desired manner. For example, features of the computing device described herein (which may be server 106 and/or audio computing device 118) can be subdivided among multiple processors and computing devices. The true scope of this patent should only be defined by the claims that follow.
This application is a continuation of U.S. application Ser. No. 16/905,239, filed Jun. 18, 2020, which is a continuation of U.S. application Ser. No. 16/437,737, filed Jun. 11, 2019 (now U.S. Pat. No. 10,726,862), which is a continuation of U.S. application Ser. No. 15/679,761 (now U.S. Pat. No. 10,360,924), filed Aug. 17, 2017, which is a continuation of U.S. application Ser. No. 15/175,105 (now U.S. Pat. No. 9,767,820), filed Jun. 7, 2016, which is a continuation of U.S. application Ser. No. 13/797,370 (now U.S. Pat. No. 9,384,754), filed Mar. 12, 2013. Each of the prior applications is hereby incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5848163 | Gopalakrishnan et al. | Dec 1998 | A |
6535617 | Hannigan et al. | Mar 2003 | B1 |
20020027994 | Katayama et al. | Mar 2002 | A1 |
20050071159 | Boman et al. | Mar 2005 | A1 |
20080118082 | Seltzer et al. | May 2008 | A1 |
20100280641 | Harkness et al. | Nov 2010 | A1 |
20120214515 | Davis et al. | Aug 2012 | A1 |
20120317240 | Wang | Dec 2012 | A1 |
20130058496 | Harris | Mar 2013 | A1 |
Number | Date | Country |
---|---|---|
0788089 | Aug 1997 | EP |
1278183 | Jan 2003 | EP |
2009056824 | May 2009 | WO |
Entry |
---|
May 4, 2022—Canadian Office Action—CA 2,845,088. |
Hollister, “ABC app eavesdrops on your TV to synchronize interactive content using Nielsen tech (video),” Engadget, www.engadget.com/2010/09/18/abc-app-eavesdrops-on-your-tv-to-synchronize-interactive-content/, Sep. 18, 2010, pp. 1-6. |
“Cross-Correlation”, Wikipedia, en.wikipedia.org/wiki/Cross-correlation, printed Oct. 4, 2012, pp. 1-5. |
Nielsen Media-Sync, “How it Works,” web.archive.org/web/20120419101045/http://media-sync.tv/howitworks, TVAura Mobile, LLC, Apr. 19, 2012, printed Mar. 12, 2013, pp. 1-2. |
Gorham, “Nielsen and ABCs Innovative iPad App Connects New “Generation” of Viewers,” nielsonwire, blog.nielsen.com/.../nielsen-and-abcs-innovative-ipad-app-connects-new-generation-of-viewers/, Sep. 16, 2010, pp. 1-4. |
Weil, “Synchronized Second-Screen technologies panorama,” blog.eltrovemo.com/529/synchronized-second-screen-technologies-panorama/, Nov. 15, 2011, pp. 1-12. |
“Synchronizing Two Audio Tracks,” dsp.stackexchange.com/questions/1418/synchronizing-two-audio-tracks, printed Oct. 4, 2012, pp. 1-2. |
Extended European Search Report—EP 14159149.5—dated Sep. 4, 2015. |
May 4, 2017—European Office Action—EP 14159149.5. |
Mar. 22, 2018—European Office Action—EP 14159149.5. |
Jan. 29, 2020—Canadian Office Action—CA 2,845,088. |
Number | Date | Country | |
---|---|---|---|
20210407531 A1 | Dec 2021 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16905239 | Jun 2020 | US |
Child | 17343433 | US | |
Parent | 16437737 | Jun 2019 | US |
Child | 16905239 | US | |
Parent | 15679761 | Aug 2017 | US |
Child | 16437737 | US | |
Parent | 15175105 | Jun 2016 | US |
Child | 15679761 | US | |
Parent | 13797370 | Mar 2013 | US |
Child | 15175105 | US |