Karaoke has proven to be a popular form of entertainment. Traditionally, karaoke is the performance of popular songs to a pre-recorded instrumental soundtrack (i.e. there are no lead vocals on the track). Often the lyrics of the song will be played along with the audio track, and will be highlighted or scrolled at the correct time and tempo to make it easier for the singer to follow along. Although generally done at a karaoke bar or at a party or other event, karaoke has grown in popularity in others venues, such as in automobiles (i.e. “in-car karaoke”).
In-car karaoke is an extremely popular form of entertainment in Japan. Instead of just singing along to songs on the radio or in-car entertainment system, drivers will often playback karaoke tracks while driving and sing along. There are a number of disadvantages of in-car karaoke that have prevented it from penetrating the mainstream. One disadvantage is the potential distraction to the driver if there is a need to follow along with visually presented lyrics. For safety, it is important to minimize driver distraction during automobile operation. But without guide lyrics, it is often difficult for an amateur performer to properly follow along and sing at the right times and tempo.
Another disadvantage is the need to provide karaoke ready recordings for use in the car. Pre-recorded karaoke tracks are relatively expensive and must be compiled in some re-playable format and source (i.e. cd-rom, tape, mp3 player, etc.) to be available in a car. This requires advance preparation and can remove some of the spontaneity from enjoying in-car karaoke.
The driver can abandon prerecorded karaoke tracks, and sing along with music, whether from mp3, FM, CD, or satellite radio, but this is not quite the same as Karaoke. The vocals of the recorded artist can overwhelm the vocals of the karaoke singer and diminish the performance experience.
The system describes a karaoke system that enhances the experience of singing along with music, but without the need to display the lyrics. The system includes a combination of a vocal track reducer and an echo canceller, decision logic for determining when a person is talking or singing (double-talk detector) and a method for “ducking” (i.e., attenuating) the vocal track when the singing is detected. No special CD or DVD with lyric tracks is required, making the system capable of working with CD, mp3, AM, FM, HD radio, satellite radio signals, or any other suitable content source. The result is that any content source may potentially be used as a karaoke soundtrack without any pre-modification.
The invention can be better understood with reference to the following drawings and description. The components in the Figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Moreover, in the Figures, like reference numerals designate corresponding parts throughout the different views.
A simplified Karaoke system is provided where a singer sings along to pre-recorded music that already includes a vocal track. When the system is activated, the singer sings along to the music and the vocal track in the music is automatically attenuated whenever the person sings. As long as the person is singing, the automatic attenuation is invoked. If the person stops singing then the vocal track returns. In some cases, the system can give the impression that the singer is participating in a “duet” with the artist. The system also provides a method of teaching the lyrics to a song. While the person sings the artist is quiet, stepping in to help only when the person can not remember the words and is quiet.
In one embodiment, the system is envisioned as being implemented in an automobile setting. In this description the term “driver” can refer to person in the vehicle who is singing, which can be the actual driver of the vehicle or to anyone else in the vehicle who is singing. Although envisioned as being useful in an automobile setting, the system may also be implemented in any other setting as well, and can be useful in a home or commercial environment as desired.
At step 203 the content playback begins. At decision block 204 the system determines if a live voice (non-content vocal source) is detected. For example, if the system is in a vehicle, the driver might be attempting to sing along with the content. In other embodiments, the driver and/or passengers may just be talking. The system checks at step 204 to determine if there is any vocal input from a non-content source.
If there is no detected non-content vocal source at step 204, the system simply continues with normal, non-attenuated playback at step 203, and continues checking for a non-content vocal source. If a non-content vocal source is detected at decision block 204, the system attenuates the vocal track of the pre-recorded content at step 205 and returns to step 203.
In one embodiment, the system only attenuates the pre-recorded vocal track when it detects a non-content vocal source. This means that between lines or verses of the pre-recorded content, when the driver isn't singing, the system returns to normal playback. This can assist a hesitant karaoke singer by playing the first word or words of the next line in a normal fashion if the driver/singer is not sure when to begin singing again, or what the words of the song are. This makes it easier for the driver/singer to follow along and to sing at the appropriate times.
In another embodiment, the system continuously provides attenuation throughout the duration of the pre-recorded song when it has detected a non-content vocal source, in the assumption that the driver/singer wishes to perform karaoke for the entirety of that content.
Non-Content Vocal Source Detection
As noted above, the system actively attenuates the vocal track of a content source when the system detects a non-content vocal source. In one embodiment, the system accomplishes this by detecting vocal energy above a threshold level on a microphone (such as a microphone in a vehicle). When vocal energy above the threshold is detected, the system attenuates the pre-recorded vocal track.
A microphone that is not directly in front of the person providing the non-content vocal source is called a “far-field” microphone. In other words, there is some distance between the singer and the microphone. In a vehicle for example, the microphone may be placed near the rear view mirror, or near a sun visor location. The use of a far-field microphone introduces particular energy detection problems. In particular, there are a number of audio energy sources in addition to the driver/singer that are detected by the microphone. For example, the pre-recorded music playing over the vehicle speakers is picked up by a far-field microphone at nearly the same energy as the would-be singer, making discrimination of the driver's voice and the pre-recorded music difficult. Discriminating between the signals using the power ratio is also difficult because the power ratio between the reference music and the microphone input can be significantly greater than or less than 1.0, so there is no set level of music expected on the microphone. A vehicle environment also includes a number of noise sources that are neither the singer nor the content. These noise sources include road and vehicle noise, wind noise, passenger chatter, cell phone ringing, climate control fans, and the like.
The system includes the ability to discriminate between sound sources so that a singer can be detected reliably and the operation of the system can be invoked appropriately. In one embodiment, the system uses a far-field echo canceller to remove the contribution of the music from the microphone channel and provide a reliable indicator of local voice presence to initiate attenuation of the song's vocal track.
The vocal track processor 107 also outputs the full music plus vocal signal 108 to Acoustic Echo Canceller (AEC) 109. The AEC 109 also receives input from cabin microphone 105. AEC 109 outputs a signal to node 110 that will modify (attenuate) the vocal signal 117 when a singer is detected so that the output 112 of summing node 111 will be the music signal 116 with attenuated vocal signal.
As can be seen at cabin 101, the microphone 105 receives sound signals from multiple sources, including speaker 104, singer 102, and noise 115 from noise sources 103. The speaker output 113 is an echo signal and the singers output 114 is the non-content vocal source to be detected.
Operation of Acoustic Echo Canceller (AEC)
The Acoustic Echo Canceller (AEC) 109 determines when the driver 102 (or other passengers if the car cabin 101 contains multiple microphones) is vocally active. In a car cabin 101, the microphone 105 is typically housed in the rear-view mirror (or some other “distant” location) and is considered “far away” from the driver's mouth. The microphone signal, y 118, consists of three signals: (1) an echo signal 113 which is the processed reference signal, x 112, emitted by the loudspeaker 104; (2) local noise 115 from the car cabin 101; (3) the driver/singer's voice 114. The AEC 109 compares the microphone signal 118 with the song's music signal 109 and determines if the driver 102 is vocally active during the song. In an acoustic echo cancellation system, this simultaneous vocal activity is referred to as “double talk” (DT). When active, the AEC 109 outputs signal 120 (which in one embodiment is 1-DT) to node 110. When there is double talk detected, the combination of signal 120 with vocal signal 117 at node 110 will result in attenuation of the vocal signal 117.
One aspect of the system is that it uses some of the AEC's analysis methods to attenuate the vocal track portion of the song. As the double talk level increases, the vocal track portion mixed into the reference signal, x, decreases, thereby “ducking” the song's vocals.
Vocal Track Processing
For Karaoke purposes, a song can be considered to be composed of two components: instrumental music 116 and vocals 117. Vocal track processing provides a real-time method to separate, and subsequently attenuate, the vocal component from the music of any song material, thereby eliminating the need to use pre-processed audio material that has already separated the vocals from the rest of the instrumental music. Vocal track processing allows the system to accept any audio source, such as a decoded MP3 stream, radio (AM/FM/Satellite), CD, or any other content source as its input. By using generally available audio sources instead of special CDs (or other audio formats) that have had their vocal tracks removed, the system does not require recurring costs for purchasing new material and is not limited to the selection of special Karaoke source material.
There are a number of known ways to attenuate vocals from a song. For a stereo (2 channel) track, one simple method is to simply subtract one channel from the other. For example, if an original 2-channel stereo recording's vocals were panned to the center, then the difference between the left and right channels (e.g., L-R or R-L) can reduce the vocal component. A slightly more complicated method filters/equalizes the signals before subtraction so that instrumental music is not as likely to be mistakenly removed. More sophisticated methods analyze the song content more closely by decomposing the input signal into frequency bands and calculating various measures, including the coherence between the left and right channels, to help further isolate the vocal track from the instrumental music. The system can utilize any current or future system for vocal track removal.
The application does not have to be Karaoke, but could just be a system for improving communication among people in a room. For example, a song could be played in a room, but the vocal track could be reduced any time someone talks so that communication is easier for people. Once the person stops talking, the vocal track in the song comes back full. Such a system could also improve in-car communication among vehicle occupants.
The illustrations have been discussed with reference to functional blocks identified as modules and components that are not intended to represent discrete structures and may be combined or further sub-divided. In addition, while various embodiments of the invention have been described, it will be apparent to those of ordinary skill in the art that other embodiments and implementations are possible that are within the scope of this invention. Accordingly, the invention is not restricted except in light of the attached claims and their equivalents.
Number | Name | Date | Kind |
---|---|---|---|
3916104 | Anazawa et al. | Oct 1975 | A |
5428708 | Gibson et al. | Jun 1995 | A |
5541359 | Lee | Jul 1996 | A |
5649019 | Thomasson | Jul 1997 | A |
5876213 | Matsumoto | Mar 1999 | A |
6744974 | Neuman | Jun 2004 | B2 |
6816833 | Iwamoto et al. | Nov 2004 | B1 |
6912501 | Vaudrey et al. | Jun 2005 | B2 |
7079026 | Smith | Jul 2006 | B2 |
7122732 | Cho et al. | Oct 2006 | B2 |
7337111 | Vaudrey et al. | Feb 2008 | B2 |
20010008100 | Devecka | Jul 2001 | A1 |
20050140519 | Smith | Jun 2005 | A1 |
20060050894 | Boddicker et al. | Mar 2006 | A1 |
20060052167 | Boddicker et al. | Mar 2006 | A1 |
20070206929 | Konetski et al. | Sep 2007 | A1 |
20070218444 | Konetski et al. | Sep 2007 | A1 |
20080134866 | Brown | Jun 2008 | A1 |
20090022330 | Haulick et al. | Jan 2009 | A1 |
20090038467 | Brennan | Feb 2009 | A1 |
20090104956 | Kay et al. | Apr 2009 | A1 |
20090165634 | Mahowald | Jul 2009 | A1 |
20090265164 | Yoon et al. | Oct 2009 | A1 |
20090304196 | Patton | Dec 2009 | A1 |
20090314154 | Esaki et al. | Dec 2009 | A1 |
20100014692 | Schreiner et al. | Jan 2010 | A1 |
20100107856 | Hetherington et al. | May 2010 | A1 |
20100300267 | Stoddard et al. | Dec 2010 | A1 |
20100304810 | Stoddard | Dec 2010 | A1 |
20100304812 | Stoddard et al. | Dec 2010 | A1 |
Number | Date | Country |
---|---|---|
08103000 | Apr 1996 | JP |
2009147625 | Jul 2009 | JP |
2009150920 | Jul 2009 | JP |
Number | Date | Country | |
---|---|---|---|
20100107856 A1 | May 2010 | US |