1. Field of the Invention
The present teachings generally relate to methods and in-vehicle audio systems in which audio content can be added to or deleted from a storage device of the in-vehicle audio systems.
2. Discussion of Related Art
A speech recognition system uses one or more vocabulary dictionaries in order to phonetically match an utterance of a user. In some speech recognition systems such as, for example, an in-vehicle audio system, audio content, such as music or other audio content, may be added to or deleted from the in-vehicle audio system. Each item of audio content may have a word or a phrase associated therewith. The word or the phrase may be a title of an item of audio content. A user may cause the in-vehicle audio system to play an item of audio content by speaking a command, which may include the title of the item of the audio content. Thus, as items of audio content are added to and/or deleted from the in-vehicle audio system, the vocabulary dictionary of the speech recognition system will become more outdated unless the vocabulary dictionary is compiled. However, compiling the vocabulary dictionary may take some time, during which a speech recognition feature of the in-vehicle audio system may not be available to the user.
This Summary is provided to introduce a selection of concepts in a simplified form that is further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
An in-vehicle audio system is provided which permits a user to operate the in-vehicle audio system by speaking a command. The in-vehicle audio system may include a speech recognition component and a storage device including a storage medium for storing audio content. A respective word or a respective phrase may be associated with each item of stored audio content. In some embodiments the audio content may include songs or musical pieces. The in-vehicle audio system may play one of the items of the audio content in response to the user uttering a command such as, for example, “play”, or other command, followed by the word or the phrase associated with the one of the items of the audio content.
Audio content may be copied or ripped from a storage medium, such as, for example, a compact disc (CD), a digital video disc (DVD) or another type of storage medium, to a medium of a storage device of the in-vehicle audio system. Further, audio content stored on the medium of the storage device may be deleted.
When audio content is to be added to the in-vehicle audio system, phonetics corresponding to words or phrases associated with the audio content to be added may be generated. The generated phonetics may be added to the vocabulary dictionary when the vocabulary dictionary is compiled, such that an utterance, including words or phrases corresponding to the generated phonetics, may be recognized by the in-vehicle audio system. In various embodiments, compiling of the vocabulary dictionary to add the generated phonetics may begin while the audio content is being added to the in-vehicle audio system and the compiling may be completed before the adding of the audio content to the in-vehicle audio system is completed.
In some embodiments, when the audio content is to be deleted from the in-vehicle audio system, the vocabulary dictionary may be updated by being compiled during a shutdown process of the in-vehicle audio system. In other embodiments, the vocabulary dictionary may be compiled shortly after determining that the audio content is to be deleted from the in-vehicle system.
In order to describe the manner in which the above-recited and other advantages and features can be obtained, a more particular description is described below and will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments and are not therefore to be considered to be limiting of its scope, implementations will be described and explained with additional specificity and detail through the use of the accompanying drawings.
Overview
An in-vehicle audio system may include a speech recognition component such that a user may operate the in-vehicle audio system by speaking a command. The in-vehicle audio system may include a storage device having a storage medium for storing audio content. Each item of the audio content may have a word or a phrase associated therewith. For example, in an embodiment in which the items of the audio content include songs or musical pieces, a word or a phrase associated with an item of the audio content may be a title of the item. The user may cause the in-vehicle audio system to play one of the items of the audio content by simply speaking a command such as, for example, “play”, or another verbal command, and the word or the phrase associated with the item.
The in-vehicle audio system may copy or rip audio content from a storage medium such as, for example, a compact disc (CD), a digital video disc (DVD), another type of optical medium, or another type of storage medium, to a medium of a storage device of the in-vehicle audio system. Further, audio content of the medium of the storage device may be deleted to make room for storing other audio content on the medium of the storage device.
When audio content is added to the in-vehicle audio system, words or phrases associated with the audio content to be added may be determined and corresponding phonetics may be generated. The generated phonetics may be added to the vocabulary dictionary when the vocabulary dictionary is compiled, such that an utterance, including words or phrases corresponding to the generated phonetics, may later be recognized by the speech recognition component. In various embodiments, compiling of the vocabulary dictionary to add the generated phonetics may begin while the audio content is being added to the in-vehicle audio system and the compiling may be completed before the adding of the audio content to the in-vehicle audio system is completed. Therefore, the speech recognition component of the in-vehicle audio system may be capable of recognizing words or phrases associated with added audio content when the adding of the audio content to the in-vehicle audio system is completed.
In some embodiments, when audio content is to be deleted from the in-vehicle audio system, the vocabulary dictionary may be updated by being compiled during a shutdown process of the in-vehicle audio system. The shutdown process may be initiated by detection of an occurrence of a particular event such as, for example, an ignition off event or other event. Therefore, in embodiments in which compiling of the vocabulary dictionary may be time-consuming, the vocabulary dictionary may be compiled during the shutdown process, thereby making unavailability of the speech recognition feature during the compiling less noticeable to the user.
In some embodiments, the vocabulary dictionary may be organized into a number of different portions. The portions may be arranged alphabetically by a word or a phrase associated with each item of audio content, by genre of an item of audio content, or by another type of arrangement. For example, if the items of audio content include music, each of the portions of the vocabulary dictionary are arranged to correspond to respective genres of music such as, for example, classical, rock, jazz, pop, oldies, etc. As an example, phonetics corresponding to a word or a phrase associated with an item of audio content of the genre “rock” may be included in the portion of the vocabulary dictionary corresponding to the genre “rock”. When adding items of “rock” audio content to the in-vehicle audio system, phonetics corresponding to words or phrases associated with each of the items of the audio content may be added to the vocabulary dictionary by compiling only the portion of the vocabulary dictionary corresponding to the genre “rock”.
Similarly, when deleting one or more items of the “rock” audio content from the in-vehicle audio system, only the portion of the vocabulary dictionary corresponding to the genre “rock” may be compiled. When only one or more portions of the vocabulary dictionary are being compiled, a time for completing compiling is less than a time for compiling all of the vocabulary dictionary. In embodiments in which less than all of the vocabulary dictionary may be compiled, when one or more items of audio content are deleted from the in-vehicle audio system, the vocabulary dictionary may be compiled at approximately a time when the one or more items of audio content are deleted.
Exemplary Devices
Processor 102 may include one or more conventional processors that interpret and execute instructions stored in a medium, such as memory 104, a media card, a flash RAM, or other medium. A tangible storage medium may include a memory, a media card, a flash card, or other storage medium. Memory 104 may include random access memory (RAM) or another type of dynamic storage device, and read-only memory (ROM) or another type of static storage device, for storing information and instructions for execution by processor 102. RAM, or another type of dynamic storage device, may store instructions as well as temporary variables or other intermediate information used during execution of instructions by processor 102. ROM, or another type of static storage device, may store static information and instructions for processor 102.
Command input device 106 may include a microphone for speech input, one or more hard or soft buttons, a keyboard, a touchscreen, or other input device.
Storage device 108 may include a medium 110 for storing audio content, such as, for example, music or other audio content. In one embodiment, storage device 108 may be a hard disk drive and medium 110 may be a hard disk.
Audio output device 112 may include one or more speakers, a headset, or other sound reproducing device for outputting audio content.
Audio input device 114 permits audio content to be input to in-vehicle audio system 100. When operational, audio input device 114 may include a medium 116 that stores a representation of audio content. In one embodiment, audio input device 114 may include an optical medium reader such as, for example, a compact disc (CD) reader or a digital video disc (DVD) reader, and medium 116 may be a CD or a DVD, respectively.
Speech recognition component 118 may recognize speech input and may convert the recognized speech input to text. Speech recognition component 118 may include a vocabulary dictionary 120. Vocabulary dictionary 120 may include phonetics corresponding to commands and words or phrases. Each of the words or phrases may be associated with audio content. For example, when an item of audio content is music, a corresponding word or phrase, associated therewith, may be a title of the item of audio content. In some embodiments, speech recognition component 118 may include one or more software modules to be executed by processor 102.
Compiler 122 may compile at least a portion of vocabulary dictionary 120 in order to add or delete phonetics corresponding to a word or a phrase associated with audio content added to in-vehicle audio system 100 or audio content deleted from in-vehicle audio system 100, respectively.
Although,
As an example of fingerprinting, suppose that the medium is a CD and the audio content stored thereon includes items of music. Fingerprinting the CD may result in a determination that the CD has N items of music stored thereon with item 1 having a length of I1 followed by a pause of a length J1, item 2 having a length of I2 followed by a pause of a length J2, etc. After the fingerprinting, a database query may provide a fingerprint match for the CD (i.e., a database match for a CD with item 1 having a length of I1 followed by a pause of a length J1, item 2 having a length of I2 followed by a pause of a length J2, etc.). As a result of being provided the fingerprint match, information regarding contents of the CD may be provided, such as a respective word or a respective phrase associated with each item of music stored on the CD. Each of the respective words or the respective phrases may be a respective title of each of the items of music.
A TTS engine may be used to produce phonetics corresponding to the respective word or the respective phrase associated with each of the items of music. The phonetics may be provided as input to compiler 122 when compiling vocabulary dictionary 120 (304). Eventually, compiler 122 completes compiling vocabulary dictionary 120 (306).
During a time in which vocabulary dictionary 120 is being compiled, speech recognition may be unavailable. Otherwise, while the items of music are being ripped, speech recognition may be available for in-vehicle audio system 100, but may not be available for the items of music being ripped. After ripping is completed (308), speech recognition may again be available for all audio content stored in in-vehicle audio system 100, including recently ripped audio content, such as the items of music.
Although the above example refers to ripping items of music from a CD, in other embodiments, items of audio content, which may or may not include music, may be ripped from another type of medium, which may be fingerprinted and matched as described above.
In a variation of the embodiment described with respect to
In some embodiments, compiling vocabulary dictionary 120 may take a substantial amount of time, during which a speech recognition feature of in-vehicle audio system 100 may be unavailable.
After receiving a command to delete one or more items of audio content, phonetics corresponding to the one or more items of audio content may be provided to compiler 122 and vocabulary dictionary 120 may be compiled during a shutdown process, with the phonetics provided as input to compiler 122 (404). The shutdown process may begin (402) after detecting an occurrence of an event such as, for example, an ignition off event (i.e., turning off an ignition of a vehicle that includes in-vehicle audio system 100) (400). Compile process 404 may be completed (406) before an end of the shutdown process (408).
In another embodiment, compiler 122 may compile only part of vocabulary dictionary 120, thereby shortening a duration of a compilation process of compiler 122. In such an embodiment, phonetics of vocabulary dictionary 120 may be organized in a specific manner. For example, the phonetics may be organized in alphabetical order (with respect to corresponding words or phrases associated with items of audio content), may be organized by category such as genre or other types of categories, or may be organized in a different manner. Vocabulary dictionary 120 may include a number of portions. As an example, if vocabulary dictionary 120 is organized alphabetically, then a first portion may include phonetics corresponding to words and phrases beginning with letters “a” through “d”, a second portion may include phonetics corresponding to words and phrases beginning with letters “e” through “h”, etc. In this embodiment, only those portions of vocabulary dictionary 120 that are changing may be compiled by compiler 122.
Exemplary Processes
If, during act 604, in-vehicle audio system 100 determines that the received command is a command only for adding audio content, then phonetics for a word or a phrase associated with each item of audio content to be added may be created or produced (act 606). Creating of the phonetics may include: fingerprinting a medium from which items of audio content are to be ripped or copied; finding, in a database, a match for the fingerprinted medium to provide a respective word or a respective phrase associated with each of the items of audio content to be ripped; and generating or producing, via a TTS engine, phonetics corresponding to the respective word or the respective phrase associated with each of the items of audio content to be ripped.
Next, audio content may begin to be added to medium 110 of storage device 108 of in-vehicle audio system 100 (act 608). Compiler 122 may then compile vocabulary dictionary 120 using the produced phonetics as input (act 610). In some embodiments, all of vocabulary dictionary 120 may be compiled and in other embodiments, only one or more portions of vocabulary dictionary 120 may be compiled. In-vehicle audio system 100 may then complete adding the audio content to medium 110 of storage device 108 (act 611). In-vehicle audio system 100 may then determine whether audio content is to be deleted (act 612). If audio content is to be deleted (as a result of receiving a command to delete the audio content from in-vehicle system 100) then in-vehicle audio system 100 may delete the audio content from medium 110 of storage device 108 (act 613). The process may then be completed.
If, during act 604, in-vehicle audio system 100 determines that the received command is not a command only to add audio content, then in-vehicle audio system 100 may determine whether the received command includes commands to add and delete audio content (act 614). If in-vehicle audio system 100 determines that the received command includes the commands to add and delete audio content, then in-vehicle audio system 100 may create or produce phonetics for one or more words or one or more phrases associated with one or more items of audio content to be deleted (act 616). Acts 608-613 may again be performed, as previously discussed. The process may then be completed.
If, during act 614, in-vehicle audio system determines that the received command does not include commands to add and delete audio content, then the received command may be assumed to include only a command to delete audio content. Phonetics for at least one word or at least one phrase associated with one or more items of audio audio content to be deleted may be created or produced (act 618). Compiler 122 may then compile vocabulary dictionary 120 using the produced phonetics as input (act 620). Acts 612-613 may again be performed. The process may then be completed.
The exemplary process illustrated by the flowchart of
As previously discussed, in some embodiments, all of vocabulary dictionary 120 may be compiled when either adding or deleting audio content with respect to in-vehicle audio system 100. In such embodiments, when deleting audio content, compiler 122 may compile vocabulary dictionary 120 at a time when a user is unlikely to notice unavailability of a speech recognition feature. This may be accomplished by compiling vocabulary dictionary 120 during a shutdown process when audio content is to be deleted from in-vehicle audio system 100.
In-vehicle audio system 100 may then determine whether vocabulary dictionary 120 is to be compiled by compiler 122 during the shutdown process (act 806). Vocabulary dictionary 120 may be compiled during the shutdown process when a delete audio content command was previously received by in-vehicle audio system 100 and vocabulary dictionary 120 has not been compiled since the delete audio content command was received. If in-vehicle audio system 100 determines that vocabulary dictionary 120 is to be compiled, then in-vehicle audio system 100 compiles vocabulary dictionary 120 using previously produced phonetics as input (act 808). The previously produced phonetics may correspond to one or more words or one or more phrases associated with one or more items of audio content to be deleted, or already deleted. Eventually, compiler 122 of in-vehicle audio system 100 completes compiling of vocabulary dictionary 120 (act 810). Sometime after completing the compiling of vocabulary dictionary 120, the shutdown process may be completed (act 812) and the process illustrated by
If, during act 806, in-vehicle audio system 100 determines that compiler 122 is not to compile vocabulary dictionary 120 during the shutdown process, then vocabulary dictionary 120 will not be compiled by compiler 122 during the shutdown process. Eventually, in-vehicle audio system 100 will perform act 812, as previously discussed, and the process illustrated by the flowchart of
Conclusion
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms for implementing the claims.
Although the above descriptions may contain specific details, they are not to be construed as limiting the claims in any way. Other configurations of the described embodiments are part of the scope of this disclosure. In addition, acts illustrated by the flowcharts of
This application claims priority, pursuant to 35 U.S.C. §119(e), to U.S. Provisional Application 61/265,569, filed in the U.S. Patent and Trademark Office on Dec. 1, 2009, and specifically incorporated herein, in its entirety, by reference.
Number | Name | Date | Kind |
---|---|---|---|
5638487 | Chigier | Jun 1997 | A |
6094635 | Scholz et al. | Jul 2000 | A |
6232539 | Looney et al. | May 2001 | B1 |
6298324 | Zuberec et al. | Oct 2001 | B1 |
6389394 | Fanty | May 2002 | B1 |
6473734 | Dvorak | Oct 2002 | B1 |
6654955 | Kusnitz et al. | Nov 2003 | B1 |
7100195 | Underwood | Aug 2006 | B1 |
7243069 | Jaepel et al. | Jul 2007 | B2 |
7729913 | Lee et al. | Jun 2010 | B1 |
8005668 | Arun | Aug 2011 | B2 |
8094949 | Rhoads | Jan 2012 | B1 |
8244536 | Arun | Aug 2012 | B2 |
20020007278 | Traynor | Jan 2002 | A1 |
20020048350 | Phillips et al. | Apr 2002 | A1 |
20030088415 | Kobal et al. | May 2003 | A1 |
20030118973 | Noble | Jun 2003 | A1 |
20030120493 | Gupta | Jun 2003 | A1 |
20040111259 | Miller et al. | Jun 2004 | A1 |
20040181391 | Inoue et al. | Sep 2004 | A1 |
20040193416 | Emonts et al. | Sep 2004 | A1 |
20040199387 | Wang et al. | Oct 2004 | A1 |
20050080797 | Short | Apr 2005 | A1 |
20050171783 | Suominen | Aug 2005 | A1 |
20060200442 | Parikh | Sep 2006 | A1 |
20060206327 | Hennecke et al. | Sep 2006 | A1 |
20060230350 | Baluja | Oct 2006 | A1 |
20070005206 | Zhang et al. | Jan 2007 | A1 |
20070005360 | Huning et al. | Jan 2007 | A1 |
20070112569 | Wang et al. | May 2007 | A1 |
20070156407 | Schedl | Jul 2007 | A1 |
20070156762 | Ben-Yaacov et al. | Jul 2007 | A1 |
20070225970 | Kady et al. | Sep 2007 | A1 |
20080140401 | Abrash et al. | Jun 2008 | A1 |
20080211641 | Murray et al. | Sep 2008 | A1 |
20090024392 | Koshinaka | Jan 2009 | A1 |
20090076681 | Yasue et al. | Mar 2009 | A1 |
20090083314 | Maim | Mar 2009 | A1 |
20090118265 | Peters et al. | May 2009 | A1 |
20090259467 | Sumiyoshi et al. | Oct 2009 | A1 |
20100031143 | Rao et al. | Feb 2010 | A1 |
20100070921 | Rieman et al. | Mar 2010 | A1 |
20100073132 | Dybalski et al. | Mar 2010 | A1 |
20100235831 | Dittmer | Sep 2010 | A1 |
Number | Date | Country |
---|---|---|
06188565 | Oct 1994 | EP |
1233407 | Aug 2002 | EP |
1300829 | Apr 2003 | EP |
1693829 | Aug 2006 | EP |
2002215186 | Jul 2002 | JP |
2008089825 | Apr 2008 | JP |
2004029931 | Apr 2004 | WO |
Entry |
---|
PCT/US2010/057607, “Notification of Transmittal of the International Search Report and the Written Opinion of the International Searching Authority, or the Declaration,” mailed Jan. 28, 2011, 11 pages (corresponds to present U.S. application). |
“Centrafuse 2.0 User Guide”, 68 pages, Flux Media, Inc., 2008. |
“Centrafuse | Car PC Front-end software”, retrieved from http://www.fluxmedia.net on May 1, 2009, 4 pages. |
“inCar Terminal | Car-PC Shop für Auto Computer!”, retrieved from http://www.incarterminal.de/ on May 1, 2009, 2 pages. |
Abstract of JP2008089825, Voice Recognition Apparatus and Voice Recognition Program, published Apr. 17, 2008. |
Abstract of JP2002215186, Speech Recognition System, published Jul. 31, 2002. |
Number | Date | Country | |
---|---|---|---|
20110131037 A1 | Jun 2011 | US |
Number | Date | Country | |
---|---|---|---|
61265569 | Dec 2009 | US |