The present application relates generally to audio signal processing, and more specifically to systems and methods for assisting automatic speech recognition.
Speech recognition servers can receive and recognize voice input. Typically, speech recognition servers reside in a cloud-based computing resource and receive the input sent to them over a wired and/or wireless network(s) in real-time. Some mobile devices may have the user press a button to signal the mobile device to activate speech recognition. After the speech recognition is activated, the user can speak to the device. Various mobile devices allow the user to use a wakeup keyword to activate the speech recognition (e.g., “ok Google”) on the mobile device. In response to a command uttered by the user (e.g., “when is the next 49ers game?”), the user can expect a quick response.
Users sometimes have to utter commands in noisy conditions, such as when there are other voices in the background. In such conditions, the speech recognition (SR) engine may receive the microphone input that includes speech from both the speaker (the user), as well as speech from other speakers speaking in the background. Accordingly, the SR engine may not recognize the speech of the speaker accurately.
In particular, an issue may arise with some pre-processing algorithms using multiple microphones and taking time to adjust parameters to optimal values when a voice comes from a new direction. This can occur, for example, when a user changes his/her position relative to the device (for example, the user moves to a different part of a room relative to a tablet or a TV set, or changes his/her hand orientation while holding a cellphone). When a talker (speaker) first speaks from the new position, the processor/algorithm can adapt many of its internal parameters to account for this (for example, direction of arrival estimate, either explicitly or implicitly, noise estimates) and then settle on optimal parameters for the new orientation. During this transitional time, however, the processing is not optimal and may even degrade the speech signal. Thus, the beginning of the utterance can be distorted or, at best, the speech can be processed with less noise removed until the processor/algorithm settles on the optimal parameters.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
According to an aspect of the present disclosure, a system for assisting automatic speech recognition (ASR) is provided. According to an example embodiment, the system includes a buffer operable to receive sensor data. The sensor data can include at least one acoustic signal. The system may include a processor communicatively coupled to the buffer. The processor can be operable to store received sensor data in the buffer (creating stored sensor data, also referred to as buffered sensor data). In some embodiments, the processor is operable to analyze the received sensor data to produce new parameters associated with the sensor data. The processor may be further operable to process the stored/buffered sensor data based at least on the new parameters and provide at least the processed sensor data to an ASR system.
In some embodiments, the ASR system is operable to further process the processed sensor data at a speed faster than real time.
In various embodiments, the processor is further operable to replace the sensor data in the buffer with the next portion of the sensor data.
In some embodiments, the processor is further operable to provide the new parameters to an ASR system that is configured to analyze the processed sensor data based at least in part on the new parameters. In response to receiving a notification from the ASR system, the processor is further operable to send the sensor data to the ASR system for further processing based at least in part on the new parameters.
In some embodiments, the ASR system is located remotely. The processor can be communicatively coupled to the ASR system via a high-speed network.
According to various embodiments, the new parameters include one or more of the following: inter-microphone energy level differences, inter-microphone phase differences, acoustic signal energy, estimated pitch, and estimated saliency of the pitch.
In certain embodiments, processing the sensor data includes separating a clean voice from noise in the acoustic signal and providing the processed sensor data, including the clean voice. The separating may include performing at least one of the following: noise suppression and noise reduction.
In some embodiments, the analyzing the sensor data includes determining a direction of arrival of the acoustic signal. According to various embodiments, the sensor data are provided by one or more of the following: a sound sensor configured to capture the at least one acoustic signal, a motion sensor, an environment sensor, a radio sensor, and a light sensor.
According to another aspect of the present technology, a method for assisting automatic speech recognition is provided. An example method includes receiving sensor data. The sensor data may include at least one acoustic signal. The method includes storing the sensor data in a buffer, according to exemplary embodiments. The method can allow processing the sensor data to produce new parameters. The method can include processing the sensor data based at least in part on the new parameters and providing the processed sensor data to an ASR system. In certain embodiments, the ASR system is operable to receive and further process the processed sensor data at a speed faster than real time.
According to another example embodiment of the present disclosure, the steps of the method for assisting automatic speech recognition are stored on a machine-readable medium comprising instructions, which, when implemented by one or more processors, perform the recited steps.
Other example embodiments of the disclosure and aspects will become apparent from the following description taken in conjunction with the following drawings.
Embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements.
The present disclosure provides exemplary systems and methods for assisting automatic speech recognition. Embodiments of the present disclosure can be practiced on any mobile device configured to receive and/or provide audio, by way of example and not limitation, a media player, personal digital assistant, mobile telephone, smart phone, phablet, tablet computer, netbook computer, notebook computer, hand-held computing system, wearable computing system, other mobile computing system, and the like.
Mobile devices can include: radio frequency (RF) receivers, transmitters, and transceivers; wired and/or wireless telecommunications and/or networking devices; amplifiers; audio and/or video players; encoders; decoders; speakers; inputs; outputs; storage devices; user input devices. Mobile devices may include inputs such as buttons, switches, keys, keyboards, trackballs, sliders, touch screens, one or more microphones, gyroscopes, accelerometers, global positioning system (GPS) receivers, and the like. Mobile devices can include outputs, such as LED indicators, video displays, touchscreens, speakers, and the like. In some embodiments, mobile devices are hand-held devices, such as wired and/or wireless remote controls, notebook computers, tablet computers, phablets, smart phones, personal digital assistants, media players, mobile telephones, and the like. In other embodiments, mobile devices are wearable devices, such as smart watches, glasses, and the like.
According to exemplary embodiments, a method for assisting automatic speech recognition includes receiving sensor data. The sensor data may include at least one acoustic signal. The method can include storing the sensor data in a buffer. The method can further allow processing the sensor data to produce new parameters. The method includes processing the sensor data based at least in part on the new parameters and providing the processed sensor data to an ASR system, according to some embodiments. The ASR system may be operable to receive and process the processed sensor data at a speed faster than real-time.
One or more sensors 210 may include acoustic and/or non-acoustic sensor(s) and in that regard may be at least one of a sound sensor, motion and/or orientation (inertial) sensor, environmental sensor, radio, and the like. Sound sensors include, for example, transducers, such as acoustic-to-electric transducers (for example, microphones) that convert sound into an electrical signal. Sound sensors can sense speech, music, ambient sounds, and the like in acoustic environment 100 (
Motion and/or orientation (inertial) sensors include, for example, magnetometers, accelerometers, gyroscopes, and the like. A magnetometer, such as a compass, measures the strength and/or direction of a magnetic field, in order to determine a direction in a frame of reference (for example, north, south, east, and west). Accelerometers measure acceleration along one or more axes, where the axes are, for example, mutually perpendicular to each other. Gyroscopes (for example, micro-electro-mechanical systems (MEMS) gyroscopes) measure rotational movement.
Environmental sensors include, for example, thermometers (that measure an ambient temperature and/or temperature gradient), hygrometers (that measure humidity), pressure sensors (that measure an altitude), and photosensors and photodetectors (for example, cameras, ambient light sensors, and infrared (IR) sensors). Cameras are, for example, charge-coupled device (CCD) image sensors, active-pixel sensor (APS), and the like. Ambient light sensors are, for example, photodiodes, photoresistors, phototransistors, and the like. IR detectors may be thermal and/or photonic (for example, photodiodes).
Radios include, for example, Global Positioning System (GPS) receivers, mobile telephone radios, Wi-Fi devices, Bluetooth radios, and the like. Mobile telephone radios are associated with one or more mobile phone cellular networks. Wi-Fi devices, for example, are based on Institute of Electrical and Electronics Engineers (IEEE) 802.11 standards. Bluetooth radios, for example, correspond to Bluetooth standards overseen by the Bluetooth Special Interest Group (SIG).
Although various sensors/emitters are described in relation to
Processor 220 carries out the instructions of a computer program, for example, by performing arithmetical, logical, and input/output operations. Processor 220 is illustrated by processor 510 and described further in relation to
Buffer 230 buffers/stores sensor data received from one or more sensors 210. Buffer 230 may, for example, be one or more of static random access memory (SRAM), First In First Out (FIFO) memory, one or more delay elements, one or more shift registers, and the like. Although depicted as integrated in processor 220, buffer 230 may alternatively be external to processor 220.
ASR 250 and 280 translate (a representation of) spoken words into text, for example, to provide a user voice interface. ASR 250 may be external to mobile device 110 (for example, in cloud-based computing resources (the “cloud”)) as illustrated by
In
In
In some embodiments, processor 220 provides the noise-reduced representation of speech from user 120 to ASR 280 (
In block 320, the received sensor data is buffered (for example, stored in buffer 230 in
In block 330, the received sensor data is processed (for example, by processor 220 in
In block 340, the buffered sensor data is processed (for example, by processor 220 in
In block 350, (a representation of) the noise-reduced signal is provided to ASR 250 and/or 280 (
An automatic speech recognition assistance (ASRA) in mobile devices performs noise suppression to detect features in the incoming microphone input in real-time, concurrently passing the stream at a real-time rate to the SR engine. A constraint on such ASRA is that ASRA determines what is noise and what is real user input (“future input”), based on the past and present data that comes from the microphone. ASRA cannot use the future input from the user to perform noise suppression (while passing the “cleaned” microphone input to the server), because ASRA passes “cleaned” microphone input in real-time to the server, and therefore cannot use this future input to produce the input passed in real-time.
Some embodiments include, for example, a device having one or more microphones and a digital signal processor (DSP) (or other processor) with memory, and a high-speed interface to a speech recognition server (the interface is optionally capable of bursting microphone input at a speed that is typically faster than real-time speed). The speech recognition resides in the network (for example, in the cloud), or integrated on the device and running on the same or an additional processor.
In response to the processor (for example, DSP) processing spoken user commands (for example, after a trigger is identified that signifies the user wants to interact with an SR engine, the trigger preceded by a few seconds by a spoken command that the SR engine should process), the processor will buffer a certain amount of time (for example, up to a few seconds) of microphone(s) input, and analyze the input to train its algorithm to recognize features of the input. In this way, the processor can apply the algorithm to improve ASRA processing, apply the algorithm using updated features to the original input, and provide “cleaned” output to the SR engine. By way of example and not limitation, the processor (ASRA algorithm) can train itself on the voice of the real user after processing some amount of time (such as a few seconds) of the microphone input, and then apply a better mask to the voice distractors starting from the beginning of the recording.
In order to make up the “lost time” in passing the microphone data to the SR engine, it will then burst buffer the accumulated “cleaned” output to the SR server over a fast network interface and continue to clean the microphone data in real time as the user speaks, after the initial buffered microphone data is processed (for example, beyond the few seconds of data stored in the buffer).
The above steps may be performed while the user is speaking to the device and the latency that was accumulated is transparent to the user, because the SR engine can process the clean microphone input faster than real-time speeds.
In some embodiments, the SR engine is, for example, an embedded engine that resides on the (mobile) device (rather than in the cloud), and may be implemented on the main processor, so the DSP can provide the processed data to a shared memory on the (mobile) device, that is accessible by the main processor. In this case, the DSP can notify the SR engine when to start processing the input (for example, after the buffering, training and processing is done) and bursting (explained above) does not necessarily occur over the network, but can simply be local to the device and implemented by the SR engine processing the cleaned input buffer from the shared memory at faster than real-time speeds.
Various embodiments of the present technology are extensible to more than one microphone, and to buffering input from sensors in addition to the microphones, to be used by ASRA processes that employ a buffer of the sensors' input. Buffering by a DSP of multiple inputs is important; because algorithms could benefit from an insight in the future (by nature of this buffered input). ASRA could then infer contextual information from multiple sensors that could be applied to the original input (where microphone is one type of input sensor). So, input from multiple microphones, as well as multiple motion sensors, can be buffered. A benefit is that typically the mobile device has a larger connectivity set, so the ASRA can make better decisions with this insight. In addition, the SR engine can operate with only one input, thereby increasing the importance of feeding it with as clean of an input as possible.
According to some embodiments, sensors include any of microphones, motions sensors, heat sensors, light sensors, and even radio sensors.
One of the benefits provided by embodiments of the present technology (for example, buffering sensor inputs in mobile devices, and processing the buffered inputs with SR assistance algorithms to provide insight into the future, and then processing the buffered input before sending it to the SR engine) is the SR engine has only one input. So cleaning this input, and doing it such that the latency accumulated by the buffering is hidden from the end user, is critical to the end-user experience. In addition, better accuracy from the SR engine will result. Since the SR engine can “pump” the data from the buffer much faster than real time, there is no discernable latency added to the user experience in getting a response after he/she finished saying the command.
In various embodiments, a user will wake up a mobile device (for example, smartphone, tablet, computer, or any other consumer electronic device) with a trigger (for example, a voice command, other sensor input, and combinations thereof) and then speak a verbal command/question. The processor (for example, DSP) may be tasked with both detecting the trigger, as well as buffering the microphones and/or sensor input that can start before the main processors are fully powered up. The buffered input of the command should be processed by the DSP, before the main processor is even able to send the entire command to the SR engine. Various embodiments use buffering so processing may train itself, look into the “future” and clean the buffered data, and then start passing the “cleaned” output to the SR engine in a burst as soon as the main processor has woken up and established a link to the SR engine.
Some embodiments provide a method for improving the performance of multi-microphone noise reduction for speech recognition.
For the voice signal processing chain, there can be tight latency constraints that have to be achieved which restrict the amount of buffering or reprocessing that can be done. However, for speech recognition, the latency constraints are different. The user is interested in the amount of time after the end of an utterance that the recognition result comes back. This allows for flexibility to buffer and reprocess the incoming speech. So, some embodiments process the incoming speech and at the same time buffer the microphone streams. Once the adaptive parameters have settled to their optimal values (for example, determined either by a fixed amount of time or by internal measures within the algorithm) then the processor/algorithm can go back and start processing the streams from the beginning. By doing so in this example, the initial part of the utterance now gets processed by the algorithm with its best parameter settings.
For example:
In this way, in this example, the beginning part of the utterance is processed using the algorithm's best parameter settings and is no longer damaged during the period of parameter adaptation.
For speech recognition applications, there is the flexibility to go back and reprocess the utterance because there is no constraint to produce an output in real-time. Nevertheless, there may be constraints to the length of the buffer so that the total response time of the system, including the recognizer itself does not become too long. By using a processor that runs the algorithm faster than real-time, it is possible to catch up the reprocessing of the buffered speech so the latency impact of the reprocessing on the overall system responsiveness can be minimized, according to some embodiments.
Many multi-microphone processing techniques have been focused on noise reduction for the voice channel. In some situations, the flexibility to do this buffered reprocessing does not exist because of the latency and real-time constraints. Different latency requirements for speech recognition, for example, make the various approaches of the present technology to buffered reprocessing feasible.
The receiver 410 can be configured to communicate with a network such as the Internet, Wide Area Network (WAN), Local Area Network (LAN), cellular network, and so forth, to receive an audio data stream, which may comprise one or more channels of audio data. The received audio data stream may then be forwarded to the audio processing system 440 and the output device 450.
The processor 420 may include hardware and software that implement the processing of audio data and various other operations depending on a type of the system 400 (for example, communication device and computer). A memory (for example, non-transitory computer readable storage medium) may store, at least in part, instructions and data for execution by processor 420.
The audio processing system 440 may include hardware and software that implement the methods according to various embodiments disclosed herein. The audio processing system 440 is further configured to receive acoustic signals from an acoustic source via acoustic sensors (e.g., microphone(s)) 430 and process the acoustic signals. After reception by the acoustic sensors (e.g., microphone(s)) 430, the acoustic signals may be converted into electric signals by an analog-to-digital converter.
In some embodiments, the acoustic sensors (e.g., microphone(s)) 430 are spaced a distance apart (for example, at top and bottom of the mobile device 110 (
In some embodiments, the audio processing system 440 is configured to carry out noise suppression and/or noise reduction based on inter-microphone level difference, level salience, pitch salience, signal type classification, speaker identification, and so forth. An example audio processing system suitable for performing noise reduction is discussed in more detail in U.S. patent application Ser. No. 12/832,901, titled “Method for Jointly Optimizing Noise Reduction and Voice Quality in a Mono or Multi-Microphone System,” filed on Jul. 8, 2010, now U.S. Pat. No. 8,473,287, issued Jun. 25, 2013, the disclosure of which is incorporated herein by reference for all purposes. By way of example and not limitation, noise reduction methods are described in U.S. patent application Ser. No. 12/215,980, entitled “System and Method for Providing Noise Suppression Utilizing Null Processing Noise Subtraction,” filed Jun. 30, 2008, and in U.S. patent application Ser. No. 11/699,732, entitled “System and Method for Utilizing Omni-Directional Microphones for Speech Enhancement,” filed Jan. 29, 2007, now U.S. Pat. No. 8,194,880, issued Jun. 5, 2012, which are incorporated herein by reference in their entireties.
The output device 450 is any device that provides an audio output to a listener (for example, the acoustic source). For example, the output device 450 may comprise a speaker, a class-D output, an earpiece of a headset, or a handset on the system 400.
The components shown in
Mass data storage 530, which can be implemented with a magnetic disk drive, solid state drive, or an optical disk drive, is a non-volatile storage device for storing data and instructions for use by processor unit(s) 510. Mass data storage 530 stores the system software for implementing embodiments of the present disclosure for purposes of loading that software into main memory 520.
Portable storage device 540 operates in conjunction with a portable non-volatile storage medium, such as a flash drive, floppy disk, compact disc, digital video disc, or Universal Serial Bus (USB) storage device, to input and output data and code to and from the computer system 500 of
User input devices 560 can provide a portion of a user interface. User input devices 560 may include one or more microphones, an alphanumeric keypad, such as a keyboard, for inputting alphanumeric and other information, or a pointing device, such as a mouse, a trackball, stylus, or cursor direction keys. User input devices 560 can also include a touchscreen. Additionally, the computer system 500 as shown in
Graphics display system 570 includes a liquid crystal display (LCD) or other suitable display device. Graphics display system 570 is configurable to receive textual and graphical information and processes the information for output to the display device.
Peripheral devices 580 may include any type of computer support device to add additional functionality to the computer system 500.
The components provided in the computer system 500 of
The processing for various embodiments may be implemented in software that is cloud-based. In some embodiments, the computer system 500 is implemented as a cloud-based computing environment, such as a virtual machine operating within a computing cloud. In other embodiments, the computer system 500 may itself include a cloud-based computing environment, where the functionalities of the computer system 500 are executed in a distributed fashion. Thus, the computer system 500, when configured as a computing cloud, may include pluralities of computing devices in various forms, as will be described in greater detail below.
In general, a cloud-based computing environment is a resource that typically combines the computational power of a large grouping of processors (such as within web servers) and/or that combines the storage capacity of a large grouping of computer memories or storage devices. Systems that provide cloud-based resources may be utilized exclusively by their owners or such systems may be accessible to outside users who deploy applications within the computing infrastructure to obtain the benefit of large computational or storage resources.
The cloud may be formed, for example, by a network of web servers that comprise a plurality of computing devices, such as the computer system 500, with each server (or at least a plurality thereof) providing processor and/or storage resources. These servers may manage workloads provided by multiple users (for example, cloud resource customers or other users). Typically, each user places workload demands upon the cloud that vary in real-time, sometimes dramatically. The nature and extent of these variations typically depends on the type of business associated with the user.
The present technology is described above with reference to example embodiments. Therefore, other variations upon the example embodiments are intended to be covered by the present disclosure.
The present application claims the benefit of U.S. Provisional Application No. 61/971,793, filed on Mar. 28, 2014. The subject matter of the aforementioned application is incorporated herein by reference for all purposes.
Number | Name | Date | Kind |
---|---|---|---|
3946157 | Dreyfus | Mar 1976 | A |
4797924 | Schnars et al. | Jan 1989 | A |
4813076 | Miller | Mar 1989 | A |
5054085 | Meisel et al. | Oct 1991 | A |
5214707 | Fujimoto et al. | May 1993 | A |
5340316 | Javkin et al. | Aug 1994 | A |
5640490 | Hansen et al. | Jun 1997 | A |
5787414 | Miike et al. | Jul 1998 | A |
6018708 | Dahan et al. | Jan 2000 | A |
6067517 | Bahl et al. | May 2000 | A |
6757652 | Lund et al. | Jun 2004 | B1 |
6954745 | Rajan | Oct 2005 | B2 |
7016836 | Yoda | Mar 2006 | B1 |
7219063 | Schalk et al. | May 2007 | B2 |
7319959 | Watts | Jan 2008 | B1 |
7698133 | Ichikawa | Apr 2010 | B2 |
8194880 | Avendano | Jun 2012 | B2 |
8275616 | Jung et al. | Sep 2012 | B2 |
8345890 | Avendano et al. | Jan 2013 | B2 |
8355511 | Klein | Jan 2013 | B2 |
8405532 | Clark et al. | Mar 2013 | B1 |
8447596 | Avendano et al. | May 2013 | B2 |
8473287 | Every et al. | Jun 2013 | B2 |
8538035 | Every et al. | Sep 2013 | B2 |
8543399 | Jeong et al. | Sep 2013 | B2 |
8712776 | Bellegarda et al. | Apr 2014 | B2 |
8718299 | Nishimura et al. | May 2014 | B2 |
8880396 | Laroche et al. | Nov 2014 | B1 |
8903721 | Cowan | Dec 2014 | B1 |
8938394 | Faaborg et al. | Jan 2015 | B1 |
9143851 | Schober | Sep 2015 | B2 |
9185487 | Solbach et al. | Nov 2015 | B2 |
9240182 | Lee et al. | Jan 2016 | B2 |
20020036624 | Ohta et al. | Mar 2002 | A1 |
20020041678 | Basburg-Ertem et al. | Apr 2002 | A1 |
20020097884 | Cairns | Jul 2002 | A1 |
20020138265 | Stevens et al. | Sep 2002 | A1 |
20030069727 | Krasny et al. | Apr 2003 | A1 |
20030161097 | Le et al. | Aug 2003 | A1 |
20030179888 | Burnett et al. | Sep 2003 | A1 |
20040029622 | Laroia et al. | Feb 2004 | A1 |
20040076190 | Goel et al. | Apr 2004 | A1 |
20040114772 | Zlotnick | Jun 2004 | A1 |
20050060155 | Chu et al. | Mar 2005 | A1 |
20050159945 | Otsuka et al. | Jul 2005 | A1 |
20050171851 | Applebaum et al. | Aug 2005 | A1 |
20060074658 | Chadha | Apr 2006 | A1 |
20060074686 | Vignoli | Apr 2006 | A1 |
20060092918 | Talalai | May 2006 | A1 |
20060100876 | Nishizaki et al. | May 2006 | A1 |
20070064817 | Dunne et al. | Mar 2007 | A1 |
20070073536 | Clark et al. | Mar 2007 | A1 |
20070081636 | Shaffer et al. | Apr 2007 | A1 |
20070154031 | Avendano et al. | Jul 2007 | A1 |
20070256027 | Daude | Nov 2007 | A1 |
20070262863 | Aritsuka et al. | Nov 2007 | A1 |
20080004875 | Chengalvarayan et al. | Jan 2008 | A1 |
20080010057 | Chengalvarayan et al. | Jan 2008 | A1 |
20080019548 | Avendano | Jan 2008 | A1 |
20080071547 | Prieto et al. | Mar 2008 | A1 |
20080140479 | Mello et al. | Jun 2008 | A1 |
20080157129 | Hsu et al. | Jul 2008 | A1 |
20080195389 | Zhang et al. | Aug 2008 | A1 |
20090024392 | Koshinaka | Jan 2009 | A1 |
20090083034 | Hernandez et al. | Mar 2009 | A1 |
20090125311 | Haulick et al. | May 2009 | A1 |
20090146848 | Ghassabian | Jun 2009 | A1 |
20090192795 | Cech | Jul 2009 | A1 |
20090220107 | Every et al. | Sep 2009 | A1 |
20090235312 | Morad et al. | Sep 2009 | A1 |
20090238373 | Klein | Sep 2009 | A1 |
20090254351 | Shin et al. | Oct 2009 | A1 |
20090270141 | Sassi | Oct 2009 | A1 |
20090323982 | Solbach et al. | Dec 2009 | A1 |
20100082346 | Rogers et al. | Apr 2010 | A1 |
20100082349 | Bellegarda et al. | Apr 2010 | A1 |
20100121629 | Cohen | May 2010 | A1 |
20100204987 | Miyauchi | Aug 2010 | A1 |
20100305807 | Basir et al. | Dec 2010 | A1 |
20100312547 | Van Os et al. | Dec 2010 | A1 |
20100324894 | Potkonjak | Dec 2010 | A1 |
20110029359 | Roeding et al. | Feb 2011 | A1 |
20110145000 | Hoepken et al. | Jun 2011 | A1 |
20110218805 | Washio et al. | Sep 2011 | A1 |
20110255709 | Nishimura et al. | Oct 2011 | A1 |
20110257967 | Every et al. | Oct 2011 | A1 |
20110275348 | Clark et al. | Nov 2011 | A1 |
20110293102 | Kitazawa et al. | Dec 2011 | A1 |
20110293103 | Park et al. | Dec 2011 | A1 |
20120010881 | Avendano et al. | Jan 2012 | A1 |
20120027218 | Every et al. | Feb 2012 | A1 |
20120166184 | Locker et al. | Jun 2012 | A1 |
20120224456 | Visser et al. | Sep 2012 | A1 |
20130097437 | Naveh et al. | Apr 2013 | A9 |
20130211828 | Gratke et al. | Aug 2013 | A1 |
20130260727 | Knudson et al. | Oct 2013 | A1 |
20130332156 | Tackin et al. | Dec 2013 | A1 |
20140006825 | Shenhav | Jan 2014 | A1 |
20140025379 | Ganapathiraju et al. | Jan 2014 | A1 |
20140114665 | Murgia | Apr 2014 | A1 |
20140244273 | Laroche | Aug 2014 | A1 |
20140274203 | Ganong et al. | Sep 2014 | A1 |
20140278435 | Ganong et al. | Sep 2014 | A1 |
20140316783 | Medina | Oct 2014 | A1 |
20140348345 | Furst et al. | Nov 2014 | A1 |
20150031416 | Labowicz et al. | Jan 2015 | A1 |
20150193841 | Bernard | Jul 2015 | A1 |
20160077574 | Bansal et al. | Mar 2016 | A1 |
Number | Date | Country |
---|---|---|
104247280 | Dec 2014 | CN |
2962403 | Jan 2016 | EP |
1020150121038 | Oct 2015 | KR |
WO2013150325 | Oct 2013 | WO |
WO2014063104 | Apr 2014 | WO |
WO2014134216 | Sep 2014 | WO |
WO2014172167 | Oct 2014 | WO |
WO2015103606 | Jul 2015 | WO |
Entry |
---|
International Search Report & Written Opinion dated Jun. 26, 2014 in Patent Cooperation Treaty Application No. PCT/US2014/018780, filed Feb. 26, 2014. |
International Search Report & Written Opinion dated May 1, 2015 in Patent Cooperation Treaty Application No. PCT/US2015/010312, filed Jan. 6, 2015, 12 pp. |
International Search Report & Written Opinion dated Apr. 29, 2014 in Patent Cooperation Treaty Application No. PCT/US2013/065765, filed Oct. 18, 2013. |
International Search Report & Written Opinion dated Sep. 11, 2014 in Patent Cooperation Treaty Application No. PCT/US2014/033559, filed Apr. 9, 2014. |
Hinton, G. et al., “Deep Neural Networks for Acoustic Modeling in Speech Recognition”, IEEE Signal Processing Magazine, Nov. 2012, pp. 82-97. |
Laroche, Jean et al., “Noise Suppression Assisted Automatic Speech Recognition”, U.S. Appl. No. 12/962,519, filed Dec. 7, 2010. |
Medina, Eitan Asher, “Cloud-Based Speech and Noise Processing”, U.S. Appl. No. 61/826,915, filed May 23, 2013. |
Laroche, Jean et al., “Adapting a Text-Derived Model for Voice Sensing and Keyword Detection”, U.S. Appl. No. 61/836,977, filed Jun. 19, 2013. |
Santos, Peter et al., “Voice Sensing and Keyword Analysis”, U.S. Appl. No. 61/826,900, filed May 23, 2013. |
Murgia, Carlo, “Continuous Voice Sensing”, U.S. Appl. No. 61/881,868, filed Sep. 24, 2013. |
Number | Date | Country | |
---|---|---|---|
61971793 | Mar 2014 | US |