Speech recognition technology allows a user of a computing device to make inputs via speech commands, rather than via a keyboard or other peripheral device input device. One difficulty shared by different speech recognition systems is discerning intended speech inputs from other received sounds, including but not limited to background noise, background speech, and speech from a current system user that is not intended to be an input.
Various methods have been proposed to discern intended speech inputs from other sounds. For example, some speech input systems require a user to say a specific command, such as “start listening,” before any speech will be accepted and analyzed as an input. However, such systems may still be susceptible to background noise that randomly matches recognized speech patterns and that therefore may be interpreted as input. Such “false positives” may result in a speech recognition system performing actions not intended by a user, or performing actions even when no users are present.
Accordingly, various embodiments are disclosed herein that relate to the use of identity information to help avoid the occurrence of false positive speech recognition events in a speech recognition system. For example, one disclosed embodiment provides a method of operating a speech recognition input system. The method comprises receiving speech recognition data comprising a recognized speech segment, acoustic locational data related to a location of origin of the recognized speech segment as determined via signals from the microphone array, and confidence data comprising a recognition confidence value, and also receiving image data comprising visual locational information related to a location of each person located in a field of view of the image sensor. The acoustic locational data is compared to the visual locational data to determine whether the recognized speech segment originated from a person in the field of view of the image sensor. The method further comprises adjusting the confidence data based upon whether the recognized speech segment is determined to have originated from a person in the field of view of the image sensor.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.
The present disclosure is directed to avoiding false positive speech recognitions in a speech recognition input system. Further, the disclosed embodiments also may help to ensure that a speech recognition event originated from a desired user in situations where there are multiple users in the speech recognition system environment. For example, where a plurality of users are playing a game show-themed video game and the game requests a specific person to answer a specific question, the disclosed embodiments may help to block answers called by other users. It will be understood that speech recognition input systems may be used to enable speech inputs for any suitable device. Examples include, but are not limited to, interactive entertainment systems such as video game consoles, digital video recorders, digital televisions and other media players, and devices that combine two or more of these functionalities.
Entertainment system 10 further comprises an input device 100 having a depth-sensing camera and a microphone array. The depth-sensing camera may be used to visually monitor one or more users of entertainment system 10, and the microphone array may be used to receive speech commands made by the players. The use of a microphone array, rather than a single microphone, allows information regarding the location of a source of a sound (e.g. a player speaking) to be determined from the audio data.
The data acquired by input device 100 allows a player to make inputs without the use of a hand-held controller or other remote device. Instead, speech inputs, movements, and/or combinations thereof may be interpreted by entertainment system 10 as controls that can be used to affect the game being executed by entertainment system 10.
The movements and speech inputs of game player 108 may be interpreted as virtually any type of game control. For example, the example scenario illustrated in
Furthermore, some movements and speech inputs may be interpreted as controls that serve purposes other than controlling player avatar 112. For example, the player may use movements and/or speech commands to end, pause, or save a game, select a level, view high scores, communicate with a friend, etc. The illustrated boxing scenario is provided as an example, but is not meant to be limiting in any way. To the contrary, the illustrated scenario is intended to demonstrate a general concept, which may be applied to a variety of different applications without departing from the scope of this disclosure.
Input device 100 also comprises memory 206 comprising instructions executable by a processor 208 to perform various functions related to receiving inputs from depth-sensing camera 202 and microphone array 204, processing such inputs, and/or communicating such inputs to console 102. Embodiments of such functions are described in more detail below. Console 102 likewise includes memory 210 having instructions stored thereon that are executable by a processor 212 to perform various functions related to the operation of entertainment system 10, embodiments of which are described in more detail below.
As described above, it may be difficult for a speech recognition system to discern intended speech inputs from other received sounds, such as background noise, background speech (i.e. speech not originating from a current user), etc. Further, it also may be difficult for a speech recognition system to differentiate speech from a current system user that is not intended to be an input. Current methods that involve a user issuing a specific speech command, such as “start listening,” to initiate a speech-recognition session may be subject to false positives in which background noise randomly matches such speech patterns. Another method involves the utilization of a camera to detect the gaze of a current user to determine if speech from the user is intended as a speech input. However, this method relies upon a user being positioned in an expected location during system use, and therefore may not be effective in a dynamic use environment in which users move about, in which users may be out of view of the camera, and/or in which non-users may be present.
Accordingly,
Next, method 300 comprises, at 312, receiving image data. The image data may comprise, for example, processed image data that was originally received by the depth-sensing camera and then processed to identify persons or other objects in the image. In some embodiments, individual pixels or groups of pixels in the image may be labeled with metadata that represents a type of object imaged at that pixel (e.g. “player 1”), and that also represents a distance of the object from an input device. This data is shown as “visual location information” 314 in
After receiving the speech recognition data and the image data, method 300 next comprises, at 316, comparing the acoustic location information to the visual location information, and at 318, adjusting the confidence data based upon whether the recognized speech segment is determined to have originated from a person in the image sensor field of view. For example, if it is determined that the recognized speech segment did not originate from a player in view, the confidence value may be lowered, or a second confidence value may be added to the confidence data, wherein the second confidence value is an intended input confidence value configured (in this case) to communicate a lower level of confidence that the recognized speech segment came from an active user. Likewise, where it is determined that the recognized speech segment did originate from a player in view, the confidence value may be raised or left unaltered, or an intended input confidence value may be added to the confidence data to communicate a higher level of confidence that the recognized speech segment came from an active user.
In either case, the recognized speech segment and modified confidence data may be provided to an application for use. Using this data, the application may decide whether to accept or reject the recognized speech segment based upon the modified confidence data. Further, in some cases where it is determined that it is highly likely that the recognized speech segment was not intended to be a speech input, method 300 may comprise rejecting the recognized speech segment, and thus not passing it to an application. In this case, such rejection of a recognized speech segment may be considered an adjustment of a confidence level to a level below a minimum confidence threshold. It will be understood that the particular examples given above for adjusting the confidence data are described for the purpose of illustration, and that any other suitable adjustments to the confidence values may be made.
In some cases, other information than acoustic location information and visual location information may be used to help determine a level of confidence that a recognized speech segment is intended to be an input.
Method 400 comprises, at 402, receiving a recognized speech segment and confidence data. As illustrated in
The digital audio processing stage 504 may be configured to perform any suitable digital audio processing on the digitized microphone signals. For example, the digital audio processing stage 504 may be configured to remove noise, to combine the four microphone signals into a single audio signal, and to output acoustic location information 507 that comprises information on a direction and/or location from which a speech input is received. The speech recognition stage 506, as described above, may be configured to compare inputs received from the digital audio processing stage 504 to a plurality of recognized speech patterns to attempt to recognize speech inputs. The speech recognition stage 506 may then output recognized speech segments and also confidence data for each recognized speech segment to an intent determination stage 508. Further, the intent determination stage 508 may also receive the acoustic location information from the digital audio processing stage 504. It will be understood that, in some embodiments, the acoustic location information may be received via the speech recognition stage 506, or from any other suitable component.
Referring back to
The video processing stage 510 may output any suitable data, including but not limited to a synthesized depth image that includes information regarding the locations and depths of objects at each pixel as determined from skeletal tracking analysis.
Referring again to
Returning to process 404, if it is determined that the recognized speech segment originated from a person in the field of view of the depth-sensing camera, then method 400 comprises, at 412, determining if the person is facing the depth-sensing camera. This may comprise, for example, determining if the visual location data indicates that any facial features of the player are visible (e.g. eyes, nose, mouth, overall face, etc.). Such a determination may be useful, for example, to distinguish between a user sitting side-by-side with and talking to another user (i.e. speech made by a non-active user) from the user making a speech input (i.e. speech made by an active user). If it is determined at 412 that the user is not facing the camera, then method 400 comprises, at 414, adjusting the confidence data to reflect a reduction in confidence that the recognized speech input was intended to be an input. On the other hand, if it is determined that the user is facing the camera, then the confidence data is not adjusted. It will be understood that, in other embodiments, any other suitable adjustments may be made to the confidence data other than those described herein to reflect the difference confidence levels resulting from the determination at 412.
Next, at 416, it is determined whether the person from whom the recognized speech segment originated can be identified by voice. As described above for process 406, this may be performed in any suitable manner, such as by consulting a database of user voice patterns 514. If it is determined that the recognized speech segment did not originate from a player in view and the speaker cannot be identified by voice, then method 400 comprises at 418, adjusting the confidence data to reflect a reduction in confidence that the recognized speech input was intended to be an input. On the other hand, if it is determined that the user is facing the camera, then the confidence data is not adjusted. It will be understood that, in other embodiments, any other suitable adjustments may be made to the confidence data other than those described herein to reflect the difference confidence levels resulting from the determination at 416.
Method 400 next comprises, at 420, determining whether the user's speech input contains a recognized keyword. Such recognized keywords may be words or phrases considered to be indicative that subsequent speech is likely to be intended as a speech input, and may be stored in a database, as indicated at 516 in
It further will be understood that the examples of, and order of, processes shown in
It also will be appreciated that the computing devices described herein may be any suitable computing device configured to execute the programs described herein. For example, the computing devices may be a mainframe computer, personal computer, laptop computer, portable data assistant (PDA), set top box, game console, computer-enabled wireless telephone, networked computing device, or other suitable computing device, and may be connected to each other via computer networks, such as the Internet. These computing devices typically include a processor and associated volatile and non-volatile memory, and are configured to execute programs stored in non-volatile memory using portions of volatile memory and the processor. As used herein, the term “program” refers to software or firmware components that may be executed by, or utilized by, one or more computing devices described herein, and is meant to encompass individual or groups of executable files, data files, libraries, drivers, scripts, database records, etc. It will be appreciated that computer-readable storage media may be provided having program instructions stored thereon, which upon execution by a computing device, cause the computing device to execute the methods described above and cause operation of the systems described above.
It is to be understood that the configurations and/or approaches described herein are exemplary in nature, and that these specific embodiments or examples are not to be considered in a limiting sense, because numerous variations are possible. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated may be performed in the sequence illustrated, in other sequences, in parallel, or in some cases omitted. Likewise, the order of the above-described processes may be changed.
The subject matter of the present disclosure includes all novel and nonobvious combinations and subcombinations of the various processes, systems and configurations, and other features, functions, acts, and/or properties disclosed herein, as well as any and all equivalents thereof.
Number | Name | Date | Kind |
---|---|---|---|
4627620 | Yang | Dec 1986 | A |
4630910 | Ross et al. | Dec 1986 | A |
4645458 | Williams | Feb 1987 | A |
4695953 | Blair et al. | Sep 1987 | A |
4702475 | Elstein et al. | Oct 1987 | A |
4711543 | Blair et al. | Dec 1987 | A |
4751642 | Silva et al. | Jun 1988 | A |
4796997 | Svetkoff et al. | Jan 1989 | A |
4809065 | Harris et al. | Feb 1989 | A |
4817950 | Goo | Apr 1989 | A |
4843568 | Krueger et al. | Jun 1989 | A |
4893183 | Nayar | Jan 1990 | A |
4901362 | Terzian | Feb 1990 | A |
4925189 | Braeunig | May 1990 | A |
5101444 | Wilson et al. | Mar 1992 | A |
5148154 | MacKay et al. | Sep 1992 | A |
5184295 | Mann | Feb 1993 | A |
5229754 | Aoki et al. | Jul 1993 | A |
5229756 | Kosugi et al. | Jul 1993 | A |
5239463 | Blair et al. | Aug 1993 | A |
5239464 | Blair et al. | Aug 1993 | A |
5288078 | Capper et al. | Feb 1994 | A |
5295491 | Gevins | Mar 1994 | A |
5320538 | Baum | Jun 1994 | A |
5347306 | Nitta | Sep 1994 | A |
5385519 | Hsu et al. | Jan 1995 | A |
5405152 | Katanics et al. | Apr 1995 | A |
5417210 | Funda et al. | May 1995 | A |
5423554 | Davis | Jun 1995 | A |
5454043 | Freeman | Sep 1995 | A |
5465317 | Epstein | Nov 1995 | A |
5469740 | French et al. | Nov 1995 | A |
5495576 | Ritchey | Feb 1996 | A |
5516105 | Eisenbrey et al. | May 1996 | A |
5524637 | Erickson | Jun 1996 | A |
5534917 | MacDougall | Jul 1996 | A |
5563988 | Maes et al. | Oct 1996 | A |
5566272 | Brems et al. | Oct 1996 | A |
5577981 | Jarvik | Nov 1996 | A |
5580249 | Jacobsen et al. | Dec 1996 | A |
5594469 | Freeman et al. | Jan 1997 | A |
5597309 | Riess | Jan 1997 | A |
5616078 | Oh | Apr 1997 | A |
5617312 | Iura et al. | Apr 1997 | A |
5638300 | Johnson | Jun 1997 | A |
5641288 | Zaenglein | Jun 1997 | A |
5682196 | Freeman | Oct 1997 | A |
5682229 | Wangler | Oct 1997 | A |
5690582 | Ulrich et al. | Nov 1997 | A |
5703367 | Hashimoto et al. | Dec 1997 | A |
5704837 | Iwasaki et al. | Jan 1998 | A |
5710866 | Alleva et al. | Jan 1998 | A |
5715834 | Bergamasco et al. | Feb 1998 | A |
5855000 | Waibel et al. | Dec 1998 | A |
5875108 | Hoffberg et al. | Feb 1999 | A |
5877803 | Wee et al. | Mar 1999 | A |
5913727 | Ahdoot | Jun 1999 | A |
5933125 | Fernie et al. | Aug 1999 | A |
5980256 | Carmein | Nov 1999 | A |
5989157 | Walton | Nov 1999 | A |
5995649 | Marugame | Nov 1999 | A |
6005548 | Latypov et al. | Dec 1999 | A |
6009210 | Kang | Dec 1999 | A |
6054991 | Crane et al. | Apr 2000 | A |
6066075 | Poulton | May 2000 | A |
6072494 | Nguyen | Jun 2000 | A |
6073489 | French et al. | Jun 2000 | A |
6077201 | Cheng et al. | Jun 2000 | A |
6098458 | French et al. | Aug 2000 | A |
6100896 | Strohecker et al. | Aug 2000 | A |
6101289 | Kellner | Aug 2000 | A |
6128003 | Smith et al. | Oct 2000 | A |
6130677 | Kunz | Oct 2000 | A |
6141463 | Covell et al. | Oct 2000 | A |
6147678 | Kumar et al. | Nov 2000 | A |
6152856 | Studor et al. | Nov 2000 | A |
6159100 | Smith | Dec 2000 | A |
6173066 | Peurach et al. | Jan 2001 | B1 |
6181343 | Lyons | Jan 2001 | B1 |
6188777 | Darrell et al. | Feb 2001 | B1 |
6215890 | Matsuo et al. | Apr 2001 | B1 |
6215898 | Woodfill et al. | Apr 2001 | B1 |
6226396 | Marugame | May 2001 | B1 |
6229913 | Nayar et al. | May 2001 | B1 |
6243683 | Peters | Jun 2001 | B1 |
6256033 | Nguyen | Jul 2001 | B1 |
6256400 | Takata et al. | Jul 2001 | B1 |
6283860 | Lyons et al. | Sep 2001 | B1 |
6289112 | Jain et al. | Sep 2001 | B1 |
6299308 | Voronka et al. | Oct 2001 | B1 |
6308565 | French et al. | Oct 2001 | B1 |
6316934 | Amorai-Moriya et al. | Nov 2001 | B1 |
6345111 | Yamaguchi et al. | Feb 2002 | B1 |
6363160 | Bradski et al. | Mar 2002 | B1 |
6384819 | Hunter | May 2002 | B1 |
6411744 | Edwards | Jun 2002 | B1 |
6430997 | French et al. | Aug 2002 | B1 |
6476834 | Doval et al. | Nov 2002 | B1 |
6496598 | Harman | Dec 2002 | B1 |
6503195 | Keller et al. | Jan 2003 | B1 |
6539931 | Trajkovic et al. | Apr 2003 | B2 |
6570555 | Prevost et al. | May 2003 | B1 |
6594629 | Basu et al. | Jul 2003 | B1 |
6633294 | Rosenthal et al. | Oct 2003 | B1 |
6640202 | Dietz et al. | Oct 2003 | B1 |
6661918 | Gordon et al. | Dec 2003 | B1 |
6681031 | Cohen et al. | Jan 2004 | B2 |
6714665 | Hanna et al. | Mar 2004 | B1 |
6731799 | Sun et al. | May 2004 | B1 |
6735562 | Zhang et al. | May 2004 | B1 |
6738066 | Nguyen | May 2004 | B1 |
6765726 | French et al. | Jul 2004 | B2 |
6788809 | Grzeszczuk et al. | Sep 2004 | B1 |
6801637 | Voronka et al. | Oct 2004 | B2 |
6807529 | Johnson et al. | Oct 2004 | B2 |
6853972 | Friedrich et al. | Feb 2005 | B2 |
6873723 | Aucsmith et al. | Mar 2005 | B1 |
6876496 | French et al. | Apr 2005 | B2 |
6882971 | Craner | Apr 2005 | B2 |
6937742 | Roberts et al. | Aug 2005 | B2 |
6950534 | Cohen et al. | Sep 2005 | B2 |
6964023 | Maes et al. | Nov 2005 | B2 |
6993482 | Ahlenius | Jan 2006 | B2 |
7003134 | Covell et al. | Feb 2006 | B1 |
7036094 | Cohen et al. | Apr 2006 | B1 |
7038855 | French et al. | May 2006 | B2 |
7039676 | Day et al. | May 2006 | B1 |
7042440 | Pryor et al. | May 2006 | B2 |
7046300 | Iyengar et al. | May 2006 | B2 |
7050606 | Paul et al. | May 2006 | B2 |
7058204 | Hildreth et al. | Jun 2006 | B2 |
7060957 | Lange et al. | Jun 2006 | B2 |
7113201 | Taylor et al. | Sep 2006 | B1 |
7113918 | Ahmad et al. | Sep 2006 | B1 |
7121946 | Paul et al. | Oct 2006 | B2 |
7170492 | Bell | Jan 2007 | B2 |
7184048 | Hunter | Feb 2007 | B2 |
7202898 | Braun et al. | Apr 2007 | B1 |
7222078 | Abelow | May 2007 | B2 |
7227526 | Hildreth et al. | Jun 2007 | B2 |
7227960 | Kataoka | Jun 2007 | B2 |
7228275 | Endo et al. | Jun 2007 | B1 |
7259747 | Bell | Aug 2007 | B2 |
7308112 | Fujimura et al. | Dec 2007 | B2 |
7317836 | Fujimura et al. | Jan 2008 | B2 |
7321853 | Asano | Jan 2008 | B2 |
7348963 | Bell | Mar 2008 | B2 |
7359121 | French et al. | Apr 2008 | B2 |
7367887 | Watabe et al. | May 2008 | B2 |
7379563 | Shamaie | May 2008 | B2 |
7379566 | Hildreth | May 2008 | B2 |
7389591 | Jaiswal et al. | Jun 2008 | B2 |
7412077 | Li et al. | Aug 2008 | B2 |
7421093 | Hildreth et al. | Sep 2008 | B2 |
7428000 | Cutler et al. | Sep 2008 | B2 |
7430312 | Gu | Sep 2008 | B2 |
7436496 | Kawahito | Oct 2008 | B2 |
7447635 | Konopka et al. | Nov 2008 | B1 |
7450736 | Yang et al. | Nov 2008 | B2 |
7452275 | Kuraishi | Nov 2008 | B2 |
7460690 | Cohen et al. | Dec 2008 | B2 |
7489812 | Fox et al. | Feb 2009 | B2 |
7536032 | Bell | May 2009 | B2 |
7555142 | Hildreth et al. | Jun 2009 | B2 |
7560701 | Oggier et al. | Jul 2009 | B2 |
7570805 | Gu | Aug 2009 | B2 |
7574020 | Shamaie | Aug 2009 | B2 |
7576727 | Bell | Aug 2009 | B2 |
7580570 | Manu et al. | Aug 2009 | B2 |
7590262 | Fujimura et al. | Sep 2009 | B2 |
7593552 | Higaki et al. | Sep 2009 | B2 |
7598942 | Underkoffler et al. | Oct 2009 | B2 |
7607509 | Schmiz et al. | Oct 2009 | B2 |
7620202 | Fujimura et al. | Nov 2009 | B2 |
7668340 | Cohen et al. | Feb 2010 | B2 |
7680287 | Amada et al. | Mar 2010 | B2 |
7680298 | Roberts et al. | Mar 2010 | B2 |
7683954 | Ichikawa et al. | Mar 2010 | B2 |
7684592 | Paul et al. | Mar 2010 | B2 |
7684982 | Taneda | Mar 2010 | B2 |
7697827 | Konicek | Apr 2010 | B2 |
7701439 | Hillis et al. | Apr 2010 | B2 |
7702130 | Im et al. | Apr 2010 | B2 |
7704135 | Harrison, Jr. | Apr 2010 | B2 |
7710391 | Bell et al. | May 2010 | B2 |
7729530 | Antonov et al. | Jun 2010 | B2 |
7746345 | Hunter | Jun 2010 | B2 |
7760182 | Ahmad et al. | Jul 2010 | B2 |
7801726 | Ariu | Sep 2010 | B2 |
7809167 | Bell | Oct 2010 | B2 |
7834846 | Bell | Nov 2010 | B1 |
7852262 | Namineni et al. | Dec 2010 | B2 |
RE42256 | Edwards | Mar 2011 | E |
7898522 | Hildreth et al. | Mar 2011 | B2 |
8024185 | Do et al. | Sep 2011 | B2 |
8035612 | Bell et al. | Oct 2011 | B2 |
8035614 | Bell et al. | Oct 2011 | B2 |
8035624 | Bell et al. | Oct 2011 | B2 |
8072470 | Marks | Dec 2011 | B2 |
8073690 | Nakadai et al. | Dec 2011 | B2 |
8296151 | Klein et al. | Oct 2012 | B2 |
8315366 | Basart et al. | Nov 2012 | B2 |
8384668 | Barney et al. | Feb 2013 | B2 |
8442833 | Chen | May 2013 | B2 |
8543394 | Shin | Sep 2013 | B2 |
20020116197 | Erten | Aug 2002 | A1 |
20030009329 | Stahl et al. | Jan 2003 | A1 |
20030018475 | Basu et al. | Jan 2003 | A1 |
20040037450 | Bradski | Feb 2004 | A1 |
20040054531 | Asano | Mar 2004 | A1 |
20040119754 | Bangalore et al. | Jun 2004 | A1 |
20040193413 | Wilson et al. | Sep 2004 | A1 |
20040260554 | Connell et al. | Dec 2004 | A1 |
20040267521 | Cutler et al. | Dec 2004 | A1 |
20050060142 | Visser et al. | Mar 2005 | A1 |
20060085187 | Barquilla | Apr 2006 | A1 |
20060143017 | Sonoura et al. | Jun 2006 | A1 |
20080026838 | Dunstan et al. | Jan 2008 | A1 |
20080059175 | Miyajima | Mar 2008 | A1 |
20080165388 | Serlet | Jul 2008 | A1 |
20080309761 | Kienzle et al. | Dec 2008 | A1 |
20080312918 | Kim | Dec 2008 | A1 |
20090018828 | Nakadai et al. | Jan 2009 | A1 |
20090030552 | Nakadai et al. | Jan 2009 | A1 |
20090067590 | Bushey et al. | Mar 2009 | A1 |
20090119096 | Gerl et al. | May 2009 | A1 |
20090125311 | Haulick et al. | May 2009 | A1 |
20090150146 | Cho et al. | Jun 2009 | A1 |
20090150156 | Kennewick et al. | Jun 2009 | A1 |
20090171664 | Kennewick et al. | Jul 2009 | A1 |
20100134677 | Yamamoto et al. | Jun 2010 | A1 |
20100207875 | Yeh | Aug 2010 | A1 |
20100211387 | Chen | Aug 2010 | A1 |
20100217604 | Baldwin et al. | Aug 2010 | A1 |
20100299144 | Barzelay et al. | Nov 2010 | A1 |
20100312547 | Van Os et al. | Dec 2010 | A1 |
20100315905 | Lee et al. | Dec 2010 | A1 |
20110035224 | Sipe | Feb 2011 | A1 |
20110043617 | Vertegaal et al. | Feb 2011 | A1 |
20110054899 | Phillips et al. | Mar 2011 | A1 |
20110107216 | Bi | May 2011 | A1 |
20110112839 | Funakoshi et al. | May 2011 | A1 |
20110164769 | Zhan et al. | Jul 2011 | A1 |
20120327193 | Dernis et al. | Dec 2012 | A1 |
20130195285 | De La Fuente et al. | Aug 2013 | A1 |
20130253929 | Weider et al. | Sep 2013 | A1 |
Number | Date | Country |
---|---|---|
201254344 | Jun 2010 | CN |
0583061 | Feb 1994 | EP |
08044490 | Feb 1996 | JP |
9310708 | Jun 1993 | WO |
9717598 | May 1997 | WO |
9944698 | Sep 1999 | WO |
Entry |
---|
Shiell, et al., “Chapter I Audio-Visual and Visual-Only Speech and Speaker Recognition: Issues about Theory, System Design, and Implementation”, Retrieved at <<http://www.igi-global.com/downloads/excerpts/9676.pdf>>, 2009, pp. 38. |
Gurban, Mihai, “Multimodal Feature Extraction and Fusion for Audio-Visual Speech Recognition”, Retrieved at <<http://biblion.epfl.ch/EPFL/theses/2009/4292/EPFL—TH4292.pdf>>, Jan. 2009, pp. 140. |
Kittler, et al., “Combining Evidence in Multimodal Personal Identity Recognition Systems”, Retrieved at <<www. springerlink.com/index/w17Ign58h8538k54.pdf>>, Apr. 2006, pp. 327-334. |
Nakadai, et al.,“Improvement of Recognition of Simultaneous Speech Signals Using AV Integration and Scattering Theory for Humanoid Robots”, Retrieved at <<http://www.sciencedirect.com/science?—ob=MImg&imagekey=B6V1C-4DS9W7 H-3-1&—cdi=5671&—user=3765386&—orig=search&—coverDate=10%2 F01% 2F2004&—sk=999559998&view=c&wchp=dGLbVtz-zSkzk&md5=813c2f68cd13f188baf7b0d6f5457007&ie=isdarticle.pdf>>, Oct. 2004, pp. 16. |
Kim, et al., “Hybrid Confidence Measure for Domain-Specific Keyword Spotting”, Retrieved at <<http://www. springerlink.com/content/gcgyggn8hxh5w36a/fulltext.pdf>>, Jan. 2002, pp. 10. |
Cooke, et al., “Gaze-contingent automatic speech recognition”, retrieved at <<http://ieeexplore.ieee.org/stamp/stamp.jsp?isnumber=4693967&arnumber=4693973&punumber=4159607>>, Nov. 12, 2009, pp. 12. |
Kanade et al., “A Stereo Machine for Video-rate Dense Depth Mapping and Its New Applications”, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1996, pp. 196-202,The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA. |
Miyagawa et al., “CCD-Based Range Finding Sensor”, Oct. 1997, pp. 1648-1652, vol. 44 No. 10, IEEE Transactions on Electron Devices. |
Rosenhahn et al., “Automatic Human Model Generation”, 2005, pp. 41-48, University of Auckland (CITR), New Zealand. |
Aggarwal et al., “Human Motion Analysis: A Review”, IEEE Nonrigid and Articulated Motion Workshop, 1997, University of Texas at Austin, Austin, TX. |
Shao et al., “An Open System Architecture for a Multimedia and Multimodal User Interface”, Aug. 24, 1998, Japanese Society for Rehabilitation of Persons with Disabilities (JSRPD), Japan. |
Kohler, “Special Topics of Gesture Recognition Applied in Intelligent Home Environments”, In Proceedings of the Gesture Workshop, 1998, pp. 285-296, Germany. |
Kohler, “Vision Based Remote Control in Intelligent Home Environments”, University of Erlangen-Nuremberg/ Germany, 1996, pp. 147-154, Germany. |
Kohler, “Technical Details and Ergonomical Aspects of Gesture Recognition applied in Intelligent Home Environments”, 1997, Germany. |
Hasegawa et al., “Human-Scale Haptic Interaction with a Reactive Virtual Human in a Real-Time Physics Simulator”, Jul. 2006, vol. 4, No. 3, Article 6C, ACM Computers in Entertainment, New York, NY. |
Qian et al., “A Gesture-Driven Multimodal Interactive Dance System”, Jun. 2004, pp. 1579-1582, IEEE International Conference on Multimedia and Expo (ICME), Taipei, Taiwan. |
Zhao, “Dressed Human Modeling, Detection, and Parts Localization”, 2001, The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA. |
He, “Generation of Human Body Models”, Apr. 2005, University of Auckland, New Zealand. |
Isard et al., “Condensation—Conditional Density Propagation for Visual Tracking”, 1998, pp. 5-28, International Journal of Computer Vision 29(1), Netherlands. |
Livingston, “Vision-based Tracking with Dynamic Structured Light for Video See-through Augmented Reality”, 1998, University of North Carolina at Chapel Hill, North Carolina, USA. |
Wren et al., “Pfinder: Real-Time Tracking of the Human Body”, MIT Media Laboratory Perceptual Computing Section Technical Report No. 353, Jul. 1997, vol. 19, No. 7, pp. 780-785, IEEE Transactions on Pattern Analysis and Machine Intelligence, Caimbridge, MA. |
Breen et al., “Interactive Occlusion and Collusion of Real and Virtual Objects in Augmented Reality”, Technical Report ECRC-95-02, 1995, European Computer-Industry Research Center GmbH, Munich, Germany. |
Freeman et al., “Television Control by Hand Gestures”, Dec. 1994, Mitsubishi Electric Research Laboratories, TR94-24, Caimbridge, MA. |
Hongo et al., “Focus of Attention for Face and Hand Gesture Recognition Using Multiple Cameras”, Mar. 2000, pp. 156-161, 4th IEEE International Conference on Automatic Face and Gesture Recognition, Grenoble, France. |
Pavlovic et al., “Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review”, Jul. 1997, pp. 677-695, vol. 19, No. 7, IEEE Transactions on Pattern Analysis and Machine Intelligence. |
Azarbayejani et al., “Visually Controlled Graphics”, Jun. 1993, vo1. 15, No. 6, IEEE Transactions on Pattern Analysis and Machine Intelligence. |
Granieri et al., “Simulating Humans in VR”, The British Computer Society, Oct. 1994, Academic Press. |
Brogan et al., “Dynamically Simulated Characters in Virtual Environments”, Sep./Oct. 1998, pp. 2-13, vol. 18, Issue 5, IEEE Computer Graphics and Applications. |
Fisher et al., “Virtual Environment Display System”, ACM Workshop on Interactive 3D Graphics, Oct. 1986, Chapel Hill, NC. |
“Virtual High Anxiety”, Tech Update, Aug. 1995, pp. 22. |
Sheridan et al., “Virtual Reality Check”, Technology Review, Oct. 1993, pp. 22-28, vol. 96, No. 7. |
Stevens, “Flights into Virtual Reality Treating Real World Disorders”, The Washington Post, Mar. 27, 1995, Science Psychology, 2 pages. |
“Simulation and Training”, 1994, Division Incorporated. |
“Notice on China Third Office Action”, Mailed Date: Sep. 10, 2012, Application No. 201110031166.6, Filed Date:Jan. 21, 2011, pp. 9. |
Number | Date | Country | |
---|---|---|---|
20110184735 A1 | Jul 2011 | US |