Not applicable.
The invention is related to controlling electronic components in a ubiquitous computing environment, and more particularly to a system and process for controlling the components using multimodal integration in which inputs from a speech recognition subsystem, gesture recognition subsystem employing a wireless pointing device and pointing analysis subsystem associated with the pointing device, are combined to determine what component a user wants to control and what control action is desired.
Increasingly our environment is populated with a multitude of intelligent devices, each specialized in function. The modern living room, for example, typically features a television, amplifier, DVD player, lights, and so on. In the near future, we can look forward to these devices becoming more inter-connected, more numerous and more specialized as part of an increasingly complex and powerful integrated intelligent environment. This presents a challenge in designing good user interfaces.
For example, today's living room coffee table is typically cluttered with multiple user interfaces in the form of infrared (IR) remote controls. Often each of these interfaces controls a single device. Tomorrow's intelligent environment presents the opportunity to present a single intelligent user interface (UI) to control many such devices when they are networked. This UI device should provide the user a natural interaction with intelligent environments. For example, people have become quite accustomed to pointing at a piece of electronic equipment that they want to control, owing to the extensive use of IR remote controls. It has become almost second nature for a person in a modern environment to point at the object he or she wants to control, even when it is not necessary. Take the small radio frequency (RF) key fobs that are used to lock and unlock most automobiles in the past few years as an example. Inevitably, a driver will point the free end of the key fob toward the car while pressing the lock or unlock button. This is done even though the driver could just have well pointed the fob away from the car, or even pressed the button while still in his or her pocket, owing to the RF nature of the device. Thus, a single UI device, which is pointed at electronic components or some extension thereof (e.g., a wall switch to control lighting in a room) to control these components, would represent an example of the aforementioned natural interaction that is desirable for such a device.
There are some so-called “universal” remote controls on the market that are preprogrammed with the known control protocols of a litany of electronic components, or which are designed to learn the command protocol of an electronic component. Typically, such devices are limited to one transmission scheme, such as IR or RF, and so can control only electronic components operating on that scheme. However, it would be desirable if the electronic components themselves were passive in that they do not have to receive and process commands from the UI device directly, but would instead rely solely on control inputs from the aforementioned network. In this way, the UI device does not have to differentiate among various electronic components, say by recognizing the component in some manner and transmitting commands using some encoding scheme applicable only to that component, as is the case with existing universal remote controls.
Of course, a common control protocol could be implemented such that all the controllable electronic components within an environment use the same control protocol and transmission scheme. However, this would require all the electronic components to be customized to the protocol and transmission scheme, or to be modified to recognize the protocol and scheme. This could add considerably to the cost of a “single UI-controlled” environment. It would be much more desirable if the UI device could be used to control any networked group of new or existing electronic components regardless of remote control protocols or transmission schemes the components were intended to operate under.
Another current approach to controlling a variety of different electronic components in an environment is through the use of speech recognition technology. Essentially, a speech recognition program is used to recognize user commands. Once recognized the command can be acted upon by a computing system that controls the electronic components via a network connection. However, current speech recognition-based control systems typically exhibit high error rates. Although speech technology can perform well under laboratory conditions, a 20%-50% decrease in recognition rates can be experienced when these systems are used in a normal operating environment. This decrease in accuracy occurs for the most part because of the unpredictable and variable noise levels found in a normal operating setting, and the way humans alter their speech patterns to compensate for this noise. In fact, environmental noise is currently viewed as a primary obstacle to the widespread commercialization of speech recognition systems.
It is noted that in the preceding paragraphs, as well as in the remainder of this specification, the description refers to various individual publications identified by a numeric designator contained within a pair of brackets. For example, such a reference may be identified by reciting, “reference [1]” or simply “[1]”. Multiple references will be identified by a pair of brackets containing more than one designator, for example, [2, 3]. A listing of references including the publications corresponding to each designator can be found at the end of the Detailed Description section.
The present invention is directed toward a system and process that controls a group of networked electronic components regardless of any remote control protocols or transmission schemes under which they operate. In general this is accomplish using a multimodal integration scheme in which inputs from a speech recognition subsystem, gesture recognition subsystem employing a wireless pointing device and pointing analysis subsystem also employing the pointing device, are combined to determine what component a user wants to control and what control action is desired.
In order to control one of the aforementioned electronic components, the component must first be identified to the control system. In general this can be accomplished using the pointing system to identify the desired component by pointing at it, or by employing speech recognition, or both. The advantage of using both is to reinforce the selection of a particular component, even in a noisy environment where the speech recognition system may operate poorly. Thus, by combining inputs the overall system is made more robust. This use of divergent inputs to reinforce the selection is referred to as multimodal integration.
Once the object is identified, the electronic device can be controlled by the user informing the computer in some manner what he or she wants the device to do. This may be as simple as instructing the computer to turn the device on or off by activating a switch or button on the pointer. However, it is also desirable to control devices in more complex ways than merely turning them on or off. Thus, the user must have some way of relaying the desired command to the computer. One such way would be through the use of voice commands interpreted by the speech recognition subsystem. Another way is by having the user perform certain gestures with the pointer that the computer will recognize as particular commands. Integrating these approaches is even better as explained previously.
In regard to the user performing certain gestures with the pointer to remotely convey a command, this can be accomplished in a variety of ways. One approach involves matching a sequence of sensor values output by the pointer and recorded over a period of time, to stored prototype sequences each representing the output of the sensor that would be expected if the pointer were manipulated in a prescribed manner. This prescribed manner is the aforementioned gesture.
The stored prototype sequences are generated in a training phase for each electronic component it is desired to control via gesturing. Essentially to teach a gesture to the electronic component control system that represents a particular control action for a particular electronic component, a user simply holds down the pointer's button while performing the desired gesture. Meanwhile the electronic component control process is recording particular sensor values obtained from orientation messages transmitted by the pointer during the time the user is performing the gesture. The recorded sensor values represent the prototype sequence.
During operation, the control system constantly monitors the incoming orientation messages once an object associated with a controllable electronic component has been selected to assess whether the user is performing a control gesture. As mentioned above, this gesture recognition task is accomplished by matching a sequence of sensor values output by the pointer and recorded over a period of time, to stored prototype sequences representing the gestures taught to the system.
It is noted however, that a gesture made by a user during runtime may differ from the gesture preformed to create the prototype sequence in terms of speed or amplitude. To handle this situation, the matching process can entail not only comparing a prototype sequence to the recorded sensor values but also comparing the recorded sensor values to various versions of the prototype that are scaled up and down in amplitude and/or warped in time. Each version of a prototype sequence is created by applying a scaling and/or warping factor to the prototype sequence. The scaling factors scale each value in the prototype sequence either up or down in amplitude. Whereas, the warping factors expand or contract the overall prototype sequence in time. Essentially, a list is established before initiating the matching process which includes every combination of the scaling and warping factors possible, including the case where one or both of the scaling and warping factors are zero (thus corresponding to the unmodified prototype sequence).
Given this prescribed list, each prototype sequence is selected in turn and put through a matching procedure. This matching procedure entails computing a similarity indicator between the input sequence and the selected prototype sequence. The similarity indicator can be defined in various conventional ways. However, in tested versions of the control system, the similarity indicator was obtained by first computing a “match score” between corresponding time steps of the input sequence and each version of the prototype sequence using a standard Euclidean distance technique. The match scores are averaged and the maximum match score is identified. This maximum match score is the aforementioned similarity indicator for the selected prototype sequence. Thus, the aforementioned variations in the runtime gestures are considered in computing the similarity indicator. When a similarity indicator has been computed for every prototype sequence it is next determined which of the similarity indicators is the largest. The prototype sequence associated with the largest similarity indicator is the best match to the input sequence, and could indicate the gesture associated with that sequence was performed. However, unless the similarity is great enough, it might be that the pointer movements are random and do not match any of the trained gestures. This situation is handled by ascertaining if the similarity indicator of the designated prototype sequence exceeds a prescribed similarity threshold. If the similarity indicator exceeds the threshold, then it is deemed that the user has performed the gesture associated with that designated prototype sequence. As such, the control action corresponding to that gesture is initiated by the host computer. If the similarity indicator does not exceed the threshold, no control action is initiated. The foregoing process is repeated continuously for each block of sensor values obtained from the incoming orientation messages having the prescribed length.
In regard to the use of simple and short duration gestures, such as for example a single upwards or downwards motion, an opportunity exists to employ a simplified approach to gesture recognition. For such gestures, a recognition strategy can be employed that looks for simple trends or peaks in one or more of the sensor values output by the pointer. For example, pitching the pointer up may be detected by simply thresholding the output of the accelerometer corresponding to pitch. Clearly such an approach will admit many false positives if run in isolation. However, in a real system this recognition will be performed in the context of an ongoing interaction, during which it will be clear to system (and to the user) when a simple pitch up indicates the intent to control a device in a particular way. For example, the system may only use the gesture recognition results if the user is also pointing at an object, and furthermore only if the particular gesture applies to that particular object. In addition, the user can be required to press and hold down the pointer's button while gesturing. Requiring the user to depress the button while gesturing allows the system to easily determine when a gesture begins. In other words, the system records sensor values only after the user depresses the button, and thus gives a natural origin from which to detect trends in sensor values. In the context of gesturing while pointing at an object, this process induces a local coordinate system around the object, so that “up”, “down”, “left” and “right” are relative to where the object appears to the user. For example, “up” in the context of a standing user pointing at an object on the floor means pitching up from a pitched down position, and so on.
As discussed above, a system employing multimodal integration would have a distinct advantage over one system alone. To this end, the present invention includes the integration of a conventional speech control system into the gesture control and pointer systems which results in a simple framework for combining the outputs of various modalities such as pointing to target objects and pushing the button on the pointer, pointer gestures, and speech, to arrive at a unified interpretation that instructs a combined environmental control system on an appropriate course of action. This framework decomposes the desired action into a command and referent pair. The referent can be identified using the pointer to select an object in the environment as described previously or using a conventional speech recognition scheme, or both. The command may be specified by pressing the button on the pointer, or by a pointer gesture, or by a speech recognition event, or any combination thereof.
The identity of the referent, the desired command and the appropriate action are all determined by the multimodal integration of the outputs of the speech recognition system, gesture recognition system and pointing analysis processes using a dynamic Bayes network. Specifically, the dynamic Bayes network includes input, referent, command and action nodes. The input nodes correspond to the aforementioned inputs and are used to provide state information to at least one of either the referent, command, or action node. The states of the inputs determine the state of the referent and command nodes, and the states of the referent and command nodes are in turn fed into the action node, whose state depends in part on these inputs and in part on a series of device state input nodes. The state of the action node indicates the action that is to be implemented to affect the referent. The referent, command and action node states comprise probability distributions indicating the probability that each possible referent, command and action is the respective desired referent, command and action.
In addition, the dynamic Bayes network preserves ambiguities from one time step to the next while waiting for enough information to become available to make a decision as to what referent, command or action is intended. This is done via a temporal integration technique in which probabilities assigned to referents and commands in the last time step are brought forward to the current time step and are input along with new speech, pointing and gesture inputs to influence the probability distribution computed for the referents and commands in the current time step. In this way the network tends to hold a memory of a command and referent, and it is thus unnecessary to specify the command and referent at exactly the same moment in time. It is also noted that the input from these prior state nodes is weighted such that their influence on the state of the referent and command nodes decreases in proportion to the amount of time that has past since the prior state node first acquired its current state.
The Bayes network architecture also allows the state of various devices to be incorporated via the aforementioned device state input nodes. In particular, these nodes provide state information to the action node that reflects the current condition of an electronic component associated with the device state input node whenever the referent node probability distribution indicates the referent is that component. This allows, as an example, the device state input nodes to input an indication of whether the associated electronic component is activated or deactivated. This can be quite useful in situations where the only action permitted in regard to an electronic component is to turn it off if it is on, and to turn it on if it is off. In such a situation, an explicit command need not be determined. For example if the electronic component is a lamp, all that need be known is that the referent is this lamp and that it is on or off. The action of turning the lamp on or off, as the case may be, follows directly, without the user ever having to command the system.
The specific features, aspects, and advantages of the present invention will become better understood with regard to the following description, appended claims, and accompanying drawings where:
In the following description of the preferred embodiments of the present invention, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration specific embodiments in which the invention may be practiced. It is understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the present invention.
In general, the present electronic component control system and process involves the integration of a unique wireless pointer-based object selection system, a unique gesture recognition system that employs the wireless pointer, and a conventional speech control system to create a multimodal interface for determining what component a user wants to control and what control action is desired.
The pointer-based object selection system will be described first in the sections to follow, followed by the gesture recognition system, and finally the integration of these systems with a conventional speech recognition system to form the present electronic component control system.
In general, the present multimodal interface control system requires an object selection system that is capable of allowing a user to point a pointing device (referred to as a pointer) at an object in the environment that is, or is associated with, an electronic component that is controllable by the control system, and by computing the orientation and location of the pointer in terms of the environment's pre-defined coordinate system, can determine that the user is pointing at the object. Any object selection system meeting the foregoing criteria can be used. One such system is the subject of a U.S. patent application entitled “A SYSTEM AND PROCESS FOR SELECTING OBJECTS IN A UBIQUITOUS COMPUTING ENVIRONMENT”, having a Ser. No. 10/160,692, and a filing date of May 31, 2002, now U.S. Pat. No. 6,982,697. Referring to
The object selection system also includes components for determining the 3D location of the pointer 10. Both the orientation and location of the pointer within the environment in which it is operating are needed to determine where the user is pointing the device. In tested embodiments of the system these components included a pair of video cameras 16, 18 with infrared-pass filters. These cameras 16, 18 are mounted at separate locations within the environment such that each images the portion of the environment where the user will be operating the pointer 10 from a different viewpoint. A wide angle lens can be used for this purpose if necessary. Each camera 16, 18 is also connected via any conventional wireless or wired pathway to the host computer 14, so as to provide image data to the host computer 14. In tested embodiments of the system, the communication interface between the each camera 16, 18 and the host computer 14 was accomplished using a wired IEEE 1394 (i.e., Firewire) interface. The process by which the 3D location of the pointer 10 is determined using the image data provided from the cameras 16, 18 will also be discussed in detail later.
The aforementioned wireless pointer is a small hand-held unit that in the tested versions of the object selection system resembled a cylindrical wand, as shown in
In general, the wireless pointer is constructed from a case having the desired shape, which houses a number of off-the-shelf electronic components. Referring to the block diagram of
There is also at least one manually-operated switch connected to the microcontroller 300. In the tested versions of the wireless pointer, just one switch 308 was included, although more switches could be incorporated depending on what functions it is desired to make available for manual activation or deactivation. The included switch 308 is a push-button switch; however any type of switch could be employed. In general, the switch (e.g., button 202 of
Additionally, a pair of visible spectrum LEDs 314, 316, is connected to the microcontroller 300. Preferably, these LEDs each emit a different color of light. For example, one of the LEDs 314 could produce red light, and the other 316 could produce green light. The visible spectrum LEDs 314, 316 can be used for a variety of purposes preferably related to providing status or feedback information to the user. In the tested versions of the object selection system, the visible spectrum LEDs 314, 316 were controlled by commands received from the host computer via the base station transceiver. One example of their use involves the host computer transmitting a command via the base station transceiver to the pointer instructing the microcontroller 300 to illuminate the green LED 316 when the device is being pointed at an object that the host computer is capable of affecting, and illuminating the red LED when it is not. In addition to the pair of visible LEDs, there is an infrared (IR) LED 318 that is connected to and controlled by the microcontroller 300. The IR LED can be located at the front or pointing end of the pointer. It is noted that unless the case of the pointer is transparent to visible and/or IR light, the LEDs 314, 316, 318 whose light emissions would be blocked are configured to extend through the case of the pointer so as to be visible from the outside. It is further noted that a vibration unit such as those employed in pagers could be added to the pointer so that the host computer could activate the unit and thereby attract the attention of the user, without the user having to look at the pointer.
A power supply 320 provides power to the above-described components of the wireless pointer. In tested versions of the pointer, this power supply 320 took the form of batteries. A regulator in the power supply 320 converts the battery voltage to 5 volts for the electronic components of the pointer. In tested versions of the pointer about 52 mA was used when running normally, which decreases to 1 mA when the device is in a power saving mode that will be discussed shortly.
Tested versions of the wireless pointer operate on a command-response protocol between the device and the base station. Specifically, the pointer waits for a transmission from the base station. An incoming transmission from the base station is received by the pointer's transceiver and sent to the microcontroller. The microcontroller is pre-programmed with instructions to decode the received messages and to determine if the data contains an identifier that is assigned to the pointer and which uniquely identifies the device. This identifier is pre-programmed into the microcontroller. If such an identifier is found in the incoming message, then it is deemed that the message is intended for the pointer. It is noted that the identifier scheme allows other devices to be contacted by the host computer via the base station. Such devices could even include multiple pointers being operated in the same environment, such as in an office. In the case where multiple pointers are in use in the same environment, the object selection process which will be discussed shortly can be running as multiple copies (one for each pointer) on the same host computer, or could be running on separate host computers. Of course, if there are no other devices operating in the same environment, then the identifier could be eliminated and every message received by the pointer would be assumed to be for it. The remainder of the data message received can include various commands from the host computer, including a request to provided orientation data in a return transmission. In tested versions of the object selection system, a request for orientation data was transmitted 50 times per second (i.e., a rate of 50 Hz). The microcontroller is pre-programmed to recognize the various commands and to take specific actions in response.
For example, in the case where an incoming data message to the pointer includes a request for orientation data, the microcontroller would react as follows. Referring to the flow diagram in
It is noted that while tested versions of the object selection system used the above-described polling scheme where the pointer provided the orientation data message in response to a transmitted request, this need not be the case. For example, alternately, the microcontroller of the pointer could be programmed to package and transmit an orientation message on a prescribed periodic basis (e.g., at a 50 Hz rate).
The aforementioned base station used in the object selection system will now be described. In one version, the base station is a small, stand-alone box with connections for DC power and communications with the PC, respectively, and an external antenna. In tested versions of the object selection system, communication with the PC is done serially via a RS232 communication interface. However, other communication interfaces can also be employed as desired. For example, the PC communications could be accomplished using a Universal System Bus (USB), or IEEE 1394 (Firewire) interface, or even a wireless interface. The antenna is designed to receive 418 MHz radio transmissions from the pointer.
Referring now to the block diagram of
It is noted that while the above-described version of the base station is a stand-alone unit, this need not be the case. The base station could be readily integrated into the host computer itself. For example, the base station could be configured as an expansion card which is installed in an expansion slot of the host computer. In such a case only the antenna need be external to the host computer.
The base station is connected to the host computer, as described previously. Whenever an orientation data message is received from the pointer it is transferred to the host computer for processing. However, before providing a description of this processing, a brief, general description of a suitable computing environment in which this processing may be implemented and of the aforementioned host computer, will be described in more detail. It is noted that this computing environment is also applicable to the other processes used in the present electronic component control system, which will be described shortly.
The object selection process is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like (which are collectively be referred to as computers or computing devices herein).
The object selection process may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
With reference to
Computer 110 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by computer 110 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computer 110. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer readable media.
The system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132. A basic input/output system 133 (BIOS), containing the basic routines that help to transfer information between elements within computer 110, such as during start-up, is typically stored in ROM 131. RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 120. By way of example, and not limitation,
The computer 110 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media discussed above and illustrated in
The computer 110 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 180. The remote computer 180 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 110, although only a memory storage device 181 has been illustrated in
When used in a LAN networking environment, the computer 110 is connected to the LAN 171 through a network interface or adapter 170. When used in a WAN networking environment, the computer 110 typically includes a modem 172 or other means for establishing communications over the WAN 173, such as the Internet. The modem 172, which may be internal or external, may be connected to the system bus 121 via the user input interface 160, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 110, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,
The exemplary operating environment having now been discussed, the remaining part of this description section will be devoted to a description of the program modules embodying the object selection process performed by the host computer. Generally, referring to
The object selection process requires a series of correction and normalization factors to be established before it can compute the orientation of the pointer from the raw sensor values provided in an orientation message. These factors are computed in a calibration procedure. The first part of this calibration procedure involves computing correction factors for each of the outputs from the magnetometer representing the three axes of the 3-axis device, respectively. Correction factors are needed to relate the magnetometer outputs, which are a measure of deviation from the direction of the Earth's magnetic field referred to as magnetic north (specifically the dot product of the direction each axis of the magnetometer is pointed with the direction of magnetic north), to the coordinate frame established for the environment in which the pointer is operating. The coordinate frame of the environment is arbitrary, but must be pre-defined and known to the object selection process prior to performing the calibration procedure. For example, if the environment is a room in a building, the coordinate frame might be establish such that the origin is in a corner with one axis extending vertically from the corner, and the other two horizontally along the two walls forming the corner.
Referring to
In addition to computing the aforementioned magnetometer correction factors, factors for range-normalizing the magnetometer readings are also computed in the calibration procedure. Essentially, these normalization factors are based on the maximum and minimum outputs that each axis of the magnetometer is capable of producing. These values are used later in a normalization procedure that is part of the process for determining the orientation of the pointer. A simple way of obtaining these maximum and minimum values is for the user to wave the pointer about while the outputs of the magnetometer are recorded by the host computer. Specifically, referring to
Factors for range-normalizing (in [−1,1]) the accelerometer readings are also computed in the calibration procedure. In this case, the normalization factors are determined using the accelerometer output normalization procedures applicable to the accelerometer used, such as the conventional static normalization procedure used in tested embodiments of the object selection process.
Once the calibration procedure is complete, the object selection process is ready to compute the orientation of the pointer each time an orientation data message is received by the host computer. The orientation of the pointer is defined in terms of its pitch, roll and yaw angle about the respective x, y and z axes of the environment's pre-defined coordinate system. These angles can be determined via various sensor fusion processing schemes that essentially compute the angle from the readings from the accelerometer and magnetometer of the pointer. Any of these existing methods could be used, however a simplified procedure was employed in tested versions of the object selection system. In this simplified procedure, the yaw angle is computed using the recorded values of the magnetometer output. Even though the magnetometer is a 3-axis device, the pitch, roll and yaw angles cannot be computed directly from the recorded magnetometer values contained in the orientation data message. The angles cannot be computed directly because the magnetometer outputs a value that is the dot-product of the direction of each magnetometer sensor axis against the direction of magnetic north. This information is not sufficient to calculate the pitch, roll, and yaw of the device. However, it is possible to use the accelerometer readings in conjunction with the magnetometer outputs to compute the orientation. Specifically, referring to
Specifically, the range-normalized accelerometer values representing the pitch and roll are used to establish the rotation matrix Ra1,a2,0, which represents a particular instance of the Euler angle rotation matrix Rθ
mcorrected=Ra1,a2
Let N be the output of the magnetometer when the pointer is held at (pitch, roll, yaw)=(0, 0, 0), as determined in the calibration procedure. Then, project onto the ground plane and normalize as follows:
And finally, the yaw angle is found as follows:
yaw=sign(mnp×Nnp)cos−1(mnpTNnp) (3)
The computed yaw angle, along with the pitch and roll angles derived from the accelerometer readings, are then tentatively designated as defining the orientation of the pointer at the time the orientation data message was transmitted by the device (process action 1010).
It is noted that there are a number of caveats to the foregoing procedure. First, accelerometers only give true pitch and roll information when the pointer is motionless. This is typically not an issue except when the orientation computations are being used to determine if the pointer is being pointed directly at an object. In such cases, the problem can be avoided by relying on the orientation information only when the device is deemed to have been motionless when the accelerometer readings were captured. To this end, the orientation (i.e., pitch, roll and yaw) of the pointer is computed via the foregoing procedure for the last orientation message received. This is then compared to the orientation computed for the next to last orientation message received, to determine if the orientation of the pointer has changed significantly between the orientation messages. If the orientation of the pointer did not change significantly, then this indicates that the pointer was motionless prior to the transmission of the last orientation message. If the pointer was deemed to have been motionless, then the orientation information is used. However, if it is found that a significant change in the orientation occurred between the last two orientation messages received, it is deemed that the pointer was in motion and the orientation information computed from the last-received orientation message is ignored. Secondly, magnetic north can be distorted unpredictably in indoor environments and in close proximity to large metal objects. However, in practice, while it was found that for typical indoor office environments magnetic north did not always agree with magnetic north found outdoors, it was found to be fairly consistent throughout a single room. Thus, since the above-described magnetometer correction factors relate the perceived direction of magnetic north in the environment in which the pointer is operating to the prescribed coordinate system of that environment, when the environment is a room, it will not make any difference if the perceived direction of magnetic north within the room matches that in any other room or outdoors, as the orientation of the pointer is computed for that room only. Finally, it should be noted that the foregoing computations will not provide accurate results if the perceived magnetic north in the environment happens to be co-linear to the gravity vector—a situation not likely to occur.
The foregoing designation of the pointer's orientation is tentative because it cannot be determined from the accelerometer reading used to compute the roll angle whether the device was in a right-side up, or upside-down position with respect to roll when the accelerometer outputs were captured for the orientation data message. Thus, the computed roll angle could be inaccurate as the computations assumed the pointer was right-side up. Referring now to
One way to accomplish the foregoing task is to compute the orientation (R) as described above, except that it is computed first assuming the pitch angle derived from the accelerometer output reflects a right-side up orientation of the pointer, i.e., Pitchright-side up=−arcsin(a) where a is the normalized output of the accelerometer approximately corresponding to the rotation of the pointer about the x-axis of the environment's coordinate system. The orientation is then computed assuming the pitch angle derived from the accelerometer output reflects an up-side down orientation of the pointer, i.e., Pitchup-side down=−π+arcsin(a). A separate estimate of what the magnetometer outputs (m*) should be given the orientation computed for the right-side up condition and for the up-side down condition are then computed as follows:
m*=RTN, (4)
where N is the direction of magnetic north. m* is the estimated magnetometer output assuming the pointer is in the right-side up condition when R is the orientation computed assuming the pointer was in this condition, whereas m* is the estimated magnetometer output assuming the pointer is in the up-side down condition when R is the orientation computed assuming the pointer was in that condition. The error between the estimated magnetometer outputs (m*) and the actual magnetometer outputs (m) is next computed for both conditions, where the error is defined as (m*−m)T (m*−m). The pointer orientation associated with the lesser of the two error values computed is deemed to be the actual orientation of the pointer. It is noted that the roll angle derived from the accelerometer output could be used to perform as similar error analysis and determine the actual orientation of the pointer.
It is further noted that the 2-axis accelerometer used in the tested versions of the pointer could be replaced with a more complex 3-axis accelerometer, or an additional 1-axis accelerometer or mercury switch oriented in the appropriate direction could be employed, to eliminate the need for the foregoing error computation procedure. This would be possible because it can be determined directly from the “third”-axis readout whether the pointer was right-side up or upside-down with respect to roll. However, this change would add to the complexity of the pointer and must be weighed against the relatively minimal cost of the added processing required to do the error computation procedure.
As indicated previously, both the orientation and location of the pointer within the environment in which it is operating are needed to determine where the user is pointing the device. The position of the pointer within the environment can be determined via various methods, such as using conventional computer vision techniques [1] or ultrasonic acoustic locating systems [2, 3]. While these methods, and their like, could be used successfully, they are relatively complex and often require an expensive infrastructure to implement. A simpler, less costly process was developed for tested versions of the system and will now be described. Specifically, the position of the pointer within the environment is determined with the aid of the two video camera having IR-pass filters. The cameras are calibrated ahead of time to the environment's coordinate system using conventional calibration methods to establish the camera parameters (both intrinsic and extrinsic) that will be needed to determine the 3D position of the pointing end of the pointer from images captured by the cameras. In operation, the aforementioned IR LED of the pointer is flashed for approximately 3 milliseconds at a rate of approximately 15 Hz by the device's microcontroller. Simultaneously, both cameras are recording the scene at 30 Hz. This means that the IR light in the environment is captured in 1/30th of a second exposures to produce each frame of the video sequence produced each camera. Referring to the time line depicted in
Once the pointer's location and orientation at a given point in time are known it is possible to determine where the user is pointing in anticipation of affecting an object in the vicinity. There are numerous methods that can be used to determine the pointed-to location and to identify the object at or near that location. In tested versions of the system, a Gaussian blob scheme is employed to accomplish the foregoing task. This entails first modeling all the objects in the environment that it is desired for the user to be able to affect by pointing at it with the pointer, as 3D Gaussian blobs. In other words, the location and extent of the object is modeled as a single 3D Gaussian blob defined by the coordinates of a 3D location in the environment representing the mean μ of the blob and a covariance Σ defining the outside edge of the blob. These multivariate Gaussians are probability distributions that are easily learned from data, and can coarsely represent an object of a given size and orientation.
The modeling of the objects of interest in the environment as Gaussian blobs can be accomplished in any conventional manner. In tested versions of the object selection system, two different methods were employed. Referring to
The computed mean and covariance define the Gaussian blob representing the traced object. This procedure can then be repeated for each object of interest in the environment.
An alternate, albeit somewhat more complex, method to model the objects of interest in the environment as Gaussian blobs was also employed in tested versions of the object selection process. This method has particular advantage when an object of interest is out of the line of sight of one or both of the cameras, such as if it were located near a wall below one of the cameras. Since images of the object from both cameras are needed to compute the pointers location, and so the points xi in the tracing procedure, the previously described target training method cannot be used unless both of the cameras can “see” the object.
Referring to
xi+siwi=μ (6)
where xi is the position of the pointer at the ith pointing location, wi is the ray extending in the direction the pointer is pointed from the ith pointing location, and si is an unknown distance to the target object. This defines a linear system of equations that can be solved via a conventional least squares procedure to find the mean location that best fits the data.
The covariance of the Gaussian blob representing the object being modeled is then established (process action 1522). This can be done in a number of ways. First, the covariance could be prescribed or user entered. However, in tested versions of the target training procedure, the covariance of the target object was computed by adding a minimum covariance to the spread of the intersection points, as follows:
Σ=Σ0+(xi+siwi−μ)(xi+siwi−μ)T (7)
It is noted that the aforementioned computations do not take into account that the accuracy in pointing with the pointer is related to the angular error in the calculation of the device's orientation (and so in the ray wi). Thus, a computed pointing location that is far away from the object being modeled is inherently more uncertain than a computed pointing location which is nearby the target. Accordingly, the foregoing target training procedure can be refined by discounting the more remote pointing location to some degree in defining the Gaussian blob representing an object being modeled. This can be accomplished using a weighted least squares approach, as follows:
where Wi is the weight assigned to the ith pointing location, ŝi is an estimate of the distance to the target object, possibly computed using the previous procedure employing the non-weighted least squares approach, c and η are parameters related to the angular error of the pointer, and I is the identity matrix. As before, Eq. (8) is generated for each pointing location to define a linear system of equations that can be solved via the least squares procedure to find the mean location that best fits the data, but this time taking into consideration the angular error associated with the computed orientation of the pointer.
It is noted that the foregoing procedures for computing the mean and covariance of a Gaussian blob representing an object allow the represented shape of the object to be modified by simply adding any number of pointing locations where the pointer is pointed along the body of the target object.
Once a Gaussian blob for each object of interest in the environment has been defined, and stored in the memory of the host computer, the pointer can be used to select an object by simply pointing at it. The user can then affect the object, as mentioned previously. However, first, the processes that allow a user to select a modeled object in the environment using the pointer will be described. These processes are preformed each time the host computer receives an orientation message from the pointer.
One simple technique for selecting a modeled object is to evaluate the Gaussian distribution at a point nearest the mean of each Gaussian representing an object of interest in the environment which is intersected by the a ray cast by the pointer, along that ray. The likelihood that the pointer is being pointed a modeled object i_is then:
li=g(x+∥μi−x∥w,Σi) (9)
where x is the position of the pointer (as represented by the IR LED), w is a ray extending from x in the direction the pointer is pointed, and g(μ,Σ) is the probability distribution function of the multivariate Gaussian. The object associated with the Gaussian blob exhibiting the highest probability l can then be designated as the selected object.
Another approach is to project each Gaussian onto a plane normal to either w or μ−x, and then to take the value of the resulting 2D Gaussian at the point where the ray w intersects the plane. This approach can be accomplished as follows. This approach can be accomplished as follows. Referring to
It is further noted that the calculation associated with the weighted least squares approach described above can be adopted to estimate the average angular error of the pointer without reference to any ground truth data. This could be useful for correcting the computed pointer orientation direction. If this were the case, then the simpler non-weighted least squares approach could be employed in the alternate target object training procedure, as well as making the object selection process more accurate. The average angular error estimation procedure requires that the pointer be modified by the addition of a laser pointer, which is attached so as to project a laser beam along the pointing direction of the pointer. The user points at the object with the pointer from a position in the environment within the line of sight of both cameras, and depresses the device's button, as was done in the alternate target object training procedure. In this case, this pointing procedure is repeated multiple times at different pointing locations with the user being careful to line up the laser on the same spot on the surface of the target object. This eliminates any error due to the user's pointing accuracy. The orientation and location of the pointer at each pointing location is computed using the procedures described previously. The average angular error is then computed as follows:
wherein i refers to the pointing location in the environment, n refers to the total number of pointing locations, w is a ray originating at the location of the pointing device and extending in a direction defined by the orientation of the device, x is the location of the pointing device, and μ is the location of the mean of the Gaussian blob representing the target object.
Without reference to ground truth position data, this estimate of error is a measure of the internal accuracy and repeatability of the pointer pointing and target object training procedures. This measure is believed to be more related to the overall performance of the pointer than to an estimate of the error in absolute position and orientation of the device, which is subject to, for instance, the calibration of the cameras to the environment's coordinate frame.
As described above, the orientation and position of the pointer may be found by a combination of sensors and signal processing techniques. This allows an object, which is an electronic component controllable by a computer via a network connection or an extension thereof, to be selected based on a geometric model of the environment containing the object. The selection of a target object is accomplished by a user merely pointing at the object with the pointer for a moment.
Once the object is selected, the electronic device can be controlled by the user informing the computer in some manner of what he or she wants the device to do. As described above, this may be as simple as instructing the computer to turn the device on or off by activating a switch or button on the pointer. However, it is also desirable to control device in more complex ways than merely turning them on or off. Thus, the user must have some way of relaying the desired command to the computer. One such way is by having the user perform certain gestures with the pointer that the computer will recognize as particular commands. This can be accomplished in a variety of ways.
One approach involves matching a sequence of sensor values output by the pointer and recorded over a period of time, to stored prototype sequences each representing the output of one or more sensors that would be expected if the pointer were manipulated in a prescribed manner. This prescribed manner is the aforementioned gesture. The stored prototype sequences are generated in a training phase for each electronic component it is desired to control via gesturing. To account for the fact that a gesture made by a user during runtime may differ from the gesture performed to create the prototype sequence in terms of speed and amplitude, the aforementioned matching process can not only entail comparing a prototype sequence to the recorded sensor values but also comparing the recorded sensor values to various versions of the prototype that are scaled up and down in amplitude and/or warped in time (i.e., linearly stretched and contracted). The procedure used to generate each prototype sequence associated with a particular gesture is outlined in the flow diagram shown in
During operation, the electronic component control system constantly monitors the incoming pointer orientation messages after an object associated with a controllable electronic component has been selected, to assess whether the user is performing a control gesture applicable to that component. This gesture recognition task is accomplished as follows. Referring to
As mentioned above, the matching process can entail not only comparing a prototype sequence to the recorded sensor values but also comparing the recorded sensor values to various versions of the prototype that are scaled up and down in amplitude and/or warped in time. In tested versions, the amplitude scaling factors ranged from 0.8 to 1.8 in increments of 0.2, and the time warping factors ranged from 0.6 to 2.0 in increments of 0.2. However, while it is believed the aforementioned scaling and warping factors are adequate to cover any reasonable variation in the gesture associated with a prototype sequence, it is noted that different ranges and increments could be used to generate the scaling and warping factors as desired. In fact the increments do not even have to be equal across the range. In practice, the prototype sequence is scaled up or down in amplitude by applying scaling factors to each value in the prototype sequence. Whereas, the prototype sequence is warped in time by applying warping factors that expand or contract the overall sequence in time.
Essentially, a list is established before initiating the matching process which includes every combination of the scaling and warping factors possible, includes the case where one or both of the scaling and warping factors are zero. Note that the instance where both the scaling and warping factors are zero corresponds to the case where the prototype sequence is unmodified. Given this prescribed list, and referring now to
for selected warp w and scale s, where pi(w,s,t) is the recorded sensor value(s) at time step t of the current version of the selected prototype sequence i, x(t) refers to the corresponding sensor values of the input sequence at time step t, and n refers to the length of the current version of the selected prototype sequence p1(w,s) and so the length of x as well. The foregoing process is then repeated for every other combination of the warp and scale factors.
Specifically, it is determined if all the warp and scale factor combinations from the prescribed list have been selected (process action 1908). If not, the process actions 1900 through 1908 are repeated. Once an average match score has been computed for every version of the prototype sequence (including the unmodified sequence), the maximum averaged match score is identified (process action 1910). This maximum averaged match score is the aforementioned similarity indicator for the selected prototype sequence.
Referring once again to
It is noted that the aforementioned prescribed length of the input sequence is made long enough to ensure that the distinguishing characteristics of each gesture are captured therein. This aids in making sure only one gesture is recognized when several gestures are employed in the system to initiate different control actions. In tested versions of the present system employing the foregoing match score procedure this means making the input sequence as long as the longest of the scaled and warped version of the prototype sequence. The aforementioned match score threshold is chosen similarly in that it is made large enough to ensure that the distinguishing characteristics of a gesture as captured in the prototype sequence actually exist in the input sequence, and that the final match score computed for any other prototype sequence associated with another gesture not having these distinguishing characteristics will not exceed the threshold.
As to the specific sensor output or outputs that are used to construct the prototype sequences and the input sequence, any combination of the accelerometer, magnetometer and gyroscope outputs contained in each orientation message can be employed. It should be noted however, that the accelerometer will not provide an output indicative of the change in the yaw angle of the pointer, and the gyroscope will only provide data reflecting a change in the yaw angle of the pointer. Thus, the user could be restricted in the types of motion he or she is allowed to use in creating gesture if just the accelerometer or gyroscope outputs are employed in the aforementioned sequences. Using fewer output values to characterize the gesture could result in lower processing costs in comparing the prototype and input sequences. However, to give the user complete freedom in choosing the types of motion used to define a gesture, both the accelerometer and gyroscope outputs, or the magnetometer outputs, would have to be included in the sequences. In addition, while the processing costs would be higher, using the outputs from all three sensors could provide better accuracy in characterizing the gesture motions.
The foregoing prototype matching approach has the advantage of allowing the electronic component control system to be trained to recognized gestures choreographed by the user, rather than requiring prescribed gestures to be used. In addition, the user can make the gesture as simple or as complex as he or she desires. A drawback of this approach however is that runtime variations of the gesture may involve more than simple scaling of amplitude and linear time warps. Pattern recognition techniques that incorporate multiple training examples, such as hidden Markov models (HMMs) [8], may capture other important variations that may be seen in runtime. However, such techniques model only those variations present in the training data, and so would require the user to perform the desired gesture over and over during the training process—perhaps to the point of making the procedure unacceptably tedious. In addition, for gestures having a short duration, HMMs often give many false positives due to their nonlinear time warping abilities. Thus, the use of a HMM approach should be limited to user-created gestures having longer durations.
In regard to the use of simple and short duration gestures, such as for example a single motion up, down or to either side, an opportunity exists to employ a simplified and perhaps more robust approach to gesture recognition. For such gestures, a recognition strategy can be employed that looks for trends or peaks in one or more of the sensor values output by the pointer. For example, pitching the pointer up may be detected by simply thresholding the output of the accelerometer corresponding to pitch.
In this case, the system is preprogrammed with gesture threshold definitions. Each of the definitions corresponds to a predefined threshold applicable to a particular single sensor output or a set of thresholds applicable to a particular group of sensor outputs. Each definition is associated in the process to a particular gesture, which is in turn known to the system to represent a call for a particular control action to be applied to a particular electronic component that is controllable by the host computer. The thresholds are designed to indicate that the pointer has been moved in a particular direction with an excursion from a starting point which is sufficient to ensure the gesture associated with the threshold or thresholds has occurred. The starting point could be any desired, but for practical reasons, the starting point in tested versions of the present control system was chosen to be with the pointer pointed at the selected object. Thus, it was necessary for the user to point the pointing at the selected object. Pointing at an object establishes a local coordinate system around the object, so that “up”, “down”, “left” and “right” are relative to where the object appears to the user. For example, “up” in the context of a standing user pointing at an object on the floor means pitching up from a pitched down position, and so on.
It would be possible for the electronic component control system to determine when the user is pointing at the selected object using the procedures described above in connection with determining what the pointer is pointing at for the purpose of selecting that object. However, a simpler method is to have the user depress the button on the pointer whenever he or she is pointing at the object and wants to control the associated electronic device using a gesture. Requiring the user to depress the button while gesturing allows the system to easily determine when a gesture begins. In other words, the system records sensor values only after the user depresses the button, and thus gives a natural origin from which to detect trends in sensor values.
Recognizing gestures using a thresholding technique relies on the gestures being simple and of a short duration. One straightforward way of accomplishing this would be to restrict the gestures to a single movement of the pointer in a prescribed direction. For example, one gesture could be to rotate the pointer upward (i.e., pitch up), while another gesture could be to rotate the pointer downward (i.e., pitch down). Other examples of appropriate gestures would be to pan the pointer to the right (i.e., increase the yaw angle), or to the left (i.e., decrease the yaw angle). The sensor output or outputs used to establish the gesture threshold definitions and to create the input sequence to be discussed shortly are tailored to the gesture. Thus, the accelerometer and/or the magnetometer outputs would be an appropriate choice for the pitch up or pitch down gesture, while the gyroscope output would not. Similarly, the gyroscope and/or the magnetometer outputs would be an appropriate choice for the side-to-side gesture (i.e., changing the yaw angle), while the accelerometer output would not. In general, when a simple one directional gesture is employed to represent a control action, the sensor output or outputs that would best characterize that motion are employed to establish the threshold definitions and the input sequence.
Given the foregoing ground rules, a procedure for gesture recognition based on a thresholding technique will now be described in reference to
The complementary nature of speech and gesture is well established. It has been shown that when naturally gesturing during speech, people will convey different sorts of information than is conveyed by the speech [4]. In more designed settings such as interactive systems, it may also be easier for the user to convey some information with either speech or gesture or a combination of both. For example, suppose the user has selected an object as described previously and that this object is a stereo amplifier controlled via a network connection by the host computer. Existing speech recognition systems would allow a user to control the volume by, for example, saying “up volume” a number of times until the desired volume is reached. However, while such a procedure is possible, it is likely to be more efficient and precise for the user to turn a volume knob on the amplifier. This is where the previously described gesture recognition system can come into play. Rather than having to turn a physical knob on the amplifier, the user would employ the pointer to control the volume by, for example, pointing at the stereo and rolling the pointer clockwise or counterclockwise to respectively turn the volume up or down. The latter procedure can provide the efficiency and accuracy of a physical volume knob, while at the same time providing the convenience of being able to control the volume remotely as in the case of the voice recognition control scheme. This is just one example of a situation where gesturing control is the best choice, there are others. In addition, there are many situations where using voice control would be the best choice. Still further, there are situations where a combination of speech and gesture control would be the most efficient and convenient method. Thus, a combined system that incorporates the previously described gesturing control system and a conventional speech control system would have distinct advantages over either system alone.
To this end, the present invention includes the integration of a conventional speech control system into the gesture control and pointer systems which results in a simple framework for combining the outputs of various modalities such as pointing to target objects and pushing the button on the pointer, pointer gestures, and speech, to arrive at a unified interpretation that instructs a combined environmental control system on an appropriate course of action. This framework decomposes the desired action (e.g., “turn up the volume on the amplifier”) into a command (i.e., “turn up the volume”) and a referent (i.e., “the amplifier”) pair. The referent can be identified using the pointer to select an object in the environment as described previously or using a conventional speech recognition scheme, or both. The command may be specified by pressing the button on the pointer, or by a pointer gesture, or by a speech recognition event, or any combination thereof. Interfaces that allow multiple modes of input are called multimodal interfaces. With this multimodal command/referent representation, it is possible to effect the same action in multiple ways. For example, all the following pointing, speech and gesture actions on the part of the user can be employed in the present control system to turn on a light that is under the control of the host computer:
a). Say “turn on the desk lamp”;
b) Point at the lamp with the pointer and say “turn on”;
c) Point at the lamp with the pointer and perform a “turn on” gesture using the pointer;
d) Say “desk lamp” and perform the “turn on” gesture with the pointer;
e). Say “lamp”, point toward the desk lamp with the pointer rather than other lamps in the environment such as a floor lamp, and perform the “turn on” gesture with the pointer;
f). Point at the lamp with the pointer and press the pointer's button (assuming the default behavior when the lamp is off and the button is clicked, is to turn the lamp on).
By unifying the results of pointing, gesture recognition and speech recognition, the overall system is made more robust. For example, a spurious speech recognition event of “volume up” while pointing at the light is ignored, rather than resulting in the volume of an amplifier being increased, as would happen if a speech control scheme were being used alone. Also consider the example given above where the user says “lamp” while pointing toward the desk lamp with the pointer rather than other lamps in the environment, and performing the “turn on” gesture with the pointer. In that example just saying lamp is ambiguous, but pointing at the desired lamp clears up the uncertainty. Thus, by including the strong contextualization provided by the pointer, the speech recognition may be made more robust [5].
The speech recognition system employed in the tested versions of the present invention is Microsoft Corporation's Speech API (SAPI), which employs a very simple command and control (CFG) style grammar, with preset utterances for the various electronic components and simple command phrases that apply to the components. The user wears a wireless lapel microphone to relay voice commands to a receiver which is connected to the host computer and which relays the received speech commands to the speech recognition system running on the host computer.
There is still a question as to how to take in the various inputs from the pointer, gesture recognition and speech recognition events, some of which may be complementary or even contradictory, and best determine what action the user wants performed and on what electronic component. While various computational frameworks could be employed, the multimodal integration process employed in the present control system uses a dynamic Bayes network [6] which encodes the various ways that sensor outputs may be combined to identify the intended referent and command, and initiate the proper action.
The identity of the referent, the desired command and the appropriate action are all determined by combining the outputs of the speech recognition system, gesture recognition system and pointing analysis processes using a dynamic Bayes network architecture. Bayes networks have a number of advantages that make them appropriate to this task. First, it is easy to break apart and treat separately dependencies that otherwise would be embedded in a very large table over all the variables of interest. Secondly, Bayes networks are adept at handling probabilistic (noisy) inputs. And further, the network represents ambiguity and incomplete information that may be used appropriately by the system. In essence the Bayes network preserves ambiguities from one time step to the next while waiting for enough information to become available to make a decision as to what referent, command or action is intended. It is even possible for the network to act proactively when not enough information is available to make a decision. For example, if the user doesn't point at the lamp, the system might ask which lamp is meant after the utterance “lamp”.
However, the Bayes network architecture is chosen primarily to exploit the redundancy of the user's interaction so as to increase confidence that the proper action is being implemented. The user may specify commands in a variety of ways, even though the designer specified only objects to be pointed to, utterances to recognize and gestures to recognize (as well as how referents and commands combine to result in action). For example, it is natural for a person to employ deictic (pointing) gestures in conjunction with speech to relay information where the speech is consistent with and reinforces the meaning of the gesture. Thus, the user will often naturally indicate the referent and command applicable to a desired resulting action via both speech and gesturing. This includes most frequently pointing at an object the user wants to affect.
The Bayes network architecture also allows the state of various devices to be incorporated to make the interpretation more robust. For example, if the light is already on, the system may be less disposed to interpret a gesture or utterance as a “turn on” gesture or utterance. In terms of the network, the associated probability distribution over the nodes representing the light and its parents, the Action and Referent nodes, are configured so that the only admissible action when the light is on is to turn it off, and likewise when it is off the only action available is to turn it on.
Still further, the “dynamic” nature of the dynamic Bayes network can be exploited advantageously. The network is dynamic because it has a mechanism by which it maintains a short-term memory of certain values in its network. It is natural that the referent will not be determined at the exact moment in time as the command. In other words a user will not typically specify the referent by whatever mode (e.g., pointing and/or speech) at the same time he or she relays the desired command using one of the various methods available (e.g., pointer button push, pointer gesture and/or speech). If the referent is identified only to be forgotten in the next instant of time, the association with a command that comes after it will be lost. The dynamic Bayes network models the likelihood of a referent or a command applying to future time steps as a dynamic process. Specifically, this is done via a temporal integration process in which probabilities assigned to referents and commands in the last time step are brought forward to the current time step and are input along with new speech, pointing and gesture inputs to influence the probability distribution computed for the referents and commands in the current time step. In this way the network tends to hold a memory of a command and referent which decays over time, and it is thus unnecessary to specify the command and referent at exactly the same moment in time. It is noted that in the tested implementation of the Bayes network, this propagation occurred four times a second.
An example of a Bayes network architecture implemented for the present electronic component control system is shown in
In addition, the referent node 2112 is also influenced by inputs from other nodes indicating that the user is pointing at a particular target object (PointingTarget node 2116) and that the user has specified a particular referent verbally (SpeechReferent node 2118).
The Command node 2100 and the Referent node 2112 (via a ReferentClass node 2120) in turn influence the Action node 2122, as do various device state nodes represented by Light1 node 2124, Light2 node 2126 and Light3 node 2128. The ReferenctClass node 2120 maps each referent to a class type (e.g., Light1 and Light2 might both be “X10” type lights). This allows actions to be specified over a set of commands and the referent class (rather then each referent instance). Such an approach is an efficient way of setting up the network as typically multiple referents in an environment will work similarly. Without this node 2120, it would be necessary to specify a command and action over each referent even though they would likely be the same within the same class of devices.
The device state nodes indicate the current state of a device where that information is important to the control system. For example, if the device state nodes represent the state of a light (i.e., light 1), the node could indicate if the light is on or off. It is noted that a device state node only influences the action node 2122 when the referent node 2112 indicates that the electronic component associated with the device state node is the referent. Finally, a SpeechAction node 2130 can also provide an input that influences the action node 2122 and so the action ultimately performed by the host computer. The speech action input is a way to completely specify the Action from a single utterance, thereby bypassing the whole dichotomy of Command and Referent. For example, SpeechAction node 2130 might map to a speech recognition utterance of “turn on the light” as a single unit, rather than saying “turn on” (Command) and “the light” (Referent). This node 2130 can also be useful when an utterance does not fit into the Command/Referent structure, but maps to Actions anyway. For example, the utterance “make it brighter in here” can be mapped to an Action of turning on a light, even though no specific Command or Referent was specified in the utterance.
Typically, the particular electronic component corresponding to the referent, and in many cases the particular command given by the user to affect the referent, dictate what the action is to be. However, the aforementioned device states can also play into this by restricting the number of possible actions if the device state applies to the referent. For example, assume the pointer is pointing at light 1. As a result the PointingTarget node in the Bayes network is “set” to Light1. This causes the referent node to also be “set” to Light1, assuming there are no other contrary influencing inputs to the node. In addition, as the referent is set to Light1, the state of this light will influence the Action node. Assume the light is on. Also assume there are only two possible actions in this case, i.e., turn the light off if it is on, or do nothing. Thus, the possible actions are limited and so when a command in input (e.g., the speech command to “turn off”), the confidence level will be high that this is the correct action in the circumstances. This added influence on the Action node causes the probability distribution of the node to collapse to “TurnOffLight”. The system then takes the appropriate action to turn off the light.
A prototype of the foregoing electronic component control system was constructed and used to control a variety of devices in a living room-like scenario. Specifically, the user was able to control the following electronic components using the pointer and a series of simple voice commands.
A user is able to turn multiple lights in the room on and off by pointing the pointer at a light and depressing the button on the pointer. The user then utters the phrases “turn on” or “turn off”, as desired to turn the light on or off. In addition, a selected light may be dimmed or brightened via gesturing by respectively rotating the pointer down or up while pointing at the light.
A user is also able to control a media player. Specifically, the user points the pointer at the host computer's monitor where the media player's GUI is displayed, and depresses the pointer's button to start the player or to pause it. The user can also roll the pointer to the left or right to change the volume, and can gesture up or down to move the previous or next tracks in the play list. “Volume up”, “volume down”, “next” and “previous” utterances command the player accordingly.
A user can point at a computer display and click the pointer's button to give control of the cursor to the pointer. The cursor is then moved around the display's screen by pointing the pointer around the screen [7]. The pointer's button acts as the left mouse button. Clicking on a special icon in the corner of the display exits the cursor control mode.
A user can also point the pointer at a special computer controlled arrays of red, green, and blue lights to brighten them over time. When the user points away, the color gradually decays. Rolling the pointer to the left or right changes the red, green and blue combination sent to the light, changing the lights color.
It is noted that for the prototype system, an audio feedback scheme was employed where an audible sound was generated by the host computer when the selected target changes. This audio feedback assures the user that the desired object has been selected, and therefore assists in the selection process. In addition, one of the aforementioned visible spectrum LEDs on the pointer (in this case the green one) was lit via a command from the host computer when the pointer was pointing at an object known to the system.
It is noted that this feedback feature could be expanded beyond that implemented in the prototype. The pointer described previously preferably has two differently colored visible spectrum LED with which to provide feedback to the user. For example, these could be used to indicate to the user that an input of some kind was not understood by the component control system. Thus, if for instance the voice recognition system did not understand a command or an identification of a referent, the control system could cause one of the visible LEDs (e.g., the red one) to light up. The visible spectrum LEDs could even be used to provide the status of a device associated with an object that the user has selected. For instance, one of the LEDs could be illuminated to indicate the device was on, while the other would indicate the device was off Or, for example, the intensity of one of the LEDs could be varied in proportion to volume setting on a stereo amplifier. These are just a few examples of the types of feedback that the visible spectrum LEDs can provide, many others are possible.
This application is a continuation of U.S. application Ser. No. 11/156,873, filed Jun. 17, 2005, now U.S. Pat. No. 7,596,767, which is a continuation of U.S. application Ser. No. 10/160,659, filed May 31, 2002, now U.S. Pat. No. 6,990,639, which claims the benefit of U.S. Provisional Application No. 60/355,368, filed Feb. 7, 2002.
Number | Name | Date | Kind |
---|---|---|---|
4288078 | Lugo | Sep 1981 | A |
4627620 | Yang | Dec 1986 | A |
4630910 | Ross et al. | Dec 1986 | A |
4645458 | Williams | Feb 1987 | A |
4695953 | Blair et al. | Sep 1987 | A |
4702475 | Elstein et al. | Oct 1987 | A |
4711543 | Blair et al. | Dec 1987 | A |
4751642 | Silva et al. | Jun 1988 | A |
4796997 | Svetkoff et al. | Jan 1989 | A |
4809065 | Harris et al. | Feb 1989 | A |
4817950 | Goo | Apr 1989 | A |
4839838 | LaBiche et al. | Jun 1989 | A |
4843568 | Krueger et al. | Jun 1989 | A |
4893183 | Nayar | Jan 1990 | A |
4901362 | Terzian | Feb 1990 | A |
4925189 | Braeunig | May 1990 | A |
5101444 | Wilson et al. | Mar 1992 | A |
5109537 | Toki | Apr 1992 | A |
5148154 | MacKay et al. | Sep 1992 | A |
5177311 | Suzuki et al. | Jan 1993 | A |
5181181 | Glynn | Jan 1993 | A |
5184295 | Mann | Feb 1993 | A |
5229754 | Aoki et al. | Jun 1993 | A |
5229756 | Kosugi et al. | Jul 1993 | A |
5239463 | Blair et al. | Aug 1993 | A |
5239464 | Blair et al. | Aug 1993 | A |
5288078 | Capper et al. | Feb 1994 | A |
5295491 | Gevins | Mar 1994 | A |
5310192 | Miyake | May 1994 | A |
5320538 | Baum | Jun 1994 | A |
5347306 | Nitta | Sep 1994 | A |
5385519 | Hsu et al. | Jan 1995 | A |
5405152 | Katanics et al. | Apr 1995 | A |
5414643 | Blackman et al. | May 1995 | A |
5417210 | Funda et al. | May 1995 | A |
5423554 | Davis | Jun 1995 | A |
5453758 | Sato | Sep 1995 | A |
5454043 | Freeman | Sep 1995 | A |
5469740 | French et al. | Nov 1995 | A |
5485565 | Saund | Jan 1996 | A |
5495576 | Ritchey | Feb 1996 | A |
5502803 | Yoshida et al. | Mar 1996 | A |
5516105 | Eisenbrey et al. | May 1996 | A |
5524637 | Erickson et al. | Jun 1996 | A |
5528263 | Platzker et al. | Jun 1996 | A |
5534917 | MacDougall | Jul 1996 | A |
5554980 | Hashimoto et al. | Sep 1996 | A |
5555003 | Montgomery et al. | Sep 1996 | A |
5559925 | Austin | Sep 1996 | A |
5563988 | Maes et al. | Oct 1996 | A |
5570113 | Zetts | Oct 1996 | A |
5572651 | Weber et al. | Nov 1996 | A |
5577981 | Jarvik | Nov 1996 | A |
5580249 | Jacobsen et al. | Dec 1996 | A |
5581276 | Cipolla et al. | Dec 1996 | A |
5587558 | Matsushima | Dec 1996 | A |
5592401 | Kramer | Jan 1997 | A |
5594469 | Freeman et al. | Jan 1997 | A |
5597309 | Riess | Jan 1997 | A |
5598187 | Ide et al. | Jan 1997 | A |
5598523 | Fujita | Jan 1997 | A |
5600765 | Ando | Feb 1997 | A |
5615132 | Horton et al. | Mar 1997 | A |
5616078 | Oh | Apr 1997 | A |
5617312 | Iura et al. | Apr 1997 | A |
5638300 | Johnson | Jun 1997 | A |
5641288 | Zaenglein | Jun 1997 | A |
5666138 | Culver | Sep 1997 | A |
5682196 | Freeman | Oct 1997 | A |
5682229 | Wangler | Oct 1997 | A |
5687254 | Poon | Nov 1997 | A |
5690582 | Ulrich et al. | Nov 1997 | A |
5703367 | Hashimoto et al. | Dec 1997 | A |
5703623 | Hall et al. | Dec 1997 | A |
5704837 | Iwasaki et al. | Jan 1998 | A |
5715834 | Bergamasco et al. | Feb 1998 | A |
5719622 | Conway | Feb 1998 | A |
5724106 | Autry et al. | Mar 1998 | A |
5732227 | Kuzunuki et al. | Mar 1998 | A |
5741185 | Kwan et al. | Apr 1998 | A |
5748186 | Raman | May 1998 | A |
5757360 | Nitta et al. | May 1998 | A |
5801704 | Oohara et al. | Sep 1998 | A |
5801943 | Nasburg | Sep 1998 | A |
5819206 | Horton | Oct 1998 | A |
5825350 | Case et al. | Oct 1998 | A |
5828779 | Maggioni | Oct 1998 | A |
5835078 | Arita et al. | Nov 1998 | A |
5862256 | Zetts | Jan 1999 | A |
5864808 | Ando et al. | Jan 1999 | A |
5867158 | Murasaki | Feb 1999 | A |
5874941 | Yamada | Feb 1999 | A |
5874942 | Walker | Feb 1999 | A |
5875108 | Hoffberg et al. | Feb 1999 | A |
5875257 | Marrin et al. | Feb 1999 | A |
5877748 | Redlich | Mar 1999 | A |
5877803 | Wee et al. | Mar 1999 | A |
5878274 | Kono et al. | Mar 1999 | A |
5884249 | Namba et al. | Mar 1999 | A |
5902968 | Sato et al. | May 1999 | A |
5909189 | Blackman et al. | Jun 1999 | A |
5913727 | Ahdoot | Jun 1999 | A |
5920024 | Moore | Jul 1999 | A |
5929844 | Barnes | Jul 1999 | A |
5933125 | Fernie | Aug 1999 | A |
5947868 | Dugan | Sep 1999 | A |
5953683 | Hansen et al. | Sep 1999 | A |
5959574 | Poore, Jr. | Sep 1999 | A |
5980256 | Carmein | Nov 1999 | A |
5989157 | Walton | Nov 1999 | A |
5995649 | Marugame | Nov 1999 | A |
5999799 | Hu et al. | Dec 1999 | A |
6002808 | Freeman | Dec 1999 | A |
6005548 | Latypov et al. | Dec 1999 | A |
6009210 | Kang | Dec 1999 | A |
6021403 | Horvitz et al. | Feb 2000 | A |
6053814 | Pchenitchnikov | Apr 2000 | A |
6054991 | Crane et al. | Apr 2000 | A |
6058349 | Kikori et al. | May 2000 | A |
6066075 | Poulton | May 2000 | A |
6072467 | Walker | Jun 2000 | A |
6072494 | Nguyen | Jun 2000 | A |
6073489 | French et al. | Jun 2000 | A |
6077201 | Cheng et al. | Jun 2000 | A |
6084572 | Yaniger et al. | Jul 2000 | A |
6097374 | Howard | Aug 2000 | A |
6098458 | French et al. | Aug 2000 | A |
6100896 | Strohecker et al. | Aug 2000 | A |
6101289 | Kellner | Aug 2000 | A |
6111580 | Kazama et al. | Aug 2000 | A |
6125337 | Rosenberg et al. | Sep 2000 | A |
6128003 | Smith et al. | Oct 2000 | A |
6130677 | Kunz | Oct 2000 | A |
6133830 | D'Angelo et al. | Oct 2000 | A |
6141463 | Covell et al. | Oct 2000 | A |
6144366 | Numazaki et al. | Nov 2000 | A |
6147678 | Kumar et al. | Nov 2000 | A |
6150947 | Shima | Nov 2000 | A |
6152856 | Studor et al. | Nov 2000 | A |
6159100 | Smith | Dec 2000 | A |
6162123 | Woolston | Dec 2000 | A |
6173066 | Peurach et al. | Jan 2001 | B1 |
6181343 | Lyons | Jan 2001 | B1 |
6184863 | Sibert | Feb 2001 | B1 |
6188777 | Darrell et al. | Feb 2001 | B1 |
6195104 | Lyons | Feb 2001 | B1 |
6215890 | Matsuo | Apr 2001 | B1 |
6215898 | Woodfill et al. | Apr 2001 | B1 |
6222465 | Kumar et al. | Apr 2001 | B1 |
6226388 | Qiane et al. | May 2001 | B1 |
6226396 | Marugame | May 2001 | B1 |
6229102 | Sato et al. | May 2001 | B1 |
6229526 | Berstis | May 2001 | B1 |
6229913 | Nayar et al. | May 2001 | B1 |
6241609 | Rutgers | Jun 2001 | B1 |
6244873 | Hill et al. | Jun 2001 | B1 |
6249606 | Kiraly et al. | Jun 2001 | B1 |
6251011 | Yamazaki | Jun 2001 | B1 |
6256033 | Nguyen | Jul 2001 | B1 |
6256047 | Isobe et al. | Jul 2001 | B1 |
6256400 | Takata et al. | Jul 2001 | B1 |
6266061 | Doi et al. | Jul 2001 | B1 |
6269172 | Rehg et al. | Jul 2001 | B1 |
6275212 | Ohtani et al. | Aug 2001 | B1 |
6275214 | Hansen | Aug 2001 | B1 |
6283860 | Lyons et al. | Sep 2001 | B1 |
6287198 | Mccauley | Sep 2001 | B1 |
6289112 | Jain et al. | Sep 2001 | B1 |
6299308 | Voronka et al. | Oct 2001 | B1 |
6300933 | Nagasaki | Oct 2001 | B1 |
6301370 | Steffens et al. | Oct 2001 | B1 |
6308565 | French et al. | Oct 2001 | B1 |
6311159 | Van Tichelen et al. | Oct 2001 | B1 |
6312335 | Tosaki et al. | Nov 2001 | B1 |
6316934 | Amorai-Moriya et al. | Nov 2001 | B1 |
6344861 | Naughton et al. | Feb 2002 | B1 |
6345111 | Yamaguchi | Feb 2002 | B1 |
6359610 | Shah et al. | Mar 2002 | B1 |
6362842 | Tahara et al. | Mar 2002 | B1 |
6363160 | Bradski et al. | Mar 2002 | B1 |
6366273 | Rosenberg et al. | Apr 2002 | B1 |
6369794 | Sakurai | Apr 2002 | B1 |
6375572 | Masuyama | Apr 2002 | B1 |
6377296 | Zlatsin et al. | Apr 2002 | B1 |
6377396 | Zlatsin et al. | Apr 2002 | B1 |
6384737 | Hsu et al. | May 2002 | B1 |
6384819 | Hunter | May 2002 | B1 |
6411278 | Kage et al. | Jun 2002 | B1 |
6411744 | Edwards | Jun 2002 | B1 |
6419580 | Ito | Jul 2002 | B1 |
6421453 | Kanevsky et al. | Jul 2002 | B1 |
6430997 | French et al. | Aug 2002 | B1 |
6464139 | Wilz, Sr. et al. | Oct 2002 | B1 |
6469633 | Wachter | Oct 2002 | B1 |
6476834 | Doval et al. | Nov 2002 | B1 |
6496598 | Harman | Dec 2002 | B1 |
6499025 | Horvitz | Dec 2002 | B1 |
6502082 | Toyama et al. | Dec 2002 | B1 |
6503195 | Keller et al. | Jan 2003 | B1 |
6509889 | Kamper et al. | Jan 2003 | B2 |
6517438 | Tosaki et al. | Feb 2003 | B2 |
6539931 | Trajkovic et al. | Apr 2003 | B2 |
6542621 | Brill et al. | Apr 2003 | B1 |
6545661 | Goschy et al. | Apr 2003 | B1 |
6569019 | Cochran | May 2003 | B2 |
6570555 | Prevost et al. | May 2003 | B1 |
6573883 | Bartlett | Jun 2003 | B1 |
6591236 | Lewis et al. | Jul 2003 | B2 |
6594616 | Zhang et al. | Jul 2003 | B2 |
6597342 | Haruta | Jul 2003 | B1 |
6600475 | Gutta et al. | Jul 2003 | B2 |
6601055 | Roberts | Jul 2003 | B1 |
6603488 | Humpleman et al. | Aug 2003 | B2 |
6633294 | Rosenthal et al. | Oct 2003 | B1 |
6640202 | Dietz et al. | Oct 2003 | B1 |
6640337 | Lu | Oct 2003 | B1 |
6641482 | Masuyama | Nov 2003 | B2 |
6661918 | Gordon et al. | Dec 2003 | B1 |
6672467 | Merkel et al. | Jan 2004 | B2 |
6681031 | Cohen et al. | Jan 2004 | B2 |
6714665 | Hanna et al. | Mar 2004 | B1 |
6720949 | Pryor | Apr 2004 | B1 |
6727885 | Ishino | Apr 2004 | B1 |
6731799 | Sun et al. | May 2004 | B1 |
6734847 | Baldeweg et al. | May 2004 | B1 |
6738066 | Nguyen | May 2004 | B1 |
6739974 | Kanno et al. | May 2004 | B2 |
6744420 | Mohri | Jun 2004 | B2 |
6750848 | Pryor | Jun 2004 | B1 |
6752719 | Himoto | Jun 2004 | B2 |
6753879 | Deleeuw | Jun 2004 | B1 |
6761637 | Weston et al. | Jul 2004 | B2 |
6765726 | French et al. | Jul 2004 | B2 |
6766066 | Kitazawa | Jul 2004 | B2 |
6788809 | Grzeszczuk et al. | Sep 2004 | B1 |
6791531 | Johnston et al. | Sep 2004 | B1 |
6795567 | Cham et al. | Sep 2004 | B1 |
6801637 | Voronka et al. | Oct 2004 | B2 |
6804396 | Higaki et al. | Oct 2004 | B2 |
6821206 | Ishida et al. | Nov 2004 | B1 |
6868383 | Bangalore et al. | Mar 2005 | B1 |
6873723 | Aucsmith et al. | Mar 2005 | B1 |
6876496 | French et al. | Apr 2005 | B2 |
6888960 | Penev et al. | May 2005 | B2 |
6890262 | Oishi et al. | May 2005 | B2 |
6894716 | Harrington | May 2005 | B1 |
6906700 | Armstrong | Jun 2005 | B1 |
6918829 | Ikariko | Jul 2005 | B2 |
6921332 | Fukunaga et al. | Jul 2005 | B2 |
6929543 | Ueshima | Aug 2005 | B1 |
6931384 | Horvitz et al. | Aug 2005 | B1 |
6937742 | Roberts et al. | Aug 2005 | B2 |
6938220 | Shigematsu et al. | Aug 2005 | B1 |
6949024 | Kaku et al. | Sep 2005 | B2 |
6950534 | Cohen et al. | Sep 2005 | B2 |
6951515 | Ohshima | Oct 2005 | B2 |
6980312 | Czyszczewski et al. | Dec 2005 | B1 |
6990639 | Wilson | Jan 2006 | B2 |
7003134 | Covell et al. | Feb 2006 | B1 |
7007236 | Dempski et al. | Feb 2006 | B2 |
7016540 | Gong et al. | Mar 2006 | B1 |
7027039 | Henty | Apr 2006 | B1 |
7028269 | Cohen-Solal | Apr 2006 | B1 |
7036094 | Cohen et al. | Apr 2006 | B1 |
7036462 | Cohen | May 2006 | B2 |
7038855 | French et al. | May 2006 | B2 |
7039676 | Day et al. | May 2006 | B1 |
7042440 | Pryor et al. | May 2006 | B2 |
7050606 | Paul et al. | May 2006 | B2 |
7056216 | Ohshima | Jun 2006 | B2 |
7058204 | Hildreth et al. | Jun 2006 | B2 |
7060957 | Lange et al. | Jun 2006 | B2 |
7070500 | Nomi et al. | Jul 2006 | B1 |
7095401 | Liu et al. | Aug 2006 | B2 |
7102616 | Sleator | Sep 2006 | B1 |
7113918 | Ahmad et al. | Sep 2006 | B1 |
7121946 | Paul et al. | Oct 2006 | B2 |
7137126 | Coffman | Nov 2006 | B1 |
7148813 | Bauer | Dec 2006 | B2 |
7170492 | Bell | Jan 2007 | B2 |
7183480 | Nishitani et al. | Feb 2007 | B2 |
7184048 | Hunter | Feb 2007 | B2 |
7202898 | Braun et al. | Apr 2007 | B1 |
7222078 | Abelow | May 2007 | B2 |
7225414 | Sharma et al. | May 2007 | B1 |
7227526 | Hildreth et al. | Jun 2007 | B2 |
7259747 | Bell | Aug 2007 | B2 |
7262760 | Liberty | Aug 2007 | B2 |
7274800 | Nefian et al. | Sep 2007 | B2 |
7308112 | Fujimura et al. | Dec 2007 | B2 |
7317836 | Fujimura et al. | Jan 2008 | B2 |
7331856 | Nakamura et al. | Feb 2008 | B1 |
7348963 | Bell | Mar 2008 | B2 |
7359121 | French et al. | Apr 2008 | B2 |
7367887 | Watabe et al. | May 2008 | B2 |
7372977 | Fujimura et al. | May 2008 | B2 |
7379563 | Shamaie | May 2008 | B2 |
7379566 | Hildreth | May 2008 | B2 |
7389591 | Jaiswal et al. | Jun 2008 | B2 |
7394346 | Bodin | Jul 2008 | B2 |
7412077 | Li et al. | Aug 2008 | B2 |
7414611 | Liberty | Aug 2008 | B2 |
7421093 | Hildreth et al. | Sep 2008 | B2 |
7430312 | Gu | Sep 2008 | B2 |
7436496 | Kawahito | Oct 2008 | B2 |
7450736 | Yang et al. | Nov 2008 | B2 |
7452275 | Kuraishi | Nov 2008 | B2 |
7460690 | Cohen et al. | Dec 2008 | B2 |
7477236 | Ofek et al. | Jan 2009 | B2 |
7489812 | Fox et al. | Feb 2009 | B2 |
7492367 | Mahajan | Feb 2009 | B2 |
7536032 | Bell | May 2009 | B2 |
7552403 | Wilson | Jun 2009 | B2 |
7555142 | Hildreth et al. | Jun 2009 | B2 |
7560701 | Oggier et al. | Jul 2009 | B2 |
7570805 | Gu | Aug 2009 | B2 |
7574020 | Shamaie | Aug 2009 | B2 |
7576727 | Bell | Aug 2009 | B2 |
7590262 | Fujimura et al. | Sep 2009 | B2 |
7593552 | Higaki et al. | Sep 2009 | B2 |
7596767 | Wilson | Sep 2009 | B2 |
7598942 | Underkoffler et al. | Oct 2009 | B2 |
7607509 | Schmiz et al. | Oct 2009 | B2 |
7620202 | Fujimura et al. | Nov 2009 | B2 |
7665041 | Wilson et al. | Feb 2010 | B2 |
7668340 | Cohen et al. | Feb 2010 | B2 |
7680298 | Roberts et al. | Mar 2010 | B2 |
7683954 | Ichikawa et al. | Mar 2010 | B2 |
7684592 | Paul et al. | Mar 2010 | B2 |
7701439 | Hillis et al. | Apr 2010 | B2 |
7702130 | Im et al. | Apr 2010 | B2 |
7704135 | Harrison, Jr. | Apr 2010 | B2 |
7710391 | Bell et al. | May 2010 | B2 |
7721231 | Wilson | May 2010 | B2 |
7729530 | Antonov et al. | Jun 2010 | B2 |
7746345 | Hunter | Jun 2010 | B2 |
7760182 | Ahmad et al. | Jul 2010 | B2 |
7809167 | Bell | Oct 2010 | B2 |
7823089 | Wilson | Oct 2010 | B2 |
7833100 | Dohta | Nov 2010 | B2 |
7834846 | Bell | Nov 2010 | B1 |
7852262 | Namineni et al. | Dec 2010 | B2 |
7890199 | Inagaki | Feb 2011 | B2 |
RE42256 | Edwards | Mar 2011 | E |
7898522 | Hildreth et al. | Mar 2011 | B2 |
7988558 | Sato | Aug 2011 | B2 |
8035612 | Bell et al. | Oct 2011 | B2 |
8035614 | Bell et al. | Oct 2011 | B2 |
8035624 | Bell et al. | Oct 2011 | B2 |
8072424 | Liberty | Dec 2011 | B2 |
8072470 | Marks | Dec 2011 | B2 |
8132126 | Wilson | Mar 2012 | B2 |
8271287 | Kermani | Sep 2012 | B1 |
8456419 | Wilson | Jun 2013 | B2 |
8553094 | Lin | Oct 2013 | B2 |
8707216 | Wilson | Apr 2014 | B2 |
8745541 | Wilson et al. | Jun 2014 | B2 |
8747224 | Miyazaki et al. | Jun 2014 | B2 |
8814688 | Barney et al. | Aug 2014 | B2 |
8834271 | Ikeda | Sep 2014 | B2 |
8858336 | Sawano et al. | Oct 2014 | B2 |
8952894 | Wilson | Feb 2015 | B2 |
9652042 | Wilson et al. | May 2017 | B2 |
20010010514 | Ishino | Aug 2001 | A1 |
20020004422 | Tosaki | Jan 2002 | A1 |
20020024500 | Howard | Feb 2002 | A1 |
20020041327 | Hildreth et al. | Apr 2002 | A1 |
20020057383 | Iwamura | May 2002 | A1 |
20020098887 | Himoto et al. | Jul 2002 | A1 |
20020103610 | Bachmann et al. | Aug 2002 | A1 |
20020157116 | Jasinschi | Oct 2002 | A1 |
20020178344 | Bourguet et al. | Nov 2002 | A1 |
20030069077 | Korienek | Apr 2003 | A1 |
20030093375 | Green et al. | May 2003 | A1 |
20030149803 | Wilson | Aug 2003 | A1 |
20030156756 | Gokturk et al. | Aug 2003 | A1 |
20030195820 | Silverbrook et al. | Oct 2003 | A1 |
20030207718 | Perlmutter | Nov 2003 | A1 |
20040001113 | Zipperer et al. | Jan 2004 | A1 |
20040056907 | Sharma et al. | Mar 2004 | A1 |
20040070564 | Dawson et al. | Apr 2004 | A1 |
20040095317 | Zhang et al. | May 2004 | A1 |
20040113933 | Guler | Jun 2004 | A1 |
20040155902 | Dempski et al. | Aug 2004 | A1 |
20040155962 | Marks | Aug 2004 | A1 |
20040189720 | Wilson et al. | Sep 2004 | A1 |
20040193413 | Wilson et al. | Sep 2004 | A1 |
20040204240 | Barney | Oct 2004 | A1 |
20040208588 | Colmenarez et al. | Oct 2004 | A1 |
20050086211 | Mayer | Apr 2005 | A1 |
20050110751 | Wilson et al. | May 2005 | A1 |
20050151850 | Ahn et al. | Jul 2005 | A1 |
20050156883 | Wilson et al. | Jul 2005 | A1 |
20050212753 | Marvit et al. | Sep 2005 | A1 |
20050238201 | Shamaie | Oct 2005 | A1 |
20050255434 | Lok et al. | Nov 2005 | A1 |
20050275637 | Hinckley et al. | Dec 2005 | A1 |
20060007142 | Wilson et al. | Jan 2006 | A1 |
20060033713 | Pryor | Feb 2006 | A1 |
20060036944 | Wilson | Feb 2006 | A1 |
20060092267 | Dempski et al. | May 2006 | A1 |
20060098873 | Hildreth et al. | May 2006 | A1 |
20060178212 | Penzias | Aug 2006 | A1 |
20070252898 | Delean | Jan 2007 | A1 |
20070060383 | Dohta | Mar 2007 | A1 |
20070091084 | Ueshima | Apr 2007 | A1 |
20080025137 | Rajan et al. | Jan 2008 | A1 |
20080026838 | Dunstan et al. | Jan 2008 | A1 |
20080036732 | Wilson et al. | Feb 2008 | A1 |
20080094351 | Nogami et al. | Apr 2008 | A1 |
20080122786 | Pryor et al. | May 2008 | A1 |
20080192007 | Wilson | Aug 2008 | A1 |
20080193043 | Wilson | Aug 2008 | A1 |
20080204410 | Wilson | Aug 2008 | A1 |
20080204411 | Wilson | Aug 2008 | A1 |
20080259055 | Wilson | Oct 2008 | A1 |
20090121894 | Wilson et al. | May 2009 | A1 |
20090198354 | Wilson | Aug 2009 | A1 |
20090221368 | Yen et al. | Sep 2009 | A1 |
20090278799 | Wilson et al. | Nov 2009 | A1 |
20100031202 | Morris et al. | Feb 2010 | A1 |
20100105479 | Wilson et al. | Apr 2010 | A1 |
20100123605 | Wilson | May 2010 | A1 |
20100146464 | Wilson et al. | Jun 2010 | A1 |
20100151946 | Wilson et al. | Jun 2010 | A1 |
20100253624 | Wilson | Oct 2010 | A1 |
20110001696 | Wilson | Jan 2011 | A1 |
20110004329 | Wilson | Jan 2011 | A1 |
20110059798 | Pryor | Mar 2011 | A1 |
20140142729 | Lobb et al. | May 2014 | A1 |
20190235645 | Wilson | Aug 2019 | A1 |
Number | Date | Country |
---|---|---|
101254344 | Jun 2010 | CN |
0357909 | Mar 1990 | EP |
0583061 | Feb 1994 | EP |
0629988 | Dec 1994 | EP |
0919906 | Feb 1999 | EP |
61161537 | Jul 1986 | JP |
8038741 | Feb 1996 | JP |
08044490 | Feb 1996 | JP |
2002153673 | May 2002 | JP |
2003058317 | Feb 2003 | JP |
9310708 | Jun 1993 | WO |
WO 9403770 | Feb 1994 | WO |
9519031 | Jul 1995 | WO |
9717598 | May 1997 | WO |
9803907 | Jan 1998 | WO |
9924890 | May 1999 | WO |
9944698 | Sep 1999 | WO |
0063874 | Oct 2000 | WO |
0073995 | Dec 2000 | WO |
0100528 | Jan 2001 | WO |
0140807 | Jun 2001 | WO |
0169365 | Sep 2001 | WO |
WO 0201589 | Jan 2002 | WO |
WO 2002009025 | Jan 2002 | WO |
0215560 | Feb 2002 | WO |
WO 2009059065 | Jul 2009 | WO |
9942920 | Aug 2009 | WO |
Entry |
---|
European Search Report, Application No. EP09006844, dated Sep. 30, 2009, 66 pp. |
Aimone, C., R. Fung, A. Khisti, M. Varia. The Head Mounted Control System. University of Toronto CSIDC Competition Submission, May 2001 . |
Fisher, R.B., A.P. Ashbrook, C. Robertson, N. Werghi, A low-cost range finder using a visually located, structured light source, Proc. 2nd Int'l Conf. on 3-D Digital Imaging and Modeling, Ottawa, Canada, Oct. 1999, pp. 24-33. |
Foerster, Friedrich and Fahrenberg, Jochen, Motion Pattern and Posture: Correctly Assessed by Calibrated Accelerometers (Scientific Paper) Publication Date: Mar. 2000. |
Jojic, N., B. Brummiott, B. Meyers, S. Harris, and T. Huang, Estimation of Pointing Parameters in Dense disparity Maps. In IEEE Intl. Conf. on Automatic face and Gesture Recognition, (Grenbole, France, 2000). |
Kohler, M.R.J. System Architecture and Techniques for Gesture Recognition in Unconstraint Environments, Proc. Of the 1997 Int. Conf. on Virtual Systems and Multimedia (VSMM'97), Geneva, Sep. 10-12, 1997, pp. 137-146. |
Masaaki Fukomoto, et al.: “Finger-Pointer: Pointing Interface by Image Processing,” Computers and Graphics, Pergamon Press Ltd., Oxford, GB, vol. 18, No. 5, Sep. 1, 1994 (Sep. 1, 1994), pp. 633-642, XP000546603, ISSN: 0097-8493 *the whole document*. |
Masui, T. and I. Siio, Real-world graphical user interface, Proceedings of the Internatioinal Symposium on Handheld and Ubiquitous Computing (HUC2000, Sep. 2000, pp. 72-84. |
Mckenzie, Mill K., et al.: “Integrating speech and two-dimensional gesture input—a studyof redundancy between modes,” Computer Human Interaction Conference, 1998. Proceedings. 1998 Australian Adelaide, SA, Australia Nov.30-Dec. 4, 1998, Los Alamitos, CA USA IEEE Comput. Soc, US Nov. 30, 1998 (Nov. 30, 1998), pp. 6-13, XP010313626, ISN: 0-8186-9206-5 *the whole document*. |
Michael Johnston and Srinivas Bangalore, “Finite-state Multomedia Parsing and Understanding,” AT&T Labs-Research Shannon Laboratory, Jul. 2000. |
Olsen, D.R.J., T. Nielsen, Laser Pointer Interaction. In Proceedings CHI'2001:Human Factors in Computing Systems (Seattle, 2001), 17-22. |
Oviatt, S., Mutual disambiguation of recognition errors with multimodal architecture, Proc. of the CHI '99 Conference on Human Factors in Computing Systems, 1999, pp. 576-583, ACM Press. |
Oviatt, S.L. Taming Speech Recognition Errors Within a Multimodal Interface. Communications of the ACM, 43(9). 45-51. Published in 2000. |
Pavlovic, V. I., et al.: “Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review,” IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Service Center, Los Alamitos, CA, US, vol. 19, No. 7, Jul. 1997 (Jul. 1997), pp. 677-695, XP000698168. ISSN: 0162-8828 *the whole document. |
Pearl, J. Probabilistic Reasoning in Intelligent Systems, Morgan Kaufmann, San Mateo, California, 1988. |
Perng, J.D., Fisher, B., Hollar, S., Pister, K.S.J., Acceleration sensing glove (ASG) Meeting Date: Oct. 18, 1999-Oct. 19, 1999. |
Priyantha, N.B., A. Chakraborty, H. Balakrishnan, The Cricket Location-Support System. In Proceedings 6th ACM MOBICOM, (Boston, MA, 2000). |
Rabiner, L.R. and B-H. Juang, An Introduction to Hidden Markov Models. IEEE ASSP Magazine (Jan. 86) 4-15. |
Randell, C. and Henk Muller. Low Cost Indoor Positioning System. In Ubicomp 2001:Ubiquitous Computing, (Atlanta, Georgia, 2001), Springer-Verlag, 42-48. |
Starner, T., et al.: “The Gesture Pendant: A Self-Illuminating, Wearable, Infrared Computer Vision System for Home Automation Control and Medical Monitoring,” Intl. Symposium on Wearable Computers. Digest of Papers, No. 4, Oct. 16, 2000 (Oct. 16, 2000), pp. 87-94, XP002907652 pp. 87-88. |
Swindells, C., K. Inkpen, J. Dill, M. Tory, That one there! Pointing to establish device identify, Proceedings of the 15th Annual ACM Symposium on User Interface Software and Technology, 2002 Paris, pp. 151-160. |
Wexelblat, A., “An Approach to Natural Gesture in Virtual Environment,” ACM Transactions on Computer-Human Interaction, Avol. 2, No. 3, Sep. 1995, pp. 179-200. |
Sharon Oviatt et al, “Multimodal Interfaces That Process What Comes Naturally,” Communications of the ACM, vol. 43, No. 3, Mar. 2000 (Mar. 2000), pp. 45-53. |
U.S. Appl. No. 12/116,049: Final Office Action dated Dec. 21, 2012, 13 pages. |
U.S. Appl. No. 12/116,813: Final Office Action dated Dec. 5, 2012, 10 pages. |
U.S. Appl. No. 12/116,813: Non-final Office Action dated Jun. 26, 2013, 8 pages. |
U.S. Appl. No. 12/393,045: Non-final Office Action dated Sep. 13, 2011, 7 pages. |
U.S. Appl. No. 12/393,045: Final Office Action dated May 14, 2012, 8 pages. |
United States Patent Application No. 12/393,045: Notice of Allowance dated Jul. 22, 2013. |
U.S. Appl. No. 12/489,768: Non-Final Office Action dated Aug. 6, 2013, 11 pages. |
U.S. Appl. No. 12/116,049: Non-Final Office Action dated Jul. 10, 2012, 12 pages. |
U.S. Appl. No. 12/880,901: Non-Final Office Action dated Jun. 4, 2012, 12 pages. |
U.S. Appl. No. 12/884,373: Non-Final Office Action dated Jun. 27, 2012, 14 pages. |
Young-Hoo Kwon, “Rotation Matrix”, 1998, pp. 1-4, http://www.kwon3d.com/theory/transform/rot.html. |
Kanade et al., “A Stereo Machine for Video-rate Dense Depth Mapping and Its New Applications”, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1996, pp. 196-202,The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA. |
Miyagawa et al., “CCD-Based Range Finding Sensor”, Oct. 1997, pp. 1648-1652, vol. 44 No. 10, IEEE Transactions on Electron Devices. |
Rosenhahn et al., “Automatic Human Model Generation”, 2005, pp. 41-48, University of Auckland (CITR), New Zealand. |
Aggarwal et al., “Human Motion Analysis: A Review”, IEEE Nonrigid and Articulated Motion Workshop, 1997, University of Texas at Austin, Austin, TX. |
Shao et al., “An Open System Architecture for a Multimedia and Multimodal User Interface”, Aug. 24, 1998, Japanese Society for Rehabilitation of Persons with Disabilities (JSRPD), Japan. |
Kohler, “Special Topics of Gesture Recognition Applied in Intelligent Home Environments”, In Proceedings of the Gesture Workshop, 1998, pp. 285-296, Germany. |
Kohler, “Vision Based Remote Control in Intelligent Home Environments”, University of Erlangen-Nuremberg/Germany, 1996, pp. 147-154, Germany. |
Kohler, “Technical Details and Ergonomical Aspects of Gesture Recognition applied in Intelligent Home Environments”, 1997, Germany. |
Hasegawa et al., “Human-Scale Haptic Interaction with a Reactive Virtual Human in a Real-Time Physics Simulator”, Jul. 2006, vol. 4, No. 3, Article 6C, ACM Computers in Entertainment, New York, NY. |
Qian et al., “A Gesture-Driven Multimodal Interactive Dance System”, Jun. 2004, pp. 1579-1582, IEEE International Conference on Multimedia and Expo (ICME), Taipei, Taiwan. |
Zhao, “Dressed Human Modeling, Detection, and Parts Localization”, 2001, The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA. |
He, “Generation of Human Body Models”, Apr. 2005, University of Auckland, New Zealand. |
Isard et al., “CONDENSATION—Conditional Density Propagation for Visual Tracking”, 1998, pp. 5-28, International Journal of Computer Vision 29(1), Netherlands. |
Livingston, “Vision-based Tracking with Dynamic Structured Light for Video See-through Augmented Reality”, 1998, University of North Carolina at Chapel Hill, North Carolina, USA. |
Wren et al., “Pfinder: Real-Time Tracking of the Human Body”, MIT Media Laboratory Perceptual Computing Section Technical Report No. 353, Jul. 1997, vol. 19, No. 7, pp. 780-785, IEEE Transactions on Pattern Analysis and Machine Intelligence, Caimbridge, MA. |
Breen et al., “Interactive Occlusion and Collusion of Real and Virtual Objects in Augmented Reality”, Technical Report ECRC-95-02, 1995, European Computer-Industry Research Center GmbH, Munich, Germany. |
Freeman et al., “Television Control by Hand Gestures”, Dec. 1994, Mitsubishi Electric Research Laboratories, TR94-24, Caimbridge, MA. |
Hongo et al., “Focus of Attention for Face and Hand Gesture Recognition Using Multiple Cameras”, Mar. 2000, pp. 156-161, 4th IEEE International Conference on Automatic Face and Gesture Recognition, Grenoble, France. |
Pavlovic et al., “Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review”, Jul. 1997, pp. 677-695, vol. 19, No. 7, IEEE Transactions on Pattern Analysis and Machine Intelligence. |
Azarbayejani et al., “Visually Controlled Graphics”, Jun. 1993, vol. 15, No. 6, IEEE Transactions on Pattern Analysis and Machine Intelligence. |
Granieri et al., “Simulating Humans in VR”, The British Computer Society, Oct. 1994, Academic Press. |
Brogan et al., “Dynamically Simulated Characters in Virtual Environments”, Sep./Oct. 1998, pp. 2-13, vol. 18, Issue 5, IEEE Computer Graphics and Applications. |
Fisher et al., “Virtual Environment Display System”, ACM Workshop on Interactive 3D Graphics, Oct. 1986, Chapel Hill, NC. |
“Virtual High Anxiety”, Tech Update, Aug. 1995, pp. 22. |
Sheridan et al., “Virtual Reality Check”, Technology Review, Oct. 1993, pp. 22-28, vol. 96, No. 7. |
Stevens, “Flights into Virtual Reality Treating Real World Disorders”, The Washington Post, Mar. 27, 1995, Science Psychology, 2 pages. |
“Simulation and Training”, 1994, Division Incorporated. |
U.S. Appl. No. 12/116,813; Notice of Allowance; dated Dec. 9, 2013; 11 pages. |
U.S. Appl. No. 12/116,813; Non-Final Office Action; dated May 28, 2015; 10 pages. |
Huber; “3-D Real-Time Gesture Recognition Using Proximity Spaces”; Proceedings Third IEEE Workshop on Applications of Computer Vision, WACV 96; Dec. 1996; p. 136-141. |
Kuno et al.; “Intelligent Wheelchair Based on the Integration of Human and Environment Observations”; Information Intelligence and Systems; Proceedings 1999 Int'l Conference; p. 342-349. |
Takahashi et al.; “Recognition of Dexterous Manipulations from Time-Varying Images”; Motion of Non-Rigid and Articulated Objects; Proceedings of the 1994 IEEE Workshop; Nov. 1994; p. 23-28. |
Bruns; “Integrated Real and Virtual Prototyping”; Industrial Electronics Society; Proceedings of the 24th Annual Conf.; Sep. 1998; p. 2137-2142. |
“GWindows: Light-weight Stereo Vision for Inte, raction”. http://research.microsoft.com/- nuria/gwindows/gwindows.htm. Last accessed Jul. 8, 2005, 2 pages. |
Azarbayejani et al., “Real-Time Self-calibrating Stereo Person Tracking Using 3-D Shape Estimation from Blob Features, Proceedings of ICPR, Aug. 1996, pp. 627-632, Vienna, Austria.”. |
Azoz, et al. “Reliable Tracking of Human Arm Dynamics by Multiple Cue Integration and Constraint Fusion”. IEEE Conference on Computer and Pattern Recognition, 1998. |
Baudel, Thomas et al. “Charade. Remote Control of Objects Using Free-Hand Gestures” Communications of the ACM, 1993, pp. 28-35, vol. 36, Issue 7. ACM Press, New York, New York, USA. |
Berard, Francois, “The Perceptual Window-Head Motion as a New Input Stream”, Proceedings of the Seventh IFIP Conference on Human-Computer Interaction, 1999, pp. 238-244. |
Bolt, Richard A., “Put-That-There: Voice and Gesture at the Graphics interface”, ACM Press, 1980, 262-270. |
Brumitt, Barry & Cadiz, J.J., “Let there be light! Comparing Interfaces for Homes of the Future”, Sep. 21, 2000. |
Buxton, et al. “A Study of Two-Handed Input”, Proceedings of CHI '86, 1986, pp. 321-326. Last accessed Jul. 8, 2005, 6 pages. |
Cedras, Claudette et al., “Motion-Based Recognition: A Survey, University of Central Florida”, 1995, pp. 1-41, Orlando, Florida, USA. |
Darrell, et al. “Integrated Person Tracking Using Stereo, Color, and Pattern Detection, Proceedings of the Conference on Computer Vision and Pattern Recognition”, 1998, pp. 601-609. Last accessed Jul. 8, 2005, 10 pages. |
Deng J. W. et al., “An HMM-based approach for gesture segmentation and recognition”, Pattern Recognition, 2000. Proceedings 15th International Conference on Sep. 3-7, 2000; [Proceedings of the International Conference on Pattern Recognition. (ICPR)], Los Alamitos, CA, USA, IEEE Comput. Soc, US, vol. 3, Sep. 3, 2000, pp. 679-682, XP010533379, ISBN: 978-0-7695-0750-7. |
Ehrenmann M. et al., “Dynamic gestures as an input device for directing a mobile platform”, Proceedings of the 2001 IEEE International Conference on Robotics and Automation. ICRA 2001. Seoul, Korea, May 21-26, 2001; [Proceedings of the IEEE International Conference on Robotics and Automation], New York, NY: IEEE, US, vol. 3, May 21, 2001 pp. 2596-2601, XP010550535, ISBN: 978-0-7803-6576-6. |
European Patent Application No. 09006844.6, Communication dated Aug. 10, 2012, 6 pages. |
European Search Report, Application No. 03002829.4-2415, Date of Completion: Oct. 30, 2007, dated Nov. 7, 2007. |
European Search Report, Application No. EP 09006844, dated Sep. 30, 2009, 6 pages. |
Fitzgerald, Will et al., “Multimodal Event Parsing for Intelligent User Interfaces”, IUI Mar. 2003, pp. 53-60, Miami, Florida, USA. |
Freeman, William T. and Weissman, Craig D., “Television Control by Hand Gestures”, International Workshop on Automatic Face and Gesture Recognition, 1995, 5 pages. |
Graham, Brian Barkley, “Using an Accelerometer Sensor to Measure Human Hand Motion”, Massachusetts Institute of Technology, May 11, 2000, 110 pp. |
Guiard, “Asymmetric Division of Labor in Human Skilled Bimanual Action: The Kinematic Chain as a Model”, Journal of Motor Behavior, 1987, pp. 486-517, vol. 19, No. 4. |
Horvitz, Eric et al., “A Computational Architecture for Conversation”, Proceedings of the Seventh International Conference on User Modeling, 1999, pp. 201-210. |
Horvitz, Eric Principles of Mixed-Initiative User Interfaces, Proceedings of CHI, 1999, 8 pages. |
Ikushi Yoda, et al., “Utilization of Stereo Disparity and Optical Flow information for Human Interaction”. Proceedings 32 of the Sixth International Conference on Computer Vision, 1998, 5 pages, IEEE Computer Society, Washington D.C. USA. |
Kabbash et al. “The “Prince” Technique: Fitts′ Law and Selection Using Area Cursors”, Proceedings of CHI '95, 1995, pp. 273-279. http://w.billbuxton.comlprince.html. Last accessed Jul. 8, 2005, 11 pages. |
Kallmann, Marcelo et al., “Direct 3D Interaction with Smart Objects”, ACM 1999. |
Kanade, et al. “Development of a Video-Rate Stereo Machine”, Proceedings of 94 ARPA Image Understanding Workshop, 1994, pp. 549-558. Last accessed Sep. 30, 2008, 4 pages. |
Kettebekov, et al., “Toward Natural Gesture/Speech Control of a Large Display”, In Proceedings of the 8th IFIP International Conference on Engineering for Human-Computer Interaction, Lecture Notes in Computer Science, 2001, 13 pages. |
Kirsein et al., “Interaction with a Projection Screen Using a Camera-tracked Laser Pointer,” 1998, 2 pages. |
Kjeldsen, Frederik C. M. “Visual Interpretation of Hand Gestures as a Practical Interface Modality”, Ph.D. Dissertation, 1997, Columbia University Department of Computer Science, 168 pages. |
Krahnstoever, et al., “Multimodal Human-Computer Interaction for Crisis Management Systems”, In Proceedings of the Sixth IEEE Workshop on Applications of Computer Vision, 2002, 5 pages. |
Krum, et al., “Speech and Gesture Multimodal Control of a Whole Earth 3D Visualization Environment”, In Proceedings of Eurographics'IEEE Visualization Symposium, 2002, 6 pages. |
Lee H-K. et al., “An HMM-Based Threshold Model Approach for Gesture Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Service Center, Los Alamitos, CA, US, vol. 21, No. 10, Oct. 1, 1999, pp. 961-976, XP000853312, ISSN:0162-8828. |
Long, Jr., et al. “Implications for a Gesture Design Tool”, Proceedings of CHI '99, 1999, pp. 40-47. Last accessed Jul. 8, 2005, 8 pages. |
Maes, Pattie, et al., “The ALIVE System: Wireless, Full-body, Interaction with Autonomous Agents, ACM Multimedia Systems”, Special Issue on Multimedia and Multisensory Virtual Worlds, 1996. |
Mignot, Cristophe et al., “An Experimental Study of Future ‘Natural’ Multimodal Human-Computer Interaction”, Proceedings of INTERCH193, 1993, pp. 67-68. |
Moeslund, Thomas et al., “A Survey of Computer Vision-Based Human Motion Capture”, Computer: Vision and Image Understanding, 2001, pp. 231-268, vol. 81, Issue 3, Elsevier Science Inc., New York, New York, USA. This article may be accessed via the internet at URL: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.21.4051&rep=rep1&type=pdf. |
Moore et al., “Exploiting Human Actions and Object Context for Recognition Tasks”, presented at the 7th IEEE International Conference on Computer Vision, Corfu, Greece Sep. 20-27, 1999. |
Moyle, et al. “Gesture Navigation: An Alternative ‘Back’ for the Future”, Proceedings of CHI '02, 2002, pp. 822-823. |
Nielsen, Michael et al., “A Procedure for Developing Intuitive and Ergonomic Gesture Interfaces for Man-Machine Interaction”. Technical Report CVMT 03-01. ISSN 1601-3646, Aalborg University, Mar. 2003, 12 pages. |
Office Action dated Apr. 16, 2009 in U.S. Appl. No. 10/724,950, 21 pages. |
Office Action dated Dec. 28, 2009 in U.S. Appl. No. 10/724,950, 19 pages. |
Office Action dated Feb. 20, 2009 in U.S. Appl. No. 10/396,653, 12 pages. |
Office Action dated Feb. 25, 2008 in U.S. Appl. No. 10/396,653, 20 pages. |
Office Action dated Feb. 26, 2007 in U.S. Appl. No. 10/396,653, 18 pages. |
Office Action dated Jun. 20, 2007 in U.S. Appl. No. 10/724,950, 8 pages. |
Office Action dated Jun. 29, 2010 in U.S. Appl. No. 10/724,950, 20 pages. |
Office Action dated May 16, 2008 in U.S. Appl. No. 10/724,950, 18 pages. |
Office Action dated Nov. 14, 2008 in U.S. Appl. No. 10/724,950, 24 pages. |
Office Action dated Nov. 29, 2007 in U.S. Appl. No. 10/724,950, 16 page. |
Office Action dated Sep. 19, 2006 in U.S. Appl. No. 10/396,653, 24 pages. |
Office Action dated Sep. 6, 2007 in U.S. Appl. No. 10/396,653, 17 pages. |
Office Action dated Sep. 8, 2008 in U.S. Appl. No. 10/396,653, 13 pages. |
Oh, et al. “Evaluating Look-to-Talk: A Gaze-Aware Interface in a Collaborative Environment”, CHI '02, 2002, pp. 650-651. Last accessed Jul. 8, 2005, 3 pages. |
Oviatt, S., “Ten Myths of Multimodal Interaction”, Communications of the ACM, 1999, pp. 74-81, vol. 42, Issue 11, ACM Press, New York, New York, USA. |
Oviatt, S., et al., “Integration and Synchronization of Input Modes during Multimodal Human-Computer Interaction”, CHI 97, Atlanta, GA, ACM Press, 1997, 415-422. |
Rigoll, et al. “High Performance Real-Time Gesture Recognition Using Hidden Markov Models”. Gesture and Sign Language in Human-Computer Interaction, vol. LNAI1371, Frolich, ed., pp. 69-80, 1997. |
Savidis et al., “Design User-Adapted Interfaces: The Unified Design Method for Transformable Interactions”, pp. 323-334, 1997 ACM. |
Sharma, et al. “Method of Visual and Acoustic Signal Co-Analysis for Co-Verbal Gesture Recognition”, 20020919. U.S. Appl. No. 60/413,998, 2002. |
Sharma, R. et al., “Speech-Gesture Driven Multimodal Interfaces for Crisis Management”. Proceedings of the IEEE, 2003, pp. 1327-1354, vol. 91, Issue 9. |
Shumin Zhai, et al., “The Silk Cursor: Investigating Transparency for 3D Target Acquisition”, CHI '94. 1994, pp. 273-279. |
Tadesse, H., Office Action, dated Mar. 16, 2005, pp. 1-12. |
U.S. Appl. No. 12/489,768: Final Office Action dated Apr. 18, 2012, 11 pages. |
U.S. Appl. No. 12/489,768: Non-Final Office Action dated Sep. 26, 2011, 7 pages. |
Welford, “Signal, Noise, Performance, and Age, Human Factors”, 1981, pp. 97-109, vol. 1, Issue 23, .http://www. ingentaconnect.com/contentlhfes/hf/1981/00000023/0000000I/art00009. |
Wilson, Andrew et al., GWindows: Towards Robust Perception-Based UI, Microsoft Research, 2003, pp. 1-8. |
Wilson, et al. “Hidden Markov Models for Modeling and Recognizing Gesture Under Variation,” Hidden Markov Models: Applications in Computer Vision. T. Caelli, ed., World Scientific, pp. 123-160, 2001. |
Worden, Aileen et al., “Making Computers Easier for Older Adults to Use: Area Cursors and Sticky Icons”, CHI 97, 1997, pp. 266-271, Atlanta. Georgia, USA. |
Zhang, “A Flexible New Technique for Camera Calibration, IEEE Transactions on Pattern Analysis and Machine Intelligence”, Nov. 2000, pp. 1330-1334, vol. 1, No. 11. Last accessed Nov. 23, 2005, 5 pages. |
Zhang, Zhengyou, “Flexible Camera calibration by Viewing a Plane from Unknown Orientations”, Microsoft Research, 1999,8 pages. |
“Microsoft Computer Dictionary”, Fourth Edition, 1997, 7 Pages. |
“Office Action Issued in European Patent Application No. 03002829.4”, dated Feb. 4, 2018, 9 Pages. |
“Office Action Issued in European Patent Application No. 03002829.4”, dated May 18, 2010, 8 Pages. |
“Partial Search Report Issued in European Patent Application No. 03002829.4”, dated Sep. 6, 2007, 5 Pages. |
“Office Action Issued in European Patent Application No. 09006844.6”, dated Apr. 19, 2016, 6 Pages. |
“Office Action Issued in European Patent Application No. 09006844.6”, dated Nov. 12, 2009, 6 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 10/160,692”, dated Sep. 20, 2004, 7 Pages. |
“Final Office Action Issued in U.S. Appl. No. 10/724,950”, dated Feb. 1, 2011, 25 Pages. |
“Final Office Action Issued in U.S. Appl. No. 10/724,950”, dated Apr. 5, 2012, 31 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 10/724,950”, dated May 20, 2011, 29 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 10/724,950”, dated Oct. 13, 2011, 29 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 10/724,950”, dated Oct. 5, 2010, 23 Pages. |
“Final Office Action Issued in U.S. Appl. No. 11/118,720”, dated May 20, 2008, 9 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 11/118,720”, dated Nov. 9, 2007, 8 Pages. |
“Final Office Action Issued in U.S. Appl. No. 11/156,873”, dated Sep. 30, 2008, 13 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 11/156,873”, dated Jan. 30, 2008, 9 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 11/185,399”, dated Jun. 6, 2008, 10 Pages. |
“Final Office Action Issued in U.S. Appl. No. 12/106,091”, dated Feb. 14, 2012, 15 pages. |
“Non Final Office Action Issued in U.S. Appl. No. 12/106,091”, dated Sep. 8, 2011, 13 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 12/106,097”, dated Feb. 18, 2010, 11 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 12/116,813”, dated Nov. 16, 2015, 6 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 12/116,813”, dated Jun. 26, 2012, 18 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 12/191,883”, dated May 25, 2011, 5 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 12/289,099”, dated May 14, 2013, 23 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 12/385,796”, dated Sep. 27, 2012, 25 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 12/385,796”, dated Feb. 28, 2013, 29 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 12/385,796”, dated Dec. 17, 2013, 6 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 12/457,656”, dated Sep. 7, 2011, 14 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 12/457,656”, dated Jul. 13, 2012, 17 Pages. |
“Final Office Action Issued in U.S. Appl. No. 12/494,303”, dated Oct. 9, 2013, 10 Pages. |
“Final Office Action Issued in U.S. Appl. No. 12/494,303”, dated Jan. 29, 2014, 12 Pages. |
“Final Office Action Issued in U.S. Appl. No. 12/494,303”, dated Mar. 15, 2018, 20 Pages. |
“Final Office Action Issued in U.S. Appl. No. 12/494,303”, dated Sep. 26, 2016, 13 Pages. |
“Final Office Action Issued in U.S. Appl. No. 12/494,303”, dated Sep. 21, 2015, 11 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 12/494,303”, dated May 3, 2016, 12 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 12/494,303”, dated Apr. 9, 2015, 11 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 12/494,303”, dated May 15, 2013, 15 Pages. |
“Non-Final Office Action Issued in U.S. Appl. No. 12/494,303”, dated Aug. 1, 2017, 17 Pages. |
“Final Office Action Issued in U.S. Appl. No. 12/495,105”, dated Oct. 28, 2014, 15 Pages. |
“Final Office Action Issued in U.S. Appl. No. 12/495,105”, dated Aug. 29, 2012, 16 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 12/495,105”, dated Nov. 1, 2011, 18 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 12/495,105”, dated Jan. 24, 2014, 23 Pages. |
“Final Office Action Issued in U.S. Appl. No. 12/705,014”, dated May 3, 2017, 16 Pages. |
“Final Office Action Issued in U.S. Appl. No. 12/705,014”, dated Jul. 29, 2015, 12 Pages. |
“Final Office Action Issued in U.S. Appl. No. 12/705,014”, dated Jan. 6, 2014, 10 Pages. |
“Final Office Action Issued in U.S. Appl. No. 12/705,014”, dated Mar. 6, 2013, 11 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 12/705,014”, dated Aug. 8, 2013, 13 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 12/705,014”, dated Jun. 20, 2012, 16 Pages. |
“Non Office Action Issued in U.S. Appl. No. 12/705,014”, dated Apr. 10, 2015, 10 Pages. |
“Final Office Action Issued in U.S. Appl. No. 12/705,113”, dated Apr. 25, 2016, 11 Pages. |
“Final Office Action Issued in U.S. Appl. No. 12/705,113”, dated Dec. 13, 2013, 18 Pages. |
“Final Office Action Issued in U.S. Appl. No. 12/705,113”, dated Mar. 7, 2013, 13 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 12/705,113”, dated Sep. 27, 2012, 14 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 12/705,113”, dated Jul. 19, 2013, 13 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 12/705,113”, dated Nov. 12, 2015, 11 Pages. |
“Final Office Action Issued in U.S. Appl. No. 14/307,428”, dated Dec. 29, 2017, 11 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 14/307,428”, dated Jun. 1, 2018, 13 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 14/307,428”, dated Jan. 27, 2017, 16 Pages. |
“Final Office Action Issued in U.S. Appl. No. 14/803,949”, dated Feb. 23, 2018, 15 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 14/803,949”, dated May 5, 2017, 9 Pages. |
Guler, Sadiye Z.., “Split and Merge Behavior Analysis and Understanding Using Hidden Markov Models”, Oct. 8, 2002, 21 Pages. |
Lucas, et al., “An Iterative Image Registration Technique with an Application to Stereo Vision”, In Proceedings of Imaging Understanding Workshop, 1981, pp. 121-130. |
Macneil, David, “Hand and Mind”, University of Chicago Press, 1992, 2 Pages. |
Schmidt, et al., “Towards Model-Based Gesture Recognition”, In Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition, Mar. 28, 2000, 6 Pages. |
Shi, et al., “Good Features to Track”, In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 21, 1994, 8 Pages. |
Tomasi, et al., “Detection and Tracking of Point Features”, In Technical Report CMU-CS-91-132, Apr. 1991, 38 Pages. |
Walker, et al., “Age Related Differences in Movement Control: Adjusting Submovement Structure to Optimize Performance”, In the Journals of Gerontology Series B: Psychological Sciences and Social Sciences, vol. 52, Issue 1., Jan. 1, 1997, 14 Pages. |
Number | Date | Country | |
---|---|---|---|
20080259055 A1 | Oct 2008 | US |
Number | Date | Country | |
---|---|---|---|
60355368 | Feb 2002 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11156873 | Jun 2005 | US |
Child | 12104360 | US | |
Parent | 10160659 | May 2002 | US |
Child | 11156873 | US |