Controller-free interactive systems, such as gaming systems, may be controlled at least partially by natural movements. In some examples, such systems may employ a depth sensor, or other suitable sensor, to estimate motion of a user and translate the estimated motions into commands to a console of the system. However, in estimating the motions of a user, such systems may only estimate major joints of the user, e.g., via skeleton estimation, and may lack the ability to detect subtle gestures.
Accordingly, various embodiments directed to estimating a posture of a body part of a user are disclosed herein. For example, in one disclosed embodiment, an image is received front a sensor, where the image includes at least a portion of at image of the user including the body part. The skeleton information of the user is estimated from the image, a region of the image corresponding to the body part is identified at least partially based on the skeleton information, and a shape descriptor is extracted for the region and the shape descriptor is classified based on training data to estimate the posture of the body part. A response then may be output based on the estimated posture of the body part.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.
Controller-free interactive systems, e.g., gaming systems, such as shown at 10 in
However, motion estimation routines such as skeleton mapping may lack the ability to detect subtle gestures of a user. For example, such motion estimation routines may lack the ability to detect and/or distinguish subtle hand gestures such as a user's open and closed hands, shown at 22 and 24 in
Accordingly, systems and methods, described below herein, are directed to determining a state of a hand of a user. For example, the action of closing and opening the hand may be used by such systems for triggering events such as selecting, engaging, or grabbing and dragging objects, e.g., object 26, on the screen, actions which otherwise would correspond to pressing a button when using a controller. Such refined controller-free interaction can be used as an alternative to approaches based on hand waving or hovering, which may be unintuitive or cumbersome. By determining states of a user's hand as described below herein, interactivity of a user with the system may be increased and simpler and more intuitive interfaces may be presented to a user.
At 202, method 200 includes receiving a depth image from a capture device, e.g., capture device 12 shown in
A depth image of a portion of a user is illustrated in
At 204, method 200 includes estimating skeleton information of the user to obtain a virtual skeleton from a depth image obtained in step 202. For example, in
The virtual skeleton 304 may include a plurality of joints, each joint corresponding to a portion of the user. The illustration in
As remarked above, current motion estimation from depth images, such as skeleton estimating described above, may lack the ability to detect subtle gestures of a user. For example, such motion estimation routines may lack the ability to detect and/or distinguish subtle hand gestures such as a user's open and closed hands, shown at 22 and 24 in
However, such an estimated skeleton may be used to estimate a variety of other physical attributes of the user. For example, the skeleton data may be used to estimate user body and/or body part size, orientation of one or more user body parts with respect to each other and/or the capture device, depth of one or more user body parts relative to the capture device, etc. Such estimations of physical attributes of a user then may be employed to normalize and reduce variability in detecting and classifying states of a user's hands, as described below.
At 206, method 200 includes segmenting a hand or hands of the user. In some examples, method 200 may additionally include segmenting one or more regions of the body in addition to the hands.
Segmenting a hand of a user includes identifying a region of the depth image corresponding to the hand, where the identifying is at least partially based on the skeleton information obtained in step 204. Likewise, any region of the body of a user may be identified in a similar manner as described below. At 306,
Hands or body regions may be segmented or localized in a variety of ways and may be based on selected joints identified in the skeleton estimation described above.
As one example, hand detection and localization in the depth image may be based on the estimated wrist and/or hand tip joints from the estimated skeleton. For example, in some embodiments, hand segmentation in the depth image may be performed using a topographical search of the depth image around the hand joints, locating nearby local extrema in the depth image as candidates for finger tips, and segmenting the rest of the hand by taking into account a body size scaling factor as determined from the estimated skeleton, as well as depth discontinuities for boundary identification.
As another example, a flood-fill approach may be employed to identify regions of the depth image corresponding to a user's hands. In a flood-fill approach, the depth image may be searched from a starting point and a starting direction, e.g., the starting point may be the wrist joint and the starting direction may be a direction from the elbow to the wrist joint. Nearby pixels in the depth image may be iteratively scored based on the projection on the starting direction as a way for giving preference to points moving away from the elbow and towards the hand tip, while depth consistency constraints such as depth discontinuities may be used to identify boundaries or extreme values of a user's hands in the depth image. In some examples, threshold distance values may be used to limit the depth map search in both the positive and negative directions of the starting direction based on fixed values or scaled based on an estimated size of the user, for example.
As still another example, a bounding sphere or other suitable bounding shape, positioned based on skeleton joints (e.g. wrist or hand tip joints), may be used to include all pixels in the depth image up to a depth discontinuity. For example, a window may be slid over the bounding sphere to identify depth discontinuities which may be used to establish a boundary in the hand region of the depth image.
In some approaches, segmenting of hand regions may be performed when a user raises the hands outward or above the torso. In this way, identification of hand regions in the depth image may be less ambiguous since the hand regions may be distinguished from the body more easily.
It should be understood that the example hand segmentation examples described above are presented for the purpose of example and are not intended to limit the scope of this disclosure. In general, any hand or body part segmentation method may be used alone or in combination with each other and/or one of she example methods described above.
Continuing with method 200 in
In some examples, the shape descriptor may be invariant to one or more transformations, such as congruency (translation, rotation, reflection, etc.), isometry, depth changes, etc. For example, the shape descriptor may be extracted in such a way as to be invariant to an orientation and location of the hand with respect to the capture device or sensor. A shape descriptor can also be made invariant to reflection, in which case it does not distinguish between the left and right hand. Further, if a shape descriptor is not invariant to reflection, it can always be mirrored by flipping the input image left-right, thereby doubling the amount of training data for each hand. Further, the shape descriptor may be normalized based on an estimated body size so as to be substantially invariant to body and or hand size differences between different users. Alternatively, a calibration step may be performed in advance where the scale of the person is pre-estimated, in which case the descriptor need not be size invariant.
As one example of shape descriptor extraction, a histogram of distances in the hand region identified in step 206 from the centroid of the hand region may be constructed. For example, such a histogram may include fifteen bins, where each bin includes the number of points in the hand region whose distance to the centroid is within a certain distance range associated with that bin. For example, the first bin in such a histogram may include the number of points in the hand region whose distance to the centroid is between 0 and 0.40 centimeters, the second bin includes the number of points in the hand region whose distance to the centroid is between 0.4 and 0.80 centimeters, and so forth. In this way, a vector may be constructed to codify the shape of the hand. Such vectors may further be normalized based on estimated body size, for example.
In another example approach, a histogram may be constructed based on distances and/or angles from points in the hand region to a joint, bone segment or palm plane from the user's estimated skeleton, e.g., the elbow joint, wrist joint, etc.
Another example of a shape descriptor is a Fourier descriptor. Construction of a Fourier descriptor may include codifying a contour of the hand region, e.g., via mapping a distance from each pixel in the hand region to a perimeter of the hand region against a radius of an elliptical fitting of the boundary of the hand and then performing a Fourier transform on the map. Further, such descriptors may be normalized, e.g., relative to an estimated body size. Such descriptors may be invariant to translation, scale, and rotation.
Still another example of constructing a shape descriptor includes determining a convexity of the hand region, e.g., by determining a ratio of an area of a contour of the hand region to the convex hull of the hand region.
It should be understood that the example shape descriptors described above are exemplary in nature and are not intended to limit the scope of this disclosure. In general, any suitable shape descriptor for a hand region may be used alone or in combination with each other and/or one of the example methods described above. For example, shape descriptors, such as the histograms or vectors described above, may be mixed and matched, combined, and/or concatenated into larger vectors, etc. This may allow the identification of new patterns that were not identifiable by looking at them in isolation.
Continuing with method 200, at 210, method 200 includes classifying the state of the hands. For example, the shape descriptor extracted at step 208 may be classified based on training data to estimate the state of the hand. For example, as illustrated at 310 in
In some examples, the training data used in the classification step 210, may be based on a pre-determined set of hand examples. The hand examples may be grouped or labeled based on a representative hand state against which the shape descriptor for the hand region is compared.
In some examples, various eta-data may be used to partition the training data. For example, the training data may include a plurality of hand state examples which may be partitioned based on one or more of hand side (e.g., left or right), hand orientation (e.g., lower arm angle or lower arm orientation), depth, and/or a body size of the user, for example. Partitioning of these training hand examples into separate subsets may reduce variability in hand shape within each partition which may lead to more accurate overall classification of hand state.
Additionally, in some examples, the training data may be specific to a particular application. That is, the training data may depend on expected actions in a given application, e.g., an expected activity in a game, etc. Further, in some examples, the training data may be user specific. For example, an application or game may include a training module wherein a user performs one or more training exercises to calibrate the training data. For example, a user may perform a sequence of open and closed hand postures to establish a training data set used in estimating user hand states during a subsequent interaction with the system.
Classification of a user's hand may be performed based on training examples in a variety of ways. For example, various machine learning techniques may be employed in the classification. Non-limiting examples include: support vector machine training, regression, nearest neighbor, (un)supervised clustering, etc.
As remarked above, such classification techniques may use labeled depth image examples of various hand states for predicting the likelihood of an observed hand as being in one of several states. Additionally, confidences may be added to a classification either during or following the classification step. For example, confidence intervals may be assigned to an estimated hand state based on the training data or by fitting a sigmoid function, or other suitable error function, to the output of the classification step.
As a simple, non-limiting example of classifying a hand state, there may be two possible hand states, open or closed, such as shown at 310 in
For example, as shown in
Various post-classification filtering steps may be employed to increase accuracy of the hand state estimations. Thus, at 211, method 200 may include a filtering step. For example, a temporal-consistency filtering, e.g., a low-pass filter, step may be applied to predicted hand states between consecutive depth image frames to smooth the predictions and reduce temporal uttering, e.g., due to spurious hand movements, sensor noise, or occasional classification errors. That is, a plurality of states of a user's hand based on a plurality of depth images from the capture device or sensor may be estimated and temporal filtering of the plurality of estimates to estimate the state of the hand may be performed. Further, in some examples, classification results may be biased toward one state or another (e.g., towards open or closed hands), as some applications may be more sensitive to false positives (in one direction or another) than other applications.
Continuing with method 200, at 212 method 200 includes outputting a response based on the estimated hand state. For example, a command may be output to a console of a computing system, such as console 16 of computing system 10. As another example, a response may be output to a display device, such as display device 20. In this way, estimated motions of the user, including estimated hand states may be translated into commands to a console 16 of the system 10, so that the user may interact with the system as described above. Further, the method and processes described above may be implemented to determine estimates of states of any part of a user's body, e.g., mouth, eyes, etc. For example, a posture of a body part of a user may be estimated using the methods described above.
The methods and processes described herein may be tied to a variety of different types of computing systems. The computing system 10 described above is a nonlimiting example system which includes a gaming console 16, display device 20, and capture device 12. As another, more general, example,
Computing system 400 may include a logic subsystem 402, a data-holding subsystem 404 operatively connected to the logic subsystem, a display subsystem 406, and/or a capture device 408. The computing system may optionally include components not shown in
Logic subsystem 402 may include one or more physical devices configured to execute one or more instructions. For example, logic subsystem 402 may be configured to execute one or more instructions that are part of one or more programs, routines, objects, components, data structures, or other logical constructs. Such instructions may be implemented to perform a task, implement a data type, transform the state of one or more devices, or otherwise arrive at a desired result. Logic subsystem 402 may include one or more processors that are configured to execute software instructions. Additionally or alternatively, logic subsystem 402 may include one or more hardware or firmware logic machines configured to execute hardware or firmware instructions. Logic subsystem 402 may optionally include individual components that are distributed throughout two or more devices, which may be remotely located in some embodiments.
Data-holding subsystem 404 may include one or more physical devices configured to hold data and/or instructions executable by the logic subsystem to implement the herein described methods and processes. When such methods and processes are implemented, the state of data-holding subsystem 404 may be transformed (e.g., to hold different data). Data-holding subsystem 404 may include removable media and/or built-in devices. Data-holding subsystem 704 may include optical storage devices, semiconductor memory and storage devices (e.g., RAM, EEPROM, flash, etc.), and/or magnetic storage devices, among others. Data-holding subsystem 404 may include devices with one or more of the following characteristics: volatile, nonvolatile, dynamic, static, read/write, read-only, random access, sequential access, location addressable, file addressable, and content addressable. In some embodiments, logic subsystem 402 and data-holding subsystem 404 may be integrated into one or more common devices, such as an application specific integrated circuit or a system on a chip.
Display subsystem 406 may be used to present a visual representation of data held by data-holding subsystem 404. As the herein described methods and processes change the data held by the data-holding subsystem, and thus transform the state of the data-holding subsystem, the state of display subsystem 406 may likewise be transformed to visually represent changes in the underlying data. Display subsystem 406 may include one or more display devices utilizing virtually any type of technology. Such display devices may be combined with logic subsystem 402 and/or data-holding subsystem 404 in a shared enclosure, or such display devices may be peripheral display devices.
Computing system 400 further includes a capture device 408 configured to obtain depth images of one or more targets and/or scenes. Capture device 408 may be configured to capture video with depth information via any suitable technique (e.g., time-of-flight, structured light, stereo image, etc.). As such, capture device 408 may include a depth camera, a video camera, stereo cameras, and/or other suitable capture devices.
For example, in time-of-flight analysis, the capture device 408 may emit infrared light to the scene and may then use sensors to detect the backscattered light from the surfaces of the scene. In some cases, pulsed infrared light may be used, wherein the time between an outgoing light pulse and a corresponding incoming light pulse may be measured and used to determine a physical distance from the capture device to a particular location on the scene. In some cases, the phase of the outgoing light wave may be compared to the phase of the incoming light wave to determine a phase shift, and the phase shift may be used to determine a physical distance from the capture device to a particular location in the scene.
In another example, time-of-flight analysis may be used to indirectly determine a physical distance from the capture device to a particular location in the scene by analyzing the intensity of the reflected beam of light over time via a technique such as shuttered light pulse imaging.
In another example, structured light analysis may be utilized by capture device 408 to capture depth information. In such an analysis, patterned light (e.g., light displayed as a known pattern such as a grid pattern or a stripe pattern) may be projected onto the scene. On the surfaces of the scene, the pattern may become deformed, and this deformation of the pattern may be studied to determine a physical distance from the capture device to a particular location in the scene.
In another example, the capture device may include two or more physically separated cameras that view a scene from different angles, to obtain visual stereo data. In such cases, the visual stereo data may be resolved to generate a depth image.
In other embodiments, capture device 408 may utilize other technologies to measure and/or calculate depth values.
In some embodiments, two or more different cameras may be incorporated into an integrated capture device. For example, a depth camera and a video camera (e.g., RGB video camera) may be incorporated into a common capture device. In some embodiments, two or more separate capture devices may be cooperatively used. For example, a depth camera and a separate video camera may be used. When a video camera is used, it may be used to provide target tracking data, confirmation data for error correction of scene analysis, image capture, face recognition, high-precision tracking of fingers or other small features), light sensing, and/or other functions. In some embodiments, two or more depth and/or RGB cameras may be placed on different sides of the subject to obtain a more complete 3D model of the subject or to further refine the resolution of the observations around the hands. In other embodiments, a single camera may be used, e.g., to obtain an RGB image, and the image may be segmented based on color, e.g., based on a color of a hand.
It is to be understood that at least some depth analysis operations may be executed by a logic machine of one or more capture devices. A capture device may include one or more onboard processing units configured to perform one or more depth analysis functions. A capture device may include firmware to facilitate updating such onboard processing logic.
Computing system 400 may further include various subsystems configured to execute one or more instructions that are part of one or more programs, routines, objects, components, data structures, or other logical constructs. Such subsystems may be operatively connected to logic subsystem 402 and/or data-holding subsystem 404. In some examples, such subsystems may be implemented as software stored on a removable or non-removable computer-readable storage medium.
For example, computing system 400 may include an image segmentation subsystem 410 configured to identify a region of the depth image corresponding to the hand, the identifying being at least partially based on the skeleton information. Computing system 400 may additionally include a descriptor extraction subsystem 412 configured to extract a shape descriptor for a region identified by image segmentation subsystem 410. Computing system 400 may further include a classifier subsystem 414 configured to classify the shape descriptor based on training data to estimate the state of the hand.
It is to be understood that the configurations and/or approaches described herein are exemplary in nature, and that these specific embodiments or examples are not to be considered in a limiting sense, because numerous variations are possible. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated may be performed in the sequence illustrated, in other sequences, in parallel, or in some cases omitted. Likewise, the order of the above-described processes may be changed.
Additionally, it should be understood that the examples of open and closed hand detection described here are exemplary in nature and are not intended to limit the scope of this disclosure. The methods and systems described herein may be applied to estimating a variety of refined gestures in a depth image. For example, various other hand profiles may be estimated using the systems and methods described herein. Non-limiting examples include: fist postures, open palm postures, pointing fingers, etc.
The subject matter of the present disclosure includes all novel and nonobvious combinations and subcombinations of the various processes, systems and configurations, and other features, functions, acts, and/or properties disclosed herein, as well as any and all equivalents thereof.
Number | Name | Date | Kind |
---|---|---|---|
4627620 | Yang | Dec 1986 | A |
4630910 | Ross et al. | Dec 1986 | A |
4645458 | Williams | Feb 1987 | A |
4695953 | Blair et al. | Sep 1987 | A |
4702475 | Elstein et al. | Oct 1987 | A |
4711543 | Blair et al. | Dec 1987 | A |
4751642 | Silva et al. | Jun 1988 | A |
4796997 | Svetkoff et al. | Jan 1989 | A |
4809065 | Harris et al. | Feb 1989 | A |
4817950 | Goo | Apr 1989 | A |
4843568 | Krueger et al. | Jun 1989 | A |
4893183 | Nayar | Jan 1990 | A |
4901362 | Terzian | Feb 1990 | A |
4925189 | Braeunig | May 1990 | A |
5101444 | Wilson et al. | Mar 1992 | A |
5148154 | MacKay et al. | Sep 1992 | A |
5184295 | Mann | Feb 1993 | A |
5229754 | Aoki et al. | Jul 1993 | A |
5229756 | Kosugi et al. | Jul 1993 | A |
5239463 | Blair et al. | Aug 1993 | A |
5239464 | Blair et al. | Aug 1993 | A |
5288078 | Capper et al. | Feb 1994 | A |
5295491 | Gevins | Mar 1994 | A |
5320538 | Baum | Jun 1994 | A |
5347306 | Nitta | Sep 1994 | A |
5385519 | Hsu et al. | Jan 1995 | A |
5405152 | Katanics et al. | Apr 1995 | A |
5417210 | Funda et al. | May 1995 | A |
5423554 | Davis | Jun 1995 | A |
5454043 | Freeman | Sep 1995 | A |
5469740 | French et al. | Nov 1995 | A |
5495576 | Ritchey | Feb 1996 | A |
5516105 | Eisenbrey et al. | May 1996 | A |
5524637 | Erickson | Jun 1996 | A |
5534917 | MacDougall | Jul 1996 | A |
5563988 | Maes et al. | Oct 1996 | A |
5577981 | Jarvik | Nov 1996 | A |
5580249 | Jacobsen et al. | Dec 1996 | A |
5594469 | Freeman et al. | Jan 1997 | A |
5597309 | Riess | Jan 1997 | A |
5616078 | Oh | Apr 1997 | A |
5617312 | Iura et al. | Apr 1997 | A |
5638300 | Johnson | Jun 1997 | A |
5641288 | Zaenglein | Jun 1997 | A |
5682196 | Freeman | Oct 1997 | A |
5682229 | Wangler | Oct 1997 | A |
5690582 | Ulrich et al. | Nov 1997 | A |
5703367 | Hashimoto et al. | Dec 1997 | A |
5704837 | Iwasaki et al. | Jan 1998 | A |
5715834 | Bergamasco et al. | Feb 1998 | A |
5774591 | Black et al. | Jun 1998 | A |
5875108 | Hoffberg et al. | Feb 1999 | A |
5877803 | Wee et al. | Mar 1999 | A |
5913727 | Ahdoot | Jun 1999 | A |
5933125 | Fernie | Aug 1999 | A |
5980256 | Carmein | Nov 1999 | A |
5989157 | Walton | Nov 1999 | A |
5995649 | Marugame | Nov 1999 | A |
6005548 | Latypov et al. | Dec 1999 | A |
6009210 | Kang | Dec 1999 | A |
6054991 | Crane et al. | Apr 2000 | A |
6066075 | Poulton | May 2000 | A |
6072494 | Nguyen | Jun 2000 | A |
6073489 | French et al. | Jun 2000 | A |
6077201 | Cheng | Jun 2000 | A |
6098458 | French et al. | Aug 2000 | A |
6100896 | Strohecker et al. | Aug 2000 | A |
6101289 | Kellner | Aug 2000 | A |
6128003 | Smith et al. | Oct 2000 | A |
6130677 | Kunz | Oct 2000 | A |
6141463 | Covell et al. | Oct 2000 | A |
6147678 | Kumar et al. | Nov 2000 | A |
6152856 | Studor et al. | Nov 2000 | A |
6159100 | Smith | Dec 2000 | A |
6173066 | Peurach et al. | Jan 2001 | B1 |
6181343 | Lyons | Jan 2001 | B1 |
6188777 | Darrell et al. | Feb 2001 | B1 |
6215890 | Matsuo et al. | Apr 2001 | B1 |
6215898 | Woodfill et al. | Apr 2001 | B1 |
6226396 | Marugame | May 2001 | B1 |
6229913 | Nayar et al. | May 2001 | B1 |
6256033 | Nguyen | Jul 2001 | B1 |
6256400 | Takata et al. | Jul 2001 | B1 |
6283860 | Lyons et al. | Sep 2001 | B1 |
6289112 | Jain et al. | Sep 2001 | B1 |
6299308 | Voronka et al. | Oct 2001 | B1 |
6308565 | French et al. | Oct 2001 | B1 |
6316934 | Amorai-Moriya et al. | Nov 2001 | B1 |
6363160 | Bradski et al. | Mar 2002 | B1 |
6384819 | Hunter | May 2002 | B1 |
6411744 | Edwards | Jun 2002 | B1 |
6430997 | French et al. | Aug 2002 | B1 |
6476834 | Doval et al. | Nov 2002 | B1 |
6496598 | Harman | Dec 2002 | B1 |
6503195 | Keller et al. | Jan 2003 | B1 |
6539931 | Trajkovic et al. | Apr 2003 | B2 |
6570555 | Prevost et al. | May 2003 | B1 |
6633294 | Rosenthal et al. | Oct 2003 | B1 |
6640202 | Dietz et al. | Oct 2003 | B1 |
6661918 | Gordon et al. | Dec 2003 | B1 |
6681031 | Cohen et al. | Jan 2004 | B2 |
6714665 | Hanna et al. | Mar 2004 | B1 |
6721444 | Gu et al. | Apr 2004 | B1 |
6731799 | Sun et al. | May 2004 | B1 |
6738066 | Nguyen | May 2004 | B1 |
6765726 | French et al. | Jul 2004 | B2 |
6788809 | Grzeszczuk et al. | Sep 2004 | B1 |
6801637 | Voronka et al. | Oct 2004 | B2 |
6873723 | Aucsmith et al. | Mar 2005 | B1 |
6876496 | French et al. | Apr 2005 | B2 |
6937742 | Roberts et al. | Aug 2005 | B2 |
6950534 | Cohen et al. | Sep 2005 | B2 |
7003134 | Covell et al. | Feb 2006 | B1 |
7007035 | Kamath et al. | Feb 2006 | B2 |
7036094 | Cohen et al. | Apr 2006 | B1 |
7038855 | French et al. | May 2006 | B2 |
7039676 | Day et al. | May 2006 | B1 |
7042440 | Pryor et al. | May 2006 | B2 |
7050606 | Paul et al. | May 2006 | B2 |
7058204 | Hildreth et al. | Jun 2006 | B2 |
7060957 | Lange et al. | Jun 2006 | B2 |
7113918 | Ahmad et al. | Sep 2006 | B1 |
7121946 | Paul et al. | Oct 2006 | B2 |
7170492 | Bell | Jan 2007 | B2 |
7184048 | Hunter | Feb 2007 | B2 |
7202898 | Braun et al. | Apr 2007 | B1 |
7222078 | Abelow | May 2007 | B2 |
7227526 | Hildreth et al. | Jun 2007 | B2 |
7257237 | Luck et al. | Aug 2007 | B1 |
7259747 | Bell | Aug 2007 | B2 |
7289645 | Yamamoto et al. | Oct 2007 | B2 |
7308112 | Fujimura et al. | Dec 2007 | B2 |
7317836 | Fujimura et al. | Jan 2008 | B2 |
7348963 | Bell | Mar 2008 | B2 |
7359121 | French et al. | Apr 2008 | B2 |
7367887 | Watabe et al. | May 2008 | B2 |
7372977 | Fujimura et al. | May 2008 | B2 |
7379563 | Shamaie | May 2008 | B2 |
7379566 | Hildreth | May 2008 | B2 |
7389591 | Jaiswal et al. | Jun 2008 | B2 |
7412077 | Li et al. | Aug 2008 | B2 |
7421093 | Hildreth et al. | Sep 2008 | B2 |
7430312 | Gu | Sep 2008 | B2 |
7436496 | Kawahito | Oct 2008 | B2 |
7450736 | Yang et al. | Nov 2008 | B2 |
7452275 | Kuraishi | Nov 2008 | B2 |
7460690 | Cohen et al. | Dec 2008 | B2 |
7489812 | Fox et al. | Feb 2009 | B2 |
7536032 | Bell | May 2009 | B2 |
7555142 | Hildreth et al. | Jun 2009 | B2 |
7560701 | Oggier et al. | Jul 2009 | B2 |
7570805 | Gu | Aug 2009 | B2 |
7574020 | Shamaie | Aug 2009 | B2 |
7574411 | Suontausta et al. | Aug 2009 | B2 |
7576727 | Bell | Aug 2009 | B2 |
7590262 | Fujimura et al. | Sep 2009 | B2 |
7593552 | Higaki et al. | Sep 2009 | B2 |
7598942 | Underkoffler et al. | Oct 2009 | B2 |
7607509 | Schmiz et al. | Oct 2009 | B2 |
7620202 | Fujimura et al. | Nov 2009 | B2 |
7668340 | Cohen et al. | Feb 2010 | B2 |
7680298 | Roberts et al. | Mar 2010 | B2 |
7683954 | Ichikawa et al. | Mar 2010 | B2 |
7684592 | Paul et al. | Mar 2010 | B2 |
7701439 | Hillis et al. | Apr 2010 | B2 |
7702130 | Im et al. | Apr 2010 | B2 |
7704135 | Harrison, Jr. | Apr 2010 | B2 |
7710391 | Bell et al. | May 2010 | B2 |
7729530 | Antonov et al. | Jun 2010 | B2 |
7746345 | Hunter | Jun 2010 | B2 |
7760182 | Ahmad et al. | Jul 2010 | B2 |
7809167 | Bell | Oct 2010 | B2 |
7834846 | Bell | Nov 2010 | B1 |
7852262 | Namineni et al. | Dec 2010 | B2 |
RE42256 | Edwards | Mar 2011 | E |
7898522 | Hildreth et al. | Mar 2011 | B2 |
7974443 | Kipman et al. | Jul 2011 | B2 |
8035612 | Bell et al. | Oct 2011 | B2 |
8035614 | Bell et al. | Oct 2011 | B2 |
8035624 | Bell et al. | Oct 2011 | B2 |
8072470 | Marks | Dec 2011 | B2 |
20020041327 | Hildreth et al. | Apr 2002 | A1 |
20030085887 | Hunt et al. | May 2003 | A1 |
20080019589 | Yoon et al. | Jan 2008 | A1 |
20080026838 | Dunstan et al. | Jan 2008 | A1 |
20080201340 | Thonangi | Aug 2008 | A1 |
20090110292 | Fujimura et al. | Apr 2009 | A1 |
20100094800 | Sharp | Apr 2010 | A1 |
20100197392 | Geiss | Aug 2010 | A1 |
20100214322 | Lim et al. | Aug 2010 | A1 |
20100215257 | Dariush et al. | Aug 2010 | A1 |
20120092445 | McDowell et al. | Apr 2012 | A1 |
20120154373 | Finocchio et al. | Jun 2012 | A1 |
20120163723 | Balan et al. | Jun 2012 | A1 |
Number | Date | Country |
---|---|---|
201254344 | Jun 2010 | CN |
0583061 | Feb 1994 | EP |
08044490 | Feb 1996 | JP |
9310708 | Jun 1993 | WO |
9717598 | May 1997 | WO |
9944698 | Sep 1999 | WO |
Entry |
---|
Athitsos, et al., “An Appearance-Based Framework for 3D Hand Shape Classification and Camera Viewpoint Estimation”, Retrieved at << http://luthuli.cs.uiuc.edu/˜daf/courses/AppCV/Papers/01004129.pdf >>, In Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition, 2002, pp. 6. |
Li, et al., “Real time Hand Gesture Recognition using a Range Camera”, Retrieved at << http://www.araa.asn.au/acra/acra2009/papers/pap128s1.pdf >>, Australasian Conference on Robotics and Automation (ACRA), Dec. 2-4, 2009, pp. 7. |
Kanade et al., “A Stereo Machine for Video-rate Dense Depth Mapping and Its New Applications”, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1996, pp. 196-202,The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA. |
Miyagawa et al., “CCD-Based Range Finding Sensor”, Oct. 1997, pp. 1648-1652, vol. 44 No. 10, IEEE Transactions on Electron Devices. |
Rosenhahn et al., “Automatic Human Model Generation”, 2005, pp. 41-48, University of Auckland (CITR), New Zealand. |
Aggarwal et al., “Human Motion Analysis: A Review”, IEEE Nonrigid and Articulated Motion Workshop, 1997, University of Texas at Austin, Austin, TX. |
Shao et al., “An Open System Architecture for a Multimedia and Multimodal User Interface”, Aug. 24, 1998, Japanese Society for Rehabilitation of Persons with Disabilities (JSRPD), Japan. |
Kohler, “Special Topics of Gesture Recognition Applied in Intelligent Home Environments”, In Proceedings of the Gesture Workshop, 1998, pp. 285-296, Germany. |
Kohler, “Vision Based Remote Control in Intelligent Home Environments”, University of Erlangen-Nuremberg/ Germany, 1996, pp. 147-154, Germany. |
Kohler, “Technical Details and Ergonomical Aspects of Gesture Recognition applied in Intelligent Home Environments”, 1997, Germany. |
Hasegawa et al., “Human-Scale Haptic Interaction with a Reactive Virtual Human in a Real-Time Physics Simulator”, Jul. 2006, vol. 4, No. 3, Article 6C, ACM Computers in Entertainment, New York, NY. |
Qian et al., “A Gesture-Driven Multimodal Interactive Dance System”, Jun. 2004, pp. 1579-1582, IEEE International Conference on Multimedia and Expo (ICME), Taipei, Taiwan. |
Zhao, “Dressed Human Modeling, Detection, and Parts Localization”, 2001, The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA. |
He, “Generation of Human Body Models”, Apr. 2005, University of Auckland, New Zealand. |
Isard et al., “Condensation—Conditional Density Propagation for Visual Tracking”, 1998, pp. 5-28, International Journal of Computer Vision 29(1), Netherlands. |
Livingston, “Vision-based Tracking with Dynamic Structured Light for Video See-through Augmented Reality”, 1998, University of North Carolina at Chapel Hill, North Carolina, USA. |
Wren et al., “Pfinder: Real-Time Tracking of the Human Body”, MIT Media Laboratory Perceptual Computing Section Technical Report No. 353, Jul. 1997, vol. 19, No. 7, pp. 780-785, IEEE Transactions on Pattern Analysis and Machine Intelligence, Caimbridge, MA. |
Breen et al., “Interactive Occlusion and Collusion of Real and Virtual Objects in Augmented Reality”, Technical Report ECRC-95-02, 1995, European Computer-Industry Research Center GmbH, Munich, Germany. |
Freeman et al., “Television Control by Hand Gestures”, Dec. 1994, Mitsubishi Electric Research Laboratories, TR94-24, Caimbridge, MA. |
Hongo et al., “Focus of Attention for Face and Hand Gesture Recognition Using Multiple Cameras”, Mar. 2000, pp. 156-161, 4th IEEE International Conference on Automatic Face and Gesture Recognition, Grenoble, France. |
Pavlovic et al., “Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review”, Jul. 1997, pp. 677-695, vol. 19, No. 7, IEEE Transactions on Pattern Analysis and Machine Intelligence. |
Azarbayejani et al., “Visually Controlled Graphics”, Jun. 1993, vol. 15, No. 6, IEEE Transactions on Pattern Analysis and Machine Intelligence. |
Granieri et al., “Simulating Humans in VR”, The British Computer Society, Oct. 1994, Academic Press. |
Brogan et al., “Dynamically Simulated Characters in Virtual Environments”, Sep./Oct. 1998, pp. 2-13, vol. 18, Issue 5, IEEE Computer Graphics and Applications. |
Fisher et al., “Virtual Environment Display System”, ACM Workshop on Interactive 3D Graphics, Oct. 1986, Chapel Hill, NC. |
“Virtual High Anxiety”, Tech Update, Aug. 1995, pp. 22. |
Sheridan et al., “Virtual Reality Check”, Technology Review, Oct. 1993, pp. 22-28, vol. 96, No. 7. |
Stevens, “Flights into Virtual Reality Treating Real World Disorders”, The Washington Post, Mar. 27, 1995, Science Psychology, 2 pages. |
“Simulation and Training”, 1994, Division Incorporated. |
Plagemann, et al., “Real-time Identification and Localization of Body Parts from Depth Images”, Retrieved at <<http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=arnumber=5509559 >>, IEEE International Conference on Robotics and Automation (ICRA), May 3-7, 2010, 6 Pages. |
Cohen, I. et al., “Inference of Human Postures by Classification of 3d Human Body Shape”, IEEE Workshop on Analysis and Modeling of Faces and Gestures, Mar. 2003, 8 pages. |
Jungling, et al.,“Feature Based Person Detection Beyond the Visible Spectrum”, IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Jun. 20-25, 2009, pp. 30-37. |
Khan, et al., “Real-time Human Motion Detection and Classification”, IEEE Proceedings Students Conference, Aug. 16-17, 2002, pp. 135-139. |
“Human motion-capture for Xbox Kinect”, Retrieved at << http://research.microsft.com/en-us/projects/vrkinect/ >>, Retrieved Date: Apr. 15, 2011, 3 Pages. |
Bolan, A. et al., “Attribute State Classification,” U.S. Appl. No. 13/098,899, fled May 2, 2011, 38 pages. |
Finocchio, M. et al., “Parallel Processing Machine Learning Decision Tree Training”, U.S. Appl. No. 12/969,112, filed Dec. 15, 2010, 33 pages. |
Number | Date | Country | |
---|---|---|---|
20120163723 A1 | Jun 2012 | US |