Many computing applications such as computer games, multimedia applications, or the like use controls to allow users to manipulate game characters or other aspects of an application. Typically such controls are input using, for example, controllers, remotes, keyboards, mice, or the like. Unfortunately, such controls can be difficult to learn, thus creating a barrier between a user and such games and applications. Furthermore, such controls may be different than actual game actions or other application actions for which the controls are used. For example, a game control that causes a game character to swing a baseball bat may not correspond to an actual motion of swinging the baseball bat.
Disclosed herein are systems and methods for processing depth information of a scene that may be used to interpret human input. For example, a depth image of the scene may be received, captured, or observed. The depth image may include a human target and an environment such as a background, one or more non-human target foreground object, or the like. According to an example embodiment, the depth image may be analyzed to determine one or more pixels associated with the human target and the environment such as the pixels that may not be associated with the human target, or the non-player pixels. The one or more pixels associated with the environment may then be removed from the depth image such that the human target may be isolated in the depth image. The isolated human target may be used to track a model of human target to, for example, animate an avatar and/or control various computing applications.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.
As will be described herein, a user may control an application executing on a computing environment such as a game console, a computer, or the like by performing one or more gestures. According to one embodiment, the gestures may be received by, for example, a capture device. For example, the capture device may capture a depth image of a scene. In one embodiment, the depth image of the scene may be received, captured, or observed. The depth image may include a human target and an environment such as a background, foreground objects that may not be associated with the human target, or the like. In an example embodiment, the environment may include one or more non-human targets such as a wall, furniture, or the like. The depth image may be analyzed to determine whether one or more pixels are associated with the environment and the human target. The one or more pixels associated with the environment may be removed or discarded to isolate the foreground object. The depth image with the isolated foreground object may then be processed. For example, as described above, the isolated foreground object may include a human target. According to an example embodiment, a model of human target, or any other desired shape may be generated and/or tracked to, for example, animate an avatar and/or control various computing applications.
As shown in
As shown in
According to one embodiment, the target recognition, analysis, and tracking system 10 may be connected to an audiovisual device 16 such as a television, a monitor, a high-definition television (HDTV), or the like that may provide game or application visuals and/or audio to a user such as the user 18. For example, the computing environment 12 may include a video adapter such as a graphics card and/or an audio adapter such as a sound card that may provide audiovisual signals associated with the game application, non-game application, or the like. The audiovisual device 16 may receive the audiovisual signals from the computing environment 12 and may then output the game or application visuals and/or audio associated with the audiovisual signals to the user 18. According to one embodiment, the audiovisual device 16 may be connected to the computing environment 12 via, for example, an S-Video cable, a coaxial cable, an HDMI cable, a DVI cable, a VGA cable, or the like.
As shown in
As shown in
Other movements by the user 18 may also be interpreted as other controls or actions, such as controls to bob, weave, shuffle, block, jab, or throw a variety of different power punches. Furthermore, some movements may be interpreted as controls that may correspond to actions other than controlling the human target avatar 40. For example, the human target may use movements to end, pause, or save a game, select a level, view high scores, communicate with a friend, etc. Additionally, a full range of motion of the user 18 may be available, used, and analyzed in any suitable manner to interact with an application.
In example embodiments, the human target such as the user 18 may have an object. In such embodiments, the user of an electronic game may be holding the object such that the motions of the human target and the object may be used to adjust and/or control parameters of the game. For example, the motion of a human target holding a racket may be tracked and utilized for controlling an on-screen racket in an electronic sports game. In another example embodiment, the motion of a human target holding an object may be tracked and utilized for controlling an on-screen weapon in an electronic combat game.
According to other example embodiments, the target recognition, analysis, and tracking system 10 may further be used to interpret target movements as operating system and/or application controls that are outside the realm of games. For example, virtually any controllable aspect of an operating system and/or application may be controlled by movements of the target such as the user 18.
As shown in
As shown in
According to another example embodiment, time-of-flight analysis may be used to indirectly determine a physical distance from the capture device 20 to a particular location on the targets or objects by analyzing the intensity of the reflected beam of light over time via various techniques including, for example, shuttered light pulse imaging.
In another example embodiment, the capture device 20 may use a structured light to capture depth information. In such an analysis, patterned light (i.e., light displayed as a known pattern such as grid pattern or a stripe pattern) may be projected onto the scene via, for example, the IR light component 24. Upon striking the surface of one or more targets or objects in the scene, the pattern may become deformed in response. Such a deformation of the pattern may be captured by, for example, the 3-D camera 26 and/or the RGB camera 28 and may then be analyzed to determine a physical distance from the capture device 20 to a particular location on the targets or objects.
According to another embodiment, the capture device 20 may include two or more physically separated cameras that may view a scene from different angles, to obtain visual stereo data that may be resolved to generate depth information.
The capture device 20 may further include a microphone 30. The microphone 30 may include a transducer or sensor that may receive and convert sound into an electrical signal. According to one embodiment, the microphone 30 may be used to reduce feedback between the capture device 20 and the computing environment 12 in the target recognition, analysis, and tracking system 10. Additionally, the microphone 30 may be used to receive audio signals that may also be provided by the user to control applications such as game applications, non-game applications, or the like that may be executed by the computing environment 12.
In an example embodiment, the capture device 20 may further include a processor 32 that may be in operative communication with the image camera component 22. The processor 32 may include a standardized processor, a specialized processor, a microprocessor, or the like that may execute instructions that may include instructions for receiving a depth image of a scene, determining whether one or more pixels associated with an environment of the depth image, discarding the one or more pixels associated with the environment from the depth image to isolate a desired object such as a human target in the depth image, processing the depth image with the isolated desired object, which will be described in more detail below.
The capture device 20 may further include a memory component 34 that may store the instructions that may be executed by the processor 32, images or frames of images captured by the 3-D camera or RGB camera, or any other suitable information, images, or the like. According to an example embodiment, the memory component 34 may include random access memory (RAM), read only memory (ROM), cache, Flash memory, a hard disk, or any other suitable storage component. As shown in
As shown in
Additionally, the capture device 20 may provide the depth information and images captured by, for example, the 3-D camera 26 and/or the RGB camera 28, and a skeletal model that may be generated by the capture device 20 to the computing environment 12 via the communication link 36. The computing environment 12 may then use the skeletal model, depth information, and captured images to, for example, control an application such as a game or word processor. For example, as shown, in
A graphics processing unit (GPU) 108 and a video encoder/video codec (coder/decoder) 114 form a video processing pipeline for high speed and high resolution graphics processing. Data is carried from the graphics processing unit 108 to the video encoder/video codec 114 via a bus. The video processing pipeline outputs data to an A/V (audio/video) port 140 for transmission to a television or other display. A memory controller 110 is connected to the GPU 108 to facilitate processor access to various types of memory 112, such as, but not limited to, a RAM (Random Access Memory).
The multimedia console 100 includes an I/O controller 120, a system management controller 122, an audio processing unit 123, a network interface controller 124, a first USB host controller 126, a second USB controller 128 and a front panel I/O subassembly 130 that are preferably implemented on a module 118. The USB controllers 126 and 128 serve as hosts for peripheral controllers 142(1)-142(2), a wireless adapter 148, and an external memory device 146 (e.g., flash memory, external CD/DVD ROM drive, removable media, etc.). The network interface 124 and/or wireless adapter 148 provide access to a network (e.g., the Internet, home network, etc.) and may be any of a wide variety of various wired or wireless adapter components including an Ethernet card, a modem, a Bluetooth module, a cable modem, and the like.
System memory 143 is provided to store application data that is loaded during the boot process. A media drive 144 is provided and may comprise a DVD/CD drive, hard drive, or other removable media drive, etc. The media drive 144 may be internal or external to the multimedia console 100. Application data may be accessed via the media drive 144 for execution, playback, etc. by the multimedia console 100. The media drive 144 is connected to the I/O controller 120 via a bus, such as a Serial ATA bus or other high speed connection (e.g., IEEE 1394).
The system management controller 122 provides a variety of service functions related to assuring availability of the multimedia console 100. The audio processing unit 123 and an audio codec 132 form a corresponding audio processing pipeline with high fidelity and stereo processing. Audio data is carried between the audio processing unit 123 and the audio codec 132 via a communication link. The audio processing pipeline outputs data to the A/V port 140 for reproduction by an external audio human target or device having audio capabilities.
The front panel I/O subassembly 130 supports the functionality of the power button 150 and the eject button 152, as well as any LEDs (light emitting diodes) or other indicators exposed on the outer surface of the multimedia console 100. A system power supply module 136 provides power to the components of the multimedia console 100. A fan 138 cools the circuitry within the multimedia console 100.
The CPU 101, GPU 108, memory controller 110, and various other components within the multimedia console 100 are interconnected via one or more buses, including serial and parallel buses, a memory bus, a peripheral bus, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures can include a Peripheral Component Interconnects (PCI) bus, PCI-Express bus, etc.
When the multimedia console 100 is powered ON, application data may be loaded from the system memory 143 into memory 112 and/or caches 102, 104 and executed on the CPU 101. The application may present a graphical user interface that provides a consistent user experience when navigating to different media types available on the multimedia console 100. In operation, applications and/or other media contained within the media drive 144 may be launched or played from the media drive 144 to provide additional functionalities to the multimedia console 100.
The multimedia console 100 may be operated as a standalone system by simply connecting the system to a television or other display. In this standalone mode, the multimedia console 100 allows one or more users to interact with the system, watch movies, or listen to music. However, with the integration of broadband connectivity made available through the network interface 124 or the wireless adapter 148, the multimedia console 100 may further be operated as a participant in a larger network community.
When the multimedia console 100 is powered ON, a set amount of hardware resources are reserved for system use by the multimedia console operating system. These resources may include a reservation of memory (e.g., 16 MB), CPU and GPU cycles (e.g., 5%), networking bandwidth (e.g., 8 kbs), etc. Because these resources are reserved at system boot time, the reserved resources do not exist from the application's view.
In particular, the memory reservation preferably is large enough to contain the launch kernel, concurrent system applications and drivers. The CPU reservation is preferably constant such that if the reserved CPU usage is not used by the system applications, an idle thread will consume any unused cycles.
With regard to the GPU reservation, lightweight messages generated by the system applications (e.g., popups) are displayed by using a GPU interrupt to schedule code to render popup into an overlay. The amount of memory required for an overlay depends on the overlay area size and the overlay preferably scales with screen resolution. Where a full user interface is used by the concurrent system application, it is preferable to use a resolution independent of application resolution. A scaler may be used to set this resolution such that the need to change frequency and cause a TV resynch is eliminated.
After the multimedia console 100 boots and system resources are reserved, concurrent system applications execute to provide system functionalities. The system functionalities are encapsulated in a set of system applications that execute within the reserved system resources described above. The operating system kernel identifies threads that are system application threads versus gaming application threads. The system applications are preferably scheduled to run on the CPU 101 at predetermined times and intervals in order to provide a consistent system resource view to the application. The scheduling is to minimize cache disruption for the gaming application running on the console.
When a concurrent system application requires audio, audio processing is scheduled asynchronously to the gaming application due to time sensitivity. A multimedia console application manager (described below) controls the gaming application audio level (e.g., mute, attenuate) when system applications are active.
Input devices (e.g., controllers 142(1) and 142(2)) are shared by gaming applications and system applications. The input devices are not reserved resources, but are to be switched between system applications and the gaming application such that each will have a focus of the device. The application manager preferably controls the switching of input stream, without knowledge the gaming application's knowledge and a driver maintains state information regarding focus switches. The cameras 26, 28 and capture device 20 may define additional input devices for the console 100.
In
The computer 241 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media discussed above and illustrated in
The computer 241 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 246. The remote computer 246 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 241, although only a memory storage device 247 has been illustrated in
When used in a LAN networking environment, the computer 241 is connected to the LAN 245 through a network interface or adapter 237. When used in a WAN networking environment, the computer 241 typically includes a modem 250 or other means for establishing communications over the WAN 249, such as the Internet. The modem 250, which may be internal or external, may be connected to the system bus 221 via the user input interface 236, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 241, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,
According to one embodiment, at 510, a depth image may be received. For example, the target recognition, analysis, and tracking system may include a capture device such as the capture device 20 described above with respect to
The depth image may be a plurality of observed pixels where each observed pixel has an observed depth value. For example, the depth image may include a two-dimensional (2-D) pixel area of the captured scene where each pixel in the 2-D pixel area may have a depth value such as a length or distance in, for example, centimeters, millimeters, or the like of an object in the captured scene from the capture device.
As described above, the depth image 600 may include a plurality of observed pixels where each observed pixel has an observed depth value associated therewith. For example, the depth image 600 may include a two-dimensional (2-D) pixel area of the captured scene where each pixel in the 2-D pixel area may represent a depth value such as a length or distance in, for example, centimeters, millimeters, or the like of a target or object in the captured scene from the capture device 20. In one embodiment, the first depth image 600 may be colorized such that different colors of the pixels of the depth image correspond to and/or visually depict different distances of the one or more human targets 602, 604 and non-human targets 606 from the capture device 20. For example, according to one embodiment, the pixels associated with a target closest to the capture device may be colored with shades of red and/or orange in the depth image whereas the pixels associated with a target further away may be colored with shades of green and/or blue in the depth image.
Referring back to
For example, in one embodiment, the target recognition, analysis, and tracking system may calculate portions of missing and/or removed depth values for pixels associated with infrared shadows in the depth image received at 505.
According to an example embodiment, the right hand 702 and the left hand 705 that may be extended in front of a portion of the human target 602 may generate respective first and second infrared shadows 708 and 710. The first and second infrared shadow 708 and 710 may include portions of the depth image 700 observed or captured by a capture device such as the capture device 20 described above with respect to
The first and second infrared shadows 708 and 710 may separate a body part from another body part of human target 602. For example, as shown in
In one embodiment, depth values of the pixels associated with an infrared shadow such as the infrared shadow 708 may be replaced. For example, the target recognition, analysis, and tracking system may estimate one or more depth values for the shadow that may replace the invalid depth values. According to one embodiment, the depth value for an infrared shadow pixel may be estimated based on neighboring non-shadow pixels. For example, the target recognition, analysis, and tracking system may identify an infrared shadow pixel. Upon identifying the infrared shadow pixel, the target recognition, analysis, and tracking system may determine whether one or more pixels adjacent to the infrared shadow pixel may have valid depth values. If one or more pixels adjacent to the infrared shadow pixel may have valid depth values, a depth value for the infrared shadow pixel may be generated based on the valid depth values of the adjacent pixels. For example, in one embodiment, the target recognition, analysis, and tracking system may estimate or interpolate valid depth values of pixels adjacent to the shadow pixel. The target recognition, analysis, and tracking system may also assign the shadow pixel a depth value of one of the adjacent pixels that may have a valid depth value.
According to one embodiment, the target recognition, analysis, and tracking system may identify other infrared shadow pixels and calculate depth values for those pixels as described above until each of the infrared shadow pixels may have a depth value associated therewith. Thus, in an example embodiment, the target recognition, analysis, and tracking system may interpolate a value for each of the infrared shadow pixels based on neighboring or adjacent pixels that may have a valid depth value associated therewith.
Additionally, in another example embodiment, the target recognition, analysis, and tracking system may calculate depth values for one or more infrared shadow pixels based on the depth image of a previous frame. As described above, the capture device such as the capture device 20 described above with respect to
At 515, a human target in a depth image may be scanned for one or more body parts. For example, upon receiving a depth image, the target recognition, analysis, and tracking system may determine whether the depth image includes a human target such as the human targets 602 and 604 described above with respect to
In one embodiment, the target recognition, analysis, and tracking system may determine whether a human target in the depth image may have been previously scanned, at 510, before the human target may be scanned at 515. For example, the capture device such as the capture device 20 described above with respect to
At 520, an environment of the depth image may be determined. For example, as described above, the depth image may be a plurality of observed pixels in a two-dimensional (2-D) pixel area where each observed pixel has an observed depth value. In one embodiment, the target recognition, analysis, and tracking system may determine whether one or more of the pixels in the depth image may be associated with the human target or environment of the depth image. As described above, the environment of the depth image may include, for example, environment objects behind a human target, environment objects above a human target, environment objects surrounding a left and a right side of a human target, environment objects in front of a human target, or the like in the depth image.
In an example embodiment, the target recognition, analysis, and tracking system may determine the environment of the depth image by initially defining a bounding box around each foreground object such as each human target in the depth image received at 505. For example, the target recognition, analysis, and tracking system may define a bounding box for each human target such as the human targets 602, 604 described above with respect to
The body measurements such as the length, width, or the like associated with one or more body parts and the calculated centroid may then be used to determine the sides of the bounding box 802. For example, the bounding box 802 may be defined by the intersection of the respective first, second, third, and fourth sides 804a-804d. According to an example embodiment, the location of the first side 804a and the third side 804c may be determined by adding the measurements such as the length associated with the respective left and right arms determined by the scan to an X value associated with the centroid in a direction of the left arm and a direction of the right arm. Additionally, in one embodiment, the location second side 804b and the fourth side 804d may be determined based on the Y value associated with the location of the top of the head of the human target and the bottom of the legs determined by on the scan. The bounding box 802 may then be defined by the intersection of, for example, the first side 804a and the second side 804b, the first side 804a and the fourth side 804d, the third side 804c and the second side 804b, and the third side 804c and the fourth side 804d.
According to an example embodiment, after defining the bounding box for the human target 602, the pixels in the depth image 800 outside the bounding box 802 may be identified as the non-human target pixel, or the pixels associated with the environment, of the depth image 800.
Referring back to
According to an example embodiment, the target recognition, analysis, and tracking system may detect edges of the foreground object such as the human target by comparing various depth values of nearby pixels that may be within the bounding box such as the bounding box 802 described above with respect to
In one embodiment, the target recognition, analysis, and tracking system may further select a predetermined number of sample points as starting points to analyze the pixels within the bounding box to determine whether the pixel may be associated with the human target or the environment. For example, the target recognition, analysis, and tracking system may randomly select one or more sample points within the bounding box. In one embodiment, the pixels associated with the randomly selected sample points may be reference pixels that may be used to initially compare pixels to detect edges of the foreground object such as the human target, which will be described in more detail below.
According to another embodiment, the various locations of the sample points 902 may be randomly selected using, for example, a shape. For example, a shape such as a diamond shape may be used to randomly select the sample points 902. The various locations along, for example, the shape such as the diamond shape may be selected as the sample points 902.
Additionally, the various locations of the sample points 902 may be based on, for example, one or more body parts of the human target 602 determined by the scan. For example, the various locations of the sample points 902 may be selected based on the shoulder width, the body length, the arm length, or the like of the human target. Additionally, the sample points 902 may be selected to cover, for example, the upper body, the lower body, or a combination of the upper and lower body of the human target 602.
Referring back to
According to an example embodiment, if the various depth values being compared may be greater than a predetermined edge tolerance, the pixels may define an edge. In one embodiment, the predetermined edge tolerance may be, for example, 100 millimeters. If a pixel representing a depth value of 1000 millimeters may be compared with an adjacent pixel representing a depth value of 1200 millimeters, the pixels may define an edge of a human target such as the human targets 602, 604, because the difference in the length or distance between the pixels may be greater than the predetermined edge tolerance of 100 mm.
According to an example embodiment, the edge tolerance value may vary between pixels. For example, for pixels in front of a chest of the human target, a higher tolerance value may be used to detect the edge of the human target. For example, the human target may hold his/her arms in front of the his/her chest. To accurately detect the edges of the hands of the human target 602, the target recognition, analysis, and tracking system may use a higher tolerance value. In another example, the human targets may extend his/her arms away from his/her torso. In this example, the target recognition, analysis, and tracking system may use a lower tolerance value to detect the edges of the human target's hands. According to one embodiment, the variable edge tolerance may be determined based on, for example, a location of the pixel, a length of an arm of the human target, and/or a width of the shoulder of the human target. According to another example embodiment, the variable edge tolerance may be interpolated such that the detected edge may be a smooth curve.
In one embodiment, the pixels within the detected edges of the human targets may be flood filled to isolate and/or identify the human target such as the human targets 602, 604. The pixels that may not be flood filled may then be identified or associated with the environment of the depth image such that the pixels may be removed, which will be described in more detail below.
According to an example embodiment, one body part of the human target 602 may be separated from another body part of the human body. For example, as described above with respect to
Additionally, as described above, the body parts that may be separated by, for example, facial hair, various articles of clothing, or the like by invalid depth values. For example, the capture device such as the capture device 20 described above with respect to
As shown in
As described above, each sample point may serve as the starting points to determine whether pixels are associated with a human target such that the human target may be flood filled. For example, the target recognition, analysis, and tracking system may start flood filling at a first sample point that may be the centroid of a human target 602. Thereafter, the target recognition, analysis, and tracking system may pick a second sample point to determine whether pixels are associated with the human target 602.
In an example embodiment, the target recognition, analysis, and tracking system may examine the depth value of the each of the sample points. For example, if the depth value of the second sample point may be close or within a predetermined tolerance to the depth value of the centroid of the human target 602, the target recognition, analysis, and tracking system may identify the sample point as being associated with an isolated body part of the human target 602. As described above, according to one example embodiment, the predefined tolerance may be determined based on values including, but not limited to, locations and/or measurements such as length, width, or the like associated with one or more body parts. Thus, according to an example embodiment, the target recognition, analysis, and tracking system may use the sample points as starting points to determine whether pixels are associated with a human target such that the pixels may be flood filled.
Referring back to
According to one embodiment, the maximum depth value of a pixel in depth history data may be estimated. For example, as shown in
According to one embodiment, the maximum depth values in the depth history data may be updated as the capture device such as the capture device 20 observes or captures depth images from frame to frame. For example, in a first frame, the depth image may capture a human target 602 on the left half of the frame, and environment objects on the right half of the frame may be exposed to the camera. The maximum depth values in the depth history data may be updated to reflect the depth values of pixels associated with environment objects on the right side of the frame. For example, when the human target 602 moves to the right half of the frame, environment objects on the left hand side of the frame may be exposed to the capture device 20. The maximum depth values in the depth history data may be updated to reflect the depth values of pixels associated with environment objects on the left half of the camera view. In other words, as the human target moves from frame to frame, a environment object may be visible to the capture device such that the depth history data may be updated to reflect the depth values of pixels associated with the environment object.
In example embodiments, the target recognition, analysis, and tracking system may update the maximum depth values for a subset of pixels in each frame. For example, a frame may include a predefined number of scan lines scan lines or the like. In one embodiment, the target recognition, analysis, and tracking system may update the maximum depth values for pixels on one horizontal scan line per frame in an top to bottom direction, by temporal averaging or other suitable mechanism of updating the history pixels over multiple frames. In other embodiments, the system may update the maximum depth values of pixels, in a bottom to top direction, or update one vertical scan line per frame, in a left to right direction, or in a right to left direction, or the like. Accordingly, the maximum depth values of a frame may be updated gradually to keep to track of objects in the camera view.
According to an example embodiment, a depth value of the pixel being examined may be compared to the maximum depth value of the pixel based on the depth history data. For example, if the pixel being exampled may have the same depth value as the historical maximum depth value of the pixel, the target recognition, analysis, and tracking system may determine that the pixel may be associated with the environment of the depth image. Alternatively, in one embodiment, if the depth value of the pixel being examined may be less than the historical maximum depth value of the pixel within, for example, a predetermined tolerance value described above, the target recognition, analysis, and tracking system may determine that the pixel may be associated with a foreground object such as the human target and the pixel may then be flood filled. Thus, according to an example embodiment, the depth history data may be used to confirm that a pixel may be associated with a human target.
Alternatively, if the depth value of a pixel in the portion 810 may be less than the historical maximum depth value of the pixel in the portion 1020, the target recognition, analysis, and tracking system may determine that the pixel may be associated with the human target 602 such that the pixel may be flood filled.
According to one embodiment, the target, recognition, analysis, and tracking system may check the depth history data when an edge having a small predetermined tolerance value may be detected. For example, the target, recognition, analysis, and tracking system may determine whether the depth difference between two pixels that may define an edge may be within a predetermined tolerance value. If the depth difference may be less than the predetermined value, the target, recognition, analysis, and tracking system may proceed to access the depth history data. In an example embodiment, the tolerance value may be predetermined based on noise in the depth image received, captured, or observed by the capture device such as the capture device 20 shown in
The depth history data may further include floor pixels. For example, the difference between dept values associated with feet of a human target such as the human target 602 and the floor may be within a small predetermined tolerance or value similar to when a hand of the human target may touch a wall as described above. The target, recognition, analysis, and tracking the system may further track the depth values of pixels associated with the floor in depth history data. For example, the depth values of the floor may be detected and stored or recorded into the depth history data. When examining a pixel in the floor area, the target, recognition, analysis, and tracking system may compare the depth value of the pixel being examined with the corresponding floor pixel in depth history data.
Referring back to
Referring back to
For example, according to an example embodiment, a model such as a skeletal model, a mesh human model, or the like of a user such as the user 18 described above with respect to
The visual appearance of an on-screen character may then be changed in response to changes to the model being tracked. For example, a user such as the user 18 described above with respect to
In one embodiment, the target recognition, analysis, and tracking system may not be able to process the second depth image at 530. For example, the depth image may be too noisy or include too may empty pixels such that the depth image may not be processed. According to one embodiment, if the depth values may be too noisy, the target recognition, analysis, and tracking system may generate an error message that may be provided to a user such as the user 18 described above with respect to
It should be understood that the configurations and/or approaches described herein are exemplary in nature, and that these specific embodiments or examples are not to be considered limiting. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated may be performed in the sequence illustrated, in other sequences, in parallel, or the like. Likewise, the order of the above-described processes may be changed.
The subject matter of the present disclosure includes all novel and nonobvious combinations and subcombinations of the various processes, systems and configurations, and other features, functions, acts, and/or properties disclosed herein, as well as any and all equivalents thereof.
Number | Name | Date | Kind |
---|---|---|---|
4627620 | Yang | Dec 1986 | A |
4630910 | Ross et al. | Dec 1986 | A |
4645458 | Williams | Feb 1987 | A |
4695953 | Blair et al. | Sep 1987 | A |
4702475 | Elstein et al. | Oct 1987 | A |
4711543 | Blair et al. | Dec 1987 | A |
4751642 | Silva et al. | Jun 1988 | A |
4796997 | Svetkoff et al. | Jan 1989 | A |
4809065 | Harris et al. | Feb 1989 | A |
4817950 | Goo | Apr 1989 | A |
4843568 | Krueger et al. | Jun 1989 | A |
4893183 | Nayar | Jan 1990 | A |
4901362 | Terzian | Feb 1990 | A |
4925189 | Braeunig | May 1990 | A |
5101444 | Wilson et al. | Mar 1992 | A |
5148154 | MacKay et al. | Sep 1992 | A |
5184295 | Mann | Feb 1993 | A |
5229754 | Aoki et al. | Jul 1993 | A |
5229756 | Kosugi et al. | Jul 1993 | A |
5239463 | Blair et al. | Aug 1993 | A |
5239464 | Blair et al. | Aug 1993 | A |
5288078 | Capper et al. | Feb 1994 | A |
5295491 | Gevins | Mar 1994 | A |
5320538 | Baum | Jun 1994 | A |
5347306 | Nitta | Sep 1994 | A |
5385519 | Hsu et al. | Jan 1995 | A |
5405152 | Katanics et al. | Apr 1995 | A |
5417210 | Funda et al. | May 1995 | A |
5423554 | Davis | Jun 1995 | A |
5454043 | Freeman | Sep 1995 | A |
5469740 | French et al. | Nov 1995 | A |
5495576 | Ritchey | Feb 1996 | A |
5516105 | Eisenbrey et al. | May 1996 | A |
5524637 | Erickson et al. | Jun 1996 | A |
5534917 | MacDougall | Jul 1996 | A |
5563988 | Maes et al. | Oct 1996 | A |
5577981 | Jarvik | Nov 1996 | A |
5580249 | Jacobsen et al. | Dec 1996 | A |
5594469 | Freeman et al. | Jan 1997 | A |
5597309 | Riess | Jan 1997 | A |
5616078 | Oh | Apr 1997 | A |
5617312 | Iura et al. | Apr 1997 | A |
5638300 | Johnson | Jun 1997 | A |
5641288 | Zaenglein | Jun 1997 | A |
5682196 | Freeman | Oct 1997 | A |
5682229 | Wangler | Oct 1997 | A |
5690582 | Ulrich et al. | Nov 1997 | A |
5703367 | Hashimoto et al. | Dec 1997 | A |
5704837 | Iwasaki et al. | Jan 1998 | A |
5715834 | Bergamasco et al. | Feb 1998 | A |
5875108 | Hoffberg et al. | Feb 1999 | A |
5877803 | Wee et al. | Mar 1999 | A |
5913727 | Ahdoot | Jun 1999 | A |
5933125 | Fernie | Aug 1999 | A |
5980256 | Carmein | Nov 1999 | A |
5989157 | Walton | Nov 1999 | A |
5995649 | Marugame | Nov 1999 | A |
6005548 | Latypov et al. | Dec 1999 | A |
6009210 | Kang | Dec 1999 | A |
6054991 | Crane et al. | Apr 2000 | A |
6057909 | Yahav et al. | May 2000 | A |
6066075 | Poulton | May 2000 | A |
6072494 | Nguyen | Jun 2000 | A |
6073489 | French et al. | Jun 2000 | A |
6077201 | Cheng et al. | Jun 2000 | A |
6098458 | French et al. | Aug 2000 | A |
6100517 | Yahav et al. | Aug 2000 | A |
6100896 | Strohecker et al. | Aug 2000 | A |
6101289 | Kellner | Aug 2000 | A |
6128003 | Smith et al. | Oct 2000 | A |
6130677 | Kunz | Oct 2000 | A |
6141463 | Covell et al. | Oct 2000 | A |
6147678 | Kumar et al. | Nov 2000 | A |
6152856 | Studor et al. | Nov 2000 | A |
6159100 | Smith | Dec 2000 | A |
6173066 | Peurach et al. | Jan 2001 | B1 |
6181343 | Lyons | Jan 2001 | B1 |
6188777 | Darrell et al. | Feb 2001 | B1 |
6215890 | Matsuo et al. | Apr 2001 | B1 |
6215898 | Woodfill et al. | Apr 2001 | B1 |
6226396 | Marugame | May 2001 | B1 |
6229913 | Nayar et al. | May 2001 | B1 |
6256033 | Nguyen | Jul 2001 | B1 |
6256400 | Takata et al. | Jul 2001 | B1 |
6283860 | Lyons et al. | Sep 2001 | B1 |
6289112 | Jain et al. | Sep 2001 | B1 |
6299308 | Voronka et al. | Oct 2001 | B1 |
6308565 | French et al. | Oct 2001 | B1 |
6316934 | Amorai-Moriya et al. | Nov 2001 | B1 |
6363160 | Bradski et al. | Mar 2002 | B1 |
6384819 | Hunter | May 2002 | B1 |
6411744 | Edwards | Jun 2002 | B1 |
6430997 | French et al. | Aug 2002 | B1 |
6476834 | Doval et al. | Nov 2002 | B1 |
6496598 | Harman | Dec 2002 | B1 |
6498628 | Iwamura | Dec 2002 | B2 |
6502515 | Burckhardt et al. | Jan 2003 | B2 |
6503195 | Keller et al. | Jan 2003 | B1 |
6539931 | Trajkovic et al. | Apr 2003 | B2 |
6570555 | Prevost et al. | May 2003 | B1 |
6633294 | Rosenthal et al. | Oct 2003 | B1 |
6640202 | Dietz et al. | Oct 2003 | B1 |
6661918 | Gordon et al. | Dec 2003 | B1 |
6674877 | Jojic et al. | Jan 2004 | B1 |
6681031 | Cohen et al. | Jan 2004 | B2 |
6714665 | Hanna et al. | Mar 2004 | B1 |
6731799 | Sun et al. | May 2004 | B1 |
6738066 | Nguyen | May 2004 | B1 |
6765726 | French et al. | Jul 2004 | B2 |
6771277 | Ohba | Aug 2004 | B2 |
6788809 | Grzeszczuk et al. | Sep 2004 | B1 |
6801637 | Voronka et al. | Oct 2004 | B2 |
6873723 | Aucsmith et al. | Mar 2005 | B1 |
6876496 | French et al. | Apr 2005 | B2 |
6937742 | Roberts et al. | Aug 2005 | B2 |
6950534 | Cohen et al. | Sep 2005 | B2 |
7003134 | Covell et al. | Feb 2006 | B1 |
7006236 | Tomasi et al. | Feb 2006 | B2 |
7016411 | Azuma et al. | Mar 2006 | B2 |
7036094 | Cohen et al. | Apr 2006 | B1 |
7038855 | French et al. | May 2006 | B2 |
7039676 | Day et al. | May 2006 | B1 |
7042440 | Pryor et al. | May 2006 | B2 |
7050177 | Tomasi et al. | May 2006 | B2 |
7050606 | Paul et al. | May 2006 | B2 |
7058204 | Hildreth et al. | Jun 2006 | B2 |
7060957 | Lange et al. | Jun 2006 | B2 |
7113918 | Ahmad et al. | Sep 2006 | B1 |
7121946 | Paul et al. | Oct 2006 | B2 |
7151530 | Roeber et al. | Dec 2006 | B2 |
7170492 | Bell | Jan 2007 | B2 |
7184048 | Hunter | Feb 2007 | B2 |
7202898 | Braun et al. | Apr 2007 | B1 |
7222078 | Abelow | May 2007 | B2 |
7224384 | Iddan et al. | May 2007 | B1 |
7227526 | Hildreth et al. | Jun 2007 | B2 |
7259747 | Bell | Aug 2007 | B2 |
7293356 | Sohn et al. | Nov 2007 | B2 |
7308112 | Fujimura et al. | Dec 2007 | B2 |
7310431 | Gokturk et al. | Dec 2007 | B2 |
7317836 | Fujimura et al. | Jan 2008 | B2 |
7340077 | Gokturk et al. | Mar 2008 | B2 |
7348963 | Bell | Mar 2008 | B2 |
7359121 | French et al. | Apr 2008 | B2 |
7367887 | Watabe et al. | May 2008 | B2 |
7379563 | Shamaie | May 2008 | B2 |
7379566 | Hildreth | May 2008 | B2 |
7389591 | Jaiswal et al. | Jun 2008 | B2 |
7412077 | Li et al. | Aug 2008 | B2 |
7421093 | Hildreth et al. | Sep 2008 | B2 |
7430312 | Gu | Sep 2008 | B2 |
7436496 | Kawahito | Oct 2008 | B2 |
7450736 | Yang et al. | Nov 2008 | B2 |
7452275 | Kuraishi | Nov 2008 | B2 |
7460690 | Cohen et al. | Dec 2008 | B2 |
7489812 | Fox et al. | Feb 2009 | B2 |
7536032 | Bell | May 2009 | B2 |
7555142 | Hildreth et al. | Jun 2009 | B2 |
7560701 | Oggier et al. | Jul 2009 | B2 |
7570805 | Gu | Aug 2009 | B2 |
7574020 | Shamaie | Aug 2009 | B2 |
7576727 | Bell | Aug 2009 | B2 |
7590262 | Fujimura et al. | Sep 2009 | B2 |
7593552 | Higaki et al. | Sep 2009 | B2 |
7598942 | Underkoffler et al. | Oct 2009 | B2 |
7607509 | Schmiz et al. | Oct 2009 | B2 |
7620202 | Fujimura et al. | Nov 2009 | B2 |
7668340 | Cohen et al. | Feb 2010 | B2 |
7680298 | Roberts et al. | Mar 2010 | B2 |
7683954 | Ichikawa et al. | Mar 2010 | B2 |
7684592 | Paul et al. | Mar 2010 | B2 |
7701439 | Hillis et al. | Apr 2010 | B2 |
7702130 | Im et al. | Apr 2010 | B2 |
7704135 | Harrison, Jr. | Apr 2010 | B2 |
7710391 | Bell et al. | May 2010 | B2 |
7729530 | Antonov et al. | Jun 2010 | B2 |
7746345 | Hunter | Jun 2010 | B2 |
7760182 | Ahmad et al. | Jul 2010 | B2 |
7809167 | Bell | Oct 2010 | B2 |
7834846 | Bell | Nov 2010 | B1 |
7852262 | Namineni et al. | Dec 2010 | B2 |
RE42256 | Edwards | Mar 2011 | E |
7898522 | Hildreth et al. | Mar 2011 | B2 |
8035612 | Bell et al. | Oct 2011 | B2 |
8035614 | Bell et al. | Oct 2011 | B2 |
8035624 | Bell et al. | Oct 2011 | B2 |
8072470 | Marks | Dec 2011 | B2 |
20040155962 | Marks | Aug 2004 | A1 |
20040207597 | Marks | Oct 2004 | A1 |
20050059488 | Larsen et al. | Mar 2005 | A1 |
20050215319 | Rigopulos et al. | Sep 2005 | A1 |
20060188144 | Sasaki et al. | Aug 2006 | A1 |
20060239558 | Rafii et al. | Oct 2006 | A1 |
20070013718 | Ohba | Jan 2007 | A1 |
20070060336 | Marks et al. | Mar 2007 | A1 |
20070098222 | Porter et al. | May 2007 | A1 |
20070110298 | Graepel et al. | May 2007 | A1 |
20070216894 | Garcia et al. | Sep 2007 | A1 |
20070260984 | Marks et al. | Nov 2007 | A1 |
20070279485 | Ohba et al. | Dec 2007 | A1 |
20070283296 | Nilsson | Dec 2007 | A1 |
20070298882 | Marks et al. | Dec 2007 | A1 |
20080001951 | Marks et al. | Jan 2008 | A1 |
20080026838 | Dunstan et al. | Jan 2008 | A1 |
20080062257 | Corson | Mar 2008 | A1 |
20080095436 | Kim et al. | Apr 2008 | A1 |
20080100620 | Nagai et al. | May 2008 | A1 |
20080118118 | Berger | May 2008 | A1 |
20080126937 | Pachet | May 2008 | A1 |
20080134102 | Movold et al. | Jun 2008 | A1 |
20080152191 | Fujimura et al. | Jun 2008 | A1 |
20080215972 | Zalewski et al. | Sep 2008 | A1 |
20080215973 | Zalewski et al. | Sep 2008 | A1 |
20080278487 | Gobert | Nov 2008 | A1 |
20090085864 | Kutliroff et al. | Apr 2009 | A1 |
20090141933 | Wagg | Jun 2009 | A1 |
20090167679 | Klier et al. | Jul 2009 | A1 |
20090213240 | Sim et al. | Aug 2009 | A1 |
20090221368 | Yen et al. | Sep 2009 | A1 |
Number | Date | Country |
---|---|---|
1395231 | Feb 2003 | CN |
101254344 | Jun 2010 | CN |
0583061 | Feb 1994 | EP |
08044490 | Feb 1996 | JP |
9310708 | Jun 1993 | WO |
9717598 | May 1997 | WO |
WO 9915863 | Apr 1999 | WO |
9944698 | Sep 1999 | WO |
WO 0159975 | Aug 2001 | WO |
WO 02082249 | Oct 2002 | WO |
WO 03001722 | Jan 2003 | WO |
WO 03046706 | Jun 2003 | WO |
WO 03054683 | Jul 2003 | WO |
WO 03071410 | Aug 2003 | WO |
WO 03073359 | Sep 2003 | WO |
WO 2009059065 | May 2009 | WO |
Entry |
---|
PCT Application No. PCT/US2010/035902: International Search Report and Written Opinion of the International Searching Authority, Dec. 22, 2010, 8 pages. |
Shivappa et al., “Person Tracking with Audio-Visual Cues Using Iterative Decoding Framework”, IEEE Fifth International Conference on Advanced Video and Signal Based Surveillance, AVSS '08, Santa Fe, NM, 2008, 260-267. |
Kanade et al., “A Stereo Machine for Video-rate Dense Depth Mapping and Its New Applications”, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1996, pp. 196-202,The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA. |
Miyagawa et al., “CCD-Based Range Finding Sensor”, Oct. 1997, pp. 1648-1652, vol. 44 No. 10, IEEE Transactions on Electron Devices. |
Rosenhahn et al., “Automatic Human Model Generation”, 2005, pp. 41-48, University of Auckland (CITR), New Zealand. |
Aggarwal et al., “Human Motion Analysis: A Review”, IEEE Nonrigid and Articulated Motion Workshop, 1997, University of Texas at Austin, Austin, TX. |
Shao et al., “An Open System Architecture for a Multimedia and Multimodal User Interface”, Aug. 24, 1998, Japanese Society for Rehabilitation of Persons with Disabilities (JSRPD), Japan. |
Hasegawa et al., “Human-Scale Haptic Interaction with a Reactive Virtual Human in a Real-Time Physics Simulator”, Jul. 2006, vol. 4, No. 3, Article 6C, ACM Computers in Entertainment, New York, NY. |
Qian et al., “A Gesture-Driven Multimodal Interactive Dance System”, Jun. 2004, pp. 1579-1582, IEEE International Conference on Multimedia and Expo (ICME), Taipei, Taiwan. |
Zhao, “Dressed Human Modeling, Detection, and Parts Localization”, 2001, The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA. |
He, “Generation of Human Body Models”, Apr. 2005, University of Auckland, New Zealand. |
Isard et al., “Condensation—Conditional Density Propagation for Visual Tracking”, 1998, pp. 5-28, International Journal of Computer Vision 29(1), Netherlands. |
Livingston, “Vision-based Tracking with Dynamic Structured Light for Video See-through Augmented Reality”, 1998, University of North Carolina at Chapel Hill, North Carolina, USA. |
Wren et al., “Pfinder: Real-Time Tracking of the Human Body”, MIT Media Laboratory Perceptual Computing Section Technical Report No. 353, Jul. 1997, vol. 19, No. 7, pp. 780-785, IEEE Transactions on Pattern Analysis and Machine Intelligence, Caimbridge, MA. |
Breen et al., “Interactive Occlusion and Collusion of Real and Virtual Objects in Augmented Reality”, Technical Report ECRC-95-02, 1995, European Computer-Industry Research Center GmbH, Munich, Germany. |
Freeman et al., “Television Control by Hand Gestures”, Dec. 1994, Mitsubishi Electric Research Laboratories, TR94-24, Caimbridge, MA. |
Pavlovic et al., “Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review”, Jul. 1997, pp. 677-695, vol. 19, No. 7, IEEE Transactions on Pattern Analysis and Machine Intelligence. |
Azarbayejani et al., “Visually Controlled Graphics”, Jun. 1993, vol. 15, No. 6, IEEE Transactions on Pattern Analysis and Machine Intelligence. |
Granieri et al., “Simulating Humans in VR”, The British Computer Society, Oct. 1994, Academic Press. |
Brogan et al., “Dynamically Simulated Characters in Virtual Environments”, Sep./Oct. 1998, pp. 2-13, vol. 18, Issue 5, IEEE Computer Graphics and Applications. |
Fisher et al., “Virtual Environment Display System”, ACM Workshop on Interactive 3D Graphics, Oct. 1986, Chapel Hill, NC. |
“Virtual High Anxiety”, Tech Update, Aug. 1995, pp. 22. |
Sheridan et al., “Virtual Reality Check”, Technology Review, Oct. 1993, pp. 22-28, vol. 96, No. 7. |
Number | Date | Country | |
---|---|---|---|
20100302395 A1 | Dec 2010 | US |