Depth cameras may be utilized to capture depth information as well as additional information such as brightness, color, etc. for a matrix of pixels. Such information may then be utilized to model targets that are present in a viewed scene.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.
According to one aspect of the disclosure, a method of automatically aiming a depth camera at a point of interest is provided. The method includes receiving from the depth camera one or more observed depth images of a scene. The method further includes, if a point of interest of a target is found within the scene, determining if the point of interest is within a far range relative to the depth camera. The method further includes, if the point of interest of the target is within the far range, operating the depth camera with a far logic, or if the point of interest of the target is not within the far range, operating the depth camera with a near logic.
A target tracking system may be used to recognize, analyze, and/or track one or more targets, such as game player 18.
The example scenario illustrated in
Other movements by game player 18 may be interpreted as other controls, such as controls to bob, weave, shuffle, block, jab, or throw a variety of different power punches. Furthermore, some movements may be interpreted into controls that serve purposes other than controlling player avatar 24. For example, the player may use movements to end, pause, or save a game, select a level, view high scores, communicate with a friend, etc.
Target tracking systems may be used to interpret target movements as operating system and/or application controls that are outside the realm of gaming. Virtually any controllable aspect of an operating system and/or application, such as the boxing game shown in
Gaming system 12, or another suitable computing device, may be configured to represent one or more targets observed via the depth camera with a model. The model may be represented by one or more polygonal meshes, by a set of mathematical primitives, by a skeletal model including a plurality of joint locations, and/or via other suitable machine representations of the modeled target.
In some scenarios, one or more aspects of a target may be of particular interest to a target tracking system. As one nonlimiting example, the head of a player target may be of particular interest in some scenarios. Such aspects of a target that are of particular interest may be referred to as points of interest of the target. Such points of interest may be particular body parts of a player target, a particular item or prop, or virtually any other aspect that is viewable within a scene. Such points of interest may be modeled, as introduced above. For example, a head of a player target may be modeled via a machine representation of the head (e.g., a polygonal mesh, a skeletal member, a data structure indicating a position and volume, etc.).
A point of interest can be variously positioned based on a variety of different characteristics of a particular target. For example, continuing with the example introduced above, the head of a player target can be at different positions depending on the height of the player target. Furthermore, the head of a player target can be at different angles relative to a depth camera depending on the height of the head, the height of the depth camera, and/or how far the player target is standing away from the depth camera.
A point of interest of a player target may not be within the field of view of a depth camera depending on where the depth camera is aimed. To illustrate this concept,
As can be seen in the high-view 44, low-view 46, and mid-view 48 of
At 52, method 50 includes determining if a point of interest (POI) of a target is found within the scene. As an example, the point of interest may be a head of a target game player. In such a scenario, gaming system 12 of
At 54, if a point of interest of a target is found within the scene, method 50 includes holding an aiming vector of the depth camera. In other words, if the point of interest is already within the field of view, method 50 may avoid unnecessary depth camera movements by maintaining a current aiming vector. Holding the depth camera so that the aiming vector does not move may include refraining from sending “move” instructions to the depth camera and/or an aiming assembly configured to selectively change an aiming vector of the depth camera.
At 56, method 50 includes determining if the point of interest has been lost. In some embodiments, determining if the point of interest is lost includes determining if the point of interest is near a center of the field of view and/or within an edge of the field of view by at least a predetermined tolerance. If the point of interest is not near the center and/or is deemed to be too close to an edge, the point of interest is considered lost and the aiming vector of the depth camera may be nudged so as to move the point of interest closer to a center of the field of view. However, in some embodiments, the depth camera will hold an aiming vector until a point of interest leaves the field of view. If the point of interest is not lost, method 50 may continue to hold the depth camera at 54 and monitor the point of interest at 56.
At 58, method 50 includes moving the aiming vector of the depth camera if the point of interest of the target is not found within the scene at 52 and/or if the point of interest is lost at 56. As explained below, the logic for moving a depth camera may vary depending on one or more factors. As an example, depth camera aiming may be handled differently depending on if a target is within a near range, relatively close to a depth camera (e.g., player target 30a in
At 62, method 60 includes determining if a depth camera is initialized. Such a determination may include determining if a relative position and/or orientation of the depth camera is known.
At 64, method 60 includes initializing the depth camera if the depth camera is not currently initialized. As an example, the relative height of the depth camera above the floor may be a parameter that is considered when making subsequent aiming decisions. As such, initializing the depth camera may include determining the relative height of the depth camera above the floor. In some embodiments, this may be accomplished, at least in part, by analyzing one or more depth images of a scene with a floor-finding algorithm to locate the floor within the scene. In other words, a floor surface may be found within the scene and a height of the depth camera above the floor surface may be calculated using depth information from the depth camera. Any suitable floor finding algorithm may be used without departing from the scope of this disclosure. As one nonlimiting example, a plurality of rows of a depth image may be scanned in screen space, and a straight depth line may be interpolated through the deepest observed points on the left and right sides of the image for each row. A pair of straight boundary lines may then be fit to the endpoints of the straight depth lines, and a floor plane may be defined to include these straight boundary lines. In some embodiments, a user may manually input the position of the depth camera. Other methods of initializing the depth camera may be used without departing from the scope of this disclosure.
At 66, method 60 includes determining if a point of interest is within the scene. As a nonlimiting example, if the head of a player target is the point of interest for a particular application, the scene can be examined to determine if the head of a player target is visible within the scene. This may be accomplished via any suitable image analysis and/or modeling technique without departing from the scope of this disclosure. As one nonlimiting example, each pixel of an observed depth image may be labeled as either a foreground pixel belonging to the target or a background pixel not belonging to the target. Each foreground pixel may then be labeled with body part information indicating a likelihood that that foreground pixel belongs to one or more body parts of the target. The target may then be modeled with a skeleton including a plurality of skeletal points, each skeletal point including a three dimensional position derived from body part information of one or more foreground pixels. It may then be determined if the point of interest (e.g., the head) is in view and modeled by the skeleton.
At 68, method 60 includes determining if the point of interest is within a far range relative to the depth camera or if the point of interest is within a near range relative to the camera. The parameters of the near range and the far range may be set based on a variety of different considerations, including the field of view of the depth camera. In one embodiment, the near range is set as 0 to 2.0 horizontal meters away from the depth camera, and the far range is anything farther than 2.0 horizontal meters away from the depth camera.
At 70, if the point of interest of the target is within the far range, the depth camera is operated with a far logic, as described by way of example with reference to
As mentioned above, method 60 may be used in conjunction with method 50 of
At 82, method 80 includes determining if a point of interest of a target is found within the scene. At 84, if the point of interest is found within the scene, method 80 may include continuing to operate with the far logic. As indicated at 85, this may include holding the depth camera so that the aiming vector does not move if the point of interest of the target is within the scene at the current aiming vector, as described above with reference to 54 of
At 86, if a point of interest of the target is not within the scene at a current aiming vector, method 80 includes determining if there is any motion within the scene (e.g., the depth values of pixels are changing from frame to frame as the target moves relative to a static background). At 88, if there is motion within the scene, method 80 includes determining if the motion is in the far field (e.g., the target pixels are within the far field). At 90, if the motion is within the far field, method 80 includes aiming the depth camera so that an aiming vector of the depth camera points towards detected motion in the scene. In this way, a player target that is near the edge of the field of view of the depth camera can be shifted towards a center of the field of view, thus providing the depth camera with a good opportunity to find the point of interest.
At 92, if the motion is not within the far field (e.g., the target pixels are within the near field), method 80 includes switching to the near logic, as described with reference to
If the point of interest cannot be found at 82 and no motion is detected in the scene at 86, this may indicate that the player target is not in the scene. At 94, method 80 includes determining if the depth camera is aimed at a default far focus. The default far focus may be a three dimensional coordinate vertically measured with reference to the floor and horizontally measured with reference to the depth camera. The default far focus may be selected based on an estimated position of a point of interest at a certain range. For example, if the head of a player target is the point of interest, it can be estimated that the average head will be located 1.5 meters above the floor and that the average player stands 3.0 meters away from the depth camera. As such, the default far focus may be located 3.0 meters away from the depth camera and 1.5 meters above the floor.
The height of the default far focus above the floor may be set based on an average height of people in a target demographic (e.g., average height of game players between 8 years old and 40 years old). The horizontal distance of the default far focus away from the depth camera can be set based on an estimated play position of players relative to the depth camera and display (e.g., HDTV). For example, if it is estimated that game players usually stand 3.0 meters away from the display and depth camera, the default far focus may be set 3.0 horizontal meters away from the depth camera.
The examples provided above are not limiting. It should be understood that the default far focus can be set at any location. In general, the default far focus may be chosen so as to provide a depth camera with a field of view that is likely to capture the points of interest of player targets that may be different sizes and/or standing in different positions.
In some embodiments, the default far focus may correspond to the height of the depth camera above the floor. That is, the default far focus depends on the height of the depth camera as determined during initialization. In such embodiments, a depth camera at a first height will have a different default far focus than a depth camera at a second height. Such variations may facilitate depth camera aiming along different angles, which depend at least in part on the height of the depth camera.
Returning to
Returning to
Turning now to
At 102, method 100 includes determining if a point of interest of a target is found within the scene. At 104, if the point of interest is found within the scene, method 100 may include exploring a move target option as discussed below with reference to
Returning to
At 112, if a point of interest of the target is not within the scene at a current aiming vector, method 100 includes determining if there is any motion within the scene (e.g., the depth values of pixels are changing from frame to frame as the target moves relative to a static background). At 114, if there is motion within the scene, method 100 includes determining if the motion is in the near field (e.g., the target pixels are within the near field). At 116, if the motion is within the near field, method 100 includes aiming the depth camera so that an aiming vector of the depth camera points towards detected motion in the scene. In this way, a player target that is near the edge of the field of view of the depth camera can be shifted towards a center of the field of view, thus providing the depth camera with a good opportunity to find the point of interest.
At 118, if the motion is not within the near field (e.g., the target pixels are within the far field), method 100 includes switching to the far logic, as described with reference to
If the point of interest cannot be found at 102 and no motion is detected in the scene at 112, this may indicate that the player target is not in the scene. At 120, method 100 includes determining if the depth camera is aimed at a default near focus. Similar to the default far focus, the default near focus may be a three dimensional coordinate vertically measured with reference to the floor and horizontally measured with reference to the depth camera. The default near focus may be selected based on an estimated position of a point of interest at a certain range.
Like the default far focus, the default near focus can be set at any location. In general, the default near focus may be chosen so as to provide a depth camera with a field of view that is likely to capture the points of interest of player targets that may be different sizes and/or standing in different positions when those player targets are relatively near the depth camera. Further, like the default far focus, the default near focus may correspond to the height of the depth camera above the floor.
Returning to
Returning to
The methods and processes described herein may be tied to a variety of different types of computing systems.
Computing system 180 includes a logic subsystem 182, a data-holding subsystem 184, a depth camera 186, and an aiming assembly 188. Computing system 180 may optionally include a display subsystem 190 and/or other components not shown in
Logic subsystem 182 may include one or more physical devices configured to execute one or more instructions. For example, the logic subsystem may be configured to execute one or more instructions that are part of one or more programs, routines, objects, components, data structures, or other logical constructs. Such instructions may be implemented to perform a task, implement a data type, transform the state of one or more devices, or otherwise arrive at a desired result. The logic subsystem may include one or more processors that are configured to execute software instructions. Additionally or alternatively, the logic subsystem may include one or more hardware or firmware logic machines configured to execute hardware or firmware instructions. The logic subsystem may optionally include individual components that are distributed throughout two or more devices, which may be remotely located in some embodiments.
Data-holding subsystem 184 may include one or more physical, non-transitory, devices configured to hold data and/or instructions executable by the logic subsystem to implement the herein described methods and processes. When such methods and processes are implemented, the state of data-holding subsystem 184 may be transformed (e.g., to hold different data). Data-holding subsystem 184 may include removable media and/or built-in devices. Data-holding subsystem 184 may include optical memory devices, semiconductor memory devices, and/or magnetic memory devices, among others. Data-holding subsystem 184 may include devices with one or more of the following characteristics: volatile, nonvolatile, dynamic, static, read/write, read-only, random access, sequential access, location addressable, file addressable, and content addressable. In some embodiments, logic subsystem 182 and data-holding subsystem 184 may be integrated into one or more common devices, such as an application specific integrated circuit or a system on a chip.
When included, display subsystem 190 may be used to present a visual representation of data held by data-holding subsystem 184. As the herein described methods and processes change the data held by the data-holding subsystem, and thus transform the state of the data-holding subsystem, the state of display subsystem 190 may likewise be transformed to visually represent changes in the underlying data. Display subsystem 190 may include one or more display devices utilizing virtually any type of technology. Such display devices may be combined with logic subsystem 182 and/or data-holding subsystem 184 in a shared enclosure, or such display devices may be peripheral display devices.
Computing system 180 further includes a depth camera 186 configured to obtain depth images of one or more targets. Depth camera 186 may be configured to capture video with depth information via any suitable technique (e.g., time-of-flight, structured light, stereo image, etc.).
For example, in time-of-flight analysis, the depth camera 186 may emit infrared light to the target and may then use sensors to detect the backscattered light from the surface of the target. In some cases, pulsed infrared light may be used, wherein the time between an outgoing light pulse and a corresponding incoming light pulse may be measured and used to determine a physical distance from the depth camera to a particular location on the target. In some cases, the phase of the outgoing light wave may be compared to the phase of the incoming light wave to determine a phase shift, and the phase shift may be used to determine a physical distance from the depth camera to a particular location on the target.
In another example, time-of-flight analysis may be used to indirectly determine a physical distance from the depth camera to a particular location on the target by analyzing the intensity of the reflected beam of light over time, via a technique such as shuttered light pulse imaging.
In another example, structured light analysis may be utilized by depth camera 186 to capture depth information. In such an analysis, patterned light (i.e., light displayed as a known pattern such as grid pattern, a stripe pattern, a constellation of dots, etc.) may be projected onto the target. Upon striking the surface of the target, the pattern may become deformed, and this deformation of the pattern may be studied to determine a physical distance from the depth camera to a particular location on the target.
In another example, the depth camera may include two or more physically separated cameras that view a target from different angles to obtain visual stereo data. In such cases, the visual stereo data may be resolved to generate a depth image.
In other embodiments, depth camera 186 may utilize other technologies to measure and/or calculate depth values. Additionally, depth camera 186 may organize the calculated depth information into “Z layers,” i.e., layers perpendicular to a Z axis extending from the depth camera along its line of sight to the viewer.
In some embodiments, two or more different cameras may be incorporated into an integrated depth camera. For example, a depth camera and a video camera (e.g., RGB video camera) may be incorporated into a common depth camera. In some embodiments, two or more separate depth cameras may be cooperatively used. For example, a depth camera and a separate video camera may be used. When a video camera is used, it may be used to provide target tracking data, confirmation data for error correction of target tracking, image capture, face recognition, high-precision tracking of fingers (or other small features), light sensing, and/or other functions.
Aiming assembly 188 is configured to selectively change an aiming vector of the depth camera. The aiming assembly may include one or more machines that physically move the camera. In different embodiments, the aiming assembly may be configured to change the up/down pitch, left/right yaw, clockwise/counter-clockwise roll, up/down lift, and/or left/right translation of the camera. As nonlimiting examples, the aiming assembly may include a one to three axis gimbal with or without an up/down lift and/or a right/left slide.
The aiming assembly may include various motors, gears, lifts, slides, and other components that are used to change the aiming vector of the depth camera. The aiming methods described herein may decrease the need to repeatedly use such components. As such, physical wear and tear to bearings, wires, gears, motors, and other components may be decreased. By decreasing wear on the various components of the aiming assembly, the business value of the depth capture system is increased, because the life of the system may be increased and/or the maintenance costs of the system may be decreased.
It is to be understood that at least some target analysis and tracking operations may be executed by a logic machine of one or more depth cameras. A depth camera may include one or more onboard processing units configured to perform one or more target analysis and/or tracking functions. A depth camera may include firmware to facilitate updating such onboard processing logic.
It is to be understood that the configurations and/or approaches described herein are exemplary in nature, and that these specific embodiments or examples are not to be considered in a limiting sense, because numerous variations are possible. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated may be performed in the sequence illustrated, in other sequences, in parallel, or in some cases omitted. Likewise, the order of the above-described processes may be changed.
The subject matter of the present disclosure includes all novel and nonobvious combinations and subcombinations of the various processes, systems and configurations, and other features, functions, acts, and/or properties disclosed herein, as well as any and all equivalents thereof.