Differentiating a detected object from a background using a gaussian brightness falloff pattern

Description

FIELD OF THE INVENTION

The present disclosure relates generally to imaging systems and in particular to three-dimensional (3D) object detection, tracking and characterization using optical imaging.

BACKGROUND

Motion-capture systems are used in a variety of contexts to obtain information about the conformation and motion of various objects, including objects with articulating members, such as human hands or human bodies. Such systems generally include cameras to capture sequential images of an object in motion and computers to analyze the images to create a reconstruction of an object's volume, position, and motion. For 3D motion capture, at least two cameras are typically used.

Image-based motion-capture systems rely on the ability to distinguish an object of interest from a background. This is often achieved using image-analysis algorithms that detect edges, typically by comparing pixels to detect abrupt changes in color and/or brightness. Such conventional systems, however, suffer performance degradation under many common circumstances, e.g., low contrast between the object of interest and the background and/or patterns in the background that may falsely register as object edges.

In some instances, distinguishing object and background can be facilitated by “instrumenting” the object of interest, e.g., by having a person wear a mesh of reflectors or active light sources or the like while performing the motion. Special lighting conditions (e.g., low light) can be used to make the reflectors or light sources stand out in the images. Instrumenting the subject, however, is not always a convenient or desirable option.

SUMMARY

Certain embodiments of the present invention relate to imaging systems that improve object recognition by enhancing contrast between the object and background surfaces visible in an image using; this may be accomplished, for example, by means of controlled lighting directed at the object. For example, in a motion-capture system where an object of interest, such as a person's hand, is significantly closer to the camera than any background surfaces, the falloff of light intensity with distance (1/r2 for pointlike light sources) can be exploited by positioning a light source (or multiple light sources) near the camera(s) or other image capture device(s) and shining that light onto the object. Source light reflected by the nearby object of interest can be expected to be much brighter than light reflected from more distant background surfaces, and the more distant the background (relative to the object), the more pronounced the effect will be. Accordingly, in some embodiments, a threshold cutoff on pixel brightness in the captured images can be used to distinguish “object” pixels from “background” pixels. While broadband ambient light sources can be employed, various embodiments employ light having a confined wavelength range and a camera matched to detect such light; for example, an infrared source light can be used with one or more cameras sensitive to infrared frequencies.

Accordingly, in a first aspect, the invention pertains to an image capture and analysis system for identifying objects of interest in a digitally represented image scene. In various embodiments, the system comprises at least one camera oriented toward a field of view; at least one light source disposed on a same side of the field of view as the camera and oriented to illuminate the field of view; and an image analyzer coupled to the camera and the light source(s). The image analyzer may be configured to operate the camera(s) to capture a sequence of images including a first image captured at a time when the light source(s) are illuminating the field of view; identify pixels corresponding to the object rather than to the background (e.g., image components that are nearby or reflective); and based on the identified pixels, constructing a 3D model of the object, including a position and shape of the object, to geometrically determine whether it corresponds to the object of interest. In certain embodiments, the image analyzer distinguishes between (i) foreground image components corresponding to objects located within a proximal zone of the field of view, where the proximal zone extends from the camera(s) and has a depth relative thereto of at least twice the expected maximum distance between the objects corresponding to the foreground image components and the camera(s), and (ii) background image components corresponding to objects located within a distal zone of the field of view, where the distal zone is located, relative to the at least one camera, beyond the proximal zone. For example, the proximal zone may have a depth of at least four times the expected maximum distance.

In other embodiments, the image analyzer operates the camera(s) to capture second and third images when the light source(s) are not illuminating the field of view and identifies the pixels corresponding to the object based on the difference between the first and second images and the difference between the first and third images, where the second image is captured before the first image and the third image is captured after the second image.

The light source(s) may, for example, be diffuse emitters—e.g., infrared light-emitting diodes, in which case the camera(s) are an infrared-sensitive camera. Two or more light sources may be arranged to flank the camera(s) and be substantially coplanar therewith. In various embodiments, the camera(s) and the light source(s) are oriented vertically upward. To enhance contrast, the camera may be operated to provide an exposure time no greater than 100 microseconds and the light source(s) may be activated during exposure time at a power level of at least 5 watts. In certain implementations, a holographic diffraction grating is positioned between the lens of each camera and the field of view (i.e., in front of the camera lens).

The image analyzer may geometrically determine whether an object corresponds to the object of interest by identifying ellipses that volumetrically define a candidate object, discarding object segments geometrically inconsistent with an ellipse-based definition, and determining, based on the ellipses, whether the candidate object corresponds to the object of interest.

In another aspect, the invention pertains to a method for capturing and analyzing images. In various embodiments, the method comprises the steps of activating at least one light source to illuminate a field of view containing an object of interest; capturing a sequence of digital images of the field of view using a camera (or cameras) at a time when the light source(s) are activated; identifying pixels corresponding to the object rather than to the background; and based on the identified pixels, constructing a 3D model of the object, including a position and shape of the object, to geometrically determine whether it corresponds to the object of interest.

The light source(s) may be positioned such that objects of interest are located within a proximal zone of the field of view, where the proximal zone extends from the camera to a distance at least twice an expected maximum distance between the objects of interest and the camera. For example, the proximal zone may have a depth of at least four times the expected maximum distance. The light source(s) may, for example, be diffuse emitters—e.g., infrared light-emitting diodes, in which case the camera is an infrared-sensitive camera. Two or more light sources may be arranged to flank the camera and be substantially coplanar therewith. In various embodiments, the camera and the light source(s) are oriented vertically upward. To enhance contrast, the camera may be operated to provide an exposure time no greater than 100 microseconds and the light source(s) may be activated during exposure time at a power level of at least 5 watts.

Alternatively, object pixels may be identified by capturing a first image when the light source(s) are not activated, a second image when the light source(s) are activated, and a third image when the light source(s) are not activated, where pixels corresponding to the object are identified based on a difference between the second and first images and a difference between the second and third images.

Geometrically determining whether an object corresponds to the object of interest may comprise or consist of identifying ellipses that volumetrically define a candidate object, discarding object segments geometrically inconsistent with an ellipse-based definition, and determining, based on the ellipses, whether the candidate object corresponds to the object of interest.

In still another aspect the invention pertains to a method of locating rounded objects within a digital image. In various embodiments, the method comprises the steps of: activating at least one light source to illuminate a field of view containing an object of interest; operating a camera to capture a sequence of images including a first image captured at a time when the at least one light source is illuminating the field of view; and analyzing the images to detect therein Gaussian brightness falloff patterns indicative of rounded objects in the field of view. In some embodiments, the rounded objects are detected without identifying edges thereof. The method may further comprise tracking the motion of the detected rounded objects through a plurality of the captured images.

Another aspect of the invention relates to an image capture and analysis system for locating rounded objects within a field of view. In various embodiments, the system comprises at least one camera oriented toward the field of view; at least one light source disposed on a same side of the field of view as the camera and oriented to illuminate the field of view; and an image analyzer coupled to the camera and the light source. The image analyzer may be configured to operate the camera(s) to capture a sequence of images including a first image captured at a time when the light source(s) are illuminating the field of view; and analyze the images to detect therein Gaussian brightness falloff patterns indicative of rounded objects in the field of view. The rounded objects may, in some embodiments, be detected without identifying edges thereof. The system may track the motion of the detected rounded objects through a plurality of the captured images.

As used herein, the term “substantially” or “approximately” means ±10% (e.g., by weight or by volume), and in some embodiments, ±5%. The term “consists essentially of” means excluding other materials that contribute to function, unless otherwise defined herein. Reference throughout this specification to “one example,” “an example,” “one embodiment,” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the example is included in at least one example of the present technology. Thus, the occurrences of the phrases “in one example,” “in an example,” “one embodiment,” or “an embodiment” in various places throughout this specification are not necessarily all referring to the same example. Furthermore, the particular features, structures, routines, steps, or characteristics may be combined in any suitable manner in one or more examples of the technology. The headings provided herein are for convenience only and are not intended to limit or interpret the scope or meaning of the claimed technology.

The following detailed description together with the accompanying drawings will provide a better understanding of the nature and advantages of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a system for capturing image data according to an embodiment of the present invention.

FIG. 2 is a simplified block diagram of a computer system implementing an image analysis apparatus according to an embodiment of the present invention.

FIGS. 3A, 3B and 3C are graphs of brightness data for rows of pixels that may be obtained according to an embodiment of the present invention.

FIG. 4 is a flow diagram of a process for identifying the location of an object in an image according to an embodiment of the present invention.

FIG. 5 illustrates a timeline in which light sources pulsed on at regular intervals according to an embodiment of the present invention.

FIG. 6 illustrates a timeline for pulsing light sources and capturing images according to an embodiment of the present invention.

FIG. 7 is a flow diagram of a process for identifying object edges using successive images according to an embodiment of the present invention.

FIG. 8 is a top view of a computer system incorporating a motion detector as a user input device according to an embodiment of the present invention.

FIG. 9 is a front view of a tablet computer illustrating another example of a computer system incorporating a motion detector according to an embodiment of the present invention.

FIG. 10 illustrates a goggle system incorporating a motion detector according to an embodiment of the present invention.

FIG. 11 is a flow diagram of a process for using motion information as user input to control a computer system or other system according to an embodiment of the present invention.

FIG. 12 illustrates a system for capturing image data according to another embodiment of the present invention.

FIG. 13 illustrates a system for capturing image data according to still another embodiment of the present invention.

DETAILED DESCRIPTION

Refer first to FIG. 1, which illustrates a system 100 for capturing image data according to an embodiment of the present invention. System 100 includes a pair of cameras 102, 104 coupled to an image-analysis system 106. Cameras 102, 104 can be any type of camera, including cameras sensitive across the visible spectrum or, more typically, with enhanced sensitivity to a confined wavelength band (e.g., the infrared (IR) or ultraviolet bands); more generally, the term “camera” herein refers to any device (or combination of devices) capable of capturing an image of an object and representing that image in the form of digital data. For example, line sensors or line cameras rather than conventional devices that capture a two-dimensional (2D) image can be employed. The term “light” is used generally to connote any electromagnetic radiation, which may or may not be within the visible spectrum, and may be broadband (e.g., white light) or narrowband (e.g., a single wavelength or narrow band of wavelengths).

The heart of a digital camera is an image sensor, which contains a grid of light-sensitive picture elements (pixels). A lens focuses light onto the surface of the image sensor, and the image is formed as the light strikes the pixels with varying intensity. Each pixel converts the light into an electric charge whose magnitude reflects the intensity of the detected light, and collects that charge so it can be measured. Both CCD and CMOS image sensors perform this same function but differ in how the signal is measured and transferred.

In a CCD, the charge from each pixel is transported to a single structure that converts the charge into a measurable voltage. This is done by sequentially shifting the charge in each pixel to its neighbor, row by row and then column by column in “bucket brigade” fashion, until it reaches the measurement structure. A CMOS sensor, by contrast, places a measurement structure at each pixel location. The measurements are transferred directly from each location to the output of the sensor.

Cameras 102, 104 are preferably capable of capturing video images (i.e., successive image frames at a constant rate of at least 15 frames per second), although no particular frame rate is required. The capabilities of cameras 102, 104 are not critical to the invention, and the cameras can vary as to frame rate, image resolution (e.g., pixels per image), color or intensity resolution (e.g., number of bits of intensity data per pixel), focal length of lenses, depth of field, etc. In general, for a particular application, any cameras capable of focusing on objects within a spatial volume of interest can be used. For instance, to capture motion of the hand of an otherwise stationary person, the volume of interest might be defined as a cube approximately one meter on a side.

System 100 also includes a pair of light sources 108, 110, which can be disposed to either side of cameras 102, 104, and controlled by image-analysis system 106. Light sources 108, 110 can be infrared light sources of generally conventional design, e.g., infrared light-emitting diodes (LEDs), and cameras 102, 104 can be sensitive to infrared light. Filters 120, 122 can be placed in front of cameras 102, 104 to filter out visible light so that only infrared light is registered in the images captured by cameras 102, 104. In some embodiments where the object of interest is a person's hand or body, use of infrared light can allow the motion-capture system to operate under a broad range of lighting conditions and can avoid various inconveniences or distractions that may be associated with directing visible light into the region where the person is moving. However, a particular wavelength or region of the electromagnetic spectrum is required.

It should be stressed that the foregoing arrangement is representative and not limiting. For example, lasers or other light sources can be used instead of LEDs. For laser setups, additional optics (e.g., a lens or diffuser) may be employed to widen the laser beam (and make its field of view similar to that of the cameras). Useful arrangements can also include short- and wide-angle illuminators for different ranges. Light sources are typically diffuse rather than specular point sources; for example, packaged LEDs with light-spreading encapsulation are suitable.

In operation, cameras 102, 104 are oriented toward a region of interest 112 in which an object of interest 114 (in this example, a hand) and one or more background objects 116 can be present. Light sources 108, 110 are arranged to illuminate region 112. In some embodiments, one or more of the light sources 108, 110 and one or more of the cameras 102, 104 are disposed below the motion to be detected, e.g., where hand motion is to be detected, beneath the spatial region where that motion takes place. This is an optimal location because the amount of information recorded about the hand is proportional to the number of pixels it occupies in the camera images, the hand will occupy more pixels when the camera's angle with respect to the hand's “pointing direction” is as close to perpendicular as possible. Because it is uncomfortable for a user to orient his palm toward a screen, the optimal positions are either from the bottom looking up, from the top looking down (which requires a bridge) or from the screen bezel looking diagonally up or diagonally down. In scenarios looking up there is less likelihood of confusion with background objects (clutter on the user's desk, for example) and if it is directly looking up then there is little likelihood of confusion with other people out of the field of view (and also privacy is enhanced by not imaging faces). Image-analysis system 106, which can be, e.g., a computer system, can control the operation of light sources 108, 110 and cameras 102, 104 to capture images of region 112. Based on the captured images, image-analysis system 106 determines the position and/or motion of object 114.

For example, as a step in determining the position of object 114, image-analysis system 106 can determine which pixels of various images captured by cameras 102, 104 contain portions of object 114. In some embodiments, any pixel in an image can be classified as an “object” pixel or a “background” pixel depending on whether that pixel contains a portion of object 114 or not. With the use of light sources 108, 110, classification of pixels as object or background pixels can be based on the brightness of the pixel. For example, the distance (r_O) between an object of interest 114 and cameras 102, 104 is expected to be smaller than the distance (r_B) between background object(s) 116 and cameras 102, 104. Because the intensity of light from sources 108, 110 decreases as 1/r², object 114 will be more brightly lit than background 116, and pixels containing portions of object 114 (i.e., object pixels) will be correspondingly brighter than pixels containing portions of background 116 (i.e., background pixels). For example, if r_B/r_O=2, then object pixels will be approximately four times brighter than background pixels, assuming object 114 and background 116 are similarly reflective of the light from sources 108, 110, and further assuming that the overall illumination of region 112 (at least within the frequency band captured by cameras 102, 104) is dominated by light sources 108, 110. These assumptions generally hold for suitable choices of cameras 102, 104, light sources 108, 110, filters 120, 122, and objects commonly encountered. For example, light sources 108, 110 can be infrared LEDs capable of strongly emitting radiation in a narrow frequency band, and filters 120, 122 can be matched to the frequency band of light sources 108, 110. Thus, although a human hand or body, or a heat source or other object in the background, may emit some infrared radiation, the response of cameras 102, 104 can still be dominated by light originating from sources 108, 110 and reflected by object 114 and/or background 116.

In this arrangement, image-analysis system 106 can quickly and accurately distinguish object pixels from background pixels by applying a brightness threshold to each pixel. For example, pixel brightness in a CMOS sensor or similar device can be measured on a scale from 0.0 (dark) to 1.0 (fully saturated), with some number of gradations in between depending on the sensor design. The brightness encoded by the camera pixels scales standardly (linearly) with the luminance of the object, typically due to the deposited charge or diode voltages. In some embodiments, light sources 108, 110 are bright enough that reflected light from an object at distance r_Oproduces a brightness level of 1.0 while an object at distance r_B=2r_Oproduces a brightness level of 0.25. Object pixels can thus be readily distinguished from background pixels based on brightness. Further, edges of the object can also be readily detected based on differences in brightness between adjacent pixels, allowing the position of the object within each image to be determined. Correlating object positions between images from cameras 102, 104 allows image-analysis system 106 to determine the location in 3D space of object 114, and analyzing sequences of images allows image-analysis system 106 to reconstruct 3D motion of object 114 using conventional motion algorithms.

It will be appreciated that system 100 is illustrative and that variations and modifications are possible. For example, light sources 108, 110 are shown as being disposed to either side of cameras 102, 104. This can facilitate illuminating the edges of object 114 as seen from the perspectives of both cameras; however, a particular arrangement of cameras and lights is not required. (Examples of other arrangements are described below.) As long as the object is significantly closer to the cameras than the background, enhanced contrast as described herein can be achieved.

Image-analysis system 106 (also referred to as an image analyzer) can include or consist of any device or device component that is capable of capturing and processing image data, e.g., using techniques described herein. FIG. 2 is a simplified block diagram of a computer system 200, implementing image-analysis system 106 according to an embodiment of the present invention. Computer system 200 includes a processor 202, a memory 204, a camera interface 206, a display 208, speakers 209, a keyboard 210, and a mouse 211.

Memory 204 can be used to store instructions to be executed by processor 202 as well as input and/or output data associated with execution of the instructions. In particular, memory 204 contains instructions, conceptually illustrated as a group of modules described in greater detail below, that control the operation of processor 202 and its interaction with the other hardware components. An operating system directs the execution of low-level, basic system functions such as memory allocation, file management and operation of mass storage devices. The operating system may be or include a variety of operating systems such as Microsoft WINDOWS operating system, the Unix operating system, the Linux operating system, the Xenix operating system, the IBM AIX operating system, the Hewlett Packard UX operating system, the Novell NETWARE operating system, the Sun Microsystems SOLARIS operating system, the OS/2 operating system, the BeOS operating system, the MACINTOSH operating system, the APACHE operating system, an OPENSTEP operating system or another operating system of platform.

The computing environment may also include other removable/nonremovable, volatile/nonvolatile computer storage media. For example, a hard disk drive may read or write to nonremovable, nonvolatile magnetic media. A magnetic disk drive may read from or writes to a removable, nonvolatile magnetic disk, and an optical disk drive may read from or write to a removable, nonvolatile optical disk such as a CD-ROM or other optical media. Other removable/nonremovable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like. The storage media are typically connected to the system bus through a removable or non-removable memory interface.

Processor 202 may be a general-purpose microprocessor, but depending on implementation can alternatively be a microcontroller, peripheral integrated circuit element, a CSIC (customer-specific integrated circuit), an ASIC (application-specific integrated circuit), a logic circuit, a digital signal processor, a programmable logic device such as an FPGA (field-programmable gate array), a PLD (programmable logic device), a PLA (programmable logic array), an RFID processor, smart chip, or any other device or arrangement of devices that is capable of implementing the steps of the processes of the invention.

Camera interface 206 can include hardware and/or software that enables communication between computer system 200 and cameras such as cameras 102, 104 shown in FIG. 1, as well as associated light sources such as light sources 108, 110 of FIG. 1. Thus, for example, camera interface 206 can include one or more data ports 216, 218 to which cameras can be connected, as well as hardware and/or software signal processors to modify data signals received from the cameras (e.g., to reduce noise or reformat data) prior to providing the signals as inputs to a conventional motion-capture (“mocap”) program 214 executing on processor 202. In some embodiments, camera interface 206 can also transmit signals to the cameras, e.g., to activate or deactivate the cameras, to control camera settings (frame rate, image quality, sensitivity, etc.), or the like. Such signals can be transmitted, e.g., in response to control signals from processor 202, which may in turn be generated in response to user input or other detected events.

Camera interface 206 can also include controllers 217, 219, to which light sources (e.g., light sources 108, 110) can be connected. In some embodiments, controllers 217, 219 supply operating current to the light sources, e.g., in response to instructions from processor 202 executing mocap program 214. In other embodiments, the light sources can draw operating current from an external power supply (not shown), and controllers 217, 219 can generate control signals for the light sources, e.g., instructing the light sources to be turned on or off or changing the brightness. In some embodiments, a single controller can be used to control multiple light sources.

Instructions defining mocap program 214 are stored in memory 204, and these instructions, when executed, perform motion-capture analysis on images supplied from cameras connected to camera interface 206. In one embodiment, mocap program 214 includes various modules, such as an object detection module 222 and an object analysis module 224; again, both of these modules are conventional and well-characterized in the art. Object detection module 222 can analyze images (e.g., images captured via camera interface 206) to detect edges of an object therein and/or other information about the object's location. Object analysis module 224 can analyze the object information provided by object detection module 222 to determine the 3D position and/or motion of the object. Examples of operations that can be implemented in code modules of mocap program 214 are described below. Memory 204 can also include other information and/or code modules used by mocap program 214.

Display 208, speakers 209, keyboard 210, and mouse 211 can be used to facilitate user interaction with computer system 200. These components can be of generally conventional design or modified as desired to provide any type of user interaction. In some embodiments, results of motion capture using camera interface 206 and mocap program 214 can be interpreted as user input. For example, a user can perform hand gestures that are analyzed using mocap program 214, and the results of this analysis can be interpreted as an instruction to some other program executing on processor 200 (e.g., a web browser, word processor, or other application). Thus, by way of illustration, a user might use upward or downward swiping gestures to “scroll” a webpage currently displayed on display 208, to use rotating gestures to increase or decrease the volume of audio output from speakers 209, and so on.

It will be appreciated that computer system 200 is illustrative and that variations and modifications are possible. Computer systems can be implemented in a variety of form factors, including server systems, desktop systems, laptop systems, tablets, smart phones or personal digital assistants, and so on. A particular implementation may include other functionality not described herein, e.g., wired and/or wireless network interfaces, media playing and/or recording capability, etc. In some embodiments, one or more cameras may be built into the computer rather than being supplied as separate components. Further, an image analyzer can be implemented using only a subset of computer system components (e.g., as a processor executing program code, an ASIC, or a fixed-function digital signal processor, with suitable I/O interfaces to receive image data and output analysis results).

While computer system 200 is described herein with reference to particular blocks, it is to be understood that the blocks are defined for convenience of description and are not intended to imply a particular physical arrangement of component parts. Further, the blocks need not correspond to physically distinct components. To the extent that physically distinct components are used, connections between components (e.g., for data communication) can be wired and/or wireless as desired.

Execution of object detection module 222 by processor 202 can cause processor 202 to operate camera interface 206 to capture images of an object and to distinguish object pixels from background pixels by analyzing the image data. FIGS. 3A-3C are three different graphs of brightness data for rows of pixels that may be obtained according to various embodiments of the present invention. While each graph illustrates one pixel row, it is to be understood that an image typically contains many rows of pixels, and a row can contain any number of pixels; for instance, an HD video image can include 1080 rows having 1920 pixels each.

FIG. 3A illustrates brightness data 300 for a row of pixels in which the object has a single cross-section, such as a cross-section through a palm of a hand. Pixels in region 302, corresponding to the object, have high brightness while pixels in regions 304 and 306, corresponding to background, have considerably lower brightness. As can be seen, the object's location is readily apparent, and the locations of the edges of the object (at 308 at 310) are easily identified. For example, any pixel with brightness above 0.5 can be assumed to be an object pixel, while any pixel with brightness below 0.5 can be assumed to be a background pixel.

FIG. 3B illustrates brightness data 320 for a row of pixels in which the object has multiple distinct cross-sections, such as a cross-section through fingers of an open hand. Regions 322, 323, and 324, corresponding to the object, have high brightness while pixels in regions 326-329, corresponding to background, have low brightness. Again, a simple threshold cutoff on brightness (e.g., at 0.5) suffices to distinguish object pixels from background pixels, and the edges of the object can be readily ascertained.

FIG. 3C illustrates brightness data 340 for a row of pixels in which the distance to the object varies across the row, such as a cross-section of a hand with two fingers extending toward the camera. Regions 342 and 343 correspond to the extended fingers and have highest brightness; regions 344 and 345 correspond to other portions of the hand and are slightly less bright; this can be due in part to being farther away in part to shadows cast by the extended fingers. Regions 348 and 349 are background regions and are considerably darker than hand-containing regions 342-345. A threshold cutoff on brightness (e.g., at 0.5) again suffices to distinguish object pixels from background pixels. Further analysis of the object pixels can also be performed to detect the edges of regions 342 and 343, providing additional information about the object's shape.

It will be appreciated that the data shown in FIGS. 3A-3C is illustrative. In some embodiments, it may be desirable to adjust the intensity of light sources 108, 110 such that an object at an expected distance (e.g., r_Oin FIG. 1) will be overexposed—that is, many if not all of the object pixels will be fully saturated to a brightness level of 1.0. (The actual brightness of the object may in fact be higher.) While this may also make the background pixels somewhat brighter, the 1/r²falloff of light intensity with distance still leads to a ready distinction between object and background pixels as long as the intensity is not set so high that background pixels also approach the saturation level. As FIGS. 3A-3C illustrate, use of lighting directed at the object to create strong contrast between object and background allows the use of simple and fast algorithms to distinguish between background pixels and object pixels, which can be particularly useful in real-time motion-capture systems. Simplifying the task of distinguishing background and object pixels can also free up computing resources for other motion-capture tasks (e.g., reconstructing the object's position, shape, and/or motion).

Refer now to FIG. 4, which illustrates a process 400 for identifying the location of an object in an image according to an embodiment of the present invention. Process 400 can be implemented, e.g., in system 100 of FIG. 1. At block 402, light sources 108, 110 are turned on. At block 404, one or more images are captured using cameras 102, 104. In some embodiments, one image from each camera is captured. In other embodiments, a sequence of images is captured from each camera. The images from the two cameras can be closely correlated in time (e.g., simultaneous to within a few milliseconds) so that correlated images from the two cameras can be used to determine the 3D location of the object.

At block 406, a threshold pixel brightness is applied to distinguish object pixels from background pixels. Block 406 can also include identifying locations of edges of the object based on transition points between background and object pixels. In some embodiments, each pixel is first classified as either object or background based on whether it exceeds the threshold brightness cutoff. For example, as shown in FIGS. 3A-3C, a cutoff at a saturation level of 0.5 can be used. Once the pixels are classified, edges can be detected by finding locations where background pixels are adjacent to object pixels. In some embodiments, to avoid noise artifacts, the regions of background and object pixels on either side of the edge may be required to have a certain minimum size (e.g., 2, 4 or 8 pixels).

In other embodiments, edges can be detected without first classifying pixels as object or background. For example, Δβ can be defined as the difference in brightness between adjacent pixels, and |Δβ| above a threshold (e.g., 0.3 or 0.5 in terms of the saturation scale) can indicate a transition from background to object or from object to background between adjacent pixels. (The sign of Δβ can indicate the direction of the transition.) In some instances where the object's edge is actually in the middle of a pixel, there may be a pixel with an intermediate value at the boundary. This can be detected, e.g., by computing two brightness values for a pixel i: βL=(βi+βi−1)/2 and βR=(βi+βi+1)/2, where pixel (i−1) is to the left of pixel i and pixel (i+1) is to the right of pixel i. If pixel i is not near an edge, |βL−βR| will generally be close to zero; if pixel is near an edge, then |βL−βR| will be closer to 1, and a threshold on |βL−βR| can be used to detect edges.

In some instances, one part of an object may partially occlude another in an image; for example, in the case of a hand, a finger may partly occlude the palm or another finger. Occlusion edges that occur where one part of the object partially occludes another can also be detected based on smaller but distinct changes in brightness once background pixels have been eliminated. FIG. 3C illustrates an example of such partial occlusion, and the locations of occlusion edges are apparent.

Detected edges can be used for numerous purposes. For example, as previously noted, the edges of the object as viewed by the two cameras can be used to determine an approximate location of the object in 3D space. The position of the object in a 2D plane transverse to the optical axis of the camera can be determined from a single image, and the offset (parallax) between the position of the object in time-correlated images from two different cameras can be used to determine the distance to the object if the spacing between the cameras is known.

Further, the position and shape of the object can be determined based on the locations of its edges in time-correlated images from two different cameras, and motion (including articulation) of the object can be determined from analysis of successive pairs of images. Examples of techniques that can be used to determine an object's position, shape and motion based on locations of edges of the object are described in co-pending U.S. Ser. No. 13/414,485, filed Mar. 7, 2012, the entire disclosure of which is incorporated herein by reference. Those skilled in the art with access to the present disclosure will recognize that other techniques for determining position, shape and motion of an object based on information about the location of edges of the object can also be used.

In accordance with the ′485 application, an object's motion and/or position is reconstructed using small amounts of information. For example, an outline of an object's shape, or silhouette, as seen from a particular vantage point can be used to define tangent lines to the object from that vantage point in various planes, referred to herein as “slices.” Using as few as two different vantage points, four (or more) tangent lines from the vantage points to the object can be obtained in a given slice. From these four (or more) tangent lines, it is possible to determine the position of the object in the slice and to approximate its cross-section in the slice, e.g., using one or more ellipses or other simple closed curves. As another example, locations of points on an object's surface in a particular slice can be determined directly (e.g., using a time-of-flight camera), and the position and shape of a cross-section of the object in the slice can be approximated by fitting an ellipse or other simple closed curve to the points. Positions and cross-sections determined for different slices can be correlated to construct a 3D model of the object, including its position and shape. A succession of images can be analyzed using the same technique to model motion of the object. Motion of a complex object that has multiple separately articulating members (e.g., a human hand) can be modeled using these techniques.

More particularly, an ellipse in the xy plane can be characterized by five parameters: the x and y coordinates of the center (x_C, y_C), the semimajor axis, the semiminor axis, and a rotation angle (e.g., angle of the semimajor axis relative to the x axis). With only four tangents, the ellipse is underdetermined. However, an efficient process for estimating the ellipse in spite of this fact involves making an initial working assumption (or “guess”) as to one of the parameters and revisiting the assumption as additional information is gathered during the analysis. This additional information can include, for example, physical constraints based on properties of the cameras and/or the object. In some circumstances, more than four tangents to an object may be available for some or all of the slices, e.g., because more than two vantage points are available. An elliptical cross-section can still be determined, and the process in some instances is somewhat simplified as there is no need to assume a parameter value. In some instances, the additional tangents may create additional complexity. In some circumstances, fewer than four tangents to an object may be available for some or all of the slices, e.g., because an edge of the object is out of range of the field of view of one camera or because an edge was not detected. A slice with three tangents can be analyzed. For example, using two parameters from an ellipse fit to an adjacent slice (e.g., a slice that had at least four tangents), the system of equations for the ellipse and three tangents is sufficiently determined that it can be solved. As another option, a circle can be fit to the three tangents; defining a circle in a plane requires only three parameters (the center coordinates and the radius), so three tangents suffice to fit a circle. Slices with fewer than three tangents can be discarded or combined with adjacent slices.

To determine geometrically whether an object corresponds to an object of interest comprises, one approach is to look for continuous volumes of ellipses that define an object and discard object segments geometrically inconsistent with the ellipse-based definition of the object—e.g., segments that are too cylindrical or too straight or too thin or too small or too far away—and discarding these. If a sufficient number of ellipses remain to characterize the object and it conforms to the object of interest, it is so identified, and may be tracked from frame to frame.

In some embodiments, each of a number of slices is analyzed separately to determine the size and location of an elliptical cross-section of the object in that slice. This provides an initial 3D model (specifically, a stack of elliptical cross-sections), which can be refined by correlating the cross-sections across different slices. For example, it is expected that an object's surface will have continuity, and discontinuous ellipses can accordingly be discounted. Further refinement can be obtained by correlating the 3D model with itself across time, e.g., based on expectations related to continuity in motion and deformation. With renewed reference to FIGS. 1 and 2, in some embodiments, light sources 108, 110 can be operated in a pulsed mode rather than being continually on. This can be useful, e.g., if light sources 108, 110 have the ability to produce brighter light in a pulse than in a steady-state operation. FIG. 5 illustrates a timeline in which light sources 108, 110 are pulsed on at regular intervals as shown at 502. The shutters of cameras 102, 104 can be opened to capture images at times coincident with the light pulses as shown at 504. Thus, an object of interest can be brightly illuminated during the times when images are being captured. In some embodiments, the silhouettes of an object are extracted from one or more images of the object that reveal information about the object as seen from different vantage points. While silhouettes can be obtained using a number of different techniques, in some embodiments, the silhouettes are obtained by using cameras to capture images of the object and analyzing the images to detect object edges.

In some embodiments, the pulsing of light sources 108, 110 can be used to further enhance contrast between an object of interest and background. In particular, the ability to discriminate between relevant and irrelevant (e.g., background) objects in a scene can be compromised if the scene contains object that themselves emit light or are highly reflective. This problem can be addressed by setting the camera exposure time to extraordinarily short periods (e.g., 100 microseconds or less) and pulsing the illumination at very high powers (i.e., 5 to 20 watts or, in some cases, to higher levels, e.g., 40 watts). In this period of time, most common sources of ambient illumination (e.g., fluorescent lights) are very dark by comparison to such bright, short-period illumination; that is, in microseconds, non-pulsed light sources are dimmer than they would appear at an exposure time of milliseconds or more. In effect, this approach increases the contrast of an object of interest with respect to other objects, even those emitting in the same general band. Accordingly, discriminating by brightness under such conditions allows irrelevant objects to be ignored for purposes of image reconstruction and processing. Average power consumption is also reduced; in the case of 20 watts for 100 microseconds, the average power consumption is under 10 milliwatts. In general, the light sources 108, 110 are operated so as to be on during the entire camera exposure period, i.e., the pulse width is equal to the exposure time and is coordinated therewith.

It is also possible to coordinate pulsing of lights 108, 110 for purposes of by comparing images taken with lights 108, 110 on and images taken with lights 108, 110 off. FIG. 6 illustrates a timeline in which light sources 108, 110 are pulsed on at regular intervals as shown at 602, while shutters of cameras 102, 104 are opened to capture images at times shown at 604. In this case, light sources 108, 110 are “on” for every other image. If the object of interest is significantly closer than background regions to light sources 108, 110, the difference in light intensity will be stronger for object pixels than for background pixels. Accordingly, comparing pixels in successive images can help distinguish object and background pixels.

FIG. 7 is a flow diagram of a process 700 for identifying object edges using successive images according to an embodiment of the present invention. At block 702, the light sources are turned off, and at block 704 a first image (A) is captured. Then, at block 706, the light sources are turned on, and at block 708 a second image (B) is captured. At block 710, a “difference” image B-A is calculated, e.g., by subtracting the brightness value of each pixel in image A from the brightness value of the corresponding pixel in image B. Since image B was captured with lights on, it is expected that B-A will be positive for most pixels.

The difference image is used to discriminate between background and foreground by applying a threshold or other metric on a pixel-by-pixel basis. At block 712, a threshold is applied to the difference image (B-A) to identify object pixels, with (B-A) above a threshold being associated with object pixels and (B-A) below the threshold being associated with background pixels. Object edges can then be defined by identifying where object pixels are adjacent to background pixels, as described above. Object edges can be used for purposes such as position and/or motion detection, as described above.

In an alternative embodiment, object edges are identified using a triplet of image frames rather than a pair. For example, in one implementation, a first image (Image1) is obtained with the light sources turned off; a second image (Image2) is obtained with the light sources turned on; and a third image (Image3) is taken with the light sources again turned off. Two difference images,

$Image 4 = abs (Image 2 - Image 1) and$

$Image 5 = abs (Image 2 - Image 3)$

are then defined by subtracting pixel brightness values. A final image, Image6, is defined based on the two images Image4 and Image5. In particular, the value of each pixel in Image6 is the smaller of the two corresponding pixel values in Image4 and Image5. In other words, Image6=min(Image4, Image5) on a pixel-by-pixel basis. Image6 represents an enhanced-accuracy difference image and most of its pixels will be positive. Once again, a threshold or other metric can be used on a pixel-by-pixel basis to distinguish foreground and background pixels.

Contrast-based object detection as described herein can be applied in any situation where objects of interest are expected to be significantly closer (e.g., half the distance) to the light source(s) than background objects. One such application relates to the use of motion-detection as user input to interact with a computer system. For example, the user may point to the screen or make other hand gestures, which can be interpreted by the computer system as input.

A computer system 800 incorporating a motion detector as a user input device according to an embodiment of the present invention is illustrated in FIG. 8. Computer system 800 includes a desktop box 802 that can house various components of a computer system such as processors, memory, fixed or removable disk drives, video drivers, audio drivers, network interface components, and so on. A display 804 is connected to desktop box 802 and positioned to be viewable by a user. A keyboard 806 is positioned within easy reach of the user's hands. A motion-detector unit 808 is placed near keyboard 806 (e.g., behind, as shown or to one side), oriented toward a region in which it would be natural for the user to make gestures directed at display 804 (e.g., a region in the air above the keyboard and in front of the monitor). Cameras 810, 812 (which can be similar or identical to cameras 102, 104 described above) are arranged to point generally upward, and light sources 814, 816 (which can be similar or identical to light sources 108, 110 described above) are arranged to either side of cameras 810, 812 to illuminate an area above motion-detector unit 808. In typical implementations, the cameras 810, 812 and the light sources 814, 816 are substantially coplanar. This configuration prevents the appearance of shadows that can, for example, interfere with edge detection (as can be the case were the light sources located between, rather than flanking, the cameras). A filter, not shown, can be placed over the top of motion-detector unit 808 (or just over the apertures of cameras 810, 812) to filter out all light outside a band around the peak frequencies of light sources 814, 816.

In the illustrated configuration, when the user moves a hand or other object (e.g., a pencil) in the field of view of cameras 810, 812, the background will likely consist of a ceiling and/or various ceiling-mounted fixtures. The user's hand can be 10-20 cm above motion detector 808, while the ceiling may be five to ten times that distance (or more). Illumination from light sources 814, 816 will therefore be much more intense on the user's hand than on the ceiling, and the techniques described herein can be used to reliably distinguish object pixels from background pixels in images captured by cameras 810, 812. If infrared light is used, the user will not be distracted or disturbed by the light.

Computer system 800 can utilize the architecture shown in FIG. 1. For example, cameras 810, 812 of motion-detector unit 808 can provide image data to desktop box 802, and image analysis and subsequent interpretation can be performed using the processors and other components housed within desktop box 802. Alternatively, motion-detector unit 808 can incorporate processors or other components to perform some or all stages of image analysis and interpretation. For example, motion-detector unit 808 can include a processor (programmable or fixed-function) that implements one or more of the processes described above to distinguish between object pixels and background pixels. In this case, motion-detector unit 808 can send a reduced representation of the captured images (e.g., a representation with all background pixels zeroed out) to desktop box 802 for further analysis and interpretation. A particular division of computational tasks between a processor inside motion-detector unit 808 and a processor inside desktop box 802 is not required.

It is not always necessary to discriminate between object pixels and background pixels by absolute brightness levels; for example, where knowledge of object shape exists, the pattern of brightness falloff can be utilized to detect the object in an image even without explicit detection of object edges. On rounded objects (such as hands and fingers), for example, the 1/r²relationship produces Gaussian or near-Gaussian brightness distributions near the centers of the objects; imaging a cylinder illuminated by an LED and disposed perpendicularly with respect to a camera results in an image having a bright center line corresponding to the cylinder axis, with brightness falling off to each side (around the cylinder circumference). Fingers are approximately cylindrical, and by identifying these Gaussian peaks, it is possible to locate fingers even in situations where the background is close and the edges are not visible due to the relative brightness of the background (either due to proximity or the fact that it may be actively emitting infrared light). The term “Gaussian” is used broadly herein to connote a curve with a negative second derivative. Often such curves will be bell-shaped and symmetric, but this is not necessarily the case; for example, in situations with higher object specularity or if the object is at an extreme angle, the curve may be skewed in a particular direction. Accordingly, as used herein, the term “Gaussian” is not limited to curves explicitly conforming to a Gaussian function.

FIG. 9 illustrates a tablet computer 900 incorporating a motion detector according to an embodiment of the present invention. Tablet computer 900 has a housing, the front surface of which incorporates a display screen 902 surrounded by a bezel 904. One or more control buttons 906 can be incorporated into bezel 904. Within the housing, e.g., behind display screen 902, tablet computer 900 can have various conventional computer components (processors, memory, network interfaces, etc.). A motion detector 910 can be implemented using cameras 912, 914 (e.g., similar or identical to cameras 102, 104 of FIG. 1) and light sources 916, 918 (e.g., similar or identical to light sources 108, 110 of FIG. 1) mounted into bezel 904 and oriented toward the front surface so as to capture motion of a user positioned in front of tablet computer 900.

When the user moves a hand or other object in the field of view of cameras 912, 914, the motion is detected as described above. In this case, the background is likely to be the user's own body, at a distance of roughly 25-30 cm from tablet computer 900. The user may hold a hand or other object at a short distance from display 902, e.g., 5-10 cm. As long as the user's hand is significantly closer than the user's body (e.g., half the distance) to light sources 916, 918, the illumination-based contrast enhancement techniques described herein can be used to distinguish object pixels from background pixels. The image analysis and subsequent interpretation as input gestures can be done within tablet computer 900 (e.g., leveraging the main processor to execute operating-system or other software to analyze data obtained from cameras 912, 914). The user can thus interact with tablet 900 using gestures in 3D space.

A goggle system 1000, as shown in FIG. 10, may also incorporate a motion detector according to an embodiment of the present invention. Goggle system 1000 can be used, e.g., in connection with virtual-reality and/or augmented-reality environments. Goggle system 1000 includes goggles 1002 that are wearable by a user, similar to conventional eyeglasses. Goggles 1002 include eyepieces 1004, 1006 that can incorporate small display screens to provide images to the user's left and right eyes, e.g., images of a virtual reality environment. These images can be provided by a base unit 1008 (e.g., a computer system) that is in communication with goggles 1002, either via a wired or wireless channel. Cameras 1010, 1012 (e.g., similar or identical to cameras 102, 104 of FIG. 1) can be mounted in a frame section of goggles 1002 such that they do not obscure the user's vision. Light sources 1014, 1016 can be mounted in the frame section of goggles 1002 to either side of cameras 1010, 1012. Images collected by cameras 1010, 1012 can be transmitted to base unit 1008 for analysis and interpretation as gestures indicating user interaction with the virtual or augmented environment. (In some embodiments, the virtual or augmented environment presented through eyepieces 1004, 1006 can include a representation of the user's hand, and that representation can be based on the images collected by cameras 1010, 1012.)

When the user gestures using a hand or other object in the field of view of cameras 1008, 1010, the motion is detected as described above. In this case, the background is likely to be a wall of a room the user is in, and the user will most likely be sitting or standing at some distance from the wall. As long as the user's hand is significantly closer than the user's body (e.g., half the distance) to light sources 1012, 1014, the illumination-based contrast enhancement techniques described herein facilitate distinguishing object pixels from background pixels. The image analysis and subsequent interpretation as input gestures can be done within base unit 1008.

It will be appreciated that the motion-detector implementations shown in FIGS. 8-10 are illustrative and that variations and modifications are possible. For example, a motion detector or components thereof can be combined in a single housing with other user input devices, such as a keyboard or trackpad. As another example, a motion detector can be incorporated into a laptop computer, e.g., with upward-oriented cameras and light sources built into the same surface as the laptop keyboard (e.g., to one side of the keyboard or in front of or behind it) or with front-oriented cameras and light sources built into a bezel surrounding the laptop's display screen. As still another example, a wearable motion detector can be implemented, e.g., as a headband or headset that does not include active displays or optical components.

As illustrated in FIG. 11, motion information can be used as user input to control a computer system or other system according to an embodiment of the present invention. Process 1100 can be implemented, e.g., in computer systems such as those shown in FIGS. 8-10. At block 1102, images are captured using the light sources and cameras of the motion detector. As described above, capturing the images can include using the light sources to illuminate the field of view of the cameras such that objects closer to the light sources (and the cameras) are more brightly illuminated than objects farther away.

At block 1104, the captured images are analyzed to detect edges of the object based on changes in brightness. For example, as described above, this analysis can include comparing the brightness of each pixel to a threshold, detecting transitions in brightness from a low level to a high level across adjacent pixels, and/or comparing successive images captured with and without illumination by the light sources. At block 1106, an edge-based algorithm is used to determine the object's position and/or motion. This algorithm can be, for example, any of the tangent-based algorithms described in the above-referenced ′485 application; other algorithms can also be used.

At block 1108, a gesture is identified based on the object's position and/or motion. For example, a library of gestures can be defined based on the position and/or motion of a user's fingers. A “tap” can be defined based on a fast motion of an extended finger toward a display screen. A “trace” can be defined as motion of an extended finger in a plane roughly parallel to the display screen. An inward pinch can be defined as two extended fingers moving closer together and an outward pinch can be defined as two extended fingers moving farther apart. Swipe gestures can be defined based on movement of the entire hand in a particular direction (e.g., up, down, left, right) and different swipe gestures can be further defined based on the number of extended fingers (e.g., one, two, all). Other gestures can also be defined. By comparing a detected motion to the library, a particular gesture associated with detected position and/or motion can be determined.

At block 1110, the gesture is interpreted as user input, which the computer system can process. The particular processing generally depends on application programs currently executing on the computer system and how those programs are configured to respond to particular inputs. For example, a tap in a browser program can be interpreted as selecting a link toward which the finger is pointing. A tap in a word-processing program can be interpreted as placing the cursor at a position where the finger is pointing or as selecting a menu item or other graphical control element that may be visible on the screen. The particular gestures and interpretations can be determined at the level of operating systems and/or applications as desired, and no particular interpretation of any gesture is required.

Full-body motion can be captured and used for similar purposes. In such embodiments, the analysis and reconstruction advantageously occurs in approximately real-time (e.g., times comparable to human reaction times), so that the user experiences a natural interaction with the equipment. In other applications, motion capture can be used for digital rendering that is not done in real time, e.g., for computer-animated movies or the like; in such cases, the analysis can take as long as desired.

Embodiments described herein provide efficient discrimination between object and background in captured images by exploiting the decrease of light intensity with distance. By brightly illuminating the object using one or more light sources that are significantly closer to the object than to the background (e.g., by a factor of two or more), the contrast between object and background can be increased. In some instances, filters can be used to remove light from sources other than the intended sources. Using infrared light can reduce unwanted “noise” or bright spots from visible light sources likely to be present in the environment where images are being captured and can also reduce distraction to users (who presumably cannot see infrared).

The embodiments described above provide two light sources, one disposed to either side of the cameras used to capture images of the object of interest. This arrangement can be particularly useful where the position and motion analysis relies on knowledge of the object's edges as seen from each camera, as the light sources will illuminate those edges. However, other arrangements can also be used. For example, FIG. 12 illustrates a system 1200 with a single camera 1202 and two light sources 1204, 1206 disposed to either side of camera 1202. This arrangement can be used to capture images of object 1208 and shadows cast by object 1208 against a flat background region 1210. In this embodiment, object pixels and background pixels can be readily distinguished. In addition, provided that background 1210 is not too far from object 1208, there will still be enough contrast between pixels in the shadowed background region and pixels in the unshadowed background region to allow discrimination between the two. Position and motion detection algorithms using images of an object and its shadows are described in the above-referenced '485 application and system 1200 can provide input information to such algorithms, including the location of edges of the object and its shadows.

The single-camera implementation 1200 may benefit from inclusion of a holographic diffraction grating 1215 placed in front of the lens of the camera 1202. The grating 1215 creates fringe patterns that appear as ghost silhouettes and/or tangents of the object 1208. Particularly when separable (i.e., when overlap is not excessive), these patterns provide high contrast facilitating discrimination of object from background. See, e.g., Diffraction Grating Handbook (Newport Corporation, Jan. 2005; available at http://gratings.newport.com/library/handbook/handbook.asp), the entire disclosure of which is hereby incorporated by reference.

FIG. 13 illustrates another system 1300 with two cameras 1302, 1304 and one light source 1306 disposed between the cameras. System 1300 can capture images of an object 1308 against a background 1310. System 1300 is generally less reliable for edge illumination than system 100 of FIG. 1; however, not all algorithms for determining position and motion rely on precise knowledge of the edges of an object. Accordingly, system 1300 can be used, e.g., with edge-based algorithms in situations where less accuracy is required. System 1300 can also be used with non-edge-based algorithms.

While the invention has been described with respect to specific embodiments, one skilled in the art will recognize that numerous modifications are possible. The number and arrangement of cameras and light sources can be varied. The cameras' capabilities, including frame rate, spatial resolution, and intensity resolution, can also be varied as desired. The light sources can be operated in continuous or pulsed mode. The systems described herein provide images with enhanced contrast between object and background to facilitate distinguishing between the two, and this information can be used for numerous purposes, of which position and/or motion detection is just one among many possibilities.

Threshold cutoffs and other specific criteria for distinguishing object from background can be adapted for particular cameras and particular environments. As noted above, contrast is expected to increase as the ratio r_B/r_Oincreases. In some embodiments, the system can be calibrated in a particular environment, e.g., by adjusting light-source brightness, threshold criteria, and so on. The use of simple criteria that can be implemented in fast algorithms can free up processing power in a given system for other uses.

Any type of object can be the subject of motion capture using these techniques, and various aspects of the implementation can be optimized for a particular object. For example, the type and positions of cameras and/or light sources can be optimized based on the size of the object whose motion is to be captured and/or the space in which motion is to be captured. Analysis techniques in accordance with embodiments of the present invention can be implemented as algorithms in any suitable computer language and executed on programmable processors. Alternatively, some or all of the algorithms can be implemented in fixed-function logic circuits, and such circuits can be designed and fabricated using conventional or other tools.

Computer programs incorporating various features of the present invention may be encoded on various computer readable storage media; suitable media include magnetic disk or tape, optical storage media such as compact disk (CD) or DVD (digital versatile disk), flash memory, and any other non-transitory medium capable of holding data in a computer-readable form. Computer-readable storage media encoded with the program code may be packaged with a compatible device or provided separately from other devices. In addition program code may be encoded and transmitted via wired optical, and/or wireless networks conforming to a variety of protocols, including the Internet, thereby allowing distribution, e.g., via Internet download.

Thus, although the invention has been described with respect to specific embodiments, it will be appreciated that the invention is intended to cover all modifications and equivalents within the scope of the following claims.

Claims

1. A method comprising: obtaining a plurality of digital images including a first digital image captured by a camera at a time when at least one light source is illuminating a field of view containing a background and a hand including at least one finger;obtaining an identification of pixels of the plurality of digital images that correspond to at least one finger of the hand that is visible in the plurality of digital images rather than to the background, the pixels being identified by: obtaining, from the plurality of digital images, a Gaussian brightness falloff pattern indicative of at least one finger of the hand;identifying an axis of the at least one finger of the hand based on the obtained Gaussian brightness falloff pattern indicative of the at least one finger without identifying edges of the at least one finger; andidentifying the pixels that correspond to the at least one finger based on the identified axis; andtracking motion of the identified pixels that correspond to the at least one finger through the plurality of digital images.
2. The method of claim 1, further comprising: comparing the tracked motion to a library of gestures;identifying a particular gesture from the library of gestures that corresponds to the tracked motion; andenabling the particular gesture that corresponds to the tracked motion.
3. The method of claim 1, wherein: the method further includes constructing a model of the at least one finger based on the identified pixels that correspond to the at least one finger and the tracked motion of the identified pixels that correspond to the at least one finger;the constructing of the model includes constructing a 3D model of the at least one finger, including a position and a shape of the at least one finger, to geometrically determine whether the at least one finger corresponds to an object of interest; andthe at least one light source is positioned such that objects of interest are located within a proximal zone of the field of view, the proximal zone extending from the camera to a distance less than twice an expected maximum distance between the objects of interest and the camera.
4. The method of claim 1, wherein: the constructing of the model includes constructing a 3D model of the at least one finger, including a position and a shape of the at least one finger, to geometrically determine whether the at least one finger corresponds to an object of interest; andthe at least one light source is positioned such that objects of interest are located within a proximal zone of the field of view, the proximal zone extending from the camera to a distance that is half or less than a distance to the background.
5. The method of claim 1, wherein edges of the at least one finger is not visible in at least one digital image of the plurality of digital images.
6. The method of claim 1, wherein the Gaussian brightness falloff pattern is characterized by a negative second derivative.
7. The method of claim 1, wherein the Gaussian brightness falloff pattern includes an asymmetric curve.
8. The method of claim 1, wherein the identifying of the pixels that correspond to the at least one finger includes: comparing pixels in successive digital images of the plurality of digital images to distinguish pixels corresponding to the at least one finger from pixels corresponding to the background.
9. The method of claim 1, wherein the identifying of the pixels that correspond to the at least one finger includes: generating a reduced representation of each digital image of the plurality of digital images in which identified pixels corresponding to the background are zeroed out.
10. An image capture and analysis system comprising: an image analyzer configured to: obtain a plurality of digital images including a first digital image captured by a camera at a time when at least one light source is illuminating a field of view containing a background and a hand including at least one finger;obtain an identification of pixels of the plurality of digital images that correspond to at least one finger of the hand that is visible in the plurality of digital images rather than to the background, the image analyzer identifying the pixels by: obtaining, from the plurality of digital images, a Gaussian brightness falloff pattern indicative of at least one finger of the hand that is visible;identifying an axis of the at least one finger of the hand based on the obtained Gaussian brightness falloff pattern indicative of the at least one finger without identifying edges of the at least one finger; andidentifying the pixels that correspond to the at least one finger based on the identified axis of the at least one fingers; andtrack motion of the identified pixels that correspond to the at least one finger through the plurality of digital images.
11. The image capture and analysis system of claim 10, wherein image analyzer is further configured to: compare the tracked motion to a library of gestures;identify a particular gesture from the library of gestures that corresponds to the tracked motion; andenable the particular gesture that corresponds to the tracked motion.
12. The image capture and analysis system of claim 10, wherein: the image analyzer is further configured to construct a model of the at least one finger based on the identified pixels that correspond to the at least one finger and the tracked motion of the identified pixels that correspond to the at least one finger;the image analyzer constructs the model by constructing a 3D model of the at least one finger, including a position and a shape of the at least one finger, to geometrically determine whether the at least one finger correspond to an object of interest; andthe at least one light source is positioned such that objects of interest are located within a proximal zone of the field of view, the proximal zone extending from the camera to a distance less than twice an expected maximum distance between the objects of interest and the camera.
13. The image capture and analysis system of claim 10, wherein: the image analyzer constructs the model by constructing a 3D model of the at least one finger, including a position and a shape of the at least one finger, to geometrically determine whether the at least one finger corresponds to an object of interest; andthe at least one light source is positioned such that objects of interest are located within a proximal zone of the field of view, the proximal zone extending from the camera to a distance that is half or less than a distance to the background.
14. The image capture and analysis system of claim 10, wherein edges of the at least one finger is not visible in at least one digital image of the plurality of digital images.
15. The image capture and analysis system of claim 10, wherein the Gaussian brightness falloff pattern is characterized by a negative second derivative.
16. The image capture and analysis system of claim 10, wherein the Gaussian brightness falloff pattern includes an asymmetric curve.
17. The image capture and analysis system of claim 10, wherein the image analyzer identifies the pixels that correspond to the at least one fingers by: comparing pixels in successive digital images of the plurality of digital images to distinguish pixels corresponding to the at least one finger from pixels corresponding to the background.
18. The image capture and analysis system of claim 10, wherein the image analyzer identifies the pixels that correspond to the at least one finger by: generating a reduced representation of each digital image of the plurality of digital images in which identified pixels corresponding to the background are zeroed out.
19. A wearable goggle, comprising: a processor configured to: obtain a plurality of digital images including a first digital image captured by a camera at a time when at least one light source is illuminating a field of view containing a background and a hand including at least one finger;obtain an identification of pixels of the plurality of digital images that correspond to at least one finger of the hand that is visible in the plurality of digital images rather than to the background, the processor identifying the pixels by: obtaining, from the plurality of digital images, a Gaussian brightness falloff pattern indicative of at least one finger of the hand that is visible;identifying an axis of the at least one finger of the hand based on the obtained Gaussian brightness falloff pattern indicative of the at least one finger without identifying edges of the at least one finger; andidentifying the pixels that correspond to the at least one finger based on the identified axis of the at least one fingers; andtrack motion of the identified pixels that correspond to the at least one finger through the plurality of digital images.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No. 16/916,034, titled “ENHANCED CONTRAST FOR OBJECT DETECTION AND CHARACTERIZATION BY OPTICAL IMAGING BASED ON DIFFERENCES BETWEEN IMAGES”, filed Jun. 29, 2020, which is a continuation of U.S. application Ser. No. 16/525,475, titled “ENHANCED CONTRAST FOR OBJECT DETECTION AND CHARACTERIZATION BY OPTICAL IMAGING BASED ON DIFFERENCES BETWEEN IMAGES”, filed Jul. 29, 2019, which is a continuation of U.S. application Ser. No. 15/937,717, titled “ENHANCED CONTRAST FOR OBJECT DETECTION AND CHARACTERIZATION BY OPTICAL IMAGING BASED ON DIFFERENCES BETWEEN IMAGES”, filed Mar. 27, 2018, issued Jul. 30, 2018 as U.S. Pat. No. 10,366,308 which is a continuation of U.S. patent application Ser. No. 15/586,048, titled “ENHANCED CONTRAST FOR OBJECT DETECTION AND CHARACTERIZATION BY OPTICAL IMAGING BASED ON DIFFERENCES BETWEEN IMAGES”, filed May 3, 2017 and issued as U.S. Pat. No. 9,934,580 on Apr. 3, 2018, which is a continuation of Ser. No. 15/349,864, titled “ENHANCED CONTRAST FOR OBJECT DETECTION AND CHARACTERIZATION BY OPTICAL IMAGING BASED ON DIFFERENCES BETWEEN IMAGES”, filed 11 Nov. 2016, by David Holz and Hua Yang and issued as U.S. Pat. No. 9,652,668 on May 16, 2017, which is a continuation of U.S. patent application Ser. No. 14/959,891, titled “ENHANCED CONTRAST FOR OBJECT DETECTION AND CHARACTERIZATION BY OPTICAL IMAGING BASED ON DIFFERENCES BETWEEN IMAGES”, filed 4 Dec. 2015, by David Holz and Hua Yang and issued as U.S. Pat. No. 9,672,441 on Jun. 6, 2017, which is a continuation of U.S. patent application Ser. No. 14/106,148, titled “ENHANCED CONTRAST FOR OBJECT DETECTION AND CHARACTERIZATION BY OPTICAL IMAGING”, filed 13 Dec. 2013, by David Holz and Hua Yang and issued as U.S. Pat. No. 9,626,591 on 18 Apr. 2017, which is a continuation of U.S. patent application Ser. No. 13/742,845, titled “ENHANCED CONTRAST FOR OBJECT DETECTION AND CHARACTERIZATION BY OPTICAL IMAGING”, filed 16 Jan. 2013, by David Holz and Hua Yang, now U.S. Pat. No. 8,693,731, issued 8 Apr. 2014, which is a continuation-in-part of U.S. patent application Ser. No. 13/724,357, titled “SYSTEMS AND METHODS FOR CAPTURING MOTION IN THREE-DIMENSIONAL SPACE”, filed 21 Dec. 2012, by David Holz, now U.S. Pat. No. 9,070,019, issued 30 Jun. 2015 and a is continuation-in-part of U.S. Ser. No. 13/414,485, titled “MOTION CAPTURE USING CROSS-SECTIONS OF AN OBJECT”, filed 7 Mar. 2012, by David Holz. Said U.S. Ser. No. 13/742,845 claims priority to U.S. Provisional Patent Application No. 61/587,554, titled “METHODS AND SYSTEMS FOR IDENTIFYING POSITION AND SHAPE OF OBJECTS IN THREE-DIMENSIONAL SPACE”, filed 17 Jan. 2012, by David Holz, to U.S. Provisional Patent Application No. 61/724,091, titled “SYSTEMS AND METHODS FOR CAPTURING MOTION IN THREE-DIMENSIONAL SPACE”, filed 8 Nov. 2012, by David Holz and to U.S. Provisional Patent Application No. 61/724,068, titled “Enhanced Contrast for Object Detection and Characterization by Optical Imaging”, filed 8 Nov. 2012, by David Holz. Said U.S. Ser. No. 13/724,357 is a continuation-in-part of U.S. patent application Ser. No. 13/414,485, and also claims priority to U.S. Provisional Patent Application No. 61/724,091, and to U.S. Provisional Patent Application No. 61/587,554 Said U.S. Ser. No. 13/414,485 claims priority to U.S. Provisional Patent Application No. 61/587,554. Each of the priority applications are incorporated herein by reference in their entirety.

US Referenced Citations (335)

Number	Name	Date	Kind
5109435	Lo et al.	Apr 1992	A
5134661	Reinsch	Jul 1992	A
5282067	Liu	Jan 1994	A
5454043	Freeman	Sep 1995	A
5538013	Brannon	Jul 1996	A
5574511	Yang et al.	Nov 1996	A
5581276	Cipolla et al.	Dec 1996	A
5594469	Freeman et al.	Jan 1997	A
5610674	Martin	Mar 1997	A
5691737	Ito et al.	Nov 1997	A
5734590	Tebbe	Mar 1998	A
5739797	Karasawa et al.	Apr 1998	A
5742263	Wang et al.	Apr 1998	A
5883969	Le Gouzouguec et al.	Mar 1999	A
5900863	Numazaki	May 1999	A
5940538	Spiegel et al.	Aug 1999	A
5983909	Yeol et al.	Nov 1999	A
5995770	Rochford et al.	Nov 1999	A
6002808	Freeman	Dec 1999	A
6031568	Wakitani	Feb 2000	A
6147678	Kumar et al.	Nov 2000	A
6154558	Hsieh	Nov 2000	A
6181343	Lyons	Jan 2001	B1
6195104	Lyons	Feb 2001	B1
6204852	Kumar et al.	Mar 2001	B1
6252598	Segen	Jun 2001	B1
6263091	Jain et al.	Jul 2001	B1
6298143	Kikuchi et al.	Oct 2001	B1
6346933	Lin	Feb 2002	B1
6417970	Travers et al.	Jul 2002	B1
6492986	Metaxas et al.	Dec 2002	B1
6493041	Hanko et al.	Dec 2002	B1
6578203	Anderson, Jr. et al.	Jun 2003	B1
6602475	Chiao	Aug 2003	B1
6603867	Sugino et al.	Aug 2003	B1
6661918	Gordon et al.	Dec 2003	B1
6674877	Jojic et al.	Jan 2004	B1
6702494	Dumler et al.	Mar 2004	B2
6738424	Allmen et al.	May 2004	B1
6758215	Begum	Jul 2004	B2
6798628	Macbeth	Sep 2004	B1
6804656	Rosenfeld et al.	Oct 2004	B1
6819796	Hong et al.	Nov 2004	B2
6919880	Morrison et al.	Jul 2005	B2
6931146	Aoki et al.	Aug 2005	B2
6950534	Cohen et al.	Sep 2005	B2
6993157	Oue et al.	Jan 2006	B1
7149356	Clark et al.	Dec 2006	B2
7215828	Luo	May 2007	B2
7244233	Krantz et al.	Jul 2007	B2
7257237	Luck et al.	Aug 2007	B1
7308112	Fujimura et al.	Dec 2007	B2
7340077	Gokturk et al.	Mar 2008	B2
7483049	Aman et al.	Jan 2009	B2
7519223	Dehlin et al.	Apr 2009	B2
7532206	Morrison et al.	May 2009	B2
7536032	Bell	May 2009	B2
7542586	Johnson	Jun 2009	B2
7598942	Underkoffler et al.	Oct 2009	B2
7606417	Steinberg et al.	Oct 2009	B2
7646372	Marks et al.	Jan 2010	B2
7656372	Sato et al.	Feb 2010	B2
7665041	Wilson et al.	Feb 2010	B2
7692625	Morrison et al.	Apr 2010	B2
7831932	Josephsoon et al.	Nov 2010	B2
7840031	Albertson et al.	Nov 2010	B2
7861188	Josephsoon et al.	Dec 2010	B2
7940885	Stanton et al.	May 2011	B2
7961934	Thrun et al.	Jun 2011	B2
7971156	Albertson et al.	Jun 2011	B2
8023698	Niwa et al.	Sep 2011	B2
8045825	Shimoyama et al.	Oct 2011	B2
8059153	Barreto et al.	Nov 2011	B1
8059894	Flagg et al.	Nov 2011	B1
8085339	Marks	Dec 2011	B2
8086971	Radivojevic et al.	Dec 2011	B2
8107687	Gold, Jr.	Jan 2012	B2
8111239	Pryor et al.	Feb 2012	B2
8112719	Hsu et al.	Feb 2012	B2
8116527	Sabol et al.	Feb 2012	B2
8159536	Wang et al.	Apr 2012	B2
8180114	Nishihara et al.	May 2012	B2
8185176	Mangat et al.	May 2012	B2
8244233	Chang et al.	Aug 2012	B2
8270669	Aichi et al.	Sep 2012	B2
8290208	Kurtz et al.	Oct 2012	B2
8319832	Nagata et al.	Nov 2012	B2
8325993	Dinerstein et al.	Dec 2012	B2
8363010	Nagata	Jan 2013	B2
8395600	Kawashima et al.	Mar 2013	B2
8514221	King et al.	Aug 2013	B2
8553037	Smith et al.	Oct 2013	B2
8567395	Savona et al.	Oct 2013	B2
8582809	Halimeh et al.	Nov 2013	B2
8593417	Kawashima et al.	Nov 2013	B2
8659594	Kim et al.	Feb 2014	B2
8693731	Holz et al.	Apr 2014	B2
8724906	Shotton et al.	May 2014	B2
8738523	Sanchez et al.	May 2014	B1
8872914	Gobush	Oct 2014	B2
8878749	Wu et al.	Nov 2014	B1
8879835	Krishnaswamy et al.	Nov 2014	B2
8891868	Ivanchenko	Nov 2014	B1
8907982	Zontrop et al.	Dec 2014	B2
8929609	Padovani et al.	Jan 2015	B2
8930852	Chen et al.	Jan 2015	B2
8954340	Sanchez et al.	Feb 2015	B2
8957857	Lee et al.	Feb 2015	B2
9014414	Katano et al.	Apr 2015	B2
9056396	Linnell	Jun 2015	B1
9063574	Ivanchenko	Jun 2015	B1
9076257	Sharma et al.	Jul 2015	B2
9122354	Sharma	Sep 2015	B2
9123176	Lu et al.	Sep 2015	B2
9124778	Crabtree	Sep 2015	B1
9153028	Holz	Oct 2015	B2
9285893	Holz	Mar 2016	B2
9317924	Aratani et al.	Apr 2016	B2
9330313	Jung et al.	May 2016	B2
9392196	Holz	Jul 2016	B2
9436288	Holz	Sep 2016	B2
9436998	Holz	Sep 2016	B2
9495613	Holz et al.	Nov 2016	B2
9501152	Bedikian et al.	Nov 2016	B2
9626591	Holz et al.	Apr 2017	B2
9646201	Horowitz	May 2017	B1
9652668	Holz et al.	May 2017	B2
9672441	Holz et al.	Jun 2017	B2
9767370	Lo et al.	Sep 2017	B1
9934580	Holz et al.	Apr 2018	B2
10007350	Holz et al.	Jun 2018	B1
10210382	Shotton et al.	Feb 2019	B2
10228242	Abovitz et al.	Mar 2019	B2
10346685	Ding et al.	Jul 2019	B2
10366308	Holz et al.	Jul 2019	B2
10395385	Zhou et al.	Aug 2019	B2
10445593	Mathiesen et al.	Oct 2019	B1
10445881	Spizhevoy et al.	Oct 2019	B2
10607413	Marcolina et al.	Mar 2020	B1
10656720	Holz	May 2020	B1
10699155	Holz et al.	Jun 2020	B2
11178384	Nakamura et al.	Nov 2021	B2
20010044858	Rekimoto	Nov 2001	A1
20020008211	Kask	Jan 2002	A1
20020080094	Biocca et al.	Jun 2002	A1
20020105484	Navab et al.	Aug 2002	A1
20030053658	Pavlidis	Mar 2003	A1
20030053659	Pavlidis et al.	Mar 2003	A1
20030123703	Pavlidis et al.	Jul 2003	A1
20030152289	Luo	Aug 2003	A1
20030202697	Simard et al.	Oct 2003	A1
20040103111	Miller et al.	May 2004	A1
20040125228	Dougherty	Jul 2004	A1
20040125984	Ito et al.	Jul 2004	A1
20040145809	Brenner	Jul 2004	A1
20040155877	Hong et al.	Aug 2004	A1
20040212725	Raskar	Oct 2004	A1
20050007673	Chaoulov et al.	Jan 2005	A1
20050094019	Grosvenor et al.	May 2005	A1
20050131607	Breed	Jun 2005	A1
20050168578	Gobush	Aug 2005	A1
20050236558	Nabeshima et al.	Oct 2005	A1
20060017807	Lee et al.	Jan 2006	A1
20060029296	King et al.	Feb 2006	A1
20060034545	Mattes et al.	Feb 2006	A1
20060050979	Kawahara	Mar 2006	A1
20060072105	Wagner	Apr 2006	A1
20060098899	King et al.	May 2006	A1
20060204040	Freeman et al.	Sep 2006	A1
20060290950	Platt et al.	Dec 2006	A1
20070042346	Weller	Feb 2007	A1
20070086621	Aggarwal et al.	Apr 2007	A1
20070130547	Boillot	Jun 2007	A1
20070206719	Suryanarayanan et al.	Sep 2007	A1
20070230929	Niwa et al.	Oct 2007	A1
20070238956	Haras et al.	Oct 2007	A1
20070268316	Kajita et al.	Nov 2007	A1
20080019576	Senftner et al.	Jan 2008	A1
20080030429	Hailpern et al.	Feb 2008	A1
20080031492	Lanz	Feb 2008	A1
20080056752	Denton et al.	Mar 2008	A1
20080064954	Adams et al.	Mar 2008	A1
20080106746	Shpunt et al.	May 2008	A1
20080126937	Pachet	May 2008	A1
20080175507	Lookingbill	Jul 2008	A1
20080187175	Kim et al.	Aug 2008	A1
20080246759	Summers	Oct 2008	A1
20080247462	Demos	Oct 2008	A1
20080273764	Scholl	Nov 2008	A1
20080278589	Thorn	Nov 2008	A1
20080304740	Sun et al.	Dec 2008	A1
20080319356	Cain et al.	Dec 2008	A1
20090002489	Yang et al.	Jan 2009	A1
20090102840	Li	Apr 2009	A1
20090103780	Nishihara et al.	Apr 2009	A1
20090122146	Zalewski et al.	May 2009	A1
20090153655	Ike et al.	Jun 2009	A1
20090203993	Mangat et al.	Aug 2009	A1
20090203994	Mangat et al.	Aug 2009	A1
20090217211	Hildreth et al.	Aug 2009	A1
20090257623	Tang et al.	Oct 2009	A1
20090274339	Cohen et al.	Nov 2009	A9
20090309710	Kakinami	Dec 2009	A1
20100013832	Xiao et al.	Jan 2010	A1
20100014781	Liu et al.	Jan 2010	A1
20100026963	Faulstich	Feb 2010	A1
20100027845	Kim et al.	Feb 2010	A1
20100046842	Conwell	Feb 2010	A1
20100053164	Imai et al.	Mar 2010	A1
20100053209	Rauch et al.	Mar 2010	A1
20100058252	Ko	Mar 2010	A1
20100066737	Liu	Mar 2010	A1
20100091110	Hildreth	Apr 2010	A1
20100118123	Freedman et al.	May 2010	A1
20100121189	Ma et al.	May 2010	A1
20100125815	Wang et al.	May 2010	A1
20100158372	Kim et al.	Jun 2010	A1
20100177929	Kurtz et al.	Jul 2010	A1
20100194863	Lopes et al.	Aug 2010	A1
20100199230	Latta et al.	Aug 2010	A1
20100201880	Iwamura	Aug 2010	A1
20100208942	Porter et al.	Aug 2010	A1
20100219934	Matsumoto	Sep 2010	A1
20100222102	Rodriguez	Sep 2010	A1
20100277411	Yee et al.	Nov 2010	A1
20100296698	Lien et al.	Nov 2010	A1
20100302357	Hsu et al.	Dec 2010	A1
20100306712	Snook et al.	Dec 2010	A1
20100309097	Raviv et al.	Dec 2010	A1
20110007072	Khan et al.	Jan 2011	A1
20110025818	Gallmeier et al.	Feb 2011	A1
20110026765	Ivanich et al.	Feb 2011	A1
20110057875	Shigeta et al.	Mar 2011	A1
20110080470	Kuno et al.	Apr 2011	A1
20110093820	Zhang et al.	Apr 2011	A1
20110107216	Bi	May 2011	A1
20110115486	Frohlich et al.	May 2011	A1
20110116684	Coffman et al.	May 2011	A1
20110134112	Koh et al.	Jun 2011	A1
20110148875	Kim et al.	Jun 2011	A1
20110169726	Holmdahl et al.	Jul 2011	A1
20110173574	Clavin et al.	Jul 2011	A1
20110181509	Rautiainen et al.	Jul 2011	A1
20110193778	Lee et al.	Aug 2011	A1
20110205151	Newton et al.	Aug 2011	A1
20110213664	Osterhout et al.	Sep 2011	A1
20110228978	Chen et al.	Sep 2011	A1
20110234840	Klefenz et al.	Sep 2011	A1
20110243451	Oyaizu	Oct 2011	A1
20110267259	Tidemand et al.	Nov 2011	A1
20110267344	Germann et al.	Nov 2011	A1
20110286676	El Dokor	Nov 2011	A1
20110289455	Reville et al.	Nov 2011	A1
20110289456	Reville et al.	Nov 2011	A1
20110291925	Israel et al.	Dec 2011	A1
20110291988	Bamji et al.	Dec 2011	A1
20110296353	Ahmed et al.	Dec 2011	A1
20110299737	Wang et al.	Dec 2011	A1
20110304650	Campillo et al.	Dec 2011	A1
20110310007	Margolis et al.	Dec 2011	A1
20120038637	Marks	Feb 2012	A1
20120050157	Latta et al.	Mar 2012	A1
20120065499	Chono	Mar 2012	A1
20120068914	Jacobsen et al.	Mar 2012	A1
20120113255	Kasuya et al.	May 2012	A1
20120159380	Kocienda et al.	Jun 2012	A1
20120194517	Izadi et al.	Aug 2012	A1
20120250936	Holmgren	Oct 2012	A1
20120270654	Padovani et al.	Oct 2012	A1
20120281873	Brown et al.	Nov 2012	A1
20120314030	Datta et al.	Dec 2012	A1
20130038694	Nichani et al.	Feb 2013	A1
20130044951	Cherng et al.	Feb 2013	A1
20130097566	Berglund	Apr 2013	A1
20130108109	Leuck et al.	May 2013	A1
20130182077	Holz	Jul 2013	A1
20130187952	Berkovich et al.	Jul 2013	A1
20130208948	Berkovich et al.	Aug 2013	A1
20130229508	Li et al.	Sep 2013	A1
20130239059	Chen et al.	Sep 2013	A1
20130252691	Alexopoulos	Sep 2013	A1
20140010441	Shamaie	Jan 2014	A1
20140028861	Holz	Jan 2014	A1
20140064566	Shreve et al.	Mar 2014	A1
20140081521	Frojdh et al.	Mar 2014	A1
20140085203	Kobayashi	Mar 2014	A1
20140125813	Holz	May 2014	A1
20140139425	Sakai	May 2014	A1
20140139641	Holz	May 2014	A1
20140168062	Katz et al.	Jun 2014	A1
20140176420	Zhou et al.	Jun 2014	A1
20140177913	Holz	Jun 2014	A1
20140192024	Holz	Jul 2014	A1
20140222385	Muenster et al.	Aug 2014	A1
20140225826	Juni	Aug 2014	A1
20140253512	Narikawa et al.	Sep 2014	A1
20140253785	Chan et al.	Sep 2014	A1
20140258886	Strong	Sep 2014	A1
20140267098	Na et al.	Sep 2014	A1
20140282224	Pedley	Sep 2014	A1
20140282274	Everitt et al.	Sep 2014	A1
20140282282	Holz	Sep 2014	A1
20140307920	Holz	Oct 2014	A1
20140364212	Osman et al.	Dec 2014	A1
20140375547	Katz et al.	Dec 2014	A1
20140376773	Holz	Dec 2014	A1
20150003673	Fletcher	Jan 2015	A1
20150009149	Gharib et al.	Jan 2015	A1
20150022447	Hare et al.	Jan 2015	A1
20150029091	Nakashima et al.	Jan 2015	A1
20150115802	Kuti et al.	Apr 2015	A1
20150159992	Buckland	Jun 2015	A1
20150172539	Neglur	Jun 2015	A1
20150198716	Romano	Jul 2015	A1
20150206320	Itani et al.	Jul 2015	A1
20150253428	Holz	Sep 2015	A1
20150261291	Mikhailov et al.	Sep 2015	A1
20150304593	Sakai	Oct 2015	A1
20150323785	Fukata et al.	Nov 2015	A1
20150363001	Malzbender	Dec 2015	A1
20160012643	Kezele et al.	Jan 2016	A1
20160062573	Dascola et al.	Mar 2016	A1
20160086046	Holz et al.	Mar 2016	A1
20160086055	Holz et al.	Mar 2016	A1
20160147376	Kim et al.	May 2016	A1
20160323564	Pacheco et al.	Nov 2016	A1
20160378294	Wright et al.	Dec 2016	A1
20170124928	Edwin et al.	May 2017	A1
20180276846	Mostafavi	Sep 2018	A1
20180285923	Fateh	Oct 2018	A1
20190012794	Radwin et al.	Jan 2019	A1
20190019303	Siver et al.	Jan 2019	A1
20190116322	Holzer et al.	Apr 2019	A1
20200019766	Choi et al.	Jan 2020	A1
20200053277	Shin et al.	Feb 2020	A1

Foreign Referenced Citations (49)

Number	Date	Country
1984236	Jun 2007	CN
201332447	Oct 2009	CN
101729808	Jun 2010	CN
101930610	Dec 2010	CN
101951474	Jan 2011	CN
102053702	May 2011	CN
201859393	Jun 2011	CN
102201121	Sep 2011	CN
102236412	Nov 2011	CN
4201934	Jul 1993	DE
10326035	Jan 2005	DE
102007015495	Oct 2007	DE
102007015497	Jan 2014	DE
0999542	May 2000	EP
1837665	Sep 2007	EP
2519418	Apr 2015	GB
H02236407	Sep 1990	JP
H08261721	Oct 1996	JP
H09259278	Oct 1997	JP
2000023038	Jan 2000	JP
2002133400	May 2002	JP
2003256814	Sep 2003	JP
2004246252	Sep 2004	JP
2006259829	Sep 2006	JP
2007272596	Oct 2007	JP
2008227569	Sep 2008	JP
2009037594	Feb 2009	JP
2011010258	Jan 2011	JP
2011065652	Mar 2011	JP
2011248376	Dec 2011	JP
4906960	Mar 2012	JP
101092909	Jun 2011	KR
2422878	Jun 2011	RU
200844871	Nov 2008	TW
9426057	Nov 1994	WO
2004114220	Dec 2004	WO
2006020846	Feb 2006	WO
2007137093	Nov 2007	WO
2010032268	Mar 2010	WO
2010076622	Jul 2010	WO
2011024193	Mar 2011	WO
2011036618	Mar 2011	WO
2011044680	Apr 2011	WO
2011045789	Apr 2011	WO
2011119154	Sep 2011	WO
2012027422	Mar 2012	WO
2013109608	Jul 2013	WO
2013109609	Jul 2013	WO
2014208087	Dec 2014	WO

Non-Patent Literature Citations (76)

Entry
Dombeck, D., et al., “Optical Recording of Action Potentials with Second-Harmonic Generation Microscopy,” The Journal of Neuroscience, Jan. 28, 2004, vol. 24(4): pp. 999-1003.
Davis et al., “Toward 3-D Gesture Recognition”, International Journal of Pattern Recognition and Artificial Intelligence, vol. 13, No. 03, 1999, pp. 381-393.
Butail, S., et al., “Three-Dimensional Reconstruction of the Fast-Start Swimming Kinematics of Densely Schooling Fish,” Journal of the Royal Society Interface, Jun. 3, 2011, retrieved from the Internet <http://www.ncbi.nlm.nih.gov/pubmed/21642367>, pp. 0, 1-12.
Kim, et al., “Development of an Orthogonal Double-Image Processing Algorithm to Measure Bubble,” Department of Nuclear Engineering and Technology, Seoul National University Korea, vol. 39 No. 4, Published Jul. 6, 2007, pp. 313-326.
Barat et al., “Feature Correspondences From Multiple Views of Coplanar Ellipses”, 2nd International Symposium on Visual Computing, Author Manuscript, 2006, 10 pages.
Cheikh et al., “Multipeople Tracking Across Multiple Cameras”, International Journal on New Computer Architectures and Their Applications (IJNCAA), vol. 2, No. 1, 2012, pp. 23-33.
Kulesza, et al., “Arrangement of a Multi Stereo Visual Sensor System for a Human Activities Space,” Source: Stereo Vision, Book edited by: Dr. Asim Bhatti, ISBN 978-953-7619-22-0, Copyright Nov. 2008, I-Tech, Vienna, Austria, www.intechopen.com, pp. 153-173.
Heikkila, J., “Accurate Camera Calibration and Feature Based 3-D Reconstruction from Monocular Image Sequences”, Infotech Oulu and Department of Electrical Engineering, University of Oulu, 1997, 126 pages.
Olsson, K., et al., “Shape from Silhouette Scanner—Creating a Digital 3D Model of a Real Object by Analyzing Photos From Multiple Views,” University of Linkoping, Sweden, Copyright VCG 2001, Retrieved from the Internet: <http://liu.diva-portal.org/smash/get/diva2:18671/FULLTEXT01> on Jun. 17, 2013, 52 pages.
Forbes, K., et al., “Using Silhouette Consistency Constraints to Build 3D Models,” University of Cape Town, Copyright De Beers 2003, Retrieved from the internet: <http://www.dip.ee.uct.ac.za/˜kforbes/Publications/Forbes2003Prasa.pdf> on Jun. 17, 2013, 6 pages.
May, S., et al., “Robust 3D-Mapping with Time-of-Flight Cameras,” 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, Piscataway, NJ, USA, Oct. 10, 2009, pp. 1673-1678.
Kanhangad, V., et al., “A Unified Framework for Contactless Hand Verification,” IEEE Transactions on Information Forensics and Security, IEEE, Piscataway, NJ, US., vol. 6, No. 3, Sep. 1, 2011, pp. 1014-1027.
Di Zenzo, S., et al., “Advances in Image Segmentation,” Image and Vision Computing, Elsevier, Guildford, GBN, vol. 1, No. 1, Copyright Butterworth & Co Ltd., Nov. 1, 1983, pp. 196-210.
Arthington, et al., “Cross-section Reconstruction During Uniaxial Loading,” Measurement Science and Technology, vol. 20, No. 7, Jun. 10, 2009, Retrieved from the Internet: http:iopscience.iop.org/0957-0233/20/7/075701, pp. 1-9.
Pedersini, et al., Accurate Surface Reconstruction from Apparent Contours, Sep. 5-8, 2000 European Signal Processing Conference EUSIPCO 2000, vol. 4, Retrieved from the Internet: http://home.deib.polimi.it/sarti/CV_and_publications.html, pp. 1-4.
Chung, et al., “International Journal of Computer Vision: RecoveringLSHGCs and SHGCs from Stereo” [on-line], Oct. 1996 [retrieved on Apr. 10, 2014], Kluwer Academic Publishers, vol. 20, issue 1-2, Retrieved from the Internet: http://link.springer.com/article/10.1007/BF00144116#, pp. 43-58.
Bardinet, et al., “Fitting of iso-Surfaces Using Superquadrics and Free-Form Deformations” [on-line], Jun. 24-25, 1994 [retrieved Jan. 9, 2014], 1994 Proceedings of IEEE Workshop on Biomedical Image Analysis, Retrieved from the Internet: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=315882&tag=1, pp. 184-193.
U.S. Appl. No. 13/742,845—Office Action dated Jul. 22, 2013, 21 pages.
U.S. Appl. No. 13/742,845—Notice of Allowance dated Dec. 5, 2013, 11 pages.
PCT/US2013/021713—International Search Report and Written Opinion dated Sep. 11, 2013, 7 pages.
U.S. Appl. No. 14/106,148—Office Action dated Jul. 6, 2015, 14 pages.
Texas Instruments, “QVGA 3D Time-of-Flight Sensor,” Product Overview: OPT 8140, Dec. 2013, Texas Instruments Incorporated, 10 pages.
Texas Instruments, “4-Channel, 12-Bit, 80-MSPS ADC,” VSP5324, Revised Nov. 2012, Texas Instruments Incorporated, 55 pages.
Texas Instruments, “Time-of-Flight Controller (TFC),” Product Overview; OPT9220, Jan. 2014, Texas Instruments Incorporated, 43 pages.
U.S. Appl. No. 14/106,148—Notice of Allowance dated Dec. 2, 2015, 41 pages.
CN 2013800122765—Office Action dated Nov. 2, 2015, 17 pages.
U.S. Appl. No. 14/959,880—Notice of Allowance dated Mar. 2, 2016, 12 pages.
Matsuyama et al. “Real-Time Dynamic 3-D Object Shape Reconstruction and High-Fidelity Texture Mapping for 3-D Video,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, No. 3, Mar. 2004, pp. 357-369.
Fukui et al. “Multiple Object Tracking System with Three Level Continuous Processes” IEEE, 1992, pp. 19-27.
Mendez, et al., “Importance Masks for Revealing Occluded Objects in Augmented Reality,” Proceedings of the 16th ACM Symposium on Virtual Reality Software and Technology, 2 pages, ACM, 2009.
U.S. Appl. No. 14/959,891—Office Action dated Apr. 11, 2016, 8 pages.
U.S. Appl. No. 14/106,148—Response to Office Action dated Jul. 6, 2015 filed Nov. 6, 2015, 41 pages.
U.S. Appl. No. 14/959,880—Notice of Allowance dated Jul. 12, 2016, 8 pages.
U.S. Appl. No. 14/106,148—Notice of Allowance dated Jul. 20, 2016, 30 pages.
U.S. Appl. No. 14/959,891—Notice of Allowance dated Jul. 28, 2016, 19 pages.
JP 2014-552391—First Office Action dated Dec. 9, 2014, 6 pages.
U.S. Appl. No. 13/742,845—Response to Office Action dated Jul. 22, 2013 filed Sep. 26, 2013, 7 pages.
U.S. Appl. No. 14/959,891—Response to Office Action dated Apr. 11, 2016 filed Jun. 8, 2016, 25 pages.
JP 2014-552391—Second Office Action dated Jul. 7, 2015, 7 pages.
JP 2014-552391—Third Office Action dated Jan. 26, 2016, 5 pages.
CN 2013800122765—Second Office Action dated Jul. 27, 2016, 6 pages.
Zhang et al., A Wearable Goggle Navigation System for Dual-Mode Optical and Ultrasound Localization of Suspicious Lesions:Validation Studies Using Tissue-Simulating Phantoms and an Ex Vivo Human Breast Tissue Model, PLOS One, dated Jul. 1, 2016, 16 pages.
U.S. Appl. No. 16/525,475—Notice of Allowance, dated Feb. 26, 2020, 14 pages.
Rasmussen, Matihew K., “An Analytical Framework for the Preparation and Animation of a Virtual Mannequin forthe Purpose of Mannequin-Clothing Interaction Modeling”, A Thesis Submitted in Partial Fulfillment of the Requirements for the Master of Science Degree in Civil and Environmental Engineering in the Graduate College of the University of Iowa, Dec. 2008, 98 pages.
Zenzo et al., “Advantages in Image Segmentation,” Image and Vision Computing, Elsevier Guildford, GB, Nov. 1, 1983, pp. 196-210.
VCNL4020 Vishay Semiconductors. Datasheet [online]. Vishay Intertechnology, Inc, Doc No. 83476, Rev. 1.3, Oct. 29, 2013 [retrieved Mar. 4, 2014]. Retrieved from the Internet: <www.vishay.com>. 16 pages.
VCNL4020 Vishay Semiconductors. Application Note [online]. Designing VCNL4020 into an Application. Vishay Intertechnology, Inc, Doc No. 84136, Revised May 22, 2012 [retrieved Mar. 4, 2014]. Retrieved from the Internet: <www.vishay.com>. 21 pages.
Schaar, R., VCNL4020 Vishay Semiconductors. Application Note [online]. Extended Detection Range with VCNL Family of Proximity Sensor Vishay Intertechnology, Inc, Doc No. 84225, Revised Oct. 25, 2013 [retrieved Mar. 4, 2014]. Retrieved from the Internet: <www.vishay.com>. 4 pages.
Cumani, A., et al., “Recovering the 3D Structure of Tubular Objects from Stereo Silhouettes,” Pattern Recognition, Elsevier, GB, vol. 30, No. 7, Jul. 1, 1997, 9 pages.
PCT/US2013/021713—International Preliminary Report on Patentability dated Jul. 22, 2014, 13 pages, (WO 2013/109609).
Ballan et al., “Lecture Notes Computer Science: 12th European Conference on Computer Vision: Motion Capture of Hands in Action Using Discriminative Salient Points”, Oct. 7-13, 2012 [retrieved Jul. 14, 2016], Springer Berlin Heidelberg, vol. 7577, pp. 640-653. Retrieved from the Internet: <http://link.springer.com/chapter/1 0.1 007/978-3-642-33783-3 46>.
Cui et al., “Applications of Evolutionary Computing: Vision-Based Hand Motion Capture Using Genetic Algorithm”, 2004 [retrieved Jul. 15, 2016], Springer Berlin Heidelberg, vol. 3005 of LNCS, pp. 289-300. Retrieved from the Internet: <http://link.springer.com/chapter/10.1007/978-3-540-24653-4_30>.
Delamarre et al., “Finding Pose of Hand in Video Images: A Stereo-based Approach”, Apr. 14-16, 1998 [retrieved Jul. 15, 2016], Third IEEE Intern Conf on Auto Face and Gesture Recog, pp. 585-590. Retrieved from the Internet: <http://ieeexplore.ieee.org/xpl/login jsp?tp=&arnumber=671011&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D671011>.
Gorge et al., “Model-Based 3D Hand Pose Estimation from Monocular Video”, Feb. 24, 2011 [retrieved Jul. 15, 2016], IEEE Transac Pattern Analysis and Machine Intell, vol. 33, Issue: 9, pp. 1793-1805, Retri Internet: <http://ieeexplore.ieee.org/xpl/logi n .jsp ?tp=&arnu mber=571 9617 &u rl=http%3A %2 F%2 Fieeexplore. ieee.org%2Fxpls%2 Fabs all.isp%3Farnumber%3D5719617>.
Guo et al., Featured Wand for 3D Interaction, Jul. 2-5, 2007 [retrieved Jul. 15, 2016], 2007 IEEE International Conference on Multimedia and Expo, pp. 2230-2233. Retrieved from the Internet: <http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=4285129&tag=1&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4285129%26tag%3D1>.
Melax et al., “Dynamics Based 3D Skeletal Hand Tracking”, May 29, 2013 [retrieved Jul. 14, 2016], Proceedings of Graphics Interface, 2013, pp. 63-70. Retrived from the Internet: <http://dl.acm.org/citation.cfm?id=2532141>.
Oka et al., “Real-Time Fingertip Tracking and Gesture Recognition”, Nov./Dec. 2002 [retrieved Jul. 15, 2016], IEEE Computer Graphics and Applications, vol. 22, Issue: 6, pp. 64-71. Retrieved from the Internet: <http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1046630&ur=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabsall.jsp%3Farnumber%3D1046630>.
Schlattmann et al., “Markerless 4 gestures 6 DOF real-time visual tracking of the human hand with automatic initialization”, 2007 [retrieved Jul. 15, 2016], Eurographics 2007, vol. 26, No. 3, 10 pages, Retrieved from the Internet: <http://cg.cs.uni-bonn.de/aigaion2root/attachments/schlattmann-2007-markerless.pdf>.
Wang et al., “Tracking of Deformable Hand in Real Time as Continuous Input for Gesture-based Interaction”, Jan. 28, 2007 [retrieved Jul. 15, 2016], Proceedings of the 12th International Conference on Intelligent User Interfaces, pp. 235-242. Retrieved fromthe Internet: <http://dl.acm.org/citation.cfm?id=1216338>.
Zhao et al., “Combining Marker-Based Mocap and RGB-D Camera for Acquiring High-Fidelity Hand Motion Data”, Jul. 29, 2012 [retrieved Jul. 15, 2016], Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 33-42, Retrieved from the Internet: <http://dl.acm.org/citation.cfm?id=2422363>.
Palmer, Diffraction Grating Handbook, Newport Corporation, 6th Edition, dated 2005, 54 pages.
JP 2016-104145—First Office Action dated Feb. 21, 2017, 8 pages.
U.S. Appl. No. 14/106,148—Notice of Allowance dated Dec. 14, 2016, 40 pages.
U.S. Appl. No. 15/349,864—Notice of Allowance dated Jan. 13, 2017, 12 pages.
U.S. Appl. No. 15/586,048, Office Action dated Jun. 12, 2017, 68 pages.
U.S. Appl. No. 15/586,048, Response to Office Action dated Jun. 12, 2017, filed Sep. 6, 2017, 8 pages.
U.S. Appl. No. 15/937,717—Notice of Allowance dated Nov. 21, 2018, 9 pages.
U.S. Appl. No. 15/937,717—Amendment filed Dec. 13, 2018 (after Notice of Allowance), 7 pages.
U.S. Appl. No. 16/525,475—Non-Final Office Action dated Jan. 10, 2020, 77 pages.
CN 201710225106.5—First Office Action dated Oct. 8, 2019, 20 pages.
U.S. Appl. No. 16/525,475—Response to Non-Final Office Action dated Jan. 10, 2020 filed Jan. 30, 20020, 12 pages.
CN 201710225106.5—Response to First Office Action dated Oct. 8, 2019, as filed Feb. 24, 2020, 17 pages.
CN 201710225106.5—Notice of Allowance dated May 8, 2020, 2 pages.
U.S. Appl. No. 16/916,034—Notice of Allowance dated Feb. 2, 2022, 10 pages.
Castle et al., Video-rate Localization in Multiple Maps for Wearable Augmented Reality, IEEE dated Jan. 2008, pp. 1-8.
Krainin et al. Manipulator and Object Tracking for In Hand Model Acquisition, University of Washington, dated Jul. 2011, pp. 1-8.

Related Publications (1)

	Number	Date	Country
	20220198776 A1	Jun 2022	US

Provisional Applications (3)

Number	Date	Country
61724068	Nov 2012	US
61724091	Nov 2012	US
61587554	Jan 2012	US

Continuations (8)

	Number	Date	Country
Parent	16916034	Jun 2020	US
Child	17693200		US
Parent	16525475	Jul 2019	US
Child	16916034		US
Parent	15937717	Mar 2018	US
Child	16525475		US
Parent	15586048	May 2017	US
Child	15937717		US
Parent	15349864	Nov 2016	US
Child	15586048		US
Parent	14959891	Dec 2015	US
Child	15349864		US
Parent	14106148	Dec 2013	US
Child	14959891		US
Parent	13742845	Jan 2013	US
Child	14106148		US

Continuation in Parts (2)

	Number	Date	Country
Parent	13724357	Dec 2012	US
Child	13742845		US
Parent	13414485	Mar 2012	US
Child	13724357		US

Differentiating a detected object from a background using a gaussian brightness falloff pattern

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Term Extension

Abstract