The present invention relates generally to depth sensing.
In exemplary implementations of this invention, a system includes multiple light sources, multiple cameras, a pattern generator and one or more computers. The system measures depth (distance to points in a scene).
Light from the multiple light sources illuminates the pattern generator. The pattern generator refracts, reflects or selectively attenuates the light, to create textured visual patterns. The pattern generator projects the textured light onto the scene. The multiple cameras capture images of the scene from different viewpoints, while the scene is illuminated by the textured light. One or more computers process the images and compute the depth of points in the scene, by a computation that involves stereoscopic triangulation.
The multiple cameras image a scene from different vantage points. The multi-view data captured by these cameras is used for the stereoscopic triangulation. For example, in some cases, the multiple cameras comprise a pair of cameras.
In some cases, each of the multiple cameras has a wide field of view (FOV). The ability to measure depth over a wide FOV is advantageous for many applications. For example, in some cases, this invention is installed in a store, restaurant, lobby, public transit facility, or other wide space where it is desirable to measure depth over a wide FOV.
In illustrative implementations, the multiple light sources are positioned at different angles from the pattern generator. Thus, they illuminate the pattern generator from different angles. In some cases, the wider the range of angles at which they illuminate the pattern generator, the wider the range of angles of light projected by the pattern generator.
In illustrative implementations, the field of illumination (FOI) of the projected textured light is controllable. For example, in some cases, actuators translate the light sources (in a direction that is not directly toward or directly away from the pattern generator), and thereby change the respective angles of the translated light sources relative to the pattern generator. This, in turn, changes the angles at which light exits the pattern generator, and thus changes the FOI.
Furthermore, in some cases: (a) the light sources are directional (emit a greater radiance in some directions than in others); and (b) actuators rotate a directional light source (e.g., such that radiance emitted by the light source in the direction of the pattern generator is greater immediately after the rotation than immediately before the rotation).
A well-known problem with conventional stereoscopic depth ranging is that it is difficult to accurately measure the depth of regions of a scene that have zero or low visual texture. For example, a flat wall with uniform visual features has very little visual texture. Thus, it would be difficult, with conventional stereoscopic depth ranging, to accurately measure depth of points on such a wall.
This invention mitigates this low-texture problem by projecting a textured light pattern onto the scene. For example, in some cases, the textured pattern comprises bright dots or patches, sharp edges, or other features with a high spatial frequency. The pattern generator creates these patterns, when illuminated by the light sources. The projected light patterns add visual texture to the scene.
In some cases: (a) the pattern generator comprises a refractive optical element; and (b) the pattern generator refracts light (from the light sources) to create the visual texture. For example, in some cases, the refractive optical element creates caustic light patterns that add texture to the scene.
In other cases: (a) the pattern generator comprises a reflective optical element; and (b) the pattern generator reflects light (from the light sources) from a specular surface of the pattern generator, in order to create the visual texture. For example, in some implementations, the specular surface is uneven (with “hills” and “valleys”).
In other cases, the pattern generator comprises a spatial light modulator (SLM) and the textured light comprises light that passes through the SLM. For example, in some implementations, the SLM is a pinhole mask, and the textured light pattern is an array of dots of light, which correspond to the holes in the mask.
In some implementations of this invention, a lens is used to widen the FOI. Light from one or more of the light sources passes through, and is diverged by, the lens. The lens may be placed either in front of or behind the pattern generator.
A problem with projecting a textured light pattern onto a distant scene plane is that the resolution of the textured pattern decreases as depth increases (i.e., as distance from the pattern generator to the scene plane increases).
In some implementations, to mitigate this resolution problem, one or more lenses are used to create virtual images of the light sources. For each actual light source: (a) a lens is positioned between the actual light source and the lens, such that the distance between the lens and the light source is less than the focal length of the lens; (b) the lens creates a virtual image of the light source; and (c) the distance between the virtual image and the pattern generator is greater than the distance between the actual light source and the pattern generator. This optical setup (with the lens) causes the projected light texture to have a greater resolution at the scene plane than it would in the absence of the lens.
In some implementations, only a single light source is used, but mirrors are employed so that the single light source appears, from the vantage point of the pattern generator, to comprise multiple light sources. Two or more mirrors are positioned so that light from the single light source reflects off the mirrors and travels to the pattern generator. The multiple mirrors create the appearance of multiple virtual light sources, when seen from the pattern generator. Light from the actual light source and virtual light sources impacts the pattern generator from different angles. For example, in some cases, if a single actual light source and two mirrors are used, then from the vantage point of the pattern generator, there appear to be three light sources (one actual and two virtual), each at a different angle from the pattern generator.
In some implementations of this invention, the multiple light sources simplify the task of registering the multiple cameras, relative to the scene being imaged. This registration is performed by turning on and off the multiple light sources, one at a time.
In some implementations, a visual display screen displays an image that conveys information about the computed depth of points in the scene. For example, in some cases, the image is a depth map.
The description of the present invention in the Summary and Abstract sections hereof is just a summary. It is intended only to give a general introduction to some illustrative implementations of this invention. It does not describe all of the details of this invention. This invention may be implemented in many other ways. Likewise, the description of this invention in the Field of the Technology section is not limiting; instead it identifies, in a general, non-exclusive manner, a field of technology to which some embodiments of this invention generally relate.
The above Figures show some illustrative implementations of this invention, or provide information that relates to those implementations. However, this invention may be implemented in many other ways.
In illustrative embodiments of this invention, a depth-sensing system includes multiple light sources, multiple cameras, a pattern generator and a computer.
A housing 150 houses or structurally supports the components of the depth-sensing device 100. The housing 150 includes a top wall 151, bottom wall 152, and four side walls 153, 154, 155, 156. A user input module 130 comprises two buttons 131, 132 for receiving input (e.g., instructions) from a human user.
In the example shown in
The setups shown in
First, the light sources (e.g., 111, 112, 113, 171, 172, 173) are each at a different direction from the pattern generator. Thus, the light from these light sources strikes the pattern generator 120, 175 over a range of directions. This, in turn, causes the field of illumination of the textured light projected by the pattern generator 120, 175 to be wider than it would be if the pattern generator were illuminated by light from only one direction (e.g., from only one light source). A wide field of illumination is helpful in many scenarios, such as determining depth in a large room (e.g., in a store, restaurant, lobby or other public space).
Second, the pattern generator projects textured light unto the scene. The visual texture added by the projected light makes it easier to determine the depth of scene features that otherwise would have little or no visual texture. The projected texture makes it easier to determine corresponding points in the multi-view images.
For example, consider a region of the scene (such as a flat, visually uniform surface of a wall) with little or no visual texture. The absence of texture makes it difficult to find corresponding points in two stereo images of the scene (e.g., a pair of images, one taken by camera 140, the other taken by camera 141). The textured light projected by the pattern generator 120 mitigates this problem. For example, in some cases, the textured light pattern includes feature points that are easily identifiable in both images, thereby simplifying the task of finding corresponding points. In illustrative implementations, finding corresponding points in images taken by different cameras is a step in measuring depth in a scene by triangulation.
Third, the multiple light sources make it easy to register stereo images, as described in more detail below.
For ease of illustration, in
This invention is not limited to the type of the illumination sources (LEDs) shown in
In illustrative implementations of this invention, the illumination sources include one or more active light sources, such as emitting diodes (LEDs), lasers, masers, incandescent light sources, fluorescent light sources, electroluminescent light sources, other luminescent or phosphorescent light sources, gas discharge light sources (including neon lights) or plasma light sources. As used herein, an “active light source” means a light source that emits light by emission of photons. As used herein, “emission of photons” does not mean (i) reflection of light, (ii) refraction of light, or (iii) mere transmission of pre-existing light. For example, in some cases, a photon is emitted when an electron drops from a higher energy state to a lower energy state.
In some implementations, the illumination sources also include one or more passive light sources that comprise specular surfaces, such as planar mirrors, concave mirrors, convex mirrors, or optical fibers. In these implementations, light is emitted by an active light source, travels to the specular surface, reflects off the specular surface, and then travels to the pattern generator. As used herein, a “passive light source” means a specular surface.
In illustrative implementations, light shines on the pattern generator from multiple different positions. These positions are occupied by any combination of active light sources or passive light sources.
In illustrative embodiments, the arrangements of illumination sources shown in
In illustrative implementations, one or more of active light sources emit NIR (near infrared) light or other light outside of the visible light spectrum. For example, in illustrative implementations, the illumination module includes (i) two or more NIR active light sources or (ii) at least one NIR active light source and at least one passive light source for reflecting the NIR light. Also, in these implementations, at least two cameras measure NIR light.
Advantageously, NIR is invisible to humans. Thus, human users are not distracted by the NIR light when the NIR light is projected unto to the scene to add visual texture to the scene. As used herein, “visual texture” means the texture of a pattern of light. The term “visual texture” does not imply that the pattern of light exists in the visible light spectrum. For example, in many implementations of this invention, the visual texture occurs in NIR light.
In illustrative implementations, a depth sensing device includes one or more actuators for translating or rotating one or more components (such as an illumination source, pattern generator, lens, or camera) of the device.
In illustrative implementations, one or more actuators control the position or orientation of one or more of the light sources, mirrors or lens. Controlling the position or orientation of these components alters the distribution of the projected light pattern over the scene being imaged. The projected light pattern provides visual texture to the scene for more accurate depth computation. Generally, walls, floor and the ceiling are places in the scene where the projected pattern is desirable. The position of the walls, floor and ceiling in the image varies from scene to scene. Therefore, actuators are useful to change the position or orientation of the light sources, mirrors or lens in order to project the pattern on those parts of the image that need virtual texture for depth computation.
In some use scenarios, the actuators translate or rotate these components (light sources, lens or mirrors) to cause the projected pattern to be concentrated in a small portion of the scene, or to be spread out over a large part of the scene, or projected in sequence from one part of the scene to another, depending on what is desirable for the particular scene.
Here are five non-limiting examples of the use of actuators, in illustrative implementations of this invention:
Actuator Example 1: In some implementations, actuators translate illumination sources (a) to move the illumination sources wider apart from each other (so that they are less densely arranged); or (b) to move the illumination sources closer together (so that they are more densely arranged). In some cases, moving the illumination sources wider apart causes the FOI to be larger; and moving them closer together causes the FOI to be smaller. More generally, translating an illumination source, in a direction other than directly at or directly away from the pattern generator, affects the size, shape or direction of the FOI.
In alternative embodiments, a set of stationary, active light sources are used. Light sources in the set are turned on and off, creating a similar effect to translating an active light source. For example, in some cases: (a) a depth-sensing device includes a stationary array of lights, and (b) a computer outputs control signals to selectively turn different lights in the set on and off, and thereby to control the range of angles at which light is incident on the pattern generator, and thereby to control the size, direction or shape of the FOI.
For example, in
Actuator Example 2: In some implementations, actuators rotate one or more directional illumination sources (e.g., to point a directional illumination source at the pattern generator). For example, in some cases: (a) translation of a directional illumination source causes or would cause the directional illumination source to no longer be pointed at the pattern generator; and (b) rotation is used to compensate for this effect. The rotation may follow, precede or occur concurrently with the translation. More generally, rotating a directional illumination source (e.g., to point it at, or away from, the pattern generator) affects the size, direction, or shape of the FOI.
Actuator Example 3: In some implementations, an actuator translates a pattern generator toward, or away from, an illumination module. In some cases: (a) moving the pattern generator closer to the illumination module increases the FOI; and (b) moving the pattern generator further from the illumination module decreases the FOI.
Actuator Example 4: In some implementations, an actuator translates a lens (e.g., to different positions along the optical axis of the lens). For example, in some cases: (a) the optical axis of a lens intersects an active light source and a pattern generator; (b) the actuator translates the lens to a position along the axis where the focal length of the lens is greater than the distance between the active light source and the lens; (c) in this position, the lens creates (from the vantage point of the pattern generator) a virtual image of the active light source; (d) the distance between the virtual image and the pattern generator is greater than the distance between the active light source and the pattern generator; and (e) the FOI is smaller than it would be if the lens were at the focal length. This example is illustrated in
Actuator Example 5: In some implementations, one or more actuators rotate one or more cameras. Such rotation has many practical applications, including in the following use scenarios: (a) to rotate cameras so that they image a desired region of the scene (e.g., to image a region that is directly in from of the device, or instead to image a region that is off center); or (b) to compensate for changes in depth at which the cameras are focused. As a non-limiting illustration of the latter application, consider the following use scenario: Two cameras are pointed such that they image the same region of a scene at a first depth. If the cameras then refocus at a new, second depth, this may cause the cameras to image different (e.g., partially overlapping) regions of the scene at the second depth. Actuators may compensate, by rotating the cameras to focus at a single region at the second depth.
In
In
In some implementations, the light projected by the pattern generator has a wide field of illumination (FOI). For example, in some cases, the maximum angle subtended by two points in the FOI (as seen from the pattern generator) is 45 degrees, or 50 degrees, or 55 degrees, or 60 degrees, or 65 degrees, or 70 degrees, or more than 70 degrees. Angle B in
In illustrative implementations, the illumination sources are positioned in different directions relative to the pattern generator. For example, in some cases, the maximum angle subtended by two illumination sources (as seen from a pattern generator) is 45 degrees, or 50 degrees, or 55 degrees, or 60 degrees, or 65 degrees, or 70 degrees, or more than 70 degrees. Angle A in
In illustrative implementations, increasing the size of the FOI at a given scene plane tends to decrease the resolution of the visual pattern projected by the pattern generator onto the given scene plane. Conversely, decreasing the size of the FOI at the scene plane tends to increase the resolution.
Thus, in illustrative implementations, the size and shape of the FOI and the resolution of the projected visual texture pattern are controllable. In some implementations, actuators translate or rotate illumination sources or turn active light sources on and off. This in turn controls the range of directions at which light from illumination sources impact the pattern generator, which in turn controls the range of directions at which light exits the pattern generator, which in turn controls the size and shape of the FOI and the resolution of the projected visual texture.
In the depth-sensing system shown in
D=B(1+A/C) (Eq. 1)
where A is the distance between a pattern generator 530 and a scene plane 533, B is the diameter of a pinhole 535 in the pattern generator 530, C is the distance between light source 541 and pattern generator 530, and D is the length of the FOI 537 that would be projected on scene plane 533 if lens 539 were not present.
In
The presence of the lens 539 causes the length of the FOI 537 to be less than it would be if the lens 539 were absent. In
In the example shown in
In the depth-sensing system shown in
F=B(1+A/E) (Eq. 2)
As is evident from Equation 2, the limit of F (the length of the FOI when the lens is present) as E (distance between virtual image and pattern generator) approaches infinity is B (the diameter of pinhole 535). Put differently, the more that E increases, the closer that F comes to B.
In some cases, it is desirable to reduce F (length of FOI when lens is present) as close as possible to the B (the diameter of the pinhole), in order to increase the resolution of the projected pattern at scene plane 533.
Equation 2 indicates that, in the example shown in
In some implementations, an actuator 534 translates lens 539 along its optical axis 540. For example, in some use scenarios, actuator 533 translates lens 539 closer to the light source 541, and thereby increases the size of the FOI projected at scene plane 533 (and thus reduces the resolution the projected pattern). In other use scenarios, actuator 534 translates lens 539 away from light source 541, and thereby decreases the size of the FOI projected at scene plane 533 (and thus increases the resolution of the projected pattern).
Alternatively (or in addition), in some implementations, a lens is positioned in front of the pattern generator, at a given location between the pattern generator and the scene. For example, a negative lens at that given location will diverge light and thereby (if the given location is far enough from the scene) increase the size of the FOI as measured at the scene. In contrast, a positive lens at that given location will converge light and thereby (if the given location is far enough from the scene) decrease the size of the FOI.
In the example shown in
Alternatively, in some cases, another way to widen the FOI is to widen the pattern generator, or to widen a region of the pattern generator which does not block light from reaching the scene.
In some cases (such as
In
In
In
In
In illustrative implementations, the pattern generator is illuminated by incoming light. The pattern generator projects light onto a scene. The projected light adds visual texture to the scene.
In illustrative implementations, a pattern generator modifies light, such that the input light (i.e., light incident on the pattern generator) is different than the output light (i.e. outgoing light that leaves the pattern generator). As used herein: (a) “input light” means light that is incident on a pattern generator; and (c) “output light” means light that leaves a pattern generator. For example, in
In illustrative implementations, modification of the light by a pattern generator can be described by at least the following seven attributes: variance, uniformity, average entropy, number of edge crossings, spatial frequency factor, or number of intensity peaks. For example:
In illustrative implementations, the shape of the pattern generator is such that the variance of the output light is substantially greater than the variance of the input light.
In illustrative implementations, the shape of the pattern generator is such that the uniformity of the output light is substantially less than the uniformity of the input light.
In illustrative implementations, the shape of the pattern generator is such that the average entropy of the output light is substantially greater than the average entropy of the input light.
In illustrative implementations, the shape of the pattern generator is such that the number of edge crossings in the output light is substantially greater than the number of edge crossings of the input light.
In illustrative implementations, the shape of the pattern generator is such that the spatial frequency factor of the output light is substantially greater than the spatial frequency factor of the input light.
In illustrative implementations, the shape of the pattern generator is such that the number of intensity peaks of the output light is substantially greater than the number of intensity peaks of the input light.
Illustrative implementations mentioned in each of the previous six paragraphs include: (a) an embodiment in which the pattern generator projects a pattern shown in
As used herein, the “variance” of an image is the second statistical moment of the intensity histogram of the image. That is, the variance ν is defined as:
where Z is a random variable denoting intensity, p(zi) is the corresponding intensity histogram for i=0, 1, 2, . . . , L−1, and L is the number of distinct intensity levels allowed in the digital image, and m is the mean value of Z (that is, the average intensity):
As used herein, the “uniformity” U of an image is defined as:
where p(zi) has the same meaning as defined above.
As used herein, the “average entropy” e of an image is defined as:
where p(zi) has the same meaning as defined above.
As used herein, the number of edge crossings in an image means the total number of times that an edge is crossed when going row-by-row across all the rows of pixels in the image, and then going column-by-column down all the columns of pixels in the image. Under this definition: (a) a single edge may have multiple edge crossings; and (b) a single pixel may be located on two edge crossings (once in a row and once in a column).
As used herein, to say that an image has “multiple edges” means that in at least one specific row or column of pixels in the image, more than one edge crossing occurs in that specific row or column. For example, under this definition, an image may be counted as having “multiple edges” if the image has an edge that forms a loop or that waves back and forth across the image.
As used herein, the “spatial frequency factor” of an image is a quantity determinable from a magnitude spectrum of a 2D DFT (Discrete Fourier Transform) of the image, where the origin of the spectrum is centered. Specifically, the “spatial frequency factor” is the length of the radius of the largest circle in the 2D DFT, which circle is centered at the origin, such that the total intensity of the pixels of the 2D DFT that are located on or outside the circle is greater than the total intensity of the pixels of the 2D DFT that are located inside the circle.
It can be helpful to standardize measurements for testing if a definition is satisfied. To say that an image is captured under “Standard Conditions” means that: (a) the digital image is a 16.1 megapixel image; (b) the digital image is an image of an entire screen and does not include any regions outside of the screen; (b) the screen is a diffuse, planar surface; (c) the screen is located 1 meter from a pattern generator, and (d) the sole source of illumination of the screen is light from a single point source of light, which point source is distant from the pattern generator, and which light travels in an optical path from the point source directly to the pattern generator, and then from the pattern generator directly to the screen.
As can be seen from the preceding definition, under Standard Conditions, a pattern generator lies in an optical path between a single distant point source of light and the scene.
It can be helpful to describe the effect of a pattern generator by comparing a first image of a screen taken when a pattern generator is projecting light onto the screen and a second image of a screen taken under identical conditions, except that the pattern generator has been removed. In other words, it can be helpful to compare a first image taken under Standard Conditions with a pattern generator present, and a second image taken under identical conditions except that the pattern generator is absent. The second image (taken with the pattern generator absent) effectively measures the input light.
As used herein, to say that “the variance of the output light is substantially greater than the variance of the input light” means that, if a first digital image of a screen were taken under Standard Conditions while the screen is illuminated by light that travels from a light source to a pattern projector and then to the screen, and a second digital image of the screen were taken under identical conditions except that the pattern generator is absent, then the variance of the first digital image would be substantially greater than the variance of the second digital image.
As used herein, to say that “the uniformity of the output light is substantially less than the uniformity of the input light” means that, if a first digital image of a screen were taken under Standard Conditions while the screen is illuminated by light that travels from a light source to a pattern projector and then to the screen, and a second digital image of the screen were taken under identical conditions except that the pattern generator is absent, then the uniformity of the first digital image would be substantially less than the uniformity of the second digital image.
As used herein, to say that “the average entropy of the output light is substantially greater than the average entropy of the input light” means that, if a first digital image of a screen were taken under Standard Conditions while the screen is illuminated by light that travels from a light source to a pattern projector and then to the screen, and a second digital image of the screen were taken under identical conditions except that the pattern generator is absent, then the average entropy of the first digital image would be substantially greater than the average entropy of the second digital image.
As used herein, to say that “the number of edge crossings of the output light is substantially greater than the number of edge crossings of the input light” means that, if a first digital image of a screen were taken under Standard Conditions while the screen is illuminated by light that travels from a light source to a pattern projector and then to the screen, and a second digital image of the screen were taken under identical conditions except that the pattern generator is absent, then the number of edge crossings of the first digital image would be substantially greater than the number of edge crossings of the second digital image.
As used herein, to say that “the spatial frequency factor of the output light is substantially greater than the spatial frequency factor of the input light” means that, if a first digital image of a screen were taken under Standard Conditions while the screen is illuminated by light that travels from a light source to a pattern projector and then to the screen, and a second digital image of the screen were taken under identical conditions except that the pattern generator is absent, then the spatial frequency factor of the first digital image would be substantially greater than the spatial frequency factor of the second digital image.
As used herein, to say that “the number of intensity peaks of the output light is substantially greater than the number of intensity peaks of the input light” means that, if a first digital image of a screen were taken under Standard Conditions while the screen is illuminated by light that travels from a light source to a pattern projector and then to the screen, and a second digital image of the screen were taken under identical conditions except that the pattern generator is absent, then the number of intensity peaks of the first digital image would be substantially greater than the number of intensity peaks of the second digital image.
The preceding six definitions do not mean that, in normal operation of this invention, the first and second images contemplated by these six definitions would actually be taken. Rather, each of the preceding six definitions precisely describe a difference between output light and input light, by stating what would be measured, if such first and second images were taken. For each of these six definitions, the contemplated difference between output light and input light either exists or does not exist, regardless of whether the first and second images are actually taken. For example, consider a statement that the variance of the output light is substantially greater than the variance of the input light. If this statement is true, then it is true regardless of whether such first and second images are actually taken.
In some cases, it is helpful to describe the effect of the pattern generator on local neighborhoods of an image. For example:
In illustrative implementations, the shape of the pattern generator is such that the variance of the output light is locally substantially greater than the variance of the input light.
In illustrative implementations, the shape of the pattern generator is such that the uniformity of the output light is locally substantially less than the uniformity of the input light.
In illustrative implementations, the shape of the pattern generator is such that the average entropy of the output light is locally substantially greater than the average entropy of the input light.
In illustrative implementations, the shape of the pattern generator is such that the number of edge crossings in the output light is locally substantially greater than the number of edge crossings of the input light.
In illustrative implementations, the shape of the pattern generator is such that the spatial frequency factor of the output light is locally substantially greater than the spatial frequency factor of the input light.
Illustrative implementations mentioned in each of the previous five paragraphs include: (a) an embodiment in which a pattern generator projects a pattern shown in
This paragraph provides a definition for what it means for a variable to “locally” increase or decrease. Whether a variable “locally” increases (or decreases) from a first image to a second image is determined by comparing values of the variable in corresponding neighborhoods in the first and second image. Specifically, to say that a variable is “locally” greater in a first image than a second image means that the total number of neighborhoods in which the variable increases from the first to the second image is greater than the total number of neighborhoods in which the variable decreases from the first to the second image. Likewise, to say that a variable is “locally” less in a first image than a second image means that the total number of neighborhoods in which the variable decreases from the first to the second image is greater than the total number of neighborhoods in which the variable increases from the first to the second image. For purposes of this definition, the neighborhoods are created by subdividing each image into square, side-by-side, non-overlapping neighborhoods of 25×25 pixels, starting at the origin of the image (i.e., upper left corner of the image), and disregarding any portions of the image that do not fit into a complete 25×25 pixel neighborhood (e.g., along the bottom border or right border of the image.
It can be helpful to describe the local effect of a pattern generator by locally comparing a first image of a screen taken when a pattern generator is projecting light onto the screen and a second image of a screen taken under identical conditions, except that the pattern generator has been removed. For example, to say that the variance of the output light is “locally” substantially greater than the variance of the input light means that, if a first digital image of a screen were taken under Standard Conditions while the screen is illuminated by light that travels from a light source to a pattern projector and then to the screen, and a second digital image of the screen were taken under identical conditions except that the pattern generator is absent, then the variance of the first digital image would be locally substantially greater than the variance of the second digital image. The same approach (to defining what a “locally” substantial difference means) also applies to uniformity, average entropy, number of edge crossings, and spatial frequency factor.
To be clear, this invention does not require that, in normal operation, images be taken under Standard Conditions. For example, this invention does not require that images be taken of a screen, or that the images be 16.1 megapixel, or that only a single distant point source of light be used. The Standard Conditions are merely used for purposes of certain definitions.
Likewise, this invention does not require that, in normal operation, any local image processing be done with 25×25 pixel neighborhoods. The 25×25 pixel neighborhood is merely the subdivision scheme used in a definition of how variables “locally” increase or decrease.
In illustrative implementations, the outgoing, modified light that leaves the pattern generator comprises textured light. The pattern generator projects textured light unto the scene. In illustrative implementations, the projected light pattern has a high spatial frequency.
In some cases, a pattern generator comprises a refractive optical element, a reflective optical element, or a spatial light modulator (SLM).
In some cases, a pattern generator includes an external surface that includes planar faces, facets or regions. In some cases, a pattern generator has an external surface that includes curved regions. In some cases, a pattern generator is pierced by holes that extend from one side to an opposite side of the pattern generator.
In
In some cases, the pattern generator is a reflective or refractive optical element (e.g., 621, 651) that creates caustic light patterns, when illuminated by one or more illumination sources. For example, in some cases, the caustic pattern includes bright patches and edges, which contrast with a darker background. In some cases, the surface geometry of the reflective or refractive element is chosen by an inverse caustic design algorithm, which starts with a desired caustic light pattern and computes a surface geometry that would create this caustic pattern. For example, in some implementations, well-known inverse caustic design techniques (such as those described by Thomas Kiser, Mark Pauly and others at the Computer Graphics and Geometry Laboratory at the École Polytechnique Fédérale de Lausanne (EPFL)) are used to determine a surface geometry, and then a reflective or refractive pattern generator is fabricated with this surface geometry.
In exemplary implementations, the pattern generator comprises an optical element with a shape such that, when the optical element is illuminated by light incident on the optical element: (a) the optical element reflects, refracts or selectively attenuates the light; and (b) the light leaves the optical element in a significantly non-uniform pattern of intensity.
In illustrative embodiments of this invention, the pattern generator comprises a patterned optical element.
As used herein, a “patterned optical element” (also called a “POE”) means an optical element with a shape such that, for at least one specific direction of incident light, the optical element reflects, refracts or selectively attenuates the incident light; and light leaving the optical element has a significantly non-uniform pattern of intensity, which pattern is not present in the incident light. For purposes of the preceding sentence: (a) “incident light” means light incident on the optical element, and (b) incident light is treated as being in a specific direction if it is emitted by a single, distant point source of light, which source is in that specific direction from the optical element. For example, an optical element that satisfies the first sentence of this paragraph for only one specific direction of incident light, and not for any other direction of incident light, is a POE. Also, for example, an optical element that satisfies that first sentence, when illuminated simultaneously by multiple directions of incident light, is a POE.
To be clear, the preceding definition of “patterned optical element” does not require that, in normal operation, a patterned optical element be illuminated in only one specific direction. When used in this invention, a patterned optical element is normally illuminated in multiple directions by multiple illumination sources.
“POE” is an acronym for “patterned optical element”.
In illustrative implementations, each pattern generator shown in
In some implementations, the shape of the POE is such that, for at least one specific direction of incoming light, the POE projects a light pattern that has multiple large intensity peaks. In such implementations, if the POE projected the light pattern onto a screen and a digital image of a screen were captured under Standard Conditions, the image would include multiple large intensity peaks.
Notwithstanding anything to the contrary herein, however: (a) a single, simple lens that is spherical or cylindrical is not a “patterned optical element” or “pattern generator”; and (b) a single Serial Lens System is not a “patterned optical element” or “pattern generator”. As used herein, a “Serial Lens System” means a lens system in which (i) the optical elements of the system consist only of simple lenses that are each either spherical or cylindrical, and (ii) a light ray transmitted through the system travels through each lens of the system, one lens at a time
Under the base illumination shown in
In some implementations, one or more computers compute scene depth, then output control signals to cause a visual display screen to display the calculated depth information in a humanly-readable format. For example, in some cases, a depth map is displayed.
In some implementations, multiple cameras capture near-infrared (NIR) images of a scene, and at least one camera captures a visible light image of a scene. One or more computers output control signals to cause a visual display screen to display calculated depth information, overlaid on an ordinary visible light image of the scene.
This invention is not limited to the particular shapes of the pattern generators shown in the
It is well-known that, starting with a projected pattern of light, one can calculate the surface geometry of an object (e.g., a reflective or refractive optical element or an SLM) that produces the projected pattern of light. For example, a conventional algorithm solves this “reverse design” problem for a reflective optical element by: (a) optimizing a 2D mesh representation of a specular surface; (b) then calculating a normal field from the deformed mesh that results from the optimization; and (c) and integrating to a height field surface.
For example, consider the following problem: given a desired grayscale intensity image, find the shape of a surface that will project a light pattern that reproduces this image. In some cases, a conventional “brightness warping” algorithm is used to solve this problem for a specular surface. In this conventional algorithm, the goal is formulated as an optimization problem. For the optimization, a fixed mesh is used to describe the light pattern, and the points on mesh of a specular surface which cast the corresponding rays are moved. The mesh of the specular surface is divided into quadangular patches which correspond to faces of the projected light mesh. By optimizing the area of these patches in the mesh for a specular surface, the brightness of the corresponding quads of the mesh for the projected light are adjusted to a desired distribution. The larger the area of a face in the warped mesh on the specular plane, the more light is projected on the unchanged area in the mesh for the projected light, increasing the brightness. Thus, the optimization deforms the mesh for the specular surface, such that the desired amounts of light are allocated to the corresponding faces of the mesh for the light pattern. The boundary vertices of the warped mesh (in the specular plane) are confined to remain on the border. Once the deformation of this mesh is computed, the normal field is obtained by interpolating the outgoing ray directions at the grid nodes using barycentric coordinates. The normal field is then integrated to a height field. Quadratic brightness constraints are employed. A consistency term is used to ensure intergrability.
A conventional “brightness warping” algorithm for a refractive object works in the same manner. For a reflective surface, a single height map is used. For a refractive object, there are two height maps: one for the surface where light enters the object and one for the surface where the light exits the object. But one of the refractive surfaces is assumed to be planar (and the incident light parallel), so the algorithm is run only on a single surface for a refractive object.
Also, well-known techniques can be employed to “reverse design” an SLM. Starting with a desired projected pattern of light, these techniques calculate an SLM that projects the light. In these calculations, conventional ray tracing is used for light rays that pass through apertures of the SLM without diffraction. In some cases, small apertures in an SLM create significant diffraction effects. In those cases, conventional formulas for modeling diffraction effects are employed.
Thus, by specifying a light pattern projected by a pattern generator, one specifies the structure (i.e., the shape) of the pattern generator. This applies for all pattern generators (including reflective or refractive pattern generators, and SLMs).
In illustrative implementations, images are taken by two cameras placed apart from each other. The change in position (between the two images) of a near scene point is expected to be more than change in position (between the two images) of a far scene point. A computer uses this disparity in the features to determine, with simple triangulation, the depth of the features.
In exemplary implementations of this invention, a computer performs a conventional algorithm to determine depth by triangulation from the images taken by the multiple cameras.
Many conventional algorithms exist for determining depth by triangulation from two images taken by two cameras from different vantage points. The following four paragraphs provide a brief overview of some features of these conventional methods.
Conventionally, if information about the cameras used to take the images are known by calibration (such as their locations, focal lengths, etc.), the exact coordinates of each feature can be reconstructed and used to produce a three-dimensional model of the scene.
Conventionally, a computer performs an algorithm to compute depth by triangulation. Part of this algorithm determines which features of one image match to those of the other image (i.e., the algorithm solves what is known as the correspondence problem). Consider two cameras placed side by side and taking left and right image respectively of the scene. To find the depth of a feature on the left image, the algorithm first finds the corresponding feature on the right image. Instead of searching the complete right image, the search space is reduced by using the epipolar geometry constraint. A point on the left image can correspond to a point lying only on the epipolar line in the right image. This constraint reduces the search space from complete right image to just a line in the right image. The process of determining epipolar lines yields essential matrix and fundamental matrix for the camera pair.
Conventionally, the algorithm also includes steps collectively known as rectification. Rectification transforms the left and right images such that epipolar lines become horizontal. After the images have been rectified, for a pixel at the kth row of the transformed left image, the algorithm searches for correspondence along the kth row of the transformed image.
Many different conventional algorithms exist for determining depth by triangulation from stereo images. Typically, these algorithms include one or more of: (1) matching cost computation; (2) cost aggregation; (3) disparity computation/optimization; and (4) disparity refinement. Examples of some approaches include sum of squared differences, cross-correlations, graph cut methods, dynamic programming, scanline optimization, genetic algorithms, and stochastic diffusion.
As noted above, in exemplary embodiments of this invention, any of these conventional methods can be used to compute depth by triangulation from stereo images taking by multiple cameras.
In exemplary implementations of this invention, the cameras are calibrated to determine intrinsic and extrinsic parameters (encoded in the essential and fundamental matrices). These parameters are used to apply a projective transform that converts the disparity map into real world depth map.
In some implementations of this invention, one or more computers compute a rectification matrix (either automatically or in response to user input that signifies an instruction to do so). In some cases, the computers are also programmed to detect an error in a rectification matrix and, upon detecting the error, to compute a new, corrected rectification matrix. For example, changed conditions, such as changed temperature, sometimes causes baseline distances between cameras to change, and thereby cause a previously computed rectification matrix to become inaccurate. In that case, the computer would calculate a new, accurate rectification matrix.
In some implementations of this invention, one or more computers perform an algorithm to apply a low pass filter to images captured by the camera. The low pass filter tends to remove noise from the images. Also, the low pass filter tends to remove, from the images of the scene, any high spatial frequency light pattern projected onto the scene by the pattern generator.
In illustrative implementations, any type of camera is used, including any digital camera or digital video camera.
In illustrative embodiments, the pattern generator projects NIR light which has a visual texture in the NIR frequency spectrum. In these embodiments, the cameras capture NIR images, and any hot mirrors on the cameras are removed (or not functional). In some cases, full spectrum cameras (e.g., for capturing images in a range from 300 nm to 1000 nm) are used.
In some implementations, the illumination sources include one or more strobe lights, and the multiple cameras include one or more high-speed cameras. In some use scenarios, the strobe lights and high-speed cameras are used for capturing images of a rapidly moving object without blurring.
The system 800 includes a power source module 810. In some cases, the power source module 810 steps down or rectifies power from a wall outlet. Lines (e.g. 811) between the above hardware components in
User interface module 809 may vary, depending on the particular implementation of this invention. In some cases, user interface module 809 includes a combination of one or more of input devices (such as a touchscreen, contact-sensitive display, keypad, mouse, joystick, scroll-wheel, buttons, dials, sliders, microphone, haptic transducer, or motion sensing input device) and one or more output devices (such as a touchscreen or other visual display screen, projector, speaker, or haptic transducer).
In the example shown in
In some cases, all or part of system 800 is mounted on a wall or affixed (at least semi-permanently) to another surface. For example, in some cases, multiple copies of the system are mounted at different points along a perimeter of a large room (e.g., in a store, restaurant, lobby, mall or other public space).
Alternatively, all or part of system 800 is housed in a portable electronic device. In some cases, some components of system 800 (e.g., the user interface 809) are housed or affixed at a location removed from other components of the system.
The lighting system 836 includes an illumination module 837, which in turn includes one or more active light sources (e.g., 849). In some cases, lighting system 836 also includes one or more specular surfaces (e.g., mirrors 841, 843) for reflecting light from one or more active light sources. In some cases, lighting system 841 includes a lens system 844. The lighting system 836 illuminates the pattern generator 843.
Optionally, the projector 835 includes additional optical system 845. Light from the pattern generator 843 strikes the additional optical system 845. After exiting the additional optical system 845, outgoing light travels to the scene. The additional optical system 845 comprises a lens system 846, or a reflective system 847, or a combination of both. Reflective system 847 comprises one or more specular surfaces for reflecting light.
Each lens system 844, 846 comprises one or more lenses. Taken as a whole, each lens system 844, 846 comprises or is the functional equivalent of a positive lens (for converging light), a negative lens (for diverging light) or an optical element that transmits but neither converges nor diverges light.
The illumination modules shown in
In illustrative implementations, hardware components of a depth-finding system are supported by one or more support structures (such as housing, beams, trusses, cantilevers, fixtures, fasteners, cables, or other components for supporting a load).
In some implementations, a calibration step includes turning light sources in the illumination module on and off.
For example, in some cases, turning light sources on and off, one at a time, makes it easier to register images and to calculate a correspondence matrix.
Also, for example, in some cases, light sources are turned on and off rapidly to determine any time difference between when an event is seen by different cameras taking a video of the scene from different vantage points. If in one video camera an event is first observed at T instant, and in another camera the event is first observed at T+t instant, then, in some cases, a computer uses an offset of time period ‘t’ between the cameras when processing images, in order to synchronize the cameras computationally.
In some implementations, high wattage light sources are used to increase the range of distances over which depth can be sensed. For example, in some cases, high wattage LEDs in an illumination module are used to project an intense light pattern so that a bright textured light pattern is clearly visible at greater scene depths (e.g., at depths of 8-10 meters). The ability to measure depth at a large distance (e.g., 8-10 meters) is desirable in many settings, including, in some cases, in public places such as restaurants, banks, and office buildings.
In some implementations, the FOI of a single depth-sensing device is too small for the size of the scene. To solve that problem, multiple depth-sensing devices are employed (e.g., in different positions along the perimeter of a large room). Overlapping FOIs from different depth-sensing devices is generally not a problem. Instead, visual patterns in overlapping FOIs are added by superposition, making an even richer visual texture, and thereby facilitating depth detection.
In some cases, a computer processes an image and detects a moving object in the scene. The computer then sends control signals to power-on a selected set of one or more light sources, so that a textured light pattern is projected at a region in which the moving object is currently located, and not at other regions. For example, in some cases, a look-up table is stored in electronic memory and maps a light source (or a set of light sources) to a region of the scene that is illuminated by the light (or set of lights). A computer accesses the look-up table to determine which light sources to turn on to illuminate the moving object at its current location. In this example, illuminating only the region where the moving object is located reduces power consumption.
In exemplary implementations of this invention, one or more electronic computers (e.g. 143, 183, 569, 599, 805, 822) are adapted: (1) to control the operation of, or interface with, hardware components of a depth-sensing device, including any light sources, cameras, or actuators; (2) to perform any calculation described above; including any calculation of a correlation matrix, rectification matrix or any computation of depth by triangulation; (3) to receive signals indicative of human input, (4) to output signals for controlling transducers for outputting information in human perceivable format, and (5) to process data, to perform computations, to execute any algorithm or software, and to control the read or write of data to and from memory devices. The one or more computers may be in any position or positions within or outside of the depth-sensing device. For example, in some cases (a) at least one computer is housed in or together with other components of the depth-sensing device, and (b) at least one computer is remote from other components of the depth-sensing device. The one or more computers may be connected to each other or to other components in the depth-sensing device either: (a) wirelessly, (b) by wired connection, or (c) by a combination of wired and wireless connections.
In exemplary implementations, one or more computers are programmed to perform any and all algorithms described herein, and any and all functions described in the immediately preceding paragraph. For example, in some cases, programming for a computer is implemented as follows: (a) a machine-accessible medium has instructions encoded thereon that specify steps in an algorithm; and (b) the computer accesses the instructions encoded on the machine-accessible medium, in order to determine steps to execute in the algorithm. In exemplary implementations, the machine-accessible medium comprises a tangible non-transitory medium. In some cases, the machine-accessible medium comprises (a) a memory unit or (b) an auxiliary memory storage device. For example, while a program is executing, a control unit in a computer may fetch the next coded instruction from memory.
In some cases, each computer includes or interfaces with one or more of the following features: a digital signal processor, microprocessor, a processor with accompanying digital signal processor, a processor without accompanying digital signal processor, a special-purpose computer chip, a field-programmable gate array, a controller, an application-specific integrated circuit, an analog to digital converter, a digital to analog converter, or a multi-core processor such as a dual or quad core processor.
The terms “a” and “an”, when modifying a noun, do not imply that only one of the noun exists.
“Active light source” is defined elsewhere in this document.
As a non-limiting example of “allowed intensity levels”, consider a grayscale image in which intensity is encoded by 8 bits, the lowest possible intensity level is 0, and the highest possible intensity level is 255. In this example, there are 256 “allowed intensity levels”, which are integers ranging from 0 to 255.
To say that projected light “adds visual texture” to a scene means that a significantly non-uniform pattern of intensity exists in the illumination of the scene when the scene is lit by both the projected light and the base lighting of the scene, which pattern is not present in the illumination of the scene when the scene is lit by only the base lighting of the scene. For purposes of the preceding sentence, “base lighting” means total illumination of the scene minus the projected light if any.
Here are some non-limiting examples of a “camera”: (a) a digital camera; (b) a video camera; (c) a NIR camera; and (d) a full spectrum camera (which images at least visible and NIR light).
“Average entropy” is defined elsewhere in this document.
The term “comprise” (and grammatical variations thereof) shall be construed as if followed by “without limitation”. If A comprises B, then A includes B and may include other things.
The term “computer” includes any computational device that performs logical and arithmetic operations. For example, in some cases, a “computer” comprises an electronic computational device, such as an integrated circuit, a microprocessor, a mobile computing device, a laptop computer, a tablet computer, a personal computer, or a mainframe computer. For example, in some cases, a “computer” comprises: (a) a central processing unit, (b) an ALU (arithmetic/logic unit), (c) a memory unit, and (d) a control unit that controls actions of other components of the computer so that encoded steps of a program are executed in a sequence. For example, in some cases, the term “computer” also includes peripheral units, including an auxiliary memory storage device (e.g., a disk drive or flash memory). However, a human is not a “computer”, as that term is used herein.
“Defined Term” means a term that is set forth in quotation marks in this Definitions section.
A point source of light that illuminates an object is “distant” from the object if the distance between the point source of light and the object is greater than ten times the maximum dimension of the object. For example, if a point source of light (such as a single LED) illuminates a pattern generator, the maximum dimension of the pattern generator is 4 cm, and the distance between the point source and the pattern generator is 45 cm, then the point source is “distant” from the pattern generator.
A “depth map” means (a) a set of data regarding depth of points in a scene, or (b) a visual display that conveys all or part of this data in humanly-perceptible format.
A “directional” light source means a light source that emits (or reflects or transmits) greater radiance in at least one direction than in other directions.
For an event to occur “during” a time period, it is not necessary that the event occur throughout the entire time period. For example, an event that occurs during only a portion of a given time period occurs “during” the given time period.
As used herein, an “edge” means a feature of an image that would be treated as an edge by a Marr-Hildreth edge detection algorithm, using a 5×5 LoG (LaPlacian of a Gaussian) mask, which 5×5 LoG mask has the values that are conceptually shown in mask 1001 in
The term “e.g.” means for example.
“Emission of photons” is defined elsewhere in this document.
The fact that an “example” or multiple examples of something are given does not imply that they are the only instances of that thing. An example (or a group of examples) is merely a non-exhaustive and non-limiting illustration.
Unless the context clearly indicates otherwise: (1) a phrase that includes “a first” thing and “a second” thing does not imply an order of the two things (or that there are only two of the things); and (2) such a phrase is simply a way of identifying the two things, respectively, so that they each can be referred to later with specificity (e.g., by referring to “the first” thing and “the second” thing later). For example, unless the context clearly indicates otherwise, if an equation has a first term and a second term, then the equation may (or may not) have more than two terms, and the first term may occur before or after the second term in the equation. A phrase that includes a “third” thing, a “fourth” thing and so on shall be construed in like manner.
“FOI” means field of illumination.
The term “for instance” means for example.
The term “frame” shall be construed broadly. For example, the term “frame” includes measured data about a scene that is captured by a camera during a single time period or single exposure, even if (i) the data is not humanly perceptible, (ii) the data has not been computationally processed, and (iii) there is not a one-to-one mapping between the data and the scene being imaged.
In the context of a camera (or components of the camera), “front” is optically closer to the scene being imaged, and “rear” is optically farther from the scene. In the context of a projector (or components of the projector), “front” is optically closer to the surface upon which light is projected by the projector, and “rear” is optically further from that surface. The “front” and “rear” of a camera or projector continue to be the front and rear, even when the camera or projector is not being used.
“Herein” means in this document, including text, specification, claims, abstract, and drawings.
The term “hole” means a hole, cavity, gap, opening or orifice.
The terms “horizontal” and “vertical” shall be construed broadly. For example, “horizontal” and “vertical” may refer to two arbitrarily chosen coordinate axes in a Euclidian two dimensional space, regardless of whether the “vertical” axis is aligned with the orientation of the local gravitational field. For example, a “vertical” axis may oriented along a local surface normal of a physical object, regardless of the orientation of the local gravitational field.
“Illumination source” is defined elsewhere in this document.
As used herein: (1) “implementation” means an implementation of this invention; (2) “embodiment” means an embodiment of this invention; (3) “case” means an implementation of this invention; and (4) “use scenario” means a use scenario of this invention.
The term “include” (and grammatical variations thereof) shall be construed as if followed by “without limitation”.
“Intensity” means any measure of or related to intensity, energy or power. For example, the “intensity” of light includes any of the following measures: irradiance, spectral irradiance, radiant energy, radiant flux, spectral power, radiant intensity, spectral intensity, radiance, spectral radiance, radiant exitance, radiant emittance, spectral radiant exitance, spectral radiant emittance, radiosity, radiant exposure or radiant energy density. Notwithstanding anything to the contrary herein, in the context of a digital image, “intensity” means a measure of achromatic light intensity, such as (1) grayscale intensity, (2) the intensity component of the HSI (hue, saturation, intensity) color model, or (3) luma.
As used herein, an “intensity peak” means a relative maximum of light intensity.
As used herein, a “large intensity peak” of an image means an intensity peak of the image, such that at least one specific pixel in the intensity peak has an intensity equal to the highest intensity in a square neighborhood of the image, which square neighborhood is centered at the specific pixel and has a size equal to at least one fiftieth of the total number of pixels in the image. Solely for purposes of the preceding sentence, if the specific pixel is so close to a border of the image that a portion of the square neighborhood would extend outside the border (and thus beyond the confines of the image) if the neighborhood were centered at the specific pixel, then the neighborhood is treated as if it extended outside the border and any pixel in the neighborhood that would be outside the border is treated as having an intensity of zero.
“Light” means electromagnetic radiation of any frequency. For example, “light” includes, among other things, visible light and infrared light. Likewise, any term that directly or indirectly relates to light (e.g., “imaging”) shall be construed broadly as applying to electromagnetic radiation of any frequency.
As used herein, (i) a single scalar is not a “matrix”, and (ii) one or more entries, all of which are zero (i.e., a so-called null matrix), is not a “matrix”.
The “maximum dimension” of an object is the longest Euclidian distance between any two points on the exterior surface of the object.
The term “mobile computing device” or “MCD” includes any of the following electronic devices: a smartphone, cell phone, mobile phone, phonepad, tablet, laptop, notebook, notepad, personal digital assistant, enterprise digital assistant, ultra-mobile PC, or any handheld computing device. A device may be an MCD even if it is not configured for direct or indirect connection to an internet or world wide web.
“Multiple edges” is defined elsewhere in this document.
To “multiply” includes to multiply by an inverse. Thus, to “multiply” includes to divide.
“NIR” means near infrared.
“Number of edges” is defined elsewhere in this document.
The term “optical element” is not limited to a refractive optical element (e.g., a lens) or a reflective optical element (e.g., a mirror). In some cases, an optical element is an SLM.
The term “or” is inclusive, not exclusive. For example A or B is true if A is true, or B is true, or both A or B are true. Also, for example, a calculation of A or B means a calculation of A, or a calculation of B, or a calculation of A and B.
“Passive light source” is defined elsewhere in this document.
“Patterned optical element” is defined elsewhere in this document.
To “point” a directional illumination source in a given direction means to orient the illumination source such that the radiance leaving the illumination source in the given direction is greater than or equal to the radiance leaving the illumination source in any other direction.
To “program” means to encode, in tangible, non-transitory, machine-readable media, instructions for a computer program. To say that a computer is “programmed” to perform a task means that instructions for the computer to perform the task are encoded in tangible, non-transitory, machine-readable media, such that the instructions are accessible to the computer during operation of the computer.
To say that an object “projects” light means that the light leaves the object (e.g., by reflection, refraction or transmission).
A parenthesis is simply to make text easier to read, by indicating a grouping of words. A parenthesis does not mean that the parenthetical material is optional or can be ignored.
To say that an object “selectively attenuates” light means that the object non-uniformly attenuates the light, such that the amount of attenuation of a light ray incident at a point on a surface of the object depends on at least the 2D spatial position of the point on the surface
As used herein, the term “set” does not include a so-called empty set (i.e., a set with no elements). Mentioning a first set and a second set does not, in and of itself, create any implication regarding whether or not the first and second sets overlap (that is, intersect).
The “shape” of an SLM includes the spatial pattern of light-transmitting and light-attenuating areas of the SLM. For example, the “shape” of a pinhole mask includes the spatial pattern of the mask holes (which transmit light) and of the mask opaque regions (which block light). Also, for example, the “shape” of an LCD includes the spatial arrangement of LCD pixels that attenuate light by different amounts. Also, in some cases, the “shape” of an LCD includes the shape of twisted nematic crystals or other liquid crystals in the LCD pixels, which in turn determine degree of attenuation of light incident on the LCD pixels.
To say that a pattern of intensity is “significantly non-uniform” means that, in the pattern, intensity as a function of spatial position is not substantially constant.
“Some” means one or more.
“Spatial frequency factor” is defined elsewhere in this document.
A “spatial light modulator”, also called an “SLM”, means a device that (i) either transmits light through the device or reflects light from the device, and (ii) attenuates the light, such that the amount of attenuation of a light ray incident at a point on a surface of the device depends on at least the 2D spatial position of the point on the surface. A modulation pattern displayed by an SLM may be either time-invariant or time-varying.
“Standard Conditions” is defined elsewhere in this document.
As used herein, a “subset” of a set consists of less than all of the elements of the set.
A “substantial” increment of a value means a change of at least 10% in that value. For example: “Substantially increase” means to increase by at least 10 percent. “Substantially decrease” means to decrease by at least 10 percent. To say that a value X is “substantially greater” than a value Y means that X is at least 10 percent greater than Y, that is X≧(1.1)Y. To say that a value X is “substantially less” than a value Y means that X is at least 10 percent less than Y, that is X≦(0.9)Y.
To say that a value is “substantially constant” means that at least one constant number exists, such that the value is always within a single range, where: (a) the bottom of the range is equal to the constant number minus ten percent of the constant number; and (b) the top of the range is equal to the constant number plus ten percent of the constant number.
The term “such as” means for example.
“Uniformity” is defined elsewhere in this document.
“Variation” is defined elsewhere in this document.
“Visual texture” is defined elsewhere in this document.
Spatially relative terms such as “under”, “below”, “above”, “over”, “upper”, “lower”, and the like, are used for ease of description to explain the positioning of one element relative to another. The terms are intended to encompass different orientations of an object in addition to different orientations than those depicted in the figures.
A matrix may be indicated by a bold capital letter (e.g., D). A vector may be indicated by a bold lower case letter (e.g., α). However, the absence of these indicators does not indicate that something is not a matrix or not a vector.
Except to the extent that the context clearly requires otherwise, if steps in a method are described herein, then: (1) steps in the method may occur in any order or sequence, even if the order or sequence is different than that described; (2) any step or steps in the method may occur more than once; (3) different steps, out of the steps in the method, may occur a different number of times during the method, (4) any step or steps in the method may be done in parallel or serially; (5) any step or steps in the method may be performed iteratively; and (6) the steps described are not an exhaustive listing of all of the steps in the method, and the method may include other steps.
This Definitions section shall, in all cases, control over and override any other definition of the Defined Terms. For example, the definitions of Defined Terms set forth in this Definitions section override common usage or any external dictionary. If a given term is explicitly or implicitly defined in this document, then that definition shall be controlling, and shall override any definition of the given term arising from any source (e.g., a dictionary or common usage) that is external to this document. If this document provides clarification regarding the meaning of a particular term, then that clarification shall, to the extent applicable, override any definition of the given term arising from any source (e.g., a dictionary or common usage) that is external to this document. To the extent that any term or phrase is defined or clarified herein, such definition or clarification applies to any grammatical variation of such term or phrase, taking into account the difference in grammatical form. For example, the grammatical variations include noun, verb, participle, adjective, or possessive forms, or different declensions, or different tenses. In each case described in this paragraph, Applicant is acting as Applicant's own lexicographer.
More Examples:
This invention may be implemented in many different ways. Here are some non-limiting examples:
In one aspect, this invention is a system comprising: (a) a set of multiple light sources; (b) a pattern generator for projecting light, when the pattern generator is illuminated by the multiple light sources; (c) multiple cameras for capturing, from different viewpoints, images of a scene illuminated by the light; and (d) one or more computers for processing the images and computing the depth of different points in the scene, by a computation that involves triangulation. In some cases, the pattern generator comprises a refractive optical element. In some cases, the system further comprises a positive lens positioned such that (a) the positive lens is in an optical path between the pattern generator and a given light source, out of the set of multiple light sources; and (b) the focal length of the positive lens is greater than the distance between the positive lens and the given light source. In some cases, the system further comprises actuators for translating at least some of the multiple light sources relative to the pattern generator, or for rotating at least some of the multiple light sources. In some cases, the system further comprises mirrors that: (a) are positioned for reflecting light from one or more of the light sources to the pattern generator; and (b) cause the maximum angle subtended by two light sources out of the multiple light sources, when viewed from the pattern generator, to be greater than such angle would be in the absence of the mirrors. Each of the cases described above in this paragraph is an example of the system described in the first sentence of this paragraph, and is also an example of an embodiment of this invention that may be combined with other embodiments of this invention.
In another aspect, this invention is a system comprising: (a) a set of multiple illumination sources; (b) a patterned optical element (POE), which POE is positioned such that each illumination source in the set is in a different direction, relative to the POE, than the other illumination sources in the set, and such that an optical path exists for light from each of the illumination sources to travel to the POE; (c) multiple cameras for capturing, from different viewpoints, images of a scene illuminated by output light that leaves the POE; and (d) one or more computers that are programmed to process the images and to compute the depth of different points in the scene, by a computation that involves triangulation. In some cases, the POE comprises a spatial light modulator. In some cases, the POE comprises a reflective optical element that includes a specular surface. In some cases, the POE comprises a refractive optical element. In some cases, the POE has a shape such that, when the POE is illuminated by input light and output light leaves the POE, the number of edge crossings in the output light is greater than the number of edge crossings in the input light. In some cases, the POE has a shape such that, when the POE is illuminated by input light and output light leaves the POE, the spatial frequency factor of the output light is greater than the spatial frequency factor of the input light. In some cases, the POE has a shape such that, when the POE is illuminated by input light and output light leaves the POE, the variance of the output light is greater than the variance of the input light. In some cases, the one or more computers are programmed to output control signals to control at least one illumination source in the set and to control the multiple cameras, such that the images are captured while the at least one illumination source illuminates the POE. In some cases, an angle subtended by two illumination sources, out of the multiple illumination sources, when viewed from the viewpoint of the POE, exceeds sixty degrees. In some cases, the system further comprises a positive lens positioned such that (a) the positive lens is in an optical path between the POE and a given light source, out of the set of multiple light sources; and (b) the focal length of the positive lens is greater than the distance between the positive lens and the given light source. In some cases, the system further comprises one or more actuators for translating one or more illumination sources, mirrors or lenses. In some cases, the system further comprises mirrors that: (a) are positioned for reflecting light from one or more of the light sources to the POE; and (b) cause the maximum angle subtended by two light sources out of the multiple light sources, when viewed from the POE, to be greater than such angle would be in the absence of the mirrors. Each of the cases described above in this paragraph is an example of the system described in the first sentence of this paragraph, and is also an example of an embodiment of this invention that may be combined with other embodiments of this invention.
In another aspect, this invention is a method comprising, in combination: (a) using multiple light sources to illuminate an optical element, such that the optical element projects light that adds visual texture to a scene; (b) using multiple cameras for capturing, from different viewpoints, images of the scene illuminated by the light; and (c) using one or more computers to process the images and to compute the depth of different points in the scene, by a computation that involves triangulation. In some cases, the optical element comprises a patterned optical element. In some cases, the method further comprises using a display screen to display a depth map, or outputting control signals to control display of a depth map. Each of the cases described above in this paragraph is an example of the method described in the first sentence of this paragraph, and is also an example of an embodiment of this invention that may be combined with other embodiments of this invention.
While exemplary implementations are disclosed, many other implementations will occur to one of ordinary skill in the art and are all within the scope of the invention. Each of the various embodiments described above may be combined with other described embodiments in order to provide multiple features. This invention includes not only the combination of all identified features but also includes each combination and permutation of one or more those features. Furthermore, while the foregoing describes a number of separate embodiments of the apparatus and method of the present invention, what has been described herein is merely illustrative of the application of the principles of the present invention. Other arrangements, methods, modifications, and substitutions by one of ordinary skill in the art are therefore also within the scope of the present invention. Numerous modifications may be made by one of ordinary skill in the art without departing from the scope of the invention.