The present disclosure is generally related to computer stereo vision and, more particularly, to techniques pertaining to camera configuration for active stereo without image quality degradation.
Unless otherwise indicated herein, approaches described in this section are not prior art to the claims listed below and are not admitted as prior art by inclusion in this section.
Computer stereo vision is a technology that provides three-dimensional (3D) information from digital images of a scene. By comparing information about the scene from two digital images taken from two vantage points, 3D information can be obtained with stereo matching by comparing relative positions of objects in the two digital images of the scene. For instance, with a first image of the scene as a base, a correspondent patch may be identified in a second image of the scene. The further displacement of the correspondence patch between the first image and the second image, the closer an object in the scene is to the camera(s) capturing the images. However, there are some limitations associated with stereo matching. For example, pixels may be occluded and as a result stereo matching cannot be performed. As another example, an ambiguous matching result (e.g., due to low texture or repeated pattern) can lead to unreliable depth information. Moreover, although sophisticated depth algorithm is available, some of the limitations associated with stereo matching still cannot be avoided.
The following summary is illustrative only and is not intended to be limiting in any way. That is, the following summary is provided to introduce concepts, highlights, benefits and advantages of the novel and non-obvious techniques described herein. Select implementations are further described below in the detailed description. Thus, the following summary is not intended to identify essential features of the claimed subject matter, nor is it intended for use in determining the scope of the claimed subject matter.
An objective of the present disclosure is to propose schemes, solutions, concepts, designs, methods and apparatuses that address aforementioned issues. Specifically, various schemes, solutions, concepts, designs, methods and apparatuses proposed in the present disclosure pertain to camera configuration for active stereo without image quality degradation.
In one aspect, a method may involve controlling a first sensor and a second sensor to capture images of a scene. The method may also involve extracting depth information about the scene from the images. The first sensor may be configured to sense light in a first spectrum, and the second sensor may be configured to sense light in both the first spectrum and a second spectrum different from the first spectrum.
In another aspect, an apparatus may include a first sensor, a second sensor, and a control circuit coupled to the first sensor and the second sensor. The first sensor may be configured to sense light in a first spectrum. The second sensor may be configured to sense light in both the first spectrum and a second spectrum different from the first spectrum. The control circuit may be configured to control the first sensor and the second sensor to capture images of a scene. The control circuit may be also configured to extract depth information about the scene from the images.
It is noteworthy that, although description provided herein may be in the context of certain technologies, the proposed concepts, schemes and any variation(s)/derivative(s) thereof may be implemented in, for and by other technologies. Thus, the scope of the present disclosure is not limited to the examples described herein.
The accompanying drawings are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of the present disclosure. The drawings illustrate implementations of the disclosure and, together with the description, serve to explain the principles of the disclosure. It is appreciable that the drawings are not necessarily in scale as some components may be shown to be out of proportion than the size in actual implementation in order to clearly illustrate the concept of the present disclosure.
Detailed embodiments and implementations of the claimed subject matters are disclosed herein. However, it shall be understood that the disclosed embodiments and implementations are merely illustrative of the claimed subject matters which may be embodied in various forms. The present disclosure may, however, be embodied in many different forms and should not be construed as limited to the exemplary embodiments and implementations set forth herein. Rather, these exemplary embodiments and implementations are provided so that description of the present disclosure is thorough and complete and will fully convey the scope of the present disclosure to those skilled in the art. In the description below, details of well-known features and techniques may be omitted to avoid unnecessarily obscuring the presented embodiments and implementations.
To address aforementioned issues, active three-dimensional (3D) sensing may be utilized to improve the accuracy in depth information and eliminate some of the limitations described above. In general, active 3D sensing can be achieved by using structured light or active stereo. Under the structured light approach, one infrared (IR) projector or emitter and one IR camera may be utilized to obtain depth information by deformation of light pattern(s) (e.g., dot pattern or stripe patter), hereinafter referred to as “Algorithm 1.” However, this approach cannot be employed or does not work well in a brightly-illuminated environment. Under the active stereo approach, one IR projector/emitter and two IR cameras may be utilized to obtain depth information by stereo matching, hereinafter referred to as “Algorithm 2.” When employed in a brightly-illuminated environment, however, active 3D sensing under the active stereo approach may fall back to a passive mode in which no projector/emitter is utilized (e.g., with the IR projector/emitter turned off).
Referring to
However, this proposed scheme is not without its shortcoming. For instance, due to different RGB-IR patterns designed by different sensor vendors, there may be distortion or degradation in the resultant RGB-IR image quality. Additionally, quality drop in the main camera may be unavoidable since some of the visible light sensing pixels are replaced with IR sensing pixels in the RGB-IR sensor.
Under this proposed scheme, both Algorithm 1 (i.e., using structured light to obtain depth information by pattern deformation in IR image(s)) and Algorithm 2 (i.e., using active stereo to obtain depth information by stereo matching) may be utilized based on the two RGB images and one IR image to generate a resultant depth map of the scene. Accordingly, for any patch in the RGB images where there is repeated pattern(s) or no/low texture, depth information for that patch may still be obtained with the IR image using structured light (Algorithm 1), thereby enhancing performance in depth sensing.
Referring to
Thus, an optimized camera combination is proposed, as shown in
Apparatus 300 may be a part of an electronic apparatus, a portable or mobile apparatus, a wearable apparatus, a wireless communication apparatus or a computing apparatus. For instance, apparatus 300 may be implemented in a smartphone, a smartwatch, a personal digital assistant, a digital camera, or a computing equipment such as a tablet computer, a laptop computer or a notebook computer. Moreover, apparatus 300 may also be a part of a machine type apparatus, which may be an Internet-of-Things (IoT) or narrowband (NB)-IoT apparatus such as an immobile or a stationary apparatus, a home apparatus, a wire communication apparatus or a computing apparatus. For instance, apparatus 300 may be implemented in a smart thermostat, a smart fridge, a smart door lock, a wireless speaker or a home control center. Alternatively, apparatus 300 may be implemented in the form of one or more integrated-circuit (IC) chips such as, for example and without limitation, one or more single-core processors, one or more multi-core processors, one or more reduced-instruction-set-computing (RISC) processors or one or more complex-instruction-set-computing (CISC) processors.
Apparatus 300 may include at least some of those components shown in
In one aspect, control circuit 310 may be implemented in the form of an electronic circuit comprising various electronic components. Alternatively, control circuit 310 may be implemented as part of or in the form of one or more single-core processors, one or more multi-core processors, one or more RISC processors, or one or more CISC processors. That is, even though a singular term “a processor” is used herein to refer to control circuit 310, control circuit 310 may include multiple processors in some implementations and a single processor in other implementations in accordance with the present disclosure. In another aspect, apparatus 310 may be implemented in the form of hardware (and, optionally, firmware) with electronic components including, for example and without limitation, one or more transistors, one or more diodes, one or more capacitors, one or more resistors, one or more inductors, one or more memristors and/or one or more varactors that are configured and arranged to achieve specific purposes in accordance with the present disclosure. In other words, in at least some implementations, control circuit 310 is a special-purpose machine specifically designed, arranged and configured to perform specific tasks pertaining to camera configuration for active stereo without image quality degradation in accordance with various implementations of the present disclosure. In some implementations, control circuit 310 may include an electronic circuit with hardware components implementing one or more of the various proposed schemes in accordance with the present disclosure. Alternatively, other than hardware components, control circuit 310 may also utilize software codes and/or instructions in addition to hardware components to implement camera configuration for active stereo without image quality degradation in accordance with various implementations of the present disclosure.
Under various proposed schemes in accordance with the present disclosure, first sensor 330 may be configured to sense light in a first spectrum, and second sensor 340 may be configured to sense light in both the first spectrum and a second spectrum different from the first spectrum. Control circuit 310 may be configured to control EM wave emitter 320 to project a structured light toward a scene. Control circuit 310 may also be configured to control first sensor 330 and second sensor 340 to capture images of the scene. Control circuit 310 may be further configured to extract depth information about the scene from the images.
In some implementations, first sensor 330 may include an RGB sensor configured to sense light in a visible band, and second sensor 340 may include an RGB-IR sensor configured to sense light in the visible band and an IR band. In some implementations, at least one of the RGB sensor and the RGB-IR sensor comprises a color filter array (CFA) with RGB color filters arranged in a pattern as a Bayer filter mosaic.
In some implementations, in extracting the depth information about the scene from the images, control circuit 310 may be configured to extract the depth information about the scene from the images by using heterogeneous techniques. In some implementations, in extracting the depth information about the scene from the images by using the heterogeneous techniques, control circuit 310 may be configured to extract the depth information about the scene by using a first technique based on a first image captured by first sensor 330 and a second image captured by second sensor 340 in the first spectrum and by using a second technique based on a third image captured by second sensor 340 in the second spectrum. In such cases, the first technique may include obtaining first depth information based on stereo matching, and the second technique may include obtaining second depth information based on pattern deformation using a structured light.
In some implementations, in extracting the depth information about the scene from the images, control circuit 310 may be configured to extract the depth information about the scene from the images by using a single technique based on a first image captured by first sensor 330 and a second image captured by second sensor 340 in the first spectrum. In such cases, the single technique may include obtaining the depth information based on stereo matching.
In some implementations, in extracting the depth information about the scene, control circuit 310 may be configured to perform certain operations. For instance, control circuit 310 may obtain first depth information based on a first image captured by first sensor 330 and a second image captured by second sensor 340 in the first spectrum. Additionally, control circuit 310 may obtain second depth information based on a third image captured by second sensor 340 in the second spectrum. Moreover, control circuit 310 may fuse or otherwise combine the first depth information and the second depth information to generate a combined result as the depth information.
At 410, process 400 may involve control circuit 310 controlling EM wave emitter 320 to project a structured light toward a scene. Process 400 may proceed from 410 to 420.
At 420, process 400 may involve control circuit 310 controlling first sensor 330 and second sensor 340 to capture images of the scene, with first sensor 330 configured to sense light in a first spectrum and with second sensor 340 configured to sense light in both the first spectrum and a second spectrum different from the first spectrum. Process 400 may proceed from 420 to 430.
At 430, process 400 may involve control circuit 310 extracting depth information about the scene from the images.
In some implementations, first sensor 330 may include an RGB sensor configured to sense light in a visible band, and second sensor 340 may include an RGB-IR sensor configured to sense light in the visible band and an IR band. In some implementations, at least one of the RGB sensor and the RGB-IR sensor comprises a color filter array (CFA) with RGB color filters arranged in a pattern as a Bayer filter mosaic.
In some implementations, in extracting the depth information about the scene from the images, process 400 may involve control circuit 310 extracting the depth information about the scene from the images by using heterogeneous techniques. In some implementations, in extracting the depth information about the scene from the images by using the heterogeneous techniques, process 400 may involve control circuit 310 extracting the depth information about the scene by using a first technique based on a first image captured by first sensor 330 and a second image captured by second sensor 340 in the first spectrum and by using a second technique based on a third image captured by second sensor 340 in the second spectrum. In such cases, the first technique may include obtaining first depth information based on stereo matching, and the second technique may include obtaining second depth information based on pattern deformation using a structured light.
In some implementations, in extracting the depth information about the scene from the images, process 400 may involve control circuit 310 extracting the depth information about the scene from the images by using a single technique based on a first image captured by first sensor 330 and a second image captured by second sensor 340 in the first spectrum. In such cases, the single technique may include obtaining the depth information based on stereo matching.
In some implementations, in extracting the depth information about the scene, process 400 may involve control circuit 310 performing certain operations. For instance, process 400 may involve control circuit 310 obtaining first depth information based on a first image captured by first sensor 330 and a second image captured by second sensor 340 in the first spectrum. Additionally, process 400 may involve control circuit 310 obtaining second depth information based on a third image captured by second sensor 340 in the second spectrum. Moreover, process 400 may involve control circuit 310 fusing or otherwise combining the first depth information and the second depth information to generate a combined result as the depth information.
The herein-described subject matter sometimes illustrates different components contained within, or connected with, different other components. It is to be understood that such depicted architectures are merely examples, and that in fact many other architectures can be implemented which achieve the same functionality. In a conceptual sense, any arrangement of components to achieve the same functionality is effectively “associated” such that the desired functionality is achieved. Hence, any two components herein combined to achieve a particular functionality can be seen as “associated with” each other such that the desired functionality is achieved, irrespective of architectures or intermedial components. Likewise, any two components so associated can also be viewed as being “operably connected”, or “operably coupled”, to each other to achieve the desired functionality, and any two components capable of being so associated can also be viewed as being “operably couplable”, to each other to achieve the desired functionality. Specific examples of operably couplable include but are not limited to physically mateable and/or physically interacting components and/or wirelessly interactable and/or wirelessly interacting components and/or logically interacting and/or logically interactable components.
Further, with respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for sake of clarity.
Moreover, it will be understood by those skilled in the art that, in general, terms used herein, and especially in the appended claims, e.g., bodies of the appended claims, are generally intended as “open” terms, e.g., the term “including” should be interpreted as “including but not limited to,” the term “having” should be interpreted as “having at least,” the term “includes” should be interpreted as “includes but is not limited to,” etc. It will be further understood by those within the art that if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases “at least one” and “one or more” to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim recitation to implementations containing only one such recitation, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an,” e.g., “a” and/or “an” should be interpreted to mean “at least one” or “one or more;” the same holds true for the use of definite articles used to introduce claim recitations. In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should be interpreted to mean at least the recited number, e.g., the bare recitation of “two recitations,” without other modifiers, means at least two recitations, or two or more recitations. Furthermore, in those instances where a convention analogous to “at least one of A, B, and C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention, e.g., “a system having at least one of A, B, and C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc. In those instances where a convention analogous to “at least one of A, B, or C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention, e.g., “a system having at least one of A, B, or C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc. It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase “A or B” will be understood to include the possibilities of “A” or “B” or “A and B.”
From the foregoing, it will be appreciated that various implementations of the present disclosure have been described herein for purposes of illustration, and that various modifications may be made without departing from the scope and spirit of the present disclosure. Accordingly, the various implementations disclosed herein are not intended to be limiting, with the true scope and spirit being indicated by the following claims.