The present disclosure relates to a processing system and, in particular, to an image processing system and an image processing method.
In general, dual camera lenses are often used to construct disparity maps for depth estimation. The main concept of depth estimation is matching corresponding pixels in different field-of-view (FOV) images of a dual camera lens. However, pixels in low-textured surfaces do not have obvious matching features, which results in unstable matching results in depth estimation. On the other hand, in a low light source environment, the processor needs to increase the brightness gain to maintain the brightness of the output image. However, the higher brightness gains may cause noise in the output image, which results in unstable depth estimates and reduced reliability. The lower confidence depth estimation can affect the quality of subsequent applications, Depth estimation can be applied in virtual reality or augmented reality, such as three-dimensional (3D) reconstruction of objects or environments. Although longer exposure times or noise suppression can alleviate the problem, these methods can also cause other imaging problems, such as motion blur or loss of detail in the image. The existing dual-camera multi-view method can maintain the temporal consistency of parallax, but the large and complicated processing process requires a lot of calculations.
Therefore, how to improve the quality and stability of the depth map, especially in areas with low texture or noise in the image, has become one of the problems to be solved in the art.
In accordance with one feature of the present invention, the present disclosure provides an image processing system. The image processing system includes a camera module and a processor. The camera module includes a first camera lens and a second camera lens. The first camera lens is configured to capture a first field-of-view (FOV) image at a current position. The second camera lens is configured to capture a second FOV image at the current position. The processor is configured to generate a current depth map and a current confidence map according to the first FOV image and the second FOV image, wherein the current confidence map comprises the confidence value of each pixel, the processor receives a previous camera pose corresponding to a previous position, the previous position corresponding to a first depth map and a first confidence map, the processor maps at least one pixel position of the first depth map to at least one pixel position of the current depth map according to the previous camera pose and the current camera pose of the current position, the processor selects the one with the highest confidence value after the confidence value of at least one pixel of the first confidence map is compared with the corresponding confidence value of the pixel of the current confidence map, and the processor generates an optimized depth map for the current position according to the pixels corresponding to the highest confidence value.
In accordance with one feature of the present invention, the present disclosure provides an image processing method. The image processing method comprises: capturing a first field-of-view (FOV) image at a current position using a first camera lens; capturing a second FOV image at a current position using a second camera lens; generating a current depth map and a current confidence map according to the first FOV image and the second FOV image, wherein the current confidence map comprises the confidence value of each pixel; receiving a previous camera pose corresponding to a previous position, the previous position corresponding to a first depth map and a first confidence map; mapping at least one pixel position of the first depth map to at least one pixel position of the current depth map according to the previous camera pose and the current camera pose of the current position; selecting the one with the highest confidence value after the confidence value of at least one pixel of the first confidence map is compared with the corresponding confidence value of the pixel of the current confidence map; and generating an optimized depth map of the current position according to the pixels that correspond to the highest confidence value.
In summary, the embodiments of the present invention provide an image processing system and an image processing method, which can enable a camera module to refer to the confidence value of each element in the current image and the previous images when shooting a low-textured object or a low-light source environment. The present invention provide an image processing system and an image processing method can generate the optimized depth information of the current image and can achieve the effect of applying the optimized depth information of the current image to produce a more accurate three-dimensional image.
The invention can be more fully understood by reading the subsequent detailed description and examples with references made to the accompanying drawings, wherein:
The following description is of the best-contemplated mode of carrying out the invention. This description is made for the purpose of illustrating the general principles of the invention and should not be taken in a limiting sense. The scope of the invention is best determined by reference to the appended claims.
The present invention will be described with respect to particular embodiments and with reference to certain drawings, but the invention is not limited thereto and is only limited by the claims. It will be further understood that the terms “comprises,” “comprising,” “comprises” and/or “including,” when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Use of ordinal terms such as “first”, “second”, “third”, etc., in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having the same name (but for use of the ordinal term) to distinguish the claim elements.
Please refer to
In one embodiment, the image processing system 100 includes a camera module CA and a processor 10. The images can be transmitted between camera module CA and the processor 10 through a wireless or wired method. The camera module CA includes a camera lens LR and a camera lens LL. In one embodiment, the camera module CA is a dual-lens camera module. In one embodiment, the camera lens LR is a right-eye camera lens. When the camera module CA shoots to point A on the desktop TB, the field-of-view (FOV) image captured by the camera lens LR is a right-eye image. And, the camera lens LL is a left-eye camera lens. When the camera module CA shoots to point A on the desktop TB, the FOV image captured by the camera lens LL is a left-eye image.
In one embodiment, the camera module CA can be disposed in a head-mounted device to capture images as the user's head moves.
In one embodiment, as shown in
In one embodiment, the process of moving the camera module CA from position P1 through position P2 to position P3 can be a continuous action, and continuous shooting can be performed for capturing images of the A point. For the convenience of description, the present invention takes three shots: That is, one shot at each of positions P1, P2, and P3, for example. A set of right-eye and left-eye images is obtained with each shot. However, a person having ordinary knowledge in the art should understand that during the process of moving the camera module CA from position P1 to position P3, multiple shots can be taken, and the number of shots is not limited.
In an embodiment, the point A captured by the camera module CA may be a low-textured object or a low-light source (or noisy) environment. In addition, low-texture objects are, for example, smooth desktops, spheres, or mirrors. The characteristics of these objects are too smooth or the characteristics are unclear. The captured images will have a reflective effect, so that it will cause the captured images unclear. It is difficult for the processor 10 to compare the parallax of the right-eye image and the left-eye image. The low light environment will cause too much noise in the captured image, and the brightness of the image needs to be brightened in order to compare the parallax of the right-eye image and the left-eye image.
In one embodiment, the depth of each pixel can be estimated by using the parallax of the corresponding pixels in the right-eye image and the left-eye image to generate a depth map.
More specifically, when the camera module CA shoots a low-texture object or a low-light environment, the captured image has lower confidence values. The confidence value represents the degree of similarity of the corresponding pixels in the right-eye image and the left-eye image. For example, the degree of similarity between the top-right pixel of the right-eye image and the top-right pixel of the left-eye image. The set (that is, every pixel in the right-eye image and every pixel in the left-eye image) of all degrees of similarity is called a confidence map.
The processor 10 can apply a known matching cost algorithm to calculate the confidence value. For example, the matching cost distribution algorithm uses the corresponding absolute intensity differences in the right-eye image and the left-eye image as matching costs, and considers these comparison costs as the confidence values of the corresponding pixels in the right-eye image and the left-eye image. In other words, each pixel of the right-eye image and its corresponding pixel of the left-eye image correspond to a confidence value. The matching cost distribution algorithm is a well-known algorithm, so it will not be described herein.
As shown in
It can be seen that when the camera module CA shoots a low-texture object or a low-light source environment, the captured image is likely to cause inconspicuous gray levels due to reflections or low-light sources, which makes the confidence value calculated by the processor 10 unstable. Therefore, the present invention addresses this situation by referring to previous images to generate optimized depth information of the current image. Please refer to
In step 210, a first camera lens captures a first field-of-view (FOV) image at a current position, and a second camera lens captures a second FOV image at the current position.
In one embodiment, as shown in
In step 220, a processor generates a current depth map and a current confidence map according to the first FOV image and the second FOV image, and the current confidence map includes the confidence value of each pixel.
In one embodiment, the processor 10 generates a current depth map according to the right-eye image and the left-eye image captured when the camera module CA is located at the current position P3. The processor 10 applies a known algorithm to generate a current depth map, such as a stereo matching algorithm.
In one embodiment, the processor 10 applies a known comparison cost distribution algorithm to calculate the confidence value. The confidence value represents the degree of similarity of the corresponding pixels in the right-eye image and the left-eye image. The set (that is, every pixel in the right-eye image and every pixel in the left-eye image) of all degrees of similarity is called a confidence map.
In step 230, the processor 10 receives a previous camera pose corresponding to a previous position, and the previous position corresponds to a first depth map and a first confidence map.
In one embodiment, the previous camera pose is provided by a tracking system. In one embodiment, the tracking system can be located inside or outside the image processing system 100. In one embodiment, the tracking system can be an inside-out tracking system, an outside-in tracking system, a lighthouse tracking system, or other tracking systems that can provide camera pose.
In one embodiment, the previous camera pose can be calculated by the camera module CA when shooting at position P1 (previous position). In addition, when the camera module CA is shooting at position P1, the processor 10 can also firstly calculate the first depth map and the first confidence map. Therefore, position P1 (the previous position) has a corresponding depth map and a corresponding confidence map.
In one embodiment, the camera module CA sequentially shoots at positions P1-P3. Therefore, when the camera module CA is shooting at the current position P3, it means that the camera module CA has completed shooting at the positions P1 to P2. And, the processor 10 has generated a depth map and a confidence map corresponding to the positions P1 to P2, respectively. Also, the camera module CA's camera pose at the positions P1 to P2 has been recorded, respectively.
In one embodiment, the camera module CA firstly shoots an object (for example, point A) or an environment at position P1. The processor 10 generates a depth map and a confidence map corresponding to position P1, and records the confidence value of each pixel in the confidence map in a confidence value queue. The camera module CA then shoots the object at position P2. The processor 10 generates a depth map and a confidence map corresponding to position P2, and records the confidence value of each pixel in the confidence map in the confidence value queue. Finally, the camera module CA shoots the object at the current position P3, and the processor 10 records the confidence value of each pixel in the current confidence map in the confidence value queue.
In this example, the queue can record three confidence maps. Therefore, when the first confidence map is generated, the first confidence map is stored in the confidence value queue. When the second confidence map is generated, the first confidence map and the second confidence map are stored in the confidence value queue. When the third confidence map is generated, the first confidence map, the second confidence map, and the third confidence map are stored in the confidence value queue. When the fourth confidence map is generated, the second confidence map, the third confidence map, and the fourth confidence map are stored in the confidence value queue. This represents the current depth map generated based on the current position, which can refer to the confidence map generated by the previous two shots. For example, when shooting at the current position P3, the current depth map can be generated referring to the confidence map generated by shooting at the positions P1 and P2. For another example, when shooting at the current position P4, the current depth map can be generated referring to the confidence map generated by shooting at the positions P2 and P3.
In one embodiment, the processor 10 receives the camera pose of the camera module CA at position P1 (i.e., the previous position), and calculates a depth map and a confidence map corresponding to position P1. In one embodiment, the processor 10 generates a depth map and a confidence map according to the right-eye image and the left-eye image captured when the camera module CA is located at position P1. In one embodiment, the camera pose of the camera module CA can be expressed by a rotation degree and a translation distance. In an embodiment, the camera module CA can obtain the camera pose of the camera module CA in an environmental space through an external tracking system, such as lighthouse technology. The external tracking system can transmit the camera pose of the camera module CA to the processor 10 through a wired or wireless manner.
In step 240, the processor 10 maps at least one pixel position of the first depth map to at least one pixel position of the current depth map according to the previous camera pose and the current camera pose of the current position.
In one embodiment, referring to
In one embodiment, after the processor 10 obtains the depth map F2 corresponding to position P2, the processor 10 attempts to shift or rotate the depth map F2 corresponding to position P2. More specifically, the processor 10 calculates a rotation and translation matrix by means of a conversion formula for calculating a rotation and a translation according to the camera pose of the camera module CA at position P2 (that is, the other previous camera pose) and the camera pose of the camera module CA at the current position P3 (that is, the current camera pose). At least one pixel position of the depth map F2 is mapped to at least one pixel position of the current depth map F3 by the rotation and translation matrix. In one embodiment, the processor 10 maps the pixel PT1 at the top right corner of the depth map F2 to the pixel PT1 at the top right corner of the current depth map F3. Since the shooting position corresponding to the depth map F2, the camera pose of the camera module CA at position P2 (that is, the other previous camera pose) are different from the current shooting position corresponding to the depth map F3 and the camera pose of the camera module CA at position P3 (that is, the current camera pose), when all pixels in the depth map F2 are mapped to the current depth map, the depth map F2 may generate a mapped depth map MF2 with deformation.
In step 250, the processor 10 selects a highest confidence value after the confidence value of at least one pixel of the first confidence map is compared with the confidence value of the corresponding pixel of the current confidence map.
In one embodiment, after the mapped depth maps MF1 and MF2 are generated, the processor 10 can obtain that each pixel in the mapped depth maps MF1 and MF2 maps to each pixel position of the current depth map F3. For at least one pixel (for example, pixel PT1) in the current depth map F3, the processor 10 selects the highest confidence value from the confidence value queue. By comparing the confidence value of at least one pixel in the confidence map corresponding to position P1 and the confidence value of at least one pixel in the confidence map corresponding to position P2 are respectively with corresponding confidence value of the pixel of the current confidence map, the processor 10 may select the confidence value.
In one embodiment, after the mapped depth maps MF1 and MF2 are generated, the processor 10 can know that each pixel in the mapped depth maps MF1 and MF2 maps to each pixel position of the current depth map F3 (for example, the pixel in the top right corner of the mapped depth maps MF1 and MF2 and the current depth map F3 all correspond to the pixel PT1). For each pixel, the processor 10 selects the one having the highest confidence value as the output. For example, as shown in
In step 260, the processor 10 generates an optimized depth map of the current position according to the pixels corresponding to the highest confidence values.
In one embodiment, the processor 10 compares each pixel in the current depth map F3 with the mapped depth maps MF1 and MF2 and individually selects the pixel corresponding to the highest confidence value as the output. For example, the processor 10 selects the pixel PT1 in the top right corner of the depth map F2 with the highest confidence value as the output for the pixel PT1 of the current depth map F3. In addition, if the confidence value of the pixel PT2 of the mapped depth map MF1 is 70, the confidence value of the pixel PT2 of the mapped depth map MF2 is 40, and the confidence value of the pixel PT2 of the current depth map F3 is 30, the processor 20 will select the pixel corresponding to the highest confidence value for the pixel PT2 of the current depth map F3. That is, the depth corresponding to the pixel PT2 of the depth map F1 is used as an output (assuming that the pixel PT2 in the current depth map F3 corresponds to the pixel PT2 respectively in the mapped depth maps MF1 and MF2). For the part that does not correspond to the depth maps MF1 and MF2, the processor 10 uses the depth corresponding to the pixels of the current depth map F3 as an output. After the processor 10 completes the comparison of each pixel in the current depth map F3 and selects the output depth corresponding to each pixel, the entire depth of all the outputs is regarded as an optimized depth map.
In summary, the embodiments of the present invention provide an image processing system and an image processing method, which can enable a camera module to refer to the confidence value of each element in the current image and the previous images when shooting a low-textured object or a low-light source environment. The present invention provide an image processing system and an image processing method can generate the optimized depth information of the current image and can achieve the effect of applying the optimized depth information of the current image to produce a more accurate three-dimensional image.
Although the invention has been illustrated and described with respect to one or more implementations, equivalent alterations and modifications will occur or be known to others skilled in the art upon the reading and understanding of this specification and the annexed drawings. In addition, while a particular feature of the invention may have been disclosed with respect to only one of several implementations, such a feature may be combined with one or more other features of the other implementations as may be desired and advantageous for any given or particular application.
This application claims the benefit of U.S. Provisional Application No. 62/760,920, filed Nov. 14, 2018, the entirety of which is incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
62760920 | Nov 2018 | US |