The present disclosure is directed to the technical field of photographing, in particular to a camera system, a mobile terminal and a method for acquiring a three-dimensional (3D) image.
In the field of mobile terminals, there exist monocular (single infrared receiver camera) grating type structured light applied to dot projector and pulsed type structured light applied to Time of Light (TOF) sensor. Cameras can be classified into front-facing cameras or rear-facing cameras according to whether the orientations of the cameras are consistent with the display direction, that is, the direction commonly used by the users. At present, the front-facing cameras are mainly structured light cameras, and the rear-facing cameras are mainly TOF cameras in propulsion applications. Due to the following limitations, it is rare to deploy structured light or TOF cameras in multiple directions (at least front and rear). Structured light is suitable for short-range high-precision 3D photographing. However, the intensity of diffracted light attenuates quickly and is strongly interfered by ambient light at a long distance. TOF can ensure the 3D recognition accuracy at a certain distance due to the long-distance intensity of pulsed light. However, it is challenging to achieve the same near-distance accuracy of TOF as that of the structured light.
Therefore, it is necessary to provide a new camera system, a mobile terminal, and a method for acquiring a 3D image.
Provided are a camera system, a mobile terminal, and a method for acquiring a three-dimensional (3D) image in various embodiments of the present disclosure, which can improve the photographing performance of the camera system.
An embodiment of the present disclosure provides a camera system, which may include, a first photographing device, a second photographing device, a photographing assistance device, and a processor; where the photographing assistance device is configured to emit a first feature light to an object to be photographed; the first photographing device is configured to collect a second feature light reflected by the object to be photographed after the first feature light is emitted by the photographing assistance device; the second photographing device includes a main camera and at least one secondary camera, and the main camera is configured to collect a first image of the object to be photographed, and the secondary camera is configured to collect a second image of the object to be photographed; and the processor is configured to acquire depth information of the object to be photographed according to the second feature light; and the processor is further configured to perform feature fusion on the first image and the second image, and perform stereo registration on a result of feature fusion and the depth information, to acquire a three-dimensional (3D) image of the object to be photographed.
An embodiment of the present disclosure further provides a mobile terminal, which may include a body and a camera system arranged on the body.
An embodiment of the present disclosure further provides a method for acquiring a three-dimensional (3D) image, which may include, emitting a first feature light to an object to be photographed; acquiring, a second feature light reflected by the object to be photographed collected by a first photographing device, a first image of the object to be photographed captured by a main camera, and a second image of the object to be photographed captured by a secondary camera; acquiring depth information of the object to be photographed according to the second feature light; and performing feature fusion on the first image and the second image, and performing stereo registration on a result of feature fusion and the depth information to acquire a 3D image of the object to be photographed.
One or more embodiments are illustrated in conjunction with the corresponding figure in the drawings, which do not constitute the limitation of the embodiments. Same reference numerals in the drawings are assigned to the same elements, and unless otherwise stated, the figures in the drawings do not limit the scale.
Various embodiments of the present disclosure will be described in detail below in conjunction with the drawings to illustrate the purpose, technical scheme and advantages of the present disclosure. However, it shall be appreciated by those having ordinary skills in the art that many technical details are put forward in order to clarify the present disclosure. However, the technical solutions claimed in the present disclosure can be practiced even without these technical details and various alternations and modifications based on the following embodiments.
An embodiment of the present disclosure relates to a camera system 10. As shown in
The photographing assistance device 3 is configured to emit a first feature light to the object to be photographed. The first photographing device 1 is configured to collect a second feature light reflected by the object to be photographed after the first feature light is emitted by the photographing assistance device 3. The second photographing device 2 includes a main camera 21 and at least one secondary camera 22. The main camera 21 is configured to collect a first image of the object to be photographed, and the secondary camera 22 is configured to collect a second image of the object to be photographed. The processor is configured to acquire depth information of the object to be photographed according to the second feature light, and is further configured to perform feature fusion on the first image and the second image, and perform stereo registration on a result of feature fusion and the depth information to acquire a three-dimensional image of the object to be photographed.
It should be noted that the processor in this embodiment can be arranged in the camera system 10. Alternatively, the processor can be arranged in a mobile terminal having the camera system 10. It is not intended to limit the position of the processor in this embodiment, and the processor can be arranged according to practical requirements.
Compared with the prior art, this embodiment of the present disclosure has the advantage that the processor acquires the depth information of the object to be photographed according to the second feature light collected by the first photographing device, and then fuses the depth information with the images photographed by a plurality of color cameras (i.e., the main camera and at least one secondary camera), so that the static multi-direction (especially forward and backward) three-dimensional recognition or reconstruction can be realized, and the continuous and dynamic three-dimensional recognition and reconstruction can be realized. Thus, diversified application scenes of the system and richer contents of the images are achieved, and the imaging is significantly enhanced and improved, and the photographing performance of the camera system is improved.
Please refer to
In particular, the infrared dot projector is configured to project a structured light coded pattern to the object to be photographed. The first photographing device is configured to collect the infrared structured speckles reflected by the object to be photographed after the structured light coded pattern projected by the infrared dot matrix projector. The processor is configured to acquire the depth information according to the infrared structured speckles. In order to facilitate understanding, the acquisition of the depth information in this embodiment will be illustrated below.
The infrared dot projector modulates the fringes programmed or preset by a computer onto the infrared speckle and projects the infrared speckles to the object to be photographed. The infrared camera is configured to photograph the bending degree of the fringes modulated by the object, demodulate the bending fringes to acquire the phases, then convert the phases into the height of the whole field, and acquire the complete depth information of the object to be photographed.
The camera system 10 shown in
Referring to
1. The infrared dot projector projects the structured light coded pattern to calibrate the characteristics of the object to be photographed.
2. Two infrared cameras symmetrically disposed on the same baseline are utilized to respectively acquire the left and right special images of distortion information generated when the structured light source is projected on the object to be photographed.
3. Distortion rectification and epipolar rectification are performed on the left and right images according to the information of stereo calibration, so that they are aligned.
4. Same features (gray scale or others) are searched for in the left and right images, and a parallax image is output.
5. According to the trigonometric method and the geometric position of the binocular cameras with a common baseline, the depth value is acquired according to the parallax depth calculation formula, and the depth information with high resolution and precision is generated.
It is worth mentioning that, in the above process, the participation of the structured light in binocular depth calculation mainly solves the problem regarding the difficulty in feature calibration of traditional binocular algorithms.
In this embodiment, after the binocular structured light assembly rotates, the typical effective viewing area is an area of 0-180 degrees or 0-360 degrees. The rotation can be done at any angle in such an area. The binocular structured light assembly can enhance the texture of the target object and the binocular positioning is independent of the infrared projector. And thus, the binocular structured light assembly can perform high-precision three-dimensional recognition in the viewing area of 0-180 degrees or 0-360 degrees, and which is well applicable in static and dynamic scenes and dark environments (video information is collected by infrared camera). In this case, the binocular structured light assembly can meet the optical application requirements of mobile terminals by rotating after the default unidirectional arrangement of the assembly is done.
Referring to
1. The binocular structured light assembly acquires the depth information of the object to be photographed (binocular parallax analysis and block matching calculation based on dual infrared cameras). At the same time, the plurality of color cameras preprocess the target image and fuses the information from two cameras (actually two cameras of the plurality of color cameras are operating at the same time, and usually, the main camera of high-definition shall be one of the two operating color cameras, and information fusion from two cameras is realized through the calibration of the main camera and the secondary camera), so as to acquire the color information of the object to be photographed.
2. Stereo registration is performed on the color information and the depth information, that is, the matching of two or more images captured by different image acquisition devices to the same coordinate system, the main purpose of which is to determine the spatial coordinate relationship between corresponding points in different images.
3. A three-dimensional (3D) point cloud is formed. The reason why a 3D point cloud is formed instead of a depth map or grid form lies mainly in that the data of the point cloud is easy to obtain and store, with discrete and sparse characteristics, and it is also easy for the data to expand into high-dimensional feature information.
4. An artificial intelligence (AI) engine is loaded to classify and segment the 3D point cloud. The data of the 3D point cloud is disordered, and multi-camera acquisition will lead to multiplication in noise, which results in the difficulty in the direct application of convolution into the data of the 3D point cloud to obtain local correlation information between three-dimensional points. At the same time, the collected data of the point cloud is likely to be unevenly distributed, and the density of the point cloud in different areas is different, which leads to the difficulty in the sampling of the data points during feature extraction. Therefore, an AI engine is loaded based on the 3D point cloud, and a deep learning approach is utilized, such as learning a cross-transformation based on the input points, which is then utilized to simultaneously weight the input features associated with the points and rearrange them into a potentially implicit canonical order, then product and summation operations are performed on the elements, and the 3D point cloud is thus classified and segmented.
5. 3D image recognition or reconstruction is realized. 3D recognition is mainly utilized for security unlocking and payment by users, and 3D reconstruction is mainly utilized in game modeling, virtual reality, and augmented reality.
It can be understood that in the camera system 10 as shown in
The camera system 10 as shown in
It is worth mentioning that the 3D recognition and reconstruction enables the images captured by the camera to reflect the actual state of objects in 3D space as real as possible, that is, to reconstruct the realistic 3D scene with the 2D images captured by the cameras. Such reconstructions are realized by means of, the binocular parallax of the binocular structured light assembly (two infrared cameras), the dual-camera calibration between color cameras, and the stereo registration between binocular structured light assembly and color cameras described in the above method, all of which involve the processing of mapping matrix and distortion calibration. In terms of the mapping matrix, as long as two types of cameras or two types of image systems are present, it is necessary to carry out coordinate transformation for the real world and the imaging plane (involving the transformation between world coordinates and camera coordinates, transformation between camera coordinates and image coordinates, and transformation between world coordinates and image coordinates). A transformation matrix can substantially include internal parameters (referring to the internal geometric and optical characteristics of cameras, each camera corresponds to a unique internal parameter) and external parameters (the position and direction of cameras in the external world coordinate system (spatial three-dimensional coordinates) or the translated and rotated positions of the cameras relative to the origin of the world coordinate system). In terms of distortion, in order to improve the luminous flux, lenses are deployed in the cameras instead of small holes for imaging. A large number of lenses, which are spherical lenses, are being deployed now, rather than aspherical lenses completely conforming to the ideal optical system, thus resulting in radial distortion and tangential distortion, which shall be calibrated and eliminated.
Referring to
It can be understood that, binocular structured light and common baseline multi-color cameras realize information fusion, which not only enables static multi-direction (especially forward and backward) three-dimensional recognition or reconstruction, but also enables continuous and dynamic three-dimensional recognition and reconstruction, so that the application scenarios are diversified, the content is richer, and the user experience performance is better. Although single-form 3D image recognition or reconstruction can also achieve more special image features by means of pure digital processing, such processing is substantially a “post-processing” by a processor, and is limited by the original image acquisition ability of the hardware. It is still not possible to realize a large number of special image effects or the effects are poor. For example, digital zoom is limited by the optical zoom performance of the original camera no matter how the zoom is performed. By such a comparison, the three different forms of three-dimensional image recognition or reconstruction as shown in
Referring to
The infrared laser emitter emits hundreds of thousands of pulsed laser spots, which are diffused evenly to the object to be photographed. Then the processor acquires the depth information of the object according to the difference between the time at which the infrared lights emit to the object to be photographed and the time at which the infrared camera receives the infrared light reflected by the object, and acquires the depth information of the object to be photographed in conjunction with the color information acquired by the color cameras.
It should be noted that the infrared dot projector consumes less power and is more suitable for static scenes, and the infrared laser emitter has lower noise at long distances and higher frame rate, and is more suitable for dynamic scenes. When the orientation of the camera system 10 is set to front facing or rear facing by default, the infrared dot projector is usually oriented to face forward, and the infrared laser emitter is usually oriented to face backward. When the default single orientation is the top, both components can be deployed. When the camera system 10 faces both forward and backward, the infrared dot matrix projector is usually oriented to face forward and the infrared laser emitter is usually oriented to face backward. Under the rotating application, it is advisable to deploy an infrared laser emitter to ensure a more balanced multi-direction depth information acquisition performance of the camera system 10 (the intensity of dot matrix light projected by infrared dot matrix projector attenuates quickly, and it is easily interfered and weaken by typical strong light such as sunlight, so it is only suitable for short-range depth information acquisition in specific directions).
An embodiment of the present application relates to a mobile terminal 100, which is schematically shown in
In particular, the camera system 10 includes a rectangular first side surface 101, on which a first photographing device A, a second photographing device B and a photographing assistance device C are arranged. The centers of the first photographing device A, the second photographing device B and the photographing assistance device C are all located on the midline L of the long side of the first side surface 101.
The camera system 10 is rotatably connected with the body 4. The body 4 includes a first surface 41 on which a display is arranged and a second surface 42 opposite to the first surface 41. A controller (not shown) is arranged within the body 4. The controller is configured to control the rotation of the camera system 10. The first side surface 101 can rotate at least from the same side of the first surface 41 to the same side of the second surface 42. With this configuration, the multi-angle photographing demand for the mobile terminal 100 can be met, thus improving the reliability of the mobile terminal 100.
The body 4 includes a top 40 on which the camera system 10 is arranged, and each side of the top 40 is provided with a sliding rail 401. The body 4 further includes a periscope mechanism 43 that is movably connected with the sliding rails 401 and rotatably connected with the camera system 10. The controller is further configured to control the periscope mechanism 43 to move along the slide rails 401, so that the camera system 10 moves with the periscope mechanism 43 along the moving direction of the periscope mechanism 43.
Referring to
Referring to
Referring to
Referring to
Referring to
Referring to
Referring to
Referring to
To sum up, the 3D recognition assembly on the 3D recognition assembly rotator, whether on the periscope mechanism or the mobile terminal with the sliding cover, set by default to a single orientation (facing the front or the back) or a dual orientation facing the front and the back. When rotated, the 3D recognition assembly can be oriented at any angle within 180 degrees of the top dome.
Referring to
Referring to
The cover may be formed of a material that is the same as or similar to that of the touch screen cover of the display. A transparent area is formed in the cover at the place corresponding to the effective rotation range of the optical transceiver to receive and transmit optical signals, and the rest of the positions are mainly dark or opaque to obscure the user's view. Alternatively, it is also possible to form transparent windows at only a few fixed positions, such as the front, back, or top, etc. In this case, optical signals are sent and received at those fixed positions merely. In an embodiment, the shape of the cover is a semicircle close to the running track of the optical transceiver to avoid signal distortion of the optical transceiver caused by the irregular shape.
Referring to
Referring to
Referring to
In
Through the rotatable binocular structured light assembly, 3D recognition at any angle in the top semicircle of 180 degrees (tablet or sliding cover type) or 360 degrees (foldable screen type) of the mobile terminal can be achieved. In terms of some typical applications of the front-facing and rear-facing cameras, the functions of secure unlocking (high-precision face recognition) and secure payment (secure payment based on high-precision face recognition) and selfie and face beautify can be realized by the front-facing camera(s). AR, Virtual Reality (VR), Mixed Reality (MR) applications, and three-dimensional object reconstruction in games, shopping, or the like can be realized by the rear-facing camera(s). With the addition of multiple color cameras, the application of the camera system can be further expanded. A typical one is that users can experience continuous rotation of the color stereo dynamic real-time AR, VR, and MR application environment on the mobile terminal within the range of 180 degrees or 360 degrees, instead of the poor user experience that relies on the pre-stored virtual reality environment within the mobile terminal. In other words, the rotating 3D recognition assembly rotator is for a better user experience of AR, VR, and MR in continuous wide stereoscopic scenes, and the binocular structured light technology scheme is selected for 3D recognition due to the balanced multi-direction application performance of the binocular structured light technology scheme.
An embodiment of the present disclosure provides a device and a method for continuously rotating the three-dimensional recognition device of a mobile terminal, the method includes, arranging a continuous rotatable three-dimensional recognition device on the top of the mobile terminal or on a rotating shaft of the mobile terminal; and realizing an application of stereoscopic real-time dynamic virtual reality by loading a local artificial intelligence engine or an artificial intelligence engine arranged at an edge computing end onto three-dimensional image information collected by the continuous rotatable three-dimensional recognition device.
An embodiment of the present disclosure relates to a method for acquiring three-dimensional images. The schematic flow of this embodiment is shown in
At S301, a first feature light is emitted to an object to be photographed.
In an implementation, the first feature light in this embodiment can be a structured light coded pattern projected by an infrared dot projector or a pulsed laser spot emitted by an infrared laser emitter. It is not intended to limit the type of the first feature light in this embodiment, and the type of the first feature light can be set according to practical requirements.
At S302: a second feature light reflected by the object to be photographed collected by a first photographing device, a first image of the object to be photographed that is captured by a main camera, and a second image of the object to be photographed that is captured by a secondary camera are acquired.
When the first feature light is the structured light coded pattern projected by the infrared dot projector, the second feature light is the infrared structured speckle reflected by the object to be photographed. And when the first feature light is the pulse laser spot emitted by the infrared laser emitter, the second feature light is the infrared light reflected by the object to be photographed.
In this embodiment, the main camera is a high-definition camera, the secondary camera is a periscope multi-fold optical zoom camera or a wide-angle camera, and both the first image and the second image captured are color images of the object to be photographed.
In this embodiment, acquiring the second feature light reflected by the object to be photographed collected by the first photographing device, is performed through the following operations, acquiring a first infrared structured speckle reflected by the object to be photographed collected by the first infrared camera; and acquiring a second infrared structured speckle reflected by the object to be photographed collected by a second infrared camera. Since monocular structured light relates to the received texture information projected onto the object by an infrared matrix projector through a single infrared camera, and the depth information is acquired by calculation of the distortion of the texture. The monocular structured light requires an infrared dot projector of very high accuracy, and the cost increases accordingly, but the diffracted light decays in intensity quickly and is susceptible to serious interference by ambient light. In this embodiment, the binocular structured light is utilized. In other words, texture information is added to the object to be photographed when collecting the information of the object in a binocular manner, so that the recognition distance and accuracy of the whole object to be photographed (whether at a long distance or a short distance) will be significantly improved. And a general random texture infrared projector is deployed in the binocular structured light assembly which is assembled with a general double-lens assembly process, thus greatly simplifying the calibration process, improving the yield and mass production, and having a relative cost advantage.
At S303, depth information of the object to be photographed is acquired according to the second feature light.
When the first characteristic light is the structured light coded pattern projected by the infrared dot projector, the infrared camera is configured to photograph the bending degree of the fringes modulated by the object, demodulate the bending fringes to acquire the phases, then convert the phases into the height of the whole field, and acquire the complete depth information of the object to be photographed. When the first feature light is a pulsed laser spot emitted by an infrared laser emitter, the depth information of the object is acquired according to the difference between the time at which the infrared lights emit to the object to be photographed and the time at which the infrared camera receives the infrared light reflected by the object, and acquire the depth information of the object to be photographed in conjunction with the color information acquired by the color cameras.
At S304, feature fusion is performed on the first image and the second image, and stereo registration is performed on a result of feature fusion and the depth information to acquire a three-dimensional image of the object to be photographed.
In an implementation, two or more images captured by different image acquisition devices are matched to the same coordinate system to form a three-dimensional point cloud, and an AI engine is loaded to classify and segment the three-dimensional point cloud, so as to realize three-dimensional image recognition or reconstruction.
It can be understood that the above processes require the powerful computing performance of the processor of the mobile terminal, and are finally stored in the memory of the mobile terminal in the form of software. In terms of the storage and computing capacity of the mobile terminal is concerned, the flow processing in the static operating mode with limited directions or the dynamic operating mode with limited frames is within reach. However, in case that the binocular structured light and multi-color camera rotate continuously for 180 degrees or 360 degrees for data processing in 3D dynamic real-time AI with Extended Reality (XR, covering AR, VR, MR) application environments, it is challenging to process all the above processes still by the mobile terminal. In this case, it is necessary to utilize the cloud or Mobile Edge Computing (MEC) platform based on the 5th Generation (5G) wireless network. In this case, the mobile terminal operates in the low-latency and wide-bandwidth 5G mobile communication network Enhanced Mobile Broadband (eMBB) scenario, so as to ensure that the cloud platform or MEC platform can be cooperatively integrated with local computing in real-time, and then achieve a good user experience (as shown in
The cloud computing service department of the cloud platform is generally located in the cloud data center on the core network side, and the transmission network from users to the data center is under great pressure at the peak of services. At this time, the user experience will be extremely poor or even inaccessible to networks would occur. MEC technique upgrades the traditional wireless base station to a smart base station, and offloads the computing, network and storage of cloud data center from the core network to the distributed base station. As a bridge between network and service, MEC technique is a key factor to deal with the wide bandwidth, low latency and localized vertical industry applications related to 5G communication.
As shown in
For the convenience of understanding, the application of interaction between the mobile terminal and the MEC platform in this embodiment will be illustrated below.
1. The mobile terminal establishes a link with the base station and establishes a session with the MEC platform.
2. The binocular structured light assembly and the color cameras of the mobile terminal rotate continuously (for 180 degrees or 360 degrees) to acquire pre-processing information or preliminary fusion information about an object.
3. The dynamic 3D image data generated by continuous rotation is compressed (generally, the image block is decomposed first and then compressed by discrete cosine transform and wavelet transform).
4. The mobile terminal transmits data to the base station (from the physical layer to the packet data aggregation layer).
5. The MEC platform analyzes and calculates the compressed data of binocular structured light and multi-color camera pre-processing or fusion information uploaded by the mobile terminal through AI+XR by means of the data analyzing ability thereof to form 3D recognition and reconstruction information (necessary image recovery, as well as image enhancement, are needed during the 3D recognition and reconstruction).
6. A determination is performed as to whether the 3D identification and reconstruction information is associated with the enterprise network/Internet application related to the capability open channel, and if so, further fusion processing will be carried out.
7. The 3D identification and reconstruction information or information associated with the enterprise network/Internet application is returned to the mobile terminal to realize the completion of low-latency localization services.
Compared with the prior art, this embodiment of the present disclosure has the advantage that the depth information of the object to be photographed is acquired according to the second feature light collected by the first photographing device, and then the depth information is fused with the images photographed by a plurality of color cameras (i.e., the main camera and at least one secondary camera), so that the static multi-direction (especially forward and backward) three-dimensional recognition or reconstruction can be realized, and the continuous and dynamic three-dimensional recognition and reconstruction can be realized. Thus diversified application scenes of the system and richer contents of the images are achieved, and the imaging is significantly enhanced and improved, and the photographing performance of the camera system is improved.
It shall be understood by those having ordinary skill in the art that the above are some embodiments for implementing the present disclosure, and in practical application, various alternations in form and details can be made without departing from the scope of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
202010622551.7 | Jun 2020 | CN | national |
This application is a national stage filing under 35 U.S.C. § 371 of international application number PCT/CN2021/098750, filed Jun. 7, 2021, which claims priority to Chinese patent application No. 202010622551.7, filed Jun. 30, 2020. The contents of these applications are incorporated herein by reference in their entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2021/098750 | 6/7/2021 | WO |