This application claims priority to Chinese Patent Application No. 202311411802.7, filed on Oct. 27, 2023, the entire disclosure of which is incorporated herein by reference.
The present disclosure relates to the fields of optical computing, depth detection and three-dimensional perception technologies, in particular to an all-optical intelligent computing three-dimensional perception system and an all-optical intelligent computing three-dimensional perception device.
Three-dimensional perception provides a brilliant representation and understanding of depth information in the physical world, and achieves unprecedented innovations in various fields, such as building industry, virtual reality, and automatic driving. Humans have been pursuing a simple and effective three-dimensional perception technique to better perceive the world.
A first aspect of the present disclosure provides an optical superpixel modeling and training method for a three-dimensional perception for an arbitrarily shaped object, applied to an all-optical intelligent computing three-dimensional perception device, wherein the all-optical intelligent computing three-dimensional perception device comprises: a solid state laser, an encoding diffractive surface, a plurality of optical elements, a decoding diffractive surface, and a photodetector; wherein, the solid state laser is configured to generate a laser beam; the encoding diffractive surface consists of a phase modulator based on encoding phase modulation information, is configured to modulate a wave-front phase based on the laser beam to generate a structured light with spatial pattern varying with depth; the plurality of optical elements are configured to irradiate the generated structured light onto a surface of an object; the decoding diffractive surface consists of a phase modulator based on decoding phase modulation information, is configured to map depth information in the structured light reflected by the object to a light intensity of an output plane; and the photodetector is configured to obtain light intensity information on the output plane for a depth information acquisition to achieve a three-dimensional perception. The method includes:
A second aspect of the present disclosure provides a superpixel classification modeling and training method for a robust three-dimensional perception for an arbitrarily shaped object under different illumination and reflectivity conditions, applied to an all-optical intelligent computing three-dimensional perception device, wherein the all-optical intelligent computing three-dimensional perception device comprises: a solid state laser, an encoding diffractive surface, a plurality of optical elements, a decoding diffractive surface, and a photodetector; wherein, the solid state laser is configured to generate a laser beam; the encoding diffractive surface consists of a phase modulator based on encoding phase modulation information, is configured to modulate a wave-front phase based on the laser beam to generate a structured light with spatial pattern varying with depth; the plurality of optical elements are configured to irradiate the generated structured light onto a surface of an object; the decoding diffractive surface consists of a phase modulator based on decoding phase modulation information, is configured to map depth information in the structured light reflected by the object to a light intensity of an output plane; and the photodetector is configured to obtain light intensity information on the output plane for a depth information acquisition to achieve a three-dimensional perception. The method includes:
A third aspect of the present disclosure provides an optical global modeling and training method for an object with priori shape information, applied to an all-optical intelligent computing three-dimensional perception device, wherein the all-optical intelligent computing three-dimensional perception device comprises: a solid state laser, an encoding diffractive surface, a plurality of optical elements, a decoding diffractive surface, and a photodetector; wherein, the solid state laser is configured to generate a laser beam; the encoding diffractive surface consists of a phase modulator based on encoding phase modulation information, is configured to modulate a wave-front phase based on the laser beam to generate a structured light with spatial pattern varying with depth; the plurality of optical elements are configured to irradiate the generated structured light onto a surface of an object; the decoding diffractive surface consists of a phase modulator based on decoding phase modulation information, is configured to map depth information in the structured light reflected by the object to a light intensity of an output plane; and the photodetector is configured to obtain light intensity information on the output plane for a depth information acquisition to achieve a three-dimensional perception. The method includes:
The foregoing and/or additional aspects and advantages of the present disclosure will become apparent and readily understood from the following description of the embodiments in combination with the accompanying drawings.
It is noted that the embodiments of the present disclosure and the features in the embodiments may be combined with each other without conflict. The present disclosure is described in detail below in the embodiments with reference to the accompanying drawings.
In order to enable those skilled in the art to better understand the scheme of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in combination with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments of the present disclosure. Based on the embodiments in the present disclosure, all other embodiments obtained by those skilled in the art without inventive works shall fall within the scope of protection of the present disclosure.
Three-dimensional perception provides a brilliant representation and understanding of depth information in the physical world, and achieves unprecedented innovations in various fields, such as building industry, virtual reality, and automatic driving. Humans have been pursuing a simple and effective three-dimensional perception technique to better perceive the world. However, the existing three-dimensional perception methods face huge challenges in acquiring three-dimensional images directly using an optical system, the depth information inherently contained in the propagation of light field is lost when captured by conventional two-dimensional cameras. Stereo vision methods, for example, use feature matching, computational stereo and deep learning algorithms to estimate differences between multi-view two-dimensional images to obtain a three-dimensional image. However, complex post-processing imposes a heavy computational burden on the electronic hardware, which limits the speed and applicability of three-dimensional perception, especially in the case of increasing model complexity in the artificial neural network (ANN) basic method. Compared with completely relying on the post-processing, directly interfering with the optical process of the three-dimensional perception can reduce the computational burden of electronic hardware. For example, a light detection and ranging (LiDAR) solution mainly uses time of flight (ToF) or frequency modulated continuous wave (FMCW) to realize three-dimensional perception without complex reconstruction algorithms. However, this solution needs to realize precise scanning and accurate time or frequency measurement, which leads to the bottleneck of parallel detections. Nowadays, no three-dimensional perception technique can be as widely used as the two-dimensional imaging sensor. With the deepening of the exploration of the physical essence of three-dimensional perception, in the present disclosure, it is desirable to directly obtain a depth image through an optical imaging device, that is, to obtain the depth information in an optical manner without post-processing. Benefiting from the advantages of energy efficiency and speed, a field of optical computing technology triggeres a research boom in recent years. For example, a diffractive surface can transmit light from an input field of view (FoV) to an output FoV with a specific modulation. By modulating light propagation and utilizing its physical properties, the diffractive surface has realized all optical linear transformation, image classification, pulse shaping, and wave sensing and logical operation. Using optical computing to promote three-dimensional perception will provide unprecedented possibilities for high-speed and low-power application scenarios, including ultra-fast obstacle avoidance in automatic driving, low-power unmanned system, energy-saving smart factory, and the like.
The all-optical intelligent computing three-dimensional perception system, method and device are described below with reference to the accompanying drawings.
It can be understood that the demand for automatic vehicles has driven huge advances in three-dimensional perception technology. However, existing technologies, such as stereo vision cameras and LiDAR, require complex post-processing or point-by-point scanning, so that the computational burden of the hardware is heavy and the operating speed is slow. The present disclosure provides the all-optical intelligent computing three-dimensional perception system, to realize three-dimensional perception at a speed of light without post-processing. The present disclosure encodes depth information into an optical pattern of custom structured light and decodes it into an output light intensity through an optical interconnection optimized by a diffractive surface. Three-dimensional perception without post-processing is accomplished during a light propagation process on a passive surface, and a captured light intensity directly reflects a depth image.
As illustrated in
The reference light inputting module 111 includes a solid state laser with a working wavelength, and the solid state laser is configured to generate a laser beam.
The all-optical structured light encoding module 112 includes an encoding diffractive surface consisting of a phase modulator based on encoding phase modulation information, and the encoding diffractive surface is configured to modulate a wave-front phase based on the laser beam to generate a structured light with a spatial pattern varying with depth.
The structured light illuminating module 113 includes a plurality of optical elements, and the optical elements are configured to irradiate the generated structured light onto a surface of an object.
The all-optical reflected light decoding module 114 includes a decoding diffractive surface consisting of a phase modulator based on decoding phase modulation information, and the decoding diffractive surface is configured to map depth information in the structured light reflected by the object to a light intensity on an output plane.
The three-dimensional information acquisition module 115 is configured to obtain light intensity information on the output plane by using a photodetector for depth information acquisition to achieve three-dimensional perception.
In detail, the reference light inputting module 111 generates the laser beam by the solid state laser with the working wavelength to provide a system working light source. The all-optical structured light encoding module 112 is accomplished by modulating the wavefront phase using the encoding diffractive surface consisting of the phase modulator with the encoding phase modulation information, to generate the structured light with the spatial pattern varying with depth. The structured light illuminating module 113 is accomplished by beam splitters and other optical elements to irradiate the generated structured light onto the surface of the object. The all-optical reflected light decoding module 114 is accomplished by the decoding diffractive surface consisting of the phase modulator with the decoding phase modulation information, to map the depth information in the structured light reflected by the object to the light intensity of the output plane. The light intensity of the output plane directly reflects a depth image, and the three-dimensional information acquisition module 115 obtains the light intensity information on the output plane captured by the photodetector, to realize the depth information acquisition and complete three-dimensional perception.
It is understood that the intelligent diffractive surface of the present disclosure consists of a phase modulator. The phase modulator may be realized by using a SLM or a phase plate made of a specific material. The phase modulator can determine the amplitude modulation coefficient and the phase modulation coefficient of each pixel for the light, and each pixel is an optical neuron. The intelligent diffractive surfaces form a neural network with a specific function through free space propagation, to realize tasks such as intelligent computation and feature learning.
The all-optical intelligent computing three-dimensional perception system enables light-speed and post-processing-free three-dimensional perception via diffractive surfaces. The present disclosure works by encoding depth information into the spatial pattern of custom structured light and decoding it into an output light intensity through optical interconnections optimized by the diffractive surfaces. Unlike a use of reconstruction algorithms, a three-dimensional map are immediately recorded using 2D sensors and a captured light intensity directly represents the depth information.
At step S1, first training data is constructed based on a preset first superpixel optimized region.
At step S2, first neuron data in an encoding diffractive surface and a decoding diffractive surface of a superpixel is trained using the first training data, to determine first target depth information according to depth information in a spatial pattern of a structured light of an encoding superpixel and a reflected optical pattern of the structured light of a decoding superpixel.
At S3, the first target depth information is mapped to a light intensity of a ROI of an output plane by an intelligently optimized optical connection.
In detail, in the method of the embodiment of the present disclosure, a square region with a suitable size is selected as a superpixel optimized region on which a suitable training set, a test set and a suitable loss function are established. The amplitude and phase modulation coefficients of each neuron in the superpixel intelligent diffractive surface are trained in combination with an error back propagation and a stochastic gradient descent algorithm, to realize that an encoding superpixel generates a customized structured light and a decoding superpixel learns the depth information in the spatial pattern. The depth information is linearly mapped to the light intensity of the ROI of the output plane through the intelligently optimized optical connection. Preferably, the training requires that the structured light of the encoding superpixel is confined within a superpixel region and an output of the decoding superpixel is also confined within a superpixel region to prevent crosstalk between neighboring superpixels. After a single encoding superpixel and a single decoding superpixel are optimized, they are replicated at any integer multiple and spliced together to form an encoding intelligent diffractive surface and a decoding intelligent diffractive surface.
In an embodiment of the present disclosure, the first training data includes the training set, the test set, and the loss function, and the first neuron data includes the amplitude modulation coefficient and includes phase modulation coefficient of neuron.
At step S101, second training data is constructed based on a preset second superpixel optimized region.
At step S102, second neuron data in an encoding diffractive surface and a decoding diffractive surface of a superpixel is trained using the second training data, to determine second target depth information according to depth information in a spatial pattern of a structured light of an encoding superpixel and a reflected optical pattern of the structured light of a decoding superpixel.
At step S103, the second target depth information is classified into a first preset number of depth intervals, and the first preset number of depth intervals are mapped to a second preset number of preset regions on an output plane, to determining a region where a maximum light intensity is located as a classification result.
In detail, in the embodiment of the present disclosure, a square region with a suitable size is selected as a superpixel optimized region on which a suitable training set, a test set and a suitable loss function are established. Amplitude and phase modulation coefficients of each neuron in a superpixel intelligent diffractive surface are trained in combination with an error back propagation and a stochastic gradient descent algorithm, to realize that an encoding superpixel generates a customized structured light and a decoding superpixel learns the depth information in the spatial pattern of a reflected structured light. Target depth information is divided into a specific number of depth intervals. The trained superpixel maps the depth intervals to a specific number of predefined regions on the output plane. The region where the maximum light intensity is located is determined as the classification result and represents quantization depth. After a single encoding superpixel and a decoding superpixel are optimized, they are replicated at any integer multiple and spliced together to form the encoding intelligent diffractive surface and the decoding intelligent diffractive surface.
In an embodiment of the present disclosure, the second training data includes the training set, the test set, and the loss function, and the second neuron data includes the amplitude modulation coefficient and includes phase modulation coefficient of neuron.
At step S10, third training data is constructed based on an optimized region with a global size.
At step S20, third neuron data in an encoding diffractive surface and a decoding diffractive surface of a superpixel is trained using the third training data, to form the encoding diffractive surface and the decoding diffractive surface according to depth information in a spatial pattern of a structured light of an encoding superpixel and a reflected optical pattern of the structured light of a decoding superpixel.
At step S30, a region depth is retrieved according to a preset demand based on a light intensity of output regions of the encoding diffractive surface and the decoding diffractive surface.
In detail, in the embodiment of the present disclosure, a region with a global size is selected as an optimized region on which a suitable training set having priori shape information, a test set and a suitable loss function are established. Amplitude and phase modulation coefficients of each neuron in an intelligent diffractive surface are trained in combination with an error back propagation and a stochastic gradient descent algorithm, to realize that an encoding superpixel generates a customized structured light and a decoding superpixel learns the depth information in the spatial pattern of a reflected structured light, so as to generate the encoding intelligent diffractive surface and the decoding intelligent diffractive surface. The region depth can be retrieved according to a demand by a light intensity of an output region or a location of the maximum light intensity of the output region.
In an embodiment of the present disclosure, the third training data includes the training set, the test set, and the loss function, and the third neuron data includes the amplitude modulation coefficient and includes phase modulation coefficient of neuron.
It is understood that the intelligent diffractive surface consists of a phase modulator. The phase modulator may be realized by using a SLM or a phase plate made of a specific material. The phase modulator can determine the amplitude modulation coefficient and the phase modulation coefficient of each pixel for a light, and each pixel is an optical neuron. The intelligent diffractive surfaces form a neural network with a specific function through free space propagation, to realize tasks such as intelligent computation and feature learning.
To ensure that the light intensity of an output optical field O is linearly correlated with depth, the phase modulation parameters of the encoding and decoding surfaces of the present disclosure are optimized. In the present disclosure, for the solution of the three-dimensional perception for objects with priori shape information, a mean square error (MSE) between |O|2 and a true depth image Ogt is used as the loss function:
where, α is a trainable parameter and is used to adjust a magnification of a light intensity, i and j are image pixel indexes, and N is a number of pixels in a row or column of an image.
In
In
The present disclosure highlights a central region of the output field O for depth estimation. A width and height of the central region is half of that of the metapixel (i.e., 75% in the classification solution) and is denoted as RoI2, and a remaining region is used for light propagation. The loss function is defined as:
where, α is a trainable parameter and is used to adjust a magnification of the light intensity, and NRoI2 is a number of pixels in a row or column of the RoI2. In order to limit light propagation with a metapixel region (denoted as RoI3), the loss function is defined as:
A two-stage scheme is used, i.e., O and structured light SL are constrainted simultaneously:
where, Pd is an operator for a free-space diffraction propagation distance d, ten is a transmission function of the encoding diffractive surface, and I is an incident light.
Therefore, a total loss function is:
where λ1, λ2 and λ3 are weights, and a sum of λ1, λ2 and λ3 is 1. Therefore, in examples of the present disclosure, a superpixel optimization method is used when facing the three-dimensional perception for arbitrarily shaped objects in a general scene. The superpixel optimization method involves selecting a square region with a suitable size as a depth perceiving unit, training the phase modulation parameters of the encoding and decoding superpixels by the error back propagation algorithm to limit the propagation of the structured light generated by the encoding superpixel to the superpixel region, and to limit the output of the decoding superpixel to the superpixel region simultaneously, to prevent crosstalk between neighboring superpixels. Meanwhile, the decoding superpixel linearly maps the depth information to the light intensity of the ROI of the output plane via the intelligently optimized optical connection. After a single encoding superpixel and a single decoding superpixel are optimized, they are replicated at any integer multiple and spliced together to form the encoding intelligent diffractive surface and the decoding intelligent diffractive surface. In the examples of the present disclosure, the superpixel classification method is used when facing the three-dimensional perception for arbitrarily shaped objects under different illumination and reflectivity conditions. The target depth range is divided into a specific number of depth intervals. The trained superpixel maps the depth intervals to a number of predefined regions on the output plane. The region where the maximum light intensity is located is determined as the classification result and represents the quantization depth. Similarly, structured light propagation is constrained within each superpixel region. A single superpixel is replicated and spliced to form the encoding intelligent diffractive surface and the decoding intelligent diffractive surface. The global optimization method is used when facing a scenario with priori shape information. The global optimization method takes an entire diffractive surface as an optimization object, and trains the phase modulation parameters of the intelligent diffractive surface by the error back propagation algorithm, to optically map the depth information of the entire scenario to the light intensity of the output plane. In the present disclosure, the FoV can be increased by horizontally expanding the encoding and decoding diffractive surfaces, and the three-dimensional perception performance can also be enhanced by vertically stacking the number of diffractive surface layers.
In conclusion, in the present disclosure, the method is distinguished from the method using ToF or FMCW in the LiDAR solutions. The method works by encoding the depth information into the spatial pattern of the customized structured light and decoding it into the output light intensity by the optical interconnection optimized by diffractive surfaces. Different from the reconstruction algorithms, three-dimensional maps are immediately recorded using two-dimensional sensors, and the depth information is directly represented through the captured light intensity.
In the present disclosure, a customized superpixel model is developed, and is independently manipulates light propagation in each superpixel to achieve three-dimensional perception for arbitrary shaped objects. The encoding and decoding surfaces are co-optimized by deep learning, to achieve best three-dimensional perception performance. The diffractive surfaces are produced by simple three-dimensional printing or lithography, and are assembled into a three-dimensional imager. The present disclosure can be flexibly designed for different depth ranges. The encoding and decoding surfaces can be cascaded horizontally for larger FoV and can be stacked vertically for better performance. Based on this scalable architecture, the present disclosure has been successfully applied in a variety of scenarios, including high-resolution three-dimensional perception scenarios with priori knowledge, three-dimensional perception for arbitrarily shaped objects, robust three-dimensional perception under different illumination and reflection conditions, and all-optical obstacle avoidance in actual automatic driving tasks. In the present disclosure, by accomplishing three-dimensional perception using passive diffractive surfaces and consuming no energy other than optical power, the computational load are reduced and a light-speed and power-efficient solution is provided for depth perception. In the present disclosure, the device is able to work with a speed of 600 Hz (limited by a camera frame rate) and a power of 0.3 μW. Compared with typical 30 Hz and watt-level power in conventional methods, a 20 times improvement in speed and a 6 orders of magnitude improvement in energy efficiency are achieved in the present disclosure.
In conclusion, an all-optical neural network system for depth detection is established in the present disclosure, which exhibits excellent performance over electronic networks with the same amount of parameters (taking U-Net structure as an example). The method and device in the present disclosure works well at long distances (up to 6 meters), a low power (as low as 0.3 μW), a high speed (up to 600 Hz), and a high transmittance, proving its performance advantages over traditional depth perception methods such as the LiDAR.
The method based on the all-optical intelligent computing three-dimensional perception system in the present disclosure enables light-speed and post-processing-free three-dimensional perception via diffractive surfaces. The present disclosure works by the encoding depth information into the spatial pattern of customized structured light and decoding it into the output light intensity through optical interconnections optimized by diffractive surfaces. Unlike the reconstruction algorithms, three-dimensional maps are immediately recorded using two-dimensional sensors, and the captured light intensity directly represents the depth information. The present disclosure proposes a new idea of three-dimensional perception, which greatly simplifies the three-dimensional perception process and supports the next generation of high-speed, low-power three-dimensional perception technology.
In order to realize the above embodiments, as illustrated in
The solid state laser 100 is configured to generate a laser beam.
The encoding diffractive surface 200 consists of a phase modulator based on encoding phase modulation information, is configured to modulate a wave-front phase based on the laser beam to generate a structured light with spatial pattern varying with depth.
The plurality of optical elements 300 are configured to irradiate the generated structured light onto a surface of an object.
The decoding diffractive surface 400 consists of a phase modulator based on decoding phase modulation information, is configured to map depth information in the structured light reflected by the object to a light intensity of an output plane.
The photodetector 500 is configured to obtain light intensity information on the output plane for depth information acquisition to achieve three-dimensional perception.
The intelligent diffractive surface consists of a phase modulator. The phase modulator can be realized by methods such as a SLM or a phase plate made of a specific material. The phase modulator can determine a light amplitude modulation coefficient and a light phase modulation coefficient of each pixel. Each pixel is an optical neuron. The intelligent diffractive surfaces can form a neural network with a specific function through free space propagation to realize tasks such as intelligent computation and feature learning.
With the all-optical intelligent computing three-dimensional perception device of the embodiment of the present disclosure, light-speed and post processing-free three-dimensional perception can be achieved through diffractive surfaces. The present disclosure works by encoding the depth information into the spatial pattern of the custom structured light and decoding it into the output light intensity through diffractive surface-optimized optical interconnections. Unlike the use of reconstruction algorithms, three-dimensional maps are immediately recorded using two-dimensional sensors, and the captured light intensity directly represents the depth information. The present disclosure provides a new idea for the three-dimensional perception, which greatly simplifies the three-dimensional perception process and supports the next generation of high-speed, low-power three-dimensional perception technology.
In the present disclosure, the reference terms “an embodiment”, “some embodiments”, “example”, “specific example”, and “some examples” and the like are intended to describe specific features, structures, materials, or characteristics described in combination with the embodiments or examples are included in at least one embodiment or example of the present disclosure. In the present disclosure, the schematic expressions of the above terms do not have to be directed to the same embodiments or examples. Moreover, the specific features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples. In addition, without contradicting each other, those skilled in the art may combine different embodiments or examples described in this present disclosure and features of different embodiments or examples.
In addition, terms such as “first” and “second” are used herein for purposes of description and are not intended to indicate or imply relative importance or to imply the number of technical features indicated. The feature defined with “first” or “second” may explicitly or implicitly include at least one of these features. In the description of the present disclosure, “a plurality of” means at least two, for example, two or three, unless specified otherwise.
Number | Date | Country | Kind |
---|---|---|---|
202311411802.7 | Oct 2023 | CN | national |