The technical field generally relates to an optical deep learning physical architecture or platform that can perform, at the speed of light, various complex functions and tasks that current computer-based neural networks can implement. The optical deep learning physical architecture or platform has applications in image analysis, feature detection, object classification, camera designs, and other optical components that can learn to perform unique functions or tasks.
Deep learning is one of the fastest-growing machine learning methods, and it uses multi-layered artificial neural networks implemented in a computer to digitally learn data representation and abstraction, and perform advanced tasks, comparable to or even superior than the performance of human experts. Recent examples where deep learning has made major advances in machine learning include medical image analysis, speech recognition, language translation, image classification, among others. Beyond some of these mainstream applications, deep learning methods are also being used for solving inverse imaging problems.
Optics in machine learning has been widely explored due to its unique advantages, encompassing power efficiency, speed and scalability. Yu et al., for example, describe different types of optical neural networks that are formed as liquid-crystal televisions (LCTV)-based optical neural networks, compact optical neural networks, mirror-array interconnected neural networks, and optical disk-based neural networks. See Yu et al., Optical Neural Networks: Architecture, Design and Models. In Progress in Optics; Wolf, E., Ed., Elsevier, 1993, Vol. 32, pp 61-144. Some of the earlier work include optical implementations of various neural network architectures to perform specific tasks. For example, Javidi et al. describe the optical implementation of neural networks for face recognition by the use of nonlinear joint transform correlators. See Javidi et al., Optical Implementation of Neural Networks for Face Recognition by the Use of Nonlinear Joint Transform Correlators. Appl. Opt. 1995, 34 (20), 3950-3962. Optical-based neural networks have the advantage that they can perform various complex functions at the speed of light.
In one embodiment, an all-optical deep learning framework or architecture is disclosed where the neural network is physically formed by multiple layers of diffractive surfaces that work in collaboration with one another to optically perform an arbitrary function that the digital version of the network can statistically learn. Thus, while the inference/prediction of the physical network is all-optical, the learning part that leads to the design of the physical network embodiment is done through a computer. This framework is described sometimes herein as Diffractive Deep Neural Network (D2NN) and demonstrates its inference capabilities through both simulations and experiments. A D2NN can be physically created by using several transmissive and/or reflective substrate layers, where individual points or small regions located on a given physical layer either transmits or reflects the incoming wave, representing an artificial “neuron” that is connected to other “neurons” of the subsequent or following layers through optical diffraction. A D2NN encompasses structures that have only transmissive substrate layers, only reflective substrate layers, as well as combinations of transmissive and reflective substrate layers.
In one embodiment, the artificial neurons are created by physical features that are formed on a surface of or within a substrate. These physical features may be used alter phase and/or amplitude of the light wave that is transmitted through or reflected by the substrate. In some embodiments, the physical features that form the various neurons that exist in a given layer may include different thicknesses of material used in the substrate. In other embodiments, the physical features used to form the neurons may include different material compositions or material properties formed at discrete locations used in the substrate. These different physical features that form the physical “neurons” in the substrate may be formed, in some embodiments, as an array of discrete regions or areas that are located across the two- or three-dimensional surface of the physical substrate layers. In one particular embodiment, the physical features are created by additive manufacturing techniques such as 3D printing but it should be appreciated that other techniques such as lithography or the like may be used to generate the “neurons” in the different layers.
In one embodiment, an all-optical diffractive deep neural network device includes a plurality of optically transmissive substrate layers arranged in an optical path, each of the plurality of optically transmissive substrate layers including a plurality of physical features formed on or within the plurality of optically transmissive substrate layers and having different complex-valued transmission coefficients as a function of lateral coordinates across each substrate layer, wherein the plurality of optically transmissive substrate layers and the plurality of physical features thereon collectively define a trained mapping function between an input optical image or input optical signal to the plurality of optically transmissive substrate layers and an output optical image or output optical signal created by optical diffraction through the plurality of optically transmissive substrate layers. The device includes one or more optical sensors configured to capture the output optical image or output optical signal resulting from the plurality of optically transmissive substrate layers.
In another embodiment, an all-optical diffractive deep neural network device includes a plurality of optically reflective substrate layers arranged along an optical path, each of the plurality of optically reflective substrate layers including a plurality of physical features formed on or within the plurality of optically reflective substrate layers, wherein the plurality of optically reflective substrate layers and the plurality of physical features collectively define a trained mapping function between an input optical image or input optical signal to the plurality of optically reflective substrate layers and an output optical image or output optical signal from the plurality of optically reflective substrate layers. The device includes one or more optical sensors configured to capture the output optical image or output optical signal from the plurality of optically reflective substrate layers.
In another embodiment, an all-optical diffractive deep neural network device includes a plurality of substrate layers positioned along an optical path, the plurality of substrate layers having one or more optically reflective substrate layers and one or more optically transmissive substrate layers including a plurality of physical features formed on or within the respective optically reflective substrate layer(s) and optically transmissive substrate layer(s), wherein the plurality of substrate layers collectively define a trained mapping function between an input optical image or input optical signal to the plurality of substrate layers and an output optical image or output optical signal from the plurality of substrate layers. The device further includes one or more optical sensors configured to capture the output optical image or output optical signal from the plurality of substrate layers.
In another embodiment, an all-optical diffractive deep neural network device includes a plurality of substrate layers positioned along an optical path, the plurality of substrate layers having one or more optically reflective substrate layers and/or optically transmissive substrate layers including a plurality of physical features formed on or within the respective optically reflective substrate layer(s) and/or optically transmissive substrate layer(s), at least one of the plurality of substrate layers including spatial light modulator(s) therein or thereon, wherein the plurality of substrate layers collectively define a trained mapping function between an input optical image or input optical signal to the plurality of substrate layers and an output optical image or output optical signal from the plurality of substrate layers. The device includes one or more optical sensors configured to capture the output optical image or output optical signal from the plurality of substrate layers.
In another embodiment, a method of forming an all-optical multi-layer diffractive network includes training a software-based deep neural network to perform one or more specific optical functions for a multi-layer transmissive and/or reflective network having a plurality of optically diffractive or optically reflective physical features located in different two dimensional locations in each of the layers of the network, wherein the training comprises feeding an input layer of the multi-layer network with training images or training optical signals and computing an optical output of the network through optical transmission and/or reflection through the multi-layer network and iteratively adjusting complex-valued transmission and/or reflection coefficients for each layer of the network until optimized transmission/reflection coefficients are obtained. A physical embodiment of the multi-layer transmissive or reflective network is then manufactured that includes a plurality of substrate layers having physical features that match the optimized transmission/reflection coefficients obtained by the trained deep neural network.
In another embodiment, a method of using an all-optical multi-layer transmissive and/or reflective network includes providing a multi-layer transmissive and/or reflective network having a plurality of substrate layers positioned along an optical path, the plurality of substrate layers including one or more optically reflective and/or optically transmissive substrate layers, wherein the plurality of substrate layers collectively define a trained mapping function between an input optical image or input optical signal to the plurality of substrate layers and an output optical image or output optical signal from the plurality of substrate layers. An object is illuminated with a light source to create the input optical image or input optical signal that is directed to the plurality of substrate layers positioned along the optical path. The output optical image or output optical signal is captured from the plurality of substrate layers with one or more optical sensors.
In another embodiment, a hybrid optical and electronic neural network-based system includes an all-optical front-end having a plurality of optically transmissive substrate layers arranged in an optical path, each of the plurality of optically transmissive substrate layers including a plurality of physical features formed on or within the plurality of optically transmissive substrate layers having different complex-valued transmission coefficients as a function of lateral coordinates across each substrate layer, wherein the plurality of optically transmissive substrate layers and the plurality of physical features collectively define a trained mapping function between an input optical image or input optical signal to the plurality of optically transmissive substrate layers and an output optical image or output optical signal created by optical diffraction through the plurality of optically transmissive substrate layers. The system includes one or more optical sensors configured to capture the output optical image or output optical signal resulting from the plurality of optically transmissive substrate layers. The system further includes a trained, digital neural network configured to receive as an input the output optical image or output optical signal resulting from the plurality of optically transmissive substrate layers and output a final output optical image or final output optical signal.
In another embodiment, a hybrid optical and electronic neural network-based system includes an all-optical front-end having a plurality of optically reflective substrate layers arranged along an optical path, each of the plurality of optically reflective substrate layers including a plurality of physical features, wherein the plurality of optically reflective substrate layers and the plurality of physical features thereon collectively define a trained mapping function between an input optical image or input optical signal to the plurality of optically reflective substrate layers and an output optical image or output optical signal from the plurality of optically reflective substrate layers. The system includes one or more optical sensors configured to capture the output optical image or output optical signal from the plurality of optically reflective layers. The system further includes a trained digital neural network configured to receive as an input the output optical image or output optical signal resulting from the plurality of optically reflective substrate layers and output a final output optical image or final output optical signal.
In another embodiment, a method of forming an optical-based multi-layer deep neural network includes training a software-based deep neural network to perform a specific function or task using a multi-layer transmissive and/or reflective network having a plurality of neurons located in each of the layers, wherein the training comprises feeding an input layer of the multi-layer network with training images or signals and computing an output of the network through optical transmission and/or reflection through the multi-layer network and iteratively adjusting the complex-valued transmission and/or reflection coefficients for the neurons of each layer of the network until optimized transmission/reflection coefficients are obtained, wherein the optimized transmission/reflection coefficients are obtained by parameterization of neuron transmission and/or reflection values and error back-propagation. A physical embodiment of the multi-layer transmissive and/or reflective network is then manufactured that includes a plurality of substrate layers having physical features corresponding to the neurons that match the optimized transmission/reflection coefficients obtained by the trained deep neural network.
While
Each substrate layer 16 of the D2NN 10 has a plurality of physical features 18 formed on the surface of the substrate layer 16 or within the substrate layer 16 itself that collectively define a pattern of physical locations along the length and width of each substrate layer 16 that have varied complex-valued transmission coefficients (or varied complex-valued transmission reflection coefficients for the embodiment of
The pattern of physical locations formed by the physical features 18 may define, in some embodiments, an array located across the surface of the substrate layer 16. With reference to
As seen in
Alternatively, the complex-valued transmission function of a neuron 24 can also engineered by using metamaterial or plasmonic structures. Combinations of all these techniques may also be used. In other embodiments, non-passive components may be incorporated in into the substrates 16 such as spatial light modulators (SLMs). SLMs are devices that imposes spatial varying modulation of the phase, amplitude, or polarization of a light. SLMs may include optically addressed SLMs and electrically addressed SLM. Electric SLMs include liquid crystal-based technologies that are switched by using thin-film transistors (for transmission applications) or silicon backplanes (for reflective applications). Another example of an electric SLM includes magneto-optic devices that use pixelated crystals of aluminum garnet switched by an array of magnetic coils using the magneto-optical effect. Additional electronic SLMs include devices that use nanofabricated deformable or moveable mirrors that are electrostatically controlled to selectively deflect light.
As noted above, the particular spacing of the substrates 16 that make the D2NN 10 may be maintained using the holder 30 of
The D2NN front-end 42 is the same as the D2NN 10 described herein. The D2NN front-end 42 may operation in transmission mode like that illustrated in
Operation 330 illustrates that the step of manufacturing or have manufactured the physical embodiment of the D2NN front-end 42 in accordance with the design. The design, in some embodiments, may be embodied in a software format (e.g., SolidWorks, AutoCAD, Inventor, or other computer-aided design (CAD) program or lithographic software program) may then be manufactured into a physical embodiment that includes the plurality of substrates 16. The physical substrate layers 16, once manufactured may be mounted or disposed in a holder 30 such as that illustrated in
Experimental—All-Optical D2NN
D2NN Architecture. Experiments were conducted using a transmission-based D2NN as illustrated in
After this learning phase, the D2NN 10 design is fixed, and once it is fabricated (e.g., 3D-printed or the like), the physical D2NN 10 manifestation performs the learned function or task at the speed of light.
Wave analysis in a D2NN.
Following the Rayleigh-Sommerfeld diffraction equation, one can consider every single neuron 24 of a given D2NN substrate layer 16 as a secondary source of a wave that is composed of the following optical mode:
where l represents the l-th layer of the network, i represents the i-th neuron located at (xi, yi, zi) of layer l, λ is the illumination wavelength, r=√{square root over ((x−xi)2+(y−yi)2+(z−zi)2)} and j=√{square root over (−1)}. The amplitude and relative phase of this secondary wave are determined by the product of the input wave to the neuron 24 and its transmission coefficient (t), both of which are complex-valued functions. Based on this, for the l-th layer of the network, one can write the output function (nil) of the i-th neuron located at (xi, yi, zi) as:
nil(x, y, z)=wil(x, y, z)·til(xi, yi, zi)·Σknkl−1(xi, yi, zi)=wil(x, y, z)·|A|·ejΔθ, (2)
where mil(xi, yi, zi)=Σknkl−1(xi, yi, zi) defines the input wave to i-th neuron of layer l, |A| refers to the relative amplitude of the secondary wave, and Δθ refers to the additional phase delay that the secondary wave encounters due to the input wave to the neuron 24 and its transmission coefficient. These secondary waves diffract between the substrate layers 16 and interfere with each other forming a complex wave at the surface of the next layer, feeding its neurons 24. The transmission coefficient of a neuron 24 is composed of amplitude and phase terms, i.e., til(xi, yi, zi)=ail(xi, yi, zi)exp(jϕil(xi, yi, zi)), and for a phase-only D2NN 10 architecture the amplitude ail(xi, yi, zi) is assumed to be a constant, ideally 1, ignoring the optical losses, which are addressed herein. In general, a complex-valued modulation at each substrate layer 16 improves the inference performance of the diffractive network (see e.g.,
Through deep learning, the phase values of the neurons 24 of each substrate layer 16 of the diffractive network are iteratively adjusted (trained) to perform a specific function or task by feeding training data at the input layer and then computing the network's output through optical diffraction. Based on the calculated error with respect to the target output, determined by the desired function, the network structure and its neuron phase values are optimized using an error back-propagation algorithm, which is based on the stochastic gradient descent approach used in conventional deep learning.
Compared to standard deep neural networks, a D2NN 10 is not only different in that it is a physical and all-optical deep network, but also it possesses some unique architectural differences. First, the inputs for neurons 24 are complex-valued, determined by wave interference and a multiplicative bias, i.e., the transmission/reflection coefficients. Second, the individual function of a neuron 24 is the phase and amplitude modulation of its input to output a secondary wave, unlike e.g., a sigmoid, a rectified linear unit (ReLU) or other nonlinear neuron functions used in modern deep neural networks. Third, each neuron's 24 output is coupled to the neurons 24 of the next substrate layer 16 through wave propagation and coherent (or partially-coherent) interference, providing a unique form of interconnectivity within the network 10. For example, the way that a D2NN 10 adjusts its receptive field, which is a parameter used in convolutional neural networks, is quite different than the traditional neural networks, and is based on the axial spacing between different substrate layers 16, the signal-to-noise ratio (SNR) at the output layer as well as the spatial and temporal coherence properties of the illumination source. The secondary wave of each neuron 24 will in theory diffract in all angles, affecting in principle all the neurons 24 of the following layer. However, for a given spacing between the successive substrate layers 16, the intensity of the wave from a neuron 24 will decay below the detection noise floor after a certain propagation distance; the radius of this propagation distance at the next substrate layer 16 practically sets the receptive field of a diffractive neural network and can be physically adjusted by changing the spacing between the substrate layers 16, the intensity of the input optical beam, the detection SNR or the coherence length and diameter of the illumination source 12.
D2NN trained for handwritten digit classification. To demonstrate the performance of the D2NN platform, a D2NN was first trained as a digit classifier to perform automated classification of handwritten digits, from zero to nine (
After its training, the design of the D2NN digit classifier was numerically tested using 10,000 images from MNIST test dataset (which were not used as part of the training or validation image sets) and achieved a classification accuracy of 91.75% (
As reported in
Following these numerical results, the 5-layer D2NN design was 3D printed (
Next, the classification performance of D2NN framework was tested with a more complicated image dataset, i.e., the Fashion MNIST (github.com/zalandoresearch/fashion-mnist), which includes ten classes, each representing a fashion product (t-shirts, trousers, pullovers, dresses, coats, sandals, shirts, sneakers, bags, and ankle boots; see
To experimentally demonstrate the performance of fashion product classification using a physical D2NN 10, a phase-only five (5) substrate 16 design was 3D-printed and fifty (50) fashion products used as test objects (i.e., 5 per class) based on the same procedures employed for digit classification diffractive network (
Next, the performance of a phase-only D2NN 10 was tested, composed of five (5) 3D-printed transmission substrate layers 16 (see
After its training and blind testing, numerically proving the imaging capability of the network as shown in
To evaluate the point spread function of the D2NN 10, pinholes were imaged with different diameters (1 mm, 2 mm and 3 mm), which resulted in output images 22, each with a full-width-at-half-maximum (FWHM) of 1.5 mm, 1.4 mm and 2.5 mm, respectively (
Note also that, based on the large area of the 3D-printed network substrate layers 16 (9×9 cm) and the short axial distance between the input (output) plane and the first (last) layer of the D2NN 10, i.e., 4 mm (7 mm), one can infer that the theoretical numerical aperture of this system approaches 1 in air (see
Discussion
For a D2NN 10, after all the parameters are trained and the physical diffractive D2NN 10 is fabricated or otherwise manufactured, the computation of the network function (i.e., inference) is implemented all-optically using a light source 12 and optical diffraction through passive components (i.e., the substrates 16). Therefore, the energy efficiency of a D2NN 10 depends on the reflection and/or transmission coefficients of the substrates 16. Such optical losses can be made negligible, especially for phase-only networks that employ e.g., transparent materials that are structured using e.g., optical lithography, creating D2NN 10 designs operating at the visible part of the spectrum. In these experiments, a standard 3D-printing material (VeroBlackPlus RGD875) was used to provide phase modulation, and each layer of the D2NN 10 shown in
The operation principles of D2NN 10 can be easily extended to amplitude-only or phase/amplitude-mixed transmissive or reflective designs. Whether the network layers perform phase-only or amplitude-only modulation, or a combination of both, what changes from one design to another is only the nature of the multiplicative bias terms, til or ril for a transmissive or reflective neuron 24, respectively, and each neuron 24 of a given substrate layer 16 will still be connected to the neurons 24 of the former layer through a wave-interference process, Σk nkl−1(xi, yi, zi), which provides the complex-valued input to a neuron 24. Compared to a phase-only D2NN design, where |til|=|ril|=1, a choice of |til|<1 or |ril|<1 would introduce additional optical losses, and would need to be taken into account for a given illumination power and detection SNR at the network output plane 22. In some embodiments, one can potentially also create diffractive D2NN 10 networks that employ a physical gain (e.g., through optical or electrical pumping, or nonlinear optical phenomena, including but not limited to plasmonics and metamaterials) to explore the domain of amplified bias terms, i.e., |til|>1 or |ril|>1. At the cost of additional complexity, such amplifying layers can be useful for the diffractive neural network to better handle its photon budget and can be used after a certain number of passive layers to boost up the diffracted signal, intuitively similar to e.g., optical amplifiers used in fiber optic communication links.
Optical implementation of learning in artificial neural networks is promising due to the parallel computing capability and power efficiency of optical systems. Compared to previous opto-electronics based learning approaches, the D2NN framework provides a unique all-optical deep learning engine that efficiently operates at the speed of light using passive components and optical diffraction. An important advantage of D2NNs 10 is that they can be easily scaled up using various high-throughput and large-area 3D fabrication methods (e.g., soft-lithography, 3D printing, additive manufacturing) and wide-field optical components and detection systems, to cost-effectively reach tens to hundreds of millions of neurons 24 and hundreds of billions of connections in a scalable and power-efficient manner. For example, integration of a D2NN 10 with lens-free on-chip imaging systems could provide extreme parallelism within a cost-effective and portable platform. Such large-scale D2NNs 10 might be transformative for various applications, including all-optical image analysis, feature detection, object classification, and might also enable new microscope or camera designs that can learn to perform unique imaging tasks/functions using D2NNs 10.
Some of the main sources of error in the experiments include the alignment errors, fabrication tolerances and imperfections. To mitigate these, a 3D-printed holder (
For an inexpensive 3D-printer or fabrication method, printing/fabrication errors and imperfections, and the resulting alignment problems can be further mitigated by increasing the area of each substrate layer 16 and the footprint of the D2NN 10. This way, the physical feature 18 size at each substrate layer 16 can be increased, which will partially release the alignment requirements. The disadvantage of such an approach of printing larger diffractive networks, with an increased feature 18 size, would be an increase in the physical size of the system and its input optical power requirements. Furthermore, to avoid bending of the network layers over larger areas, an increase in layer thickness and hence its stiffness would be needed, which can potentially also introduce additional optical losses, depending on the illumination wavelength and the material properties. In order to minimize alignment errors and improve the performance of a D2NN 10, a monolithic D2NN 10 design that combines all the substrate layers 16 of the network as part of a 3D fabrication method (i.e., there are not gaps between adjacent substrate layers 16) can be used. Among other techniques, laser lithography based on two-photon polymerization can provide a desired solution for creating such monolithic D2NNs 10.
Another embodiment is the use of spatial light modulators (SLMs) as part of a D2NN 10. This approach of using SLMs in D2NNs 10 has several advantages, at the cost of an increased complexity due to deviation from an entirely passive optical network to a reconfigurable electro-optic one. First, a D2NN 10 that employs one or more SLMs can be used to learn and implement various tasks because of its reconfigurable architecture. Second, this reconfigurability of the physical network can be used to mitigate alignment errors or other imperfections in the optical system of the network. Furthermore, as the optical network statistically fails, e.g., a misclassification or an error in its output is detected, it can mend itself through a transfer learning-based re-training with appropriate penalties attached to some of the discovered errors of the network as it is being used. For building a D2NN 10 that contains SLMs, both reflection and transmission-based modulator devices can be used to create an optical network that is either entirely composed of SLMs or a hybrid one, i.e., employing some SLMs in combination with fabricated (i.e., passive) substrate layers 16.
Materials and Methods
The D2NN 10 design was implemented using TensorFlow (Google Inc.) framework, as shown in
At the detector/output plane 22, the intensity of the network output was measured, and as a loss function to train the imaging D2NN its mean square error (MSE) was used against the target image. The classification D2NNs were also trained using a nonlinear loss function, where the aim was to maximize the normalized signal of each target's corresponding detector region, while minimizing the total signal outside of all the detector regions (see, e.g.,
After the training phase of the optimized D2NN architecture, the 3D model of the network layers to be 3D-printed (i.e., the design of the physical D2NN 10) was generated by Poisson surface reconstruction (see
Following the corresponding design of each D2NN 10, the axial distance between two successive 3D-printed substrate layers 16 was set to be 3.0 cm and 4.0 mm for the classifier and lens networks, respectively. The larger axial distance between the successive layers of the classifier D2NNs increased the number of neuron connections to ˜8 billion, which is approximately 100-fold larger compared to the number of the neuron connections of the imaging D2NN 10, which is much more compact in depth (see
Terahertz Set-up. The schematic diagram of the experimental setup is given in
Forward Wave Propagation Model.
The forward model of the D2NN 10 architecture is illustrated in
where i refers to a neuron of the l-th layer, and p refers to a neuron 24 of the next substrate layer 16, connected to neuron i by optical diffraction. The same expressions would also apply for a reflective D2NN 10 with a reflection coefficient per neuron: ril. The input pattern hk0, which is located at layer 0 (i.e., the input plane), is in general a complex-valued quantity and can carry information in its phase and/or amplitude channels. The resulting wave function due to the diffraction of the illumination plane-wave interacting with the input can be written as:
nk,p0=wk,p0·hk0, (4)
which connects the input to the neurons 24 of layer 1. Assuming that the D2NN design is composed of M substrate layers (excluding the input and output planes), then a detector at the output plane measures the intensity of the resulting optical field:
siM+1=|miM+1|2. (5)
The comparison of the forward model of a conventional artificial neural network and a diffractive neural network is summarized in
Error Backpropagation. To train a design for a D2NN 10, the error back-propagation algorithm was used together with the stochastic gradient descent optimization method. A loss function was defined to evaluate the performance of the D2NN output with respect to the desired target, and the algorithm iteratively optimized the diffractive neural network parameters to minimize the loss function. Without loss of generality, here focusing on the imaging D2NN 10 architecture, and define the loss function (E) using the mean square error between the output plane intensity siM+1 and the target giM+1:
where K refers to the number of measurement points at the output plane. Different loss functions can also be used in D2NN. Based on this error definition, the optimization problem for a D2NN design can be written as:
minϕ
To apply the backpropagation algorithm for training a D2NN 10, the gradient of the loss function with respect to all the trainable network variables needs to be calculated, which is then used to update the network layers during each cycle of the training phase. The gradient of the error with respect to ϕil of a given layer l can be calculated as:
In Eq. (8),
quantifies the gradient of the complex-valued optical field at the output layer (mkM+1=Σk
where, 3≤L≤M−1. In the derivation of these partial derivatives, an important observation is that, for an arbitrary neuron at layer l≤M, one can write:
where k1,2 represent dummy variables. During each iteration of the error backpropagation, a small batch of the training data is fed into the diffractive neural network to calculate the above gradients for each substrate layer 16 and accordingly update the D2NN 10.
Imaging D2NN Architecture. Structural similarity index, SSIM, values between the D2NN output plane 22 and the ground truth (i.e., target images) were calculated to optimize the architecture of the diffractive neural network. This way, the number of network substrate layers 16 and the axial distance between two consecutive substrate layers 16 as was optimized shown in
Dataset Preprocessing. To train and test the D2NN 10 as a digit classifier, MNIST handwritten digit database was used, which is composed of 55,000 training images, 5,000 validation images and 10,000 testing images. Images were up-sampled to match the size of the D2NN model. For the training and testing of the imaging or “lens” D2NN 10, ImageNet was used where a subset of 2,000 images was randomly selected. Each color image was converted into grayscale and resized it to match the D2NN 10. It should be noted that color image data can also be applied to D2NN framework although a single wavelength THz system was used for testing. For colorful images, as an example, Red, Green and Blue channels of an image can be used as separate parallel input planes 20 to a diffractive neural network 10. Turning back to training used herein, the selected images were then randomly divided into 1500 training images, 200 validation images and 300 testing images. Very similar imaging performance was obtained by using 10,000 images in the training phase (instead of 2,000 images); this is expected since each training image contains various spatial features at different parts of the image, all of which provide valuable patches of information for successfully training the diffractive imaging network.
To test the performance of the D2NN 10 digit classifier experimentally, 50 handwritten digits were extracted from MNIST test database. To solely quantify the match between the numerical testing results and experimental testing, these 3D-printed handwritten digits were selected among the same 91.75% of the test images that numerical testing was successful. The digits were up-sampled and binarized, as implemented during the training process. Binarized digits were stored as a vector image, in .svg format, before they were 3D printed. The images were then fed into Autodesk Fusion Software (Autodesk Inc.) to generate their corresponding 3D model. To provide amplitude only image inputs to the digit classifier D2NN 10, the 3D-printed digits were coated with aluminum foil to block the light transmission in desired regions.
In addition to MNIST digit classification, to test the D2NN framework with a more challenging classification task, the Fashion MNIST database was used which has more complicated targets as exemplified in
D2NN Neuron Numbers and Connectivity. D2NN uses optical diffraction to connect the neurons at different layers of the network. The maximum half-cone diffraction angle can be formulated as φmax=sin−1(λfmax), where fmax=½df is the maximum spatial frequency and df is the layer feature size. Here, a D2NN 10 is was used operating at 0.4 THz by using low-cost 3D-printed substrate layers 16. The 3D printer that was used has a spatial resolution of 600 dpi with 0.1 mm accuracy and the wavelength of the illumination system is 0.75 mm in air.
For the digit and fashion product classification D2NNs 10, the pixel size was set to 400 μm for packing 200×200 neurons over each substrate layer 16 of the network 10, covering an area of 8 cm×8 cm per substrate layer 16. Five (5) transmissive diffraction substrate layers 16 were used with the axial distance between the successive layers set to be 3 cm. These choices mean create a fully-connected diffractive neural network structure because of the relatively large axial distance between the two successive substrate layers 16 of the diffractive network. This corresponds to 200×200×5=0.2 million neurons 24 (each containing a trainable phase term) and (200×200)2×5=8.0 billion connections (including the connections to the output layer). This large number of neurons 24 and their connections offer a large degree-of-freedom to train the desired mapping function between the input amplitude (handwritten digit classification, 20) or input phase (fashion product classification, 20) and the output intensity measurement 22 for classification of input objects 14.
For the imaging lens D2NN 10 design, the smallest feature size was ˜0.9 mm with a pixel size set of 0.3 mm, which corresponds to a half-cone diffraction angle of ˜25°. The axial distance between two successive substrate layers 16 is set to be 4 mm for 5 layers, and the width of each layer was 9 cm×9 cm. This means the amplitude imaging D2NN 10 design had 300×300×5=0.45 million neurons 24, each having a trainable phase term. Because of the relatively small axial distance (4 mm) between the successive substrate layers 16 and the smaller diffraction angle due to the larger feature size, there are <0.1 billion connections in this imaging D2NN design (including the connections to the output layer, which is 7 mm away from the 5th layer of the diffractive network). Compared to the classification D2NNs 10, this amplitude imaging embodiment is much more compact in the axial direction as also pictured in
There are some unique features of a D2NN 10 that make it easier to handle large scale connections (e.g., 8 billion connections as reported in
Performance analysis of D2NN as a function of the number of layers and neurons. A single diffractive substrate layer cannot achieve the same level of inference that a multi-layer D2NN 10 structure can perform. Multi-layer architecture of D2NN 10 provides a large degree-of-freedom within a physical volume to train the transfer function between its input and the output planes, which, in general, cannot be replaced by a single phase-only or complex modulation layer (employing phase and amplitude modulation at each neuron).
To expand on this, a single diffractive layer performance is quite primitive compared to a multi-layered D2NN 10. As shown in
Error Sources and Mitigation Strategies. There are five main sources of error that contribute to the performance of a 3D-printed D2NN 10: (1) Poisson surface reconstruction is the first error source. After the transmission substrate layers 16 are trained, 3D structure of each substrate layer 16 is generated through the Poisson surface reconstruction as detailed in earlier. However, for practical purposes, one can only use a limited number of sampling points, which distorts the 3D structure of each substrate layer 16. (2) Alignment errors during the experiments form the second source of error. To minimize the alignment errors, the transmission substrate layers 16 and input objects 14 are placed into single 3D printed holder. However, considering the fact that 3D printed materials have some elasticity, the thin transmission substrate layers 16 do not perfectly stay flat, and they will have some curvature. Alignment of THz source and detector with respect to the transmission layers also creates another error source in the experiments. (3) 3D-printing is the third and one of the most dominant sources of error. This originates from the lack of precision and accuracy of the 3D-printer used to generate network substrate layers 16. It smoothens the edges and fine details on the transmission layers. (4) Absorption of each transmissive substrate layer 16 is another source that can deteriorate the performance of a D2NN design. (5) The measurements of the material properties that are extensively used in the simulations such as refractive index and extinction coefficient of the 3D printed material might have some additional sources of error, contributing to a reduced experimental accuracy. It is hard to quantitatively evaluate the overall magnitude of these various sources of errors; instead the Poisson surface reconstruction errors, absorption related losses at different layers and 0.1 mm random misalignment error for each network layer were incorporated during the testing phase of the D2NNs as shown in
To minimize the impact of the 3D printing error, a relatively large pixel size, i.e. 0.4 mm and 0.3 mm was used for the classification and imaging D2NNs, respectively. Furthermore, a 3D-printed holder (
Reconfigurable D2NN Designs. As explained herein, some embodiments use SLMs as part of a D2NN 10. In addition to using SLMs as part of a reconfigurable D2NN 10, another option is to use a given 3D-printed or fabricated D2NN 10 as a fixed input block of a new diffractive network where one trains only the additional layers that plan to be fabricated. Assume for example that a 5-layer D2NN 10 has been printed/fabricated for a certain inference task. As its prediction performance degrades or slightly changes, due to e.g., a change in the input data, etc., one can train a few additional layers of substrate 16 to be physically added/patched to the existing printed/fabricated network 10 to improve its inference performance. In some cases, one can even peel off (i.e., discard or remove) some of the existing layers of substrates 16 of the printed network and assume the remaining fabricated substrate layers 16 as a fixed (i.e., non-learnable) input block to a new network where the new layers to be added/patched are trained for an improved inference task (coming from the entire diffractive network: old layers and new layers). Intuitively, one can think of each D2NN 10 as akin to a Lego® piece (with several layers following each other); one can either add a new substrate layer 16 (or multiple substrate layers 16) on top of existing (i.e., already fabricated) ones, or peel off/remove some substrate layers 16 and replace them with the new trained diffractive substrates 16. This provides a unique physical implementation (like blocks of Lego®) for transfer learning or mending the performance of a printed/fabricated D2NN 10 design.
This modular design for the Fashion MNIST diffractive network 10 was implemented and the results are summarized in
Discussion of Unique Imaging Functionalities using D2NNs. The D2NN framework will help imaging at the macro and micro/nano scale by enabling all-optical implementation of some unique imaging tasks. One possibility for enhancing imaging systems could be to utilize D2NN 10 designs to be integrated with sample holders or substrates used in microscopic imaging to enhance certain bands of spatial frequencies and create new contrast mechanisms in the acquired images. In other words, as the sample on a substrate (e.g., cells or tissue samples, etc.) diffracts light, a D2NN 10 can be used to project magnified images of the cells/objects onto a CMOS/CCD imaging senor or chip with certain spatial features highlighted or enhanced, depending on the training of the diffractive network. This could form a very compact chip-scale microscope (just a passive D2NN 10 placed on top of an imager chip) that implements, all-optically, task specific contrast imaging and/or object recognition or tracking within the sample. Similarly, for macro-scale imaging, such as, for example, face recognition, as an example, could be achieved as part of a sensor design, without the need for a high mega-pixel imager. For instance, tens to hundreds of different classes can potentially be detected using a modest (e.g., <1 Mega-pixel) imager chip placed at the output plane 22 of a D2NN 10 that is built for this inference task.
For THz part of the spectrum, as another possible use example, various biomedical applications that utilize THz imagers for looking into chemical sensing or the composition of drugs to detect e.g., counterfeit medicine, or for assessing the healing of wounds etc. could benefit from D2NN 10 designs to automate predictions in such THz-based analysis of specimen using a diffractive neural network.
Optical Nonlinearity in Diffractive Deep Neural Networks. Optical nonlinearity can be incorporated into the D2NN 10 using various optical non-linear materials (crystals, polymers, semiconductor materials, doped glasses, among others as detailed below). A D2NN 10 operates based on controlling the diffraction or reflection of light through complex-valued diffractive/reflective elements to perform a desired/trained task. Augmenting nonlinear optical components is both practical and synergetic to the D2NN 10 structures described herein. Assuming that the input object 14, together with the D2NN diffractive substrate layers 16, create a spatially varying complex field amplitude E(x,y) at a given substrate layer 16, then the use of a nonlinear medium (e.g., optical Kerr effect based on third-order optical nonlinearity, χ(3)) will introduce an all-optical refractive index change which is a function of the input field's intensity, Δn∝χ(3)E2. This intensity dependent refractive index modulation and its impact on the phase and amplitude of the resulting waves through the diffractive network 10 can be numerically modeled and therefore is straightforward to incorporate as part of the network training phase. Any third-order nonlinear material with a strong χ(3) could be used to form the nonlinear diffractive substrate layers 16: glasses (e.g., As2S3, metal nanoparticle doped glasses), polymers (e.g., polydiacetylenes), organic films, semiconductors (e.g., GaAs, Si, CdS), graphene, among others. There are different fabrication methods that can be employed to structure each nonlinear layer of a diffractive neural network using these materials.
In addition to third-order all-optical nonlinearity, another method to introduce nonlinearity into a D2NN 10 is to use saturable absorbers that can be based on materials such as semiconductors, quantum-dot films, carbon nanotubes or even graphene films. There are also various fabrication methods, including standard photo-lithography, that can be employed to structure such materials as part of a D2NN design. For example, in THz wavelengths, recent research has demonstrated inkjet printing of graphene saturable absorbers. Graphene-based saturable absorbers are further advantageous since they work well even at relatively low modulation intensities.
Another promising avenue to bring non-linear optical properties into D2NNs 10 is to use nonlinear metamaterials. These materials have the potential to be integrated with diffractive or reflective networks owing to their compactness and the fact that they can be manufactured with standard fabrication processes. While a significant part of the previous work in the field has focused on second and third harmonic generation, recent studies have demonstrated very strong optical Kerr effect for different parts of the electromagnetic spectrum, which can be incorporated into the deep diffractive neural network architecture to bring all-optical nonlinearity into its operation.
Finally, one can also use the DC electro-optic effect to introduce optical nonlinearity into the layers of a D2NN although this would deviate from the “all-optical” operation of the device 10 and require a DC electric-field for each substrate layer 16 of the diffractive neural network 10. This electric-field can be externally applied to each layer of a D2NN 10. Alternatively, one can also use poled materials with very strong built-in electric fields as part of the material (e.g., poled crystals or glasses). The latter will still be all-optical in its operation, without the need for an external DC field. To summarize, there are several practical approaches that can be integrated with diffractive neural networks to bring physical all-optical nonlinearity into D2NNs 10.
Experimental—Improved All-Optical D2NNs and Hybrid Optical D2NN and Electronic Neural Network-Based System
Furthermore, in an alternative embodiment, a hybrid optical and electronic neural network-based system 40 is disclosed that uses an all-optical front end 42 along with a back-end electronic neural network 44 to create hybrid machine learning and computer vision systems. Such a hybrid system 40 utilizes an all-optical D2NN front-end 42, before the electronic neural network 44, and if it is jointly optimized (i.e., optical and electronic as a monolithic system design), it presents several important advantages. This D2NN-based hybrid system 40 approach can all-optically compress the needed information by the electronic network 44 using a D2NN front-end 42, which can then significantly reduce the number of pixels of the optical sensor 26 (e.g., detectors) that needs to be digitized for an electronic neural network 44 to act on. This would further improve the frame-rate of the entire system, also reducing the complexity of the electronic network 44 and its power consumption. This D2NN-based hybrid system 40 can potentially create ubiquitous and low-power machine learning systems that can be realized using relatively simple and compact imagers, with e.g., a few tens to hundreds of pixels at the opto-electronic sensor plane, preceded by an ultra-compact all-optical diffractive network 42 with a layer-to-layer distance of a few wavelengths, which presents important advantages compared to some other hybrid network configurations involving e.g., a 4-f configuration to perform a convolution operation before an electronic neural network.
To better highlight these unique opportunities enabled by D2NN-based hybrid system 40, an analysis was conducted to reveal that a 5-layer phase-only (or complex-valued) D2NN that is jointly-optimized with a single fully-connected layer, following the optical diffractive layers, achieves a blind classification accuracy of 98.71% (or 98.29%) and 90.04% (or 89.96%) for the recognition of hand-written digits and fashion products, respectively. In these results, the input image to the electronic network 44 (created by diffraction through the jointly-optimized front-end D2NN 42) was also compressed by more than 7.8 times, down to 10×10 pixels, which confirms that a D2NN-based hybrid system 40 can perform competitive classification performance even using a relatively simple and one-layer electronic network that uses significantly reduced number of input pixels.
In addition to potentially enabling ubiquitous, low-power and high-frame rate machine learning and computer vision platforms, these hybrid neural network systems 40 which utilize D2NN-based all-optical processing at its front-end 42 will find other applications in the design of compact and ultra-thin optical imaging and sensing systems by merging fabricated D2NNs with optical sensors 26 such as opto-electronic sensor arrays. This will create intelligent systems benefiting from various CMOS/CCD imager chips and focal plane arrays at different parts of the electromagnetic spectrum, merging the benefits of all-optical computation with simple and low-power electronic neural networks that can work with lower dimensional data, all-optically generated at the output of a jointly-optimized D2NN design.
Mitigating Vanishing Gradients in Optical Neural Network Training
In the D2NN framework, each neuron 24 has a complex transmission coefficient, i.e., til(xi, yi, zi)=ail(xi, yi, zi)exp(jϕil(xi, yi, zi)), where i and l denote the neuron and diffractive layer number, respectively. ail and ϕil are represented during the network training as functions of two latent variables, α and β, defined in the following form:
ail=sigmoid(αil), (14a)
ϕil=2π×sigmoid(βil), (14b)
where,
is a non-linear, differentiable function. In fact, the trainable parameters of a D2NN are these latent variables, αil and βil, and eqs. (14a, 14b) define how they are related to the physical parameters (ail and ϕil) of a diffractive optical network. Note that in eqs. (14a, 14b), the sigmoid acts on an auxiliary variable rather than the information flowing through the network. Being a bounded analytical function, sigmoid confines the values of ail and ϕil inside the intervals (0,1) and (0,2π), respectively. On the other hand, it is known that sigmoid function has vanishing gradient problem due to its relatively flat tails, and when it is used in the context depicted in eqs. (14a, 14b), it can prevent the network to utilize the available dynamic range considering both the amplitude and phase terms of each neuron. To mitigate these issues, eqs. (14a, 14b) were replaced as follows:
where ReLU refers to Rectified Linear Unit, and M is the number of neurons per layer. Based on eqs. (15a, 15b), the phase term of each neuron, ϕil, becomes unbounded, but since the exp(jϕil(xi, yi, zi)) term is periodic (and bounded) with respect to ϕil, the error back-propagation algorithm is able to find a solution for the task in hand. The amplitude term, ail, on the other hand, is kept within the interval (0,1) by using an explicit normalization step shown in eqs. (15a, 15b).
To exemplify the impact of this change alone in the training of an all-optical D2NN 10 design, for a 5-layer, phase-only (complex-valued) diffractive optical network 10 with an axial distance of 40×λ between its layers, the classification accuracy for Fashion-MNIST dataset increased from reported 81.13% (86.33%) to 85.40% (86.68%) following the above discussed changes in the parameterized formulation of the neuron transmission values. Further improvements were made in the inference performance of an all-optical D2NN 10 after the introduction of the loss function related changes into the training phase, which is discussed below.
Effect of the Learning Loss Function on the Performance of All-Optical Diffractive Neural Networks
As an alternative to using mean squared error (MSE) loss for D2NNs 10, the cross-entropy loss may be used as an alternative. Since minimizing the cross-entropy loss is equivalent to minimizing the negative log-likelihood (or maximizing the likelihood) of an underlying probability distribution, it is in general more suitable for classification tasks. Note that, cross-entropy acts on probability measures, which take values in the interval (0,1) and the signals coming from the detectors (one for each class) at the output layer of a D2NN 10 are not necessarily in this range; therefore, in the training phase, a softmax layer is introduced to be able to use the cross-entropy loss. It is important to note that although softmax is used during the training process of a D2NN 10, once the diffractive design converges and is fixed, the class assignment at the output plane of a D2NN 10 is still based solely on the maximum optical signal detected at the output plane, where there is one detector assigned for each class of the input data (see
When one combines D2NN training related changes reported above on the parametrization of neuron modulation (eqs. (15a, 15b)), with the cross-entropy loss outlined above, a significant improvement in the classification performance of an all-optical diffractive neural network 10 is achieved. For example, for the case of a 5-layer, phase-only D2NN 10 with 40×λ axial distance between the substrate layers 16, the classification accuracy for MNIST dataset increased from 91.75% to 97.18%, which further increased to 97.81% using complex-valued modulation, treating the phase and amplitude coefficients of each neuron as learnable parameters. The training convergence plots and the confusion matrices corresponding to these results are also reported in
All these results demonstrate that the D2NN framework using linear optical materials can already achieve a decent classification performance, also highlighting the importance of the potential of integrating optical nonlinearities into the substrate layers 16 of a D2NN 10, using e.g., plasmonics, metamaterials or other nonlinear optical materials, in order to come closer to the performance of state-of-the-art digital deep neural networks.
Performance Trade-Offs in D2NN Design
Despite the significant increase observed in the blind testing accuracy of D2NNs, the use of softmax-cross-entropy (SCE) loss function in the context of all-optical networks also presents some trade-offs in terms of practical system parameters. MSE loss function operates based on pixel-by-pixel comparison of a user-designed output distribution with the output optical intensity pattern, after the input light interacts with the diffractive layers (see e.g.,
This performance improvement with the use of SCE loss function in a diffractive neural network design comes at the expense of some compromises in terms of the expected diffracted power efficiency and signal contrast at the network output. To shed more light on this trade-off, if one defines the power efficiency of a D2NN 10 as the percentage of the optical signal detected at the target label detector (IL) corresponding to the correct data class with respect to the total optical signal at the output plane of the optical network (E).
Next the signal contrast of diffractive neural networks was analyzed, which was defined as the difference between the optical signal captured by the target detector (IL) corresponding to the correct data class and the maximum signal detected by the rest of the detectors (i.e., the strongest competitor (ISC) detector for each test sample), normalized with respect to the total optical signal at the output plane (E). The results of the signal contrast analysis are reported in
Comparing the performances of MSE-based and SCE-based D2NN designs in terms of classification accuracy, power efficiency and signal contrast, as depicted in
Advantages of Multiple Diffractive Layers in D2NN Framework
As demonstrated in
This is not in contradiction with the fact that, for an all-optical D2NN 10 that is made of linear optical materials, the entire diffraction phenomenon that happens between the input and output planes 20, 222 can be squeezed into a single matrix operation (in reality, every material exhibits some volumetric and surface nonlinearities, and what is meant here by a linear optical material is that these effects are negligible). In fact, such an arbitrary mathematical operation defined by multiple learnable diffractive layers cannot be performed in general by a single diffractive layer placed between the same input and output planes 20, 22; additional optical components/layers would be needed to all-optically perform an arbitrary mathematical operation that multiple learnable diffractive layers can in general perform. The D2NNs 10 creates a unique opportunity to use deep learning principles to design multiple diffractive substrate layers 16, within a very tight layer-to-layer spacing of less than 50×λ, that collectively function as an all-optical classifier, and this framework will further benefit from nonlinear optical materials and resonant optical structures to further enhance its inference performance.
In summary, the “depth” is a feature/property of a neural network, which means the network gets in general better at its inference and generalization performance with more layers. The mathematical origins of the depth feature for standard electronic neural networks relate to nonlinear activation function of the neurons. But this is not the case for a diffractive optical network since it is a different type of a network, not following the same architecture or the same mathematical formalism of an electronic neural network.
Connectivity in Diffractive Neural Networks
In the design of a D2NN 10, the layer-to-layer connectivity of the optical network is controlled by several parameters: the axial distance between the layers (ΔZ), the illumination wavelength (λ), the size of each fabricated neuron 24 and the width of the diffractive substrate layers 16. In numerical simulations, a neuron size of approximately 0.53×λ was used. In addition, the height and width of each diffractive substrate layer 16 was set to include 200×200=40K neurons 24 per layer. In this arrangement, if the axial distance between the successive diffractive layers is set to be ˜40×λ, then the D2NN 10 becomes fully-connected. On the other hand, one can also design a much thinner and more compact diffractive network by reducing ΔZ at the cost of limiting the connectivity between the diffractive substrate layers 16. To evaluate the impact of this reduction in network connectivity on the inference performance of a diffractive neural network 10, the performance of the D2NN 10 was tested using ΔZ=4×λ, i.e., 10-fold thinner compared to the earlier discussed diffractive networks. With this partial connectivity between the diffractive layers, the blind testing accuracy for a 5-layer, phase-only D2NN decreased from 97.18% (ΔZ=40×λ) to 94.12% (ΔZ=4×λ) for MNIST dataset (see
Integration of Diffractive Neural Networks with Electronic Networks: Performance Analysis of D2NN-Based Hybrid Machine Learning Systems
Integration of passive diffractive neural networks with electronic neural networks (see e.g.,
1 × 106
4 × 109
3 × 10−3
For the electronic neural networks that were considered in this analysis, in terms of complexity and the number of trainable parameters, a single fully-connected (FC) digital layer and a custom designed 4-layer convolutional neural network (CNN) (referred to it as 2C2F-1 due to the use of 2 convolutional layers with a single feature and subsequent 2 FC layers) represent the lower end of the spectrum (see Tables 1, 2); on the other hand, LeNet, ResNet-50 and another 4-layer CNN (referred to as 2C2F-64 pointing to the use of 2 convolutional layers, subsequent 2 FC layers and 64 high-level features at its second convolutional layer) represent some of the well-established and proven deep neural networks with more advanced architectures and considerably higher number of trainable parameters (see Table 2). All these digital networks used in the analysis, were individually placed after both a fully-connected (ΔZ=40×λ) and a partially-connected (ΔZ=4×λ) D2NN front-end 42 and the entire hybrid system 40 in each case was jointly optimized at the second stage of the hybrid system training procedure.
Among the all-optical D2NN-based classifiers presented in the previous sections, the fully-connected (ΔZ=40×λ) complex modulation D2NNs 10 have the highest classification accuracy values, while the partially-connected (ΔZ=4×λ) designs with phase-only restricted modulation are at the bottom of the performance curve (see the all-optical parts of
The 2nd, 3rd and 4th rows of the “Hybrid Systems” sub-tables in
Among the three (3) different detector array arrangements that were investigated, 10×10 detectors represent the case where the intensity on the opto-electronic sensor plane is severely undersampled. Therefore, the case of 10×10 detectors represents a substantial loss of information for the imaging-based scenario (note that the original size of the objects 14 in both image datasets is 28×28). This effect is especially apparent in results illustrated in
On the other hand, for designs that involve higher pixel counts and more advanced electronic neural networks (with higher energy and memory demand), the results reveal that D2NN based hybrid systems 40 perform worse compared to the inference performance of perfect imager-based computer vision systems. For example, based on table date of
Methods
Diffractive Neural Network Architecture
In the diffractive neural network model, the input plane represents the plane of the input object or its data, which can also be generated by another optical imaging system or a lens, e.g., by projecting an image of the object data. Input objects were encoded in amplitude channel (MNIST) or phase channel (Fashion-MNIST) of the input plane and were illuminated with a uniform plane wave at a wavelength of λ for all-optical classification. In the hybrid system simulations, on the other hand, the objects in both datasets were represented as amplitude objects at the input plane, providing a fair comparison between the two tables of
Optical fields at each plane of a diffractive network were sampled on a grid with a spacing of ˜0.53λ in both x and y directions. Between two diffractive layers, the free-space propagation was calculated using the angular spectrum method. Each diffractive layer, with a neuron size of 0.53λ×0.53λ, modulated the incident light in phase and/or amplitude, where the modulation value was a trainable parameter and the modulation method (phase-only or complex) was a pre-defined design parameter of the network. The number of layers and the axial distance from the input plane to the first diffractive layer, between the successive diffractive layers, and from the last diffractive layer to the detector plane were also pre-defined design parameters of each network. At the detector plane, the output field intensity was calculated.
Forward Propagation Model
Forward propagation was modeled as described previously herein.
Training Loss Function
To perform classification by means of all-optical diffractive networks with minimal post-processing (i.e., using only a max operation), discrete detectors were placed at the output plane. The number of detectors (D) is equal to the number of classes in the target dataset. The geometrical shape, location and size of these detectors (6.4λ×6.4λ) were determined before each training session. Having set the detectors at the output plane, the final loss value (L) of the diffractive neural network is defined through two different loss functions and their impact on D2NN based classifiers were explored. The first loss function was defined using the mean squared error (MSE) between the output plane intensity, Sl+1, and the target intensity distribution for the corresponding label, Gl+1, i.e.,
where K refers to the total number of sampling points representing the entire diffraction pattern at the output plane.
The second loss function used in combination with the all-optical D2NN 10 is the cross-entropy. To use the cross-entropy loss function, an additional softmax layer is introduced and applied on the detected intensities (only during the training phase of a diffractive neural network design). Since softmax function is not scale invariant, the measured intensities by D detectors at the output plane are normalized such that they lie in the interval (0,10) for each sample. With Il denoting the total optical signal impinging onto the lth detector at the output plane, the normalized intensities, I′l, can be found by,
In parallel, the cross-entropy loss function can be written as follows:
L=−ΣlDgl log(pl), (18)
where
and gl refer to the lth element in the output of the softmax layer, and the lth element of the ground truth label vector, respectively.
A key difference between the two loss functions is already apparent from eq. (16) and eq. (18). While the MSE loss function is acting on the entire diffraction signal at the output plane of the diffractive network, the softmax-cross-entropy is applied to the detected optical signal values ignoring the optical field distribution outside of the detectors (one detector is assigned per class). This approach based on softmax-cross-entropy loss brings additional degrees-of-freedom to the diffractive neural network training process, boosting the final classification performance, at the cost of reduced diffraction efficiency and signal contrast at the output plane. For both the imaging optics-based and hybrid (D2NN+electronic) classification systems presented in the tables of
Diffractive Network Training
All neural networks (optical and/or digital) were simulated using Python (v3.6.5) and TensorFlow (v1.10.0, Google Inc.) framework. All-optical, hybrid and electronic networks were trained for 50 epochs using a desktop computer with a GeForce GTX 1080 Ti Graphical Processing Unit, GPU and Intel® Core™ i9-7900X CPU @3.30 GHz and 64 GB of RAM, running Windows 10 operating system (Microsoft).
Two datasets were used in the training of the presented classifiers: MNIST and Fashion-MNIST. Both datasets have 70,000 objects/images, out of which 55,000 and 5,000 were selected as training and validation sets, respectively. A remaining 10,000 were reserved as the test set. During the training phase, after each epoch the performance of the current model in hand was tested on the 5K validation set and upon completion of the 50th epoch, the model with the best performance on 5K validation set was selected as the final design of the network models. All the numbers reported herein are blind testing accuracy results held by applying these selected models on the 10K test sets.
The trainable parameters in a diffractive neural network are the modulation values of each layer, which were optimized using a back-propagation method by applying the adaptive moment estimation optimizer (Adam) with a learning rate of 10−3. A diffractive layer size of 200×200 neurons 24 per substrate layer 16 was chosen, which were initialized with π for phase values and 1 for amplitude values. The training time was approximately 5 hours for a 5-layer D2NN design with the hardware outlined above.
D2NN-Based Hybrid Network Design and Training
To further explore the potentials of D2NN framework, diffractive network layers were co-trained together with digital neural networks to form hybrid systems. In these systems, the detected intensity distributions at the output plane 22 of the diffractive network 42 were taken as the input for the digital neural network 44 at the back-end of the system. To begin with, keeping the optical architecture and the detector arrangement at the output plane of the diffractive network same as in the all-optical case, a single fully-connected layer was introduced as an additional component (replacing the simplest max operations in an all-optical network), which maps the optical signal values coming from D individual detectors into a vector of the same size (i.e., the number of classes in the dataset). Since there are 10 classes in both MNIST and Fashion-MNIST datasets, this simple fully-connected digital structure brings additional 110 trainable variables (i.e., 100 coefficients in the weight matrix and 10 bias terms) into the hybrid system 40.
Hybrid configurations that pair D2NNs with CNNs were also assessed, a more popular architecture than fully-connected networks for object classification tasks. In such an arrangement, when the optical and electronic parts are directly cascaded and jointly-trained, the inference performance of the overall hybrid system was observed to stagnate at a local minimum (see
In the second stage of the training process, the already trained 5-layer D2NN optical front-end 42 (preceding the detector array 26) was cascaded and jointly-trained with a digital neural network 44. It is important to note that the digital neural network in this configuration was trained from scratch. This type of procedure “resembles” transfer learning, where the additional layers (and data) are used to augment the capabilities of a trained model. Using the above described training strategy, the impact of different configurations was studied, by increasing the number of detectors forming an opto-electronic detector array 26, with a size of 10×10, 25×25 and 50×50 pixels. Having different pixel sizes (see Table 1), all the three configurations (10×10, 25×25 and 50×50 pixels) cover the central region of approximately 53.3λ×53.3λ at the output plane of the D2NN 42. Note that each detector configuration represents different levels of spatial undersampling applied at the output plane 22 of a D2NN 42, with 10×10 pixels corresponding to the most severe case. For each detector configuration, the first stage of the hybrid system training, shown in
For the digital part, five different networks were analyzed representing different levels complexity regarding (1) the number of trainable parameters, (2) the number of FLOPs in the forward model and (3) the energy consumption; see Table 1. This comparative analysis depicted in Table 1 on energy consumption assumes that 1.5 pJ is needed for each multiply-accumulate (MAC) and based on this assumption, the 4th column of Table 1 reports the energy needed for each network configuration to classify an input image. The first one of these digital neural networks was selected as a single fully-connected (FC) network connecting every pixel of detector array with each one of the 10 output classes, providing as few as 1,000 trainable parameters (see Table 1 for details). The 2C2F-1 network was used as a custom designed CNN with 2 convolutional and 2 FC layers with only a single filter/feature at each convolutional layer (see Table 2). As the 3rd network, LeNet was used which requires a certain input size of 32×32 pixels, thus the detector array values were resized using bilinear interpolation before being fed into the electronic neural network. The fourth network architecture that was used in the comparative analysis (i.e., 2C2F-64), as described in (https://www.tensorflow.org/tutorials/estimators/cnn), has 2 convolutional and 2 fully-connected layers similar to the second network, but with 32 and 64 features at the first and second convolutional layers, respectively, and has larger FC layers compared to the 2C2F-1 network. The last network choice was ResNet-50 with 50 layers, which was only jointly-trained using the 50×50 pixel detector configuration, the output of which was resized using bilinear interpolation to 224×224 pixels before being fed into the network. The loss function of the D2NN-based hybrid system was calculated by cross-entropy, evaluated at the output of the digital neural network.
As in D2NN-based hybrid systems, the objects were assumed to be purely amplitude modulating functions for perfect imager-based classification systems presented in the tables of
The main reason behind the development of the two-stage training procedure stems from the unbalanced nature of the D2NN-based hybrid systems, especially if the electronic part of the hybrid system is a powerful deep convolutional neural network (CNN) such as ResNet. Being the more powerful of the two and the latter in the information processing order, deep CNNs adapt and converge faster than D2NN-based optical front-ends. Therefore, directly cascading and jointly-training D2NNs with deep CNNs offer a suboptimal solution on the classification accuracy of the overall hybrid system. In this regard, the tables in
After training this model for 50 epochs, the layers of the diffractive network preceding the detector array 26 are taken as the initial condition for the optical part in the second stage of the training process (see
While embodiments of the present invention have been shown and described, various modifications may be made without departing from the scope of the present invention. The drawings may refer to various dimensions such as spacing between substrate layers 16. Such dimensional information is for explanatory purposes and should not limited the scope of the invention. In addition, features of one specific embodiment may be used in another embodiment even though no explicitly described herein. For example, optically reflective substrates 16 may be combined with optically transmissive substrates 16 in some embodiments. Likewise, the electronic network 44 back-end may be used in conjunction with a reflective embodiment like disclosed in
This Application is a U.S. National Stage filing under 35 U.S.C. § 371 of International Application No. PCT/US2019/027275, filed Apr. 12, 2019, which claims priority to U.S. Provisional Patent Application No. 62/657,405 filed on Apr. 13, 2018, U.S. Provisional Patent Application No. 62/703,029 filed on Jul. 25, 2018 and U.S. Provisional Patent Application No. 62/740,724 filed on Oct. 3, 2018, which are hereby incorporated by reference. Priority is claimed pursuant to 35 U.S.C. §§ 119, 371 and any other applicable statute.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2019/027275 | 4/12/2019 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2019/200289 | 10/17/2019 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
4963725 | Hong | Oct 1990 | A |
5080464 | Toyoda | Jan 1992 | A |
5095459 | Ohta et al. | Mar 1992 | A |
5255362 | Brandstetter | Oct 1993 | A |
5440671 | Shiratani | Aug 1995 | A |
5842191 | Stearns | Nov 1998 | A |
6445470 | Jenkins et al. | Sep 2002 | B1 |
7512573 | Martinez | Mar 2009 | B2 |
10217023 | Rubin | Feb 2019 | B1 |
11017309 | Roques-Carmes | May 2021 | B2 |
20100209830 | Carcasi et al. | Aug 2010 | A1 |
20170351293 | Carolan et al. | Dec 2017 | A1 |
Entry |
---|
Bagherian, H. et al., On-Chip Optical Convolutional Neural Networks, arXiv:1808.03303v2 [cs.ET] Aug. 16, 2018. |
Brunner, D. et al., Parallel photonic information processing at gigabyte per second data rates using transient states, Nature Communications, 4:1364, DOI: 10.1038/ncomms2368, www.nature.com/naturecommunications, Publication Year 2013. |
Bueno, J. et al., Reinforcement Learning in a large scale photonic Recurrent Neural Network, arXiv:1711 05133v2 [cs.NE] Nov. 15, 2017. |
Chakraborty, I. et al., Toward Fast Neural Computing using All-Photonic Phase Change Spiking Neurons, arXiv:1804.00267v2 [cs.ET] Aug. 28, 2018. |
Chang, J. et al., Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification. Scientific Report (2018) 8:12324, DOI: 10.1038/s41598-018-30619-y. |
Chen, Y. et al., Deep Learning-Based Classification of Hyperspectral Data, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 7, No. 6, Jun. 2014. |
Cignoni, P. et al., MeshLab: an Open-Source Mesh Processing Tool, Eurographics Italian Chapter Conference (2008). |
Farhat, N. H. et al., Optical implementation of the Hopfield model, May 15, 1985, vol. 24, No. 10, Applied Optics, 1469-1475. |
Girshick, R. et al., Rich feature hierarchies for accurate object detection and semantic segmentation Tech report (v5), arXiv:1311.2524V5 [cs.CV] Oct. 22, 2014. |
Golik, P. et al., Cross-Entropy vs. Squared Error Training: a Theoretical and Experimental Comparison, Human Language Technology and Pattern Recognition, Computer Science Department, RWTH Aachen University, 52056 Aachen, Germany (2013). |
Grischkowsky, D. et al., Far-infrared time-domain spectroscopy with terahertz beams of dielectrics and semiconductors, J. Opt. Soc. Am. B/vol. 7, No. 10/Oct. 1990. |
Hermans, M. et al., Trainable & Dynamic Computing: Error Backpropagation Through Physical Media, arXiv:1407 6637v1 [cs.NE] Jul. 24, 2014. |
Hughes, T.W. et al., Training of photonic neural networks through in situ backpropagation, arXiv:1805.09943v1 [physics.optics] May 25, 2018. |
Kazhdan, M. et al., Screened Poisson Surface Reconstruction, ACM Transactions on Graphics, vol. VV, No. N, Article XXX, Publication date: 2013. |
Khorasaninejad, M. et al., Metalenses at visible wavelengths: Diffraction-limited focusing and subwavelength resolution imaging, sciencemag.org, SCIENCE, Jun. 3, 2016, vol. 352, Issue 6290, 1190-1194. |
Kildishev, A. V. et al., Planar Photonics with Metasurfaces, Science 339, 1232009 (2013). DOI: 10.1126/science.1232009. |
Kingma, D. P. et al., ADAM: A Method for Stochastic Optimization, arXiv:1412.6980v9 [cs.LG] Jan. 30, 2017. |
Lecun, Y. et al., GradientBased Learning Applied to Document Recognition, Proc. of the IEEE, Nov. 1998 (46 pages). |
Lin, X. et al., All-optical machine learning using diffractive deep neural networks, Science 361, 1004-1008 (2018), Sep. 7, 2018. |
Psaltis, D. et al., Optical information processing based on an associative-memory model of neural nets with thresholding and feedback, Optics Letters, vol. 10, No. 2, Feb. 1985. |
Psaltis, D. et al., Adaptive optical networks using photorefractive crystals, Applied Optics, vol. 27, No. 9, May 1, 1988. |
Shastri, B. et al., Principles of Neuromorphic Photonics, arXiv:1801.00016v1 [cs.ET] Dec. 29, 2017. |
Shen, Y. et al., Deep Learning with Coherent Nanophotonic Circuits, arXiv:1610.02365v1 [physics.optics] Oct. 7, 2016. |
Srivastava, N. et al., Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research 15 (2014) 1929-1958 Submitted 11/13; Published Jun. 2014. |
Tang, Y. et al., Deep Learning using Linear Support Vector Machines, arXiv:1306.0239v4 [cs.LG] Feb. 21, 2015. |
Trabelsi, C. et al., Deep Complex Networks, arXiv:1705.09792v4 [cs.NE] Feb. 25, 2018. |
Wagner, K. et al., Multilayer optical learning networks, Dec. 1, 1987, vol. 26, No. 23, Applied Optics, 5061-5076. |
Wan, Li et al., Regularization of Neural Networks using DropConnect, Proceedings of the 30 th International Conference on Machine Learning, Atlanta, Georgia, USA, 2013. |
Wang, Z. et al., Image Quality Assessment: From Error Visibility to Structural Similarity, IEEE Transactions on Image Processing, vol. 13, No. 4, Apr. 2004. |
Weverka, R. T. et al., Fully interconnected, two-dimensional neural arrays using wavelength-multiplexed volume holograms, Optics Letters, vol. 16, No. 11, Jun. 1, 1991. |
Xiao, Y. et al., Nonlinear Metasurface Based on Giant Optical Kerr Response of Gold Quantum Wells, ACS Photonics, 5(5), May 1, 2018, DOI:10.1021/acsphotonics.7b01140. |
Yin, X. et al., Artificial Kerr-type medium using metamaterials, Apr. 9, 2012, vol. 20, No. 8, Optics Express, 8543-8550. |
PCT International Search Report for PCT/US2019/027275, Applicant: The Regents of the University of California, Form PCT/ISA/210 and 220, dated Aug. 19, 2019 (5 pages). |
PCT Written Opinion of the International Search Authority for PCT/US2019/027275, Applicant: The Regents of the University of California, Form PCT/ISA/237, dated Aug. 19, 2019 (9 pages). |
Javidi, B. et al., Optical implementation of neural networks for face recognition by the use of nonlinear joint transform correlators, Applied Optics, vol. 34, No. 20, Jul. 10, 1995, 3950-3962. |
Jutamulia, S. et al., Overview of hybrid optical neural networks, Optics & Laser Technology, vol. 28, No. 2, pp. 59-72, 1996. |
Psaltis, D. et al., Holography in artificial neural networks. Nature. 343, 325-330 (1990). |
Soures, N. et al., Neuro-MMI: A Hybrid Photonic-Electronic Machine Learning Platform. In 2018 IEEE Photonics Society Summer Topical Meeting Series (SUM); 2018; pp. 187-188. |
Ku, F. T. S., II Optical Neural Networks: Architecture, Design and Models. In Progress in Optics; Wolf, E., Ed.; Elsevier, 1993; vol. 32, pp. 61-144. |
Yu, F. et al., Flat optics with designer metasurfaces. Nature Materials. 13, 139-150 (2014). |
Caulfield, H.J. et al., Optical Neural Networks. Proc. IEEE 1989, 77 (10), 1573-1583. |
PCT International Preliminary Report on Patentability (Chapter 1 of the Patent Cooperation Treaty) for PCT/US2019/027275, Applicant: The Regents of the University of California, Form PCT/IB/326 and 373, dated Oct. 22, 2020 (11 pages). |
Number | Date | Country | |
---|---|---|---|
20210142170 A1 | May 2021 | US |
Number | Date | Country | |
---|---|---|---|
62657405 | Apr 2018 | US | |
62703029 | Jul 2018 | US | |
62740724 | Oct 2018 | US |