The present disclosure relates generally to the field of photonic devices and neural networks and artificial intelligence, in particular to systems and methods for fully or partially processing data in the optical domain in neural networks.
Neural networks are often utilized for data classification including image, video, and 3D objects. In conventional photonic neural network implementations, there are significant computational challenges when analyzing large data sets, which may include optical, image, and other data. For example, raw optical data is often analyzed using an image sensor serving as a pixel array, through methods such as photo-detection and digitization. Larger data sets, such as those with a large number of input pixels, computational load quickly becomes great and processing times are lengthened, as the data is passed through a plurality of neural network layers. In addition, optical power drops significantly from layer to layer in these processes, which together with other implementation difficulties, makes realization of non-linear functions challenging. Hence, only a limited number of neuron layers can be implemented before the computational, power costs, and non-linear functionality become overly burdensome. Thus, there is a need for improved neural networks, and in particular, for neural networks able to process different types of data.
The present disclosure provides systems and methods for photonic-electronic neural network computation. Embodiments provide the direct processing of raw optical data and/or conversion of various types of input data to the optical domain, and application into neural networks. Through the direct use of data in the optical domain, disclosed systems and methods are able to significantly reduce processing time and computational load, compared to traditional neural network implementations. In various examples, both processing time and power consumption are orders of magnitude lower than conventional methods.
In an embodiment, arrays of input data are processed in an optical domain and applied through a plurality of photonic-electronic neuron layers, such as in a neural network. The data may be passed through one or more convolution cells, training layers, and classification layers to generate output information. Various types of input data, e.g., audio, video, speech, analog, digital, etc., may be directly processed in the optical domain and applied to any numbers of layers and neurons in various neural network configurations. Systems and methods may also be integrated with one or more photonic-electronic systems, including but not limited to 3D imagers, optical phased arrays, photonic assisted microwave imagers, high data-rate photonic links, and photonic neural networks.
The appended drawings are illustrative only and are not necessarily drawn to scale. In the drawings:
The present disclosure may be understood more readily by reference to the following detailed description of desired embodiments and the examples included therein.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. In case of conflict, the present document, including definitions, will control. Preferred methods and materials are described below, although methods and materials similar or equivalent to those described herein can be used in practice or testing. All publications, patent applications, patents and other references mentioned herein are incorporated by reference in their entirety. The materials, methods, and examples disclosed herein are illustrative only and not intended to be limiting.
The singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise.
As used in the specification and in the claims, the term “comprising” may include the embodiments “consisting of” and “consisting essentially of.” The terms “comprise(s),” “include(s),” “having,” “has,” “can,” “contain(s),” and variants thereof, as used herein, are intended to be open-ended transitional phrases, terms, or words that require the presence of the named ingredients/steps and permit the presence of other ingredients/steps. However, such description should be construed as also describing compositions or processes as “consisting of” and “consisting essentially of” the enumerated ingredients/steps, which allows the presence of only the named ingredients/steps, along with any impurities that might result therefrom, and excludes other ingredients/steps.
As used herein, the terms “about” and “at or about” mean that the amount or value in question can be the value designated some other value approximately or about the same. It is generally understood, as used herein, that it is the nominal value indicated ±10% variation unless otherwise indicated or inferred. The term is intended to convey that similar values promote equivalent results or effects recited in the claims. That is, it is understood that amounts, sizes, formulations, parameters, and other quantities and characteristics are not and need not be exact, but can be approximate and/or larger or smaller, as desired, reflecting tolerances, conversion factors, rounding off, measurement error and the like, and other factors known to those of skill in the art. In general, an amount, size, formulation, parameter or other quantity or characteristic is “about” or “approximate” whether or not expressly stated to be such. It is understood that where “about” is used before a quantitative value, the parameter also includes the specific quantitative value itself, unless specifically stated otherwise.
Unless indicated to the contrary, the numerical values should be understood to include numerical values which are the same when reduced to the same number of significant figures and numerical values which differ from the stated value by less than the experimental error of conventional measurement technique of the type described in the present application to determine the value.
All ranges disclosed herein are inclusive of the recited endpoint and independently of the endpoints, 2 grams and 10 grams, and all the intermediate values). The endpoints of the ranges and any values disclosed herein are not limited to the precise range or value; they are sufficiently imprecise to include values approximating these ranges and/or values.
As used herein, approximating language may be applied to modify any quantitative representation that may vary without resulting in a change in the basic function to which it is related. Accordingly, a value modified by a term or terms, such as “about” and “substantially,” may not be limited to the precise value specified, in some cases. In at least some instances, the approximating language may correspond to the precision of an instrument for measuring the value. The modifier “about” should also be considered as disclosing the range defined by the absolute values of the two endpoints. For example, the expression “from about 2 to about 4” also discloses the range “from 2 to 4.” The term “about” may refer to plus or minus 10% of the indicated number. For example, “about 10%” may indicate a range of 9% to 11%, and “about 1” may mean from 0.9-1.1. Other meanings of “about” may be apparent from the context, such as rounding off, so, for example “about 1” may also mean from 0.5 to 1.4. Further, the term “comprising” should be understood as having its open-ended meaning of “including,” but the term also includes the closed meaning of the term “consisting.” For example, a composition that comprises components A and B may be a composition that includes A, B, and other components, but may also be a composition made of A and B only. Any documents cited herein are incorporated by reference in their entireties for any and all purposes.
Building on work on large scale integrated electronic-photonic systems including 3D imagers, optical phased arrays, photonic assisted microwave imagers, high data-rate photonic links, and photonic neural networks, the inventors have been designing and implementing multi-layer integrated photonic- mmWave deep neural networks for image, video, and 3D object classification. In the disclosed system, images are taken using an array of pixels and directly processed in the optical domain for both or either learning and classification phases with part of the processing (including the non-linear function) is performed in electrical (analog, digital, RF, mm-wave, ...) blocks. The invention also include processing of other types of input data including but not limited to audio, video, speech, and/or the analog or digital representation of any type of data.
Compared to the state-of-the-art GPU based systems, the disclosed architecture, which can be implemented at any number of layers and neurons in many different configurations, directly processes the raw optical data or any type of data after up-conversion to optical domain (without photo-detection/digitization) with orders-of magnitude faster processing time, orders- of-magnitude lower power consumption, and scalability to complex practical deep networks.
Unlike recent implementations of photonic neural networks, where optical power drops significantly layer by layer (hence a limited number of neuron layers can be implemented), the disclosed monolithic electronic- photonic system (1) contains several neuron layers and can be utilized in practical applications, (2) utilizes strong and programmable yet ultra-fast mmWave non-linear function, and (3) is highly scalable to many layers as the same optical power is available to each layer.
The inventors have already designed and successfully measured many blocks of this system such as photonic- mmWave neuron, non-linear function, 3D imager front-end, and have taped-out the first version of the multi-layer deep network to be demonstrated in the course of the competition. Chip simulations show 280ps classification time (per frame) and 2 ns training time (per iteration).
The inventors disclose a design and implementation of an integrated photonic deep neural networks for image, video, and 3D object classification. While the disclosed integrated photonic architecture directly processes the raw optical (image) data collected at the input pixels, which significantly reduces the system complexity and power consumption by eliminating the photo-detection and digitization of the input image data, it can also be used for other types of data after up-conversion to optical domain.
The elements of the correlation output matrix are arranged and fed to the neurons in the first layer (i.e. input layer) of the neural network. Besides the input layer, the typical deep network architecture is composed of an output layer and intermediate “hidden” layers. For networks with large number of input pixels, multiple convolution layers can be used to further lower the computation load.
Sample images of hand-written numbers (for step 1) are shown in
To realize the convolution layer using overlapping sliding windows, a photonic waveguide network is designed to route the optical signals from twelve 3×3 overlapping windows of pixels to an array of convolution cells (CC). Different size and type of windows can be used. Each 3×3 waveguide array forms the inputs of a convolution cell. Within each CC, the inner product of input optical signals and the pre- programmed 3×3 convolution matrix is photonically calculated. The outputs of the 12 convolution cells are arranged and routed to four photonic-electronic neurons (i.e. 3 inputs per neuron) forming the input layer of the deep learning network. Within each photonic-electronic neuron, the input optical waves are combined after their amplitudes are adjusted according to the weight associated with each input. The non-linear activation function is realized in electro-optical or electrical domain and the signal is up-converted back to the optical domain to form the neuron output. Additional devices and systems within each neuron are implemented enabling the electronic-photonic neuron to be used in both forward propagation (in the classification phase) and the backward propagation (in the training phase). The second layer, the hidden layer, composed of three 4-input photonic-electronic neurons and is followed by the output layer with two photonic-electronic neurons. This photonic deep neural network will be used to perform 2-class classification of images. For example, the system can be trained with images of two digits (e.g. “0” and “2”) and used to classify the images of these two digits. The details of each component of the architecture in
In another embodiment of the neuron in this disclosure, neurons can be used to perform complex signal analysis where both amplitude and phase of electric field of light is processed. An example is shown in
When the input current to the TIA, iin is small enough (less than a certain threshold), the output power is set to Pout = 0.003Ps. As iin increases, the modulator output power increases almost linearly as Pout = P(1 + 0.07Kiin), where iin is in mA and K = KTKd. For large enough iin, the electronic-photonic neuron output saturates at Pout = 0.65Ps. Note that the shape of the activation function can be adjusted by changing the TIA gain, the BL power (Ps), and the DC current at the modulator driver output. The DC part of the modulator driver current can be used to adjust the relative location of the notch with respect to the wavelength. For Pout < 0.65Ps, corresponding to the non-saturated response, the activation function can be approximated by the rectified linear unit (ReLU), which is a known activation function for neural networks [12]. For the case that Pout includes the saturation region in
The inventors have also deigned a TIA and ring modulator driver as one block in GlobalFoundries GF9WG CMOS SOI process with simulated bandwidth of 27 GHz and current gain of 10 A/A. This disclosure include other types of TIAs and amplifies used between the photodiodes and modulating device within a neuron.
For the deep neural network in
Over the past few years, the inventors have designed, implemented, and measured many photonic devices and components on GlobalFoundries GF9WG CMOS SOI process as well as other photonic and photonic- enabled CMOS processes and created Verilog A models for many photonic devices based on their measured or simulated performances. On this process, electronic and photonic devices and blocks can be co-simulated using Cadence tools. The same approach has been used to design and successfully demonstrate a few monolithically co-integrated electronic-photonic systems on GlobalFoundries GF7SW CMOS SOI process as well as hybrid-integrated electronic-photonic systems. The inventors will use GF9WG process to implement the photonic deep learning networks. To validate the entire design of the photonic deep learning network to be implemented in the first step (in
In this section, an example of the classification of the 6×5-pixel handwritten numbers is used to explain the principle of operation of the forward propagation process for the system taped-out and to be demonstrated. As the target image is formed on the input 6×5 grating coupler array, optical waves are coupled into the input waveguides, passed though the routing network to generate 108 optical signals (corresponding to 12 overlapping 3×3 subimages) and arrive at 12 convolution cells used to compute the convolution. The outputs of the convolution cells are arranged into 4 rows of 3 optical signals and routed to the input of 4 neurons of the input layer. If the output of the 6×5 grating coupler array is rearranged into a column vector, Px (of size 30×1), twelhve different 9×30 matrices of C1to C9 representing the distribution network (including the corresponding optical losses) can be defined to find the intensity of light at the convolution cells. In this case, the input to the ith convolution cell is written as Qi = Ci × Px, where Qi is a 9×1 vector. Within each convolution cell, the inner product of the input vector and the 1×9 convolution weight vector, Wconv, is calculated as the cell output as Ji = Wconv × Qi = Wconv × Ci × Px. Note that the convolution weight vector is the same for all 12 convolution cells and does not change during the training and classification phases. The 12 outputs of the convolution cells are arranged into four 3×1 arrays, each used as the input to one of the four electronic-photonic neurons of the input layer as I1 = [J1J2J3]T, I2 = [J4J5J6]T, I3 = [J7J8J9]T, and I4 = [J10J11J12]T, where I1, I2, I3, and I4 represent 3×1 input vectors for the four neurons in the input layer. The output of each neuron is generated by passing the weighted sum of its inputs though the non-linear activation function. Thus, the output of ith neuron in the first layer is written as Oin, = ƒ(Win,i × Ii), where Win,i, and f(.) represent the 3-element weight vector for the ith neuron in the input layer (i = 1, 2, 3, 4) and the activation function, respectively. Similarly, the output of the ith neuron in the hidden layer (2nd layer) is written as Oh, = ƒ(Wh,i × [Oin,1 Oin,2 Oin,3 Oin,4 ]T), where Wh,i represent the 4-element weight vector in the ith neuron in the hidden layer (i = 1, 2, 3) and T denotes transpose operation. In the matrix format, assuming that Oin = [Oin,1 Oin,2 Oin,3 Oin,4]T, and Oh = [Oh,1 Oh,2 Oh,3 ], then
where Wh is a 3 × 4 matrix whose rows are Wh,i vectors for i = 1, 2, 3. Finally, the outputs of the output layer (3rd layer) are calculated as Oo, = ƒ(Wo,i × [Oh,1 Oh,2 Oh,3 ]T), where Wo,i represent the 3-element weight vector in the ith neuron (i = 1, 2) in the output layer. In the matrix format, assuming that Oo = [Oo, Oo,2 ]T, then Oo = ƒ(Wo × Oh), where Wo is a 2 × 3 matrix whose rows are Wo,i vectors for i = 1, 2. The outputs of the 3rd layer, Oo,and Oo,2 are used to determine the class of the input image. While the distribution network matrices (C1to C9) depend only on the layout of the distribution network, and the convolution weight vector is pre-defined and is unchanged during the training and classification, the weight vectors for all other layers (i.e. Win,i, Wh,i, and Wo,i ) are calculated during the training phase and updated electronically by setting the currents of the optical attenuators. Note that in this work, similar to the typical CNN, the weights of the convolution cells in the convolution layers are set to the same values, however, in another embodiment, the weights could be different for different convolution cells.
The array of 6×5 grating couplers can be similar to the one the inventors used for coherent imaging [5] but with a larger fill-factor. In this case, if an amplified laser emitting 50 mW at 1550 nm is used for illumination using a narrow-beam collimator from 0.5 m distance, once a focused image is formed, each pixel of the on-chip grating coupler array receives about 0.5 µW. To examine the performance of the photonic neural network in
The labels corresponding to the images are also loaded into the Cadence simulator and are used for supervised training. The entire system is realized in Cadence using the Verilog-A models of the photonic components next to the electronic devices instantiated from the GF9WG process PDK and simulated using Cadence SpectreRF tool. Images in the training set are fed to the system one-by-one. Digital computation and weight setting is performed using VerilogA blocks emulating an off-chip microcontroller. First, random initial weights (within the valid expected range) are set for all neurons. Then, the images within the training set (1800 images) are input to the system one-by-one. For each image, after forward propagation is completed, the outputs of the network, Oo, and Oo,2, are calculated and read by the microcontroller (emulated using VerilogA blocks in Cadence simulation).
Output error signals, eo,1 and eo,2, are calculated by subtracting the network outputs from the target values Target1 and Target2 ( that are hard-coded in the VerilogA code), that is, eo = [eo,1 eo,2]T = [Target1 – Oo,1 Target2- Oo,2]T . At this point, the error signals will be propagated backward and used to update the weight vectors for photonic-electronic neurons within different layers. First, the output error signals are used to find the equivalent error signals referred to the hidden layer based on the corresponding weights [9]. The current weight vectors are stored in the microcontroller (emulated by VerilogA blocks in Cadence). Therefore, the equivalent error signals back propagated to the hidden layer are calculated as
where eh = [eh,1eh,2eh,3]T and
is the normalized output layer weight function with ∑ Wo,i representing the sum of all 3 elements of Wo,i. Using the gradient decent method with a quadratic cost function [9], and assuming a ReLU activation function (see
In the previous section, an all-electronic training including the error back propagation and neuron weight update process was explained and used to verify the photonic-electronic forward propagation using Cadence tools. For deep networks with many layers and large number of neurons per layer, all- electronic training may slow down the training process significantly. In this section, the inventors disclose anovel photonic-electronic architecture capable of backward propagation calculation.
Consider the case that this neuron is placed in layer M. The error from layer M+1 can enter this neuron in the form of an optical signal. Half of this optical signal is guided to a PIN optical attenuator. This attenuator is set to high attenuation during the forward propagation phase and low attenuation during the back propagation phase to avoid generating errors during the forward propagation phase (classification). The PIN attenuator output at point Z is split into 12 branches with equal powers using a 1×12 MMI coupler splitter (see Table 1). Each output of the MMI is then coupled to one of the neuron input waveguide using a 50/50 directional coupler. Assuming the optical error signal back propagating from the (M+1)th layer to the neuron in the Mth layer to have the power of Po, for an N input neuron, the back propagating optical signal in each output of the MMI (after splitting) will have a power of
Since the PIN attentuators setting the signal weights are bidirectional, the error signals back propagated to the input of the neuron can be written as
where Wirepresent the weight in the ith input and the factor ⅛ represent the effect of two Y-junctions before point Z and the 50/50 coupler after the MMI. Similarly, these error signals continue to back propagate layer by layer to get to the first layer. Note that the power splitting performed by the MMI can be viewed as the error normalization as the power in each input path is divided by the total number of the neuron inputs.
After error back propagation, the weights need to be updated. To explain the weight adjustment process, consider the output and hidden layers for the network shown in
the goal is to use the gradient descent method to find the amount that each weight should be adjusted to minimize Etotal. In another embodiment, other optimization methods could be used for weight calculations. In this case, each weight W should be adjusted by
For example, for the first neuron of the output ∂w layer,
Defining the MMI output as zo,1 0.5(wo,1,1oh,1 + wo,1,2oh,2 + wo,1,3oh3), the o,1,1 output of this neuron is written as oo,1 = ƒ(Rzo,1), where f(.) represents the ReLU activation function. For this case, the change in wo,1,1 is written as
where α is the slope of the ReLU function (corresponding to its derivative). Then, this weight can be adjusted as wo,1,1 → wo,1,1 - LrΔwo,1,1. Interestingly, LrΔwo,1,1 can also be calculated opto-electronically. As shown in
where R, β and GM are the PDi responsivity, the gain of the trans-impedance amplifier, and the gain of the ring modulator R1, respectively. The output of the ring modulator R1 is photo-detected and amplified resulting in a mm-wave voltage that can be written as
where G is the gain of the amplifier after the photo-diode. Defining
this voltage can be written as
Therefore, the learning rate, Lr, can be adjusted by changing the gain of the amplifiers. This mm-wave voltage is connected to an on-chip analog weight and bias adjustment unit. This unit changes the value of wo,1,1, which is stored in a capacitor, to (wo,1,1 - LrΔwo,1,1). Similarly, all weight vectors in the output layer are updated. As shown in
The forward propagation time is mainly limited by the bandwidths of the photodiode, p-n ring modulator, and the mm-wave blocks within the activation functions. To provide a fair comparison between the performance of a deep network implemented on a state-of-the-art GPU platform and a similar photonic-electronic deep network, the inventors have used an NVIDIA Titan V (5120) GPU [10] to implement a typical 7 layer deep network to classify 256×256-pixel images. Using this GPU, the training (3000 iterations) and classification (99%) takes 20 min. and 3.8 ms, respectively. The power consumption of this GPU is about 65 W. For the same performance, the training and classification using disclosed photonic deep network are estimated to take 2.8 ms and 0.5 ns, respectively. Compared to GPU platform, the power consumption is reduced from 65 W to 1.2 W.
In the second step, the array of the grating coupler can be replaced with an alternative device, e.g., an optical phased array (OPA). In this case, both amplitude and phase of the target object would be available to the deep network enabling interesting applications such as 3D image classification and phase contrast image classification. Also, the OPA enables instantaneous free- space image correlation calculation and/or can be used for tracking and classification of fast-moving objects within a large field-of-view. The following references are provided for background and are incorporated herein by their entireties for any and all purposes.
The following embodiments are illustrative only and do not necessarily limit the scope of the present disclosure of the appended claims.
Embodiment 1. A method for artificial neural network computation, comprising: receiving an array of input data; processing the input data in an optical and electro-optical domain; applying the processed input data through a plurality of electronic-photonic neuron layers in a neural network; and generating an output comprising classification information from the neural network.
Embodiment 2. The method of Embodiment 1, wherein the input data comprises at least one of optical data audio data, image data, video data, speech data, analog data, and digital data.
Embodiment 3. The method of any one of Embodiments 1-2, further comprising upconverting the input data to be directly processed in the optical domain.
Embodiment 4. The method of Embodiment 3, wherein the upconverting occurs without digitization or photo-detection.
Embodiment 5. The method of any one of Embodiments 1-4, wherein the input data is optical data extracted from at least one of a data center connection, a fiber optic communication, and a 3D image.
Embodiment 6. The method of any one of Embodiments 1-5, wherein, at the input layer, the processed input data is weighted and passed through an activation function.
Embodiment 7. The method of any one of Embodiments 1-6, wherein the activation function is electro-optical or optical.
Embodiment 8. The method of any one of Embodiments 1-7, wherein the input data is complex with amplitude and phase.
Embodiment 9. The method of any one of Embodiments 1-8, wherein a pixel array provides the input data, and the input data is converted to an optical phased array.
Embodiment 10. The method of any one of Embodiments 1-9, wherein processing the input data comprises routing the input data through one or more convolution cells.
Embodiment 11. The method of Embodiment 10, wherein a photonic waveguide routes optical data to the one or more convolution cells
Embodiment 12. The method of claim any one of Embodiments 1-11, wherein the plurality of electronic-photonic neuron layers includes at least one training layer and a classification layer.
Embodiment 13. An artificial neural network system, comprising: at least one processor; and at least one memory comprising instructions that, when executed on the processor, cause the computing system to receive an array of input data; process the input data in an optical domain; apply the processed input data through a plurality of electronic-photonic neuron layers in a neural network; and generate an output comprising classification information from the neural network.
Embodiment 14. The system of Embodiment 13, wherein the input data comprises at least one of optical data audio data, image data, video data, speech data, analog data, and digital data.
Embodiment 15. The system of any one of Embodiments 13-14, further comprising upconverting the input data to be directly processed in the optical domain, and the upconverting occurs without digitization or photo-detection.
Embodiment 16. The system of any one of claims 13-15, further comprising a plurality of optical attenuators to adjust the processed input data.
Embodiment 17. The system of any one of Embodiments 13-16, further comprising a bias adjustment unit.
Embodiment 18. The system of any one of Embodiments 13-17, wherein the electronic-photonic neuron layers each comprise a biasing light.
Embodiment 19. The system of any one of Embodiments 13-18, further comprising at least one of a 3D imager, an optical phased array, and a photonic assisted microwave imager.
Embodiment 20. The system of any one of Embodiments 13-19, wherein generating an output has a classification time of less than 280 ps.
Embodiment 21. The system of any one of Embodiments 13-20, wherein, at the input layer, the processed input data is weighted and passed through an activation function.
Embodiment 22. The system of any one of Embodiments 13-21, wherein processing the input data comprises routing the input data through one or more convolution cells, and the plurality of electronic-photonic neuron layers includes a training layer and a classification layer.
This application claims priority to and the benefit of U.S. Pat. Application No. 63/054,692, “Photonic-Electronic Deep Networks” (filed Jul. 21, 2020), the entirety of which is incorporated herein by reference for any and all purposes.
This invention was made with government support under N00014-19-1-2248 awarded by the Office of Naval Research. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/042526 | 7/21/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63054692 | Jul 2020 | US |