The disclosure made in the US Patent Application Publication No. 2016/0139280 to Sahu, et al, issued as U.S. Pat. No. 9,568,623; and the disclosure made in the publication entitled “Reservoir computing using dynamic memristors for temporal information processing” to Du et al., NATURE COMMUNICATIONS|8, Article number: 2204 (2017)|DOI: 10.1038/s41467-017-02337-y, (hereafter, “Du”) are hereby incorporated by reference.
This invention relates generally to an optical reservoir computing (ORC) system and a method of using the ORC system. More particularly, the present invention relates to an ORC system with high dimensionality (HD), non-linearity (NL), fading memory (FM), and separation property (SP).
Introduction:
Machine Learning/Artificial Intelligence (ML/AI) algorithms using modern hardware and programming languages have been advancing rapidly during last decade. The hardware is almost exclusively built on semiconductor microprocessors. Current state-of-the-art (SoA) machine learning algorithms, such as Deep Neural Networks (DNNs) are implemented on high performance semiconductor microprocessor clusters comprising central processing units (CPUs), graphics processing units (GPUs) and specialized digital accelerators, for example, tensor processing units (TPUs).
Those high performance semiconductor microprocessor clusters follow the von Neumann architecture of computing, and have become powerful in terms of processing speed over the past decades, with reducing transistor size. ML/AI experts have taken advantage of this power to invent and implement new computational workloads for cognitive data analysis. The algorithms and the von Neumann implementations have now become increasingly important and ubiquitous. However, it has been recognized that doing more complex cognitive tasks with von Neumann architecture requires increasing number of processors, energy and cost. It has been contemplated that the cognitive workloads may work far more efficiently in some architectures outside of von Neumann. Human brain, for example, is a non-von Neumann machine that processes cognitive problems (for example, face recognition) far faster than a von Neumann machine using far less energy.
ML/AI algorithms, for example, DNNs, serving cognitive workloads can be mapped into non-von Neumann system architectures. The non-von Neumann architectures have been implemented in different fields of science and mathematics under the names such as “Reservoir computing”, “Echo State Networks”, “Recurrent Neural Network”, and “Liquid State Network”. The non-von Neumann architectures are generally implemented with inherent memory, dimensional expansion, and subsequent integration with feedback.
Reservoir computing (RC) is an example of a Recurrent Neural Network for dynamic data analysis, which has been implemented into traditional semiconductor hardware in the electrical domain still using von Neumann architecture for fundamental computing. A big step forward in this direction is to use the optical domain for RC, which may fundamentally use a non-von Neumann processing. The hardware methods are called Optical Reservoir Computing (ORC).
Generally, an ORC system uses bulk optical and electro-optical components for delayed-feedback systems to realize reservoirs with several hundreds or thousands of nodes. The advantages of an ORC relative to an RC in electrical domain are:
This present disclosure discloses a type of ORC system.
Reservoir computing (RC) is a framework for computation that may be viewed as an extension of neural networks. Typically an input signal is fed into a fixed (random) dynamical system called a reservoir and the dynamics of the reservoir map the input to a higher dimension. Then a simple readout mechanism is trained to read the state of the reservoir and map it to the desired output. The main benefit is that training is performed only at the readout stage and the reservoir is fixed. RC is synonymous with Liquid-State Machines (LSM) and Echo State Networks (ESN), terms also widely used in the field.
In an electrical domain, RC has been demonstrated successfully using memristors by “Du”. A preview of RC in several physical domains, including optical domain, namely, Optical Reservoir Computing (ORC) is provided in “Tanaka”. In ORC, high dimensionality (HD) is usually provided with diffraction. However, HD with higher diffractive orders has progressively lower intensity, necessitating costly and complex intensity-balancing and optical amplification for spatial cross-coupling of nodes. Non-linearity (NL) is usually enabled by driving optical amplifiers to saturation. This requires much power, besides optical amplifiers being expensive to build. Fading Memory (FM) is usually made possible with looped optical fibers or waveguides, which requires much space. When such waveguides are built in compact area, signals are very lossy, and require additional power-hungry optical amplifiers.
In the present disclosure for an ORC, HD is provided by the random but fixed time-wavelength multiplexing on to an XY plane by a Fresnel-Kohler Integrator (FKI). NL is introduced by overlapping non-linear responses to input signals by an array of fluorescers. FM is provided by different decay time constants of elements of the array of fluorescers.
The present invention discloses an ORC system comprising a Light Emitting Diode Modulator (LED-M), a Beam Expander (BE), a Fluorescer Array (FA), a Fresnel-Kohler Integrator (FKI), a Liquid Crystal Spatial Light Modulator (LC-SLM), and a Photo-Detector Array (PDA), and PDA signal processing electronics which may include a Field Programmable Gate Array (FPGA) and a logic controller. The LED-M receives an input electrical signal and outputs an optical signal. The optical signal passes through the BE, is made incident upon the FA, is processed in the FKI, and is multiplexed onto the LC-SLM. The LC-SLM, the PDA a Field-Programmable Gate Array (FPGA), and a logic controller form a feedback loop. A method of using the optical reservoir computing system is disclosed. The method comprises the steps of minimizing an error function of difference between a measured state of the PDA and a target state of the PDA by a regression model; and tuning different combinations of the LC-SLM states.
An optical signal from the LED-M 110 passes through the BE 120, is made incident upon the FA 140, is processed in the FKI 160, and is multiplexed onto the LC-SLM 180, which selectively masks the incoming light and illuminates the PDA 190.
An optical signal from the LED-M 210 passes through the BE 220, is made incident upon the FA 240, is processed in the FKI 260, and is multiplexed onto the LC-SLM 280. In examples of the present disclosure, the LC-SLM 280, the PDA 290, the FPGA 292, and the controller 294 form a feedback loop. The elements of LC-SLM 280 are tuned to have different attenuations based on the learning from analysis of PDA 290 signals by FPGA 292.
In examples of the present disclosure, the input signal 202 to the LED-M 210 is an electronic signal, representing the cognitive data that needs to be processed. The data may be fed at any frequency of interest, but specifically in the range from 1 kHz to 10 GHz. The processing can be done in real time (such as WiFi signature detection) or off-line (such as face recognition from photographs). The LED-M 210 comprises a near-ultraviolet photodiode, with wavelength of emission between 200 and 450 nm. The electronic signal modulates the near-ultraviolet photodiode so as to generate the optical signal, thus achieving the electro-optic conversion, so that the reservoir computing can now happen in the optical domain.
In examples of the present disclosure, the optical integrator 260 is an FKI. The PDA 290 includes photodiodes, and electronics for amplification, shaping and discrimination of the optical signal. The FPGA 292 outputs data to an external data processing and controlling device 298, which may be a computer. The FPGA 292 also controls the pixels of the LC-SLM which does a fixed or programmable masking of FKI 260 output before it goes to PDA 290.
In examples of the present disclosure, the optical signal from LED-M is a modulated light beam with a wavelength in a range from 200 nm to 450 nm. The input electronic signal for cognitive processing is converted to the near-ultraviolet optical signal with the LED-M 210. In examples of the present disclosure, the LED-M 210 includes a 280 nm LED (for example, XR-280 from RAYVIO Corporation) and a high-speed LED driver (for example, ONET4201LD from TEXAS INSTRUMENTS Incorporated). In examples of the present disclosure, the optical signal is a non-linear function of the input electrical signal, for example, see “Modeling Laser-Diode Non-linearity in a Radio-over-Fibre Link”, Pre-print, Research Gate, 2003, by Baghersalimi et al. This electro-optic conversion is non-linear, which contributes to the non-linearity (NL) of signal transformation required for Reservoir Computing.
A modern technique to create and tune elements of the FA is to dissolve a strong fluorescer in a relaxation ionic liquid (see U.S. Pat. No. 9,568,623) to obtain the desired temporal and spectral behavior. For example, from
In examples of the present disclosure, fluorescer compounds can be selected to have emission time constants starting from nano-seconds to sub-seconds, thus making this architecture suitable not just for RF signals (ns resolution), but also for audio (micro-sec) and seismic (milli-sec) signals.
Conversion of electrical to optical signal through LED-M 210 and FA 240 is mathematically depicted in
Popt=Σn=03αn{A(t)−Tth}n (1)
Where A(t) is the amplitude function of the input electrical signal u(t), Ith is the threshold current of the LED-M, and αn are the coefficients of electro-optic modulation, which are empirical constants for the LED-M type.
When this optical signal Popt 424 from LED-M 210 is expanded with the BE 220 and made incident on the FA 240, each of the FA 240's elements fluoresce differently in response. The response function of a single element of FA 240, FAout is generally described as a function of time t and wavelength band λ:
FAout(t,λ)=Popt·e−t/τ(λ)·Σp=0NH(t−tp) (2)
where τ(λ) is the fluorescence decay time constant for wavelength band λ, N is the number of incoming signal pulses, p is the index of the pulse in the sequence of pulses, tp is the time at which the p'th pulse was generated. The Heaviside function H is defined as:
where k is known as the logistic factor.
The exponential term e−t/τ(λ) provides the Fading Memory (FM) and the Heaviside function Σp=0NH(t−tp) provides the Non-Linearity (NL). The sequence of the input signal (one-dimensional) is now encoded in both, time and wavelength. The tensor that describes the fluorescer array (FA) 240 is [FA]τλ, as shown in
All the 16 time-wavelength emission outputs are made incident on the Fresnel-Kohler Integrator (FKI), one example of integrator 260, in
Due to the spectral dispersion of the Fresnel lens and convergence by the aspheric lens, the FKI 260 system as a whole creates a near-uniform illumination on the target plane (αβ) by coupling all the input sources in spectral (λ) and temporal (τ) dimensions in a non-linear, complex, but fixed manner. The tensor that describes the FKI 260 is [FKI]αβ, as shown in
FKI 260 images the signal on to the LC-SLM 280 of
Light going through the LC-SLM 280 is made incident upon the PDA 290 of
A copy of the PDA 290 outputs is fed into an FPGA 292 of
The dimensional expansion of information and squashing works in the following manner and depicted in
The dimensional compression to output nodes and training works in the following manner. Image of the FKI 260 is presented to the imaging plane 580, which is the input surface of the LC-SLM 280. The LC-SLM provides a tunable transformation with attenuation in 2 spatial dimensions, denoted as μ, ρ. Since in this implementation there are only 4×4 segments in the LC-SLM 280, the number of elements in dimensions μ, ρ is [4×4]=16. The tensor that describes the transformation through LC-SLM 280 is [SLM]αβμσ. A large dimensional compression happens at this stage, since the α, β information is lost giving rise to the SLM tuning for attenuation. The output nodes are just the segments of the LC-SLM, and their dimensionality is μ, σ which is [4×4]=16. There are just 16 output nodes which need to be tuned for training of the ORC without altering the 400,000 internal nodes—a fundamental feature of Reservoir Computing.
The output of the LC-SLM 280 is presented to the PDA 290, which converts the optical signals back into the electrical domain. In this implementation, the PDA 290 is a matrix of 2 dimensions A, B of [3×3]=9 photo-diodes. The spatial information is further compressed from μ, σ: [4×4] to A, B: [3×3]. Since the photo-detectors are blind to the different wavelengths, the λ information is completely collapsed and integrated into the response of photo-detector. The squashing of the nodes, a typical feature of neural nets, can be applied here by under or over-saturating the photo-detector response under desired conditions. The time information τ is also shaped and integrated to a slower time vector T. The photo-detector transformation tensor is therefore written as [PD]λτμσTAB.
The electrical signal after being further processed in PDA 290 is read by the FPGA 292. The information read by the FPGA has two spatial dimensions—the plurality of PDA elements, namely A, B: [3×3]. It also has a temporal dimension T. In this implementation, the temporal resolution of the FPGA is 10 nano-seconds, processing signals that can be 10-110 nano-seconds wide—therefore the vector T is 10 elements long. The final information matrix from FPGA has a dimensionality of 3; T,A,B: [10×3×3].
In this implementation, the dimensionality expansion and compression is summarized as follows.
[FA]τλ·[FKI]αβ·[SLM]αβμσ·[PD]λτμσTAB≡[FPGA]TAE (4)
This is where the output weights are applied from training. The LC-SLM 280 does a resolution-compression from αβ to μσ, depicted by the transformation tensor [SLM]αβμσ of
Training algorithm for this device is rather simple, since the RC is all in optical domain. The input signature is u(t), which gets transcribed on the LC-SLM 280 as X(t,λ,x″,y″). The weights that are applied by virtue of the LC-SLM 280 attenuation are W(x′,y′). The PDA 290 outputs constitute the final read-out Y(T,x,y).
Generally, if plurality is defined as P, then: P(x″)>>P(x′)>P(x) and P(y″)>>P(y′)>P(y). Also, time scale T>>time scale of t.
The SP property of RC desires that for different u(t), there would be distinct classes of Y(T,x,y). The W are needed to be trained to achieve that. W is recursively trained by minimizing the error function E.
In this notation, Y is the PDA 290 state and Ytarget is the desired state when SP is achieved. The training method iterates until SP is achieved.
Regression methods are used to minimize the error function. The methods will regress on the tensor W(x′,y′), such that: W(x′y′). X=Y
First, simple Linear Regression methods are tried to find W, using regular inverse and Penrose inverse:
WX=Y⇒W=(YXT)(XXT)−1 (6)
WX=Y⇒W=(YXT)(XXT)+ (7)
For better results, Ridge Regression method is used, with a regularization parameter λ, such that:
W=(YXT)(XXT+λI)−1 (8)
Other well-known minimization techniques borrowed from the fields of Echo State Networks and Liquid State Machines can be used if the above methods do not produce the desired result. Training will be deemed complete when desired SP is achieved for a certain number of different signature signals.
Those of ordinary skill in the art may recognize that modifications of the embodiments disclosed herein are possible. For example, a package size of an optical reservoir computing system may vary. Other modifications may occur to those of ordinary skill in this art, and all such modifications are deemed to fall within the purview of the present invention, as defined by the claims.
Number | Name | Date | Kind |
---|---|---|---|
9568623 | Sahu et al. | Feb 2017 | B2 |
20160139280 | Sahu et al. | May 2016 | A1 |
Entry |
---|
“Advances in photonic reservoir computing”, Guy Van der Sande, et al., Nanophotonics 2017; 6(3): 561-576, DOI 10.1515/nanoph-2016-0132. |
“Reservoir computing using dynamic memristors for temporal information processing”, Du et al., Nature Communications | 8: 2204 | DOI: 10.1038/s41467-017-02337-y. |
“Recent Advances in Physical Reservoir Computing: A Review”, Tanaka et al., Preprint submitted to Neural Networks, Dec. 20, 2018, arXiv:1808.04962v2 [cs.ET] Dec. 19, 2018. |
“Recent trends in concentrated photovoltaics concentrators' architecture”, Marina Buljan et al., Journal of Photonics for Energy, 040995-1, vol. 4, 2014. |
“Differences in the behavior of dicationic and monocationic ionic liquids as revealed by time resolved-fluorescence, NMR and fluorescence correlation spectroscopy”, Debashis Majhi et al., Phys.Chem.Chem.Phys., 2018, 20, 7844, DOI: 10.1039/C7CP08630J. |
“Modelling Laser-Diode Non-linearity in a Radio-over-Fibre Link”, G. Baghersalimi, et al., Pre-print, Research Gate, 2003. |
Number | Date | Country | |
---|---|---|---|
20210027192 A1 | Jan 2021 | US |