This invention relates to the field of an analog to digital converters (ADC) using memristors in a neural network.
High performance data converters are key components in modern mixed-signal systems, in advanced technology nodes, and emerging data-driven applications. However, the analog performance in the same process is dramatically degraded due to reduced signal-to-noise ratio (SNR), low intrinsic gain, device leakage, and device mismatch. These deep-submicron effects exacerbate the intrinsic speed-power-accuracy tradeoff in ADCs, which has become a chronic bottleneck of modern system design. Moreover, these effects are poorly handled with specific and time-consuming design techniques for special purpose applications, resulting in considerable overhead and severely degrading their performance.
The foregoing examples of the related art and limitations related therewith are intended to be illustrative and not exclusive. Other limitations of the related art will become apparent to those of skill in the art upon a reading of the specification and a study of the figures.
The following embodiments and aspects thereof are described and illustrated in conjunction with systems, tools and methods which are meant to be exemplary and illustrative, not limiting in scope.
There is provided, in an embodiment, a pipelined analog-to-digital converter (ADC) system comprising: a first ADC stage comprising a trainable neural network layer, wherein said first ADC stage is configured to (i) receive an analog input signal, and (ii) convert it into a first n-bit digital output representing said analog input signal; a digital-to-analog converter (DAC) circuit comprising a trainable neural network layer, wherein said DAC circuit is configured to (iii) receive said first n-bit digital output, and (iv) convert it into an analog output signal representing said first n-bit digital output; and a second ADC stage comprising a trainable neural network layer, wherein said second ADC stage is configured to (v) receive a residue analog input signal of said analog input signal, and (vi) convert it into a second n-bit digital output representing said residue analog input signal; wherein said first and second n-bit digital outputs are combined to generate a combined digital output representing said analog input signal.
In some embodiments, the system further comprises one or more subsequent ADC stages connected in a pipelined arrangement, wherein each of said subsequent ADC stages comprises: a DAC circuit comprising a trainable neural network layer, wherein said DAC circuit is configured to receive an n-bit digital output from a preceding ADC stage and convert it into an analog output signal representing said n-bit digital output; and an ADC circuit comprising a trainable neural network layer, wherein said ADC circuit is configured to receive a residue analog input signal of said analog input signal, and convert it into an n-bit digital output representing said residue analog input signal.
There is also provided, in an embodiment, an analog to digital (ADC) conversion method, the method comprising: receiving an analog input signal; converting said analog input signal, using a first ADC stage comprising a trainable neural network layer, into a first n-bit digital output representing said analog input signal; converting said first n-bit digital output, using a digital-to-analog converter (DAC) circuit comprising a trainable neural network layer, into an analog output signal representing said first n-bit digital output; and converting a residue analog input signal of said analog input signal, using a second ADC stage comprising a trainable neural network layer, into a second n-bit digital output representing said residue signal; and combining said first and second n-bit digital outputs to generate a combined digital output representing said analog input signal.
In some embodiments, the method further comprises using one or more subsequent ADC stages connected in a pipelined arrangement, wherein said method comprises, with respect to each of said subsequent ADC stages: (i) receiving, from a preceding ADC stage, an n-bit digital output; (ii) converting said n-bit digital output, using a DAC circuit comprising a trainable neural network layer, into an analog output signal representing said n-bit digital output; and (iii) converting a residue analog input signal of said analog input signal, using an ADC circuit comprising a trainable neural network layer, into an n-bit digital output representing said residue signal.
In some embodiments, each of the first and second n-bit digital outputs is a 4-bit digital output.
In some embodiments, the residue analog input signal is determined based, at least in part, on a comparison between said analog input signal and said analog output signal.
In some embodiments, the first n-bit digital output, said second n-bit digital output, and all of said n-bit digital output of said subsequent ADC stages, are combined to generate said combined digital output.
In some embodiments, the combined digital output comprises a number of bits equal to n-bits multiplied by the number of all of said ADC stages.
In some embodiments, the first n-bit digital output represents a most significant bits (MSB) portion of said combined digital output.
In some embodiments, the n-bit digital output of a last of said subsequent ADC stages in the pipeline represents a least significant bits (LSB) portion of said combined digital output.
In some embodiments, each of the trainable neural network layers comprises a plurality of neurons connected with synapses, and wherein each of said synapses is set with an adjustable weighting.
In some embodiments, each of the synapses comprises a memristor, and wherein each of said trainable neural network layers is arranged as a memristive crossbar array comprising a synaptic weightings matrix.
In some embodiments, an output vector of each of the trainable neural network layers is calculated as a weighted sum of said outputs of said neurons multiplied by said synaptic weightings matrix.
In some embodiments, at a training stage, each of the neural network layers is trained by an iterative process comprising: (i) comparing said output vector of said neural network layer to a respective training input; and (ii) adjusting, based on said comparing, said synaptic weightings matrix of said neural network layer, wherein said adjusting minimizes a cost function based on a gradient descent algorithm.
In some embodiments, with respect to each of the ADC stages, the training input comprises an n-bit portion of a desired digital output of said system, and wherein said n-bit portion corresponds to bit positions of said n-bit digital output of said ADC stage within said combined digital output.
In some embodiments, with respect to each of the DAC circuits, the training input comprises an output of a preceding trained ADC stage.
In some embodiments, the training stage is performed simultaneously and independently with respect to all of said ADC stages.
In addition to the exemplary aspects and embodiments described above, further aspects and embodiments will become apparent by reference to the figures and by study of the following detailed description.
Exemplary embodiments are illustrated in referenced figures Dimensions of components and features shown in the figures are generally chosen for convenience and clarity of presentation and are not necessarily shown to scale. The figures are listed below.
Disclosed herein are a system and method providing for a neuromorphic analog-to-digital converter (ADC).
In some embodiments, the present design employs a pipelined neural network ADC architecture. In some embodiments, the present design provides for a large-scale ADC based on coarse-resolution neuromorphic ADC and DAC, modularly cascaded in a high-throughput pipeline, which are then trained using a training algorithm for multiple full-scale voltages and sampling frequencies. In some embodiments, the training algorithm may be configured to tune the neural network in non-ideal test conditions, as an accurate, fast, and low-power ADC.
In some embodiments, an ADC of the present disclosure comprises a hybrid CMOS-memristor design may achieve 0.97 fJ/conv figure-of-merit (FOM) at the maximum conversion rate.
As noted above, deep-submicron effects in current ADCs exacerbate the intrinsic speed-power-accuracy tradeoff. For example, current designs may achieve high resolution combined with moderate-to-high speeds, but they rely on proper component matching and require complex op-amps which are increasingly difficult to design and scale in state-of-the-art CMOS technologies. Additionally, they typically employ flash-type sub-ADCs, which require high power and have a large physical footprint, due to a large number of accurate comparators, pushing them out of the application band of interest.
The analog-to-digital conversion task can be characterized as an example of simple pattern recognition, where the analog input can be classified into one of the 2N different patterns for N bits, and thus can be readily solved using artificial neural networks (ANNs). The calibration and training process of these networks can be viewed as modification of neural parameters based on the measured error calculated during learning.
Four-bit single-stage neural network (NN) ADCs have been previously proposed. However, four-bit resolution is insufficient for practical applications, while direct scaling of this architecture is challenging due to the quadratic increase in number of synaptic weights (with exponentially large values), large footprint, high power consumption, longer training time, and limited sampling frequency.
Accordingly, in some embodiments, the present disclosure provides for a large-scale, general-purpose neuromorphic ADC. In some embodiments, the present ADC comprises a hybrid CMOS-memristor design with multiple trainable cores of four-bit NN ADCs and DACs in a two-stage pipeline. This architecture takes advantage of light-weight low-power sub-ADC cores combined with high throughput and resolution achievable through the pipeline. Furthermore, each sub-ADC optimizes the effective number of bits (ENOB) and power dissipation during training for the chosen sampling frequency.
In some embodiments, the present disclosure employs neuro-inspired approaches to create ADCs that could be trained in real time for general purpose applications, and break through conventional ADC limitations.
In some embodiments, the present disclosure leverages neural network architectures and artificial intelligence learning algorithms, to create an ADC which integrates memristor technology with CMOS.
In some embodiments, a learning algorithm of the present disclosure implements one or more supervised machine learning algorithms, e.g., a stochastic gradient descent algorithm, which fits multiple application specifications such as full-scale voltage ranges and sampling frequencies.
In some embodiments, the present disclosure provides for using the converted signal to train a neural network of the present ADC, in order to autonomously adapt to the exact specifications of the running application as well as to adjust to environmental variations.
In some embodiments, the present disclosure utilizes an artificial neural network (ANN) architecture comprising memristors. Memristors are two-terminal passive devices with varying resistance which changes according to the current flowing through the device, or alternatively, the voltage across the device. Memristors primarily serve as non-volatile memory and can be used for both digital and analog applications. The activation-dependent dynamics of memristors make them a promising feature for registering and updating synaptic weights. Consequently, memristors are now being widely adopted in the design of synapses for artificial neural systems because of their small footprint, analog storage properties, energy efficiency, and non-volatility. These characteristics allow for synapse-like behavior, where the conductance of the memristor is considered as the weight of the synapse. Accordingly, in some embodiments, the use of memristors as synapses helps to achieve a high-precision, high-speed, low-power, simple, cost-efficient, and reconfigurable single channel ADC architecture that improves on the typical speed-power accuracy tradeoff.
Although embodiments of the present disclosure will be detailed herein with reference to specific components and/or architectures, the present invention is not necessarily limited in its application to the details of construction and the arrangement of the components and/or methods set forth in the following description and/or illustrated herein.
While the analog domain is mainly characterized by its energy efficiency in data processing, its digital counterpart outperforms it in reliable computation. ADCs are mixed-signal systems that inherently combine hybrid analog-digital principles along with the pros and cons of each domain. Therefore, these systems are optimally customized to fit a specific subset from a wide functional spectrum.
Design tradeoff is an extreme case when the system is pushed toward its performance limits. The ADC comprises a signal sampler that discretely samples the continuous-time signal at a constant rate, and a quantizer that converts the sampled value to the corresponding discrete-time N-bit resolution binary-coded form. The quality of a system is considered ideal when it achieves high speed and accuracy with a low power drain. In practice, however, the resolution decreases as the conversion rate increases, and greater power consumption is required to achieve the same resolution.
Device mismatch is the dominant factor affecting system accuracy. Larger devices are necessary to improve system accuracy, but the capacitive loading of the circuit nodes increases as a result and greater power is required to attain a certain speed. The maximal speed of the system is a function of the gain-bandwidth, but it is limited by the input pole.
Aside from device mismatches, four loss mechanisms affect the ADC resolution and limit the signal-to-noise-and-distortion ratio (SNDR):
Quantization noise is the only error in an ideal ADC. Jitter is a sample-to-sample variation of the instant in time at which sampling occurred. Additionally, the conversion speed is limited by the ability of the comparator to make assertive decisions regarding the relative amplitude of the input voltage. This limitation is called comparator ambiguity and it is related to the speed of the device used to fabricate the ADC. Device speed is measured as the frequency, fT, at which there is unity current gain. As a result of these limitations, approximately one bit of resolution is lost each time the sampling rate doubles.
Whereas non-linear distortions, memory effects, and device mismatches can be somewhat compensated for, thermal white noise cannot; consequently, it is one of the more dominant limiters of ADC performance. It is modeled by KT/C noise, where K denotes Boltzmann's constant, T denotes temperature, and C denotes sampler capacitance. Lowering the noise floor by a factor of two in purely thermal-noise limited circuits would quadruple the power consumption. The limit that device mismatch imposes on the power consumption is approximately two orders of magnitude higher than the limit imposed by thermal noise.
The need to digitize so many signal types has produced a broad range of data converters diverse in their resolution, sampling rates, and power consumption budget. These considerations profoundly affect system architectures and their performance. The speed-power-accuracy tradeoff has resulted in a wide range of ADC architectures optimized for special purpose applications, from high-speed, to high-resolution, to low-power applications.
When comparing ADCs with different specifications, a numerical quantity known as a figure of merit (FOM) is used to characterize the performance of each ADC relative to its alternatives. Two or more metrics can be combined into a single FOM that accurately reflects the merits of the ADC in a certain context and for a specified purpose. One of the most widely used FOMs is defined as
and relates the ADC power dissipation during conversion, P, to its performance in terms of sampling frequency, fs, and effective number of resolution bits (ENOB).
Lower FOM values will result in better ADC performance. The ENOB is calculated from the SNDR as
The aforementioned FOM best captures the fundamental speed-power-accuracy tradeoff. The ongoing saga of CMOS technology trends toward smaller transistor dimensions has resulted thus far in ultra-deep submicron transistors. The FOM evolution also best describes Moore's law of ADCs. Technology scaling improves sampling frequencies, because fT allows for faster operation. However, the speed of sampling frequency is limited by the comparator ambiguity. In the same context, the impact of technology scaling on power dissipation optimization is also limited by the supply voltages, and by leakage currents that inevitably lead to an increase in the power consumption required to maintain SNDR. These limitations, along with manufacturing process variations and device mismatches in ultra-deep submicron technologies, are the biggest obstacle to achieving high linearity, wide dynamic range, and high-resolution converters.
Techniques for circumventing the tradeoff have recently been investigated, with the goal of achieving ultra-low-power consuming converters with high resolution through a combination of systematic, architectural and technological approaches. Examples of such methods are digitally assisted background calibration, time-interleaving, pipelining, sub-ranging, folding, interpolating, and oversampling. These techniques have succeeded to postpone the FOM saturation. Modern ADC architectures are custom designed circuits that are fine-tuned to optimize specific capabilities and design parameters up to the application's specification.
The field of machine learning (ML) is devoted to the study and implementation of systems capable of learning from data using their evolving perceptual ability to make crucial decisions, predictions, and classifications based on examples learned from the past. Data conversion could be viewed as a special case of the classification optimization and signal restoration problem that could easily be solved using ML to learn from the data.
Accordingly, a trainable ADC architecture for general purpose applications may be by a machine learning algorithm in real-time to optimize the ENOB and power dissipation, by providing a specific training dataset. This procedure is equivalent to a dynamic FOM optimization. The technique is not exclusive to reconfiguration, but can also be applied for device mismatch self-calibration, adaptation, and noise tolerance. Furthermore, the trainability of the architecture adds flexibility that makes it cost-effective and versatile, with a minimalistic design that uses one channel and an intelligent machine learning algorithm.
The deterministic four-bit neural network ADC in Danial (2018) converts an analog input voltage (Vin) to a digital output code (D3D2D1D0) according to the following iterative expressions,
where Vref is the reference voltage equals to one full-scale voltage quantum (LSB), and u(·) is the signum neural activation function (neuron) having either zero or full-scale voltage output.
The neural network shown in
Synaptic weights are tuned to minimize the mean square error (MSE) by using the stochastic gradient descent (SGD) learning rule
ΔWij(j>i)(k)=−η(Ti(k)−Di(k))Tj(k), (2)
where η is the learning rate (a small positive constant), and in each iteration k, the output of the network Di(k) is compared to the desired teaching label Ti(k) that corresponds to the input Vin(k). The training continues until the training error falls to Ethreshold, a predefined constant that defines the learning accuracy. The FOM is optimized and the network is configured from a random initial state to the desired ADC.
The neural network DAC in L. Danial et al., “DIDACTIC: A Data-Intelligent Digital-to-Analog Converter with a Trainable Integrated Circuit using Memristors,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, Vol. 8, No. 1, pp. 146-158, March 2018, converts the four-bit digital input code (V3V2V1V0) to an analog output (A) as
where binary weights (2i) are implemented with reconfigurable synaptic weights Wi and having a similar realization as in
ΔWi(k)=−η(t)(Vout(k)−t(k))Di(k), (4)
where η(t) is the time-varying learning rate, and t(k) is the analog teaching label. The feedback is disconnected after the training is complete (E<Ethreshold).
Increasing the scale of the neural network ADC beyond 4 bits is challenging. Table 1 below highlights the effect of scaling on design and performance parameters of the ADC.
The number of synapses in the network increases quadratically. Consequently, the footprint and power consumption rise significantly. Moreover, there is an exponential rise in the aspect ratio of synaptic weights, which is practically limited by the high-to-low resistive states ratio (HRS/LRS), number of resistive levels, endurance of the memristor (e.g., multiple trainings per day for multiple days), and time and power consumption of the training phase—ultimately limiting the practical achievable resolution to four-bits. Additionally, higher number of neurons require longer conversion-time which limits the maximal Nyquist sampling frequency.
In some embodiments, the present disclosure provides for using coarse-resolution neural network-based ADCs and DACs, to create a fine-resolution pipelined network.
The output of the sub-ADC is converted back to an analog signal A by the DAC according to
where Wi are the synaptic weights. Next, this output is subtracted from the held input to produce a residue Q as
Q=V
in
−A. (7)
This residue is sent to the next stage of the pipeline, where it is first sampled and held. The second stage sub-ADC is designed similarly to that of the first stage, except that the resistive weights of the input are modified from Rin=Rf (feedback resistance of neuron) to Rf/16. This is made in order to scale the input from VFS/16 to the full-scale voltage VFS. The LSBs of the digital output are obtained from this stage as
The sample-and-hold circuit enables concurrent operation of the two stages, achieving a high throughput rate, but introduces latency of two clock cycles. Thus D-flipflop registers are used to time-align the MSBs and the LSBs.
Conventional pipeline implementations generally use power-hungry flash sub-ADC cores and rely on redundancies and complex calibration techniques for high resolution. Conversely, in the present disclosure, trainable neural network ADC/DAC cores have minimalistic design with mismatch self-calibration, noise tolerance, and power consumption optimization. This eliminates the need for an exclusive inter-stage gain unit and calibration mechanism, because the residue is amplified by the input resistive weight of the second sub-ADC. Although resistors are highly prone to manufacturing variations, they can be effectively used as the input weights because their mismatches will be calibrated for by other memristive weights in the second stage. Furthermore, the training algorithm ensures that the quantization error remains within tolerable limits without using digital calibration techniques. This eliminates the area and power overheads of the calibration circuits, which overwhelm around 33% and 17% of the total area and power, respectively.
The aim of the training is to configure the network from a random initial state (random synaptic weights) to an accurate eight-bit ADC. It is achieved by minimizing the mean-square-error (MSE) of each sub-ADC and the DAC by using specific teaching labels for desired quantization. During the training phase, switches S1 and S2 are in position 1.
The DAC is supplied with four-bit digital teaching labels corresponding to an analog ramp input, as shown in
The accuracy requirements of each stage decrease through the pipeline and the first stage should be accurate to the overall resolution. Moreover, the two-stages operate on different inputs for different quantization. Thus, their teaching dataset must be different to execute the online SGD algorithm as
ΔWij(j>i)(k)=−ηADC(Ti(k)−Di(k))Tj(k),0≤i,j≤3, (9)
ΔWij(j>i)(k)=−ηADC(Ti(k)−Di(k))Tj(k),4≤i,j≤7, (10)
Interestingly, Eqs. (9) and (10) can be implemented using different teaching inputs, as shown in
For the training dataset, an analog ramp signal is sampled at 4·28 (=1024). Four adjacent samples are given the same digital labels, providing an eight-bit training dataset, shown as Vt1 in
The present pipelined ADC was simulated and evaluated SPICE (Cadence Virtuoso) using a 180 nm CMOS process and memristors fitted by the VTEAM memristor model to a Pt/HfOx/Hf/TiN RRAM device. The device has an HRS/LRS of 50.
First, the learning algorithm was evaluated in terms of training error and learning time. Next, the circuit was statistically and dynamically evaluated, and finally, power consumption was analyzed. The circuit parameters are listed in Table 2 above. To test the robustness of the design, device non-idealities and noise were incorporate.
The basic deterministic functionality of the pipeline ADC was demonstrated during training by the online SGD algorithm.
Linearity plots (
The proposed 8-bit pipelined architecture is compared to the scaled version of neural network ADC in Danial (2018). As shown in Table 3 below, the pipelined ADC consumes less power, achieves high conversion rate, and better FOM with lesser HRS/LRS device ratio and number of resistive levels.
To test the scalability of the present architecture, the present inventors performed behavioral simulations in MATLAB. The results for 12-bit design with ideal device parameters are summarized in Table 4 below.
Furthermore, when the full-scale voltage is reduced to 0.9V and the sampling frequency is increased to 10 MSPS, the network converges to a new steady state to operate correctly under different specifications.
It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as suitable in any other described embodiment of the invention. The present specification is to be read is if all such single embodiments and separate embodiments and sub-combinations are explicitly set forth herein. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements.
Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims.
All publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention. To the extent that section headings are used, they should not be construed as necessarily limiting.
In some embodiments, the present disclosure uses the Voltage Threshold Adaptive Memristor (VTEAM) model to accurately model the memristor's behavior in design and simulations. The model is given by the following equations
where w is an internal state variable, v(t) is the voltage across the memristive device, i(t) is the current passing through the memristive device, G(w,v) is the device conductance, koff, kon, αoff, αon, are constants, von, and voff are threshold voltages.
In some embodiments, the present disclosure uses the multi-level linearized Pt/HfOx/Hf/TiN RRAM device. For this device, post fitting to the VTEAM model, the I-V relationship is given by,
Synapses are the building blocks of a neural network as they connect one neuron to the other. The strength of this connection is determined by the synaptic weight. A higher synaptic weight means strong dependency on the output of a neuron on its preceding neuron. When neuromorphic architecture is implemented on the conventional computing architecture, the synaptic weights are fetched from the memory unit to the processor unit where they are read and updated. The updated weights are stored back to the memory unit and the Von Neumann bottleneck remains a challenge.
Accordingly, in some embodiments, the present disclosure implements artificial synapses using hybrid CMOS-memristor design. The resistance of memristors can be changed based on the history of applied electrical stimuli. This closely resembles to the biological synapses where the strength of connection increases or decreased based on the applied action potential. The memristive synapse can not only store the weight but also naturally transmit information into post-neurons, overcoming the Von Neumann bottleneck. The design consists of a voltage-controlled memristor connected to the shared terminal of PMOS and NMOS, as shown in
The deterministic four-bit neural network ADC converts an analog input voltage (Vin) to a digital output code (D3D2D1D0) according to the following iterative expressions,
where Vref is the reference voltage equals to one full-scale voltage quantum (LSB), and u(·) is the signum neural activation function (neuron) having either zero or full-scale voltage output. The neural network shown in
Synaptic weights are tuned to minimize the mean square error (MSE) by using the stochastic gradient descent (SGD) learning rule,
ΔWij(j>i)(k)=−η(Ti(k)−Di(k))Tj(k),
where η is the learning rate (a small positive constant), and in each iteration k, the output of the network Di(k) is compared to the desired teaching label Ti(k) that corresponds to the input Vin(k). The training continues until the training error falls to Ethreshold, a predefined constant that defines the learning accuracy.
A previously proposed neural network DAC converts the four-bit digital input code (V3V2V1V0) to an analog output (A) as,
where binary weights (2i) are implemented with reconfigurable synaptic weights Wi and having similar realization as in
As shown in
ΔWi(k)=−η(t)(Vout(k)−t(k))Di(k),
where η(t) is the time-varying learning rate, and t(k) is the analog teaching label. The feedback is disconnected after the training is complete (E<Ethreshold).
The ADC is evaluated statistically for differential non-linearity (DNL) and integral non-linearity (INL). These are defined as,
where Vj and Vj+1 are adjacent code transition voltages, and jϵ{x|1≤x≤2N-2}.
The Signal to Noise and Distortion Ratio (SNDR) is calculated from the FFT plot of ADC's output as,
where Ppeak is the peak signal power from the FFT plot, Pnoise-floor is the average noise power, N is the total number of bits, and CPG, Scalloping_Loss, ENBW are window-dependent parameters.
The Effective Number of Bits (ENOB) is calculated from the SNDR as,
The figure-of-merit (FOM) relates the ADC's sampling frequency, fs, power consumption during conversion, P, and effective number of bits, ENOB. A lower value of FOM signifies better overall performance. FOM is defined as,
Increasing the scale of the neural network ADC described above, above four bits, is challenging. Table 1 above highlights the effect of scaling on design and performance parameters of the ADC. The number of synapses in the network increases quadratically. Consequently, the area and power consumption rise significantly. Moreover, there is an exponential rise in the aspect ratio of synaptic weights, which is practically limited by the high-to-low resistive states ratio (HRS/LRS), number of resistive levels, endurance of the memristor, and time and power consumption of the training phase—ultimately limiting the practical achievable resolution to four-bits. Additionally, higher number of neurons require longer conversion-time which limits the maximal Nyquist sampling frequency.
Pipeline is a technique where multiple instructions are overlapped during execution. It is divided into stages which are connected with one another to form a pipe like structure, as shown in
In some embodiments, the present disclosure uses light-weight coarse-resolution neural network ADCs and DACs to build a fine-resolution pipelined network. An eight-bit two-stage pipelined ADC is shown in
In the first-stage sub-ADC, a synapse Wij is present between a pre-synaptic neuron with index j and digital output Dj, and a post-synaptic neuron with index i, and digital output Di. A neuron for each bit collectively integrates inputs from all synapses and produces an output by the signum neural activation function u(·). The sub-ADC coarsely quantizes (MSBs) the sampled input Vin to the digital code D7D6D5D4(MSB to LSB) as,
The output of the sub-ADC is converted back to an analog signal A by the DAC as,
where Wi are the synaptic weights. Next, this output is subtracted from the held input to produce a residue Q as,
Q=V
in
−A.
This residue is sent to the next stage of the pipeline, where it is first sampled and held. The second stage sub-ADC is designed similar to that of the first stage, except that the resistive weights of the input are modified from Rin=Rf (feedback resistance of neuron) to Rf/16. This is made in order to scale the input from VFS/16 to the full-scale voltage VFS. The LSBs of the digital output are obtained from this stage as
The sample-and-hold circuit enables concurrent operation of the two stages, achieving a high throughput rate, but introduces latency of two clock cycles. Thus D-flipflop registers are used to time-align the MSBs and the LSBs.
Trainable neural network ADC/DAC cores in this design have minimalistic design with mismatch self-calibration, noise tolerance, and power consumption optimization. This eliminates the need for an exclusive inter-stage gain unit and calibration mechanism, because the residue is amplified by the input resistive weight of the second sub-ADC. Although resistors are highly prone to manufacturing variations, they can be effectively used as the input weights because their mismatches will be calibrated for by other memristive weights in the second stage. Furthermore, the training algorithm ensures that the quantization error remains within tolerable limits without using digital calibration techniques.
The aim of the training is to configure the network from a random initial state (random synaptic weights) to an accurate eight-bit ADC. It is achieved by minimizing the mean-square-error (MSE) of each sub-ADC and the DAC by using specific teaching labels for desired quantization. During the training phase, switches S1 and S2 are in position 1.
The DAC is supplied with four-bit digital teaching labels corresponding to an analog ramp input, as shown in
The accuracy requirements of each stage decrease through the pipeline and the first stage should be accurate to the overall resolution. Moreover, the two-stages operate on different inputs for different quantization. Thus, their teaching dataset must be different to execute the online SGD algorithm as,
ΔWij(j>i)(k)=−ηADC(Ti(k)−Di(k))Tj(k),0≤i,j≤3,
ΔWij(j>i)(k)=−ηADC(Ti(k)−Di(k))Tj(k),4≤i,j≤7,
Interestingly, the above equations can be implemented using different teaching inputs, as shown in
For the training dataset, an analog ramp signal is sampled at 4·28 (=1024). Four adjacent samples are given the same digital labels, providing an eight-bit training dataset, shown as Vt1 in
The present proposed pipelined ADC is simulated and comprehensively evaluated in SPICE (Cadence Virtuoso) using a 180 nm CMOS process and memristors fitted by the VTEAM memristor model to a Pt/HfOx/Hf/TiN RRAM device. The device has an HRS/LRS of 50. First, the learning algorithm was evaluated in terms of training error and learning time. Next, the circuit is statistically and dynamically evaluated, and finally, power consumption is analyzed. The circuit parameters are listed in Table 2 above. To test the robustness of the design, device non-idealities and noise were incorporated.
The basic deterministic functionality of the pipeline ADC is demonstrated during training by the online SGD algorithm.
Linearity plots (
The pipelined ADC is tested for reconfigurability by changing the full-scale voltage from 1.8 V to 0.9 V and sampling frequency from 0.1 MS/s to 10 MS/s. The synaptic weights of the sub-ADCs and the DAC converges to new steady state to operate correctly under different specifications, as shown in
This 8-bit pipelined architecture is compared to the scaled version of neural network ADC. As shown in Table 3 above, the pipelined ADC consumes less power, achieves high conversion rate, and better FOM with lesser HRS/LRS device ratio.
To test the scalability of the present architecture, behavioral simulations were performed in MATLAB. Results for 12-bit design with ideal device parameters are summarized in Table 4 above.
A logarithmic ADC performs conversions with non-uniform quantization, where small analog amplitudes are quantized with fine resolution, while large amplitudes are quantized with coarse resolution.
For several biomedical applications, such as cochlear implants, hearing aids, neural recording and stimulation, a nonlinear analog-to-digital converter (ADC) seems a more appealing choice for a signal processing system than a linear ADC. Audio signals, for example, are well-suited to log encoding because the human ear is less able to distinguish sound levels when the dynamic range of the signals is larger. The benefits of a nonlinear ADC include the ability to handle input signals with a large dynamic range, reduction of noise and data bit-rate, and compensation for nonlinear sensor characteristics.
An N-bit logarithmic ADC converts an analog input voltage (Vin) to an N-bit digital output code (Dout=DN-1, . . . ,D0) according to a logarithmic mapping described by,
where N is the number of bits, B is the base of the logarithmic function (e.g., 10), C is defined as the code efficiency factor, and VFS is the full-scale analog input voltage range. Larger values of C result in more logarithmic conversion, capturing smaller signals and a higher dynamic range. The equation above implies that the logarithmic ADC achieves good resolution for small input signals, but still allows coarsely quantized large input signals. Quantization noise is thus lower when the signal amplitude is small, and it grows with the signal amplitude.
For small input amplitudes, the LSB size is small and has a minimum value of,
when Dout changes from 0 to 1. For large input amplitudes, the LSB size is larger and has a maximum value of,
when Dout changes from 2N−2 to 2N−1. The dynamic range (DR) of an ADC is defined by the ratio of the maximum input amplitude to the minimum resolvable input amplitude,
The DNL and INL for logarithmic ADC are defined similarly to the linear ADC except that in a logarithmic ADC the ideal step size varies with each step,
where Vj and Vj+1 are adjacent code transition voltages, and jϵ{x|1<=x<=2N−2}.
An N-bit logarithmic DAC converts an N-bit digital input code (Din) to an analog output voltage (Vout) according to a logarithmic (exponential) mapping described by
Exponential DAC, cascaded to a logarithmic ADC, is required to reproduce the linear analog input of the ADC. The INL, DNL, and ENOB for logarithmic DAC are defined as for the linear DAC, after activating a logarithmic transformation on Vout.
In some embodiments, the present disclosure utilizes the learning capabilities of ANNs, applying linear vector-matrix-multiplication and non-linear decision-making operations to train them to perform logarithmic quantization. Therefore, the logarithmic ADC equations are formulated in an ANN-like manner as follows, using three bits as an example,
where Vin is the analog input and D2 D1D0 is the corresponding digital form (i=2 is the MSB), while each
In a real-time operation, where non-ideal, stochastic, and varying conditions affect the conversion accuracy, the correct weights are not distributed deterministically in binary-weighted style. Rather, the weights should be updated in real-time in situ by a training mechanism. Four interconnected weights are needed to implement a three-bit logarithmic ADC. The interconnected synaptic weights of the network are described by an asymmetric matrix W, and each element Wij represents the synaptic weight of the connection from pre-synaptic neuron j to post-synaptic neuron i. In the linear ADC case, i and j were bounded by the network dimensions, which are equal to N. However, in this case, where have additional synaptic connections are present due to the AND product between neurons and their complements, the matrix dimensions approach (2N-1+2).
To train this network, W is tuned to minimize some measure of error (e.g., MSE) between the estimated and desired labels, over a training set. The online stochastic gradient descent (SGD) algorithm is used to minimize the error,
ΔWij(j>i)(k)=−η(Ti(k)−Di(k))Tj(k),
where η is the learning rate, a small positive constant, and in each iteration k, a single empirical sample Vin(k) is chosen randomly and compared to a desired teaching label T(k). The training phase continues until the error is below Ethreshold.
The logarithmic DAC equations are formulated in an ANN-like manner as follows, using three bits as an example,
V
out=20
Thus, the logarithmic DAC is realized by a single-layer ANN with a linear neural activation output function and 2N synapses. The DAC is trained using online SGD, with a time-varying learning rate and a teaching analog signal t(k),
ΔWi(k)=−η(t)(Vout(k)−t(k))Di(k).
The neural network ADC/DAC architectures and their building blocks, including neurons, synapses, and training feedbacks, are illustrated in
The synapse and Neuron circuit designs are explained above. The memristive crossbar (2T1R) inherently implements Ohm's and Kirchhoff's laws for ANN hardware realization. The present ADC/DAC was designed using a 0.18 μm CMOS process and memristors fitted by the VTEAM model to a Pt/HfOx/Hf/TiN RRAM device.
This device has a high-to-low resistance state (HRS/LRS) ratio of 50 to 1000. The aspect weight ratio of the ADC/DAC is equal to 22
Neuron values are multiplied using AND gates, added to the DAC and ADC in the frontend and backend, respectively. The online SGD algorithm is executed by the feedback circuit, which precisely regulates the synaptic reconfiguration. The aim is to implement the equations above and execute basic subtraction and multiplication operations.
While the feedback of the ADC is simple and realized by digital circuits, the feedback of the DAC is implemented by a pulse width modulator (PWM) with time proportional to the error and ±VDD, 0 V pulse levels. After the training is complete (E≤Ethreshold), the feedback is disconnected from the conversion path.
The present proposed three-bit logarithmic ANN ADC/DAC design is simulated and evaluated using Cadence Virtuoso. First, the MSE and training time of the learning algorithm are evaluated. Next, the circuit is statically and dynamically evaluated, and finally power consumption is analyzed. Functionality and robustness were massively tested under extreme conditions using MATLAB. The design parameters are listed in Table 5 below. Furthermore, circuit variations and noise sources are quantified and validated.
The basic deterministic functionality of the three-bit logarithmic ADC/DAC is demonstrated during training by the online SGD algorithm.
It is shown that the proposed training algorithm compensates for variations by reconfiguring the synaptic weights. It is statically evaluated how the proposed ADC responds to the DC logarithmic ramp signal.
The DAC is evaluated using similar methodologies. The proposed networks can also be trained to perform linear ADC/DAC using linearly quantized teaching data-sets. Table VI lists the full performance metrics and comparison with the linear ADC/DAC.
In some embodiments, the present disclosure presents a novel pipelined neural network ADC architecture. This large-scale design was based on coarse-resolution neuromorphic ADC and DAC, modularly cascaded in a high-throughput pipeline and precisely trained online using SGD algorithm for multiple full-scale voltages, and sampling frequencies. The learning algorithm successfully tuned the neural network in non-ideal test conditions and configured the network as an accurate, fast, and low-power ADC. The hybrid CMOS-memristor design with 1.8 V full-scale voltage achieved 0.97 fJ/conv FOM at the maximum conversion rate.
In some embodiments, the present disclosure report also presents a novel logarithmic quantization of an ANN ADC/DAC that is trained online using the SGD algorithm, enabling reconfigurable quantization. A hybrid CMOS-memristor circuit design was presented for the realization of a three-bit neural network ADC/DAC. The learning algorithm successfully adjusted the memristors and reconfigured the ADC/DAC along with the full-scale voltage range, quantization distribution, and sampling frequency. The simulations achieved a 77.19 pJ/conv FOM, exceeding the performance of a linear ADC.
This application claims the benefit of priority of U.S. Provisional Patent Application Nos. 62/945,293, filed on Dec. 9, 2019 and 62/957,854, filed on Jan. 7, 2020. The contents of the above applications are all incorporated by reference as if fully set forth herein in their entirety.
Number | Date | Country | |
---|---|---|---|
62945293 | Dec 2019 | US | |
62957854 | Jan 2020 | US |