1. Field of the Invention
This invention relates to signal receivers. Particularly, this invention relates to double sampling receivers for optical or electrical signaling.
2. Description of the Related Art
(Note: This application references a number of different publications as indicated throughout the specification by one or more reference numbers within brackets, e.g., [x]. A list of these different publications ordered according to these reference numbers can be found below in the section entitled “References.” Each of these publications is incorporated by reference herein.)
Integrated circuit scaling has enabled a huge growth in processing capability, which necessitates a corresponding increase in inter-chip communication bandwidth. This trend is expected to continue, requiring both an increase in the per-pin data rate and the I/O number. Unfortunately, the bandwidth of the electrical channels and the number of pins per chip do not follow the same trend. As data rates scale to meet increasing bandwidth requirements, the shortcomings of copper channels are becoming more severe. While I/O circuit performance favors from technology scaling, the bandwidth of electrical channels does not scale with the same trend. In particular, as data rate increases, they pose excessive frequency-dependent loss, which results in significant intersymbol interference (ISI) [1]-[3]. In order to continue scaling data rates, equalization techniques can be employed to compensate for the ISI. However, the power and area overhead associated with equalization make it difficult to achieve target bandwidth with a realistic power budget. As a result, rather than being technology-limited, current high-speed I/O link designs are becoming channel- and power-limited.
A promising solution to the I/O bandwidth problem is the use of optical interchip communication links. The negligible frequency-dependent loss of optical channels provides the potential for optical link designs to fully utilize increased data rates provided through CMOS technology scaling without excessive equalization complexity. Optics also allows very high information density through wavelength-division multiplexing (WDM). Hybrid integration of optical devices with electronics has been demonstrated to achieve high performance [4]-[9], and recent advances in silicon photonics have led to fully integrated optical signaling [10]-[11]. These approaches pave the way to massively parallel optical communications. In order for optical interconnects to become viable alternatives to established electrical links, they must be low-cost and have competitive energy and area-efficiency metrics. Dense arrays of optical detectors require very low-power, sensitive, and compact optical receiver circuits. Existing designs for the input receiver, such as TIA, require large power consumption to achieve high bandwidth and low noise and can occupy large area due to bandwidth enhancement inductors. Moreover, these analog circuits require extensive engineer efforts to migrate and scale to future technologies.
With the increasing bandwidth requirements of computing systems and limitations on power consumption, optical signaling for chip-to-chip interconnects has gained a lot of interest. Dense arrays of optical detectors require very low-power, sensitive, and compact optical receiver circuits. Existing designs for the input receiver, such as the transimpedance amplifier (TIA), require large power consumption to achieve high bandwidth and low noise, and can occupy a large area due to bandwidth enhancement inductors. In most optical receivers, the photodiode current is converted to a voltage signal. A simple resistor can perform the I-V conversion if the resulting RC time constant is in the order of the bit interval (Tb). However, for a given photodiode capacitance and target signal-to-noise ratio (SNR), the RC limits the bandwidth and hence the data rate. To avoid this problem, TIAs which are highly analog, power hungry are commonly employed, and do not scale well with the given technology. One alternative is to integrate the front-end to eliminate the need for resistance and break the bandwidth trade-off. However, this technique suffers from voltage headroom limitations, and requires short-length DC-balanced inputs.
In view of the foregoing, there is a need in the art for improved apparatuses and methods for optical receivers. There is particularly a need for such apparatuses and methods to operate at less than full bandwidth. Furthermore, there is a need for such apparatuses and methods to operate at very low power. These and other needs are met by embodiments of the present invention as detailed hereafter.
An optical receiver architecture is disclosed which employs an RC double-sampling front-end and dynamic offset modulation technique. The low-voltage double-sampling technique provides high power efficiency by avoiding linear high-gain elements conventionally employed in typical transimpedance-amplifier (TIA) receivers. Various applications are described, including electrical on-chip interconnects as well as pulse amplitude modulation.
An embodiment of the invention comprises an apparatus for signal receiving, comprising a front-end including a double sampling circuit for sampling the input voltage at an end of two consecutive bit times and determining a voltage difference of the sampled input voltage at the end of the two consecutive bit times, and an offset voltage or dynamic offset modulation (DOM) circuit for applying a dynamic offset voltage to the voltage difference. The voltage difference can be used to determine a signal received on the front end. The DOM circuit can avoid input-dependent performance degradation and the front-end can have a bandwidth that is at least an order of magnitude less than a bandwidth of the signal.
The front end can include a resistor or transimpedance amplifier (TIA) coupled to a capacitor at an input voltage. The bandwidth of the TIA can be a fraction of the data rate, e.g., less than 10% of the operating data rate 1/Tb.
Typically, a binary output representing the signal is determined from the voltage difference with the applied dynamic offset. However, in other embodiments, the input voltage may be a pulse amplitude modulation signal.
In the simplest case, the TIA is a single resistor. The dynamic offset voltage gain may be
where Tb is a bit time τ is an RC time constant of the front end.
Typically, the input voltage is from a photodiode receiving light. In further embodiments, the input voltage may be from an on-chip interconnect. The front-end, the double sampling circuit, and the offset voltage circuit may be implemented in complementary metal oxide semiconductor (CMOS).
The front end can be for low power (less than 0.5 pJ/s) and high speed (higher than 20 Gb/s) signal communication.
The apparatus can be scalable and portable.
The double sampling circuit can perform de-multiplexing.
The signal can be a data bit sequence, wherein each bit in the sequence is indexed by an integer n and the input produces a signal in response thereto.
The front end can include a resistor-capacitor (RC) circuit that integrates the signal to produce an exponential signal, wherein a time constant RC=τ of the RC circuit is greater than a bit time T of the data bit sequence;
The double sampling circuit can sample a first level V(n−1) of the exponential signal and a next level V(n) at the bit time T later of the exponential signal,
The voltage difference can be V[n]=V(n)−V(n−1), V[n]>0 can indicate the nth bit is a one, and V[n]<0 can indicate the nth bit is a zero
The DOM circuit can convert V[n] into a V′[n] having a constant magnitude if the V[n] is different from the constant magnitude.
A sense amplifier or comparator can receive the V′[n] or the V[n] having the constant magnitude at its input to read each data bit in the data bit sequence; and an isolation or buffer amplifier can isolate the sense amplifier from the double sampler. The constant magnitude can be α/2(1−e−T/τ) and α can be a gain between the double sampling circuit and the input of the sense amplifier.
The RC circuit can comprise a parasitic capacitance C and a resistance of the input, and a shunt resistor R in parallel with the resistance, and RC=τ can be given by the product of the resistance of R and a parasitic capacitance C, such that an output of the input in response to the data bit sequence is integrated over the parasitic capacitance to produce the exponential signal.
The signal can be a current detected by a photodiode having a parasitic capacitance, and the exponential voltage signal across the parasitic capacitance and applied to the double sampling circuit can be VPD=VDD−RI1e−t/RC, where t is time, I1 is the current generated by the input in response to a bit comprising a one, VDD is a voltage of a power rail of the receiver, R is selected to prevent out of range input voltages that would saturate the sense amplifier or comparator.
A method embodiment for signal receiving can comprise receiving an input voltage at a front-end including a transimpedance amplifier (TIA) coupled to a capacitor, sampling the input voltage at an end of two consecutive bit times and determining a voltage difference of the sampled input voltage at the end of the two consecutive bit times with a double sampling circuit, and applying a dynamic offset voltage to the voltage difference with an offset voltage circuit. This method embodiment of the invention may be further modified consistent with the apparatus embodiments described herein.
Referring now to the drawings in which like reference numbers represent corresponding parts throughout:
In the following description of the preferred embodiment, reference is made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration a specific embodiment in which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the present invention.
Technical Description
1. Introduction
Some exemplary optical receiver front-end topologies are shown in
One or more embodiments of the proposed receiver resolves problems of conventional systems by employing an integrating RC front-end along with dynamic offset modulation technique that decouples the bandwidth/data-rate and integration/headroom trade-offs. In this technique, the input current from the photodiode is integrated over the parasitic capacitor, while a shunt resistor limits the voltage. This resistor along with the photodiode capacitance creates a time constant that is much larger than the bit time (Tb). The resulting voltage at the input is sampled every bit interval to resolve the received bit. This process is performed by comparing the two consecutive samples, however, due to the RC nature of the front-end, after several consecutive ones or zeros, the voltage reaches to the saturation level and results in close to zero voltage difference at the input of the next stage comparator. Embodiments of the present invention resolve this dependency on input by introducing a dynamic offset to the sense amplifier based on the value of the input voltage. This offset effectively increases the double-sampled voltage for weak ones/zeros.
The introduced offset is proportional to the difference between the sampled voltage and a fixed voltage with a proportionality coefficient, β. For an exponential decaying signal, β can be chosen such that the resulting double-sampled voltage is always equal to ΔVmax/2, where ΔVmax is the maximum voltage difference and occurs for a transition after many consecutive ones or zeros. As a result, a constant voltage difference can be guaranteed at the input of the comparator, regardless of the received input.
One or more embodiments of the invention can also be applied to on-chip interconnects as well. The limited bandwidth of the on-chip interconnects is mainly due to their RC nature. Especially in highly scaled technologies with highly resistive interconnects this problem is more pronounced. In addition, the ever increasing processing capability of microprocessors due to technology scaling necessitates the same trend of scaling in the I/O bandwidth. The conventional solution to the limited bandwidth of interconnects is to employ techniques such as decision feedback equalization (DFE) at the receiver or FIR pre-emphasis at the transmitter. However, the maximum achieved data rate using these techniques is less than 10 Gb/s with power efficiency of about 1 pJ/b while providing 2.5 Gb/s/μm throughput per area. The double sampling receiver with dynamic offset modulation of the present invention can achieve higher than 20 Gb/s with less than 0.2 pJ/b power efficiency and 12.5 Gb/s/μm throughput.
In addition, one or more embodiments can be further extended to more complex amplitude modulations with high spectral density such as 4-PAM in order to achieve higher data rate over bandwidth-limited channels as will be understood by those skilled in the art. For example, 4-PAM modulation achieves twice data rate as conventional OOK modulation with little extra complexity. This modulation can be employed both for the optical and electrical links
2. Optical Receiver Architecture According to One or More Embodiments
The task of the optical receiver is to resolve the value of the incoming signal by sensing the changes in the magnitude of photodiode current. To minimize the transmit optical power, the receiver has to be able to resolve small optically generated current from the photodiode. In order to achieve a robust data resolution with low BER, the total input-referred noise current from the circuitry and the diode itself should be well below the optically generated current. In general, design of a low-noise front-end with a very high bandwidth is difficult and requires high electrical power consumption. In most optical receivers, the photodiode current is converted to a voltage signal.
A simple resistor, shown in
the maximum bandwidth, and hence the data rate, supported by TIA is proportional to its gain A.
As a result, to achieve a high data-rate, a TIA with large gain-bandwidth product is required, which can result in high power consumption. Passive components such as inductors can be employed to enhance the bandwidth of TIA [4]-[6], but impose a significant area overhead.
An alternative to TIA is the integrating front-end [15] as shown in
The double-sampling technique allows for immediate demultiplexing at the front-end by employing multiple clock phases and samplers. It also eliminates the need for high gain stages, such as TIA, that operate at the input data rate. Another advantage associated with this technique is the inherent single ended to differential conversion that happens at the front-end and reduces receiver sensitivity to common-mode interferences. A significant advantage of this technique is that it mainly employs digital circuitry that allows for achieving considerable power saving by scaling to advance technology nodes. However, this technique suffers from voltage headroom limitations and requires short-length dc-balanced inputs such as 8B/10B encoded data.
for a long sequence of “0” following a long sequence of “1,” where VPD denotes the input voltage, R is the front-end resistance, Cin is the total capacitance at the input, and I1 is the current due to a “1” input. Double-sampling can be applied with a double-sampling circuit 204 to sample the input voltage at the end of two consecutive bit times Vn-1 and Vn, as shown in
where Tb denotes the bit time. For instance, a “1” after a long sequence of “0” generates larger ΔV[n] than a “1” after a long sequence of “1.” The dependency of the voltage difference on the input signal can be resolved by introducing a “dynamic offset” to the sense amplifier using the system of
As an example, a long sequence of ones, followed by a long sequence of zeros is considered as illustrated in
ΔV(z)=(1−z−1)V(z). (5)
After subtracting the previous sample, V[n−1], the resulting voltage difference ΔV′[n] can be written in the z-domain as
ΔV′(z)=(1−z−1)V(z)+βz−1V(z) (6)
where β is the DOM coefficient and V(z) is equal to
In order to have a constant ΔV′[n] regardless of the received input sequence, β is found for which ΔV′(z) is independent of z. By substituting (7) in (6), it can be shown that for
ΔV′(z) will be independent of z and equal to
is the double-sampled voltage due to a one (zero) following a long sequence of zeros (ones).
3. Analysis of an Exemplary Optical Receiver Embodiment
In this example design, a demultiplexing factor of four is chosen as the minimum possible demux factor to allow for proper operation of the double sampler and the following comparator stage. The front-end S/H is comprised of a PMOS switch and the parasitic capacitor (Cs) from the following stage. The optimum size of Cs is chosen considering the noise performance of the front-end and S/H speed as will be explained later. An amplifier with about 6 dB of gain may be inserted between the S/H and the comparator to provide isolation between the sensitive sampling node and the comparator and minimize kickback noise. This also creates a constant common-mode voltage at the comparator input and improves its speed and offset performance. A StrongARM sense amplifier may be employed to achieve high sampling rate and low power.
For Tb<<RCin equation (11) can be approximated by
As a result, the receiver sensitivity is a strong function of the bit period Tb, total input capacitance Cin, photodiode responsivity ρ, and the total input-referred noise.
The receiver input capacitance is comprised of
C
in
=C
PD
+C
pad
+C
WB
+C
int+2CS (14)
where CPD is the photodiode capacitance, Cpad denotes the bonding pad capacitance, CWB is the wirebond capacitance, Cint is the input interconnect capacitance, and CS is the total sampling capacitance of each sampler. The required ΔVb is set by minimum signal-to noise ratio (SNR) for target BER and the residual input-referred offset of the sense amplifier after correction Voffset. As a result, the minimum required ΔVb is equal to
where σn is the total input voltage noise variance, which is computed by input referring the receiver circuit noise and the effective clock jitter noise.
The main sources of noise in the RC front-end are the sampler noise, buffer noise, sense-amp noise, and, finally, clock jitter noise, as shown in
Where CA is the internal sense amplifier node capacitance, which is set to approximately 15 fF in order to obtain sufficient offset correction range. The sense amplifier gain Avsa is estimated to be equal to near unity for the 0.8 V common-mode input level set by the buffer output, resulting in a sense amplifier voltage noise sigma of 0.75 mV. The buffer noise can be written as
where is the transistor noise coefficient. According to simulation, the input-referred voltage noise variance of the buffer stage is equal to 0.6 mVrms while it provides about 6 dB of gain. Sampler voltage noise variance is equal to
where the factor of two is due to the two sampling capacitors employed in the sampler block, which generate the differential input voltage to the buffer.
Clock jitter also has an impact on the receiver sensitivity because any deviations from the ideal sampling time results in a reduced double-sampled differential voltage, as shown in
Using the measured clock jitter of about 1 psrms, it is estimated to be about 0.5 mVrms. As shown in
Where A is the buffer DC gain. As β/A<<1, the noise contribution of the DOM is negligible.
Combining the input-referred circuit noise and effective clock jitter noise, ignoring σDOM, results in the total input noise power equal to
In order for the receiver to achieve adequate sensitivity, it is essential to minimize the sense amplifier input-referred offset caused by device and capacitive mismatches. While the input-referred offset can be compensated by increasing the total area of the sense amplifier [16], this reduces the buffer bandwidth by increasing input capacitance and also results in higher power consumption. Thus, in order to minimize the input-referred offset while still using relatively small devices, a capacitive trimming offset correction technique may be used [14]. In this technique the capacitance is digitally adjusted to unbalance the amplifier and cancel the offset voltage. The residual offset is limited by the minimum offset cancelling capacitance possible.
As shown in
V
PD max
−V
PD min
=RI
1 max
=RρP
max. (22)
In this design, the variable resistor (R) at the input changes from about 0.8 to 4 KΩ, which allows the receiver to operate for up to 0-dBm input optical power with a photodiode responsivity of about 1 A/W. According to simulation, the receiver may operate at higher input optical powers as the double-sampled voltage is quite large, however the excessive voltage at VPD will stress the transistors connected to this node. The minimum input optical power is determined by the noise performance of the front-end as explained earlier in this section.
In a fabricated prototype, the DOM coefficient β may be adjusted manually. In the following, an adaptive algorithm is introduced which can automatically set β for optimum operation. In addition, the required clock signals may be provided provided from off-chip. In a complete system the clock may be generated on-chip using a CDR. A bang-bang CDR technique can be applied to the novel receiver as described in the next section.
4. Design Considerations for an Exemplary Optical Receiver Embodiment
A number of additional design considerations such as adaptation techniques for DOM, scaling behavior of the receiver, and suitable clocking techniques can be readily evaluated by those skilled in the art. The feasibility of these techniques is validated through circuit- and system-level simulations.
a. Adaptation of Dynamic Offset Modulation
As previously shown, the DOM coefficient depends on the front-end time constant (RC). As a result, at the beginning of the operation and in order to maintain the operation of the receiver over slow dynamic variations such as temperature or supply drifts, an adaptation technique should be employed. The RC front-end may be considered first without the DOM. As previously discussed and illustrated in
A block diagram of a bang-bang-controlled gain adjustment loop is shown in
The adaptation loop can be designed to operate only occasionally to correct for slow variations, and the same hardware can be reused for clock recovery as explained hereafter.
b. Scaling
Silicon photonics has offered high-performance optical components, such as Germanium photodiodes, waveguides and modulators. This integration allows for very small photodiode parasitic capacitance. Here, the effect of photodiode capacitance scaling on the performance of the proposed receiver is evaluated. According to equation (10), the double-sampling voltage is inversely proportional to the photodiode capacitance. As a result, larger double-sampling voltage can be achieved using a smaller parasitic capacitance. This allows for scaling the receiver sensitivity Pavg for a fixed data rate. This argument is valid under this assumption that no charge sharing happens between the photodiode capacitance and the receiver sampling capacitors. In order to minimize this charge sharing, a certain ratio between the photodiode capacitance and the sampling capacitor has to be kept. In this example design, this ratio is chosen to be about 10. Therefore, while the photodiode capacitance is scaled, the sampling capacitor should also scale with the same rate. This in turn increases the kT/C noise of the sampler and degrades the front-end SNR. However, as the noise is inversely proportional to square root of the capacitor size, the overall SNR and hence the sensitivity of the receiver increases proportional to the square root of the photodiode scaling factor, as shown in
The receiver maximum data rate is also a function of the photodiode capacitance. According to
for a given sensitivity, the data rate can be increased by Rb scaling the photodiode capacitance. As mentioned earlier, in order to minimize charge sharing, the sampling capacitor scales with the same rate as the photodiode capacitance. As a result the input referred-noise, σn, changes accordingly. For the target RX sensitivity of 100 μA,
c. Clocking
An interesting problem in a clocked integrating front-end is to recover the clock from the incoming data. As mentioned previously, the clock jitter could be one of the limiting factors in the receiver sensitivity. As a result, an efficient low jitter clocking technique is crucial. For highly parallel links, a dual-loop CDR [17] can be employed with one loop for the frequency synthesis, which can be shared between all of the channels, and the other for phase correction in each channel (alternatively, in a source-synchronous clocking scheme, the frequency synthesis loop can be eliminated, and a phase correction loop will be sufficient). An alternative technique is to employ a forwarded-clock scheme in aWDMlink using one of the channels (wavelengths), which allows for simple phase correction loops to set the optimal sampling time.
The most common phase detection technique employed in electrical signaling is the 2×-oversampled phase detector known as an Alexander phase detector [18]. A similar technique can be applied to the proposed double-sampled front-end.
Removing the extra phases for oversampled phase recovery can help to reduce the power consumption in the oscillator and clock buffers and relax the difficulties of phase spacing control. The RC front-end allows us to create an efficient baud-rate phase recovery scheme similar to [9], [19] based only on data samples as shown in
Due to the less update density in the baud-rate phase detection technique, the overall loop gain is smaller compared to the conventional 2×-oversampling by almost a factor of 2.67[19]. As a result, 2×-oversampling phase correction loop provides higher bandwidth, for identical loop filter and charge-pump, and hence superior jitter tolerance. On the other hand, the baud-rate phase detector has the additional advantage of being less sensitive to clock phase errors, as the same clocks are used for both the data and phase samples, whereas the 2×-oversampling detector relies on quadrature phase matching.
Another important aspect of the phase correction loop is its effect on the operation of the β correction loop. As explained earlier, these two loops operate based on the same correction signal P to minimize the difference between the two consecutive double-sampled voltages, ΔV′[n−1] and ΔV′[n]. As a result, they can operate concurrently to adjust β and clock phase. This can be validated in simulation for a PRBS7 pattern when the initial phase is about half UI apart from the optimal point. The bandwidth of the β and phase correction loop in this simulation can be approximately 2 MHz. The experiment can be repeated for the case where the clock phase was leading and lagging with respect to the optimal clock phase as well as over and undercompensated β.
As mentioned earlier in this section, the only difference between the adjustment loop and the CDR loop is the length of the pattern that should be monitored. As a result, the same hardware (P comparators) employed in the adaptation loop can be reused to perform clock recovery except for the pattern detection logic. This allows for savings in power consumption and area.
An exemplary embodiment of the invention can be fabricated in a 65-nm CMOS technology with the receiver occupying less than 0.0028 mm as shown in
The functionality of the receiver may first be validated using the on-chip emulator and PRBS7, PRBS9, PRBS15 sequences. R and Cin may be 2.2 KΩ and 250 fF (RCin>550 ps).
In a second set of measurements, the receiver can be wirebonded to a high speed photodiode and tested at different data rates. The photodiode, bonding pad, wire-bond, and the receiver front-end are estimated to introduce more than 200 fF capacitance.
where PS is the optical power sensitivity, IS=I1−I0 is the current sensitivity and ER is the extinction ratio. The measured extinction ratio at 14.2 Gb/s is about 13 dB using the external modulator. As a result, the nominal optical sensitivity according to the current sensitivity of 75 μA will be equal to −14 dBm. The difference between the nominal and measured optical sensitivity is about 5 dB, which is believed to be due to the coupling loss. This difference grows as the data rate increases due to the limited bandwidth of the external modulator. Therefore, the sensitivity can improve by employing advance optical packaging technologies.
Table 1 summarizes the performance of the proposed optical receiver and compares it with other methods.
5. Electrical Signalling and Application to on Chip Interconnects
a. Introduction
As VLSI technologies and multi-core processor chips continue to scale, long on-chip wires will present increasing performance limitations. While transistors favor from technology scaling, the shrinking cross-sectional area of the on-chip wires increases electrical resistance and hence their latency, which has a quadratic relation with the wire length. Simple inverter-based repeaters can partially mitigate the latency problem, where an optimal design makes the repeated wire delay linear with length instead of quadratic. However, the associated power and area become prohibitive as the technology scales due to the increased number of repeaters per unit length,
Due to the RC nature of the on-chip wires, binary signals suffer from a long train of post-cursor inter-symbol interference (ISI). To eliminate ISI, equalization techniques such as decision feedback equalization (DFE) can be utilized, but, the long post-cursor tail necessitates many DFE taps, which results in significant power overhead. This problem is exacerbated as the technology scales. RC signal emulation in a DFE, is also an attractive solution to eliminate many taps of post-cursors. The main limiting factor in this technique is to meet the timing requirement in the feedback loop, especially at high data rates.
b. Electrical Signaling Receiver Architecture
Applying the principle of the receiver embodiment previously described, one or more embodiments can achieve an on-chip link using minimum-pitch wires for high-speed signaling to address the bandwidth requirement of future microprocessors. One or more embodiments can employ the double-sampling technique with a feed-forward dynamic offset modulation (DOM) to achieve high data rates over minimum-pitch and long on-chip wires that suffer from excessive loss and latency. In order to further improve data rate and reduce power consumption, a capacitively-driven transmitter [25] may be employed. One or more embodiments achieve low power consumption, high bandwidth density and scalability to future technology nodes.
Minimum-pitch wires can have a slow exponential response to a fast transition, with a time-constant (τ) much larger than the bit time (T). Instead of conventional equalization techniques, a mostly-digital double-sampling technique may be applied to break the tradeoff between the data-rate and the on-chip wire time-constant.
As shown in
where α is the main path gain. This results in a constant double-sampled voltage, ΔV′[n], equal to
Another advantage of this technique is the capability to perform immediate demultiplexing at the front-end. A quarter-rate architecture (multiplexing factor of 4) is employed in this design. As a result, the comparators can operate in a fraction of the data rate.
Utilizing low-swing signaling also reduces the power consumption in an on-chip interconnect, where most of the power is associated with the dynamic charging and discharging of the wire capacitance (Cw). A separate supply can be employed for an inverter-based transmitter to reduce the signal swing and hence improve power efficiency. However, it is not desirable to have multiple supplies on chip, as it makes the power distribution complicated. An alternative approach to achieve low swing is to drive the wire through a capacitor, Cp. This helps reducing the signal swing on the wire through a capacitive voltage divider. Ignoring the parasitic capacitance associated with the driver and the receiver, the resulting signal swing at the receiver side will be equal to Cw/(Cw+Cp)×Vdd. This capacitor also pre-emphasizes transitions and reduces the driver's load. Because it acts as a high-pass filter, the capacitor increases the bandwidth of the wire by almost a factor of Cw/Cp and decreases latency.
c. Exemplary Implementation
In this embodiment a capacitive driver is employed to achieve small voltage swing and reduce power consumption. It should be noted that the coupling capacitor limits the maximum number of consecutive ones/zeros due to the voltage drift associated with the high-pass behavior of the link. As a result, the coupling capacitor, Cp is optimized to reduce the time constant of the drift process while providing reasonable voltage swing and bandwidth enhancement. In this design a PMOS transistor realizes a 400 fF capacitor for the driver. This results in about 140 mV voltage swing over a 7 mm wire and less than about 1 mV drift in voltage after more than 40 consecutive ones/zeros. The termination resistor sets the receiver's DC voltage to Vdd.
The high DC voltage at the input of the receiver guarantees best operation of the PMOS samplers as shown in
An amplifier with about 6 dB gain provides isolation between the sensitive sampling node and the sense amplifier. It also creates a constant common-mode voltage and prevents input dependent offset. A StrongARM sense amplifier may be employed to achieve high speed and low power. The sense amplifier has a separate offset cancellation for mismatch compensation through the variable capacitors shown in
An exemplary embodiment can be fabricated in 28 nm LP CMOS technology with the receiver and transmitter occupying less than 950 μm2 and 160 μm2, respectively. The functionality of the transceiver can be validated using single-ended on-chip wires with different lengths (4-7 mm). PRBS-7 to 31 data can be generated off-chip and sent to the on-chip transmitters.
The proposed link embodiment offers over 4× improvement in energy efficiency and about 40% lower latency compared to the repeated link. The receiver offers a peak energy efficiency of 136 fJ/b at 10 Gb/s data rate for 7 mm wires. The transceiver may also tested using a 4 mm wire. This link may be comprised of two adjacent wires to investigate the effect of crosstalk as shown in
Table 2 summarizes the performance of the proposed link and compares it with other methods.
6. Analysis of a Further Optical Receiver Embodiment
And the difference of samples is given by:
After dynamic offset modulation,
As a reminder, β, the DOM coefficient is chosen to be:
so that the resulting voltage is independent of n.
In the case of LBW TIA front-end there are two poles associated with the two nodes at the input and output of the LBW TIA.
where A is the gain of the amplifier and RF is the feedback resistance.
In this case the DOM coefficient is chosen to be:
to cancel the dominant pole at the output of DOM. It should be noted that the time constant associated with the input and output nodes are approximately:
For 28 nm technology the minimum controllable CL, which is the load of 2 sampling caps is approximately 15 fF (at any time 2 sampling caps load the LBW TIA). The gain of a one stage inverter-based amplifier, optimized for power, is about A≈12 dB. So, for any Cp smaller than 70 pF, τ2 is the dominant pole. Therefore
is chosen to be the DOM coefficient to cancel sampling voltage variations due to the dominant pole.
The BER is determined and the eye is formed for n>>1
Note that τ1, τ2>>Tb. Therefore,
In addition, note that the dominant pole associated with RC front-end is due to Cp while the dominant pole associated with LBW TIA, for a small enough PD cap, is CL.
In order to analyze sensitivity of receiver, all noise sources are taken into account from PD to Sense-Amp as shown in
Considering the small power overhead added by the LBW_TIA, a figure of merit (FOM) has been defined as FOM=sensitivity×power and is simulated to get a sense of where LBW TIA front-end is beneficial comparing to a simple RC front-end. For this FOM, a minimum of 70 fF PD cap is required to see the enhancement.
Table 3 shows simulation results and a comparison of other optical receivers to the design according to one or more embodiments
7. Pulse Amplitude Modulation
One or more embodiments can be further extended to more complex amplitude modulations with high spectral density such as 4-PAM in order to achieve higher data rate over bandwidth-limited channels. For instance, 4-PAM modulation achieves twice data rate as conventional OOK modulation with little extra complexity. This modulation can be employed both for the optical and electrical links.
9. Process Steps
Block 2700 represents receiving an input voltage at a front-end including a resistor or transimpedance amplifier (TIA) coupled to a capacitor.
Block 2702 represents sampling the input voltage at an end of two consecutive bit times and determining a voltage difference of the sampled input voltage at the end of the two consecutive bit times, using a double sampling circuit.
Block 2704 represents applying a dynamic offset voltage to the voltage difference with an offset voltage circuit.
Block 2706 represents determining a binary output representing the signal from the voltage difference with the applied dynamic offset voltage.
Block 2800 represents providing a (e.g. low bandwidth) resistive front-end for (e.g., low power high speed) signal communication including a double sampling circuit for sampling an input voltage, at the end of two consecutive bit times, producing a voltage difference used to determine a signal received on the front end; and a dynamic offset modulation (DOM) circuit for applying a dynamic offset voltage to the voltage difference, wherein the DOM circuit avoids input-dependent performance degradation. The apparatus can avoid a full bandwidth front end or the front-end can have a bandwidth that is at least an order of magnitude less than a bandwidth of the signal.
The signal can be a data bit sequence, wherein each bit in the sequence is indexed by an integer n and the input produces a signal in response thereto; the front end can include a resistor-capacitor (RC) circuit that integrates the signal to produce an exponential signal, wherein a time constant RC=τ of the RC circuit is greater than a bit time T of the data bit sequence; the double sampling circuit can sample a first level V(n−1) of the exponential signal and a next level V(n) at the bit time T later of the exponential signal, the voltage difference can be V[n]=V(n)−V(n−1), V[n]>0 can indicate the nth bit is a one, and V[n]<0 can indicate the nth bit is a zero; and a DOM circuit can convert V[n] into a V′[n] having a constant magnitude if the V[n] is different from the constant magnitude.
The constant magnitude can be α/2(1−e−T/τ) and α is a gain between the double sampling circuit and the input of the sense amplifier.
The RC circuit can comprise a parasitic capacitance C and a resistance of the input, and a shunt resistor R in parallel with the resistance, and RC=τ can be given by the product of the resistance of R and a parasitic capacitance C, such that an output of the input in response to the data bit sequence is integrated over the parasitic capacitance to produce the exponential signal.
The signal can be a current detected by a photodiode having a parasitic capacitance, and the exponential voltage signal across the parasitic capacitance and applied to the double sampling circuit can be VPD=VDD−RI1e−t/RC, where t is time, I1 is the current generated by the input in response to a bit comprising a one, VDD is a voltage of a power rail of the receiver, R is selected to prevent out of range input voltages that would saturate the sense amplifier or comparator.
A gain of the dynamic offset voltage can be (1−e−T/τ), where T is a bit time τ is an RC time constant of the front end.
The input voltage can be from an on-chip interconnect, from a photodiode receiving light, and/or be a pulse amplitude modulation signal, for example.
The front-end, the double sampling circuit, and the offset voltage circuit can be implemented in complementary metal oxide semiconductor (CMOS).
The resistor can be replaced with a TIA. The front end can comprise a Resistance-Capacitance (RC) circuit and a resistance R in the RC circuit can be a low-bandwidth Trans Impedance Amplifier (TIA).
The front end can be for low power (less than 0.5 pJ/s) and high speed (higher than 20 Gb/s) signal communication. Speed and power numbers depend on fabrication technology. However, for 65 nm CMOS technology the speed can be higher that 20 Gb/s and power can be lower than 0.5 pJ/s. As the technology scales to smaller nodes these numbers get better.
Block 2802 represents providing a device (e.g., sense amplifier or comparator) to receive the V′[n] or the V[n] having the constant magnitude at its input to read each data bit in the data bit sequence. An isolation or buffer amplifier can isolate the sense amplifier from the double sampler.
Block 2804 represents the end result, an apparatus for signal communication. The apparatus can be scalable and portable. In reference to scalability, one or more embodiments of the design can be implemented in any CMOS technology node (in fact the two prototypes tested were in 65 nm CMOS and 28 nm CMOS). So as the technology advances to smaller nodes, i.e. 20 nm-14 nm etc, the proposed design is still valid.
In one embodiment, the double sampling circuit can perform de-multiplexing (e.g., time division demultiplexing), as shown in the embodiment of
10. Advantages and Improvements
As the CMOS technology scales along the ITRS roadmap, there is an ever-increasing gap between core processing power and inter-chip I/O electrical channel bandwidth. Optical channels have negligible frequency dependent loss while they have orders of magnitude higher bandwidth density compared to electrical channels. This makes optical channels an attractive alternative for I/O parallel links Recent improvements in silicon photonics have established a new milestone in chip-to-chip I/O bandwidth. Low-cap high-speed photo-detectors have surpassed 30 GHz of bandwidth while maintaining very low capacitance load. Besides, novel hybrid integration techniques, such as micro bump or through silicon via, provide extremely low-capacitance bonds between photo-detectors and the receiver circuitry eliminating needs for high-capacitance traditional wire-bonds or flip-chip bonds. As the total capacitive load of the photo-detector and its bonding to circuitry decreases, it becomes comparable to the load of receiver stages. This shift of paradigm necessitates rethinking of optical receiver architecture to take full advantage of the state-of-the-art technology. One or more embodiments of the invention satisfy this need.
One or more embodiments include a compact low-power optical receiver that scales well with technology for use in optical signaling in chip-to-chip and on-chip communication.
One or more embodiments achieve a dense, high speed, low power-efficient optical receiver in 65 nm CMOS that supports up to 24 Gb/s of data rate. The low-voltage RC front-end receiver uses mostly digital building blocks and avoids the use of linear high-gain analog elements. The proposed receiver employs double-sampling and dynamic offset modulation to resolve arbitrary patterns. An efficient adaptation algorithm for adjustment of DOM gain is proposed and investigated. The application of the baud-rate clock recovery to the receiver is also analyzed. The receiver consumes less than 0.36 pJ/b power at 20 Gb/s, and operates up to 24 Gb/s with −4.7 dBm optical sensitivity (BER<10−12). Since a large percentage of power consumption is due to the clock buffers and digital blocks, the overall power consumption can greatly benefit from technology scaling. It is also shown that this design is highly suitable for hybrid integration with low-capacitance photodiodes to achieve high optical sensitivity and high data rate. Experimental results validate the feasibility of the receiver for ultra-low-power, high-data rate and highly parallel optical links.
One or more embodiments achieve a high data rate, low power on-chip link in 28 nm CMOS. This embodiment features a double-sampling receiver with dynamic offset modulation and a capacitively-driven transmitter. The functionality of one embodiment link was validated using 4-7 mm minimum-pitch on-chip wires, achieving up to 20 Gb/s of data rate (13.9 Gb/s/μm) with BER<10−12, better than 136 fJ/b of power efficiency at 10 Gb/s, and the total area of the transmitter and receiver less than 1110 μm2. In one embodiment, a transceiver for repeater-less on-chip communication demonstrates high bandwidth density, low latency, and low power consumption. The mainly digital architecture of this embodiment is well-suited for highly-scaled technologies. Experimental results for one or more embodiments validate the functionality of the link in 28 nm CMOS. One or more embodiments offer up to 20 Gb/s/ch data rate and 12.5 Gb/s/μm bandwidth density with better than 180 fJ/b energy efficiency.
The following references are incorporated by reference herein.
[30] Y. Liu et. al., “A 10-Gb/s compact low-power serial I/O with DFE-IIR equalization in 65-nm CMOS,” IEEE ISSCC Digest of Technical Papers, pp. 182-183, February 2009.
This concludes the description of the preferred embodiment of the present invention. The foregoing description of one or more embodiments of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope of the invention be limited not by this detailed description, but rather by the claims appended hereto.
This application claims the benefit under 35 U.S.C. §119(e) of the following U.S. provisional patent application, which is incorporated by reference herein: U.S. Provisional Patent Application No. 61/643,086, filed May 4, 2012, by Azita Emami-Neyestanak and Meisam Hoarvar Nazari, entitled “Double-Sampling Receiver with Dynamic Offset Modulation for Optical and Electrical Signaling”, (CIT-6189-P).
This invention was made with government support under ECCS0747768 awarded by the National Science Foundation. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
61643086 | May 2012 | US |