This is the first application filed in respect of the present invention.
The present invention relates to high-speed communications networks, and in particular to methods and systems for soft thermal failure in a high capacity transmission system.
In the field of communications networks, telecommunications equipment is required to meet stringent availability and environmental requirements. In very general terms, telecommunications equipment (such as transmitters, receivers, routers and switches) must be able to function properly under any anticipated environmental conditions. For example, the Synchronous Optical Network (SONET) standard specifies that an OC-192 signal operate at a line rate of 9.95328 Gb/s (giga-bits/second) with a maximum permissible bit error rate (of 10-12). All SONET equipment designed to process such a signal must be able to achieve this level of performance, even in the presence of simultaneous events such as worst case ambient air temperatures, fan failures, dusty filters, thin air at a high altitude location, worst case device processing, and worst case power supply voltages. Other network protocols provide different line rates and maximum permissible error rates, but in all cases, the installed equipment must meet the specified performance under anticipated worst-case conditions. It will also be noted that, at the physical network layer, the data rate (line rate) is generally fixed and all data bits in the signal must be processed.
As is well known in the art, Integrated Circuit (IC) components have a maximum safe operating temperature. For example a CMOS chip may operate with a junction temperature of 100° C. Above that temperature, timing and signal errors can occur such that the chip may temporarily fail to perform its function, and the probability of permanent chip failure also increases.
The heat generated in semiconductor circuits (such as CMOS) is the combined effect of leakage currents and state transitions due to signal processing. Leakage currents are present any time power is supplied to the circuit, so that, even when the circuit is not processing a signal, a base level of heat will always be generated within the circuit. The active processing of a signal involves state transitions within the logic gates forming the circuit, and the resulting currents produce an “active” heating of the circuit. The magnitude of this active heating is approximately proportional to the frequency of the clock applied to the circuit. Thus, all other things being equal, doubling the clock speed will double the active heating of the circuit.
In high capacity telecommunications systems, clock speeds of 20 GHz and higher are commonly encountered. At these speeds, the amount of active heat generated in an IC can amount to several Watts, which can easily raise junction temperatures to dangerous levels. This problem is expected to become increasingly severe due to continuing demand for ever faster channel line rates (and thus IC clock speeds)
Accordingly, techniques that enable reliable operation of IC components in a high capacity telecommunications systems are highly desirable.
The present invention addresses the above-noted problems by providing a technique for soft thermal failure in a high speed IC.
Thus, an aspect of the present invention provides a method of managing operation of an Integrated Circuit (IC) designed to process a signal. A temperature of the IC is detected, and signal processing performed by the IC adjusted based on the detected temperature.
Further features and advantages of the present invention will become apparent from the following detailed description, taken in combination with the appended drawings, in which:
It will be noted that throughout the appended drawings, like features are identified by like reference numerals.
The present invention provides technique for soft thermal failure in a high speed IC. Embodiments of the present invention are described below, by way of example only, with reference to
In very general terms, the present invention provides techniques in which the signal processing performed by an IC can be adjusted based on changes in the IC temperature. Thus, the present invention also provides a control circuit and method in which the temperature of the IC is detected, and the detected temperature used to generate a control signal used to adjust (either by reducing or increasing) the signal processing performed by the IC. If desired, the temperature may be detected at multiple locations within the IC, and the multiple temperature values used singly, for example to detect portions of the IC which may be generating most of the active heat load. Alternatively, the multiple temperature measurements may be combined, for example to obtain an average temperature of the IC. If desired, the rate of change of the IC temperature with time may be calculated and used. In any such case, specific devices for detecting the temperature of an IC, such as thermocouples or temperature-sensitive resistors, for example, are well known, and thus will not be described herein.
Various techniques may be used to process the detected temperature value(s) to generate one or more control signals. These may include comparing a detected temperature to a predetermined threshold. Thus, in the following examples, the processing performed by the IC is reduced if the detected temperature rises above a predetermined threshold. This predetermined threshold may correspond with a maximum permissible IC temperature, or alternatively may correspond with a selected temperature blow the maximum permissible, for example to provide a safety margin. It will be appreciated that the reverse-operation is also possible; that is, the processing performed by the IC may be increased (within the physical design constraints of the IC, of course) when the detected temperature is below the threshold. In some cases, it may be desirable to provide two thresholds, one representing a higher temperature than the other. Thus, for example, the processing performed by the IC can be reduced if the detected temperature rises above the first (higher) threshold, and is increased when the detected temperature drops below the second (lower) threshold. As may be appreciated, this operation tends to reduce rapidly changing the IC processing.
As will be appreciated, the design of control circuits capable of performing the above operations is well within the purview of those of ordinary skill in the art, and so do not require detailed explanation herein.
For the purposes of the present description the invention will be described by way of two embodiments, namely a Frequency-Domain Processor (FDP) and a Forward Error Correction (FEC) decoder, both of which may be implemented as IC components in a receiver. Those of ordinary skill in the art will recognise, however, that the present invention is by no means limited to these examples. It is expected that those of ordinary skill in the art will be able, based on the teachings of the present application, to apply the techniques to numerous other components in a high-capacity signal processing system.
Each raw digital sample stream IX, QX, and IY, QY generated by the A/D converters 16 is composed of a series of multi-bit digital values. The resolution (in bits) of each sample stream is established by the design of the A/D converter circuit, and is selected based on a compromise between cost and the desired precision of the signal processing performed by the equalizer 18. In practice a resolution of between 6 and 8 bits has been found to be satisfactory.
In general, the equalizer 18 operates to compensate chromatic dispersion and polarization rotation impairments. The compensated signals 20 output from the equalizer 18 represent multi-bit estimates X′ (n) and Y′ (n) of the symbols encoded on each transmitted polarization 34 of the received optical signal 4. The symbol estimates 20 X′ (n), Y′ (n), are supplied to a carrier recovery block 22 for LO frequency control, symbol detection and data recovery, such as described in Applicant's co-pending U.S. patent application Ser. No. 11/366,392 filed Mar. 2, 2006.
In the embodiment of
In the embodiment of
The cross-compensation block 34 applies x-polarization vectors HXX, HXY to the X-polarization intermediate array {TAX} and Y-polarization vectors HYY, HYX to the Y-polarization intermediate array {TAY}. The multiplication results are then added together to generate modified vectors {VAX} and {VAY}, as may be seen in
The modified arrays {VAX} and {VAY} output by the FDP 30 are then supplied to respective retiming blocks 36, which operate to re-time the modified arrays {VAX} and {VAY} from the sample timing of the A/D converters 16 to the desired T-spaced timing of the multi-bit symbol estimates 20.
By way of example only, consider an embodiment in which a 35 GBaud optical signal 4 is sampled at a sample rate of 1/TS=40 GHz. The A/D converters 16, FFT blocks 26, and FDP 30 will all operate at the sample rate of 1/TS=40 GHz. In this case, each of the arrays {RAX} and {RAY} output by the FFT blocks 26, and thus each of the modified arrays {VAX} and {VAY} output by the FDP 30 span a frequency range of 0-40 GHz, with upper side band (USB) spectrum at 0-20 GHz and lower side band (LSB) at 20-40 GHz, respectively. In principle, retiming of the modified arrays {VAX} and {VAY} can be accomplished by extracting the center portion of each array, and then supplying the remaining portions to the IFFT blocks 38.
In an embodiment in which each FFT block 26 has a width of 1024 taps (that is taps n=0 . . . 1023), with USB at n=0-511 and LSB at n−512-1023, retiming the modified arrays {VAX} and {VAY} can be accomplished by recognising that, within each array, the combined spectrum is nominally symmetrical about the center of the array. Consequently, the upper and lower halves of the array can be overlapped by a selected number of taps in the center portion of the array, and the thus “overlapped taps” added together. The number of overlapped taps is selected so that, after the addition operation, the total number of remaining taps corresponds with the desired width of the retimed array. In the above example, the 1024 taps of the FFT output spans a frequency range of 0-40 GHz, and it is desired to reduce the frequency range to 0-35 GHz. This corresponds with a reduction of 128 taps. Thus, the upper and lower halves of each modified array {VAX} and {VAY} are overlapped by 128 taps and the overlapped taps added together. This results in the 128 taps lying above the center of the array (taps n=512 . . . 639) being added to the 128 taps lying below the center of the array (taps n=384 . . . 511), and the summation result supplied to taps 384-511 of the retimed array, as shown in
The retimed arrays {VAX}′ and {VAY}′ are then supplied to the IFFT blocks 38, which operate at the T-spaced symbol rate of 35 GHz to generate time domain data 40, in the form of a complex valued vector having a width equal to the IFFT 38, which, in the illustrated embodiment is N=896 taps. The IFFT output data 40 is divided into two blocks {v0X}, and {v1X}, of which {vX} is selected as the equalizer output 20 in the form of a complex valued vector {υIX+jυQX} representing p=448 T-spaced complex valued estimates X′ (n) and Y′ (n) of the transmitted symbols. The other IFFT output block, {v0X}, is discarded.
As may be appreciated, the amount of active heat generated by the FDP 30, for any given sample clock speed, is a function of the average number of gates exhibiting state transitions at any particular instant. Thus, it is possible to control the active heat generated by controlling this number. This can be accomplished by each of the following strategies, which can be used alone, or in any desired combination:
zero-out one or more least significant bits (LSBs) of the multi-bit samples generated by each A/D converter 16;
zero-out one or more LSBs of each tap of the FFT 26; and
reduce the impulse response width of the equalizer 18 (or, equivalently, FFT 26);
A directly analogous approach can be used to zero-out one or more LSBs of each tap of the FFT 26.
As will be appreciated, zeroing-out the LSBs of the A/D samples and/or the FFT output array {RAX} values will inherently reduce the precision of the equalizer 18, and so increase the error rate of the sample estimates 20. However, sufficiently robust Forward Error Correction (FEC), may be able to compensate the increased raw error rate.
Iterative FEC decoding is known in the art. Normally, each iteration corrects some of the bit errors, and enables additional bit errors to be corrected by the next iteration. In the embodiment of
The second stage 48 encompasses any desired number of additional FEC decode blocks and a selector block 50, which enables a desired number of additional FEC decode iterations to be performed. In the illustrated embodiment, the output of each FEC decode block 44 in the second stage 48 is supplied to both the next FEC decode block in the cascade, and to the selector block 50. A select signal supplied by a controller (not shown) can then be used to control the selector block 50 to route the output of a selected one of the FEC decode blocks to the output of the FEC decoder. The embodiment of
As may be seen in
For example, consider a communications system that uses an iteratively decoded product code. As is well known in the art, each error that is corrected in any given iteration tends to allow other errors to be corrected in later iterations. Such a method can convert a raw error rate of 0.3% to better than 1×10−12 after ten iterations.
Further consider a FEC decoder IC in which, under worst case conditions, a junction temperature of 100° C. is obtained at a power dissipation of 20 Watts power dissipation. If the Input/Output (IO) heat of the IC is 5 Watts and the static heat is 5 Watts, then the “power budget” for active heat is 10 Watts. If each FEC decode iteration produces 2 watts of active heat, then a maximum of five FEC decode iterations can be performed.
However, under optimum conditions (e.g. the decoder is operating at sea level, fans are operating to blow cool air over the IC, the device fabrication achieves typical parameters, and the battery voltage is nominal), the same FEC decode chip may achieve a junction temperature of only 65° C. at 20 Watts. If each iteration (2 Watts of active heat) raises the junction temperature by 4° C., then there is room for 8 more iterations (for a total of 13 iterations) before the thermal limit of 100° C. is reached.
With a temperature sensor on the chip, an on-chip controller can determine the current junction temperature, and adjust the number of iterations (in real-time) so as to maximize the number of iterations performed while at the same time keeping the junction temperature below the maximum permitted value.
The active heat generated by the FEC decoder 52 of
If one of the FEC decode blocks 54 is disabled, for example by means of a disable signal from the controller 60, then the number of active FEC decode blocks 54 is n=7, and the time available for computing FEC iterations is reduced to approximately n-1=6 clock cycles. The total number of FEC iterations that can be completed during this period will be
If a second FEC decode block 54 is disabled, then the period for completing FEC iterations will be further reduced, and the total number of FEC iterations that can be completed during this period will be m=8. Accordingly, as the number of active FEC decode blocks 54 is reduced, the total number of FEC iterations performed by each of the remaining active FEC decode blocks must be reduced proportionately, so as to ensure that the FEC iterations will be completed before the next data block arrives.
With a temperature sensor on the chip, an on-chip controller 60 can determine the current junction temperature, and adjust the number of active FEC decode blocks 54 (and thus the number of FEC iterations performed by each active FEC decode block), in real-time, so as to maximize the number of active FEC decode blocks (and the number of FEC iterations) performed while at the same time keeping the junction temperature below the maximum permitted value.
In the foregoing description, junction temperatures are determined by means of an on-chip temperature sensor. However, this is not essential. Other means of estimating junction temperature may be used. For example, a temperature sensor could be located in the environment of the chip, but separate from the chip, such as on a heat sink or in a cooling air inlet or exhaust stream. Either contact (e.g. resistive) or non-contact (e.g. Infra-red) sensors may be used. Parameters other than temperature can be used to deduce the IC temperature, such as supply voltage, supply current, power, air velocity, air pressure, fan failure, IC circuit parameter, temperature differential, bit or frame error count, iteration count, or application dependent provisioning.
The embodiments of the invention described above are intended to be illustrative only. The scope of the invention is therefore intended to be limited solely by the scope of the appended claims.