The present invention was made under joint research agreements involving École Polytechnique de Montréal, Université du Québec à Montréal and Gestion TechnoCap dated Nov. 23, 2006 and Jan. 1, 2009 and expanded to include among the parties Université du Québec en Outaouais on Nov. 29, 2007 and a Financing and Invention Agreement between Gestion TechnoCap Inc. and Richard Norman dated May 6, 2006.
The present invention relates to integrated circuits, and more particularly to integrated circuit interconnect devices and support circuits and devices for integrated circuit systems.
This section is dedicated to explain the prior works on which the present technical contributions are based.
Prior Art of the Invention: Software and Hardware Strategies for Defect Tolerance in Large Area Integrated Circuit.
The size of an integrated circuit (IC) increases not only its cost, but also the probability that manufacturing defects appear on its surface. When the surface of an IC is that of an entire wafer, the probability to find at least one defect on the surface increases toward certainty. Not all defects cause dramatic global failure on the entire IC. Some defects are benign; some generates various faults such as opens, shorts, stuck-at-one, stuck-at-zero. Some defects can be overcome by defect-tolerant strategies.
It is not possible with current deep-sub-micron lithography to process an entire wafer as single reticle image. LAIC systems are fabricated with reticle image fields that span a maximum typical area of about 2 cm×3 cm. The size of a wafer is more than an order of magnitude greater than the biggest reticle image, so every Wafer Scale Integrated (WSI) design must take this into account and design a functional circuit composed of big “macroscopic” repetitive cells. The larger the reticle image field is, more defects can appear on each reticle image, so, each reticle-sized circuit must tolerate more defects.
Even for mature microfabrication processes, manufacturing defects significantly reduce the yield of large functional ICs. A conventional yield is the fraction of functional ICs produced without defect. Defects appear randomly on a LAIC, and most of the time they cannot be detected by visual inspection, so it is impossible to know where defects are located by means other than electrical testing. It is therefore required to add “test” phases in the workflow of the LAIC under production.
Fast fault diagnosis is needed by the microelectronics industry. Rapid test allows cutting diagnosis cost in the production chain and if done in a defect aware design flow, can be advantageous for increasing productivity. Also, a fast diagnosis algorithm applied in a defect aware design flow can make the difference between a product that is usable and one that is not from a user standpoint.
While
In general, a Test Controller (TC) takes into account that the links and the topology connecting UUTs together are static. The internal programming of a TC is based on the topology of the UUTs and the relative position of every UUT being known in advance and not changing.
A very good hybrid solution that offers the best of star architecture and the best of daisy-chain architecture is the multi-dropout architecture, as depicted in
If the multi-dropout bus contains a single fault, the whole PCB or IC is dysfunctional. However, if only a one-to-one link between ICs or between the IC submodule and the bus is broken, only that submodule and the components it contains are isolated from the test controller. To improve the robustness of the system, other busses can be added, so even if one bus is defective, the system is still testable in part. The multi-dropout architecture improves defect tolerance and test time compared to the daisy chained PCB test architecture.
Designing a system architecture that can survive multiple defects can be very profitable for complex and large systems. The loss of a large and dense system due to failure of the test hardware can be very expensive and can impact its profitability. For the same reason, large and complex systems are preferably designed to have sufficient defect tolerance.
Various methodologies are known to produce fault tolerant LAICs. By default, LAICs produced with advanced semiconductor manufacturing are complex systems in which multiple defects are expected with a high probability. For example, a laser restructuring process can give a second life to an LAIC if the fault is properly diagnosed. If the defect is located, laser fuse or anti-fuse previously installed in the LAIC can be used to disable defective zones. Consequently, enabling fuses or anti-fuses allow proper isolation of defective zone and activate new zones in the LAIC. This healing process can be applied as long as the capability and wiring resources still exist to perform such operations.
Another transistor level technique for fault-tolerant LAICs is using self-healing sub-circuits able to overcome a finite number of faults. This is complex to implement and it usually demands redesigning logic cells using full-custom circuit layout.
Fault tolerance can be achieved by software-based partial reconfiguration and duplication of vulnerable functionality. This method consists of diagnosing systems to know exactly where the faults are located. Once all faults have been located, a software-based control system can make a partial reconfiguration of the system to exclude faulty zones in the circuit. Assuming each vulnerable functional module is at least duplicated, it is possible to preserve the functionality of the whole system.
Another well known solution is to use a voting scheme to increase the probability to get a functional scan chain. A group of 3 TAPs (Test Access Port) control the same portion of the system. Each TAP is associated with one redundant scan chain for one cell of the system. Each group of TAPs produces a viable communication links to its adjacent cell, because the probability to have two dysfunctional scan chains or two dysfunctional TAPs in the same cell is very low. The use of a voting scheme allows the external controller to view the defect-tolerant scan chain as one scan chain, reducing reprogramming and redesigning cost on the controller level. This solution is patented by Savaria and Lu in U.S. Pat. No. 6,928,606 entitled “Fault tolerant scan chain for a parallel processing system” [2] and published in [3].
Test of interconnect traces between every IC soldered on a PCB can be done through so-called boundary scans. This has been standardized [53] and is a well known method to access and control from the outside all the input and output pins of every IC soldered on a PCB. A boundary scan chain (BS) can be set up using different methods. Another well known method is to connect all BS cells in a daisy-chain scheme or to have a set of test busses to get parallel access to ICs' JTAG ports (multi-dropout boundary scan control architecture). If every pin of every IC is associated to one unique BS cell and if the order of appearance of every BS cell is known in the scan chain, it is possible to achieve very efficient diagnosis by applying special test schemes.
Another well known method for PCB diagnosis is a “walking one” sequence that can be easily generated with special built-in hardware. The output from a counter can be connected to a NOR(OR) gate of the same width as the width of the counter output. The resulting generated sequential vectors are 1000 . . . (0111 . . . ). This walking sequence allows shorts and stuck-at fault (SA) detection. Diagnosis is possible provided that the location of every PCB input and output terminal is known in the scan chain. The walking one sequence is used as input sequence in the scan chain applied to every output terminal. A test compactor can be used to compress the data coming out of the PCB to improve test speed. Counting the number of “1” that comes out of the PCB is an efficient compression method.
Another example of method used for short and stuck-at fault detection and localization is the checkerboard method. Using boundary scan, and knowing the location of every boundary scan in the PCB and its respective order in the scan chain spanning the PCB, it is possible to diagnose shorts and stuck-at faults SAFs efficiently in log2n time. The concept is to apply a special vector schemes on the PCB. Rather than a walking one vector scheme, checkerboard applies a sequence of decreasing periodic “cyclic” structure to every output terminal of the PCB. For example, with a PCB containing 8 output terminals, the test vector sequence is “11110000, 11001100, 10101010”. With 16 output terminals the test vector sequence is “1111111100000000, 1111000011110000, . . . ” and so on.
Using the concepts described above with proper modifications, improvement and adaptation for a regular reconfigurable network on chip (RRNoC) allows the design of an efficient diagnosis methodology. The difference between PCB test and diagnosis and RRNoC test and diagnosis lies in not all network on chip input and network on chip output being associated with a boundary scan. On the other hand, the network is by definition reconfigurable, so methods can take advantage of the network reconfigurability to improve its observability and controllability.
FPGA test and diagnosis methods have been proposed for finding faults (1) in configurable logic blocks (CLBs) and (2) in interconnect resources. Only fault diagnosis in interconnects relates to the present invention. A subdivision exists in each class of methods: fault diagnosis using the programmable fabric of the FPGA and fault diagnosis using DFT (design for testability).
There exist numerous papers (such as [4-7]) and U.S. Pat. No. 7,302,625 entitled “Built-in self test (BIST) technology for testing field programmable gate arrays (FPGAs) using partial reconfiguration” [8] covering the approach of fault diagnosis in interconnect resource using the programmability of FPGA. The key idea underlying this subclass of diagnosis methodology is to use the existing FPGA's configuration hardware infrastructure to auto-validate test of interconnects using a built-in self test (BIST) built from CLBs. The goal is to create as few as possible temporary and global data paths that travel along the FPGA to test interconnects. The number of configuration cycles can be optimized to make the diagnosis more efficient.
U.S. Pat. No. 6,966,020 entitled “Identifying faulty programmable interconnect resources of field programmable gate arrays” [9] discloses background information relating to “on-the-fly” diagnosis of FPGA interconnects. Test patterns are generated and this data travels via two or three identical groups of selected interconnect resources (called Wires under test (WUT)). If a difference is observed between two or three identical group of WUT, it proves that a fault exists somewhere in these WUTs. Then, the FPGA's interconnects can be reprogrammed and re-tested to narrow down the possible location of the faults. Based on this principle, searching can locate as precisely as possible faults in the network. Three identical groups are also used to allow multiple fault detection.
Design for testability principles have also been exploited in a second class of solution for diagnosing faults in interconnects. In this category, the main approach is to use power supply current (known as Iddq) monitoring as a means for locating faults. As the test vector sequence is applied to the IC under test, if the Iddq increases, then a bridging fault can be detected. Fault diagnosis using a search algorithm can be accurate enough to locate faults in interconnects, and bridging fault coverage of 100% is reachable with this technique. This method is very efficient for detecting faults in regular structures.
The present inventors have recently developed and published in [10] basic foundations of a test methodology related to the present disclosure.
As shown in
The preferred embodiment of the configurable crossbar from the prior art is shown in
The prior art includes a crossbar of 7 incoming interconnects and 7 outgoing interconnects in each four directions. Two signals can be redirected to at most two IC pin signals, therefore, m=2 and n=7 according to the convention of
In the prior art the basic walking-one diagnosis methodology is applied in 3 phases as explained in [10]. From each phase, a specific test type is applied. They are Test Type A, B, and C to refer respectively to phases A, B, C. Test Type A is depicted in the
Test type A: this test takes advantage of the available local control and observation registers to test concurrently each crossbar of the network.
Test type B (covered in the flowchart in
Test type C is depicted in
Test phase C begins with the test result from test phase A and B to generate a list of suspect cells 1401. In each suspect cell, a subset of suspect interconnects exist. Therefore, the next step 1402 is to create, for each suspect cell the list, a list named “ttp” of suspect interconnects. For each element of the list “ttp”, a unique network reconfiguration must be completed 1403. Each network reconfiguration is associated to a path, i.e. a set of activated interconnects between to distant point in the network 1404. At the end of the path is created the broadcast on the crossbar as explained earlier. Both broadcast “1” and “0” are applied on the crossbar. The result of the test can be shifted out 1405 to complete the defect map of the circuit 1407. It is important to notice that each suspicious cell can be tested concurrently because of the local nature of this test. However, if there are multiple suspect interconnects on the same cell, they must be tested sequentially.
It is known from previous work that the basic walking one approach is too slow to test and diagnose large networks. Improvements and new methods are needed to reach acceptable diagnosis efficiency.
A common use of logic diagnosis is to support fault tolerance of reconfigurable circuits. Knowing the precise location of faults in any homogenous and highly regular structure with reconfigurable capabilities permits the system to adapt to those faults. Fault tolerance (or defect tolerance) becomes an unavoidable topic as the scale of ICs is decreasing toward the physical limits of the photolithographic process. Furthermore, the increasing interest in wafer scale packaging and wafer scale integration system make defect tolerance a very important design issue to improve production yields.
Prior Art of the Invention: Configurable Interposer for Three Dimensional Large Area Integrated Circuits.
Three-dimensional (3D) chip integration is a means to create miniature, low-power and high-performance electronic systems. Significant improvements in performance of future electronic systems could be obtained from 3D chip stacks of at least two or more dies enabling dense, high-bandwidth and low-delay Z-axis interfaces between chips included in the 3D system.
3D stacked ICs are a very hot research topic [11-13]. There are already several 3D stacked ICs in production and the market is increasing significantly. A research and development roadmap has been proposed by the 3D stacked IC industry [14].
The main function of an interposer is to make mechanical and electrical connections between two layers. Interposers are used extensively in the microelectronic industry for three dimensional connections of integrated circuits (3D IC), such as in system in package (SiP), multi-dies stacks or multi-stack packages.
Designers working on 3D chip architectures face the major problem of increased power density. Power generates heat that must be channeled outside of the 3D structures. High temperatures create problems such as frequency throttling, increased noise, decreased chip life expectancy and degraded chip reliability. The disclosed configurable interposer with dynamic thermal management can alleviate thermal management issues.
Another problem created by heat appears in LAICs. Thermal gradients generate thermal stress in the silicon substrate. If the gradients are too large, it could result in breaking the silicon substrate and permanently damaging the system.
In 3D stacked ICs, multiple active die layers are stacked vertically and are interconnected together. Stacked layers are very densely interconnected making observation of 3D interconnects very difficult. Efficient and standardized tests of 3D stacked ICs are difficult to achieve. Furthermore, for the same reason, it is harder to diagnose faults in 3D stacked IC for devices being prototyped and devices under validation.
Several interposers used in 3D stacked ICs have already been patented. For example, U.S. Pat. No. 7,649,368, entitled “Wafer level interposer” [15] discloses an interposer that is designed to ease chip testability. This interposer is static and no configurable device is integrated. Other patents to protect special static interposers without active components are presented in the U.S. Pat. Application Publication No. 2008/0265,391 entitled “Etched Interposer for Integrated Circuit Devices” [16].
Some aspects of programmable interposers that map a packaged or unpackaged component's contacts to a different pattern have been disclosed in U.S. Pat. Application No. 2008/0143,379, entitled “Reprogrammable Circuit Board with Alignment-Insensitive Support for Multiple Component Contact Types” [1], and were based on the WaferIC™. This WaferIC™ is achieved by adding through-wafer vias for signal contacts as well as for power contacts. These programmable interposers can then map a component's contacts to a different pattern. This can be used, for example, to avoid redesigning a PCB when the contact pattern of a layer changes with a new generation of that layer, or when substituting a layer with a different contact pattern when assembling a PCB. Such an interposer is also used to adapt a component to a programmable PCB that does not support the contact type or spacing of that component. Using the alignment-insensitive contacts and programmable connectivity of the programmable interposer eliminates the need to have a custom interposer design for each component whose contacts are to be re-mapped. The configurable interposer is in fact an active substrate that can transmit data between any IC pins connected on this surface. The IC can be any CPU, microcontroller, FPGA or any IC whose pinout is compatible with the configurable interposer.
Several test methods exist that essentially control and observe many internal points and state bits through a limited of access points using some suitable protocol generally supported a controller or wrapper or some sort. Some are based on conventional scan often implemented using the IEEE1149.1 standard [53] that proposes Test Access Port and Boundary-Scan Architecture. Other standards extend the capability of the IEEE 1149.1 such as IEEE1149.6 [54] that includes AC-coupled and/or differential nets, IEEE1149.7 [55] that reduces the number of pins and enhances the functionality or the p1500 standard [56] that particularly supports a wide range of previously known test standards using a bus interface. This facilitates design, test and verification and provides a useful means of partitioning a system across large design teams.
Configurable Network on Chips (NoC) are extensively used in the SoC and FPGA industry to improve communication bandwidth and latency between various functional parts of the system. Configurable interposers offer a configurable network on chip that spans on the entire active surface of the interposers. This feature does not exist on any previously reported interposer.
Hardware assertion checking is becoming an important method to debug complex electronic systems in the semiconductor industry [17]. Hardware assertion checking is an efficient means to detect errors in complex digital systems where complex communication protocols are used. Circuits for assertion checking are synthesized in FPGA or in SoC logic and are embedded in devices under verification, and observe key signals to compare the actual circuit behavior with previously defined logical and temporal behavior of the design modeled in a high level language. In case of a fault in the hardware or a bug in the software, an assertion checker embedded in the device under verification can precisely identify the source of the problem in space (localized fault) and time (when the fault occurs according to what condition). Techniques already exist to create an efficient implementation (hardware synthesis) of assertions expressed in a high level language. But no existing system can program assertions in dedicated hardware inserted in a programmable interposer.
A scan chain spanning a whole 3D stack of chips is used to observe and force signals in a circuit for logical test of that circuit. As in the PCB industry, Design for Testability (DFT) is therefore used to test shorts and stuck-at faults between metal traces. At-speed observability and controllability of 3D stacked chips is hard to achieve because interconnects could be buried in the core of the 3D stacked chips. Therefore, the increased miniaturization of the 3D stacked chips makes at-speed DFT harder to achieve. No previous system offers the possibility to observe all the digital pins of all chips in the system.
Built-in self test (BIST) is a class of techniques through which a system can test itself using embedded electronic modules that generate test vectors and interpret the results locally in a circuit. BIST is extensively used in industry, but no existing interposer offers the possibility to program a BIST for rapid prototyping of DFT in 3D stacked chips. To diagnose problems encountered in some system under test, it is desirable to implement a BIST specialized for diagnosis; however, no existing interposer can configure an embedded BIST circuit dedicated to diagnosis of 3D stacked chips.
Prior Art of the Invention: Distributed Hardware and Software Strategy for Rapid Prototyping of Reliable and Energy-Efficient Three Dimensional Large Area Integrated Circuit System
Higher performance electronic systems are required by many applications. On the other hand, energy efficient electronic systems are becoming a strategic issue in electronics. For example, the market of portable devices is increasing every year and new products are designed demanding a very high level of performance for handheld devices. To maximize battery life, it is required to create energy efficient electronic systems. Furthermore, one of the most important challenges is to invest resources on research to develop new technologies that can make easier an evolution towards a more sustainable society. Reducing energy use of electronic systems can be very positive.
Electronic systems can be viewed as a set of heterogeneous interacting components. Some components are analog (e.g. a radio frequency filter circuit), some are purely digital (e.g. a CPU) and some contain electro-optical elements such as display. For example, a smart phone contains a central CPU connected to a cell phone, which is interacting with the user through a touch screen. Each component can be activated according to logical rules and according to the context. They can be activated in parallel or serially. They can be activated while a portion of the system is in a sleep state depending on the power budget.
To be competitive, an electronic system must be able to achieve peak performance on-demand [18]. The peak duration may not last for a long time. Therefore, components of the system can be forced into a sleep or idle state to minimize power consumption during most of the time. The ability to dynamically shut down and/or adjust the level of performance of each module is a way to reduce system energy consumption. Designing a system that is able to reconfigure its own state according to pre-defined rules to maximize energy efficiency is called dynamic power management (DPM). This methodology is used in portable devices, but increasingly used in stationary systems to create non-negligible energy savings in buildings, data centers, etc.
Extensive research has been done on Dynamic Power Management (DPM) to create more energy-efficient systems [18, 19]. The existing types of DPM are related to predictive capabilities of a PM (power manager) able to observe the components under its control. Most of the time, DPM policies are implemented in an operating system (OS). This class of methods is called OS Power Management (OSPM) [18]. The control and the intelligence needed to analyze data coming from the hardware in relation to DPM is done by the OS [20].
The power state machine (PSM) can be used as a model to represent the behavior of power managed components (PMC). Each state transition is associated to a power and delay cost.
Some conditions must apply in order to be able to save energy with the DPM design methodology [18]. The first condition is to have components that consume variable power during system operation. The second condition is to be able to predict the future workload of the most power hungry components of the system. The third condition is to be able to achieve such prediction with negligible power consumption. These conditions can be satisfied by observing signals that trigger shut-down or power-up events. Furthermore, it is required to use a Power Manager (PM) implementing the control of shut down and power-up of components. Such components are called power managed components (PMC). The set of all control commands for power managed components is called a policy.
A recent initiative, known as the advanced configuration and power interface (ACPI) standard, has been proposed by the industry [21]. This standard targets personal computer power and defines the interface between the motherboard and the control system, which is implemented in software. However, the standard does not provide specific DPM methods to improve energy efficiency.
Adaptive techniques for power management exist in the academic literature [22]. Adaptive techniques consist of learning from the statistical coverage taken from the past workload. When workload statistical behavior is changing over the time, the accuracy of the wake-up and shut-down predictions is directly compromised. In order to avoid predictive degradation, the DPM policy depends on a learning algorithm based on past events. Some existing learning algorithms were implemented in software part [23]. No existing method can capture data coming from the software and from any digital pin of the system to learn from the past workload because having observability on every pin of every system component has never been done before.
The existing DPM policies are very basic due to the complexity of the problem. The presented DPM are mainly used on personal computers and to apply a DPM methodology on other important electronic designs such as smart phones, telecom electronic systems, digital video or FPGA based systems [19].
Dynamic thermal management (DTM) is already a very well known research subject [18, 19, 24, 25]. This method can dynamically respond to temperature when it is larger than a certain threshold in 2D ICs or 3D stacked ICs by reducing processor power or other power manageable components. DTM pro-actively reacts to predicted thermal crisis by using scheduling algorithms, but inevitably with performance degradation.
Boulé et al. [17] have proposed the synthesis of hardware assertion integrated in ASIC or in FPGA designs. This specialization is relatively new and a lot of research must be done in order to achieve a high level of maturity.
Prior Art of the Invention: Differential Electrical Signal Propagation in Integrated Circuit Networks with Configurable Pair Location
The use of differential signaling is prevalent in high speed I/Os. Existing solutions include LVDS (low-voltage differential signaling), LVPECL (low-voltage positive emitter-coupled logic), CML (current mode logic), HSTL (High-speed transceiver logic) and many others [26].
A solution to propagate a differential signal on a LAIC has already been proposed [24], however, such approach does not offer spatial reconfiguration as needed by the system.
Several electrical and physical constraints to support differential signaling must be met. Differential buffers transmit two different signals that are compared at the receiver end. The configurable interface must support a pair of balanced input signals and a pair of balanced output signals to transmit differential data. The differential signal quality is strongly dependent on the symmetry between the complementary signals. Dissymmetry induces jitter between the two differential signals and can lead to loss of the transmitted information. Very stringent jitter constraints exist for most high-speed interfaces. For example, in the PCIe transmission protocol, 30 percent of the bit length is the maximum allowed jitter [27, 28], which represents a jitter of 120 ps for a data rate of 2.5 Gbps. This very short propagation time difference can be caused by slight length or load dissymmetry between paired signal paths.
Proper signal integrity is required to propagate high frequency differential signals on long PCB traces, to avoid wave reflections, attenuations as well as parasitic couplings [27]. This is typically achieved with impedance matching at every level of the transmission chain. In a configurable integrated interconnection system, there is no PCB trace and the input and output driver impedances of the configurable differential interface need to match the uIC (user IC) input/output differential pin impedances in order to meet their input and output specifications. The input/output impedances in differential signaling are typically set to 50 Ω [27]
Prior Art of the Invention: Apparatus and Methods to Sustain Thermo-Mechanical Stability in Large Area Integrated Circuit Systems
Prior Art of the Invention: Smart Thermo-Mechanical Prediction Unit and Monitoring Methods to Reliably Sustain Transient Thermo-Mechanical Stress Peaks in LAIC (Large Area Integrated Circuit) Systems
Wafer-scale integrated circuits provide the advantage that interconnections between different sub-circuits on the wafer are made during manufacture of the wafer. The number of handling steps and the manufacturing time are then reduced.
Furthermore, wafer-scale integration allows faster switching speeds since the interconnection lengths on a wafer between the subcircuits are shorter than interconnections and bonding wires in classical printed circuit board technologies.
Wafer-scale integration is a way to implement the so-called more than Moore's law scaling, since a variety of functions can be implemented on the same wafer that is much larger than a conventional IC using standard lithographic technologies.
Wafer-scale integration offers the possibility of getting a large and unique active surface useful for many different applications such as high resolution display, high resolution sensor arrays or high resolution configurable network array.
The rapid development of semiconductor technology [41] has enabled integration of entire electronic systems on a single chip. Today's systems on chip (SOCs) can be designed to incorporate mixed-technology, including high-performance/low-power logic, analog, embedded SRAM/DRAM, radio frequency (RF) modules, micro-electromechanical systems (MEMS), and optical electronic systems [42].
From a mechanical perspective, ICs can be thought of as composite structures (multilevel) fabricated from highly dissimilar materials. These structures are commonplace in the electronic industry. Because these structures are made of materials that have different properties, specifically different coefficients of thermal expansion (CTEs), thermal stresses, distortion and warping are a source of concern. Additional thermally induced stresses can be produced from heat dissipated by local high power density during normal operation.
A main reliability challenge is to ensure transient thermo-mechanical stability in LAIC systems due to the multiple embedded heat sources and the presence heterogeneous materials assembled in a multi-layer structure. Typically, different materials will tend to have mismatches in Thermal Coefficients of Expansion (TCEs).
Heat expansion and contraction due to circuits operating can result in buckling and cracking of a LAIC system, particularly a full-wafer LAIC if attached to a rigid substrate. Performing experiments to measure or predict the stress and temperature generated in the multilevel devices using some finite element analysis tool is costly, time consuming and device dependant.
Transient thermo-mechanical stress issues are critical for large ICs industry. Thermal expansion and contraction due to the circuits performing normal operations can result in localized peak stress and cracking of the device, particularly in LAIC systems if they are supported or fixed to a rigid substrate or if such systems are insufficiently cooled.
Several mechanisms are used to measure temperature and stress in integrated circuits. U.S. Pat. No. 6,453,218 entitled “Integrated RAM Thermal Sensor” [29] discloses a method and apparatus for an integrated thermal sensor to regulate the temperature of RAM devices. This uses traditional techniques such as a diode to sense temperature variations to create an analog signal which will be converted into a digital signal prior to being sent to an external host computer for data processing.
Embedded test structure methods in U.S. Pat. No. 5,625,288 entitled “On-chip high frequency reliability and failure test structures” [43], use Self-stressing test structures for realistic high frequency reliability characterizations. An on-chip high frequency oscillator, controlled by DC signals from off-chip, provides a range of high frequency pulses to test structures. The test structures provide information with regard to a variety of reliability failure mechanisms, including hot-carriers, electromigration, and oxide breakdown. The system is normally integrated at the wafer level to predict the failure mechanisms of the production integrated circuits on the same wafer.
U.S. Pat. No. 5,639,163, an “On-chip Temperature Sensing System” [30], makes use of a differential pair of diodes to collect the temperature, and of two external resistors responsible to generate a constant current injected in each diode.
Moreover, in U.S. Pat. No. 4,768,170, a “MOS Temperature Sensing circuit” [31] formed on the silicon substrate has been disclosed. This circuit uses two diodes with different sizes, and exploits the canceling effect of the leakage current of a smaller diode with respect to a larger diode whose leakage is due to process variations; therefore creating a temperature dependent circuit.
As with thermal sensors, many ways to sense pressure have been disclosed, including the “Capacitive pressure sensor” of U.S. Pat. No. 4,322,775[32]. The use of capacitances as a transducer was very well documented in the past. Some of them are used in applications such as the “Silicon Pressure Sensor” defined in U.S. Pat. No. 4,317,126 [33].
Recently the tracking thermal mini-cycle stress method was used [44]. With this method, temperature excursions of an assembly experienced over its life is disclosed. A modifier value for a figure of merit (FOM) value is computed and added to a cumulative figure of merit value. In response to the cumulative figure of merit, values exceeding the cumulative stress figure of merit budget are proposed as a stress management solution.
Due to aggressive technology scaling, VLSI integration density as well as power density increase drastically. For example, the power density of high performance microprocessors has already reached 50 W/cm2 at 100 nm technology and it will reach 100 W/cm2 at 50 nm technology [45]. This evolution towards higher integration levels is motivated by the needs of advanced high performance, lighter and more compact systems with less power consumption. Meanwhile, to mitigate the overall power consumption, many low power techniques such as dynamic power management [46], clock gating [47], voltage islands [48], dual Vdd/Vth [49] and power gating [50, 51] were recently proposed.
These techniques, though helpful to reduce the overall power consumption, may cause significant on-chip thermal gradients and local hot spots due to different clock/power gating activities and varying voltage scaling. It has been reported in [52] that temperature variations of 30° C. can occur in a high performance microprocessor design. The magnitude of thermal gradients and associated thermo-mechanical stress is expected to increase further as VLSI and SoC designs move into nanometer processes and multi-GHz frequencies.
An important issue with VLSI systems and micro-systems is how to perform its thermal monitoring, to detect overheating, without complicated control circuits. The traditional approach consists of distributing multiple sensors over a chip, and then reading their outputs simultaneously and comparing them to a reference voltage recognized as the overheating level.
Prior Art of the Invention: Propagation of Analog Signals on a Digital Interconnect Network and Support for Analog Signals
More and more integrated circuits use analog pins to read or provide analog signals. For instance, several state-of-the-art processors that are landmark digital ICs such as the Intel Pentium 4 and Pentium M [34], as well as IBM PowerPC [23], use on-chip thermal sensors to monitor in real time their thermal profiles [28]. While some systems such as the POWER5 processors from IBM [35] uses digital thermal sensors based on a ring oscillator whose actual frequency increases with temperature, other used analog thermal sensors which are based on temperature-sensing diodes and whose output is a current whose intensity is temperature-controlled.
Several well known circuit techniques can be used to build Analog-to-Digital converters to convert the signals from analog to digital [36-38], such as direct conversion, successive-approximation, ramp-compare. Wilkinson, multi-slope, pipeline, Sigma-Delta conversion [30] and with intermediate FM stage.
Well known circuit techniques can also be used to build Digital-to-Analog converters to convert the signals from digital-to-analog [37-39], such as pulse-width modulation, oversampling, interpolating, binary-weighted, R-2R ladder and thermometer-coded.
Analog signals are important even in predominantly digital systems. While an interconnect network propagating analog signals could be implemented in parallel with a digital networks to transmit these analog signals, the capabilities of analog networks are limited (due to noise, crosstalk, delay, as well as voltage, current, and frequency range). A dedicated parallel analog network would also be costly and very frequently left unused in predominantly digital systems.
One way to perform Analog-to-Digital (ND) or Digital-to-Analog (D/A) conversion is to use of a voltage controlled oscillator (VCO). Some VCOs convert analog signals into a digital stream or a signal whose frequency varies with the magnitude of the analog input signal. A frequency to analog conversion can then be done at the destination. A similar conversion principle can be applied with delta-sigma modulation.
Integrated circuit, as depicted by 9401, 9402, 9403 in
The present invention relates to tools and methodologies for interfacing with large area integrated circuits (LAIC), made from photo-repetition of one or more reticle image fields, and large area Micro-Electro-Mechanical Systems (LAMS).
The present invention can also be applied to any WSI (wafer scale integrated) system.
The present invention can also be applied to three dimensional stacked integrated circuit systems.
The present invention also relates to electronics serial communication systems needing robust defect tolerant features to improve production yields.
The present invention also relates to electrical signal propagation supporting configurable differential interconnects stage in LAIC.
The present invention also relates to distribution of power supplies integrated in LAIC structures.
The present invention also relates to massively distributed sensors integrated in LAIC structures and tools and methods to improve the reliability and integrity of the power distribution.
The present invention relates to supporting and supplying large area micro-systems (LAMS) and is particularly well suited for the WaferBoard™ defined in U.S. Pat. Application Publication No. 2008/0143,379, entitled “Reprogrammable Circuit Board with Alignment-Insensitive Support for Multiple Component Contact Types” [1].
The present invention also relates to hardware architecture and algorithms to locate short and stuck-at faults for efficient diagnosis in LAIC.
The present invention also relates to tools and methods for prototyping LAMS.
The present invention also relates to distributed analog-to-digital converters (or a subset of) and digital-to-analog converters (or a subset of) that are linked by a configurable digital interconnect network to propagate analog quantities.
The present invention relates to predicting and monitoring transient thermo-mechanical stress peaks in LAIC (Large Area Integrated Circuit) systems and it also relates to monitoring methods to sustain transient thermo-mechanical stress peaks that can affect system reliability.
It is an object of the present invention to provide a generic architecture of defect tolerant scan chain that is robust to manufacturing defects in LAIC. That scan chain supports test and diagnosis under the control of software or embedded hardware possibly located in an external computer.
It is a further object of the present invention to add recovering capabilities for one or more scan chain included in LAICs.
It is an object of the present invention to provide a configurable scan chain bus that can make bifurcation, loop back and direct access by “jumping above” an arbitrary number of cells in the scan chain by having a random access port to the input or output of any cell.
It is a further object of the present invention to provide a configurable scan chain bus that can be configured by external software or one or more embedded controller.
It is a further object of the present invention to provide a configurable scan chain that is divided into one or several modules, each having its own TAP controller, input and output data ports.
It is a further object of the present invention to provide modules in the configurable scan chain, with defects seen at the input data ports, to be replaced by extending the scan chain until it reaches a module with a functional input port. The same strategy can be applied to the output port.
It is a further object of the present invention to provide modules in the configurable scan chain, with defects seen at an output data port that can use another output data port to complete the scan chain.
It is an object of the present invention to provide physical links between input and output ports of adjacent modules, each controlled by an external controller or software, deciding how to activate these links to built a single macroscopic scan chain spanning all modules in a LAIC.
It is an object of the present invention to provide a TAP controller in each module that can potentially control its adjacent modules that has a faulty TAP controller.
It is an object of the present invention to provide a power domain supplying one or several modules in the LAIC to its adjacent modules.
It is an object of the present invention to provide independent clock or reset trees for each module in a LAIC to build fault tolerance onto these critical signals.
It is an object of the present invention to provide fault tolerant clock trees in a LAIC so that faults affecting clock trees branches do not compromise the integrity of the whole tree, but stay located to the defective branch and its associated children, and to avoid a defect at the root of the clock tree causing failure of the whole clock tree.
It is a further object of the present invention to provide independent clock trees (or a tree used for signal with large fanout) for each module in a LAIC, and share the clock root signal to ensure that when a clock root signal is blocked by a fault, a functional clock tree shares its root signal to the faulty tree to recover from its breakdown.
It is a further object of the present invention to provide independent clock trees for each module in a LAIC and to share the successive children from the root of the clock tree to ensure that when a branch of the clock tree is defective, the clock signal from the same branch level of the clock tree can be used instead to drive the children of the defective branch.
It is a further object of the present invention to provide a diagnosis algorithm capable of locating all the faults included in the configurable scan chain network.
It is an object of the present invention to provide several possible paths to link all functional modules in the configurable scan chain of a LAIC, to bypass or go-around a faulty link or faulty TAP that is blocking a path according to a map of defect locations.
It is a further object of the present invention to provide a software mechanism to register one or several of these paths in a database, that behave as a standard scan chain, i.e. that can be used to test and configure the non-defective modules on each path.
It is a further object of the present invention to provide a mechanism to extract paths in the database to know how the modules with their respective TAP controllers are linked together to properly generate the data stream and to properly interpret the data stream that comes out of the path.
It is a object of the present invention to provide for the configurable scan chain different ways to implements inter-TAP connections that can each be configured by control software: the CICU link (configurable inter-cell unidirectional link), the CICB (configurable inter-cell bidirectional) link, the RA link (random access link).
It is a further object of the present invention to provide the CICU link (configurable inter-cell unidirectional link) for inter-TAP connection that does not allow a path to return back to a used module.
It is a further object of the present invention to provide the CICB (configurable inter-cell bidirectional) link for inter-TAP connection that can make a link to its neighbor modules and can receive a feedback from them.
It is a further object of the present invention to provide the RA link for inter-TAP connection that creates links via a bus, where each bus is connected to more than one TAP controller, enabling parallel access or the serial information to jump directly and randomly from a module to a distant module, requiring a special multi-dropout module.
It is an object of the present invention to provide the mechanism to control the internal resource of a module in the configurable scan chain from the module's TAP controller or from a TAP controller in adjacent modules to increase the robustness of the fault-tolerant capabilities of a configurable scan chain.
It is an object of the present invention to provide different combination of defect tolerant strategies (CICU link, CICB link, RA link, external control, clock sharing) that can be adapted to a particular implementation of the configurable scan chain.
It is an object of the present invention to provide diagnosis strategies for regular reconfigurable network (RRN). Most of the diagnosis methodologies are currently done using stimulation on every I/O port of an IC to increase the speed of the diagnosis. It is an object of the present invention to provide a software-based fault diagnosis methodology requiring a smart diagnosis controller to optimize the test sequence according to the collected data from the configurable scan chain.
It is a further object of the present invention to provide three main classes of solutions to effectively diagnose faults in the RRN. The common basis to all three methods is the limited control used to perform tests. Only JTAG ports are used or multiple scan chains can be used in parallel. The main class of solutions are: (1) Optimized and concurrent diagnosis with versatile fault tolerant scan chain (2) concurrent BIST with multiple scan chain (3) BIST and ring signal propagation.
It is also an object of the present invention to provide test and diagnosis using versatile reconfigurable scan chains such as with CICU, CICB and RA link architectures, to make the diagnosis more robust to defects as well as faster by the use of cells connected together by daisy scan to configure only a specific crossbar and a very specific register in the crossbar.
It is also an object of the present invention to provide techniques to test short faults. A significant increase in test speed can be reached with this technique.
It is also an object of the present invention to provide methods that use the reprogrammability of the network to create rings of any form, particularly closed loops, and associating a test pattern generator (TPG) that plays the role of the transmitter and a response analyzer playing the role of a data receiver is associated with each ring.
It is an object of the present invention to provide a third class of methods using a concurrent BIST architecture with multiple scan only.
It is an object of the present invention to provide a new domain of application of the walking one diagnosis can be applied to detect and locate shorts in a matrix of CMPIO integrated on various kind of LAIC.
It is an object of the present invention to provide a new kind of interposer containing advanced design for testability and diagnosability modules to accelerate the test and diagnosis of 3D IC circuit or 3D LAIC.
A further object is to accomplish these objectives with interposer that embeds configurable logic cells to create intelligent and dynamic power and thermal management.
Yet a further object to accomplish this objective with an array of identical cells that spans the entire active surface of an interposer.
Another further object is to accomplish these objectives with multiple cells that use the CMPIO (Configurable Multi-Purpose I/O) technology enabling alignment insensitive interconnection between IC dies deposited on the interposer.
It is an even further object of this invention to combine this interposer with embedded configurable logic cells that spans its entire active surface with CMPIO technology with a fault tolerant JTAG configuration system such as the versatile serial communication system as disclosed above in the configurable interposer.
It is a further object of the present invention to provide configurable interposers that can be used as a means to interconnect heterogeneous LAICs stacked in a 3D structure. A yet further object of the invention is to accomplish this with interposers interconnected through configurable crossbars to create a configurable 3D network of interconnects. A yet further object of the invention to provide a configurable interposer where configuration is controlled by software. This software supports on-the-fly reconfiguration of the network that enables rapid prototyping of systems embedding 3D stacked chips or 3D stacked LAICs.
It is a further object of the present invention to provide programmable assertion checkers embedded in a LAIC.
It is a further object of the present invention to provide programmable assertion checkers embedded in a configurable interposer.
It is a further object of the present invention to provide observability on the majority of electronic system pins using a special network for System on Chip (SoC) that posses the ability to redirect a large number of signals to external software able to analyze the captured data.
It is also an object of the present invention to provide programmable logical cells integrated in the LAIC or in the configurable interposer that can emulate the behavior of complex Built-In Self Test (BIST).
It is yet an object of the present invention to provide for wafer-scale integrated circuit and especially 3D stack of chips, dynamic thermal stress management to avoid the silicon crystal to break because of temperature gradient. Implementation of the dynamic behavior can be done by means of temperature sensors. A further possibility is to have an array of local controller that generates heat with a special resistive heating circuit to get the temperature gradient smoother along the substrate XY, XZ and YZ planes. Dynamic thermal stress management can be implemented inside the configurable interposer or inside any type of LAIC
It is yet an object of the present invention to provide a system capable of supporting advanced computer-aided design for rapid prototyping of digital low power electronic systems.
It is further an object of the present invention to provide ability to the computer-aided design tool to get signal data (observability) from every I/O pins and current consumption on every VDD pins.
It is therefore an object of the present invention of knowing the current consumption and the voltage level through a dense array of current sensor and voltage sensor that allows the CAD tool to be aware of the real-time power consumption of the system under prototyping developed with a configurable interposer or a WaferIC.
It is yet an object of the present invention to provide a LAIC or a configurable interposer with an array of temperature sensors.
It is an object of the present invention to provide algorithms applicable on the electronic system under prototyping to minimize the applied voltage level of every PMC and digital IC of the system.
Another object of this invention is to provide passive and active: mechanical, thermal and electrical solutions to support and allow the correct operation of any fragile and thin Large Area Micro System (LAMS) and wafer-scale integrated circuits.
A further object of this invention is to allow the implementation of a high-density programmable system board that includes a wafer-scale integrated circuit (WaferIC), a fault tolerant interconnect network implemented on the WaferIC, called WaferNet, and a circuit that allows detecting ICs laid over the WaferIC.
It is therefore one object of the present invention to provide a stable mechanical support to Large Area Micro-Systems (LAMS) that includes large area integrated circuits, large area Micro-Electro-Mechanical Systems (MEMS) and Nano-electro-mechanical systems (NEMS) that can compensate thermal and mechanical stresses applied to the large and fragile LAMS substrates, stresses due to difference between different coefficients of thermal expansion (CTE) of material or due to applied external mechanical stresses.
It is a further object of the present invention to provide a mechanical and electrical support to LAMS devices by keeping their active surfaces clean of any mechanical or electrical components.
It is a further object of the present invention to have active and passive thermal devices that efficiently evacuate the heat generated by the normal activity of LAMS devices.
It is also an object of the invention to have a smart network of embedded thermal and pressure sensors distributed on the whole surface of LAMS device to get feedback of the thermal behavior of the supported application and then to enhance its operations by adjusting some parameters such as its power supply voltages, its power consumption, its operating speed, its clock frequencies or other system parameters that can be externally configured and tuned.
It is a further object of the invention to have a network of embedded programmable heaters and coolers distributed on the whole surface of the LAMS device to smooth its surface temperature distribution and then to avoid thermal spots and high thermal gradients which can cause local mechanical breaks, operating dysfunctions or variations.
It is yet further object of the invention to provide several programmable AC-DC and DC-DC voltage converters and robust ground planes needed to support operation of the electrical features of the supported LAMS devices.
It is another object of the invention to have a network of programmable voltage regulators and passive electrical devices distributed on the whole surface of the LAMS device in order to provide hierarchical power supply that is programmable, stable and provides good signal integrity.
It is also an object of the invention to provide power from one side of the LAMS device through Through Silicon Vias, to free up the other side.
It is an object of the present invention to have a network of embedded electrical (voltage, current, radiations) and physical (temperature, pressure, stress . . . ) sensors distributed on the whole surface of the LAMS to get feedback on physical and electrical spatial distributions of the supported application and then to enhance its operations by adjusting some of its controllable external parameters.
It is an object of the invention to provide all electronic circuitry, electrical connections and structures needed to generate, to probe, to propagate, to amplify or to process any electrical signal generated or needed by the supported LAMS device.
It is also an object of the invention to provide a mean to support differential electrical signal propagation in the supported LAMS device.
It is a further object of the present invention to provide a network of programmable circuits and passive devices distributed on the whole surface of the LAMS device needed to generate, to probe, to propagate, to amplify or to process any electrical signal generated or needed by the supported LAMS device.
It is a further object of the invention to provide methods and apparatus to leverage configurable digital interconnect to propagate analog quantities.
It is a yet further object of the invention to provide method and apparatus to leverage configurable digital interconnect to propagate analog quantities by using distributed analog-to-digital converters and digital-to-analog converters that are linked by a configurable digital interconnect network to propagate analog quantities.
It is also one object of the present invention to propagate analog signal through dedicated metal grids (typically used for power supply distribution) coupled with large transmission gates. That option is of interest when such metal grids are not used for their primary intended purpose.
It is therefore one object of the present invention to provide a thermal sensor cell network and dynamical thermal peak prediction to control thermal stress on a Large Area Integrated Circuit system.
It is a further object of the present invention to provide LAIC systems with a configurable temperature sensor cell network embedded into a LAIC use data provided by the temperature sensors network in a dynamical thermal management policy.
It is a further object of the present invention to have active and passive thermal devices that efficiently evacuate heat generated by the normal activity of devices in a LAIC system.
It is a yet further object of the present invention to provide a smart thermo mechanical prediction unit and peak stress monitoring to control transient thermal stress in LAIC systems.
It is also an object of the present invention to provide LAIC systems with temperature sensor arrays and smart thermal stress prediction units, embedded into LAICs, and a further object to use data provided by these sensor cell networks in a dynamic management policy to increase the reliability of the LAIC system by adjusting some parameters such as its power supply voltages, its power consumption, its operating speed, its clock frequencies or other system parameters.
It is therefore one object of the present invention to provide a stable mechanical support to the LAIC system.
It is a yet further object of the present invention to provide a configurable sensor cells to 3D LAIC system.
It is a further object of the invention to have a network of embedded configurable sensor cells distributed in LAIC system or LAIC to predict its peak surface temperature location and then to avoid thermal spots and high thermal gradient which can cause local mechanical break, device operating dysfunctions or electrical parameter variations.
The present invention also relates to methodologies to make integrated circuit components that include surface contacts for making contact with a plurality of integrated circuit components. These surface contacts typically receive and process external data from said surface contacts and drive some other surface contacts with processing results. The integrated circuit component may or may not be a LAMS or LAIC. When the integrated circuit is not a LAMS or LAIC, the wafer from which it is derived is separated in dies embedded into packages protecting them from scratches, from environmental conditions and providing mechanical strength facilitating manipulation by humans or by system assembly equipments. The pads of the die are normally connected to external pins or balls through embedded conducting paths to make further contact to system integration technologies such as printed circuit boards or multiple chip modules that allow connecting together multiple pins or balls from the same chip or from different chips
It is an object of the invention to have a network of programmable voltage regulators and passive electrical devices distributed in the integrated circuit component in order to provide hierarchical power supply to internal and external components that is programmable, more stable and that provides better signal integrity.
It is an object of the present invention to have a network of embedded electrical (voltage, current, radiations) and physical (temperature, pressure, stress . . . ) sensors distributed in the integrated circuit component to get feedback on physical and electrical spatial distributions of the supported application and then to enhance its operations by adjusting some of its controllable external parameters.
It is an object of the invention to provide all electronic circuitry, electrical connections and structures needed to generate, to probe, to propagate, to amplify or to process any electrical signal generated or needed by the integrated circuit component.
It is a further object of the present invention to provide a network of programmable circuits and passive devices distributed on the whole surface of the integrated circuit component needed to generate, to probe, to propagate, to amplify or to process any electrical signal generated or needed by the supported integrated circuit component.
It is further an object of the present invention to provide the ability to get signal data (observability) from every I/O pins and current consumption on every VDD pins.
It is a further object of the invention to provide methods and apparatus to leverage configurable digital interconnects to propagate analog quantities in the integrated circuit component.
It is a yet further object of the invention to provide methods and apparatus to leverage configurable digital interconnect to propagate analog quantities by using distributed analog-to-digital converters and digital-to-analog converters that are linked by a configurable digital interconnect network that can propagate analog quantities by digital means.
It is also an object of the present invention to propagate analog signals through dedicated metal grids (typically used for power supply distribution) coupled with large transmission gates. That option is of interest when such metal grids are not used for their primary intended purpose.
The following definitions are provided in an alphabetical order to facilitate searching.
By the expression “Alignment-insensitive” as used herein is meant not rendered inoperable by small changes in placement or angle of something affixed relative to what it is affixed to.
By the expression “Alignment-insensitive contacts” is meant an array of substrate contacts of a size and spacing such that components can be placed in registration anywhere within the array of substrate contacts such that at least one of the substrate contacts will be in contact with each one of the component contacts and none of the substrate contacts will be in contact with more than one of the component contacts. Switch circuitry can be used for selecting substrate contacts in contact with component contacts for providing an interconnecting path for the component contacts to other devices.
The expression BIST as used herein is the acronym for Built-In Self-Test. BIST is often used as the name of an embedded electronic sub-module that permits an IC to test itself. The BIST technique is used to improve test time and reduce test cost, reducing the demand for external test equipments (ATE). Properly designed, BIST can be used for defect diagnosis.
The expression “boundary scan” as used herein is a scan chain inserted in IC input and output pins to create control and/or observation points otherwise difficult to access by other means. Boundary scan cells can collect data from IC pins or force data or signals on IC pins
The expression “cell” as used herein refers to a hardware module that is instantiated/printed/fabricated on the substrate of an IC.
The expression “CHAC” as used herein is the acronym for Configurable Hardware Assertion Checker.
The expression CMPIO as used herein is an acronym for Configurable Multi-Purpose IO. A CMPIO array forms an array of tiny pads with respective dimensions of the order of 50 μm×50 μm and even smaller for subsequent generation of the same technology. CMPIOs can provide data and power to other devices. CMPIOs can be configured as floating, as digital or analog input/outputs, as power supplies or as ground.
The expression “defect” means a physical alteration on a circuit as compared to its designed parameters. A fault, which is a logical discrepancy over the specified behavior, is often, but not always, the consequence of a defect. Not all defects cause faults, and not all faults are visible to some user on the system boundaries.
The expression “Defect tolerant architecture” as used herein means an architecture that can be reprogrammed or reconfigured to avoid one or more dysfunctions of a system due to defects in its fabrication process.
The expression “diagnosis” as used herein is the process of locating faults.
The expression “direct contact” as used herein means an electrical contact between balls, pads or surface contacts of integrated circuits through IC pins, where an IC pin touches directly another IC pin. Electrical contact in a direct contact can be made through any short conductive material, such as a metallic ball or a Z-axis film with embedded conductive paths.
The expression “fault” as used herein means a behavior of an electronic circuit that departs from the nominal or specified behavior. In a digital circuit, a static fault is a change of its logical behavior. Some faults can be transient or dynamic and some may only affect timing. Faults are often, but not always, caused by defects.
The expression “fault-tolerant architecture” as used herein means an architecture that can be reprogrammed or configured to avoid one or more dysfunctions of a system.
The expression “green meter” as used herein means modules of an instrumented electronic system that can extract energy consumption in real-time to help optimizing power consumption on existing designs.
The expression “hardware assertion module” as used herein means a circuit that verifies properties of a design. Some properties may manifest themselves over time. Some others can be verified statically. These properties typically define logical and temporal behavior of the design. A hardware assertion module is a hardware device that can identify when and where a specified property is violated.
The expression “IC”, or “integrated circuit” as used herein means an electronic circuit fabricated over a monolithic substrate that comprises multiple components such as transistors, resistors, capacitors and/or inductors. An IC is a miniaturized version of an electronic circuit that could possibly exist as a set of discrete electronic or solid state devices connected together for a purpose. ICs are commonly integrated on a single die of silicon, but other technologies such as gallium arsenide (GaAs) exist. In conventional IC fabrication, multiple copies of the same circuit are ‘printed’ over a semiconductor wafer. That wafer is diced, and dies are mounted and encapsulated in packages to form ‘chips’ or ICs. A bare IC, die or encapsulated IC can also be identified as an integrated circuit or as an integrated circuit component.
The expression “Interposer” has used herein means a component that serves as an intermediate layer between two integrated circuits.
The expression “LAIC” (Large-Area Integrated Circuit) as used herein means any integrated circuit made from photo-repetition of one or several reticle image fields on the same circuit layer that are interconnected into a single integrated circuit.
The expression “large area micro-system (LAMS)” as used herein means an array or collection of micro-systems larger than a reticle image produced with one or several monolithic substrates such as LAICs.
The expression MEMS as used herein means Micro-Electro-Mechanical Systems. The term MEMS is often used loosely, in which cases MEMS integrates one or more of the following components: mechanical elements, sensors, actuators, and electronics on a common substrate using some microfabrication technology.
The expression “micro-substrate” as used herein means a small piece of planar material that mechanically, electrically and thermally support another fragile planar material deposited on it.
The expression “micro-system” as used herein means some electronic or mechanical components, usually made through a lithographic process, that contain small parts with dimensions between one micron and one millimeter on a side.
The expression MISR as used herein means a multiple input signature register. A MISR is a parallel input register that can be used for test response compaction. MISRs are usually used as part of BIST systems to increase test speed by compressing results produced by a set of test vectors.
The expression NEMS as used herein means Nano-Electro-Mechanical Systems. A NEMS integrates one or more of the following components: mechanical elements, sensors, actuators, and electronics on a common substrate through nanofabrication technology.
The expression “NoC” as used herein means Network on Chip.
The expression “NoW” as used herein means Network on Wafer.
The expression “PCB” as used herein means Printed Circuit Board. A PCB is a mechanical support that also electrically connects discrete electronic components or ICs using conductive traces etched from conductive sheets laminated onto a non-conductive substrate.
The expression “PMC” as used herein means Power Manageable Component.
The expression “PSA” as used herein means Programmable Shut-down Assertion.
The expression “PSM” as used herein means: Power State Machine
The expression “PWA” as used herein means Programmable Wake-up Assertion.
The expression “Reticle” used herein refers to a physical object used as part of a micro-fabrication process to print an image of one layer of one or more IC over some area of a wafer. More than one IC may be printed at a time when they are sufficiently small. To improve resolution of manufacturing processing, the reticle is often enlarged by some factor; say 5×, compared to the part of a wafer printed in one exposure. Typically, the maximum size that can be printed on wafer with a reticle is 2.5 cm by 2.5 cm. That maximum image size corresponds to the normal maximum size of an IC. A reticle image field is what gets printed on a wafer.
The expression “reticle image field” as used herein means the geometrical zone that gets printed on the surface of a wafer where some micro-fabrication step such as a lithographic process step takes place. It defines the maximum size that a regular IC that is not stitched can have. By stitching multiple field images together, a Large Area Integrated Circuit (LAIC) is formed. In the most common micro-fabrication processes, a stepper covers a whole semiconductor wafer that can be large than 30 cm in diameter, by imaging multiple copies of the reticle at regular interval.
The expressions RRN, RRNoC or RRNoW as used herein stand for Regular Reconfigurable Network (RRN), RRN on Chip and RRN on Wafer. This type of network includes the WaferNet network, but can include every type of network on chip (RRNoC) or on Wafer (RRNoW) that contains a regular array of reconfigurable crossbars interconnected together.
The expression “Scan chain” as used herein means a sub-circuit within an IC that is composed of a chain of memory elements. This chain is typically accessed through a serial protocol like JTAG or any other types of interconnect network to minimize the number of connections.
The expression “Scan chain path” as used herein means a path between two distant points in a circuit made of a scan chain. Several scan chain paths can exists between two distant points in a circuit.
The expression “SiP” as used herein means System in Package.
The expression “SoC” as used herein means System on Chip.
The expression “SoW” as used herein means System on Wafer.
The expression “stuck-at fault” has used herein relates to a most common fault model where it is assumed that the logical value on some electrical node is “stuck” at a constant logical value. Therefore conventional stuck-at faults can be of two types, either stuck at logic-0 or at logic-1, respectively named stuck-at-0 (SA0) and stuck-at-1 (SA1).
The expression “support frame” as used herein refers to a multi-layer stack structure such that each layer can be a heatsink, PCB, ceramic, silicon PCB, thermal grease, balls, MEMS, NEMS or any material that can be used to reduce mechanical stress on fragile LAMS devices and/or to interconnect devices for power supply or data signal propagation.
The expression “support circuitry” as used herein refers to any circuit that support another circuit, which can include devices, such as and not limited to passive devices (e.g. resistor, capacitor, inductor), active devices (e.g. transistors, diodes, etc.) and any combination of devices to build functional or control modules.
The expression “substrate” as used herein means the base layer of a structure such as an integrated circuit, multichip module (MCM), printed circuit. Silicon is the most widely used substrate for integrated circuits. Fiberglass (FR4) is mostly used for printed circuit boards, and ceramic is used for MCMs.
The expression “Test controller” as used herein refers to a module that controls the transfer of test vectors or data. A test controller can be a simple “go/no-go” logical unit. It can include software capable to provide complex diagnosis about the functionality of the unit under test or generate complex sets of data streams. A test controller can be used to communicate data between two modules, such as data to configure or test one module or data from/to sensor modules or actuator modules. A test controller can be substantial serial when it uses a scan chain protocol or substantially parallel when it uses a bus based protocol. Some test controllers include one or more test access ports (TAPs)
The expression “TSV” as used herein means Through Silicon Via. TSVs play the role of direct vertical interconnects in 3D ICs.
The expression “UUT” as used herein means Unit Under Test. A UUT can be any IC, SoC module, or part of a LAIC (including WSI) that is under test. A UUT is controlled by an external means such as an external test controller. The term UUT is therefore used in relation to configuration, programming or testing of a system.
The expression “VDD” as used herein is the abbreviation used for the power supply voltage of integrated circuits.
The expression “Wafer” as used herein refers to a slice of very pure semiconductor mono-crystal (typically silicon material even though other materials such as GaAs, InP and others are used). IC dies are micro fabricated over the surface of a wafer using photolithography and related processes. A wafer is typically disk-shaped, as a consequence of how it was obtained by slicing it from a mono-crystal cylindrical ingot.
The expression “WSI” as used herein means Wafer scale Integration. WSI is a process from which integrated circuits that cover substantially the whole surface of a semiconductor wafer are fabricated.
The expression wafer-scale micro-system device as used herein means an array or collection of micro-systems larger than a reticle image produced on a full wafer or a superposition of different wafers.
The expression WaferBoard as used herein refers to dynamically reconfigurable and reusable platforms that can be used to rapidly prototype and validate electronic systems.
These and other objects, features and advantages of the invention will be more readily apparent from the following detailed description of the preferred embodiments, in which:
First Family of Preferred Embodiments: Software and Hardware Strategies for Defect Tolerance in Large Area Integrated Circuit
An efficient and fault tolerant scan chain for large and complex LAICs can use the generic reticle image architecture depicted in
Communication links can be bidirectional (as shown in
The preferred embodiment for the LAIC architecture shown in
Each TC can be embedded in the LAIC or can be externally implemented (off-LAIC TC) with their respective control software.
Fault tolerance is achieved with the multi-reticle field image architecture of
Each reticle field image can be linked to its adjacent neighbors. Links 1803 and/or 1804 can be activated in case of failed communication links1806 between an external TC and a reticle. Therefore, if a TC-reticle link is dysfunctional or for any reticles not linked to a TC, then one of its adjacent reticle field images can dispatch the data stream to this reticle.
One independent external TC 1805 can be used to control one or more reticle field images. Each link 1806 is independent from the others, which means that it provides and gets its own set of signals to/from its reticles.
For example, the preferred embodiment for link 1806 is a standard JTAG link that includes a clock signal (tck), an optional reset signal (trst), a control signal (tms) and two data signals for the serial communication protocol (tdi and tdo).
In the preferred embodiment, each cell has a Test-Access-Port (TAP) module (1902, 1903) that controls the flow of data received from any neighbor cells. This TAP module allows a direct access to user's registers 1907 and to the forward link register (freg) 1906.
The goal of freg 1906 is to select the next cell to which the outgoing data stream is forwarded. All registers freg 1906 must be set such that one and only one cell forwards a data stream to a targeted cell. The mechanism used to select the cell-to-cell link can be based on demultiplexers, tri-state buffers, decoders or others.
A preferred embodiment is that register freg 1906 sets the state of an output demultiplexer 1904 which redirects the data stream toward one of the neighbor cells (through link 1c.2). For example, the link 1908 connects the cell (x, y) 1901 to cell (x+1, y) 1911. To forward data stream to 1911's TAP module from cell (x, y) 1901, the register freg 1906 is configured to set the demultiplexer 1904. Then, only the link 1908 forwards data to the OR gate 1905 of cell 1911, while all the other OR gates 1905 inputs of cell 1911 are set to zero.
a, 21b, 21c and 21d further illustrate the required successive steps in order to create a path between the head cell 2105, which has a direct connection 2151 to the TC and the tail cell 608, which sends data back through the direct connection 2152 to the TC. The example shows four successive steps 21a, 21b, 21c and 21d depicted for a very simple 2×2 cell reticle image field. Each step is associated with configuration commands sent to the 2×2 cells. The goal of the step 1 in
Once the register freg 1906 is properly configured, a path between cells 2105 and 2112 through link 1603 is created. The instruction register 1906 of cell 2105 is set into bypass mode. Then the system is ready to access the next cell 2112 with the configured link 1603. The second step in
The disclosed LAIC architecture is fault tolerant with respect to defective cells, defective cell-to-cell links, defective TC, defective TAP controllers can be bypassed, worked around or replaced. Several strategies can be implemented to overcome those faults.
The first fault tolerance strategy is the “external control” that allows a functional module to be controlled by a neighbor cell.
Each cell can be put in one of the following four states: (1) inactive state, where the inter-cellular scan chain or internal scan chain are not used (such as cell 608); (2) bypass state, where the internal scan chain is not used but the data stream is redirected to the next cell through the inter-cellular scan chain (such as 2206 in cell 2316); (3) scan-in state, where the cell's internal scan chain is accessed (such as cell 2310); (4) external scan state, where the cell takes control of the internal scan chain of a neighbor cell (such as cell 2151 takes control of 2305's internal scan chain).
A combination of one of these four states for each cell can be used to bypass or go around defective CLC, inter-cell links or internal scan chain.
For example, under a dysfunctional CLC in cell 2305, a fault tolerance strategy consists of reaching the internal scan chain in cell 2305 with cell 2151 configured in external scan state.
In another example, under dysfunctional links 2315 and 2316, a fault tolerance strategy consists of creating a path between head cell 2151 and tail cell 2152 by going around these broken links.
If the head 2151 or tail 2152 cells are dysfunctional, the entire reticle image field is lost. Other head cells or tail cells must be respectively used. Redundant head cells or tail cells can be added in the reticle or the head or tail cells of adjacent reticles can be used with links between neighbor reticle image fields also called inter-reticle links. Inter-reticle links are created with reticle stitching techniques. In the preferred embodiment, there is one head cell and one tail cell per reticle and inter-reticle links to each adjacent complete cell in the horizontal and vertical directions.
Inter-reticle links also increase the fault tolerance capability, especially for cells isolated due to dysfunctional inter-cell links or CLCs.
For example,
With inter-reticle links, isolated cells 2410 and 2411 can be accessed with paths from head and tail cells in adjacent reticle (paths 2414 and 2413). Head and tail cells could also be in different reticles respectively.
The differences between the bidirectional intercellular link (BICL) and UICL architectures imply variations in the cell hardware architecture and the TC software.
Once the planned path is set and proven functional, step in
While the routing resource needed to realize the BICL architecture, the test time is faster than UICL because the diagnosis is easier and faster to be completed. It also offers finer coverage of functional cells.
The test controller TC is an essential part of the solution and in a preferred embodiment; it includes a software resource to plan a possible path in the cell-matrix network and must be able to diagnose the faults. The diagnosis algorithm is shown in
At the beginning of the diagnosis process, the functional or dysfunctional status of each cell, CLC, internal scan chain and link is unknown.
As the database grows, inference rules are applied and defective cells are diagnosed and registered in a defect map 2907. Then, the iterations are stopped in 2909 when all cells have been diagnosed or when a satisfactory set of functional paths has been found.
For example, in
The example on
In most designs, there is one clock tree for simplicity. But in some designs, more than one clock tree is required. For example, in a LAIC system, a whole circuit cannot exist on the whole wafer, so a single clock tree cannot be implemented. Furthermore, each reticle must be identical because of the nature of the fabrication process (already discussed). Therefore, there is at least one clock tree on each reticle. If a clock tree is not functional, the whole reticle becomes dysfunctional. To overcome these vulnerabilities, it is possible to share clocks between reticles by configurable means. If the whole reticle is dysfunctional odds are good that the cause is a faulty clock tree. The
An extension of the first family of preferred embodiments is to apply the invention to make fault tolerant JTAG for Large area micro systems (LAMS). If there is only one daisy-chained scan chain between all units under test and if there is a fault in this scan chain, the whole LAMS becomes non testable, and therefore dysfunctional. This vulnerability can be avoided by the system embodiment depicted in
When n equal 2, if there is one fault in one link, it is possible to overcome the faulty link by using the other one. This process of bypassing the fault by finding an alternative path is fundamentally the same fault tolerance method as proposed in the first family of preferred embodiment with the UICL architecture for LAIC. The need for diagnosis remains in this system because it is required to know where the faulty link is located in order to activate the other path.
Because the network chaining all LAMS's ICs is not a lattice, the algorithm depicted on the
A versatile reconfigurable scan chain has been designed not only for defect tolerance but also to optimize the diagnosis and test speed of large NoC or NoW. The method disclosed here is a speed optimized version of the basic walking-one depicted in the Prior Art section.
An example of the capabilities of the versatile reconfigurable scan chains is illustrated on
Another benefit of having a versatile reconfigurable scan chain is to take advantage of the TAP controller array (not shown on the
As stated earlier, fault diagnosis using the walking one approach has three test phases. The first test phase (A) is fast and easy to complete, therefore it does not need optimization.
Test type B is by far the longest step of the walking one algorithm and therefore needs to be optimized.
Because test type B can be used concurrently among network crossbars, a test point list must be created 1302 that schedules the test of each crossbar input of the circuit. This list, once created can be partitioned to share the test workload on more than one crossbar input (more detail on concurrent testing will follow shortly). Once the list is created the RRN must be prepared by forcing an “S” logical value on all input terminal of each crossbar 1303. At this step, each crossbar must be configured in a one-to-one state where every crossbar input is re-directed to one crossbar output to activate all interconnect and allows interconnect observation. The logical “S” value is 0 if the algorithm is shifting a walking “1” and “S” is “0” if the algorithm is applying a walking “0” on crossbar input.
The next step 1304, 1305 is to create a bypass list, i.e. a list of unused and unprogrammed cell to the application of the current walking one or zero in the network. From the bypass list, there is (implicitly) a test list that can be generated because test list and bypass list form a partition of the whole test point list. Step 1304 is simply to execute the network modification only on crossbars affected by the walking one or zero. Because of the bypass list, only affected crossbars are reconfigured. Furthermore only the proper crossbar inputs are changed through the crossbar test scan chain as the walking one moves forward in the test point list 1305. Therefore, unnecessary scans are limited and the test speed can be improved significantly. Once the algorithm has visited all test point list, the walking-one is repeated, but with a walking-zero 1308.
The flowchart step 1306 “shift-out the test result to the test controller” is the most time consuming sub-step of the test type B. Shifting out the result to the test controller means that all the observation registers of all crossbars are included in this shift. This can be a very substantial amount of data if the NoW or the NoC contains numerous interconnections. Therefore, this step must be optimized. There are two methods to accelerate this part the algorithm: (1) use concurrent walking ones; or (2) using the principle of cone of influence to shift-out only the needed information for every test.
Knowing the exact extent and geometrical form of the cone of influence is the basis for a dramatic improvement in test speed of the test type B, and because of this, improvement of diagnosis speed of the whole algorithm is possible. Each new walking “1” or “0” applied to interconnect under test creates a new and unique cone of influence. Therefore, the algorithm must keep knowledge of the modification of the cone of influence as new walking one are applied to the network under test. This knowledge can then be leveraged to shift data only from cells that are part of the cone of influence. This is possible with the use of the TAP controller cell array available in any of the CICU, CICB or RA link architectures disclosed in this document. The list of cells part of the cone of influence is the basis for the generation of a test bypass list applied at the step 1306 of the flowchart of the test type B.
The second method to improve speed is to use multiple walking ones concurrently, as shown in
A diagnosis method disclosed in this document is to create rings of any form, particularly close loops, and associating a test pattern generator (TPG) that plays the role of the transmitter and a response analyzer playing the role of a data receiver is associated with each ring. In the presence of a fault free ring, the transmitter and the receiver should receive the same signal; otherwise, a single fault is detected in the ring.
To avoid any fault mask caused by multiple fault in the network, each network interconnect are tested by multiple ring each having a unique form and location. BIST PR has the advantage of being able to test dynamical fault, SA and even short fault. Moreover, this document discloses special techniques to make diagnosis as efficient as possible and to enable the detection and localization of short fault.
Another diagnosis method disclosed in this document is based on a “RING BIST”. BIST means build-in self-test. In order to design a BIST system, it is required to include a device that enables auto-generation and at least makes compaction of the result data generated from tests. RING BIST creates on the same cell position both test vector pattern and test vector reception and compaction. Moreover, this approach uses reprogrammable capabilities of the network to create multiple concurrent rings to test the circuit. Most of the faults detected from this RING BIST are localizable.
Ring BIST is able to detect crosstalk faults. In order to do so, multiple signal rings must intersect each other or be closed together in as many combinatory patterns as possible. Moreover, at-speed test can be completed, because the signal emitter and signal receiver can be clocked by the same signal coming from a closed source to maximize clock speed. An example of overlapped RING is shown on the
Multiple “RING BIST” can overlap each other in order to reveal crosstalk fault, or shorts. Such faults are called active faults 3809, because it implies at least two known interconnects in the process. If used as non-overlapped rings, tests can reveal the location of at least noise faults or delay faults. Locating with precision active faults needs a special algorithm. This special algorithm is detailed later in the present application.
On the contrary, passive fault 3808 implies one activated interconnect transmitting a signal and one passive interconnect capturing a constant value from a distant source. In order to detect and locate such faults, the same logic is used as in walking-one algorithm.
During the application of the walking ‘1’, it is required to force a ‘0’ on all control registers of the network. Such precaution enables the diagnosis of shorts on any pair of interconnect (parallel or perpendicular). The same idea applies for shorts diagnosis. Every passive cell (for example 3804 and 3810) must force a constant logical value that is the complement of the logical valued forced on the ring interconnects. If there is a short (such as 3808) between a ring and any passive interconnect such as the interconnect that is shown on the figure between 3804 and 3810, then the capture register will reveal the fault.
A general and fast algorithm to locate shorts, crosstalk, delay faults as rapidly as possible is depicted as a flowchart on the
The same process is repeated again (step 4009), but with a new value (step 4010) to force onto the observation register of each crossbar. In this second test phase the same test list “G” is applied sequentially to the network under test, but every TPG must generate a new test vector set to reveal new faults. An easy way to create a new test vector set is to change the number of cycles from M to M′. A second condition must be respected in order to reach 100% test coverage for short faults: the last test vector generated by the TPG must be composed of “1” if S=1 and composed of “0” if S=0. The last value generated by the TPG is very important to diagnose passive faults. This is why the same process is repeated again to reveal all the bridge fault types (wired-and fault, wired-or, A dominate B fault, etc.).
The Ring test list applied to the network creates a list of test results composed of test vectors captured from observation registers in the network. From this result, interconnect faults can be detected and located. Passive faults (e.g., 3808) can be easily located for shorts. Active faults (3809), i.e., faults detected from the ring receivers, must be located using inference rules.
Coverage can rapidly degenerate by applying too many rings at the same time. It is important to create plenty of space between ring sources in order to create a perimeter for un-activated cells. Each un-activated cell (shown in white in
The concurrent BIST diagnosis can be done from one test controller outside of the device under test of from a test controller embedded in the DUT. The device can be accessed through a JTAG port where the multiple scan chains can be selected and shifted through the standard instruction register from the IEEE 1449.1. The device can be accessed and diagnosed from a direct access to the multiple scan chain, with the TAP controller outside of the test controller.
As with each diagnosis system disclosed in this document, the concurrent BIST is dedicated to a RRN. The preferred embodiment for concurrent BIST is depicted in
The walking sequence is generated from a NOR gate or an OR gate. If it is a “k” bit counter, the walking sequence can be 2k bits long. Areg is partitioned in two distinct parts. The first part is composed of “k” bits that define the number of register associated to the counter. The second part is composed of a single bit to determine if the NOR gate or the OR gate is activated to generate a walking one or zero sequence.
The crossbar contains 4*n+m input port (4107). Each crossbar input must be controlled with a scannable control register 4104 in contact with the output of the NOR gate (for walking one) or OR gate (for walking zero). The test mode is triggered from the test mode signal coming from the counter module 4106. The variable “n” is the number of interconnects in each direction coming out from the crossbar and is m the number of supported signal redirections from CMPIOs (see section prior art,
The crossbar output 4108 must be observed with a capture register included in the walking 1/0 interpreter module 4105. The capture registers are connected 4103 to the crossbar output (4108) and they can be shifted-out directly and entirely to the test controller to locate the faulty interconnect. To make the diagnosis faster, it is possible to use the “walking-1/0 interpreter module” 4105. The function of this module is to compress the data coming from the capture register to make the diagnosis faster. The compressed data is transferred to the Ireg to be shifted-out to the test controller. Normally, in the presence of a fault-free network, all interconnects of the network must give an equal constant logical value (same value on all crossbar output). An exception occurs in only two cases: first, if the interconnect under test that is associated to the walking sequence (the interconnect under test) works properly, that interconnect will differ from the other; and second, if the number of ‘1’ is larger or equal to two, then it is the proof that a fault is present in the interconnect under test included in the cell. Therefore, a decoder detects the occurrence. Therefore, if a logical value is found in the Ireg, then, it is either a faulty interconnect or it is the normal output of the walking sequence on the observation register. Because the order of appearance of each cell is known in the scan chain, it is possible to locate faults.
The hardware architecture described above is designed to apply multiple concurrent walking-one sequences on the same network with the same scan chain. The flowchart depicted in the
Second Family of Preferred Embodiments: Contact Detection Methodology by Locating Short Between CMPIO.
By default, no shorts should be present between CMPIO. If a short exists, it must be located with a proper algorithm.
To diagnose short locations in the sea of cells (WaferIC), diagnosis algorithms are disclosed in this section: (1) the walking one algorithm.
Third Family of Preferred Embodiments: Configurable Interposer for Three Dimensional Large Area Integrated Circuits
The configurable interposer is in fact an active substrate containing active digital and analog circuit. The first usage of the configurable interposer considered is as a configurable NoC to receive or transmit data on each pin of each IC die of the system. Any set of conventional chips or ICs placed anywhere on configurable interposer can be connected to any other chips or ICs placed on another system layer in 3D stacked chip. Moreover, the configurable interposer embeds an array of tiny “CMPIO” to enable electrical contact between uIC pins and provide power to the user's ICs. The uICs can be any CPU, microcontroller, FPGA or any IC whose pinout or ball-grid is compatible with the configurable interposer CMPIO array. The use of a sea of tiny CMPIO supports compatibility with a wide range of pin and ball types, spacings and patterns.
The configurable interposer comprises a regular array of unit cells, with each cell comprising at least a configurable crossbar, an array (preferably 4×4 or more) of CMPIO, a configurable assertion checker, a configurable logic cell, and a microcontroller. The configuration to a particular state for each cell's crossbar creates a unique interconnection network mirroring the desired topology of the system composed of multiple layers of interconnected uICs.
Because the tiny CMPIO array is deployed on the entire active surface of the configurable interposer, the wire bonding (4503) that connect multiple layers together are in contact with the CMPIO arrays, for example at 4504. This electrical contact can become a real digital connection by properly configuring the configurable interposer's crossbar to create the desired connection between elements of different system level. For example, uICs 4501 and 4502 can be connected to uIC 4507 through wires 4510 and 4503 and through the configurable interposers 4509 and 4505. A layer of power blocks (4506) is used in each level to provide power to the configurable interposers.
The power is delivered to the ICs by the means of the configurable interposer. Each CMPIO can be configured as a VDD, GND or as an I/O. All power rails of the system can supply current or drain the current outside of the SiP.
Rather than using FPGAs 4702, the interposer 4701 could be populated with advanced reconfigurable GPUs each having the possibility to interconnect with other adjacent GPUs to optimize that array of GPUs for a given processing application (e.g. bioinformatics, interpretation of seismic data, etc.) because the interposer is configurable.
The uIC and configurable interposer arrangement can be expanded in axis Z (in number of layers) and in the XY plane as shown on the
The internal architecture of the configurable interposer is shown in
The configurable network of crossbars (5006 and 5013) enables the configuration of any kind of netlist between dies' pins. Signals travel between layers 5002 and 5003 by passing through the TSV I/O (5010 and 5009). The configurable network of crossbars is connected not only on the CMPIO, but also on the TSV I/O. It means that the designer can activate a particular interconnection between the two configurable interposer layers.
In another preferred version of the structure proposed in
In a preferred arrangement, the stack would alternate the layer types as necessary. For example, a stack could alternate layers as follows: interposer-chip-interposer-chip-interposer . . . , or interposer-chip-Z_axis_film-interposer-chip-Z_axis_film . . . . As many useful variations are possible, the disclosed combinations are only exemplary and it should be understood that 3D stacks combining such layers differently are possible and can be useful. The disclosed structure assumes that interposers and chip layers provide a sufficient number of vias to propagate signals, ground and power supplies needed by the assembly. As the assembly could include ICs from various vendors in die or in packaged form, such ICs may not have been designed to be specifically embedded in the disclosed 3D assembly, such devices with desired functionality may not provide any vias supporting vertical connections.
To ensure that suitable numbers of vertical connections are available, special dummy dies of the kind disclosed in
As disclosed dummy dies can be simple pieces of silicon with TSVs, generic dummy dies could be reused for multiple applications and thus manufactured in very high volumes for very low cost. They could also be thinned on demand to fit a particular use, and/or be available in a variety of standard thicknesses.
The disclosed 3D stack could be based on interposers supporting alignment insensitive contacts. This would be useful for assembling very dense 3D stacks in low volume, possibly using Z-axis films. Alternately, interposers or dies and dummy dies could receive balls as in the ball grid array (BGA) technology. Pick and place machines with an accuracy sufficiently better than the size of large 100 micron TSVs could be used to assemble 3D stacks composed of interposers, dies and dummy dies without Z-axis film and alignment insensitive contacts. These elements could be reflowed together in a compact low cost 3D assembly.
The preferred embodiment in
In a variation shown in
Stacking multiple dies as shown on the example of
A key challenge of 3D integration relates to test and control in general. Test and control relates to determining the presence or absence of faults and defects. Once the presence of a fault is known, its location can be determined precisely, a process called diagnosis. Setting up alternate path around faults/defect to obtain a desired functionality in spite of them leads to fault tolerance. This process can be called configuration. Configuration is a process that is also useful beyond fault tolerance as it may allow enabling modes and desired functionality at will. It may allow programming a clock speed, changing the operating voltage produced by an internal regulator or gate ON/OFF some modules or put them in standby or sleep modes to save power. Supporting these functionalities at the system level is useful. This can be done using the general objective or testing known as controllability and observability of internal nodes or states stored in memory elements.
Several test methods exist. Some are based on conventional scan often implemented using the IEEE1149.1 standard [53]. Other standards such as IEEE1149.6 [54], IEEE1149.7 [55], and P1500 [56] exist. Other known methods such as random access scan are known but never evolved into widely accepted industry standards. A key idea of many, if not all, such test methods is the ability to control and observe many internal points and state bits through a limited of access points using some suitable protocol generally supported a controller or wrapper or some sort. A wrapper as the name suggests wraps some circuitry using an interface. The p1500 standard is particularly open to support a wide range of previously known test standards using a bus interface. This facilitates design, test and verification and provides a useful means of partitioning a system across large design teams. Using the concept of interface based design and design contract, modules designed by teams that have minimal and possibly no interaction can work together. That is particularly useful when some programmable interposer is designed to accept virtually all available chips from diverse sources where the group developing the interposer has limited knowledge of what is in a chip that may have been designed by a team that has been dissolved of be part of a future design project.
The methods disclosed earlier to implement test and fault tolerance of the interconnect structure of a LAIC programmable interconnect device apply directly to a system composed of a 3D stack comprising a plurality of programmable interposer and IC layers. The need to test and configure 3D stacks is equally important and such stack may be composed of a wide range of ICs found on the market. An interposer that can flexibly support such ICs possibly requiring heterogeneous test method is useful. As proposed earlier, a programmable interconnect fabrics embedding test controllers or a modern variant based of test wrapper such as the P1500 is directly applicable and could be used not only to test the programmable interconnect device but also various ICs embedded in a 3D stack.
The need to ensure power integrity through distributed regulators, to support analog signals, to measure various parameters that relate to thermo-mechanical integrity, to supply current, to supply voltages, and their respective integrity are all useful in a 3D stack.
In a preferred embodiment, the 5 real TSVs propagate VSS, VDD1, VDD2, ANi, Tj. Here VSS is the ground, VDD1 and VDD2 are two different power supply tied to respective metallic power distribution grids, ANi would propagate some signal vertically in an analog way through a metallic stack and Tj would distribute to all layers one of the test signals of one of the IEEE1149 or P1500 standards. A digital signal can be propagated through an analog ohmic connection. Other TSV patterns and arrangements can be useful. For instance, more than two VDDs could be provided. A larger number of test or analog interconnect signal in each TVS pattern. By combining compatible interposers and ICs or dummy ICs, the proposed arrangement allows building low-impedance metallic connections through a 3D stack particularly useful to connect ground and supplies and to bring analog signals as well as test signals inside a 3D stack. The preferred purpose of Dummy vias is to systematically insert infrastructure circuits needed to support test and management of analog signals.
A dummy via in the regular fabric of an interconnect device is a zone where instead of having a regular TSV, some circuits needed to complete an electronic system would be available and could be connected to suitable parts of the system. Some support circuits that are useful in electronic systems include pull-up devices, pull-down devices, a voltage reference, a programmable voltage reference, a typical RC power-on reset circuit, analog to digital and digital to analog converter as well as some analog switch. This list of possible analog support circuits is not exhaustive or restrictive.
In the preferred embodiment, the programmable interconnect device has more than one metallic grid to distribute power and one of these grids can be used to connect analog support circuits to analog pins of user ICs as needed. Alternately, a metallic grid dedicated to the distribution of analog signals could be embedded in the programmable interconnect device. Also, more than on type of dummy via could be designed if the desired circuitry does not fit in the area of a single one. Various forms of regular interlaced distribution pattern of such plurality of dummy vias are useful. This is not restrictive and other uses of the dummy via zone could be useful.
Fourth Family of Preferred Embodiments: Distributed Hardware and Software Strategy for Rapid Prototyping of Reliable and Energy Efficient Three Dimensional Large Area Integrated Circuit System
The present invention aims not only to aid design of energy efficient electronic systems, but also to form a whole new family of integrated circuits. The methodology disclosed herein can be applied to 3D stacked ICs with one or several configurable interposers. A configurable interposer could be used as a tool to implement adaptable power management policies, or dynamical thermal management (DTM).
Just as FPGAs allow dramatically reducing development time and cost as compared to ASICs by allowing easy changes and architecture exploration, the use of one or more programmable interposer can reduce the development time of effective DPM or DTM policies.
Furthermore, as FPGA are considered for production for high complexity applications, the proposed programmable interposer solution is an attractive solution for production of highly complex systems that need to be energy efficient.
The configurable interposer includes design for testability features to improve the quality and the efficiency of the test and diagnosis of complex 3D stack chips or complex SoW.
Embedded programmable assertion (5502) allows checking for complex patterns of signals coming from the uIC under test. Assertion checkers can detect logical faults based on the observation of the traffic on specifiable sets of interconnects in the 3D chips. The hardware implemented assertions are obtained by programming special logical cells embedded in the LAIC. Interconnecting sub-group of logical cells allows the creation of the desired behavior. Such hardware implemented embedded assertion checkers facilitate diagnosis the location in space and time of the root cause of observed undesired system behaviors. This embedded programmable assertion can be used for a large number of applications, not just for diagnosis and testability. By definition, assertion models check the expected logical and temporal behavior of the device under test (or diagnosis). Assertions are expressed by high level language, such as PSL, and a subset of this language is synthesizable in the hardware assertion integrated in the configurable interposer. Normally, simulation or emulation is done on the design to validate the behavior. The assertion can verify if an expected behavior occurs in the circuit, and it is able to detect potential or confirmed problems during uIC operation. The innovation integrates assertion in the configurable interposer to make advanced, at-speed diagnosis of complex 3D stack LAIC systems.
In order to create, validate and test energy efficient electronic systems, the designer needs to have extensive observability of the key system parameters that determine power and energy consumption. Data obtained from sensor can be analyzed to find patterns and correlations, with as much accuracy as possible, that determine power and energy consumption from which shut down events for each component can be planned. The system can control each PMC (Power Managed Component), according to a wide range of architectures or DPM methodology that can be tested effectively and for which the energy consumption can be directly measured.
The configurable interposer Iddq testing device is shown as 5507. A current sensor is associated with every CMPIO and a dedicated ADC 5506 converts the current sensor's analog output to N digital bits and then converts this to a serial signal with a Serdes (5505). The serial signal is connected to the NoC; therefore the signal can be redirected to any uIC chip integrated in the 3D structure or outside of the system to software analyzer. Iddq testing is used for diagnosis and efficient testing of ICs. In the present invention, the current sensor can also be used for evaluating the energy efficiency of the device under test.
A preferred embodiment of the internal architecture of the configurable assertion integrated in the configurable interposer is shown in
The pattern detector contains a small local crossbar (5609) named CHAC crossbar that interconnects any input port to any sets of hardware emulation (5608) of the Boolean behavior of a k-sat. The k-sat is a very well known formulation of the prepositional satisfiability problem. The local crossbar 5609 and the k-sat module 5608 are software configurable by the means of the serial scan chain 5602.
The Boolean result from the configurable pattern detector can be received by a configurable state machine (5606). The configurable state machine is configured with a serial scan chain (5602) and the specific configuration bitstream is generated by the embedded software or external software controlling the system. To create a large span of emulated behavior, the state of the configurable state machine and a set of signals are connected back to the RRN to give an observability access to every external device in connection with the system.
The RRN provides a system clock. It means that the system clock can potentially come from anywhere or anything that is in contact with the network. The same principle holds for logical signals dedicated to control the state of the AND-OR plane (5607) or the pattern detector. This is the key to aggregate multiple hardware assertions checker together to increase arbitrarily the complexity of the assertion checker.
Referring back to
The configurable hardware assertion checker is fully fault tolerant for multiple reasons. First, the configuration system that provide the configuration bitstream to the logical AND-OR plane and pattern detector is fully fault tolerant. Secondly, the system contains as many CHAC 4602 as the number of cells in the circuit. Because the number of cells is high and many of the cells won't be used, if a cell contains a failed CHAC this specific cell will not be used. In order to be able to avoid the use of a faulty CHAC, one must be able to diagnose the fault. As shown in
All the cells of the configurable interposer contain the same architecture.
Each BIST is composed of a vector generator (LFSR) and a signature analyzer (MISR). The signal generated by the LFSR can be redirected to desired chip pins in a LAIC or in a 3D structure. The LAIC or the configurable interposer MISR can observe any signal in the system with the use of the configurable network. The complexity of the LFSR or the MISR can be enhanced arbitrarily by combining together a large number of programmable cells. BIST for diagnosis such as a walking one sequence can be generated and results interpreted to perform a precise diagnosis by the programmable logical cell embedded in the LAIC or in the configurable interposer.
Some conditions must apply in order to be able to save energy with the DPM design methodology. The first condition is to have components that consume variables power during system operation. The second condition is to predict the future workload of the most power hungry components of the system. The third condition is to be able to achieve such prediction with negligible power consumption. These conditions can be satisfied by observing signals that trigger shut-down or power-up event. Furthermore, it is required to use a Power Manager (PM) implementing the control of shut down and power-up of components. Such components are called power managed components (PMCs). The set of all control command for power managed components is called a policy. The PM can be distributed on the whole configurable interposer or the WaferIC. Instead of having a central PM as the prior art, the logical behaviors of the PM are inserted in the configurable logic included in the configurable interposer or the WaferIC. This is achievable because the external software is in connection with a configurable interposer or a LAIC that have an access to all VDD power supply pins of the system. Knowing the correct location of every VDD pins of the system and having the possibility to force a particular voltage level (between 1V to 3.3V) on the CMPIO is the key to find the minimal applicable VDD voltage level on every ICs of the system.
A preferred embodiment of the configurable interposer integrates hardware assertion embedded in the configurable interposer to enhance thermal management. Furthermore, the configurable interposer uses the extensive observability on every signal of the system to prototype, validate and fully implement software based thermal management policy.
Preferably multiple features are included in the same configurable interposer, including: (1) integration of hardware assertion embedded in the configurable interposer to enhance thermal management; (2) the configurable interposer using the extensive observability on every signal of the system to prototype, validate and fully implement software based thermal management policy.
The interposer comprises a current, voltage and power monitoring of every VDD pin of every uIC deposited on the active surface of the configurable interposer. The current and the voltage are directly measured and the measurements are redirected to an embedded or an external software module. The role of the software module is to analyze the crude voltage and current data and compute power consumed by every uIC, from which energy efficiency statistics can be gathered in a database and shown to the user.
A preferred method to automate the search for the optimal DPM policy is to mix the massive data gathering capacity on power consumption of the electronic system with the possibility to control every PMC of the system. The data gathered on the power consumption fluctuation is stored in the database.
Then the data is analyzed by the software to create a predictive model of future shut down events and future power-on events. The data gathered is not only power data coming from the current consumption, but signal data coming from every observation pin of the system. The massive quantity of data is analyzed by software running on a high-performance external computer during the design phase. Once the best possible predictive model is found with the computer, the model is expressed in term of assertions and then synthesized into the configurable assertion checker. The predictive model can be based on statistical analysis of the power data and the digital signal data coming from I/O pins. The innovative aspect consists in implementing the predictive model by the means of the massively distributed configurable assertions embedded in the configurable interposer or in the WaferIC.
The next step (step 5903) consists of evaluating the energy efficiency of every chip or die placed on the active surface of the system. This stage is crucial to pinpoint the most important places in time (when) and location (where, on which chips or die) to search for power management policies. Because the search is very time consuming, a heuristic is added to the Predictive Model Search (PMS) algorithm. The search criteria are based on the energy efficiency of every die or uIC deposed on the system. The most logical choice is to search for a predictive model only on the least-energy-efficient component of the system.
The energy efficiency of the component is estimated by evaluating the power consumption multiplied by a metric called the “ExIn” index (for Exchange Intensity index). The ExIn index is computed from the number of data exchanged inward or outward for a component over a small time interval, and the index thus changes over time. The time precision depends on the sampling rate of the signal data captured in the system. The ExIn index can be mixed with the power consumption to get a relatively accurate estimation of the energy efficiency of the component over time. A preferred method to mix power and ExIn data is to create another metric called the EE index, defined by the ExIn index divided by the current power consumption (EE(t)=ExIn(t)/P(t)). If the later EE index is high, this means that the energy efficiency of the component is high. The EE index varies over time and if the EE index is relatively low compared to other components or compared to previous time interval, then the space-time interval can be chosen and place in a list of data to search for power management policies. Therefore, this stage finds when and where to apply DPM policy to improve the inefficient part of the system.
Selecting the best DPM policy is the next step (step 5904). This selection can be made by the user which select in a library of DPM policy. Each DPM policy is then tested with the reconfigurable logic cell and the reconfigurable network to force to signals on the PMC and to observe the system signal to trigger the PMC to shut down or to wake up (step 5905).
The power manager must able to implement the policies without significant degradation of the system power consumption. In other words, the power consumption required for the power manager to implement the DPM policy must be small enough to be negligible. In order to do that, the number of power managers to implement and synthesize in the configurable interposer or in the WaferIC must be optimized (step 5906).
The last step (step 5907) consists of activating the configurable links between the configurable interposer or the WaferIC and the power manageable components. All the links can be configured by software with a fault tolerant serial communication link such as the previously discussed (CICU or CICB).
The main benefit of using an array of CMPIO and a configurable network on chip is to gain an observability of every I/O of the system. This is the foundation for generating assertion-based management policies. As shown in
In order to find those assertions, embedded or external software (shown as 6002 in
The shut-down and wake-up events must be automatically generated from the observed data. While many methods to accomplish this are possible, the preferred method is to use regression analysis. The workload is defined as the total computing done in a small interval of time. The workload of the whole system can fluctuate around an average value with more or less time variation. The workload can be defined for a single component. In that case, the workload can drop down to zero during a non-negligible time interval. During such events a shut down applied to this component can be forced without compromising the system functionality. To detect this kind of event, a correlation must exist between the exchanged signals between components and a particular time interval with zero workload. In other words, a series of precursor signal must be detected before sending a shut down takes place. Therefore, the algorithm consists of finding such correlations. The same principle holds in finding correlations between a series of precursor signal and a wake-up event (step 6104) as stated in
This methodology can be fully automated and installed in the OS of the whole system (configurable interposer and 3D stack chip). Therefore, this design flow is the basis to create an adaptive DPM policy. The assertions can be changed “on the fly” to reflect a new workloads patterns observed by the software installed in the OS of the system.
The configurable interposer or the WaferIC can force a specific voltage level on every uIC pin dedicated to power. This is possible through the regular array of CMPIO.
The ability to force voltage to supply current to any VDD pin is the key to automatically adjusting the appropriate voltage according to the uIC's specifications. It is possible to gain a non-negligible amount of energy saving in the whole system by minimizing the applied voltage level on every VDD of the system.
The algorithm disclosed in
The first step (step 6201) is to create from a database a list of all the power rails of every IC under prototyping. The second step (step 6202) is to configure the interposer or WaferIC find the minimum VDD for each component. The following convention is used: the list of all ICs in the system is uICL and the list of all power rails of the actual uICL[i] is PR, which is created in step 6203. The next step (step 6204) is to initialize the VDD voltage level to the nominal value as stated by the specification documents. The voltage is then slightly decremented (step 6205). Then, the whole system is tested with automated and auto-validating testcases (step 6206). The VDD of the current power rails under minimization is slightly decremented. If the test case does not pass, the minimal voltage level applied on the power rail is found and corresponds to the i-1 search iteration previously applied on the whole system (6207). The same process is repeated on each power rail of the system. In consequence, this algorithm is able to automate the search of the minimal voltage on every power rail of the system and as a matter of fact, accelerates the whole design flow.
To limit peak temperature in a 3D chip stack dynamic thermal management can be integrated in the system. Such techniques can be implemented in 2D chip with dynamic frequency and voltage scaling. The same sets of techniques can be implemented in 3D. Because there is a strong correlation between chips and stacked chips, a configurable interposer can assume the role of dynamical thermal manager.
An array of temperature sensors are embedded in the configurable interposer. Data obtained from these sensors can then be gathered for software based quantitative evaluations of the effectiveness of the implemented DTM policy. The configurable and interconnected logic cells can be the base from which the thermal management policy is executed and controlled. On the other hand as depicted in
A current sensor, a voltage sensor and/or a power sensor is associated with every surface contact support circuitry 9501 and a dedicated analog-to-digital circuit converts the sensor's analog output to digital signal as stated in the previously preferred embodiments. The digital signal can be redirected to any internal memory, internal controller or external controller for analysis. The sensor(s) can also be used for evaluating the energy efficiency of the integrated circuit component or the signal integrity at the surface contact.
Fifth Family of Preferred Embodiments: Mosaic of Miniature Printed Circuit Board for Mechanical Support and Power of Large Area Integrated Circuits
An aspect of the present invention is stacking of different mechanical and electrical layers to support a LAMS device. This stacking architecture acts firstly as a flat and stable mechanical support for very fragile LAMS devices. Secondly it supplies powers and signals to LAMS devices using only one side of the LAMS device. A layered arrangement of different layers allows supporting sub-micrometer devices with existing improved millimeter systems and technologies. The invention is a hierarchical layered system from mechanical and electrical points of view. The invention structure is described as follows and illustrated in
The main structure of the invention supports any mechanical and electrical devices needed for the LAMS application. The support frame 6002 is the first level of the hierarchical structure. It acts as the LAMS device mechanical support and supplies electrical power and signals to LAMS devices. The power circuitry of the support frame is similar to a power supply unit in an electronic system. It is designed to provide stable voltage(s) and high current to LAMS devices. Typical LAMS power ranges are from 300 W to 1000 W, depending on the current capability needed by the LAMS application. The support frame can be a multi-layer printed circuit board or any multi-layer thin or thick film technologies with or without common electronic components (ICs, passive devices, connectors, etc.).
The multilevel structure is made of different materials with their own properties, specifically different coefficients of thermal expansion (CTE). They could induce thermal stresses, distortion, and warping that could cause problems if not managed properly. The power used in any layer will generate heat and the LAMS device could fail prematurely due to mechanical thermal expansion. LAMS devices are effectively very sensitive to mechanical stresses (such as bending or pressure) or thermo-mechanical stresses.
Some pre-processing and post-processing technologies that can be used to interconnect LAMS devices (6001) to the support substrate (6002), such as through-wafer-via or through-silicon-via, could significantly increase the sensitivity to mechanical stresses.
The support substrate (6003) is designed in order to get a surface as flat as possible under the LAMS device (6001) to minimize mechanical stresses. The support substrate (6003) is also made as thin as possible to maximize heat transfer between LAMS devices (6001) and the large and robust main mechanical support (6002).
The TCE of each layer in the multi-layer structure is made as close as the TCE of the LAMS devices.
But even if perfectly TCE-matched materials are used for each layer, it is impossible to avoid temperature difference between layers. It is therefore impossible to avoid thermo-mechanical stresses if the two layers are rigidly attached together.
In the present invention, mechanical stress are reduced by ensuring that one layer attached to another be made as an array of mechanically independent devices. The size of these devices in this array must be made small enough to reduce the stress to a tolerable level in X and Y directions.
Preferred methods for the main support structure are illustrated in
In the first method, as shown in
The second method,
The interfaces between the support substrate (
The interface substrate must be able to compensate TCE mismatch between the main mechanical support and the LAMS devices but also has to ensure a maximum mechanical stability to the whole system.
The connections between the interface substrate and the LAMS device can be made with solder balls to hold it and make electrical connections to it as shown in
Another method for the LAMS interface is preferred when the active side of the LAMS device must be freed up for the application, such as described in U.S. Ser. No. 11/611,263.
One of the most important reliability issues of LAMS scale packaging is the thermo-mechanical stress that is caused by the mismatch of the coefficients of thermal expansion (TCE) between the LAMS device and the main support substrate. This thermo-mechanical stress can be reduced either by using an interface substrate material whose TCE matches that of the LAMS device (AlN, Si or GaAs) or able to compensate the TCE difference. Thermo-mechanical stress in LAMS application can lead to the break of the LAMS device.
The interface substrate can be made with a material that has a TCE equal or nearly equal to that of the LAMS device. For instance, silicon has a TCE of 2.6×10−6 K−1, silicon or alumina silicate glasses (TCE of 2.9×10−6 K−1) or Aluminum Nitride (TCE 4.5×10−6 K−1) can be used as substrate for the interfacing substrate but such substrates are extremely expensive and not suitable for high volume products. Notice that even perfectly TCE-matched layers will develop mechanical stress if their respective temperatures are different.
An alternative method to minimize both the cost and the thermal stress on the LAMS device during its operation is to split its support substrate into a mosaic of cheaper micro-substrates as shown on
A more detailed illustration is given on
The set interface substrate fixed and connected to the LAMS device can be defined as the packaged LAMS. The package LAMS has to be placed on the main system support 6302.
The packaged LAMS can be fixed and connected to the main support but can also be only deposited on the main support substrate in order to slide on this surface to compensate X and Y TCE mismatches as described in [000486].
If the packaged LAMS is fixed on the main substrate support, connections and fixations are ensured by classic PCB or packaging soldering techniques solder balls and metal lines.
If the packaged LAMS is only deposited and must slide on the main substrate support as 6801, the electrical connections are ensured by face to face metal rails on packaged LAMS backside and main substrate support topside.
To ensure more mechanical and thermo-mechanical stability during mounting steps to the set of substrates, an interposer layer (7105) between micro-substrate array (7101) and the heatsink (7103) can also be added.
If the LAMS device can be split into several parts (typically identical) that interact only electrically, another alternative of the whole system assembly is preferred. The LAMS device can be diced into an array of identical cells or other parts. This array is placed and/or fixed on a substrate support as depicted in
A more details view is given in
Sixth Family of Preferred Embodiments: Distributed and Fault Tolerant Power Supply for Large Area Integrated Circuits
Another aspect of the present invention is able to power any LAMS device. A hierarchical and distributed architecture of a programmable power supply voltage regulator is proposed to satisfy LAMS device power requirements. The global architecture of the power supply system is depicted on
The hierarchical architecture is well suit to efficiently distribute power to the whole LAMS area. Depending on the LAMS application, the large area of the system imposes a need to design a power supply distribution strategy as robust as possible to provide voltage sources as stable, homogenous and fast-response as possible. The first level of hierarchy (1) feeds a second distributed one (3) through dedicated interconnections (2). The second level (3) feeds a third more distributed hierarchy level (6) also through dedicated interconnections (5) and then reach the entire LAMS area (7). This power supply tree architecture is generic and can be used in all systems where voltage/currents sources have to be spatially/temporally homogenous distributed (in area or volume).
A preferred implementation of this hierarchical and distributed power supply system is given in the following parts.
The first and main stage (1) of the power supply architecture is similar to a computer power supply unit. It is designed to convert 100-120 V (North America and Japan) or 220-240 V (Europe, Africa, Asia and Australia) AC power from the mains to usable low-voltage DC power for the LAMS application. Typical power ranges are from 300 W to 1000 W, depending on the voltage and the current capability needed by the wafer-scale system. All circuitry lays on the backside or main PCB shown in
Most of AC-DC and DC-DC converters have decoupling capacitors and inductors to enhance their dynamic performances. Those capacitors have to be as close as possible to the application to avoid long power lines and the related electromagnetic issues. Large decoupling capacitances are placed directly on the PCB-wafer interface depending on the adopted wafer supporting strategies.
Power and ground connections with the LAMS device are made with solder balls. The distribution of those power and ground solder balls is important to minimize electromagnetic effects between them and to enhance the power supply performances. The power (7501) and ground (7502) balls are equally distributed as depicted on
In the particular case where the active side of the wafer-scale device is useful, through LAMS vias (TLV) are distributed on the whole surface of the LAMS device, by respecting the same distribution described in previous paragraph.
To distribute the power to the whole LAMS device area, classical techniques for integrated circuits are used. Typically, power distribution within an integrated circuit is done from the top-level metal layer, which is connected to the package, down through inter-layer vias and finally to the active devices, as illustrated in
To power wafer-scale application that needs large currents and to avoid electro-migration issues, the power and ground grids can be strengthened with post-processed metal layers, deposited on the topside or the backside of the LAMS device by using standard Wafer Level Packaging processes (WLP) or redistribution layers.
If very robust and stable power supply voltages are needed by the LAMS application, other levels of hierarchy (5-6) can be embedded in the LAMS device (4) in schematic given in
A first possibility to improve the power supply system capabilities is to embed local and fast power supply regulators in the LAMS application to provide stable strong currents very close to the final application.
The architecture of the embedded voltage regulators is also a hierarchical architecture and is depicted in
Each master stage can contain an accurate voltage sensor. The measured voltage is converted in digital data and then sent to the global system control stage. An accurate and real time power supply voltage map of the wafer-scale device can be elaborated from the data provided by the voltage sensor network.
The real time LAMS surface voltage sensing is useful to control adequately each block of the LAMS power supply chain in order to get the best electrical response of the system to any power supply requirements and/or constraints.
The programmable voltage reference has to provide stable and programmable voltage depending on the LAMS requirements. Different microelectronic (LMOS, CMOS, biCMOS, bipolar) circuitries can be designed to respect those requirements.
Another way to stabilize the power supply voltage on the LAMS surface is to add a level of integrated passive devices.
Decoupling capacitors can be placed on the surface of the LAMS device. Wafer Level Packaging and Integrated Passive Device post processing steps allow to deposit passive devices, as capacitors, on a semiconductor surface. Those capacitors can be connected or not to the LAMS device power lines to enhance its power capability by using post-process MEMS switches.
A large ground plane can be deposited on the LAMS surface (WLP technology) to enhance the electromagnetic behavior of the whole system. Distributed MEMS switches on the LAMS surface allow connecting any LAMS point to a clean ground. This configurable Kelvin ground point networks is useful for electromagnetic sensitive systems or for high power systems.
The present invention provides a configurable network of passive devices. With this network, any contact of the LAMS device can be connected to passive devices such as resistor, capacitor, inductor or ground point. This interesting possibility can be used to strengthen the power supply capability of the LAMS application. It is also useful to adapt the impedance of certain kind of electric signal paths. This networks can be also used to clamp any electrical signal of the LAMS device to a fixed voltage (ground or power voltage for instance). This networks is externally programmable and can be configured ‘on the fly’ during the operation of the LAMS application.
This configurable passive device network is implemented with a superposition of post-processing layers and micro/nano scale technologies deposited on the LAMS application itself. A collection of Integrated Passive Devices is distributed on the LAMS surface. Any device can be connected to neighbor LAMS application dedicated nodes with programmable MEMS switches as depicted on
The distributed network of MEMS switches (3) allows ensuring low resistive electrical paths between the passive devices and some dedicated LAMS device nodes (2).
The passive device network is obtained by using classical post-processed integrated passive device technologies (IPD).
The three principal classes of integrated passive component technologies that are available today include thin-film technology, low-temperature co-fired ceramic (LTCC) technology, and technologies based on extensions of high-density interconnection (HDI) and other printed circuit board (PCB) technologies. The HDI and PCB technologies are most commonly employed in digital applications, where distributed capacitance and medium precision pull-up resistor functions can be realized at reasonable yield and cost. Of the technologies suited for RF integration, the thin-film integrated passive technologies generally provide the level of precision, range of component values, and functional density to allow a more integrated, smaller, and lighter implementation of a given RF function.
A collection of metal and/or polysilicon resistors with different resistance values is distributed on the whole wafer surface.
A collection of metal capacitors with different capacitance values is distributed on the whole wafer surface.
A collection of metal inductors with different inductance values is distributed on the whole wafer surface.
A low impedance ground grid is implemented with WLP processes on topside or backside of the application. Some distributed and configurable MEMS switches can connect any node of the wafer-scale application to the clean ground plane. This configurable ground network allows enhancing power and EMI characteristics of the system.
An integrated circuit component is typically connected to other components through its surface contacts, as shown in
The CMPIOs have their own programmable analog and digital circuitries that allow powering many different electrical devices. The output power supply voltage can be externally controlled and is also regulated.
Distributed voltage regulator support circuitry can be activated to feed power to any integrated circuit components connected to its surface contact. This distributed voltage regulator support circuitry has a hierarchical structure similar to that in
Each master stage can contain an accurate voltage sensor. The measured voltage is converted into digital data and then sent to the global system control stage. An accurate and real-time power supply voltage map of the integrated circuit component can be elaborated from the data provided by the voltage sensor network.
The real time voltage sensing of one or more integrated circuit surface contacts is useful to control adequately each block of the power supply chain of integrated circuit components in order to get the best electrical response to any power supply requirements and/or constraints.
The programmable voltage reference has to provide stable and programmable voltage depending on the integrated circuit requirements. Different microelectronic (LMOS, CMOS, biCMOS, bipolar) circuitries can be designed to respect those requirements.
Seventh Family of Preferred Embodiments: Thermo-Mechanical Stability in LAIC (Large Area Integrated Circuit) Systems
One of the main aims of the invention is to limit as much as possible thermal and pressure stresses on the supported LAMS device. Those thermal effects can have disastrous consequences on LAMS application. An object of the invention is a hierarchical and distributed thermal regulation system.
Thermal and pressure sensors are embedded and distributed on the whole LAMS surface. Those thermal and pressure sensors can be made by using different technologies, depending on the LAMS application technology used. The measured temperatures and pressures are converted in digital data and then sent to the global system control stage. An accurate and real time thermal and pressure maps of the LAMS device can be elaborated from the data provided by the thermal and pressure sensor network.
Programmable thermal heater and coolers are embedded and distributed on the whole LAMS surface. Those heaters and coolers can be made by using different technologies, depending on the LAMS application technology used.
The LAMS distributed thermal sensor and generator networks are directly linked to the system control that can be local or global. Thermal and pressure mechanisms are very slow physical phenomenon and can be regulated and controlled with a real-time software approach. Dangerous temperatures and pressures are detected and their potential consequences are avoided by controlling the thermal generator networks adequately to reduce the differential temperature or in a worst case, by switching off the LAMS device and its components.
Eighth Family of Preferred Embodiments: Differential Electrical Signal Propagation in Integrated Circuit Networks with Configurable Pair Location
An object of the invention is a smart CMOS module that is useful to support all described functionalities. This CMOS circuit is a Configurable Multi-Purposes IO (CMPIO). The output stage of the described module is fully configurable and is able to realize many functions with the same device. The use of the same output stage for different functions allows minimizing the silicon area needed for this smart Configurable Multi-Purposes IO. The output stage is a combination of PMOS and NMOS transistors.
The CMPIOs common functionalities are given below. CMPIOs have their own programmable analog and digital circuitries that allow supporting many single ended digital Input/Output standards (CMOS, TTL). The output or input voltage, the output and input impedances can be externally controlled.
CMPIOs have their own programmable analog and digital circuitries that allow supporting many differential digital single ended Input/Output standards. The output or input voltages, currents, the output and input impedances can be externally controlled.
The CMPIOs also include original features. They have their own programmable analog and digital circuitries that allow powering many different electrical devices. The output power supply voltage can be externally controlled and is also regulated.
A fault tolerant RRN allows propagating single ended digital signals on the wafer-scale application surface (referenced to WaferNet™)
CMPIOs are able to support differential signaling with the particularity that the complementary pair of differential nodes can be placed anywhere on the LAMS application surface. To support this spatial uncertainty, a dedicated configurable differential signaling structure is described below.
CMPIOs can drive and can be driven by configurable input/output balanced H-tree networks called WaferDiffNet™.
WaferDiffNet is a hierarchical configurable input/output H-tree network that propagates balanced differential signals from CMPIOs to RRN or from RRN to CMPIOs. It can be considered as a differential signal ‘window’ on the wafer surface, that can be resized or moved depending on the differential signal ball locations.
A cell-based hierarchical approach is used to simplify the physical implementation of such complex balanced configurable H-tree networks on a wafer-scale application. The size of a unit square cell tiled through the full wafer-scale active surface is noted Lcell.
A four hierarchical level WaferDiffNet logical structure is depicted on
The 4 level WaferDiffNet can support differential IOs distant from a minimal length of √2.Lcell to a maximum length of 4.√2.Lcell depending on the differential IO placements and orientations.
Each stage of the output WaferDiffNet™ is configurable and can propagate or not a single ended digital signal to the 4 connected following stages as a digital de-multiplexer. Classical three-state buffers or inverters can be used to implement fast digital de-multiplexers for each stage of the output WaferDiffNet.
Metal interconnections between each stage are regular and are implemented using top level metal layers of the CMOS technology used for delay dispersion and jitter considerations.
The three-state buffers used in each stage are well sized and balanced considering their loads especially the metal line interconnection lengths and capacitances in order to be able to propagated high-speed digital signals.
Each stage if the input WaferDiffNet is configurable and can propagate or not analog signals. Analog multiplexers coupled with differential to singled ended signal converters are used in each stage of the input WaferDiffNet.
Each stage of the input WaferDiffNet can be set in a low-power mode by an external configuration in order to minimize the whole structure power consumption.
Considering the 4 level WaferDiffNet depicted on
The first stage of the 4 level WaferDiffNet is only a 4-to-1 analog multiplexer (8003) that propagates or not analog signal from CMPIOs at its inputs (8002) to the second stage input (8006) depending on the external configuration (8004-8005).
Considering the 4 level WaferDiffNet depicted on
The second stage of the 4 level WaferDiffNet is a 4-to-1 analog multiplexer (8103) that propagates or not analog signals coupled with a configurable differential to single ended converter (8107).
The 4-to-1 analog multiplexer (8103) of the second stage allows propagating or not analog signals from the outputs of the first stages (8102) to the third stage input (8106) depending on the external configuration (8104-8105).
The configurable differential to single ended converter (8107) of the second stage can select a pair of differential signals provided by the previous stages (8102) between 4 pair possibilities and then transform them into a single ended digital signal that is directly sent to the global WaferNet™ (8108).
The third stage of the 4-level WaferDiffNet can also be depicted on
The 4-to-4 analog multiplexer (8203) of the third stage allows propagating or not analog signal from the outputs (8202) of the second stages to inputs (8206) of the fourth stages depending on the external configuration (8204-8205). The possibility to address 4 different fourth stages around a third one allows the configurable network to cover the whole wafer area and to support all differential signal ball pitches. The differential configurable network ‘window’ can slide with a step of a half ‘window’.
The configurable differential to single ended converter (8207) of the third stage can select a pair of differential signals (8202) provided by the previous stages between 12 pair possibilities and then transform them into a single ended digital signal that is directly sent to the global WaferNet™(8208).
The fourth stage of the 4 level WaferDiffNet is a configurable differential to single ended converter and is depicted on
The configurable differential to single ended converter (8213) of the fourth stage can select a pair of differential signals provided by the previous stages (8212) between 12 pair possibilities depending on the configuration (8214-8215) and then transform them into a single ended digital signal that is directly sent to the global WaferNet™ (8216).
CMPIOs are also able to support differential signaling with the particularity that the complementary pair of differential nodes can be placed to any surface contact of the integrated circuit component.
CMPIOs can drive and can be driven by configurable input/output balanced H-tree networks called DiffNet.
A digital network (DN) inside the integrated circuit component allows propagating single ended digital signals on the integrated circuit application surface. The integrated circuit can have any size up to a full wafer.
DiffNet is a hierarchical configurable input/output H-tree network that propagates balanced differential signals from CMPIOs to DN or from DN to CMPIOs. It can be considered as a differential signal ‘window’ on the integrated circuit surface that can be resized or moved depending on the differential signal surface contact locations.
A cell-based hierarchical approach is used to simplify the physical implementation of such complex balanced configurable H-tree networks on the integrated circuit component. The edge length of a unit square cell tiled through the integrated circuit surface is noted Lcell.
In a preferred embodiment, a four hierarchical level DiffNet logical structure is depicted in
The 4 level DiffNet can support differential IOs (differential surface contacts) distant from a minimal length of √2.Lcell to a maximum length of 4.√2.Lcell depending on the differential IO placement and orientation.
Each stage of the output DiffNet is configurable and can propagate or not a single ended digital signal to the 4 connected following stages as a digital de-multiplexer. Classical three-state buffers or inverters can be used to implement fast digital de-multiplexers for each stage of the output DiffNet.
In the proposed configuration, metal interconnections between each stage are regular and can be implemented using top level metal layers of the CMOS technology used for delay dispersion and jitter considerations.
The three-state buffers used in each stage are well sized and balanced considering their loads, especially the metal line interconnection lengths and capacitances in order to be able to propagated high-speed digital signals.
Each input stage of DiffNet is configurable and can propagate or not analog signals. Analog multiplexers coupled with differential to singled ended signal converters are used in each stage of the input DiffNet.
Each stage of the input DiffNet can be set in a low-power mode by an external configuration in order to minimize the power consumption of the whole structure.
Considering the 4 level DiffNet depicted on
The first stage of the 4 level DiffNet is only a 4-to-1 analog multiplexer (8003) that propagates or not an analog signal from CMPIOs at its inputs (8002) to the second stage input (8006) depending on the external configuration (8004-8005).
Considering the 4 level DiffNet depicted in
The second stage of the 4 level WaferDiffNet is a 4-to-1 analog multiplexer (8103) that propagates or not analog signals coupled with a configurable differential to single ended converter (8107).
The 4-to-1 analog multiplexer (8103) of the second stage allows propagating or not analog signals from the outputs of the first stages (8102) to the third stage input (8106) depending on the external configuration (8104-8105).
The configurable differential to single ended converter (8107) of the second stage can select a pair of differential signals provided by the previous stages (8102) between 4 pair possibilities and then transforms them into a single ended digital signal that is directly sent to the global DN (8108).
A preferred structure for the third stage of the 4-level DiffNet is also depicted in
The 4-to-4 analog multiplexers (8203) of the third stage allow propagating or not analog signals from the outputs (8202) of the second stages to inputs (8206) of the fourth stages depending on the external configuration (8204-8205). The possibility to select 4 different fourth stages around a third one allows the configurable network to cover the whole wafer area and to support all differential signal ball pitches. The differential configurable network ‘window’ can slide with a step of a half ‘window’.
The configurable differential to single ended converter (8207) of the third stage can select a pair of differential signals (8202) provided by the previous stages between 12 pair possibilities and then transform them into a single ended digital signal that is directly sent to the global DN (8208).
The fourth stage of the 4 level DiffNet is a configurable differential to single ended converter depicted in
The Configurable Differential to Single Ended Converter (8213) of the Fourth Stage can Select a Pair of Differential Signals Provided by the Previous Stages (8212) Between 12 Pair Possibilities Depending on the Configuration (8214-8215) and then Transform them into a Single Ended Digital Signal that is Directly Sent to the Global DN (8216). Ninth Family of Preferred Embodiments: Propagation of Analog Signals on a Digital Interconnect Network and Support for Analog Signals
To fulfill the need for reprogrammable circuits carrying analog signals, the present invention discloses converting one or more analog signals to one or more digital signals or quantities that can reliably be propagated though a reprogrammable digital network, and then converting the signal back to analog at the destination. Any known conversion technique can be implemented. In a preferred embodiment of the invention, an analog signal is converted to a digital signal or quantities that can reliably be propagated though embedded digital interconnects, and then converted to an analog signal at the destination.
Any conversion technique can be implemented. One way to obtain that functionality is to embed Analog-to-Digital or Digital-to-Analog converters to convert the signals from analog to digital and vice versa.
Another technique is to use of a voltage controlled oscillator (VCO) [37-39] to convert the analog signal into a digital stream or a signal whose frequency varies with the magnitude of the analog input signal. A frequency to analog conversion is then done at the destination.
The same conversion principle can be applied with delta-sigma modulation [37-39].
Another invention is to propagate analog signal between to I/O with one or several metal grids (typically used for power) coupled with large transmission gates.
Once Analog-to-Digital or Digital-to-Analog converters are introduced in a programmable or configurable fabric such as the one proposed in this invention, in addition to propagating analog quantities for one I/O to another, it becomes possible to convert an analog quantity that originates from an integrated system or to drive some internal node using an analog quantity using a suitable converter. Thus information detected by an internal sensor in analog form can be propagated over a digital switch fabric, and used elsewhere inside or outside of the integrated system. It should be understood that I/O here could be any of a CMPIO, a port of the test controller or some TSV propagating some electrical signal in a 3D assembly. This allows measuring the voltage drop in a TSV used as a shunt, the drop (peak or average) in power distribution path, the temperature or mechanical stress at some internal point. It also allows distributing and controlling an internal node meant to receive an analog reference or any form of analog quantity as in any regular analog feedback path, except that part of that path would in fact be propagated as a form or digital signal over one or more than one connection in a substantially serial or substantially parallel way. The invention also covers the digital path used to propagate information in the form of a modulation in frequency or pulse width of a substantially digital signal.
Another use of the concept of propagating a digital signal to drive a substantially analog quantity would allow trimming an analog function using a digital correction that could be stored in a register, in some form of non volatile memory such as fuses, zener-zapped devices or floating-gate devices or in devices whose parametric value may be altered by the application of a current or of an external stimuli such as one or more laser pulse. In the case of floating-gate devices, they may also store an analog quantity. The support circuitry could thus be any circuitry required to support known trimming of an analog function. A known benefit of such trimming is to adjust for process or environment variation to correct the accuracy of the analog function. That could be applied for instance to voltage references, current sensors, voltage sensors, embedded amplifier, voltage regulator. This list is not restrictive. It should be understood that temperature gradient or voltage drops in power distribution networks are prone to induce environment parametric variations. Other sources of parametric variations could also be managed in the same way.
Tenth Family of Preferred Embodiments: Smart Thermo-Mechanical Prediction Unit and Monitoring Methods to Sustain Transient Thermo-Mechanical Stress Peaks Reliability in Large Area Integrated Circuit Systems
The effects of mechanical stress on VLSI devices behavior are of significance to modern integrated circuit manufacturers since large values of stress can be induced by various steps during fabrication and by a variety of packaging processes, including die attachment and encapsulation. One of the major concerns in designing such packages is the reliability of solder joints, die, and the various material interfaces present in the package.
Smart thermo-mechanical prediction unit and monitoring methods to sustain thermo-mechanical stress peaks reliability in embedded high density VLSI system according to a first family of preferred embodiments of the present invention overcomes these drawbacks of the prior art by providing a useful prediction unit and monitoring methods to detect and therefore avoid critical stress by monitoring the whole active LAIC surface.
Preferred embodiments of the present invention therefore use a method that is more efficient by using embedded pressure and thermal array sensors network. While many such networks are known in the art, two that are considered exemplary for their simplicity, efficiency, flexibility and robustness are discussed herein, with flexibility for grouping and interconnection needs of the present invention.
Smart thermo-mechanical prediction unit and monitoring methods to sustain thermo-mechanical stress peaks reliability in SoC according to a second family of preferred embodiments of the present invention can help designer of the future high density SoC by controlling critical hot spot and associated stress level during operation and hence avoiding these drawbacks of the prior art by providing a useful prediction unit and monitoring methods to detect and therefore avoid critical stress by monitoring the whole active SoC surface.
A possible configurable thermal sensor cell couple (8403) is selected from thermal sensor network (8401) embedded on LAIC systems by grouping three individual thermal sensors (8402) to build thermal sensor cell (8404) to allows the surface temperature peak value measured and position to be localized;
One sensor unit cell (8504) depicted in
For two sensors A (8502) and C (8503) placed in the distance a on line AC (8505), the difference between their output voltages or frequency in the case of using RO (Ring Oscillators) is proportional to the changes of the temperature value along the line 8505. This is true only when the heat source is directly on the line AC 8505 for any other cases the values of the angle α 8506 has to be taken into account for the proper calculation of ΔT.
Using one sensor unit cell (8504), the information on the temperature distribution and partly on the position of the heat source is obtained. In order to obtain the temperature value of a single punctual heat source, the distance between the sensor and the heat source is calculated.
The information on the temperature distribution and partly on the position of the heat source is obtained by using multiple sensor couple, as shown on
In the preferred embodiment, two sensor unit cells 8604 and 8605 are required for this purpose. The cells are placed in a given distance H (8606) and each of them gives information about the angle α (α1 and α2) in the direction of the heat source 8607.
The conceptual block diagram on
A preferred embodiment for finding heat source with peak temperature and the corresponding localization is described in the conceptual block diagram of
A preferred embodiment for confirmation of peak temperature and localization of the heat source (8905) is depicted in conceptual block diagram off
A preferred embodiment to extract a dynamic thermo mechanical map of the state of thermo mechanical stress (9005) is described in the conceptual block diagram of
A preferred embodiment shown on
A preferred embodiment shown on
[4] H. Chun-Lung and C. Ting-Hsuan, “Built-in Self-Test Design for Fault Detection and Fault Diagnosis in SRAM-Based FPGA,” Instrumentation and Measurement, IEEE Transactions on, vol. 58, pp. 2300-2315, 2009.
[5] Y. L. Peng, et al., “BIST-based diagnosis scheme for field programmable gate array interconnect delay faults,” Computers & Digital Techniques, IET, vol. 1, pp. 716-723, 2007.
The present application is a continuation of PCT/CA2011/050537 designating the United States, which claims the benefits of U.S. provisional application Ser. Nos. 61/275,722, filed on Sep. 7, 2010 and 61/420,766 filed on Dec. 7, 2010.
Number | Date | Country | |
---|---|---|---|
61275722 | Sep 2010 | US | |
61420766 | Dec 2010 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CA2011/050537 | Sep 2011 | US |
Child | 13782868 | US |