1. Field of Invention
The present invention relates generally to the field of Structured ASICs. Embodiments of the present invention relate to a circuit for a Structured ASIC.
2. Description of Related Art
The present invention relates generally to an improved Digitally Controlled Delay Line (DCDL) for a Structured ASIC.
A Structured ASIC is an ASIC (Application-Specific Integrated Circuit) having some pre-made elements that are manufactured once in a first manufacturing process and kept in inventory, then the elements are interconnected later, or customized by a customer, in a second manufacturing process by masks (mask-programmable) rather than making a circuit all at once as in a traditional ASIC. In a Structured ASIC the customization occurs by configuring one or more via layers between metal layers in the ASIC.
A configurable logic block (CLB) may be an element of field-programmable gate array (FPGA), structured ASIC devices, and/or other devices. CLBs may be configured, for example, to implement different random logic (from combinational logic, such as NANDs, NORs, or inverters, and/or sequential logic, such as flip-flops or latches).
Broadly defined, structured application-specific integrated circuits (ASICs) may attempt to reduce the effort, expense and risk of producing ASICs by standardizing portions of the physical implementation across multiple products. By amortizing the expensive mask layers of the device across a large set of different designs, the non-recurring engineering (NRE) for a customized ASIC seen by a particular customer, which are one-time costs that do not depend on the number of units sold, can be significantly reduced. There may be additional benefits to the standardization of some portion of mask set, which may include improved yield through higher regularity and/or reduced manufacturing time from tape-out to packaged chip.
ASICs can be broken down further into a full-custom ASIC, a Standard Cell-based ASIC (standard-cell), and a gate array ASIC. At the opposite end of an ASIC is a field-programmable gate array (FPGA), an integrated circuit designed to be configured by the customer or designer after manufacturing in the field using software commands rather than at a foundry or IC fab. Other non-ASICs include simple and complex PLDs (Programmable Logic Devices), and off-the-shelf small and medium scale IC components (SSI/MSI).
A full-custom ASIC customizes every layer in an ASIC device, which can have 10 to 15 layers, requiring in a lithography process 10 to 15 masks. Since the customized design of the ASIC occurs at the transistor level, and modern ASICs have tens if not hundreds of millions of transistors, a full-custom ASIC is typically economically feasible only for applications that required millions of units. An example of such an application is the cell phone digital modem or a flat panel television video processing device.
In a standard cell ASIC, circuits are constructed from predefined logic components known as cells. Designers work at the gate level, not the finer transistor level, simplifying the process, and instead of 10-15 layers only 3-5 layers may exist. The fab manufacturing the device provides a library of basic building blocks that can be used in the cells, such as basic logic gates, combinational components (and-or-inverter, multiplexer, 1-bit full adder), and basic memory, such as D-type latch and flip-flop. A library of other function blocks such as adder, barrel shifter and random access memory (RAM) may also exist. While the layout of each cell in a standard cell is predetermined, the circuit itself has to be uniquely constructed by connecting all layers to one another and the cells within each layer in a custom manner, which takes time and effort.
A register is a standard component in an ASIC, and is a group of flip-flops that stores a bit pattern. Registers can hold information from components or hold state between iterations of a clock so that it can be accessed by other components, to allow I/O synchronization, handshaking data between clock domains, pipelining, and the like.
In a gate-array ASIC, the level of abstraction is one level higher than a standard cell, in that each building block in a gate array is from an array of predefined cells, known as a base cell, which resembles a logic gate. Since location and type of cell is predetermined, gate-array ASICs can be manufactured in advance in greater quantities and inventoried for use later. A circuit is manufactured by customizing the interconnect between these cells, which is done at the metal layer via masks. In gate level ASICs, typically fewer metal layers have to be customized to specify the interconnect required to complete the circuit, which simplifies the manufacturing process.
A synchronous digital system has a clock distribution network that defines a reference point for moving data within the system. A clock distribution network distributes the clock signals from a common point to all the elements in the system that need it. Generally clock signals are loaded with a great fanout, travel over comparatively great distances, and operate at the higher speeds than other signals within the synchronous system. Clock waveforms must be particularly clean and sharp. In addition, long global interconnect lines become significantly more resistive as line dimensions are decreased, and is one of the primary reasons for the increasing significance of clock distribution on synchronous performance. The control of any differences and uncertainty in the arrival times of the clock signals can limit the maximum performance of the entire system and create race conditions in which an incorrect data signal may latch within a register. The clock distribution network often takes a significant portion of the power consumed by a chip; furthermore, significant power can be wasted in transitions within blocks, when their output is not needed. Power may be saved by clock gating, which involves adding logic gates to the clock distribution tree, so portions of the tree can be turned off when not needed.
A complex field programmable device is the most versatile non-ASIC, as the generic logic cells can be more sophisticated than ASIC cells, and the interconnect structure can be programmable in the field using software, rather than at a fab using for example photolithographic masks. A complex field programmable device can be re-programmed to a different circuit in hours, rather than only being programmable once at a fab like an ASIC. A complex field programmable device can be broadly divided into two categories, a Complex Programmable Logic Device (CPLD) and a Field Programmable Gate Array (FPGA). The logic cell of a CPLD is more complex than an FPGA, and has a D-type flip-flop and a programmable logic device semiconductor such as a PAL™ type programmable logic device semiconductor, with configurable product terms. The interconnect of a CPLD is more centralized, with fewer concentrated routing lines. A FPGA logic cell is smaller, with a D-type flip-flop and a small Look Up Table (LUT), a multi input and single output block that is widely used for logic mapping, or multiplexers for routing signals through the interconnect and logic cells. The interconnect structure in an FPGA tends to be more distributed and flexible than a CPLD, making it more ideal for more high capacity, complex devices. The FPGA design that defines a circuit is stored in RAM, so when the FPGA is powered off, the design for the circuit disappears. When the FPGA is powered back up, one must reload the circuit design from non-volatile memory.
A simple PLD, historically called a programmable logic device, is much more limited in application, as they do not have a general interconnect structure. Today these devices are relatively rare by themselves and are now used as internal components in an ASIC or CPLD. Likewise, off-the-shelf small and medium scale IC components (SSI/MSI) are rarely used anymore, as they are first generation devices such as the 7400 series transistor-transistor logic (TTL) manufactured by various companies used in the 1960s and 70s to build computers. These components are no longer supported by modern EDA (Electronic Design Automation) software and have very limited functionality.
A complex field programmable device can be thought of as a form of programmable logic fabric. One such programmable logic fabric is a SRAM programmable Look-Up Table (LUT) technology that forms the basis of Field Programmable Gate Arrays and Complex Programmable Logic Devices. The programmable fabric technology allows synthesis of a logic design described in a Hardware Description Language (HDL) to be synthesized on to the logic fabric in order to perform the required logic function. The logic fabric includes memory blocks, embedded multipliers, registers and Look-Up Table logic blocks. Interconnect between logic elements is also SRAM programmable. As the state of the SRAM is deleted when powered off, the function of the programmable logic fabric incorporating SRAM can be changed.
ASIC design flow as a whole is a complex endeavor that involves many tasks, as described further herein, such as: logic synthesis, Design-for-Test (DFT) insertion, Electric Rules Check (ERC) on gate-level netlist, floorplan, die size, I/O structure, design partition, macro placement, power distribution structure, clocks distribution structure, preliminary check, (e.g., IR drop voltage drop, Electrostatic Discharge (ESD)), placement and routing, parasitic extraction and reduction (parasitic devices), Standard Delay Format (SDF) timing data generated by EDA tools, various checks including but not limited to: static timing analysis, cross-talk analysis, IR drop analysis, and electron migration analysis.
At the first step in the ASIC design flow, the design entry step, the circuit is described, as in a design specification of what the circuit is to accomplish, including functionality goals, performance constraints such as power and speed, technology constraints like physical dimensions, and fabrication technology and design techniques specific to a given IC foundry. Further in the design entry step is a behavioral description that describes at a high-level the intended functional behavior of the circuit (such as to add two numbers for an adder), without reference to hardware. Next is a RTL (Register Transfer Language) structural description which references hardware, albeit at a high-level of abstraction using registers. RTL focuses on the flow of signals between registers, with all registers updated in a synchronous circuit at the same time in a given clock cycle, which further necessitates in the design flow that the clocks be synchronized and the circuits achieve timing constraints and timing closure. RTL description captures the change in design at each clock cycle. All the registers are updated at the same time in a clock cycle for a synchronous circuit. A synchronous circuit consists of two kinds of elements: registers and combinational logic. Registers have a clock, input data, output data and an enable signal port. Every clock cycle the input data is stored internally and the output data is updated to match the internal data. Registers, often implemented as flip-flops, synchronize the circuit's operation to the edges of the circuit clock signal, and have memory. Combinational logic performs all the logical functions in the circuit and it typically consists of logic gates. RTL is expressed usually in a Verilog or VHDL Hardware Description Language (HDL), which are industry standard language descriptions. A hardware description language (HDL) is a language used to describe a digital system, for example, a network switch, a memory or a flip-flop. By using a HDL one can describe any digital hardware.
A design flow progresses from logical design steps to more physical design steps. Throughout this flow timing is of critical importance and must be constantly reassessed so that timing closure is realized throughout the circuit, since timing between circuits could change at different stages of the flow. Furthermore, the circuit must be designed to be tested for faults. The insertion of test circuitry can be done at the logic synthesis step, where register transfer level (RTL), is turned into a design implementation in terms of logic gates such as a NAND gate. Thus logic synthesis is the process of generating a structural view from the RTL design output using an optimal number of primitive gate level components (NOT, NAND, NOR, and the like) that are not tied to a particular device technology (such as 32 nm features), nor do with any information on the components' propagation delay or size. In logical synthesis the circuit can be manipulated with Boolean algebra. Logical synthesis may be divided into two-level synthesis and multilevel synthesis. Because of the large number of fan-ins for the gates (the number of inputs to a gate), two-level synthesis employs special ASIC structures known as Programmable-Logic Arrays (PLA) and modified Programmable Array Logic (PAL)-based CPLD devices. Multilevel synthesis is more efficient and flexible, as it eliminates the stringent requirements for the number of gates and fan-ins in a design, and is preferred. The multilevel synthesis implementation is realized by optimizing area and delay in a circuit. However, optimizing multilevel synthesis logic is more difficult than optimizing two-level synthesis logic, and often employs heuristic techniques.
Functional synthesis is performed at the design entry stage to check that a design implements the specified architecture. Once Functional Verification is completed, the RTL is converted into an optimized gate level netlist, using smaller building blocks, in a step called Logic Synthesis or RTL synthesis. In EDA this task is performed by third party tools. The synthesis tool takes an RTL hardware description and a standard cell library for a particular manufacturer as input and produces a gate-level netlist as output. The standard cell library is the basic building block repository for today's IC design. Constraints for timing, area, speed, testability, and power are considered. Synthesis tools attempt to meet constraints by calculating the engineering cost of various implementations. The tool then attempts to generate the best gate level implementation for a given set of constraints, target the particular manufacturing process under consideration. The resulting gate-level netlist is a completely structural description with only standard cells at the “leaves” of the design. At logical/RTL synthesis it is also verified whether the Gate Level Conversion has been correctly performed by performing simulation. The netlist is typically modified to ensure any large net in the netlist has cells of proper drive strength (fan out), which indicates how many devices a gate can drive. A driving gate can be any cell in the standard cell library. During compilation of the netlist the EDA tool many adjust the size of the gate driving each net in the netlist so that area and power is not wasted in the circuit by having too large of a drive strength. Buffer cells are inserted when a large net is broken info smaller sections by the EDA tool.
Throughout the logical design state, an EDA tool performs a computer simulation of the layout before actual physical design.
The next step in the ASIC flow is the physical implementation of the gate level netlist, or physical design, such as system partitioning, floorplanning, placement and routing. The gate level netlist is converted into a geometric representation of the layout of the design. The layout is designed according to the design rules specified in the library for the fab that is to build the digital device. The design rules are guidelines based on the limitations of the fabrication process.
The physical implementation step consists of several sub steps: system partitioning, floorplanning, placement and routing. These steps relating to how the digital device is to be represented by the functional blocks, as one ASIC or several (system partitioning), how the functional blocks are to be laid out on one ASIC (floorplanning) and how the logic cells can be placed within the functional blocks (placement) and how these logic cells are to be interconnected with wiring (routing). The file produced at the output of this physical implementation is the so-called GDSII file, which is the file used by the foundry to fabricate the ASIC.
Floorplanning involves inputting into a floorplanning tool a netlist that describes the interconnection of ASIC blocks (RAM, ROM, ALU, cache controller, and the like); the logic cells (NAND, NOR, D flip-flop, and so on) within the blocks; and the logic cell connectors (e.g., terminals, pins, or ports). Floorplanning maps the logical description as found in the netlist to the physical description, the floorplan.
The goals of floorplanning are to arrange the ASIC blocks on the silicon chip, to decide the location of the I/O pads, to decide the location and number of the power pads, the type of power distribution, and the location and type of clock distribution. Design constraints in floorplanning include minimizing the silicon chip area and minimizing timing delay. Delay is often estimated from the total length of the interconnect and from an estimate of the total capacitance. Interconnect length and predicted interconnect capacitance is estimated from statistics of previously routed chips, including such factors as net fanout and block size of the circuits in the ASIC.
For any design to work at a specific speed, timing analysis has to be performed throughout the ASIC design flow. One must check using a Static Timing Tool in EDA whether the design is meeting the speed requirements of the specification. Industry standard Static Timing tools include Primetime (Synopsys), which verifies the timing performance of a design by checking the design for all possible timing violations caused by the physical design process.
During placement, for example, timing is effected since the length of an interconnect caused by placement changes the capacitance of the interconnect and hence changes the delay in the interconnect. The goal of an EDA placement tool is to arrange all the logic cells within the flexible blocks on a chip to achieve objectives such as: guarantee the router can complete the routing step, minimize all the critical net delays, make the chip as dense as possible, minimize power dissipation, and minimize cross talk between signals. Modern EDA placement tools use even more specific and achievable criteria than the above. The most commonly used placement objectives are one or more of the following: minimize the total estimated interconnect length, meet the timing requirements for critical nets, and minimize the interconnect congestion.
Algorithms for placement do exist, for example, the minimum rectilinear Steiner tree (MRST) is the shortest interconnect using a rectangular grid. The determination of the MRST is in general a NP-complete problem—which is difficult to solve in a reasonable time. For small numbers of terminals heuristic algorithms exist, but they are expensive in engineering cost to compute. Several approximations to the MRST exist and are used by EDA tools.
In the routing step, the wiring between the elements is planned. A Structured ASIC cross-section has metal layers; in a standard cell ASIC there may be nine metal layers, but in many structured ASICs not all metal layers need be for routing, and some layers may be pre-routed, and only the top layers are used for routing. This reduces the complexity of the manufacturing process, since non-recurring engineering costs are much lower, as photolithographic masks are required only for the fewer metal layers not for every layer, and production cycles are much shorter, as metallization is a comparatively quick process. The metal layers may be interconnected with one another at select vertical holes called vias that are filled with metal or some conductor, called the ‘via’ layer, and thus be configurable at this interconnecting layer, or ‘via configurable’. If the logic fabric comprising the Structured ASIC is configured with traditional IC optical lithography involving photolithographic masks, it can be thought of as “mask programmable”. The mask for a Structured ASIC is programmed at the vias, which can be termed a via-configurable logic block (VCLB) architecture. The VCLB configuration and programmability may be performed by changing properties of so called “configurable vias”—connections between VCLB internal nodes. A configurable or programmable via may be in one of two possible states: it may be either enabled or disabled. If a programmable via is enabled, then it can conduct a signal (i.e., the via exists and has low resistance). If a via is disabled, then it cannot practically conduct a signal, i.e., the via has very high resistance or does not physically exist. In some designs, such as by the present assignee to this invention, eASIC Corporation, the customizable metallization layers may be reduced to a few or even a single via layer where the customization is performed, see by way of example and not limitation U.S. Pat. No. 6,953,956, issued to eASIC Corporation on Oct. 11, 2005; U.S. Pat. No. 6,476,493, issued to eASIC Corporation on Nov. 5, 2002; and U.S. Pat. No. 6,331,733, issued to eASIC Corporation on Dec. 18, 2001; all incorporated herein by reference in their entirety. Further, this single via layer could be customized without resorting to mask-based optical lithography, but with a maskless e-beam process, as taught by the '956 patent.
During circuit extraction and post layout simulation, a back-annotated netlist is used with timing information to see if the physical design has achieved the objectives of speed, power and the like specified for the design. If not, the entire ASIC design flow process is repeated. In modern EDA tools the delays calculated from a simulation library of library cells used in the design, during physical design steps, are placed in a special file called the SDF (Synopsys Delay Format) file. Each cell can have its own delay based on where in the netlist it is found, what are its neighboring cells, the load on the cell, the fan-in, and the like. Each internal path in a cell can have a different propagation time for a signal, known as a timing arc. The maximum possible clock rate is determined by the slowest logic path in the circuit, called the critical path.
Compounding the problem of delay is that in a synchronous ASIC one must avoid clock skew, and different parts of the ASIC may have different clock domains controlling them, with the wiring nets that establish the clock signal forming a clock net branching out in the form of a clock tree. Establishing this tree, which often requires additional circuitry like buffer cells to help drive the massive clock tree, is called clock tree synthesis. As an ASIC is a synchronous circuit, all the clocks in the clock tree must be in synch and chip timing control achieved, typically by using Phase-Locked Loops (PLLs) and/or Delay-Locked Loops (DLLs). If the clock signal arrives at different components at different times, there is clock skew. Clock skew can be caused by many different things, such as wire-interconnect length, temperature variations and differences in input capacitance on the clock inputs of devices using the clock. Further, timing must satisfy register setup and hold time requirements. Both data propagation delay and clock skew play important parts in these calculations. Problems of clock skew can be solved by reducing short data paths, adding delay in a data path, clock reversing and the like. Thus during the physical synthesis steps, clock synthesis is an important step, which distributes the clock network throughout the ASIC and minimizes the clock skew and delay.
Finally, IP in the form of proprietary third party functionality such as a semiconductor processor may be embedded in an ASIC using soft macros, firm macros and hard macros that can be bought from third parties. A soft macro describes the IP as RTL code and does not have timing closure given the design specification nor layout optimization for the process under consideration. However, as RTL code a soft macro can be modified by a designer with EDA tools and synthesized into the designer's library. By contrast, a hard macro is timing-guaranteed and layout-optimized for a particular design specification and process technology but is not portable outside the particular design and process under consideration, and is not represented in RTL code; rather a hard macro is tailored for a particular foundry and closer to GDSII layout. A firm macro falls between a hard macro and a soft macro. Firm macros are in netlist format, are optimized for performance/area/power using a specific fabrication technology, are more flexible and portable than hard macros, and more predictive of performance and area to be used than soft macros. Macros obviate a designer having to design every component from scratch, and are a great time saver. Third party designers favor firm and hard macros since it is easier to hide intellectual property (IP) present in such macros than it is to hide such IP in a soft macro.
Given the above, the pros and cons of standard cell ASICs versus a complex field programmable device such as an FPGA is as follows. The advantages of FPGAs are that they are easy to design, have shorter development times and thus are faster in time-to-market, and have lower NRE costs. These are also the disadvantages of standard cell ASICs: they are difficult to design, have long development times, and high NRE costs. The disadvantages of FPGAs are that design size is limited to relatively small production designs, design complexity is limited, performance is limited, power consumption is high, and there is a high cost per unit. These FPGA disadvantages are standard-cell advantages, as standard cells support large and complex designs, have high performance, low power consumption and low per-unit cost at a high volume.
A Structured ASIC falls between an FPGA and a Standard Cell-based ASIC in classification and performance. Structured ASICs are used for mid-volume level designs. In a Structured ASIC the task for the designer is to map the circuit into a fixed arrangement of known cells.
Structured ASICs are closer to standard-cells in their advantages over FPGAs. The disadvantage of structured ASICs compared to FPGAs is that FPGAs do not require any user design information during manufacturing. Therefore, FPGA parts can be manufactured in larger volumes and can exist in larger inventories. This allows the latency of getting parts to customers in the right volumes to be reduced. FPGAs can also be modified after their initial configuration, which means that design bugs can be removed without requiring a fabrication cycle. Design improvements can be made in the field, and even done remotely, which removes the requirement of a technician to physically interact with the system.
Given these pros and cons, structured ASICs combine the best features of FPGAs and standard cell ASICS. Structured ASICs can have three main architectures: fine-grained, where the structured elements are unconnected discrete components, including transistors, resistors and other components; medium-grained, where the structured elements contain generic logic, such as gates, MUXs, LUTs or flip-flops; and, finally, hierarchical design, which contains mini-structured elements such as gates, MUXs and LUTs but no flip-flops for storage, with the flip-flops or registers added later. Hierarchical design has blocks and sub-blocks in a hierarchy, and takes more run time in an EDA tool than a flat design to build. The architectural comparison between fine-grained, medium-grained and hierarchical structured ASICs is that fine-grained structured ASICs require many connections in and out of a structured element, while the higher granularities reduce connections to the structured element but decreases the functionality they can support. Each individual design will benefit differently at these various granularities.
Structured ASIC advantages over standard cell ASICs and FPGAs include that they are largely prefabricated, with components are that are almost connected in a variety of predefined configurations and ready to be customized into any one of these configurations. Only a few metal layers are needed for fabrication of a Structured ASIC, which dramatically reduces the turnaround time. Structured ASICs are easier and faster to design than standard cell ASICs. Multiple global and local clocks are prefabricated in a Structured ASIC. Consequently, there are no skew problems that need to be addressed by the ASIC designer. Thus signal integrity and timing issues are inherently addressed, making design of a circuit simpler and faster. Capacity, performance, and power consumption in a Structured ASIC is closer to that of a standard cell ASIC. Further, structured ASICs have faster design time, reduced NRE costs, and quicker turnaround than standard cell ASICs. Thus with structured ASICs the per-unit cost is reasonable for several hundreds to 100 k unit production runs.
A technology comparison between standard cell ASICs, structured ASICs, and FPGAs, respectively, is roughly as follows: generally speaking, there is a ratio of 100:33:1 between the number of gates in a given area for standard cell ASICs, structured ASICs, and FPGAs, respectively; a ratio of 100:75:15 for performance (based on clock frequency); and a ratio of 1:3:12 for power, though these ratios change year by year and at different process lithographic nodes.
Compared to a field-programmable gate array (FPGA), the unit price of a Structured ASIC solution may be reduced by a significant amount due to the removal of the storage and logic required for configuration storage and implementation. The unit cost of a Structured ASIC may be somewhat higher than a full custom ASIC, primarily due to the imperfect fit between design requirements and a standardized base layer, with certain I/O, memory and logic capacities.
Structured ASIC products may be differentiated by the point at which the user customization occurs and how that customization is actually implemented. Most structured ASICs may only standardize transistors and the lowest levels of metal. A large set of metal and via masks may be needed in order to customize a product. This yields a marginal cost reduction for NRE. Manufacturing latency and yield benefits may also be compromised using this approach.
An ideal ASIC device may combine the field programmability of FPGAs with the power and size efficiency of ASICs or structured ASICs.
A system-on-chip (SoC) is an integrated circuit that implements many or all of the functions of a complete electronic system. The components of a SoC vary with the application. Some SoCs contain mixed signal and analog input/output (IO), but usually most of a SoC is digital. The SoC may contain memory, CPUs (central processing units)/microprocessors, busses, specialized logic and other digital functions. The architecture of the SoC is tailored to an application rather than being general-purpose.
A FET (Field Effect Transistor) is a transistor that uses an electric field to control the conductivity of a charge carrier channel in a semiconductor. A common type of FET is the Metal Oxide Semiconductor FET (MOSFET). MOSFET work by inducing a conducting channel between two contacts called the source and the drain by applying a voltage on the oxide-insulated gate electrode. Two types of MOSFET are called nMOSFET (commonly known as nMOS or NFET) and pMOSFET (commonly known as pMOS or PFET) depending on the type of carriers flowing through the channel. A nMOS transistor is made up of n-type source and drain and a p-type substrate. The three modes of operation in a nMOS are called the cut-off, triode and saturation. nMOS logic is easy to design and manufacture, but devices made of nMOS logic gates dissipate static power when the circuit is idling, since DC current flows through the logic gate when the output is low. By contrast, a pMOS transistor is made up of p-type source and drain and a n-type substrate. PMOS technology is low cost and has a good noise immunity. In a nMOS, carriers are electrons, while in a pMOS, carriers are holes; since electrons travel faster than holes, all things being equal NFETs are twice as fast as PFETs. When a high voltage is applied to the gate, with the gate-source voltage exceeding some threshold value (VGs>VTH), the nMOS will conduct, while pMOS will not; and conversely when a low voltage is applied in the gate, nMOS will not conduct and pMOS will conduct. PFETs are normally closed switches and NFETs are normally open switches. PFETs often occupy more silicon area than NFETs when forming logic blocks. PMOS devices are more immune to noise than nMOS devices. Furthermore, nMOS ICs are smaller than pMOS ICs with the same functionality, since the nMOS can provide one-half of the impedance provided by a pMOS under the same geometry and operating conditions.
Complementary metal-oxide-semiconductor (CMOS) is a technology for constructing integrated circuits. CMOS is sometimes referred to as complementary-symmetry metal-oxide-semiconductor (or COS-MOS). The words “complementary-symmetry” refer to the fact that the typical digital design style with CMOS uses complementary and symmetrical pairs of p-type and n-type metal oxide semiconductor field effect transistors (MOSFETs) for logic functions. Complementary Metal-Oxide-Silicon circuits require an nMOS and pMOS transistor technology on the same substrate. An n-type well is provided in the p-type substrate. Alternatively one can use a p-well or both an n-type and p-type well in a low-doped substrate. The gate oxide, poly-silicon gate and source-drain contact metal are typically shared between the pMOS and nMOS technology, while the source-drain implants are done separately. Since CMOS circuits contain pMOS devices, which are affected by the lower hole mobility, CMOS circuits are not faster than their all-nMOS counter parts. Even when scaling the size of the pMOS devices so that they provide the same current, the larger pMOS device has a higher capacitance.
The CMOS advantage is that the output of a CMOS inverter can be as high as the power supply voltage and as low as ground. This large voltage swing and the steep transition between logic levels yield large operation margins and therefore also a high circuit yield. In addition, there is no power dissipation in either logic state. Instead the power dissipation occurs only when a transition is made between logic states. CMOS circuits are therefore not faster than nMOS circuits but are more suited for very/ultra large-scale integration (VLSI/ULSI).
In electronics, a multiplexer (MUX or mux), sometimes called a data selector, is a circuit that selects one of several analog or digital input signals and forwards the selected input into a single line. A multiplexer of 2n inputs has n select lines, which are used to select which input line to send to the output. Demultiplexers take one data input and a number of selection inputs, and they have several outputs. Similarly, a decoder is a circuit that performs the reverse operations of an encoder.
In integrated circuits often clock signals need to be adjusted for skew and for timing, with the adjustment occurring as a fraction of a normal clock cycle period. This can be done through Digitally Controlled Delay Line (DCDL) circuitry, such as found in the prior art, see U.S. Pat. No. 5,465,076, to Yamauchi et al., issued Nov. 7, 1995, incorporated by reference herein. DCDLs are also used with Delay Locked Loops (DLLs).
The trouble with some of the prior art is that the minimum delay may be too large, the range may be too small or not scalable, glitches of the clock signal may result from the architecture employed for the DCDL and resolution may not be fine enough. Minimum delay occurs from a DCDL circuit design architecture if a clock signal has to pass through a number of delays before it can be output again, with each of these delays summing together to produce a minimum delay that may be unacceptably large for a design. Range is a function of how many stages can be safely added to a design to still achieve a scalable, useable output, and depends on the architecture. Clock glitches should always be avoided but are sometimes unavoidable in certain DCDL architectures that are otherwise acceptable. Regarding resolution, certain architectures will only achieve a “coarse” tuning of the clock signal, meaning the unit of time by which the clock signal may be delayed is relatively large, as opposed to a “fine” tuning, which achieves a comparatively smaller delay. Coarse tuning is due to employing an entire gate to slow down a clock signal, as opposed to fine tuning where sub-gates are used. However, coarse tuning is sometimes useful to give a designer a range of clock delays, and should optimally be available in addition to fine tuning.
The 2-bit Binary-to-Thermometer Decoder 20 supplies a plurality of different voltages T, output in response to a two-bit binary code signal input as thermometer values (unary coding), meaning for example a binary 0 is output as 000, a binary 1 is output as 001, a binary 2 is output as 011, a binary 3 is output as 111, a binary 4 is output as 1111, a binary 5 is output as 11111, a binary 6 is output as 111111, a binary 7 is output as 1111111, a binary 8 is 11111111, a binary 9 is 111111111, a binary 10 is 1111111111. The incorporation of zero is also possible in such a unary coding scheme, as are alternative schemes where the compliment of the output is taken.
The 2-bit Binary-to-Thermometer Decoder 20 will convert a 2-bit binary number input into an equivalent thermometer value output, which can represent voltage values. When a predetermined thermometer voltage value output is received by the gates 12, 14, and 16 of pMOS transistors 13, 15, and 17, and the gate-source voltage of the P-type MOSFETs exceeds the threshold value, certain of the pMOS transistors will conduct, depending on the value received. This increases the flow of the current into the source of the PFET transistor 22, which forms part of the inverter 32. Increasing the thermometer values output from decoder block 20 from a low number to higher number will cause more of the pMOS transistors at the top of the circuit to conduct.
Likewise, a similar thing happens when thermometer voltage values are input into the nMOS transistors 24, 26 and 28, connected in parallel as shown. The first nMOS transistor 24 is connected at its gate to Vdd, the positive supply voltage, and certain transistors, such as nMOS transistors 26, 28, depending on the thermometer voltage value from Decoder block 20 input into their gates, will conduct when their gate-source voltage exceeds a threshold value, which will increase the flow of current into the source of the NFET transistor 30, which forms part of the inverter 32. The net effect of increasing the thermometer values is that more current will flow into the sources of the PFET transistor 22 and the NFET transistor 30, which will increase the current flow through inverter 32. An analysis of the fine-tuning DCDL circuit 10 of
What is lacking in the prior art is a DCDL circuit for use in a Structured ASIC that combines fine-tuning and coarse-tuning in a single circuit, has a small minimum delay, a large range that is scalable, provides a fine resolution when used in fine-tuning mode and whose output is glitch free. What is further needed is a DCDL tied to a via-configurable, balanced and scalable high-speed routing fabric of novel configuration. The present invention has these features.
Accordingly, an aspect of the present invention is to provide a Digitally Controlled Delay Line (DCDL) for a Structured ASIC, manufactured using a CMOS process using NFET/nMOS and PFET/pMOS transistors, which may include together with the DCDL a via-configurable logic block (VCLB) architecture. VCLB configuration may be performed by changing properties of so-called “configurable vias”—connections between VCLB internal nodes and elements in a Structured ASIC.
An aspect of the present invention is to provide a DCDL circuit that combines fine-tuning and coarse-tuning in a single circuit.
An aspect of the present invention is to provide a DCDL that has a small minimum resolution for delay.
As aspect of the present invention is to provide a DCDL that has a small minimum delay.
A further aspect of the present invention is for a DCDL that is scalable and has a large range, from minimum delay to maximum delay.
Another aspect of the present invention is to provide for a DCDL which produces glitch free output over its entire range.
Yet another aspect of the present invention is to tie the DCDL to a high-speed routing fabric that is automatically balanced, inherently supports a tree, and is scalable.
The sum total of all of the above advantages, as well as the numerous other advantages disclosed and inherent from the invention described herein, creates an improvement over prior techniques.
The above described and many other features and attendant advantages of the present invention will become apparent from a consideration of the following detailed description when considered in conjunction with the accompanying drawings.
Detailed description of preferred embodiments of the invention will be made with reference to the accompanying drawings. Disclosed herein is a detailed description of the best presently known mode of carrying out the invention. This description is not to be taken in a limiting sense, but is made merely for the purpose of illustrating the general principles of the invention. The section titles and overall organization of the present detailed description are for the purpose of convenience only and are not intended to limit the present invention.
In an actual chip layout the exact placement of the blocks shown therein may vary from the simple stylized representations as shown in the drawings, and in addition there may be several layers in an ASIC chip that achieve the functionality shown in the figures, superimposed on one another, and not necessarily a single layer as shown in the drawings. This is true for most of the elements in the present invention, as understood by one of ordinary skill, and that does not detract from any of the teachings of the functional relationships between the elements of the present invention as shown herein. Furthermore, designations of orientations such as north-south or east-west are relative to the observer and depend on the chip as outlined in the drawings; hence these orientations are for convenience only and do not limit the invention, other than indicating that the north-south direction is orthogonal to the east-west direction, in the same way that a vertical direction is orthogonal to a horizontal direction.
It should be understood that one skilled in the art may, using the teachings of the present invention, vary embodiments shown in the drawings without departing from the spirit of the invention herein. In the figures, elements with like numbered reference numbers in different figures indicate the presence of previously defined identical elements.
The method and apparatus of the present invention may be described in software, such as the representation of the invention in an EDA tool, or realized in hardwire, such as the actual physical instantiation.
Regarding the floorplan of the present invention, the drawings sometimes show elements as blocks that in a physical implementation may differ from this stylized representation, but the essential features of the floorplan should be apparent to one of ordinary skill in the art from the teachings herein.
The elements in the floor plan of the present invention are operatively connected to one another where necessary, as can be appreciated by one of ordinary skill in the art from the teachings herein.
The Digitally Controlled Delay Line (DCDL) of the present invention, in particular as shown in the drawings, is for delaying input or output signals, such as PLL, DLL or clock signals, but may also include delaying IO signals (which sometimes require delay due to various IO standards) and other signals into or out of the core logic 715 of the chip 100, which is shown in the drawings in
One purpose of the DCDL is to perform a wide range of timing delays that can be controlled and calibrated using digital signals from any control state machines implemented in the core logic. Each delay line may be composed of eight independent Delay Taps, as further described herein and as shown in
The controller for the DCDL is found in the core 715, and has its own control logic. The lines from the DCDL controller in core 715 to the DCDL that is found in the eIOMOTIF portion 660 of the chip are sent in Grey encoded binary code rather than thermometer binary code in order to save space on the chip 100, since Grey code takes fewer signal lines to send and is converted to thermometer code for controlling the DCDL circuit.
In order to meet the timing requirements, in the present invention the DCDL delay circuit is implemented using a fine controllable delay section, such as shown in
Turning attention to
The DCDL of
The description of the operation of the five coarse delay modules 22, 24, 26, 28, 30 is that they operate as traditional gate-delay devices comprising muxes, in that when a predetermined control signal is received by the multiplexer, the input signal (typically a clock signal) is either sent to output Z1 or output Z2. Hence in
At some point, depending on how many stages of coarse delay modules there are (five are shown in
As discussed, the coarse gain modules 22, 24, 26, 28, 30 will respond to five thermometer control signals. The thermometer control signals are output from a coarse delay decoder 230, as shown in
By contrast to the coarse delay modules, the fine-delay modules 12, 14 have the ability to route a signal to be delayed, such as a clock signal, by a more graduated and precise series of unit times, a series of “sub-gate delay” unit times, which are smaller than the coarse grain module “gate-delay” unit of time in their minimum value (minimum resolution), and, when summed together, may be smaller than the coarse grain module gate-delay unit of time. Referring to
The sub-gate delay logic array 250 is a fine-delay, sub-gate delay circuit shown in detail in the embodiments of
In
Operation of this delay inverter controlled by thermometer decoder is as follows. When there is the application of a suitable voltage control signal, which is thermometer coded, at gate inputs CN1, CN2, CN3, CN4, CN5, CN6, CN7 for the PFET transistors and gate inputs C1, C2. C3, C4, C5, C6, C7 for the NFET transistors, sufficient so that the MOSFET gate-source voltage exceeds some threshold value, the pMOS transistors 262, 264, 266, 268, 270, 272, 274 and nMOS transistors 261, 263, 265, 267, 269, 272, 273 will conduct maximum current via their drains into sources of CMOS transistors 253, 255, which can be shown empirically and theoretically to produce a minimum delay through the inverter 252. Conversely, turning off all PFET and NFET transistors at the top and bottom will produce a maximum delay through inverter 252, while turning off some pairs of PFET and NFET transistors, but keeping other pairs of PFET and NFET transistors on, will result in a delay somewhere between these two extremes. Consequently this sub-gate delay logic array structure, as in
Hence it can be seen that the fine grain sub-gate delay logic array structure shown in
Glitch-free operation of the present invention has been found to occur when the DCDL structure disclosed herein is operated using thermometer coding for the control signals, and the thermometer coding is not changing from one value to another; hence the present DCDL is substantially glitch-free. This is true both for the fine-grain control blocks 12, 14 and the coarse grain control blocks 22, 24, 26, 28, 30. In
Thus for the thermometer coding of the coarse tuning blocks 22, 24, 26, 28, 30, using the earlier notation, and if five bits are used for thermometer values, then a control thermometer value for minimum delay would be 10000, i.e. a delay path of A→22→Z. For the next larger delay, the thermometer value might be 11000, i.e. a delay path of A→22→24→22→Z. The next larger delay after this step might have a thermometer value of 11100, i.e. a delay path of A→22→24→26→24→22→Z. The maximum delay would be to traverse all the coarse tuning blocks, and might have a thermometer control voltage value of 11111, i.e. a delay path of A→22→24→26→28→30→28→26→24→22→Z. Of course one could elect to turn off the control modules altogether, e.g. with a thermometer value of 00000. The same is true for coarse delay that may be produced by the fine grain delay modules 12 and 14 (which can take two bits at D1, D2 in
The controlling CMOS transistors comprise transistors 260A, 262A, 264A, 266A, 268A 270A, 272A, 274A (pMOS) and transistors 259A, 261A, 263A, 265A, 267A, 269A, 272A, 273A (nMOS), with the first two transistors 259A, 260A from the plurality of CMOS transistors being connected at their gates to Vdd and Vss, positive and negative (ground) voltage, respectively, and the remaining seven P-type MOSFET transistors 262A, 264A, 266A, 268A, 270A, 272A, 274A and seven N-type MOSFET transistors 261A, 263A, 265A, 267A, 269A, 272A, 273A having their gates connected to fine-stage decoder thermometer decoder outputs CN1, CN2, CN3, CN4, CN5, CN6, CN7 for the PFET transistors and fine-stage decoder thermometer decoder outputs C1, C2, C3, C4, C5, C6, C7 for the NFET transistors. The outputs (e.g. the drain) of these transistors are operatively tied to the inverter 252A, as shown. Hence, as shown in
The fine-delay structure described in the preceding paragraph and in reference to
Operation of the fine-tune delay of this fine-stage delay inverter, controlled by a thermometer decoder, is substantially the same as in the
Turning now to the Structured ASIC in which the DCDL of the present invention appears in, there in shown in
As shown in the figures, in particular
All of these first second, third and fourth routing fabrics are distinct, and ordinarily the first and third routing fabrics dealing with IO and testing are not directly connected, but a designer may decide to operatively connected to one fabric to another and the core 715.
The first IO fabric of IO sub-bank 630, has four sub-banks 632, 634, 636, 638 on the left side of the Structured ASIC in
As shown in
The Structured ASIC chip 100 of the present invention has eight signal metal layer (M1-M8, with one of those eight layers being customizable or via configurable by the customer of the Structured ASIC and the others being fixed prior to customization by the customer), and three metal layers M9/M10/M11 for power distribution.
In
IO path areas for power related macros and sub-bank routing include areas for power related macros and subbank routing, and to logical pin IO repeater areas, where any IO signal may be buffered and/or repeated or transmitted for eventual transmission to the logical physical pins that contact the Structured ASIC chip 100 at the periphery, for input/output to external signals. The eIOMOTIF boundary region 660 can contain logic to configure the eIO cell blocks 670, and is also tied to the DCDL blocks, and the eIOMOTIF boundary region 660 can be considered part of the core 715.
For the Structured ASIC chip 100 there are several IO sub-bank routing blocks 630, as can be seen in
As best shown in
As best shown in
As best shown in
As best seen in
A plurality of planar connection blocks or connectors 1094 can be made to connect what is normally an open circuit at each of the lines 1092 in which these connectors are placed inline with the lines 1092. By filling the connectors, preferably in a via-configurable manner, to close, the lines 1092 go from an open circuit to a closed circuit state and conduct a signal. Once the connectors 1094 are closed there can be electrical conduction in the horizontally extending wires 1092. The via programmable planar connection blocks 1094 are placed in a diagonal line as shown, to provide a better layout. Inverters or inverting buffers 1096 are placed along a diagonal line to create a balanced signal, facilitate the signal, and connect to the horizontally placed wires 1092. The distance of each inverter 1096 from the connectors 1094 are equally spaced so any signal that branches from the connector takes the same amount of time to traverse one branch leading up as a signal does to traverse the other branch leading down. The HS units 1082, 1084 have a planar network end 1097 and an open end 1098. To form a planar network, as shown, the two planar network ends of HS units 1082, 1084 are abutted end to end. The area of intersecting vertical and horizontal signal wires 1090, 1092, together with associated programmable vias, inverters and planar box connection blocks, form a fourth routing fabric switch.
The high-speed routing fabric of
An illustration of the myriad connections that may be possible given the structure of
In an actual design the more general case is to have several trees in parallel, each using different lines in the high-speed fabric 1080. Hence one has say eight entry points on the left hand side of the HS fabric 1080 which runs down the north-south side of the chip 100 and eight destination points running into the core 715 of the chip 100, all handled by the HS fabric working with the eIOMOTIF fabric 660, and running into the boundary eMotif cells 603. Eight entry points are often used with phases in PLL/DLLs in the chip 100. Multiple entry points are also used with DDR SDRAM interfaces, as explained further herein. The routing delay will be the same for any and all of these entry and destination points due to the balanced nature of the HS fabric 1080.
The HS fabric 1080 abuts a single eMotif 203 module on one side as shown in
The HS fabric 1080 can be operative connected to the eIOMOTIF fabric 660, which is tied to both the eMotif cell modules 603 and the eIOs of IO sub-bank 630. The HS fabric and the trees that are capable of being built in it can support the global clock tree for chip 100.
The HS fabric 1080 can also support an interface for memory, such as DDR, (DDR SDRAM) and any associated logic for this interface to DDR (the actual DDR memory itself is found outside the chip 100). The HS fabric 1080 also supports eIOs and DLLs/PLLs in the IO sub-bank 630, including but not limited to single-ended IOs and differential IOs found therein. A byte of DDR interface includes data for eight single-ended IOs, a differential IO for any synchronization strobe, and data for the PLL/DLL. This DDR interface is readily implementable from the hardware of the present invention, despite the strict requirements for skew, cross-talk and balancing, by utilizing the eIOMOTIF fabric, and eMOTIF modules. Using the hardware one could even construct a hard macro to achieve the functionality of the DDR interface. Using the present invention any interface including but not limited to any serial data streams, serializers/deserializers, network interfaces, and other data interfaces.
Regarding the present invention, it is important to reiterate that the floorplan of the Structured ASIC is providing an infrastructure for a customer to use to build some sort of circuit of value to the customer, primarily through programmable vias. The number of circuits that can be built, and the various interconnections between the elements of the Structured ASIC, is a large set. Any number of connections may be made as can be appreciated by one of ordinary skill in the art from the teachings herein.
The architecture of the present invention has been found to not produce clock glitches when control signals are in thermometer coding as taught herein, and have a wide range of operation across various process, voltage and temperature (PVT) variations.
A designer using the architecture for a DCDL of the present invention can thus make various delays, from fine to coarse, over a wide range. Hence the present invention achieves glitch free and scalable range DCDL by combining in a serial, pipeline stage manner a sub-gate delay fine stage structure for DCDL in combination with a coarse state structure, as shown in the figures, as long as thermometer coding is employed for the control code, and the control code does not change during a transition of any data signal such as a clock signal. Hence the present invention is substantially glitch-free.
Placement of the blocks that comprise the DCDL of the present invention are shown in
As mentioned, the controller for the DCDL is found in the core 715, and has its own control logic, with signal lines from the DCDL controller in core 715 sent to the DCDL found in the eIOMOTIF fabric 660 as an encoded Grey code signal to save space on the chip 100, since Grey codes are more compact than thermometer value codes and require fewer signal lines or bandwidth to transmit. Thus the thermometer values used in the DCDL are actually originally Grey code values in the DCDL controller found in core 715 that are converted to thermometer code by the DCDL Binary-to-Thermometer decoder. This saves signal lines for transmission from the DCDL controller to the DCDL circuit, since for example four bits using a decoder can produce 16 bits of instructions (2̂4=16). In actual practice eight bits in Grey code are sent by a DCDL controller found in core 715 and decoded by the decoders as shown in
Regarding the present invention, it is important to reiterate that the floorplan of the Structured ASIC is providing an infrastructure for a customer to use to build some sort of circuit of value to the customer, primarily through programmable vias. The number of circuits that can be built, and the various interconnections between the elements of the Structured ASIC, is a large set. Thus by definition not every conceivable variation of interconnection that is possible using the architecture of the present invention can be readily described in a single document of reasonable size, but the essential features are described in the present application, as can be appreciated by one of ordinary skill in the art.
Regarding manufacture of the present semiconductor circuit comprising a DCDL in a via-configurable Structured ASIC, it may be manufactured on a 28 nm CMOS process lithographic node or smaller and having feature sizes of this dimension or smaller. The method of manufacturing the ASIC may be as the flow was described herein in connection with an ASIC and/or Structured ASIC; and the DCDL would be a block of logic within that ASIC. The DCDL as well as the floor plan of the Structured ASIC of the present invention are manufactured using a CMOS semiconductor process using NFET/nMOS and PFET/pMOS transistors, which includes a via-configurable logic block (VCLB) architecture. VCLB configuration may be performed by changing properties of so called “configurable vias”—connections between VCLB internal nodes. The configurable vias that are used to customize the chip at a plurality of metal layers, and preferably between two metal layers with a single via layer, and are changed by the customer that deploys the Structured ASIC. Thus it is possible that in this design the customizable metallization layers may be reduced to a few or even a single via layer where the customization is performed, see by way of example and not limitation the patents to the present assignee to this invention, eASIC Corporation, U.S. Pat. No. 6,953,956, issued to eASIC Corporation on Oct. 11, 2005; U.S. Pat. No. 6,476,493, issued to eASIC Corporation on Nov. 5, 2002; and U.S. Pat. No. 6,331,733, issued to eASIC Corporation on Dec. 18, 2001; all incorporated herein by reference in their entirety. Further, a single via layer could be customized without resorting to mask-based optical lithography, but with a maskless e-beam process, as taught by the '956 patent.
Modifications, subtractions and/or additions can be applied by one of ordinary skill from the teachings herein without departing from the scope of the present invention. For example, though the invention discusses the architecture of a DCDL but does not make any claims on how to configure the DCDL to achieve control, e.g. to construct a first-order DCDL, or a second-order DCDL. Constructing such a DCDL is only limited by the imagination of the designer using the architecture of the present invention. As another example, both in the fine-tune stage and the coarse-tune stage the delay elements are inverters, but this term should be thought of as synonymous with any sub-gate element that is capable of delaying a signal; inverters are generally favored because the amount of delay produced is relatively small, hence a more fine resolution of delay is possible by the cumulative addition of such delays, but in general any electronic structure that produces delay can be thought of as functioning as and synonymous with the delay-producing inverter taught herein. Thus the scope of the invention is limited solely by the claims.
It is intended that the scope of the present invention extends to all such modifications and/or additions and that the scope of the present invention is limited solely by the claims set forth below.
The present application is related to: U.S. application Ser. No. ______, Attn. Docket No. EAS 12-1-2 for “VIA-CONFIGURABLE HIGH-PERFORMANCE LOGIC BLOCK INVOLVING TRANSISTOR CHAINS” by Alexander Andreev, Sergey Gribok, Ranko Scepanovic, Phey-Chuin TAN, Chee-Wei KUNG, filed the same day as the present invention, ______ 2012; U.S. application Ser. No. ______, Attn. Docket No. EAS 12-2-2 for “ARCHITECTURAL FLOORPLAN FOR A STRUCTURED ASIC MANUFACTURED ON A 28 NM CMOS PROCESS LITHOGRAPHIC NODE OR SMALLER” by Alexander Andreev, Ranko Scepanovic, Ivan Pavisic, Alexander Yahontov, Mikhail Udovikhin, Igor Vikhliantsev, Chong-Teik LIM, Seow-Sung LEE, Chee-Wei KUNG, filed the same day as the present invention, ______ 2012; U.S. application Ser. No. ______, Attn. Docket No. EAS 12-3-2 for “CLOCK NETWORK FISHBONE ARCHITECTURE FOR A STRUCTURED ASIC MANUFACTURED ON A 28 NM CMOS PROCESS LITHOGRAPHIC NODE” by Alexander Andreev, Andrey Nikishin, Sergey Gribok, Phey-Chuin TAN, Choon-Hun CHOO, filed the same day as the present invention, ______ 2012; U.S. application Ser. No. ______, Attn. Docket No. EAS 12-4-2 for “MICROCONTROLLER CONTROLLED OR DIRECT MODE CONTROLLED NETWORK-FABRIC ON A STRUCTURED ASIC” by Alexander Andreev, Andrey Nikitin, Marian Serbian, Massimo Verita, filed the same day as the present invention, ______ 2012; Attn. Docket No. EAS 12-5-2 for “TEMPERATURE CONTROLLED STRUCTURED ASIC MANUFACTURED ON A 28 NM CMOS PROCESS LITHOGRAPHIC NODE” by Alexander Andreev and Massimo Verita, filed the same day as the present invention, ______ 2012; and all assigned to the same Assignee as the present invention, all of which are specifically incorporated herein by reference.