This disclosure relates generally to multi-chip packages and in particular, to improved techniques for reducing a number of required die-design types for a multi-chip package implementation.
The invention may best be understood by referring to the following description and accompanying drawings that are used to illustrate embodiments of the invention. In the drawings:
In some embodiments, programmable devices such as FPGAs may be used to create multi-chip computing systems, for example, to prototype system designs prior to committing with fixed CPUs, GPUs, ASICS, and the like. Usually, several different programable die designs have been needed to implement the system. Unfortunately, this can cost excessive money and time in creating and verifying the different masks and manufacture processes needed for multiple different die designs. Accordingly, in some embodiments, new die configuration types are provided that may be used together with other instances of the design to create multi die modules requiring just the single die type. For example, in some embodiments, a module may use a bridge with through silicon via (TSV) capabilities or an interposer to facilitate single die tape-in instead of requiring multiple, unique die tape-ins. A reduced number of required die types may result in a reduced number of required tape-ins, which can result in mask cost savings, as well as improvements in design turn-around time. In addition, test program development may be simplified, and product costs may be improved as volume is served by the reduced number of die types and wafers, resulting in improved yield.
The general IO blocks 115 may be used for die-to-die (D2D) connectivity, e.g., employing PAM, single-ended, differential, Serdes, and/or parallel interface implementations. In addition, they could also be used to implement external communication interfaces such as for off-chip memory (e.g., DDR, GDDR) or programming, test, or monitoring (e.g., I2C, SPI) depending on particular design objectives.
The interior processing region 120 generally includes a plurality of functional circuit blocks 122 coupled together through a programmable interconnect fabric 124. As indicated, functional blocks 122 may include a variety of different processing block types including configurable logic blocks (CLBs), intellectual property (e.g., hardened IP, HIP) blocks for performing specific functions, and memory blocks (Mx).
CLBs typically include three elements: look-up tables (LUTs), multiplexers, and flipflops. The IP blocks are typically used for performing specialized logical, mixed signal and/or analog circuits for implementing fundamental arithmetic functionality such as adders, MACs (multiply accumulate), security, DSP, GPU and CPU cores, memory and IO controllers, clock generation circuits, and the like. As technologies advance, more and more functional block options become feasible for FPGA incorporation.
With programmable logic, LUTs are the primary elements for implementing configurable logical functions. For example, they can be arranged and controlled to generate truth table operation for any desired combinational logic function. The flip flops are used for sequential logic implementation. They also may be used to efficiently incorporate adders/multipliers and DSP logic, for example, inside the CLBs themselves, to reduce latency, facilitate faster computation, reduce routing, and increased throughput. The multiplexers, among other things, are used to select the data output and pathways between the LUTs and flops to configure the desired logical functionality.
The memory blocks may include a combination of volatile and non-volatile memory such as RAM (random access memory), ROM (read only memory), flash memory and the like. The memory may be used for a variety of purposes such as for storing programable logic configurations, implementing processor architecture memory (e.g., distributed and block RAM for cache functionality), buffering data, and the like.
The functional blocks 122 are coupled to each other and to the IO interfaces through programmable interconnect fabric 124. The interconnect fabric may be implemented in any suitable manner. For example, it may be formed as a routing matrix comprising programmable switches, wires, clock network elements, and the like. The routing elements provide connections between the IO blocks 110, 115 and the data processing section 120, and also between the functional blocks 122 themselves.
The multi-chip FPGA device of
Unfortunately, these conventional approaches require the use of different die designs (e.g., layouts), which can dramatically increase the time and resources needed to make all of the dies. They require multiple tape-ins, resulting in significant mask cost. In addition, fixing bugs for different die types can result in additional tape-ins when trouble-shooting multiple, different designs.
It should be appreciated that any suitable technology for implementing a multi-chip package of dies (e.g., multiple FPGA dies, or even multiple CPU, GPU, or ASIC dies) including 2D, 2.5D and/or 3D methodologies may be employed. For example, wafer-level fan-out redistribution, using reconstituted wafer substrates of molding compounds as the surface for interconnections between dies may be used in 2D or 2.5D implementations. Similarly, with some methods, a separate, usually silicon-based, interconnect layer for redistribution could be used. For example, either an interposer (passive and/or active, typically formed from silicon) or die-to-die bridges (e.g., silicon bridges) embedded in an organic surface (e.g., substrate surface or interposer) could be employed.
An interposer is typically formed from a piece of silicon, large enough to accommodate the multiple chips with the chips being bonded to the interposer. Interposers typically include multiple signal lines (e.g., data lines), and because the data is being moved from silicon to silicon, the loss of power may be minimized.
Bridges, such as EMIB (Embedded Multi-Die Interconnect Bridge), developed by Intel Corp., may also be employed. EMIB is an example of a 2.5D MCP bridge interconnect technology. In some forms, EMIB may be a combination of both interposer and substrate. Rather than simply employing a large interposer, this technique may use a small slither of silicon (the bridge) embedded into the substrate. Such a bridge may include hundreds or thousands of connections to couple adjacent sides of two chips together. In this way, data between the chips may be transferred through silicon without excessive restrictions. Also, multiple bridges between two chips may be employed if more bandwidth is needed, or multiple bridges for designs using more than two chips could also be used.
Any suitable architectures for implementing the general IO blocks may be employed. For example, for D2D implementations, proprietary or standard protocols such as Advanced Interface Bus (AIB) or Universal Chiplet Interconnect Express (UCIe) may be used. Regardless, the physical layer architecture can be SerDes-based or parallel-based. A SerDes-based architecture typically includes parallel-to-serial (serial-to-parallel) data conversion, impedance matching circuitry, and in some cases, clock data recovery or clock forwarding functionality. The primary role for using a SerDes architecture may be to minimize the number of IO interconnects in simple 2D-type multi-chip packaging, e.g., as with organic substrates employing bridges, or the like, for the D2D connections.
On the other hand, a parallel based architecture typically includes many low-speed, simple IO channels in parallel, each made of a driver and a receiver with forwarding clock techniques to further simplify the architecture. It supports DDR-type signaling and for certain multi-chip designs, may be well-suited for D2D applications. For example, a parallel architecture may be well suited for minimizing power in dense 2.5D type packaging, as with, for example, the use of silicon interposers.
The signal lines 244 couple the adjacent D2D block pairs (217A-219B and 217B-219C) to one another for chip-to-chip communications. The reference layer 242, from a top view perspective (not shown) may have gaps or openings to accommodate the signal lines and possibly other signals (e.g., IO from active transceivers). Alternatively, the signal lines could be formed from vias or micro vias with insulating lateral surfaces.
Compute system prototyping may be a highly beneficial use of FPGA modules 660 in accordance with some embodiments. Hardware platforms such as FPGA prototyping are growing in popularity due to their relative low expense and ability to test system designs at speed versus simulation which is too slow and often can't provide an accurate assessment of design behavior. FPGA-based prototyping may be well suited for even the largest designs. An FPGA based prototype system allows engineers to use the same software in the prototype system as with the final product, thus allowing an early start in software development. The architecture for the prototype need only include minor additions compared to the final architecture. Therefore, the evaluation of different configurations and functionality verification may be simple, reliable, and fast. It also allows for the evaluation of large system-on-chips using one or more multi-FPGA modules such as a multi-FPGA module 200, 500, and/or 660, as previously discussed. When combined with the ability to control the clocking of individual components, such a configuration allows analysis of both software and hardware. Another benefit is that FPGA-based compute system prototypes can use synthesizable RTL (register transfer language) developed for an actual hardened design to provide cycle-accurate, high-performance execution and real-world interface connectivity. This performance can scale with the complexity of designs thanks to the flexibility of prototyping solutions that allow design partitioning across multiple FPGAs to be utilized in order to handle very large design sizes requiring massive verification throughput. This also brings the added benefit of more time to perform exhaustive verification of large designs, or to allow additional exploration of design options. While verification may be a primary use, physical prototyping supports other use cases, including proof-of-concept research, test pattern generation, IP development, end-user evaluation, and even as highly configurable computer systems for varieties of applications.
It should be appreciated that the FPGA modules 660 may be a component included in any suitable data processing system, such as a data processing system 600, shown in
In some embodiments, the data processing system 600 may be part of a data center that processes a variety of different requests. For instance, the data processing system 600 may receive a data processing request via the host/network interface 610 to perform encryption, decryption, machine learning, video processing, voice recognition, image recognition, data compression, database search ranking, bioinformatics, network security pattern identification, spatial navigation, digital signal processing, or some other specialized task.
Illustrative examples of the technologies disclosed herein are provided below. An embodiment of the technologies may include any one or more, and any compatible combination of, the examples described below.
Example 1 is a multi-chip module that includes first and second dies that each have a first side including a peripheral region with both transceiver and D2D blocks and a second side that includes a peripheral region with both transceiver and D2D blocks. The first die is disposed next to the second die such that the second side of the first die is next to the first side of the second die, and at least a portion of the D2D block of the first die's second side is coupled to at least a portion of the D2D block of the second die's first side. Moreover, at least some of the D2D blocks are unused when in a side that does not have a neighboring die, and wherein some of the transceiver blocks are unused when in a side of the die that does have a neighboring die. It should be appreciated that any suitable die type, e.g., PLD, FPGA, CPU, GPU, ASIC, and the like could be used for implementing these dies, in this, and in the other examples presented throughout the specification.
Example 2 includes the subject matter of example 1, and wherein the coupled together general IO blocks from the first and second dies include die-to-die IO interfaces for communicatively coupling the first die to the second die.
Example 3 includes the subject matter of any of examples 1-2, and wherein the first and second dies are field programmable gate array dies.
Example 4 includes the subject matter of any of examples 1-3, and wherein the first and second dies are separate instances of the same die design.
Example 5 includes the subject matter of any of examples 1-4, and wherein the general IO blocks are disposed between the transceiver blocks.
Example 6 includes the subject matter of any of examples 1-5, and wherein the transceiver and general IO blocks in each peripheral region are interleaved, whereby each block is at an outer edge of its die.
Example 7 includes the subject matter of any of examples 1-6, and wherein the first and second dies are mounted on an interposer having signal lines for coupling general IO block portions from the first die's second side to the second die's first side.
Example 8 includes the subject matter of any of examples 1-7, and wherein the interposer has a reference plane coupled to the transceiver s on the first die's second side and second die's first side to render them inert.
Example 9 includes the subject matter of any of examples 1-8, and wherein the first and second dies are mounted atop an organic substrate portion that includes at least one bridge for coupling general IO block portions from the first die's second side to the second die's first side.
Example 10 is an apparatus that includes a programable integrated circuit die. The die has an interior processing region and a peripheral region that includes an inner general IO block and an outer transceiver block. The inner general IO block is disposed between the outer transceiver block and the interior processing region.
Example 11 includes the subject matter of example 10, and wherein the general IO block includes D2D circuitry.
Example 12 includes the subject matter of any of examples 10-11, and wherein the D2D circuitry comprises SerDes circuits.
Example 13 includes the subject matter of any of examples 10-12, and wherein the peripheral region occupies opposite sides of the die.
Example 14 includes the subject matter of any of examples 10-13, and further comprising a second die that is a separate instance of the programmable integrated circuit die, which is a first die.
Example 15 includes the subject matter of any of examples 10-14, and wherein the first and second dies are mounted on an interposer having signal lines for coupling general IO block portions from the first die to the second die.
Example 16 includes the subject matter of any of examples 10-15, and wherein the interposer has a reference plane coupled to the transceiver blocks that are to be inert from the first and second dies.
Example 17 includes the subject matter of any of examples 10-16, and wherein the first and second dies are mounted atop an organic substrate portion that includes at least one bridge for coupling general IO block portions from the first and second dies to one another.
Example 18 is an apparatus that includes a substrate and a substrate; and first and second FPGA dies. The first and second FPGA dies are of the same design and are mounted to the substrate. The first and second dies have adjacent sides that each include (i) a D2D block within a peripheral region, and (ii) an unused transceiver block within the peripheral region. The D2D blocks are coupled to one another, and the transceiver blocks are coupled to a reference rail to render them inert.
Example 19 includes the subject matter of example 18, and wherein the first and second dies are mounted to the substrate through an organic material.
Example 20 includes the subject matter of any of examples 18-19, and wherein the D2D blocks are coupled together through at least one bridge having multiple signal lines.
Example 21 includes the subject matter of any of examples 18-20, and further comprising through silicon vias (TSVs) passing through the at least one bridges to couple the unused transceiver blocks to the reference rail.
Example 22 includes the subject matter of any of examples 18-21, and wherein the reference rail is a ground plane.
Example 23 includes the subject matter of any of examples 18-22, and wherein the first and second dies are mounted to an interposer.
Example 24 includes the subject matter of any of examples 18-23, and wherein the interposer is a silicon interposer.
Example 25 is a data processing apparatus including at least one FPGA module having the first and second dies in accordance with the examples of examples 18-24.
Example 26 is programmable device module that includes a substrate and first and second FPGA dies. The first and second FPGA dies are of the same design and are mounted to the substrate. Moreover, the first and second dies have adjacent sides that each include (i) a D2D block within a peripheral region, and (ii) an unused transceiver block within the peripheral region. The module also includes means for coupling the D2D blocks to one another.
Example 27 includes the subject matter of example 26, and wherein the transceiver blocks are coupled to a reference rail to render them inert.
Example 28 includes the subject matter of any of examples 26-27, and wherein the first and second dies are mounted to the substrate through an organic layer.
Example 29 includes the subject matter of any of examples 26-28, and the coupling means comprises at least one bridge.
Example 30 includes the subject matter of any of examples 26-29, and wherein the coupling means comprises through silicon vias (TSVs) passing through the at least one bridges to couple the unused transceiver blocks to a reference rail.
Example 31 includes the subject matter of any of examples 26-30, and wherein the reference rail is a ground plane.
Example 32 includes the subject matter of any of examples 26-31, and wherein the first and second dies are mounted to an interposer.
Example 33 includes the subject matter of any of examples 26-32, and wherein the interposer is a silicon interposer.
Example 34 is a computing system with at least one module having the subject matter of any of examples 26-33.
Example 35 includes the subject matter of example 34 and further comprising at least one FPGA module to prototype a hardware system.
Example 36 is a multi-chip module that includes identical first and second dies having a peripheral region containing die to die IO on at least two sides of the die. At least some of the die-to-die IO in the package are unused when that side of the die does not have a neighboring die and some of the off package IO are unused when that side of the die does have a neighboring die.
Reference in the specification to “an embodiment,” “one embodiment,” “some embodiments,” or “other embodiments” means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least some embodiments, but not necessarily all embodiments. The various appearances of “an embodiment,” “one embodiment,” or “some embodiments” are not necessarily all referring to the same embodiments. If the specification states a component, feature, structure, or characteristic “may,” “might,” or “could” be included, that particular component, feature, structure, or characteristic is not required to be included. If the specification or claim refers to “a” or “an” element, that does not mean there is only one of the elements. If the specification or claims refer to “an additional” element, that does not preclude there being more than one of the additional elements.
Throughout the specification, and in the claims, the term “connected” means a direct connection, such as electrical, mechanical, or magnetic connection between the things that are connected, without any intermediary devices.
The term “coupled” means a direct or indirect connection, such as a direct electrical, mechanical, or magnetic connection between the things that are connected or an indirect connection, through one or more passive or active intermediary devices.
The term “circuit” or “module” may refer to one or more passive and/or active components that are arranged to cooperate with one another to provide a desired function. Different circuits or modules may share or even consist of common components. for example, A controller circuit may be a circuit to perform a first function and at the same time, the same controller circuit may also be a circuit to perform another function, related or not related to the first function.
The term “signal” may refer to at least one current signal, voltage signal, magnetic signal, or data/clock signal. The meaning of “a,” “an,” and “the” include plural references. The meaning of “in” includes “in” and “on.”
Unless otherwise specified the use of the ordinal adjectives “first,” “second,” and “third,” etc., to describe a common object, merely indicate that different instances of like objects are being referred to and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking or in any other manner
For the purposes of the present disclosure, phrases “A and/or B” and “A or B” mean (A), (B), or (A and B). For the purposes of the present disclosure, the phrase “A, B, and/or C” means (A), (B), (C), (A and B), (A and C), (B and C), or (A, B and C).
It is pointed out that those elements of the figures having the same reference numbers (or names) as the elements of any other figure can operate or function in any manner similar to that described but are not limited to such.
Furthermore, the particular features, structures, functions, or characteristics may be combined in any suitable manner in one or more embodiments. For example, a first embodiment may be combined with a second embodiment anywhere the particular features, structures, functions, or characteristics associated with the two embodiments are not mutually exclusive.
In addition, well-known power/ground connections to integrated circuit (IC) chips and other components may or may not be shown within the presented figures, for simplicity of illustration and discussion, and so as not to obscure the disclosure. Further, arrangements may be shown in block diagram form in order to avoid obscuring the disclosure, and also in view of the fact that specifics with respect to implementation of such block diagram arrangements are dependent upon the platform within which the present disclosure is to be implemented.