Examples of the present disclosure relate generally to methods, apparatuses and computer program products for providing three-dimensional architecture for computer chips.
For integrated circuits, electronic circuits may be fabricated on a semiconductor die, which may form a chip or chiplet. Each chip or chiplet may include one or more processing elements capable of performing specified functions on a given set of data or information.
Typically, the more processing elements a chip has, the more processing power the chip is capable of. However, conventional techniques to add processing power to a chip are limited. For example, in a two-dimensional approach, additional processing elements may be added along the width and length dimensions of the die. This two-dimensional approach is typically limited due to the size constraints (e.g., area) of a die, particularly for microelectronic technologies.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to limit the scope of the claimed subject matter. The foregoing needs are met, to a great extent, by the present application described in more detail below.
Three-dimensional application specific integrated circuit architecture is described herein. An integrated circuit may include computing chiplets, which may be stacked atop random access memory chiplets. In some cases, the stacking may include a compute die stacked atop another compute die. The stacking may provide the ability for the computing chiplets and random access memory chiplets to communicate via short, low-latency, and/or low attenuation. The stacking may also allow for minimizing the area footprint of the integrated circuit.
In this regard, examples of the present disclosure may provide higher memory per unit area and increased compute processing (e.g., Tera operations per second (TOPs)) for integrated circuits. For instance, in some examples, the integrated circuit architecture of the examples of the present disclosure may increase the TOPS by at least four times (for example by utilizing a three-dimensional (3D) topology/configuration) and in some examples may increase memory (e.g., static random-access memory (SRAM)) capacity between eight-ten times. Additionally, the examples of the present disclosure may improve integrated circuit system latency and power efficiency between chiplets or chips. For example, by introducing another dimension as a degree of freedom, the communication networks in such an architecture may deliver lower latency than a conventional 2D topology.
In one example aspect of the present disclosure, an integrated circuit (IC) is provided. The integrated circuit may include an interposer layer. The integrated circuit may further include a plurality of random access memory chiplets stacked atop the interposer layer, and a plurality of compute chiplets. The plurality of compute chiplets may be stacked atop a respective random access memory chip of the plurality of random access memory chiplets, such that the plurality of compute chiplets may be in electrical communication with the respective random access memory chip of the plurality of random access memory chiplets.
In another example aspect of the present disclosure, a method is provided. The method may include transmitting at least one electrical signal from a computing chiplet. The method may further include transferring the at least one electrical signal to a random access memory chiplet. The computing chiplet may be stacked atop the random access memory chiplet, thereby forming a three-dimensional stacked integrated circuit.
In yet another example aspect of the present disclosure, a computer program product is provided. The computer program product may include at least one non-transitory computer-readable medium including computer-executable program code instructions stored therein. The computer-executable program code instructions may include program code instructions configured to transmit at least one electrical signal from a computing chiplet. The computer program product may further include program code instructions configured to transfer the at least one electrical signal to a random access memory chiplet. The computing chiplet may be stacked atop the random access memory chiplet, thereby forming a three-dimensional stacked integrated circuit.
In some examples, the plurality of random access memory chiplets may comprise static random access memory (SRAM) chiplets. In some examples, the plurality of compute chiplets may comprise a plurality of processing elements (PEs). In some examples, the IC may include a plurality of high bandwidth memory (HBM) stacked atop the interposer layer. In some examples, the IC may include a set of chiplets (e.g., network chiplets) stacked atop the interposer layer. The chiplets may for example include functionality, including but not limited to, networking, input/output (I/O), a network interface card or a network interface controller (NIC) and/or the like.
In some examples, the plurality of random access memory chiplets are configured to electrically communicate with one another via the interposer layer. In some examples, more than one compute chiplet of the plurality of computer chiplets is stacked atop a random access memory chiplet of the plurality of random access memory chiplets. In some examples, one or more random access memory chiplets of the plurality of random access memory chiplets are configured to electrically communicate with other random access memory chiplets of the plurality of random access memory chiplets via the interposer layer.
In some examples, the IC may further include a substrate layer, wherein the interposer layer is stacked atop the substrate layer; and at least one host central processing unit (CPU) stacked atop the substrate layer, wherein the at least one host CPU is in electrical communication with the interposer layer via the substrate layer. In some examples, the at least one host CPU comprises two host CPUs.
In some examples, the IC may further include a set of network and CPU chiplets, wherein the set of network and CPU chiplets are stacked atop the interposer layer. In some examples, the IC may further include a CPU stacked atop the interposer layer. In some examples, the IC may further include another random access memory stacked atop the interposer layer; and a CPU stacked atop the another random access memory, wherein the host CPU is in electrical communication with the another random access memory. In some examples, the another random access memory is configured to be in electrical communication with the plurality of random access memory chiplets via the interposer layer.
There has thus been outlined, rather broadly, certain embodiments of the present disclosure in order that the detailed description thereof may be better understood, and in order that the present contribution to the art may be better appreciated.
Additional advantages will be set forth in part in the description which follows or may be learned by practice. The advantages will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive, as claimed.
In order to facilitate a more robust understanding of the application, reference is now made to the accompanying drawings, in which like elements are referenced with like numerals. These drawings should not be construed to limit the application and are intended only to be illustrative.
A detailed description of the illustrative embodiments will be discussed in reference to various figures, embodiments, and aspects herein. Although this description provides detailed examples of possible implementations, it should be understood that the details are intended to be examples and thus do not limit the scope of the application.
Reference in this specification to “one embodiment,” “an embodiment,” “one or more embodiments,” “an aspect” or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Moreover, the term “embodiment” in various places in the specification is not necessarily referring to the same embodiment. That is, various features are described which may be exhibited by some embodiments and not by the other.
It is understood that any or all of the systems, methods and processes described herein may be embodied in the form of computer executable instructions, e.g., program code, stored on a computer-readable storage medium which instructions, when executed by a machine, such as a computer, server, transit device or the like, perform and/or implement the systems, methods and processes described herein. Specifically, any of the steps, operations or functions described above may be implemented in the form of such computer executable instructions. Computer readable storage media includes volatile and nonvolatile, removable, and non-removable media implemented in any method or technology for storage of information, but such computer readable storage media do not include signals. Computer readable storage media may include, but are not limited to, random access memory (RAM), read only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD ROM), digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other physical medium which may be used to store the desired information and which may be accessed by a computer.
As defined herein a “computer-readable storage medium,” which refers to a non-transitory, physical or tangible storage medium (e.g., volatile or non-volatile memory device), may be differentiated from a “computer-readable transmission medium,” which refers to an electromagnetic signal.
Some manufacturers may attempt to maximize processing and storage capabilities of an application specific integrated circuit (ASIC) by including more processing/storage components onto a given chip. There may also be a cost advantage as well, as smaller chip sizes may provide for more dies to be cut or etched from a single wafer. However, ASIC dimensions are typically limited based on the reticle limit, which is the maximum area of chip that may be manufactured. Manufacturers may be unable to overcome the reticle limit by increasing the width and/or length dimensions of the chip. These large full reticle size chips may be prone to defects. This may result in poor yield and increasing chip cost. However, these issues may be addressed by using smaller chip size in particular technologies (e.g., microelectronics). According to the present disclosure, a chip may include a three-dimensional architecture. The chip may include computing chiplets configured to perform various processing functions. The computing chiplets may be stacked atop random access memory chiplets. Likewise, other components of the chip may either be stacked on an interposer, or may be stacked on memory as well, to enable communicating between non-stacked entities. This stacking may facilitate high speed, low latency communications between the memory and computing chiplets, which may enable for more powerful processing capabilities of the chip.
Chips including 2.5D configurations are typically limited in capabilities, in part due to the communication routes imposed by the configuration. Each component positioned on the interposer may transmit communications to other components on the interposer via the interposer. The communication route length and material of the interposer may cause communication delay as well as communication attenuation. This lag may also impose limitations on processing capabilities of the chip, particularly in examples where multiple components may require communicating or coordinating with one another to fulfill a task.
Further, in examples where the chip 100 is also an integrated circuit, a host CPU 125 may also be included in the chip 100. However, host CPUs 125 are typically unable to be placed atop the interposer 120, and thus an additional substrate layer 130 (e.g., a printed circuit board (PCB) layer) may be implemented, where the host CPU 125 and the interposer 120 are placed atop the substrate layer 130. This may further cause communication delays, as the host CPU 125 communicates with the components of the interposer 120 by transmitting a communication to the substrate layer 130, which travels to the interposer 120, and then to the particular component (e.g., the compute chiplet 105).
The chip 200 may include a substrate 205. The substrate 205 may be a section of semiconducting material (e.g., layer) that may support other components of the chip 200. For example, the substrate 205 may be composed of silicon. The substrate 205 may form a plane, where the substrate 205 may include a length and width that are substantially greater than a substrate thickness. The substrate 205 may be configured to house, include, or contain a circuitry component(s) along a given surface.
The chip 200 may also include an interposer 210. The interposer 210 may be disposed atop the substrate 205. The interposer 210 may facilitate electrical communications from the substrate 205 to components atop the interposer 210, or between different components atop the interposer 210. The interposer 210 may be composed of semiconducting material, such as for example silicon, and may include electrical connections (e.g., on a top surface) capable of carrying electrical connections to and from different components atop the interposer 210. The interposer 210 may form a plane, where the interposer 210 may include a width and length that are substantially greater than an interposer thickness. Further, the interposer plane may be generally parallel to the substrate plane. In some examples, the interposer length, width, or both, may be smaller than the length and/or the respective width of the substrate 205.
One or more bottom chiplets (e.g., random access memory chiplets 215) may be disposed atop the interposer 210. The random access memory chiplets 215 may be electrically connected to the interposer 210 (e.g., via the interposer top surface and the memory bottom surface), such that a random access memory chiplet 215 is configured to transmit and receive electrical communications via the interposer 210 to other components atop the interposer 210, to other components atop the substrate 205, or both. The random access memory chiplets 215 may in some examples be static random access memory (SRAM), such that the data stored by a SRAM is statically maintained. In some cases, the random access memory chiplets 215 may connect to DRAM, and may include In-memory compute, near memory, Network-On-Chip, and/or the like.
The chip 200 may also include one or more computing chiplets 220. Computing chiplets 220 may be disposed atop a random access memory chiplet 215. A computing chiplet 220 may be in electrical communication with the random access memory chiplet 215 the computing chiplet is disposed on. For example, a computing chiplet 220 may include electrical connections on its bottom surface, which may be coupled to electrical connection on a top surface of the respective random access memory chiplet 215. Thus, a computing chiplet 220 may be configured to be in electrical communication with its respective random access memory chiplet 215. In some examples, a computing chiplet 220 may be configured to electrically communicate with other components atop the interposer 210, such as other computing chiplets 220 (e.g., an electrical communication travels from the computing chiplet 220, to its respective random access memory chiplet 215, to the interposer 210, to another random access memory chiplet 215, and to another computing chiplet 220), other components atop the substrate 205 (e.g., a CPU), and/or the like. In some examples, the electrical connection between the computing chiplet 220 and the respective random access memory chiplet 215 may be direct, such as by bump bonds or electrical pins. In some examples, the electrical connection may be indirect, such as by implementing electrical leads between a computing chiplet 220 and a respective random access memory chiplet 215.
Each computing chiplet 220 may be configured to include one or more processing elements 221. The processing elements 221 may be configured to receive a data set(s) and perform distinct process operations on the data(s). For example, a processing element 221 may be configured to perform an arithmetic function (e.g., multiplier, derivation, additive, and/or the like) on the set of data(s). In some examples, a processing element 221 may be configured to perform a logical function (e.g., and, ors, nors, and/or the like) on a set of data(s). In some examples, the processing elements 221 may include functions specific to that processing element (e.g., the processing elements may be distinct from one another). In some examples, the processing elements 221 may receive the data set(s) from a particular circuitry component (e.g., another processing element), process the data, and output the processed results to another particular circuitry component (e.g., another particular circuitry element) to form a processing pipeline. In some examples, a computing chiplet 220 may include a predetermined quantity of processing elements 221 (e.g., 32 processing elements, 16 processing elements, etc.).
More than one computing chiplet 220 may be stacked atop a given random access memory chiplet 215. For example, as shown in
The chip 200 may also include one or more high bandwidth memory (HBM) chiplets 225. In some examples of the present disclosure, the HBM chiplets 225 may be referred to herein as HBMs 225. The HBMs 225 may provide additional memory capacity of the chip 200, for example, for AI model/inference training, and/or the like. The HBMs 225 may be disposed on the interposer 210. For example, the HBMs 225 may be in direct contact with the interposer 210, such that a bottom surface of the HBMs 225 may contact a top surface of the interposer 210. The HBMs 225 may be in electrical communication with the interposer 210. For example, the bottom surface of an HBM 225 may include electrical connections that may couple to respective electrical connections of the top surface of the interposer 210. In some examples, the electrical connections may include bump bonds, electrical pins, and/or the like. The HBMs 225 may in some examples include second generation of high bandwidth memory (HBM2), second generation evolutionary high bandwidth memory (HBM2E), third generation of high bandwidth memory (HBM3), fourth generation of high bandwidth memory (HBM4), HBM with processing-in-memory, and/or the like.
The chip 200 may also include one or more chiplets 230. The chiplets 230 may for example include functionality, including but not limited to, networking, input/output (I/O), a network interface card or a network interface controller (NIC) and/or the like. The chiplets 230 may be disposed on the interposer 210. For example, the chiplets 230 may be in direct contact with the interposer 210, such that a bottom surface of the chiplets 230 may contact a top surface of the interposer 210. The chiplets 230 may be in electrical communication with the interposer 210. For example, the bottom surface of a chiplet 230 may include electrical connections that may couple to respective electrical connections of the top surface of the interposer 210. In some examples, the electrical connections may include bump bonds, electrical pins, and/or the like. The chiplets 230 may be configured to facilitate communications between the various components of the chip 200. For example, in some examples the chiplets 230 may include network-on-chip logic. For example, the chiplets 230 may manage processing pipelines of processing elements either within or between computer chiplets 220. In some examples, the chiplets 230 may be configured to facilitate communications between the chip 200 and external components.
The chip may also include one or more host CPUs 235. The host CPUs 235 may be disposed on the substrate 205. For example, the host CPUs 235 may be in direct contact with the substrate 205, such that a bottom surface of the host CPUs 235 may contact a top surface of the substrate. The host CPUs 235 may be in electrical communication with the substrate 205. For example, the bottom surface of a host CPU(s) 235 may include electrical connections that may couple to respective electrical connections of the top surface of the substrate 205. In some examples, the electrical connections may include bump bonds, electrical pins, and/or the like. The host CPUs 235 may include logic to manage the various other components of the chip 200. For example, the host CPUs 235 may be configured to execute artificial intelligence, machine learning, video processing, and/or the like.
The chip 200 may utilize 3D architecture, at least with respect to the computing chiplets 220 being stacked atop the random access memory chiplets 215, and likewise the random access memory chiplets 215 being stacked atop the interposer 210. The stacking of the computing chiplets 220 atop the random access memory chiplets 215 may allow for very low latency for communications between the computing chiplets 220 atop the random access memory chiplets 215, which may enhance the processing capabilities of the chip 200. Further, the implementation of various components on the interposer 210 (e.g., in a 2.5D architecture) may further enhance low latency across the chip 200. For example, components may include an embedded silicon bridge, embedded metal connectors, passives, capacitors, chiplet-to-chiplet physical layer, and/or the like.
While the chips described with reference to
As shown in
The chip 32 may be an integrated circuit, and in particular, for example, an ASIC. The chip 32 may be an example of chip 200, chip 300, chip 400, chip 500, chip 600, chip 700, chip 800 of
In some examples, the chip 32 may execute computer-executable instructions stored in the memory (e.g., memory 44 and/or memory 46) of the computing device 30 in order to perform the various required functions of the computing device 30. For example, the chip 32 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the computing device 30 to operate in a wireless or wired environment. In some examples, the chip 32 may run application-layer programs (e.g., browsers) and/or radio access-layer (RAN) programs and/or other communications programs. In some examples, the chip 32 may also perform security operations such as authentication, security key agreement, and/or cryptographic operations, such as at the access-layer and/or application layer for example.
The chip 32 may be coupled to communication circuitry (e.g., transceiver 34 and transmit/receive element 36). In some examples, the chip 32, through the execution of computer executable instructions, may control the communication circuitry in order to cause the computing device 30 to communicate with other computing devices via the network to which it is connected.
The transmit/receive element 36 may be configured to transmit signals to, or receive signals from, other computing devices or networking equipment. For example, in an embodiment, the transmit/receive element 36 may be an antenna configured to transmit and/or receive radio frequency (RF) signals. The transmit/receive element 36 may support various networks and air interfaces, such as wireless local area network (WLAN), wireless personal area network (WPAN), cellular, and the like. In yet another embodiment, the transmit/receive element 36 may be configured to transmit and receive both RF and light signals. It will be appreciated that the transmit/receive element 36 may be configured to transmit and/or receive any combination of wireless or wired signals.
The transceiver 34 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 36 and to demodulate the signals that are received by the transmit/receive element 36. As noted above, the computing device 30 may have multi-mode capabilities. Thus, the transceiver 34 may include multiple transceivers for enabling the computing device 30 to communicate via multiple radio access technologies (RATs), such as universal terrestrial radio access (UTRA) and Institute of Electrical and Electronics Engineers (IEEE 802.11), for example.
The chip 32 may access information from, and store data in, any type of suitable memory, such as the non-removable memory 44 and/or the removable memory 46. For example, the chip 32 may store session context in its memory, as described above. The non-removable memory 44 may include RAM, ROM, a hard disk, or any other type of memory storage device. The removable memory 46 may include a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, and the like. In other embodiments, the chip 32 may access information from, and store data in, memory that is not physically located on the computing device 30, such as on a server or a home computer.
The chip 32 may receive power from the power source 48 and may be configured to distribute and/or control the power to the other components in the computing device 30. The power source 48 may be any suitable device for powering the computing device 30. For example, the power source 48 may include one or more dry cell batteries (e.g., nickel-cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), etc.), solar cells, fuel cells, and the like.
The chip 32 may also be coupled to the GPS chipset 50, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the computing device 30. It will be appreciated that the computing device 30 may acquire location information by way of any suitable location-determination method while remaining consistent with an exemplary embodiment.
The foregoing description of the exemplary aspects of the present disclosure has been presented for the purpose of illustration. It is not intended to be exhaustive or to limit the patent rights to the precise forms disclosed. Persons skilled in the relevant art may appreciate that many modifications and variations are possible in light of the above disclosure.
Some portions of this description describe the exemplary aspects of the present disclosure in terms of applications and/or symbolic representations of operations on information. These application descriptions and representations are commonly used by those skilled in the data processing arts to convey the substance of their work effectively to others skilled in the art. These operations, while described functionally, computationally, or logically, are understood to be implemented by computer programs or equivalent electrical circuits, microcode, or the like. Furthermore, it has also proven convenient at times, to refer to these arrangements of operations as components, without loss of generality. The described operations and their associated components may be embodied in software, firmware, hardware, or any combinations thereof.
Any of the steps, operations, or processes described herein may be performed or implemented with one or more hardware or software components, alone or in combination with other devices. In one exemplary aspect of the present disclosure, a software component may be implemented with a computer program product comprising a computer-readable medium containing computer program code, which may be executed by a computer processor for performing any or all of the steps, operations, or processes described.
Some exemplary aspects of the present disclosure also may relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a computing device selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a non-transitory, tangible computer readable storage medium, or any type of media suitable for storing electronic instructions, which may be coupled to a computer system bus. Furthermore, any computing systems referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
Some exemplary aspects of the present disclosure also may relate to a product that is produced by a computing process described herein. Such a product may comprise information resulting from a computing process, where the information is stored on a non-transitory, tangible computer readable storage medium and may include any example aspect of a computer program product or other data combination described herein.
The language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. It is therefore intended that the scope of the patent rights be limited not by this detailed description, but rather by any claims that issue on an application based hereon. Accordingly, the disclosure of the exemplary aspects is intended to be illustrative, but not limiting, of the scope of the patent rights, which is set forth in the following claims.
This application claims the benefit of U.S. Provisional Application No. 63/514,970 filed Jul. 21, 2023, the entire content of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
63514970 | Jul 2023 | US |