The disclosure relates in general to a semiconductor structure, and more particularly to a processor integrated circuit (IC) having a plurality of monolithic dies respectively having a processing unit circuit, a plurality of static random access memory (SRAM) arrays or a plurality of dynamic random access memory (DRAM) arrays.
Information technology (IT) systems are rapidly evolving in businesses and enterprises across the board, including those in factories, healthcare, and transportation. Nowadays, system on chip (SOC) or artificial intelligence (AI) is the keystone of IT systems which is making factories smarter, improving patient outcomes better, increasing autonomous vehicle safety. Data from manufacturing equipment, sensors, machine vision systems could be easily total 1 petaByte per day. Therefore, a high performance computing (HPC) SOC or AI chip is required to handle the such petaByte data.
Generally speaking, AI chips could be categorized by a graphic processing unit (GPU), a field programmable gate array (FPGA), and an application specific IC (ASIC). Originally designed to handle graphical processing applications using parallel computing, CPUs began to be used more and more often for AI training. CPU's training speed and efficiency generally is 10 to 1000 times larger than general purpose CPU.
FPGAs have blocks of logic that interact with each other and can be designed by engineers to help specific algorithms, and is suitable for AI inference. Due to faster time to market, lower cost, and flexibility, FPGA prefers over ASIC design although it has disadvantages like larger size, slower speed, and larger power consumption. Due to the flexibility of FPGA, it is possible to partially program any portion of the FPGA depending on the requirement. FPGA's inference speed and efficiency is 10-100 times larger than general purpose CPU.
On the other hand, ASICs are tailored directly to the circuitry and are generally more efficient than FPGAs. For customized ASIC, its training/inference speed and efficiency could be 10-1000 times larger than general purpose CPU. However, unlike FPGAs which are easier to customize as AI algorithms continue to evolve, ASICs are slowly becoming obsolete as new AI algorithms are developed.
No matter in GPU, FPGA, and ASICs (or other similar SOC, CPU, NPU, etc.), logic circuit and SRAM circuit are two major circuit the combination of which approximately occupy around 90% of the AI chip size. The rest 10% of the AI chip may include I/O pads circuit. Nevertheless, the scaling process/technology nodes for manufacturing AI chips are becoming increasingly necessary to train an AI machine efficiently and quickly because they offer better efficiency and performance. Improvement in integrated circuit performance and cost has been achieved largely by process scaling technology according to Moore's Law, but such scaling technology down to 3 nm to 5 nm encounter a lot of technical difficulties, so the semiconductor industry's investment costs in R&D and capital are dramatically increasing.
For example, SRAM device scaling for increased storage density, reduction in operating voltage (VDD) for lower stand-by power consumption, and enhanced yield necessary to realize larger-capacity SRAM become increasingly difficult to achieve with miniaturization down to the 28 nm (or lower) manufacture process is a challenge.
Some of the reasons for the dramatically increase of the total area of the SRAM cell represented by λ2 or F2 when the minimum feature size decreases could be described as follows. The traditional 6T SRAM has six transistors which are connected by using multiple interconnections, has its first interconnection layer M1 to connect the gate-level (“Gate”) and the diffusion-level of the Source-region and the Drain-region (those regions called generally as “Diffusion”) of the transistors. There is a need to increase a second interconnection layer M2 and/or a third interconnection layer M3 for facilitating signal transmission (such as the word-line (WL) and/or bit-lines (BL and BL Bar)) without enlarging the die size by only using M1, then a structure Via-1, which is composed of some types of the conductive materials, is formed for connecting the second interconnection layer M2 to the first interconnection layer M1.
Thus, there is a vertical structure which is formed from the Diffusion through a Contact (Con) connection to the first interconnection layer M1, i.e. “Diffusion-Con-M1”. Similarly, another structure to connect the Gate through a Contact structure to the first interconnection layer M1 can be formed as “Gate-Con-M1”. Additionally, if a connection structure is needed to be formed from the first interconnection layer M1 interconnection through a Via1 to connect to the second interconnection layer M2 interconnection, then it is named as “M1-Via1-M2”. A more complex interconnection structure from the Gate-level to the second interconnection layer M2 can be described as “Gate-Con-M1-Via1-M2”. Furthermore, a stacked interconnection system may have an “M1-Via1-M2-Via2-M3” or “M1-Via1-M2-Via2-M3-Via3-M4” structure, etc.
Since the Gate and the Diffusion in two access transistors (NMOS pass-gate transistors PG-land PG-2, as shown in
As results, the necessary space between one M1 interconnection and the other M1 interconnection will increase the die size and in some cases the wiring connections may block some efficient channeling intention of using M2 directly to surpass M1 regions. In addition, there is difficult to form a self-alignment structure between Via1 to Contact and at the same time both Via1 and Contact are connected to their own interconnection systems, respectively.
Additionally, in traditional 6T SRAM, at least there are one NMOS transistor and one PMOS transistor located respectively inside some adjacent regions of p-substrate and n-well which have been formed next to each other within a close neighborhood, a parasitic junction structure called n+/p/n/p+ parasitic bipolar device is formed with its contour starting from the n+ region of the NMOS transistor to the p-well to the neighboring n-well and further up to the p+ region of the PMOS transistor.
There are significant noises occurred on either n+/p junctions or p+/n junctions, an extraordinarily large current may flow through this n+/p/n/p+ junction abnormally which can possibly shut down some operations of CMOS circuits and to cause malfunction of the entire chip. Such an abnormal phenomenon called Latch-up is detrimental for CMOS operations and must be avoided. One way to increase the immunity to Latch-up which is certainly a weakness for CMOS is to increase the distance from n+ region to the p+ region. Thus, the increase of the distance from n+ region to the p+ region to avoid Latch-up issue will also enlarge the size of the SRAM cell.
However, even miniaturization of the manufacture process down to the 28 nm or lower (so called, “minimum feature size”, “Lambda (A)”, or “F”), due to the interference among the size of the contacts, among layouts of the metal wires connecting the word-line (WL), bit-lines (BL and BL Bar), high level voltage VDD, and low level voltage VSS, etc., the total area of the SRAM cell represented by λ2 or F2 dramatically increases when the minimum feature size decreases, as shown in
Similar situation happens to logic circuit scaling. Logic circuit scaling for increased storage density, reduction in operating voltage (Vdd) for lower stand-by power consumption, and enhanced yield necessary to realize larger-capacity logic circuit become increasingly difficult to achieve. Standard cells are commonly used and basic elements in logic circuit. The standard cell may comprise basic logical function cells (such as, inverter cell, NOR cell, and NAND cell.
Similarly, even miniaturization of the manufacture process down to the 28 nm or lower, due to the interference among the size of the contacts and layouts of the metal wires, the total area of the standard cell represented by λ2 or F2 dramatically increases when the minimum feature size decreases.
It is noticed that, some active regions or fins between PMOS and
NMOS (called “dummy fins”) are not utilized in PMOS/NMOS of this standard cell, the potential reason of which is likely related to the latch-up issue between the PMOS and NMOS. Thus, the latch-up distance between the PMOS and NMOS in
The scaling trend regarding area size (2Cpp×Cell_Height) v. different process technology node for three foundries could be shown in
From another point of view, any high performance computing (HPC) chip, such as, SOC, AI, NPU (Network Processing Unit), GPU, CPU, and FPGA etc., currently they are using monolithic integration to put more circuits as many as possible. But, as shown in
Thus, there is a need to propose a new integration system including a logic chip with HPC and a SRAM chip with a high storage volume which could solve the above-mentioned problems such that more powerful and efficient SOC or AI single chip based on monolithic integration in the near future could come true.
One aspect of the present disclosure is to provide an IC package, wherein the package IC includes a substrate, a first monolithic die, a second monolithic die and a third monolithic die. A processing unit circuit is formed in the first monolithic die. A plurality of SRAM arrays are formed in the second monolithic die, wherein the plurality of SRAM arrays include at least 2-15 G Bytes. A plurality of DRAM arrays are formed in the third monolithic die, wherein the plurality of DRAM arrays include at least 16-256 G Bytes. The first monolithic die, the second monolithic die and the third monolithic die are vertically stacked above the substrate.
In one embodiment of the present disclosure, the first monolithic die has a die area the same or substantially the same as a scanner maximum field area defined by a specific technology process node; the second monolithic die has a die area the same or substantially the same as the scanner maximum field area defined by the specific technology process node; and the third monolithic die has a die area the same or substantially the same as the scanner maximum field area defined by the specific technology process node.
In one embodiment of the present disclosure, the scanner maximum field area is not greater than 26 mm by 33 mm, or 858 mm2. In one embodiment of the present disclosure, the first monolithic die and the second monolithic die are enclosed within a single package; wherein the third monolithic die is electrically connected to the first monolithic die through the second monolithic die. In one embodiment of the present disclosure, the plurality of DRAM arrays include at least 128 G Bytes, 256 G Bytes or 512 G Bytes.
In one embodiment of the present disclosure, the processing unit circuit comprising a first processing unit circuit and a second processing unit circuit, wherein the first processing unit circuit includes a plurality of first logic cores, and each of the plurality of first logic cores includes a first SRAM set; the second processing unit circuit includes a plurality of second logic cores, and each of the plurality of second logic cores includes a second SRAM set, wherein the first processing unit circuit or the second processing unit circuit is selected from a group consisting of a graphic processing unit (GPU), a central processing unit (CPU), a tensor processing unit (TPU), a network processing unit (NPU) and a field programmable gate array (FPGA).
In one embodiment of the present disclosure, the plurality of DRAM arrays include a counter electrode on the top of the third monolithic die.
In one embodiment of the present disclosure, the processor IC further comprises a molding or shielding compound encapsulating the first monolithic die, the second monolithic die, and the third monolithic die, wherein a top surface of the counter electrode is revealed and not covered by the molding or shielding compound.
In one embodiment of the present disclosure, the processor IC further includes a top lead-frame contacted to the top surface of the counter electrode and the substrate; and a molding or shielding compound encapsulating the first monolithic die, the second monolithic die, the third monolithic die, and the top lead-frame.
Another aspect of the present disclosure is to provide an IC package, wherein the dual DRAM package includes a substrate; a first DRAM monolithic die and a second DRAM monolithic die. A first plurality of DRAM arrays are formed in the first DRAM monolithic die, wherein the first plurality of DRAM arrays include at least 16-256 G Bytes, and the first plurality of DRAM arrays include a first counter electrode on the top portion of the first DRAM monolithic die. The second plurality of DRAM arrays are formed in the second DRAM monolithic die, wherein the second plurality of DRAM arrays include at least 16-256 G Bytes; and the second plurality of DRAM arrays include a second counter electrode on the top portion of the second DRAM monolithic die. The first DRAM monolithic die and the second DRAM monolithic die are vertically stacked over the substrate; the second counter electrode of the second DRAM monolithic die is contacted to the substrate; and the first DRAM monolithic die is electrically connected to the substrate through the second DRAM monolithic die.
In one embodiment of the present disclosure, the second DRAM monolithic die is electrically coupled to the substrate through electrical bonding.
Another aspect of the present disclosure is to provide an integration system, wherein the integration system includes a carrier substrate, a first IC package, a second IC package and a metal shielding case. Wherein the first IC package is bonded to the carrier substrate; the second IC package is bonded to the carrier substrate; and the metal shielding case encapsulates the first IC package and the second IC package.
In one embodiment of the present disclosure, the integration system further includes a third IC package and a metal shielding case, wherein the third IC package is bonded to the carrier substrate; and the metal shielding case encapsulates the first IC package, the second IC package, and the third IC package.
In one embodiment of the present disclosure, the metal shielding case is thermally coupled to a first counter electrode on the top portion of the first DRAM monolithic die of the second IC package, and thermally coupled to a first counter electrode on the top portion of the first DRAM monolithic die of the third IC package.
The above and other aspects of the disclosure will become better understood with regard to the following detailed description of the preferred but non-limiting embodiment(s). The following description is made with reference to the accompanying drawings:
The present disclosure provides an integration system. The above and other aspects of the disclosure will become better understood by the following detailed description of the preferred but non-limiting embodiment(s). The following description is made with reference to the accompanying drawings:
Several embodiments of the present disclosure are disclosed below with reference to accompanying drawings. However, the structure and contents disclosed in the embodiments are for exemplary and explanatory purposes only, and the scope of protection of the present disclosure is not limited to the embodiments. It should be noted that the present disclosure does not illustrate all possible embodiments, and anyone skilled in the technology field of the disclosure will be able to make suitable modifications or changes based on the specification disclosed below to meet actual needs without breaching the spirit of the disclosure. The present disclosure is applicable to other implementations not disclosed in the specification.
The disclosure has proposed to integrate the following inventions:
For example,
Furthermore, each of the first conductor pillar portions 731a and the third conductor pillar portion 731b also has a seed region or seed pillar in the upper portion thereof, and such seed region or seed pillar could be used for the following selective epitaxy growth. Subsequently, a second conductor pillar portion 732a is formed on the first conductor pillar 731a by a second selective epitaxy growth; and a fourth conductor pillar portion 732b is formed on the third conductor pillar portion 731b.
This embodiment, as shown in
Furthermore, a localized isolation 48 (such as nitride or other high-k dielectric material) is located in one trench and positioned under the source region, and another localized isolation 48 is located in another trench and positioned under the drain region. Such localized isolation 48 is below the horizontal silicon surface (HSS) of the silicon substrate and could be called as localized isolation into silicon substrate (LISS) 48. The LISS 48 could be a thick Nitride layer or a composite of dielectric layers. For example, the localized isolation or LISS 48 could comprise a composite localized isolation which includes an oxide layer 481 covering at least a portion sidewall of the trench and another oxide layer 482 covering at least a portion bottom wall of the trench. The oxide layers 481 and 482 could be L-Shape oxide layer formed by thermal oxidation process.
The composite localized isolation 48 could further include a nitride layer 483 being over the oxide layer 482 or/and the oxide layer 481. The shallow trench isolation (STI) region could comprise a composite STI 49 which includes a STI-1 layer 491 and a STI-2 layer 492, wherein the STI-1 layer 491 and a STI-2 layer 492 could be made of thick oxide material by different process, respectively.
Moreover, the source (or drain) region could comprise a composite source region 55 and/or drain region 56. For example, in the NMOS transistor 52, the composite source region 55 (or drain region 56) at least comprises a lightly doped drain (LDD) 551 and an N+ heavily doped region 552 in the trench. Especially, it is noted that the lightly doped drain (LDD) 551 abuts against an exposed silicon surface with a uniform (110) crystalline orientation. The exposed silicon surface has its vertical boundary with a suitable recessed thickness in contrast to the edge of the gate structure. The exposed silicon surface is substantially aligned with the gate structure. The exposed silicon surface could be a terminal face of the channel of the transistor.
The lightly doped drain (LDD) 551 and the N+ heavily doped region 552 could be formed based on a selective epitaxial growth (SEG) technique (or other suitable technology which may be Atomic Layer Deposition ALD or selective growth ALD-SALD) to grow silicon from the exposed TEC area which is used as crystalline seeds to form new well-organized (110) lattice across the LISS region which has no seeding effect on changing (110) crystalline structures of newly formed crystals of the composite source region 55 or drain region 56. Such newly formed crystals (including the lightly doped drain (LDD) 551 and the N+ heavily doped region 552) could be named as TEC-Si.
In one embodiment, the TEC is aligned or substantially aligned with the edge of the gate structure 33, and the length of the LDD 551 is adjustable, and the sidewall of the LDD 551 opposite to the TEC could be aligned or substantially aligned with the sidewall of the spacer 34. The composite source (or drain) region could further comprise some tungsten (or other suitable metal materials, such as TiN/Tungsten) plugs 553 formed in a horizontal connection to the TEC-Si portion for completion of the entire source/drain regions. The active channel current flowing to future Metal interconnection such as Metal-1 layer is gone through the LDD 551 and the N+ heavily-doped region 552 to tungsten 553 (or other metal materials) which is directly connected to Metal-1 by some good Metal-to-Metal Ohmic contact with much lower resistance than the traditional Silicon-to-Metal contact.
The source/drain contact resistance of the NMOS transistor 52 can be kept for a reasonable range according to the structure of the merged metal-semiconductor junction utilized in the source/drain structure. This merged metal-semiconductor junction in the source/drain structure can improve current crowding effect and reduce contact resistance. Additionally, because the bottom of source/drain structure is isolated from the substrate due to the bottom oxide (oxide layer 482), the n+ to n+ or p+ to p+ isolation can be kept within a reasonable range. Therefore, the spacing between two adjacent active regions of the PMOS transistor (not shown) could be scaled down to 2A. The bottom oxide (oxide layer 482) can significantly reduce source/drain junction leakage current and then reduces n+ to n+ or p+ to p+ leakage current.
It results in a much longer path from the n+/p junction through the p-well (or p-substrate)/n-well junction to the n/p+ junction. As shown in
Moreover, it is possible that the composite STI 49 is raised up (such as the STI-2 layer 492 is higher than the original semiconductor surface and up to the top surface of the gate structure, such that the selectively grown source/drain regions will be confined by the composite STI 49 and will not be over the composite STI 49. The metal contact plug (such as Tungsten plug 553) can be deposited in the hole between the composite STI 49 and the gate structure without using another contact mask to create a contact hole. Moreover, the top surface and one sidewall of the heavily-doped region 552 is directly contacted to the metal contact plug, and the contact resistance of the source/drain regions could be dramatically reduced.
Furthermore, in convention design, the metal wires for high level voltage Vdd and low level voltage Vss (or ground) are distributed above the original silicon surface of the silicon substrate, and such distribution will interfere with other metal wires if there no enough spaces among those metal wires. The present invention also discloses a new standard cell or SRAM cell in which the metal wires for high level voltage Vdd and/or the low level voltage Vss could be distributed under the original silicon surface of the silicon substrate, thus, the interference among the size of the contacts, among layouts of the metal wires connecting the high level voltage Vdd, and low level voltage Vss, etc. could be avoided even the size of the standard cell is shrunk.
For example, in the drain region of the NMOS 51, the tungsten or other metal materials 553 is directly coupled to, the P-well (by removing the LISS 48) which is electrically coupled to Vdd. Similarly, in the source region of the NMOS 51, the Tungsten or other metal materials 553 is directly coupled to the p-well or P-substrate (by removing the LISS 48) which is electrically coupled to ground. Thus, the openings for the source/drain regions which are originally used to electrically couple the source/drain regions with metal-2 layer (M2) or metal-3 layer (M3) for Vdd or ground connection could be omitted in the new standard cell and standard cell.
To sum up, at least there are following advantages:
(1) The linear dimensions of the source, the drain and the gate of the transistors in the standard cell/SRAM could be precisely controlled, and the linear dimension can be as small as the minimum feature size, Lambda (A), as shown in the incorporated U.S. patent application Ser. No. 17/138,918. Therefore, when two adjacent transistors are connected together through the drain/source, the length dimension of the transistor would be as small as 3λ, and the distance between the edges of the gates of the two adjacent transistors could be as small as 2λ. Of course, for tolerance purpose, the length dimension of the transistor would be around 3λ-6λ or larger, the distance between the edges of the gates of the two adjacent transistors could be 8λ or larger.
(2) The first metal interconnection (M1 layer) directly connect Gate, Source and/or Drain regions through self-aligned miniaturized contacts without using a conventional contact-hole-opening mask and/or an Metal-0 translation layer for M1 connections.
(3) The Gate and/or Diffusion (Source/Drain) areas are directly connected to the metal-2 (M2) interconnection layer without connecting the metal-1 layer (M1) in a self-alignment way. Therefore, the necessary space between one metal-1 layer (M1) interconnection layer and the other metal-1 layer (M1) interconnection layer and blocking issue in some wiring connections will be reduced. Furthermore, same structure could be applied to a lower metal layer is directly connected to an upper metal layer by a conductor pillar, but the conductor pillar is not electrically connected to any middle metal layer between the lower metal layer and the upper metal layer.
(4) The metal wires for high level voltage Vdd and/or the low level voltage VSS in the standard cell could be distributed under the original silicon surface of the silicon substrate, thus, the interference among the size of the contacts, among layouts of the metal wires connecting the high level voltage Vdd, and low level voltage Vss, etc. could be avoided even the size of the standard cell is shrunk. Moreover, the openings for the source/drain regions which are originally used to electrically couple the source/drain regions with metal-2 layer (M2) or metal-3 layer (M3) for Vdd or Ground connection could be omitted in the new standard cell and standard cell.
Based on the above-mentioned,
Therefore, an innovation of an integrated scaling and/or stretching platform (ISSP) in its monolithic die design is proposed to provide an integration system, with any combination of the proposed technologies (such as, new transistor, interconnection-to-transistor, SRAM cell and standard-cell designs), such that an original schematic circuit of die that can be scaled down in its area by 2-3 times or more.
In another view, more SRAMs or more major different function blocks (CPU or GPU) could be formed in the original size of a single monolithic die. Thus, the device density and computing performance of an integration system (such as, an AI chip or SOC) can be significantly increased, in comparison with the conventional one having the same size, without shrinking the technology nodes for manufacturing the integration system.
Using 5 nm technology process node as example, a CMOS 6-T SRAM cell size can be shrunken to about 100 F2 (where F is the minimum feature size made on silicon wafers) as shown in
That is, in the event a single monolithic die has a circuit (such as a SRAM circuit, a logic circuit, a combination of a SRAM and a logic circuit, or a major function block circuit CPU, GPU, FPGA, etc.) which occupies a die area (such as Y nm2) based on a technology process node, with the help of the present invention, the total area of the monolithic die with the same schematic circuit could be shrunk, even the monolithic die is still manufactured by the same technology process node. The new die area occupied by the same schematic circuit in the monolithic die will be smaller than the original die area, such as be 20% to 80% (or 30% to 70%) of Y nm2.
For example,
In one view of shrinking the size of the ISSP integration system 1000, as shown in the middle of
In one embodiment, the combination area of the SRAM circuit 1001B and the logic circuit 1001A in the single monolithic die 1001 shrinks area by 3.4 times of area of the conventional monolithic die 1011. In other words, in comparison with the conventional monolithic die 1011, the ISSP of the present invention may lead the area the logic circuit 1001A of the single monolithic die 1001 shrunken by 5.3×; lead the area the SRAM circuit 1001B of the single monolithic die 1001 shrunken by 5.3×; and lead the combination area of the SRAM circuit 1001B and the logic circuit 1001A in the single monolithic die 1001 shrunken by 3.4× (as shown in the middle of
In the another view of adding more devices, as shown in the right hand of
In the present embodiment, the monolithic die 1101 of the ISSP integration system 1100 includes different level caches L1, L2 and L3 commonly made of SRAMs. Wherein the caches L1 and L2 (collectively “low level cache”) are usually allocated one per CPU or GPU core unit, with the cache L1 being divided into L1i and L1d, which are used to store instructions and data respectively, and the cache L2, which does not distinguish between instructions and data; and the cache L3 (could be one of “high level cache”), which is shared by multiple cores and usually does not distinguish between instructions and data either. The cache L1/L2 is usually one per CPU or GPU core.
For high speed operation, therefore, based on the ISSP of the present disclosure, the die area of the monolithic die 1101 may be the same or substantially the same as a scanner maximum field area (SMFA) defined by a specific technology process node. However, the storage volume of the cache L1/L2 (low level cache) and the cache L3 (high level cache) of the ISSP integration system 1100 could be increased. As shown in
Alternatively, other than the exiting major function block, another different major function block, such as FPGA, can be integrated together in the same monolithic die.
For example, the XPU 1101A′ of the ISSP integration system 1100′ could serve as a CPU, and the YPU 1101C of the ISSP integration system 1100′ could serve as a GPU. Each of the XPU 1101A′ and the YPU 1101C has multiple logic cores, and each core has low level cache (such as cache L1/L2 with 512K or 1M/128K bits), and a high volume of high level cache (such as, cache L3 with 32 MB, 64 MB or more) shared by the XPU 1101A′ and the YPU 1101C, and these three level caches may include a plurality of SRAM arrays respectively.
Due to the fact that a GPU is more and more critical for AI training, and FPGAs have blocks of logic that interact with each other and can be designed by engineers to help specific algorithms, and is suitable for AI inference. Thus, in some embodiments of the present disclosure, an ISSP integration system 1100″ having a single monolithic die 1101″ could include a GPU and a FPGA, as shown in
In addition, as shown in
In some embodiments of the present disclosure, somewhat larger capacity shared SRAM (or embedded SRAM, “eSRAM”) can be designed into one monolithic (single) die due to smaller area of SRAM cell design according to the present invention. Since high storage volume of eSRAMs can be used, it's faster and effective, as compared with the conventional embedded DRAM or the external DRAMs. Thus, it is reasonable and possible to have a high bandwidth/high storage volume SRAM within a single monolithic die which has a die size the same or substantially the same (such as 80%-99%) as scanner maximum field area (SMFA, such as 26 mm by 33 mm, or 858 mm2).
Therefore, the integration system 1200 provided by the integrated scaling and/or stretching platform (ISSP) of the present disclosure could include at least two single monolithic dies, and those two monolithic dies could have the same or substantially the same size. For example,
As shown in
According to
Of course, in consideration of selective usage of the different technologies proposed herein and the conventional Back End of Line technology, the SMFA (26 mm by 33 mm) of the single monolithic die 1202 may accommodate smaller volume of SRAM, such as ¼-¾ times SRAM size at different technology nodes in the above table 1. For example, the single monolithic die 1202 may accommodate around 2-15 GB (such as, 5-15 GB SRAM or 2.5 GB-7.5 GB), due to the selective usage of the different technologies proposed herein and the conventional Back End of Line technology.
As shown in
Of course, it is possible three, four or more HBSRAM dices can be integrated in a single package of the integration system 1500, then the caches L3 and L4 in the integration system 1500 could be more than 128 GB or 256 GB SRAM. In some embodiments of the present disclosure, the single monolithic dies 1301 and 1302 of the integration system 1500 could be enclosed in the same IC package.
Comparing with currently available HBM DRAM memory which includes around 24 GB based on the stack of 12 DRAM chips, the present invention could replace the HBM3 memory by more HBSRAM (such as one HBSRAM chip with around 5-10 GB or 15-20 GB). Therefore, no HMB memory or only few HBM memory (such as less than 4 GB or 8 GB HBM) is required in the ISSP.
The application of the integration system provided by the integrated scaling and stretching platform (ISSP) of the present invention is not limited to these regards as discussed above, the ISSP can be also applied to form integration system with DRAM cell structure, such as a rack server having DRAM Dual In-line Memory Modules (DRAM DIMMs).
Nowadays, rack servers are commonly used for data center and cloud computing application. Each rack server may include one or two top-tier server processors and 4-8 memory slots for inserting DRAM DIMMs. A traditional top-tier server processor 1600, such as a AMD 3rd generation EPYC™ processor as shown in
However, the distance between the server processor 1600 and the DIMM slots on the motherboard of the rack server may be 3-10 cm, the operation frequency for the server processor may up to 3.5 G-4 G Hz and the operation frequency for the DDR 5 may be up to 4.8 GHz. Therefore, the signal propagation distortion and EMI issues in such rack server are always challenging problems.
The problems can be solved by applying the integrated scaling and stretching platform (ISSP) of the present invention, as previously mentioned in
In the present embodiment, a single monolithic die 1701 includes processing chips 17011&17012 each of which may includes 16 or 32 cores (each with L1/L2 caches), and other circuits 17013 (e.g., I/O, security, communication circuits) originally arranged in the top-tier server processor 1600 can also be integrated in a single monolithic die 1701; and 2-5 GB (or 5-10 GB, or 10 GB-15 GB) L3/L4 SRAM caches originally arranged in the top-tier server processor 1600 can be integrated in a second single monolithic die 1702.
Thus, the 9 separate packaged ICs originally arranged in the up-to-date server processor (AMD 3rd generation EPYC™ processor) 1600 could be transformed into two the separate monolithic dies 1701 and 1702 based on ISSP proposed by this invention, wherein one single monolithic dies 1701 have 32-64 processing cores, L1/L2 SRAM caches and other circuits (e.g., I/O, security, communication circuits), and the monolithic die 1702 has 2-5 GB (or 5-10 GB, or 10 GB-15 GB) or more L3/L4 SRAM caches, as shown in
Moreover, a new DRAM cell structure (“M-Cell 1800”) base on the integrated scaling and stretching platform (ISSP) of the present invention is disclosed, the area of which could be as small as 4-6λ2 or 4-10λ2.
Firstly, word lines and the gate structures (including a high-k insulator layer 1304 and a gate material 1306) of a plurality of access transistors AQ1, AQ2 and AQ3 are formed in U-shaped concaves of horizontal silicon surface (hereinafter, “HSS”) of the substrate 202. As shown in
The suitable high-k insulator layer 1304 is formed as a gate dielectric layer of the access transistor, wherein a top of two edges of the high-k insulator layer 1304 could be higher than the HSS. Afterwards select a suitable gate material 1306 that is appropriate for a word line conductance and can achieve a targeted work-function performance for the access transistor to have a lower threshold voltage (a goal of selecting the suitable gate material 1306 is to reduce a boosted word line voltage level to be as low as possible but provide sufficient device drive in completing enough amount of charges to be restored into the capacitor and, on the other hand, in facilitating faster charge transfer for signal sensing).
The gate material 1306 is thick enough to fill in the U-shape concaves between two adjacent longitudinal stripes (the oxide-3 layer 1102 and the nitride-2 layer 1104). Then, the gate material 1306 is etched back to result in a longitudinal (the Y direction) word line which is sandwiched between two adjacent longitudinal stripes (the oxide-3 layer 1102 and the nitride-2 layer 1104). The newly proposed access transistor (hereafter called as U-transistor) with the U-shaped channel 1302 is different from a recessed transistor commonly used in the state-of-the-art buried word line design. The U-transistor has its body with two sides bounded by the CVD-STI-oxide2 along the Y direction (i.e. a channel width direction) and its channel length including a depth of one edge of the U-shaped channel 1312 on a side corresponding to a drain of the U-transistor, a length of a bottom of the U-shaped channel 1312, and a depth of another edge of the U-shaped channel 1312 on a side corresponding to a source of the U-transistor.
Due to a structure difference between the U-transistor and the recessed transistor, the channel length of the U-transistor can be much better controlled. In addition, since the HSS is fixed, the dopant concentration profiles of the drain and the source of the U-transistor, respectively, are much more controllable with less device-design-parameter variations as revealed more clearly as to be described later about how to complete the drain and the source of the U-transistor. In addition, forming simultaneously the gate structure of the U-transistor and the word line in the longitudinal direction by self-alignment between the two adjacent longitudinal stripes (the oxide-3 layer 1102 and the nitride-2 layer 1104) is such a way that the word line is not below the HSS, wherein that the word line is not below the HSS presents quite different design and performance parameters from the commonly used buried word line. In addition, a height of the word line (i.e. the gate material 1306) is designed to be lower than that of the composite layers (composed of the oxide-3 layer 1102 and the nitride-2 layer 1104) by using the etching-back technique (shown in
Next, an oxide-7 plug made of oxide-7 layer is formed in the hole-1/3 that is formed in the center of the source region below the HSS-1/3; a tungsten plug made of a metal layer 2802 is formed inside the hole-1/2 that is formed in the drain region to connect with the UGBL (Underground Bit line which is below the HSS); and a necklace-type conductive n+ silicon 3202 (named as n+ silicon drain-collar) connecting to the HSS on two sides of the hole-1/2 as the drain-1 and the drain-2 of the access transistors AQ1, AQ2, respectively, and also as a conductive bridge (i.e. bridge contact) between the UGBL and the access transistors AQ1, AQ2 (as shown in
Elevated source electrodes EH-1S and elevated drain electrodes EH-1D are respectively formed in a vertical direction above the HSS by a selective epitaxy silicon growth technology, using the exposed HSS as the seed; and elevated source electrodes EH-2S and elevated drain electrodes EH-2D are respectively formed by carrying out another selective epitaxial silicon growth process using the exposed silicon surfaces of the source electrode EH-1S and the drain electrode EH-1D as high-quality silicon seeds (as shown in
The elevated source electrode EH-1S and the elevated drain electrode EH-1D could be the pure silicon material rather than polycrystalline or amorphous silicon materials since they are well grown gradually by using the exposed HSS) as the seed. Both the elevated source electrode EH-1S and the elevated drain electrode EH-1D are surrounded by gate structure/wordline and the oxide-5 spacer on the left sidewall and the right sidewall along the X-direction. Although the other two sidewalls along the Y-direction are widely opened, the CVD-STI-oxide2 cannot provide the seeding function for growing up the selective epitaxial silicon and therefore the selective epitaxy silicon growth should result in having some laterally over-grown pure-silicon materials which stop on the edges of CVD-STI-oxide2 and have no possibility of causing connections of the neighboring electrodes. In addition, after the elevated source electrode EH-1S and the elevated drain electrode EH-1D are grown, an optional RTA (rapid temperature annealing) step can be utilized to form NLDD (n+ lightly doped drain) 4012 under the elevated source electrode EH-1S or the elevated drain electrode EH-1D, such that the elevated source electrode EH-1S or the elevated drain electrode EH-1D has better electrical connection to channel region of the transistor.
During the selective epitaxial silicon growth process for growing the elevated source electrodes EH-2S and the elevated drain electrodes EH-2D, a well-designed heavier in-situ n+ doping concentration can be achieved in the elevated source electrode EH-2S and the elevated drain electrode EH-2D in order to be prepared for a low-resistivity connection between the elevated source electrode EH-2S (or the elevated drain electrode EH-2D) and the storage electrode of the stacked storage capacitor (SSC) which will be made later. The combination of the elevated source electrode EH-1S and the elevated source electrode EH-2S is called as the elevated source electrode EH-1+2S (similarly, the combination of the elevated drain electrode EH-1D and the elevated drain electrode EH-2D is called as the elevated drain electrode EH-1+2D). In addition, taking the elevated source electrode EH-1+2S as an example, the upper portion of the elevated source electrode EH-1+2S, i.e. the elevated source electrode EH-2S, has some high-quality, n+ doped silicon material directly abutted to the spacer on one sidewall and the opposite sidewall is close to gate structure/wordline, and the other two sidewalls are widely open on the Y-direction along the longitudinal word line. The height of the elevated source electrode EH-1+2S (the height of the elevated drain electrode EH-1+2D) is well designed to be lower than that of the spacer.
As shown in
As shown in
Next, as shown in
In summary, the proposed HCoT cell which not only compacts the size of the DRAM cell but also enhances the signal-to-noise ratio during the DRAM operation. Since the capacitor is located over the access transistor and largely encompasses the access transistor as well as inventing both vertical and horizontal self-alignment techniques of arranging and connecting the geometries of these essential micro-structures in the DRAM cell, the new HCoT cell architecture can reserve the merit of at least 4 to 10 square units even when the minimum physical feature size is much less than 10 nanometers. The area of the H-capacitor may occupy 50%-70% of the HCoT cell area. The detailed description regarding the manufacture process of the HCoT cell structure could refer to the U.S. application Ser. No. 17/337,391, filed on Jun. 2, 2021 and entitled “MEMORY CELL STRUCTURE”, and the whole content of the U.S. application Ser. No. 17/337,391 is incorporated by reference herein.
Furthermore, the metal electrode of the capacitor in the new HCoT cell architecture offers an efficient route for heat dissipation and so the temperature of the HCoT cell during the operation could be lower accordingly, such lower temperature will then reduce both the leakage currents from the capacitor and the thermal/operational noises. Additionally, the metal electrode further encompasses the word line passing through the access transistor, and the combination of such encompassed word lines with the underground bit lines (UGBLs) made below the silicon surface could effectively shield the cross-coupling noises among different word lines/bit lines, and thus the problematic pattern sensitivity issue in traditional DRAM cell array operations could be dramatically reduced. Besides, the UGBL below the silicon surface of the present invention can flexibly lower the resistivity and capacitance of the bit lines, therefore, the signal sensitivity during the charge sharing period between the capacitor and the bit line could be improved and thus the operation speed of the new architecture of HCoT cell could be enhanced as well.
Using 4λ2 for the area of the M-Cell as an example, the total bytes within the SMFA of 26 mm by 33 mm based on different technology nodes (supposing 50% DRAM cell utilization rate, that is, 50% of the SMFA is used for DRAM cell, the rest of SMFA is used for DRAM I/O circuit) could be 25 times of the total bytes of SRAM in the aforesaid Table 1, since the size of the new SRAM according to the present invention is 100λ2. For example, the SMFA of 26 mm by 33 mm could at least accommodate 537 GB (21.5 GB×25) DRAM at technology node=5 nm, and may provide more in the event the utilization rate is more than 50%. The SMFA of 26 mm by 33 mm could at least accommodate 68.5 GB (2.74 GB×25) DRAM at technology node=14 nm, 134 GB (5.36 GB×25) DRAM at technology node=10 nm, and 272.5 GB (10.9 GB×25) DRAM at technology node=7 nm. Thus, a monolithic DRAM die with 64-512 GB (such as 64 GB, 128 GB, 256 GB, or 512 GB) could be available, and the top of the monolithic DRAM die is covered by the counter-electrode. Of course, in consideration of tolerance, variation, and the conventional Back End of Line technology, the SMFA (26 mm by 33 mm) of the single monolithic die may accommodate smaller volume of M-Cell DRAM, such as ¼-½ times DRAM size at different technology nodes in the above mentioned. For example, the single monolithic die may accommodate around 16-128 GB (such as, 16, 32, 64, or 128 GB) or 32-256 GB (such as 32, 64, 128, or 256 GB), due to the selective usage of the different technologies proposed herein and the conventional Back End of Line technology.
However, the structure of the single molding package is not limited to this regard. For example,
Moreover, for high performance computing, a new ISSP rack server unit 2100 including two ISSP server processors (such as, the server processors 2000 and 2000′ as shown above) attached to another substrate (such as ABF substrate or PCB substrate) 2101 and encapsulated by metal shielding casing 2102 is proposed.
To increase the DRAM capacity in the ISSP rack server unit 2200, two monolithic DRAM chips 2201 and 2202 based on M-Cell structure 1800 could be encapsulated in a molding/shielding compound 2205.
Such new ISSP rack server unit 2300 may include 80-640 GB (such as 512 GB), 1 TB or 2 TB DRAM, and 2-15 GB (such as 10 GB) or more SRAM. Furthermore, since the all DRAMs are encapsulated by shielding compound 1912 and metal shielding casing 2302, the EMI issues could be improved. Additionally, since the top of the ISSP server processor (the server processor 1900) and the Dual DRAM package (the ISSP rack server unit 2200) are covered by the counter electrodes of the DRAM chip (e.g. the top metal of counter-electrode 1903a of the DRAM monolithic die 1903 and/or the counter electrodes 2202a of the top DRAM chip 2202), the metal shielding case 2302 could be thermally coupled (not shown) to those counter electrodes 1903a and/or 2202a for better heat dissipation.
Monolithic integration on a single die which enables the success of Moore's Law is now facing its limits, especially due to limits of photography printing technologies. On one hand the minimum feature size printed on the die is very costly to be scaled in its dimension, but on the other hand the die size is limited by a Scanner Maximum Field Area. But more and diversified functions of processors are emerging, such requirement are hard to integrate on a monolithic die. In addition, somewhat duplicated existence of small eSRAMs on each major function die and external or embedded DRAMs are not a desirable and optimized solution. Based on the integrated scaling and/or stretching platform (ISSP) in a monolithic die or SOC die:
(1) A single major function block like FPGA, TPU, NPU, CPU or GPU can be shrunk to a much smaller size;
(2) More SRAM could be formed in the monolithic die; and
(3) Two or more major function block, such as GPU and FPGA (or other combination of), which has also gone through this ISSP to become smaller, can be integrated together in the same monolithic die.
(4) More levels of caches could be existed in a monolithic die.
(5) Such ISSP monolithic die could be combined with another dies (such as eDRAMs) based on heterogeneous integration.
(6) HPC Die 1 with L1 &L2 caches could be electrically connected (such as wire bonding or flip chip bonding) to one or more HBSRAM Dice 2 which are utilized as L3&L4 caches in a single package, each of the HPC Die 1 and the HBSRAM Die 2 has SMFA.
(7) No HMB memory or only few HBM memory is required in the ISSP.
(8) For data center and cloud computing application, the ISSP server processor is proposed with three monolithic dice in a single molding package, one is single monolithic die which comprises logic circuits (such as XPU and YPU; more than 32 or 64 cores), I/O circuit and a few L1 and L2 level SRAM caches; another is SRAM monolithic die with 10 GB, 20 GB, or more L3/L4 caches; and the other is DRAM monolithic die with 128 GB, 256 GB, 512 GB, or more.
(9) Two or more ISSP server processors could be attached to a PCB substrate and encapsulated by metal shielding casing, as an ISSP rack server unit for high performance computing.
(10) One aforesaid ISSP server processor and two “Dual DRAM packages” could be attached to a PCB substrate, and then encapsulated by metal shielding case, as an ISSP rack server unit for high storage capacity.
While the invention has been described by way of example and in terms of the preferred embodiment (s), it is to be understood that the invention is not limited thereto. On the contrary, it is intended to cover various modifications and similar arrangements and procedures, and the scope of the appended claims therefore should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements and procedures.
This application claims the benefit of U.S. provisional application Ser. No. 63/303,542 filed Jan. 27, 2022, the subject matter of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
63303542 | Jan 2022 | US |