The present invention relates to a semiconductor module.
Conventionally, volatile memories such as DRAM (Dynamic Random Access Memory) have been known as storage devices. DRAM is required to have a large capacity capable of withstanding high performance of an arithmetic unit (hereinafter referred to as a logic chip) and an increase in amount of data. Therefore, the capacity has been increased by miniaturizing a memory (memory cell array, memory chip) and increasing the number of cells in a plane. On the other hand, this type of increase in capacity has reached its limit due to the weakness to noise caused by the miniaturization, the increase in die area, and the like.
In view of this, in recent years, a technology has been developed that realizes a large capacity by stacking a plurality of planar memories to form a three-dimensional (3D) structure (for example, refer to Patent Documents 1 to 4).
Incidentally, with the increase in performance of the MPU and the increase in amount of data, an improvement in communication rate between the MPU and the DRAM is also required along with the increase in capacity. Although the communication rate between the MPU and the DRAM can be improved by improving the memory bandwidth, the data transfer power (consumed power) is also increased by improving the communication rate. For example, assuming the energy required to transfer one bit of data between a sense amplifier of the DRAM and a processing element of the processor is 1 pJ, the data transfer power reaches 1024 W at a memory bandwidth of 128 TB/s. Therefore, it is very useful if the memory bandwidth can be widened and the data transfer efficiency can be improved by reducing the power consumption.
It is an object of the present invention to provide a semiconductor module which can increase memory bandwidth and reduce power consumption to improve data transfer efficiency.
The present invention relates to a semiconductor module including: an interposer; and a processing unit including a plurality of processing unit main bodies arranged side by side in a first direction along a plate surface of the interposer, the processing unit being mounted on the interposer and electrically connected to the interposer, in which the processing unit main bodies each include a plurality of subset units each having one arithmetic unit including at least one core and one memory unit arranged side by side in the first direction of the arithmetic unit and configured by a stacked RAM module, and the plurality of subset units is arranged side by side in a second direction intersecting with the first direction.
Furthermore, it is preferred that the processing unit further includes a router unit that relays data communication between the plurality of processing unit main bodies and that is arranged side by side in the second direction of the processing unit main bodies.
Furthermore, it is preferred that the interposer includes a communication line that connects a plurality of router units.
Furthermore, it is preferred that the arithmetic unit includes a first interface unit at one end adjacent to the memory unit arranged side by side, and the memory unit includes a second interface unit at one end adjacent to the arithmetic unit arranged side by side.
According to the present invention, it is possible to provide a semiconductor module which can increase the memory bandwidth and reduce the power consumption to improve the data transfer efficiency.
Hereinafter, a semiconductor module according to an embodiment of the present invention will be described with reference to the accompanying drawings. The semiconductor module 1 according to the present embodiment is, for example, a system in a package (SIP) in which an arithmetic unit (hereinafter referred to as an MPU) and a stacked DRAM are disposed on an interposer. The semiconductor module 1 is disposed on another interposer or a package substrate, and is electrically connected by using a micro bump. The semiconductor module 1 is a device that can obtain power from another interposer or package substrate, and transmit and receive data to and from another interposer or package substrate.
As shown in
The processing unit 20 is mounted on the interposer 10, and is electrically connected to the interposer 10. As shown in
The processing unit main body 21 is formed in a rectangular shape when viewed from the front. The processing unit main body 21 includes an arithmetic unit group C in which a plurality of arithmetic units 23 described later are arranged side by side, and a memory unit group D in which a plurality of memory units 24 described later are arranged side by side.
The arithmetic unit group C is formed in a rectangular shape when viewed from the front, and is configured by arranging the arithmetic units 23, which will be described later, along the plate surface of the interposer 10 in the second direction F2 intersecting with the first direction F1. In other words, the arithmetic unit group C is formed in a rectangular shape that is long in the second direction F2 when viewed from the front.
The memory unit group D is formed in a rectangular shape when viewed from the front, and the memory units 24, which will be described later, are arranged side by side in the second direction F2. In other words, the memory unit group D is formed in a rectangular shape that is long in the second direction F2 when viewed from the front. The memory unit group D is arranged side by side with the arithmetic unit group C in the first direction F1. Here, as shown in
In the present embodiment, sixteen pieces (plural) of processing unit main bodies 21 are provided. As shown in
As shown in
The arithmetic unit 23 is formed in a rectangular shape when viewed from the front, and is disposed on the interposer 10. The arithmetic unit 23 is connected to the interposer 10 by using an ACF (anisotropic conductive film), Hybrid Bonding, a micro bump, or the like. The arithmetic unit 23 includes at least one core 25.
In the present embodiment, as shown in
The memory unit 24 is configured by a stacked RAM module, and is formed in a rectangular shape when viewed from the front. In the present embodiment, the memory unit 24 is configured by a stacked DRAM module. The memory unit 24 is disposed on the interposer 10. The memory unit 24 is connected to the interposer 10 by using an anisotropic conductive film (ACF), Hybrid Bonding, a micro bump, or the like. The memory unit 24 is arranged side by side in the first direction F1 of the arithmetic unit 23, which is one of the left and right sides along the plane of the drawing in
According to the subset unit 22 described above, the entirety of the processing unit main body 21 is configured by 256 cores 25 (256 processing elements (PEs))/cores), and has a 64-channel configuration (64 MB/channel). Each channel has a memory bandwidth of 128 GB/s by being configured with a 256 b width and a 4 Gbps communication rate, and is configured with an 8 TB/s memory bandwidth as a whole of 64 channels. In the processing unit main body 21, the capacity of the memory unit 24 is configured to be 4 GB. Since the entire module comprises 16 processing unit main bodies 21, the entire module is configured with 4096 cores 25, 1024 channels, a memory bandwidth of 128 TB/s, and the capacity of the memory unit 24 comprising 64 GB.
Furthermore, in the plurality of subset units 22, the arithmetic unit 23 and the memory unit 24 are disposed in the same order in the first direction F1, as shown in
The router unit 30 relays data communication between the plurality of processing unit main bodies 21. The router unit 30 is connected to another router unit 30 via the communication line 12 of the interposer 10. The router unit 30 is arranged side by side in the second direction F2 of the processing unit main body 21. Specifically, the router unit 30 is arranged side by side in the second direction F2 of the arithmetic unit 23 of the processing unit main body 21. In the present embodiment, as shown in
Next, the wiring unit 26 will be described. The wiring unit 26 is a wiring formed on the interposer 10, and is disposed in a layered shape on the interposer 10. The wiring unit 26 electrically connects one end of the arithmetic unit 23 of the subset unit 22 with one end of the memory unit 24 in the first direction F1. In addition, a plurality of wiring units 26 is disposed in accordance with respective positions of the subset portions 22 arranged side by side in the second direction F2. In the present embodiment, the wiring unit 26 is configured by two copper pads (not shown) of 2 μm pitch and copper or aluminum wiring (not shown) of 1 μm pitch. Each of the copper pads is connected to one end of one arithmetic unit 23 and one end of one memory unit 24 in one subset unit 22, and each of both ends of the copper or aluminum wiring is connected to the two copper pads. The copper or aluminum wiring is formed with a length L2 of, for example, 0.2 mm in the first direction F1.
The above semiconductor module 1 operates as follows. As shown in
In the semiconductor module 1 shown in
According to the semiconductor module 1 according to an embodiment as described above, the following effects are obtained.
(1) The semiconductor module 1 includes an interposer 10 and a processing unit 20 having a plurality of processing unit main bodies 21 arranged side by side in a first direction F1 along the plate surface of the interposer 10, and mounted on the interposer 10 and electrically connected to the interposer 10. In addition, the processing unit main body 21 includes a plurality of subset units 22 each having one arithmetic unit 23 including at least one core 25 and one memory unit 24 arranged side by side in the first direction F1 of the arithmetic unit 23 and configured by a stacked RAM module. The plurality of subset portions 22 is arranged side by side in the second direction F2 intersecting with the first direction F1. As a result, the core 25 of the arithmetic unit 23 and the memory unit 24 can be disposed close to each other, so that the connection distance therebetween can be shortened. As a result, the memory bandwidth can be widened and the power required for data communication can be reduced, so that the data transfer efficiency can be improved.
(2) The processing unit 20 further includes a router unit 30 that relays data communication between the plurality of processing unit main bodies 21 and that is arranged side by side in the second direction F2 of the processing unit main body 21. As a result, data communication between the processing unit main bodies 21 becomes possible, so that it is possible to improve the arithmetic efficiency using the plurality of subset units 22.
(3) The interposer 10 includes a communication line 12 that connects a plurality of router units 30. Since the communication line 12 is provided in the interposer 10, the router units 30 can be connected to each other without providing a separate wiring, and thus, both can be easily connected to each other.
(4) The arithmetic unit 23 includes the first interface unit 27 at one end thereof adjacent to the memory unit 24 arranged side by side, and the memory unit 24 includes the second interface unit 28 at one end thereof adjacent to the arithmetic unit 23 arranged side by side. Since the first interface unit 27 and the second interface unit 28 are disposed close to each other, the length of the signal line connecting the arithmetic unit 23 and the memory unit 24 can be further shortened.
Although the preferred embodiment of the semiconductor module of the present invention has been described above, the present invention is not limited to the above-described embodiment, and can be modified as appropriate.
For example, in the above embodiment, the combination of the stacking direction power supply connection terminal of the memory unit 24 and the stacking direction signal connection terminal of the memory unit 24 can be formed as shown in Table 1 below.
In the above embodiment, the processing unit 20 is configured by 16 pieces of the processing unit main bodies 21 in 8 rows in the first direction F1 and 2 columns in the second direction F2 in total; however, the number of the first direction F1 and the second direction F2 is not limited to this. In a case in which the plurality of processing unit main bodies 21 is disposed in the first direction F1, and one processing unit main body 21 is disposed in the second direction F2, the router unit 30 is disposed adjacent to the arithmetic unit 23 column, for each set of the processing unit main bodies 21. Furthermore, in a case in which three or more processing unit main bodies 21 are disposed in the second direction F2, the router unit 30 may be disposed adjacent to the two arithmetic unit group C between the processing unit main bodies 21 in the second direction F2. In a case in which the processing unit main body 21 is disposed as a single unit instead of a pair in the first direction F1, the router unit 30 is disposed adjacent to the arithmetic unit group C of the single processing unit main body 21. Furthermore, the router unit 30 and the arithmetic unit 23 in the processing unit main body 21 may be connected by a Network on Chip (NoC). The location of the router unit 30 may be appropriately changed, or a plurality thereof may be disposed.
In the above embodiment, the scales, the number of channels, the communication rate, the number of cores 25, the number of stacks, and the like of the arithmetic unit 23, the memory unit 24, and the wiring unit 26 are merely examples, and the present invention is not limited thereto.
In the above embodiment, the second direction F2 is a direction orthogonal to the first direction F1; however, the present invention is not limited thereto. In other words, the second direction F2 may be a direction substantially orthogonal to the first direction F1 along the plate surface of the interposer 10, or may be a direction inclined with respect to the first direction F1.
In the embodiment described above, one arithmetic unit 23 constituting the subset unit 22 and one memory unit 24 are disposed in contact with each other; however, the present invention is not limited thereto. One arithmetic unit 23 and one memory unit 24 may be disposed at predetermined intervals. In addition, in the first direction F1, the subset portions 22 may be disposed in contact with each other or may be disposed at predetermined intervals.
Furthermore, the arithmetic unit is not limited to MPUs, and may be applied to a wide range of logical chips. The memory is not limited to DRAM, and may be applied to a wide range of RAM (Random Access Memory) including nonvolatile RAM (e.g., MRAM, ReRAM, and FeRAM).
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2017/020690 | 6/2/2017 | WO | 00 |