Current memory circuits that use double data rate (DDR) and quadruple data rate (QDR) access schemes have separate address, write data, read data and status pins. These access schemes require high frequency data transmission links that provide low bit error rate (BER), high bandwidth and low on-chip latency. Bandwidth is the amount of information exchanged during read and write operations. Latency is the time lapsed between an event in an input signal and a corresponding event in an output signal that results from the event in the input signal. For example, in a memory circuit latency is the time lapsed between the receipt of a ‘Read’ command at an input pin of the memory circuit and the transmission of the corresponding read data to the output pins of the memory circuit.
In a device that has a serial transmission link one or more serializer-deserializer (SERDES) circuits convert data packets between serial and parallel formats. It is common practice to place the SERDES circuits and other associated logic components along the periphery of the silicon chip. Such architecture results in a wide spread in latencies in the silicon, depending on the distance between the SERDES and the specific functional block that is the source or the destination of the data. Thus, worst case timing latency is determined by the longest path set by the I/O which is the furthest away from any one device resource. A typical layout of I/O at the periphery would result in the worst case path from one corner of the die to the opposite corner. The resulting distance that an input signal must traverse could be the width plus the height of the die.
Error rates are expected to increase for high speed data links. Many circuits have a cyclic redundancy check (CRC) circuit to perform error checking on data packets. Error checking is performed across the entire data packet, which may be striped across multiple data lines to increase bandwidth and to reduce latency. However, such an approach requires that multiple data lines converge into the CRC circuit to allow error checking, thus adding to the length of the traces that signals must traverse for an operation.
Moreover, heaviest packet traffic in a device typically occurs as communication among functional blocks formed in or on the silicon substrate. Data lines formed in or on the silicon substrate are dimensionally constrained, thus representing significant capacitive and resistive loads to the paths the signals must traverse. In addition, communication lines in or on silicon further need to circumvent the functional blocks that create barriers to signal routing, adding to the lengths of the communication lines. As a result, on die packet traffic routed through communication lines on a silicon substrate with a significant density of functional blocks will experience increased latencies.
In an application using a SERDES circuit, placement of a power pin next to a data pin in a package substrate complicates “signal escape” to an external component. Routing signals in a printed circuit board from a signal pad at the center of the chip through a “picket fence” of power pins exposes the data signal on the signal pad to interference, cross-talk, and distortion. Thus packages where the signal pins are toward the outer edges of the packet reduce the picket fence effect. To overcome the above problem, it is customary to place I/O signals at the edge of the silicon substrate. However such placement can negatively impact the overall latency of the circuit. Package pin-out configuration is a concern in integrated circuit design.
Tx/Rx differential pairs are typically grouped closely together in high speed communication systems. Each Tx transmitter includes a transmit channel that conveys read data and status information out of a package. Each Rx receiver includes a receive channel that receives address, control and write data from outside of the package. In networking devices, the proximity of Tx and Rx channels can result in data crosstalk and an increase in bit flips.
Bandwidth becomes more significant when a SERDES block is combined with a high speed memory block. Due to the proximate locations of Tx to Rx, a conventional systems have a significantly limited signal line density, which adversely affects the available bandwidth. In high speed communication systems, it is increasingly critical to have a significant amount of line/signal density for improving the device bandwidth.
U.S. Pat. No. 7,405,946 to Hall et al. (“Hall”) separates transmitter contacts from receiver contacts in a high speed interface pattern. However, Tx data channels in Hall's pattern must be positioned parallel to Rx data channels to convey data from the transmitter out to the host. Parallel Tx/Rx channels tend to degrade data signals and increase error rates. In Hall's Tx/Rx pattern, the data line transporting a high speed Tx signal must cross over an Rx data line before exiting the PC board. Such proximity of Rx contacts to Tx contacts contributes to noise coupling between Tx and Rx signals. Thus, Hall does not resolve the problem of Inter Signal Interference (ISI) for high speed data links.
Accordingly, there is a need for an IC device layout that takes into account the routing delay for high speed data signals on a PCB or a SOC. In addition, a need exists for simplified data path routing for high speed networking devices to minimize the routing length through the silicon die. Further, a need exists for reducing the amount of interference between Rx and Tx signals while easing printed circuit board layout.
The present invention provides a layout for a semiconductor device coupled to a second device. To optimize the high speed transmission rates in the present invention, at least two functional circuit blocks (“IP cores”) are symmetrically located with respect to a central axis on a semiconductor die; each core being accessible via a plurality of Tx and Rx data lines. A serial interface is centered on the die between the two IP cores. The serial interface includes multiple ports which serve as nodes coupled to various data lines. In particular, the serial interface includes multiple transmitter ports and multiple receiver ports. The ports are coupled together by Tx data lines and Rx data lines. The die itself has multiple metal layers and is encapsulated in a package having multiple routing layers.
The present invention is also directed to a semiconductor device coupled to a second device, where the semiconductor device contains a die divided into two partitions. An IP core is contained in each partition. Further, multiple receiver terminals are located in the first partition of the die, and multiple transmitter terminals are located in the second partition of the die. A serial interface is further incorporated on the die and is positioned adjacent to one of the IP cores, wherein the serial interface includes transmitter ports and receiver ports. The IC device also includes Tx data lines, originating from respective Tx ports wherein each Tx port serializes and transmits a serial data signal for output on a Tx data line to one of said IP cores; and Rx data lines, originating from respective receiver ports, wherein each receiver port receives and deserializes a serial data signal for output on an Rx data line to one of said IP cores.
Another embodiment of the invention is directed to a stacked die that includes multiple dies attached together. At least one die in the stack assembly has Rx terminals in a first partition of the die and Tx terminals in a second partition of the die. At least one of the dies in the stack has a serial interface in a central region of the chip layout. Thus, it is not necessary for all the dies in the stack assembly to have the same chip layout as the die of the present invention.
The invention is also directed to a stacked die assembly that operates with reduced power, and propagation delay. By centrally locating the SERDES interface on the top surface of the die the driving distance is reduced by approximately one half. The reduced driving distance correlated to the layout of the invention reduces the system latency as well as power.
Other features of the invention will be described in connection with the accompanying drawings.
The present invention balances the access time and propagation delays for a signal entering a die across all physical corners of the silicon. This is achieved by providing a SERDES interface in the center of the die.
It is not necessary for the IP cores of the present invention to have the same function or to be limited to memory blocks. In all embodiments, at least one IP core (functional block) is located in each partition. In one embodiment, each partition may constitute an equivalent half, that is, each partition may have the same area. However, it is not necessary that the partitions of the present invention have the same area as illustrated in
Each SERDES block 115 contains Rx/Tx unit 122a, 124a/122b, 124b, respectively. Each Tx port in Tx unit 122b, 124b contains a differential pair of transmitters, the transmitter pairs are grouped with the transmitters of the same Tx unit. Each Rx port in Rx unit 122a, 124a contains a differential pair of receivers that are isolated from the Tx ports in Tx unit 122b, 124b. In addition, each Tx port and each Rx port has clocking functionality to implement PLL circuitry. Although 16 Tx ports and 16 Rx ports are shown, the present invention is also applicable to a SERDES block that has a different number of Tx/Rx ports. Preferably, the Rx ports in Rx unit 122a, 124a occupy a portion of the upper partition 50 of the die layout and the Tx ports in Tx unit 122b, 124b occupy a lower partition 52 of the die layout. By placing the SERDES block in approximately the center of the die, the distance of the data access from opposite edges of the die is more uniform than in the prior art. As a result, the layout of the present invention provides a symmetrical or nearly symmetrical point of entry for each data signal.
Portion 375-1 (375-2) may be used to provide an extra Tx data channel 551-1 (551-2) (see,
In the present invention, a Tx signal will take longer to travel from bump 30 in the serial interface 322b through the die (400 of
Conducting balls 216-1a,b to 216-16a,b are coupled to Rx data lines 552-1 to 552-16; conducting balls 516-1a,b are coupled to Rx data line 553-1; and conducting balls 516-2a, 516-2b are coupled to Rx data line 553-2. Conducting balls 215-1a, 215-1b to 215-16a, 215-16b are coupled to Tx data lines 550-1 to 550-16; conducting balls 515-1a, 515-1b are coupled to Tx data line 551-1; and conducting balls 515-2a, 515-2b are coupled to Tx data line 551-2. All other elements in
A semiconductor device that contains the layout of the present invention will be referred to in this description as a Bandwidth Engine (BE) device. The problems overcome by adopting the layout of the BE device will be explained in reference to the prior art system of
Data line 70 in
The present invention will be further explained in reference to
The present invention may also be implemented by positioning chip 100 on either side of chip 200. For example,
The present invention has been described by various examples above. However, the aforementioned examples are illustrative only and are not intended to limit the invention in any way. The skilled artisan would readily appreciate that the examples above are capable of various modifications. Thus, the invention is defined by the claims set forth below.
Number | Name | Date | Kind |
---|---|---|---|
4796224 | Kawai et al. | Jan 1989 | A |
5281151 | Arima et al. | Jan 1994 | A |
5363279 | Cha | Nov 1994 | A |
5640048 | Selna | Jun 1997 | A |
5654877 | Burns | Aug 1997 | A |
5880987 | Merritt | Mar 1999 | A |
6237130 | Soman et al. | May 2001 | B1 |
6259649 | Kim | Jul 2001 | B1 |
6317804 | Levy et al. | Nov 2001 | B1 |
6479758 | Arima | Nov 2002 | B1 |
6662250 | Peterson | Dec 2003 | B1 |
6687842 | DiStefano | Feb 2004 | B1 |
6730540 | Siniaguine | May 2004 | B2 |
6747362 | Barrow | Jun 2004 | B2 |
6833287 | Hur et al. | Dec 2004 | B1 |
7227254 | Devnani | Jun 2007 | B2 |
7263678 | Byrn et al. | Aug 2007 | B2 |
7365573 | Okada | Apr 2008 | B2 |
7405946 | Hall et al. | Jul 2008 | B2 |
7493511 | Yin et al. | Feb 2009 | B1 |
7522468 | Norman | Apr 2009 | B2 |
7579683 | Falik et al. | Aug 2009 | B1 |
7663903 | Kang et al. | Feb 2010 | B2 |
7679168 | Shu | Mar 2010 | B2 |
7786591 | Khan et al. | Aug 2010 | B2 |
7829997 | Hess et al. | Nov 2010 | B2 |
7863738 | Romig | Jan 2011 | B2 |
7999383 | Hollis | Aug 2011 | B2 |
8004070 | Chen | Aug 2011 | B1 |
8049303 | Osaka et al. | Nov 2011 | B2 |
8072971 | McKernan | Dec 2011 | B2 |
20030183919 | Devnani | Oct 2003 | A1 |
20040136411 | Hornbuckle et al. | Jul 2004 | A1 |
20040243894 | Smith | Dec 2004 | A1 |
20050098886 | Pendse | May 2005 | A1 |
20060091542 | Zhao et al. | May 2006 | A1 |
20060201704 | Heng et al. | Sep 2006 | A1 |
20060273468 | Mahajan et al. | Dec 2006 | A1 |
20070137029 | Schoenfeld et al. | Jun 2007 | A1 |
20080052451 | Pua et al. | Feb 2008 | A1 |
20080054493 | Leddige et al. | Mar 2008 | A1 |
20080143379 | Norman | Jun 2008 | A1 |
20090016714 | Soto et al. | Jan 2009 | A1 |
20090039492 | Kang et al. | Feb 2009 | A1 |
20090052218 | Kang | Feb 2009 | A1 |
20090194864 | Dang | Aug 2009 | A1 |
20100059898 | Keeth et al. | Mar 2010 | A1 |
20100085392 | Usui | Apr 2010 | A1 |
20100102434 | Kang et al. | Apr 2010 | A1 |
20100314761 | Yoshikawa et al. | Dec 2010 | A1 |
20100327457 | Mabuchi | Dec 2010 | A1 |
20110049710 | Kao et al. | Mar 2011 | A1 |
20110057291 | Slupsky et al. | Mar 2011 | A1 |
20110121443 | Danno et al. | May 2011 | A1 |
20110193227 | Chuang et al. | Aug 2011 | A1 |
20110298127 | Seta et al. | Dec 2011 | A1 |
Entry |
---|
Kyu-Hyoun Kim, Uksong Kang, Hoe-Ju Chung, Duk-Ha Park, Woo-Seop Kim, Young-Chan Jang, Moonsook Park, Hoon Lee, Jin-Young Kim, Jung Sunwoo, Hwan-Wook Park, Hyun-Kyung Kim, Su-Jin Chung, Jae-Kwan Kim, Hyung-Seuk Kim, Kee-Won Kwon, Young-Taek Lee, Joo Sun Choi, Changhyun Kim, An 8Gb/s/pin 9.6ns Row-Cycle 288Mb Deca-Data, Circuits Conference Feb. 2014. |
“Christoforos Kozyrakis,David Patterson,Katherine Yelick,” “Computers for the PostPC Era: Microprocessors for Gadgets,” “Presented Sep. 25, 2000, IBM TJ Watson Research, New York, http://www.eecs.berkeley.edu/˜pattrsn/talks/iram.html”. |
David A. Patterson, “The Future of Microprocessors Embedded in Memory,” 4th System LSI Symposium, Presented Oct. 15, 1998, Japan, http://www.eecs.berkeley.edu/˜pattrsn/talks/iram.html. |
“David Patterson, Krste Asanovic, Aaron Bown, Ben Gribstad, Richard Fromm, Jason Golbus, Kimberly Keeton, Christoforos Kozyrakis, Stelianos Perissakis, Randi Thomas, Noah Treuhaft, John Wawrzynek, and Katherine Yelick “An Overview of Intelligent RAM (IRAM),” Presented at the Telecosm Conference, Sponsored by Forbes Magazine and Geroge Gilder, Ritz-Carlton Ranco Mirage, Palm Springs, CA Sep. 14-16, 1997, http://www.eecs.berkeley.edu/˜pattrsn/talks/iram.html”. |
“David Patterson, Thomas Anderson, Krste Asanovic, Ben Gribstad, Neal Cardwell, Richard Fromm, Jason Golbus, Kimberly Keeton, Christoforos Kozyrakis, Stelianos Perissakis, Randi Thomas, Noah Treuhaft, John Wawrzynek, and Katherine Yelick, “Intelligent RAM (IRAM): Chips that remember and compute (Revision 3),” Presented at DARPA ACS Meeting, IBM, and Micron in Jun.-Jul. 1997, http://www.eecs.berkeley.edu/˜pattrsn/talks/iram.html”. |
Number | Date | Country | |
---|---|---|---|
20120025397 A1 | Feb 2012 | US |