This patent application relates to the field of integrated-circuit (IC) engineering, and more particularly, to high-speed digital microarchitecture.
Digital data may flow through an IC via numerous logic paths. Such paths may include sequential logic—clocks, one-shots, and memory circuits such as flip-flops. In some ICs, the overall throughput of data may be limited by the data-to-output lag (tDQ) of a memory circuit, which is a function of the data-setup time (tS) and the clock-to-output lag (tCQ). It may be desirable, therefore, to reduce both the tS and the tCQ of memory circuits that limit data throughput.
Furthermore, sequential logic operating at very high clock speeds may be prone to the effects of clock skew, clock jitter, and within-die delay variations, which can cause logic errors. One way to avoid such errors is to reduce the clock speed, which also reduces data throughput. A better alternative may be to implement time borrowing. Time borrowing is useful for absorbing clock skew and clock jitter and for averaging out within-die delay variations. This approach can extend the useful range of clock speed in an IC. Time-borrowing concepts may not be applicable, however, to every type of memory circuit.
Accordingly, the disclosure herein provides a novel class of memory circuit which exhibits attractively short tS and tCQ characteristics and is amenable to time borrowing.
Aspects of this disclosure will now be described by example and with reference to the illustrated embodiments listed above. Components that may be substantially the same in one or more embodiments are identified coordinately and are described with minimal repetition. It will be noted, however, that elements identified coordinately may also differ to some degree. The claims appended to this description uniquely define the subject matter claimed herein. The claims are not limited to the example structures and numerical ranges set forth below, nor to implementations that address the herein-identified problems or disadvantages of the current state of the art.
As described in further detail below, memory circuit 10 includes flip-flop 16, which is configured to store the input data D. In some data paths, tCQ—the time required for input data to be stored in and propagate through the flip-flop—may be undesirably great. Therefore, memory circuit 10 also includes selection logic 18A. The selection logic forces data output 14 to the logic level of the stored data once the input data is stored—i.e., once it is fully and stably latched in the flip-flop. Before the input data is stored, the selection logic, on receiving clock pulse 12, forces the data output to the logic level of the unstored input data—20 in
In the embodiment of
Continuing in
In this and other embodiments, the selection logic is configured to expose a logic level dependent on whether the upstream memory logic has latched the input data. The exposed logic level is derived from the input data before the input data is latched, and from the latched input data after the input data is latched. In the embodiments considered herein, an output of the upstream memory logic reveals whether the input data is latched. That output is presented to the selection logic for the purpose of determining whether the input data is latched. In the embodiment of
Continuing in
In memory circuit 10, receipt of clock pulse 12 triggers flip-flop 16 to store the logic level of data input 20. In general, such storing may be triggered by either edge of a clock pulse—i.e., a leading or trailing, rising or falling edge. For ease of description, it will be assumed hereinafter that flip-flop 16 is triggered to store the logic level of the data input on receiving a leading edge of the clock pulse.
With selection logic 18A configured as illustrated, data output 14 is driven to the logic level of data input 20 only when timing input 24 and each of the first and second control lines (26, 28) are high, and is driven otherwise to the logic level of stored-data line 22. The first and second control lines are maintained high prior to receipt of clock pulse 12—i.e., when the timing input is low. The BYP_SEL line is high under these conditions, but BYP_CLK is low. Accordingly, multiplexer 30 maintains the data output at the logic level of the most recently stored input data (whichever state is present at stored-data line 22). The logic level presented at the data output is held until receipt of the clock pulse in the selection logic. When the clock pulse is initially received—i.e., when the timing input goes high—the first and second control lines remain high momentarily, causing BYP_CLK to go high as well. As a result, multiplexer 30 switches the data output to the logic level of the data input. Due to buffer 36, the clock pulse is received in the selection logic before it is received in the upstream memory logic. Thus, the data output is driven to the logic level of the data input before the input data is stored in flip-flop 16, and more specifically, on receipt of the clock pulse in selection logic 18A.
Through buffer 36, clock pulse 12 is received, delayed, in upstream memory logic 32. The upstream memory logic is thereby triggered to latch the logic level of data input 20. This logic level then appears at first control line 26, while the complementary logic level appears at second control line 28. Latching causes the first and second control lines to be complements of each other, so that BYP_SEL and BYP_CLK are forced low. At this point, multiplexer 30 switches data output 14 to the logic level of the stored data at stored-data line 22. In this manner, the data output is driven to the logic level of the stored-data line as soon as the logic level of the input data is stored in flip-flop 16.
The timing chart of
As noted above, flip-flop 16 is triggered by the leading edge of clock pulse 12 through buffer 36. The buffer causes the triggering of the flip-flop to be delayed relative to receipt of the clock pulse in the selection logic. The delay may be suitable in duration for softening the hard clock edge typically exhibited by an SA latch. Such softening enables time borrowing for clock skew and clock jitter absorption, and for averaging out within-die delay variations. In general, the amount of skew and jitter absorption may depend on the particulars of the clock-pulse distribution scheme. In one example, however, an absorption of 25 ps may be applied, effectively reducing tDQ from 30 ps to 5 ps.
Memory circuit 10 offers short tS and tCQ, which combine to yield a short overall tDQ. Furthermore, the circuit is amenable to time borrowing. The price paid for these benefits is a rather long data hold-time requirement τD, which may be 70 ps in some examples. If the input data is changed after receipt of the clock pulse but before τD, those changes will propagate directly through to the output, possibly causing a logic error. Accordingly, the disclosed memory circuit is most advantageous in throughput-limiting data paths where the input data is not susceptible to change within the τD interval.
No aspect of the foregoing description should be understood in a limiting sense, for numerous other embodiments are contemplated as well. The selection logic, for example, may be configured to receive the timing input and only one control line from the flip-flop. This adaptation could be accomplished trivially by incorporating AND gate 38 into flip-flop 16 instead of selection logic 18A. Other variants are contemplated in which the flip-flop is configured to drive a single control line that reflects whether the input data has or has not been stored.
Furthermore, the selection logic need not include a multiplexer. Instead of the multiplexer, it may include an inverting complex gate, as shown in
Thus, selection logic 18B is functionally analogous to selection logic 18A, but may offer an even shorter tCQ due to the decreased number of logic stages in the data-to-output path of inverting complex gate 40 relative to multiplexer 30.
In the embodiment shown in
The speed advantage of inverting complex gate 40 is due largely to the fact that the signal from data input 20 need only propagate through a single transistor stage before arriving at data output 14. Despite the advantages of the illustrated embodiments, it will be understood that various other configurations, including other inverting complex-gate variants, are contemplated as well.
In the embodiment shown in
The schematic diagram of
In the embodiment illustrated in
Returning now to
Buffer 36 is configured to delay receipt of clock pulse 12 in upstream memory logic 32 relative to receipt of the clock pulse in downstream memory logic 34B. When clock pulse 12 is received in the upstream memory logic, after the preselected delay, the input data is stored in the upstream memory logic. At this point, the INT and INTB outputs become complements of each other, with INT and ˜INTB both assuming the logic level of the stored input data. Under these conditions, the D input of downstream memory logic 34B is set to the inverse of the stored input data. Again, because data output 14 is the inverting output of the downstream memory logic, the logic level presented at data output 14 is non-inverted relative to the logic level stored in the upstream memory logic.
An advantage of memory circuit 46 relative to memory circuit 10 of
A disadvantage of memory circuit 46 relative to memory circuit 10 is a slight increase in tDQ. In memory circuit 10, tDQ is simply the delay through multiplexer 30, which may include a first inverter, followed by a transfer gate, followed by a second inverter. In memory circuit 46, the first inverter is replaced, effectively by AOI structure 58. Because the AOI structure is stacked, it may be slower than an inverter. Accordingly, the increase in tDQ is the difference between the delay through the AOI structure relative to the delay through an inverter—5 ps in some examples.
The schematic diagram of
The configurations described above enable various methods to present input data at a data output of a memory circuit promptly on receiving a clock pulse in the memory circuit. Accordingly, some such methods are now described, by way of example, with continued reference to the above configurations. It will be understood, however, that the methods here described, and others within the scope of this disclosure, may be enabled by different configurations as well. The methods may be entered upon any time the memory circuit is operating, and may be executed repeatedly. Further, some of the process steps described and/or illustrated herein may, in some embodiments, be omitted without departing from the scope of this disclosure. Likewise, the indicated sequence of the process steps may not always be required to achieve the intended results, but is provided for ease of illustration and description. One or more of the illustrated actions, functions, or operations may be performed repeatedly, depending on the particular strategy being used.
At 78 the exposed logic level of the selection logic is presented to downstream memory logic. At 80 the exposed logic level presented by the selection logic is stored in the downstream memory logic. At 82 the logic level stored in the downstream memory logic is presented to data output on receipt of the clock pulse. From 82 the method returns.
As noted above, the memory circuits described herein may be used to an advantage in logic paths where a very short tDQ is desired, and an acceptably long τD is available. Such paths exist in numerous, varied environments in IC microarchitecture. One example environment is illustrated in
Microprocessor 84 includes substructures 90 through 104, in addition to numerous control and interconnect structures not shown in
In the embodiment shown in
Virtually any of the microprocessor substructures 90 through 104 may include logic paths that can potentially limit overall data throughput. In such paths, a non-transparent, fast-bypass memory circuit may be used to an advantage. For example, memory circuit 10 or 46 may be used in decoder logic 92, execute logic 80A/B, and/or writeback logic 100A/B.
Another use for memory circuits 10 or 46 in the various substructures of microprocessor 84 may be to reduce the effects of clock jitter and clock skew. Clock jitter refers to the inherent period-length variation of the pulse train from a clock; it may result from various environmental factors. Clock skew is a scenario in which different microprocessor substructures receive imperfectly synchronized clock pulses due to so-called within-die (WID) delay variations. Some WID delay variations can result from nonidealities in fabrication—geometric and/or material inconsistencies that affect signal-path impedances, for example. Other delay variations are merely the result of the clock pulse having to travel different distances to reach the various substructures of the microprocessor.
The memory circuits described herein may also be used in so-called repeater-type interconnects that carry data signals among the various substructures of microprocessor 84. This aspect is illustrated with further reference to
Interconnect 108, in particular, is configured to carry data between two substructures of microprocessor 84. This interconnect includes a monodirectional signal path 110 through which a data signal is carried over a conductor or series of conductors. In some embodiments, an interconnect may include a bidirectional signal path—e.g., two antiparallel, monodirectional signal paths. In still other embodiments, an interconnect may include virtually any multiplicity of monodirectional or bidirectional signal paths-64 bidirectional signal paths, for example, for bidirectional exchange of 64-bit data between substructures of the microprocessor.
As illustrated in
Referring again to
It will be understood, finally, that the circuits and methods described hereinabove are embodiments of this disclosure—non-limiting examples for which numerous variations and extensions are contemplated as well. Accordingly, this disclosure includes all novel and non-obvious combinations and sub-combinations of the such circuits, as well as any and all equivalents thereof.
This application is a continuation-in-part of U.S. patent application Ser. No. 13/327,693 filed 15 Dec. 2011 and entitled FAST-BYPASS MEMORY CIRCUIT, the entirety of which is hereby incorporated by reference herein for all purposes.
Number | Name | Date | Kind |
---|---|---|---|
3178590 | Heilweil | Apr 1965 | A |
3241122 | Bardell, Jr. | Mar 1966 | A |
3381232 | Hoernes | Apr 1968 | A |
3413557 | Muhlenbruch | Nov 1968 | A |
3474262 | Turcotte | Oct 1969 | A |
4256411 | Podosek | Mar 1981 | A |
5032708 | Comerford et al. | Jul 1991 | A |
5262973 | Richardson | Nov 1993 | A |
5305258 | Koshizuka | Apr 1994 | A |
5349255 | Patel | Sep 1994 | A |
5422805 | McIntyre et al. | Jun 1995 | A |
5586069 | Dockser | Dec 1996 | A |
5586081 | Mills | Dec 1996 | A |
5600598 | Skjaveland et al. | Feb 1997 | A |
5604689 | Dockser | Feb 1997 | A |
5615113 | Matula | Mar 1997 | A |
5638312 | Simone | Jun 1997 | A |
5694355 | Skjaveland et al. | Dec 1997 | A |
5748515 | Glass et al. | May 1998 | A |
5821791 | Gaibotti et al. | Oct 1998 | A |
5835941 | Pawlowski | Nov 1998 | A |
5867443 | Linderman | Feb 1999 | A |
5870329 | Foss | Feb 1999 | A |
5903171 | Shieh | May 1999 | A |
5949706 | Chang et al. | Sep 1999 | A |
6009451 | Burns | Dec 1999 | A |
6041008 | Marr | Mar 2000 | A |
6055590 | Pettey et al. | Apr 2000 | A |
6084856 | Simmons et al. | Jul 2000 | A |
6125064 | Kim | Sep 2000 | A |
6163500 | Wilford et al. | Dec 2000 | A |
6173303 | Avigdor et al. | Jan 2001 | B1 |
6263331 | Liu et al. | Jul 2001 | B1 |
6263410 | Kao et al. | Jul 2001 | B1 |
6265922 | Kirsch | Jul 2001 | B1 |
6300809 | Gregor et al. | Oct 2001 | B1 |
6304505 | Forbes et al. | Oct 2001 | B1 |
6310501 | Yamashita | Oct 2001 | B1 |
6366529 | Williams | Apr 2002 | B1 |
6396309 | Zhao et al. | May 2002 | B1 |
6400735 | Percey | Jun 2002 | B1 |
6438024 | Gold et al. | Aug 2002 | B1 |
6442721 | Whetsel | Aug 2002 | B2 |
6452433 | Chang | Sep 2002 | B1 |
6472920 | Cho et al. | Oct 2002 | B1 |
6501677 | Rau et al. | Dec 2002 | B1 |
6580411 | Kubota et al. | Jun 2003 | B1 |
6630853 | Hamada | Oct 2003 | B1 |
6646938 | Kodama | Nov 2003 | B2 |
6714060 | Araki | Mar 2004 | B2 |
6747485 | Suryanarayana et al. | Jun 2004 | B1 |
6803610 | Koolhaas et al. | Oct 2004 | B2 |
6842059 | Wu | Jan 2005 | B1 |
6885589 | Cioaca | Apr 2005 | B2 |
6924683 | Hayter | Aug 2005 | B1 |
6987775 | Haywood | Jan 2006 | B1 |
7051169 | Ganton | May 2006 | B2 |
7057421 | Shi et al. | Jun 2006 | B2 |
7106098 | Zack et al. | Sep 2006 | B1 |
7111133 | Ishikawa et al. | Sep 2006 | B2 |
7196552 | Zhou | Mar 2007 | B2 |
7227798 | Gupta et al. | Jun 2007 | B2 |
7242235 | Nguyen | Jul 2007 | B1 |
7283404 | Khan et al. | Oct 2007 | B2 |
7304903 | Mukhopadhyay et al. | Dec 2007 | B2 |
7333516 | Sikkink | Feb 2008 | B1 |
7346861 | Lee | Mar 2008 | B1 |
7405606 | Kok et al. | Jul 2008 | B2 |
7408393 | Jain | Aug 2008 | B1 |
7414903 | Noda | Aug 2008 | B2 |
7418641 | Drake et al. | Aug 2008 | B2 |
7498850 | Hendrickson | Mar 2009 | B2 |
7499347 | Chen et al. | Mar 2009 | B2 |
7603246 | Newcomb et al. | Oct 2009 | B2 |
7613030 | Iwata et al. | Nov 2009 | B2 |
7724606 | Osawa et al. | May 2010 | B2 |
7739538 | Fee et al. | Jun 2010 | B2 |
7760117 | Chou | Jul 2010 | B1 |
7783911 | Chen et al. | Aug 2010 | B2 |
8164934 | Watanabe et al. | Apr 2012 | B2 |
8330517 | Cline | Dec 2012 | B1 |
8332957 | Hayasaka | Dec 2012 | B2 |
8352530 | Dao et al. | Jan 2013 | B2 |
8363492 | Ishino et al. | Jan 2013 | B2 |
8369177 | Hold et al. | Feb 2013 | B2 |
8453096 | Magee et al. | May 2013 | B2 |
8488360 | Okuda | Jul 2013 | B2 |
8565034 | Lu et al. | Oct 2013 | B1 |
8570818 | Jung et al. | Oct 2013 | B2 |
8742796 | Dally et al. | Jun 2014 | B2 |
8760208 | Dike et al. | Jun 2014 | B2 |
8830766 | Sahu | Sep 2014 | B2 |
8848458 | Kottapalli et al. | Sep 2014 | B2 |
8908449 | Ramaraju | Dec 2014 | B1 |
8964457 | Liaw | Feb 2015 | B2 |
9496047 | Yang et al. | Nov 2016 | B2 |
20020089364 | Goldgeisser et al. | Jul 2002 | A1 |
20030025217 | Song | Feb 2003 | A1 |
20030117170 | Meneghini | Jun 2003 | A1 |
20030120886 | Moller et al. | Jun 2003 | A1 |
20030123320 | Wright et al. | Jul 2003 | A1 |
20030156461 | Demone | Aug 2003 | A1 |
20030210078 | Wijetunga et al. | Nov 2003 | A1 |
20040027184 | Araki | Feb 2004 | A1 |
20040160244 | Kim | Aug 2004 | A1 |
20040243896 | Jaber et al. | Dec 2004 | A1 |
20050040856 | Ramaraju et al. | Feb 2005 | A1 |
20050108604 | Wong | May 2005 | A1 |
20050128844 | Yamagami | Jun 2005 | A1 |
20060049852 | Park et al. | Mar 2006 | A1 |
20060132209 | Meltzer et al. | Jun 2006 | A1 |
20060136656 | Conley et al. | Jun 2006 | A1 |
20070028157 | Drake et al. | Feb 2007 | A1 |
20070130242 | Tajiri | Jun 2007 | A1 |
20070146033 | Pesci | Jun 2007 | A1 |
20070180006 | Gyoten et al. | Aug 2007 | A1 |
20070253263 | Noda | Nov 2007 | A1 |
20080086667 | Chen et al. | Apr 2008 | A1 |
20080195337 | Agarwal et al. | Aug 2008 | A1 |
20080270862 | Drake et al. | Oct 2008 | A1 |
20090040848 | Nitta | Feb 2009 | A1 |
20090119631 | Cortadella et al. | May 2009 | A1 |
20090168499 | Kushida et al. | Jul 2009 | A1 |
20100102890 | Stratz et al. | Apr 2010 | A1 |
20100109707 | Srivastava et al. | May 2010 | A1 |
20100174877 | Yagihashi | Jul 2010 | A1 |
20100306426 | Boonstra et al. | Dec 2010 | A1 |
20100332924 | Ziaja et al. | Dec 2010 | A1 |
20110040906 | Chung et al. | Feb 2011 | A1 |
20110066904 | Lackey | Mar 2011 | A1 |
20110168875 | Okuda | Jul 2011 | A1 |
20120147680 | Koike | Jun 2012 | A1 |
20120163068 | Houston | Jun 2012 | A1 |
20120182056 | Dally et al. | Jul 2012 | A1 |
20130080491 | Pitkethly | Mar 2013 | A1 |
20130080740 | Gentle et al. | Mar 2013 | A1 |
20130129083 | Fujino | May 2013 | A1 |
20130155781 | Kottapalli et al. | Jun 2013 | A1 |
20140003160 | Trivedi et al. | Jan 2014 | A1 |
20140056050 | Yang et al. | Feb 2014 | A1 |
20140129745 | Alfieri | May 2014 | A1 |
20140184268 | Alfieri et al. | Jul 2014 | A1 |
20140244921 | Alfieri et al. | Aug 2014 | A1 |
20140354330 | Gotterba et al. | Dec 2014 | A1 |
20140355362 | Wang et al. | Dec 2014 | A1 |
20150016183 | Sinangil et al. | Jan 2015 | A1 |
20160269002 | Zhang et al. | Sep 2016 | A1 |
Number | Date | Country |
---|---|---|
2004214997 | Jul 2004 | JP |
Entry |
---|
Weste, Neil H.E., and David Money Harris. CMOS VLSI Design: A Circuits and Systems Perspective. 2011. Addison-Wesley. 4th Ediition. Chapter 1, pp. 1-61. |
Bowman, et al., “Time-Borrowing Multi-Cycle On-Chip Interconnects for Delay Variation Tolerance,” Circuit Research Lab, Intel Corporation, Hillsboro, OR, Copyright 2006, ISLPED Oct. 4-6, 2006, pp. 79-84. |
Neil H. E. Weste & David Money Harris, “CMOS VLSI Design: A circuits and Systems Perspective” 2011, Fourth Edition, Chapter 1 (book pp. 1-59), Addison-Wesley, Boston, Massachusetts. |
Number | Date | Country | |
---|---|---|---|
20130155783 A1 | Jun 2013 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 13327693 | Dec 2011 | US |
Child | 13447037 | US |