The present disclosure generally relates to the field of electronics. More particularly, an embodiment of the invention relates to techniques for dynamic and/or idle power reduction sequence using recombinant clock and/or power gating.
Power consumption is quickly becoming a major issue for computing device manufacturers. For example, high energy costs and/or environmental concerns require lower power consumption. Also, from a practical perspective, reduction of power consumption may allow for use of a computing device in more settings, e.g., due to lighter power source components (e.g., batteries or power supplies) and/or reduction in heat generation.
The detailed description is provided with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items.
FIGS. 1 and 5-6 illustrate block diagrams of embodiments of computing systems, which may be utilized to implement various embodiments discussed herein.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of various embodiments. However, some embodiments may be practiced without the specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to obscure the particular embodiments.
Some embodiments relate to techniques for dynamic and/or idle power reduction sequence using recombinant clock and/or power gating. In an embodiment, a power reduction sequence is introduced that combines coarse clock gating and power sequence gating modes to reduce dynamic and idle power significantly compared to earlier implementations.
Generally, clock gating may be classified into two components. First, coarse grain clock gating which may include gating off an entire cluster or block of hardware when idle or not in use. Examples would include gating off the entire PCIe interface (Peripheral Component Interconnect Express™ (PCIe) interconnect (in accordance with the PCI Express™ Specification Base Specification version 2.0 (published Jan. 17, 2007)), such as Transmit (Tx), Link, and Phy (Physical) Layers), entire DMA (Direct Memory Access) controller, entire IOAPIC (Input/Output Advanced Programmable Interrupt Controller), etc. Second, fine grain clock gating which may generally not require direct instantiation of clock gating cells, but it may instead rely on coding data path enables in the RTL (Resistor Transistor Logic) and the use of special backend synthesis tools to create the clock gating cells. For example, the RTL may be coded so that flops are synthesized as enabled D-flops. The backend tools may then convert the enable D-flops to ordinary D-flops with a clock gated cell.
Moreover, while L1 (idle) power state may not be achievable in servers as frequently as in mobile devices (e.g., phones, laptops, tablets, etc.) due to performance requirements, an embodiment addresses cases where even when one or more blocks are inactive, the coarse clock gating may be used for the inactive block(s) and put the system in the partial clock gating state; thus, exploiting almost every opportunity to go down to full clock gating state, and eventually to sleep state (as will be further discussed herein with reference to
Additionally, in some current implementations, a processor may communicate with input/output (I/O) devices via an I/O Hub (IOH). Furthermore, the processor may be provided on a different integrated circuit (IC) device than the IOH. A bus may be used to communicate between these IC devices. Such implementations may, however, reduce speed, e.g., due to delay associated with communicating signals between the IC devices, and/or increase power consumption, e.g., due to presence of additional circuitry required to allow for communication between the IC devices. Also, additional (board) space may be required for the discrete IOH component(s).
In one embodiment, an IOH may be integrated on the same IC device as a processor (which may include one or more processor cores as discussed herein in more detail below). This allows for removal of logic that is only needed for off-chip communication. For example, logic associated with transmission of signals off chip across an interconnect (e.g., physical link) may be removed. Also, logic that would normally control transmitting, training, testing, power state management, etc. of the physical link may be removed. Even though the physical link and additional logic is removed, the same communication mechanisms may still be maintained in some embodiments, e.g., to allow for compatibility with other existing logic, communication protocols, design requirements, etc. For instance, in a QPI (Quick Path Interconnect) based processor, the QPI physical layer and lower link layer may be removed. For the accompanying IOH, the physical layer may be removed.
One or more of the above-mentioned elements may be provided in various computing environments. More particularly,
As illustrated in
The IIO 120 may include a sideband control logic 124 (e.g., to communicate sideband signals with the logic 112), one or more FIFOs 126 (e.g., to enable deterministic data transfer between the upper link layer 110 and IIO 120 via an interconnect/bus 127), a link logic 128 (e.g., to provide link support for communication between the processor 102 and the IIO 120), and a protocol logic 130 (e.g., to provide the support for shutting down or waking system 100).
In an embodiment, a relatively wider and/or slower bus/interconnect 127 may eliminate high speed circuit and power challenges (when compared with the bus/interconnect that couples the non-integrated processor and IOH, for example). In one embodiment, the bus 127 is widened four times, allowing the frequency to be reduced by four times. A valid bit may be added to allow for more flexibility (null flits are now optional, etc.) and/or to support tester modes.
In some embodiments, FIFOs 126 going in both directions (to and from IIO 120) are added. When transferring data between the processor 102 components (e.g., logic 110) and IIO 120, the data is written into a FIFO based on a write pointer and is read by the receiver based on a read pointer. The separation of the write and read pointers may be programmable, for example, to account for clock skew differences between the processor 102 components (e.g., logic 110) and IIO 120. This allows the processor 102 and IIO 120 to run off of different Phase-Locked Loops (PLLs) for flexibility, finer granularity of power states, etc.
In an embodiment, the following sideband signals may be used (e.g., via logics 112 and/or 124):
1. From the IIO 120:
2. From the processor 102:
As shown in
In an embodiment, in order to achieve power optimization, the IIO block implements one or more requirements on all the I/O interfaces and interacting functional blocks, as well as takes into account the dynamic nature of the link operation to switch to various power saving modes. For example, the QPI and PCIe ports both support the L1 power state. Some PCIe devices may not support the L1 state or L1 could be broken during operation. Other blocks/interfaces (such as JTAG (Joint Test Action Group), reset, VLW (Virtual Legacy Wire), and VT-d) may not support specific defined power states, but such blocks/interfaces may be held inactive while the coarse grain clocks are gated.
In one or more embodiments, the inactive block are clock gated (e.g., as early as possible) for item 1a above, which puts the system in the static clock gating state during the normal operation. Also, dynamic clock gating state may be used if opportunity for item 1b and 1c above arise. In some embodiments, only item 1b above may be chosen for implementation, as the RTL code may be reused from TBG (Time Base Generator) which may generate clock frequencies at programmable frequencies (and in an embodiment based on a reference clock provided by the motherboard of the computing system), and the resource and schedule may have higher priority. However, all embodiments are not limited to this implementation.
In an embodiment, the Power Control/Management Unit (PMU) 150 of
3a. Static Clock gating—This may be achieved by clock gating one or more inactive block due to items 1a and 1b above, e.g., and is achieved as early as possible such as after system boot up in an embodiment.
3b Dynamic Clock gating—This may be achieved by clock gating one or more inactive block(s), e.g., as soon as the block becomes inactive during system operation or runtime (item 1c above).
3c Full Clock gating—This is achieved by full coarse clock gating of one or more blocks of the IIO logic.
3d Pre-sleep—This is achieved by preparing the PLL (of the IIO logic) for shut down, e.g., via disabling any master clock gate, divider, etc.
3e Light Sleep—This is performed by PLL shut down.
3f Deep Sleep—This is achieved by power gating one or more blocks of the IIO logic.
In some embodiments, the IIO logic sequence is:
1. QPI link is in L1 state;
2. PCIe is either not connected or in L1 or idle for a (e.g., predefined or programmed) period of time. An idle PCIe link may be in L0 or L0s during which active idle is used since the link may not be in L1 but there is no traffic. A packet detect scheme may be used for discerning cases where a PCIe component/card does not support L1 or does not transition into L1 for any reason;
3. DMI is in L1 or idle for a (e.g., pre-defined or programmable) period of time defined through a register.
In an embodiment, in the CGCG state, clocks to all the IIO blocks/clusters are gated off in sequence and then the clock tree to the IIO except for the PCIe Phy/link and Transaction/layer. The Full Clock gating state may be initiated either from an active state or from dynamic clock gating state when the following conditions are true denoting a pure idle state (no traffic): (1) QPI link is in L1 state, (2) PCIe and DMI go to L1 (or QPI and DMI in L1 and PCIe is not connected). In this state, the IIO logic may performs full clock gating stage as shown in
The sleep state may be achieved either from full clock gating state by implementing item 3d and 3e above (Light Sleep) or 3d, 3e and 3f (Deep sleep) above. At Deep Sleep stage, the IIO logic power may reduce idle power drastically (e.g., to about 500 mw in some implementations). Some embodiments also cover the small system where clock gating is not required, and Light Sleep or Full sleep state may be achieved directly from Active State to Sleep State.
The Sleep State may be achieved through several routes as below:
a) From Active to Dynamic clock gating, to Active to Full Clock gating, then to sleep. One reason to get back to Active State before going into Full clock gating state is that the implementation could choose to get back to Active State to put the PCIe link into L1.
b) From Active to Dynamic clock gating, to full clock gating, and to sleep mode. The implementation could choose to isolate the necessary L1 entry logic from other inactive blocks.
c) From Active to Full clock gating, then to sleep.
d) From Active to Sleep. This works for small system where di/dt is an issue.
Some embodiments, though using the IIO logic as an example (as in Laptop segment), are applicable to a broader range of market segment. Any combination of the six stages discussed above could be implemented in various market segment. Several examples are given below: (1) full six stages implementation could be implemented for MIDs (Mobile Intelligent/Internet Devices (such as mobile phone with the PDA (Personal Digital Assistant), digital camera, etc.)) segment, such as Atom® processors and chipset for Atom processors; (2) full six stages implementation or five stages implementations (item 3b to 3f) could be for laptop segment; and/or (3) four stage implementation (3b to 3e) or two stage implementation (3b and 3c) could be for desktop or server segment.
In various embodiments, a combination of new flows of power optimization of six stages optimize TDP (Thermal Design Power), idle power, and average power. In one embodiment, a four stage power optimization sequence may in order utilize 3c, 3d, 3e, and 3f stages discussed above. In another embodiment, a five stage power optimization sequence may in order utilize 3b, 3c, 3d, 3e, and 3f stages discussed above. Accordingly, some embodiments provide for a SOC (System On Chip) where power gating, clock gating (coarse and fine grain), and PLL shut-off have been achieved and which directly contributes to a significant reduction in idle power. The direct transition from CGCG to sleep may also provide a significant amount of efficiency as discussed above.
As illustrated in
In one embodiment, the system 500 may support a layered protocol scheme, which may include a physical layer, a link layer, a routing layer, a transport layer, and/or a protocol layer. The fabric 504 may further facilitate transmission of data (e.g., in form of packets) from one protocol (e.g., caching processor or caching aware memory controller) to another protocol for a point-to-point or shared network. Also, in some embodiments, the network fabric 504 may provide communication that adheres to one or more cache coherent protocols.
Furthermore, as shown by the direction of arrows in
As illustrated in
In an embodiment, the processors 602 and 604 may be one of the processors 602 discussed with reference to
In at least one embodiment, the I/O functionality may be integrated into the processors 602/504. Other embodiments of the invention, however, may exist in other circuits, logic units, or devices within the system 600 of
The chipset 620 may communicate with a bus 640 (e.g., using an interface circuit 641). The bus 640 may have one or more devices that communicate with it, such as a bus bridge 642 and I/O devices 643 (which may communicate with the IIO via other components such as shown in
In various embodiments of the invention, the operations discussed herein, e.g., with reference to
The storage medium may include volatile and/or nonvolatile memory (or storage). For example, nonvolatile memory may include one or more of the following: read-only memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), electrically EPROM (EEPROM), a disk drive (e.g., 528), a floppy disk, a compact disk ROM (CD-ROM), a digital versatile disk (DVD), flash memory, a magneto-optical disk, or other types of nonvolatile machine-readable media capable of storing electronic data (e.g., including instructions). Volatile memory may include devices such as random access memory (RAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), static RAM (SRAM), etc.
Also, the term “logic” may include, by way of example, software, hardware, or combinations of software and hardware. The machine-readable medium may include a storage device such as those discussed herein. Additionally, such computer-readable media may be downloaded as a computer program product, wherein the program may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) through data signals provided in a propagation medium via a communication link (e.g., a bus, a modem, or a network connection).
Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least an implementation. The appearances of the phrase “in one embodiment” in various places in the specification may or may not be all referring to the same embodiment.
Also, in the description and claims, the terms “coupled” and “connected,” along with their derivatives, may be used. In some embodiments of the invention, “connected” may be used to indicate that two or more elements are in direct physical or electrical contact with each other. “Coupled” may mean that two or more elements are in direct physical or electrical contact. However, “coupled” may also mean that two or more elements may not be in direct contact with each other, but may still cooperate or interact with each other.
Thus, although embodiments of the invention have been described in language specific to structural features and/or methodological acts, it is to be understood that claimed subject matter may not be limited to the specific features or acts described. Rather, the specific features and acts are disclosed as sample forms of implementing the claimed subject matter.
The present application relates to and is a continuation-in-part of U.S. patent application Ser. No. 12/791,836, filed Jun. 1, 2010, entitled “Integration of processor and Input/Output hub”, which is incorporated herein by reference and for all purposes.
Number | Name | Date | Kind |
---|---|---|---|
5404544 | Crayford | Apr 1995 | A |
5761516 | Rostoker et al. | Jun 1998 | A |
5893153 | Tzeng et al. | Apr 1999 | A |
6009488 | Kavipurapu | Dec 1999 | A |
6442697 | Jain et al. | Aug 2002 | B1 |
6487689 | Chuah | Nov 2002 | B1 |
6536024 | Hathaway | Mar 2003 | B1 |
6721840 | Allegrucci | Apr 2004 | B1 |
6980024 | May et al. | Dec 2005 | B1 |
7181188 | Vu et al. | Feb 2007 | B2 |
7353374 | Trimberger | Apr 2008 | B1 |
7702639 | Stanley et al. | Apr 2010 | B2 |
7814371 | Sams et al. | Oct 2010 | B2 |
7865744 | Lee et al. | Jan 2011 | B2 |
7882294 | Love | Feb 2011 | B2 |
7898994 | Zhao et al. | Mar 2011 | B2 |
8181059 | Millet et al. | May 2012 | B2 |
8304698 | Tischler | Nov 2012 | B1 |
20030159024 | Chen | Aug 2003 | A1 |
20030226050 | Yik et al. | Dec 2003 | A1 |
20040139283 | Arimilli et al. | Jul 2004 | A1 |
20040215371 | Samson et al. | Oct 2004 | A1 |
20050128846 | Momtaz et al. | Jun 2005 | A1 |
20050149768 | Kwa et al. | Jul 2005 | A1 |
20050283561 | Lee et al. | Dec 2005 | A1 |
20060224806 | Suzuki | Oct 2006 | A1 |
20070005995 | Kardach et al. | Jan 2007 | A1 |
20070094436 | Keown et al. | Apr 2007 | A1 |
20070180281 | Partovi et al. | Aug 2007 | A1 |
20070300088 | Lippojoki | Dec 2007 | A1 |
20080074992 | Sams et al. | Mar 2008 | A1 |
20080162748 | Fanning | Jul 2008 | A1 |
20080162855 | Thomas | Jul 2008 | A1 |
20080307244 | Bertelsen et al. | Dec 2008 | A1 |
20090164684 | Atherton et al. | Jun 2009 | A1 |
20090210595 | Chaussade | Aug 2009 | A1 |
20090259713 | Blumrich et al. | Oct 2009 | A1 |
20100083026 | Millet et al. | Apr 2010 | A1 |
20100153759 | Singhal | Jun 2010 | A1 |
20100162019 | Kumar et al. | Jun 2010 | A1 |
20100257393 | Zhuang et al. | Oct 2010 | A1 |
20100281195 | Daniel et al. | Nov 2010 | A1 |
20110293035 | Kobayashi | Dec 2011 | A1 |
20110296216 | Looi et al. | Dec 2011 | A1 |
20120079159 | Rajwar et al. | Mar 2012 | A1 |
Number | Date | Country |
---|---|---|
102270187 | Dec 2011 | CN |
2006-285872 | Oct 2006 | JP |
2007-517332 | Jun 2007 | JP |
2007-249808 | Sep 2007 | JP |
2008-194563 | Aug 2008 | JP |
2009-217813 | Sep 2009 | JP |
2010-500807 | Jan 2010 | JP |
2010-515164 | May 2010 | JP |
546560 | Aug 2003 | TW |
8904516 | May 1989 | WO |
2008018017 | Feb 2008 | WO |
2011153042 | Dec 2011 | WO |
2011153042 | Apr 2012 | WO |
2012047600 | Apr 2012 | WO |
2012088530 | Jun 2012 | WO |
2012047600 | Aug 2012 | WO |
2012088530 | Dec 2012 | WO |
Entry |
---|
Office Action received for U.S. Appl. No. 13/040,507, mailed on Apr. 8, 2013, 19 pages. |
Office Action received for U.S. Appl. No. 12/791,836, mailed on Apr. 2, 2013, 7 pages. |
Office Action received for U.S. Appl. No. 12/791,836, mailed on Sep. 17, 2012, 6 pages. |
International Preliminary Report on Patentability Received for the PCT Application No. PCT/US2011/037990, mailed on Dec. 13, 2012, 5 pages. |
International Search Report and Written Opinion Received for PCT Application No. PCT/US2011/037990, mailed on Feb. 9, 2012, 10 pages. |
International Search Report and Written Opinion Received for the PCT Application No. PCT/US2011/067260, mailed on Aug. 14, 2012, 9 pages. |
Berktold et al., “CPU Monitoring With DTS/PECI”, Intel Corporation, White Paper, Sep. 2009, pp. 1-23. |
International Search Report and Written Opinion Received for the PCT Application No. PCT/US2011/053335, mailed on Jun. 22, 2012, 9 pages. |
International Preliminary Report on Patentability and Written Opinion received for PCT Application No. PCT/US2011/053335, mailed on Apr. 4, 2013, 6 pages. |
International Preliminary Report on Patentability and Written Opinion received for PCT Application No. PCT/US2011/067260, mailed on Jul. 4, 2013, 6 pages. |
Office Action received for Chinese Patent Application No. 201110158611.5, mailed on Jun. 7, 2013, 5 pages of English Translation and 6 pages of Office Action. |
Office Action received for Japanese Patent Application No. 2013-513224, mailed on Dec. 10, 2013, 3 pages of English Translation and 3 pages of Japanese Office Action. |
Office Action received for Taiwan Patent Application No. 100134377, mailed on Nov. 13, 2013, 9 pages of English Translation and 9 pages of Japanese Office Action including Search Report. |
Final Office Action received for U.S Appl. No. 13/040,507, mailed on Oct. 9, 2013, 37 pages. |
‘First-In First-Out’, The Free On-Line Dictionary of Computing, retrieved from Internet on Oct. 1, 2013 , Dec. 6, 1999, 1 page. <http://foldoc.org/fifo>. |
Final Office Action received for U.S Appl. No. 12/791,836, mailed on Dec. 16, 2013, 5 pages. |
Number | Date | Country | |
---|---|---|---|
20110296222 A1 | Dec 2011 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12791836 | Jun 2010 | US |
Child | 12978452 | US |