Aspects of the disclosure relate generally to data communication, and more specifically, to dynamic serial data link configuration.
A Peripheral Component Interconnect (PCI) bus is a common connection interface for attaching computer peripherals to a host computer. Early versions of PCI use a parallel bus to connect the peripherals with the computer. Some examples of PCI peripherals are data storage systems, network cards, USB hubs, graphic cards, etc. A Peripheral Component Interconnect Express (PCI-Express or PCIe) is a modification of the standard PCI bus. The PCIe uses a high speed serial point-to-point communication link instead of a parallel bus. More detail on PCIe can be found, for example, in PCI Express® Base Specification Revision 4.0, Version 1.0, which is incorporated herein by reference.
A PCIe link between two devices may consist of one or more lanes. For example, a PCIe link may have one lane (×1), 4 lanes (×4), eight lanes (×8), twelve lanes (×12), sixteen lanes (×16), and thirty-two lanes (×32). A link can have a higher bandwidth by using more lanes. However, using more lanes per PCIe link increases power consumption.
The following presents a simplified summary of one or more aspects of the present disclosure, in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated features of the disclosure, and is intended neither to identify key or critical elements of all aspects of the disclosure nor to delineate the scope of any or all aspects of the disclosure. Its sole purpose is to present some concepts of one or more aspects of the disclosure in a simplified form as a prelude to the more detailed description that is presented later.
Aspects of the present disclosure provide systems and methods for dynamic serial data link (e.g., Peripheral Component Interconnect Express) reconfiguration to optimize power consumption of the serial data link. In some embodiments, Peripheral Component Interconnect Express (PCIe) link reconfiguration may be based on power state based transition, utilization based transition, and/or host based transition.
One aspect of the disclosure provides a method of operating a first device using a serial data link including one or more lanes. In one example, the serial data link may be a Peripheral Component Interconnect Express (PCIe) link. The first device communicates data with a second device using the serial data link configurable to operate in a plurality of link configurations. Each link configuration includes a lane width and a technology generation. The technology generation defines a set of rules for operating the serial data link. The first device detects a condition for changing the link configuration of the serial data link. Then, the first device may select a link configuration among the plurality of link configurations that prioritizes reduction of the lane width over downgrading the technology generation to meet a predetermined performance requirement of the serial data link. After selecting the link configuration, the first device modifies the serial data link to use the selected link configuration.
Another aspect of the disclosure provides an apparatus configured to communicate with a host using a serial data link including one or more lanes. In one example, the serial data link may be a PCIe link. The apparatus has a communication interface configured to communicate data with the host using the serial data link configurable to operate in a plurality of link configurations. Each link configuration includes a lane width and a technology generation. The technology generation defines a set of rules for operating the serial data link. The apparatus has a controller operatively coupled with the communication interface. The controller is configured to detect a condition for changing the link configuration of the serial data link. The controller selects a link configuration among the plurality of link configurations that prioritizes reduction of the lane width over downgrading the technology generation to meet a predetermined performance requirement of the serial data link. Then, the control modifies the serial data link to use the selected link configuration.
Another aspect of the disclosure provides an apparatus configured to communicate with a host using a serial data link comprising one or more lanes. The apparatus includes means for communicating data with a second device using the serial data link configurable to operate in a plurality of link configurations. Each link configuration includes a lane width and a technology generation, and the technology generation defines a set of rules for operating the serial data link. The apparatus further includes means for detecting a condition for changing the link configuration of the serial data link. The apparatus further includes means for selecting a link configuration among the plurality of link configurations that prioritizes reduction of the lane width over downgrading the technology generation to meet a predetermined performance requirement of the serial data link. The apparatus further includes means for modifying the serial data link to use the selected link configuration.
Referring now to the drawings, systems and methods for reconfiguring a Component Interconnect Express (PCIe) link to dynamically manage PCIe bandwidth to optimize power consumption and reduce underutilized bandwidth.
The PCIe link 110 can include one or more lanes for transmitting and receiving data between the devices. Each lane includes a set of differential signal pairs, one pair for transmission and one pair for reception. A ×N Link (e.g., ×1, ×2, ×4, ×8, ×16) is composed of N lanes. For example, an ×1 PCIe link includes one lane, and an ×16 PCIe link includes 16 lanes. When a PCIe link 110 includes multiple lanes, the bandwidth of the individual lanes are aggregated to provide more bandwidth. During hardware initialization, device A and device B negotiate the lane widths and frequency of operation used by the PCIe link. In general, the frequency of operation of the PCIe link increases in later PCIe generations, resulting in higher data rate per-lane. However, using more lanes and/or higher frequency (i.e., newer generation) increases power consumption by the devices. Each newer PCIe generation generally uses higher frequency to increase data rate of the PCIe link. PCIe generations may be referred to as technology generation in this disclosure.
Table 1 below illustrates data rates (GB/s) of some exemplary PCIe lane and technology generation combinations.
Table 2 below illustrates power consumption (Watts) of some exemplary PCIe lane and technology generation combinations.
In some embodiments, device A and device B each include one or more processors 112 and 114 to control various operations including PCIe operations and data communication between the devices. The processors 112 and 114 may be implemented as any type of processing devices, such as microprocessors, microcontrollers, embedded controllers, logic circuits, software, firmware, or the like, for controlling the operation of the devices 102 and 104. In one embodiment, the processors 112 and 114 can be special purpose controllers specifically configured/programmed to perform any of the functions and procedures contained within the application.
In some embodiments, some or all of the functions, processes, and procedures described herein as being performed by the processors 112 and 114 may instead be performed by one or more elements of the devices 102 and 104. For example, each device 102 or 104 may include a microprocessor, a microcontroller, an embedded controller, a logic circuit, a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), software, firmware, or any kind of processing device, for performing one or more of the functions described herein as being performed by the processors 112/114 and/or PCIe modules 106/108.
PCIe standards define various link power management states that can be called link states in short.
For example, link state L0s 302 is a low latency energy saving standby state. In this link state, no data can be communicated on the PCIe link, and some circuit components of the device can be disabled or turned off to reduce power consumption. The transition time from L0s back to L0 is typically the shortest among all the power saving link states. The time needed to transition out of a power saving link state (e.g. L0s) is called the exit latency. Link state L0s can be used to reduce power wastage during short intervals of idle between PCIe link activities.
Power saving link state L1 306 has higher exit latency than the L0s link state. For example, link state L1 306 may be used to reduce power when the device becomes aware of a lack of outstanding PCIe requests or pending transactions. Link state L1 provides more power saving than link state L0s at the expense of higher exit latency. In link state L1, a device can turn off its transmitter and enters an electrical idle state. When returning from L1 to L0, both devices may go through a link recovery process to retrain the PCIe link to reestablish synchronization.
Link state L1 may optionally include substates, for example, link substate L1.1 and link substate L1.2. Substate L1.1 (308) may be used as a low power link state in Peripheral Components Interface Power Management (PCIPM), and substate L1.2 (310) may be used as a low power link state in Active State Power Management (ASPM). In the L1.1 substate, the link common mode voltages are maintained. In the L1.2 substate, the link common mode voltages may not be maintained. The L1.2 substate is entered when the PCIe link is in the L1 substate and conditions for entry into L1.2 substate are met.
For example, the devices A and B may enter a power saving PCIe link state after certain predetermined conditions are met including a timeout when the PCIe link is idle (i.e., no data communication between devices). In general, the low power saving link state (e.g., L0s) has a shorter timeout than the high power saving link state (e.g., L1.2).
Aspects of the present disclosure provide various dynamic PCIe link reconfiguration methods to optimize power consumption of a PCIe link. In some embodiments, PCIe link reconfiguration may be based on power state based transition, utilization based transition, and/or host based transition. These different approaches are described in more detail below in turn. To avoid unnecessary power consumption, a PCIe link is dynamically set in a minimum configuration that can support an input-output (I/O) requirement. A PCIe configuration includes the number of lanes used and technology generation. A minimum PCIe configuration refers to a certain combination of lanes and technology generation that can support the corresponding I/O requirement (e.g., data rate) without substantial underutilized bandwidth.
As illustrated in Tables 1 and 2, reducing lane number results in more power saving than moving to a lower PCIe generation, for the same data rate reduction. Therefore, it will be more effective to save power by prioritizing lane number reduction when transitioning between different PCIe configurations.
At block 402, a first device may communicate data with a second device using a serial data link configurable to operate in a plurality of link configurations. Each link configuration includes a lane width and a technology generation. For example, the lane width may be ×1, ×2, ×4, ×8, or ×16 of a PCIe link. The technology generation defines a set of rules for operating the serial data link. For example, a technology generation may refer to the PCIe version or generation such as 1st, 2nd, 3rd, or 4th PCIe generation.
At block 404, the first device detects a condition for changing the link configuration of the serial data link. For example, the device may determine the condition based on whether the current PCIe configuration (e.g., lane width and technology generation) can meet the power consumption, performance (e.g., data transfer rate), and/or link utilization requirement of the serial data link. In some embodiments, the device may use a power state based method, utilization based method, and/or host based method to determine the condition. These methods will be described in more detail in relation to
At block 406, the device selects a link configuration among the plurality of link configurations that prioritizes reduction of the lane width over downgrading the technology generation to meet a predetermined performance requirement of the serial data link. In some embodiments, the performance requirement may include data rate (bandwidth) and/or power consumption. In some examples, reducing lane width or downgrading technology generation can reduce power consumption as illustrated in Table 2. However, lane width reduction generally results in more saving in power consumption than downgrading technology generation. Therefore, the device, in block 406, may prioritize lane width reduction over technology generation downgrade.
At block 408, the device modifies the serial data link to use the selected link configuration. In some examples, the first device and second device may communicate with each other according to a PCIe handshake protocol, and both devices support dynamic reconfiguration of PCIe link states and technology generations. The transmitter (first device or second device) may send a request over the serial data link to initiate reconfiguration to the selected link configuration.
In one example, it is assumed that the device is initially configured in a link configuration that has a lane width of 4 using 4th generation PCIe. If the device is in the active idle power state 404, the device has no bandwidth requirement. In that case, the device can reconfigure the PCIe link 110 to use a ×1 configuration without downgrading the technology generation. As the example shown in table 2, power consumption can be reduced from 0.54 W to 0.15 W that is less than the 400 mW limit of the active idle power state.
If the device is now in the full power state 506, the device has a bandwidth requirement of 3.2 GB/s. In that case, the device can reconfigure the PCIe link 110 to maintain a lane width of 4 but downgrading to 3rd generation PCIe. As the example shown in table 2, power consumption can be reduced from 0.54 W to 0.37 W while still meeting the bandwidth requirement of the full power state 506.
If the device is now in the light throttling power state 506, the device has a bandwidth requirement of 1.6 GB/s. In that case, the device can reconfigure the PCIe link 110 to use a lane width of 2 and downgrade to 3rd generation PCIe. As the example shown in table 2, power consumption can be reduced from 0.54 W to 0.2 W.
If the device is now in the heavy throttling power state 508, the device has a bandwidth requirement of 1 GB/s. In that case, the device can reconfigure the PCIe link 110 to use a lane width of 2 and downgrade to 3rd generation PCIe. This configuration can support a bandwidth of 1.6 GB/s. As the example shown in table 2, power consumption can be reduced from 0.54 W to 0.2 W. In another example, the device can reconfigure the PCIe link 110 to use a lane width of 1 and continue to use 4th generation PCIe. This configuration also supports a bandwidth of 1.6 GB/s. As the example shown in table 2, power consumption can be reduced from 0.54 W to 0.15 W. In this case, reducing lane width can achieve more power consumption reduction than downgrading technology generation.
If the device is now in the extreme throttling power state 510, the device has a bandwidth requirement of 400 MB/s. In that case, the device can reconfigure the PCIe link 110 to use a lane width of 1 and continue to use 4rd generation PCIe. This configuration can support a bandwidth of 1.6 GB/s. As the example shown in table 2, power consumption can be reduced from 0.54 W to 0.15 W. Further power consumption reduction may be made by downgrading technology generation, however, with diminishing results.
In some embodiments, the device may use more aggressive PCIe link state timeout in the extreme throttling power state. In some embodiments, the device may reduce the timeout values of some PCIe power saving link states (e.g., L0s and/or L1.2). In one example, PCIe link state L0s may have a default timeout of about 30 μs and an aggressive timeout of about 1 μs. In one example, PCIe link state L1.2 may have a default timeout of about 100 milliseconds (ms) and an aggressive timeout of about 1 ms. A shorter timeout allows faster transition to the energy saving link state. The device may dynamically change the timeout values based on the current power state of the device.
The above described PCIe reconfiguration examples prioritize lane width reduction over technology generation downgrade because power reduction is more significant when reducing lane width than downgrading PCIe generation, for example, as shown in Table 2. In some embodiments, the power state based PCIe link reconfiguration techniques described above may be performed with the technology generation fixed, for example, to the 3rd generation.
At block 702, the device determines the utilization of the PCIe link 110. The device may collect front end utilization statistics, for example, bytes transferred over the PCIe link to determine a link utilization percentage per time window. For example, if X bytes are transferred over the PCIe link over a period of time T, the transfer rate is R=X/T. The PCIe utilization may be determined R divided by the PCIe link's rated bandwidth. Some examples of PCIe rated bandwidths are shown in Table 1.
At decision block 704, the device determines whether or not link utilization is smaller than 50%. If the utilization is smaller than 50%, the device further determines whether or not the current lane width is the smallest lane width at decision block 706. For example, PCIe link may have a smallest lane width of one (e.g., ×1). At block 708, if the current lane width is not the smallest lane width, the device reduces the lane width to the next lower lane width while not changing the PCIe generation. For example, if the PCIe link is currently configured to have a lane width of 4 using 4th generation PCIe, the device may reconfigure the PCIe link to have a lane width of 2 and continue to use 4th generation PCIe. In this case, according to Table 2, the device can reduce power consumption from 0.54 W to 0.15 W. Similarly, if the PCIe link is currently configured to have a lane width of 2 using 4th generation PCIe, the device may reconfigure the PCIe link to have a lane width of 1 and continue to use 4th generation PCIe. Other examples are possible.
At block 710, if the utilization is greater than 50%, the device increases the lane width of the PCIe link 110 to its largest supported lane width (e.g., ×4). For example, if the PCIe link is currently configured to have a lane width of 1 or 2 using 4th generation PCIe, the device may reconfigure the PCIe link to have a lane width of 4 (largest supported lane width of the device) and continue to use 4th generation PCIe. The device may repeat the above described procedures of method 700 to dynamically reconfigure the PCIe link based on the utilization of the link.
At block 802, the host device determines an expected workload intensity of the PCIe link. The host device may determine the workload based on various factors such as command queue utilization of the PCIe link. In some examples, the host device may receive certain information that may be used to determine the expected workload from an application running on the host device. In one example, the application may indicate that autosave is performed every minute. In that case, the host device may determine the expected PCIe workload or traffic generated by the autosave function based on, for example, statistic collected on previous autosave traffic of the application. In one example, the host device can sense user activity and determine that the host has been idle for a while so that the host may reduce lane width in response to the reduced PCIe workload. When the host device senses that the user resumed work, the host device can transition to a full lane width configuration. In another example, the host device can identify that a user is opening an application that is known for high (or low) PCI bandwidth requirement.
At block 804, if the expected workload is low (e.g., smaller than 25% link bandwidth), the device may reconfigure the PCIe link to a lane width of ×1. At block 806, if the expected workload is medium (e.g., between 25% and 50% link bandwidth), the device may reconfigure the PCIe link to a lane width of ×2. At block 808, if the expected workload is high (e.g., greater than 50% link bandwidth), the device may reconfigure the PCIe link to a lane width of ×4. The lane widths and workload levels described above in relation to
In some embodiments, the host device may initiate the lane width change itself based on the expected workload intensity, or the other device connected to the PCIe link may initiate the lane width change based on information (e.g., expected workload) received from the host device.
The above described PCIe reconfiguration methods may be used individually or in any combinations including one or more of the methods and procedures described in relation to
While the above description contains many specific embodiments of the invention, these should not be construed as limitations on the scope of the invention, but rather as examples of specific embodiments thereof. Accordingly, the scope of the invention should be determined not by the embodiments illustrated, but by the appended claims and their equivalents.
The various features and processes described above may be used independently of one another, or may be combined in various ways. All possible combinations and sub-combinations are intended to fall within the scope of this disclosure. In addition, certain method, event, state or process blocks may be omitted in some implementations. The methods and processes described herein are also not limited to any particular sequence, and the blocks or states relating thereto can be performed in other sequences that are appropriate. For example, described tasks or events may be performed in an order other than that specifically disclosed, or multiple may be combined in a single block or state. The example tasks or events may be performed in serial, in parallel, or in some other suitable manner Tasks or events may be added to or removed from the disclosed example embodiments. The example systems and components described herein may be configured differently than described. For example, elements may be added to, removed from, or rearranged compared to the disclosed example embodiments.