PEER TO PEER POWER MANAGEMENT

Abstract
An apparatus and associated method contemplating first and second electronic devices configured to execute input/output (I/O) commands via a network. At least one of the electronic devices has a power manager application configured to control an amount of power supplied to the one of the electronic devices based on an amount of power being supplied to one or more of the electronic devices.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS

None.


BACKGROUND OF THE INVENTION

1. Field of The Invention


The present invention relates generally to power management in systems that include a number of electronic devices.


2. Description of Related Art


Power budgets in complex computer systems are receiving more scrutiny lately. Distributed data storage systems, for example, use a number of electronic devices that are capable of operation at different power levels corresponding to different operational modes. For purposes of this description the electronic devices can include storage devices such as disk drives and solid state drives, and they can be read/write devices, such as a tape drive. The storage devices experience a significantly higher utilization at some times in comparison to other times. Some storage activities can be time shifted to off-peak times to extend the benefits of reduced power level operation. It is to these improvements that the embodiments of the present technology are directed.


SUMMARY OF THE INVENTION

Some embodiments of this technology contemplate an apparatus having a first electronic device and a second electronic device. The first electronic device is configured to execute input/output (I/O) commands via a network. The second electronic device is also configured to execute I/O commands via the network. Additionally, the second electronic device has a power manager application configured to control an amount of power supplied to the second electronic device based on an amount of power being supplied to the first electronic device.


Some embodiments of this technology contemplate a computer apparatus having a plurality of electronic devices, each electronic device including a controller configured to communicate input/output (I/O) commands and power control circuitry configured to selectively increase an amount of power supplied to one of the electronic devices based on an amount of power being supplied to at least one of the other electronic devices.


Some embodiments of this technology contemplate a method that includes: obtaining an apparatus including a plurality of electronic devices operably executing input/output (I/O) commands via a network; and individually controlling amounts of power supplied by each of the electronic devices, based on one of the electronic devices determining the amount of power supplied by at least one other of the electronic devices.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a block diagram of a computer system employing embodiments of this technology.



FIG. 2 is a block diagram of a storage controller in each server of the computer system of FIG. 1.



FIG. 3 is a block diagram of a power manager in each controller of the computer system of FIG. 1.



FIG. 4 is a more detailed block diagram of the storage controller and power manager in the computer system of FIG. 1.



FIG. 5 diagrammatically depicts different resources that are employed in executing different input/output (I/O) access commands according to embodiments of this technology.



FIG. 6 diagrammatically depicts selectively setting different power levels for the resources in FIG. 5.



FIG. 7 is a flowchart depicting steps in a method for POWER MANAGEMENT in accordance with embodiments of this technology.



FIG. 8 is an isometric depiction of a tape storage library utilizing embodiments of this technology.





DETAILED DESCRIPTION

Initially, this disclosure is by way of example only, not by limitation. The power management concepts herein are not limited to use or application with any specific system or method. That is, the embodiments for storing digital data are illustrative and not limiting of the contemplated embodiments of this technology. Thus, although the instrumentalities described herein are for the convenience of explanation, shown and described with respect to exemplary embodiments, the skilled artisan understands that the principles herein may be applied equally in other types of systems and associated methods.


To illustrate an exemplary environment in which embodiments of the present technology can be practiced, FIG. 1 is a block diagram depiction of a computer system 100. One or more hosts 102 (depicted as clients) are networked to one or more network-attached servers 104 (three depicted) via a storage area network (SAN). The hosts 102 access software applications residing in the server 104 that routinely store data to and retrieve data from a data store 108. Data is transferred with the storage devices via various communication protocols, such as serial ATA and fibre-channel for example.



FIG. 2 is a block diagram depiction of a storage controller 112 controlling the data transfers with the storage devices in each of the servers 104. Each controller 112 can be embodied in a single integrated circuit or can be distributed among a number of discrete circuits, as desired. The controller 112 can reside anywhere within the computer system 100, such as in a network or in a network-attached device. A processor 114, such as a programmable computer processor, provides top level control in accordance with stored programming steps and processing data, such as can be stored in non-volatile memory 116 (flash memory or similar) and in dynamic random access memory (DRAM) 118. A fabric interface (I/F) circuit 120 communicates with other storage controllers (not depicted, such as in the other servers 104) and with the hosts 102 via the fabric 106 (FIG. 1), and a device I/F circuit 122 communicates with the storage devices in the data store 108. The I/F circuits 120, 122 and a path controller 124 form a communication path to pass commands and data (“access commands”) between the hosts 102 (FIG. 1) and the data store 108. A cache 126 can be used as a temporary repository for the access commands. Writeback caching is a method of retaining data in the cache for transfer at a later, more operationally convenient time. By employing a writeback cache policy, a completion acknowledgement can be transmitted back to the initiating device (such as the host 102) before the data is actually written to the storage device(s). That permits scheduling execution of any access command to occur at a later, more convenient (efficient) time.


Access commands are flushed (moved) from the cache 126 into a command queue 128.


In other words, the command queue 128 is populated to, in effect, grant permissions to issue cached access commands. A power manager application 130 allocates power resources to enable electronic devices that are necessary for executing the flushed access commands. For purposes of this description, in illustrative embodiments each power manager application 130 controls the amount of power provided to its respective data store 108.


The capacity of the data store 108 is organized into logical addresses that are referenced when transferring access commands with the storage devices. System configuration information defines the storage relationships between user data and any associated parity and/or minor data. The system configuration information furthermore identifies the storage relationships between physical block addresses and the associated logical block addresses.


The controller 112 architecture advantageously provides scalable, highly functional data management and control of the data store 108. The system configuration information can further include data structures that are aligned to stripe boundaries on the storage devices. The data structures can reference data buffers in the cache 126 that are dedicated to storing the data associated with a storage stripe.


During operation, executing the access commands generally involves input-output (I/O) data transfers between the hosts 102 and the respective data store 108. Readback data retrieved from the storage devices, including non-requested speculative data, can be retained for a time in the respective cache 126 in an expectation of a “cache hit” in response to a subsequent access command. Speculative data is data that is retrieved in addition to the requested data, based on a predicted value of its access in the future. The throughput capacity of the computer system 100 is advantageously increased when a subsequent access command can be satisfied directly from the cache 126 instead of scheduling an I/O transfer with one or more of the storage devices.



FIG. 3 is an illustrative functional block depiction of how each power manager 130 (FIG. 2) changes a power state of one of the storage devices 132 in the data store 108. Although only one storage device 132 is depicted, it will be understood that the power manager 130 can change power states to two or more of the storage devices 132 in the data store 108 in the same manner.


Each power manager 130 includes a power control block 134 coupled to a switching device 136. The power control block 134 provides control outputs to the switching device 136 in response to operational inputs from the respective controller 112 (FIG. 2). The switching device 136 can be characterized, for example, as an n-channel MOSFET with a gate input coupled to the output of the power control block 134. The use of a MOSFET is merely illustrative and not limiting of the contemplated emboidments of the claimed invention. The switching device 136 further has source-drain conduction paths 138, 140, 142 connected between different input power sources 144, such as V1, V2, . . . Vn depicted, and the storage device 132. In the absence of a gate input, the source-drain conduction path is preferably in a high impedance state.


The same structure can be repeated for each of the other storage devices, although not depicted. For example, there can be ten data storage devices 132 in the data store 108, each storage device 132 capable of operating at any of the three power inputs depicted (V1, V2, and Vn). In that example, it will be understood that the power control block 134 can include a total of thirty such conduction paths to the data storage devices 1321, 1322, . . . 13210.



FIG. 4 is a more detailed block diagram of the controller 112, configured in accordance with the present technology, that is arranged to increase data processing performance. The command queue 128 contains a number of prioritized access commands (sometimes referred to as “I/O commands”) from the cache 126. Although only one command queue 128 is depicted to simplify this description, there can be a dedicated command queue 128 for each storage device 132 to simultaneously transfer data with all of the storage devices 132.


Write commands received via the fabric I/F 120 are writeback cached in the cache 126 and held there until they are flushed from the cache 126. Flushing is controlled by a policy engine 146 that determines particular sets of write commands that can be simultaneously stored to the different storage devices 132. A policy engine is a processor-based module that includes an application executing an operating system. For example, without limitation, the sets of write commands can be defined by RAID (“redundant array of independent devices”) container services (RCS) 147. The RCS 147 perform data transfers in accordance with RAID algorithms that aim to transfer parity stripes of data across a plurality of the storage devices 132. The functional blocks depicted in FIG. 4 can exist in software or hardware. In the latter, for example, the policy engine 146 can be constructed as a finite state machine.


The controller 112 continuously manages the writeback data processes to optimize throughput performance at each of a number of different operational modes, depending on system-wide conditions and requirements. For example, one operational mode generally involves periodically flushing constant-size blocks of data of a predetermined size. For example, flushing a stripe of RAID data when the entire RAID stripe is cached. Another operational mode generally involves flushing smaller and perhaps varying-size blocks of data. For example, indexing the cached data by age (e.g., time the data has spent in the cache waiting flushing) and flushing an access command when its age exceeds a predetermined age limit.


In accordance with this technology, the aggressiveness with which the cache 126 is flushed can be tied to the I/O load. That is, not flushing aggressively enough during a relatively high I/O load can cause the cache 126 to reach saturation. Conversely, flushing too aggressively during a relatively low I/O load can leave the cache deficient for satisfying potential cache hits. Both scenarios adversely affect data throughput performance.


The policy engine 146 can continuously collect qualitative data about access commands received via the fabric I/F 120 on an I/O-by-I/O basis via path 148. The policy engine 146 can dynamically characterize the I/O load and consequently issue rules via path 150 that govern the cache 126 which, in turn, populates the command queue 128 to define a command profile. The policy engine 146 also stays continuously apprised of the cache 126 state via path 152.


The policy engine 146 can also collect quantitative data about the I/O load in real time, such as the current rate of access commands coming from one or more network requesting devices. That enables the policy engine 146 to dynamically characterize the I/O load and continuously adjust the command profile to the storage devices 132 in relation to the characterization. For example, the policy engine 146 can continuously collect real time data characterizing the I/O load in terms of the ratio of rate sensitive commands (illustrated below) to latency sensitive commands (illustrated below). Writeback cache commands are considered to be rate sensitive commands because it does not matter so much which requests are flushed to the storage devices 132 at any point in time. In fact, rate sensitive commands may even be overwritten while pending in cache 126 as dirty data. What matters is that rate sensitive commands get flushed at a rate that prevents the cache 126 from reaching saturation or starvation.


On the other hand, an access command to read data that is stored in one or more of the storage devices 132 will likely cause the host application to block further processing until the access command is satisfied. The time it takes to satisfy the access command for read data, the latency period, is critical to the performance of the application. Such access commands are thereby referred to as latency sensitive commands. Further, in certain circumstances the host can opt to not authorize writeback caching. In that case an access command for writing data, called a writethrough command, is likewise categorized as a latency sensitive command.


The policy engine 146 can also collect quantitative data characterizing the I/O load in other terms such as the size of the associated data file (bandwidth), the particular host 102 and/or network device initiating the access command, storage device 132 information such as access history, timestamp data, RAID class, and the LUN class to which the access command is directed.


In collecting quantitative data the policy engine 146 preferably tallies counts during each of a predetermined sample interval, such as each one-second interval. A free running counter can be set with a pointer moving the index on one-second boundaries to continuously track the ratio. The counter holds a desired number of previously observed ratios, such as the previous eight one-second sample ratios, with a ninth slot for tallying the current one-second ratio. On the one-second boundaries the index cycles, subtracts the indexed historical value and adds the latest sample value, then divides by eight to calculate the most recent running average of the ratio.


The policy engine 146 can be responsive to performance parameters 154 in formulating rules for flushing the cache 126. The parameters 154 can be quantitative or qualitative. The parameters 154 can include goals, such as but not limited to enforcing a desired command profile that is some factor of the network I/O load in terms of the ratio of latency sensitive commands to rate sensitive commands (ratio of read to write commands for writeback caching), enforcing assigned priorities to different LUN classes, enforcing a desired read command latency, and the like. The policy engine 146 thus can correlate the I/O load characterization with the predefined performance parameters 154 to define the rules for flushing the cache 126. Additionally, the parameters 154 can include system condition information. For example, a power supply indicator may inform the policy manager 146 that the computer system 100 has switched to a backup battery power source. In this condition the policy manager 146 can respond by implementing contingencies to aggressively flush the cache 126 because of the limited power availability. The parameters 154 can also include the state of pending background I/Os, meaning I/Os that are not directly involved with executing access commands.



FIGS. 5, 6, and 7 diagrammatically depict illustrative methodology by which the computer system 100 performs power management in accordance with this technology. For purposes of an example, FIG. 5 depicts two I/O command streams presently stored in the cache 126 (FIG. 4) of one server 104, the I/O1 stream transferring data with data storage devices D1, D2, D3, and the I/O2 stream transferring data with data storage devices D1, D4, D5. FIG. 6 depicts the amount of power allocated to each of the storage devices 132. For purposes of this example each of the storage devices is selectively operable at different power levels from lowest power level “low” to highest power level “high” and an intermediate power level “med.” For example, “low” can be a standby power level of the data storage devices 132, whereas “high” can be a higher read/write power level. In that case “med” can be an intermediate low power idle power level.


For purposes of this example, FIG. 6 depicts the changes in power that are necessary to execute I/O1 and I/O2 command streams. That is, at time t1 to execute the I/O1 stream requires at least the med power level in storage device D1 and the high power level in storage devices D2 and D3. Executing the I/O2 stream requires the high power level in storage device D1, and the high power level in storage devices D4 and D5. The power manager 130 is configured to individually control amounts of power supplied to each of the storage devices 132, as described above. In doing so, the power manager 130 selectively increases an amount of power supplied to one of the storage devices 132, such as increasing the power level from the low power level to the high power level in storage device D1, and such as increasing the power level from the lower power level (such as sleep mode) to the med power level in storage device D4. Generally, in this technology the power manager 130 selectively increases the power to its storage devices 132 based on an amount of power being supplied to the storage devices of a different server 104. Particularly, as explained below, the power manager 130 increases the amount of power supplied to one or more of its storage devices 132 if the increased amount of power is less than a predetermined threshold level of power. Generally, the power manager 130 selectively increases the amount of power to its storage device 132 based on the amount of power already being supplied to all of the storage devices 132 in the entire computer system 100. Alternatively, the power manager 130 can inquire individually to one or more of the other servers 104 to determine what the existing power level is. That power level can be used to define the threshold for determining whether or not to increase the power level to its storage device 132.



FIG. 7 (in conjunction with FIG. 4) depicts a flowchart of steps in a method 158 for POWER MANAGEMENT in accordance with illustrative embodiments of the present technology. The method 158 begins in block 160 where a particular policy manager 146 and power manager 130 cooperatively optimize the total amount of power to the computer system 100, in view of the present operating parameters. As discussed above, for example, during business hours with a heavy I/O load the optimal total power is likely to be relatively high to support aggressive flushing of the cache 126 to prevent saturation. The policy manager 146 also derives a threshold value, T, that is greater than the optimal power level for purposes of the power management control that follows. The value of T can be derived from predetermined margins of the optimal power within which normal variation is either predicted or empirically observed.


In block 162 the particular policy manager 146 considers the next I/O command received from the network. In block 164 the policy manager 146 defines which storage devices (D1-D6) are necessary (the “set”) to execute this I/O command, and at what voltage levels the set is necessary to execute the I/O command. Instead of operating on its individual storage devices 132, the power manager 146 advantageously increases the amount of power to the entire set of storage devices D1-D6 only if a predetermined condition is met. For example, the power manager 146 increases power levels to each storage device D1-D6 in the set if a sum of the increased amount of power to the set and the amount of power already being supplied to another server 104 is less than T.


Assume for the example of FIG. 6 that the computer system 100 is at the moment sufficiently powered to execute I/O1, and the particular policy manager 146 is presently considering I/O2 in block 162. In that event the set defined in block 164 is storage device D1 at V2 volts, and storage devices D4 and D5 at V1 volts. If the I/O command pending in block 162 is flushed from the cache then the total power requirement, Ptot, is (3*V2+2*V1). Block 161 proceeds with consideration of just the first storage device D1 in the set defined in block 164. If the determination of block 166 is favorable, then blocks 163 and 165 increment consideration to the next storage device in the set until all the storage devices D1-D6 in the set are included in the consideration.


In block 166 the particular policy manager 146 determines whether Ptot is less than the predetermined T. If the determination of block 166 is “no,” then the I/O command (sometimes referred to as “access command”) considered in block 162 remains cached in block 168. However, if the determination of block 166 is “yes,” then in block 170 the power manager 146 switches the power to enable the set derived in block 164. The I/O command considered in block 162 is flushed to the command queue 128 in block 172. The command queue 128 is continuously executed in blocks 174, 176 until empty.


The particular policy manager 146 can continuously flush the cache 126 in view of the current power settings, as depicted by input 178 and flushing stream 180. Otherwise, aged access commands in the cache 126 can be re-evaluated via control branch 182. The Ptot and T are periodically evaluated in block 184, in view of any changing parameters. Ptot can be decreased, for example, if the current parameters allow and/or a particular storage device D has been unused at its present power setting for longer than a predetermined time. In any event, control then passes back to consideration of the next I/O command in block 162, whether it be from the network or the cache 126.


Although not depicted in FIG. 7, in equivalent embodiments each power manager, after learning the determination of block 166 is “no,” can opt to nonetheless increase the amount of power to one or more storage devices of the set but to a derated power level that satisfies the Ptot being less than T.


Embodiments of the present invention can be commercially practiced in a Spectra Logic T-950 tape cartridge library manufactured by Spectra Logic of Boulder Colorado. FIG. 8 shows a commercial embodiment of one T-950 tape library without an enclosure. The T-950 tape library has first and second shelf systems 1901, 1902 that support a plurality of the mobile media, such as the magazine 192 holding a plurality of LTO tape cartridges with MAMs, archived by the tape library. The shelf systems 1901, 1902 can each have at least one auxiliary memory reader. Disposed next to the second shelf system 1902 are at least four IBM LTO tape drives to write data to and read data from a tape cartridge. The IBM LTO tape drives each have the capability of storing data to an auxiliary radio frequency memory device contained in an LTO tape cartridge. Between the first and second shelf systems 1901, 1902 is a magazine transport space 198. The magazine transport space 198 provides adequate space for a magazine 192 to be moved, via the transport unit, from a position in the first shelf system 1901, for example, to a tape drive. The transport unit can further accommodate at least one auxiliary radio frequency memory device reader. Magazines 192 can be transferred into and out from the T-950 tape library via the entry/exit port 200. Transferring magazines 192 in and out of the T-950 tape library can be accomplished by an operator, or by an automated material handling system. The T-950 tape library has cooling fans 202 located in the base. The T-950 tape library can be linked to a central data base to control movement of the auxiliary radio frequency memory devices as indicated by readings from the device readers. The T-950 tape library also includes a library central processing unit providing top-level control and coordination of all processes. The T-950 tape library also provides a graphical user interface displaying assessment results or simple messages such as an audible or visual alert accompanying recommendations for further action(s).


It is to be understood that even though numerous characteristics and advantages of various embodiments of the present invention have been set forth in the foregoing description, together with the details of the structure and function of various embodiments of the invention, this disclosure is illustrative only, and changes may be made in detail, especially in matters of structure and arrangement of parts within the principles of the present invention to the full extent indicated by the broad general meaning of the terms in which the appended claims are expressed. For example, multiple write commands can be simultaneously interleaved by the path controller 112 in performing the I/O command throughput for example, while still maintaining substantially the same functionality without departing from the scope and spirit of the claimed invention. Another example can include using these techniques across multiple libraries, while still maintaining substantially the same functionality without departing from the scope and spirit of the claimed invention. Further, though communication is described herein as between a host and the tape library, communication can be received directly by a tape drive, via the fabric interface for example, without departing from the scope and spirit of the claimed invention. Further, for purposes of illustration, a tape drive and tape cartridges are used herein to simplify the description for a plurality of drives and tape cartridges. Finally, although the preferred embodiments described herein are directed to tape drive systems, and related technology, it will be appreciated by those skilled in the art that the claimed invention can be applied to other systems, without departing from the spirit and scope of the present invention.


It will be clear that the claimed invention is well adapted to attain the ends and advantages mentioned as well as those inherent therein. While presently preferred embodiments have been described for purposes of this disclosure, numerous changes may be made which readily suggest themselves to those skilled in the art and which are encompassed in the spirit of the claimed invention disclosed and as defined in the appended claims.


It is to be understood that even though numerous characteristics and advantages of various aspects have been set forth in the foregoing description, together with details of the structure and function, this disclosure is illustrative only, and changes may be made in detail, especially in matters of structure and arrangement to the full extent indicated by the broad general meaning of the terms in which the appended claims are expressed.

Claims
  • 1. An apparatus comprising: a first electronic device configured to execute input/output (I/O) commands via a network; anda second electronic device configured to execute I/O commands via the network, the second electronic device comprising a power manager application configured to control an amount of power supplied to the second electronic device based on an amount of power being supplied to the first electronic device.
  • 2. The apparatus of claim 1 wherein the power manager application is a second power manager application, and wherein the first electronic device comprises a first power manager application configured to control an amount of power supplied to the first electronic device based on an amount of power being supplied to the second electronic device.
  • 3. The apparatus of claim 2 comprising a third electronic device comprising a third power manager application configured to control an amount of power supplied to the third electronic device based on an amount of power being supplied to at least one of the first and second electronic devices.
  • 4. The apparatus of claim 3 wherein each of the power manager applications is configured to increase the amount of power supplied to the respective electronic device based on the amount of power being supplied to the plurality of electronic devices.
  • 5. The apparatus of claim 4 wherein each power manager application is configured to increase the amount of power to the respective electronic device if a sum of the increased amount of power and the amount of power being supplied to all of the electronic devices is less than a predetermined threshold.
  • 6. The apparatus of claim 5 wherein each of the electronic devices comprises a plurality of data storage devices that are selectively powered in order to execute a particular I/O command.
  • 7. The apparatus of claim 6 wherein each of the power manager applications is configured to selectively inquire to at least one of the other electronic devices to determine the current power level of the data storage devices in the other electronic device.
  • 8. The apparatus of claim 7 wherein the sum is a first sum and each power manager, after not increasing the amount of power because the first sum is greater than the predetermined threshold, is configured to selectively increase a derated amount of power if a second sum of the amount of the derated power and the amount of power being supplied to at least one of the other electronic devices is less than the predetermined threshold.
  • 9. The apparatus of claim 7 comprising a policy manager application in each controller configured to continuously characterize a network I/O load on the respective electronic devices, and wherein each power manager application is configured to set the predetermined threshold to a value that is based on the network I/O load characterization.
  • 10. The apparatus of claim 9 wherein the policy manager application is configured to quantitatively characterize the network I/O load.
  • 11. The apparatus of claim 10 wherein the policy manager application is configured to qualitatively characterize the network I/O load.
  • 12. The apparatus of claim 11 wherein the policy manager application is configured to set the predetermined threshold to a value that maintains operation of the apparatus within a desired power mode setting.
  • 13. The apparatus of claim 12 wherein the power manager application is configured to change the power mode setting based on an available capacity of at least one of the data storage devices for storing unexecuted network I/O commands.
  • 14. The apparatus of claim 12 wherein each of two or more of the data storage devices is a nonvolatile storage device.
  • 15. The apparatus of claim 14 wherein each of two or more of the data storage devices is a tape drive device.
  • 16. The apparatus of claim 15 wherein each power manager application, after not increasing the amount of power to the first electronic device to execute a selected one of the I/O commands because either the first or the second sum is greater than the predetermined threshold, is configured to cache the selected I/O command.
  • 17. A computer apparatus comprising a plurality of electronic devices, each electronic device comprising an controller configured to communicate input/output (I/O) commands and power control circuitry configured to selectively increase an amount of power supplied to one of the electronic devices based on an amount of power being supplied to at least one of the other electronic devices.
  • 18. The computer apparatus of claim 17 wherein each power control circuitry is configured to increase the amount of power supplied to at least one of the electronic devices if a sum of the increased amount of power and the amount of power being supplied to at least some of the electronic devices is less than a predetermined threshold.
  • 19. The computer apparatus of claim 18 comprising a policy manager application configured to continuously characterize a network I/O command load on the data storage apparatus, and wherein each power control circuitry is configured to set the predetermined threshold to a value that is based on the network I/O command load characterization.
  • 20. A method comprising: obtaining an apparatus including a plurality of electronic devices operably executing input/output (I/O) commands via a network; andindividually controlling amounts of power supplied by each of the electronic devices, based on one of the electronic devices determining the amount of power supplied by at least one other of the electronic devices.