The present invention generally relates to environmental control systems, such as heating, ventilation, and air conditioning (HVAC) systems, which can be used to control the temperature and humidity of common spaces, e.g., as can exist in data centers containing server computers. More, specifically the present invention can relate to efficiently maintaining certain environmental conditions by increasing or decreasing an operation level (e.g. starting and stopping) of respective units (modules) of an environmental control system.
Modern datacenters use HVAC systems to control indoor temperature, humidity, and other variables. It is common to have many HVAC units deployed throughout a data center. They are often floor-standing units, but may be wall-mounted, rack-mounted, or ceiling-mounted. The HVAC units also often provide cooled air either to a raised-floor plenum, to a network of air ducts, or to the open air of the data center. The data center itself, or a large section of a large data center, typically has an open-plan construction, i.e. no permanent partitions separating the air in one part of the data center from the air in another part. Thus, in many cases, these data centers have a common space is temperature-controlled and humidity-controlled by multiple HVAC units.
HVAC units for data centers are typically operated with decentralized, stand-alone controls. It is common for each unit to operate in an attempt to control the temperature and humidity of the air entering the unit from the data center. For example, an HVAC unit may contain a sensor that determines the temperature and humidity of the air entering the unit. Based on the measurements of this sensor, the controls of that HVAC will alter operation of the unit in an attempt to change the temperature and humidity of the air entering the unit to align with the set points for that unit.
For reliability, most data centers are designed with an excess number of HVAC units. Since the open-plan construction allows free flow of air throughout the data center, the operation of one unit can be coupled to the operation of another unit. The excess units and the fact that they deliver air to substantially overlapping areas provides redundancy, which ensures that if a single unit fails, the data center equipment (servers, routers, etc.) will still have adequate cooling.
Embodiments of the present invention provide systems and methods for evaluating operational redundancy of a system based on environmental maintenance modules (e.g. HVAC units). In various embodiments, a system can heat and/or cool an environment. Sensors can measure temperatures, power consumption and other information at various points within the environment. The calculated operational redundancy values are useful tools for evaluating the likelihood that the system can withstand extreme events and/or component failures and still keep an environmental value such as temperature within a desired range.
In an embodiment, a method of obtaining an operational redundancy value for a system including a plurality of environmental maintenance modules for maintaining an environmental value within a specified range includes monitoring the plurality of environmental maintenance modules, while the environmental maintenance modules are running, to receive operational data regarding a level of operation of each of the plurality of environmental maintenance modules. The method also includes determining an operational weight for each of the plurality of environmental maintenance modules based on the operational data of each of the environmental maintenance modules, computing an available capacity of the system based on the operational weights of the plurality of environmental maintenance modules, and determining a required capacity for the system to maintain the environmental value within the specified range when a load exists for the plurality of environmental maintenance modules. The method also includes calculating the operational redundancy value based on the available capacity and the required capacity and providing a message based on the operational redundancy value. In a further embodiment, a computer product includes instructions for implementing the method. Still further embodiments are directed to systems and computer readable media associated with methods described herein.
A better understanding of the nature and advantages of embodiments herein may be gained with reference to the accompanying drawings and remaining portions of the specification, including the claims. In the drawings, like reference numbers can indicate identical or functionally similar elements.
An “environmental maintenance system” may include any system for controlling the environment of a space (an “environmentally-controlled space”). Environmental maintenance systems can include one or more “environmental maintenance modules” such as heating, ventilation, and air conditioning (HVAC) units, air handling units (AHUs), computer room air conditioner (CRAC) units, etc. Each of the environmental maintenance modules may include one or more sensors.
A “sensor” may include any device that measures a quantity at a location. For example, a sensor may measure temperature, humidity, pressure or flow of a liquid or gas, speed of a motor, electrical current, voltage or power consumption, etc. In some cases, a sensor may be a part of an environmental maintenance module. In other cases, a sensor may be standalone; for example, it may not be integrated or associated with a specific environmental maintenance module.
“Operational data” may include any number, percentage, or other quantity that measures, or is calculated from measurements of, the operation, effect, efficiency or operational health of an environmental maintenance system. For example, raw data from a sensor may be considered operational data; similarly, statistics derived from such data (e.g., heat extraction rate for an airflow, calculated from incoming temperature, final temperature, and flow rate of the airflow) are also operational data. An example of operational data based on other operational data is a “Coefficient of Performance” (COP). COP is an operational performance metric for a piece of equipment that quantifies its actual performance; in the case of a cooling unit, COP may be expressed as a ratio of the unit's cooling rate with its power consumption.
“Available capacity” means a number or capacity of one or more environmental maintenance modules in terms of their current ability to maintain a desired appropriate environmental value. In some implementations, environmental maintenance modules that are known to be operating with some degree of impairment maybe counted towards available capacity. In other embodiments, an impaired module is counted partially toward available capacity, with its contribution only counted to the degree of its impaired capacity, such as being weighted with a Coefficient of Performance (COP) of less than a design capacity for the impaired module, or a measured value such as heat transfer capacity.
Redundancy is often employed in a variety of systems to ensure performance to critical specifications, so that if the systems have one component fail, others can carry the load without the single failure starting a whole system failure. In environmental maintenance systems, redundancy often takes the form of installing more heating or cooling subsystems than “should be” necessary to heat or cool a physical space.
Embodiments herein recognize that a further layer of security can be realized by not simply relying on redundancy as installed, but rather by periodically evaluating and calculating operational redundancy, taking into account measured status and/or health of the heating or cooling systems, as well as the actual load on those systems. The general concept of redundancy will be discussed first, followed by introduction of operational redundancy principles and calculations.
To ensure that an environment (e.g. a data center) is sufficiently cool or warm, standard operating procedure is to deploy, and sometimes operate, extra HVAC units (or other environmental maintenance modules) beyond what is marginally required. Recommended levels of system redundancy (for all types of data center infrastructure, including that of heating/cooling systems, hereinafter called environmental maintenance modules) for data centers are specified in industry standard documents such as TIA-942, the Telecommunications Industry Association's Telecommunications Infrastructure Standard for Data Centers. TIA-942 assigns “tiers” to data center facilities that depend on various factors including environmental maintenance module redundancy.
Tier 1 data centers need only have enough design capacity to meet the data center's needs under nominal operating conditions. If a number of environmental maintenance modules that is adequate to meet such needs when operating at design capacity is defined as a number N, then the Tier 1 requirement is for N modules. Tier 2 data centers require at least some design redundancy in case of an environmental maintenance module failure; the Tier 2 requirement is for N+1 modules. Tier 3 and Tier 4 data center environmental maintenance module redundancy requirements vary depending on architecture of the modules (e.g., whether they derive power from common sources and/or reject heat to other units in a laddered approach); redundancy of up to 2(N+1) modules is required in certain cases.
Thus, generally speaking, redundancy in cooling systems and electric power systems of mission-critical facilities is traditionally defined as the total number of units installed minus the number of units required to service the load, assuming each unit operates at its design operating point. Redundancy is traditionally expressed in terms of the number of redundant units. Examples are N+1, N+2, or 2N, where N is the number of units required to service the load. In a mission-critical cooling or power distribution application such as data center cooling, telecom office cooling, or cellular site cooling, redundancy is a necessary feature to guarantee uptime in the event of a cooling unit failure. The traditional definition of redundancy is a design metric. It does not account for the fact that cooling units and uninterruptible power supply (UPS) units degrade with time and use.
Embodiments can use an operational redundancy metric that accounts for performance degradation of environmental maintenance modules (e.g., cooling units, heating units, UPS units, etc.) over time. This redundancy metric can be correlated with failure so that alerts and warnings can be dispatched to operators when the level of operational redundancy has reached a low enough threshold to indicate high risk. Thus, equipment maintenance can be performed as an optimized, quantitative tradeoff between cost and risk. Furthermore, the energy-saving benefits of maintaining equipment to reduce risk can be factored in to offset the cost of maintenance.
A performance-based (e.g., operational) redundancy metric can improve capacity planning. For example, a colocation operator ideally knows quantitatively (not just as a design assumption) if there is enough excess cooling capacity to sell additional information technology (IT) services to a new customer. The new IT services will produce additional heat that must be extracted. If the traditional design redundancy calculation were used to determine excess capacity that could be sold, it might cause the colocation operator to sell poorly performing capacity with a high likelihood of cooling system failure in the future.
Embodiments of operational redundancy can analyze data from sensors throughout an environment (e.g., sensors within environmental maintenance modules, sensors at locations outside of modules, or internal health check or self-diagnostic information from the modules) to determine actual operational health of specific modules. An operational redundancy value is then calculated, in embodiments, starting with actual operational data for specific modules and deriving an available capacity metric for the entire system, instead of basing redundancy calculations on assumptions such as design capacities of the modules. This metric may be called the Redundancy Value (RV). Related metrics express redundancy in various terms, such as number of redundant modules deployed, percentage of redundancy as a percentage of total modules deployed, redundancy in terms of heat transfer capacity, and the like.
The operational redundancy value thus varies according to the operational health of the modules, and can also vary according to a load presented to the system (e.g., heat generated by data center equipment that must be removed). In certain embodiments, load is estimated or assumed, while in other embodiments, load is calculated from measured parameters (such as electrical power consumed by data center equipment). With a calculation or estimation of load in place, a required capacity to meet the load can be calculated, again taking into account the actual operational data for specific modules. The operational redundancy value can provide valuable insight into the effective redundancy of the system; for example the operational redundancy value can be calculated in real time and used to alert appropriate personnel when it drops below a threshold, or can be calculated based on data for a historical period to correlate to system performance over the historical period.
Embodiments can be used to know when a cooling or power system is at risk of failing. In the case of cooling systems, this risk could be caused by too much heat generation from IT equipment, by performance degradation of cooling equipment, or both. Embodiments can also be used to alert a data center service provider about the risk of selling capacity that is not healthy, therefore helping avoid a customer outage. Embodiments can also enable maintenance optimization to manage risk of failure. For example, instead of maintaining all equipment on a scheduled basis, an operator can maintain cooling or power equipment to within an acceptable level of risk, thereby achieving lower energy consumption while avoiding unnecessary maintenance costs. For colocation data centers, embodiments allow the colocation operator to maximize revenue without incurring too much risk of a customer outage due to cooling system or power system failure.
The performance-weighted redundancy value can be based on performance measurements that can be readily acquired and installed. Embodiments can be applied to cooling systems of all sizes and configurations, from very large data centers with hundreds of cooling units to small, cellular base stations that typically have just two air-conditioners and an outdoor air economizer fan.
The performance-weighted redundancy value can be easily understood by a cooling system operator. The values of the performance-weighted redundancy value can be presented in traditional redundancy terms (e.g., N+1, N+2, 2N, 2(N+1)) and they can be directly related to compliance with design standards such as TIA-942, supra. Alternatively, the performance-weighted redundancy value can be presented as a ratio or percentage, either of the number of cooling units or of the amount of cooling capacity. Another advantage is that embodiments yield one or more metrics that are actionable for the user and may be used for “what-if” type scenarios to determine a more cost effective repair strategy than traditional unit counting.
The techniques herein do not require an automatic control system. Advantageously, a monitoring and alerting/reporting system are used, but are not essential. For example, the disclosed metrics can be calculated based on historical data and/or correlated to known thermal events, to support business decisions about implementing additional environmental module capacity.
Certain embodiments benefit from more instrumentation than is typically factory-installed in cooling units. In particular, for certain types of cooling equipment, such embodiments benefit from power monitoring instrumentation and/or flow monitoring instrumentation.
In one embodiment, fan 250 is a centrifugal fan driven by an alternating current (A/C) induction motor. The induction motor may have a variable speed (frequency) drive (VSD) 255 for changing its speed. An optional sensor 260 measures return air temperature, and an optional sensor 270 measures discharge air temperature.
Sensors 222, 224, 226, 260 and/or 270 may be for example wireless sensors that acquire and transmit information wirelessly, or they may be connected via wires or optical (e.g., fiber optic) connections; for example, sensors 222, 224, 270 and 260 may be probes tethered to a local host 280.
Sensors 222, 224, 226, 260 and/or 270 send information to a host computer 290. It should be understood that host computer 290 receives information from more than one set of sensors 222, 224, 226, 260 and/or 270 and is thus typically located remotely from AHU 200, but in embodiments host computer 290 may form part of, or be located with, one AHU 200 while receiving temperature information from sensors of other AHUs 200. In one example, sensors 222, 224, 226, 260 and/or 270 transmit wirelessly through a wireless network gateway to host computer 290. In another example, sensors 222, 224, 226, 260 and/or 270 pass at least some part of the information to local host 280, which relays the temperature information to host computer 290, either wirelessly or through wired or optical connections. Alternatively, some of the information can be passed directly from sensors 222, 224, 226, 260 and/or 270 to host computer 290, while other information is transmitted first to local host 280 and relayed to host computer 290. In other embodiments, AHU 200 has capability to monitor itself, and formulates one or more operational health and/or self-diagnostic metrics that can be used in place of raw data from sensors to determine operational health of AHU 200.
Host computer 290 monitors the information received from AHUs 200 and calculates an operational redundancy value for the system that includes AHUs 200 (e.g., data center 10). The operational redundancy value, sometimes referred to herein as a Coefficient of Redundancy or RV, is calculated based on operationally weighted performance of each AHU 200, instead of a heat extraction design specification or capacity of each AHU 200. The operationally weighted performance is based on sensor data (e.g., from sensors 222, 224, 226, 260 and/or 270) or operational health and/or self-diagnostic metrics of each AHU 200. Each AHU 200 may perform above or below its stated heat extraction design capacity, and performance of an AHU 200 typically degrades over time due to a variety of wearout mechanisms.
An operational redundancy value may be based on theoretical load on the system, or on one or more measurements of system load. For example, when the system is a data center that requires cooling, the load may be measured by assessing power consumed by the data center, or by measuring and adding the heat removed by the AHUs. The load may be expressed in terms of an equivalent number of AHUs required to remove the heat, with excess AHUs being considered redundant.
A. Modules Operating Separately
An example calculation of a redundancy value assumes a number T of environmental maintenance modules, in this case cooling units, of similar capacity operate separately from one another in terms of heat dissipation capability. For example, each cooling unit may have a dedicated condenser. There may be other aspects in which the modules operate together, such as being controlled by a common control system, have a common power source and the like, but efficiency of each unit does not depend significantly on efficiency of the other units. This case is illustrated for example in
Part of the redundancy value calculation involves calculating a number of available environmental maintenance modules, S, based on the number and operational condition of the modules that are present and operating. Sensors that evaluate performance of each AHU provide information to a host computer, which calculates a coefficient of performance (COP) or weight Wi associated with each AHU i (where i is an index value). Certain COP calculations and appropriate values are specified in standards such as the American Society of Heating, Refrigerating, and Air-Conditioning Engineers (ASHRAE) Standard 90.1.
The weights used to define S can be computed based on the measured performance of the cooling units relative to a standard or expectation. In certain embodiments, Wi has a value from zero (the AHU is effectively broken, it removes no heat) to 1 (the AHU is performing at its design capability). In other embodiments, Wi may be allowed to have a value greater than one (the AHU's performance exceeds its design capability). For a direct-expansion cooling unit, the weight can be a function of the coefficient of performance (COP) of the unit. Thus, Wi may be a ratio of a heat extraction rate in thermal kilowatts (kWt) to its electrical consumption (kWe). In embodiments, Wi may be calculated in other ways such as averaging over time, or as a binary function that compares a COP of AHU i with a minimum performance threshold, MinStdCOP. In these embodiments, Wi=1 when COP>MinStdCOP, otherwise Wi=0. The performance threshold MinStdCOP can be determined in a variety of ways, such as basing MinStdCOP on design capacity of the AHU, evaluating the kWt/kWe ratio and the like. For example, a useful value of MinStdCOP is the minimum standard level defined by ASHRAE Standard 90.1. For a medium-capacity, air-cooled direct-expansion (DX) unit, this value is 2.1, meaning that the cooling rate of the unit (kWt) should be at least 210% of the electrical energy consumption of the unit (kWe). Units with a COP below MinStdCOP are said to be poorly performing. They are operating with a sub-standard level of efficiency.
Alternatively, in embodiments, MinStdCOP could be variable, and dependent on exogenous variables such as outdoor air temperature, return air temperature, discharge air temperature, or any other parameter that affects the performance (e.g., COP) of the cooling unit. Then MinStdCOP could be defined as a fraction of the expected COP.
Partial or complete failure of a cooling unit is known to have an adverse impact on COP, which is why COP is a good choice for a DX cooling unit. To attenuate noise, COP may be computed as the average or sum of heat extraction rate over a period of time divided by the average or sum of electrical energy consumption over the same period of time. For DX cooling units, the weights can be a binary function of the COP, a linear function of the COP, or any other monotonically increasing function of the COP.
A weight Wi can also be a calculated or modeled probability that an environmental maintenance module will continue to operate for an additional period of time. This probability is typically called a survival function, and could be a function of the COP or exogenous variables such as a type, make, or model of environmental maintenance module.
Having determined Wi for each AHU i, an effective number S of AHUs at the system level is:
S=Σ
i=1
T
W
i Eq. 1
In embodiments, to provide a conservative measure of redundancy, S may be truncated to the nearest integer.
Next in the calculation of a redundancy value is determination of a load L and its expression as a required capacity to maintain the environmental value (e.g., temperature). In embodiments, L is determined in terms of equivalent AHUs required by first calculating a cooling rate hi for each AHU i, typically averaging the cooling rate over some time interval. A sum of the cooling rates hi provides a net cooling rate H:
H=Σ
i=1
T
h
i Eq. 2
For systems that use environmental maintenance modules with identical design capacity, H is divided by the design capacity, (and optionally, for a conservative measure, rounded up to the nearest integer) to get L, representing the required capacity:
L=int(H/(design capacity))+1 Eq. 3
For systems that use environmental maintenance modules with differing heat extraction design specifications or capacities, required capacity L is the largest number of available cooling units that are collectively required to provide cooling rate H. In these embodiments, AHUs are considered in increasing order of design capacity, that is, the available units with the lowest capacity are considered first. Design capacity of each AHU is subtracted from H until the result is negative, with required capacity L being the number of AHUs subtracted to obtain the first negative result. This is a conservative result because it makes L as large as possible, leaving fewer AHUs left over for redundancy.
Once L is determined, the operational redundancy value RV is determined as:
RV=S−L Eq. 4
The operational redundancy value RV can be interpreted to provide useful conclusions about the system that it characterizes. A negative RV implies that poorly performing units are carrying the burden of maintaining the environmental value. Negative RV implies a high level of operational risk. That is, the system may be unable to maintain the environmental value at all; if it does, even a slight degradation in performance or any additional load may make the system unable to maintain the environmental value. An RV that is greater than or equal to zero, but less than a number of redundant units desired for the type of system being characterized, means that there is less redundancy available than is desired. While the system may be operating normally, it does not have the robustness normally expected for the type of system or for its design intent. Such levels of RV imply a medium level of operational risk. An RV that meets or exceeds the number of redundant units desired for the type of system being characterized implies an acceptable level of operational risk.
The operational redundancy value can also be computed using physical units of heat transfer, or as a percentage of total units or of total cooling capacity. When computed using units of heat transfer, the operational redundancy value may be designated as RVh; computed as a percentage of total units it may be designated as RVu; computed as a percentage of total cooling capacity it may be designated as RVc. By extension, calculation of RV for other types of systems would involve determining and converting actual results of system components over time, and calculating corresponding sums and/or ratios of the quantities that are exemplified by calculations related to cooling systems in Eqs. 5-8 below.
The operational redundancy value RV as computed according to Eq. 4 above is an integer value of redundant cooling units, and the variable RV as used herein without a subscript is assumed to refer to RV as computed by Eq. 4. However, in embodiments, it is also possible to calculate an operational redundancy value in other terms. For example, to compute an operational redundancy value in units of heat transfer (e.g., kWt), first an available cooling capacity Sh in heat transfer units (e.g., kWt) is calculated using the following equation:
S
h=Σi=1TWiCi Eq. 5
where Ci is the design capacity of cooling unit i in units of heat transfer (e.g., kWt).
Then RVh is computed by the following equation:
RVh=Sh−Σi=1LCi Eq. 6
where the values of C in Eq. 6 are sorted in ascending order as the index i goes from 1 to the load L, as described above.
To compute an operational redundancy value in units of percent of total units, certain embodiments use the following equation:
where S, L and T are as described above.
To compute an operational redundancy value in units of percent of total cooling capacity, certain embodiments use the following equation:
B. Modules Operating in Hierarchical Designs
Some systems of environmental maintenance modules (such as, but not limited to cooling systems) have a hierarchical design. In such cases where the environmental maintenance modules are cooling systems, cooling units extracting heat from the controlled space are served by other units that extract heat from the controlled-space cooling units to the atmosphere. One example of this design is a system where direct-expansion (DX) space cooling units are served by one or more dry coolers. A second example of this design is a system where chilled water space cooling units that are served by one or more chiller plants (e.g., as shown in
Consider a system in which eight environmental maintenance modules are served by two master units, and assume without loss of generality that one master unit serves four of the modules while a second master unit serves modules another four of the modules.
S=W
D,1Σi=14WC,i+WD,2Σi=58WC,i Eq. 9
where the D subscript refers to a particular one of the master units 340, and the C subscript refers to a particular one of the environmental maintenance modules 330. The weights of the master units 340 can be computed in a similar manner to the weights of environmental maintenance modules that operate separately from one another, where the weight can be a function of a COP or similar metric (e.g., heat transfer divided by power consumption), or another performance metric such as expected cooling rate of the master unit 340. If an expected cooling rate were used instead of, or in addition to COP, its value could be dependent on exogenous variables such as outdoor temperature and humidity.
The operational redundancy values calculated herein can be used to characterize robustness of systems in a wide variety of proactive and reactive ways. For example, in an embodiment, environmental maintenance modules of a system can be monitored in real time, operational weights for each of the modules can be determined, and available capacity can be calculated from the operational weights. A load on the system can be measured or assumed, and an operational redundancy value can be calculated based on a difference between the available capacity and required capacity to meet the load.
The operational redundancy value can form the basis of messages to a system operator. In particular, the operational redundancy value can be compared to one or more thresholds to assign an alert level to the system, and the messages may include only the alert, or may also contain the operational redundancy value itself, and/or related information about specific environmental maintenance modules, system loads and the like. Messages may be sent in the form of items displayed on a computer monitor, or may be telephone or Web based alerts such as emails, text messages, and the like.
For example, as noted above, a negative operational redundancy value implies that poorly performing units are carrying the burden of maintaining the environmental value, and implies a high level of operational risk. A system that calculates an operational redundancy value can compare the result to zero and assign a “Red” alert level (or other color or label) based on the operational redundancy value being negative. A message may be sent to the system operator when the assigned alert level is one of a selected subset of alert levels. For example, a message might include the system level “Red” alert as well as indications of which environmental maintenance modules are performing poorly, abnormal load conditions and the like. The operator might be prompted to take actions such as reducing load, turning on additional environmental maintenance modules, notifying a supervisor and the like. An RV that is greater than or equal to zero, but less than a number of redundant units desired for the type of system being characterized, means that there is less redundancy available than is desired, and implies a medium level of operational risk. A system that calculates RV can compare the result to zero and/or a desired number of redundant units, assign a “Yellow” alert level (or other color or label) based on RV being in this range. The system may provide similar messages based on selected alert levels to prompt similar responses by the operator as those discussed above. Similar actions can be taken on the basis of operational redundancy values other than the unsubscripted RV.
An RV that meets or exceeds the number of redundant units desired for the type of system being characterized implies an acceptable level of operational risk. A system that calculates RV can compare the result to a desired number of redundant units, and assign a “Green” alert level (or other color or label) based on RV being in this range. An RV that significantly exceeds the number of redundant units desired for the type of system being characterized implies both an acceptable level of operational risk and a possibility that some units of excess capacity could be shut down (e.g., to reduce operational cost, or for maintenance), but still leave the system with enough redundancy to maintain the acceptable level of operational risk. A system that calculates RV can compare the result to a desired number of redundant units, and assign a “Blue” alert level (or other color or label) based on RV being in this range. Similar actions can be taken on the basis of operational redundancy values other than the unsubscripted RV.
In another embodiment, a monitoring business can implement a monitoring system as a service to a data center business. The monitoring business may add sensors to existing environmental maintenance modules and/or access information already available from the modules, periodically calculate an operational redundancy value, send messages and/or alerts, store the operational redundancy value calculations or provide other services that help the data center business manage its environmental maintenance resources.
In another embodiment, operational redundancy values can be generated from historical data of a system, and the operational redundancy values (and/or alerts generated from the values) can be correlated to system events such as failures. In this embodiment, correlation of operational redundancy values to system events can be used to inform decision-making about investments in system capacity (e.g., whether to invest in additional environmental maintenance modules or master units) and/or monitoring capacity (e.g., whether to invest in sensing and analysis equipment that can produce operational redundancy values and alerts in real time).
In still another embodiment, operational redundancy values can be generated based on combinations of historical data of a system, and assumptions about the system, as “what if” exercises. For example, data center operators generally strive to sell or rent as much space in data centers as possible, but use of such space may be constrained by the data center's ability to remove heat from both existing and proposed operations, with or without redundant capacity. If load L is expressed in terms of a number of environmental maintenance modules sufficient to meet a cooling need (e.g., see Eqs. 2 and 3 above) and a desired number of redundant units R is a desired number of environmental maintenance modules required for an expected level of redundancy (as per an applicable tier requirement in TIA-942), then an excess number of cooling units E can be expressed as:
E=T−L−R Eq. 10
E thus represents cooling capacity that can be considered available to meet cooling needs for new equipment that may be added to a data center, or as additional redundancy/security for existing IT equipment. When addition of servers to an existing data center is considered, it is highly advantageous to evaluate E utilizing actual data for the data center, to minimize the chances that unwarranted assumptions may be made about the cooling capacity. If some of what appears to be excess capacity is poorly performing, it should not be sold until the performance of the environmental maintenance modules has been brought back up above a minimum standard level indicative of sound operation. A number of available or allowable units out of the excess that can be sold, denoted as A, is equal to the maximum of S−L−R or 0:
A=max(0,S−L−R) Eq. 11
In yet another “what if” exercise, operational redundancy values may be calculated from operational data as shown in the above equations, but with weights Wi of specific environmental maintenance modules excluded from the calculation of available cooling capacity S. When S calculated in this manner is then utilized in Eq. 4 to generate RV, the value of RV reflects the operational redundancy that would exist if the specific environmental maintenance modules were not operating. The resulting value of RV can then be utilized to understand how much redundancy would remain in the system should the specific modules be taken offline for repair or replacement.
The specific details of the specific aspects of the present invention may be combined in any suitable manner without departing from the spirit and scope of embodiments of the invention. However, other embodiments of the invention may be directed to specific embodiments relating to each individual aspects, or specific combinations of these individual aspects.
It should be understood that the present invention as described above can be implemented in the form of control logic using computer software in a modular or integrated manner. Software may be stored, for example, in non-transitory, computer readable media, and when executed by a processor, will cause the processor to execute calculations and methods such as discussed above. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will know and appreciate other ways and/or methods to implement the present invention using hardware and a combination of hardware and software.
A step 402 monitors environmental maintenance modules to receive operational data. Step 402 may be done in real time or may be done by gathering stored data from the environmental maintenance modules. The operational data may be raw data from sensors of the environmental maintenance modules, or may be one or more operational health and/or self-diagnostic metrics provided by the environmental maintenance modules. An example of step 402 is receiving data from any of sensors 222, 224, 226, 260 and/or 270,
A step 404 determines an operational weight Wi for each of the environmental maintenance modules based on the operational data. An example of step 404 is calculating the operational weights from the operational data, utilizing a lookup table to determine the operational weights from the operational data, or comparing the operational data with one or more thresholds to determine the operational weights Wi.
A step 406 computes an available capacity metric for the system based on a sum of the operational weights Wi. An example of step 406 is adding together the operational weights Wi to form a value of S (Eq. 1). S is the effective number of cooling units that are operating to some minimal performance standard; that is, S is an operational value not a design assumption.
A step 408 determines a system capacity that is required to maintain an environmental value within a specified range, given a system load. The load may be measured or estimated. An example of step 408 is calculating a load L (Eq. 3) expressed as a number of environmental maintenance modules required to maintain the environmental value.
A step 410 of method 400 calculates an operational redundancy value based on a difference between the available capacity metric and the required system capacity. One example of step 410 is subtracting L from S to form a redundancy value RV (as per Eq. 4 above); other examples include expressing available capacity and load in differing units that relate to module performance, and calculating appropriate sums and/or ratios thereof, as per Eqs. 5-8 above.
Method 400 optionally returns to step 402 after step 410, but in embodiments, an optional step 412 provides a message based on the operational redundancy value. In embodiments, the message is simply storage of the calculated operational redundancy value; alternatively, the message may be display of the operational redundancy value, and/or an alert based thereon, to an operator of the system. If optional step 412 is performed, method 400 thereafter returns to step 402.
In method 420, a step 422 monitors environmental maintenance modules to receive operational data relative to heat transfer capacity. Step 422 may be done in real time or may be done by gathering stored data from the environmental maintenance modules. The operational data may be raw data from sensors of the environmental maintenance modules, or may be one or more operational health and/or self-diagnostic metrics provided by the environmental maintenance modules. An example of step 422 is receiving data from any of sensors 222, 224, 226, 260 and/or 270,
A step 424 determines an operational weight Wi for each of the environmental maintenance modules based on the operational data. Like step 404 of method 400 above, an example of step 424 is calculating the operational weights from the operational data, utilizing a lookup table to determine the operational weights from the operational data, or comparing the operational data with one or more thresholds to determine the operational weights Wi.
A step 426 computes an available heat transfer capacity based on a sum of the operational weights multiplied by the respective design capacities of the environmental maintenance modules. An example of step 426 is multiplying the operational weight Wi for each environmental maintenance module by the design capacity of that module, and adding together the products to form a value of Sh (Eq. 4). Sh is the effective amount of available heat transfer capacity at the system level; that is, Sh is an operational value, not a design assumption.
A step 428 determines a required capacity in terms of heat transfer, for the system to maintain the selected environmental variable within a specified range, given a system load. The load may be measured or estimated. An example of step 428 is calculating a load L (Eq. 3) expressed as a number of environmental maintenance modules required to maintain the selected environmental variable.
A step 430 of method 420 calculates an operational redundancy value based on a difference between the available heat transfer capacity, from step 426, and the required capacity, using information from step 428. One example of step 430 is subtracting design capacities of the environmental maintenance modules needed to meet load L, from Sh to form a redundancy value RVh as per Eq. 6 above. That is, first environmental maintenance module design capacities Ci, in units of heat transfer, from the smallest to larger design capacity modules, are summed until the total exceeds L. The sum is then subtracted from Sh to yield RVh, as per Eq. 6.
Method 420 optionally returns to step 422 after step 430, but in embodiments, an optional step 432 divides operational redundancy value RVh by a total of the designed heat capacities of the environmental maintenance modules, to express the operational redundancy as a percentage of designed capacity, RVc. It will be appreciated that since the total of the design heat capacities is a constant for a given system (e.g., is unaffected by operational health of the environmental maintenance modules), this amounts to scaling RVh and expressing it in different units (e.g., percentage) as RVc.
Method 420 optionally returns to step 422 after optional step 432, but in embodiments, an optional step 434 provides a message based on the operational redundancy value RVh. In embodiments, the message is simply storage of the calculated operational redundancy value RVh; alternatively, the message may be display of RVh, and/or an alert based thereon, to an operator of the system. If optional step 434 is performed, method 420 thereafter returns to step 422.
A step 442 monitors environmental maintenance modules to receive operational data. Step 442 may be done in real time or may be done by gathering stored data from the environmental maintenance modules. The operational data may be raw data from sensors of the environmental maintenance modules, or may be one or more operational health and/or self-diagnostic metrics provided by the environmental maintenance modules. An example of step 442 is receiving data from any of sensors 222, 224, 226, 260 and/or 270,
A step 444 determines an operational weight Wi for each of the environmental maintenance modules based on the operational data. An example of step 444 is calculating the operational weights from the operational data, utilizing a lookup table to determine the operational weights from the operational data, or comparing the operational data with one or more thresholds to determine the operational weights Wi.
A step 446 computes available system capacity based on a sum of the operational weights. An example of step 446 is adding together the operational weights to form a value of S (Eq. 1). S is the effective number of cooling units that are operating to some minimal performance standard; that is, S is an operational value, not a design assumption.
A step 448 determines a required capacity to maintain an environmental value within a specified range, given a system load. The load may be measured or estimated. An example of step 448 is calculating a load L (Eq. 3) of environmental maintenance modules required to maintain the environmental value.
A step 450 of method 440 calculates an operational redundancy percentage based on a difference between the available capacity from step 446, and the required capacity, and dividing this difference by the total number of environmental maintenance modules. One example of step 450 is subtracting L from S to form redundancy value, and dividing by T, to form RVu (as per Eq. 7 above). It will be appreciated that since the total number of environmental maintenance modules is a constant for a given system (e.g., is unaffected by operational health of the environmental maintenance modules), this amounts to scaling RV and expressing it in different units (e.g., percentage) as RVu.
Method 440 optionally returns to step 442 after step 450, but in embodiments, an optional step 452 provides a message based on the operational redundancy value. In embodiments, the message is simply storage of the calculated operational redundancy value; alternatively, the message may be display of the operational redundancy value, and/or an alert based thereon, to an operator of the system. If optional step 452 is performed, method 440 thereafter returns to step 442.
The following sections provide examples of operational redundancy calculations according to Eqs. 1-8 above.
A room has 13 direct-expansion (DX) cooling units. Thus T, the total number of cooling units available, is equal to 13. During a one-week period, average heat extraction rate from the 13 cooling units is H=927 kW. The coefficients of performance (COPs) of the 13 cooling units over that week are 1.44, 1.96, 2.33, 2.75, 2.93, 2.98, 3.08, 3.65, 3.80, 3.88, 4.00 and 4.19 respectively. The design capacities of the units corresponding to the COP values are 115, 79, 79, 79, 79, 68, 88, 68, 68, 79, 68, 79 and 115 kW respectively. This data will be used to calculate RV, RVh, RVu and RVc as described above.
First, an operational redundancy calculation based on number of redundant cooling units will be illustrated. According to Eq. 3 above, L=12 because the sum of the design capacity of the 11 smallest units is 834 kW (less than H) while the sum of the design capacity of the 12 smallest units is 949 kW (greater than H). Based on the capacity and design of these units, the minimum COP specified by ASHRAE Standard 90.1 is 2.1. By this metric, the units with COPs of 1.44 and 1.96 are poorly performing. Using a value of 2.1 for a minimum performance threshold MinStdCOP, and a binary function for the weights Wi, such that Wi=1 when COP>MinStdCOP, otherwise Wi=0, the number of healthy units, using Eq. 1 above, is S=11. Then, using Eq. 4 above, RV=11−12=−1.
Next, an operational redundancy calculation based on excess cooling capacity is illustrated. In this example, using Eq. 5 above, the available cooling capacity in heat transfer units Sh=870 kW (the design capacities of the poorly performing units are not counted). Then, using Eq. 6 above, an operational redundancy value in units of heat transfer is RVh=870−949=−79 kW.
Next, an operational redundancy calculation based on percent of total units is illustrated. Using S, L and T as defined above, operational redundancy value in percent of total units is RVu=(11−12)/13=−7.7%.
Next, an operational redundancy calculation based on total cooling capacity is illustrated. RVh is calculated as −79 kW just above, and the total sum of design capacities is 1064 kW. Thus, using Eq. 8 above, an operational redundancy value in units of percent total cooling capacity is RVc=−79/1064=−7.4%.
In each of the above examples, since the operational redundancy values are negative, the risk level is high; poorly performing cooling units are required to get the heat out of the room.
If the two poorly performing units in Example 1 degrade in a way that causes their power consumption rates to be reduced in proportion to their degraded heat extraction rates, h, then the COPs of those units may stay above the MinStdCOP threshold of 2.1. This might happen in a dual-fan, dual-compressor unit if both a fan and a compressor fail at the same time. One way to handle this case is to declare such a unit as failed, and set its weight to something less than unity (e.g., zero) in the redundancy calculation. Another way to account for this type of failure is to use an improved calculation that may use a different performance metric than COP. In an embodiment, one alternative performance metric to COP is an expected heat extraction rate. The expected heat extraction rate could be a function of exogenous variables such as return air temperature of the cooling unit, power consumption of the cooling unit (if the cooling unit contains compressor(s)), outdoor air temperature (if the cooling unit rejects heat directly through a condenser), chilled water temperature (if the cooling unit rejects heat to a chiller plant), and/or condenser water temperature (if the cooling unit rejects heat to a dry cooler or cooling tower). For a cooling unit with compressorized cooling, such as a direct-expansion cooling unit, the following equation represents the expected heat transfer rate:
h
s=COPdPƒo(OAT)ƒr(RAT) Eq. 12
where he is the expected heat transfer rate, COPd is the coefficient of performance at the design operating point, P is the power consumption of the cooling unit, fo( ) is a function that captures the effect of outdoor air temperature on the capacity of the unit, OAT is the outdoor air temperature, fr( ) is a function that captures the effect of return air temperature on the capacity of the unit, and RAT is the return air temperature.
For a cooling unit with chilled water cooling, the following equation represents the expected heat transfer rate:
h
e
=Cƒ
c(ChWT,Vlv)ƒr(RAT) Eq. 13
where C is the heat extraction rate at the design operating point (i.e., design capacity), fc( ) is a function that captures the effect of chilled water temperature and chilled water valve position on unit capacity, ChWT is the chilled water temperature, and Vlv is the chilled water valve position.
When using expected heat extraction rate, the weights in certain redundancy calculations (e.g., Wi in Eq. 1, Eq. 5) are computed as a function of expected and actual heat extraction rates. For example, weights Wi could be binary functions where Wi=0 if hi<Pct*he, and Wi=1 otherwise, where Pct is a configurable percentage (e.g., 75%).
A room has two cooling units, A and B. Cooling unit A has a design capacity of 68 kW and cooling unit B has a design capacity of 115 kW. H=90 kW. In this example, even if the COPs of both units are greater than the MinStdCOP of 2.1, RV=0 because a single failure (unit B) would cause a high-temperature condition.
RV is designed to be a measure of performance-weighted redundancy that is correlated with a risk of failure. To demonstrate this correlation, RV was computed for 146 rooms, using a 1-week averaging window for cooling rate and power averages. There were 16 instances where RV was negative (a qualitatively High level of risk), 17 instances where RV had a value between 0 and an as-designed level of redundancy (a Medium level of risk), and 113 cases where RV was greater than the as-designed level of redundancy (a Low level of risk). All of these calculations were performed based on historical data from the same 1-week time window.
Then, a much longer historical period was searched for extreme-temperature events, where such an event was defined as one sensor reading above 100° F. while 5 or more additional sensors were reading above 90° F.
The odds of getting this outcome by chance are low. For example, if Medium and High are combined into a single Risky category, then the probability of either 6 or more of the 9 rooms with an extreme-temperature event being categorized as Risky when the general population of rooms is Risky just 23% of the time (33 out of 146), is just 0.006, or 0.6%. This demonstrates that a low RV value is an indicator of elevated risk of an extreme-temperature event.
The following pseudocode illustrates exemplary formulas and strategies for calculating relevant items such as COP, Load and RV. This pseudocode is not necessarily intended to be executable code (although certain programming environments may, in fact, be able to execute it). Rather, this pseudocode will be understood by one skilled in the art to illustrate relevant calculations and definitions of variables utilized in the calculations according to certain embodiments.
The techniques detailed above may be implemented using systems such as a control system, computer, or controller. Any of the control systems, computers, or controllers may utilize any suitable number of subsystems. Examples of such subsystems or components are shown in
A computer system can include a plurality of the same components or subsystems, e.g., connected together by external interface 581 or by an internal interface. In some embodiments, computer systems, subsystem, or apparatuses can communicate over a network. In such instances, one computer can be considered a client and another computer a server, where each can be part of a same computer system. A client and a server can each include multiple systems, subsystems, or components.
It should be understood that any of the embodiments of the present invention can be implemented in the form of control logic using hardware (e.g. an application specific integrated circuit or field programmable gate array) and/or using computer software with a generally programmable processor in a modular or integrated manner. As used herein, a processor includes a multi-core processor on a same integrated chip, or multiple processing units on a single circuit board or networked. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will know and appreciate other ways and/or methods to implement embodiments of the present invention using hardware and a combination of hardware and software.
Any of the software components or functions described in this application may be implemented as software code to be executed by a processor using any suitable computer language such as, for example, Java, C, C++, C# or scripting language such as Perl or Python using, for example, conventional or object-oriented techniques. The software code may be stored as a plurality or series of instructions or commands on a computer readable medium for storage and/or transmission, suitable media include random access memory (RAM), a read only memory (ROM), a magnetic medium such as a hard-drive or a floppy disk, or an optical medium such as a compact disk (CD) or DVD (digital versatile disk), flash memory, and the like. The computer readable medium may be any combination of such storage or transmission devices.
Such programs may also be encoded and transmitted using carrier signals adapted for transmission via wired, optical, and/or wireless networks conforming to a variety of protocols, including the Internet. As such, a computer readable medium according to an embodiment of the present invention may be created using a data signal encoded with such programs. Computer readable media encoded with the program code may be packaged with a compatible device or provided separately from other devices (e.g., via Internet download). Any such computer readable medium may reside on or within a single computer program product (e.g. a hard drive or an entire computer system), and may be present on or within different computer program products within a system or network.
The specific details of particular embodiments may be combined in any suitable manner without departing from the spirit and scope of embodiments of the invention. However, other embodiments of the invention may be directed to specific embodiments relating to each individual aspect, or specific combinations of these individual aspects.
It should be apparent that various different modifications can be made to embodiments without departing from the scope and spirit of this disclosure. In particular, the techniques and calculations disclosed herein may be adapted to any kind of system that utilizes multiple units in parallel toward a common system goal. Examples include cooling systems, heating systems, material processing or treatment systems, power distribution systems, manufacturing systems, data processing systems, and transportation systems.
A recitation of “a”, “an” or “the” is intended to mean “one or more” unless specifically indicated to the contrary. The use of “or” is intended to mean an “inclusive or,” and not an “exclusive or” unless specifically indicated to the contrary.
The above description of exemplary embodiments of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form described, and many modifications and variations are possible in light of the teaching above. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications to thereby enable others skilled in the art to best utilize the invention in various embodiments and with various modifications as are suited to the particular use contemplated.
This application is a continuation of PCT Application No. PCT/US2015/029302, filed May 5, 2015, which claims priority to U.S. Provisional Patent Application No. 61/988,720, filed May 5, 2014. Both of the above-identified applications are hereby incorporated by reference in their entireties for all purposes.
Number | Date | Country | |
---|---|---|---|
61988720 | May 2014 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2015/029302 | May 2015 | US |
Child | 15340713 | US |