This invention relates to systems and methods for redundant data center cooling and electrical systems.
Even the best public utility systems are inadequate to meet the needs of mission-critical applications. Mission-critical facilities within various organizations require power that is not subject to loss or substantial variations. Variations in power across a system may result in data loss and component failure. Data centers, for example, consist of several components, and each may be a potential point of failure, which can incur significant financial and data losses. Such components may include power sources, backup generators, uninterruptible power supplies (“UPS”), power distribution units (“PDU”), and equipment power supplies (e.g., servers, routers, switches, etc.).
Many organizations, when faced with the likelihood of downtime and data processing errors caused by utility power choose to implement a UPS system between the public power distribution system and their mission-critical loads. The UPS system design configuration chosen for the application directly impacts the availability of the critical equipment it supports. There are many variables that affect a system's availability, including human error, reliability of components, maintenance schedules, and recovery time. The impact that each of these variables has on the overall system's availability is determined to a large degree, by the configuration chosen. Currently, several UPS solutions exist for supporting critical loads, including those systems known as “parallel redundant,” “isolated redundant,” “distributed redundant,” “multiple parallel bus,” “system plus system,” and “isolated parallel,” etc. (See McCarthy, et al. Comparing UPS System Design Configurations, available at: https://www.apc.com/salestools/SADE-5TPL8X/SADE-5TPL8X_R3_EN.pdf.)
Each type of UPS system configuration offers its own features and level of protection. Passive-standby systems, for example, are considered “off-line” systems and monitor incoming power and switch to a battery source when an interruption occurs. This transfer takes place in milliseconds and is acceptable for some applications. But the loss of power during the transfer can disrupt the operation of sensitive electronic equipment. These UPS also do not filter power-line noise or voltage spikes or sags. Because of these limitations, their use is limited largely to systems not performing critical tasks.
Line-interactive UPS systems, in contrast, include a transformer or an inductor between the power source and the connected equipment. Such systems further include a bank of batteries to condition and filter incoming power. These types of systems offer more protection than passive-standby configurations, but do not completely isolate the protected equipment from irregularities in the incoming power. These systems offer adequate protection for many facility applications, but not enough protection for mission-critical operations, such as data centers.
Double-conversion systems, however, eliminate the momentary loss of power found in the other two types of UPS in the transfer from incoming power to battery-supplied power by using a bank of batteries connected to the direct-current part of the system. These UPS fully isolate protected equipment from the power source, thereby eliminating most power disturbances.
Such UPS system topologies can become quite complex. For example, “distributed redundant” configurations, also known as tri-redundant, are commonly used in the large data center market, sometimes within financial organizations. The basis of this design uses three or more UPS modules with independent input and output feeders. The independent output buses are connected to the critical load via multiple PDUs.
“System plus system” configurations are often located in standalone, specially-designed buildings. It is not uncommon for the infrastructure support spaces (UPS, battery, cooling, generator, utility, and electrical distribution rooms) to be equal in size to the data center equipment space, or even larger.
From the utility service entrance to the UPS, a distributed redundant design and a system plus system design may be similar Both provide for concurrent maintenance, and minimize single points of failure. The major difference is in the quantity of UPS modules that are required in order to provide redundant power paths to the critical load, and the organization of the distribution from the UPS to the critical load. As the load requirement, “N”, grows, the savings in quantity of UPS modules also increases.
Choosing a traditional UPS system to protect facilities and systems may be difficult; such system must be sized properly for the load it is designated to protect. Managers also need to properly size the batteries in the UPS to provide the desired runtime in the event of a power loss. For some applications, the UPS only needs to provide power long enough to allow an orderly shutdown of connected equipment. But in other applications, the batteries will need enough capacity to provide power for the duration of common power interruptions. The required battery capacity will depend on the nature of the functions performed by the protected load. But there is a need in the art to increase the reliability of these critical power components by implementing redundancy, in order to provide a high-availability environment.
Redundancy refers to a system design where a component is duplicated so that in the event of a component failure, IT equipment is not impacted. The main goal of redundancy is to ensure zero downtime. Active redundancy eliminates performance declines by monitoring the performance of individual devices, and this monitoring is used in voting logic. The voting logic is linked to switching that automatically reconfigures the components. Electrical power distribution provides an example of active redundancy.
Cooling is also a major cost factor in data centers. If cooling is implemented poorly, the power required to cool a data center can match or exceed the power used to run the IT equipment itself. Cooling also is often the limiting factor in data center capacity. In some cases, heat removal can be a bigger problem than getting power to the equipment.
In one form, the system of the invention comprises a data center system comprising at least three independent, shared-airspace cooling system modules, and at least three, fully-compartmentalized electrical or power system modules, in which the load is preferably spread near-evenly through the systems, and in which a failure or maintenance of any one of the cooling or electrical/power modules does not impact the critical load.
On the whole, the system will preferably maintain at least 51% utilization efficiency of capacity at full 100% critical load under normal operating conditions.
Some embodiments of the invention are not computer-controlled, but mechanically-controlled, so that hacking or failure of controller software is not potential a point of failure. And within the system, communication occurs only within sub-systems, not between subsystems.
The invention is disclosed with reference to the accompanying drawings, wherein:
Corresponding reference characters indicate corresponding parts throughout the several views. The example(s) set out herein illustrate several embodiments of the invention but should not be construed as limiting the scope of the invention in any manner.
Referring now to
Turning to
Turning to
Turning to
Each cooling system module 110 preferably runs an average of no less than 51% of the critical load of the overall data center system 100 under full normal operation when the load is near balance, and upon failure or maintenance, each cooling system module 110 will independently increase cooling, based on environmental inputs, to assume the critical load within ASHRAE Thermal Guidelines For Data Processing TC9.9 3rd Edition.
In one embodiment, there is no electronic communication between each cooling system module 110, though communication (wired or wireless, including software-mediated communication) within a module (e.g., between or among cooling units 120) may occur, such as shown in
Turning to
Each power system module 210 and pathway is fully compartmentalized from each other until the point of demarcation. Compartmentalization requires a minimum level of dust, smoke, and splash resistance meeting NEMA TYPE 3; thirty (30) minutes of fire rating when tested to ASTM E814/UL 1479; and mostly non-shared airspace under normal operating conditions (sealed, but not necessarily hermetically sealed, from one another).
When one power system module 210 fails or is taken offline for maintenance, the remaining power system modules 210 automatically assume the deficit, maintaining the critical load without fault through an active-active (rather than active-passive) design. As a whole, the electrical design will preferably maintain at least 51% efficiency of total capacity at full 100% critical load under normal operating conditions. All power system modules 210 preferably run in active-active state under normal operating conditions.
As shown in
Turning to
Turning to
Power system module 210 A, as shown in
Each power system module 210 may be optionally fed by multiple power sources (utility, generator, renewable and alternative energy) but one of each source must be fully-independent to each power system module 210. Each power system module 210 may have non-equal energy storage capacity (runtime) and equipment types (when compared to other power system modules 210 in the system), so long as total output wattage of each power system module 210 within a given system is near equal, as shown with power system modules 210 A, B, and C.
In an embodiment of data center system 300 of the invention shown in
Cooling system modules 110 A, B, and C, as shown, do not communicate with one another via data-based (e.g., software-based or networked) communications, though they may communicate internally (i.e., within a module, from cooling unit 120-to-unit 120) via wire or wireless data-based communication system 310.
Each cooling system module 110 uses a thermal temperature input to monitor and ultimately adjust the temperature as needed (as may be known to those of skill in the art) to maintain the load across data center system 300.
Each cooling system module 110 A, B, and C is paired to a corresponding power system module 210, identified as power system modules 210 A, B, and C, such that for each power system module 210, there is one paired cooling system module 110.
The electrical system is composed of a minimum of three (3) compartmentalized power system modules 210 and pathways (identified as power system modules 210 A, B, and C) with each power system module 210 preferably individually utilizing at minimum 51% of its capacity when the critical load is at 100%.
In some embodiments, the critical load may be preferably divided near-equally among each power system module 210 and pathway. Each power system module 210 is compartmentalized from each other, and each pathway is compartmentalized until its point of demarcation.
In some embodiments, when one power system module 210 fails or is taken offline for maintenance, the remaining power system modules 210 automatically assume the deficit, maintaining the critical load without fault.
In another embodiment, shown in
In a further embodiment, a data center system is provided that comprises no less than three (3) independent shared-airspace cooling system modules 110, and no less than four (4) fully-compartmentalized power system modules 210, in which the load is preferably spread near-evenly through the system, and in which a failure or maintenance of any one cooling system module 110 or power system module 210 does not impact the critical load.
Turning to
Under normal operating conditions, each of the four power system modules 210 is preferably operated at least 51% utilization, and in this example, at least 75% utilization. Each of the cooling system modules 110 is preferably operated at 66.6% utilization and in no event less than 51% utilization. All of the power system modules 210 and cooling system modules 110 are active under normal operating conditions, i.e., none are in “stand-by” mode.
This configuration allows for one of each of the power system modules 210 (A, B, C, or D) and cooling system modules 110 (A, B, or C) to be removed from the data center system 400 (due to fault, maintenance, etc.), while still maintaining 100% of the critical IT load, as shown in
In an alternative embodiment, data center system 500 is shown in
This configuration allows for one pair of the power system modules 210 and cooling system modules 110 (A, B, C, D, or E) to be removed from the system (due to fault, maintenance, etc.), while still maintaining 100% of the critical IT load, as shown in
In an alternative embodiment, data center system 600 is shown in
This configuration allows for two pairs of the power system modules 210 and cooling system modules 110 (A, B, C, D, or E) to be removed from the data center system 600 (due to fault, maintenance, etc.), while still maintaining 100% of the critical IT load, as shown in
The total load that can be carried by a power system module 210 depends in part on the rating of the facility's input. If the actual load exceeds the rating on the input for a sufficient period of time, the input breaker will trip, and power will be interrupted to everything that receives power from that input. To design a data center system where power is not interrupted, the load for the equipment (e.g., “IT Load”) must be estimated by some means. There are various ways known in the art to estimate the power of an IT equipment deployment in a data center (e.g. faceplate rating, direct power measurement). The approach chosen depends on the goal of the end user. The actual power consumption for a server, for example, depends on many factors. First, and most obviously, server power depends heavily on the configuration. Even for similarly configured hardware, power consumption can vary from system to system. In view of the potential variability, any general power number that is used for capacity budgeting should be conservative. The consequence of under-provisioning power is increased downtime risk.
Turning now to
In
In data center system 700, each power system module 210 is separately contained and compartmentalized from other power system modules 210 until the power reaches a point of demarcation/distribution (POD) 720, for example, in a data center processing space, so the load is protected from a single fault or failure in one power system module 210.
Each power system module 210 is fed power by power generation sources 710 running in parallel in an active/passive state. In the example shown in
In this example, downstream of transfer switch 730 are multiple parallel connected uninterruptable power supply (UPS) 220 units. In one embodiment, the UPS 220 units are APC Symmetra MW UPS units. UPS 220 units are configured to provide reliable power to the load when transfer switch 730 transfers between power generation sources (PGS) 710. The UPS 220 units are connected downstream to power distribution units (PDUs) 222. The PDUs 222 are configured to distribute power to the load 740.
In this example, load 740 represents any need for uninterrupted critical power, e.g., information technology (servers), cooling, or infrastructure. Load 740 is preferably near-equally balanced between each power system module 210 as an operational requirement under normal operating conditions. In this example, each power system module 210 maintains a maximum aggregate average below 75% of the rated load capacity for available utilization under normal load conditions. Should power system module 210 have a fault or failure or need to be taken offline for maintenance (FFM), the net result will increase to a maximum aggregate average below 100% of the rated load capacity for available utilization under FFM load conditions across the remaining operational units.
Turning now to
In the example shown, each cooling system module 800 is configured with two main loops 810, 820, as a hybrid system, interconnected with a heat exchanger 830. Internal cooling loop (ICL) 810 is located, for example, inside a data center processing space, where a waterless system must be used to prevent threat to electrical systems. Internal cooling loop 810 may use compressed liquid inert gas (refrigerant) for heat rejection. The compressed inert liquid reverts to a gas state at room temperature in the event of a leak. Internal cooling loop 810 may further comprise a refrigerant delivery network (RDN) 840, for example, available under the trade name Opticool, and an active heat exchanger (AHX) 830. Internal cooling loop 810 is interconnected to a refrigerant pump system (RPS) 850/heat exchanger 830 in an external cooling loop (ECL) 820.
External cooling loop (ECL) 820 may be housed outside the data center processing space, where the use of a water-based system does not impose a threat to critical electrical systems. External cooling loop 820 may use a water-based glycol unit for heat rejection. External cooling loop 820 for each cooling system module 800 is compartmentalized to chiller 860 where it vents to atmosphere. ECL 820 may further comprise a refrigerant pump system (RPS) 850, water piping and pump system (WPPS) 870, and chiller system 860. Refrigerant delivery network 840 pumps the compressed liquid inert gas, in a loop from the active heat exchanger 830 (where heat is removed from the load) in internal cooling loop 810 to the refrigerant pump system 850 in external cooling loop 820 where the heat is exchanged and pushed downstream in external cooling loop 820 to the chiller 860 and removed.
Each cooling system module 800 may comprise several stand-alone internal- and external cooling loop coupled loops 810, 820. Each cooling system module 800 is only fed by one power system module 210. In the example, data center system 900 shown in
The load is near equally balanced between each cooling system module 800 as an operational requirement under normal operating conditions. Each cooling system module 800 maintains a maximum aggregate average below 75% of the rated load capacity for available utilization under normal load conditions. If one cooling system module 800 has a fault or failure or needs to be taken offline for maintenance (FFM), the net result will be to increase to a maximum aggregate average below 100% of the rated load capacity for available utilization under FFM load conditions across the remaining operational units.
As further shown in the example, data center system 900 shown in
In
While the invention has been described with reference to particular embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the scope of the invention.
Therefore, it is intended that the invention not be limited to the particular embodiments disclosed as the best mode contemplated for carrying out this invention, but that the invention will include all embodiments falling within the scope and spirit of the appended claims.
This application is a continuation of U.S. Non-Provisional patent application Ser. No. 16/784,719, filed Feb. 7, 2020, which in turn claims priority from U.S. Provisional Patent Application Ser. No. 62/802,426, filed Feb. 7, 2019, which are incorporated herein by reference in their entireties.
Number | Name | Date | Kind |
---|---|---|---|
6465909 | Soo et al. | Oct 2002 | B1 |
7385862 | Dubey | Jun 2008 | B2 |
7459803 | Mosman | Dec 2008 | B2 |
7679217 | Dishman et al. | Mar 2010 | B2 |
7681404 | Bean, Jr. | Mar 2010 | B2 |
7877622 | Gruendler | Jan 2011 | B2 |
7906871 | Freeman et al. | Mar 2011 | B2 |
8276000 | Humphrey et al. | Sep 2012 | B2 |
8423806 | Cheng et al. | Apr 2013 | B2 |
8644997 | Lillis et al. | Feb 2014 | B2 |
8671287 | DeCusatis et al. | Mar 2014 | B2 |
8688413 | Healey et al. | Apr 2014 | B2 |
8707095 | Grimshaw | Apr 2014 | B2 |
8907520 | Chapel et al. | Dec 2014 | B2 |
8949081 | Healey | Feb 2015 | B2 |
9192069 | Emert et al. | Nov 2015 | B2 |
9519517 | Dalgas et al. | Dec 2016 | B2 |
9552053 | O'Connor et al. | Jan 2017 | B2 |
9583973 | Vogman | Feb 2017 | B2 |
9698589 | Leyh | Jul 2017 | B1 |
9703363 | Zhou | Jul 2017 | B2 |
9778717 | Kaplan | Oct 2017 | B2 |
9846467 | Gardner et al. | Dec 2017 | B2 |
9904331 | VanGilder et al. | Feb 2018 | B2 |
10225958 | Gao | Mar 2019 | B1 |
10432017 | Morales | Oct 2019 | B1 |
20070216229 | Johnson, Jr. et al. | Sep 2007 | A1 |
20100097044 | Gipson | Apr 2010 | A1 |
20100223085 | Gauthier | Sep 2010 | A1 |
20100300650 | Bean, Jr. | Dec 2010 | A1 |
20110016893 | Dawes | Jan 2011 | A1 |
20130046514 | Shrivastava et al. | Feb 2013 | A1 |
20130253716 | Gross et al. | Sep 2013 | A1 |
20150051749 | Hancock et al. | Feb 2015 | A1 |
20150121113 | Ramamurthy et al. | Apr 2015 | A1 |
20150249363 | Humphrey, Jr. et al. | Sep 2015 | A1 |
20170098956 | Sarti | Apr 2017 | A1 |
20170332510 | Sarti | Nov 2017 | A1 |
20180052431 | Shaikh et al. | Feb 2018 | A1 |
20200106297 | Ross | Apr 2020 | A1 |
Number | Date | Country |
---|---|---|
WO2015148686 | Oct 2015 | WO |
Entry |
---|
Fluegeman, Michael. 2(N+1) and 3N/2 Redundancy: High Reliability Optons. https://www.facilitiesnet.com/datacenters/article/2N1-and-3N2-Redundancy-High-Reliability-Options-Facilities-Management-Data-Centers-Feature--16763 Aug. 5, 2016. |
Allen, Mike. Redundancy: N+1, N+2 vs. 2N+1 (Part II). https://www.datacenters.com/news/redundancy-n1-n2-vs-2n-vs-2n1-part-ii Nov. 16, 2016. |
McCarthy, et al. Comparing UPS System Design Configurations Jun. 14, 2018. |
Piper, James, Understanding Types of UPS, https://www.facilitiesnet.com/powercommunication/article/Understanding-Types-of-UPS--13558 Oct. 12, 2012. |
Piper, James; Emergency Power; The ABCs of UPS; https://www.facilitiesnet.com/powercommunication/article/Emergency-Power-The-ABCs-of-UPS--8596 Apr. 1, 2008. |
Number | Date | Country | |
---|---|---|---|
20230380113 A1 | Nov 2023 | US |
Number | Date | Country | |
---|---|---|---|
62802426 | Feb 2019 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16784719 | Feb 2020 | US |
Child | 18341260 | US |