The present invention relates to the field of computing. More specifically, the present invention relates to the field of implementing a disaster recovery appliance.
A typical network requires the use of one or more servers to store, distribute and process data. Furthermore, the network typically implements a backup system to save data in the event a server loses data whether it be due to a virus, software issue or hardware failure. Although the data is backed up, that does not remedy the problem of the server being inaccessible as a result of a malfunction. Once a server fails, an administrator has to replace the server with another server which is a process that could take a few hours or days if a new server needs to be purchased. With a server down, data stored on that server is likely inaccessible which causes problems such as a website being unavailable. It has been estimated that millions of dollars are lost due to system inaccessibility. Furthermore, there is a significant effect on reputation when a company's website is down. Moreover, for Local Area Networks (LANs) within an organization, a server being down would halt productivity if employees are unable to access their data.
A solution of ensuring that the server data is continuously available is to utilize a dedicated backup server for each server. While that works well with one server, it becomes a high cost solution with multiple servers as each server typically costs many thousands of dollars.
Another common feature when utilizing networks is Lights Out Management (LOM) which allows a system administrator to monitor and manage servers remotely. A typical LOM system includes a hardware component called a LOM module and an application for monitoring system variables such as temperature and CPU utilization. The application also provides the system administrator with remote abilities such as rebooting, fan speed control, troubleshooting and operating system installation. Although LOM provides some remote management abilities, there are many issues that LOM is unable to handle.
A disaster recovery appliance is described herein. The disaster recovery appliance is coupled to one or more servers. The disaster recovery appliance continuously receives backup data for each of the one or more servers. When a server fails, the disaster recovery appliance, replaces the failed server. While the failed server is inaccessible, the disaster recovery appliance is able to mimic the functionality of the failed server. In some embodiments, the disaster recovery appliance is able to act as a server in addition to a backup device for the other servers.
In one aspect, a system for providing network stability and data reliability comprises one or more servers and a computing device coupled to the one or more servers, wherein the computing device backs up data from the one or more servers and replaces a failed server in the one or more servers. The computing device replaces the failed server upon detecting a condition indicating the first server is about to fail. The computing device is a server and a backup device. The computing device uses a continuous backup scheme to back up the data. The computing device stores a system image of each of the one or more servers. The computing device is coupled to the one or more servers over a network. The system further comprises a storage server coupled between the one or more servers and the computing device for backing up data. The computing device continues backing up data from active servers in the one or more servers after the failed server fails. The system further comprises a standby computing device coupled to the computing device to temporarily replace the failed server. The system further comprises a virtual server generated by the computing device to temporarily replace the failed server.
In another aspect, a method of providing network stability and data reliability comprises backing up data from one or more servers to a computing device and serving the data utilizing the computing device when a server of the one or more servers fails. Backing up data includes storing an image of the one or more servers. The method further comprises continuing to back up the data from active servers of the one or more servers on the computing device. The computing device backs up the data using a continuous backup scheme. The computing device is a server and a backup device. The computing device is coupled to the one or more servers over a network.
In another aspect, a method of providing network stability and data reliability comprises backing up data from one or more servers to a computing device and utilizing a standby computing device to temporarily replace a failed server of the one or more servers when the server fails. Backing up data includes storing an image of the one or more servers. The computing device backs up the data using a continuous backup scheme. The method further comprises generating a virtual server with the computing device to temporarily replace a failed second server if the standby computing device is unavailable. The method further comprises initializing the computing device into server mode to serve the data for a failed second server if the standby computing device is unavailable. The method further comprises continuing to back up data from the active servers of the one or more servers on the computing device.
In yet another aspect, an apparatus for providing network stability and data reliability comprises a storage component, a data backup application stored on the storage component for backing up data received from one or more servers, a data restore application stored on the storage component for restoring the data received from the one or more servers and a server application stored on the storage component for serving the data received from a failed server of the one or more servers. The data received comprises a server image. The data backup application and the data restore application continue executing while the server application is executing. The data backup application and the data restore application stop executing when the server application is executing.
In another aspect, a system for providing network stability and data reliability comprises one or more servers, a first computing device coupled to the one or more servers, wherein the first computing device backs up data from the one or more servers and replaces a failed server of the one or more servers and a second computing device coupled to the one or more servers and the first computing device, wherein the second computing device is coupled after the failed server failed, further wherein the second computing device backs up data from the one or more servers and the first computing device. The first computing device copies the data to the second computing device. The second computing device replaces a second failed server of the one or more servers and the first computing device.
A disaster recovery appliance is described herein. A server configuration includes one or more servers in addition to a storage server or a backup server. In an embodiment, the disaster recovery appliance is coupled to the storage server. Using a continuous backup scheme, the one or more servers continuously back up their data on the storage server which then backs up the data on the disaster recovery appliance. The storage server stores all of the relevant application and user data corresponding to each server. The storage server also stores and is aware of the environment on each server. For instance, if one of the servers is a SQL server, the storage server contains the necessary software and/or image to replicate the SQL server. The disaster recovery appliance contains an operating system and utilities to back up and restore data when needed. Specifically, when one of the servers fails, the disaster recovery appliance is available to take the place of the failed server. The disaster recovery appliance becomes a temporary or permanent replacement server in real-time (e.g. instantaneously aside from set up time if any) so that the change is seamless.
As described herein, the storage server is aware of each server's environment, and thus is able to provide the disaster recovery appliance with the same environment as the faulty server. After the failed server becomes inaccessible, the disaster recovery appliance is able to mimic the actions and data contained on the failed server. Thus, when users attempt to access an application or data that is on the failed server, they will continue to access the data uninterrupted as if the failed server were up and running. In some embodiments, the disaster recovery appliance is the storage server. In some embodiments, the disaster recovery appliance couples to a network to receive backup data from the one or more servers. In some embodiments, the disaster recovery appliance is capable of backing up a single server, and in other embodiments, the disaster recovery appliance is capable of backing up more than one server. Furthermore, in some embodiments, once a server fails, the disaster recovery appliance is able to operate in dual modes such that the disaster recovery appliance continues to back up data from other servers, while the disaster recovery appliance also continues to serve data as the replacement server. In some embodiments, once a server fails, the disaster recovery appliance operates in a single mode of serving data, and the backup functionality is shut down.
By backing up data on a disaster recovery appliance, and then being able to switch from backup mode to server mode, a network utilizing the disaster recovery appliance is able to maintain full operation with an extremely short interrupt time from a failed server. Additional disaster recovery appliances are able to be coupled to a server system to provide additional backup capabilities. The disaster recovery appliance also utilizes plug-and-play technology so that it is able to be installed easily.
In some embodiments, for mission critical operations where the amount of down-time must be as close to 0 seconds as possible, additional components are able to be used to ensure down-time is minimized. In addition to backing up data such as user data and/or applications on a disaster recovery appliance and then serving the data using the disaster recovery appliance, the data is able to be served using a virtual server and/or a warm standby device. The virtual server is described in U.S. patent application Ser. No. 11/644,451 filed Dec. 21, 2006, entitled, “Virtual Recovery Server,” which is also incorporated by reference herein. The warm standby appliance is described in U.S. patent application Ser. No. 11/644,581 filed Dec. 21, 2006, entitled, “Warm Standby Appliance,” which is also incorporated by reference herein. Although, these additional components are able to be included to further ensure a minimal down-time, it is possible to have minimal down-time simply using the disaster recovery appliance.
Although only one warm standby appliance and only one virtual server are described above, any number of warm standby appliances and virtual servers are able to be implemented. For example, for a large company with fifty servers where the absolute minimum downtime is required, the company may have two warm standby appliances and the ability to generate multiple virtual servers in case many servers fail at roughly the same time. Furthermore, although a warm standby appliance and a virtual server are described above as both being part of the system, it is possible to use one or more warm standby appliances without a virtual server, or to use one or more virtual servers without a warm standby appliance.
If in the step 504, a warm standby appliance is not available and a virtual server is not able to be generated to replace the failed server, then the disaster recovery appliance is initialized into server mode, in the step 518. In the step 520, the disaster recovery appliance continues serving data. The system is continuously backing up data in addition to monitoring for server failures. Therefore, when a server does fail, the system is able to adapt and utilize the necessary resources whether they be one or more warm standby appliances, one or one or more virtual servers and/or one or more disaster recovery appliances. In some embodiments, the process automatically occurs; whereas, in other embodiments an administrator maintains the process.
The disaster recovery appliance is utilized by coupling a disaster recovery appliance to a storage server wherein the storage server then transfers server images to the disaster recovery appliance periodically. Alternatively, the disaster recovery appliance is directly coupled to one or more data/application servers to back up the data and/or applications and then when a server fails, the disaster recovery appliance replaces the failed server. The disaster recovery appliance is updated often with captured images of the servers, so that minimal data is lost if a server were to fail. The disaster recovery appliance is then able to mimic the functional server after the server fails, and the disaster recovery appliance remains the replacement server. Thus, from a customer or user perspective, there will be little downtime affecting the user's interaction with the server.
In operation, the disaster recovery appliance provides a permanent disaster recovery appliance when a server fails. One or more servers operate by serving data to users, where serving includes hosting a website, providing/storing data, executing applications or anything a server is capable of doing. Furthermore, each of these servers typically has a dedicated task or at least partitioned tasks, so that one server may be deemed an SQL server while another is focused on a different aspect of serving. In some embodiments, a storage or backup server is utilized to back up these servers which then sends the backup data to the disaster recovery appliance. In some embodiments, the servers are directly coupled to the disaster recovery appliance, and in some embodiments, the servers are coupled to the disaster recovery appliance through a network. The data and/or application backups are performed utilizing any backup technology but preferably receiving images of each server. When one or more of the servers fails, the disaster recovery appliance takes the place of that server. Therefore, the server is only down for a very short amount of time while the disaster recovery appliance takes over. Once the disaster recovery appliance is running, users should experience no difference than if the server were still running. In some embodiments, the disaster recovery appliance continues backing up data from the active servers in addition to acting as a server.
In addition to utilizing the disaster recovery appliance when a server has failed, the disaster recovery appliance is able to detect when a server is about to fail, so that the disaster recovery appliance starts taking over the serving processes before the server fails. With such a detection, it is possible to have zero downtime. A failing server is able to be detected in a number of ways such as by monitoring the system environment. For example, if the server's internal temperature is reaching a dangerously high temperature, that is an indicator that the server is about to shut down, and thus the disaster recovery appliance should take over. Other methods of detecting a failing server are possible.
The present invention has been described in terms of specific embodiments incorporating details to facilitate the understanding of principles of construction and operation of the invention. Such reference herein to specific embodiments and details thereof is not intended to limit the scope of the claims appended hereto. It will be readily apparent to one skilled in the art that other various modifications may be made in the embodiment chosen for illustration without departing from the spirit and scope of the invention as defined by the claims.
Number | Name | Date | Kind |
---|---|---|---|
4866635 | Kahn et al. | Sep 1989 | A |
5602990 | Leete | Feb 1997 | A |
5649196 | Woodhill et al. | Jul 1997 | A |
5659743 | Adams et al. | Aug 1997 | A |
5787409 | Seiffert et al. | Jul 1998 | A |
5802364 | Senator et al. | Sep 1998 | A |
5812751 | Ekrot et al. | Sep 1998 | A |
5835911 | Nakagawa et al. | Nov 1998 | A |
5897635 | Torres et al. | Apr 1999 | A |
5933647 | Aronberg et al. | Aug 1999 | A |
5950010 | Hesse et al. | Sep 1999 | A |
5974547 | Klimenko | Oct 1999 | A |
6012152 | Douik et al. | Jan 2000 | A |
6029196 | Lenz | Feb 2000 | A |
6067582 | Smith et al. | May 2000 | A |
6144959 | Anderson et al. | Nov 2000 | A |
6170065 | Kobata et al. | Jan 2001 | B1 |
6189101 | Dusenbury, Jr. | Feb 2001 | B1 |
6209089 | Selitrennikoff et al. | Mar 2001 | B1 |
6212660 | Joeressen et al. | Apr 2001 | B1 |
6282711 | Halpern et al. | Aug 2001 | B1 |
6301612 | Selitrennikoff et al. | Oct 2001 | B1 |
6317761 | Landsman et al. | Nov 2001 | B1 |
6349137 | Hunt et al. | Feb 2002 | B1 |
6356915 | Chtchetkine et al. | Mar 2002 | B1 |
6363400 | Chtchetkine et al. | Mar 2002 | B1 |
6366296 | Boreczky et al. | Apr 2002 | B1 |
6378035 | Parry et al. | Apr 2002 | B1 |
6421777 | Pierre-Louis et al. | Jul 2002 | B1 |
6449658 | Lafe et al. | Sep 2002 | B1 |
6459499 | Tomat | Oct 2002 | B1 |
6463530 | Sposato | Oct 2002 | B1 |
6473794 | Guheen et al. | Oct 2002 | B1 |
6477531 | Sullivan et al. | Nov 2002 | B1 |
6490677 | Aguilar et al. | Dec 2002 | B1 |
6536037 | Guheen et al. | Mar 2003 | B1 |
6553375 | Huang et al. | Apr 2003 | B1 |
6556950 | Schwenke et al. | Apr 2003 | B1 |
6606744 | Mikurak | Aug 2003 | B1 |
6625651 | Swartz et al. | Sep 2003 | B1 |
6625754 | Aguilar et al. | Sep 2003 | B1 |
6636857 | Thomas et al. | Oct 2003 | B2 |
6654797 | Kamper | Nov 2003 | B1 |
6654801 | Mann et al. | Nov 2003 | B2 |
6694375 | Beddus et al. | Feb 2004 | B1 |
6697852 | Ryu | Feb 2004 | B1 |
6704886 | Gill et al. | Mar 2004 | B1 |
6718464 | Cromer et al. | Apr 2004 | B2 |
6728530 | Heinonen et al. | Apr 2004 | B1 |
6735625 | Ponna | May 2004 | B1 |
6751658 | Haun et al. | Jun 2004 | B1 |
6757729 | Devarakonda et al. | Jun 2004 | B1 |
6816462 | Booth, III et al. | Nov 2004 | B1 |
6816882 | Conner et al. | Nov 2004 | B1 |
6871210 | Subramanian | Mar 2005 | B1 |
6880108 | Gusler et al. | Apr 2005 | B1 |
6885481 | Dawe | Apr 2005 | B1 |
6886020 | Zahavi et al. | Apr 2005 | B1 |
6915343 | Brewer et al. | Jul 2005 | B1 |
6954853 | Wang et al. | Oct 2005 | B2 |
6954930 | Drake et al. | Oct 2005 | B2 |
6959235 | Abdel-Malek et al. | Oct 2005 | B1 |
6985967 | Hipp | Jan 2006 | B1 |
7003560 | Mullen et al. | Feb 2006 | B1 |
7003663 | Lagosanto et al. | Feb 2006 | B2 |
7058698 | Chatterjee et al. | Jun 2006 | B2 |
7080118 | Hildebrand | Jul 2006 | B2 |
7143307 | Witte et al. | Nov 2006 | B1 |
7149698 | Guheen et al. | Dec 2006 | B2 |
7175078 | Ban et al. | Feb 2007 | B2 |
7178166 | Taylor et al. | Feb 2007 | B1 |
7194445 | Chan et al. | Mar 2007 | B2 |
7200779 | Coss, Jr. et al. | Apr 2007 | B1 |
7210143 | Or et al. | Apr 2007 | B2 |
7237122 | Kadam et al. | Jun 2007 | B2 |
7260597 | Hofrichter et al. | Aug 2007 | B1 |
7287053 | Bodin | Oct 2007 | B2 |
7328367 | Ukai et al. | Feb 2008 | B2 |
7337311 | Chen et al. | Feb 2008 | B2 |
7392046 | Leib et al. | Jun 2008 | B2 |
7401125 | Uchida et al. | Jul 2008 | B1 |
7480822 | Arbon et al. | Jan 2009 | B1 |
7512584 | Keith, Jr. | Mar 2009 | B2 |
7571467 | Priestley et al. | Aug 2009 | B1 |
7627694 | Sreenivasan et al. | Dec 2009 | B2 |
7698487 | Rothman et al. | Apr 2010 | B2 |
7788524 | Wing et al. | Aug 2010 | B2 |
20010037323 | Moulton et al. | Nov 2001 | A1 |
20010049793 | Sugimoto | Dec 2001 | A1 |
20020013827 | Edstrom et al. | Jan 2002 | A1 |
20020035674 | Vetrivelkumaran et al. | Mar 2002 | A1 |
20020049764 | Boothby et al. | Apr 2002 | A1 |
20020083183 | Pujare et al. | Jun 2002 | A1 |
20020087625 | Toll et al. | Jul 2002 | A1 |
20020087963 | Eylon et al. | Jul 2002 | A1 |
20020091763 | Shah et al. | Jul 2002 | A1 |
20020094868 | Tuck et al. | Jul 2002 | A1 |
20020104080 | Woodard et al. | Aug 2002 | A1 |
20020107920 | Hotti | Aug 2002 | A1 |
20020116585 | Scherr | Aug 2002 | A1 |
20020124092 | Urien | Sep 2002 | A1 |
20020138640 | Raz et al. | Sep 2002 | A1 |
20020157089 | Patel et al. | Oct 2002 | A1 |
20020161868 | Paul et al. | Oct 2002 | A1 |
20020188941 | Cicciarelli et al. | Dec 2002 | A1 |
20030005096 | Paul et al. | Jan 2003 | A1 |
20030033379 | Civanlar et al. | Feb 2003 | A1 |
20030036882 | Harper et al. | Feb 2003 | A1 |
20030037328 | Cicciarelli et al. | Feb 2003 | A1 |
20030041136 | Cheline et al. | Feb 2003 | A1 |
20030046371 | Falkner | Mar 2003 | A1 |
20030051128 | Rodriguez et al. | Mar 2003 | A1 |
20030055878 | Fletcher et al. | Mar 2003 | A1 |
20030078960 | Murren et al. | Apr 2003 | A1 |
20030110188 | Howard et al. | Jun 2003 | A1 |
20030126242 | Chang | Jul 2003 | A1 |
20030191730 | Adkins et al. | Oct 2003 | A1 |
20030204562 | Hwang | Oct 2003 | A1 |
20030233383 | Koskimies | Dec 2003 | A1 |
20030233493 | Boldon et al. | Dec 2003 | A1 |
20040010716 | Childress et al. | Jan 2004 | A1 |
20040068554 | Bales et al. | Apr 2004 | A1 |
20040073787 | Ban et al. | Apr 2004 | A1 |
20040093492 | Daude et al. | May 2004 | A1 |
20040104927 | Husain et al. | Jun 2004 | A1 |
20040107273 | Biran et al. | Jun 2004 | A1 |
20040123153 | Wright et al. | Jun 2004 | A1 |
20040148306 | Moulton et al. | Jul 2004 | A1 |
20040180721 | Rowe | Sep 2004 | A1 |
20040193876 | Donley et al. | Sep 2004 | A1 |
20040201604 | Kraenzel et al. | Oct 2004 | A1 |
20040236843 | Wing et al. | Nov 2004 | A1 |
20040243928 | Hesmer et al. | Dec 2004 | A1 |
20050027846 | Wolfe et al. | Feb 2005 | A1 |
20050033808 | Cheng et al. | Feb 2005 | A1 |
20050044197 | Lai | Feb 2005 | A1 |
20050044544 | Slivka et al. | Feb 2005 | A1 |
20050108297 | Rollin et al. | May 2005 | A1 |
20050108546 | Lehew et al. | May 2005 | A1 |
20050108593 | Purushothaman et al. | May 2005 | A1 |
20050144218 | Heintz | Jun 2005 | A1 |
20050149729 | Zimmer et al. | Jul 2005 | A1 |
20050160289 | Shay | Jul 2005 | A1 |
20050193245 | Hayden et al. | Sep 2005 | A1 |
20050198196 | Bohn et al. | Sep 2005 | A1 |
20050216524 | Gomes et al. | Sep 2005 | A1 |
20050216902 | Schaefer | Sep 2005 | A1 |
20050226059 | Kavuri et al. | Oct 2005 | A1 |
20050256952 | Mouhanna et al. | Nov 2005 | A1 |
20050262503 | Kane | Nov 2005 | A1 |
20050268145 | Hufferd et al. | Dec 2005 | A1 |
20050273486 | Keith, Jr. | Dec 2005 | A1 |
20050283606 | Williams | Dec 2005 | A1 |
20050286435 | Ogawa et al. | Dec 2005 | A1 |
20060021040 | Boulanger et al. | Jan 2006 | A1 |
20060031377 | Ng et al. | Feb 2006 | A1 |
20060031407 | Dispensa et al. | Feb 2006 | A1 |
20060031529 | Keith, Jr. | Feb 2006 | A1 |
20060041641 | Breiter et al. | Feb 2006 | A1 |
20060041759 | Kaliski, Jr. et al. | Feb 2006 | A1 |
20060047716 | Keith, Jr. | Mar 2006 | A1 |
20060047946 | Keith, Jr. | Mar 2006 | A1 |
20060074943 | Nakano et al. | Apr 2006 | A1 |
20060095705 | Wichelman et al. | May 2006 | A1 |
20060129459 | Mendelsohn | Jun 2006 | A1 |
20060143709 | Brooks et al. | Jun 2006 | A1 |
20060179061 | D'Souza et al. | Aug 2006 | A1 |
20060224544 | Keith, Jr. | Oct 2006 | A1 |
20060224545 | Keith, Jr. | Oct 2006 | A1 |
20060233310 | Adams, Jr. et al. | Oct 2006 | A1 |
20070078982 | Aidun et al. | Apr 2007 | A1 |
20070094269 | Mikesell et al. | Apr 2007 | A1 |
20070143374 | D'Souza et al. | Jun 2007 | A1 |
20070174658 | Takamoto et al. | Jul 2007 | A1 |
20070174690 | Kambara et al. | Jul 2007 | A1 |
20070185936 | Derk et al. | Aug 2007 | A1 |
20070233633 | Keith, Jr. | Oct 2007 | A1 |
20070239905 | Banerjee et al. | Oct 2007 | A1 |
20070271290 | Keith, Jr. | Nov 2007 | A1 |
20070271428 | Atluri | Nov 2007 | A1 |
20070274315 | Keith | Nov 2007 | A1 |
20070276836 | Chatterjee et al. | Nov 2007 | A1 |
20080034019 | Cisler et al. | Feb 2008 | A1 |
20080034071 | Wilkinson et al. | Feb 2008 | A1 |
20080077622 | Keith | Mar 2008 | A1 |
20080077630 | Keith | Mar 2008 | A1 |
20080127294 | Keith | May 2008 | A1 |
20080209142 | Obernuefemann | Aug 2008 | A1 |
20080216168 | Larson et al. | Sep 2008 | A1 |
20080294860 | Stakutis et al. | Nov 2008 | A1 |
20080313632 | Kumar et al. | Dec 2008 | A1 |
20090094362 | Huff | Apr 2009 | A1 |
20100050011 | Takamoto et al. | Feb 2010 | A1 |