Computer data is vital to today's organizations, and a significant part of protection against disasters is focused on data protection. As solid-state memory has advanced to the point where cost of memory has become a relatively insignificant factor, organizations can afford to operate with systems that store and process terabytes of data.
Conventional data protection systems include tape backup drives, for storing organizational production site data on a periodic basis. Such systems suffer from several drawbacks. First, they require a system shutdown during backup, since the data being backed up cannot be used during the backup operation. Second, they limit the points in time to which the production site can recover. For example, if data is backed up on a daily basis, there may be several hours of lost data in the event of a disaster. Third, the data recovery process itself takes a long time.
Another conventional data protection system uses data replication, by creating a copy of the organization's production site data on a secondary backup storage system, and updating the backup with changes. The backup storage system may be situated in the same physical location as the production storage system, or in a physically remote location. Data replication systems generally operate either at the application level, at the file system level, or at the data block level.
Current data protection systems try to provide continuous data protection, which enable the organization to roll back to any specified point in time within a recent history. Continuous data protection systems aim to satisfy two conflicting objectives, as best as possible; namely, (i) minimize the down time, in which the organization production site data is unavailable, during a recovery, and (ii) enable recovery as close as possible to any specified point in time within a recent history.
In one aspect, a method to upgrade software on nodes in a clustered environment, includes terminating processes on a first node before upgrading the software on the first node, upgrading the software to a first version from a second version on the first node, running the processes on the first node after upgrading the software on the first node to the first version, determining whether a second node is about to upgrade to the first version of software, allowing transfer of site control from the second node to the first node, if the second node is about to upgrade to the first version of software and upgrading the software on the second node to the first version of software after the transferring of site control from the second node to the first node.
In another aspect, an article includes a non-transitory machine-readable medium that stores executable instructions to upgrade software on nodes in a clustered environment. The instructions causing a machine to terminate processes on a first node before upgrading the software on the first node, upgrade the software to a first version from a second version on the first node, run the processes on the first node after upgrading the software on the first node to the first version, determine whether a second node is about to upgrade to the first version of software, transfer site control from the second node to the first node, if the second node is about to upgrade to the first version of software and upgrade the software on the second node to the first version of software after the transfer of site control from the second node to the first node.
In a further aspect, a first node includes circuitry configured to terminate processes on a first node before upgrading the software on the first node, upgrade the software to a first version from a second version on the first node, run the processes on the first node after upgrading the software on the first node to the first version, determine whether a second node is about to upgrade to the first version of software and be allowed to receive site control, if the second node is about to upgrade to the first version of software.
Described herein is an approach to upgrade software on nodes. In particular, the methods and techniques described herein allow for two nodes to use the same software upgrade process independently from each other while allowing only one of the nodes to have site control at a time. In one example, a node can crash in the middle of an upgrade, recover and continue upgrading seamlessly. While the description describes one particular pair of nodes as being data protection appliances (DPAs), the nodes may be any nodes in a clustered computing environment where one of the nodes is determined as site control. As used herein, site control is a determination of one and only one of the nodes as the controller of all other nodes in a system (e.g., a site). While the manner of selecting the site control and the responsibilities assumed by it are determined by a leader election protocol not described herein, more than one site control is strictly prohibited; however, the system can function for brief periods of time without a site control.
Referring to
The DPAs 14a, 14b are redundant in case of failure so that one of the DPAs 14a, 14b controls the production site (i.e., has site control) at a time. Likewise, the DPAs 14c, 14d are redundant in case of failure so that one of the DPAs 14c, 14d controls the replication site at a time. The DPA 14c, 14d are also redundant to the DPAs 14a-14b, in the event the production site fails.
Each of the DPAs 14a-14d includes an upgrade status object. For example, the DPA 14a includes an upgrade status object 22a, the DPA 14b includes an upgrade status object 22b, the DPA 14c includes an upgrade status object 22c and the DPA 14d includes an upgrade status object 22d. The upgrade status objects 22a-22d indicate whether a software update is in progress for the respective DPA 14a, 14b. In one example, the upgrade status object 22a, 22b is a persistent bit that is not removed or copied over during a software upgrade of the respective DPA 14a-14d. In one particular example, if the upgrade status object 22a-22d is set to “True” no processes on the respective DPA 14a-14d will run when a script is executed to run all processes on the DPA. For example, the DPA 14a-14d will not be automatically or manually run whether it be processes/services/daemons/webservers and so forth.
In one example, the upgrade status objects 22a-22d may be used in the event of a crash during upgrade so that their respective DPA 14a-14d can recognize its previous state prior to the crash.
The storage volume 16a includes a DPA upgrade object 26a and a DPA upgrade object 26b. The DPA upgrade object 26a is updated by the DPA 14a and the DPA upgrade object 26b is updated by the DPA 14b. The DPA 14a can read or access the DPA upgrade object 26b but it cannot write to it. Likewise, the DPA 14b can read or access the DPA upgrade object 26a but it cannot write to it.
Similarly, the storage volume 16b includes a DPA upgrade object 26c and a DPA upgrade object 26d. The DPA upgrade object 26c is updated by the DPA 14c and the DPA upgrade object 26d is updated by the DPA 14d. The DPA 14c can read or access the DPA upgrade object 26d but it cannot write to it. Likewise, the DPA 14d can read or access the DPA upgrade object 26c but it cannot write to it.
The DPA upgrade objects 26a-26d include two fields. For example, the DPA upgrade object 26a includes a site control field 32a and a version field 36a, the DPA upgrade object 26b includes a site control field 32b and a version field 36b, the DPA upgrade object 26c includes a site control field 32c and a version field 36c and the DPA upgrade object 26d includes a site control field 32d and a version field 36d.
The site control field 32a-32d indicates whether the respective DPA 14a-14d can take over site control. In one example, a “True” in the site control field 32a-32d indicates that the DPA can take over site control while a “False” in the site control field 32a-32d indicates that the DPA cannot take over site control. The version field 36a-36d indicates what version of software is on or is about to be on the respective DPA 14a-14d.
Referring to
The following is an example of process 200 executing on the DPA 14a first before executing on the DPA 14b. Process 200 receives notification that a new software version (n+1) is available (202) and terminates the processes running on the DPA (204) and sets the upgrade status object 22a to “True” (208). Process 200 determines if the software version for the DPA 14a is the same as the software version for the DPA 14b (214). For example, the DPA 14a reads the version field 36b in the DPA upgrade object 26b to determine the version of software on or is about to be put on the DPA 14b. If the software version for the DPA 14a is the same as the software version for the DPA 14b, the process 200 sleeps for a predetermined time (220). For example, the DPA 14a sleeps for 30 seconds. By allowing the DPA 14a to sleep, the DPA 14b can take over site control from the DPA 14a if the DPA 14a has site control. In other examples, alternatively to sleep, if the current node is running site control it can electively relinquish control to another node if such a mechanism exists (e.g., using a push mechanism instead of pull mechanism). Using a sleep mechanism is one example to minimize the time in which the system is without Site Control. System 10 can function for a while without site control; however, this is costly so that reducing the time that there is no site control is desired.
Process 200 sets the fields in the DPA upgrade object 26a (228). For example, the DPA 14a sets the site control field to False and the version field from “n” to “n+1.”
Process 200 upgrades the software on the DPA 14a (234) and sets the upgrade status object 22a to False from True (238). Process 200 runs the processes on the DPA 14a (244). For example, the DPA 14a runs a script that runs all the processes on the DPA 14a. The script is allowed to execute if the upgrade status object 22a is set to False.
Process 200 determines if the DPA upgrade object 26b is not corrupted (252) and determines whether the software versions between the DPAs 14a, 14b are the same (254). If the DPA upgrade object 26 is not corrupted and the software versions are the same, process 200 sets the site control field 32a to True from False and allows transfer of site control (264). Thus, the DPA 14a waits until just before the DPA 14b starts its upgrade (i.e., when DPA 14b executes processing block 228 and changes version filed 36b from “n” to “n+1.”) to allow transfer of the site control at the production site thereby ensuring that only one DPA 14a, 14b is in control of the production site at a time. The transfer of site control is determined by a leader election protocol such as described, for example, in U.S. Pat. No. 7,840,662.
The process 200 persistently updates fields in the DPA upgrade object 26a (270). For example, the site control field is updated with a “True” and the version field is updated with “n+1.”
Referring to
The processes described herein (e.g., process 200) are not limited to use with the hardware and software of
The system may be implemented, at least in part, via a computer program product, (e.g., in a machine-readable storage device), for execution by, or to control the operation of, data processing apparatus (e.g., a programmable processor, a computer, or multiple computers)). Each such program may be implemented in a high level procedural or object-oriented programming language to communicate with a computer system. However, the programs may be implemented in assembly or machine language. The language may be a compiled or an interpreted language and it may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program may be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network. A computer program may be stored on a storage medium or device (e.g., CD-ROM, hard disk, or magnetic diskette) that is readable by a general or special purpose programmable computer for configuring and operating the computer when the storage medium or device is read by the computer to perform the processes described herein. The processes described herein may also be implemented as a machine-readable storage medium, configured with a computer program, where upon execution, instructions in the computer program cause the computer to operate in accordance with the processes.
The processes described herein are not limited to the specific examples described. For example, the process 200 is not limited to the specific processing order of
The processing blocks (for example, process 200) associated with implementing the system may be performed by one or more programmable processors executing one or more computer programs to perform the functions of the system. All or part of the system may be implemented as, special purpose logic circuitry (e.g., an FPGA (field-programmable gate array) and/or an ASIC (application-specific integrated circuit)).
Elements of different embodiments described herein may be combined to form other embodiments not specifically set forth above. Other embodiments not specifically described herein are also within the scope of the following claims.
Number | Name | Date | Kind |
---|---|---|---|
6212557 | Oran | Apr 2001 | B1 |
6892316 | Eide | May 2005 | B2 |
7203741 | Marco et al. | Apr 2007 | B2 |
7206910 | Chang | Apr 2007 | B2 |
7260818 | Iterum | Aug 2007 | B1 |
7360208 | Joshi | Apr 2008 | B2 |
7719443 | Natanzon | May 2010 | B1 |
7840536 | Ahal et al. | Nov 2010 | B1 |
7840662 | Natanzon | Nov 2010 | B1 |
7844856 | Ahal et al. | Nov 2010 | B1 |
7860836 | Natanzon et al. | Dec 2010 | B1 |
7882286 | Natanzon et al. | Feb 2011 | B1 |
7934262 | Natanzon et al. | Apr 2011 | B1 |
7958372 | Natanzon | Jun 2011 | B1 |
8037162 | Marco et al. | Oct 2011 | B2 |
8041940 | Natanzon et al. | Oct 2011 | B1 |
8060713 | Natanzon | Nov 2011 | B1 |
8060714 | Natanzon | Nov 2011 | B1 |
8103937 | Natanzon et al. | Jan 2012 | B1 |
8108634 | Natanzon et al. | Jan 2012 | B1 |
8214612 | Natanzon | Jul 2012 | B1 |
8250149 | Marco et al. | Aug 2012 | B2 |
8271441 | Natanzon et al. | Sep 2012 | B1 |
8271447 | Natanzon et al. | Sep 2012 | B1 |
8326800 | Cunningham | Dec 2012 | B2 |
8332687 | Natanzon et al. | Dec 2012 | B1 |
8335761 | Natanzon | Dec 2012 | B1 |
8335771 | Natanzon et al. | Dec 2012 | B1 |
8341115 | Natanzon et al. | Dec 2012 | B1 |
8370648 | Natanzon | Feb 2013 | B1 |
8380885 | Natanzon | Feb 2013 | B1 |
8392680 | Natanzon et al. | Mar 2013 | B1 |
8429362 | Natanzon et al. | Apr 2013 | B1 |
8433869 | Natanzon et al. | Apr 2013 | B1 |
8438135 | Natanzon et al. | May 2013 | B1 |
8464101 | Natanzon et al. | Jun 2013 | B1 |
8478955 | Natanzon et al. | Jul 2013 | B1 |
8495304 | Natanzon et al. | Jul 2013 | B1 |
8498967 | Chatterjee et al. | Jul 2013 | B1 |
8510279 | Natanzon et al. | Aug 2013 | B1 |
8521691 | Natanzon | Aug 2013 | B1 |
8521694 | Natanzon | Aug 2013 | B1 |
8521853 | Rathunde et al. | Aug 2013 | B2 |
8543609 | Natanzon | Sep 2013 | B1 |
8583885 | Natanzon | Nov 2013 | B1 |
8589535 | Calder | Nov 2013 | B2 |
8600945 | Natanzon et al. | Dec 2013 | B1 |
8601085 | Ives et al. | Dec 2013 | B1 |
8627012 | Derbeko et al. | Jan 2014 | B1 |
8683592 | Dotan et al. | Mar 2014 | B1 |
8694700 | Natanzon et al. | Apr 2014 | B1 |
8706700 | Natanzon et al. | Apr 2014 | B1 |
8712962 | Natanzon et al. | Apr 2014 | B1 |
8712974 | Datuashvili | Apr 2014 | B2 |
8719497 | Don et al. | May 2014 | B1 |
8725691 | Natanzon | May 2014 | B1 |
8725692 | Natanzon et al. | May 2014 | B1 |
8726066 | Natanzon et al. | May 2014 | B1 |
8738813 | Natanzon et al. | May 2014 | B1 |
8745004 | Natanzon et al. | Jun 2014 | B1 |
8751828 | Raizen et al. | Jun 2014 | B1 |
8769336 | Natanzon et al. | Jul 2014 | B1 |
8805786 | Natanzon | Aug 2014 | B1 |
8806161 | Natanzon | Aug 2014 | B1 |
8825848 | Dotan et al. | Sep 2014 | B1 |
8832399 | Natanzon et al. | Sep 2014 | B1 |
8850143 | Natanzon | Sep 2014 | B1 |
8850144 | Natanzon et al. | Sep 2014 | B1 |
8862546 | Natanzon et al. | Oct 2014 | B1 |
8892835 | Natanzon et al. | Nov 2014 | B1 |
8898112 | Natanzon et al. | Nov 2014 | B1 |
8898409 | Natanzon et al. | Nov 2014 | B1 |
8898515 | Natanzon | Nov 2014 | B1 |
8898519 | Natanzon et al. | Nov 2014 | B1 |
8914595 | Natanzon | Dec 2014 | B1 |
8924668 | Natanzon | Dec 2014 | B1 |
8930500 | Marco et al. | Jan 2015 | B2 |
8930947 | Derbeko et al. | Jan 2015 | B1 |
8935498 | Natanzon | Jan 2015 | B1 |
8949180 | Natanzon et al. | Feb 2015 | B1 |
8954673 | Natanzon et al. | Feb 2015 | B1 |
8954796 | Cohen et al. | Feb 2015 | B1 |
8959054 | Natanzon | Feb 2015 | B1 |
8977593 | Natanzon et al. | Mar 2015 | B1 |
8977826 | Meiri et al. | Mar 2015 | B1 |
8996460 | Frank et al. | Mar 2015 | B1 |
8996461 | Natanzon et al. | Mar 2015 | B1 |
8996827 | Natanzon | Mar 2015 | B1 |
9003138 | Natanzon et al. | Apr 2015 | B1 |
9026696 | Natanzon et al. | May 2015 | B1 |
9031913 | Natanzon | May 2015 | B1 |
9032160 | Natanzon et al. | May 2015 | B1 |
9037818 | Natanzon et al. | May 2015 | B1 |
9063994 | Natanzon et al. | Jun 2015 | B1 |
9069479 | Natanzon | Jun 2015 | B1 |
9069709 | Natanzon et al. | Jun 2015 | B1 |
9081754 | Natanzon et al. | Jul 2015 | B1 |
9081842 | Natanzon et al. | Jul 2015 | B1 |
9087008 | Natanzon | Jul 2015 | B1 |
9087112 | Natanzon et al. | Jul 2015 | B1 |
9104529 | Derbeko et al. | Aug 2015 | B1 |
9110914 | Frank et al. | Aug 2015 | B1 |
9116811 | Derbeko et al. | Aug 2015 | B1 |
9128628 | Natanzon et al. | Sep 2015 | B1 |
9128855 | Natanzon et al. | Sep 2015 | B1 |
9134914 | Derbeko et al. | Sep 2015 | B1 |
9135119 | Natanzon et al. | Sep 2015 | B1 |
9135120 | Natanzon | Sep 2015 | B1 |
9146878 | Cohen et al. | Sep 2015 | B1 |
9152339 | Cohen et al. | Oct 2015 | B1 |
9152578 | Saad et al. | Oct 2015 | B1 |
9152814 | Natanzon | Oct 2015 | B1 |
9158578 | Derbeko et al. | Oct 2015 | B1 |
9158630 | Natanzon | Oct 2015 | B1 |
9160526 | Raizen et al. | Oct 2015 | B1 |
9177670 | Derbeko et al. | Nov 2015 | B1 |
20070074201 | Lee | Mar 2007 | A1 |
20090144720 | Roush | Jun 2009 | A1 |
20090313630 | Hori | Dec 2009 | A1 |
20100042869 | Szabo et al. | Feb 2010 | A1 |
20100162226 | Borissov | Jun 2010 | A1 |
20110161949 | Kodaka | Jun 2011 | A1 |
20130031403 | Mordani | Jan 2013 | A1 |
Entry |
---|
Webopedia, “What is Sleep Mode?” Feb. 1, 2001, last retrieved from http://www.webopedia.com/TERM/S/sleep—mode.html on Aug. 20, 2015. |