Enterprise environments are gradually scaling up to include distributed database systems that consolidate ever-increasing amounts of data. Designing backup solutions for such large-scale environments are becoming more and more challenging as a variety of factors need to be considered. One such factor is the prolonged time required to initialize, and subsequently complete, the backup of data across failover databases.
Other aspects of the invention will be apparent from the following description and the appended claims.
Specific embodiments of the invention will now be described in detail with reference to the accompanying figures. In the following detailed description of the embodiments of the invention, numerous specific details are set forth in order to provide a more thorough understanding of the invention. However, it will be apparent to one of ordinary skill in the art that the invention may be practiced without these specific details. In other instances, well-known features have not been described in detail to avoid unnecessarily complicating the description.
In the following description of
Throughout the application, ordinal numbers (e.g., first, second, third, etc.) may be used as an adjective for an element (i.e., any noun in the application). The use of ordinal numbers is not to necessarily imply or create any particular ordering of the elements nor to limit any element to being only a single element unless expressly disclosed, such as by the use of the terms “before”, “after”, “single”, and other such terminology. Rather, the use of ordinal numbers is to distinguish between the elements. By way of an example, a first element is distinct from a second element, and a first element may encompass more than one element and succeed (or precede) the second element in an ordering of elements.
In general, embodiments of the invention relate to a decoupled backup solution for distributed databases across a failover cluster. Specifically, one or more embodiments of the invention improves upon a limitation of existing backup mechanisms involving distributed databases across a failover cluster. The limitation entails restraining backup agents, responsible for executing database backup processes across the failover cluster, from immediately initiating these aforementioned processes upon receipt of instructions. Rather, due to this limitation, these backup agents must wait until all backup agents, across the failover cluster, receive their respective instructions before being permitted to initiate the creation of backup copies of their relative distributed database. Subsequently, the limitation imposes an initiation delay on the backup processes, which one or more embodiments of the invention omits, thereby granting any particular backup agent the capability to immediately (i.e., without delay) initiate those backup processes.
In one embodiment of the invention, the above-mentioned components may be directly or indirectly connected to one another through a network (not shown) (e.g., a local area network (LAN), a wide area network (WAN) such as the Internet, a mobile network, or any other network). The network may be implemented using any combination of wired and/or wireless connections. In embodiments in which the above-mentioned components are indirectly connected, there may be other networking components or systems (e.g., switches, routers, gateways, etc.) that facilitate communications, information exchange, and/or resource sharing. Further, the above-mentioned components may communicate with one another using any combination of wired and/or wireless communication protocols.
In one embodiment of the invention, a CUC (102A-102N) may be any computing system operated by a user of the DFC (106). A user of the DFC (106) may refer to an individual, a group of individuals, or an entity for which the database(s) of the DFC (106) is/are intended; or whom accesses the database(s). Further, a CUC (102A-102N) may include functionality to: submit application programming interface (API) requests to the DFC (106), where the API requests may be directed to accessing (e.g., reading data from and/or writing data to) the database(s) of the DFC (106); and receive API responses, from the DFC (106), entailing, for example, queried information. One of ordinary skill will appreciate that a CUC (102A-102N) may perform other functionalities without departing from the scope of the invention. Examples of a CUC (102A-102N) include, but are not limited to, a desktop computer, a laptop computer, a tablet computer, a server, a mainframe, a smartphone, or any other computing system similar to the exemplary computing system shown in
In one embodiment of the invention, the CAC (104) may be any computing system operated by an administrator of the DFC (106). An administrator of the DFC (106) may refer to an individual, a group of individuals, or an entity whom may be responsible for overseeing operations and maintenance pertinent to hardware, software, and/or firmware elements of the DFC (106). Further, the CAC (104) may include functionality to: submit data backup requests to a primary backup agent (PBA) (110) (described below), where the data backup requests may pertain to the performance of decoupled distributed backups of the active and one or more passive databases across the DFC (106); and receive, from the PBA (110), aggregated reports based on outcomes obtained through the processing of the data backup requests. One of ordinary skill will appreciate that the CAC (104) may perform other functionalities without departing from the scope of the invention. Examples of the CAC (104) include, but are not limited to, a desktop computer, a laptop computer, a tablet computer, a server, a mainframe, a smartphone, or any other computing system similar to the exemplary computing system shown in
In one embodiment of the invention, the DFC (106) may refer to a group of finked nodes—i.e., database failover nodes (DFNs) (108A-108N) (described below)—that work together to maintain high availability (or minimize downtime) of one or more applications and/or services. The DFC (106) may achieve the maintenance of high availability by distributing any workload (i.e., applications and/or services) across or among the various DFNs (108A-108N) such that, in the event that any one or more DFNs (108A-108N) go offline, the workload may be subsumed by, and therefore may remain available on, other DFNs (108A-108N) of the DFC (106). Further, reasons for which a DFN (108A-108N) may go offline include, but are not limited to, scheduled maintenance, unexpected power outages, and failure events induced through, for example, hardware failure, data corruption, and other anomalies caused by cyber security attacks and/or threats. Moreover, the various DFNs (108A-108N) in the DFC (106) may reside in different physical (or geographical) locations in order to mitigate the effects of unexpected power outages and failure (or failover) events. By way of an example, the DFC (106) may represent a Database Availability Group (DAG) or a Windows Server Failover Cluster (WSFC), which may each encompass multiple Structured Query. Language (SQL) servers.
In one embodiment of the invention, a DFN (108A-108N) may be a physical appliance—e.g., a server or any computing system similar to the exemplary computing system shown in
In one embodiment of the invention, the various DFNs (108A-108N) in the DFC (106) may operate under an active-standby (or active-passive) failover configuration. That is, under the aforementioned failover configuration, one of the DFNs (108A) may play the role of the active (or primary) node in the DFC (106), whereas the remaining one or more DFNs (108B-108N) may play the role of the standby (or secondary) node(s) in the DFC (106). With respect to roles, the active node may refer to a node to which client traffic (i.e., network traffic originating from one or more CUCs (102A-102N)) may currently be directed. A standby node, on the other hand, may refer to a node that may currently not be interacting with one or more CUCs (102A-102N).
In one embodiment of the invention, each DFN (108A-108N) may host a backup agent (110, 112A-112M) thereon. Specifically, the active (or primary) DFN (108A) of the DFC (106) may host a primary backup agent (PBA) (110), whereas the one or more standby (or secondary) DFNs (108B-108N) of the DFC (106) may each host a secondary backup agent (SBA) (112A-112M). In general, a backup agent (110, 112A-112M) may be a computer program or process (i.e., an instance of a computer program) tasked with performing data backup operations entailing the replication, and subsequent remote storage, of data residing on the DFN (108A-108N) on which the backup agent (110, 112A-112M) may be executing. In one embodiment of the invention, a PBA (110) may refer to a backup agent that may be executing on an active DFN (108A), whereas a SBA (112A-112M) may refer to a backup agent that may be executing on a standby DFN (108B-108N).
In one embodiment of the invention, the PBA (110) may include functionality to: receive data backup requests from the CAC (104), where the data backup requests may pertain to the initialization of data backup operations across the DFC (106); initiate primary data backup processes on the active (or primary) node (i.e., on which the PBA (110) may be executing) in response to the data backup requests from the CAC (104); also in response to the data backup requests from the CAC (104), issue secondary data backup requests to the one or more SBAs (112A-112M) executing on the one or more standby (or secondary) nodes in the DFC (106), where the secondary data backup requests pertain to the initialization of data backup operations at each standby node, respectively; obtain an outcome based on the performing of the data backup operations on the active node, where the outcome may indicate that performance of the data backup operations was either a success or a failure; receive data backup reports from the one or more SBAs (112A-112M) pertaining to outcomes obtained based on the performing of data backup operations on the one or more standby nodes, where each data backup report may indicate that the performance of the data backup operations, on the respective standby node, was either a success or a failure; aggregate the various outcomes, obtained at the active node or through data backup reports received from one or more standby nodes, to generate aggregated data backup reports; and issue, transmit, or provide the aggregated data backup reports to the CAC (104) in response to data backup requests received therefrom.
In one embodiment of the invention, a SBA (112A-112M) may include functionality to: receive secondary data backup requests from the PBA (11.0), where the secondary data backup requests pertain to the initialization of data backup operations on the standby (or secondary) node on which the SBA (112A-112M) may be executing; immediately afterwards (i.e., not waiting on other SBAs (112A-112M) to receive their respective secondary data backup requests), initiate secondary data backup processes on their respective standby node; obtain an outcome based on the performing of the secondary data backup processes on their respective standby node, where the outcome may indicate that performance of the secondary data backup processes was either a success or a failure; generate data backup reports based on the obtained outcome(s); and issue, transmit, or provide the data backup reports to the PBA (110) in response to the secondary data backup requests received therefrom.
In one embodiment of the invention, data backup operations (or processes), performed by any backup agent (110, 112A-112M), may entail creating full database backups, differential database backups, and/or transaction log backups of the database copy (114, 116A-116M) (described below) residing on or operatively connected to the DFN (108A-108N) on which the backup agent (110, 112A-112M) may be executing. A full database backup may refer to the generation of a backup copy containing all data files and the transaction log residing on the database copy (114, 116A-116M). The transaction log may refer to a data object or structure that records all transactions, and database changes made by each transaction, pertinent to the database copy (114, 116A-116M). A differential database backup may refer to the generation of a backup copy containing all changes made to the database copy (114, 116A-116M) since the last full database backup, and the transaction log, residing on the database copy (114, 116A-116M). Meanwhile, a transaction log backup may refer to the generation of a backup copy containing all transaction log records that have been made between the last transaction log backup, or the first full database backup, and the last transaction log record that may be created upon completion of the data backup process. In one embodiment of the invention, upon the successful creation of a full database backup, a differential database backup, and/or a transaction log backup, each backup agent (110, 112A-112M) may include further functionality to submit the created backup copy to the cluster BSS (118) for remote consolidation.
In one embodiment of the invention, a database copy (114, 116A-116M) may be a storage system or media for consolidating various forms of information pertinent to the DFN (108A-108N) on which the database copy (114, 116A-116M) may be residing or to which the database copy (114, 116A-116M) may be operatively connected. Information consolidated in the database copy (114, 116A-116M) may be partitioned into either a data files segment (not shown) or a log files segment (not shown). Information residing in the data files segment may include, for example, data and objects such as tables, indexes, stored procedures, and views. Further, any information written to the database copy (114, 116A-116M), by one or more end users, may be retained in the data files segment. On the other hand, information residing in the log files segment may include, for example, the transaction log (described above) and any other metadata that may facilitate the recovery of any and all transactions in the database copy (114, 116A-116M).
In one embodiment of the invention, a database copy (114, 116A-116M) may span logically across one or more physical storage units and/or devices, which may or may not be of the same type or co-located at a same physical site. Further, information consolidated in a database copy (114, 116A-116M) may be arranged using any storage mechanism (e.g., a filesystem, a collection of tables, a collection of records, etc.). In one embodiment of the invention, a database copy (114, 116A-116M) may be implemented using persistent storage (i.e., non-volatile) storage media. Examples of persistent storage media include, but are not limited to: optical storage, magnetic storage, NAND Flash Memory, NOR Flash Memory, Magnetic Random Access Memory (M-RAM), Spin Torque Magnetic RAM (ST-MRAM), Phase Change Memory (PCM), or any other storage media defined as non-volatile Storage Class Memory (SCM).
In one embodiment of the invention, within the DFC (106), there may be one active database copy (ADC) (114) and one or more passive database copies (PDCs) (116A-116M). The ADC (114) may refer to the database copy that resides on, or is operatively connected to, the active (or primary) DFN (108A) in the DFC (106). Said another way, the ADC (114) may refer to the database copy that may be currently hosting the information read therefrom and written thereto by the one or more CUCs (102A-102N) via the active DFN (108A). Accordingly, the ADC (114) may be operating in read-write (RW) mode, which grants read and write access to the ADC (114). On the other hand, a PDC (116A-116M) may refer to a database copy that resides on, or is operatively connected to, a standby (or secondary) DFN (108B-108N) in the DFC (106). Said another way, a PDC (116A-116M) may refer to a database copy with which the one or more CUCs (102A-102N) may not be currently engaging. Accordingly, a PDC (116A-116M) may be operating in read-only (RO) mode, which grants only read access to a PDC (116A-116M). Further, in one embodiment of the invention, a PDC (116A-116M) may include functionality to: receive transaction log copies of the transaction log residing on the ADC (114) from the PBA (110) via a respective SBA (114A-114M); and apply transactions, recorded in the received transaction log copies, to the transaction log residing on the PDC (116A-116M) in order to maintain the PDC (116A-116M) up-to-date with the ADC (114).
In one embodiment of the invention, the cluster BSS (118) may be a data backup, archiving, and/or disaster recovery storage system or media that consolidates various forms of information. Specifically, the cluster BSS (118) may be a consolidation point for backup copies (described above) created, and subsequently submitted, by the PBA (110) and the one or more SBAs (112A-112M) while performing data backup operations on their respective DFNs (108A-108N) in the DFC (106). In one embodiment of the invention, the cluster BSS (118) may be implemented using one or more servers (not shown). Each server may be a physical server (i.e., in a datacenter) or a virtual server (i.e., residing in a cloud computing environment). In another embodiment of the invention, the cluster BSS (118) may be implemented using one or more computing systems similar to the exemplary computing system shown in
In one embodiment of the invention, the cluster BSS (118) may further be implemented using one or more physical storage units and/or devices, which may or may not be of the same type or co-located in a same physical server or computing system. Further, the information consolidated in the cluster BSS (118) may be arranged using any storage mechanism (e.g., a filesystem, a collection of tables, a collection of records, etc.). In one embodiment of the invention, the cluster BSS (118) may be implemented using persistent (i.e., non-volatile) storage media. Examples of persistent storage media include, but are not limited to: optical storage, magnetic storage, NAND Flash Memory, NOR Hash Memory, Magnetic Random Access Memory (M-RAM), Spin Torque Magnetic RAM (ST-MRAM), Phase Change Memory (PCM), or any other storage media defined as non-volatile Storage Class Memory (SCM).
While
Turning to
In Step 202, in response to the data backup request (received in Step 200), a primary data backup process is initiated on an active (or primary) DFN (see e.g.,
In Step 204, one or more secondary data backup requests is/are issued to one or more secondary backup agents (SBAs), respectively. In one embodiment of the invention, each SBA may be a computer program or process (i.e., an instance of a computer program) executing on the underlying hardware of one of the one or more standby (or secondary) DFNs (see e.g.,
In one embodiment of the invention, upon receipt of a secondary data backup request, each SBA may immediately proceed with the initiation of the data backup operation (or process) on their respective standby/secondary DFNs. This immediate initiation of data backup operations/processes represents a fundamental improvement (or advantage) that embodiments of the invention provide over existing or traditional data backup mechanisms for database failover clusters (DFCs). That is, through existing/traditional data backup mechanisms, each SBA is required to wait until all SBAs in the DFC have received a respective secondary data backup request before each SBA is permitted to commence the data backup operation/process on their respective standby/secondary DFN. For DFCs hosting or operatively connected to substantially large database copies (e.g., where the total data size collectively consolidated on the database copies may reach up to 25,600 terabytes (TB) or 25.6 petabytes (PB) of data), the elapsed backup time, as well as the allocation and/or utilization of resources on an active/primary DFN, associated with performing backup operations (or processes) may proportionally be large in scale. Accordingly, by enabling SBAs to immediately initiate data backup operations processes upon receipt of a secondary data backup request (rather than having them wait until all. SBAs have received their respective secondary data backup request), one or more embodiments of the invention reduce the overall time expended to complete the various data backup operations/processes across the DIV.
In Step 206, a data backup report is received from each SBA (to which a secondary data backup request had been issued in Step 204). In one embodiment of the invention, each data backup report may entail a message that indicates an outcome of the initiation of a secondary data backup process on a respective standby/secondary DFN. Subsequently, in one embodiment of the invention, a data backup report may relay a successful outcome representative of a successful secondary data backup process—i.e., the successful creation of a backup copy and the subsequent submission of the backup copy to the cluster BSS. In another embodiment of the invention, a data backup report may alternatively relay an unsuccessful outcome representative of an unsuccessful secondary data backup process—i.e., the unsuccessful creation of a backup copy due to one or more database-related errors. In one embodiment of the invention, similar outcomes may be obtained from the performance of the primary data backup process on the active/primary DFN (initiated in Step 202).
In Step 208, an aggregated data backup report is issued back to the CAC (wherefrom the data backup request had been received in Step 200). In one embodiment of the invention, the aggregated data backup report may entail a message that indicates the outcomes (described above) pertaining to the various data backup processes across the DFC. Therefore, the aggregated data backup report may be generated based on the outcome obtained through the primary data backup process performed on the active/primary DFN in the DFC, as well as based on the data backup report received from each SBA, which may specify the outcome obtained through a secondary data backup process performed on a respective standby/secondary DFN in the DFC.
In one embodiment of the invention, the computer processor(s) (302) may be an integrated circuit for processing instructions. For example, the computer processor(s) may be one or more cores or micro-cores of a processor. The computing system (300) may also include one or more input devices (310), such as a touchscreen, keyboard, mouse, microphone, touchpad, electronic pen, or any other type of input device. Further, the communication interface (312) may include an integrated circuit for connecting the computing system (300) to a network (not shown) (e.g., a local area network (LAN), a wide area network (WAN) such as the Internet, mobile network, or any other type of network) and/or to another device, such as another computing device.
In one embodiment of the invention, the computing system (300) may include one or more output devices (308), such as a screen (e.g., a liquid crystal display (LCD), a plasma display, touchscreen, cathode ray tube (CRT) monitor, projector, or other display device), a printer, external storage, or any other output device. One or more of the output devices may be the same or different from the input device(s). The input and output device(s) may be locally or remotely connected to the computer processor(s) (302), non-persistent storage (304), and persistent storage (306). Many different types of computing systems exist, and the aforementioned input and output device(s) may take other forms.
Software instructions in the form of computer readable program code to perform embodiments of the invention may be stored, in whole or in part, temporarily or permanently, on a non-transitory computer readable medium such as a CD, DVD, storage device, a diskette, a tape, flash memory, physical memory, or any other computer readable storage medium. Specifically, the software instructions may correspond to computer readable program code that, when executed by a processor(s), is configured to perform one or more embodiments of the invention.
Turning to
Turning to the example, consider a scenario whereby the CAC (402) issues a data backup request to the DFC (404). Following embodiments of the invention, the PBA (410) receives the data backup request. In response to receiving the data backup request, the PBA (410) initiates a primary data backup process at the primary DFN (406) directed to creating a backup copy of the ADC (414). After initiating the primary data backup process, the PBA (410) issues secondary data backup requests to the first and second SBAs (412A, 412B). Thereafter, upon receipt of a secondary data backup request, the first SBA (412A) immediately initiates a secondary data backup process at the first secondary DFN (408A), where the secondary data backup process is directed to creating a backup copy of the first PDC (416A). Similarly, upon receipt of another secondary data backup request, the second SBA (412B) immediately initiates another secondary data backup process at the second secondary DFN (408B), where the other secondary data backup process is directed to creating a backup copy of the second PDC (416B).
In contrast, had the first and second SBAs (412A, 412B) been Operating using the existing or traditional backup mechanism for DFCs, upon receipt of a secondary data backup request, the first SBA (412A) refrains from initiating a secondary data backup process at the first secondary DFN (408A) until after all other SBAs (i.e., the second SBA (412B)) have received their respective secondary data backup request. Substantively, an initiation delay is built into the existing or traditional mechanism, which prevents any and all SBAs across the DFC to initiate a secondary data backup process until every single SBA receives a secondary data backup request. In the example system (400) portrayed, with the DFC (404) consisting of only two SBAs (412A, 412B), the initiation delay may be negligible. However, in real-world environments, where DFCs may include hundreds, if not, thousands of SBAs, the initiation delay may be substantial, thereby resulting in longer backup times, over-utilization of production (i.e., primary DFN (406)) resources, and other undesirable effects, which burden the performance of the DFC (404) and the overall user experience.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Accordingly, the scope of the invention should be limited only by the attached claims.