The present application claims priority from Japanese application JP2005-191702 filed on Jun. 30, 2005, the content of which is hereby incorporated by reference into this application.
1. Field of the Invention
The present invention relates to a database management system constructing technology for constructing a database management system of parallel-server configuration. More particularly, it relates to a technology which is effective when applied to a database management system constructing technology for constructing a database management system which performs the system fail-over when a failure has occurred.
2. Description of the Related Art
The technology of the system fail-over has been employed as a method for enhancing availability of the database management system. The system fail-over is a technology for recovering the system, and shortening a service-down time due to the occurrence of a failure. Here, this recovery and shortening is accomplished by switching the processing in a machine where this failure has occurred over to the processing in another machine.
In the database management system-to which the system fail-over control is applied, a machine which is supposed to become a switching destination at the time of a failure occurrence has been defined on each machine basis which is executing a service at present. Moreover, if the system has detected a failure in an execution-system (i.e., operation-system) machine, the system will perform the system fail-over to a standby-system machine. As this kind of technology, there has been known a one disclosed in, e.g., JP-A-2001-282763.
In the system fail-over, at first, resources in the execution system, e.g., discs storing the database therein and the network address, are switched to resources in the standby system. When the resource switching has been completed, database server in the standby system makes reference to log in the database inherited from the execution system, thereby executing a recovery processing for the database. At a point-in-time when the database recovery processing has been completed, the standby system starts reception of the service, thereby becoming an execution system. This step completes the system fail-over. This technology has been explained in Jim Gray and Andreas Reuter: “Transaction processing: concept and techniques”, Morgan Kaufmann Publishers, 1993.
When applying the system fail-over to a database management system where a plurality of database servers operate on one and the same machine, the following configuration can be employed: Namely, systems of a plurality of database servers on a single machine where a failure has occurred will be switched to a plurality of different machines. The processing in the single machine where the failure has occurred is inherited by the plurality of different machines in a shared manner, thereby distributing load within the failure. This configuration makes it possible to suppress a load increase in the switching-destination machines. Here, concerning to which switching-destination machine each of the plurality of switching targets will switch its system, user has studied the plan in advance, and has described it into a definition file at the time of system initial construction.
Generally speaking, as specs of machines which configure the database management system, machines of basically the same performance are used. Accordingly, distributing the load equally or substantially equally results in the highest processing efficiency as the system as a whole.
In constructing the system, the user describes system information, such as placement of the database servers in each machine on the system, into the definition file. Then, the system is started up based on this definition file. When applying the system fail-over configuration thereto, as described earlier, the information on the system fail-over is also defined simultaneously. In many cases, no consideration has been given to the balance among loads occurring after the system fail-over. The system guarantees none of the balance of the loads among all the machines occurring after the system fail-over. Consequently, at the time of system initial construction, the user must design the system so that the loads will also be balanced among all the machines after the system fail-over.
At the time of system initial construction, the user must design the system correctly by taking into consideration the balance of the loads among all the machines occurring after the system fail-over. If the user fails to do this task, the load balance in the system will become unequal at the time of initial start-up, or at the time when a failure has occurred and the system fail-over is performed. As a result, the loads become biased, and are concentrated onto a particular machine after the system fail-over. This situation, in some cases, lowers throughput of the entire system.
It is an object of the present invention to provide a technology for making it possible to solve the above-described problem, and to guarantee that, at the initial start-up time and after the system fail-over, the loads among all the information processing devices will become equal or substantially equal.
It is another object of the present invention to provide a technology for making it possible to reduce a load for re-inputting the system configuration of an already-existing database management system.
In the present invention, in a database management system constructing device for constructing a database management system, the database management system storing a database such that the database is divided into a plurality of storage areas, and performing a data processing by causing a database processing server to be related to each storage area, the database management system is constructed by calculating the number of the database processing servers which becomes equally distributable into information processing devices that are under operation at the time of a failure occurrence.
In the database management system constructing device of the present invention, after having read input information which includes the number of information processing devices configuring the database management system, and multi-failure tolerant number-of-times indicating tolerant number-of-times of multi failures, when, of the read number of information processing devices, a failure has occurred in an information processing device corresponding to each number-of-times of the multi-failure tolerant number-of-times, the total number of database processing servers, in which database processing servers of the information processing device where the failure has occurred are equally distributable into the other information processing devices, is calculated, and correspondences between the database processing servers and database-used storage devices are determined, and a system-configuration definition file including these pieces of information is stored into the storage devices.
Next, the calculated total number of database processing servers are allocated to each information processing device which is made operable at initial start-up time or at system fail-over time, thereby determining an information processing device which is to execute each database processing server at the initial start-up time or at the system fail-over time. After that, a system fail-over definition file where these pieces of information are set is stored into the storage devices.
After that, the stored system-configuration definition file is delivered via a communications device to each of the information processing devices configuring the database management system. In each information processing device, the database processing servers are executed based on the delivered definition file, thereby performing the database processing. At the time of a failure occurrence, the system fail-over is performed. Based on this system fail-over, the database processing servers which had been executed in the information processing device where the failure has occurred will be executed by the information processing device specified in the system fail-over definition file.
According to the present invention, it becomes possible to guarantee that, at the initial start-up time and after the system fail-over, the loads among all the information processing devices will become equal or substantially equal.
Hereinafter, the explanation will be given below concerning a database management system constructing device in a first embodiment for constructing a database management system including a plurality of information processing devices.
The database management system indicated in the present embodiment is constructed by a setting machine 100 which is the database management system constructing device. In the setting machine 100, an initial construction program 106 is operating. The initial construction program 106 includes a system-configuration determination processing unit 300 and a definition-file delivery processing unit 301. Incidentally, it is assumed that machines in the present embodiment are the information processing devices.
The system-configuration determination processing unit 300 receives input information 107. This input information 107 includes information such as multi-failure tolerant number-of-times and the number of discs on the system. Based on this input information 107, the system-configuration determination processing unit 300 determines a system-configuration definition and a system fail-over definition in which, even if failures have occurred up to the specified tolerant number-of-times, the loads will become equal among all the remaining machines. Then, the unit 300 outputs this determined contents as a system-configuration definition file 108 and a system fail-over definition file 109.
The system-configuration definition file 108 describes respective types of settings needed for constructing the system. The system fail-over definition file 109 describes information on configuration of the system fail-over and system fail-over destination for each DB processing server. Moreover, the definition-file delivery processing unit 301 delivers the system-configuration definition file 108 and the system fail-over definition file 109 to each machine.
A program for allowing the setting machine 100 in the present embodiment to function as the system-configuration determination processing unit 300 and the definition-file delivery processing unit 301 is recorded into a record medium such as CD-ROM, and is stored into a magnetic disc or the like. After that, the program is loaded onto a memory, then being executed. Incidentally, the record medium for recording the above-described program may also be another record medium other than the CD-ROM. Also, the program may be used by installing the program into the information processing devices from the record medium. Also, the program may be used by accessing the record medium via a network.
In the machine 0 (101), the machine 1 (102), the machine 2 (103), the machine 3 (104), and the machine 4 (105), based on the definition information in the system-configuration definition file 108, database processing servers are started up, thereby configuring the system. What are called “database processing servers” here refer to database-processing execution environments which include process groups, memory areas, and the like. Namely, the “database processing servers” means database processing functions provided by executing programs for the database processing servers. External storage devices 115, each of which forms a pair with each of the DB processing servers 111 to 114, store the DB data therein. Also, a system fail-over function 116 is operating in each machine in which the DB processing servers are located. The system fail-over function 116 monitors operation situation of the DB processing servers in each machine. Then, at the time of a failure occurrence, the function 116 switches the system in accordance with the contents in the system fail-over definition file 109.
Architecture of the database management system in the present embodiment is Shared Nothing architecture. The database (e.g., table or index) managed by the present system is divided into a plurality of division tables and division indexes by using various techniques, then being stored in the divided manner into a plurality of DB storage areas within the external storage devices 115. A certain DB storage area is caused to correspond to a determined DB processing server. A DB processing server will access only the data (e.g., table data or index data) stored within a DB storage area which is caused to correspond to the DB processing server.
The setting machine 100 includes a CPU 2001, a main storage device 2002, a communications control device 2003, an I/O control device 2004, a terminal 2005, and an external storage device 2006 such as a magnetic disc. The communications control device 2003 is a device for controlling communications between the setting machine 100 and the other machines via a network 2007. The I/O control device 2004 is a device for controlling read/write of data from/into the external storage device 2006.
A processing program 2010 for implementing the initial construction program 106 is stored on the external storage device 2006. The CPU 2001 loads the initial construction program 106 onto the main storage device 2002, then executing the program 106. Also, the CPU 2001 receives input of the input information 107 from the user.
Next, at a step 602, the system-configuration determination processing unit 300 determines the DB-processing-server number in which the loads will become equal among all the operating machines even after the system fail-over. Moreover, at a step 603, the unit 300 judges whether or not the system construction is executable. In the database management system in the present embodiment, it is assumed that a single DB processing server exclusively occupies a single external storage device 115. It is also assumed that the judgment as to whether or not the system construction is executable is performed by making a comparison between the DB-processing-server number calculated at the step 602 and the disc number in the system acquired from the input information 107.
If the disc number is smaller than the DB-processing-server number, the unit 300 judges that the system construction is impossible. Accordingly, the unit 300 outputs, to an output device, a notification that DB processing servers of a failed machine cannot be equally distributed into the operating machines. After that, at a step 604, the unit 300 notifies the user of the number of the discs whose addition is necessary by outputting the number of the discs to the output device. Then, the unit 300 receives a response to this notification via the input device, thereby confirming the possibility or impossibility of the addition of the discs. If the addition of the discs is permissible, at a step 606, the unit 300 acquires information on the discs in the necessary number.
Meanwhile, if, at the step 604, the addition of the discs is impossible, at a step 605, the unit 300 outputs, to the output device, an inquiry as to whether or not the system will be constructed with the DB-processing-server number equivalent to the disc number specified in the input information 107. Then, the unit 300 receives an instruction thereto from the user via the input device.
If, at the step 605, the unit 300 receives the instruction from the user that the system will not be constructed with the DB-processing-server number equivalent to the disc number specified in the input information 107, the unit 300 proceeds to a step 607. Here, the unit 300 decrements, by “1”, the value of the multi-failure tolerant number-of-times specified in the input information 107. The case of having decremented the number-of-times of the multi failures to be tolerated makes it possible to construct the system with the use of the smaller DB-processing-server number, i.e., the smaller disc number. Furthermore, the unit 300 returns to the step 602, then determining the DB-processing-server number once again.
Meanwhile, if, in the processing at the step 605, the unit 300 receives the instruction from the user that the system will be constructed with the DB-processing-server number equivalent to the disc number specified in the input information 107, the unit 300 proceeds to a step 608, assuming that the DB-processing-server number is the same number as the disc number specified in the input information 107.
At the step 608, the unit 300 determines correspondences between the DB processing servers and the discs, then proceeding to a step 609. At the step 609, the unit 300 outputs, to the external storage device 2006 and as the system-configuration definition file 108, system-configuration information including the determined DB-processing-server number and the correspondences between the DB processing servers and the discs. At a step 610, the unit 300 determines a system fail-over sequence by setting machines for executing the DB processing servers at the initial start-up time, and machines which become system fail-over destinations at the system fail-over time. At a step 611, the unit 300 outputs information indicating the system fail-over sequence to the external storage device 2006 and as the system fail-over definition file 109.
First, at a step 701, the failed-machine number i is set at “1”, then proceeding to a step 702. At the step 702, after determining the operating-machine number from the difference between the machine number and the failed-machine number i, value of the least common multiple S(i) of the failed-machine number i and the operating-machine number is determined. Moreover, at a step 703, value T(i) is determined which results from dividing the value of the least common multiple S(i) by the failed-machine number i.
At a step 704, it is judged whether or not the value of the failed-machine number i has reached the multi-failure tolerant number-of-times. If the value has not reached the multi-failure tolerant number-of-times, the processing proceeds to a step 705. At the step 705, the value of the failed-machine number i is incremented by “1”, then proceeding to the step 702. Meanwhile, if, at the step 704, it is judged that the value of the failed-machine number i has reached the multi-failure tolerant number-of-times, the processing proceeds to a step 706.
At the step 706, the least common multiple X of all the values T(n) which have been determined at the step 703 up until now is determined. Here, n denotes values ranging from 1 to the multi-failure tolerant number-of-times. At a step 707, value of the total DB-processing-server number on the system is determined. The value of X determined at the step 706 turns out to become the DB-processing-server number per machine. Accordingly, the value of the total DB-processing-server number is determined by multiplying this value of X by the machine number. Also, here, server names from DB server 00 to DB server X are allocated to the respective DB processing servers.
At a step 802, an initial location machine in which each DB processing server is located at the initial start-up time is determined. Here, the DB-processing-server number determined at the step 602 is divided by the machine number. Next, the DB processing servers are sequentially allocated to the machines in this division unit, then storing names of the machines into the field of the initial location machine in the system fail-over definition table 1000. Here, if the DB-processing-server number can not be equally divided, it will be divided as equally as possible.
Next, at a step 803, system fail-over destinations at the first time are determined for each DB processing server. Among DB processing servers which have been located in one and the same machine as their initial location machine, the machines other than this initial location machine are equally allocated to the DB processing servers. Next, names of the other machines are stored into the field of the system fail-over destinations at the first time in the system fail-over definition table 1000, then proceeding to a step 804.
Moreover, at a step 804 or thereinafter, machines which become system fail-over destinations at the second time or thereinafter are determined. At the step 804, all of the DB processing servers are extracted which have been located so far in the machines which become the same as the location machines in a DB processing server supposed to determine the system fail-over destination machine. Furthermore, the processing proceeds to a step 805.
At the step 805, the system fail-over destination machines are determined so that the DB processing servers extracted at the step 804 will be located in a manner of being equally distributed into the machines other than the machines in which the DB processing servers have been located so far. Next, names of the machines are stored into the fields of the fail-over destinations at the second time or thereinafter in the system fail-over definition table 1000, then proceeding to a step 806.
At the step 806, it is confirmed whether or not there exists a DB processing server to which the system fail-over destination machine has been unallocated. If there exists the unallocated DB processing server, the processing returns to the step 804, then continuing the allocation. Meanwhile, if there exists none of the unallocated DB processing server, the processing proceeds to a step 807.
At the step 807, it is confirmed whether or not all the fields of the system fail-over definition table 1000 have been filled. If all the fields have been not filled, the processing returns to the step 804, then continuing the determination of the system fail-over destination. Meanwhile, if all the items of the system fail-over definition table 1000 have been filled, the processing proceeds to a step 808 then to terminate the processing flow for the system fail-over sequence determination.
Next, the explanation will be given below concerning a processing example in the case of constructing a database management system of the configuration in
Each of a machine 1 (102), a machine 2 (103), a machine 3 (104), and a machine 4 (105) includes a CPU 2201, a main storage device 2202, a communications control device 2203, an I/O control device 2204, an external storage device 2700 such as a magnetic disc, and external storage devices 115. The external storage devices 115, which are co-use disc devices, can be accessed from each of the machine 1 (102), the machine 2 (103), the machine 3 (104), and the machine 4 (105). The external storage devices 115 permanently or temporarily store thereon the data which becomes access target in the present database management system. The external storage devices 115 are the co-use disc devices, and the system fail-over function controls the accesses to the co-use disc devices.
A processing program 2701 for implementing the DB processing servers is stored on the external storage device 2700. A DB processing server in the operating state is operated on the main storage device 2202 using the CPU 2201. The DB processing server performs read/write of the data from/into the external storage devices 115 and 2700 by the I/O control device 2204, then performing transmission/reception of the data with the machines connected thereto by the communications control device 2203 via a network 2500.
First, the user inputs the input information 107. Hereinafter, the system definition will be determined in accordance with the flowchart in
In accordance with the step 601, the system-configuration determination processing unit 300 in the setting machine 100 reads the contents of the input information 107. Here, the machines for locating the DB processing servers therein are four in number, and the external storage devices 115 for storing the DB data are twelve in number. Also, here, the multi-failure tolerant number-of-times are set at two number-of-times. Namely, the present database management system is configured into a system in which, even if a failure has occurred two number-of-times therein, its operation can be continued by distributing the load.
Next, in accordance with the step 602, the unit 300 determines the DB-processing-server number. The DB-processing-server number will be determined in accordance with the flowchart in
First, in accordance with the step 701, the failed-machine number i is set at “1”. Next, in accordance with the step 702, the least common multiple S1 of the failed-machine number i and the operating-machine number is determined. In the present example, the machine number for locating the DB processing servers therein is “4”. Accordingly, when the failed-machine number is “1”, the operating-machine number becomes equal to “3”. Consequently, S1 becomes a least common multiple of “1” and “3”, thereby becoming equal to “3”. Next, the processing proceeds to the step 703, where the value of T1 is determined as being “3” by dividing the value “3” of S1 by the failed-machine number “1”.
In the judgment at the step 704, the value of the failed-machine number i is “1”. Accordingly, the failed-machine number i has not reached the value “2”of the multi-failure tolerant number-of-times in the present example, thus proceeding to the step 705. At the step 705, the value of the failed-machine number i is incremented by “1”, thereby making the failed-machine number i equal to “2”. Moreover, the processing proceeds the step 702 once again, where the least common multiple S2 is determined. The least common multiple S2 is a least common multiple of the value “2” of the failed-machine number i and the operating-machine number “2”, which results from subtracting the value “2” of the failed-machine number i from the machine number “4”. Consequently, the least common multiple S2 becomes equal to “2”. Furthermore, the processing proceeds to the step 703, where the value of T2 is determined. The value of T2 becomes equal to “1” by dividing the value “2” of S2 by the value “2” of the failed-machine number i. In addition, the processing proceeds to the step 704.
At the step 704, the value of the multi-failure tolerant number-of-times is “2”, and the value of the failed-machine number i is “2”. Accordingly, the judgment result turns out to be affirmative, thus proceeding to the step 706. At the step 706, the value X of the least common multiple is determined from all the values of T which have been determined at the step 703 up until now. In the present example, since T1 has been found to be “3” and T2 has been found to be “1”, the value X of the least common multiple becomes equal to “3”.
At the step 707, the value of the total DB-processing-server number on the system is determined by multiplying the value X by the machine number. Since the value X is “3” and the machine number is “4”, the total DB-processing-server number on the system becomes equal to “12”.
Next, the judgment at the step 603 is carried out. In the present construction example, since the DB-processing-server number is “12” and the disc number is “12”, the system is constructible. In the present embodiment, it is required to cause the discs to correspond to the DB processing servers individually. Accordingly, if the disc number lacks, additional installment of the discs or the like is required. Moreover, in accordance with the step 608, the correspondences between the respective DB processing servers and the respective discs are determined. Furthermore, at the step 609, the system-configuration definition file 108 is outputted.
Next, in accordance with the step 610, the unit 300 determines the system fail-over sequence. The system fail-over sequence will be determined in accordance with the flowchart in
First, at the step 801, the system fail-over definition table is prepared. After that, at the step 802, the initial location of all the DB processing servers is determined. Namely, the twelve DB processing servers determined at the step 602 are going to be located into the four machines. The DB processing servers are sequentially distributed into the machines with the three servers distributed for each machine (
At the step 803, the system fail-over destinations at the first time are determined. This task is satisfying enough only to determine the system fail-over destinations in the case where a failure has occurred in any one of the four machines. Accordingly, the DB processing servers in each machine are sequentially distributed into the machines other than their initial location machine, thereby determining the system fail-over sequence (
Moreover, in accordance with the step 804, the system fail-over destinations at the second time or thereinafter for each DB processing server are determined. For example, when determining the system fail-over destination for the DB server 00, another DB processing server is extracted which has been located in the machines of the combination of the machine 1 and the machine 2 in which the DB server 00 has been located so far. Here, the DB server 03 corresponds to this DB processing server to be extracted.
Next, the processing proceeds to the step 805. The DB server 00 and the DB server 03 are located into the machines other than the machine 1 and the machine 2 in a manner of being distributed thereto. Here, the DB server 00 is located into the machine 3, and the DB server 03 is located into the machine 4. Hereinafter, in accordance with the judgment at the step 806, the system fail-over destination at the second time for each server is determined (
In the present example, the multi-failure tolerant number-of-times are set at two number-of-times. Consequently, the definition on the system fail-over sequence is completed with the determination of the system fail-over destinations at the second time. If the multi-failure tolerant number-of-times is specified at larger number-of-times, in accordance with the judgment at the step 807, the system fail-over destination machine is similarly determined for each DB processing server.
Furthermore, in accordance with the step 611, the unit 300 outputs the determined contents as the system fail-over definition file 109. In addition, the definition-file delivery processing unit 301 delivers the definition file to the machine 0 (101), the machine 1 (102), the machine 2 (103), the machine 3 (104), and the machine 4 (105), thereby constructing the system.
A diagram (a) in
A diagram (b) in
A diagram (c) in
In this way,
As having been explained so far, according to the database management system constructing device in the present embodiment, the database management system is constructed by calculating the DB-processing-server number which becomes equally distributable into the information processing devices under operation at the time of a failure occurrence. As a result, it becomes possible to guarantee that, at the initial start-up time and after the system fail-over, the loads among all the information processing devices will become equal or substantially equal.
Hereinafter, the explanation will be given below concerning a database management system constructing device in a second embodiment for constructing a database management system by inputting an already-existing system-configuration definition file.
In the present embodiment, when performing a configuration modification of applying the system fail-over function to a database management system to which the system fail-over function has been not applied, the explanation will be given below concerning a processing of taking advantage of information in the system-configuration definition file 108 of the database management system as the input information 107.
Moreover, at a step 1002, the unit 300a judges whether to employ the configuration of the DB processing servers of the already-existing system, or to employ the configuration where the loads are made equal among all the machines, i.e., whether or not modifying the BES (Back-End Server) number is unnecessary. This judgment is made in accordance with instruction contents inputted from the user via the terminal 2005. Furthermore, at the step 1002, if the unit 300a has judged that the configuration of the DB processing servers be reset, the processing proceeds to the step 602. Meanwhile, if the unit 300a has judged that the already-existing configuration be used, the processing proceeds to the step 610. Hereinafter, processing contents at the step 602, step 603, step 604, step 605, step 606, step 607, step 608, step 609, step 610, and step 611 are basically the same as those in the first embodiment.
Next, the definition-file delivery processing unit 301 delivers the system fail-over definition file 109 to each of the machines configuring the database management system. Each machine performs the system fail-over based on this system fail-over definition file 109, thereby modifying the system into the system of the system fail-over function-applied configuration. Also, if, at the step 1002, the unit 300a has judged that the loads be made equal among all the machines, and if the system-configuration definition file 108 has been created, the definition-file delivery processing unit 301 also delivers the system-configuration definition file 108 to each machine within the system.
Next, the explanation will be given below concerning a concrete example in the case where, when the system is operating with the system fail-over function-unapplied configuration like
First, at the step 1001, the system-configuration determination processing unit 300a reads the system-configuration definition file of the already-existing system. From this information, the unit 300a acquires the machine configuration and disc configuration of the system. In the present example, it is assumed that the machine number for locating the DB processing servers therein is “4”, and that the disc number is “12”.
Next, at the step 1002, the unit 300a determines whether or not the configuration of the DB processing servers should be reset. This time, assuming that the configuration of the DB processing servers should be reset, and that hereinafter, similarly in the first embodiment, the system fail-over definition has been created with the multi-failure tolerant number-of-times set at two number-of-times, the system-configuration determination processing unit 300a creates and outputs the system-configuration definition file 108 and the system fail-over definition file 109. This brings about the result that the DB processing servers which are “12” in number will be located similarly in the construction example in the first embodiment.
Next, the definition-file delivery processing unit 301 delivers the system-configuration definition file 108 and the system fail-over definition file 109 to each machine on the system. Each machine starts up the DB processing servers based on the system-configuration definition file 108, and operates the system fail-over function based on the system fail-over definition file 109. This operating state becomes basically the same as the case in the first embodiment.
In this way, in the present embodiment, it is possible to take advantage of the contents of the already-existing system-configuration definition file 108 when modifying the system fail-over function-unapplied system into the system fail-over function-applied configuration.
As having been explained so far, according to the database management system constructing device in the present embodiment, the database management system is constructed by inputting the already-existing system-configuration definition file. As a result, it becomes possible to reduce a load for re-inputting the system configuration of the already-existing database management system.
Next, referring to
The third embodiment of the database constructing system to which the present invention is applied makes it possible to specify, as an input value, a DB-processing-server location machine as the machine to be used specifically for standby. In this case, in the occurrence of failures up to the number of the standby-specific machines, all of the DB processing servers of a failed machine are switched into the standby machines. As a result, at the failure occurrence time, throughput of the entire system will not become lower up to the number of the standby machines. Simultaneously, at a failure occurrence time starting from a state where there exists no standby machine, the system fail-over is performed into the other remaining machines. This configuration lessens the lowering in the throughput of the entire system down to the smallest possible degree. The initial construction of this system will be facilitated.
Hereinafter, referring to a flowchart illustrated in
The processing steps illustrated in the present diagram differ, in a step 1701, from the processing steps of the system-configuration determination in the first embodiment illustrated in
First, similarly in the first embodiment, the system-configuration determination processing unit 300b reads the input information at the step 601. Hereinafter, similarly in the first embodiment, the unit 300b determines the DB-processing-server number at the step 602. Also, it is assumed that the initial value of the operating-machine number at the time of determining the DB-processing-server number at the step 602 is a value resulting from subtracting the standby-specific machine number from the machine number specified in the input information 107b. Similarly in the first embodiment, the flow in
Next, in accordance with the step 1701, the unit 300b determines the system fail-over sequence. The system fail-over sequence will be determined at the step 1701 in accordance with a flowchart in
First, at a step 1800, the system fail-over definition table 1000 is prepared. At this time, column number of the table is set at “the standby-machine number +1”. Next, at a step 1801, the initial location of all the DB processing servers is determined. This means that all the DB processing servers determined at the step 602 are located into each machine in an equal number. Moreover, at a step 1802, it is judged whether or not determinations of the system fail-over destinations have been completed by the number-of-times equivalent to the standby-machine number. If, in the judgment at the 1802, it is judged that the determinations have been not completed, the processing proceeds to a step 1803. At the step 1803, the system fail-over destinations are determined sequentially. This task is satisfying enough only to determine the system fail-over destinations in the case where a failure has occurred in any one of the machines. Accordingly, the standby-specific machines are specified as the system fail-over destinations. Meanwhile, if, at the 1802, the judgment result has been found to be the completion, the present flow is terminated.
Hereinafter, the processing contents at the step 611 are basically the same as the ones in the first embodiment.
Next, the definition-file delivery processing unit 301 delivers the system-configuration definition file 108 and the system fail-over definition file 109 to each of the machines configuring the database management system. Each of the machines configuring the database management system operates based on the system-configuration definition file 108. If a failure has occurred in a machine in which the DB processing servers are operating, each machine carries out the system fail-over based on the system fail-over definition file 109, thereby continuing the operation of the database management system.
Also, after the database management system has been operated, the standby-machine number count processing unit 1600 counts the number of the standby machines which are on standby in preparation for a failure occurrence in the database management system. The unit 1600 has, as the initial value, the standby-machine number specified in the input information 107b, then decrementing the initial value by “1” every time the system fail-over has occurred. When this initial value has become equal to “0”, the unit 1600 operates the system fail-over sequence redefinition processing unit 1601. A flowchart in
At the step 801, similarly in the processing contents in the first embodiment, the system fail-over definition table 1000 is prepared.
The system fail-over definition table 1000 is a table for specifying machine names of the system fail-over destinations up to the multi-failure tolerant number-of-times. At the processing point-in-time, failures have occurred by the number-of-times equivalent to the standby-machine number, and the corresponding system fail-overs have been performed. Accordingly, the system fail-over definition table 1000 becomes the table corresponding to the system fail-overs by the number-of-times which results from subtracting the standby-machine number from the multi-failure tolerant number-of-times.
At the step 2000, the machine names in which the respective DB processing servers are operating at the processing point-in-time when the location information 1602 has been read at the step 1901 in
Hereinafter, processing contents at the step 803, step 804, step 805, step 806, and step 807 are basically the same as those in the first embodiment.
Next, the definition-file delivery processing unit 301 delivers the system fail-over definition file 109a to each of the machines configuring the database management system. If a failure has occurred hereinafter, each machine will perform the system fail-over based on this system fail-over definition file 109a.
Next, referring to
First, the user inputs the input information 107b. Here, the standby-specific machine number is specified at “1”. The other input information is basically the same as the one in the first embodiment, and the multi-failure tolerant number-of-times is set at “2”. Hereinafter, in accordance with the flowchart in
If a failure has occurred in the machine 1 (102) as is illustrated in a diagram (b) in
Here, the standby-machine number becomes equal to “0” from “1” in the standby-machine number count processing unit 1600. As a result, the system fail-over sequence redefinition processing unit 1601 redefines the system fail-over sequence in accordance with the flowchart in
Moreover, if a failure has also occurred in the machine 2 (103) as is illustrated in a diagram (c) in
As having been explained so far, according to the present embodiment, it becomes possible to construct the following system: Namely, even if a failure has occurred, the throughput of the entire system will not become lower as long as there remains the standby-specific machine. Also, even if there remains no standby-specific machine, the operation will be continued using the other machines which still remain at the failure occurrence time. This configuration prevents the throughput of the entire system from lowering up to the largest possible degree.
It should be further understood by those skilled in the art that although the foregoing description has been made on embodiments of the invention, the invention is not limited thereto and various changes and modifications may be made without departing from the spirit of the invention and the scope of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
2005-191702 | Jun 2005 | JP | national |