This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2010-153742, filed on Jul. 6, 2010, the entire contents of which are incorporated herein by reference.
This technique relates to a technique for distributedly holding data in plural nodes.
In the sphere of cloud computing, large-scale data is stored and processed in a large-scale distributed system (about hundred to tens of thousands of computers are integrated with a network). This kind of system is called, for example, a distributed data store. Such distributed data store has higher scalability and fault tolerance than a distributed database, which is a system resulting from distributing a typical relational database (RDB). However, in the current stage, functionally, the distributed data store is still no match for the RDB. Hereafter, the distributed data store and distributed database will be called a “distributed data system”.
In this kind of distributed data system, acquiring a snapshot (in other words, a copy of the entire data at a certain time) is considered. Because the snapshot is a copy at a certain time, the snapshot is typically not the most recent except for right after the acquiring. However, the snapshot is useful when desiring to see information going back in time to the certain time. Moreover, when performing a summing processing in which all data is referenced in a system in which updates occur often, there is a possibility of coming into collision with the processing by another user. However, because the snapshot does not cause any updates, other processing can be carried out without any collision with the snapshot. In a processing that uses the snapshot, data that is not the most recent is often sufficient. Therefore, it is unnecessary to worry much about the most recent data. However, consistency of data must be maintained.
For example, when acquiring a snapshot while a transaction is in progress, there is a possibility that consistency will be lost. Therefore, there is a problem when a transaction is carried out in plural nodes that are included in the distributed data system. More specifically, as illustrated in
As for a snapshot, a method called copy-on-write may be used. The copy-on-write virtually makes a show that a copy has been obtained, and at writing that is performed later (in other words, at executing “write” command), the values before the writing are copied. In a core system in which updates are carried out frequently, when the summing processing and/or analysis processing that uses the snapshot is executed at the same time, there is a high possibility that the collision will occur. However, because, by using the copy-on-write, it appears that the processing to acquire the snapshot is completed immediately, both of the processing can be executed simultaneously without any collision. However, how the copy-on-write should be used in a situation in which a transaction is executed across the plural nodes as described above was not taken into consideration before.
Namely, when data is distributedly held in plural nodes of a system such as the distributed data system, it is difficult for conventional arts to obtain the consistent snapshot.
An information processing method relating to a first aspect of this technique includes: in response to receipt of a snapshot request from a first node that receives an instruction to obtain a snapshot, identifying transactions in progress; transmitting data representing states of the identified transactions in progress to the first node; after the identifying, carrying out a first processing to prevent the transactions in progress from normally completing; receiving a list of first transactions whose results are reflected to snapshot data or a list of second transactions whose results are not reflected to the snapshot data; and causing to execute copy-on-write on a basis of a specific time after removing the first transactions from among transactions to be processed in the first processing and confirming that the respective first transactions are normally completed or cancelled.
In addition, an information processing method relating to a second aspect of this technique includes: in response to receipt of an instruction to obtain a snapshot, transmitting a snapshot request to each of a plurality of first nodes; receiving, from each of the plurality of first nodes, identifiers of transactions in progress and data representing states of the transactions in progress, and storing the received identifiers and the received data into a data storage unit in association with a transmission source node; identifying first transactions for which an acknowledgement response has been outputted in each of relating transmission source nodes from among the transactions whose identifiers are stored in the data storage unit; and transmitting a list of the identified first transactions or a list of second transactions that are transactions other than the identified first transactions among the transactions whose identifiers are stored in the data storage unit.
The object and advantages of the embodiment will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the embodiment, as claimed.
The system illustrated in
In response to the request, the snapshot processing unit 1220 of the computer 1200 receives the snapshot request from the computer 1100 (step S1005). Then, the transaction manager 1210 of the computer 1200 identifies transactions in progress (for example, transaction process 1230) in the computer 1200, generates data representing the states of the transactions in progress, and outputs the generated data to the snapshot processing unit 1220. The snapshot processing unit 1220 transmits identifiers of the transactions in progress and data representing the states of the transactions in progress to the computer 1100 (or in other words, the coordinator node) (step S1007).
Furthermore, after identifying the transaction in progress, the transaction manager 1210 carries out a prevention processing to prevent the transactions in progress from normally completing (step S1011). As for the transactions other than selected transactions that will be described later, the normal completion of the transaction is delayed so that the results of the transactions are not reflected to the snapshot. However, at this step, the selected transactions whose results are reflected to the snapshot and other transactions whose results are not reflected to the snapshot have not yet been identified. Therefore this step is uniformly executed for the transactions under operation in order that the transactions do not complete properly.
The communication unit 1110 of the computer 1100 receives the identifiers of the transactions in progress and the data representing the states of the transactions in progress from each of the computers 1200, stores the received identifiers of the transactions in progress and the received data into the data storage unit 1130 in association with the transmission source computer 1200 (step S1009).
After the aforementioned data is received from all of the computers 1200, the transaction selector 1120 identifies, from among the transactions whose identifier is stored in the data storage unit 1130, transactions for which an acknowledgement response (in other words, “ack”) has been outputted in each relating transmission source participant nodes (in other words, computers 1200) as selected transactions whose results are reflected to snapshot data (step S1013). That is because, when an acknowledgement response has been outputted from all of the transmission source participant nodes, the transactions complete normally with no delay or are cancelled. Therefore, it is preferable that the results of such transactions are included into the snapshot in view of securing the immediacy of the snapshot. As for transactions for which an acknowledgement response is not outputted from all of the transmission source participant nodes, the time until the transactions complete normally or are cancelled is highly uncertain, and when trying to include the results of the transactions into the snapshot, the snapshot acquisition timing is delayed by that uncertain amount of time. Therefore, it is considered that the results of such transactions may not be included into the snapshot. Then, the communication unit 1110 transmits a list of selected transactions, or a list of the transactions other than the selected transactions among the transactions whose identifiers are stored in the data storage unit 1130 to each of the computers 1200 (step S1015).
In response to the transmission of the list, the snapshot processing unit 1220 in each of the computers 1200 receives the list of the selected transactions or the list of transactions that are transactions other than the selected transactions from the computer 1100 (step S1017). Incidentally, the list of the selected transactions may be a list that is common for all of the computers 1200, or may be an individual list for each of the computers 1200. The individual list may include a list of the selected transactions relating to the computer 1200 that is a destination of the individual list, and may also not include a list of transactions that are not to be processed by the destination computer 1200.
Then, the transaction manager 1210 removes the selected transactions (this typically includes one or plural transactions, however there is a case that no transaction are included, namely a case of “empty”) from the transactions for which the prevention processing is carried out, and the snapshot processing unit 1220 causes to execute the copy-on-write on a basis of a specific time T after the respective selected transactions has normally been completed or cancelled (step S1019). In other words, as for a transaction that completed normally before the time point T, when the data after the processing of the transaction has completed is updated after the time point T, a processing (copy-on-write) for saving the data before the update is executed. Therefore, the states of the data after the processing of the transaction are recorded as the snapshot. Incidentally, the selected transactions have consistency among all of the computers 1200, so that when a transaction is the selected transaction in a certain computer 1200, there is no contradiction of it not being the selected transaction in another computer 1200. Such consistency is also the same for transactions that are not selected transactions.
On the other hand, the processing to prevent from normally completing is carried out for the transactions other than the selected transactions to control the timing of the completion of the transactions so that the transactions do not complete normally before the time point T described above, and so that the normal completion of the transactions is made after the time point T has passed. As a result, when data update is caused by the normal completion of the transactions after the time point T has passed, the copy-on-write (data copy before the update) is performed immediately. Therefore, as for the transactions other than the selected transactions, the data before the transaction completes normally is included into the snapshot, and the processing result of such a transaction is not reflected on the snapshot.
As described above, by properly categorizing transactions according to the states of the transactions, and adjusting the time point of the commit, adjustment is made in order to avoid the lack of consistency, such as the result of a transaction T1 being reflected onto the snapshot at a certain node, however, no result being reflected onto the snapshot at another node.
The coordinator node 3 has a snapshot coordinator 310, and a data storage unit 320 that stores data that is processed by that snapshot coordinator 310. The snapshot coordinator 310 has a transaction selector 311 and a message communication unit 313.
The transaction coordinator node 5 has a transaction coordinator 51 and a data storage unit 52.
The participant node 7 has a snapshot participant 71 that cooperates with the snapshot coordinator 310, a transaction manager 73 that cooperates with the transaction coordinator 51, a transaction process 77 that is generated from the transaction manager 73 and functions as a transaction participant, a data storage unit 75 that stores data that is processed by the transaction manager 73 and snapshot participant 71, and a copy-on-write processing unit 79 that performs a processing for the copy-on-write. In addition, the participant node 7 also manages a database 82 that is processed by a transaction (in
The transaction coordinator node 5 may also sometimes be a participant node 7. In other words, the transaction coordinator 51 may also sometimes be included in the participant node 7. Similarly, the coordinator node 3 may also sometimes be a participant node 7. In other words, the snapshot coordinator 310 may also sometimes be included in the participant node 7. However, in the following, in order to simplify the explanation, they will be explained as being included in a separate node.
First, a few basic items of this embodiment will be explained.
In this embodiment, the term “transaction” is one of terms in database terminology, and indicates a group of processes that have consistency. For example, in a bank transfer, a processing to transfer a certain amount of money from one account to another account corresponds to one transaction. When there is consistency in the state before execution of the transaction, there is consistency also after execution of the transaction. In terms of the example above, there is consistency in that the total amount of money does not change even after execution of the transaction. In this embodiment, a transaction across plural nodes is called a distributed transaction. Moreover, when noted as a transaction, this also includes transactions within one node, and distributed transactions.
The transaction is processed as all or nothing. For example, in case of transferring a certain amount of money from one account to another account in the wire transfer, when an error occurs during the transaction, the data is not left in an unfinished state, and finally all of the data are reflected (commit) or all processes are cancelled (abort). Otherwise, inconsistency occurs.
Typically, the update results in a transaction are stored into a temporal storage area such as the log storage unit 81, and at the commit, the update results are finally reflected to the actual storage location of the data system, such as the database 82. On the other hand, at the abort, the update results that were written in the log are discarded.
Furthermore, in the distributed transaction, the execution that is processed at each node is called a sub-transaction. However, when it is known that the sub-transaction is executed at a specific node, such sub-transaction is also called a transaction. In the distributed data system, plural transactions may be processed simultaneously in the entire system. Therefore, there is a possibility that plural sub-transactions are operating at each node. In the distributed transaction, when there is even one sub-transaction to be aborted, all of the sub-transactions must also be aborted. In the case that even one sub-transaction is committed, inconsistency will occur. Therefore, the transaction coordinator 51 manages all of the sub-transactions together. In this embodiment, even when there is only one sub-transaction in the distributed transaction, similar control is made.
As described above, in the distributed transaction, when some of the sub-transactions are committed and the remaining are aborted, the consistency is lost. In order that such a case does not occur and the consistency is maintained, there is a two-phase commit protocol as illustrated in
The transact ion participant illustrated in
As illustrated in
The distributed transaction in this example is an example in which the transaction is committed at the nodes 1 and 2, and the consistency is maintained. As illustrated in
When any processing up to writing the update result into the log was not successful, a negative acknowledgement response (nack) is returned. Even in the case where there is just one negative acknowledgement (nark), the transaction coordinator transmits an abort, and the transaction is aborted.
Before transmitting the acknowledgement response (ack) message, a log representing that this acknowledgement response is transmitted is written into the log data storage unit. Also, immediately after the commit message is received, a log representing that the commit has been received is written into the log data storage unit. This is carried out so that, in the case some kind of trouble occurs after that, it is possible to know the transaction state and restore the transaction.
Next, the outline of the processing flow of the system illustrated in
The transaction coordinator 51, as described above, transmits a “begin transaction” to nodes (here, the transaction participant node 7) that will execute a sub-transaction. This is the same as normal.
The transaction manager 73 manages and controls the transaction (more precisely, the sub-transaction) in the participant node 7, and the transaction manager 73 captures the “begin transaction” from the transaction coordinator 51, generates a process for a transaction participant, and further transmits the “begin transaction” to the transaction participant. After this, the transaction coordinator 51 transmits “command” and “prepare” messages to the transaction participant, and the transaction participant receives the “command” and “prepare” messages without the transaction manager 73 taking part in the exchange of this kind of messages. In response to this, the transaction participant outputs an acknowledgment response (ack) or negative acknowledgement response (nack). The transaction manager 73 captures this response. Incidentally, in the case of a situation as will be described below, transmission of the acknowledgement response to the transaction coordinator 51 is delayed. Similarly, in response to the acknowledgement response or negative acknowledgement response, the transaction coordinator 51 transmits a commit or abort, and the transaction manager 73 captures that message. In the case of a situation as described below, output of the commit to the transaction participant is delayed.
In this embodiment, after a log representing that a commit or abort was received has been written, the transaction participant outputs a commit completion notification or abort completion notification to the transaction manager 73. After that, in the case of the commit, the transaction participant reflects the processing result to the database 82. Incidentally, after the processing result has been reflected onto the database, or after the processing for the abort has been completed, a commit completion notification or abort completion notification may be outputted. However, as described above, there is no problem even when the notification is transmitted after writing the log, and this is advantageous because the notification is made earlier. As a result of this, the transaction manager 73 knows that the processing on the transaction participant side is complete.
In this way, the transaction manager 73 manages the transaction (in other words, sub-transaction) in progress at the participant node 7, and grasps the processing state. For example, data such as illustrated in
Next, the snapshot protocol will be explained using
Here, the snapshot participant 71 outputs a request for a list of transactions in progress to the transaction manager 73 (step (12)). The transaction manager 73 identifies the transactions in progress according to a predetermined rule, generates a list of transactions in progress and outputs the generated list to the snapshot participant 71 (step (13)). As will be explained below, after the transactions in progress have been identified, the transaction manager 73 captures the acknowledgement responses and commits for the transactions listed in the list of transactions in progress and delays the output thereof.
The snapshot participant 71 receives the list of the transactions in progress from the transaction manager 73, and transmits the list to the snapshot coordinator 310 of the coordinator node 3 (step (14)). The snapshot coordinator 310 receives the list of the transactions in progress from all of the participant nodes 7, then performs a processing as will be described below to select the transactions whose results will be reflected onto the snapshot, generates a list of selected transactions for each participant node, and transmits the generated list to each participant node 7 (step (15)). The selected transactions are transactions for which an acknowledgement response (ack) has been outputted at all of the relating nodes.
The snapshot participant 71 of the participant node 7 receives the list of the selected transactions, and then outputs the list of the selected transactions to the transaction manager 73 (step (16)).
The transaction manager 73 receives the list of the selected transactions, and carries out a processing for the commit or abort for the transactions listed in the list of the selected transactions. In other words, the transaction manager 73 transmits the captured acknowledgement responses and outputs the commit. Incidentally, as for the abort, transmission is not delayed, so a processing is completed as is with failure of the transaction, however, that the abort was transmitted (or received) is checked.
After writing into the data storage unit 75 that the commit or abort was received, each of the selected transaction participants outputs a commit completion notification or abort completion notification to the transaction manager 73. After that, when the commit completion notification or abort completion notification has been received from all of the transactions listed in the list of selected transactions, the transaction manager 73 outputs a notification to notify the completion of a selected transaction processing to the snapshot participant 71 (step (17)).
When the snapshot participant 71 receives the notification to notify the completion of the selected transaction processing from the transaction manager 73, the snapshot participant 71 determines the final snapshot time. The copy-on-write is carried out based on this final snapshot time. The copy-on-write will be explained in detail later.
Furthermore, the snapshot participant 71 transmits a snapshot completion notification to the snapshot coordinator 310 (step (18)). It is not shown in
After the final snapshot time, the snapshot participant 71 transmits a transaction completion request to the transaction manager 73 in order to complete transactions that are listed in the list of the transactions in progress but not listed in the list of the selected transactions (step (19)). The transaction manager 73 causes the transactions to be completed by transmitting the captured acknowledgement responses (ack) to the transaction coordinator 51, and outputting the captured commit to the transaction participants.
In this way, while the consistency of the data in the overall system is maintained by suitably categorizing transactions according to the states of the transactions and adjusting the commit timing, immediacy by the copy-on-write is enabled by adequately setting the final snapshot time, which is the timing to carry out the copy-on-write.
In order to make it easy to understand the following explanation, the control for delaying the output of the acknowledgement response (ack) and commit is explained using
As described above, the transaction manager 73 carries out the control for delaying the outputs by capturing the commits and acknowledgement responses of transactions listed in the list of the transactions in progress on and after the temporary snapshot time. In
After that, after the processing for the commit has been performed for all of the selected transactions included in the list of the selected transactions, the snapshot participant 71 sets the final snapshot time. As a result, at step (18), the snapshot participant 71 transmits a snapshot processing completion notification to the snapshot coordinator 310. Furthermore, at step (19), the snapshot participant 71 outputs a transaction completion request to the transaction manager 73. When the transaction manager 73 receives the transaction completion request, the transaction manager 73 transmits the captured and delayed acknowledgement response (ack), then causes the transactions to execute the subsequent processing (arrow B). As a result, because the transaction coordinator 51 transmits a commit, for example, the commit is received in the process of the transaction t3 as well, and the processing for the commit is carried out.
In this way, by carrying out the copy-on-write based on the final snapshot time after completing the transactions whose processing results are reflected on the snapshot, it is possible to obtain the consistent snapshot data based on the final snapshot time, immediately in the appearance.
In a core system in which updates occur frequently, when a summing processing and analysis processing, which include reference to the database, are simultaneously executed, there is a high possibility of collision. However, by instantaneously obtaining the snapshot and carrying out the summing processing and analysis processing on the obtained snapshot data, it becomes possible to execute both simultaneously without collision.
The detailed processing will be explained next using
When the message communication unit 313 of the snapshot coordinator 310 in the coordinator node 3 receives an instruction to obtain the snapshot from the user terminal 9, for example (
When the snapshot participant 71 in the participant node 7 receives the snapshot request (step S5), the snapshot participant 71 outputs a request for a list of transactions in progress to the transaction manager 73 (step S7). The transaction manager 73 receives the request for the list of the transactions in progress from the snapshot participant 71 (step S9), generates the list of the transactions in progress, and outputs the generated list to the snapshot participant 71 (step S11).
As illustrated in
Therefore, in a case such as illustrated in
Furthermore, after the transactions in progress have been identified, the transaction manager 73 not only captures the acknowledgement response and commit that were outputted or transmitted for the transactions listed in the list of the transactions in progress, but also delays the output or transmission of them (step S13). As was described above, the transaction manager 73 also captures negative acknowledgement responses and aborts, and updates the transaction management table as illustrated in
On the other hand, when the snapshot participant 71 receives the list of the transactions in progress from the transaction manager 73 (step S15), the snapshot participant 71 transmits the list of the transactions in progress to the snapshot coordinator 310 (step S17).
When the message communication unit 313 of the snapshot coordinator 310 receives the list of the transactions in progress from each of the participant nodes 7 (step S19), the message communication unit 313 stores the received list into the data storage unit 320 in association with the identifiers of the transmission source nodes (or snapshot participants). After the message communication unit 313 receives the list of the transactions in progress from all of the participant nodes 7, the message communication unit 313 notifies the transaction selector 311 of this event.
For example, it is assumed that the progress states of transactions in node A are as illustrated in
In such a case, a table as illustrated in
From the examples in
After receiving the notification from the message communication unit 313, the transaction selector 311 carries out a transaction selection processing (step S21). The transaction selection processing will be explained using
The transaction selector 311 identifies one unprocessed transaction (step S31). Then, the transaction selector 311 checks whether or not an acknowledgement response has been outputted in each of the nodes from which the identified transaction is notified by the list of the transactions in progress (step S33). At this step, as illustrated by the examples in
As for transaction t2, since the commit has already been received in the node B, the state of the transaction t2 is not represented in the list of the transactions in progress for the node B. In other words, it is not known whether the transaction t2 has been executed in the node B, however, in such a case, only nodes for which the report was received are checked. The reason for this is described below. As for anode in a state before an acknowledgement response is outputted, the output of the acknowledgement response, as will be described below, is delayed until the final snapshot time. Therefore, there is no commit until then. That is, when there is a node in which the commit has already been made, this means that there is no node in the state before an acknowledgement response is outputted. Therefore, all of the states in the list of the transactions in progress must be the states after the acknowledgement response has been outputted, and regardless of the information relating to the committed nodes, the result is finally determined to be included in the snapshot.
When the acknowledgement response has been outputted for the identified transaction in all of the nodes from which the notification was received (step S35: YES route), the transaction selector 311 sets ON to the selection flag in the management table such as illustrated in
On the other hand, when the acknowledgement response has not been outputted in any one of the nodes from which the notification of the identified transaction was made (step S35: NO route), the transaction selector 311 sets OFF to the selection flag in the management table such as illustrated in
To sum up, when there are two nodes, the judgment criteria are as illustrated in
The transaction selector 311 then determines whether or not all transactions have been processed (step S41). Where there is an unprocessed transaction, the processing returns to the step S31, however, when the processing has been completed for all transactions, the processing returns to the calling-source processing.
In the example in
Returning to the explanation of the processing in
Moving to an explanation of the processing in
On the other hand, when the list of the selected transactions is not empty, the transaction manager 73 carries out a selected transaction processing (step S57). This selected transaction processing will be explained using
The transaction manager 73 outputs a commit, which had been captured and delayed for the selected transactions listed in the list of the selected transactions, to the corresponding transaction process 77 (transaction participant) (step S81). In addition, when a commit is newly captured for the selected transactions, the transact ion manager 73 immediately outputs that commit to the corresponding transaction process 77 (step S83).
By doing so, the selected transactions are completed before the final snapshot time is set. As was explained above, when a commit is received, the transaction participant registers, into the log storage unit 81, that the commit was received, then outputs a commit completion notification to the transaction manager 73. After that, the processing results that were stored in the data storage unit 75 are reflected on the database 82. As for an abort, the transaction manager 73 captures it but immediately outputs it without any delay. The transaction manager 73 also manages the states of the transactions in the management table as illustrated in
For example, a selected transaction management table as illustrated in
The transaction manager 73 then determines whether all of the selected transactions have been completed (step S85). The transaction manager 73 determines whether ON is set to the completion flag in the selected transaction management table as illustrated in
When it is determined that all of the selected transactions have been completed, the processing returns to the calling-source processing. On the other hand, when there is a selected transaction that is not completed, the transaction manager 73 waits for receipt of a commit completion notification or abort completion notification for that selected transaction (step S87). When a commit completion notification or abort completion notification is not received (step S89: NO route), the processing returns to the step S87. On the other hand, when a commit completion notification or abort completion notification is received for any selected transaction (step S89: YES route), the transaction manager 73 carries out a completion registration in the selected transaction management table for the transmission source transaction of the commit completion notification or abort completion notification (step S91). In other words, ON is set to the completion flag. After that, the processing returns to the step S85.
In this way, the transaction manager 73 confirms that the selected transactions that are listed in the list of the selected transactions are completed in its own participant node 7.
Returning to the explanation of the processing in
The snapshot participant 71 receives the message to notify the completion of the selected transaction processing from the transaction manager 73 (step S65). Here, because the snapshot participant 71 has completed preparation to carry out the copy-on-write, the snapshot participant 71 determines the final snapshot time at this time (step S67).
The snapshot participant 71 then causes the copy-on-write processing unit 79 to start the copy-on-write (step S68). For example, the snapshot participant 71 generates a snapshot file in the data storage unit 75. By doing so, when the transaction process 77 carries out the next update of data (for example, data in page or record units) in the database 82, the copy-on-write processing unit 79 copies the data before the update and stores the copied data into the data storage unit 75, for example. Thus, in the appearance, acquisition of the snapshot is completed instantly. However, the actual snapshot is gradually stored in the data storage unit 75 every time an update is carried out.
After that, the snapshot participant 71 transmits a snapshot completion message to the snapshot coordinator 310 of the coordinator node 3 (step S69). The message communication unit 313 of the snapshot coordinator 310 receives the snapshot completion message from the snapshot participant 71 (step S71). After that, when snapshot completion messages have been received from all of the participant nodes 7, the message communication unit 313 of the snapshot coordinator 310 transmits completion notification to the user terminal 9 or the like (step S73).
When the user terminal 9 or the like receives the completion notification, it becomes possible to request the snapshot data after that.
On the other hand, the snapshot participant 71 outputs a transaction completion request to the transaction manager 73 (step S75). The transaction manager 73 receives the transaction completion request from the snapshot participant 71 (step S77). Here, the transaction manager 73 transmits or outputs acknowledgement responses and commits that were captured and delayed for transactions that are not listed in the list of the selected transactions but are listed in the list of the transactions in progress (step S79). As a result, when the transaction process 77 receives the commit, the processing results that were written in the log are reflected on the database 82. However, at this time, the copy-on-write processing unit 79 copies the data before being updated and stores the read data into the snapshot file in the data storage unit 75.
After that, until the snapshot participant 71 actually reads the snapshot data, when the transaction process 77 updates the database 82, the copy-on-write processing unit 79 copies the data before being updated and stores that data into the data storage unit 75, as long as the data has not already been copied. By repeating such a process, the snapshot data is gradually stored into the snapshot file in the data storage unit 75.
Here, update of the database 82 and change in the snapshot file will be explained using
After the final snapshot time, the snapshot file is generated in the data storage unit 75. As illustrated by the example in
As for the transaction t3, as processing advances, update data 5004 is generated for the page 4 and stored in the data storage unit 75, however, the time reaches the temporary snapshot time before an acknowledgement response is transmitted. Therefore, the transaction manager 73 captures the acknowledgement response and delays the output of the acknowledgement response. The time reaches the temporary snapshot time before the acknowledgement response is transmitted, so the processing results are not reflected on the snapshot, and after the time reaches the final snapshot time, the acknowledgement response is released, and for example, a commit is also outputted. By doing so, after the commit is received, the database 82 is updated with the update data 5004. At this time, the copy-on-write is executed, and the data for the page 4 before being updated with the update data 5004 is stored in the snapshot file. The ID “4” of the copied page is also registered in the header.
After that, the transaction t5 is executed, update data 5005 for the page 1 is generated and stored in the data storage unit 75, and after a commit is outputted, the data of the page 1 in the database 82 is updated with the update data 5005. Here, because the page 1 is not registered in the header of the snapshot file, the copy-on-write is performed, and the data for the page 1 before being updated with update data 5005 is stored. The ID “1” of the copied page is also registered in the header.
By repeating such a processing, the snapshot data is stored in the snapshot file.
The processing that is carried out after that when a request to obtain the snapshot data is outputted from the user terminal 9 to the transaction coordinator node 5, for example, will be explained using
The transaction coordinator 51 receives the data read from the database 82 and the data read from the snapshot file from the transaction process 77, and stores the data in the data storage unit 52 (step S121). When the transaction coordinator 51 receives data from all of the nodes, the transaction coordinator 51 transmits all of the snapshot data to the requesting source user terminal 9 (step S123). Incidentally, since the amount of data may become very large, data that represents the storage location of the transaction coordinator node 5 (for example, URI (Universal Resource Indicator)) may be sent as a notification to the user terminal 9, so that the user terminal 9 may download the data. Also, instead of outputting all of the snapshot data together, the data may be divided into plural portions or the data satisfying certain conditions as in the case of a normal database may be outputted in response to a request to return such data.
In this way, the user terminal 9 obtains snapshot data, and may perform analysis or summing of the obtained data. Analysis or summing of the snapshot data may also be partially executed by the transaction process at each node without returning the data to the user terminal 9 (for example, sums can be found at each node), and the results can then be returned to the user terminal 9, after which analysis and summing can be performed at the user terminal (for example the total of the sums found at each node can be calculated).
The explanation up to this point has centered on transactions that normally complete. Here, a supplementary explanation will be given for a case in which a transaction cannot normally complete, because of an error that occurred at a transaction participant, or an error that occurred at another transaction participant or transaction coordinator.
As described above, when it is clear that a transaction will not complete normally (for example, the negative acknowledgement response is transmitted, or the abort is received), the transaction is not listed in the list of the transactions in progress. This is not a problem for the following reason.
In other words, in the case of a transaction for which an error occurred, an abort is finally received and the transaction is cancelled. That is, the transaction is not included in the snapshot, and in that sense, basically no problem will occur. However, when an error occurs at a different transaction participant, and the transaction is not listed in the list of the transactions in progress, there is a possibility that the transaction will be set to be included in the snapshot. However, in that case, the transaction is finally aborted, so the result is not included in the snapshot. When the transaction is aborted, transmission is not delayed, so the processing does not stall. Therefore, there is no problem.
Although the embodiments were explained, this technique is not limited to these embodiments. For example, the functional block diagram illustrated in
In addition, the user terminal 9, the coordinator node 3, the participant node 7 and the transaction coordinator node 5 are computer devices as shown in
More specifically, functions such as the snapshot coordinator 310, transaction coordinator 51, snapshot participant 71, transaction manager 73, and copy-on-write processing unit 79 may be realized by executing, by the CPU 2503, the programs. In addition, the HDD 2505 and memory 2501 are used to realize at least a portion of the data storage unit 320, data storage unit 75, log storage unit 81 and database 82.
Just to be sure, the three-phase commit protocol and copy-on-write will be supplementary explained.
(A) Three-Phase Commit Protocol
In the two-phase commit protocol, when a coordinator is failed before a participant receives the commit and after the participant transmits an acknowledgement response, a state that the commit and abort cannot be made, so-called “blocking”, occurs. In order to solve such a problem, the three-phase commit protocol is considered.
The difference with the two-phase commit protocol is that, after exchanging “prepare” and “ack” as illustrated in
Incidentally, the exchange of the final ack (not “ack” after “prepare”) and commit is related to this embodiment, as well as the two-phase commit protocol.
In addition, the writing of logs concerning “ack” and “commit” is similar to the two-phase commit protocol.
(B) Copy-on-Write
Here, the relationship between the concentrated copy-on-write and data update will be explained based on the storage structure in page unit. The page is a memory block having a fixed length, such as 4 KB or 8 KB, and is a unit for input and output to a disk device. However, instead of page unit, record unit may be used.
As illustrated in
At the snapshot acquisition time, a snapshot file that is a file to store the snapshot data is prepared. Information representing what page was updated after the snapshot is stored in page 0 of this file. It is assumed that pages 1 to 5 of the snapshot file respectively correspond to pages with the same number in the database file. However, at the snapshot acquisition time, these areas are not allocated and empty.
At update time 1 after the snapshot acquisition time, it is assumed that the page 3 in the database file is updated. Then, before the page in the page 3 is updated, its contents are copied into the page 3 in the snapshot file.
After that, the page 3 of the database file is updated. The arrows in the figure represent the update.
At update time 2, it is assumed that the page 5 in the database file is updated. Also at this time, like the page 3, the page 5 of the database file is updated after a copy is stored to the page 5 of the snapshot file.
It is also assumed that, at update time 3, the page 3 is updated again. At this time, although the database file is updated, a page before the update is not copied to the snapshot file. This is because the contents at the snapshot acquisition time have already been stored in the snapshot file, and it is not required to copy the contents. If the contents are further copied, information at the snapshot acquisition time is lost.
Next, reference to the page will be explained. As for pages that are not updated after the snapshot, page in the database file is referenced, and as for pages that were updated, the page in the snapshot file is referenced. For example, at and after the update time 2, as for the pages 3 and 5, the snapshot file is referenced, and as for pages other than them, the database file is referenced. It is possible to judge which file should be referenced, based on information in the page 0 in the snapshot file.
Thus, in this copy-on-write method, at the snapshot acquisition time, it is enough only by preparing a file, which is almost empty. Therefore, it is possible to immediately obtain the snapshot. However, because the actual copy is delayed and performed at the update, the processing amount for the update processing increases by the processing amount of the copy. In addition, when all pages are updated after the snapshot acquisition time, the almost same storage area as the database file is required similarly to a processing to copy at the snapshot acquisition time.
The aforementioned embodiments are outlined as follows:
A snapshot acquisition processing method executed by a computer that is a snapshot participant node includes: (A) in response to receipt of a snapshot request from a first node that receives an instruction to obtain a snapshot, identifying transactions in progress; (B) transmitting data representing states of the identified transactions in progress to the first node; (C) after the identifying, carrying out a first processing to prevent the transactions in progress from normally completing; (D) receiving a list of first transactions whose results are reflected to snapshot data or a list of second transactions whose results are not reflected to the snapshot data; and (E) causing to execute copy-on-write on a basis of a specific time after removing the first transactions from among transactions to be processed in the first processing and confirming that the respective first transactions are normally completed or cancelled.
Thus, when the transactions having a possibility that the processing result is reflected to the snapshot are made successfully complete or cancelled by a specific time and the copy-on-write is carried out on a basis of the specific time, it becomes possible to immediately obtain the consistent snapshot. Incidentally, when there are a lot of first transactions, the communication amount may be reduced by employing the list of the second transactions.
Incidentally, the aforementioned first processing may include a processing to prevent from receiving a commit, and the aforementioned causing may include outputting the commit whose receiving was prevented to a process for the first transactions. It is possible to handle a case of a protocol in which a commit is outputted in response to an acknowledgement response, such as two-phase commit protocol.
In addition, the transactions in progress may be defined by excluding a transaction that has outputted a negative acknowledgement response and a transaction that has received an abort from transactions that have not received the commit. By limiting to the transactions having a possibility that the results are reflected onto the database or the like, the processing load of the snapshot coordinator is reduced.
Furthermore, the aforementioned first processing may further include a processing to prevent from transmitting an acknowledgement response from a transaction that has not received the commit. In such a case, the method may further include: after the specific time, transmitting the acknowledgement response whose transmitting is prevented to a transaction coordinator; and after the specific time, causing the second transactions to execute a normal completion or cancellation. Thus, it becomes possible to surely not include the processing results of the second transactions into the snapshot.
Furthermore, the aforementioned transmitting may include storing a second list of identifiers of the identified transactions in progress into a data storage unit. Moreover, the aforementioned first processing may include, based on the second list stored in the data storage unit, preventing the transactions in progress from normally completing. In addition, the aforementioned receiving may include: storing the list received from the first node into the data storage unit; and checking, based on the list received from the first node and stored in the data storage unit, whether the respective first transactions have completed or cancelled. Thus, the processing is surely carried out.
A snapshot acquisition processing method executed by a computer that is a snapshot coordinator node includes: (A) in response to receipt of an instruction to obtain a snapshot, transmitting a snapshot request to each of a plurality of first nodes; (B) receiving, from each of the plurality of first nodes, identifiers of transactions in progress and data representing states of the transactions in progress, and storing the received identifiers and the received data into a data storage unit in association with a transmission source node; (C) identifying first transactions for which an acknowledgement response has been outputted in each of relating transmission source nodes from among the transactions whose identifiers are stored in the data storage unit; and (D) transmitting a list of the identified first transactions or a list of second transactions that are transactions other than the identified first transactions among the transactions whose identifiers are stored in the data storage unit. Incidentally, the list may be generated for each participant node.
Incidentally, it is possible to create a program causing a computer to execute the aforementioned processing, and such a program is stored in a computer readable storage medium or storage device such as a flexible disk, CD-ROM, DVD-ROM, magneto-optic disk, a semiconductor memory, and hard disk. In addition, the intermediate processing result is temporarily stored in a storage device such as a main memory or the like.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present inventions have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2010-153742 | Jul 2010 | JP | national |