The invention relates to control of a server computer and, in particular, relates to a method of controlling writing to a disk device by the server computer.
In many of the cases where massive data processing by computer is demanded, the data is distributed to a plurality of computers to be simultaneously processed in parallel. In some of the cases where a series of complex steps are required to complete processing, independent computers dedicated to different steps are prepared to perform the processing while transferring results of the individual steps among them. In such data processing utilizing a plurality of computers, it is common, for the purpose of convenience in giving and receiving data, to construct a shared file system that contains files to be processed or to retain results of processing on the network, so that all computers can access the same file.
A shared file system is easy to become a single point of failure because of the nature that the shared file system provides files to all over the system. That is to say, if the computer providing the shard file system goes down, file accesses will be unavailable in the entire system to cause a catastrophic system halt. For this reason, a shared file system created for a large-scale system is configured with multiple computers to provide a failover function that enables uninterrupted file accesses when a failure occurs in a computer.
Lustre 2.0 Operations Manual, Chapter 30, [online] Internet, accessed Sep. 2, 2011, discloses a method as follows. When a computer (hereinafter, client) for making a request to create or write to a file issues a request involving writing to a computer for processing the request (hereinafter, server), the server assigns a unique transaction number to the processing.
The server writes the requested matter and the latest transaction number to a disk. The client is notified of the transaction number and keeps the details of the request and the transaction number until the writing to the disk by the server is confirmed. If a failure occurs in the server, a substitute computer is activated to check the contents of the disk used by the previous server. Since the disk has a record of transaction numbers of the completed writing to the disk, the substitute computer checks the numbers and requests the client to re-execute the processing that has not written the disk yet.
Non-Patent Literature 1: Lustre 2.0 Operations Manual, Chapter 30, [online] Internet [accessed Sep. 2, 2011]
The existing technique as disclosed in the aforementioned Lustre 2.0 Operations Manual assigns a transaction number to a request from the client and writes the transaction number to the disk together with the requested matter, so that the client's request can be restored without duplication or lack in the case of failure in the server. Such an example is illustrated in
However, incorporating a new function to the standard disk file system included in an operating system is more likely difficult in view of the possibility of acquisition of source codes or compatibility with the existing files. However, incorporating a new function to the standard disk file system included in an operating system is more likely difficult in view of the possibility of acquisition of source codes or compatibility with the existing files. In order to eliminate this difficulty, it is required to develop an entire disk file system, which increases the cost for development and maintenance of software.
The invention is accomplished in view of the foregoing problems; an object of the invention is, when an auxiliary server takes over processing from a failed main server, to prevent an error caused by double execution of writing without providing a special function of the disk file system, such as a function for recording transaction numbers.
A representative example of the invention disclosed in this application is as follows. Upon receipt of a request signal for writing to a storage device from a management computer, at least one server computer performs provisional writing, which is writing related to the information processing request, to a first storage part included in the at least one server computer and sends a first notification signal indicating completion of the provisional writing to the management computer. Upon receipt of the first notification signal, the management computer sends a second notification signal indicating that the management computer has received the first notification signal to the at least one server computer. Upon receipt of the second notification signal, the at least one server computer performs writing to the storage device.
The invention prevents an error caused by double execution of writing without providing a special function in the disk file system, such as a function for recording transaction numbers, when an auxiliary server takes over processing from a failed main server.
Hereinafter, embodiments of the invention will be described in detail with drawings.
This embodiment first provides an example of device configuration to embody the invention, thereafter, outlines overall processing, and then explains the details.
First, a configuration of a computer system in Embodiment 1 is explained with a block configuration diagram shown in
The network 104 is, for example, a LAN (Local Area Network). The network 105 is, for example, a SAN (Storage Area Network).
The client computer 101 is a computer apparatus to be used by a system administrator or a user, including a processor 111, a memory 112, a storage device 113, and a network interface 114, and connected to the network 104 via the network interface 114. The memory 112 of the client computer 101 holds a user program 140 and a file system client 141; for example, the user program 140 issues a data input/output command to the file system client 141. The user program 140 and the file system client 141 are computer programs; they are loaded to the memory 112 from the storage device 113 or from a different computer via the network 104 using the network interface 114 and executed by the processor 111. The file system client 141 includes a user request transfer module 151, an ACK processing module 152, a resend processing module 153, and a request history information 171, which will be described later.
The server computer 102a is a computer apparatus for receiving file input/output requests from the client computer 101 and accessing the disk devices 103a to 103c, and includes a processor 121a, a memory 122a, a storage device 123a, a network interface 124a, and a storage interface 125a. The server computer 102a is connected to the network 104 via the network interface 124a and connected to the network 105 via the storage interface 125a.
The memory of the server computer 102a holds a file system server 142a to process requests from the file system client 141. The file system server 142a is a computer program, which is loaded to the memory 122a from the storage device 123a or from a different computer via the network 104 using the network interface 124a and is executed by the processor 121a. The file system server 142a includes a client request provisional execution module 161a, a failover processing module 162a, a disk file system module 163a, and memory file system information 181a, which will be described later.
The server computer 102b is a computer apparatus having the same configuration as that of the server computer 102a including the aforementioned file system server 142a, processor 121a, storage device 123a, and the like, and is connected to the network 104 via a network interface and connected to the network 105 via a storage interface. The memory 122b of the server computer 102b holds a client request provisional execution module 161b having the same configuration and function as the client request provisional execution module 161a, a failover processing module 162b having the same configuration and function as the failover processing module 162a, a disk file system module 163b having the same configuration and function as the disk file system module 163a, and memory file system information 181b having the same configuration and function as the memory file system information 181a, and they are executed by the processor 121b.
This embodiment is described assuming that the server computer 102a is a server computer to be used normally and the server computer 102b is a substitute server computer to be used when a failure occurs in the server computer 102a; however, the roles of the server computers 102a and 102b may be exchanged because they have no difference in configuration and function. Alternatively, both of them may be configured to be a substitute server of the other.
The disk devices 103a to 103c are storage devices connectable to the network 105; they may be hard disk drives (HDDs), semiconductor disks (SSDs), or a storage array in which HDDs and SSDs are combined as a RAID system.
Next, an overview of the functional blocks in the file system client 141 is provided.
The user request transfer module 151 transfers file input/output commands from the user program to the server computer. The ACK processing module 152 manages whether a processing request sent by the user request transfer module 151 has been successfully processed or not. The resend processing module 153 requests the server computer 102b working as a substitute server after occurrence of a failure in the server computer 102a to re-execute the request which was received by the server computer 102a before the failure but has not been completely processed. The request history information 171 is management information for the ACK processing module 152 to manage uncompleted processing.
Next, an overview of the file system server 142a is provided.
The client request provisional execution module 161a receives a request from the file system client 141 and performs processing on the memory file system information 181a. The failover processing module 162a receives a request from the resend processing module 153 of the file system client. The disk file system module 163a manages the data structure of data stored in the disk devices 103a to 103c and provides a variety of processing such as reading a file and writing a file. The disk file system module 163a converts a manipulation request to create, delete, read, or write to a file into a read/write request designating a recording position in the disk device 103a or other disk device and written in a recording format in the disk devices and issues the read/write request to the disk. The memory file system information 181a is a data structure for the client request provisional execution module 161a to execute a processing request from the file system client on the memory on a temporary basis.
Next, with
Described above is a configuration example of a computer system in this embodiment; hereinafter, operations of the components shown in
First,
The client request provisional execution module 161a that has received the message M220 executes the request on the memory file system information 181a under a request R225. If the client request provisional execution module 161a needs data which is not in the memory file system information 181a, it notifies the disk file system module 163a of a read request with a message M222. In response to the message M222, the disk file system module 163a receives required data from the disk device 103a or other disk device under a request R226 and forwards this data to the client request provisional execution module 161a with a message M227.
The client request provisional execution module notifies the ACK processing module 152 in the file system client 141 of a processing result with a message M230. The ACK processing module 152 that has received the message M230 notifies the user request transfer module of the processing result with a message M240. Further, the user request transfer module 151 notifies the user program of the processing result with a message M250.
If the message M230 indicates error termination or if the file system has not been changed because of read processing, the processing is terminated.
If the message M230 indicates normal termination of processing of a request involving writing, the user request transfer module 151 registers the request in the request history information 171 under a request R215. The ACK processing module 152 further notifies the client request provisional execution module 161a of receipt of the message M230 with a message M260. The client request provisional execution module that has received the message M260 writes the same request as the request R225 to the disk device 103a or other disk device via the disk file system module 163a (M270 and R275). The disk file system module 163a notifies the client request provisional execution module of completion of the write to the disk with a message M280 and the client request provisional execution module returns a notice of write to the disk device 103a or other disk device to the ACK processing module 152 with a message M290. The ACK processing module 152 that has received the message M290 deletes the information on this processing from the request history information under a request R295. Through the above-described registering and deleting request history information, the request history information holds a list of requests in the course of writing to the disk device 103a or other disk device under the request R275. The presence of the request history information enables re-execution of a request for which completion of writing to the disk device 103a or other disk device has not been confirmed when a failure occurs in the server.
As an effect of the processing in accordance with the sequence of
Between
First,
The sequences starting from circles 381, 382, and 383 respectively represent the processing of the client request provisional execution module 161b, the failover processing module 162b, and the disk file system module 163b in the server computer 102b that takes over the processing from the server computer 102a.
When the ACK processing module 152 does not receive a response to the message M260 corresponding to the message M290 in
Next,
Next, with reference to
Described above are outlines of overall processing in this embodiment. Hereinafter, operations in each module to perform such processing and a method of storing data in the memory are described in detail.
The user request transfer module 151 starts running in response to receipt of a file processing request from the user program 140 (S601). Upon start, at Step S602, the user request transfer module 151 forwards the file processing request to the client request provisional execution module 161a of the file system server in the server computer 102a. At Step S5603, the user request transfer module 151 waits for return of a result of processing in the client request provisional execution module 161a or the disk file system module 163a of the file system server in the server computer 102a through the ACK processing module 152. Step S604 is to wait for a message from the server; unless the user request transfer module 151 receives a message from the server within a specific time period, it determines that a communication error has occurred. Then, after waiting for the completion of later-described processing of the resend processing module 153 at Step S605, the user request transfer module 151 changes the destination of the request to the substitute server 102b at Step S606 and resends the request to the substitute server 102b at Step S602.
If, at Step S604, the user request transfer module 151 receives a message from the server within the specific time period, it determines that no communication error has occurred and proceeds to Step S607. Step S607 is to check a response from the server computer 102a or the server computer 102b if the route going through Step S606 is taken; the user request transfer module 151 checks whether the processing requested to the server at Step S602 involves writing to the disk device 103a or other disk device. If the result of checking is that the processing requested to the server at Step S602 does not involve writing to the disk device 103a or other disk device, the user request transfer module 151 returns, at Step S610, the result of the processing by the client request provisional execution module (161a or 161b) or the disk file system module (163a or 163b) of the file system server (142a or 142b) to the user program as a response to the request received at S601. If the result of the determination at Step S607 is that the processing requested to the server at Step S602 involves writing to the disk device 103a or other disk device, the user request transfer module 151 further determines, at Step S608, whether the result of processing the request by the client request provisional execution module or the disk file system module of the file system server (142a or 142b) is successful. If the result of determination is that the processing was failed, the user request transfer module 151 performs Step S610, which has already been described. If the result of the determination is that the processing has been successfully completed, the user request transfer module 151 stores the request to the request history information at Step 609 and performs Step S610.
Next, with reference to
The client request provisional execution module 161a starts running in response to receipt of a request from the above-described user request transfer module 151 (Step S701). Upon start, the client request provisional execution module 161a determines whether the received request is for processing involving writing to the disk device 103a or other disk device (S702).
If the determination at Step S702 is that the received request is for processing involving writing to the disk device 103a or other disk device, the client request provisional execution module 161a first determines, at Step S703, whether the memory file system information 181a has free space. If the determination at Step S703 is that the memory file system information 181a has no free space, the client request provisional execution module 161a deletes data with a release-enable flag ON to release the storage area at Step S704. Thereafter, the client request provisional execution module 161a performs the requested writing to the memory file system information 181a at Step S705. If the determination at Step S703 is that the memory file system information 181a has free space, the client request provisional execution module 161a skips S704 to perform the requested writing to the memory file system information 181a at Step S705.
If the determination at Step S702 is that the received request is not for processing involving writing to the disk device 103a or other disk device, the client request provisional execution module checks whether the designated data exists in the memory file system information 181a at Step S706. If, at Step S706, the designated data exists in the memory file system information 181a, the client request provisional execution module 161a retrieves the data from the memory file system information 181a at Step S707. If the designated data does not exist in the memory file system information 181a at Step S706, the client request provisional execution module 161a issues a read command to the disk file system module 163a at Step S708 and acquires the requested data. Finally, at Step S709, the client request provisional execution module 161a sends a notification indicating whether an error has occurred in the foregoing operations on the memory file system to terminate the processing.
The processing of the client request provisional execution module (161a or 161b) is featured by performing processing involving a change of the disk device 103a or other disk device on the memory file system information before actually requesting the disk device 103a or other disk device to perform the processing. The existence of the client request provisional execution module has an effect that whether writing to the disk device 103a or other disk device by a request from the user request transfer module 151 in the file system client has been performed can be determined without actually manipulating the disk device 103a or other disk device.
The processing on the memory file system information (181a or 181b) is featured by that information in the memory file system information (181 a or 181b) will not be deleted without going through later-described Step S905 in
Next, with reference to
If the determination at Step S903 is that the command is for processing involving writing to the disk device 103a or other disk device, the ACK processing module 152 determines whether a notification of successful processing has been received from the client request provisional execution module (161a or 161b) at Step S904. If the determination at Step S904 is that the processing is not successfully completed, the ACK processing module 152 terminates the processing (S909). If the determination at Step S904 is that the processing has been successfully completed, the ACK processing module 152 sends a message (M260) acknowledging a message M230 in
If the determination at Step S906 is that a response to the message has been received, the ACK processing module 152 deletes the request from the request history information at Step S907 and terminates the processing at Step S909. If the determination at Step S906 is that no response has been received, the ACK processing module 152 invokes the resend processing module 153 at Step S908 and terminates the processing at Step S909. The Step S907 to delete the request history information prevents unnecessary re-execution by the later-described resend processing module 153 in
Next, with reference to
Next, with reference to
Next, with reference to
The failover processing module 162b receives a message M310 explained in
The failover processing module 162b intermediates between the resend processing module 153 and the disk file system module 163b. The existence of the failover processing module 162b eliminates the necessity for the disk file system module 163b to have a function equivalent to the resend processing module 153, so that the file system included in an existing operating system can be used without change.
Described above is Embodiment 1 of the invention. This embodiment guarantees that, as explained with reference to
In the sequence in
As set forth above, the server computer is controlled so as not to perform writing to the disk device 103a or other disk device with respect to a request for which processing result, whether successful or failed, is unknown. This control provides a failover function without adding special processing such as writing transaction numbers to the disk device 103a or other disk device.
Hereinafter, Embodiment 2 is explained. Embodiment 2 is the same as Embodiment 1 in the basic configurations and operations of the client computer and the server computers but is to provide these configurations and operations in a parallel file system. Therefore, This embodiment explains only the configurations and operations different from Embodiment 1.
In Embodiment 2 of the invention, the server computers to be the parallel file servers are each composed of a main server and an auxiliary server as shown in
Next, at determination Step S1504, the parallel file system client module determines whether the contents of the processing involve an access to the contents of the file. If the determination at Step S1504 is that the contents of the processing involve an access to the contents of the file, the parallel file system client module requests the file system client 1451 or 1452 to access the file at Step S1505. Since the operations of the file system clients 141, 1451, and 1452 are the same as those in the foregoing Embodiment 1, explanation is omitted here. If the determination at Step S1504 is that the processing does not involve an access to the contents of the file or when Step S1505 has been completed, the parallel file system client module terminates the processing at Step S1507.
As described above, according to Embodiment 2 which applies the invention to a parallel file system, if a failure occurs in one of the servers 102a, 1401a, and 1402a constituting a parallel file system, the substitute server 102b, 1401b, or 1402b can take over the processing. This configuration can eliminate an error causing an inconsistency, for example, a state where the processing is completed successfully up to S1503 in
The invention is not limited to the above-described embodiments but includes various modifications. The above-described embodiments are explained in details for better understanding of the invention and are not limited to those including all the configurations described above. A part of the configuration of one embodiment may be replaced with that of another embodiment; the configuration of one embodiment may be incorporated to the configuration of another embodiment. A part of the configuration of each embodiment may be added, deleted, or replaced by that of a different configuration.
The above-described configurations, functions, processing modules, and processing means, for all or a part of them, may be implemented by hardware: for example, by designing an integrated circuit. The above-described configurations and functions may be implemented by software, which means that a processor interprets and executes programs providing the functions. The information of programs, tables, and files to implement the functions may be stored in a storage device such as a memory, a hard disk drive, or an SSD (Solid State Drive), or a storage medium such as an IC card, or an SD card. The drawings shows control lines and information lines as considered necessary for explanation but do not show all control lines or information lines in the products. It can be considered that almost of all components are actually interconnected.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2011/005154 | 9/14/2011 | WO | 00 | 9/27/2013 |