The present invention pertains to a unified storage and a method of controlling a unified storage, and is suitable to be applied to a unified storage that functions as a block storage and a file storage and a method of controlling this unified storage.
In recent years, various types of data, including structured data such as databases or business systems and unstructured data such as images and videos, are handled. Because a suitable storage system differs according to data, both a block storage and a file storage are necessary to store various types of data. However, costs pile up when a block storage and a file storage are separately purchased. Accordingly, a unified storage which with one device supports a plurality of data access protocols for files and blocks, such as Network File System (NFS)/Common Internet File System (CIFS), iSCSI (Internet Small Computer System Interface), or Fibre Channel (FC) is coming into wide use.
For example, U.S. Pat. No. 8,117,387 (hereinafter referred to as Patent Document 1) discloses a technique for realizing a unified storage by combining a server (a network-attached storage (NAS) head), which performs file processing, with a block storage. In addition, U.S. Pat. No. 8,156,293 (hereinafter referred to as Patent Document 2) discloses a technique for realizing a unified storage by using some resources, such as a central processing unit (CPU) or a memory, of a block storage to perform file processing.
Incidentally, because file processing has a higher overhead for processing than block processing, it is necessary to scale out file processing. However, unified storages using Patent Documents 1 and 2 have the following problems.
Firstly, with the technique according to Patent Document 1, there is the problem that an additional server (a NAS head) is necessary to realize scaling out of file processing, and costs pile up.
In addition, with the technique according to Patent Document 2, there is the problem that it is not possible to scale out file processing because resources that perform file processing are integrated with the block storage. To give an explanation in detail, a block storage system typically has two storage control modules for high availability, performs failover processing when there is a failure with one storage control module, and continues processing by the other storage control module. The technique according to Patent Document 2 realizes high availability by a similar method by using resources of each storage control module in the block storage, even in file processing. Accordingly, the technique according to Patent Document 2 cannot scale out file performance.
The present invention is made in consideration of the above points, and thus an objective of the present invention is to propose a unified storage that can maintain availability and scale out file performance while suppressing costs, and a method of controlling this unified storage.
In order to achieve the foregoing objective, the present invention provides a unified storage configured to function as a block storage and a file storage, the unified storage having a plurality of controllers that are storage controllers and a storage apparatus, each of the plurality of controllers being equipped with one or more main processors and one or more channel adapters, each main processor processing data inputted to and outputted from the storage apparatus by causing a block storage control program to operate, each channel adapter having a processor, the processor performing transmission and reception to and from the one or more main processors after accepting an access request, and the processors in a plurality of the channel adapters cooperating to cause a distributed file system to operate and make a request to the plurality of controllers to distributively store data that is written as a file.
In addition, in order to achieve the foregoing objective, the present invention provides a method of controlling a unified storage configured to function as a block storage and a file storage, the unified storage having a plurality of controllers that are storage controllers and a storage apparatus, each of the plurality of controllers being equipped with one or more main processors and one or more channel adapters, each main processor processing data inputted to and outputted from the storage apparatus by causing a block storage control program to operate, each channel adapter having a processor, the processor performing transmission and reception to and from the one or more main processors after accepting an access request, and the processors in a plurality of the channel adapters cooperating to cause a distributed file system to operate and distributively store data, written as a file, to the plurality of controllers.
By virtue of the present invention, it is possible to maintain availability and scale out file performance, while suppressing costs.
With reference to the drawings, description is given in detail below regarding embodiments of the present invention.
Note that the following description and the drawings are examples for describing the present invention, and are omitted and simplified, as appropriate, in order to clarify the description. In addition, there is no limitation to all combinations of features described in the embodiments being essential to means for solving the invention. The present invention is not limited to the embodiments, and every possible example of application that matches the idea of the present invention is included in the technical scope of the present invention. A person skilled in the art can make, inter alia, various additions or modifications to the present invention, within the scope of the present invention. The present invention can be implemented in various other forms. Unless otherwise specified, components may be singular or plural.
In the following description, various items of information may be described by expressions such as “table,” “list,” and “queue,” but the various items of information may be expressed as data structures different to these. To indicate independence from a data structure, “xx table,” “xx list,” etc. may be referred to as “xx information.” When describing details for each item of information, in a case where expressions such as “identification information,” “identifier,” “name,” “ID,” and “number” are used, these can be mutually interchanged.
In addition, in the following description, it may be that, in a case where description is given without distinguishing elements of the same kind, a reference symbol or a common number from among reference symbols is used and, in a case where description is given while distinguishing elements of the same kind, the reference symbol for the element is used or, in place of the reference symbol, an ID assigned to the element is used.
In addition, in the following description, processing performed by executing a program may be described, but because the program is executed by at least one or more processor (for example, a CPU) to perform defined processing while appropriately using a storage resource (for example, a memory) and/or an interface device (for example, a communication port) or the like, the processor may be given as the performer of the processing. Similarly, the performer of processing performed by executing a program may be a controller, an apparatus, a system, a computer, a node, a storage system, a storage apparatus, a server, a management computer, a client, or a host that each have a processor. The performer (for example, a processor) of processing performed by executing a program may include a hardware circuit that performs some or all of the processing. For example, the performer of processing performed by executing a program may include a hardware circuit that executes encryption and decryption or compression and decompression. A processor operates in accordance with a program to thereby operate as a functional unit for realizing a predetermined function. An apparatus and a system that include the processor are an apparatus and system that include such functional units.
A program may be installed to an apparatus such as a computer from a program source. The program source may be a non-transitory storage medium that can be read by a program distribution server or a computer, for example. In a case where the program source is in a program distribution server, it may be that the program distribution server includes a processor (for example, a CPU) and a non-transitory storage resource, and the storage resource also stores a distribution program and a program to be distributed. It may be that the processor in the program distribution server executes the distribution program, whereby the processor in the program distribution server distributes the program to be distributed to another computer. In addition, in the following description, two or more programs may be realized as one program, and one program may be realized as two or more programs.
A unified storage 1 according to a first embodiment of the present invention mounts one or more high-performance front-end interfaces (FE-I/Fs) in each of a plurality of storage controllers (hereinafter referred to simply as controllers), and uses CPUs and memories in the FE-I/Fs to cause a distributed file system (distributed FS) to operate. The distributed FS is configured to recognize each controller and distributively dispose data on the controllers while protecting data by storing data by a redundant configuration in which data is present across two or more controllers, such that data is not present in only one controller.
A detailed internal configuration is described below, but a high-performance FE-I/F equipped in the unified storage 1 is a channel adapter having a processor (for example, a CPU) or the like, and is specifically a smart network interface card (NIC), for example. It is known that such a channel adapter as a Smart NIC is cheaper than a server such as a NAS head.
As illustrated in
The distributed file system control program P11 requests processors in two or more controllers 100 to store data by having a redundant configuration between the controllers 100. To describe in detail, a block storage control program P1 (refer to
Here, for example, when a failure occurs in the #0 controller 100 as illustrated in
The unified storage 1 has a storage control apparatus 10 and a storage device unit 20.
The storage control apparatus 10 has a plurality of controllers 100. In the case in
Note that, although not illustrated in
In addition, in the case in
Each controller 100 has one or more FE-I/Fs 110, a backend interface (BE-I/F) 120, one or more CPUs 130, a memory 140, and a cache 150. These are connected to each other by a communication channel such as a bus, for example.
Note that an FE-I/F 110 in the present embodiment may be realized by a configuration that is integrated with a controller 100, or may be realized by a separate apparatus (device) that can be mounted in a controller 100. In order to facilitate understanding of a connection relation,
A FE-I/F 110 is an interface device for communicating with an external device present on the front end, such as a client 40. A distributed FS operates on the FE-I/F 110. A distributed FS may operate on a freely-defined number of FE-I/Fs 110 from among the plurality of FE-I/Fs 110. The BE-I/F 120 is an interface device for the controller 100 to communicate with the storage device unit 20.
A CPU 130 is an example of a processor that performs operation control for block storage. The memory 140 is, for example, a random-access memory (RAM) and temporarily stores a program and data for operation control by the CPU 130. The memory 140 stores the block storage control program P1. Note that the block storage control program P1 may be stored in the storage device unit 20. The block storage control program P1 is a control program for block storage, and is a program for processing data inputted to and outputted from the storage device unit 20. Specifically, for example, the block storage control program P1 provides the FE-I/F 110 with a logical volume (logical Vol in
The cache 150 temporarily stores data written by a block protocol from a client 40 or a distributed FS that operates on the FE-I/F 110, and data read from the storage device unit 20.
The network 30 is specifically, for example, a local area network (LAN), a wide area network (WAN), a storage area network (SAN), or the like.
A client 40 is an apparatus that accesses the unified storage 1, and transmits a data input/output request (a data write request or a data readout request) in units of blocks or units of files to the unified storage 1.
The management terminal 50 is a computer terminal operated by a user or an operator. The management terminal 50 is provided with a user interface in accordance with, inter alia, a graphical user interface (GUI) or a command-line interface (CLI), and provides functionality for the user or operator to control or monitor the unified storage 1.
The storage device unit 20 has a plurality of physical devices (PDEVs) 21. A PDEV 21 is, for example, a hard disk drive (HDD), but may be another type of non-volatile storage device, including a flash memory device such as a solid-state drive (SSD), for example. The storage device unit 20 may have different types of PDEVs 21. In addition, a group of a redundant array of independent disks (RAID) may be configured by a plurality of PDEVs 21 of the same type. Data is stored to the RAID group according to a predetermined RAID level.
The network I/F 111 is an interface device for communicating with an external device such as a client 40 or with another FE-I/F 110. Note that communication with an external device such as a client 40 and communication with another FE-I/F 110 may be performed by different network I/Fs. The internal I/F 112 is an interface device for communicating with the block storage control program P1. The internal I/F 112 is connected with, inter alia, the CPU 130 in the controller 100 by Peripheral Component Interconnect Express (PCIe), for example.
The CPU 113 is a processor that performs operation control of the FE-I/F 110. The memory 114 temporarily stores programs and data used for the operation control by the CPU 113. The CPU 113 accepts an access request to thereby perform transmission and reception to and from a processor (CPU 130) in the controller 100. The CPUs 113 in a plurality of FE-I/Fs 110 cooperate to cause a distributed file system 2 to operate, and make a request to the plurality of controllers 100 to distributively store data that is written as a file.
The memory 114 stores a distributed file system control program P11, a file protocol server program P13, a block protocol server program P15, and a node management table T11. Note that it may be that the memory 114 stores only the block protocol server program P15, or it may be that the memory 114 stores only the distributed file system control program P11, the file protocol server program P13, and the node management table T11, and not the block protocol server program P15. In addition, respective programs and data stored to the memory 114 may be stored to the storage device 116.
The distributed file system control program P11 is executed by the CPU 113 to thereby cooperate with the distributed file system control program P11 in another FE-I/F 110 to manage and control the distributed file system and provide the distributed file system (FS 2 in
The file protocol server program P13 receives various requests for, inter alia, a read or a write from a client 40 or the like, and processes a file protocol included in such requests. A file protocol processed by the file protocol server program P13 is specifically, for example, NFS, CIFS, a file-system-specific protocol, Hypertext Transfer Protocol (HTTP), or the like.
The block protocol server program P15 receives various requests for, inter alia, a read or a write from a client 40 or the like, and processes a block protocol included in such requests. A block protocol processed by the block protocol server program P15 is specifically, for example, iSCSI, FC, or the like.
The cache 115 temporarily stores data written from a client 40, or data read from the block storage control program P1.
The storage device 116 stores, inter alia, an operating system or management information for the FE-I/F 110.
The network I/F 41 is an interface device for communicating with the unified storage 1.
The CPU 42 is a processor that performs operation control for the client 40. The memory 43 temporarily stores programs and data used for the operation control by the CPU 42. The memory 43 stores an application program P41, a file protocol client program P43, and a block protocol client program P45. Note that it may be the memory 43 stores only the application program P41 and the block protocol client program P45, or it may be that the memory 43 stores only the application program P41 and the file protocol client program P43. In addition, respective programs and data stored to the memory 43 may be stored to the storage device 44.
The application program P41 is executed by the CPU 42 to thereby make a request to the file protocol client program P43 and the block protocol client program P45 and read and write data from and to the unified storage 1.
The file protocol client program P43 receives various requests for, inter alia, a read or a write from the application program P41 or the like, and processes a file protocol included in such requests. A file protocol processed by the file protocol client program P43 is specifically, for example, NFS, CIFS, a file-system-specific protocol, HTTP, or the like. In addition, in a case where a file protocol processed by the file protocol client program P43 is a protocol that supports data distribution such as Parallel NFS (pNFS) or a client-specific protocol (for example, Ceph), it may be that the file protocol client program P43 calculates a storage destination node (target data storage destination node calculation illustrated in
The block protocol client program P45 receives various requests for, inter alia, a read or a write from a client 40 or the like, and processes a block protocol included in such requests. A block protocol processed by the block protocol client program P45 is specifically, for example, iSCSI, FC, or the like.
The storage device 44 stores, inter alia, an operating system or management information for the client 40.
Specifically, in the case in
Note that, in a case where the number of controllers 100 is three or more, it may be that data is distributively disposed among controllers 100 in addition to just within a controller 100. In this case, data protection may be in triplicate or better data protection in which the same data is stored to three or more FE-I/Fs 110 instead of duplicative data protection in which the same data is stored to two FE-I/Fs 110.
In addition, the above-described distributive disposition of data is not limited to user data. File metadata or file system metadata may similarly be subjected to data protection that is across controllers 100, and may distributively be disposed among FE-I/Fs 110.
The node management table T11 illustrated in
According to
Next, the distributed file system control program P11 calculates a storage destination node for storing target data (the file) (step S103). A detailed processing procedure for calculating the storage destination node for target data is described below with reference to
Next, the distributed file system control program P11 requests the storage destination node calculated in step S103 to write the target data (step S105). Note that, in a case where the target data is divided into a plurality of chunks, the distributed file system control program P11 executes the processing in step S105 for each chunk. In addition, write requests for a plurality of chunks may be performed in parallel.
Note that, in a case where a file protocol processed by the file protocol client program P43 in the client 40 is a protocol that supports data distribution such as pNFS or a client-specific protocol, the processing in steps S101 to S105 is performed by the client 40.
Next, with respect to the data write request in step S105, the distributed file system control program P11 in the storage destination node (FE-I/F 110) for the target data, upon receiving the data write request, performs a process for writing the data to a region corresponding to the data (step S107). When the process for writing the data completes, the distributed file system control program P11 makes a completion response with respect to the data write request in step S105 (step S109).
Next, in the node (FE-I/F 110) that has received the file storage request from the client 40, the distributed file system control program P11 receives the completion response with respect to the data write request, from the distributed file system control program P11 in the storage destination node, and makes a file storage completion response to the file protocol server program P13. Further, the file protocol server program P13 performs protocol processing and makes a completion response, with respect to the file storage request, to the client 40 (step S111), and the file storage process ends.
According to
Next, the distributed file system control program P11 calculates the storage destination node that stores target data (the file) (step S203). A process for calculating the storage destination node for target data is similar to step S103 in
Next, the distributed file system control program P11, referring to the node management table T11, confirms whether or not the target data storage destination node calculated in step S203 is in a normal state, and selects a storage destination node in the normal state (step S205). To give a description in detail, in a case where any storage destination node is in a failure state, the distributed file system control program P11 selects a storage destination node that is in the normal state. In addition, if all of the storage destination nodes are in the failure state, the distributed file system control program P11 returns an error to the client 40, and ends the file readout process.
Next, the distributed file system control program P11 requests the storage destination node selected in step S205 to read out the target data (step S207). Note that, in a case where the target data is divided into a plurality of chunks, the distributed file system control program P11 executes the processing in steps S205 and S207 for each chunk. In addition, readout requests for a plurality of chunks may be performed in parallel.
Note that, in a case where a file protocol processed by the file protocol client program P43 in the client 40 is a protocol that supports data distribution such as pNFS or a client-specific protocol, the processing in steps S201 to S207 is performed by the client 40.
Next, with respect to the data readout request in step S207, the distributed file system control program P11 in the storage destination node (FE-I/F 110) for the target data, upon receiving the data readout request, performs a process for reading out data from a region corresponding to the data (step S209). When processing for reading out the data completes, the distributed file system control program P11 returns the data that has been read out (readout data) to the request source for the data readout request (step S211).
Next, in the node (FE-I/F 110) that has received the file readout request from the client 40, the distributed file system control program P11 receives the readout data from the distributed file system control program P11 in the storage destination node, and returns the readout data to the file protocol server program P13 (step S213). Further, the file protocol server program P13 performs protocol processing and returns the readout data to the client 40 (step S213), and the file readout process ends.
According to
Next, the distributed file system control program P11 calculates another storage destination node for the target data (step S303). A calculation method may be similar to that in step S301, but processing in this step calculates a hash value after adding a value (for example, 1, 2, 3, . . . ), which is the same each time a storage destination node is calculated for the same data, to the file name and the chunk offset.
Next, the distributed file system control program P11, referring to the node management table T11, confirms whether or not the storage destination node obtained in step S301 and the storage destination node obtained in step S303 are connected to different controllers 100 (step S305). In a case where the two storage destination nodes are connected to different controllers 100 (YES in step S305), the process proceeds to step S307. In contrast, in a case where the two storage destination nodes are connected to the same controller 100 (NO in step S305), the value is changed (for example, if the value added to the offset in the immediately prior step S303 is 1, the value is changed to 2, etc.), step S303 is returned to, and a storage destination node is calculated again.
In step S307, the distributed file system control program P11 returns, to the call source node (FE-I/F 110), a target data storage destination node list resulting from converting the storage destination nodes calculated in step S301 and step S303 into a list, and ends the processing.
By virtue of the processing for calculating a target data storage destination node as above, it is possible to select nodes (FE-I/Fs 110) in a plurality of controllers 100 as target data storage destination nodes. Accordingly, it is possible to realize data protection that is across controllers 100. Further, a hash value is used for each item of data, whereby it is possible to distributively dispose data to respective nodes connected to the same controller 100.
Note that, although description is given by taking duplicative data protection as illustrated in
Regarding which controller 100 a calculated storage destination node is connected to, the target data storage destination node calculation method illustrated in
Note that the processing in
According to
Next, the distributed file system control program P11 calculates another storage destination controller for the target data (step S403). A calculation method may be similar to that in step S401, but processing in this step calculates a hash value after adding a value (for example, 1, 2, 3, . . . ), which is the same each time a storage destination controller is calculated for the same data, to the file name and the chunk offset.
Next, the distributed file system control program P11 confirms whether or not the controller 100 obtained in step S401 and the controller 100 obtained in step S403 are different controllers (step S405). In a case where the two controllers 100 are different (YES in step S405), step S407 is advanced to. In contrast, in a case where the two controllers 100 are the same controller (NO in step S405), the value is changed (for example, if the value added to the offset in the immediately prior step S403 is 1, the value is changed to 2, etc.), step S403 is returned to, and a storage destination controller is calculated again.
In step S407, the distributed file system control program P11 calculates a target data storage destination node in each controller 100 obtained as a storage destination controller. Specifically, the distributed file system control program P11 determines, on the basis of the file name and the chunk offset, a logical volume in an FE-I/F 110 for storing the target data. For example, a hash value for the file name and the chunk offset is calculated, and the logical volume in the FE-I/F 110 is determined on the basis of this hash value. By performing such processing, it is possible to distributively dispose target data at nodes within controllers 100. Note that there is no limitation to using a file name and a chunk offset and, for example, a determination may alternatively be made on the basis of information such as a file mode number and a chunk serial number, for example. In addition, it may be that a storage destination node is calculated for only one storage destination controller, and a result of this calculation is applied to each storage destination controller.
Finally, the distributed file system control program P11 returns, to the call source node (FE-I/F 110), a target data storage destination node list resulting from converting the storage destination nodes calculated in step S407 into a list (step S409), and ends the processing.
By virtue of the processing for calculating a target data storage destination node as above, it is possible to select nodes (FE-I/Fs 110) in a plurality of controllers 100 as target data storage destination nodes. Accordingly, it is possible to realize data protection that is across controllers 100. Further, a hash value is used for each item of data, whereby it is possible to distributively dispose data to respective nodes connected to the same controller 100.
Regarding which controller 100 a calculated storage destination node is connected to, the target data storage destination node calculation method illustrated in
As described above, the unified storage 1 according to the present embodiment mounts, to each of a plurality of controllers 100, one or more channel adapters (specifically, an FE-I/F 110 having a CPU 113) each having a processor that performs transmission and reception to and from a processor (CPU 130) in a controller 100 after receiving an access request, the processors in a plurality of channel adapters cooperate to cause a distributed file system to operate, and a channel adapter requests two or more controllers 100 to distributively store data, written as a file, to the plurality of controllers 100. To describe in further detail, the distributed file system recognizes the plurality of controllers 100 and distributively stores a plurality of items of data, corresponding to a file for which a write request has been made, to a plurality of volumes corresponding to the plurality of controllers 100 to achieve redundancy across the controllers 100, and thus can maintain availability for this data. By increasing the number of mounted channel adapters on which the distributed file system operates in the controllers 100, it is possible to scale out file performance. Accordingly, the unified storage 1 according to the present embodiment can realize scaling out file processing without adding a server such as a NAS head, and can suppress costs for scaling out.
In other words, in the unified storage 1 according to the present embodiment, the distributed file system recognizes each controller 100, and distributively stores a plurality of items of data corresponding to a file for which a write request has been made to a plurality of volumes corresponding to a plurality of controllers to achieve redundancy between the controllers 100, whereby all data can be accessed even at a time when a failure has occurred for any controller 100 or channel adapter (FE-I/F 110).
Accordingly, by virtue of the unified storage 1 according to the present embodiment, it is possible to maintain availability and scale out file performance, while suppressing costs.
Note that, as functionality pertaining to the block storage in the unified storage 1 according to the present embodiment, in a case of having received an access request for block data, a channel adapter (FE-I/F 110) transfers the access request to the block storage control program P1 without going through the distributed file system. This is similar in other embodiments described below.
In a unified storage 1A according to a second embodiment of the present invention, in addition to the configuration of the unified storage 1 according to the first embodiment, each controller 100 has a management hardware (management H/W) 160. A distributed file system control program P61 operates in the management H/W 160 in any one controller 100. However, this distributed file system control program P61 differs to the distributed file system control program P11 in the first embodiment, and performs only processing pertaining to majority logic in the distributed file system, and does not perform a file storage process or the like. The management H/W 160 on which the distributed file system control program P61 is not operating confirms whether a failure has not arisen in the management H/W 160 for another controller 100 and, when a failure has occurred, performs a failover process for the distributed file system control program P61.
As illustrated in
Comparing the configuration in
Each management H/W 160 is hardware that is for managing and, in addition to a CPU, a memory, and the like, is provided with a user interface in accordance with a GUI, a CLI, or the like, and provides functionality for a user or an operator to control or monitor the unified storage 1A. In addition, each management H/W 160 executes processing pertaining to the majority logic in the distributed file system, and performs a failover process for the distributed file system control program P61 when a failure has occurred for a management H/W 160 mounted (connected) to a controller 100 different to that for itself.
The network I/F 161 is an interface device for communicating with an external device such as a client 40 or with a distributed file system control program P11 in an FE-I/F 110. Note that communication with an external device such as a client 40 and communication with an FE-I/F 110 may be performed by different network I/Fs. The internal I/F 162 is an interface device for communicating with the block storage control program P1. The internal I/F 162 is, for example, connected by PCIe to, inter alia, a CPU 130 in a controller 100.
The CPU 163 is a processor that performs operation control of the management H/W 160. The memory 164 temporarily stores programs and data used for the operation control by the CPU 163. The memory 164 stores a distributed file system control program P61, a storage management program P63, and a failure management program P65.
The distributed file system control program P61 differs to the distributed file system control program P11 in FE-I/Fs 110 in the first embodiment, and performs only processing pertaining to majority logic in the distributed file system 2, and does not perform a file storage process or the like. Operation by the distributed file system that includes, inter alia, a file storage process is realized by the distributed file system control program P11 in the FE-I/Fs 110 similarly to in the first embodiment. Note that information requiring persistence is not limited to being stored in the memory 164, and may be stored in a logical volume provided by block storage.
The storage management program P63 is provided with a user interface in accordance with a GUI, a CLI, or the like, and provides functionality for a user or an operator to control or monitor the unified storage 1A.
The failure management program P65 is executed by a management H/W 160 on which the distributed file system control program P61 is not operating. The failure management program P65 confirms whether or not a failure has arisen in the management H/W 160 for another controller and, when a failure has occurred, performs a failover process for the distributed file system control program P61.
In other words, at a normal time when a failure has not occurred, the management H/W 160 (for example, the #0 management H/W 160) in one controller 100 (a first controller) performs processing pertaining to the majority logic by executing the distributed file system control program P61, and the management H/W 160 (for example, the #1 management H/W 160) in the other controller 100 (a second controller) executes the failure management program P65 to monitor for the occurrence of a failure in the #0 management H/W 160. In a case where a failure has occurred in the #0 management H/W 160 (may be interpreted as a node or controller connected to this management H/W 160), the failure management program P65 in the #1 management H/W 160 performs a failover process for the distributed file system control program P61 in the #0 management H/W 160.
Note that, in the case for
The storage device 165 stores, inter alia, an operating system or management information for the management H/W 160.
According to
In step S505, the failure management program P65 performs a failover for the distributed file system control program P61 in the other management H/W 160. At this time, if necessary for the failover, the failure management program P65 may mount a logical volume used by the distributed file system control program P61 to thereby refer to data.
By a failover process as above being performed, the unified storage 1A according to the second embodiment can maintain the majority logic for the distributed file system even after a failure has occurred for a controller 100, and can continue processing for the distributed file system and reading and writing of data using the majority logic.
In addition, besides the management H/W 160, the unified storage 1A has a configuration similar to that of the unified storage 1 according to the first embodiment, and various types of processes such as a file storage process and a file readout process are executed by processing procedures similar to those in the first embodiment. Accordingly, the unified storage 1A according to the second embodiment can achieve effects similar to those of the unified storage 1 according to the first embodiment.
In a unified storage 1B according to a third embodiment of the present invention, similarly to the configuration of the unified storage 1 according to the first embodiment, each of a plurality of controllers 100 is equipped with one or more high-performance FE-I/Fs 170, and CPUs and memories in the FE-I/Fs 170 are used to cause a distributed file system to operate. Further, as a difference from the first embodiment, the unified storage 1B according to the third embodiment performs failure monitoring among the FE-I/Fs 170 across controllers. In a case where a failure has occurred for any controller, a failover process is performed by an FE-I/F 170 in a controller for which a failure has not occurred taking over processing for the FE-I/F 170 in the controller for which the failure has occurred and using data stored by the controller for which a failure has not occurred, from among data stored redundantly (for example, parity or the like) among controllers, to restore data stored in the controller for which the failure has occurred. Note that the third embodiment has a configuration in which data protection is not performed at a file layer, but it may be that data protection is performed by block storage.
As illustrated in
In the case in
By such a failover process being performed, even after a failure has occurred for one controller 100, the unified storage 1B can continue to access data via an FE-I/F 170 in a corresponding other controller 100.
Similarly to the memory 114 in the FE-I/F 110, the memory 171 stores a distributed file system control program P11, a file protocol server program P13, a block protocol server program P15, and a node management table T11. Note that, as a point different to the first embodiment, it may be that the distributed file system control program P11 in the memory 171 only distributively disposes data without performing data protection that is across controllers 100.
As a program and data not in the memory 114 in the FE-I/F 110, the memory 171 stores a failure management program P17 and a failure pair management table T12.
The failure management program P17 identifies a failure pair FE-I/F 170 (failure pair node) on the basis of the failure pair management table T12, and confirms whether or not a failure has arisen for the failure pair node on the basis of the node management table T11. In a case where a failure has arisen for the failure pair node, the failure management program P17 allocates the logical volume used by the distributed file system control program P11 in the failure pair FE-I/F 170 to its own node, and enables access from the distributed file system control program P11 in its own node.
The failure pair management table T12 illustrated in
Specifically, for example, a case where a value for the node ID pair C21 is (0, 3) and a value for the controller ID pair C22 is (0, 1) means that the #0 FE-I/F 170 (a node) mounted to the #0 controller 100 and the #3 FE-I/F 170 (a node) mounted to the #1 controller 100 are joined in a failure pair.
According to
Next, the failure management program P17, referring to the failure pair management table T12, confirms whether or not the failure node for which information has been obtained in step S601 is a failure pair node for its own node (step S603). Step S605 is advanced to in a case where the failure node is the failure pair node (YES in step S603), and the process ends in a case where the failure node is not the failure pair node (NO in step S603).
In step S605, the failure management program P17 allocates the logical volume that the distributed file system control program P11 in the failure pair node has used to its own node. Allocation of a logical volume may be performed via the block protocol server program P15 or may be performed by making a direct request to the block storage control program P1 stored in the memory 140 of a controller 100, using the internal I/F 112.
A file storage process and a file readout process in the unified storage 1B according to the third embodiment have similar processing procedures to those illustrated by
Note that, although the above-described unified storage 1B is described as not protecting data across controllers 100, as a variation of the third embodiment, it may be that the unified storage 1B employs a redundant configuration similar to that in the first embodiment and protects data across controllers 100.
The unified storage 1B according to the third embodiment as described above has a configuration that does not perform data protection that is across controllers 100 (in other words, data protection among logical volumes on different controllers), but can cause a distributed file system to operate by using a plurality of channel adapters (FE-I/Fs 170) mounted to respective controllers 100. Further, the unified storage 1B according to the third embodiment monitors, through a combination of the abovementioned channel adapters (FE-I/Fs 170) that goes across controllers 100, the occurrence of a failure in a failure pair node in accordance with execution of the failure management program P17 in channel adapters. In a case where a failure has occurred in a controller 100 that includes a channel adapter and a processor (CPU 130), a failover can be realized by a channel adapter in a controller 100 for which a failure has not occurred taking over processing for the channel adapter in the controller 100 for which the failure has occurred, and restoring, on the basis of redundancy, data stored in the controller 100 for which the failure has occurred to continue processing. Accordingly, by virtue of the unified storage 1B according to the third embodiment, the effect of making it possible to maintain availability for a storage system and scale out file performance while suppressing cost increases due to adding servers is achieved.
In a unified storage 1C according to a fourth embodiment of the present invention, similarly to the configuration of the unified storage 1 according to the first embodiment, each of a plurality of controllers 100 is equipped with one or more high-performance FE-I/Fs 180. As a difference to the first through third embodiments, CPUs and memories in the FE-I/Fs 180 are used to cause local file systems 3 to operate in the unified storage 1C.
In addition, similarly to the unified storage 1B according to the third embodiment, the unified storage 1C performs failure monitoring among FE-I/Fs 180 across controllers. In a case where a failure has occurred for any controller, a failover process is performed by an FE-I/F 180 in a controller for which a failure has not occurred taking over processing for the FE-I/F 180 in the controller for which the failure has occurred and using data stored by the controller for which a failure has not occurred, from among data stored redundantly (for example, parity or the like) among controllers, to restore data stored in the controller for which the failure has occurred. Note that the fourth embodiment protects data by block storage and does not protect data at a file layer.
As illustrated in
The file system program P12, by being executed by the CPU 113, provides a local file system (a file system 3) to the file protocol server program P13. The file system program P12 stores data in a logical volume allocated to itself. Storage of data to a logical volume may be performed via the block protocol server program P15 or may be performed by directly communicating with the block storage control program P1 (refer to
The failure management program P17 identifies a failure pair FE-I/F 180 (failure pair node) on the basis of the failure pair management table T12, and confirms whether or not a failure has arisen for the failure pair node on the basis of the node management table T11. In a case where a failure has arisen for the failure pair node, the failure management program P17 allocates the logical volume used by the file system program P12 in the failure pair FE-I/F 180 to its own node, and enables access from the file system program P12 in its own node.
According to
Next, the failure management program P17, referring to the failure pair management table T12, confirms whether or not the failure node for which information has been obtained in step S701 is a failure pair node for its own node (step S703). The configuration of the failure pair management table T12 is as exemplified in
In step S705, the failure management program P17 allocates the logical volume that the file system program P12 in the failure pair node has used to its own node. Allocation of a logical volume may be performed via the block protocol server program P15 or may be performed by making a direct request to the block storage control program P1 stored in the memory 140 of a controller 100, using the internal I/F 112.
Next, the failure management program P17 mounts a file system from the logical volume allocated in step S705, and provides the file system 3 to the file protocol server program P13 (step S707).
The failure management program P17 assigns a virtual internet protocol (IP) address that the failure pair node has used to its own node (step S709), and the process ends.
By the failover process as above being executed, even after a failure has occurred in a controller 100, the unified storage 1C according to the fourth embodiment can continue to access data via the failure pair node belonging to a corresponding controller (refer to the controller ID pair C22 and the node ID pair C21 in
Number | Date | Country | Kind |
---|---|---|---|
2022-155471 | Sep 2022 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
8117387 | Matsuki et al. | Feb 2012 | B2 |
8156293 | Shitomi et al. | Apr 2012 | B2 |
8356072 | Chakraborty | Jan 2013 | B1 |
20080244030 | Leitheiser | Oct 2008 | A1 |
20230208439 | Hong | Jun 2023 | A1 |
Number | Date | Country | |
---|---|---|---|
20240104064 A1 | Mar 2024 | US |