This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2012-258254, filed on Nov. 27, 2012, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are directed to a control system, an abnormality diagnosis method of a control system, and a computer-readable recording medium having stored therein abnormality diagnosis program of a control system.
There is a data storage system which includes a plurality of input/output controllers (IOCs) having a function as initiators that gives a command to a plurality of storage devices. The IOC including such a data storage system is also referred to as a serial attached small computer system interface controller (SAS controller). A data storage system having a function that, if an abnormality of any of IOCs is detected, separates the IOC from which the abnormality is detected is known.
Here, the abnormality of the IOC which is recognized by the data storage system is as follows:
(1) when the IOC reports an abnormal status as a SAS controller,
(2) when the IOC does not response,
(3) when an error regarding a SAS path occurs while accessing the plurality of storage devices which are controlled by the IOC, and
(4) when an abnormality of data as a storage system such as misalignment of a data integrity field is detected.
When the above-mentioned abnormality is detected by the data storage system, it is difficult to discriminate whether the abnormality occurs due to an error in a hardware of the IOC, or an abnormal operation of a farm of the IOC, or other factors than the IOC.
For example, it is known that when the abnormality of the IOC is detected, a chip of the IOC is reset. After resetting the chip, when the IOC normally operates, the data storage system determines that the detected abnormality is not caused by the error in the hardware of the IOC and continuously uses the IOC. In contrast, after resetting the chip, when the IOC does not normally operate or an abnormality of the IOC is detected again, the data storage system separates the IOC from the system.
However, in the technique of the related art, when the abnormality detected by the data storage system is caused by a hardware error of the IOC, the abnormality may be generated again after resetting the chip. Further, regardless that the abnormality detected by the data storage system is caused by a reason other than the IOC, when the abnormality of the IOC is detected again after resetting the chip, the IOC is separated regardless of the abnormality in the IOC. Further, even when the abnormality detected by the data storage system is generated by a partial error of the hardware of the IOC, the whole IOC is undesirably separated as the abnormal portion.
A control system including at least two controllers configured to serve as initiators to control a control target device, the control system including: a confirmation unit configured to operate one of the two controllers as an initiator and the other controller as a target to confirm statuses of the two controllers; and a validation unit configured to operate an abnormal controller which is confirmed by the confirmation unit as a target and a normal controller as an initiator and performs a data access process on the target to validate a function of the abnormal controller.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
Hereinafter, embodiments of a control system, an abnormality diagnosis method of the control system, and a computer-readable recording medium having stored therein abnormality diagnosis program of the control system will be described with reference to drawings. However, the embodiments described below are only illustrative but do not intend to exclude various modification examples or technologies which are not described in the embodiment. That is, the embodiments may be modified in various manners (combination of embodiments and modification examples) without departing from a gist of the invention.
Further, each drawing does not intend to include only components described in the drawings but may further include other functions.
[A-1] System Configuration
A control system (storage system) 1 according to the embodiment, as illustrated in
Hereinafter, as a reference numeral indicating the storage device, when it is required to specify one of the plurality of storage devices, reference numerals 30-1 to 30-m are used. But, when any of storage devices is indicated, a reference numeral 30 is used.
The CM 10 and the expander 20 are connected through phys 50a-1 to 50a-4, and 50b-1 to 50b-4 as physical wiring lines (physical links). Further, the expander 20 and the storage device 30 are connected to each other through a phy 50c. Further, the CM 10 and the host device 40 are connected through a phy 50d.
The host device 40 is, for example, a computer (information processing device) having a function as a server. Even though one host device 40 is provided in an example illustrated in
The expander 20 relays the CM 10 and the storage device 30 and transmits data based on an input/output (I/O) of the host device. In other words, the CM 10 accesses the storage device 30 provided in the storage system 1 through the expander 20.
The expander 20, as illustrated in
The wide port 21-1 is a port which is connected to a wide port 121-1 of the CM 10, which will be described below, through a plurality (four in this embodiment) of phys 50a-1 to 50a-4. Hereinafter, as reference numerals which indicate phys connecting between the wide ports 121-1 and 21-1, when it is required to specify one of the plurality of phys, reference numerals 50a-1 to 50a-4 are used, but when an arbitrary phy is indicated, a reference numeral 50a is used.
The wide port 21-2 is a port which is connected to a wide port 121-2 of the CM 10, which will be described below, through a plurality (four in this embodiment) of phys 50b-1 to 50b-4. Hereinafter, as reference numerals which indicate phys connecting between the wide ports 121-2 and 21-2, when it is required to specify one of the plurality of phys, reference numerals 50b-1 to 50b-4 are used, but when an arbitrary phy is indicated, a reference numeral 50b is used.
In other words, in the wide ports 21-1 and 21-2, the same number (four in this embodiment) of the ports as the number of the phys 50a and 50b are provided and phys 50a and 50b are connected to the ports one to one. That is, the wide ports 21-1 and 21-2 are provided so as to correspond to the phys 50a and 50b. Further, the same number of wide ports 21 as the number of IOC 12-1 and IOC 12-2 of the CM 10, which will be described below, is provided (two in the embodiment).
The storage device 30 is a storing device which stores data to be readable and for example, is a hard disk drive (HDD). In an example illustrated in
The CM 10 includes a central processing unit (CPU) 11, an IOC 12-1, an IOC 12-2, a memory 13, and a host adapter (HA) 14.
Hereinafter, the IOC 12-1 is referred to as an IOC #0 and the IOC 12-2 is referred to as an IOC #1 in some cases.
Further, hereinafter, when a specific IOC is indicated, the IOC is denoted as the “IOC 12-1”, the “IOC #0”, the “IOC 12-2”, or the “IOC #1”. However, when an arbitrary server device is indicated, the IOC is denoted as an “IOC 12”.
The CPU 11, the IOC 12, the memory 13, and the HA 14 are connected through a peripheral component interconnect bus (PCI bus) BS so as to be communicated with each other.
The HA 14 has a function that connects a local device (CM 10) and the host device 40 so as to be communicated with each other.
The IOC #0 and the IOC #1 include wide ports 121-1 and 121-2, respectively.
The wide port 121-1 is a port which is connected to the wide port 21-1 of the expander 20 through the phy 50a.
The wide port 121-2 is a port which is connected to the wide port 21-2 of the expander 20 through the phy 50b.
In other words, in the wide ports 121-1 and 121-2, the same number (four in this embodiment) of the ports as the number of the phys 50a and 50b are provided and phys 50a and 50b are connected to the ports one to one. That is, the wide ports 121-1 and 121-2 are provided so as to correspond to the phys 50a and 50b.
In the embodiment, the IOC 12 has a function as an initiator and a function as a target.
Here, the function as an initiator is a function that the IOC 12 gives a command to another device (for example, the storage device 30 or another IOC 12). Further, the function as a target is a function that the IOC 12 receives a command from another device (for example, another IOC 12).
When there is an access request from the host device 40 to the storage device 30, the IOC #0 functions as an initiator and gives a command to read/write data to the storage device 30 through the phy 50a, the expander 20 and the phy 50c. Similarly, when there is an access request from the host device 40 to the storage device 30, the IOC #1 functions as an initiator and gives a command to read/write data to the storage device 30 through the phy 50b, the expander 20 and the phy 50c.
Further, by functions as a confirmation unit 111 and a validation unit 113 of the CPU 11 which will be described below, the IOC #0 issues an access command to access the memory 13 to the IOC #1 through the phy 50a, the expander 20, and the phy 50b. In this case, the IOC #0 functions as an initiator and the IOC #1 functions as a target. Similarly, by functions as the confirmation unit 111 and the validation unit 113 of the CPU 11 which will be described below, the IOC #1 issues an access command to access the memory 13 to the IOC #0 through the phy 50b, the expander 20, and the phy 50a. In this case, the IOC #1 functions as an initiator and the IOC #0 functions as a target.
Further, even though two IOCs 12 are provided in an example illustrated in
The memory 13 is a recording device including a read only memory (ROM) and a random access memory (RAM). An operating system (OS), a software program related to an abnormality diagnosis of a control system (an abnormality diagnosis program of the control system) or data for the program is written in the ROM of the memory 13. The software program on the memory 13 is appropriately read in the CPU 11 so as to be executed. Further, the RAM of the memory 13 is used as a primary recording memory or a working memory.
In an example of the embodiment, the memory 13 includes a work area which is not illustrated and when the abnormality diagnosis of the IOC 12 is performed, the IOC 12 reads out data in the work area.
The CPU 11 is a processing device which performs various control or operation and executes OS or a program stored in the memory 13 to implement various functions. That is, the CPU 11, as illustrated in
Therefore, the CPU 11 executes the abnormality diagnosis program of the control system to function as the confirmation unit 111, the cut-off processing unit 112, and the validation unit 113.
Further, a program (the abnormality diagnosis program of the control system) which implements the function as the confirmation unit 111, the cut-off processing unit 112, and the validation unit 113 is provided so as to be recorded in a computer-readable recording medium such as a flexible disk, a CD (a CD-ROM, a CD-R, or a CD-RW), a DVD (a DVD-ROM, a DVD-RAM, a DVD-R, a DVD+R, a DVD-RW, a DVD+RW, or an HD DVD), a Blu-ray Disc, a magnetic disk, an optical disk, or a magneto-optical disk. Therefore, the computer reads out the program from the recording medium and transmits the program to an internal recording device or an external recording device so as to be recorded therein. Alternatively, the program is recorded in the recording device (recording medium) such as a magnetic disk, an optical disk, or a magneto-optical disk and provided to the computer from the recording device through the communication path.
When the function as the confirmation unit 111, the cut-off processing unit 112, and the validation unit 113 is implemented, the program stored in the internal recording device (the memory 13 in the embodiment) is executed by a microprocessor (a CPU 11 in the embodiment) of the computer. In this case, the program recorded in the recording medium may be read out by the computer to be executed.
Further, in the embodiment, the computer is a concept including a hardware and an OS and refers to a hardware which operates under the control of the OS. When the OS is not necessary and the application program operates the hardware by itself, the hardware itself corresponds to the computer. The hardware at least includes a microprocessor such as the CPU 11 and a unit which reads out a computer program recorded in the recording medium. In the embodiment, the CM 10 and the host device 40 have a function as a computer.
The confirmation unit 111 causes one of the IOCs 12 to operate as an initiator and another IOC 12 to operate as a target to confirm whether the IOCs 12 normally operate. The confirmation of the operation of the IOC 12 by the confirmation unit 111 uses a known method and the detailed description will be omitted.
Here, the abnormality of the IOC 12 which is recognized by the confirmation unit 111 is as follows:
(1) when the IOC reports an abnormal status as a SAS controller,
(2) when the IOC does not response,
(3) when an error regarding a SAS path occurs while accessing the plurality of storage devices which are controlled by the IOC, and
(4) when an abnormality of data as a storage system such as misalignment of a data integrity field is detected.
The cut-off processing unit 112 temporarily separates the IOC 12 which is confirmed by the confirmation unit 111 to be abnormal from the storage system 1. Further, the cut-off processing unit 112 separates the IOC 12 or the phys 50a and 50b indicated by the validation unit 113. The cut-off processing is implemented by various known methods and detailed description thereof will be omitted.
The validation unit 113 validates whether the abnormality of the IOC 12 which is confirmed by the confirmation unit 111 is an abnormality of the IOC 12 itself or an abnormality of any of phys 50a (or 50b) connected to the IOC 12.
[A-2] Example of Validation Method of Abnormal Portion
In
In
First, the confirmation unit 111 confirms the abnormality of the IOC #1.
The cut-off processing unit 112 temporarily separates the IOC #1 which is confirmed by the confirmation unit 111 to be abnormal from the storage system 1.
The validation unit 113 validates whether the abnormality of the IOC #1 which is confirmed by the confirmation unit 111 is an abnormality of the IOC #1 itself or an abnormality of any of phys 50b connected to the IOC #1. The validation unit 113, first, as illustrated in
First, as illustrated by an arrow A of
Here, as any one of phys 50a, for example, the phy 50a which is not used for the access request to the storage device 30 from the host device 40 is desirably selected. Here, for example, the IOC #0 accesses the memory 13 through the phy 50a-1.
Specifically, the validation unit 113 causes the IOC #0 to access the data stored in the memory 13 with respect to the IOC #1 through the phy 50a-1, the expander 20, and the phy 50b-1. The validation unit 113 performs the data access by sequentially changing the phys 50b-1, 50b-2, 50b-3, and 50b-4 to validate all phys 50b. That is, the validation unit 113 causes the IOC #0 to access the data stored in the memory 13 with respect to the IOC #1 through the phy 50a-1, the expander 20, and the phy 50b-2. Further, the validation unit 113 causes the IOC #0 to access the data stored in the memory 13 with respect to the IOC #1 through the phy 50a-1, the expander 20, and the phy 50b-3. That is, the validation unit 113 causes the IOC #0 to access the data stored in the memory 13 with respect to the IOC #1 through the phy 50a-1, the expander 20, and the phy 50b-4.
As described above, the validation unit 113 causes the IOC #0 to access the data stored in the memory 13 while sequentially changing all (four in an example of the embodiment) phys 50b in the abnormal IOC #1. Further, the order of the phys 50b used when the IOC #0 accesses the memory 13 is not limited to the above-mentioned order, but, for example, the order of phys 50b-4, 50b-3, 50b-2, and 50b-1 may be used.
In the example illustrated in
When the abnormal phy 50b-1 of the common function is specified, the cut-off processing unit 112 separates the phy 50b-1 from the wide port 21-2 of the expander 20.
Further, when the common function of the IOC #1 itself is abnormal (for example, when the hardware of the IOC #1 is abnormal), the IOC #0 may not access the memory 13 through any of the phys 50b. Accordingly, the validation unit 113 recognizes that all phys 50b are abnormal. The cut-off processing unit 112 separates all phys 50b from the wide port 21-2 of the expander 20.
Next, as illustrated by an arrow B of
Here, similarly to the validation processing of the common portion of the abnormal IOC described above, for example, the IOC #1 accesses the memory 13 through the phy 50a-1.
Specifically, the validation unit 113 causes the IOC #1 to access the data stored in the memory 13 with respect to the IOC #0 through the phy 50b-2, the expander 20, and the phy 50a-1. The validation unit 113 performs the data access by sequentially changing the phys 50b-2, 50b-3, and 50b-4 to validate all phys 50b excluding the phy 50b-1 which is separated in the validation processing on the common portion of the abnormal IOC. That is, the validation unit 113 causes the IOC #1 to access the data stored in the memory 13 with respect to the IOC #0 through the phy 50b-3, the expander 20, and the phy 50a-1. The validation unit 113 causes the IOC #1 to access the data stored in the memory 13 with respect to the IOC #0 through the phy 50b-4, the expander 20, and the phy 50a-1.
As described above, the validation unit 113 causes the IOC #1 to access the data in the memory 13 while sequentially changing all (three in this example) phys 50b in the abnormal IOC #1 side excluding the phy 50b-1 which is separated in the validation processing of the common portion of the abnormal IOC. Further, the order of the phys 50b used when the IOC #1 accesses the memory 13 is not limited to the above-mentioned order, but, for example, the order of phys 50b-4, 50b-3, and 50b-2 may be used.
In the example illustrated in
When the abnormal phy 50b-2 of the initiator function is specified, the cut-off processing unit 112 separates the phy 50b-2 from the wide port 121-2 of the IOC #1.
Further, when the initiator function of the IOC #1 itself is abnormal (for example, when the hardware of the IOC #1 is abnormal), the IOC #0 may not access the memory 13 through any of the phys 50b. Accordingly, the validation unit 113 recognizes that all phys 50b are abnormal. The cut-off processing unit 112 separates all phys 50b from the wide port 121-2 of IOC #1.
By the above-described processing, the abnormal phys 50b-1 and 50b-2 are completely separated and the cut-off processing unit 112 releases the temporal separation of the IOC #1 to be returned to the storage system 1.
Further, when all phys 50b are separated, the cut-off processing unit 112 does not release the temporal separation of the IOC #1.
[A-3] Operation
An abnormality diagnosis process in the storage system 1 as an example of the embodiment configured as described above will be described with reference to the flowchart (steps A10 to A140) of
When abnormality occurs in the storage system 1, the confirmation unit 111 confirms whether the generated abnormality is related with the IOC 12 (step A10 of
When the generated abnormality is related with the IOC 12 (see YES route of step A10 of
The validation unit 113 performs the validation processing A of the common function of the abnormal IOC (steps A30 to A70 of
The validation unit 113 causes the normal IOC 12 to function as an initiator and the abnormal IOC 12 to function as a target and connects any one of phys (hereinafter, referred to as a phy for validation) in the normal IOC 12 to the abnormal IOC 12 (step A30 of
The validation unit 113 causes the normal IOC 12 to access the data stored in the work area of the memory 13 with respect to the abnormal IOC 12 through the phy for validation, the expander 20, and one of phys in the abnormal IOC 12. By doing this, the validation unit 113 checks the target function of the phy in the used abnormal IOC 12 (step A40 of
The validation unit 113 determines whether the check result is normal, that is, whether to access the memory 13 (step A50 of
When the check result is normal (see Yes route of step A50 of
In contrast, when the check result is not normal (see No route of step A50 of
When all phys of the abnormal IOC 12 are not completely validated (see No route of step A60 of
In the meantime, when all phys of the abnormal IOC 12 are completely validated (see Yes route of step A60 of
The validation unit 113 causes the abnormal IOC 12 to function as an initiator and the normal IOC 12 to function as a target and connects a phy for validation to the abnormal IOC 12 (step A80 of
The validation unit 113 causes the abnormal IOC 12 to access the data stored in the work area of the memory 13 with respect to the normal IOC 12 through one of phys in the abnormal IOC 12, the expander 20, and the phy for validation. By doing this, the validation unit 113 checks the initiator function of the phy in the used abnormal IOC 12 (step A90 of
The validation unit 113 determines whether the check result is normal, that is, whether to access the memory 13 (step A100 of
When the check result is normal (see Yes route of step A100 of
In contrast, when the check result is not normal (see No route of step A100 of
When all phys of the abnormal IOC 12 are not completely validated excluding the phys separated in step A70 (see No route of step A110 of
In contrast, when all phys in the abnormal IOC 12 is validated excepting the phy separated in step A70 (Yes route of step A110 of
As described above, the abnormality diagnosis processing in the storage system 1 is completed.
In the meantime, if the generated abnormality is not related with the IOC 12 (for example, the storage device 30 or the phy 50c is abnormal) (See No route of step A10 of
[A-4] Effect
As described above, according to the storage system 1 as the example of the embodiment, the IOC 12 in which the abnormality is detected may be efficiently separated from the system.
Further, the cut-off processing unit 112 may separate every abnormal phy and may avoid the separation of the overall IOC 12 in which the abnormality is detected so that a redundant system may be achieved.
Further, the validation unit 113 performs the validation processing of the initiator function after performing the validation processing of the common function of the IOC 12 in which the abnormality is detected so that it is possible to reduce the influence on the normal IOC 12.
Further, the validation unit 113 uses only one phy in the normal IOC 12 for the diagnosis processing so that it is possible to continuously perform the normal operation of the system during the diagnosis processing.
The disclosed technology is not limited to the embodiment described above and various modification may be made in the invention without departing from the purpose of the embodiment. The configurations or processing of the embodiment may be selected if necessary or appropriately combined.
Hereinafter, in the drawings, reference numerals same as the previously described reference numerals indicate the same components denoted by the previously described reference numerals so that the description thereof will not be repeated.
In a storage system 1 as the first modification example of the embodiment, as illustrated in
The reset processing unit 114 resets the chip of the IOC 12 in which the abnormality is confirmed. Further, when the confirmation unit 111 confirms an abnormality related to the IOC 12, the reset processing unit 114 determines whether the chip of the IOC 12 in which the abnormality is confirmed has been reset in the past. When the reset has not been performed in the past, the reset processing unit 114 resets the chip. Further, the reset processing unit 114 determines whether the chip was successfully reset, that is, the IOC 12 in which the abnormality is confirmed restarted.
For example, the reset processing unit 114 stores a log concerning whether the chip of the IOC 12 has been reset in the past in the memory 13 and determines whether to perform the reset processing referring to the log. The reset processing unit 114 may delete the log stored in the memory 13 when a predetermined period has elapsed.
The abnormality diagnosis processing in the storage system 1 as the first modification example of the embodiment configured as described above will be descried with reference to the flowchart illustrated in
When abnormality occurs in the storage system 1, the confirmation unit 111 confirms whether the generated abnormality is related with the IOC 12 (step B10). For example, the determination is implemented by determining which one of the abnormalities (1) to (4) of the IOC indicates the abnormality, referring to an error log.
When the generated abnormality is related with the IOC 12 (see Yes route of step B10), the reset processing unit 114 determines whether the chip of the IOC 12 in which the abnormality is generated has been reset in the past (step B20).
When the chip of the IOC 12 in which the abnormality is generated has not been reset in the past (No route of step B20), the reset processing unit 114 resets the chip of the IOC 12 in which the abnormality is generated (step B30).
The reset processing unit 114 determines whether the chip was successfully reset, that is, whether the IOC 12 in which the abnormality is generated, is restarted.
When the chip was successfully reset (see Yes route of step B40), the abnormality diagnosis processing in the storage system 1 is completed.
In contrast, when the chip was not successfully reset (see No route of step B40), the cut-off processing unit 112 separates the IOC 12 in which the abnormality is generated from the storage system 1 (step B50) and completes the abnormality diagnosis processing in the storage system 1.
Further, when the chip of the IOC 12 in which the abnormality is generated has been reset in the past (see Yes route of step B20), the cut-off processing unit 112 temporarily separates the IOC 12 in which the abnormality is generated from the storage system 1 (step B60).
The validation unit 113 performs the validation processing A (see steps A30 to A70 of
The validation unit 113 performs the validation processing B (see steps A80 to A120 of
The cut-off processing unit 112 releases the temporary separation of the abnormal IOC 12 to be returned to the abnormal IOC 12 to the storage system 1 (step B90) and completes the abnormality diagnosis processing in the storage system 1. Further, when all phys of the abnormal IOC 12 are separated, the cut-off processing unit 112 does not release the temporal separation of the abnormal IOC 12.
In the meantime, when the generated abnormality is not related with the IOC 12 (for example, the storage device 30 or the phy 50c is abnormal) (See No route of step B10), the CPU 11 or an operator performs the generally abnormality processing by an existing method (step B100) and then abnormality diagnosis processing in the storage system 1 is completed.
As described above, according to the storage system 1 as the first modification example of the embodiment, the same operation and effect as the example of the embodiment described above may be obtained and the following effect may be also achieved.
The reset processing unit 114 confirms whether the chip of the IOC 12 has been reset and when the chip has not been reset, the chip of the IOC 12 in which the abnormality is detected is reset before validating the abnormal portion by the validation unit 113 so that it takes less time to perform the abnormality diagnoses processing.
Hereinafter, in the drawings, reference numerals same as the previously described reference numerals indicate the same components as the reference numerals so that the description thereof will not be repeated.
In a storage system 1 as the second modification example of the embodiment, as illustrated in
The load confirmation unit 115 confirms whether a load of I/O is high, that is, a load of a normal IOC 12 which is used for normal operation of the storage system 1 is high. For example, the load confirmation unit 115 confirms whether the load of I/O is high based on whether the load exceeds a predetermined threshold value.
The redundancy determination unit 116 determines whether a predetermined number or more of phys in the normal IOC 12 is used, that is, the number of phys in the normal IOC 12 which are not separated is a predetermined number or larger (for example, two).
The abnormality diagnosis processing in the storage system 1 as the second modification example of the embodiment configured as described above will be descried with reference to the flowchart illustrated in
When abnormality occurs in the storage system 1, the confirmation unit 111 confirms whether the generated abnormality is related with the IOC 12 (step C10). For example, the determination is implemented by determining which one of the abnormalities (1) to (4) of the IOC indicates the abnormality, referring to an error log.
When the generated abnormality is related with the IOC 12 (see Yes route of step C10), the reset processing unit 114 determines whether the chip of the IOC 12 in which the abnormality is generated has been reset in the past (step C20).
When the chip of the IOC 12 in which the abnormality is generated has not been reset in the past (No route of step C20), the reset processing unit 114 resets the chip of the IOC 12 in which the abnormality is generated (step C30).
The reset processing unit 114 determines whether the chip was successfully reset, that is, whether the IOC 12 in which the abnormality is generated is restarted (step C40).
When the chip was successfully reset (see Yes route of step C40), the abnormality diagnosis processing in the storage system 1 is completed.
In contrast, when the chip was not successfully reset (see No route of step C40), the cut-off processing unit 112 separates the IOC 12 in which the abnormality is generated from the storage system 1 (step C50) and completes the abnormality diagnosis processing in the storage system 1.
Further, when the chip of the IOC 12 in which the abnormality is generated has been reset in the past (see Yes route of step C20), the cut-off processing unit 112 temporarily separates the IOC 12 in which the abnormality is generated from the storage system 1 (step C60).
The load confirmation unit 115 confirms whether the load of I/O is high (step C70).
When the load of I/O is not high (see No route of step C70), the redundancy determination unit 116 confirms whether a plurality of phys in the normal IOC 12 can be used (step C80).
When the plurality of phys of the normal IOC 12 can be used (see Yes route of step C80), the validation unit 113 performs the validation processing A (see steps A30 to A70 of
By doing this, only when the phys of the normal IOC 12 is redundant, the validation processing A of the common function of the abnormal IOC and the validation processing B of the initiator function of the abnormal IOC may be performed.
The validation unit 113 performs the validation processing B (see steps A80 to A120 of
The cut-off processing unit 112 releases the temporary separation of the abnormal IOC 12 to be returned to the abnormal IOC 12 to the storage system 1 (step C110) and completes the abnormality diagnosis processing in the storage system 1. Further, when all phys of the abnormal IOC 12 are separated, the cut-off processing unit 112 does not release the temporal separation of the abnormal IOC 12.
Further, when the plurality of phys of the normal IOC 12 cannot be used (see No route of step C80), the processing proceeds to step C50.
In contrast, when the load of I/O is high (see Yes route of step C70), the load confirmation unit 115 returns to step C70 in order to be in a standby status until the load of I/O is lowered.
By doing this, the validation processing A of the common function of the abnormal IOC and the validation processing B of the initiator function of the abnormal IOC are not performed until the load of I/O is lowered.
In the meantime, when the generated abnormality is not related with the IOC 12 (for example, the storage device 30 or the phy 50c is abnormal) (See No route of step C10), the CPU 11 or an operator performs the generally abnormality processing by an existing method (step C120) and then abnormality diagnosis processing in the storage system 1 is completed.
In the second modification example of the embodiment, any one of the step C70 and step C80 may not be performed.
Further, in step C70, in case the load of I/O is not lowered even when a predetermined time has elapsed, the processing proceeds to step C50 and the cut-off processing unit 112 separates the abnormal IOC 12 from the storage system 1.
As described above, according to the storage system 1 as the second modification example of the embodiment, the same operation and effect as the example of the embodiment described above may be obtained and the following effect may be also achieved.
The load confirmation unit 115 confirms the load of I/O and when the load of I/O is not high, the validation unit 113 validates the abnormal portion, which may not interrupt the task.
Further, the redundancy determination unit 116 determines the redundancy of the phy. When the phy is redundant, the validation unit 113 validates the abnormal portion so that reliability may be improved.
The abnormality diagnosis method of the storage system 1 as the example of the embodiment or the modification examples of the embodiment described above may be achieved not only during the normal operation of the storage system 1, but also during the test of operation confirmation of the IOC 12 in a device manufacturing factory.
Further, the cut-off processing unit 112 may appropriately separate the IOC 12 in which the abnormality is generated by the load of I/O or the number of phys which may be used by the IOC 12 after performing the abnormal diagnosis processing of the IOC 12.
According to the disclosed control system, it is possible to efficiently separate the IOC in which the abnormality is detected from the system.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present inventions have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2012-258254 | Nov 2012 | JP | national |