This invention relates to technology of controlling the power saving of disk devices storing data in file systems.
Needs for a file service using a NAS (Network Attached Storage) system including a file server for managing file systems and a disk array subsystem for storing data of file systems are rapidly increasing in recent years. In accordance with the above, the amount of data stored in the NAS system has increased, and the number of disk devices of the disk array system included in the NAS system is also of an increasing trend.
As the number of disk devices increases, every time each disk device is spun up to operate, the power consumption of the entire disk array system also increases. Therefore, in a large-capacity disk array system including a plurality of disk devices and configuring RAID (Redundant Array of Inexpensive Disks) of the plurality of disk devices, it is particularly necessary to inhibit the power consumption of the entire disk array system.
In this case, by stopping the rotation of the disk devices that are not accessed by the host systems for a certain length of time, the power consumption of the entire disk array system can be inhibited.
However, if frequently accessed data and seldom accessed data are stored in the same disk device, this disk device will be frequently accessed, which lowers the frequency of stopping the rotation, and the power consumption of the entire disk array system cannot be inhibited.
Meanwhile, if the length of time since the disk device is accessed until the rotation is stopped is set short for stopping the rotation of the disk device frequently, it is necessary to start the rotation (i.e. spin up) of the disk device each time the disk device is accessed, which may delay the response to the access.
Therefore, Patent Document 1 describes a system for saving power consumption while inhibiting the degradation of the response speed to accesses.
However, in conventional technologies, the inhibition of time-out occurring in a host system even if the host system sends an access request to the disk device in the spin-off status is not given sufficient consideration.
That is, as in the conventional technologies, if a logical volume is configured separately from a first logical volume for storing data of the disk devices whose rotation is stopped and a second logical volume for storing data of the files accessed with a higher frequency than a predetermined reference value, power consumption can be inhibited by stopping the rotation of the disk device for the accesses to the first logical volume, and degradation of the response time for the accesses in accordance with spin up can be prevented for the accesses to the second logical volume.
However, when giving consideration to reducing power consumption, for example, if all the disk devices whose access frequency has decreased are set to the spin off status, all the disk devices in the spin off status may become the access target of an access request from the host system. In this case, if a response is returned to the host system on condition that the disk devices in the spin off status are spun up and made accessible, the host system shifts to the time-out processing, and the access request is suspended.
Therefore, it is desired that, even if all the disk devices in the spin off status become the access targets, a response should be returned to the host system in the range time-out does not occur to the host system.
This invention was devised in view of the above-mentioned problems of the conventional technologies. Thus, an object of this invention is to provide a storage system and its information processing method capable of returning a response to the host system in the range time-out does not occur to the host system even if the disk device in the power saving status is the access target of the host system.
In order to achieve the foregoing object, this invention is characterized in that a primary controller for processing access requests from a host system distinguishes an access request from the host system and, if an access target designated in the access request includes a disk device in the power saving status, outputs to the disk controller a command for returning the disk device in the power saving status to the accessible status, and sends response information to the host system to the effect that processing for accessing the access target is being performed in response to the access request.
This invention enables the inhibition of time-out by the host system even if the disk device in the power saving status is the access target of the host system.
The embodiments of this invention are now described with reference to the attached drawings. Note that the embodiments described below are merely exemplary of the invention and that this invention is not limited to the specific embodiments described below.
This embodiment explains a case of performing the file system service by using a NAS controller including the function as a file server, and a disk array system.
As more specifically described, this embodiment comprises a NAS controller which distinguishes an access request from a client terminal, and if an access target specified by the access request includes a disk device in the power saving status, outputs to the disk controller of the disk array system a command for returning the disk device in the power saving status to the accessible status, and sends the response information to the client terminal to the effect that the processing for accessing the access target is being performed until the disk device in the power saving status returns to the accessible status in response to the access request.
The NAS controller 101 includes a network I/O processing unit 102, a data list management unit 103, a disk control command unit 104, a file system control unit 106, a data processing unit 107, a metadata cache unit 108, and an I/F (interface control unit) 109.
The NAS controller 101 is configured as a primary controller for exchanging information with a disk array system 112 via a SAN 111 as well as exchanging information with a client terminal 150 of a host system via an external network 110, and performing the processing for the client terminal 150 and the disk array system 112 with reference to the access request from the client terminal 150.
The network I/O processing unit 102 exchanges data with the client terminal 150, the management terminal, and other components via the external network 110. The data list management unit 103 manages various types of information such as, to be described later, file system configuration information 301, volume management information 401, and a disk power saving control table 501. The disk control command unit 104 includes a timer unit 105 for measuring the time and outputs commands for starting or stopping the rotation of the disk devices to the disk array system 112. The file system control unit 106 performs the file system related control such as a command for changing the files and directories to WORM (Write Once Read Many) and others. The data processing unit 107 performs data processing. The metadata cache unit 108 retains a copy of the metadata. The I/F 109 exchanges data with the I/F 113 included in the disk array system 112 via the communication path 111 such as a SAN (Storage Area Network).
In addition to the I/F (interface control unit) 113, the disk array system 112 includes the disk control unit 114 and a plurality of logical volumes (LU0117, LU1118, LU2119, LU3120, LU4121, etc.). Note that the number of logical volumes is not limited to the example shown in
The disk control unit 114 includes a disk rotation management unit 115 and an I/O processing unit 116. The I/O processing unit 116 reads and writes data from and to logical volumes, and the disk rotation management unit 115 controls the start, the stop, and the frequency of the disk rotation in the disk devices configuring logical volumes and others.
In this case, the disk rotation management unit 115, when controlling the start, the stop and the frequency of the disk rotation in the disk devices configuring the logical volumes, spins up the disk rotation until each disk device becomes accessible (e.g. until the disk rotation frequency becomes higher than in the standby status) or spins off the disk rotation until each disk device changes to the power saving status in which power consumption is smaller than in the accessible status; for example, until each disk is stopped.
Note that, when describing the start and the stop of the rotation of the disks included in the disk device, this description refers to the “rotation of the disk device,” which means the “rotation of the disks included in the disk device.”
The NAS controller 101 includes a network interface 204, a CPU (Central Processing Unit) 205, a memory 206, a local disk 207, and an adapter 208, and these are connected so as to be communicable with each other via an internal communication path.
The disk array system 112 includes an interface 209, a cache memory 210, a disk controller 211, and a plurality of disk devices 212 connected with the disk controller 211, and the interface 209, the cache memory 210, and the disk controller 211 are connected so as to be communicable with each other via an internal communication path.
The NAS controller 101 and the disk array system 112 are connected via the SAN 111. The connection method between the NAS controller 101 and the disk array system 112 is not limited to the SAN 111, and a private line or the TCP/IP (Transmission Control Protocol/Internet Protocol) Network and others may also be used.
The network interface 204 in the NAS controller 101 exchanges data with the external network 110. The local disk 207 stores various types of management data such as the programs to be executed by the NAS controller 101, file system configuration information 301, volume management information 401, a disk power saving control table 501, and others.
In the memory 206, various types of data and programs stored in the local disk 207 are read, and the temporary data for processing is stored.
The CPU 205 executes the processing to be performed by the NAS system 101. The adapter 208 exchanges data with the disk array system 112 via the SAN 111 or other networks.
Note that the data list management unit 103, the disk control command unit 104, the file system control unit 106, the data processing unit 107, and others are configured by the CPU 205 executing the programs read from the local disk 207 to the memory 206. Note that these programs may also be configured of one or more codes for causing the CPU 205 to perform various types of processing described later.
Furthermore, the metadata cache unit 108 shown in
The interface 209 in the disk array system 112 exchanges data with the NAS controller 101 via the SAN 111 or other networks. The disk controller 211 controls data read/write and rotation for each disk device 212. The cache memory 210 temporarily retains the data which is read from or written to each disk device 212. Each disk device 212 includes one or more disks whose stop, start and frequency of rotation are controlled by the disk rotation management unit 115 shown in
Note that a plurality of disk devices 212 are combined to configure a RAID group, and logical volumes are configured using a part of or the entire RAID group. The information on the correspondence of the disk devices 212 with the RAID group and the correspondence of the RAID group with the logical volumes belonging to this RAID group is managed in the memory included in the disk controller 211.
Furthermore, the disk controller 211 includes a memory and a CPU, and this memory stores an I/O processing program and a disk rotation management program to be executed by this CPU (not shown in the figure). The disk control unit 114 shown in
The data type 302 is information showing the type of data as the management target. For example, the item #1 shows that the data type is inode information (inode information is a type of metadata which is management data of the file system), and #2 shows that the data type is mount information which is a type of metadata. Furthermore, #3 shows that the data managed by #3 is managed in the hierarchy of mnt/fs0/or lower (i.e. the data of the file system identified by “fs0”). #m and #n show the data types with reference to the data attributes. #n shows that the data is of the WORM (Write Only Read Many) attribute described later while #m shows that the data is given another type of attribute (other than metadata and WORM attributes).
The storage destination LU 303 is information showing the LUs in the disk array system 112 in which the management target data is stored. The cache flag 304 shows whether the management target data should be cached in the metadata cache unit 108 of the NAS controller 101. Note that this embodiment shows that the management target data is cached in the metadata cache unit 108 if the cache flag 304 is “1” and that the data is not cached if the cache flag 304 is “0.” Furthermore, the NAS controller 101 does not have to include the metadata cache unit 108, and in this case, the file system configuration information 301 is not required to include the cache flag 304.
Note that the data types shown in
Furthermore, if it includes a plurality of file systems including files that are hierarchized in a plurality of logical volumes and installed, the NAS controller 101 can operate as a file server managing the plurality of file systems associated with a plurality of shared folders.
The LU number 402 is the identification information of logical volumes.
The associated LU number 403 is the identification information of the LUs (hereinafter referred to as associated LUs) sharing the disk device with the LUs identified by the corresponding LU numbers 402. That is, an associated LU is an LU belonging to the same RAID group as a specific LU, and the identification information of the associated LUs for the LUs identified by the corresponding LU numbers 402 is registered in the associated LU number 403. Therefore, for example, if the identification information of the LU1118 in
Each time the relevant LU is accessed, the time measured by the timer unit 105 (i.e. the access time) is registered in the last access time 404. Therefore, the last access time 404 shows the time of the latest access to the relevant LU.
The continuous rotation flag 405 shows whether to perform the continuous rotation of the disk device configuring the relevant LU, and the setting to “1” shows that the continuous rotation of the disk device is performed regardless of the access frequency and the elapsed time after the last access time, while the setting to “0” shows that the rotation of the disk device might be stopped.
Note that the example in
As for LUs storing such metadata, if the rotation of the disk device configuring an LU is stopped, spin up is required each time it is accessed, which causes latency for the spin up. As a result, the response time performance to the access request is deteriorated. Therefore, as for LUs storing data, such as metadata, whose access frequency is predicted to be high in advance, deterioration in the response time performance to the access request can be prevented by setting the continuous rotation flag 405 to “1.”
The setting method of the continuous rotation flag 405 shown in
If the value of the continuous rotation flag 405 is “0,” the stop latency 406 is set for the volume management information 401. The stop latency 406 is the value used for determining the timing to stop the rotation of the disk device and, when the length of time equal to the stop latency 406 elapses after the last access time 404, the processing of stopping the rotation of the disk device is performed. That is, if a longer period of time than this stop latency 406 elapses without any access to the relevant LU, the processing of stopping the rotation of the disk device configuring the relevant LU will be performed. If the rotation of the disk device is stopped, that time is set as the rotation stop time 407 for the volume management information 401.
The common resource name 502 is information for identifying common resource names published for the external network 110; for example, shared folder names. The FS name 503 is information for identifying the file system configured of disk devices. The attribute 504 shows whether the RG configuring the file system is entered as a power saving target. For example, it shows “off” if the entry is not reserved, and “-” if it is reserved as an entry.
The RG number 505 is the number of the RAID group configuring the file system. The RG status 506 shows whether the disk device belonging to the RAID group is spun up or spun down. For example, it shows “off” if the disk device belonging to the RAID group is spun off as a result of the power being turned off, and “ready” if it is spun up and in the accessible status.
The tree information 507 is information on the storage destination of tree information (e.g. directory configuring the file system and attribute information creating the file structure) on condition that the RG status 506 is “off.” The storage destination of the tree information is, for example, the metadata cache unit 108. The last update date and time 508 is the date and time (year, month, day, hour, minute, second) of the last access to the file system belonging to the RAID group. This last update date and time 508 is updated each time the file system is accessed.
Next, the processing in the NAS controller at the time of spin off is described below with reference to the flowchart in
Firstly, the NAS controller 101, for spinning off the disk device belonging to the file system, refers to the last update date and time 508 in the disk power saving control table 501 at an arbitrarily set frequency; for example, once an hour, retrieves the last access time of the file system as the target of power saving control (S1), and determines whether the elapsed time after the last access is shorter than the limit value of the non-access time; for example, whether or not it is shorter than one hour (S2).
At step S2, if the elapsed time after the last access is determined to be over the limit value, the NAS controller 101 enters the spin off command for the RAID group belonging to the file system as the power saving target, and records “off” in the attribute 504 in the disk power saving control table 501 (S3).
Next, the NAS controller 101 saves the tree information of the file system corresponding with the relevant RAID group; for example, folder tree information, to the area which can be referred to even after the spin off; for example, in the metadata cache unit 108 (S4), records the storage destination of the folder tree information in the tree information 507 in the disk power saving control table 501 (S5), and proceeds to the processing of step S6.
Meanwhile, at step S2, if the elapsed time after the last access is determined to be shorter than the limit value of the non-access time, the NAS controller 101 extracts the name of the next target file system (S6), and determines whether or not the processing related to the extracted target file system has been performed (S7).
The NAS controller 101, if it determines that the processing related to the target file system extracted at step S7 has not been performed, returns to step S1, repeats the steps from S1 to S7 for the next target file system or, if it determines at step 7 that the entire processing for the extracted file system has been completed, completes the processing of this routine.
Next, the other processing in the NAS controller at the time of spin off is described below with reference to the flowchart in
The NAS controller 101 starts the processing on condition that the spin off command related to the RAID group belonging to the file system as the target of power saving is entered, refers to the disk power saving control table 501 with reference to the file system name, retrieves the information of the attribute 504 as the entry request of the target file system as well as retrieve the information of the RG number 505 (S11), and determines whether or not there are any other file systems configuring the same RAID group (S12).
Next, the NAS controller 101, if it determines that there is any other file system configuring the same RAID group, refers to the attribute 504 of the other file system (S13). That is, if the same RAID group is configured of a plurality of file systems, it refers to the entry information of the spin off requests of the other file system, for ascertaining whether the spin off command is entered for the other file system. In this case, if there are a plurality of other file systems, the same number of pieces of information are referred to.
Subsequently, the NAS controller 101 determines whether all the other file systems have entered the spin off command (S14), and if they have not, as spin off is not performed without the entry, completes the processing of this routine or, if it determines that all the entries are completed, proceeds to the processing of step S15.
Meanwhile, the NAS controller 101, if it determines that there is no other file system configuring the same RAID group, issues the spin off request for spinning off the target RAID group to the disk controller 211 by using the command of the disk storage (S15), and completes the processing of this routine.
In this case, if the spin off request is issued to the disk controller 211, the disk rotation management unit 115 of the disk controller 211, assuming that the NAS controller 101 has ordered spin off, performs the power saving control for the disk device 212 belonging to the target RAID group, and changes the disk device 212 belonging to the target RAID group to the spin off status.
If the disk device 212 is changed to the spin off status and the rotation of the same is stopped, the information related to the disk device 212 in the spin off status is transferred from the disk control unit 114 to the NAS controller 101. By this processing, the data list management unit 103 of the NAS controller 101 updates the RG status 506 corresponding with the target RAID group in the disk power saving control table 501 from “ready” to “off.”
Next, the processing in the NAS controller 101 at the time of spin up is described below with reference to the flowchart in
Firstly, the NAS controller 101, in response to the access request from the client terminal 150, performs addressing for specifying the communication target as a name resolution (S21), then performs the processing for making a TCP (Transmission Control Protocol) session (S22), and then performs the processing for making a NetBIOS (Basic Input Output System) session, that is, the processing for making a communication path (S23).
Next, the NAS controller 101 performs the SMB (Server Message Block) negotiation (S24), performs the processing for making the SMB session (S25), and obtains the list of the shared folders (S26). That is, the NAS controller 101 obtains the list of the shared folders published on the external network 110.
Subsequently, the NAS controller 101 performs the processing for connecting the client terminal 150 with the shared folders (S27). In this processing, if the disk device 212 belonging to the file system specified by the access request from the client terminal 150 is in the spin off status, the NAS controller 101 performs at least two round-trip packet transmissions with the client terminal 150.
That is, as it takes several minutes for the disk device 212 in the spin off status to be spun up and return to the accessible status, in that while, the NAS controller 101 will create a response information that the processing for making the disk device 212 in the spin off status accessible is performed with reference to the tree information saved in the metadata cache unit 108, and perform at least two packet transmissions for the purpose of adding the created response information to the command and responding to the client terminal 150.
As commands used for the two packet transmissions; for example, SMB_COM_TREE_CONNECT_ANDX and TRANS2_QUERY_PATH_INFORMATION are available.
At this time, if a delayed response by approximately fifty to fifty-five seconds per packet transmission is made using quasi response information added to the command, the two packet transmissions can gain a total of one hundred to one hundred and ten seconds. This time gaining enables the inhibition of the time-out processing by the client terminal 150.
That is, if the NAS controller 101 does not return any response from the client terminal 150 to the client terminal 150 while performing the processing for connecting the client terminal 150 with the shared folders, the client terminal 150 shifts to the time-out processing for terminating the access to the NAS controller 101 on condition that the time-out period has elapsed.
Therefore, while the disk device 212 in the spin off status is spun up and returns to the accessible status, the NAS controller 101 will perform at least two packet transmissions with the client terminal 150 so as to inhibit the client terminal 150 from terminating the access to the NAS controller 101 even if the timeout period has elapsed.
Next, the NAS controller 101, on condition that the disk device 212 in the spin off status was spun up and changed to the accessible status, for example, on condition that it has received the information that the disk device 212 in the spin off status was spun up and changed to the accessible status from the disk controller 211, as well as stopping the sending of the response information to the client terminal 150, performs the processing for operating the file or the folder (S28). That is, at this step, the processing for the disk control unit 114 to actually access the disk device 212 belonging to the shared folder selected with reference to the operation information form the client terminal 150 is performed.
Next, the NAS controller 101, on condition that it has obtained the information that the access to the disk device 212 is completed from the disk control unit 114, disconnects the SMB session (S29), disconnects the TCP session (S30), and completes the processing of this routine.
Next, the processing in the NAS controller when receiving a shared folder connection request is described below with reference to the flowchart in
Firstly, the NAS controller 101 starts the processing when receiving the shared folder connection request from the client terminal 150 by smbd (server message block daemon), receives the tree connection request for a connection to the specified shared folder among a plurality of shared folders (S41), refers to the disk power saving control table 501 with reference to the received tree connection request, retrieves the information related to the file system name and the RG status 506 of the required shared folder (S42), and checks the RG status 506 with reference to the retrieved information (S43).
That is, at step S43, for determining whether or not the file system belonging to the required shared folder is in the spin up status, the NAS controller 101 determines whether “ready” exists in the RG status 506 corresponding with the file system belonging to the required shared folder. At step S43, if the RG status 506 is determined to be “off,” the NAS controller 101, assuming that the file system belonging to the required shared folder is in the spin off status, sets the packet transmission inhibiting flag (S44), enters the spin up command for spinning up the disk device 212 in the target RAID group including the file system belonging to the required shared folder (S45), and proceeds to the processing of step S46.
Meanwhile, at step S43, if “ready” is determined to exist in the RG status 506 corresponding with the file system belonging to the required shared folder, the NAS controller 101 proceeds to the processing of step S46, and creates a response packet at step S46.
Next, the NAS controller 101 confirms that the packet transmission inhibiting flag is not set for the required shared folder (S47) and, if it determines that the packet transmission inhibiting flag is set (Yes), enters the response packet to the sleep transmission queue (S48), and completes the processing of this routine.
Meanwhile, if it determines at step S47 that the packet transmission inhibiting transmission flag is set for the required shared folder (No), the NAS controller 101 enters the response packet to the transmission queue (S49), and completes the processing of this routine. Note that the sleep transmission queue will send the response packet automatically when the period of time set for the sleep timer (not shown in the figure) installed in the NAS controller 101 elapses.
Next, the processing in the NAS controller at the time of spin up is described below with reference to the flowchart in
Firstly, the NAS controller 101 starts the processing with reference to the file system name, issues a spin up request to the disk control unit 114 of the disk controller 211 for spinning up the disk device 212 belonging to the target RAID group by using the command of the disk storage (S51), and completes the processing of this routine.
In this case, the disk control unit 114, when receiving the spin up request, spins up the disk device 212 as the target of the spin up, keeps the rotation frequency as specified, and changes the device to the accessible status. Subsequently, the disk control unit 114 sends the information that the spin up of the disk device 212 is completed to the NAS controller 101.
By this method, the data list management unit 103 of the NAS controller 101 updates the RG status 506 corresponding with the RG including the disk device 212 as the target of the spin up from among the RG status 506 in the disk power saving control table 501 from “off” to “ready.” Furthermore, the NAS controller 101 reconnects the response packet from the sleep queue to the transmission queue, and immediately switches to the processing for publishing the response packet.
In this embodiment, as the NAS controller 101 will determine the access request from the client terminal 150, output the command (request) for spinning up the disk device in the spin off status to the disk controller 211 if the access target includes the shared folders corresponding with the disk devices in the spin off status, and send the response information that the processing for the connection with the shared folder as the access target is being performed in response to the access request to the client terminal 150 by at least two packet transmissions, it is possible to inhibit the client terminal 150 from shifting to the time-out processing.
According to this embodiment, even if the disk device in the power saving mode is the access target of the client terminal 150, time-out by the client terminal 150 can be inhibited.
Furthermore, this embodiment achieves the power saving control over the NAS system and a broader application of the power saving function of the disk array system.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2009/062256 | 6/30/2009 | WO | 00 | 1/12/2010 |