This invention relates to an information-processing system, and specifically to management of lives of rewritable nonvolatile memories.
In a rewritable nonvolatile memory, the life of writing is limited. Patent Document 1 discloses, in a memory device including a rewritable nonvolatile memory, a technology of evenly averaging numbers of times of writing in respective physical blocks of the nonvolatile memory. The technology of averaging the numbers of times of writing in the respective physical blocks of the rewritable nonvolatile memory is called wear leveling.
The inventors of the application have found that, in the case of an information-processing system having a first information-processing unit that performs writing of information in a first memory device having a nonvolatile memory and a second information-processing unit that performs writing of information in a second memory device having a nonvolatile memory, when the concept of wear leveling is applied to distribution of workloads to the respective information-processing units, the lives of the nonvolatile memories of the first information-processing unit and the second information-processing unit come to ends at almost exactly the same time, and continuous operation of the information-processing system is obstructed.
An information-processing system of the invention has a first information-processing unit that performs writing of information in a first memory device having a nonvolatile memory, a second information-processing unit that performs writing of information in a second memory device having a nonvolatile memory, a first counter that counts a number of times of writing in the first memory device, and a second counter that counts a number of times of writing in the second memory device, and assignment of workloads to the first information-processing unit and the second information-processing unit is performed based on a replacement time of the first memory device, a replacement time of the second memory device, output of the first counter, and output of the second counter. Thereby, the above described problem is solved.
The memory devices including the nonvolatile memories may be replaced while some of information processing units are stopped and running of the other information processing units is continued, and thereby, continuous operation of the information-processing system may be performed.
As below, an example will be explained using the drawings.
Each of the server devices 102 to 107 has a central processing unit (CPU) 110, a main memory device 111, a memory device 112 having a rewritable nonvolatile memory, a controller 113 of the memory device, a network interface (I/F) 114 for connection to the network switch. In the example, the main memory device 111 includes a DRAM and the memory device 112 includes an NAND flash memory as the rewritable nonvolatile memory. Note that the invention may be applied to an embodiment in which the memory device 112 includes a phase-change memory as the nonvolatile memory. The memory device controller 113 controls writing in the memory device 112 and reading from the memory device 112. Further, the memory device controller 113 has a counter 115 that counts the number of times of writing in the memory device 112 as a controlled object. The respective server devices 102 to 107 may be independently stopped and the memory devices 112 of the stopped server devices are replaceable. Therefore, the memory devices 112 of the stopped server devices may be replaced by new memory devices 112.
The server device 102 controls assignment of workloads within the information-processing system 101 as a scheduling node. Modules stored in the main memory device 111 of the server device 102 are shown in
The server devices 103 to 106 as calculation nodes read out the program 404 and the data 405 necessary for execution of the workloads from the storage device 109 according to the assignment instructions from the server device 102 as the scheduling node, and execute the assigned workloads.
The server device 107 as the test server device collects missing information with respect to the workload lacking at least some of the information of the amount of load of the workload, the execution time, and the number of times of writing in the memory device 112. Further, in the case of the workload with no entry in a workload table 112, the server device 107 also adds anew entry to the workload table 112. Modules stored in the main memory device 111 of the server device 107 are shown in
As below, an operation of the information-processing system 101 will be explained using
At step 201, the information collection module 301 of the server device 102 reads out the assignment-scheduled workload list 402, the workload information table 403, and the workload assignment table 406 from the storage 109.
At step 202, the information collection module 301 of the server device 102 sends queries to the server devices 103 to 106 about presence or absence of workloads being executed, and the information update module 304 deletes the entries no longer being executed among the entries of the workload assignment table 406 based on the query results.
At step 203, the server device 102 determines whether or not the workload assignment to be performed is the first assignment of the day. If the workload assignment is the first assignment of the day, the operation of the information-processing system 101 moves to step 204 and, if not, moves to step 209.
At step 204, the information collection module 301 of the server device 102 reads out the maintenance plan information 401 from the storage device 109, and collects the numbers of times of writing in the respective memory devices 112, i.e., the count values as output of the counters 115 from the server devices 103 to 106.
At step 205, the scheduling module 302 calculates the average times of writing per day so that the numbers of times of writing in the respective memory devices 112 may reach the ends of the lives at the scheduled replacement dates of the respective memory devices 112 of the server devices 103 to 106 from the maintenance plan information 401 and the count values of the respective counters 115 obtained at step 204, and sets the calculated numbers of times to the scheduled remaining numbers of times of writing of the day for the respective memory devices 112. Here, the lives in the example are the maximum values of the numbers of times of writing set for the respective memory devices 112, and may be values with margins for securement of reliability.
At step 206, the server device 102 checks whether or not there is a workload being continuously executed from the previous day in the server devices 103 to 106 based on the workload assignment table 406. If there is a workload, the operation of the information-processing system 101 moves to step 207 and, if not, moves to step 209.
At step 207, the scheduling module 302 of the server device 102 calculates the scheduled numbers of times of writing in the memory devices 112 of the day by the workloads being continuously executed from the previous day based on the information of the assignment times of the workloads of the workload assignment table 406 and the execution times and the information of the number of times of writing in the memory device 112 per hour of the workload information table 112.
At step 208, the scheduling module 302 updates the scheduled remaining numbers of times of writing in the memory devices 112 by subtracting the scheduled remaining numbers of times of writing in the memory devices 112 by the workloads being continuously executed from the previous day calculated at step 207 from the scheduled remaining numbers of times of writing for the memory devices 112 of the day set at step 205.
At step 209, the information collection module 301 of the server device 102 reads out the assignment-scheduled workload list 402 from the storage device 109, and collects workload status information with respect to each server device from the server devices 103 to 106. Here, the workload status information is information containing the CPU utilization and the memory utilization of the respective server devices of the server devices 103 to 106 in the example.
At step 210, the scheduling module 302 determines whether or not there is a workload lacking at least some of the information of the workload itself in the workload information table 403 or the information of the amount of load of each workload, the execution time, and the number of times of writing in the memory device 112 among the workloads in the assignment-scheduled workload list 402 read out at step 209. If there is a workload lacking the information, the operation of the information-processing system 101 moves to step 211 and, if there is no workload lacking the information, moves to step 212.
At step 211, the scheduling module 302 determines distribution of the workload determined to lack the information at step 210 in the test server device 107. Further, the assignment instruction module 303 gives instructions of execution of the workload to the server device 107 as the test server device and executes addition of the entry of the workload in the workload assignment table 406 and deletion of the entry of the workload from the assignment-scheduled workload list 121. The server device 107 acquires the program 404 and the data 405 for execution of the workload from the storage device 109 and executes the workload. In the examples shown in
At step 212, the scheduling module 302 of the server device 102 determines whether or not there is an assignable workload to the server devices 103 to 106 among the unassigned workloads in the assignment-scheduled workload list 402 read out at step 209. The determination of assignability is performed based on the value of the scheduled remaining number of times of writing in each memory device 112 and the workload information table 403, in the example, performed based on information of the CPU utilization, the memory utilization, the execution time, and the number of times of writing in each memory device 112 per hour as the amount of load of the unassigned workload and the values of the CPU utilization, the memory utilization, the execution time, and the scheduled remaining number of times of writing in each memory device 112 of the respective server devices.
If at least one server device of the server devices 103 to 106 has room in the workload status and there is an unassigned assignable workload, the operation of the information-processing system 101 moves to step 213. If any server device of the server devices 103 to 106 has no room in the workload status and there is no unassigned assignable workload, or if there is no unassigned workload, the flow is executed again from step 201 after a fixed waiting time.
At step 213, the scheduling module 302 of the server device 102 determines assignment to the server device closer to the stop time, i.e., having the memory device 112 closer to the replacement time by giving priority to the workload with the larger number of times of writing in the nonvolatile memory, i.e., the workload with the larger number of times of writing in the memory device 112 of the day in the example among the assignable workloads calculated at step S212 based on the workload information table 403.
The examples shown in
At step 214, the assignment instruction module 303 of the server device 102 gives instructions to start execution of workloads to the server devices 103 to 106 according to the assignment of the workloads determined at step 213. Further, the assignment instruction module 303 that has given instructions of assignment of the workloads executes addition of the entry of the workload in the workload assignment table 406 and deletion of the entry of the workload from the assignment-scheduled workload list 121. The server devices 103 to 106 to which the instructions to start execution of workloads have been given read out the program 404 and the data 405 necessary for execution of the respective workloads from the storage device 109, store them in the main memory devices 111 and the memory devices 112 within the server devices, and start execution of the workloads.
At step 215, the scheduled numbers of times of writing in the respective memory devices 112 in the remaining time of the day by the workloads assigned at step 214 are calculated from the assignment times of the workload assignment table 406 and the numbers of times of writing in the memory devices 112 per hour of the workload information table 403, the calculation results are subtracted from the scheduled remaining numbers of times of writing in the respective memory devices 112 of the day, and the values of the scheduled remaining numbers of times of writing in the respective memory devices 112 of the day are updated. After step 215, the information-processing system 101 returns to step 212 and executes the flow again.
As described above, the numbers of times of writing in the respective memory devices 112 are not controlled to be averaged, but the workloads are distributed based on the replacement times of the respective memory devices 112, and thereby, the memory devices 112 may be replaced while some of information processing units are stopped in a planned manner based on the replacement times and running of the other information processing units is continued, and the continuous operation of the information-processing system 101 may be performed.
101: information-processing system, 102-107: server devices, 108: network switch, 109: storage device, 110: central processing unit (CPU), 111: main memory device, 112: memory device, 113: controller of memory device, 114: network interface (I/F), 115: counter.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2012/003414 | 5/25/2012 | WO | 00 | 11/25/2014 |