Embodiments of the present invention relate to a memory controller and a memory system.
An SSD (Solid State Drive) is provided with a buffer area (buffer memory) for temporarily storing read-out data.
An SSD drive including a plurality of ports is capable of connecting to a plurality of hosts via the plurality of ports. In such a configuration, the plurality of hosts can cause the SSD drive to execute a read operation in parallel. The SSD drive handles processing of read commands received via the respective ports as threads that are sets of processing for each port. Accordingly, the SSD drive operates normally with a plurality of threads for the plurality of hosts.
Conventionally, in the case of operating in the plurality of threads, an allotted amount of the buffer area for each thread was a fixed amount. Accordingly, when the allotted amount of the buffer area is not ample, a size of the buffer area becomes insufficient in cases of processing commands with a large read-out size in one thread, and processing with a large number of read commands being issued simultaneously from the hosts.
A memory controller that reads data from nonvolatile memory according to an embodiment of the present invention includes: first and second ports that receive commands; a thread executing unit that executes a first thread that is a set of processes based on the command received by the first port, and a second thread that is a set of processes based on the command received by the second port; a buffer; and a buffer managing unit that manages a first buffer area to be allotted to the first thread and a second buffer area to be allotted to the second thread, wherein the thread executing unit stores read data in the first buffer area upon executing the first thread, and stores read data in the second buffer area upon executing the second thread, and the buffer managing unit dynamically allots regions in the buffer to the first and second buffer areas.
Hereinbelow, embodiments of a memory controller and a memory system will be described in detail with reference to the attached drawings. Note that these embodiments do not limit the present invention.
The memory system 100 of the embodiment handles a set of processes based on a command received from outside via the port 21 as a thread 0, and handles a set of processes based on a command received from outside via the port 22 as a thread 1. A thread is a set of processes based on a command inputted via a port.
The read buffer 6 of the memory system 100 of the embodiment includes regions worth sixty-four clusters. One cluster corresponds to eight sectors. In the read buffer 6, in assuming that a region for storing data worth one cluster is one resource, the read buffer 6 includes sixty four pieces of resources.
The regions of the read buffer 6 of the embodiment includes resources (fixed resources) allotted fixedly (statically) to each of the thread 0 and the thread 1, and resources (shared resources) shared by the thread 0 and the thread 1 and allotted dynamically and variably. For each thread, as illustrated in (b) in
The shared resources are allotted dynamically according to an amount of the resources that become necessary in each thread. In the embodiment, the resources that become necessary in the process based on the read command executed in the thread 0 are allotted from the shared resources. Similarly, the resources that become necessary in the process based on the read command executed in the thread 1 are allotted from the shared resources.
A read command process in the thread 0 will be described as an example, and an operation of the memory system 100 will be described by using a flow chart of
Firstly, in step S301, the read command is received via the port 21. Accordingly, the process based on the read command is included in the thread 0. In step S302, an amount of resources that will be necessary for the process based on the read command is specified. For example, in assuming that a read-out size of the read command is 256 kbytes and a data size that can be stored in one resource is 4 kbytes, the resources necessary for the execution of the read command becomes sixty four pieces. Next, a determination is made as to whether allotment of the shared resources is necessary or not (step S303). In a case where the allotment of the shared resources is necessary since the read-out will not be completed by the eight pieces of fixed resources (step S303: Yes), the buffer managing unit 4 allots forty-eight pieces of shared resources to the thread 0 as illustrated in (b) of
The process proceeds to step S305 after step S304. Further, in a case where the read-out size of the read command is a size that can be read out by the fixed resources and thus the shared resources are not necessary (step S303: No), the process also proceeds to step S305. In step S305, a determination is made on whether there is a vacancy in the fixed resources of the thread 0. That is, a determination is made on whether there is a region in which data being read out does not exist or not in the regions of the fixed resources of the thread 0 in the read buffer 6.
In a state where the data read from the NAND chips 41, 42, . . . 4n to the read buffer 6 is not performed at all, since there is a vacancy in the fixed resources (step S305: Yes) due to all of the eight pieces of fixed resources of the thread 0 being vacant, the process proceeds to step S308. In step S308, the thread executing unit 3 performs one resource worth of data read from the NAND chips 41, 42, . . . 4n via the NAND controller 5. That is, the thread executing unit 3 causes one resource worth of the read-out data from the NAND chips 41, 42, . . . 4n to be read out in the region of the fixed resource in the read buffer 6.
Thereafter, the process proceeds to step S309, and a determination is made on whether there is data that has not yet been read out or not. For example, in a case of just having read out the first one resource worth, since only one resource worth among the sixty four resources worth that are now necessary has been read out, the process returns to step S305 for still having data that has not yet been read out (step S309: Yes). This is repeated until the vacancy worth the eight pieces of fixed resources is used up. When the reading of the data worth the eight pieces of fixed resources is completed, the vacancy in the fixed resources of the thread 0 becomes zero in step S305 (step S305: No). That is, the process proceeds to step S306 since there is no ninth fixed resource.
In step S306, a determination is made on whether there is a vacancy in the shared resources of the thread 0 or not. That is, a determination is made on whether there is a region in which data being read out does not exist among the regions of the shared resources in the read buffer 6 or not. Since the buffer managing unit 4 allotted the forty eight pieces of shared resources to the thread 0 in step S304, there is a vacancy in the shared resources (step S306: Yes); whereby the process proceeds to step S307 and uses one piece of the shared resources. That is, in step S308, one resource worth of data is read out to the region in the shared resource via the NAND controller 5 from the NAND chips 41, 42, . . . 4n in one resource worth size. Thereafter, the process proceeds to step S309, and a determination is made on whether there is data that has not yet been read out or not. The process ends in a case where there is no read-out data (step S309: No), however, returns to step S305 in a case where there still is read-out data left (step S309: Yes). This is repeated until the vacancy worth the forty eight pieces of shared resources is used up.
When the read out worth the forty eight pieces of shared resources is finished, the process proceeds to step S310 since there no longer is a vacancy in the shared resources (step S306: No), where the thread executing unit 3 suspends rest of the read out process of the thread 0. Thereafter, the process proceeds to the read command process of the thread 1 (step S311). In the read command process of the thread 1 also, the processes similar to the above processes are executed.
In a case where the fifty-six pieces of resources became insufficient for processing the read command for the thread 0 (step S306: No), the process of the thread 0 is suspended as aforementioned (step S310), and the process of the read command for the subsequent thread 1 is started (step S311). Similarly, in a case where the resources of the thread 1 became insufficient, or in a case where the process has been finished quickly for example for a size of the read-out data of the read command in the thread 1 being small or the like, the thread executing unit 3 restarts the process of the read command of which process was suspended in the thread 0. Accordingly, switching the process of the threads due to insufficient resources or units of the processes of the read commands is referred to as switching the threads by round robin.
Notably, the read-out data stored in the shared resources is thereafter transferred sequentially to the host, and the shared resource that had transferred the read-out data to the host is released. The releasing of the shared resources is an operation independent from the switching of the processes of the thread as aforementioned.
According to the memory system 100 of the present embodiment, it becomes possible to use up a total of fifty-six pieces of resources for one thread, namely the eight pieces of fixed resources and the forty eight pieces of shared resources. That is, compared to a method of fixedly allotting the sixty four pieces of resources to two threads by thirty two pieces each as illustrated in (a) of
For example, in the case where the read command controller 2 received the read command that reads out 256 kbytes that is worth sixty four pieces of resources as the thread 0, and received the read command that reads out 4 kbytes that is worth one piece of resource as the thread 1, the surplus of the resources become as illustrated in
In the method that fixedly allots the sixty four pieces of resources to two threads by thirty two pieces each as illustrated in (a) of
According to the first embodiment, the resources are allotted dynamically depending on a load of each thread in the memory system that works on a plurality of threads. Due to this, larger number of resources can be allotted to threads with the read command with a large amount of read-out data and with large number of simultaneous issuance of read commands from a host, whereby the performance of the read command process can be improved.
In the present embodiment also, regions of a read buffer 6 include fixed resources to be allotted to each of two threads, and shared resources that can be used in common in all threads. The shared resources are allotted dynamically to each thread by the following method. In the present embodiment, the shared resources are allotted dynamically based on a number of resources used in a read command that had been processed in each thread. An operation of the memory system 100 will be described using a flow chart of
Firstly, an amount of resources that can be used in the thread 0 is acquired in step S501. Specifically, according to the flow chart of
Accordingly, in the present embodiment, a load on each thread is determined based on a number of processed read commands, an average in a most recent certain time period of the accumulated sum of the size of the read-out data by the read command and the like. Then, larger amount of shared resources are allotted to a thread with greater load. Further, the load of each thread may be re-evaluated at a time such as after having processed a certain number of commands, or every certain time.
In the above, the shared resources are distributed by the amount of processed resources in each thread, that is, by a ratio of an used amount of a read buffer 6 used in the past in each thread, however, no limitation is made to the above method so long as being based on the used amount of a read buffer 6 used in the past in each thread.
Returning to
Further, in step S504, a determination is made on whether there is a vacancy in the total of fourty six pieces of resources, namely the fixed resources of the thread 0 and the shared resources or not. That is, a determination is made on whether there is a region in which data being read out does not exist or not in the regions of the fixed resources or the shared resources of the thread 0 in the read buffer 6.
In a state where the data read from NAND chips 41, 42, . . . 4n to the read buffer 6 is not performed at all, since there is a vacancy in the fixed resources (step S504: Yes) due to all of the forty-eight pieces of resources that the thread 0 can use being vacant, the process proceeds to step S505. In step S505, a thread executing unit 3 performs one resource worth of data read from the NAND chips 41, 42, . . . 4n via a NAND controller 5. That is, the thread executing unit 3 causes one resource worth of read-out data from the NAND chips 41, 42, . . . 4n to be read out in the region of the fixed resource or the shared resource in the read buffer 6.
Thereafter, the process proceeds to step S506, a determination is made as to whether there is data that has not yet been read out or not, and the process ends in a case where there is no read-out data (step S506: No), however, returns to step S504 if there is data that has not yet been read out (step S506: Yes). This is repeated until the thread 0 reads out data in the forty six pieces of available resources. When the read out of the data worth forty six pieces of resources is finished, a vacancy in the available resources in the thread 0 becomes zero in step S504 (step S504: No), whereby the process proceeds to step S507, and the thread executing unit 3 suspends rest of the read-out process of the thread 0. Thereafter, the process proceeds to the read command process of the thread 1 (step S508). In the read command process of the thread 1 also, the processes similar to the above processes are executed.
According to the memory system 100 of the present embodiment, the shared resources are allotted to each thread in accordance with the ratio of the number of resources that are already processed in each thread. As illustrated in (c) of
Further, for example, in the case where the read command controller 2 received the read command that reads out 256 kbytes that is worth sixty four pieces of resources as the thread 0, and received the read command that reads out 4 kbytes that is worth one piece of resource as the thread 1, the surplus of the resources become as illustrated in
In the method that fixedly allots the sixty four pieces of resources to two threads by thirty two pieces each as illustrated in (a) of
According to the second embodiment, the buffer regions (resources) are allotted dynamically depending on data of the load of each thread of the past in the memory system that works on a plurality of threads. Due to this, larger number of resources can be allotted to threads with the read command with a large amount of read-out data and with large number of simultaneous issuance of read commands from a host, whereby the performance of the read command process can be improved.
The present embodiment is another embodiment of the second embodiment. In the present embodiment, shared resources are allotted to each thread based on a queue length that measures a read-out data amount of a read command that is waiting for processing by a command queue for each thread in a read command controller 2 in predetermined units (for example, 4 kbytes that is a cluster size).
For example, as illustrated in
When the thread executing unit 3 finishes processing the read command of the thread 0, the queue length of the command queue 210 decreases by the data amount read out by the read command, and when the read command of the thread 1 is finished processing, the queue length of the command queue 220 decreases by the data amount read out by the read command. A buffer managing unit 4 allots the shared resources to the threads 0 and 1 based on the queue lengths of the command queues 210 and 220, that is, the data amounts scheduled to be read out in each thread. A method of calculating allotment amounts from the queue lengths is not limited so long as larger number of shared resources are allotted to threads with longer queue length. For example, a predetermined amount of shared resources may be allotted if the queue length of a certain thread is a predetermined threshold or more.
In the present embodiment, a load of each thread is determined based on a number of read commands in an unprocessed command queue, or a total sum of a size of the data scheduled to be read out by the read command. Then, larger number of shared resources are allotted to a thread with greater load. Further, the load of each thread may be re-evaluated at a time such as after having processed a certain number of commands, or every certain time.
According to the third embodiment, the buffer regions (resources) are allotted dynamically based on the data amount scheduled to be read out in each thread in the memory system that works on a plurality of threads. Due to this, larger number of resources can be allotted to threads with the read command with a large amount of read-out data and with large number of simultaneous issuance of read commands from a host, whereby the performance of the read command process can be improved.
Notably, although a port number was set to two, a thread number was set to two, and the ports and the threads were caused to be on one-to-one basis in the first to third embodiments, a limitation is not necessarily made hereto. Further, the port number and the thread number may not be limited to two, and may be a larger number greater than two. Even if the thread number is three or more, the processes of the threads are switched by round robin. The mount of the fixed resources may be changed according to the thread number.
Further, in a case where there is only one thread, the aforementioned fixed allotment becomes unnecessary, and all of the resources can entirely be used for the thread.
In the present embodiment, in a case where resources that are available for a certain thread has been used up and the process has shifted to processing of another thread in the first to third embodiments, read out to a page register (page read) is performed for rest of data of a suspended read process.
Specifically, process of
In step S901 of
As illustrated in
According to the fourth embodiment, in a case where the available resources are used up in a certain thread and the process proceeded to the processing of another thread, the page read is performed for the rest of the data in the suspended read process. Due to this, an improvement in reading performance becomes possible.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
This application is based upon and claims the benefit of priority from Provisional Patent Application No. 61/876,015, filed on Sep. 10, 2013; the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
61876015 | Sep 2013 | US |