Embodiments described herein relate generally to a memory system.
A memory system comprising a memory capable of queuing a plurality of messages, and a processing unit which processes these messages is known.
Also, in a memory system comprising a nonvolatile memory and a controller which controls the nonvolatile memory, access to the nonvolatile memory is performed by, for example, translating logical address LBA managed by a host into logical address LCA which is the read/write unit of the nonvolatile memory.
In general, according to one embodiment, a memory system comprises: a first memory including a message queue having first to nth addresses (n is a natural number greater than or equal to 2), a first pointer showing one of the first to nth addresses, and a second pointer showing one of the first to nth addresses; a monitor unit which detects whether the first and second pointers show the first address; and a processing unit which changes an address shown by the first pointer from the first address to an ith address (i is a natural number greater than or equal to 2 and less than or equal to n) when the first and second pointers show the first address. An address shown by the second pointer is incremented from the first address to a (j+1)th address (j is a natural number greater than or equal to 1) when first to jth messages are queued in the first to jth addresses, the monitor unit outputs information when (j+1) is equal to i, and the processing unit processes the first to jth messages on the basis of the information.
Memory System
A memory system 10 comprises a CPU 11, a memory 12, a monitor unit 13, and a bus 14 for connecting these elements.
The memory 12 is a buffer memory. The memory 12 may be a volatile RAM such as a static random access memory (SRAM) or a dynamic random access memory (DRAM), or a nonvalatile RAM such as a magnetic random access memory (MRAM), a resistance random access memory (ReRAM), or a ferroelectric random access memory (FeRAM).
The memory 12 includes a message queue MQ having first, second, . . . , nth addresses A1, A2, . . . An (where n is a natural number greater than or equal to 2), and a first pointer (a head pointer) HP and a second pointer (a tail pointer) TP holding an address of the message queue MQ. The first, second, . . . , nth addresses A1, A2, . . . An may be a part of all addresses in the message queue MQ. That is, the first address A1 is may be an initial address in the message queue MQ or may be an address except the initial address, for example, a middle address in the message queue MQ.
The monitor unit 13 is, for example, a hardware monitor. The monitor unit 13 monitors the first and second pointers HP and TP.
From the state in which the first and second pointers HP and TP show the same address, the CPU 11 (firmware FW) increments the address shown by the first pointer HP by a predetermined number. The predetermined number corresponds to the number of messages required by the CPU 11. The number of messages required by the CPU 11 is intended as the numbers of messages that can be processed for the first time after these messages have been queued in the message queue MQ.
For example, in a case where the first and second pointers HP and TP both show the first address A1, when the CPU 11 is to process k messages (where k is a natural number greater than or equal to 2), the CPU 11 changes the address shown by the first pointer HP from the first address A1 to a (k+1)th address A(K+1).
Meanwhile, in each of the addresses in the message queue MQ, a message transferred from a message source 20 is queued. The message source 20 is, for example, a CPU or a hardware engine. When the message source 20 queues one message in address Ai in the message queue MQ, the message source 20 increments the address shown by the second pointer TP from Ai to Ai+i.
That is, when the message source 20 queues k messages in addresses from the first address A1 to a kth address Ak, for example, the message source 20 changes the address shown by the second pointer TP from the first address A1 to a (k+1)th address A(k+1). As a result, the addresses shown by the first and second pointers HP and TP become the same again.
When the first and second pointers HP and TP show the same address, the monitor unit 13 notifies the CPU 11 of this fact.
When the CPU 11 confirms that the first and second pointers HP and TP show the same address, the CPU 11 retrieves the k messages queued in the message queue MQ and processes these messages. Also, the CPU 11 may increment the address shown by the first pointer HP by a predetermined number (i.e., the number of messages required by the CPU 11) again, and repeat the above-described operation.
According to the aforementioned memory system, when the addresses shown by the first and second pointers HP and TP are the same, the CPU 11 increments the first pointer HP by a predetermined number corresponding to the number of necessary messages in advance. Further, every time a message from the message source 20 is queued, the message source 20 increments the address shown by the second pointer TP.
That is, the state that the addresses shown by the first and second pointers HP and TP have become the same again means that a given number of messages required by the CPU 11 are all gathered. Therefore, when the addresses shown by the first and second pointers HP and TP have become the same, the monitor unit 13 notifies the CPU 11 of this fact. Also, when a notification is received from the hardware monitor 13, the CPU can process the given number of messages.
The above means that the CPU 11 can process the given number of messages efficiently. That is, in the past, whenever a message is queued, the queue was notified to the CPU 11, and every time the CPU 11 receives the notification, the CPU 11 needed to confirm whether all of the necessary messages are gathered. For this reason, when the messages required by the CPU 11 are not completely queued, this confirmation by the CPU 11 is performed uselessly, causing degradation in the performance of the memory system. Note that this will be described later.
Therefore, according to the present embodiment, advantages such as improving the performance of the memory system by reducing the load of the CPU 11, and reducing power consumption by eliminating unnecessary operation of the CPU 11 can be obtained.
Operation
An embodiment of the operation of the memory system of
In the initial state, addresses indicated by the first and second pointers are the same. In this case, it is assumed that the first and second pointers both indicate the first address A1 (step ST01: HP=TP).
First, the CPU increments the address shown by the first pointer HP based on the required number of messages. In this case, since the number of messages required by the CPU is four, the CPU changes the address shown by the first pointer HP from the first address A1 to a fifth address A5 (step ST02).
After that, the CPU is brought into a stall state. That is, the CPU waits until information indicating that all of four messages M1, M2, M3, and M4 are gathered is received from the monitor unit (step ST03).
Meanwhile, a message is transferred from the message source MS to the message queue MQ. The message source MS increments the address shown by the second pointer TP every time the message is transferred (incrementation of TP).
Further, when all of the four messages M1, M2, M3, and M4 required by the CPU are queued in the message queue MQ, the first and second pointers HP and TP both show the fifth address A5.
When the monitor unit detects that the first and second pointers HP and TP show the same address, the monitor unit informs the CPU of this fact (informs that HP=TP).
When the CPU receives a notification from the monitor unit that the first and second pointers HP and TP show the same address, the CPU stops a stall state and retrieves the four messages M1, M2, M3, and M4 from the message queue MQ and process these messages (ST04).
Note that the above operations are performed repeatedly.
In the initial state, addresses indicated by the first and second pointers are the same. In this case, it is assumed that the first and second pointers both indicate a first address A1 (step ST11: HP=TP=A1,
The CPU waits until information indicating that a message is queued is received from the monitor unit (step ST12).
Meanwhile, as shown in
When the monitor unit detects that the first and second pointers HP and TP show different addresses, as shown in
When the CPU receives a notification from the monitor unit that the first and second pointers HP and TP show different addresses, as shown in
Further, when the CPU determines that not all of a given number of messages that are required are gathered, the CPU suspends a process of message Mi (i=1) retrieved from the message queue MQ (step ST14).
Further, as shown in
That is, as shown in
When the monitor unit detects that the first and second pointers HP and TP show different addresses, as shown in
When the CPU receives a notification from the monitor unit that the first and second pointers HP and TP show different addresses, as shown in
Further, when the CPU determines that not all of the given number of messages that are required are gathered, the CPU suspends a process of messages M1 and M2 retrieved from the message queue MQ (step ST14).
Further, the CPU processes the messages when all of the required four messages M1, M2, M3, and M4 are gathered (step ST15).
In the above-described comparative example, every time a message is transferred from the message source MS to the message queue MQ, the fact that HP≠OTP is notified to the CPU from the monitor unit. Accordingly, the CPU must retrieve a message every time such a notification is received, and when the given number of messages that are required are not completely gathered, a process of the messages must be suspended.
That is, since the CPU (firmware) is required to retrieve a message and determine whether to execute or suspend a message process every time the message is queued, a load of the CPU is increased and the performance of the memory system is degraded. Further, since the rate of operation of the CPU is increased, the power consumption is increased.
In contrast, the embodiment can resolve the problems as described above. That is, as has been described referring to
Application Example
The example of
For example, when messages M1, . . . , Mn are queued in the message queue MQ, and those messages include an error message E, detailed information e regarding the error message E is queued in the error queue EQ.
In the memory system of
In this case, after the CPU has received a notification that the given number of messages are completely gathered from the monitor unit, the CPU retrieves these given number of messages and also retrieves the detailed information e on the error message E at the same time. The CPU can perform a predetermined error process on the basis of the detailed information e on the error message E.
Note that the monitor unit can also immediately inform the CPU of the fact that the error message E is received when the detailed information e on the error message E is queued in the error queue EQ. In this case, the CPU can confirm the error message E and the detailed information e thereon before the given number of messages are completely gathered.
That is, the CPU can perform a predetermined error process to the error message E before the given number of messages are completely gathered.
A host 30 is an electronic apparatus such as a personal computer and a portable terminal. The host 30 may be an imaging device such as a digital still camera or a video camera, a tablet computer, a smartphone, a game console, a car navigation system, a printer, a scanner or a server system.
A storage device 31 is a device connectable to the host 30, and is, for example, a solid-state drive (SSD) or a USB memory. The storage device 31 may be incorporated into the host 30, or may be connected to the host 30 via a cable or a network.
The host 30 and the storage device 31 are connected to each other by a predetermined interface standard. The predetermined interface standard is, for example, a Peripheral Component Interconnect Express (PCIe) standard having the broadband, low latency, high expandability, etc., features, or a Non-Volatile Memory Express (NVMe) standard which deals with a nonvolatile memory (storage memory) on the interface of the PCIe standard.
The predetermined interface standard may be a Serial Advanced Technology Attachment (SATA) standard, a Universal Serial Bus (USB) standard, a Serial Attached SCSI (SAS) standard, a Mobile Industry Processor Interface (MIPI) standard, a Unified Protocol (UniPro) standard, etc.
The storage device 31 includes a device controller 32 and a nonvolatile memory 33. The nonvolatile memory 33 is a storage memory whose capacity can be increased, that is, a NAND flash memory, for example. The nonvolatile memory 33 may comprise a memory cell having a two-dimensional structure or may comprise a memory cell having a three-dimensional structure. The nonvolatile memory 33 may include a plurality of memory chips which are stacked.
The nonvolatile memory 33 includes a plurality of channels (in this case, four channels) CH0, CH1, CH2, and CH3, and a plurality of banks (in this case, two banks) BANK0 and BANK1).
The plurality of channels CH0, CH1, CH2, and CH3 are elements which can be operated in parallel. For example, in parallel with read/write in one channel CH0, read/write in the remaining three channels CH1, CH2, and CH3 can be executed. As can be seen, the plurality of channels CH0, CH1, CH2, and CH3 realize high-speed read/write.
The plurality of banks BANK0 and BANK1 are elements for executing an interleave operation. For example, each of the channels includes two chips. In this case, when the read/write in chips CP00, CP10, CP20, and CP30 within BANK0 is busy (i.e., execution in progress), data transfer is executed between chips CP01, CP11, CP21, and CP31 within BANK 1 and the device controller 32. In this way, the data transfer between the nonvolatile memory 33 and the device controller 32 is performed efficiently.
The device controller 32 comprises, for example, a CPU 11-0, . . . , an LUT controller 11′, a RAM (memory) 12, a monitor unit 13, a host interface 34, a hardware engine 35, a memory controller 36, and a bus 14 which connects these elements. The CPU 11-0, . . . , the RAM 12, the monitor unit 13, and the bus 14 correspond to the CPU 11, the RAM 12, the monitor unit 13, and the bus 14 of the memory system of
The host interface 34 receives various commands such as read/write commands from the host 30. For example, a message queued in the message queue MQ within the RAM 12 is an access command (read/write command) to the nonvolatile memory 33 which is transferred from the host 30.
In access to the nonvolatile memory 33, the LUT controller 11′ refers to an LUT (an address translation table), and translates a logical address from the host 30 into a physical address of the nonvolatile memory 33.
The CPU 11-0, . . . , 11-n are assigned to perform various functions for access when, for example, the nonvolatile memory 33 is to be accessed.
The memory controller 36 comprises an encoding/decoding unit 361, and a memory interface 362. The encoding/decoding unit 361 includes, for example, an error correction code (ECC) encoder, and an ECC decoder.
When data is to be written to the nonvolatile memory 33, the encoding/decoding unit 361 adds an error correction code (ECC) to the write data as a redundant code by encoding (i.e., ECC encoding) the write data. Also, when the data is to be read from the nonvolatile memory 33, the encoding/decoding unit 361 uses the ECC added to the read data and performs error correction of the read data (ECC decoding).
The memory interface 362 serves as the interface of the nonvolatile memory 33.
In this case, although the RAM 12 is arranged within the device controller 32, the RAM 12 can be provided independently of the device controller 32 in place of the present structure.
As an example of use of the storage device described above, for example, as shown in
In this case, the CPU 11-0 changes the address shown by the first pointer HP from the first address A1 to the fifth address A5, for example.
When responses to the four messages (read commands of data A, B, C, and D) M1, M2, M3, and M4 are transferred from the nonvolatile memory 33 as the message source MS to the message queue MQ, these four messages M1, M2, M3, and M4 are processed by the CPU 11-0. Further, the CPU 11-0 changes the address shown by the second pointer TP from the first address A1 to the fifth address A5, for example.
In this case, data A, B, C, and D, for example, are read in parallel in the CPU 11-0 from channels CH0, CH1, CH2, and CH3. Also, the CPU 11-0 is enabled to simultaneously process these items of data A, B, C, and D.
As described above, according to the first embodiment, it is possible to improve the performance of the memory system by reducing the load of the CPU, and reduce power consumption by eliminating unnecessary operation of the CPU.
A memory system 40 comprises a PCIe/NVMe port 41, a device controller 42, a nonvolatile memory 43, and a buffer memory 44.
The nonvolatile memory 43 is a storage memory, and is a NAND flash memory, for example. The device structure of the nonvolatile memory 43 may be a two-dimensional structure or a three-dimensional structure.
The buffer memory 44 is, for example, a volatile RAM such as a DRAM or an SRAM. The buffer memory 44 may be a nonvolatile RAM such as an MRAM, an ReRAM, or an FeRAM. The buffer memory 44 is used as a cache memory, for example. Although the buffer memory 44 is arranged outside the device controller 42, the buffer memory 44 may be arranged inside the device controller 42.
The memory system 40 can be connected to a host via the PCIe/NVMe port 41. The nonvolatile memory 43 stores, for example, user data and an LUT (a look-up table) which translates logical address LBA into logical address LCA when the host is to access the user data. The LUT is also stored in the buffer memory 44 as cache data. The LUT may be stored in a memory in the host.
By referring to the LUT, the device controller 42 translates logical address LBA into logical address LCA.
The host manages a namespace. When the maximum capacity of the namespace is defined as Cmax, the namespace includes LUT chunks each having a predetermined capacity of A=Cmax/Y, where Y is a natural number greater than or equal to 2. Further, logical address LBA is translated into logical address LCA via the LUT chunks.
Note that the definition and the detailed explanation of each of the logical addresses LBA and LCA, the namespace, and the LUT chunk will be provided later.
The number of LUT chunks in the namespace is determined based on the capacity of the namespace provided by the host. When the maximum number of namespaces managed by the host is assumed as X, the total number N of the LUT chunks is (X+Y). Here, Y is greater than or equal to X (Y≧X). Further, Y is the maximum number of LUT chunks which can be set in one namespace. Furthermore, preferably, A×N should be greater than the capacity of the nonvolatile memory 43.
The device controller 42 comprises, for example, a CPU 421, a PCIe/NVMe interface 422, an Advanced Host Controller Interface (AHCI) controller 423, a buffer controller 424, a code/decode module 425, a memory interface 426, and a bus 427 connecting these elements.
The PCIe/NVMe interface 422 controls data communication using the PCIe/NVMe port 41 based on the PCIe/NVMe standard. The AHCI controller 423 interprets the nonvolatile memory 43 as a storage device connected to the host, and controls data communication between the host and the storage device. The buffer controller 424 serves as an interface of the buffer memory 44.
The code/decode module 425 performs coding of write data to the nonvolatile memory 43, and decoding of read data from the nonvolatile memory 43. A method of coding/decoding is not particularly limited. For example, as the method of coding/decoding, Reed Solomon (RS) coding/decoding, Bose Chaudhuri Hocquenghem (BCH) coding/decoding, Low Density Parity Check (LDPC) coding/decoding, etc., can be used.
The memory interface 426 serves as the interface of the nonvolatile memory 43. The memory interface 426 controls data communication with the nonvolatile memory 43. The CPU 421 controls a read/write operation for the nonvolatile memory 43, on the basis of an instruction from the host. Also, the CPU 421 controls the operations such as garbage collection and refresh.
The nonvolatile memory includes a block BK.
The block BK comprises a plurality of cell units CU arranged in a first direction. One cell unit CU includes a memory cell string extending in a second direction intersecting the first direction, a transistor (field effect transistor: FET) S1 which is connected to one end of a current path of the memory cell string, and a transistor (FET) S2 which is connected to the other end of the current path of the memory cell string. The memory cell string includes eight memory cells, i.e., memory cells MC0 to MC7, whose current paths are connected in series.
One memory cell MCk (k is one of 0 to 7) includes a charge storage layer (for example, a floating gate electrode) FG, and a control gate electrode CG.
In this case, one cell unit CU includes eight memory cells MC0 to MC7, but the number of memory cells is not limited to this. For example, one cell unit CU may comprise two or more memory cells, for example, 32 or 56 memory cells. Also, each of the memory cells may be a single-level cell (SLC) capable of storing 1-bit data, or a multilevel cell (MLC) capable of storing data of 2 bits or more.
A source line SL is connected to one end of the current path of the memory cell string via select transistor S1. A bit line BLm-1 is connected to the other end of the current path of the memory cell string via select transistor S2.
Word lines WL0 to WL7 are connected in common to the control gate electrodes CG of the respective memory cells MC0 to MC7 arranged in the first direction. Similarly, a select gate line SGS is connected in common to gate electrodes of the respective select transistors S1 arranged in the first direction, and a select gate line SGD is connected in common to gate electrodes of the respective select transistors S2 arranged in the first direction.
One page PP comprises m memory cells connected to a single word line WLi (i is one of 0 to 7). The read/write operation of the nonvolatile memory is performed per page PP (corresponding to a cluster), and the erase operation is performed per block BK.
The memory system 40 described above provides one or more address spaces for the host. The address space is a range of address which can be specified by the host. Address information indicating a position within the address space is represented as a logical address. When a plurality of address spaces are provided, in order to distinguish between the address spaces, names are given. The address space provided by the memory system 40 is represented as a namespace (NS).
The host specifies the position of data within the memory system 40 by using an identifier of the namespace (namespace ID: NS_ID), and a logical address. The logical address is represented by, for example, a logical block address (LBA). The logical address LBA is an address of a data unit (region) managed by the host. Accordingly, the logical address LBA is translated into a logical address (logical cluster address) LCA of the read/write unit (cluster) of the nonvolatile memory 43.
Recently, in order to meet the demands of making the capacity of the memory system 40 larger, a memory capacity of the nonvolatile memory 43 as the storage tends to be increased. In accordance with the above, for example, the size of the LUT (look-up table) which associates logical address LBA and logical address LCA with each other, that is, the memory capacity (the number of pointers) required by the LUT, is increased. Also, since the number of pointers which associate logical address LBA and logical address LCA with each other is increased, the time required for processes such as addition and deletion of the namespace is increased, and the load of the device controller 42 is increased.
Hence, in the present embodiment, the maximum capacity (Max capacity) of one namespace is divided by a given unit A. For example, when the maximum number of the namespaces which the host can set is X, the number of divisions Y of the maximum capacity of one namespace is assumed to be a number which is equal to X or greater than X. Further, management of the namespace is assumed to be carried out by a given unit A×n. Here, n is one of 1 to Y.
A memory space of the given unit A is referred to as an LUT chunk.
As can be seen, by managing the namespace by using the LUT chunks, it becomes unnecessary to associate logical address LBA and logical address LCA with each other one by one. That is, the number of pointers which associates logical address LBA and logical address LCA with each other can be reduced, and thus, processes such as addition and deletion of the namespace do not take time, and the load of the device controller 42 can be reduced. This will be described later.
Further, it is assumed that the total number N of the LUT chunks is (X+Y).
In this case, a total capacity of all of the LUT chunks becomes (X+Y)×A, and becomes greater than the maximum capacity Y×A of one namespace. The reason the total capacity of all of the LUT chunks is set greater than the maximum capacity of one namespace is that it is assumed a case where not all of capacity A of one LUT chunk is used.
Hereinafter, a specific example will be described.
For example, as shown in
In this case, for example, eight namespaces (NS_ID=0, NS_ID=1, NS_ID=2, . . . , NS_ID=7) can be set. Also, the total number N of the LUT chunks is (X+Y)=16 (CNK_0, CNK_1, CNK_2, . . . , CNK_12, CNK_13, CNK_14, CNK_15). When it is assumed that the maximum capacity Cmax of one namespace is 2 terabytes, the capacity of one LUT chunk is 256 gigabytes.
The host can set the capacity of one namespace within the range from 0 bytes to 2 terabytes.
For example, when the capacity of the namespace (NS_ID=0) is set within the range from 513 gigabytes to 768 gigabytes, three LUT chunks CNK_1, CNK_3, and CNK_8 are associated with the namespace (NS_ID=0). Also, when the capacity of the namespace (NS_ID=1) is set within the range from 1 gigabyte to 256 gigabytes, one LUT chunk CNK_5 is associated with the namespace (NS_ID=1).
Similarly, when the capacity of the namespace (NS_ID=2) is set within the range from 1 gigabyte to 256 gigabytes, one LUT chunk CNK_0 is associated with the namespace (NS_ID=2). Also, when the capacity of the namespace (NS_ID=7) is set within the range from 257 gigabytes to 512 gigabytes, two LUT chunks CNK_7 and CNK_11 are associated with the namespace (NS_ID=7).
Further, the host specifies the position of data within the memory system 40 by using an identifier of the namespace (NS_ID), and logical address LBA.
For example, when the host requests the memory system 40 to read the user data, the host transfers the identifier of the namespace (NS_ID) and logical address LBA to the memory system 40. The device controller 42 translates logical address LBA into logical address LCA based on the identifier of the namespace (NS_ID) and logical address LBA.
In this case, as shown in
Further, when the namespace specified by the host is NS_ID=0, and logical address LBA is within the range of BAc to BAd, since LUT chunk CNK_3 is referred to, logical address LBA (BAc to BAd) is translated into logical address LCA (CAc to CAd).
Furthermore, when the namespace specified by the host is NS_ID=0, and logical address LBA is within the range of BAe to BAf, since LUT chunk CNK_1 is referred to, logical address LBA (BAe to BAf) is translated into logical address LCA (CAe to CAf).
According to the second embodiment, for example, when logical address LBA (BAa to BAb) of namespace NS_ID=0 is to be accessed, by referring to LUT chunk CNK_8, translation from logical address LBA (BAa to BAb) into logical address LCA (CAa to CAb) is enabled. In contrast, in the comparative example, in namespace NS_ID=0, association between logical address LBA (BAa to BAb) and logical address LCA (CAa to CAb) must be performed for each address.
Similarly, for example, when logical address LBA (BAc to BAd) of namespace NS_ID=0 is to be accessed, by referring to LUT chunk CNK_3, translation from logical address LBA (BAc to BAd) into logical address LCA (CAc to CAd) is enabled. In contrast, in the comparative example, in namespace NS_ID=0, association between logical address LBA (BAc to BAd) and logical address LCA (CAc to CAd) must be performed for each address.
That is, the size of the LUT required in the second embodiment (i.e., the number of pointers for associating logical address LBA and logical address LCA with each other) can be made smaller than that required in the comparative example (i.e., the number of pointers for associating logical address LBA and logical address LCA with each other). Therefore, according to the second embodiment, processes such as addition and deletion of the namespace do not take time, and the load of the device controller 42 can be reduced.
As described above, according to the first embodiment, it is possible to improve the performance of the memory system by reducing the load of the CPU, and reduce power consumption by eliminating unnecessary operation of the CPU. Also, according to the second embodiment, by introducing the concept of LUT chunks, processes such as addition and deletion of the namespace do not take time, and the load of the CPU can be reduced.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
This application claims the benefit of U.S. Provisional Application No. 62/395,799, filed Sep. 16, 2016, the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
62395799 | Sep 2016 | US |