The field of invention pertains generally to the computing sciences, and, more specifically, to a mass storage device capable of fine grained read and/or write operations.
Computing system designers are constantly considering ways to improve the design of their systems. “Efficiency” is a respectable measure of system operation, and therefore, system design. In particular, accesses to and/or movements of large amounts of un-needed information within the system are to be avoided to the extent practicable.
A better understanding of the present invention can be obtained from the following detailed description in conjunction with the following drawings, in which:
Here, the main memory's address space is typically viewed as being broken down into pages. A software program operates by issuing memory read requests for instructions and read/write instructions for/to data on the pages that have been allocated for the software program in main memory 102. Often, a software program's footprint is larger than the number of pages that have been allocated for it in main memory 102. As such, during run-time execution of the software, pages that are not needed or believed by the software to not be imminently needed may be swapped out of main memory 102and stored in non-volatile mass storage 105. Likewise, pages that are needed or believed by the software to be imminently needed and that are not in main memory 102 are called up from mass storage 105 and entered into main memory 102.
Unfortunately, each page can contain a significant amount of information (e.g., 4 kilo-bytes (KB)). Transferring large data amounts to/from mass storage 105 in units of pages stems from a traditional inability of mass storage devices (e.g., hard disk drives, FLASH memory devices, tape drives, etc.) to access information rapidly at small levels of granularity. For instance, in the case of a traditional disk drive, considerable amounts of time are consumed moving a read/write head over an appropriate track on a disk. Once the read/write head is situated in the correct location, large amounts of information can be accessed at relatively high speeds. Thus, by its nature, the performance of a disk drive is optimized if it is asked to read/write large amounts of data at a single track location rather than “hopping” from track to track at slow speeds only to access small amounts of data.
A similar situation exists with FLASH memory devices as same amounts of data have traditionally only been erased together in large blocks, and, moreover, as a replacement for hard disk drives, FLASH memory devices are designed to support read/write accesses in units of large amounts of data (blocks).
Main memory 102, by contrast, is traditionally implemented as a true random access memory such as dynamic random access memory (DRAM), and is capable of accessing data at much finer granularities. Typically, data is physically read/written from/to main memory 102 in cache lines which are much smaller than pages. For instance, traditionally, whereas a single page may be 4096 bytes (4KB) of data, a single cache line may only be, e.g., 64 bytes (64B) of data. Moreover, main memory 102 is typically referred to as being “byte addressable” in that data segments as small as a single byte may be written to and/or read from main memory 102 from the perspective of a processing core 101 that issues memory read/write requests. That is, the processing core 101 may issue a write request that requests only a single byte to be written into main memory 102, and/or, the processing core 101 may issue a read request that requests only a single byte of information.
Thus, traditionally, there is a great disparity between the granularity at which accesses can be made to main memory 102 versus the granularity at which accesses can be made to mass storage 105.
Importantly, computing system designers are recognizing that transferring large amounts of data to/from mass storage 105 may be too inefficient. For instance, consider a scenario where an entire page is called up from mass storage 105 and entered in main memory 102 but only a small amount of information on the page is actually utilized by its software program. Here, not only are large amounts of consumed memory space not being utilized, but also, large amounts of data were moved from mass storage 105 to main memory 102 for essentially no purpose. The presence of unneeded data in main memory 102 and the movement of large amounts of data with no end-purpose can translate into a noticeable power consumption inefficiency of the computer system 100. The problem may be particularly relevant, e.g., in the case of battery powered systems (such as handheld devices (e.g., smartphones) or laptop computers).
As such,
In various embodiments, consistent with traditional SSD operation, the SSD accesses information at “block” granularities (a page and block may be of same size, or, e.g., one is a multiple of the other). As such, the controller 303 manages a mapping table (also referred to as an address translation table) that maps logical block addresses to physical block addresses. When a host sends a block of data to be written into the SSD 301, the host also appends a block address to the data which is referred to as a logical block address (LBA). The SDD 301 then writes the block of data into the non volatile media 302 and associates, within the mapping table, the data's LBA to the physical location(s) within the SSD where the block's pages of data are stored. The specific physical locations are specified with a physical block address (PBA) that uniquely identifies the one or more die, plane and/or other resources within the SSD 301 where the pages are stored.
Importantly, the controller 303 also supports finer grained data accesses such as, e.g., byte level read operations and/or byte level write operations. In the case of a byte level read operation, referring to
Here, the amount of stripped information that is included in the response may be slightly larger than the requested data. For example, the physical layer of the I/O interface may establish a minimum packet payload size that is larger than the smallest granularity at which the SSD can support read/write requests. For example, the SSD may have the ability to support byte level requests but the I/O interface has a minimum packet payload size of eight bytes. In this case, an eight byte chunk that includes the requested byte may be returned in the response 3. The host will then pass at least the requested byte up to the memory controller which will write it into the main memory.
In the case of a byte level write operation, referring to
In various embodiments, the device driver software for the SSD 301 is enhanced to accept software commands that support smaller grained non volatile mass storage operations as described above (here device driver software may execute on a general purpose processing core 201 or, potentially, some other processing intelligence in the system). For example, according to one embodiment, the device driver software maintains an application programmer's interface (API) that supports the following “ReadPartial” software command (method call) through the interface:
pData<=ReadPartial(LBA, X, Int N)
where LBA is the logical block address of the block that contains the desired read data, X is an offset from the location of the first byte in the targeted block where the desired data begins and N is the amount of requested data specified in bytes. pData is the returned read data that is written into main memory, or at least provided by the mass storage device in response to the method call.
Thus, the ReadPartial operation is capable of providing read amounts in a range of 1 to N bytes and is broader in functionality than only providing a single byte. As discussed above, the actual returned amount may differ than the actual needed amount, e.g., owing to differences between the granularities of what the SSD is capable of and the channel(s) between the SSD and main memory. For example, in the aforementioned example where the physical layer of the I/O supports minimum packet payload sizes that are larger than the minimum granularity that the SSD can perform, the software that invokes the method call may set Int N=the minimum packet payload size.
In various embodiments, the device driver's API also supports the following “WritePartial” software command (method call) through the interface:
WritePartial(LBA, X, Int N, pData)
where LBA is the logical block address of the block that contains the data to be written over, X is an offset from the location of the first byte in the targeted block where the data to be written over begins and N is the amount of data to be written specified in bytes. pData is the data to be written.
In various embodiments, the aforementioned SSD remains capable of block level accesses in which blocks to be read are specified by LBA and provided by the SSD, and, blocks to be written are provided to the SSD and written over the block having the same LBA within the SSD.
The ability to specify, in a single command, read or write data not only with granularity as to amount of data (e.g., only a single byte) but also as to start point and end point (even if large data amounts are requested, conceivably even larger than a block), corresponds to much more precise usage models and resulting efficiencies within the computing system.
In particular, if only a small amount of data is to be written/read to/from mass storage, only a corresponding small amount of data is passed between the memory controller and peripheral control hub. As compared to traditional approaches, the passing of large amounts of unneeded data is avoided thereby improving the power efficiency of the computing system. If large chunks of information that are not aligned on block boundaries are in need of a write/read operation, then, the specific write/read data (and little if anything else) is passed between the memory controller and the peripheral control hub which, likewise, at least eliminates the possibility of passing large amounts of un-needed data within the system (e.g., in the case where affected data starts near the end of a first block and ends near the beginning of a last block).
In various system implementations, the software that oversees the functionally of an improved SSD as described herein and page management of system memory (e.g., operating system, virtual machine monitor, operating system instance, application software, etc.) does not apply the granularly of the improved operations described herein to a mass storage device until the pages for all data affected by such improved operations are resident in main memory.
That is, for example, when an application needs instructions or data on a page that is not currently resident in main memory, the page is called up from mass storage as per traditional methods. Here, again, a page may be equal in size to a block of data within the SSD, or, one may be a multiple of the other (in the case where a block is a multiple of more than one page, generally, multiple pages are passed between memory and storage as a single unit of transfer). Once the page is resident in main memory, however, the software can begin writing/reading finer grained accesses to data that is on that page into mass storage.
Here, although software is typically executed by accessing information on pages of data in main memory, operations to mass storage for information of pages that are already in main memory can also happen. An example is, e.g., transactional software (e.g., two-phase commit protocol software) in which certain data that has been calculated and is retained in main memory needs to be “committed” to mass storage for, e.g., permanent keeping because of its non volatile characteristic. In this case, data on a page in main memory is written back to mass storage. Here, with the improved operations described herein, if such data is only a small amount (e.g., just a byte) then just the small amount can be written back from main memory to mass storage, whereas, as per traditional operations, the entire page that the data is on needs to be written back from main memory to mass storage.
It is also possible for a reverse scenario in the case of read where, e.g., permanent data (e.g., look-up table data) is written over when being processed from a page in main memory. If the software needs the original data again, it can only fetch it from mass storage. Here, if the data is, e.g., as small as a byte, such data can be called up from mass storage using the improved operations described herein, whereas, according to traditional operation, the data's entire page would need to be called up from mass storage.
For operations that affect multiple pages (e.g., a large write of data that extends from the middle of a first page to the middle of a second page), each of the affected pages should be present in memory in order to use an improved write function to mass storage that only includes the write data itself and not the data of all the pages on which the write data resides. Similarly, in the case of a large read whose read data spans multiple pages, generally, the pages having the affected data should already reside in main memory. However, in the case where a large read includes one or more whole pages in the middle of the requested data amount, only the pages having the beginning and end portions of the requested data should be in memory in order to execute the command. Conceivably, the intervening whole pages in between the pages having the beginning and end portions can be called up initially through the execution of the improved command (they do not need to be main memory as a pre-requisite to execution of the improved read operation).
In other embodiments, rather than postured as new/separate commands that are different, e.g., in input parameter syntax, than legacy commands that support block based accesses, instead, the above described method calls are synergized with legacy operations so that only one command syntax is used for both fine grained and legacy method calls. More specifically, for instance, rather being expressed as a “partial” read, the generic read method call is expressed as:
pData<=Read(LBA, X, Int N)
where, for partial reads, the input parameters LBA, X and Int N are used as described above for PartialRead but for a legacy read command X=0 and Int N=the block size. Similarly, rather being expressed as “partial” write, the generic write method call is expressed as:
Write(LBA, X, Int N, pData)
where, for partial writes, the input parameters LBA, X and Int N are used as described above for PartialWrite but for a legacy write commands X=0 and Int N=the block size.
In various embodiments, the aforementioned mass storage device may be fronted, internally or externally, by a mass storage cache (“disk cache”). A cache, as is known in the art, is used to keep frequently accessed blocks in a faster storage medium (such as DRAM) so that the performance of the mass storage device may be observed to be faster than the storage device's underlying, deeper non-volatile mass storage medium. Here, if the cache is large enough for caching frequently accessed blocks, some portion of the cache or some additional smaller extended cache region may be used to cache frequently accessed items of small granularity. For instance, a mere 8B cache can cache up to eight bytes of information that are the frequent target of smaller grained read/write operations (e.g., if the mass storage device frequently receives read/write accesses for eight, different, non-contiguous bytes (as articulated with eight different input parameter sets in their respective method calls) the 8B cache can keep each of these bytes.
An applications processor or multi-core processor 750 may include one or more general purpose processing cores 715 within its CPU 701, one or more graphical processing units 716, a memory management function 717 (e.g., a memory controller) and an I/O control function 718. The general purpose processing cores 715 typically execute the operating system and application software of the computing system. The graphics processing unit 716 typically executes graphics intensive functions to, e.g., generate graphics information that is presented on the display 703. The memory control function 717 interfaces with the system memory 702 to write/read data to/from system memory 702. The power management control unit 712 generally controls the power consumption of the system 700.
Each of the touchscreen display 703, the communication interfaces 704-707, the GPS interface 708, the sensors 709, the camera(s) 710, and the speaker/microphone codec 713, 714 all can be viewed as various forms of I/O (input and/or output) relative to the overall computing system including, where appropriate, an integrated peripheral device as well (e.g., the one or more cameras 710). Depending on implementation, various ones of these I/O components may be integrated on the applications processor/multi-core processor 750 or may be located off the die or outside the package of the applications processor/multi-core processor 750.
The computing system also includes non-volatile storage 720 which may be the mass storage component of the system. Here, for example, the mass storage may be composed of one or more SSDs that are composed of FLASH memory chips whose multi-bit storage cells are programmed at different storage densities depending on SSD capacity utilization as described at length above.
Embodiments of the invention may include various processes as set forth above. The processes may be embodied in machine-executable instructions. The instructions can be used to cause a general-purpose or special-purpose processor to perform certain processes. Alternatively, these processes may be performed by specific/custom hardware components that contain hardwired logic circuitry or programmable logic circuitry (e.g., FPGA, PLD) for performing the processes, or by any combination of programmed computer components and custom hardware components.
Elements of the present invention may also be provided as a machine-readable medium for storing the machine-executable instructions. The machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs, and magneto-optical disks, FLASH memory, ROMs, RAMs, EPROMs, EEPROMs, magnetic or optical cards, propagation media or other type of media/machine-readable medium suitable for storing electronic instructions. For example, the present invention may be downloaded as a computer program which may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals embodied in a carrier wave or other propagation medium via a communication link (e.g., a modem or network connection).
In the foregoing specification, the invention has been described with reference to specific exemplary embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.