Fail-Safe Write Back Caching Mode Device Driver For Non Volatile Storage Device
Computing systems typically include system memory (or main memory) that contains data and program code of the software that the system's processor(s) are currently executing. Traditionally, non volatile storage (such as a disk drive) is used to store the program code when the system is powered off. Computer scientists are frequently trying to squeeze more performance out of non volatile storage (because it is usually slower than system memory) and reduce system memory power consumption.
A better understanding of the present invention can be obtained from the following detailed description in conjunction with the following drawings, in which:
In a common application the storage device 101 is “block” based which means units of data are read from the storage device 101 and written into the storage device 101 in larger chunks (e.g., “blocks”, “sectors”, “pages”) than nominal accesses to system memory (or “main” memory) which typically write/read to/from in smaller sized data units (e.g., byte addressable cache lines).
A problem is that traditional block based storage devices (e.g., hard disk drives, solid state drives (SSDs)) tend to be slow. As such, referring to
As observed in
The caching layer 105, as implemented by the filter driver 104, is typically a block based storage resource. That is, units of information are written to and read from the caching layer 105 in block units. Even in the case where the caching layer 105 is implemented as a section of DRAM system memory (in which case the filter driver 104 is referred to as a “DRAM filter driver”), the units of data that are written to and read from caching layer 105 are performed in units of blocks (e.g., by aggregating multiple system memory cache lines into a block). In cases where the cache 105 is implemented in system memory, the filter driver 104 is allocated a region of system memory which the filter driver 104 uses as the cache 105.
As can be seen in
With respect to data consistency issues, in the case of a DRAM filter driver, because of the non volatile nature of the DRAM caching layer 105, a “write-through” cache is typically implemented. In the case of a write-through cache, as observed in
Additionally, more traffic is introduced internally within the system (here, traffic is understood to be the various flows of information within the system). That is the write through process 112 not only introduces more traffic within the system but also causes filter driver 104 to include additional complex code in order to setup/arrange/control the write-through caching system. Further still, even if write-through caching is not adopted, again in the case a DRAM filter driver, because of the volatile nature of DRAM, the content of the caching layer 105 will need to be “dumped” 113 into the low level storage 101 of a system storage hierarchy upon a system power down cycle to preserve the content of the cached information. The problem of having more internal traffic as a consequence has been handled by reducing the effectiveness or “enjoyment” of the cache for write operations. That is, in some configurations, write operations are denied usage of the cache and the cache is only used for read operations.
Such emerging non volatile random access memories technologies typically have some combination of the following: 1) higher storage densities than DRAM (e.g., by being constructed in three dimensional (3D), e.g., crosspoint or otherwise, circuit structures); 2) lower power consumption densities than DRAM (e.g., for a same clock speed); and/or 3) access latency that is slower than DRAM yet still faster than traditional non-volatile memory technologies such as FLASH. The later characteristic in particular permits the emerging non volatile memory technology to be used in a main system memory role rather than a low level storage role of a system storage hierarchy (which is the traditional architectural location of non volatile storage (other than BIOS/firmware)).
Thus, even though the lower level 214 is comprised of a non volatile memory, in various embodiments at least a portion of the non volatile memory acts as a true system memory in that it supports finer grained data accesses (e.g., byte addressable cache lines) rather than larger blocked based accesses associated with traditional, low level non volatile storage of a system storage hierarchy, and/or, otherwise acts as an addressable memory that the program code being executed by processor(s) of the CPU operate out of.
The upper layer 213 may act as a cache for the lower layer 214 or as a level of system memory having a higher priority than the lower layer 214 (e.g., where more time sensitive (e.g., “real time”) data is kept). In the former case (upper layer 213 acts as a cache for the lower layer 214), the upper layer 213 may not have its own uniquely addressable system memory space (unique memory addresses are assigned to the lower level 214). In the later case (upper layer 213 acts as a higher priority system memory level), both the upper and lower layers 213, 214 may have their own separate uniquely addressable system memory space. In various embodiments the upper layer 213 is comprised of a DRAM based memory.
The presence of a non volatile level 214 of system memory opens up a wealth of possible system performance improvements and novel internal system workings and/or processes.
Here, because the caching layer 305 is non-volatile, the need to synchronize a data block in cache 305 with any copy of itself (if any) in the low level storage device 301 of a system storage hierarchy in real time is greatly reduced. Should the system suffer a sudden power failure the data blocks in cache 305 will be preserved because of the non-volatile nature of the cache 305. As such, the motivation for a write-through caching scheme is largely diminished. This frees the filter driver 304 and the overall system of the costly internal write-through processes associated with the prior art approach of
Because of the lack of motivation to instill a write-through caching process, the filter driver 304 may configure itself (e.g., as a default) in a non write-through mode (e.g., a write-back mode as discussed further below). Here, a user may be specifically informed by the filter driver 304 that write-through caching will not be implemented unless the user specifically requests it. For example, the user may be informed by the filter driver 304 that a write-back cache will be implemented and/or that write through caching is not being implemented. As such, whereas prior art solutions may have only used the cache for read operations to avoid write through penalties for writes, with the new system, there is no penalty for writes and writes are free to use the cache as much as reads.
In the case of a write-back cache, no duplicate copy of a data block that is written 311 to cache 305 is written back to the storage device 301. Thus, in an embodiment, a filter driver 304 that implements a caching layer 305 within a layer of non volatile region of system memory may default or be hard-coded into a write-back mode rather than a write-through mode. To the extent the filter driver 304 may offer write-through mode, in an embodiment, a user has to affirmatively select it over and above a (e.g., default, preferred or suggested) write-back mode.
The implementation of the write-back mode may result in an immediate improvement in performance from the perspective of the user 303 relative to the prior art solution of
Additionally, also as observed in
Thus, as a basis of comparison, the prior art approach of
By contrast, the improved approach of
The approach of
Here, the device driver 402 includes caching functionality code 406 (including, e.g., caching inclusion/eviction policy code). The caching functionality code 406 includes a mode of operation in which blocks of information that are written to cache 405 are not automatically written through to low level storage of a system storage hierarchy 401 nor are blocks of information in cache “dumped” into low level storage 401 of a system storage hierarchy upon a system power down cycle. As such, only a single item of program code (the device driver 402) needs to be installed into the system in order to effect system memory level caching for a storage device 401 that employs a write-back caching mode (and not write-through caching) and yet is still a power-safe-fail solution.
In any of the embodiments described above with respect to
An applications processor or multi-core processor 650 may include one or more general purpose processing cores 615 within its CPU 601, one or more graphical processing units 616, a memory management function 617 (e.g., a memory controller) and an I/O control function 618. The general purpose processing cores 615 typically execute the operating system and application software of the computing system. The graphics processing units 616 typically execute graphics intensive functions to, e.g., generate graphics information that is presented on the display 603. The memory control function 617 interfaces with the system memory 602. The system memory 602 may be a multi-level system memory such as the multi-level system memory 212 observed in
Each of the touchscreen display 603, the communication interfaces 604-607, the GPS interface 608, the sensors 609, the camera 610, and the speaker/microphone codec 613, 614 all can be viewed as various forms of I/O (input and/or output) relative to the overall computing system including, where appropriate, an integrated peripheral device as well (e.g., the camera 610). Depending on implementation, various ones of these I/O components may be integrated on the applications processor/multi-core processor 650 or may be located off the die or outside the package of the applications processor/multi-core processor 650.
Embodiments of the invention may include various processes as set forth above. The processes may be embodied in machine-executable instructions. The instructions can be used to cause a general-purpose or special-purpose processor to perform certain processes. Alternatively, these processes may be performed by specific hardware components that contain hardwired logic for performing the processes, or by any combination of programmed computer components and custom hardware components.
Elements of the present invention may also be provided as a machine-readable medium for storing the machine-executable instructions. The machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs, and magneto-optical disks, FLASH memory, ROMs, RAMs, EPROMs, EEPROMs, magnetic or optical cards, propagation media or other type of media/machine-readable medium suitable for storing electronic instructions. For example, the present invention may be downloaded as a computer program which may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals embodied in a carrier wave or other propagation medium via a communication link (e.g., a modem or network connection).
In the foregoing specification, the invention has been described with reference to specific exemplary embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.