The present disclosure generally relates to the field of electronics. More particularly, some embodiments generally relate to improving storage cache performance by using compressibility of the data as a criteria for cache insertion or allocation.
Generally, data stored in a cache can be accessed many times faster than the same data stored in other types of memory. Generally, as the size of a cache media is increased, the likelihood that data is found in the cache increases (e.g., resulting in a better hit rate). However, growing the size of the cache adds to overall system costs.
The detailed description is provided with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items.
FIGS. 3A1, 3A2, 3B1, 3B2, and 3C illustrate flow diagrams of methods according some embodiments.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of various embodiments. However, various embodiments may be practiced without the specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to obscure the particular embodiments. Further, various aspects of embodiments may be performed using various means, such as integrated semiconductor circuits (“hardware”), computer-readable instructions organized into one or more programs (“software”), or some combination of hardware and software. For the purposes of this disclosure reference to “logic” shall mean either hardware, software, firmware, or some combination thereof.
As discussed above, utilizing a cache can be beneficial to performance. To this end, storage caches are widely used. For example, Solid State Drives (SSDs) may be used as the cache media. In general, all things being equal, the hit rate of the cache will grow as the size of the cache media grows. Therefore, some cache implementations using SSDs may use hardware compression in the SSD to compress data so that more data fits into the cache, resulting in an improved cache hit rate.
To this end, some embodiments relate to improving storage cache performance by using compressibility of the data as a criteria for cache insertion or allocation. To efficiently use a cache, a decision is made whether a piece of data should be cached (or evicted from the cache). This decision (also referred to herein as “cache insertion” or “cache allocation”) is aimed at ensuring that the data being cached is likely to be accessed in the (e.g., relatively near) future and that the limited space in the cache media is only used for frequently accessed data. Hence, whether some piece of data is cached (or evicted from the cache) can be a critical decision in cache utilization efficiency.
More specifically, one embodiment improves the cache hit rate of storage caches that utilize data compressing non-volatile memory (e.g., SSDs) by giving preference to data (e.g., in a cache line or other granularity of cache storage media) that has higher compressibility as a factor in cache policy decisions (or when data is cached or evicted from the cache). Previously, this was not possible because there was no way for the cache policy logic/software in the host to know the compressibility of the data on a cache line by cache line basis (or other cache granularity). Part of the optimization (that can be implemented in various non-volatile memory such as those discussed herein) includes a feature in the compression process where the host logic/software is explicitly given information regarding the compressibility of each Input/Output (JO) data, e.g., as it is written (or prior to writing the data) to the cache media. Therefore, cache policy logic/software in the host (or a server) can explicitly know the compressibility of each cache line of data, even though the actual compression is performed by hardware in the non-volatile memory device (e.g., SSD) itself. The cache policy logic/software then can give preference to data that is more compressible; thus, increasing the overall compressibility of the data in the cache. Hence, the cache can hold more cache lines than it would have if compressibility was not used as a factor, and therefore, all other factors being equal, the hit rate of the cache will improve. Thus compressibility of the data in a cache line is used to augment traditional factors (sequentiality, process ID, size, file type to name a few) used to decide whether or not to move storage data into the cache or remove storage data from the cache.
Furthermore, even though some embodiments are discussed with reference to SSDs (e.g., including NAND and/or NOR type of memory cells), embodiments are not limited to SSDs and non-volatile memory of any type (in a format other than SSD but still usable for storage) may be used. The storage media (whether used in SSD format or otherwise) can be any type of storage media including, for example, one or more of: nanowire memory, Ferro-electric transistor random access memory (FeTRAM), magnetoresistive random access memory (MRAM), flash memory, Spin Torque Transfer Random Access Memory (STTRAM), Resistive Random Access Memory, byte addressable 3-Dimensional Cross Point Memory, PCM (Phase Change Memory), etc. Also, any type of Random Access Memory (RAM) such as Dynamic RAM (DRAM), backed by battery or capacitance to retain the data, may be used. Hence, even volatile memory capable of retaining data during power failure or power disruption (e.g., backed by battery or capacitance) may be used for the storage cache.
The techniques discussed herein may be provided in various computing systems (e.g., including a non-mobile computing device such as a desktop, workstation, server, rack system, etc. and a mobile computing device such as a smartphone, tablet, UMPC (Ultra-Mobile Personal Computer), laptop computer, Ultrabook™ computing device, smart watch, smart glasses, smart bracelet, etc.), including those discussed with reference to
In an embodiment, the processor 102-1 may include one or more processor cores 106-1 through 106-M (referred to herein as “cores 106,” or more generally as “core 106”), a processor cache 108 (which may be a shared cache or a private cache in various embodiments), and/or a router 110. The processor cores 106 may be implemented on a single integrated circuit (IC) chip. Moreover, the chip may include one or more shared and/or private caches (such as processor cache 108), buses or interconnections (such as a bus or interconnection 112), logic 120, memory controllers (such as those discussed with reference to
In one embodiment, the router 110 may be used to communicate between various components of the processor 102-1 and/or system 100. Moreover, the processor 102-1 may include more than one router 110. Furthermore, the multitude of routers 110 may be in communication to enable data routing between various components inside or outside of the processor 102-1.
The processor cache 108 may store data (e.g., including instructions) that are utilized by one or more components of the processor 102-1, such as the cores 106. For example, the processor cache 108 may locally cache data stored in a memory 114 for faster access by the components of the processor 102. As shown in
As shown in
System 100 also includes Non-Volatile (NV) storage (or Non-Volatile Memory (NVM)) device such as an SSD 130 coupled to the interconnect 104 via SSD controller logic 125. Hence, logic 125 may control access by various components of system 100 to the SSD 130. Furthermore, even though logic 125 is shown to be directly coupled to the interconnection 104 in
As shown in
Furthermore, logic 125 and/or SSD 130 may be coupled to one or more sensors (not shown) to receive information (e.g., in the form of one or more bits or signals) to indicate the status of or values detected by the one or more sensors. These sensor(s) may be provided proximate to components of system 100 (or other computing systems discussed herein such as those discussed with reference to other figures including 4-6, for example), including the cores 106, interconnections 104 or 112, components outside of the processor 102, SSD 130, SSD bus, SATA bus, logic 125, etc., to sense variations in various factors affecting power/thermal behavior of the system/platform, such as temperature, operating frequency, operating voltage, power consumption, and/or inter-core communication activity, etc.
As illustrated in
FIGS. 3A1 to C illustrate flow diagrams of methods according to some embodiments. More particularly, FIGS. 3A1 and 3A2 illustrate methods to address two types of read misses. FIGS. 3B1 and 3B2 illustrate methods to address two types of write misses.
Referring to
Referring to
Referring to
Referring to
Furthermore, the insertion decision would be yes/no for the data currently being read or written. The deletion would be made based on factors like LRU (Least Recently Used) plus compressibility information and would be in response to the need for space, and in this case, logic would search for the “Best” cache line to delete. In various embodiments, the data may be cached in a dedicated cache (not shown) and/or in NVM (such as memory cells 292, SSD 130, etc.). Also, the methods of FIGS. 3A1-3C may be performed in response to a read or a write operation directed at a backing store (such as the backing store 180, the disk drive 428 of
Accordingly, an embodiment improves the effectiveness of storage caches by using the compressibility of the data in a “line” of the cache to be a factor in the algorithms/policies deciding when to insert/allocate/retain a line in the cache and when to delete/evict a line from the cache. Preference can be given to cache lines that are more compressible; thus, increasing the number of lines the cache holds. Hence, the hit rate, and the overall performance of the storage subsystem will improve. In some embodiments, there is an assumption that there is either no correlation or positive correlation between compressibility and the likelihood of the data being needed in the near future.
In some implementations, when queried, NVM (e.g., SSD 130 and/or logic 160) returns a size that grows/shrinks in proportion to the aggregate compressibility of all the data on the media. When the size grows, additional cache lines can be added to the cache. When the size shrinks, lines are removed from the cache. Hence, some embodiments provide an improved implementation because by using the compressibility of an individual cache line as a criteria, preference can be given to more compressible cache lines as a factor in cache insertion/retention and/or deletion policies, and thus the overall compressibility of the aggregate data can be improved, resulting in more cache lines being stored.
Moreover, in an embodiment, host caching policies (e.g., implemented in processors 102/402/502/620/630 of
In an embodiment, one or more of the processors 402 may be the same or similar to the processors 102 of
A chipset 406 may also communicate with the interconnection network 404. The chipset 406 may include a graphics and memory control hub (GMCH) 408. The GMCH 408 may include a memory controller 410 (which may be the same or similar to the memory controller 120 of
The GMCH 408 may also include a graphics interface 414 that communicates with a graphics accelerator 416. In one embodiment, the graphics interface 414 may communicate with the graphics accelerator 416 via an accelerated graphics port (AGP) or Peripheral Component Interconnect (PCI) (or PCI express (PCIe) interface). In an embodiment, a display 417 (such as a flat panel display, touch screen, etc.) may communicate with the graphics interface 414 through, for example, a signal converter that translates a digital representation of an image stored in a storage device such as video memory or system memory into display signals that are interpreted and displayed by the display. The display signals produced by the display device may pass through various control devices before being interpreted by and subsequently displayed on the display 417.
A hub interface 418 may allow the GMCH 408 and an input/output control hub (ICH) 420 to communicate. The ICH 420 may provide an interface to I/O devices that communicate with the computing system 400. The ICH 420 may communicate with a bus 422 through a peripheral bridge (or controller) 424, such as a peripheral component interconnect (PCI) bridge, a universal serial bus (USB) controller, or other types of peripheral bridges or controllers. The bridge 424 may provide a data path between the CPU 402 and peripheral devices. Other types of topologies may be utilized. Also, multiple buses may communicate with the ICH 420, e.g., through multiple bridges or controllers. Moreover, other peripherals in communication with the ICH 420 may include, in various embodiments, integrated drive electronics (IDE) or small computer system interface (SCSI) hard drive(s), USB port(s), a keyboard, a mouse, parallel port(s), serial port(s), floppy disk drive(s), digital output support (e.g., digital video interface (DVI)), or other devices.
The bus 422 may communicate with an audio device 426, one or more disk drive(s) 428, and a network interface device 430 (which is in communication with the computer network 403, e.g., via a wired or wireless interface). As shown, the network interface device 430 may be coupled to an antenna 431 to wirelessly (e.g., via an Institute of Electrical and Electronics Engineers (IEEE) 802.11 interface (including IEEE 802.11a/b/g/n/ac, etc.), cellular interface, 3G, 4G, LPE, etc.) communicate with the network 403. Other devices may communicate via the bus 422. Also, various components (such as the network interface device 430) may communicate with the GMCH 408 in some embodiments. In addition, the processor 402 and the GMCH 408 may be combined to form a single chip. Furthermore, the graphics accelerator 416 may be included within the GMCH 408 in other embodiments.
Furthermore, the computing system 400 may include volatile and/or nonvolatile memory (or storage). For example, nonvolatile memory may include one or more of the following: read-only memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), electrically EPROM (EEPROM), a disk drive (e.g., 428), a floppy disk, a compact disk ROM (CD-ROM), a digital versatile disk (DVD), flash memory, a magneto-optical disk, or other types of nonvolatile machine-readable media that are capable of storing electronic data (e.g., including instructions).
As illustrated in
In an embodiment, the processors 502 and 504 may be one of the processors 402 discussed with reference to
In one embodiment, one or more of the cores 106 and/or processor cache 108 of
The chipset 520 may communicate with a bus 540 using a PtP interface circuit 541. The bus 540 may have one or more devices that communicate with it, such as a bus bridge 542 and I/O devices 543. Via a bus 544, the bus bridge 542 may communicate with other devices such as a keyboard/mouse 545, communication devices 546 (such as modems, network interface devices, or other communication devices that may communicate with the computer network 403, as discussed with reference to network interface device 430 for example, including via antenna 431), audio I/O device, and/or a data storage device 548. The data storage device 548 may store code 549 that may be executed by the processors 502 and/or 504.
In some embodiments, one or more of the components discussed herein can be embodied as a System On Chip (SOC) device.
As illustrated in
The I/O interface 640 may be coupled to one or more I/O devices 670, e.g., via an interconnect and/or bus such as discussed herein with reference to other figures. I/O device(s) 670 may include one or more of a keyboard, a mouse, a touchpad, a display, an image/video capture device (such as a camera or camcorder/video recorder), a touch screen, a speaker, or the like. Furthermore, SOC package 602 may include/integrate the logic 125 in an embodiment. Alternatively, the logic 125 may be provided outside of the SOC package 602 (i.e., as a discrete logic).
The following examples pertain to further embodiments. Example 1 includes an apparatus comprising: memory to store one or more cache lines corresponding to a compressed version of data in response to a determination that the data is compressible; and logic to determine whether the one or more cache lines are to be retained or inserted in the memory based at least in part on an indication of compressibility of the data. Example 2 includes the apparatus of example 1, wherein the one or more cache lines are to be stored in the memory prior to the determination of whether the one or more cache lines are to be retained in the memory. Example 3 includes the apparatus of example 1, wherein the one or more cache lines are to be stored in the memory after the determination of whether the one or more cache lines are to be retained in the memory. Example 4 includes the apparatus of example 1, comprising logic to determine whether to remove the one or more cache lines. Example 5 includes the apparatus of example 1, comprising logic to determine whether to remove the one or more cache lines based at least in part on the indication of compressibility of the data. Example 6 includes the apparatus of example 1, wherein the compressibility of the data is to be determined based at least in part on a size of an uncompressed version of the data and a size of the compressed version of the data. Example 7 includes the apparatus of example 1, wherein the memory is to include non-volatile memory comprising one of: nanowire memory, Ferro-electric transistor random access memory (FeTRAM), magnetoresistive random access memory (MRAM), flash memory, Spin Torque Transfer Random Access Memory (STTRAM), Resistive Random Access Memory, Phase Change Memory (PCM), NAND, 3-Dimensional NAND, and byte addressable 3-Dimensional Cross Point Memory. Example 8 includes the apparatus of example 1, wherein an SSD is to comprise the memory and the logic. Example 9 includes the apparatus of example 1, wherein the memory is to store uncompressed data.
Example 10 includes a method comprising: storing one or more cache lines, corresponding to a compressed version of data, in memory in response to a determination that the data is compressible; and determining whether the one or more cache lines are to be retained or inserted in the memory based at least in part on an indication of compressibility of the data. Example 11 includes the method of example 10, further comprising storing the one or more cache lines in the memory prior to the determination of whether the one or more cache lines are to be retained in the memory. Example 12 includes the method of example 10, further comprising storing the one or more cache lines in the memory after the determination of whether the one or more cache lines are to be retained in the memory. Example 13 includes the method of example 10, further comprising determining whether to remove the one or more cache lines. Example 14 includes the method of example 10, further comprising determining whether to remove the one or more cache lines based at least in part on the indication of compressibility of the data. Example 15 includes the method of example 10, further comprising determining the compressibility of the data based at least on a size of an uncompressed version of the data and a size of the compressed version of the data. Example 16 includes the method of example 9, further comprising storing uncompressed data in the memory. Example 17 includes the method of example 10, wherein the memory includes non-volatile memory comprising one of: nanowire memory, Ferro-electric transistor random access memory (FeTRAM), magnetoresistive random access memory (MRAM), flash memory, Spin Torque Transfer Random Access Memory (STTRAM), Resistive Random Access Memory, Phase Change Memory (PCM), NAND, 3-Dimensional NAND, and byte addressable 3-Dimensional Cross Point Memory.
Example 18 includes a system comprising: memory; and at least one processor core to access the memory; the memory to store one or more cache lines corresponding to a compressed version of data in response to a determination that the data is compressible; logic to determine whether the one or more cache lines are to be retained or inserted in the memory at least in part based on an indication of compressibility of the data. Example 19 includes the system of example 18, wherein the one or more cache lines are to be stored in the memory prior to the determination of whether the one or more cache lines are to be retained in the memory. Example 20 includes the system of example 18, wherein the one or more cache lines are to be stored in the memory after the determination of whether the one or more cache lines are to be retained in the memory. Example 21 includes the system of example 18, comprising logic to determine whether to remove the one or more cache lines based at least in part on the indication of compressibility of the data. Example 22 includes the system of example 18, wherein the compressibility of the data is to be determined based at least in part on a size of an uncompressed version of the data and a size of the compressed version of the data. Example 23 includes the system of example 18, wherein the memory is to store uncompressed data. Example 24 includes the system of example 18, wherein the memory is to include non-volatile memory comprising one of: nanowire memory, Ferro-electric transistor random access memory (FeTRAM), magnetoresistive random access memory (MRAM), flash memory, Spin Torque Transfer Random Access Memory (STTRAM), Resistive Random Access Memory, Phase Change Memory (PCM), NAND, 3-Dimensional NAND, and byte addressable 3-Dimensional Cross Point Memory. Example 25 includes the system of example 18, wherein an SSD is to comprise the memory and the logic.
Example 26 includes a computer-readable medium comprising one or more instructions that when executed on a processor configure the processor to perform one or more operations to: store one or more cache lines, corresponding to a compressed version of data, in memory in response to a determination that the data is compressible; and determine whether the one or more cache lines are to be retained or inserted in the memory based at least in part on an indication of compressibility of the data. Example 27 includes the computer-readable medium of example 26, further comprising one or more instructions that when executed on the processor configure the processor to perform one or more operations to store the one or more cache lines in the memory prior to the determination of whether the one or more cache lines are to be retained in the memory. Example 28 includes the computer-readable medium of example 26, further comprising one or more instructions that when executed on the processor configure the processor to perform one or more operations to store the one or more cache lines in the memory after the determination of whether the one or more cache lines are to be retained in the memory.
Example 29 includes an apparatus comprising means to perform a method as set forth in any preceding example. Example 30 comprises machine-readable storage including machine-readable instructions, when executed, to implement a method or realize an apparatus as set forth in any preceding example.
In various embodiments, the operations discussed herein, e.g., with reference to
Additionally, such tangible computer-readable media may be downloaded as a computer program product, wherein the program may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals (such as in a carrier wave or other propagation medium) via a communication link (e.g., a bus, a modem, or a network connection).
Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least an implementation. The appearances of the phrase “in one embodiment” in various places in the specification may or may not be all referring to the same embodiment.
Also, in the description and claims, the terms “coupled” and “connected,” along with their derivatives, may be used. In some embodiments, “connected” may be used to indicate that two or more elements are in direct physical or electrical contact with each other. “Coupled” may mean that two or more elements are in direct physical or electrical contact. However, “coupled” may also mean that two or more elements may not be in direct contact with each other, but may still cooperate or interact with each other.
Thus, although embodiments have been described in language specific to structural features and/or methodological acts, it is to be understood that claimed subject matter may not be limited to the specific features or acts described. Rather, the specific features and acts are disclosed as sample forms of implementing the claimed subject matter.