Examples described herein are generally related to techniques for improving performance of storing and accessing data in storage devices in computing systems.
A storage device includes one or more types of memory. A multi-level cell (MLC) is a memory element capable of storing more than a single bit of information, compared to a single-level cell (SLC) which can store only one bit per memory element. Triple-level cells (TLC) and quad-level cells (QLC) are versions of MLC memory, which can store 3 and 4 bits per cell, respectively. (Note that due to convention, the name “multi-level cell” is sometimes used specifically to refer to the “two-level cell”). Overall, memories are commonly referred to as SLC (1 bit per cell—fastest, highest cost); MLC (2 bits per cell); TLC (3 bits per cell); and QLC (4 bits per cell—slowest, least cost). One example of a MLC memory is QLC NAND flash memory.
Some computing systems use different types of storage devices to store data objects depending on the sizes of the data objects, the frequencies of access of the data objects, the desired access times, and so on. Some computing systems may include one or more storage nodes, with each storage node including one or more storage devices. A computing system may have storage devices of various types of memory, with various operating characteristics and capabilities. In some computing systems, hashing techniques are used to provide a deterministic way to distribute and locate data objects across the entire set of storage nodes in a computing system. One known hashing algorithm uses relative weights of storage nodes to identify how data objects are to be distributed evenly in a cluster of storage nodes without creating hot spots.
Data center administrators currently use command line tools at a system console to identify types of storage devices, group storage devices into logical pools, and manually assign weights based on documented storage device specifications. This manual storage setup (in some cases implemented as customized command line scripts) is based on known reference configurations to automate storage node weights for consistent hashing during a storage pool provisioning step.
The solutions used currently are manual and error prone due to a lack of a clear way to assign weights to storage nodes in the storage pool based on storage device properties. Further, storage device specifications may not be available, or may be incorrect or outdated, which makes this information an unreliable source for assessing storage device performance. A data center administrator typically runs a few synthetic (e.g., artificial or contrived) benchmarks to identify storage device performance characteristics and then manually assigns weights for the storage devices and storage nodes. Given the increasingly large number of storage devices in modern computer server farms, this approach is problematic.
As contemplated in the present disclosure, a storage device may expose performance characteristics information (e.g., rating information) which gets used by a storage management component to determine a storage management policy for a computing system. In an embodiment, the storage management policy may be based on automated memory grouping (also calling pooling or tiering) and assigned relative weights based on the storage device performance characteristics information to improve on hashing-based data distribution within a computing system.
Each data center region in computing system 100 may include one or more storage nodes. For example, data center region 1102 includes “J” number of storage nodes, denoted storage node 1-1110, storage node 1-2112, . . . storage node 1-J 114, where J is a natural number. For example, data center region 2104 includes “K” number of storage nodes, denoted storage node 2-1116, storage node 2-2118, . . . storage node 2-K 120, where K is a natural number. For example, data center region N 106 includes “L” number of storage nodes, denoted storage node N-1122, storage node N-2124, . . . storage node N-L 120, where L is a natural number.
Thus, in some examples, depending on the overall storage requirements for computing system 100, the computing system may include many storage nodes, with each storage node possibly including many storage devices. Further, each storage device may include one or more memories. Each storage device 204, 206, . . . 208 may have performance characteristics information that is discoverable by storage management component 108.
According to some examples, as shown in
In some examples, storage controller 324 may include logic and/or features to receive transaction requests to storage memory device(s) 322 at storage device 320. For these examples, the transaction requests may be initiated by or sourced from OS 311 that may, in some embodiments, utilize file system 313 to write/read data to/from storage device 320 through input/output (I/O) interfaces 303 and 323.
In some examples, memory 326 may include volatile types of memory including, but not limited to, RAM, D-RAM, DDR SDRAM, SRAM, T-RAM or Z-RAM. One example of volatile memory includes DRAM, or some variant such as SDRAM. A memory subsystem as described herein may be compatible with a number of memory technologies, such as DDR4 (DDR version 4, initial specification published in September 2012 by JEDEC), LPDDR4 (LOW POWER DOUBLE DATA RATE (LPDDR) version 4, JESD209-4, originally published by JEDEC in August 2014), WIO2 (Wide I/O 2 (WideIO2), JESD229-2, originally published by JEDEC in August 2014), HBM (HIGH BANDWIDTH MEMORY DRAM, JESD235, originally published by JEDEC in October 2013), DDR5 (DDR version 5, currently in discussion by JEDEC), LPDDR5 (LPDDR version 5, currently in discussion by JEDEC), HBM2 (HBM version 2, currently in discussion by JEDEC), and/or others, and technologies based on derivatives or extensions of such specifications.
However, examples are not limited in this manner, and in some instances, memory 326 may include non-volatile types of memory, whose state is determinate even if power is interrupted to memory 326. In some examples, memory 326 may include non-volatile types of memory that is a block addressable, such as for NAND or NOR technologies. Thus, memory 326 can also include a future generation of types of non-volatile memory, such as a 3-dimensional cross-point memory (3D XPoint™ commercially available from Intel Corporation), or other byte addressable non-volatile types of memory. According to some examples, memory 126 may include types of non-volatile memory that includes chalcogenide glass, multi-threshold level NAND flash memory, NOR flash memory, single or multi-level Phase Change Memory (PCM), a resistive memory, nanowire memory, FeTRAM, MRAM that incorporates memristor technology, or STT-MRAM, or a combination of any of the above, or other memory.
In some examples, storage memory device(s) 322 may be a device to store data from write transactions and/or write operations. Storage memory device(s) 322 may include one or more chips or dies having gates that may individually include one or more types of non-volatile memory to include, but not limited to, NAND flash memory, NOR flash memory, 3-D cross-point memory (3D XPoint™), ferroelectric memory, SONOS memory, ferroelectric polymer memory, FeTRAM, FeRAM, ovonic memory, nanowire, EEPROM, phase change memory, memristors or STT-MRAM. For these examples, storage device 320 may be arranged or configured as a solid-state drive (SSD). The data may be read and written in blocks and a mapping or location information for the blocks may be kept in memory 326.
According to some examples, communications between storage device driver 315 and storage controller 324 for data stored in storage memory devices(s) 322 and accessed via files 313-1 to 313-n may be routed through I/O interface 303 and I/O interface 323. I/O interfaces 303 and 323 may be arranged as a Serial Advanced Technology Attachment (SATA) interface to couple elements of server 310 to storage device 320. In another example, I/O interfaces 303 and 323 may be arranged as a Serial Attached Small Computer System Interface (SCSI) (or simply SAS) interface to couple elements of server 310 to storage device 320. In another example, I/O interfaces 303 and 323 may be arranged as a Peripheral Component Interconnect Express (PCIe) interface to couple elements of server 310 to storage device 320. In another example, I/O interfaces 303 and 323 may be arranged as a Non-Volatile Memory Express (NVMe) interface to couple elements of server 310 to storage device 320. For this other example, communication protocols may be utilized to communicate through I/O interfaces 303 and 323 as described in industry standards or specifications (including progenies or variants) such as the Peripheral Component Interconnect (PCI) Express Base Specification, revision 3.1, published in November 2014 (“PCI Express specification” or “PCIe specification”) or later revisions, and/or the Non-Volatile Memory Express (NVMe) Specification, revision 1.2, also published in November 2014 (“NVMe specification”) or later revisions.
In some examples, system memory device(s) 312 may store information and commands which may be used by circuitry 316 for processing information. Also, as shown in
In some examples, storage device driver 315 may include logic and/or features to forward commands associated with one or more read or write transactions and/or read or write operations originating from OS 311. For example, the storage device driver 315 may forward commands associated with write transactions such that data may be caused to be stored to storage memory device(s) 322 at storage device 320.
System Memory device(s) 312 may include one or more chips or dies having volatile types of memory such RAM, D-RAM, DDR SDRAM, SRAM, T-RAM or Z-RAM. However, examples are not limited in this manner, and in some instances, system memory device(s) 312 may include non-volatile types of memory, including, but not limited to, NAND flash memory, NOR flash memory, 3-D cross-point memory (3D XPoint™), ferroelectric memory, SONOS memory, ferroelectric polymer memory, FeTRAM, FeRAM, ovonic memory, nanowire, EEPROM, phase change memory, memristors or STT-MRAM.
Persistent memory 319 may include one or more chips or dies having non-volatile types of memory, including, but not limited to, NAND flash memory, NOR flash memory, 3-D cross-point memory (3D XPoint™), ferroelectric memory, SONOS memory, ferroelectric polymer memory, FeTRAM, FeRAM, ovonic memory, nanowire, EEPROM, phase change memory, memristors or STT-MRAM.
According to some examples, server 310 may include, but is not limited to, a server, a server array or server farm, a web server, a network server, an Internet server, a work station, a mini-computer, a main frame computer, a supercomputer, a network appliance, a web appliance, a distributed computing system, a personal computer, a tablet computer, a smart phone, multiprocessor systems, processor-based systems, or combination thereof, in a data center region.
In an embodiment, flow 500 may be implemented in storage management component 108 of system 100 shown in
Included herein is a set of logic flows representative of example methodologies for performing novel aspects of the disclosed architecture. While, for purposes of simplicity of explanation, the one or more methodologies shown herein are shown and described as a series of acts, those skilled in the art will understand and appreciate that the methodologies are not limited by the order of acts. Some acts may, in accordance therewith, occur in a different order and/or concurrently with other acts from that shown and described herein. For example, those skilled in the art will understand and appreciate that a methodology could alternatively be represented as a series of interrelated states or events, such as in a state diagram. Moreover, not all acts illustrated in a methodology may be required for a novel implementation.
A logic flow may be implemented in software, firmware, and/or hardware. In software and firmware embodiments, a logic flow may be implemented by computer executable instructions stored on at least one non-transitory computer readable medium or machine readable medium, such as an optical, magnetic or semiconductor storage. The embodiments are not limited in this context.
Storage management component 108 may be executed to automatically determine a storage policy for system 100, taking into account characteristics and performance ratings of one or more storage nodes in the system, and one or more storage devices in each storage node. The storage policy may be used by system 100 to make decisions for allocating data to be stored in the storage node(s), and a storage device(s) within a storage node, that may be best suited for overall system performance. In an embodiment, storage management component 108 may be executed upon startup of system 100. In an embodiment, storage management component 108 may be executed on demand (e.g., manually) by a system administrator or may be scheduled to be executed periodically. In another embodiment, storage management component 108 may be executed whenever a storage node is activated or deactivated in the system. In another embodiment, storage management component 108 may be executed whenever a storage device is activated or deactivated in the system. In an embodiment, storage management component 108 may automatically determine the storage policy based on an analysis of one or more storage devices in one or more storage nodes of system 100.
Prior to processing the storage nodes and their storage devices, storage management component 108 may initialize variables to be used in further calculations. In an embodiment, storage management component 108 may initialize a IOPS Denominator, a Throughput Denominator, a Capacity Denominator, a IOPS Relative Weight, a Capacity Relative Weight, and a Throughput Relative Weight. Processing may begin with a first storage device within a first storage node of system 100 at block 502. Storage management component 108 may get the storage device rating for the storage device. At block 504, storage management component 108 may assign the storage device to a storage pool. In an embodiment, a storage pool may be a group or collection of storage devices that have similar operating characteristics.
If the memory type at block 604 is SLC NAND, then the storage device may be assigned to journaling pool 614. In an embodiment, the journaling pool may be used to store log files of changes to data. Because SLC NAND is used, a higher level of NAND performance with high endurance, but with lower cost than 3-D cross-point, may be provided for the journaling pool. In an embodiment, updates to the data in the journaling pool may be write intensive. If the memory type at block is TLC 3D NAND, then the storage device may be assigned to performance pool 616. Performance pool may be used for performance-oriented workloads that do not have extremely low latency requirements. If the memory type at block 604 is QLC 3D NAND, in an embodiment storage management component 108 may check the read/write throughput ratio of the storage device at block 618. A QLC 3D NAND may provide lower endurance and lower write bandwidth performance that SLC NAND or TLC 3D NAND, but at higher capacity and lower cost. In an embodiment, the read/write throughput ratio may be obtained by performing the Get storage drive rating command. In one embodiment, if the ratio is greater than a predefined value such as 8:2, then storage management component 108 may check the drive writes per day (DWPD) endurance metric for the storage device. In an embodiment, the DPWD metric may be obtained by performing the Get storage drive rating command. In an embodiment, if the DWPD of the storage device is greater than a predefined value such as 0.3, then the storage device may be assigned to throughput pool 622. In an embodiment, throughput pool 622 may be used to store, for example, streaming data, for applications that require higher writes per day. In an embodiment, if the DWPD of the storage device is less than or equal to a predefined value such as 0.3 or is less than or equal to a predefined read/write throughput ratio such as 8:2, then storage device may be assigned to capacity pool 624. In an embodiment, capacity pool 624 may be used to store, for example, data to be archived for longer periods of time with less frequent access.
Although six different storage pool types are shown in
Turning back now to
Drive IOPS Weight=Drive IOPS/IOPS Denominator;
Drive Capacity Weight=Drive Capacity/Capacity Denominator;
Throughput Weight=Drive Throughput/Throughput Denominator; wherein the values for Drive IOPS, Drive Capacity, and Drive Throughput may be obtained from the storage device.
In other embodiments, other or additional individual storage device weights may be used.
Next, at block 508 storage management component 108 may calculate a relative storage device weight based at least in part on the individual storage device weights. In an embodiment, the relative storage device weight may be calculated:
Relative Storage Device Weight=(IOPS Relative Weight*Drive IOPS Weight)+(Capacity Relative Weight*Drive Capacity Weight)+(Throughput Relative Weight*Throughput Weight).
At block 510, storage management component 108 determines if more storage devices for the current storage node need to be processed. If so, processing continues at block 502 with the next storage device for the current storage node. If not, processing continues with block 512, where a storage node weight may be calculated at least in part on the relative storage device weights. In an embodiment, the storage node weight represents the aggregated weight of the storage devices of that storage node. In an embodiment, the storage node weight may be calculated as:
Storage Node Weight=ΣRelative Storage Device Weights
At block 514, storage management component 108 determines if more storage nodes need to be processed. If so, processing continues with the first storage device of the next storage node in system 100 at block 502. If not, all storage devices in all storage nodes have now been processed. At block 516, storage management component 108 may automatically determine a storage policy for system 100 based at least in part on the storage node weight for each storage node and the pools. The storage policy may be determined without manual intervention or activation by a system administrator. The storage policy may be used by system 100 to automatically determine which storage nodes and storage devices within storage nodes are to be used for storing data.
According to some examples, a component called circuitry 316 of
Server 310 may be part of a computing device that may be, for example, user equipment, a computer, a personal computer (PC), a desktop computer, a laptop computer, a notebook computer, a netbook computer, a tablet, a smart phone, embedded electronics, a gaming console, a server array or server farm, a web server, a network server, an Internet server, a work station, a mini-computer, a main frame computer, a supercomputer, a network appliance, a web appliance, a distributed computing system, multiprocessor systems, processor-based systems, or combination thereof. Accordingly, functions and/or specific configurations of server 310 described herein, may be included or omitted in various embodiments of server 310, as suitably desired.
The components and features of server 310 may be implemented using any combination of discrete circuitry, ASICs, logic gates and/or single chip architectures. Further, the features of server 310 may be implemented using microcontrollers, programmable logic arrays and/or microprocessors or any combination of the foregoing where suitably appropriate. It is noted that hardware, firmware and/or software elements may be collectively or individually referred to herein as “logic”, “circuit” or “circuitry.”
Some examples may be described using the expression “in one example” or “an example” along with their derivatives. These terms mean that a particular feature, structure, or characteristic described in connection with the example is included in at least one example. The appearances of the phrase “in one example” in various places in the specification are not necessarily all referring to the same example.
Some examples may be described using the expression “coupled” and “connected” along with their derivatives. These terms are not necessarily intended as synonyms for each other. For example, descriptions using the terms “connected” and/or “coupled” may indicate that two or more elements are in direct physical or electrical contact with each other. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
It is emphasized that the Abstract of the Disclosure is provided to comply with 37 C.F.R. Section 1.72(b), requiring an abstract that will allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing Detailed Description, it can be seen that various features are grouped together in a single example for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed examples require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed example. Thus, the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separate example. In the appended claims, the terms “including” and “in which” are used as the plain-English equivalents of the respective terms “comprising” and “wherein,” respectively. Moreover, the terms “first,” “second,” “third,” and so forth, are used merely as labels, and are not intended to impose numerical requirements on their objects.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.