Data storage systems are arrangements of hardware and software in which storage processors are coupled to arrays of non-volatile storage devices, such as magnetic disk drives, electronic flash drives, and/or optical drives. The storage processors service storage requests arriving from host machines (“hosts”), which specify blocks, files, and/or other data elements to be written, read, created, deleted, etc. Software running on the storage processors manages incoming storage requests and performs various data processing tasks to organize and secure the data elements on the non-volatile storage devices.
Performance metrics are often employed in data storage systems. Certain performance metrics of data storage systems may be measured directly, while other, such as behavioral metrics are more complicated to measure. Such behavioral metrics may be estimated in various ways from the directly measured metrics. Behavioral metrics may be estimated using analytical formulas or trained neural networks.
The foregoing background is presented for illustrative purposes to assist the reader in readily understanding the background in which the invention was developed. However, the foregoing background is not intended to set forth any admission that any particular subject matter has the legal effect of prior art.
Conventional approaches to estimating the behavioral metrics may suffer from deficiencies. Although analytical formulas may produce fast results without using many processing resources, trained neural networks tend to produce more accurate results. Unfortunately, the data representing these trained neural networks is often quite large, requiring a large amount of memory and processing resources to run. Therefore, administrators are often reluctant to run neural networks on their data storage systems, as the neural networks can compete for resources with the storage system's main task of servicing I/O requests. Instead of the neural networks running on the data storage systems themselves, they could be run on a remote server. However, such an approach may result in considerable latency in receiving results.
Thus, it would be desirable to operate a data storage system that is able to estimate its behavioral performance metrics accurately using a neural network but without suffering from either high latency or high utilization of data storage system resources. This result may be accomplished by running a full neural network on a remote server and creating a scaled-down version of that full neural network to run on the data storage system itself. The scaled-down version may be a neural network that runs at a lower level of numerical precision. For example, the neural network may be “discretized,” in which synapses of the full neural network are either eliminated if their weights are below a threshold or converted into simple unweighted synapses if their weights are above the threshold. In effect, the original floating point representation of the synapse's weight is rounded to an integer representation with only two distinct values (1 and 0). This discretization allows many nodes of the full neural network to be eliminated in the scaled-down version, reducing the memory footprint on the data storage system. In addition, both the reduced size and the elimination of weighting allows the scaled-down neural network to be operated using far fewer processing resources. Further, a discretized representation allows the use of integer math for any necessary calculations on the discretized neural network rather than much slower floating point math used by the full neural network. The full neural network is still available to check the accuracy of the results, while the scaled-down version is still able to produce a sufficiently accurate approximation in real-time or near real-time. In addition, the scaled-down version is able to receive updates in response to continued training of the full neural network.
In one embodiment, a method is performed by a computing device for monitoring storage performance of a remote data storage apparatus (DSA). The method includes (a) receiving performance metrics of the DSA and a first set of behavioral estimates generated by a first neural network (NN) running on the DSA operating on the performance metrics; (b) operating a second NN on the computing device with the received performance metrics as inputs, the second NN configured to produce a second set of behavioral estimates as outputs in response to the performance metrics, the second NN running at a higher level of precision than the first NN; and (c) sending to the remote DSA updated parameters of an updated version of the first NN based at least in part on the performance metrics and the first and second sets of behavioral estimates. An apparatus and computer program product for performing a similar method are also provided.
In one embodiment, a method is performed by a computerized apparatus for monitoring storage performance of the apparatus. The method includes (1) operating a first neural network (NN) on the apparatus with performance metrics of the apparatus as inputs, the first NN configured to produce a first set of behavioral estimates as outputs in response to the performance metrics; (2) sending the performance metrics and the first set of behavioral estimates to a remote computing device configured to run a second NN, the second NN configured to produce a second set of behavioral estimates as outputs in response to the performance metrics, the second NN running at a higher level of precision than the first NN; (3) receiving updated parameters of the first NN from the remote computing device in response to the remote computing device updating the first NN based at least in part on the performance metrics and the first and second sets of behavioral estimates; and (4) updating the first NN with the received updated parameters and operating the updated first NN on the apparatus to produce additional behavioral estimates. An apparatus and computer program product for performing a similar method are also provided.
In one embodiment, a system is provided. The system includes (I) a plurality of computerized data storage apparatuses (DSAs) and (II) a remote computing device remote from the DSAs. Each DSA is configured to (A) operate a first neural network (NN) on that DSA with performance metrics of that DSA as inputs, the first NN configured to produce a first set of behavioral estimates as outputs in response to the performance metrics; (B) send the performance metrics and the first set of behavioral estimates to the remote computing device; (C) receive updated parameters of the first NN from the remote computing device; and (D) update the first NN with the received updated parameters and operate the updated first NN on that DSA to produce additional behavioral estimates. The remote computing device is configured to, for each DSA, (i) receive the performance metrics and the first set of behavioral estimates from that DSA; (ii) operate a second NN for that DSA with the received performance metrics as inputs, the second NN configured to produce a second set of behavioral estimates as outputs in response to the performance metrics, the second NN running at a higher level of precision than the first NN; and (iii) send to that DSA updated parameters of an updated version of the first NN based at least in part on the performance metrics and the first and second sets of behavioral estimates.
The foregoing summary is presented for illustrative purposes to assist the reader in readily grasping example features presented herein. However, the foregoing summary is not intended to set forth required elements or to limit embodiments hereof in any way.
The foregoing and other features and advantages will be apparent from the following description of particular embodiments of the invention, as illustrated in the accompanying drawings, in which like reference characters refer to the same or similar parts throughout the different views.
Embodiments are directed to techniques for operating a data storage system that is able to estimate its behavioral performance metrics accurately using a neural network but without suffering from either high latency or high utilization of data storage system resources. This result may be accomplished by running a full neural network on a remote server and creating a scaled-down version of that full neural network to run on the data storage system itself. The scaled-down version may be a neural network that runs at a lower level of numerical precision. For example, the neural network may be “discretized,” in which synapses of the full neural network are either eliminated if their weight are below a threshold or converted into simple unweighted synapses if their weight are above the threshold. In effect, the original floating point representation of the synapse's weight is rounded to an integer representation with only two distinct values (1 and 0). This discretization allows many nodes of the full neural network to be eliminated in the scaled-down version, reducing the memory footprint on the data storage system. In addition, both the reduced size and the elimination of weighting allows the scaled-down neural network to be operated using far fewer processing resources. Further, a discretized representation allows the use of integer math for any necessary calculations on the discretized neural network rather than much slower floating point math used by the full neural network. The full neural network is still available to check the accuracy of the results, while the scaled-down version is still able to produce a sufficiently accurate approximation in real-time or near real-time. In addition, the scaled-down version is able to receive updates in response to continued training of the full neural network.
The server 50 and each DSA 32 at least include network interface circuitry 34, processing circuitry 36, and memory 40, as well as interconnection circuitry and various other circuitry and parts (not depicted).
Network interface circuitry 34 may include one or more Ethernet cards, cellular modems, Fibre Channel (FC) adapters, Wireless Fidelity (Wi-Fi) wireless networking adapters, and/or other devices for connecting to a network 35. Network 35 may include a LAN, WAN, VPN, cellular network, wireless network, the Internet, other types of computer networks, and various combinations thereof.
Processing circuitry 36 may be any kind of processor or set of processors configured to perform operations, such as, for example, a microprocessor, a multi-core microprocessor, a digital signal processor, a system on a chip, a collection of electronic circuits, a similar kind of controller, or any combination of the above. Processing circuitry is typically general-purpose, for performing various types of processing. In some embodiments, server 50 also includes specialized processing circuitry 37, such as, for example, a graphical processing unit (GPU) or general-purpose GPU (GPGPU) like an NVIDIA GEFORCE GPU or an AMD RADEON GPU. In some embodiments, a DSA 32 does not include such specialized processing circuitry 37.
Memory 40 may include any kind of digital system memory, such as, for example, random access memory (RAM). Memory 40 stores an operating system (OS, not depicted, such as, for example, a Linux, UNIX, Windows, MacOS, or similar operating system) as well as various drivers and applications (not depicted) in operation. Memory 40 may also store various other data structures used by the OS, drivers, and applications.
Each DSA 32 includes storage interface circuitry 38 and persistent data storage 39. Storage interface circuitry 38 controls and provides access to the persistent storage 39. Storage interface circuitry 38 may include, for example, SCSI, SAS, ATA, SATA, FC, M.2, and/or other similar controllers and ports. Persistent storage 39 may be made up of one or more persistent storage devices, such as, for example, magnetic disks, flash drives, solid-state storage drives, or other types of storage drives.
In some embodiments, memory 40 may also include a persistent storage portion (not depicted). Persistent storage portion of memory 40 may be made up of one or more persistent storage devices, such as, for example, magnetic disks, flash drives, solid-state storage drives, or other types of storage drives. Persistent storage portion of memory 40 or persistent storage 39 is configured to store programs and data even while the computing device 32 is powered off. The OS, applications, and drivers are typically stored in this persistent storage portion of memory 40 or persistent storage 39 so that they may be loaded into a system portion of memory 40 upon a system restart or as needed. The various applications, when stored in non-transitory form either in the volatile portion of memory 40 or in the persistent portion of memory 40 or in persistent storage 39, each form a computer program product. The processing circuitry 36, 37 running one or more applications thus forms a specialized circuit constructed and arranged to carry out the various processes described herein.
Each DSA 32 operates an I/O driver stack (not depicted) to process data storage commands with respect to the persistent storage 39.
Server 50 includes a full-precision neural network 52 (depicted as full-precision neural networks 52(a), 52(b), . . . ) for each DSA 32. Each full-precision neural network 52 includes various nodes and interconnecting weighted synapses (not depicted) and is configured to receive a set of performance metrics 44 (depicted as performance metrics 44(a), 44(b), . . . ) for a particular DSA 32 as input values. The full-precision neural network 52 is configured to operate on those input values and to produce behavioral estimates 56 (depicted as behavioral estimates 56(a), 56(b), . . . ) for the particular DSA 32 as output values.
Each DSA 32 also includes a reduced-precision neural network 42 (for example, reduced-precision neural network 42(a) for DSA 32(a)). Each reduced-precision neural network 42 is configured to operate on the set of performance metrics 44 of its DSA 32 as input values and to produce behavioral estimates 46 (depicted as behavioral estimates 46(a), 46(b), . . . ) for the DSA 32 as output values. Each reduced-precision neural network 42 is a scaled-down version of the corresponding full-precision neural network 52 from the server 50. Typically, the reduced-precision neural network 42 includes fewer nodes and synapses (not depicted) than the corresponding full-precision neural network 52, allowing it to be stored fully in memory 40. The synapses of the reduced-precision neural network 42 have a lower level of numerical precision than the synapses of the full-precision neural network 52. In one embodiment, in which the reduced-precision neural network 42 is “discretized,” each synapse of the reduced-precision neural network 42 is unweighted, allowing faster processing. An example full-precision neural network 52 and corresponding reduced-precision neural network 42 are described in more detail below in connection with
In some embodiments, although the full-precision neural network 52 includes many nodes and synapses operating at a high degree of numerical precision (e.g., 64-bit floating point), operation can be accelerated by running on the specialized processing circuitry 37 of the server 50. This allows the server 50 to operate full-precision neural networks 52 for many DSAs 32.
Behavioral estimates 46 are not as accurate as behavioral estimates 56, but they are still accurate enough for many purposes.
In operation, a DSA 32 runs its reduced-precision neural network 42 operating on its performance metrics 44 as inputs, yielding behavioral estimates 46 as outputs. DSA 32 may then use those behavioral estimates 46 for various purposes, such as displaying to a user and adjusting its operation, as needed. For example, the behavioral estimates 46 may include a compression ratio (or, more generally, a data reduction ratio), and the DSA 32 may use that compression ratio or data reduction ratio to calculate how much to throttle incoming writes.
DSA 32 also sends a signal 48 (depicted as signals 48(a), 48(b)) to the server 50, including the set of performance metrics 44 and the corresponding set of behavioral estimates 46. In response the server 50 operates its full-precision neural networks 52 for that DSA 32 on the performance metrics 44 as inputs, yielding behavioral estimates 56 as outputs. Server 50 may compare the behavioral estimates 46, 56, and if they differ significantly, server 50 may update the reduced-precision neural network 42 for that DSA 32. In some embodiments, this may also include updating the full-precision neural networks 52 for that DSA 32, such as by running machine learning techniques. If an update is performed, server 50 sends an update signal 58 back to the DSA 32 including updated parameters (e.g., a topology and set of activation functions for each node) of the reduced-precision neural network 42 for that DSA 32.
In step 110, server 50 receives performance metrics 44 of the DSA 32 and a first set of behavioral estimates 46 generated by a first neural network (e.g., reduced-precision neural network 42) running on the DSA 32 operating on the performance metrics 44.
In step 120, server 50 operates a second neural network (e.g., full-precision neural network 52) with the received performance metrics 44 as inputs, the second neural network configured to produce a second set of behavioral estimates 56 as outputs in response to the performance metrics 44, the second neural network running at a higher level of precision than the first neural network. In some embodiments, step 120 is performed on specialized processing circuitry 37.
In step 130, server 50 generates an updated version of the first neural network based at least in part on the performance metrics 44 and the first and second sets of behavioral estimates 46, 56. The updated version of the first neural network includes updated parameters of the first neural network. In some embodiments, step 130 may be illustrated with respect to
In
Other synapses 310, with weights W3, W4, W7, W8, W10, W11, W12 may be maintained as unweighted synapses 312 in arrangement 311 since they have values above the threshold of 0.2. In some embodiments, as depicted, since hidden node 304(4) would only have one input synapse 312 in arrangement 311, hidden node 304(4) is not used in arrangement 311, and synapses 312 are instead inserted directly between nodes 304′(2) and 306′(1) and between nodes 304′(2) and 306′(2).
In some embodiments, an alternative arrangement 311′ of the reduced-precision neural network 42 is generated including a confidence value 316 as an additional output. Thus, input node 302(1) is maintained as input node 302′(1), and additional hidden nodes 304′(6), 304′(7) are added, together with corresponding unweighted synapses 312 to generate new output node 316. Confidence value 316 indicates how likely the output nodes 306′ are to be close in value to the output nodes 306. If this value is below a confidence threshold (e.g., 0.8), then the values output nodes 306′ may be ignored, and the DSA 32 may instead choose to ask the server 50 for the values of its output nodes 306.
In some embodiments, confidence value 316 may be an array of values. In these embodiments, each value of the array corresponds to a respective one of the other output nodes 306′, indicating whether or not that output value 306′ is to be ignored or not. Thus, in these embodiments, each array value may be zero or all ones (e.g., 11111111), allowing them to be XORed with the values of the other output nodes 306′ to quickly identify invalid results.
Returning to
In step 210, DSA 32 operates a first neural network (e.g., reduced-precision neural network 42) with performance metrics 44 of the DSA 32 as inputs. The first neural network is configured to produce a first set of behavioral estimates 46 as outputs in response to the performance metrics 44. In some embodiments, the performance metrics 44 may initially be converted from floating point values into integer values so that integer mathematical operations may be utilized throughout the neural network 42, thereby speeding up operation.
In step 220, DSA 32 sends the performance metrics 44 and the first set of behavioral estimates 46 to a remote computing device (e.g., server 50) configured to run a second neural network (e.g., full-precision neural network 52). The second neural network 52 is configured to produce a second set of behavioral estimates 56 as outputs in response to the performance metrics 44. In addition, the second neural network 52 runs at a higher level of precision than the first neural network 42.
In some embodiments, optional steps 230-248 may be performed. In step 230, DSA 32 determines whether or not a confidence value 316 of the behavioral metrics 46 exceeds a confidence threshold. If it does, then operation proceeds with step 235, in which the DSA 32 utilizes the first set of behavioral estimates 46 (e.g., informing a user of the DSA 32 of values of the first set of behavioral estimates and/or throttling intake of write commands based in part on a data reduction ratio of the first set of behavioral estimates 46, etc.). Otherwise, operation proceeds with steps 240-248, in which the behavioral estimates 46 are not used by the DSA 32 (step 240), the DSA 32 instead requesting (step 242) and receiving (step 244) the behavioral estimates 56 generated by the full-precision neural network 52 from the server 50 to be used instead of the first set of behavioral estimates 46 (step 248). In some embodiments, steps 230-248 are performed separately for each output value 306′ of the behavioral metrics 46, the confidence value 316 being an array of values.
In any case, operation may proceed with step 250. It should be understood that step 250 may not always follow step 220. Thus, step 250 is only performed if the server 50 generates an updated version of reduced-precision neural network 42. In step 250, DSA 32 receives updated parameters of the first neural network 42 from the server 50 in response to the server 50 updating the first neural network 42 based at least in part on the performance metrics and the first and second sets of behavioral estimates 46, 56.
In step 260, DSA 32 updates the first neural network 42 with the received updated parameters and operates the updated first neural network 42 on the DSA 32 to produce additional behavioral estimates 46 going forward.
As used throughout this document, the words “comprising,” “including,” “containing,” and “having” are intended to set forth certain items, steps, elements, or aspects of something in an open-ended fashion. Also, as used herein and unless a specific statement is made to the contrary, the word “set” means one or more of something. This is the case regardless of whether the phrase “set of” is followed by a singular or plural object and regardless of whether it is conjugated with a singular or plural verb. Further, although ordinal expressions, such as “first,” “second,” “third,” and so on, may be used as adjectives herein, such ordinal expressions are used for identification purposes and, unless specifically indicated, are not intended to imply any ordering or sequence. Thus, for example, a “second” event may take place before or after a “first event,” or even if no first event ever occurs. In addition, an identification herein of a particular element, feature, or act as being a “first” such element, feature, or act should not be construed as requiring that there must also be a “second” or other such element, feature or act. Rather, the “first” item may be the only one. Although certain embodiments are disclosed herein, it is understood that these are provided by way of example only and that the invention is not limited to these particular embodiments.
While various embodiments have been particularly shown and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the appended claims.
For example, although various embodiments have been described as being methods, software embodying these methods is also included. Thus, one embodiment includes a tangible non-transitory computer-readable storage medium (such as, for example, a hard disk, a floppy disk, an optical disk, flash memory, etc.) programmed with instructions, which, when performed by a computer or a set of computers, cause one or more of the methods described in various embodiments to be performed. Another embodiment includes a computer that is programmed to perform one or more of the methods described in various embodiments.
Furthermore, it should be understood that all embodiments which have been described may be combined in all possible combinations with each other, except to the extent that such combinations have been explicitly excluded.