The present disclosure is related to performing mathematical operations on changed versions of data objects via a storage compute device. In one embodiment, a method involves receiving and storing a data object from a host. A first mathematical operation is performed on the data object via a storage compute device. An update from the host is received and stored, the update data stored separately from the data object and including a portion of the data object that has subsequently changed. A second mathematical operation is performed on a changed version of the data object using the update data. The method may be implemented on a storage compute device and system.
These and other features and aspects of various embodiments may be understood in view of the following detailed discussion and accompanying drawings.
In the following diagrams, the same reference numbers may be used to identify similar/same components in multiple figures. The drawings are not necessarily to scale.
Some computational tasks are suited for massively distributed computing solutions. For example, data centers that provide web services, email, data storage, Internet search, etc., often distribute tasks among hundreds or thousands of computing nodes. The nodes are interchangeable and tasks may be performed in parallel by multiple computing nodes. This parallelism increases processing and communication speed, as well as increasing reliability through redundancy. Generally, the nodes may include rack mounted computers that are designed to be compact and power efficient, but otherwise operate similarly to desktop computer or server.
For certain types of tasks, it may be desirable to rearrange how data is processed within the individual nodes. For example, applications such as neuromorphic computing, scientific simulations, etc., may utilize large matrices that are processed in parallel by multiple computing nodes. In a traditional computing setup, matrix data may be stored in random access memory and/or non-volatile memory, where it is retrieved, operated on by relatively fast central processor unit (CPU) cores, and the results sent back to volatile and/or non-volatile memory. It has been shown that the bus lines and I/O protocols between the CPU cores and the memory can be a bottleneck for some types of computation.
This disclosure generally relates to use of a data storage device that performs internal computations on data on behalf of a host, and is referred to herein as a storage compute device. While a data storage device, such as a hard drive, solid-state drive (SSD), hybrid drive, etc., generally includes data processing capabilities, such processing is mostly related to the storage and retrieval of user data. So while the data storage device may perform some computations on the data, such as compression, error correction, etc., these computations are invisible to the host. Similarly, other computations, such as logical-to-physical address mapping, involve tracking host requests, but are intended to hide these tracking operations from the host. In contrast, a storage compute device makes computations based on express or implied computation instructions from the host, with the intention that some form of a result of the computation will be returned to the host and/or be retrievable by the host.
While a storage compute device as described herein may be able to perform as a conventional storage device, e.g., handling host data storage and retrieval requests, such storage compute devices may include additional computational capability that can be used for certain applications. For example, scientific and engineering simulations may involve solving equations on data objects such as very large matrices. Even though the matrices may be sparse, and therefore amenable to a more concise/compressed format for storage, the matrices may still be cumbersome to move in and out of storage for performing operations. For example, if available volatile, random access memory (RAM) is significantly smaller than the objects being operated on, then there may be a significant amount of swapping data between RAM and persistent storage.
While a conventional storage device can be used to store data objects, such device may not be given information that allows it to identify the objects. For example, host interfaces may only describe data operations as acting on logical block addresses (or sectors), to which the storage device translates to a physical address. In contrast, a storage compute device will obtain additional data that allows the storage device to manage the objects internally. This management may include, but is not limited to, selection of storage location, managing of object identifiers and other metadata (e.g., data type, extents, access attributes, security attributes), compression, and performance of single or multiple object computations and transformations.
In embodiments described below, a storage compute device includes two or more compute sections that perform computations on computation objects. For purposes of this discussion, computation objects may at least include objects that facilitate performing computations on data objects. Computation objects may include stored instructions, routines, formulas, definitions, etc., that facilitate performing repeatable operations. A computation object may include data objects, such as scalars/constants that are utilized in all of the relevant computations and accessible by the compute section (e.g., using local or shared volatile memory). Other data objects are used as inputs and outputs of the computations, and may also include temporary objects used as part of the computations, e.g., intermediate computation objects. While the examples below may refer to matrix data objects, the term “data object” as used herein is not intended to be limited to matrices. It will be understood that the embodiments described herein may be used to perform computations on other large data sets, such as media files/streams, neural networks, etc.
In storage compute devices described below, a controller receives and stores a computation object and one or more data objects from a host. The computation object defines a mathematical operation that is then performed on the one or more data objects. The host provides update data from the host, the update data including a sub- set of the data object that has subsequently changed. The mathematical operations are repeated on a changed version of the one or more data objects using the update data.
These features of a storage compute device can be used for operations where the device needs to repeat the same analysis at intervals as the data changes. The data may be changing slowly or quickly. For example, the storage device computation may be part of a larger iterative computation, which may involve repeating of the same calculation with incrementally updated objects. By incrementally updating currently stored objects instead of replacing them, performance can be improved and data storage requirements reduced. This may also provide other features, such as point-in-time snapshots and versioning.
In
The storage compute device 100 includes a processing unit 106. The processing unit 106 includes hardware such as general-purpose and/or special-purpose logic circuitry configured to perform functions of the storage compute device 100, including functions indicated in functional blocks 108-112. Functional block 112 provides legacy storage functionality, such as read, write, and verify operations on data that is stored on media. Blocks 108-111 represent specialized functionalities that allow the storage compute device 100 to provide internal computations on behalf of the host 104.
Block 108 represents a command parser that manages object-specific and computation-specific communications between the host 104 and storage compute device 100. For example, the block 108 may process commands that define objects (matrices, vectors, scalars, sparse distributed representations) and operations (e.g., scalar/matrix mathematical and logical operations) to be performed on the objects. A computation section 109 performs the operations on the objects, and may be specially configured for a particular class of operation. For example, if the storage compute device 100 is configured to perform a set of matrix operations, then the computation section 109 may be optimized for that set of operations. The optimization may include knowledge of how best to store and retrieve objects for the particular storage architecture used by the storage compute device 100, and how to combine and compare data objects.
An object storage module 110 manages object creation, storage, and access on the storage compute device 100. This may involve, among other things, storing metadata describing the objects in a database 115. The database 115 may also store logical and/or physical addresses associated with the object data. The object storage module 110 may manage other metadata associated with the objects via the database 115, such as permissions, object type, host identifier, local unique identifier, etc.
An object versioning module 111 manages host-initiated changes to stored data objects. The host 104 may issue a command that causes a data object currently stored on the storage compute device 100 to be changed. This may involve deleting or keeping the older version of the object. For example, if the data object is a matrix, the change command could include a first array of matrix row/column indicators and a second array with data values associated with the row/column indicators. The changes may be specified in other ways, such as providing a sub-array (which may include single rows or columns of data) and an index to where the sub-array is to be placed in the larger array to form the updated version.
As noted above, the functional blocks 108-112 may at some point will access persistent storage, and this can be done by way of a channel interface 116 that provides access to the storage unit 114. There may be a multiple channels, and there may be a dedicated channel interface 116 and computation section 109 for each channel. The storage unit 114 may include both volatile memory 120 (e.g., DRAM and SRAM) and non-volatile memory 122 (e.g., flash memory, magnetic media). The volatile memory 120 may be used as a cache for read/write operations performed by read/write block 112, such that a caching algorithm ensures data temporarily stored in volatile memory 120 eventually gets stored in the non-volatile memory 122. The computation section 109 may also have the ability to allocate and use volatile memory 120 for calculations. Intermediate results of calculations may remain in volatile memory 120 until complete and/or be stored in non-volatile memory 122.
As noted above, it is expected that data objects may be too large in some instances to be stored in volatile memory 120, and so may be accessed directly from non-volatile memory 122 while the calculation is ongoing. While non-volatile memory 122 may have slower access times than volatile memory 120, it still may be more efficient to work directly with non-volatile memory 122 rather than, e.g., breaking the problem into smaller portions and swapping in and out of volatile memory 120.
In
As seen in
The versioning data within the metadata 206c will be of interest to an object versioning component 212. The object versioning component 212 may receive a communication, as indicated by dashed line 214, when the object is created, or at least when the object is changed. In one configuration, the objects may receive a default initial revision upon creation, and the object versioning component 212 may only need to track versions after updates occur. An example of updating the illustrated matrix is shown in
In
In
The assembly operation 404 involves retrieving metadata 406 from the database 210. The metadata 406 at least includes information regarding where particular portions 408-410 of the matrix data are accessed in the storage unit 208. The metadata 406 may also include indicators of where the data portions are inserted into a base version, identifiers, names, timestamps, and/or events associated with the particular version, etc. The metadata 406 may be indexed via a unique identifier associated with the data object, e.g., provided in the host request 400.
Based on the metadata 406, the object versioning component 212 assembles the data portions 408-410 into the requested version 412 of the data object. This version 412 may be further processed (e.g., adding other metadata, formatting) to form a data object 414 that passed to the host 202 in response to the request 400. It will be understood that the host 202 need not be aware that the requested object is versioned. Each version may have its own unique identifier, and the host 202 need not be aware of the assembly processed used to retrieve a particular version. Also, while the example of a host request is used here for purposes of illustration, forming particular versions of objects may be performed in response to internal request. For example, the host 202 may load initial objects to the storage compute device 200 and specify particular, repeated operations to be performed on the initial objects. For each iteration, the storage compute device 200 may decide internally to use versioned objects to perform the repeated operations, or may do so at the request of the host 202.
While versioned objects are described as being changed by host commands such as shown in
In response to the command(s), the object storage component 504 writes data 510 of the objects to a storage unit 506 and writes metadata 511 of the objects to a database 507. The object storage component 504 then provides an acknowledgement 512 of success. The host 500 also defines a third object via command 513. This object is different in that it is a resultant of a computation, and so the object storage component 504 only writes metadata 514 of the object. The object storage component 504 may also perform other actions that are not shown, such as allocating space in the storage unit 506 and initializing the allocated space.
The host 500 sends a computation command, e.g., computation object 516, to a compute engine 508 that causes the compute engine 508 to multiply the first two objects A and B and put the result in the third object C. When complete, the compute engine 508 writes the result 517 to the storage unit 506 and acknowledges 518 completion with the host 500. Thereafter, the host gets the resultant object C via commands/actions 519-522. In this case, the resultant object may be part of a larger, iterative computation performed via a number of storage compute devices and/or hosts. In response to this iteration, the value of one of the inputs to the computation, object A, are changed.
This change to object A is communicated to the storage compute device 502, here by command 523 shown being directly sent to an object versioning component 505. The object versioning component 505 saves the update data 524 and metadata 525 and acknowledges 526 completion. Thereafter, the host 500 performs the same computation as was performed by computation object 516, except as seen by computation object 527, the computation involves the next version of the object A and the result is the next version of object C. The computation object 527 may be a stored version of the earlier computation object 516, but expressly or impliedly applied to the new versions as indicated.
While not shown in this diagram, performance of computation 527 may involve the object versioning component 505 providing an updated version of the input object A (now labeled A.1) to the compute engine 508. An example of this is shown and described above in relation to
After completion of the computation, the host 500 requests update data for the resultant object C.1 via computation object 530. Because this is a request for only the changes from a different version of object C, this computation object 530 is processed by the object versioning component 505, which retrieves the data via actions 531-533 in a similar way as the original object was retrieved via actions 520-522. The difference is that the data 533 received by the host 500 just represents the difference from the original object 522 earlier received, and it is up to the host 500 to apply the changes to obtain the full resultant object C.1. If the host 500 and storage compute device 502 are part of a larger system that is solving a distributed problem, then communicating just the changes between iterations may be sufficient to solve some types of problems. Such a system is shown in
In reference now to
The controller 614 receives from the host processor a data object, which is stored in the storage section 610. The compute section 612 performs a first mathematical operation on the data objects. Thereafter, the controller 614 receives update data from the host processor 602. The update data includes a portion of the data object that has subsequently changed. The update data may be stored in the storage section 610 separate from the data object. The compute section 612 then performs a second mathematical operation on a changed version of the one or more data objects using the update data. The changed version may be assembled dynamically for use in the calculation based on the original version plus any update data for the target version and intermediary versions.
The storage compute devices 606-608 may be able to coordinate communicating of object data and distribution of parallel tasks on a peer-to-peer basis, e.g., without coordination of the host processor 602. In other arrangements, the host processor 602 may provide some or all direction in dividing inter-host distribution of tasks in response to resource collisions. The host device 601 may be coupled to a network 618 via network interface 616. The tasks can also be extended to like-configured nodes 620 of the network 618, e.g., nodes having their own storage compute devices. If the distribution of tasks extends to the nodes 620, then the host processor 602 may generally be involved, at least in providing underlying network services, e.g., managing access to the network interface, processing of network protocols, service discovery, etc.
In reference now to
The various embodiments described above may be implemented using circuitry and/or software modules that interact to provide particular results. One of skill in the computing arts can readily implement such described functionality, either at a modular level or as a whole, using knowledge generally known in the art. For example, the flowcharts illustrated herein may be used to create computer-readable instructions/code for execution by a processor. Such instructions may be stored on a non-transitory computer-readable medium and transferred to the processor for execution as is known in the art.
The foregoing description of the example embodiments has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the inventive concepts to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. Any or all features of the disclosed embodiments can be applied individually or in any combination and are not meant to be limiting, but purely illustrative. It is intended that the scope be limited not with this detailed description, but rather determined by the claims appended hereto.