Various entities are increasingly relying on “cloud” storage services provided by various cloud storage vendors and so many applications have been designed to employ application program interfaces (“APIs”) provided by these vendors. Presently, a commonly used cloud storage service is AMAZON's Simple Storage Service (“S3”). A second commonly employed cloud storage service is MICROSOFT AZURE.
Although entities desire to use these applications that are designed to function with one or more cloud service APIs, they also sometimes want more control over how and where the data is stored. As an example, many entities prefer to use data storage systems that they have more control over, e.g., data storage servers commercialized by NetApp, Inc., of Sunnyvale, Calif. Such data storage systems have met with significant commercial success because of their reliability and sophisticated capabilities that remain unmatched, even among cloud service vendors. Entities typically deploy these data storage systems in their own data centers or at “co-hosting” centers managed by a third party.
Data storage systems provide their own protocols and APIs that are different from the APIs provided by cloud service vendors and so applications designed to be used with one often cannot be used with the other. Thus, some entities that are interested in using applications designed for use on cloud storage services but with data storage systems they can exercise more control over.
Technology is disclosed for event processing using distributed tables for storage services compatibility (“disclosed technology”). In various embodiments, the disclosed technology supports capabilities for enabling a data storage system to provide aspects of a cloud data storage service API. The technology may employ an eventually consistent database for storing metadata relating to stored objects. The metadata can indicate various attributes relating to data that is stored separately. These attributes can include a mapping between how data stored at a data storage system may be represented at a cloud data storage service, e.g., an object storage namespace. For example, data may be stored in a file in the data storage service, but retrieved using an object identifier (e.g., similar to a uniform resource locator) provided by a cloud storage service.
A commercialized example of an eventually consistent database is “Cassandra,” but the technology can function with other databases. Such databases are capable of handling large amounts of data without a single point of failure, and are generally known in the art. These databases have partitions that can be clustered. Each partition can be stored in a separate computing device (“node”) and each row has an associated partition key that is the primary key for the table storing the row. Rows are clustered by the remaining columns of the key. Data that is stored at nodes is “eventually consistent,” because in that other locations may be informed of the additional data (or changed data) over time.
Changes to an object can be stored as separate data in Cassandra, e.g., as “events.” Each event can indicate a particular change to an object, e.g., creation, multiple updates, and delete. In some embodiments, a “generation” column of a table tracks the various events and is incremented so that the latest generation indicates the latest state. Eventually consistent databases like Cassandra can be very fast for write operations, but slower for some other operations. Thus, in some embodiments, every change or deletion can write an event. However, when multiple nodes are involved in an eventually consistent database, the disclosed technology performs additional processing to ensure that semantics, e.g., application semantics are enforced. As an example, a deletion of an object cannot precede creation of the object. The additional processing is done because a particular node may not have all events needed to reflect a current view for an object because additional events were stored at a different node.
Regardless of the sequence of events, the events can be broken down using a finite number of “base sequences” of events that map to a single event that in turn represents the chosen resolution of the sequence. The strategy to resolve a sequence of events to a “correct” state at the latest point in time becomes a substitution of base sequence resolutions into an original arbitrary sequence until a correct current state is reflected. To reflect the correct state, the following processing can occur: (1) events can be processed in time order; and (2) events occurring earlier in time are assumed not to apply to events occurring later.
The technology can include a resolution processor for each different application that is supported. As an example, the technology can include a first resolution processor for AMAZON S3 and a second resolution processor for a Cloud Data Management Interface (CDMI). These different resolution processors can process events according to their own respective storage application semantics and resolve conflicts according to their own protocols for doing so. As an example, a CDMI event processor may combine all events from oldest to newest in a timewise manner, but an S3 event processor may choose to ignore some events (e.g., a sequence of update events if there is a delete event later in time).
Several embodiments of the described technology are described in more detail in reference to the Figures. The computing devices on which the described technology may be implemented may include one or more central processing units, memory, input devices (e.g., keyboard and pointing devices), output devices (e.g., display devices), storage devices (e.g., disk drives), and network devices (e.g., network interfaces). The memory and storage devices are computer-readable storage media that may store instructions that implement at least portions of the described technology. In addition, the data structures and message structures may be stored or transmitted via a data transmission medium, such as a signal on a communications link. Various communications links may be used, such as the Internet, a local area network, a wide area network, or a point-to-point dial-up connection. Thus, computer-readable media can comprise computer-readable storage media (e.g., “non-transitory” media) and computer-readable transmission media.
While
Those skilled in the art will appreciate that the logic illustrated in
Thus, the technology is capable of handling queries in an eventually consistent database, e.g., Cassandra, without locking rows. As is known in the art, locking rows would cause significant deterioration in performance.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. Accordingly, the invention is not limited except as by the appended claims.