1. Field
Embodiments of the invention generally relate to computing applications. More specifically, embodiments provide a multi-threaded application with an in-memory database management system (DBMS) using a collection of automatically generated programming objects.
2. Description of the Related Art
A broad variety of computer software applications access data stored in databases. Similarly, application programs often create and manipulate complex graph data structures in order to perform a variety of application functions. Typically, a program developer creates such data structures from objected oriented programming objects, e.g., a Java® programming language or a C++ class. Using the Java programming language as an example, a developer may compose a collection of “plain old Java objects,” where references between objects in the graph data structure are represented as Java object variables that point to other Java objects. However, this approach is not thread safe. In some cases, thread safety can be achieved by using, e.g., synchronization mechanisms provided by the Java programming language on a “root” object of a complex data structure. But doing so limits the throughput of a multithreaded program which makes frequent access to the data structure. More fine-grained locking can be used on the data structure, e.g., by using separate locks on separate elements, but this approach introduces the possibility of deadlock conditions. More generally, Java thread synchronization does not address transactions, automatic deadlock detection and rollback, or two-level locking.
Another solution to providing a multithreaded application with access to data is to forego use of a graph data structure objects and instead to configure each thread to access another application, typically a relational database. In such a case, an application program typically uses some form of object-relational mapping mechanism to map data records stored in a relational database to attributes of program objects as well as to provide independent access to data from each thread. The relational database coordinates multiple threads accessing the data. However, DBMS's are frequently much slower for write-accesses and thus are suited to applications that are read-mostly, rather than applications that make heavy using of writing (changing) the graph data structure from multiple threads.
Embodiments presented herein include a method for accessing a concurrent graph in-memory database system (DBMS). This method may generally include launching a multithreaded application configured to access a concurrent graph data structure included in the multithreaded application and initializing the concurrent graph data structure. The concurrent graph data structure includes a factory object configured to instantiate a plurality of objects in the concurrent graph data structure in response to requests from one of the plurality of the threads in the multithreaded application. This method also includes, while executing a first thread, of the plurality, initiating a transaction and accessing at least a first one of the plurality objects of in the concurrent graph data structure, wherein the first object is configured to maintain concurrency control while being accessed by the first thread as part of the transaction. Upon determining the transaction is complete, the transaction is committed. Upon determining an exception has occurred, a state of the first object is rolled back to undo any changes made while performing the transaction.
Other embodiments include, without limitation, a computer-readable medium that includes instructions that enable a processing unit to implement one or more aspects of the disclosed methods as well as a system having a processor, memory, and application programs configured to implement one or more aspects of the disclosed methods.
So that the manner in which the above recited aspects are attained and can be understood in detail, a more particular description of embodiments of the invention, briefly summarized above, may be had by reference to the appended drawings. Note, however, the appended drawings illustrate only typical embodiments of this invention and do not limit the scope thereof, for the invention may admit to other equally effective embodiments.
Embodiments presented herein provide an object-oriented, multithreaded application program that both supports a specific object-schema and provides transactional semantics for threads launched by the application to access a concurrent graph data structure, which itself provides an in-memory DBMS for the application threads. Embodiments presented herein also provide techniques for generating source code for the concurrent graph data structure, transaction patterns for accessing the concurrent graph data structure, as well as source code for creating, reading and updating, and deleting attributes for objects in the graph structure. At the same time, the generated code handles concurrency issues and deadlocks that occur when multiple threads access the concurrent graph data structure.
In one embodiment, the generated code includes a factory class used to instantiate objects (i.e., nodes) in the concurrent graph data structure, manage indexes of objects in the concurrent graph, and resolve deadlocks that may occur when multiple threads access the concurrent graph simultaneously. The resulting application code allows a multithreaded program to access the graph data structure quickly and efficiently, including performing frequent writes (changes) to the concurrent graph data structure, as well as frequent reading of the concurrent graph, from multiple threads executing simultaneously.
In one embodiment, the concurrent graph data structure incorporates functionality of a conventional DBMS into the implementation of a set of programmatic objects (e.g., Java or C++ classes) accessed by a multithreaded application, by using encapsulation. For example, the concurrent graph data structure may manage concurrency issues, e.g., using two-level locking or after-the-fact optimistic concurrency detection, deadlock detection (if pessimistic concurrency is used), rollback of incomplete transactions (in case of rollback due to concurrency violations, deadlock, or Java exceptions interrupting a transaction), without requiring a developer to explicitly build this functionality into the multithreaded application or concurrent graph objects. Instead, the source code generated from a schema description in conjunction with the use of transaction annotation in the application itself encapsulates this functionality into the objects of the concurrent graph structure.
The generated code may include a factory object for creating instances of the concurrent graph objects. The factory object may also include an extent or realized collection of all instances of each class of object in the concurrent graph data structure, and indexes on the objects in an extent based on an extensible set of unique keys for each object. In one embodiment, the code generation tools described herein automatically generate an implementation of the objects that make up the concurrent graph data structure from a high level data schema language that describes the objects and relationships as well as the factory from the same high level data schema language. The schema language allows a developer to represent relationships between objects explicitly, including the cardinality of the relationship, and relationships may be modified from either of the two objects that have the relationship, and both ends of the relationship are automatically maintained consistently by objects of the concurrent graph. In one embodiment, the two-way relationship maintenance is encapsulated within the implementation of the objects created by the code generator for a given data schema defined using the data schema language.
The concurrent graph data structure, i.e., the in-memory DBMS, allows for representation of graph data structures in memory using familiar object navigation semantics, while at the same time providing the atomicity, concurrency and integrity properties of a conventional DBMS, including concurrent access and modification of the concurrent graph data structure from multiple threads. Thus, the concurrent graph data structure serves as a “traffic cop” between multiple application threads, preventing them from seeing unfinished and inconsistent changes made by other threads performing transactions against the concurrent graph, and atomicity of changes. It also provides automatic detection of deadlocks, and corrects rollback of a thread's incomplete transaction when exceptions or deadlocks occur.
Aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples a computer readable storage medium include: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the current context, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus or device.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. Each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by special-purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Embodiments of the invention may be provided to end users through a cloud computing infrastructure. Cloud computing generally refers to the provision of scalable computing resources as a service over a network. More formally, cloud computing may be defined as a computing capability that provides an abstraction between the computing resource and its underlying technical architecture (e.g., servers, storage, networks), enabling convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction. Thus, cloud computing allows a user to access virtual computing resources (e.g., storage, data, applications, and even complete virtualized computing systems) in “the cloud,” without regard for the underlying physical systems (or locations of those systems) used to provide the computing resources. A user can access any of the resources that reside in the cloud at any time, and from anywhere across the Internet.
Note, embodiments of the invention are described below using the Java programming language as an example of a programming language used to provide source code for an in-memory DBMS using a concurrent graph data structure. One of ordinary skill in the art will recognize, however, that embodiments of the invention may be adapted for use with other object oriented programming languages that support multithreaded applications.
In one embodiment, the application threads 1051-n and API threads 1151-2 initiate and commit transactions against the concurrent graph data structure 120, e.g., threads 105, 115 may create, read, updated and delete data elements (i.e., objects and attributes of objects) in the concurrent graph data structure 120. In turn, the concurrent graph data structure 120 may be configured to ensure that transactions performed concurrently by multiple threads are (i) atomic, i.e., a transaction initiated by a thread 105, 115 is either completed fully or not at all, including rolling back a partially completed transaction; (ii) consistent, i.e., any completed transaction will bring the database from one valid state to another, e.g., deleting a parent object will result in any child objects being deleted as well; and (iii) isolated, i.e., two threads executing independent transactions concurrently results in a concurrent graph data structure that could have been obtained if transactions are executed one after the other.
As shown, the concurrent graph data structure 120 includes an object factory 122 and concurrent graph objects 125. In one embodiment, the object factory provides a programmatic object configured to create the nodes (i.e., instantiate a concurrent graph object 122) as part of transactions initiated by threads 105, 115. More generally, the concurrent graph data structure 120, or just concurrent graph, provides an in-memory data structure which includes object instances (i.e., concurrent graph objects 125) and relationships among object instances. Unlike conventional object-oriented programming objects, the concurrent graph object 120 includes a locking mechanism to prevent an object's state from being simultaneously modified by two different threads 105, 115 at the same time and also includes a rollback mechanism allowing object state to be restored to a value it had at the start of a transaction (if the transaction fails).
In one embodiment, the concurrent graph data structure 120 includes a mechanism to determine which object instances (i.e., which concurrent graph objects 125) have been read and/or modified by a given transaction. Additionally, the locking mechanism of the concurrent graph data structure 120 is able to determine when a deadlock occurs, e.g., when two threads are each waiting for access to a lock held by the other. In one embodiment, the concurrent graph data structure 120 may be persisted, i.e., stored in a persistent storage medium, e.g. a disk drive. Doing so allows the in-memory state of the concurrent graphs objects 125 to be persisted to storage 135—and later read from storage 135.
In one embodiment, a code generator 210 may generate the in-memory DBMS source code 215 based on a schema description 205 of the entities (e.g., objects) in a given concurrent graph. The schema itself 205 may be composed according to a schema definition language used to describe concurrent objects and relationships among them including various relationship cardinalities. The code generator 210 may be configured to transform a given concurrent graph schema (e.g., schema 205) defined using the schema definition language into fully implemented objects that use a collection of inheritable base classes and a factory class (i.e. the concurrent graph classes 220) that performs basic CRUD (create, retrieve, update, and delete) operations on the concurrent graph objects as part of thread-initiated transactions.
While the syntax and semantics of the schema description language may be tailored to suit the needs of a particular case,
More generally, the data schema 300 includes not only each object's attributes, but also includes relationships, constraints on the attributes and relationships, a declaration of unique keys, and methods that manipulate the objects. For the purposes of identifying the object, each class has a primary key. In addition, the object may have other unique keys by which an object possessing a particular key value may be found using the factory class generated for a given data schema.
Relationships between classes in the schema may specify a cardinality of that relationship (e.g., as being one-to-one, one-to-many, many-to-one, or many-to-many). Relationships among objects are bi-directional, meaning that if class A has a relationship to class B, then class B will have a corresponding inverse relationship to class A. Each direction of a relationship can be single-valued (one) or multi-valued (many). A relationship may exist between objects of two distinct classes, or between a class and itself. For example, in data schema 300, there is a one-to-one relationship between Husband and Wife. To generate source code for this relationship, the code generator may represent this one-to-one relationship using one-way Java object references on each side of the relationship, whose name indicates the relationship from that side. The bi-directional relationship between Husband and Wife is an example one-to-one relationship. Note that the relationship is declared only on one side in data schema 300 data (as shown in
One-to-many relationships from an object to multiple other objects may be represented with a set of Java object references from the “one” side class to the many side class and a single Java object reference from the many side to the one side class. The bi-directional relationships between Husband and Child and the separate relationship between Wife and Child are two examples of one-to-many relationships. Many-to-many relationships between an object and another object may be represented by a set of Java object references in each class. The bi-directional relationship between two Child instances (idol/admirer) is an example of a many-to-many relationship. Specifically, a Child may idolize multiple other children, and a Child may have multiple other children as admirers (note, at least as defined in this example, a child may admire him or herself).
By including the relationships, cardinality, and other constraints on relationships between objects in the data schema 300, the code generator can create source code for classes that support transactional semantics for multiple threads accessing the concurrent graph data structure. Further, in addition to specifying data attributes, the data schema 300 may also specify method operations for a particular class. For example, the “child” class of data schema 300 includes a “parentNames” procedure that returns the names of each parent associated with a child instance. Note, to do so, an instance of a child class in the concurrent graph data structure must traverse the relationships of that child object to identify the parent names from the related objects in the concurrent graph data structure. To do so, the generated code may automatically obtain read locks when a thread accesses the concurrent graph using this method. Doing so allows the developer to simply access the concurrent graph data structure using familiar object oriented mechanisms, without having to explicitly address concurrency, atomicity, or deadlock resolution into the application. Note, in addition to any specific methods supplied in the data schema 300, the code generator may also create accessor and mutator methods for the data attributes of each class, e.g., methods to perform create, read, update and delegate operations for attributes of an object defined by data schema 300.
In one embodiment, the code generator creates a derived class from the concurrent object base class 410 for each class described in the data schema. The source code generated for each such derived class encapsulates functionality allowing multiple threads to concurrently read, update, and delete objects in the concurrent graph data structure, as well as capture (and enforce) relationships between classes specified by the data schema 300. For example, the generated code will enforce the cardinality specified by a given relationship (e.g., an instance of the husband class can have a relationship to at most one instance of the wife class, but can be related to multiple instances of the child class). The generated classes 420 also includes any specific methods or procedures described by the data schema 300, along with an inherited collection of methods inherited from the concurrent object base class 410
The lock map 525 allows the factory object 510 to identify when a deadlock occurs and throw the appropriate exceptions in response. Doing so allows a thread requesting a lock that resulted in a deadlock condition to roll back and/or retry a given transaction. In one embodiment, concurrency issues are managed by the concurrent graph data structure using two level locks 530. In such an embodiment, a thread may obtain a lock to a given concurrent graph object 535 whenever a transaction is performed that includes that concurrent graph object 535. The two level locks 530 include one (or more) read locks for a given concurrent graph object 535 and a single write lock for that concurrent graph object 535. That is, multiple threads may obtain a read lock for a given concurrent graph object 535, but only one thread may obtain a write lock at any given time. When requesting a write lock, a thread performing a transaction needs to wait until all read locks on that object have been released and the write lock is then obtained, allowing the transaction to continue. Similarly, if a write lock is active for a given object, any thread requesting a read lock for that object needs to wait until the write lock for that object is released and the read lock is then obtained. The lock map 525 identifies what locks have been requested for a given object and what thread (or threads) is waiting for a given read or write lock. In the event of a deadlock, the concurrent graph factory object 510 can resolve the deadlock by throwing an exception caught by the threads causing the deadlock. In response, the threads can rollback a partially completed transaction causing it to release all of its locks, thus resolving the deadlock.
At step 715, the code generator generates source code for each class identified in the data schema. For example, in one embodiment, the code generator may create a derived class from a concurrent object base class. Such a derived class may include the attributes, keys, and methods specified by the data schema for that class. Further, the derived class may include source code that allows the derived object to interact with the two level locks and the factory object. For example, in addition to any scheme specific methods, the code generator may create methods to access, read and write to the data attributes of that class. Importantly, the derived class includes code needed to obtain read/write locks automatically when methods to read or write to the attributes are invoked by an application thread as part of a transaction.
At step 720, the code generator generates source code for a factory object for the in-memory DBMS. As described above, in one embodiment, the factory object may be derived from a concurrent graph base class and provide the functionality needed to create instances of the object classes generated at step 715, as well as source code to identify and resolve deadlocks that occur when multiple threads access locks to objects in the in-memory database. Additionally, the factory object may include source code configured to create indexes and extents of objects created by the application threads as part of a transaction at runtime. The indexes allow object references to quickly and efficiently be obtained by an application thread and the extents allow an application thread to quickly identify all objects of a given object type. Further, the code generator may also include source code in the factory object for creating and maintain a map indicating what objects are waiting for a given object lock and include source code for resolving deadlocks when they occur.
At step 720, the code generator generates source code for the in memory database that does not depended on the contents of the data schema received at step 705. For example, the support classes may include the locking and deadlock objects described above as well as code used to persist (or restore) a concurrent graph data structure from non-volatile storage. At step 730, the code generator outputs the source code for the classes generated at steps 715, 720, and 725.
Once created (or resorted) multiple application threads may read to and write from object nodes in the concurrent graph data structure. As shown by method 800, e.g., a loop begins following block 812 where the multithreaded application selects a thread to execute (until it blocks) or relinquishes control. At step 815, a thread initiates (or resumes) a transaction. In the present context, a transaction refers to an operation performed against the in-memory DBMS that should either be committed or rolled back. While performing a transaction, e.g., while the thread invokes accessor and mutator methods for one of the concurrent objects, the concurrent objects obtain read and/or write locks when accessing data objects in the in-memory DBMS (step 825). At step 830, the thread determines whether a transaction has been successfully completed. If so, then the thread commits the transaction (step 835). Otherwise, if the transaction fails (e.g., because a deadlock occurs) any changes made by the transaction are rolled back, and the thread may restart the transaction (step 840). In either case, the method 800 returns to step 815 where another thread is executed (allowing another transaction to be resumed/initiated). For example, the following table illustrates an example pattern for a thread to perform a transaction using the Java programming language
The code between cg.start( ) and cg.commit( ) may throw exceptions that are not caught by the above pattern. In that case, a cg.rollback( ) will occur due to the finally clause. Thus, uncaught exceptions are considered to be errors that abort the transaction and all changes to the concurrent graph data structure will be rolled back if the uncaught exception passes through the transaction boilerplate. Another approach to provide this transaction pattern would be to use Java annotation semantics. For example, a “@begin_transaction” and an “@end_transaction” annotation could be used to hide the boilerplate code, allowing the developer to simply bracket their transactions with the annotations.
The CPU 905 retrieves and executes programming instructions stored in the memory 920 as well as stores and retrieves application data residing in the storage 930. The interconnect 917 is used to transmit programming instructions and application data between the CPU 905, I/O devices interface 910, storage 930, network interface 915, and memory 920. Note, CPU 905 is included to be representative of a single CPU, multiple CPUs, a single CPU having multiple processing cores, and the like. And the memory 920 is generally included to be representative of a random access memory. The storage 930 may be a disk drive storage device. Although shown as a single unit, the storage 930 may be a combination of fixed and/or removable storage devices, such as fixed disc drives, removable memory cards, or optical storage, network attached storage (NAS), or a storage area-network (SAN).
Illustratively, the memory 920 includes a concurrent graph data structure 922, a multithreaded application 924, and a code generation tool 926. And the storage 930 includes a schema description 932 and persisted DBMS 934. As described above, the concurrent graph data structure 922 provides an in-memory DBMS accessed by the multithreaded application 924. At the same time, for the application developer, the objects of the concurrent graph data structure 922 are accessed using familiar semantics for creating, reading, updating, and deleting objects. That is, the developer may interact with the objects instantiated in the concurrent graph data structure as a collection of “plain old Java objects.” The code generation tool 926 is generally configured to create the classes needed for the concurrent graph data structure 922 from a schema description 932. The persisted DBMS 934 represents a serialized copy of the concurrent drag data structure written to disk 922. Note, while computing system 900 shows both the code generation tool and the concurrent graph data structure 922 on the same computing device, one of ordinary skill in the art will recognize that the code generation tool 924 need not be included or distributed with the multithreaded application 925.
As described, embodiments presented herein provide an object-oriented, multithreaded application program that both supports a specific object-schema and provides transactional semantics for threads launched by the application to access a concurrent graph data structure, which itself provides an in-memory DBMS for the application threads. Embodiments presented herein also provide techniques for generating source code for the concurrent graph data structure, transaction patterns for accessing the concurrent graph data structure, as well as source code for creating, reading and updating, and deleting attributes for objects in the graph structure. At the same time, the generated code handles concurrency issues and deadlocks that occur when multiple threads access the concurrent graph data structure.
While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.
This application claims priority to U.S. Provisional Patent Application Ser. No. 61/548,142 filed Oct. 17, 2011, entitled “Concurrent Graph In-Memory DBMS and Automatic Generation of Concurrent Graph In-Memory DBMS,” which is hereby incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
61548142 | Oct 2011 | US |