Graph databases are powerful tools that find countless applications in data science, information technology, and virtually any field in which it is desired to track diverse types of information. As compared with relational databases, which store data in tables, graph databases store data in the form of nodes. Each node may have a name as well as one or more labels, and multiple nodes may be grouped according to label. Relationships are formed between nodes, and each relationship has a start node and an end node. Each node of a graph database can have properties, and properties may be stored as key-value pairs.
Because graph databases represent data using nodes rather than tables, graph databases can run complex queries involving many diverse nodes using a relatively small amount of memory. Running similar queries using table-based databases might require numerous table joins, which can consume a great deal of memory and require long execution times. A popular graph database management system is Neo4j, which was developed by Neo4j, Inc. and which employs a query language called Cypher. Other examples of graph databases include JanusGraph, Giraph, and Dgraph, and other query languages include Gremlin and SPARQL.
Graph databases are typically schema-less, meaning that there is no enforcement of rules regarding the structures that nodes, properties, relationships, and labels can assume. This schema-less feature is an advantage to many users, who require flexibility in terms of data processing and analysis.
Unfortunately, the lack of schemas inherent in most graph databases can cause difficulties for non-experts. Indeed, the very lack of structure that experts enjoy is often a handicap to non-experts, who could benefit from the simplicity of more highly-structured data, particularly when performing manual data entry. Thus, there is a need for applying schemas to graph databases to facilitate their ease of use by non-experts and others.
To help address this need, an improved technique for managing a graph database provides a schema for a data model in the graph database by creating a metamodel. The metamodel resides alongside the data model and includes a set of nodes, which define respective classes of nodes that may exist in the data model. Users may operate a user interface to create new nodes in the data model as instances of the node classes defined in the metamodel.
In some examples, the new nodes inherit characteristics of the node classes, which limit the scope of permitted properties and/or relationships of the new nodes in the data model to those established for the respective node classes in the metamodel. In this manner, the node classes defined in the metamodel enforce structure on the instances of the node classes created in the data model, thereby enabling the metamodel to function as a schema for the data model.
Certain embodiments are directed to a method of managing a graph database. The method includes creating a metamodel in the graph database. The metamodel includes a first set of nodes, the first set of nodes defining respective node classes. The metamodel is created based at least in part on operation of a user interface served by a computing machine in communication with the graph database. The method further includes providing a data model in the graph database, the data model including a second set of nodes, the second set of nodes distinct from the first set of nodes. The method still further includes applying the metamodel as a schema for the data model, including instantiating, in response to operation of the user interface, new nodes of the data model as instances of respective node classes defined by the first set of nodes.
Other embodiments are directed to a computerized apparatus constructed and arranged to perform a method of managing a graph database, such as the method described above. Still other embodiments are directed to a computer program product. The computer program product stores instructions which, when executed by control circuitry of a computerized apparatus, cause the computerized apparatus to perform a method of managing a graph database, such as the method described above.
In some examples, both the metamodel and the data model operate within a single instance of the graph database.
According to some examples, creating the metamodel includes establishing a set of properties of a particular node of the first set of nodes. The set of properties define respective types of characteristics that the user interface permits users to assign to instances of the particular node created in the data model.
In some examples, establishing the set of properties of the particular node includes presenting, by the user interface, a control for specifying whether a property is required to be assigned by instances of the particular node created in the data model.
In some examples, creating the metamodel includes establishing a set of relationships for the particular node in the metamodel. The set of relationships define respective types of connections that are permitted by instances of the particular node with other nodes in the data model.
In some examples, establishing the set of relationships for the particular node in the metamodel includes establishing a particular relationship by (i) identifying another node in the metamodel and (ii) specifying a type of relationship between the particular node and the other node. Here, each instance of the particular node in the data model inherits the specified type of relationship from the particular node in the metamodel and thereby is permitted to form a relationship of the specified type with a respective instance of the other node in the data model.
In some examples, establishing the particular relationship is further performed by specifying a directionality of the particular relationship.
In some examples, establishing the particular relationship is further performed by specifying a forward cardinality and/or a reverse cardinality of the particular relationship.
According to some examples, creating the metamodel further includes assigning each of the first set of nodes (i) a respective label that identifies the node as a component of the metamodel and (ii) a respective name that identifies the node class represented by the respective node.
According to some examples, instantiating the new nodes of the data model includes assigning each of the new nodes (i) a respective label that matches the name of one of the first set of nodes in the metamodel and thereby identifies a node class of which the new node is an instance and (ii) a respective name that identifies the specific instance of the node class.
In some examples, the technique further includes creating a new node in the data model by (i) receiving, via the user interface, user input for selecting a particular node class from the metamodel, (ii) in response to a selection of the particular node class, displaying, by the user interface, a set of property fields for receiving user entry of properties of the new node, the set of property fields based on properties established for the selected node class in the metamodel, and (iii) accepting user input of properties into at least one of the set of property fields.
In some examples, the method further includes failing creation of the new node in response to a required property of the particular node class not being entered.
According to some examples, creating the new node in the data model is further performed by displaying a set of relationship fields for receiving user entry of relationships that the new node is permitted to have with other nodes in the data model. The set of relationship fields is based on relationships established for the selected node class in the metamodel.
The foregoing summary is presented for illustrative purposes to assist the reader in readily grasping example features presented in this disclosure. However, this summary is not intended to set forth required elements or to limit embodiments hereof in any way. One should appreciate that the above-described features can be combined in any manner that makes technological sense, and that all such combinations are intended to be disclosed herein, regardless of whether such combinations are identified explicitly or not.
The foregoing and other features and advantages will be apparent from the following description of particular embodiments of the invention, as illustrated in the accompanying drawings, in which like reference characters refer to the same or similar parts throughout the different views.
Embodiments of the invention will now be described. One should appreciate that such embodiments are provided by way of example to illustrate certain features and principles of the invention but that the invention hereof is not limited to the particular embodiments described.
An improved technique for managing a graph database provides a metamodel as a schema for a data model. The metamodel resides in the graph database and includes a set of nodes, which define respective classes of nodes that may exist in the data model. Users may operate a user interface to create new nodes in the data model as instances of the node classes defined in the metamodel.
In the example shown, the computing machine 120 includes one or more communication interfaces 122, a set of processors 124, and memory 130. The communication interfaces 122 include, for example, network interface adapters (e.g., Ethernet, Wi-Fi, or the like) for converting signals received over the network 114 to suitable form for use by the computing machine 120. The set of processors 124 includes one or more processing chips and/or assemblies, such as one or more multi-core CPUs (central processing units). The memory 130 includes both volatile memory, e.g., Random Access Memory (RAM), and non-volatile memory, such as one or more ROMs (Read-Only Memories), disk drives, solid state drives, or the like. In some examples, the computing machine 120 is coupled to external storage, such as a NAS (Network-Attached Storage) or a SAN (Storage Area Network). The set of processors 124 and the memory 130 together form control circuitry, which is constructed and arranged to carry out various methods and functions as described herein. Also, the memory 130 includes a variety of software constructs realized in the form of executable instructions. When the executable instructions are run by the set of processors 124, the set of processors 124 is made to carry out the operations of the software constructs. Although certain software constructs are specifically shown and described, it is understood that the memory 130 typically includes many other components, which are not shown, such as an operating system, various applications, processes, and daemons. Although
As further shown in
In a particular example, the graph database 150 is implemented using Neo4j, although other graph database technologies may be used instead. In some examples, the UI 142 and schema manager 144 are tightly integrated, and may indeed be part of a single programming environment, e.g., with forms and code served from the same program. In a particular example, which is not intended to be limiting, the schema application 140 is implemented on a GRAILS framework and uses a JavaScript API. The schema manager 144 may use a combination of server-side and client-side code. In such cases, the client-side code is executed on the client machine 112, e.g., in a web browser or client-side application running on the client machine 112.
As further shown in
In example operation, the user 110 accesses the schema application 140 over the network 114, e.g., using a browser or client application. The schema application 140 exposes the UI 142 via the web server 132, and the user 110 operates the UI 142 from the client machine 112 to control the schema application 140. For example, the user 110 may direct the schema application 140 to create the metamodel 170, e.g., by creating nodes 160a, establishing relationships among the nodes 160a, and assigning properties labels, and/or names to the nodes 160a. As the user 110 creates and configures the nodes 160a, the schema application 140 internally identifies those nodes as components of the metamodel 170, e.g., by assigning each of the nodes 160a a specific label. For example, by giving each node 160a of the metamodel 170 the label “Component,” the schema application 140 can distinguish nodes 160a of the metamodel 170 from other nodes in the graph database 150. The specific text used in the metamodel-identifying label is not important. However, the schema application 140 should preferably apply the label consistently to nodes 160a of the metamodel 170 and should avoid using that label on nodes that are not part of the metamodel 170.
With the metamodel 170 or some portion thereof created, the user 110 may direct the schema application 140 to create new nodes 160b in the data model 180, based on the nodes 160a in the metamodel 170. For example, the schema application 140 treats the nodes 160a of the metamodel 170 as node classes, and instantiates nodes 160b of the data model 180 as specific instances of those node classes.
As instances of node classes, the nodes 160b of the data model 180 inherit characteristics from the node classes (nodes 160a) from which they are created. For example, any properties defined for nodes 160a of the metamodel 170 become properties that are allowed to be configured for instances of those nodes in the data model 180. Likewise, any relationships established among the nodes 160a in the metamodel 170 become allowable relationships that may be configured for the corresponding data-model instances. In some examples, the schema application 140 uses the properties and relationships of nodes 160a in the metamodel 170 to constrain how properties and relationships may be configured for instances in the data model 180. For example, the schema application 140 limits available properties for a node 160b in the data model 180 to only those types of properties that have been defined for the node class in the metamodel 170 from which the node in the data model 180 was instantiated. The schema application 140 also limits available relationships for a node 160b in a similar manner, such that nodes in the data model 180 can only have the types of relationships with specified types of other components as defined in the metamodel 170. As a result, the schema application 140 treats the metamodel 170 as a kind of “schema by example,” such that nodes 160b of the data model 180 are specific examples of the corresponding node classes in the metamodel 170, with the same types of properties and the same types of relationships as those defined in the metamodel 170.
Although the schema application 140 may create new nodes in the data model 180 based on classes defined in the metamodel 170, some nodes in the data model 180 may be created by other means. For example, the data model 180 may receive data feeds from various sources, where the data are already organized by some schema in the data source, such that schema application 140 can ingest the data with only minimal adjustment. Thus, it is not required that all nodes 160b of the data model 180 be created from the metamodel 170 directly.
As shown in
Names and properties of nodes 160b in the data model 180 are established as specific values that are particular to the respective instances. For example, node 160b1 has the name “SVR-361,” which may be used to identify a specific server instance. Likewise, property “192.168.10.11” is a specific instance of “IP_Addy” and “i5-200” is a specific instance of “CPU.” Names and properties of other nodes correspond in a similar fashion.
Each relationship has a direction, which may be inbound or outbound, as indicated by the arrows. Each relationship may also have a forward cardinality 222 and/or a reverse cardinality 224. Each cardinality 222 or 224 indicates a number of elements of that type that can be linked to. For example, relationship 220a shows that a data center may link to “ZERO_OR_MORE” servers, but a server may link to only “ONE” data center. Other possible values for cardinality include “ZERO” and “ONE_OR_MORE,” for example.
The schema application 140 preferably provides the UI 142 in the form of a GUI, which promotes user-friendly access to the functionality of the schema application 140 for managing metamodels and data models without requiring expert-level knowledge. Although this disclosure focuses on certain features related to metamodels and data models, one should appreciate that the schema application 140 may be extended to provide a comprehensive front-end to the graph database 150, e.g., to promote ease of use by ordinary users.
Users may select other views, as shown above the graph, such as “Extended Fields,” “Outbound Links” (relationships), and “Inbound Links,” with the numbers in parentheses specifying the number of extended fields (2), outbound links (1), and inbound links (1) that have been configured for the selected component (Server). Users may also view “Nodes of Type,” which provides a display of all nodes in the data model 180 that are instances of the Server node class. In the depicted example, there is a total of 158 Servers in the data model 180.
Links at the top of the screen provide additional options. These include a “Component List” link 320, for displaying a list of all components in the metamodel 170, a “New Component” link 330, for creating a new component in the metamodel 170, and a “Create Server” link 340, for creating a new instance in the data model 180 of the Server class. They also include an “Edit” link 350 for editing the current component and a “Delete” link 360 for deleting it.
In an example, the property fields displayed on screen 800 are limited to those extended fields that are defined in the metamodel 170 for the Server component (node 160a1). Accordingly, the schema application 140 displays different property fields for different metamodel components. Users can change the component “Type” by clicking the field 810. In an example, when the user 110 clicks the field 810 to select a different component type, the schema application 140 performs a live lookup into the graph database 150, e.g., via a Cypher query, to identify all components of the metamodel 170, e.g., all nodes 160a that have the “Component” label. The schema application 140 then displays a drop-down list of the returned components and allows the user 110 to select one. When the new component type is selected, the displayed property fields change to reflect those that have been defined for the new component type.
As part of creating the new Server instance in
In some examples, the schema application 140 supports numerous related features for operating the data model 180, e.g., to avoid the need for non-expert users to access the graph database 150 directly. For example, the schema application 140 may support common reports. It may also support smart graphs, which allow users to view report results in graphical form. The schema application 140 may store pre-designed queries (e.g., Cypher queries) within nodes of the graph database 150, e.g., as text-area properties. It may also support parameterized queries, e.g., by presenting data entry fields for accepting any needed parameters. In some examples, the schema application 140 includes a query builder, which reads the metamodel 170 and guides users to design queries that conform to the structure of the metamodel 170.
At 1010, a metamodel 170 is created in the graph database 150. The metamodel 170 includes a first set of nodes 160a, which define respective node classes. The metamodel 170 is created based at least in part on operation of a user interface 142 served by a computing machine 120 in communication with the graph database 150, which may run within the computing machine 120 or remotely (e.g., on a separate server).
At 1020, a data model 180 is provided in the graph database 150. As shown for example in
At 1030, the metamodel 170 is applied as a schema for the data model 180, e.g., by instantiating, in response to operation of the user interface 142, new nodes 160b of the data model 180 as instances of respective node classes defined by the first set of nodes 160a.
An improved technique has been described which provides a schema for a data model 180 in a graph database 150 by creating a metamodel 170. The metamodel 170 resides in the graph database 150 and includes a set of nodes 160a, which define respective classes of nodes that may exist in the data model 180. Users may operate a user interface 142 to create new nodes 160b in the data model 180 as instances of the node classes defined in the metamodel 170. The new nodes 160b may inherit characteristics of the node classes, which limit the scope of permitted properties and/or relationships of the new nodes in the data model 180 to those established for the respective node classes in the metamodel 170. In this manner, the node classes defined in the metamodel 180 enforce structure on the instances of those node classes created in the data model 180, thereby enabling the metamodel 170 to function as a schema for the data model 180.
Having described certain embodiments, numerous alternative embodiments or variations can be made. For example, although names and labels have been used herein to associate node classes with instances of those node classes, this is merely an example. Alternatively, properties or other constructs may be used to establish these associations. Also, although particular features of a GUI have been shown for enabling users to control the schema application 140, the particular features as shown are intended as illustrative examples rather than as limiting.
Further, although features have been shown and described with reference to particular embodiments hereof, such features may be included and hereby are included in any of the disclosed embodiments and their variants. Thus, it is understood that features disclosed in connection with any embodiment are included in any other embodiment.
Further still, the improvement or portions thereof may be embodied as a computer program product including one or more non-transient, computer-readable storage media, such as a magnetic disk, magnetic tape, compact disk, DVD, optical disk, flash drive, solid state drive, SD (Secure Digital) chip or device, Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA), and/or the like (shown by way of example as medium 1050 in
As used throughout this document, the words “comprising,” “including,” “containing,” and “having” are intended to set forth certain items, steps, elements, or aspects of something in an open-ended fashion. Also, as used herein and unless a specific statement is made to the contrary, the word “set” means one or more of something. This is the case regardless of whether the phrase “set of” is followed by a singular or plural object and regardless of whether it is conjugated with a singular or plural verb. Further, although ordinal expressions, such as “first,” “second,” “third,” and so on, may be used as adjectives herein, such ordinal expressions are used for identification purposes and, unless specifically indicated, are not intended to imply any ordering or sequence. Thus, for example, a “second” event may take place before or after a “first event,” or even if no first event ever occurs. In addition, an identification herein of a particular element, feature, or act as being a “first” such element, feature, or act should not be construed as requiring that there must also be a “second” or other such element, feature or act. Rather, the “first” item may be the only one. Although certain embodiments are disclosed herein, it is understood that these are provided by way of example only and that the invention is not limited to these particular embodiments.
Those skilled in the art will therefore understand that various changes in form and detail may be made to the embodiments disclosed herein without departing from the scope of the invention.
This application claims the benefit of U.S. Provisional Patent Application No. 62/743,334, filed Oct. 9, 2018, the contents and teachings of which are incorporated by reference herein in their entirety.
Number | Date | Country | |
---|---|---|---|
62743334 | Oct 2018 | US |