In certain computing environments, analytical tools can be employed to provide users and administrators with insightful information for making decisions and improvements relating to the operation of those environments. For example, the analytical tools can be configured to determine the risk posed to data security by continuously or periodically evaluating the activities of a given entity in the environment. These tools gather data from various products or data sources to build dashboards, reports, and for other analytical purposes. The data represents, for example, information about various users, devices, and networks along with their relationships. Structured Query Language (SQL) relational databases have been used to store this data which, in turn, is accessed through various endpoints when the data is queried. SQL is a standardized query language for constructing queries to access and manipulate relational databases. However, SQL is not compatible with other types of databases, such as graph databases, due to their structural differences. Therefore, a different query language must be used with such databases. The format of the query depends on the type of database, since different types of databases can utilize different query formats. Thus, building such queries can be incommodious to users who are unfamiliar with the specific database query requirements.
One example provides a graph database query construction and execution method including receiving a first database query including one or more selection sets each defining at least one database field to be queried from a graph database, where the first database query is coded in a generic query language, where the at least one database field is represented in the graph database as a property of a vertex; generating, for each of the one or more selection sets, a second database query including a select clause representing a request to retrieve the property of the vertex from the graph database, where the second database query is coded in a graph query language; and encapsulating the second database query into a third database query configured to be executed on the graph database, the third database query including the second database query, a graph query type, and a graph name associated with the graph database. In some examples, the first database query includes a query condition, and the method includes inserting the query condition into the select clause. In some examples, the query condition includes one or more of: a where clause, an order by clause, and/or a limit clause. In some examples, the method includes determining whether the vertex includes a relation annotation, where the relation annotation is represented in the graph database by a relation on the vertex and/or by a relation on an edge connected to the vertex; and inserting, in response to determining that the vertex includes the relation annotation, a pattern constraint to the select clause, the pattern constraint corresponding into the relation annotation. In some examples, the method includes causing the third database query to be executed on the graph database to produce a response to the third database query; and causing the response to be rendered to a user via a user interface of a client computing device. In some examples, the response is coded in the graph query language, and the method includes recoding the response in the generic query language for rendering via the user interface. In some examples, the first database query is a GraphQL query, and the response is a GraphQL response. In some examples, the generic query language is different from the graph query language.
Another example provides a computer program product including one or more non-transitory machine-readable mediums having instructions encoded thereon that when executed by at least one processor cause a process to be carried out, the process including: receiving a first database query including one or more selection sets each defining at least one database field to be queried from a graph database, where the first database query is coded in a generic query language, where the at least one database field is represented in the graph database as a property of a vertex; generating, for each of the one or more selection sets, a second database query including a select clause representing a request to retrieve the property of the vertex from the graph database, where the second database query is coded in a graph query language; and encapsulating the second database query into a third database query configured to be executed on the graph database, the third database query including the second database query, a graph query type, and a graph name associated with the graph database. In some examples, the first database query includes a query condition, and the process includes inserting the query condition into the select clause. In some examples, the query condition includes one or more of: a where clause, an order by clause, and/or a limit clause. In some examples, the process includes determining whether the vertex includes a relation annotation, where the relation annotation is represented in the graph database by a relation on the vertex and/or by a relation on an edge connected to the vertex; and inserting, in response to determining that the vertex includes the relation annotation, a pattern constraint to the select clause, the pattern constraint corresponding into the relation annotation. In some examples, the process includes causing the third database query to be executed on the graph database to produce a response to the third database query; and causing the response to be rendered to a user via a user interface of a client computing device. In some examples, the response is coded in the graph query language, and the process includes recoding the response in the generic query language for rendering via the user interface. In some examples, the first database query is a GraphQL query, and the response is a GraphQL response.
Another example provides a system including a storage; and at least one processor operatively coupled to the storage, the at least one processor configured to execute instructions stored in the storage that when executed cause the at least one processor to carry out a process including receiving a first database query including one or more selection sets each defining at least one database field to be queried from a graph database, where the first database query is coded in a generic query language; generating, for each of the one or more selection sets, a second database query, where the second database query is coded in a graph query language; and encapsulating the second database query into a third database query configured to be executed on the graph database. In some examples, the first database query includes a query condition, and the process includes inserting the query condition into the second database query, where the query condition includes one or more of: a where clause, an order by clause, and/or a limit clause. In some examples, the process includes determining whether the graph database includes a relation annotation; and inserting, in response to determining that the graph database includes the relation annotation, a pattern constraint to the second database query, the pattern constraint corresponding into the relation annotation. In some examples, the process includes causing the third database query to be executed on the graph database to produce a response to the third database query; and causing the response to be rendered to a user via a user interface of a client computing device. In some examples, the response is coded in the graph query language, and the process includes recoding the response in the generic query language for rendering via the user interface.
Other aspects, examples, and advantages of these aspects and examples, are discussed in detail below. It will be understood that the foregoing information and the following detailed description are merely illustrative examples of various aspects and features and are intended to provide an overview or framework for understanding the nature and character of the claimed aspects and examples. Any example or feature disclosed herein can be combined with any other example or feature. References to different examples are not necessarily mutually exclusive and are intended to indicate that a particular feature, structure, or characteristic described in connection with the example can be included in at least one example. Thus, terms like “other” and “another” when referring to the examples described herein are not intended to communicate any sort of exclusivity or grouping of features but rather are included to promote readability.
Various aspects of at least one example are discussed below with reference to the accompanying figures, which are not intended to be drawn to scale. The figures are included to provide an illustration and a further understanding of the various aspects and are incorporated in and constitute a part of this specification but are not intended as a definition of the limits of any particular example. The drawings, together with the remainder of the specification, serve to explain principles and operations of the described and claimed aspects. In the figures, each identical or nearly identical component that is illustrated in various figures is represented by a like numeral. For purposes of clarity, not every component may be labeled in every figure.
As summarized above, at least some examples described in this disclosure are directed to techniques for translating a generic database query, such as a GraphQL query, to a graph database query, such as Cypher for a Neo4j graph database or GSQL for a Tiger Graph database. Such techniques are useful in conjunction with services that provide, for example, analytical insights of data received from one or more products. Such services collect data associated with entities in the user's environment, such as users, devices, and network information along with the relationships between these entities. The data generated from various onboarded products is stored in a graph database or datastore. The graph database can be queried to retrieve data for building reports, dashboards, and the like. This is achieved by translating a generic database query to a graph language query.
In accordance with an example of the present disclosure, a customer adds one or more products, such as a virtual application or desktop, a collaboration application or desktop, or other application to an analytical service. Data from these products flow into the analytical service. The data can represent, for example, device logins, network access, application execution, file creation and sharing, and other activities. The data are ingested into a graph database. Subsequently, users can query the graph database via the analytical service to retrieve data of interest. However, the format of the query depends on the type of database, since different types of databases can utilize different query formats. Furthermore, every query requirement corresponds to a separate query, and each query requires a new data endpoint for processing. Thus, as noted above, building such queries can be incommodious to users who are unfamiliar with the specific query requirements because they must be constructed according to, and with knowledge of, the structure of the database being queried. This poses challenges when the database structure is complex or unknown to the user.
To this end, examples of the present disclosure provide techniques for automatically generating a query for a graph database using a generic query language, such as GraphQL, which does not require the user to know the structure of the graph database. A schema representing a structure of the graph database is used to automatically translate the generic query to a graph query that comports with the structure of the graph database. A query language is a specification that defines the syntax and procedure for retrieving information from a database. Different query languages exist for different types of databases. For example, GraphQL is a language-independent (or generic) data query language developed as an alternate to Representational State Transfer (REST) and ad-hoc webservice architectures. GraphQL can be used as a substitute for a REST Application Programmable Interface (API) to access a graph database. REST APIs can become difficult to maintain especially when there are many endpoints. Also, REST APIs are dependent on the structure of the database, and thus require the developer of the API to have an intimate knowledge of that structure and how the endpoints correspond to the structure. In contrast to a REST API, GraphQL, or another suitable generic query language, can be used with any language and any database system because it is language-independent. Furthermore, in contrast to a REST API, GraphQL exposes only one endpoint.
Example Data Query/Response Processes
An end user client device 202 executes a REST client/user interface (UI) 204, which interacts with multiple REST-based endpoints 212 associated with the graph database 210. The REST client/UI 204 exposes the endpoints 212 to the end user client device 202. The endpoints 212 are used to get, post, update, and/or delete data 216 from, to, or in the graph database 210. For example, the endpoints 212 can be used to retrieve data to build reports and dashboards via a calling process. Each request by the REST client/UI 204 from the calling process corresponds to an individual graph query 214 written by a developer. The graph query 214 is processed by a REST controller server 206 via the data access layer 208, to obtain a response 218 from the graph database 210. As with the SQL query 114, the graph query 214 results in a unique endpoint 212 (i.e., each request corresponds to a unique endpoint). In operation, the REST client/UI 204 invokes one of the endpoints 212 via the server 206 and the graph queries 214 are executed on the graph database 210 via a data access layer 208, resulting in a response 218 to the REST client/UI 204 via the calling process.
The process 200 is similar to the process 100 of
An end user client device 302 executes a generic query language (e.g., GraphQL) client/user interface (UI) 304, which interacts with one or more resolvers exposed by the GraphQL controller 306 through a single endpoint 312 to obtain data from the graph database 310. The resolvers define one or more functions for generating a response to a graph query and includes at least one database field to be queried. The generic query language client/UI 304 exposes of the resolver(s), through the endpoint 312, to the end user client device 302. The endpoint 312 is used to get, post, update, and/or delete data 316 from, to, or in the graph database 310. For example, the endpoint 312 can be used to retrieve data to build reports and dashboards via a calling process. Each request by the generic query language client/UI 304 from the calling process corresponds to a generic query 314 (e.g., a query constructed in the GraphQL query language), which is processed by a generic query language (e.g., GraphQL) controller 306 to obtain a response 318 from the graph database 310. The generic query 314 results in an endpoint 312.
In operation, the generic query language client/UI 304 invokes the endpoint 312 via the GraphQL controller 306. A graph query generator 308 translates the generic query 314 into a graph query 320 constructed in a graph query language, such as GSQL, according to a schema 322 for the graph database 310, as described in further detail below. The graph query 320 is executed on the graph database 310, resulting in a response 318 to the generic query language client/UI 304 via the calling process.
Example Graph Database Schema
In this example, the graph database schema 400 includes the following entities: User 402, Network 404, Device 406, Shares 408, and RiskIndicator 410. Each of these entities is represented in the graph database schema 400 as a vertex in the graph database. The graph database schema 400 further includes the following relations between entities: NetworkOpertation 412, Own 414, HasUserRisk 416, ShareOperation 418, HasNetworkRisk 420, HasDeviceRisk 422, and HasShareRisk 424. Each of these relations is represented in the graph database schema 400 as an edge between corresponding vertices in the graph database. Each of the vertices and edges in the graph database schema 400 can be associated with data relating to the entities and relations, as will be described by example below.
In an example, consider a user Adam whose account is being attacked. The user Adam is represented by the User 402 vertex in the graph database schema 400, and Adam's computing device (e.g., desktop, laptop, tablet, etc.) is represented by the Device 406 vertex. The relation Own 414 represents the relationship between the User 402 Adam and his Device 406. A hacker attempts to login to Adam's account multiple times from a network with IP 10.0.0.4 but fails to login. All login attempts made by Adam are events, which are are loaded to graph database by creating User vertex “Adam” 402 and Network vertex “10.0.0.4”. The relation NetworkOperation 412 between the two vertices User 402 and Network 404 is created, with the access time set to the current time.
The events are then used to predict or detect any risk using one or more machine learning (ML) or other rule-based models. In this example, the models predict an excessive authorization failures risk, which is associated with the user Adam. The risk is updated in the graph database by creating the RiskIndicator 410 vertex for excessive authorization failures and a relation HasUserRisk 416 between the User 402 and RiskIndicator 410 vertices, with the current time stamp of occurrence and any other related information. Other examples will be apparent in light of this disclosure.
As noted above,
The method 500 further includes generating 504, for each of the one or more selection sets, and any optional query conditions (e.g., where, order by, limit by, etc.), a second database query. The second database query can be generated via a calling process. The second database query includes a select clause representing a request to retrieve the property of the vertex corresponding to the selection set (e.g., “graphuser”) from the graph database, such as shown in
Referring to
Next, the generating of the second database query 504 includes determining 542 whether a query condition exists on the selection set 530. Examples of query conditions include but are not limited to a where clause, an order by clause, and/or a limit clause. A “where clause” is, for example, a clause in the second database query that defines a parameter that is to be matched in the database. For example, the query “get all users who own a device named ‘Macbook’” can be constructed as a graph query that includes results from the graph database where the device name is “Macbook,” as will be understood by one of skill in the art. The “where clause” can also exclude results, such as by requesting all results where the result does not include the parameter defined in the query (e.g., result all results where the device name is not “Macbook”). An “order by clause” is, for example, a clause in the second database query that causes the results of the query to be returned in a particular order or sequence. For example, the query “get all users who own a device named ‘Macbook’” can include an “order by name” clause so that the results are returned sorted according to the name. A “limit clause” is, for example, a clause in the second database query that defines a constraint on the number of unique results returned by the query. For example, the query “get all users who own a device named ‘Macbook’” can include a “limit by 5” to limit the number of results returned by the query to five or fewer.
If the second database query includes a query condition, then the query condition is inserted 544 into the select clause of the second database query, which is a raw graph query.
Referring again to
Referring next to
For example, while the first database query refers generically to a resolver “graphuser” exposed by the GraphQL server, the second database query includes the structural parameters ultimately needed to execute the query on the graph database once the query is translated into the graph query language. For example, the schema of
After the second database query is executed on the graph database, the database returns a response constructed in the graph query language (e.g., GSQL). The graph query language response is then handed back to the calling process for translation into a generic query language (e.g., GraphQL) prior to rendering the query response to the user.
The computing platform or device 800 includes one or more processors 810, volatile memory 820 (e.g., random access memory (RAM)), non-volatile memory 830, one or more network or communication interfaces 840, a user interface (UI) 860, a display screen 870, and a communications bus 850. The computing platform 800 may also be referred to as a computer or a computer system.
The non-volatile (non-transitory) memory 830 can include: one or more hard disk drives (HDDs) or other magnetic or optical storage media; one or more solid state drives (SSDs), such as a flash drive or other solid-state storage media; one or more hybrid magnetic and solid-state drives; and/or one or more virtual storage volumes, such as a cloud storage, or a combination of such physical storage volumes and virtual storage volumes or arrays thereof.
The user interface 860 can include one or more input/output (I/O) devices (e.g., a mouse, a keyboard, a microphone, one or more speakers, one or more biometric scanners, one or more environmental sensors, and one or more accelerometers, etc.).
The display screen 870 can provide a graphical user interface (GUI) and in some cases, may be a touchscreen or any other suitable display device.
The non-volatile memory 830 stores an operating system (OS) 825, one or more applications 834, and data 836 such that, for example, computer instructions of the operating system 825 and the applications 834, are executed by processor(s) 810 out of the volatile memory 820. In some examples, the volatile memory 820 can include one or more types of RAM and/or a cache memory that can offer a faster response time than a main memory. Data can be entered through the user interface 860. Various elements of the computer platform 800 can communicate via the communications bus 850.
The illustrated computing platform 800 is shown merely as an example computing device and can be implemented by any computing or processing environment with any type of machine or set of machines that can have suitable hardware and/or software capable of operating as described herein.
The processor(s) 810 can be implemented by one or more programmable processors to execute one or more executable instructions, such as a computer program, to perform the functions of the system. As used herein, the term “processor” describes circuitry that performs a function, an operation, or a sequence of operations. The function, operation, or sequence of operations can be hard coded into the circuitry or soft coded by way of instructions held in a memory device and executed by the circuitry. A processor can perform the function, operation, or sequence of operations using digital values and/or using analog signals.
In some examples, the processor can be embodied in one or more application specific integrated circuits (ASICs), microprocessors, digital signal processors (DSPs), graphics processing units (GPUs), microcontrollers, field programmable gate arrays (FPGAs), programmable logic arrays (PLAs), multicore processors, or general-purpose computers with associated memory.
The processor 810 can be analog, digital, or mixed. In some examples, the processor 810 can be one or more physical processors, which may be remotely located or local. A processor including multiple processor cores and/or multiple processors can provide functionality for parallel, simultaneous execution of instructions or for parallel, simultaneous execution of one instruction on more than one piece of data.
The network interfaces 840 can include one or more interfaces to enable the computing platform 800 to access a computer network 880 such as a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or the Internet through a variety of wired and/or wireless connections, including cellular connections. In some examples, the network 880 may allow for communication with other computing platforms 890, to enable distributed computing. In some examples, the network 880 may allow for communication with the one or more of the end user client device(s) 102, 202, 302, the REST controller server 106, 206, the REST client/UI 104, 204, the data access layer 108, 208, the SQL database 110, the GraphQL client/UI 304, the GraphQL controller 306, the graph query generator 308, and/or the graph database 210, 310 of
The foregoing description and drawings of various embodiments are presented by way of example only. These examples are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Alterations, modifications, and variations will be apparent in light of this disclosure and are intended to be within the scope of the invention as set forth in the claims.
Also, the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. Any references to examples, components, elements or acts of the systems and methods herein referred to in the singular can also embrace examples including a plurality, and any references in plural to any example, component, element or act herein can also embrace examples including only a singularity. References in the singular or plural form are not intended to limit the presently disclosed systems or methods, their components, acts, or elements. The use herein of “including,” “comprising,” “having,” “containing,” “involving,” and variations thereof is meant to encompass the items listed thereafter and equivalents thereof as well as additional items. References to “or” can be construed as inclusive so that any terms described using “or” can indicate any of a single, more than one, and all of the described terms. In addition, in the event of inconsistent usages of terms between this document and documents incorporated herein by reference, the term usage in the incorporated references is supplementary to that of this document; for irreconcilable inconsistencies, the term usage in this document controls.