Analytical services gather data from various sources and build reports, dashboards, and other products that provide insights into the data. The data represent information about entities such as users, devices, networks, and shares, along with correlations among the data, which are collected over time from the entities. The data are analyzed by various models, such as those defined to detect any threats, risks, or vulnerabilities associated with the entities. These models process the data in real time and in batch fashion so that preventive measures, such as responsive actions, are taken to mitigate the threats. The data received and generated during processing by the services are ingested into various data stores, such as time series, graph, relational, or other types of databases, based on the nature of the data (e.g., what the data represents) and how the data is to be used. With growing requirements and needs, a large amount of data is generated continuously, and existing data is enhanced with additional attributes and information which can be leveraged to build more use cases and solutions.
One example provides a method of updating a database schema. The method includes receiving a plurality of messages from an application environment; analyzing, as specified by a configuration, each of the messages to produce an analysis including a list of attributes and corresponding attribute values; fetching a schema from either a database or a schema registry associated with the database; updating the schema to produce an updated schema based on one or more of the attributes in the list of attributes; and applying the updated schema to the database.
At least some examples of the method include one or more of the following. The configuration specifies a start time and an end time for messages to be analyzed. The method further includes generating an ingestion specification based on the database and the one or more of the attributes in the list of attributes, and registering the ingestion specification with the database. The method further includes transmitting the analysis and/or the updated schema to an operator console. The method further includes receiving a modified analysis from the operator console, where the modified analysis includes at least one modification to the one or more of the attributes in the list of attributes, and where updating the schema is further based on the modified analysis. The method further includes registering the updated schema with the schema registry associated with the database. The plurality of messages includes a user login event, a security event, a processing event, and/or data representing activity occurring within the application environment. At least one attribute of the plurality of messages includes a numerical value, and the analysis includes a minimum, maximum, average, mean, or a statistical value of the at least one attribute of the plurality of messages. At least one attribute of the plurality of messages includes a text value, and the analysis includes a number of empty, null, duplicate, or missing values of at least one attribute of the plurality of messages.
Another example provides a computer program product including one or more non-transitory machine-readable mediums having instructions encoded thereon that when executed by at least one processor cause a process to be carried out. The process includes receiving a plurality of messages from an application environment; analyzing, as specified by a configuration, each of the messages to produce an analysis including a list of attributes and corresponding attribute values; fetching a schema from either a database or a schema registry associated with the database; updating the schema to produce an updated schema based on one or more of the attributes in the list of attributes; and applying the updated schema to the database.
At least some examples of the computer program product include one or more of the following. The configuration specifies a start time and an end time for messages to be analyzed. The process further includes generating an ingestion specification based on the database and the one or more of the attributes in the list of attributes, and registering the ingestion specification with the database. The process further includes transmitting the analysis and/or the updated schema to an operator console. The process further includes receiving a modified analysis from the operator console, where the modified analysis includes at least one modification to the one or more of the attributes in the list of attributes, and where updating the schema is further based on the modified analysis. The process further includes registering the updated schema with the schema registry associated with the database. The plurality of messages includes a user login event, a security event, a processing event, and/or data representing activity occurring within the application environment.
Yet another example provides a system including a storage and at least one processor operatively coupled to the storage. The at least one processor is configured to execute instructions stored in the storage that when executed cause the at least one processor to carry out a process. The process includes receiving, by an event manager, a plurality of messages from an application environment; analyzing, by the event manager and as specified by a configuration, each of the messages to produce an analysis including a list of attributes and corresponding attribute values; fetching, by the event manager, a schema from either a database or a schema registry associated with the database; updating, by the event manager, the schema to produce an updated schema based on one or more of the attributes in the list of attributes; and applying, by the event manager, the updated schema to the database.
At least some examples of the system include one or more of the following. The configuration specifies a start time and an end time for messages to be analyzed. The process further includes generating, by the event manager, an ingestion specification based on the database and the one or more of the attributes in the list of attributes, and registering the ingestion specification with the database. The process further includes transmitting, by the event manager, the analysis and/or the updated schema to an operator console. The process further includes receiving, by the event manager, a modified analysis from the operator console, where the modified analysis includes at least one modification to the one or more of the attributes in the list of attributes, and where updating the schema is further based on the modified analysis. The process further includes registering, by the event manager, the updated schema with the schema registry associated with the database. The plurality of messages includes a user login event, a security event, a processing event, and/or data representing activity occurring within the application environment.
Other aspects, examples, and advantages of these aspects and examples, are discussed in detail below. It will be understood that the foregoing information and the following detailed description are merely illustrative examples of various aspects and features and are intended to provide an overview or framework for understanding the nature and character of the claimed aspects and examples. Any example or feature disclosed herein can be combined with any other example or feature. References to different examples are not necessarily mutually exclusive and are intended to indicate that a particular feature, structure, or characteristic described in connection with the example can be included in at least one example. Thus, terms like “other” and “another” when referring to the examples described herein are not intended to communicate any sort of exclusivity or grouping of features but rather are included to promote readability.
Various aspects of at least one example are discussed below with reference to the accompanying figures, which are not intended to be drawn to scale. The figures are included to provide an illustration and a further understanding of the various aspects and are incorporated in and constitute a part of this specification but are not intended as a definition of the limits of any particular example. The drawings, together with the remainder of the specification, serve to explain principles and operations of the described and claimed aspects. In the figures, each identical or nearly identical component that is illustrated in various figures is represented by a like numeral. For purposes of clarity, not every component may be labeled in every figure.
According to some examples of the present disclosure, data is received from various application environments on a regular and ongoing basis. The data can be used for security or analytical purposes in the maintenance of those application environments. For example, incoming messages can include events that are related to user activity (e.g., login and logout) that is monitored for security threats. The data are ingested into one or more databases as part of this process. The ingestion of data involves configuring one or more database schemas and, in some cases, ingestion specifications, that define how the incoming messages are to be stored in the database. For example, certain fields or attributes of the messages may define how the data is represented within the structure of the database(s) in which the data are to be stored. Determining which attributes and the corresponding databases where the data are to be stored, involves analyzing at least a subset of the messages to help define the corresponding database schemas and ingestion specifications that can be subsequently used to process incoming messages.
In some instances, the analysis of the data can be carried out by developers, engineers, or other users for various attributes of interest and its presence in all types of events. After identifying the attributes, engineers, developers, or other users typically write and define the schema and ingestion specifications to store the data in the desired database as per its signature, syntax, etc. The process of data analysis, identifying fields of interest and the writing database schema, and updating the existing schema, are tasks undertaken when changes occur to the use case or to implement enhancements and other requirements. Therefore, there remain non-trivial problems associated with event analysis and management.
To this end, example embodiments of the present disclosure provide techniques for analyzing events incoming through a message broker and configuration of a database schema for storing the events based on the analysis. The analysis is performed on all the attributes of the incoming events with reference to a primary identifier of an event source. The analysis determines the characteristics of the attributes, which facilitates development of the database schema with availability, accuracy, existence, and other factors of various attributes. Analysis is supported for various formats of events, such as AVRO, XML, complex JSON, etc. In some examples, the attributes of interest for database schema generation can be provided via a configuration for the respective databases including relational, time-series, analytical, graph, etc. Also, if a given database supports direct ingestion of data through the message broker, then the ingestion specification can be generated. Various examples will be apparent in light of the present disclosure.
Example System
The application environment 102 includes one or more applications that are executing on a client or server computing device, such as applications executing in a virtual workspace or another virtual computing environment that provides computing resources (processing and/or data) to end users. The message broker 104 includes one or more modules configured to provide a standardized flow of data from the application environment 102 to the event manager 108 and to the database(s) 106. In some examples, the message broker 104 receives one or more messages 120 from the application environment 102. The messages 120 represent events or other actions that occur within, and are generated by, the application environment 102, such as user login events, security events, processing events, or other data representing activity occurring within the application environment 102. The message broker 104 routes the messages 120 to the event manager 108 and/or the database 106.
The configuration 114 is provided to the event manager 108. The configuration 114 specifies details about the database 106 (e.g., the type of database) and associated parameters for database schema generation (e.g., the database field attributes), such as shown in the examples of
If the database 106 supports direct ingestion from the message broker 104, then a database ingestion specification 118 is generated that can be used for ingesting the messages 120 into the database 106. For example, the ingestion specification 118 can include a database schema, which configures the database name and other parameters; an input configuration, which instructs the database about how to connect to the message broker 104 and how to parse the messages 120; and any other parameters needed to support the ingestion method used to ingest or otherwise load data from the messages 120 into the database 106. The ingestion specification 118 can be validated and/or updated by the user as needed.
The event manager 108 is configured to produce an analysis 122 to the administrator console. As discussed in further detail with respect to
In some examples, the system 100 can include a workstation, a laptop computer, a tablet, a mobile device, or any suitable computing or communication device. One or more components of the system 100, including the event manager 108, can include or otherwise be executed using one or more processors, volatile memory (e.g., random access memory (RAM)), non-volatile machine-readable mediums (e.g., memory), one or more network or communication interfaces, a user interface (UI), a display screen, and a communications bus. The non-volatile (non-transitory) machine-readable mediums can include: one or more hard disk drives (HDDs) or other magnetic or optical machine-readable storage media; one or more machine-readable solid state drives (SSDs), such as a flash drive or other solid-state storage media; one or more hybrid machine-readable magnetic and solid-state drives; and/or one or more virtual machine-readable storage volumes, such as a cloud storage, or a combination of such physical storage volumes and virtual storage volumes or arrays thereof. The user interface can include one or more input/output (I/O) devices (e.g., a mouse, a keyboard, a microphone, one or more speakers, one or more biometric scanners, one or more environmental sensors, and one or more accelerometers, etc.). The display screen can provide a graphical user interface (GUI) and in some cases, may be a touchscreen or any other suitable display device. The non-volatile memory stores an operating system (OS), one or more applications, and data such that, for example, computer instructions of the operating system and the applications, are executed by processor(s) out of the volatile memory. In some examples, the volatile memory can include one or more types of RAM and/or a cache memory that can offer a faster response time than a main memory. Data can be entered through the user interface. Various elements of the system 100 (e.g., the application environment 102, the message broker 104, the event manager 108, the database 106, the administrator console 112, and/or the schema registry 110) can communicate via the communications bus or another data communication network.
The system 100 described herein is an example computing device and can be implemented by any computing or processing environment with any type of machine or set of machines that can have suitable hardware and/or software capable of operating as described herein. For example, the processor(s) of the system 100 can be implemented by one or more programmable processors to execute one or more executable instructions, such as a computer program, to perform the functions of the system. As used herein, the term “processor” describes circuitry that performs a function, an operation, or a sequence of operations. The function, operation, or sequence of operations can be hard coded into the circuitry or soft coded by way of instructions held in a memory device and executed by the circuitry. A processor can perform the function, operation, or sequence of operations using digital values and/or using analog signals. In some examples, the processor can be embodied in one or more application specific integrated circuits (ASICs), microprocessors, digital signal processors (DSPs), graphics processing units (GPUs), microcontrollers, field programmable gate arrays (FPGAs), programmable logic arrays (PLAs), multicore processors, or general-purpose computers with associated memory. The processor can be analog, digital, or mixed. In some examples, the processor can be one or more physical processors, which may be remotely located or local. A processor including multiple processor cores and/or multiple processors can provide functionality for parallel, simultaneous execution of instructions or for parallel, simultaneous execution of one instruction on more than one piece of data.
The network interfaces can include one or more interfaces to enable the system 100 access a computer network such as a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or the Internet through a variety of wired and/or wireless connections, including cellular connections. In some examples, the network may allow for communication with other computing platforms, to enable distributed computing. In some examples, the network may allow for communication with the application environment 102, the message broker 104, the event manager 108, the database 106, the administrator console 112, the schema registry 110, and/or other parts of the system 100 of
Example Process
The process 200 further includes determining 204 the message broker 104 from which the event manager 108 receives the messages 120. Note that there can be more than one message broker 104, depending on how the messages 120 are to be received from the application environment 102. Each message broker 104 is specified by the configuration 114. For example, the message broker 104 can be Apache Kafka, Azure Service Bus, or any other message broker that is configured to process messages from the application environment 102. The message broker 104 is the source of all incoming messages to be processed by the event manager 108. In some examples, if the database 106 supports an ingestion specification 118, the database 106 can be configured to ingest messages directly from the message broker 104 according to the ingestion specification 118.
The process 200 further includes receiving 206, by the event manager 108, the messages 120 transmitted from the application environment 102 via the message broker 104. For example, as noted above, the configuration 114 can specify a time window or time frame to be used for filtering the messages 120 that are to be analyzed (e.g., between 1 Jan. 2020 at 12:01 am and 7 Jan. 2020 at 11:59 pm), and/or the configuration 114 can specify other attributes for filtering the messages 120 (e.g., by user, by source, by topic, by subscription, by database type, etc.). In this manner, the event manager 108 will receive and process any messages 120 that include events occurring within the time window specified by the configuration 114, or any messages 120 that otherwise satisfy any other criteria specified by the configuration 114.
The process 200 further includes analyzing 208, by the event manager 108, the incoming messages 120 received from the message broker 104 to produce the analysis 122 of
The analysis 122 performed on the messages 120 is collated and transmitted as an output 210 to the administrator console 112, such as shown in
The analysis 122, as transmitted by the event manager 108 and/or the administrator via the administrator console 112, is then registered 212 with the schema registry 110 as the schema 116. If there are multiple databases 106, then separate schemas 116 can be registered for each database. An example schema 116 is shown in
The process 200 further includes determining 214, by the event manager 108, the type of the database 106 from the configuration 114. For example, the database type can be a relational database, a time series or analytical database, a graph database, or any other type of database or datastore. After the database is determined from the configuration 114, the existing (current) schema 116 is fetched 216 from the database 106 by the event manager 108. Next, the event manager 108 updates 218 the schema 116 according to the attributes specified in the analysis 122, as described above. For example, if the analysis 122 specifies one or more attributes that are not in the existing schema 116, then the schema 116 is updated to include those attributes specified in the analysis 122. If the configuration 114 provides that the updated schema 116 is to be applied to the database 106, then the event manager 108 applies 220 the updated schema 116 to the database 106. If the event manager 106 determines 214 that there are multiple databases, then the event manager 106 fetches 216, updates 218, and applies 220 the updated schema 116 to each of the databases 106, accordingly. If the configuration 114 does not provide that the updated schema 116 is to be applied to the database 106, then the event manager 104 transmits 222 the updated schema 116 as an output via the administrator console 112 for further processing by the administrator.
If the database 106 supports direct ingestion of the messages 120 from the message broker 104, and the configuration 114 provides that the ingestion specification 118 is to be generated, then the event manager 108 generates 224 the ingestion specification 118, such as shown in
It will be appreciated that the process 200 is extensible for use with any message broker and database that is specified in the configuration 114. For example, analysis of events can be performed on any messages 120 that are received by the event manager 108, and the schema 116 for ingestion specification 118 can be accordingly created or otherwise modified based on the resulting analysis 122.
The foregoing description and drawings of various embodiments are presented by way of example only. These examples are not intended to be exhaustive or to limit the present disclosure to the precise forms disclosed. Alterations, modifications, and variations will be apparent in light of this disclosure and are intended to be within the scope of the present disclosure as set forth in the claims.
Also, the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. Any references to examples, components, elements or acts of the systems and methods herein referred to in the singular can also embrace examples including a plurality, and any references in plural to any example, component, element or act herein can also embrace examples including only a singularity. References in the singular or plural form are not intended to limit the presently disclosed systems or methods, their components, acts, or elements. The use herein of “including,” “comprising,” “having,” “containing,” “involving,” and variations thereof is meant to encompass the items listed thereafter and equivalents thereof as well as additional items. References to “or” can be construed as inclusive so that any terms described using “or” can indicate any of a single, more than one, and all of the described terms. In addition, in the event of inconsistent usages of terms between this document and documents incorporated herein by reference, the term usage in the incorporated references is supplementary to that of this document; for irreconcilable inconsistencies, the term usage in this document controls.