This disclosure relates in general to processing of objects processed by systems or applications, for example, requests and traces, and in particular to an object tagging language for categorizing objects processed by systems.
Several applications require analyzing objects processed by systems, for example, requests processed by applications, reports generated by applications, or traces generated by systems such as database management systems. For example, systems and applications are often migrated from physical datacenters to cloud platforms such as AWS (AMAZON WEB SERVICES), GOOGLE cloud platform, MICROSOFT AZURE. Various objects processed by these systems are analyzed to determine whether the migration and subsequent execution of the systems is as expected. Such systems generate large number of objects, for example, a typical system may generate several hundred thousand objects that need to be analyzed.
In a multitenant system, different tenants may run different applications and may even run proprietary code. Therefore, there is no consistent format of the objects that need to be analyzed. For example, different applications or systems may log data using different formats and even unstructured data. Even replicas of a replicated system that run on different platforms may generate traces that have different formats. For example, one replica of an application may use an ORACLE database, whereas another replica may use a different database such as POSTGRESQL database. These two replicas may generate traces using different formats. Accordingly, it is difficult to analyze the traces of the two replicas to determine whether the execution of the two replicas matches. For example, it is possible that one of the traces includes errors or warning that are not logged in the trace generated by the other replica. The varied nature of the objects processed by such systems makes it difficult to analyze the processed objects and monitor execution of these systems.
The figures depict various embodiments for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the embodiments described herein.
The figures use like reference numerals to identify like elements. A letter after a reference numeral, such as “115a,” indicates that the text refers specifically to the element having that particular reference numeral. A reference numeral in the text without a following letter, such as “115,” refers to any or all of the elements in the figures bearing that reference numeral.
A system allows users to perform analysis of objects processed by systems, for example, requests, traces, logs, and so on. The system allows users to use an object tagging language to categorize objects. The system created a tagging metadata index based on the tagged objects. The tagging metadata index allows efficient execution of queries used for analyzing the objects.
According to an embodiment, the system (e.g., an online system or a multi-tenant system) receives from one or more systems, objects processed by the systems, for example, traces based on execution of the systems. A trace includes a set of trace objects. A trace object includes a set of fields. One or more fields may store unstructured data. A trace object has an identifier.
The system receives a declarative tagging specification comprising tagging rules. A tagging rule specifies criteria for identifying a category of a trace object. A criterion may describe a pattern of unstructured data stored in a field that is specific to a category of the trace object. The system executes the tagging rules of the tagging specification for the objects. The execution causes at least some of the objects to be annotated with tags. A tag for an object identifies a category of the object. The system generates a tagging metadata index that maps categories of tags to identifiers of objects. The system receives a query for analyzing the objects and executes the query using the tagging metadata index. The use of the tagging metadata index allows efficient analysis of the objects processed by the system.
The multi-tenant system 110 stores information of one or more tenants 115. Each tenant may be associated with an enterprise that represents a customer of the multi-tenant system 110. Each tenant may have multiple users that interact with the multi-tenant system via client devices 105. With the multi-tenant system 110, data for multiple tenants may be stored in the same physical database. However, the database is configured so that data of one tenant is kept logically separate from that of other tenants so that one tenant does not have access to another tenant's data, unless such data is expressly shared. It is transparent to tenants that their data may be stored in a table that is shared with data of other customers. A database table may store rows for a plurality of tenants. Accordingly, in a multi-tenant system, various elements of hardware and software of the system may be shared by one or more tenants. For example, the multi-tenant system 110 may execute an application server that simultaneously processes requests for a number of tenants. However, the multi-tenant system enforces tenant-level data isolation to ensure that jobs of one tenant do not access data of other tenants.
A cloud platform may also be referred to as a cloud computing platform or a public cloud environment. A tenant may use the cloud platform infrastructure language to provide a declarative specification of a data center that is created on a target cloud platform 120. A tenant 115 may create one or more data centers on a cloud platform 120. A data center represents a set of computing resources including servers, applications, storage, memory, and so on that can be used by users, for example, users associated with the tenant.
The computing resources of a data center are secure and may not be accessed by users that are not authorized to access them. For example, a data center 125a that is created for users of tenant 115a may not be accessed by users of tenant 115b unless access is explicitly granted. Similarly, data center 125b that is created for users of tenant 115b may not be accessed by users of tenant 115a, unless access is explicitly granted. Furthermore, services provided by a data center may be accessed by computing systems outside the data center, only if access is granted to the computing systems in accordance with the declarative specification of the data center.
Examples of cloud platforms include AWS (AMAZON web services), GOOGLE cloud platform, or MICROSOFT AZURE. A cloud platform 120 offers computing infrastructure services that may be used on demand by a tenant 115 or by any computing system external to the cloud platform 120. Examples of the computing infrastructure services offered by a cloud platform include servers, storage, databases, networking, security, load balancing, software, analytics, intelligence, and other infrastructure service functionalities. These infrastructure services may be used by a tenant 115 to build, deploy, and manage applications in a scalable and secure manner.
The multi-tenant system 110 may include a tenant data store that stores data for various tenants of the multi-tenant store. The tenant data store may store data for different tenants in separate physical structures, for example, separate database tables or separate databases. Alternatively, the tenant data store may store data of multiple tenants in a shared structure. For example, user accounts for all tenants may share the same database table. However, the multi-tenant system stores additional information to logically separate data of different tenants.
Each component shown in
The interactions between the various components of the system environment 100 are typically performed via a network, not shown in
Although the techniques disclosed herein are described in the context of a multi-tenant system, the techniques can be implemented using other systems that may not be multi-tenant systems. For example, an online system used by a single organization or enterprise may use the techniques disclosed herein to create one or more data centers on one or more cloud platforms 120.
According to an embodiment, the cloud platform 120 receives requests for processing by the application 230. The received requests may be stored in a request store 220. The cloud platform 120 provides the requests received to both application replicas 230a and 230b. Each application replica 230a, 230b executes the requests received. Since there are differences in the installations of the application replicas 230a, 230b, it is likely that the same request may be processed differently by the two application replicas 230a, 230b. The execution of the requests may be different for other reasons, for example, the execution may depend on parameters such as location, time, some environment variables that is defined locally by an administrator, and so on that may be different when the request is executed by each application replica.
As an example, the same request may execute successfully in application replica 230a but return an error in application replica 230b. Alternatively, the same request may execute with warnings in application replica 230a but run without any warnings in application replica 230b. Similarly, it is likely that the same request returns in results that are different. For example, if the request returns a JSON (JAVASCRIPT OBJECT NOTATION) object as the result, the two application replicas may return JSON objects with differences in some of the attributes or differences in the structures of the returned object. The execution of the requests may generate different traces in the two application replicas. A trace refers to a sequence of logs that are generated and stored by an application or system that represents an execution of a task, for example, a beginning to end execution of a request processed by the system. Accordingly, the trace will store any error or warning generated during execution of the request. The execution of the requests may result in the two application replicas storing different logs.
A system may need to analyze the various objects associated with the application 230 to determine the differences in the execution of requests. For example, if the application was recently installed in the cloud platform, a system administrator may have to monitor the execution of the two application replicas to observe any differences. An application may be associated with a large number of requests, for example, millions of requests per day. As a result, the application or application replicas are associated with a very large number of objects that are stored by the system. Furthermore, each application may store objects such as traces or results in a proprietary format. Attributes of the objects may be scalar values, nested objects, unstructured data, semi-structured data, structured data, and so on. The variety of the formats of the data stored in the objects makes analysis of such objects difficult.
The system according to various embodiments allows users such as system administrators or data analysts to analyze objects processed by applications and systems, such as traces, logs, requests, and so on. Examples of traces analyzed by the system include database traces, result traces (traces that store responses generated by applications or systems), log traces, the error traces, the application tier traces, query language execution traces, and so on. The analysis of the traces may be performed to compare execution of two replicas of a system or application. The analysis of the traces may be performed to determine differences in execution of requests over a period of time, for example, to determine whether the system started generated errors or warnings over a period of time. The analysis of the traces may be performed to identify anomalies during execution of requests, for example, if there was a systemic change in execution time of requests over a time interval.
The object tagging interface 320 allows users to provide object tagging specification for tagging objects in a system. For example, a user associated with a tenant of a multi-tenant system may use a client device 305a to provide object tagging specification to tag traces generated by an application executed by the tenant. In an embodiment, the object tagging interface 320 allows users to use an object tagging language to provide a declarative specification for tagging objects. The declarative specification based on the object tagging language specifies declarative rules mapping patterns of objects to specific tags. The declarative tagging rules simply specify the mapping without requiring the users to specify how the mapping is implemented. The object tagging module 330 determines how the declarative specification is implemented. The object tagging interface 320 may present a graphical user interface that allows users to build declarative tagging rules. The object tagging interface 320 may allow users to invoke application programming interfaces (APIs) to specify declarative tagging rules. The object tagging interface 320 stores the receives declarative tagging rules in the tagging rule store 315. A declarative tagging rule may also be referred to herein as a tagging rule.
The object tagging module 330 receives a declarative specification comprising one or more declarative tagging rules and performs tagging of objects stored in a system according to the declarative specification. The object tagging module 330 receives declarative tagging rules and transforms and tags objects stored in an object store 335 based on category, expressions & filters specified in the declarative tagging rules. The object tagging module 330 executes the declarative tagging rules on an object to generate one or more of key/value pair of tags collection for the object based on the patterns defined in the declarative tagging rule. The framework supported by the object tagging system 130 allows users to specify custom tagging rules. For example, each tenant of a multi-tenant system may define a distinct set of rules. Accordingly, the same type of objects generated by different instances of the same application may be analyzed differently by different tenants of the multi-tenant system. The details of the architecture of the object tagging module 330 are provided in the
The object store 335 stores various types of objects that are analyzed by the object tagging system 130. The object store 335 may include one or more data stores. The object store 335 may be associated with one or more applications that are being analyzed by the object tagging system 130 and may be located in a system running the applications. The object store 335 stores objects including reports, traces (generated by databases or applications), logs, queries, requests, and so on that are analyzed by the object tagging system 130.
The index store 340 stores a tagging metadata index generated by the object tagging module 330. The tagging metadata index maps various tags to objects stored in the object store 335. An object is associated with an object identifier that allows the system to uniquely identify the object. The individual objects may be identified using object identifiers. The tagging metadata index allows efficient execution of queries for analyzing the objects.
The object query interface 350 allows users to query objects stored in the object store 335 for analysis purposes. For example, the object query interface 350 may provide a graphical user interface that allows a user to extract portions of traces that represent error or warnings. The object query interface 350 may provide a graphical user interface that allows a user to compare execution of a set of requests across two replicas of the an application. The execution of the object queries uses the tagging metadata index for efficient execution of the user requests.
As shown in
The tagging rule processing module 410 obtains the declarative tagging rules from the object tagging interface 320 and parses the rules to perform syntactic and semantic analysis of the declarative tagging rules to validate them. If the tagging rule processing module 410 determines a syntactic or semantic error in a declarative tagging rule, the tagging rule processing module 410 may report the error to the user, for example, via the object tagging interface 320. The user may revise the declarative tagging rule to fix any errors. The tagging rule processing module 410 may receive a modified version of the declarative tagging rule via the object tagging interface 320 and analyze the declarative tagging rule again.
The object transformation module 420 accesses objects from the object store 335 and processes the declarative tagging rules applicable to the objects. The object transformation module 420 transforms the objects according to the declarative tagging rules applicable to the objects and may add one or more tags to the objects. The object transformation module 420 may store the transformed objects in the object store 335.
The tagging index generation module 430 generates and updates the tagging metadata index based on objects and declarative tagging rules processed by the object tagging module 330.
The system receives 510 a declarative tagging specification. The declarative tagging specification comprising one or more tagging rules, wherein a tagging rule specifies criteria for identifying a category of a trace object, at least one criterion describing a pattern of unstructured data stored in a field, the pattern specific to a category of the trace object;
The system retrieves 520 from one or more systems, objects processed by the systems, for example, traces, requests, logs, and so on. For example, the traces generated by a system may include a set of trace objects. Each object includes a set of fields that may be stored as key-value pairs. The set of fields may include one or more fields storing unstructured data. An object has an object identifier that uniquely identifies the object.
The system executes the steps 530 and 540 for each object being processed and for each tagging rule of the declarative tagging specification that is applicable to the object being processed. The system executes 530 the tagging rule for the object. The execution causes the objects to be annotated with one or more tags. Each tag for the object may identify a category of the trace object, for example, database trace, result log, and so on. The system updates 540 the tagging metadata index to include a mapping from the tags that were used to annotate the object to identifiers of object being annotated.
The system receives 550 a query for analyzing the traces and executes 560 the query using the tag index.
Following is an example rule tagging definition illustrating various attributes specified in a tagging rule.
Examples of attributes of a tagging rule include a rule name that identifies and describes the rule, a rule type indicating whether the rule is a built-in rule or user specified rule, a status indicating whether the rule should be included or excluded during execution in a given context, a list of tags (each tag represented as a key value pair), a message that is appended with an object when the object is annotated with a particular tag, a category identifying the category of objects to which the tag is applied (for example, a classification category for classifying unstructured data, a result log category, a database trace category, a row count category for classifying rows as objects, and so on), one or more criteria that should be met by an object in order for the system to apply a particular tag to the object, and so on. According to an embodiment a criterion is represented as an expression, for example, a regular expression, a database query language expression, and so on. The expression is evaluated against an object to determine whether the tag should be applied to the object.
Following is an example tagging rule for processing a query language (QL) trace. The rule has name QLTrace. The rule specifies a tag QLStatement for finding generated QL statement. The tagging rule specifies two criteria, one named diffdateliteral that specifies a regular expression. The regular expression may check for patterns of String, Date, or Alphanumeric fields using wild card characters such as ‘*’ and ‘?’ and other regular expression features.
The following tagging rule filters objects that store information describing cardinality. The tag name FilterCardinality specifies a criteria named Filter_criteria that executes a regular expression to identify a specific term in the objects of category classification_t and db_trace_t.
The following tagging rule identifies reports that failed due to an error. The tag named LoadingReportsFailed specifies a criterion that checks for keywords that indicated failure in the objects.
Following is a portion of an example object that may be generated by a system during execution, for example, while processing a request. The object includes various fields that may be stored as key-value pairs. Certain fields for example, ResultLog_t store unstructured text. An expert with knowledge of the objects generated by the system can identify patterns in the various fields that represent certain characteristics of execution of the system while processing the requests. Accordingly, the expert can provide tagging rules that match the patterns to determine the tags for each object.
If the example tagging rules described herein are executed using the above example object, the following tags may be generated. These tags are added to the object and the transformed object annotated with these tags is stored. These tags may be added as a field of the object.
Following is a portion of an example tagging metadata index that stores the mapping from tags to objects. The tagging metadata index maps tags to objects such as requests. The tagging metadata index identifies the objects using their respective identifiers. The tagging metadata index may store additional attributes for example, a category of the object, certain flags applicable to the object and so on. The additional metadata attributes stored in the tagging metadata index allow specific queries to be processed efficiently. For example, a user may be able to filter the requests based on specific attributes stored in the tagging metadata index.
The system disclosed allows efficient analysis of arbitrary objects processed by multiple heterogenous data sources. The system does not enforce any predetermined schema on the data representation of objects. The system receives tagging rules and extracts data and transforms the objects to annotate the objects with tags representing various categories and attributes describing the objects. This allows a user to add structure to unstructured and arbitrary objects processed by various systems. The system further created a tagging metadata index that allows users to efficiently search across the objects.
The storage device 608 is a non-transitory computer-readable storage medium, such as a hard drive, compact disk read-only memory (CD-ROM), DVD, or a solid-state memory device. The memory 606 holds instructions and data used by the processor 602. The pointing device 614 may be a mouse, track ball, or other type of pointing device, and is used in combination with the keyboard 610 to input data into the computer system 600. The graphics adapter 612 displays images and other information on the display 618. The network adapter 616 couples the computer system 600 to a network.
As is known in the art, a computer 600 can have different and/or other components than those shown in
The computer 600 is adapted to execute computer modules for providing the functionality described herein. As used herein, the term “module” refers to computer program instruction and other logic for providing a specified functionality. A module can be implemented in hardware, firmware, and/or software. A module can include one or more processes, and/or be provided by only part of a process. A module is typically stored on the storage device 608, loaded into the memory 606, and executed by the processor 602.
The types of computer systems 600 used by the entities of
The particular naming of the components, capitalization of terms, the attributes, data structures, or any other programming or structural aspect is not mandatory or significant, and the mechanisms that implement the embodiments described may have different names, formats, or protocols. Further, the systems may be implemented via a combination of hardware and software, as described, or entirely in hardware elements. Also, the particular division of functionality between the various system components described herein is merely exemplary, and not mandatory; functions performed by a single system component may instead be performed by multiple components, and functions performed by multiple components may instead performed by a single component.
Some portions of above description present features in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. These operations, while described functionally or logically, are understood to be implemented by computer programs. Furthermore, it has also proven convenient at times, to refer to these arrangements of operations as modules or by functional names, without loss of generality.
Unless specifically stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system memories or registers or other such information storage, transmission or display devices.
Certain embodiments described herein include process steps and instructions described in the form of an algorithm. It should be noted that the process steps and instructions of the embodiments could be embodied in software, firmware or hardware, and when embodied in software, could be downloaded to reside on and be operated from different platforms used by real time network operating systems.
The embodiments described also relate to apparatuses for performing the operations herein. An apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored on a computer readable medium that can be accessed by the computer. Such a computer program may be stored in a non-transitory computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus. Furthermore, the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
The algorithms and operations presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may also be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will be apparent to those of skill in the, along with equivalent variations. In addition, the present embodiments are not described with reference to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the embodiments as described herein.
The embodiments are well suited for a wide variety of computer network systems over numerous topologies. Within this field, the configuration and management of large networks comprise storage devices and computers that are communicatively coupled to dissimilar computers and storage devices over a network, such as the Internet.
Finally, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, the disclosure of the embodiments is intended to be illustrative, but not limiting.