The disclosure concerns the field of information technology. In particular, the disclosure concerns the filtering of input data records against a list of predefined rules. Each input data record comprises multiple data fields. In this document the rules are also referred to as matcher queries. In case an input data record fulfills a rule's condition or matcher query, the action associated to that rule is executed. In some cases, the action does not modify the input data record; in other cases, the input data record is modified.
In prior art solutions, the filtering of input data records is performed sequentially, e.g. one rule, then the next rule etc. In other words, an input data record is sequentially matched with all rules in the list of predefined filter rules. In case the matcher query belonging to a rule is fulfilled then the action associated with that rule is executed. Otherwise, the action associated with that rule is not executed. Subsequently the next rule is processed until all rules are processed. After having processed all rules, the next input data record is filtered.
Examples of prior art solutions are the SPLUNK Data Stream Processor (https://docs.splunk.com/Documentation/DSP/1.4.0/User/Filter, see
In the SPLUNK Data Stream Processor of
Likewise in the CRIBL Stream of
The prior art is also described in U.S. Pat. No. 10,896,175 B2, where it is stated that “The modification to the first data processing pipeline can include a first set of pipelined commands corresponding to the first search query being modified, and the dependency can be enforced by causing a second set of pipelined commands corresponding to the second search query to be modified to include the modified first set of pipelined commands.” It should be noted that the patent describes a solution that works when reading data, not when ingesting data.
Another prior art solution for rule-based data stream processing is described in US 202/20121689 A1, which states that “Systems and methods for rule-based data stream processing by data collection, indexing, and visualization systems. An example method includes: receiving, by the computer system, an input data stream comprising raw machine data; processing the raw machine data by a data processing pipeline that produces transformed machine data”. It should be noted that the patent describes a solution of data processing based on pipeline stages, not data matching solution.
This section provides background information related to the present disclosure which is not necessarily prior art.
This section provides a general summary of the disclosure, and is not a comprehensive disclosure of its full scope or all of its features.
The object of the disclosure is to find a new computer-implemented method for the filtering of input data records against an ordered list of pre-defined rules. The method shall be computationally more efficient than prior art solutions in cases the filtering modifies or does not modify the input data record. In case the filtering modifies the input data record, filtering and modifying the input data record shall be done in a single processing step. In other words, separate processing steps for filtering and modifying the input data record shall be avoided.
According to a first aspect of the disclosure, the objective is solved by a computer-implemented method for filtering an input data record against an ordered list of pre-defined rules according to claim 1.
In particular, the computer-implemented method comprises the following steps:
The disclosed method comprises two phases, first a pre-processing phase followed by a matching phase. In the pre-processing phase, each rule in the list of pre-defined rules is pre-processed into a number of unique, univariate conditions and a number of logical operators operating on the unique, univariate conditions. A unique, univariate condition depends on a single data field only. By pre-processing each rule, at least one unique, univariate condition is derived. In addition to the at least one unique, univariate condition, a number of logical operators, including Boolean operators, such as AND, OR, NOT . . . , and the Identity operator, operating on the number of unique, univariate conditions are derived from the rule. Formally the identity operator implements the Identity function f for all X, such that ƒ(X)=X. In terms of Boolean logics, each transformed rule consisting of logical operator(s) operating on at least one unique, univariate condition is logically equivalent to the original rule. In most cases, some conditions are present in multiple rules. In order to minimize the number of unique, univariate conditions for all rules, multiple instances of identical conditions are discarded, hereby defining an ordered list of unique, univariate conditions with minimal length. After that, preferably the conditions in the list are grouped by the data field accessed in the input data record. The data field is the field on which the univariate condition depends on. Essentially, in the pre-processing phase, the disclosed method derives an ordered list of data fields, an ordered list of unique, univariate conditions, and a set of logical operators operating on the unique, univariate conditions.
In the matching phase of the disclosed method, the input data records are sequentially processed. Starting with the first input data record, the statuses of the conditions is derived by matching the data fields in the input data record to the list of unique, univariate conditions. Next, the input data record is filtered by sequentially applying the at least one logical operator of a transformed rule to the unique, univariate conditions. By evaluation the statuses of the conditions and by applying the logical operators belonging to a transformed rule, the status of the transformed rule is evaluated. The processing of rules is done in ascending order, however, in some cases not all rules need to be processed. Note that a condition being present in multiple rules is only evaluated once and the status of a condition is re-used when processing other rules. In case the status of a transformed rule is TRUE, i.e. the input data record fulfills the requirements of the matcher query then the respective action associated with that rule is executed. Otherwise, no action is executed. After processing all rules, the next input data record is filtered.
According to a preferred embodiment of the disclosure, the logical operators for all transformed rules comprise positive operators only, i.e. do not comprise a NOT operator. By doing so, it is possible to compile a total set of pre-selected rules, such that only those rules comprised in this set are evaluated. In the prior art, always all rules need to be evaluated.
In another preferred embodiment, after evaluating the conditions, for each condition having a TRUE status a set of pre-selected rules is compiled. The set of pre-selected rules comprises those transformed rules where the status of the condition is evaluated.
It is particularly advantageous to compile a total set of pre-selected rules for all conditions having a TRUE status, and to process only the transformed rules comprised in the total set of pre-selected rules. This allows to minimize the number of rules to be evaluated. If based on the conditions having a TRUE status, certain rules are not comprised in the total set of pre-selected rules then those rules will not be evaluated at all.
According to another preferred embodiment, after an action associated with a rule modifies a data field in the input data record, only those conditions are re-evaluated than depend on the modified data field. This feature is extremely helpful since not all rules or not even all conditions are re-evaluated but only those conditions that depend on the modified data field in the input data record. This feature allows to filter and modify the input data record in a single step, thereby avoiding the at least two processing stages of the prior art.
It is preferred to repeat the steps
Further areas of applicability will become apparent from the description provided herein. The description and specific examples in this summary are intended for purposes of illustration only and are not intended to limit the scope of the present disclosure.
The accompanying drawings, which are incorporated in and constitute part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure. The embodiments illustrated herein are presently preferred, it being understood, however, that the disclosure is not limited to the precise arrangements and instrumentalities shown, wherein:
Example embodiments will now be described more fully with reference to the accompanying drawings.
Conditions in the set of pre-defined rules are first enumerated at 72 for form an ordered list of conditions, where each condition in the ordered list of conditions is univariate, i.e., depends on a single data field. Select conditions may be removed from the ordered list of conditions as indicated at 73, where each of the removed conditions has an identical condition preceding it in the ordered list of conditions. Additionally, the conditions in the ordered list of conditions may be grouped by data field as indicated at 74.
Next, a set of transformed rules are formed at 75. For each given rule in the set of pre-defined rules, conditions in the ordered list of conditions that are associated with a given rule are connected by logical operators to form a transformed rule corresponding to the given rule in the set of pre-defined rules.
Input data records can now be filtered in relation to the set of transformed rules. To do so, a data record is retrieved at 76 and the data record is evaluated at 77 in relation to the set of transformed rules. More specifically, the data record is evaluated in relation to the set of transformed rules by evaluating each condition in the ordered list of conditions in relation to the data record; evaluating the logical operators in the set of transformed rules; and for each transformed rule in the set of transformed rules, executing the action associated with a given transformed rule when conditions for the given transformed rule are satisfied by the data record. When evaluating the logical operators in the set of transformed rules, select rules can be skipped, for example where the select rules have at least one condition that is not satisfied by the data record.
In some instances, an action executed for a particular transformed rule modifies a value of a given data field in the data record. In these instances, the data record needs to be re-evaluated in relation to the set of transformed rules. In particular, conditions in the ordered list of conditions that depend upon the value of the given data field are re-evaluated and the logical operators in the set of transformed rules are re-evaluated. Processing the set of transformed rules then continues sequentially starting from the particular transformed rule that resulted in the modification to the data record.
With continued reference to
In a first application example, the disclosed method for filtering a single input data record by a set of six rules, called R0 . . . R5, is demonstrated. In this example it is assumed that none of the actions Action0 . . . Action5 being taken after matching an input data record to the set of rules modifies the input data record. The rules used in this example are:
In the matcher queries, the symbol “==” is used for an equality check between the left and the right side, and “!=” for an inequality check. The rules, short R0 for Rule #0 etc., and the respective actions are displayed in
In a first step in the preparation phase, the positive unique, univariate conditions comprised in each rule are extracted from the rules. A univariate condition depends on a single field of the input data record only. This results in the following unique, univariate conditions per rule:
Discarding multiple instances of identical conditions results in the following list of ordered positive unique, univariate conditions for all rules:
The conditions C0 . . . C5 above are positive conditions only, i.e. no negations such as a NOT operator in «log level!=“NONE”» are present. Alternatively, it would be possible to allow negative conditions only, or even mixed conditions. The column “Rule” in Tab. 3 indicates the transformed rule(s) evaluating the respective condition. E.g. the status of the condition C0 is used when evaluation the status of the transformed rules R0* and R1*.
By using positive conditions only instead of e.g. the expressions «log level!=“NONE”» in rule R4 and «log level==“NONE”» forming part of rule R5, only a single condition C5 needs to be evaluated. Its negation is realized by the Boolean NOT operator. This results in fewer conditions and is computationally much more efficient.
Next, the conditions are grouped according to the data field in the input data record, i.e. the data field the condition depends on. This results in the following table of conditions and fields:
To conclude the preparation phase, the rules are logically identical to the transformed rules featuring logical operators, comprising Boolean operators and the Identity operator, operating on the conditions:
The mapping from the ordered list of rules to an ordered list of conditions, and from the ordered list of conditions to an ordered list of fields is shown in
In the matching phase after the preparation phase, the following input data record is assumed to be present:
The direction of mappings in the matching phase is from fields to conditions, and by applying logical operators to conditions to transformed rules. (see
As a first step in the matching phase, the conditions are evaluated against the input data record. Preferably, the order of evaluation follows the fields. This yields the following statuses of the conditions:
Consequently, the conditions C0 and C1 are TRUE, and C2 . . . C5 are FALSE.
Next, the transformed rules comprising the conditions C0 . . . C5 and logical operators are evaluated sequentially in ascending order. This yields the following results:
In case the status of a transformed rule is TRUE, the respective action will be executed. In this example Action0, Action1, Action2, and Action4 are executed, whereas Action3 and Action 5 are not executed.
In contrast to prior-art solutions, the disclosed method is suitable for high and extremely high data input rates since thousands of rules can be processed in real-time. Due to the splitting of rules into an ordered list of unique, univariate conditions and the fact that the conditions are present in multiple rules, the evaluation of rules is a lot quicker than in the prior art since conditions are evaluated just once.
In a second application example, the rules R0 . . . R5 from the 1st application example are reused. Apart from using a different input data record, it is assumed that the action “Action3” following R3 changes the field “log level” in the input data record to “INFO”. The other actions following rules R0 . . . R2, R4 and R5 do not modify data (see
As the rules are the same as in the 1st application example, the preparation phase is identical to the 1st application example. Therefore, the transformed rules from the 1st application example are reused.
In the matching phase, the following data record is assumed:
First in the matching phase, the conditions are evaluated against the input data record. This yields the following statuses of the conditions:
In other words, the conditions C4 and C5 are TRUE, and C0 . . . C3 are FALSE.
Next, the transformed rules are evaluated sequentially in ascending order based on the statuses of the conditions C0 . . . C5, i.e. starting with R0*, then R1*, R2* etc.
It turns out that the status of R0*, R1* and R2* is FALSE and the status or R3* is TRUE. The intermediate states of conditions C0 . . . C5 and transformed rules R0* . . . R3* before executing Action 3 are shown in
As only one field F3 in the input data record changed, only one condition, namely C5, being dependent from the data record “log level” needs to be re-evaluated. This yields the following result:
Following this modification of the input data record and the re-evaluation of the affected condition, the processing of rules is continued. After R3*, the transformed rule R4* is evaluated. This yields the following result:
As Action4 does not change the input data record, the processing of rules is continued with R5*. This yields:
In total, both Action3 and Action4 are executed. The final status is displayed in
In a third application example four filtering rules are present and the actions following the rules do not modify the data record. The following rules and actions are assumed to be present:
As in the previous examples, the rules are split into an ordered list of unique, univariate conditions. This yields the following conditions:
Following this, the conditions can be grouped according to the data fields in the input data record:
To conclude the preparation phase, the rules can be rewritten as transformed rules featuring conditions and logical operators operating on these conditions as follows:
The structure of data fields, transformed rules comprising conditions and logical operators, and actions is shown in
In the matching phase of the disclosed method, the following input data record is assumed:
The evaluation of conditions yields the following states of the conditions:
The column “Pre-selected rules Id” maps a TRUE status of a condition, e.g. C2, to a respective transformed rule, here R1*, as C2 is used when evaluating R1*. As the transformed rule R0* is not present in the column “Pre-selected rules Id” above, R0* does not need to be evaluated at all. This reduces complexity!
Based on the status of conditions, only the transformed rules R1*, R2* and R3* are evaluated. The evaluation yields the following results:
Finally, the actions following the rules R1*, R2* and R3* having a “TRUE” status are executed. The statuses of conditions, transformed rules, and actions are shown in
A fourth application example shows another case where actions modify multiple data records. The following rules and actions are assumed:
First, the rules R0 . . . R3 are split into conditions, which yields the following list of ordered, unique, univariate conditions:
Next, the conditions are grouped according to the visited field:
Next, the transformed rules are represented by logical operators operating on the conditions:
In the preparation phase, the structure of data fields, transformed rules comprising conditions and logical operators, and actions was derived as shown in
In the matching phase of the method, the following inputs are used:
Based on these input data records, the conditions are evaluated:
It turns out that only conditions C0, C1, C3 and C4 are fulfilled.
Next, the pre-selected rules associated with TRUE conditions are executed one by one, starting with the lowest ranking rule, here R0.
Note that as R0* has the status FALSE, no action is taken and R1* is processed. As R1* has a TRUE status, the action A1 associated with R1* is executed. Thus the input data record is modified as follows:
As only the field “log level” is changed, it is sufficient to re-evaluate the dependent conditions, namely C8. The evaluation of C8 yields:
Continuing the evaluation of rules with R2* yields:
Executing the action associated with R2* yields the following data record:
As only the data record “service.type” was changed, the conditions associated with this field are re-evaluated. This yields:
The evaluation of rules is continued with R3, which yields:
Executing A3 yields the following data record:
As no condition depends on the data record “api.error”, no condition needs to be evaluated. In addition, as the last pre-selected rule was already executed, the matching phase of the method is finished. The final status of conditions, transformed rules, and actions is shown in
A fifth application example shows another case similar to the third application example where actions do not modify the input data records. The following rules and actions are assumed:
In this case, some matcher queries feature wildcards and a function “matchesValue”, which checks whether a term, e.g. “PROD”, is present in a specific data field, e.g. “instance_name”.
In a first step, the rules are split into an ordered list of unique, univariate conditions:
The 15 conditions are then grouped according to the visited data fields, which yields:
As the last step in the preparation phase, the rules are represented as transformed rules featuring logical operators operating on the conditions. This yields:
The structure of data fields, transformed rules comprising conditions and logical operators, and actions as derived in the preparation phase is displayed in
In the matching phase, the following input data records are assumed to be present:
As a first step in the matching phase, the conditions C1-C15 are evaluated against the input data record. This yields the following states:
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
In a preferred embodiment of the disclosure, the statuses of the conditions—in our case C0 . . . C14—are mapped to a Bit Array having a least the length of the unique conditions. As there are 15 conditions, the statuses can be mapped to a 15-bit array or to a 16-bit array (2 Bytes). Mapping a TRUE status to 1 and a FALSE status to 0, a 15-bit array has a binary value of “001 110 010 110 011 1”. The status of the conditions is shown in
Next, the transformed rules being present in the above table are evaluated based on the Boolean operators operating on the conditions C0 . . . C14. This yields the following results:
TRUE
Action1
TRUE
Action4
TRUE
Action6
TRUE
Action7
Consequently, only the actions A1, A4, A6 and A7 are executed (see
A sixth application example shows another case where actions modify multiple data records. The following rules and actions are assumed:
In a first step, the matcher queries for rules R0 . . . R7 are split into an ordered list of unique, univariate conditions. This yields the following conditions C0 . . . C14:
The column “Rules” notes the rule(s) where a specific condition is applied.
Next, the conditions are grouped in order of the visited field. This yields:
As the last step in the preparation phase, the transformed rules are represented as conditions and logical operators, yielding:
The structure of data fields, transformed rules comprising conditions and logical operators, and actions is displayed in
In the matching phase of the disclosed method, the following input data records are assumed to be present:
Note that the field “process.technology” in the input data record comprises two values, namely “.NET” and “azure”.
As a first step in the matching phase following the preparation phase, the conditions are evaluated against the input data records, yielding:
TRUE
TRUE
TRUE
TRUE
TRUE
As above, the statues of the conditions can be represented by a binary value, here “010110100000100”, in a 15-bit array. Next in the matching phase, the rules will be evaluated sequentially in ascending order starting from the lowest transformed rule R0* in the column of “Pre-selected rules ID” above. This yields:
The status of the conditions and transformed rules R0* and R1* are shown in
Note that the logical representation of R0* yields FALSE, consequently, the next higher-ranking rule is evaluated, which is R1*. As the evaluation of R1* yields TRUE, the action A1 associated with R1* is executed. This changes the input data record as follows:
The changed data record is printed bold above.
As only one data record was changed, the conditions depending on a single field need to be re-evaluated. This yields:
Following this, the evaluation of rules is continued with R2*.
As the matching query for R2* is TRUE, the action A2 associated with it is executed. This changes the input data record as follows:
As two conditions depend on the field service.type, these two conditions, namely C9 and C11, need to be re-evaluated. These yield:
The execution of rules is continued, however, as the rule R3* is no re-selected rules ID for any condition, R3* is omitted and rule R4* is next.
The evaluation of R4* yields:
As the matching query for R4* is TRUE, the action A4 associated to R4* is executed. This changes the input data record as follows:
As only condition C13 depends on the field backend.service.error, only C13 needs to be re-evaluated, yielding:
After this, the evaluation of rules is continued with rule R5*. As the condition of R5* is not fulfilled, the evaluation continues with R6*.
Executing action A6 associated with R6* yields the following input data records:
As only condition C14 depends on the field message, only this condition needs to be re-evaluated, yielding:
Next, the evaluation of rules is continued with rule R7*. As the condition of R7* is fulfilled (see table below), the action A7 is executed:
The action A7 changes the input data records as follows:
As no condition depends on the field error.id, consequently no condition needs to be re-evaluated. As R7* is the last transformed rule in the list of predefined rules, the disclosed method is stopped.
The techniques described herein may be implemented by one or more computer programs executed by one or more processors. The computer programs include processor-executable instructions that are stored on a non-transitory tangible computer readable medium. The computer programs may also include stored data. Non-limiting examples of the non-transitory tangible computer readable medium are nonvolatile memory, magnetic storage, and optical storage.
Some portions of the above description present the techniques described herein in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. These operations, while described functionally or logically, are understood to be implemented by computer programs. Furthermore, it has also proven convenient at times to refer to these arrangements of operations as modules or by functional names, without loss of generality.
Unless specifically stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system memories or registers or other such information storage, transmission or display devices.
Certain aspects of the described techniques include process steps and instructions described herein in the form of an algorithm. It should be noted that the described process steps and instructions could be embodied in software, firmware or hardware, and when embodied in software, could be downloaded to reside on and be operated from different platforms used by real time network operating systems.
The present disclosure also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a computer selectively activated or reconfigured by a computer program stored on a computer readable medium that can be accessed by the computer. Such a computer program may be stored in a tangible computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMS, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus. Furthermore, the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
The algorithms and operations presented herein are not inherently related to any particular computer or other apparatus. Various systems may also be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatuses to perform the required method steps. The required structure for a variety of these systems will be apparent to those of skill in the art, along with equivalent variations. In addition, the present disclosure is not described with reference to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present disclosure as described herein.
The foregoing description of the embodiments has been provided for purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure. Individual elements or features of a particular embodiment are generally not limited to that particular embodiment, but, where applicable, are interchangeable and can be used in a selected embodiment, even if not specifically shown or described. The same may also be varied in many ways. Such variations are not to be regarded as a departure from the disclosure, and all such modifications are intended to be included within the scope of the disclosure.
This application claims the benefit of U.S. Provisional Application No. 63/528,998, filed on Jul. 26, 2023. The entire disclosure of the above application is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
63528998 | Jul 2023 | US |