The present disclosure generally relates to identifying related event processing for a transaction, and in particular to identifying a transaction lineage from event processing.
Entitles like financial institutions process many transactions on a daily basis. Each transaction involves multiple steps and processes. For example, transferring funds from a first person to a second person may involve verifying the identity of the first person, checking whether the first person has enough funds in his account for the transfer, checking whether the transfer has the characteristics of an in appropriate or reportable type of transfer (e.g., money laundering), and other steps. Typically, when storing data on such a transaction, a storage system will store general information for the transaction, such as X amount of funds were transferred from account A to account B on a certain date. However, there is typically no way to verify that the correct steps were followed in processing the transaction based on the storage of such general information.
In addition, modern transaction processing may occur across many different processing systems, each of which may have a separate part of the processing to perform. Each of these systems may have several processing steps that modify the transaction within a system, and different systems may represent the data in different data storage schema. As a result, the same transaction may generate many types of events at these different systems and be associated with different types of events and related data during this processing. When systems report completion of events related to processing the transaction, determining the transaction lineage and correlating processing of a particular transaction across systems may be challenging due to the changing nature of the data within and across systems.
An event lineage system processes data received for events to determine link signatures to associate received events with other events. Each event may represent a state or portion of processing for completing one or more transactions. The event may be described by event data that may describe a state or condition of the data after a processing event. The event data may thus describe a category of the event, processing codes, data values, and other event data. When link signatures match across events, the event lineage system may determine that the events are a part of the same transaction and thereby generate a transaction lineage for the events.
To generate the link signatures, the event lineage system maintains a set of lineage rules. The lineage rules describe parameters for converting the data elements for an event to link signatures of the event. Each lineage rule may include conditions that may be used to identify what type of events the rule should be applied to. These conditions may describe an event type, field values, data scheme types, and other aspects of event data. When an event (or respective event data) is received (or identified) by the event lineage system, the event lineage system determines which lineage rules match the event data and meet the conditions for applying those lineage rules. For each matching lineage rule, the lineage system applies the lineage rule to determine one or more link signatures for the event. The link signatures may be categorized as a child link signature or a parent link signature, designating whether the link signature is expected to match a preceding or following event.
To determine the event signature, the lineage rule specifies data elements of the event data (e.g., data values for particular fields) and an order for the data elements. To obtain a signature, the ordered data elements are hashed to generate a signature, for example by calculating the root of a Merkle tree having the data elements. Though the data schemas may differ across systems and have varying data elements, because the same underlying values can be identified in the different data schemas and ordered by the rules (which may differ in varying schemas and correspond to different field names), the resulting signature may still match. Using the link signatures, the event lineage system can match a series of events and determine a transaction lineage that represents the time-ordered sequence of events, even as events may be split to several systems and under differing data schemas.
In addition, the event signatures may be used to audit or evaluate successful transaction processing. The link signatures in some examples may represent “expected” prior and subsequent processing for a transaction. For example, when a parent link signature is generated by a lineage rule, this may indicate that the subject event is expected to come after a prior event, and should not be the initial event in a process. Likewise, when a child link signature is generated, this may indicate that the subject event is expected to have a subsequent event that will match the child link signature. When these link signatures are unmatched (e.g., a child link signature has no matching parent link signature or a parent link signature has no matching child link signature), it may thus indicate an error with successfully completing processing of that transaction, and may be used to identify or diagnose errors within the systems.
In addition, the lineage rules allow events to be received and processed by the event lineage system in parallel and without requiring the event lineage system to receive events in a particular order. To generate the link signatures for an event, typically the lineage rules use the data of the event itself, rather than some known relationship between this particular event and another event. As a result, the events can be processed in parallel without maintaining a known list of pending transactions and attempting to link events to a pending transaction as events are received. This has the additional benefit that the transaction lineage may only be useful or required infrequently, such as demonstrating compliance for an audit or to identify the source of an error. Accordingly, storing the events and related link signatures may permit later determination of a transaction lineage when needed, rather doing so at the time events are received.
The figures depict, and the detail description describes, various non-limiting embodiments for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein.
The figures use like reference numerals to identify like elements. A letter after a reference numeral, such as “102A,” indicates the text refers specifically to the element having that particular reference numeral. A reference numeral in the text without a following letter, such as “102,” refers to any or all of the elements in the figures bearing that reference numeral (e.g., “102” in the text refers to reference numerals “102A,” “102B,” “102C” and/or “102D” in the figures).
As the transaction is processed, records of the processing may be generated, and represented here as events. These records may capture the state of the processing at a particular point, such as upon entry of a request to a processing system 110, and at intermediate processing steps at a processing system 110. For example, a processing system may generate each event related to the transaction and capture the state of the transaction when the event occurred. Continuing with the example of transferring funds from the first account to the second account, an event with related event data may be generated (e.g., as a record) for each of the events mentioned above that occurred for the transaction.
The event data for each particular event may be stored in varying schemas, according to the transaction, the particular event, configurations of the processing system 110 performing the event, and so forth. For example, event 100 at processing system 110A is stored as “Schema A” while other events 101-103 are associated with different schemas B, C, and D respectively. Each schema is a defined organization or structure of relevant event data. A schema may define a set of various data labels, associated data types, and permissible values of the data. For example, a schema may include a label “Transaction Id” of a data type “String” with permissible values of any string of characters up to a maximum length. As another example, a schema may define a data label of “processing code” as an integer with permissible values in the range of 1-8. These schemas may differ across different processing systems 110 and across different events. For example, within processing system 110B, event 101 has event data stored in Schema B. The same Schema B is used for event 104. However, at processing system 110C and 110D, the transactions between event 102 and 105, as well as between 103 and 106, change schemas within the respective processing systems 110. Typically, these changing schemas may or may not have equivalent or identical fields or data labels between different schemas.
The flow of a transaction may also split or combine across different processing systems 110. As shown in
Though shown here as relating to a single transaction, the schemas and processing systems 110 may not readily provide information or a unifying identifier to identify that the data relates to the same transaction. As discussed further below, “lineage rules” may be used to describe the relationships between the different events and event data. By applying the lineage rules to the event data, the event data itself may be used to describe “link signatures” from the event data to provide a means for identifying links between executed events when executing a specific transaction. This process is discussed in further detail below.
The processing systems 110A-C are computer systems that processes at least part of a transaction. As discussed with respect to
Processing a transaction involves multiple steps and the execution of multiple processes. For example, transferring funds from a first account to a second account may involve, validating that the first and second accounts exist, verifying that the user that initiated the transfer is authorized to make the request, determining whether the first account has sufficient funds for the transfer, determining whether the amount of the transfer exceeds an established limit, determining whether the transfer has the characteristics of a money laundering type of transfer, etc. As the processing systems 110 complete events in the processing, the processing systems 110 report the related event data to the event lineage system 202.
In some embodiments, the processing systems 110 include a data storage system that stores data for each event of a transaction performed by that processing system. In addition, or as an alternative, the event data may be transmitted to and stored by the event lineage system 202. The storage of the event data may be performed by storing the events as one or more progressions related to each transaction. A progression is comprised of multiple records (e.g., the event data) that are chronologically and cryptographically linked. Each record of a progression represents an event related to the transaction of the progression. In the embodiment where multiple processing systems 110 collaborate to process transactions, each processing system 202 may store a subset of records of a progression or progressions that are linked to other progressions stored by another processing system 202.
The event lineage system 202 receives event data related to various events as transactions are processed. As discussed more fully below, the event lineage system 202 receives events and uses the event data and lineage rules to identify relationships between events and identify lineages between occurring events.
The network 206 represents the communication pathways between the processing system(s) 204, the event lineage system 202, and any other systems (not shown) communicating over the network 206. In one embodiment, the network 206 is the Internet and uses standard communications technologies and/or protocols. The network 206 can also utilize dedicated, custom, or private communications links that are not part of the public Internet. The network 206 may comprise any combination of local area and/or wide area networks, using both wired and wireless communication systems. In one embodiment, information exchanged via the network 206 is cryptographically encrypted and decrypted using cryptographic keys of the senders and the intended recipients.
The event management module 300 receives event data to evaluate event lineages and may store event data. When an event for a transaction occurs, the event management module 300 receives data from the processing system 110 for the event. An event may be, for example, a process executed as part of the transaction, a function applied to the transaction data or any other step of the transaction. The data identified by the event management module 300 may include the data processed, data input into a function, an identifier of the process/function applied, and the results of the process/function.
In some embodiments, the event management module 300 may store event data in an event data store 308. The event data may be stored by various means, and in one embodiment is stored as a set of progressions. These progressions may be cryptographically linked and immutable such that the event data may be subsequently verified after storage. In these circumstances, the event lineage system 202 may also operate to verify records and operate as a trusted record or ledger for the events.
The event management module 300 uses the lineage rules 304 to generate one or more link signatures reflecting expected prior and future events associated with the received event data. The lineage rules 302 define how to generate one or more link signatures from the event data. The lineage rules may be stored as a structured mark-up language, script or language or other form. For example, in various embodiments the lineage rules may be stored as YAMML or JSON.
The lineage rules 302 may define a set of conditions for defining which event data the lineage rules apply to. When a received event matches these conditions, the lineage rule is applied to generate the link signatures designated by the lineage rule. The conditions for applying a lineage rule may identify a data schema or processing system from which the event data was received. The conditions may also include an event type, or a data field value for a particular data item in the event data. These conditions may be particular to the schema designated by the lineage rule. For example, the lineage rule may specify that it relates to SchemaA when the value for field “ActivityName” in SchemaA has a value of “BOOK.”
To generate the link signature, the data identified by the lineage rule is hashed by a hashing function to determine a unique signature for the information to be linked. In one example, a hash function is applied to each data element to create hash values for each data element. In one embodiment, the hash function applied is an SHA-256 (Secure Hash Algorithm-256) function. These hash values may be organized as a tree in which hash values for data elements are combined. The order of data elements defined by the lineage rule is used to determine the order of data items being hashed and combined. In this example, the hashes may be combined to generate a root of a Merkle tree. This Merkle tree root may be used as the link signature for the event. In another example, the link signature may be generated by other hashing means, for example by concatenating data values in the defined order and determining a hash value of the concatenated data values.
As shown in
The link signatures may be associated with an event node 404 for the event. An event node may represent the event when stored in association with the link signature, for example in a graph or other structure or data storage scheme. Together, the generated link signatures and event are termed an event lineage, representing the characteristic signatures of the received event. After generating the link signatures and event lineage, the link signatures may be stored in a lineage signature data store 304 shown in
In this example, the Parent Link Signature for lineage rule 450A is not shown for convenience. The child link signature for lineage rule 450A designates the values and ordering to be used in generating the link signature for a child link. In this example, the order specifies the SystemID, ActionCode, and RequestorID fields are used, in that order, for generating the link signature. These values may be selected from the data values of the Schema. In this example, the SystemID is “SystemA” and the “ActionCode” is DBT (which is known because the condition required the ActionCode field to equal DBT).
Lineage Rule 450B shows a corresponding lineage rule for an event expected to be a child of lineage rule 450A. Here, the parent link signature describes the data values for generating a corresponding link signature to the link signature generated by lineage rule 450A. However, since the lineage rule 450B relates to a different event and different Schema (Schema G), different data fields and values may be available. For example, Schema G may have no data fields corresponding to fields of Schema F, such as the “SystemID” or “ActionCode” values. Although that data thus may not be in the Schema of the matching data event for lineage rule 450B, these values may be designated in the lineage rule itself. In this case, the first two data values for the parent link signature are defined as strings, having values “SystemA” and “DBT.” These correspond to the expected values that would be used when lineage rule 450A uses its SystemID and ActionCode values from Schema F. By including these values in lineage rule 450B, this rule may be used to connect related events, even when the related schema does not directly have that data in its data fields. In addition, the parent link of lineage rule 450B includes the value of field SourceRequestID, which in Schema F corresponds to the RequstorID field. As a result, the child link signature of lineage rule 450A that uses values of [SystemID, Action Code, Requestor ID] (three values) may match the signature generated from lineage rule 450B that values of [“SystemA”, “DBT”, SourceRequestID] (three values).
By appropriately designating fields of various granularity, the lineage rules can account for different types of processes and transactions. For example, a link to represent an aggregation of all “DBT” actions from a particular system may need to represent a large number of parent events for the event aggregating these actions. This may be considered a “fan-in” relationship between these events, where one later event relies on many prior events. To do so in a lineage rule, the lineage rule may specify the type of events being aggregated, rather than refer to specific transaction identifiers. For example, the link signature may be defined as using the only the System ID or “ActionCode” fields in the rule for the event being aggregated. Likewise, the lineage rule for the aggregating event may create a link signature for defined values (e.g., specified strings) for the relevant system and ActionCode. In this way, link signatures can be used to define such “fan-in” or “fan-out” relationships across events.
The lineage rules may also specify additional operations for generating link signatures. For example, the lineage rules may also specify an ordering of data field values within a data field type for the schema. For example, a schema may permit the listing of any number of data elements, such as transaction times, or a list of strings. To ensure that these values are consistent across links, the lineage rule may specify that these values are “ordered by” a data field value or parameter. For example, strings (whichever strings are present in the event data for that field) may be “ordered by” an alphabetical ordering, or transaction times may be ordered chronologically. In addition, the lineage rule may designate that for each separate value of a field present in the event data, a link signature is to be generated for each value. Thus, if the data specifies three strings, a link signature may be generated for each of the three strings, using each string respectively in the generation of the link signature.
In these examples, the generation of a link signature may represent that there is an “expected” subsequent event that, when the transaction is complete, should generate a matching link signature. Accordingly, the link signatures may also be conditionally generated based on whether a further event is expected. The conditional generation may be performed by designating that an event is terminal when a condition is evaluated. In that case, child link signatures (or another type) may not be generated, or, link signatures may be flagged to not expect or require a match for that link signature to consider the transaction as succeeding.
These transaction lineages may also be used to audit and verify that transactions correctly executed. Returning to
The storage device 708 is a non-transitory computer-readable storage medium, such as a hard drive, compact disk read-only memory (CD-ROM), DVD, or a solid-state memory device. The memory 706 holds instructions and data used by the processor 702. The pointing device 714 may be a mouse, track ball, or other type of pointing device, and is used in combination with the keyboard 710 to input data into the computer system 700. The graphics adapter 712 displays images and other information on the display 718. The network adapter 716 couples the computer system 700 to the network 206. Some embodiments of the computer system 700 have different and/or other components than those shown in
The computer 700 is adapted to execute computer program modules for providing the functionality described herein. As used herein, the term “module” to refers to computer program instruction and other logic for providing a specified functionality. A module can be implemented in hardware, firmware, and/or software. A module is typically stored on the storage device 708, loaded into the memory 706, and executed by the processor 702.
A module can include one or more processes, and/or be provided by only part of a process. Embodiments of the entities described herein can include other and/or different modules than the ones described here. In addition, the functionality attributed to the modules can be performed by other or different modules in other embodiments. Moreover, this description occasionally omits the term “module” for purposes of clarity and convenience.
The types of computer systems 700 used by the systems of
Some portions of this description describe the embodiments in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are commonly used by those skilled in the data processing arts to convey the substance of their work effectively to others skilled in the art. These operations, while described functionally, computationally, or logically, are understood to be implemented by computer programs or equivalent electrical circuits, microcode, or the like. Furthermore, described modules may be embodied in software, firmware, hardware, or any combinations thereof.
Reference in the specification to “one embodiment” or to “an embodiment” means that a particular feature, structure, or characteristic is included in at least one embodiment of the disclosure. The appearances of the phrase “in one embodiment” or “a preferred embodiment” in various places in the specification are not necessarily referring to the same embodiment.
Some portions of the above are presented in terms of methods and symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the art to most effectively convey the substance of their work to others skilled in the art. A method is here, and generally, conceived to be a self-consistent sequence of steps (instructions) leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical, magnetic or optical signals capable of being stored, transferred, combined, compared and otherwise manipulated. It is convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like. Furthermore, it is also convenient at times, to refer to certain arrangements of steps requiring physical manipulations of physical quantities as modules or code devices, without loss of generality.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “displaying” or “determining” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system memories or registers or other such information storage, transmission or display devices.
Certain aspects disclosed herein include process steps and instructions described herein in the form of a method. It should be noted that the process steps and instructions described herein can be embodied in software, firmware or hardware, and when embodied in software, can be downloaded to reside on and be operated from different platforms used by a variety of operating systems.
The embodiments discussed above also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a non-transitory computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus. Furthermore, the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
The methods and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may also be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear from the description below. In addition, the embodiments are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings described herein, and any references below to specific languages are provided for disclosure of enablement and best mode.
While the disclosure has been particularly shown and described with reference to a preferred embodiment and several alternate embodiments, it will be understood by persons skilled in the relevant art that various changes in form and details can be made therein without departing from the spirit and scope of the invention.
Finally, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, the disclosure is intended to be illustrative, but not limiting, of the scope of the invention.
This disclosure claims the priority benefit of U.S. provisional application No. 62/713,542, the contents of which are incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62713542 | Aug 2018 | US |