Embodiments of the present invention relate generally to computer-implemented systems for information management, and more particularly to a method and system for maintaining historical data for data receivers.
Banks and other institutions may use computerized systems to manage information. A computerized system may include one or more software components, associated interfaces, databases and supporting hardware. Components may comprise a plurality of applications. Various ones of the components may require information from, or supply information to, others of the components, where the components may belong either to a common computerized system or distinct computerized systems.
An example occurs in banks. One software component of a computerized system of a bank might be used for managing loans made by the bank—e.g., taking loan applications, administering payments and the like. Another software component of a computerized system of the bank might relate to collateral management. The collateral management component would typically need information from the loan management component, and vice versa, for example to determine whether a loan applicant had provided enough security for a given loan to be approved. Further, both the loan management component and the collateral management component might supply information, from time to time, to an analyzer component that used the information to perform calculations, generate statistics and perform regulatory reporting.
Sharing of information between computerized systems as described in the foregoing need not be confined to a single institution such as a bank. Various independent entities (e.g., businesses, government agencies) need to be able to obtain the information of other independent entities, and to provide information to other independent entities. There could be a one-way flow of information between entities, or a two-way flow. That is, an entity might supply information to another, but not receive information from the other; or might receive information from another but not supply it to the other. On the other hand, two entities might mutually exchange information.
In the various computerized systems described in the preceding, in many cases the supplying of information and the obtaining of information needs to be possible on demand in order for the businesses and other institutions served by the systems to operate properly. For example, in a bank, a loan could be initially approved, but then the requested amount of the loan might increase. In this event, the loan management component would need to determine whether the value of the collateral securing the loan was enough to secure the increased loan amount. Therefore, the loan management component would need to obtain the value of the collateral from the collateral management component. Similarly, the collateral management component would need to be informed about the increased loan amount.
Moreover, different entities need different information at different times. For example, a bank analyzer component might need historical information about loans stretching back weeks or months in order to perform its functions. On the other hand, a collateral management component might never need anything more than the most current “snapshot” of a loan status.
Computerized systems are known for handling information flow as described above. Such systems may abstract suppliers of information as “senders,” and requesters or receivers of information as “receivers.” In view of the differing needs of receivers, one challenge for the systems associated with on-demand service is ensuring that a given receiver gets the kind of information it needs, when it needs it. The challenge is presented largely by the fact that information is continually changing and being added to, while computer processing and data storage capabilities are finite.
By way of illustration, consider again the example of a bank. Information about a given account might change a number of times over the course of a day, week or month. For a first receiver of information about the account, only current information might be needed. On the other hand, a second receiver of information about the account might need to know the information for one or more points in the past. Still a third receiver might need to know information about the account for one or more points in the past different from the points that the second receiver is interested in.
There might be many more such changing accounts in the bank, and many more receivers with varying needs for information about the accounts. One straightforward approach to meeting the needs of all the receivers might be to simply maintain independent copies of the information as needed for each receiver. However, this approach is clearly unworkable because of the huge demands it would place on data storage and processing capacity.
Accordingly, it is known to only supply and receive changes in the information. That is, assuming an initial or base information set, only changes to the base information set, such as modifications, additions or deletions are supplied to interested receivers. In the case of a bank loan, for example, the base information set might include such data as an account number, a borrower's name and address, and initial conditions, such as an initial interest rate. Then, the base information set might be changed, for example, by a modification of the initial interest rate, the addition or deletion of participants in the loan, the occurrence of an early termination, or the like. Interested receivers, assuming they already have all or some the base information set, can keep up-to-date in accordance with their respective needs by being informed only of the changes.
Existing techniques for propagating such changes include a “push” technique and a “pull” technique. In a push technique, when a change occurs in a sender's information, the sender sends the change information, without being requested to, to all known interested receivers at substantially the same time as the change occurs, or at some later, previously arranged or convenient time. In a pull technique, a receiver requests information when it wants it, and a sender returns information in response to the request.
However, there are disadvantages associated with known techniques. One disadvantage is that a receiver that uses a pull technique (a “pull receiver”) cannot obtain reliable historical data. This is because a pull receiver may not know when information that it is interested in has changed, and consequently may not request the change information. Thus, if the information changes again before the pull receiver requests the earlier change information, that earlier change information may be lost for the pull receiver. While it would be possible for a sender to push change information to all interested receivers whenever change occurs, this would not be an acceptable arrangement for most pull receivers, since most pull receivers are only able to process information at times of their choosing. Further, the arrangement would in general place excessive demands on and lower the performance of the associated computer systems. Another alternative would be for the sender to save copies of all pre-change data in the event a pull receiver later wants historical data. However, this alternative also has disadvantages, since, along lines discussed earlier, it would be costly in terms of data processing and storage capacity, and there may be cases when there is no actual need for the saved data.
Embodiments of the present invention address disadvantages in the prior art as discussed above. The embodiments relate to preserving historical data only for those receivers that require it. Thus, historical data is kept only when necessary and the excessive demands associated with alternative implementations are avoided. Moreover, the needs of pull receivers requiring historical data are efficiently met.
The embodiments relate more specifically to creating a new change pointer to indicate a change to be made to data on a database. For a receiver of the data, it may be determined whether the receiver requires historical data, and if so, an image of the data may be created before changing it. This pre-change image may be stored on an image database, and then the data may be changed on a current database. The new change pointer may be related to the changed data on the current database, and the image may be related to a previous change pointer.
A plurality of images may be created in this way over time. When a receiver, such as a pull receiver, that needs historical data requests the historical data, the images may be retrieved for the receiver and the associated pointers may be correspondingly updated to indicate that the receiver has been provided with the historical data (the images). The images may also be provided to a receiver by a push mechanism.
Before changing the data, the application 101 may call a change notification API (Application Program Interface) 103, and pass it parameters relating to the change. For example, the parameters could include an object type, object key, and other information. In response to the call by the application 101 and based on the parameters, the change notification API 103 may perform operations represented by block 104, labeled “processing,” to add a new change pointer corresponding to the change to a change pointer database 106 embodied on a machine-readable medium. The new change pointer may include such information as what object is being changed, the time that it is being changed, and a reason or reasons for the change. The new change pointer may further include such information as a unique change pointer identifier, a change origination, and a time stamp.
As part of the processing 104, the change notification API 103 may consult a configuration database 105 embodied on a machine-readable medium to determine whether there are any receivers that need historical data for the object whose associated data is being changed. Assume in this example that there is at least one such receiver. The processing 104 may then further include checking for whether any other change pointers have been created earlier for the same object that have not yet been processed for the at least one receiver. If there were an unprocessed change pointer, this would be an indication that the receiver had not yet received the associated change information. In the present example, assume that no earlier change pointer exists for the object whose associated data is being changed. The application 101 may then proceed to change data associated with the object on the application database 102.
As discussed earlier in connection with
According to embodiments of the present invention, based at least in part on the condition that there is an unprocessed change pointer for the interested receiver, an image 201 may be made. An image is a copy of the current data before it is changed.
To create the image 201, the processing 104 may call back to the application 101 via another API 202. In response to the call back, the application 101 may read the pre-change data from the application database 102 and pass it as an image 201 via the API 202 to the processing 104. The processing 104 may store the image 201 on an image database 203 embodied in a machine-readable medium. The processing 104 may further relate the image 201 to the previously-created pointer (the pointer created as described with reference to
A plurality of images may be created in the foregoing way over time, for any receiver that needs historical data for data being changed. More specifically, there may be only one image created for all receivers that need the image, but a number of new images may be created as additional changes occur. It may be understood in view of the above that historical data is preserved for receivers that need it in an efficient manner, since pre-change data is only copied when necessary (i.e., only when an interested receiver that needs historical data has not yet received it), thereby conserving data processing and storage capacity. Moreover, the needs of pull receivers are efficiently met.
The information as to what receivers need historical data (images) may be kept, as noted earlier, in a configuration database 105. The configuration database 105, more specifically, could be “customizing” data that is specific to particular user(s) and/or application(s). The customizing data could include a definition, per “export object” (object subject to being pushed or pulled) of: whether images of the object are available at all (some objects may not support images); which receivers are interested the object (subscription); whether a subscribing receiver requires historical data for the object; and other parameters.
When a receiver subsequently pulls (requests and receives) change data, and consequently, a corresponding image or images, the associated change pointer(s) may be correspondingly updated to indicate that the pointer(s) has/have been processed for a particular receiver. Thus, if a check is subsequently performed for whether a receiver that needs historical data has already received the corresponding image or images, it can be determined from the processed change pointers.
Any new pointer created since the last pull may be returned by the change pointer API 304 to the pull extractor tool 303. If the pulling receiver 301 requires images, for each pointer associated with an image, the corresponding image may be retrieved from the image database 203 by the pull extractor tool 303 and returned to the pulling receiver 301. Additionally, the current data on the application database 302 may be retrieved and returned to the pulling receiver 301. After any images and/or current data are retrieved for the pulling receiver, the change pointer database 106 may be updated to indicate that the associated pointers have been processed for that particular pull receiver. Thus, if the change notification API 103 subsequently consults the change pointer database 105 to determine whether there are any pointers associated with change data that have not yet been processed for the receiver, the pointers will correctly indicate that the receiver has received the needed change data.
To send data, the application 401 may call a push extractor tool 403. The push extractor tool 403 may call the change pointer API 304. The change pointer API 304 may read the change pointer database 106 and determine whether in fact there is any data that needs to be sent, based, for example, on whether any new change pointers were created for the data to be sent since the last push. The change pointer API 304 may return any new pointers to the push extractor tool 403. The change pointer API 304 may further read the configuration database 105 to determine whether any receiver requires images. If there is only one pointer, this is an indication that there are no images associated with the data to be sent. In other words, if there is any data to be sent, it is current data on the application database 302. On the other hand, if there is more than one pointer associated with the data to be sent, this is an indication that there are images associated with the data to be sent.
If a receiver 406 requires images, for each pointer associated with an image, the corresponding image may be retrieved from the image database 203. Additionally, the current data on the application database 302 may be retrieved. According to embodiments, the push extractor tool 403 may store retrieved images in a “container,” (404) which is a grouping of data items where the grouping has a predetermined size limit to facilitate subsequent handling. If current data is retrieved from the application database 302, it may be converted into image format before being placed in the container. If and when the container reaches its size limit, or there is no more data to put in the container, the container may be sent to a “middleware” layer of software 405 that distributes the data in the container to various interested receivers 406. A plurality of containers 404 may be filled and sent for distribution in a loopwise manner, until all required data is pushed. After any images and/or current data are pushed to a receiver 406, the change pointer database 106 may be updated to indicate that the associated pointers have been processed for that particular receiver. Although not shown in
A computer program or collection of programs comprising computer-executable instructions according to embodiments of the present invention may be stored and transported on machine-readable media such as diskette 501, CD-ROM 502, magnetic tape 503 and fixed disk 504. The computer instructions may be retrieved from the machine-readable media 501-504 using their respective reading devices 505-508 into memory 500, and executed by a processor 510. The functionality disclosed hereinabove for performing the embodiments may find specific implementations in a variety of forms, which are considered to be within the abilities of a programmer of ordinary skill in the art after having reviewed the specification.
More specifically, the components could execute on a plurality of hardware platforms in a client-server environment. The environment may include client machines to handle front-end processing (e.g., user interfaces), application servers to process application logic, and database servers to handle the database access.
It may be appreciated in view of the foregoing description that embodiments of the invention efficiently address disadvantages in the prior art by reducing a number of images to a minimum based on a receiver's requirements. Moreover, the embodiments provide an extraction process (the push and pull extractors described above) that minimizes the number of objects replicated, and supports both push and pull scenarios using a common framework.
Several embodiments of the present invention are specifically illustrated and/or described herein. However, it will be appreciated that modifications and variations of the present invention are covered by the above teachings and within the purview of the appended claims without departing from the spirit and intended scope of the invention.