Claims
- 1. A method for knowledge discovery through analytic learning cycles, comprising:
defining a problem associated with an enterprise; executing a cycle of analytic learning which is founded on a view of data from across the enterprise, the data having been captured and aggregated and is available at a central repository, the analytic learning cycle employs data mining including
exploring the data at the central repository in relation to the problem, preparing a modeling data set from the explored data, building a model from the modeling data set, assessing the model, deploying the model back to the central repository, and applying the model to a set of inputs associated with the problem to produce results, thereby creating historic data that is saved at the central repository; and repeating the cycle of analytic learning using the historic as well as current data accumulated in the central repository, thereby creating up-to-date knowledge for evaluating and refreshing the model.
- 2. The method of claim 1, wherein the enterprise experiences a plurality of events occurring at a plurality of sites across the enterprise in association with its operations, wherein a plurality of applications are run in conjunction with these operations, wherein the operations, the plurality of events and applications, and the data are integrated so as to achieve the view as a coherent, real-time view of the data from across the enterprise as well as to achieve enterprise-wide coherent and zero latency operations, and wherein the integration is backed by the central repository.
- 3. The method of claim 1, wherein the data is explored using enterprise-specific predictors related to the problem such that through the analytic learning cycle the data is analyzed in relation to the problem in order to establish patterns in the data.
- 4. The method of claim 1, wherein a plurality of organizations includes a retail organization, a healthcare organization, a research institute, a financial institution, an insurance company, a manufacturing organization, and a government entity, wherein the enterprise is one of the plurality of organizations, and wherein the problem is defined in relation to operations of the enterprise.
- 5. The method of claim 1, wherein the problem is defined in the context of asset protection and is formulated for fraud detection.
- 6. The method of claim 1, wherein the problem is defined in the context of financial transactions with a bank representative or via an ATM (automatic teller machine), the problem being formulated for presenting customer-specific offers in the course of such transactions.
- 7. The method of claim 1, wherein the problem is defined in the context of business transactions conducted at a point of sale, via a call center, or via a web browser, the problem being formulated for presenting customer-specific offers in the course of such transactions.
- 8. The method of claim 1, wherein the problem definition creates a statement of the problem and a way of assessing and later evaluating the model, and wherein, based on model assessment and evaluation results, the problem is redefined before the analytic learning cycle is repeated.
- 9. The method of claim 1, wherein the results are patterns established through the application of the model, wherein the results are logged in the central repository and used for formalizing responses to events, the responses becoming part of the historic data and along with the responses are used in preparing modeling data sets for subsequent analytic earning cycles.
- 10. The method of claim 1, wherein the data is held at the central repository in the form of tables in relational databases and is explored using database queries.
- 11. The method of claim 1, wherein the preparation of modeling data set includes transforming explored data to suit the problem and the model.
- 12. The method of claim 11, wherein the transformation includes reformatting the data to suit the set of inputs.
- 13. The method of claim 1, wherein the modeling data set holds data in denormalized form.
- 14. The method of claim 13, wherein the denormalized form is fashioned by taking data in normalized form and lining it up flatly and serially end-to-end in a logically contiguous record so that it is becomes retrievable more quickly relative to normalized data.
- 15. The method of claim 1, wherein the modeling data set is held at the central repository in a table containing one record per entity.
- 16. The method of claim 15, wherein the modeling data set is provided to a target file, and wherein the table holding the modeling data set is identified along with the target file and a transfer option.
- 17. The method of claim 16, wherein the modeling data set is provided to the target file in bulk via multiple concurrent streams, and wherein the transfer option determines the number of concurrent streams.
- 18. The method of claim 1, wherein the modeling data set is provided from the central repository to a mining server in bulk via multiple concurrent streams.
- 19. The method of claim 1, wherein based on the assessment of the model one or more of the defining, exploring, preparing, building, and assessing steps are reiterated in order to create another version of the model that more closely represents the problem and provide predictions with better accuracy.
- 20. The method of claim 1, wherein the data set is prepared using part of the explored data and wherein the model is assessed using a remaining part of the explored data in order to determine whether the model provides predictions with expected accuracy in view of the problem.
- 21. The method of claim 1, wherein the model is formed with a structure, including one of a decision tree model, a logistic regression model, a neural network model, a nearest neighbor model, a Naïve Bayes model, or a hybrid model.
- 22. The method of claim 21, wherein the decision tree contains a plurality of nodes in each of which there being a test corresponding to a rule that leads to decision values corresponding to the results of the test.
- 23. The method as in claim 21, wherein the neural network includes input and output layers and any number of hidden layers.
- 24. The method as in claim 1, wherein the defining, exploring, preparing, building, and assessing steps are used to build a plurality of models that upon being deployed are placed in a table at the central repository and are differentiated from one another by their respective identification information.
- 25. The method as in claim 1, wherein the model is applied to the set of inputs in response to a prompt from an application to which the results or information associated with the results are returned.
- 26. A system for knowledge discovery through analytic learning cycles, comprising:
a central repository; means for providing a definition of a problem associated with an enterprise; means for executing a cycle of analytic learning which is founded on a view of data from across the enterprise, the data having been captured and aggregated and is available at the central repository, the analytic learning cycle execution means employs data mining means including
means for exploring the data at the central repository in relation to the problem, means for preparing a modeling data set from the explored data, means for building a model from the modeling data set, means for assessing the model, means for deploying the model back to the central repository, and means for applying the model to a set of inputs associated with the problem to produce results, thereby creating historic data that is saved at the central repository; and means for repeating the cycle of analytic learning using the historic as well as current data accumulated in the central repository, thereby creating up-to-date knowledge for evaluating and refreshing the model.
- 27. The system of claim 26, further comprising:
a plurality of applications, wherein the enterprise experiences a plurality of events occurring at a plurality of sites across the enterprise in association with its operations, wherein the plurality of applications are run in conjunction with these operations; and means for integrating the operations, the plurality of events and applications, and the data so as to achieve the view as a coherent, real-time view of the data from across the enterprise as well as to achieve enterprise-wide coherent and zero latency operations, and wherein the integration means is backed by the central repository.
- 28. The system of claim 26, wherein the data is explored using enterprise-specific predictors related to the problem such that through the analytic learning cycle the data is analyzed in relation to the problem in order to establish patterns in the data.
- 29. The system of claim 26, wherein a plurality of organizations includes a retail organization, a healthcare organization, a research institute, a financial institution, an insurance company, a manufacturing organization, and a government entity, wherein the enterprise is one of the plurality of organizations, and wherein the problem is defined in relation to operations of the enterprise.
- 30. The system of claim 26, wherein the problem is defined in the context of asset protection and is formulated for fraud detection.
- 31. The method of claim 26, wherein the problem is defined in the context of financial transactions with a bank representative or via an ATM (automatic teller machine), the problem being formulated for presenting customer-specific offers in the course of such transactions.
- 32. The method of claim 26, wherein the problem is defined in the context of business transactions conducted at a point of sale, via a call center, or via a web browser, the problem being formulated for presenting customer-specific offers in the course of such transactions.
- 33. The system of claim 26, wherein the means for providing the problem definition is configured for
creating a statement of the problem as defined for the enterprise and a way of assessing and later evaluating the model, and providing a modified definition of the problem, if necessary based on model assessment and evaluation results, before the analytic learning cycle is repeated.
- 34. The system of claim 26, wherein the results are patterns established through the means for applying the model, wherein the results are logged in the central repository and used for formalizing responses to events, the responses becoming part of the historic data and along with the responses are used in preparing modeling data sets for subsequent analytic earning cycles.
- 35. The system of-claim 26, wherein the central repository is configured to hold the data in the form of tables in relational databases, and wherein the data exploring means is configured to explore the data at the central repository using database queries.
- 36. The system of claim 26, wherein the modeling data set preparation means includes means for transforming explored data to suit the problem and the model.
- 37. The system of claim 36, wherein the transforming means is configured for reformatting the data to suit the set of inputs.
- 38. The system of claim 26, wherein the modeling data set holds data in denormalized form.
- 39. The system of claim 38, wherein the preparing means is configured for fashioning the denormalized form by taking data in normalized form and lining it up flatly and serially end-to-end in a logically contiguous record so that it is becomes retrievable more quickly relative to normalized data.
- 40. The system of claim 26, wherein the modeling data set is held at the central repository in a table containing one record per entity.
- 41. The system of claim 40, further comprising:
means for providing the modeling data set to a target file, the providing means being configured for identifying the table holding the modeling data along with the target file and a transfer option.
- 42. The system of claim 41, the modeling data set is provided to the target file in bulk via multiple concurrent streams, and wherein the transfer option determines the number of concurrent streams.
- 43. The system of claim 26, further comprising:
a mining server, wherein the modeling data set is provided from the central repository to the mining server in bulk via multiple concurrent streams.
- 44. The system of claim 26, wherein, based on an assessment of the model, the system is further configured to prompt one or more of the defining means, exploring means, preparing means, building means, and assessing means to reiterated their operation in order to create another version of the model that more closely represents the problem and provide predictions with better accuracy.
- 45. The system of claim 26, wherein the data set is prepared using part of the explored data and wherein the model is assessed using a remaining part of the explored data in order to determine whether the model provides predictions with expected accuracy in view of the problem.
- 46. The system of claim 26, wherein the model is formed with a structure, including one of a decision tree model, a logistic regression model, a neural network model, a nearest neighbor model, a Naïve Bayes model, or a hybrid model.
- 47. The system of claim 46, wherein the decision tree contains a plurality of nodes in each of which there being a test corresponding to a rule that leads to decision values corresponding to the results of the test.
- 48. The system as in claim 46, wherein the neural network includes input and output layers and any number of hidden layers.
- 49. The system as in claim 26, wherein the defining, exploring, preparing, building, and assessing means are used to build a plurality of models that upon being deployed are placed in a table at the central repository and are differentiated from one another by their respective identification information.
- 50. The system as in claim 26, further comprising:
a plurality of applications, wherein the applying means is configured for applying the model to the set of inputs in response to a prompt from one of the applications to which the results or information associated with the results are returned.
- 51. A computer readable medium embodying a program for knowledge discovery through analytic learning cycles, comprising:
program code configured to cause a computer to provide a definition of a problem associated with an enterprise; program code configured to cause a computer system to execute a cycle of analytic learning which is founded on a view of data from across the enterprise, the data having been captured and aggregated and is available at a central repository in real time, wherein the analytic learning cycle employs data mining including
exploring the data at the central repository in relation to the problem, preparing a modeling data set from the explored data, building a model from the modeling data set, assessing the model, deploying the model back to the central repository, and applying the model to a set of inputs associated with the problem to produce results, thereby creating historic data that is saved at the central repository; and program code configured to cause a computer system to repeat the cycle of analytic learning using the historic as well as current data accumulated in the central repository, thereby creating up-to-date knowledge for evaluating and refreshing the model.
- 52. A system for knowledge discovery through analytic learning cycles, comprising:
a central repository at which the real-time data is available having been aggregated from across the enterprise, the real-time data being associated with events occurring at one or more sites throughout an enterprise; enterprise applications; enterprise application interface which is configured for integrating the applications and real-time data and is backed by the central repository so as to provide a coherent, real-time view of enterprise operations and data; a data mining server configured to participate in an analytic learning cycle by building one or more models from the real-time data in the central repository, wherein the central repository is designed to store such models; a hub with core services including a scoring engine configured to obtain a model from the central repository and apply the model to a set of inputs from among the real-time data in order to produce results, wherein the central repository is configured for containing the results along with historic and current real-time data for use in subsequent analytic learning cycles.
- 53. The system of claim 52, wherein the scoring engine has a companion calculation engine configured to calculate scoring engine inputs by aggregating real-time and historic data in real time.
- 54. The system of claim 52, wherein the central repository contains one or more data sets prepared to suit a problem and a set of inputs from among the real-time data to which a respective model is applied, the problem being defined for finding a pattern in the events and to provide a way of assessing the respective model.
- 55. The system as in claim 54, wherein, based on results of the respective model assessment, the problem is redefined before an analytic learning cycle is repeated.
- 56. The system of claim 52, further comprising:
tools for data preparation configured to provide intuitive and graphical interfaces for viewing the structure and contents of the real-time data at the central repository as well as for providing interfaces that specify data transformation.
- 57. The system of claim 52, further comprising:
tools for data transfer and model deployment configured to provide intuitive and graphical interfaces for viewing the structure and contents of the real-time data at the central repository as well as for providing interfaces that specify transfer options.
- 58. The system of claim 52, wherein the central repository contains relational databases in which the real-time data is held in normalized form and a space for modeling data sets in which reformatted data is held in denormalized form.
- 59. The system of claim 52, wherein the central repository is associated with a relational database management system configured to support database queries.
- 60. The system of claim 52, wherein the central repository contains a table for holding models, each model being associated with an identifier, and one or more of a version number, names and data types of the set of inputs, and a description of model prediction logic formatted as IF-THEN rules.
- 61. The system of claim 59, wherein the description of model prediction logic consists of JAVA code.
REFERENCE TO PRIOR APPLICATION
[0001] This application claims the benefit of and incorporates by reference U.S. Provisional Application No. 60/383,367, titled “ZERO LATENCY ENTERPRISE (ZLE) ANALYTIC LEARNING CYCLE,” filed May 24, 2002.
[0002] This application is related to and incorporates by reference U.S. patent application Ser. No. 09/948,928, filed Sep. 7, 2001, entitled “Enabling a Zero Latency Enterprise”, U.S. patent Ser. No. 09/948,927, filed Sep. 7, 2001, entitled “Architecture, Method and System for Reducing Latency of Business Operations of an Enterprise”, and U.S. patent application Ser. No. ______ (Attorney docket No. 200300827-1), filed Mar. 27, 2003, entitled “Interaction Manager.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60383367 |
May 2002 |
US |