This disclosure relates to monitoring, detecting, analyzing, and revising machine learning models in production.
Machine learning models are applied to automate tasks previously attributed to humans with increasing efficiency and accuracy. The machine learning models are developed with a number of assumptions about how they will operate in production. However, there are risks that these assumptions may be violated, whether by operational, software/hardware, unknown circumstances, or by intentional adversarial attacks from outside sources that are attempting to manipulate the results of the machine learning models.
The presently disclosed features look to address these technical problems and to increase accuracy, fairness, and robustness in the performance of machine learning models in production.
Lifecycles of machine learning models involve interconnecting tasks such as defining, data collection, building, testing, deploying, monitoring, evaluating, retraining, and updating the machine learning models. Effective lifecycle management of machine learning models in a production environment poses a technical challenge on multiple disciplinary levels and requires an integration of a diverse skillset including but not limited to business intelligence, domain knowledge, machine learning and data science, data ETL (extract/transform/load) techniques, software development, DevOps (software development (Dev) and information technology operations (Ops)), and QA (quality assurance). As such, lifecycle management of a machine learning model usually requires complex processes involving a diverse group of computer engineers, domain experts, software developers, and/or data scientists.
A lifecycle of a machine leaning model begins with its initial definition and development by data scientists and domain experts. Depending on the types of input data and types of prediction tasks and outputs, the machine learning model may be first architected to include various choices of machine learning algorithms. A training dataset may be collected/generated, processed, and labeled. The machine learning model may then be trained using the training dataset. The trained machine learning model may include one or more data processing layers embedded with model parameters determined during the training process. The trained machine learning model may then be further tested before being provided to a production environment.
As an additional important part of the lifecycle of a trained machine learning model, its predictive performance may be further continuously monitored and evaluated while it is being deployed in the production environment. Based on such continuous monitoring and evaluation, a retraining of the machine learning model may be triggered when needed and then the retrained machine learning model may be retested/reevaluated and updated in the production environment. Such a monitoring, evaluation, retraining, reevaluation and updating process provide some degree of surety, reliability, and safety in the performance of the machine learning model in the production environment.
Many practical industrial, enterprise, and other applications require a large number of interconnecting machine learning models integrated into a single complex data analytics and/or information/signal control system. In a particular example of a sensor network application, thousands of real-time machine learning models may be integrated in a framework for collecting data from tens of thousands or more distributed sensors and for performing real time predictive data analytics. Such a framework for machine learning models, for example, may be adapted in an application for use in an industrial plant for analyzing various real time sensor outputs and generating real time control signals to a large number of components of the plant based on predictions from the machine learning models. For these complex applications, the development, deployment, monitoring/evaluation, retraining, reevaluation, and updating of the large number of machine learning models to provide improved surety in the predictive performance and model safety present a daunting task and can only be effectively achieved with at least some degree of automation. The disclosure below describes various implementations of such an automated framework and technical components therein for developing, deploying, monitoring/evaluating, retraining, reevaluating, and updating machine learning models. Such a framework may be referred to as a machine learning model management framework (MLMM framework).
The term “model surety” is used in this disclosure to broadly represent various aspects of the automated model development, deployment, monitoring, evaluation, retraining, reevaluation, and updating in the MLMM framework for ensuring that a machine learning model generates predictions that are reasonably accurate, fair, robust, and safe in the production environment and during its lifecycle. These aspects may include but are not limited to:
As a particular example, the automated MLMM framework above may include technical components that facilitate detection and correction of concept drift of a machine learning model in the production environment. Specifically, development and training of a machine learning model may rely on a set of rules, relationship, and assumptions. Such rules, relationship, and distribution may change or shift over time, leading to a drop in the performance of the machine learning models over time. In some aspect, the real time input data distributions and/or underlying data relationship may change or shift away from those of the original training dataset. As a result, the trained machine model may become stale and inaccurate in predicting the target variables for new incoming data items. For example, in a machine learning model for predicting whether a user (input data) would click (target variable) a particular online advertisement (input data), the input data distribution such as demographics of the users may shift over time, e.g., the population may age (the portion of elderly increases) over time, and the underlying data relationship such as user behavior (likelihood of click an online advertisement) of a particular demographic group may evolve over time. Such changes, shifts, or evolution of the underlying data distribution or data relationship, referred to as concept drift, may lead to decrease in predictive accuracy of the machine learning model. Such predictive performance decrease may be detected by the MLMM framework. The machine learning model may be retrained based on updated training data, rules, and model assumptions. Further details for handling concept drift of machine learning models in the MLMM framework are included in the U.S. Provisional Patent Application No. 62/963,961, filed on Jan. 21, 2020, the entirety of which is herein incorporate by reference.
As another particular example, regardless of their types, purposes, and inner workings, machine learning models in production may be exposed to security threats in the form of, e.g., adversarial attacks. Such in-production adversarial attacks may include but are not limited to various types of attacks such as evasion, poisoning, trojaning, backdooring, reprograming, and inference attacks. An adversarial attack may be general-purpose or may specifically target a particular machine learning model or a particular machine learning architecture or algorithm. An adversarial attack may incorporate adversarial noises to input data aimed at inducing untargeted mis-prediction, or targeted mis-prediction, and/or reducing the confidence level of the machine learning model. A machine learning model may be built to be safe against some known priori adversarial attacks during the training stage. However, it is difficult to consider all priori adversarial attacks, and unknown new adversarial attacks may be developed by hackers after the machine learning model is placed in a production environment. A machine learning model in a production environment thus may be vulnerable to various types of existing and new adversarial attacks. The automated MLMM framework above may include various components for detecting known or unknown adversarial attacks in the input data and for further providing retraining and updating of the machine learning model to render the adversarial attach ineffective. Further details for handling adversarial attacks on machine learning models in the MLMM framework are included in the U.S. Provisional Patent Application No. 62/966,410, filed on Jan. 27, 2020, the entirety of which is herein incorporate by reference.
As another particular example, the automated MLMM framework above may include technical components that provide explainable reasoning behind the predictions of a machine learning model. Such explainability provides the logic behind predictive decisions made by the machine learning model and allows for human interactions, collaboration, and model sensitivity analysis through “what-if” algorithms. For example, these technical components may be designed to analyze prediction results corresponding to an input dataset and derive factors and features in the input dataset that contribute the most to the prediction results. These technical components may further provide capability to perform “what-if” analysis, in which some aspects of the input data may be modified for new predictions and the new predictions may be further analyzed to provide counter-factual reasoning.
As yet another particular example, the automated MLMM framework above may include technical components that provide detection, testing, quantification, and removal of bias in the prediction outcome of the machine learning model. Fairness of the machine learning model may thus be improved. For example, a machine learning model embedded with word/document/language processing algorithms may be used by recruiters to identify resumes of job applicants that match or qualify for a particular job advertisement. In one example, the machine learning model may give high scores to a set of resumes associated mainly with male job applicants in response to a search for, e.g., a computer programmer. The technical components of the MLMM framework related to bias detection may be designed to detect whether such male-dominant output is biased (due to, e.g., gender-biased language algorithms used in the machine learning model) or unbiased (e.g., the male dominant results are related to dominance of male applicants and their true qualification). Detailed example implementations are provided below in relation to
The various technical components of the automated MLMM framework may be designed and integrated to provide model surety in the various aspects described above. As a result, the MLMM framework provides data integrity and model governance and may be configured to provide transparency, fairness, consistency, and robustness in the predictive behavior of the machine learning models. These machine learning models may be updated to be resilient against existing and future adversarial attacks and be adaptive against concept drift in machine learning models.
These various technical components may be provided in a modular fashion for integration in any implementation of the MLMM. For example, the monitoring and testing may be performed on any of the input data 1502, machine learning model 1504, and operating unit 1506 as consumer of the machine learning model. Monitoring and testing may also be performed on ETL processes for input data, shown in 1510, and model execution process, shown as 1520. The monitoring and testing may be performed on ETL, model-data integrity, model performance, model reasoning or explainability, model robustness, and model fairness, shown by 1530, 1532, 1534, 1536, 1538, and 1540, respectively, with increasing algorithm complexity, as shown by arrow 1550.
The monitoring and testing units 1530 for ETL, for example, may be configured to perform, for example, source-to-target data consistency, meta-data verification, schema verification, data integration, relational, data veracity, and data privacy monitoring and testing. The model-data integrity monitoring and testing units 1532 for model-data integrity may be configured to perform, for example, feature completeness, feature validity (including types, range, missingness), data and/or model integrity, data-to-model consistency, data type and precision, and data masking monitoring and testing. The model performance monitoring and testing units 1534 may be configured to perform, for example, accuracy metrics monitoring and testing such as classification precision, classification recalls, regression, statistical metrics monitoring and testing such as coverage, confidence interval, p-value, model consistency monitoring and testing such as correlations, expected trends, proxy models, and resource monitoring and testing such as model latency, memory consumption, etc. The model reasoning/explainability monitoring and testing units 1536 may be configured to perform, for example, counterfactual reasoning, “what-if” reasoning, feature-aware reasoning (such as LIME, k-LIME, SHAPLEY, and causality reasoning). The model robustness monitoring and testing units 1538 may be configured to perform, for example, monitoring and testing of robustness against adversarial attacks, concept drift, outliers in input data, input noise, and missingness in input data. The model fairness monitoring and testing units 1540 may be configured to perform, for example, group-aware bias detection (such as gender, age, and racial bias), business/regulation-aware detection (such as demographic parity test in a loan application approval model), detection of bias in training data (bias due to pretrained language model, for example), and detection of individual-aware bias such as counter-factual individual outcome from a machine learning model.
The various units above and other technical components may be provided as plug-ins of the automated MLMM framework. As shown in
The MLMM framework of
For example, as shown in
The detection circuitries 204 may generate detection results 208, which may be processed by the inspection circuitries 206 to generate inspection results 209. The detection circuitry 204 and the inspection circuitries 206 thus augment the production machine learning pipeline 210 to determine that there are no issues occurring during model execution on live input data, and if there are issues, these issues would be identified. The detection circuitries 204, in particular, analyze the model output 216 of the live input data to determine if the model results 216 are trustworthy. If the model results 216 are trustworthy, the results 208 from the detection circuitries are further inspected by the inspection circuitries 206 to ensure consistency in the detection of issues with the machine learning model. The surety circuitries 202 including the detection circuitries 204 and the inspection circuitries 206 may support various levels of machine learning models, referred to as surety machine learning models, or surety models for simplicity. Like the machine learning models in the production machine learning model pipeline 210, the surety machine learning models may be trained, and further, may be evaluated, retrained and updated as described below.
Correction circuitries may be further added to the model surety pipeline 202 of
As further shown in
The data store 304 may act as a repository for various data needed for the correction engine 302. Such data from the data store 304 are used either by the machine learning model correction component 410 for generating updated machine learning models or by the surety model correction engine 302 for generating updated surety machine learning models such as detection models and inspection models. Various data supplied by the data store 304 to the correction engine 302 may or may not be shared by the machine learning model correction component 410 and the surety model correction component 412. The data from the data store 304, for example, may include but are not limited to training data 420, machine learning model performance metrics data 422, and surety model performance metrics data 424. The training data 420 includes datasets for retraining of the machine learning models and the surety models (the various models used by the detection circuitries 204 and inspection circuitries 206). The data store 304 maintains an updated and corrected version of the training datasets that, for example, include the datasets used for original training of the machine learning models and the surety models, and live data processed by the surety pipeline of
The data store 304 may be in communication with an input data processing component 430 as part of the correction circuitries 301 in real time, at predetermined times, or periodically. The input data processing component 430 may be configured to performing ETL of input data to the surety pipeline 202 of
The data store 304 may further be in communication with the surety pipeline 202 in real time, at predetermined times, or periodically, to receive prediction output 216 based on the processed input data from the machine learning models in the surety pipeline 202, and the detection results 208 and the inspection results 209 from the detection circuitry 204 and the inspection circuitries 206 of the surety pipeline 202, respectively. These results are provided, for example, to the machine learning model performance metrics component 422 and the surety model performance metrics component 424 of the data store 304. An additional data catalog component 440 may be designed to manage and track the data contents in the data store 304. For example, the data catalog component 440 may include a metadata collection for the data store and data lineage tracking of the data store, as shown in 440.
The machine learning model correction component 410 of the correction engine is responsible for retraining of the machine learning models and communicating the updated models to the model catalog 402. The machine learning model correction component 410 may include data cleaning components 460 for cleansing the various training data and machine learning performance metrics data from the data store 304, data evaluation components 462 for analyzing the data, a model retraining component 464 for retraining of the machine learning models, a model evaluation component 466 for evaluating and testing the retrained machine learning models. The machine learning model correction component 410 may also include model encryption component 468 for encrypting the machine learning models using various model protection algorithms.
Likewise, the surety model correction component 412 of the correction engine is responsible for retraining of the surety models and communicating the updated surety models or algorithms to the model catalog 402. The surety model correction component 412 may include algorithm evaluation component 470 and model evaluation component 472 for evaluating and selecting various data analytics, modeling algorithms and surety models, a retraining component 474 for retraining of the surety models (including the detection modules and inspection modules), and a model versioning component 476 for controlling and managing various versions of the surety models.
The model catalog 402 may be configured as a model repository and log. The model catalog 402 may include various data, models, and algorithms. For example, the model catalog 402 may include verified training data set 450, verified machine learning models 452, verified surety algorithms 454, verified surety models 456, and verified surety machine learning pipelines 458. These verified data, algorithms, and models may be provided to other components of the MLMM via Application Programming Interface (API) and be incorporated into the surety pipeline 202.
Access to the various components of the correction circuitries 301 may be provided to various type of users through the MLMM. For example, access to these components may be provided via an API functions. These API functions may be integrated to provide applications and/or user interfaces for the various types of users of the MLMM. These users, for example, may include model engineers, data scientists, business analysts, and the like.
In some implementations, model engineers may be provided with user interfaces for configuring automated retraining of the machine learning models and surety models. For example, a model engineer may configure the surety pipeline 202 to detect noticeable drops in model accuracy, and then call a model retraining pipeline to generate a more robust model version. The model engineer thus (1) uses functionalities provided by the data store 304 for processing and fetching the latest training data, machine learning model results, detection results, (2) cleans the data and using the cleaned data to retrain the model and create a new model version via the correction engine 302, and (3) stores the new model in the model catalog 402 for access by any user on the same project.
In some implementations, data scientists may be provided with user interfaces for adding various algorithms to the model catalog 402 for use by the retrained machine learning models and the surety models. For example, a data scientist may be responsible for configuring the use of multiple detection algorithms for ensemble detection of issues in the machine learning models (e.g., ensemble detection of concept drift of the machine learning models, as further described below), and may have discovered a new detection algorithm and wish to add the new detection algorithm to the detection circuitry 204. The data scientist may evaluate the code of the new detection algorithm and add the code to the detection circuitry 204, creating a new version of the detection module. The new version of the detection module is then added to the model catalog so other data scientists can reuse the more robust detect module on their projects.
In some other implementations, a business analyst may be provided with a verified surety pipeline when the business analyst commence project. For a specific example, the business analyst may start a new project to classify satellite images. The business analyst may begin by searching the model catalog to find verified machine learning models for satellite imagery and verified adversarial attack detection modules. The verified satellite imagery machine learning models and the detection modules are then sent to a model engineer, who configures them in a surety pipeline, views the results, and configures a retraining pipeline to automatically improve the machine learning models and detection models over time.
As shown in
Specifically, the online pipeline 502 handles live data in real time, and may include three main example phases: detection phase (510), data transformation phase (507), and model execution phase (508). The detection phase 510 may be handled in a generalizable detection engine by analyzing the live input data 506 to the surety pipeline. The live input data enter the system and pass through the generalizable detection engine of the detection phase 510 first. For example, the detection engine may be configured to determine safety of the input data (e.g., whether the input data contains adversarial attacks). The generalizable detection engine may be designed and configured to be agnostic to the various machine learning models. From there the detection engine assesses the live input data as either “safe” or “unsafe”. If the detection engine in the detection phase 510 determines that the live input data is “unsafe” for the machine learning model to run on, data transformation and model execution phases as indicated by 507 and 508 are not run. Instead, an alert may be generated through the escalation process as indicated by 516, the input data and the assessment information by the detection phase 510 are made available via, e.g., API, and an on-demand inspection of the input data may be triggered, as shown by the arrow 520 crossing from the detection phase 510 to the on-demand pipeline 504.
Otherwise, as shown by the arrow 522, if the data is “safe” according the detection engine of the detection phase 510, the online pipeline 502 continues to the data transformation phase 507 and the machine learning model execution phase 508 with the live input data 506. The data transformation phase 507 is responsible for performing any necessary data transformations and normalization before proceeding to model execution. Such data transformation, for example, may include but is not limited to feature squeezing for image data, tokenization for text data, data normalization, outlier handling, or anything else that facilitates pre-processing of the live input data prior to the execution of the machine learning model. In some implementations, regardless of which decision the detection engine in the detection phase 510 makes, the data returned by the detection engine may be made available for all other stages and components of the MLMM to consume via API.
In the example implementation of
The on-demand pipeline 504 of
The on-demand pipeline 504 maybe configured to be triggered by the generalizable detection engine in the detection phase 510 of the online pipeline 502, which may be configured to be agnostic to the machine learning models in production (or in the online pipeline). When the detection engine, for example, detects incoming data samples as adversarial above a given confidence threshold, the on-demand pipeline 504 may be triggered. The triggering of the on-demand pipeline 504 and the escalation and alerting functions indicated by 516 of the online pipeline are not mutually exclusive. Alert may be sent to relevant components in the MLMM and/or parties for manual intervention while the on-demand pipeline 504 may be automatically triggered.
The on-demand pipeline 504 handles further inspection and correction of issues with the data or model via an inspection engine 540 and the correction circuitries (including the data store 304 and the correction engine 302). The correction engine 302, as describe above in relation to
While the example online and on-demand pipeline architecture of
In some implementations, the detection engines described above either before or after the execution phase of the machine learning model in the online pipeline 502 may be implemented with an ensemble of detector for improving detection accuracy and for a broader scope of detectable issues. Additionally or optionally, a particular production machine learning model in the online pipeline may also be implemented using an ensemble of machine learning models rather than a single machine learning model for processing input live data and to improve prediction accuracy. The ensemble of machine learning models for a particular task in the production online pipeline 502 may be selected from a machine learning model school (or a machine learning model library).
The use of ensemble detectors and/or ensemble of production of machine learning models is illustrated in
Using the production model ensemble selected from the model school 604 in the online pipeline improves accuracy of the model prediction. While such a model ensemble implementation is particular helpful for reducing impact on prediction accuracy due to concept drifts, it may be generally used to improve the performance of machine learning models against other issues faced by the machine learning models. The ensemble of models may be generated and trained using different model architectures, different model algorithms, different training data sets, and/or the like, to promote model diversity for improved overall prediction accuracy.
The training of the various models in the model school 604 of
The ensemble approach above for the detection engine combines benefit of various single detection algorithms to provide more accurate and broader detection. It allows for a flexible detector with configurable design and complex optimization of various tunable parameters in the ensemble detector including detection accuracy/precision/sensitivity levels (for example, via the detection threshold) and delays between the various detection branches (the hold time 907). The individual detection algorithms can be chosen with diversity. For example, a concept drift detector ensemble may include various detection algorithms based on one or more of Drift Detection Method (DDM), Early Drift Detection Method (EDDM), Adaptive Windowing (ADWIN) detection, Page-Hinkely (P-H) statistical detection, and CUmulative SUM (CUSUM) detection, and the like.
Finally, the detection circuits or detection engines described above in relation to
As an example,
As further shown in
The correction circuitry and correction engine is to retrain the machine learning models to remove the bias. For example, as shown in
The various technical components above for the MLMM may be implemented using any types of computing devices.
The GUIs 1410 and the I/O interface circuitry 1406 may include touch sensitive displays, voice or facial recognition inputs, buttons, switches, speakers and other user interface elements. Additional examples of the I/O interface circuitry 1406 includes microphones, video and still image cameras, headset and microphone input/output jacks, Universal Serial Bus (USB) connectors, memory card slots, and other types of inputs. The I/O interface circuitry 1406 may further include magnetic or optical media interfaces (e.g., a CDROM or DVD drive), serial and parallel bus interfaces, and keyboard and mouse interfaces.
The communication interfaces 1402 may include wireless transmitters and receivers (“transceivers”) 1412 and any antennas 1414 used by the transmit and receive circuitry of the transceivers 1412. The transceivers 1412 and antennas 1414 may support WiFi network communications, for instance, under any version of IEEE 802.11, e.g., 802.11n or 802.11ac, or other wireless protocols such as Bluetooth, Wi-Fi, WLAN, cellular (4G, LTE/A). The communication interfaces 1402 may also include serial interfaces, such as universal serial bus (USB), serial ATA, IEEE 1394, lighting port, I2C, slimBus, or other serial interfaces. The communication interfaces 1402 may also include wireline transceivers 1416 to support wired communication protocols. The wireline transceivers 1416 may provide physical layer interfaces for any of a wide range of communication protocols, such as any type of Ethernet, Gigabit Ethernet, optical networking protocols, data over cable service interface specification (DOCSIS), digital subscriber line (DSL), Synchronous Optical Network (SONET), or other protocol.
The system circuitry 1404 may include any combination of hardware, software, firmware, APIs, and/or other circuitry. The system circuitry 1404 may be implemented, for example, with one or more systems on a chip (SoC), application specific integrated circuits (ASIC), microprocessors, discrete analog and digital circuits, and other circuitry. The system circuitry 1404 may implement any desired functionality of the MLMM tool. As just one example, the system circuitry 1404 may include one or more instruction processor 1418 and memory 1420.
The memory 1420 stores, for example, control instructions 1422 for executing the features of the MLMM tool, as well as an operating system 1421. In one implementation, the processor 1418 executes the control instructions 1422 and the operating system 1421 to carry out any desired functionality for the MLMM tool, including those attributed to passive modules 1423 (e.g., relating to monitoring of ML models), and/or active modules 1424 (e.g., relating to applying adversarial attack tests or verifying a ML model's robustness). The control parameters 1425 provide and specify configuration and operating options for the control instructions 1422, operating system 1421, and other functionality of the computer device 1400.
The computer device 1400 may further include various data sources 1430. Each of the databases that are included in the data sources 1430 may be accessed by the MLMM tool to obtain data for feeding into a machine learning model.
As described above, the MLMM tool provides management, orchestration, and governance of machine learning models in their production environments. The MLMM tool further provides easy to deploy machine learning models, executes data pipelines feeding the machine learning models, provides pipelines of models where applicable, and retrieves and/or maintains a log of the results of the machine learning models. The MLMM tool further provides a modular approach to automatic or semi-automatic monitoring, testing, and/or correction of machine learning models in their production environments. The MLMM tool further provides a modular design and deployment of passive and/or active monitoring agents with feedback loops and result logging features. The MLMM tool further provides generalizable detection engines designed to verify machine learning models in production with transparent and adjustable complexity level, reasoning logic, and business requirement dependencies. The MLMM tool further provides automatic or semi-automatically aid for training machine learning models to become robust to outliers, missing data (data scarcity), concept drifts, or even instances of adversarial attacks. The MLMM tool optionally utilize ensemble techniques for the production machine learning models and/or the detection engine to provide enhanced prediction and detection accuracy. The MLMM tool provides features that detect ways to improve pipeline performance once machine learning models have already been deployed.
Various implementations have been specifically described. However, other implementations that include a fewer, or greater, number of features and/or components for each of the apparatuses, methods, or other embodiments described herein are also possible.
This application claims priority to U.S. Provisional Patent Application No. 62/856,904 filed on Jun. 4, 2019, U.S. Provisional Patent Application No. 62/963,961, filed on Jan. 21, 2020, and U.S. Provisional Patent Application No. 62/966,410, filed on Jan. 27, 2020, the entirety of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
62856904 | Jun 2019 | US | |
62963961 | Jan 2020 | US | |
62966410 | Jan 2020 | US |