This application is related to U.S. patent application Ser. No. 17/663,423, entitled UNSUPERVISED LEARNING FOR REAL-TIME DETECTION OF DANGEROUS CORNERING EVENTS IN FORKLIFT TRAJECTORIES FOR EDGE-LOGISTICS ENVIRONMENTS, filed 14 May 2022, which is incorporated herein in its entirety by this reference.
Embodiments of the present invention generally relate to the implementation and use of machine learning models. More particularly, at least some embodiments of the invention relate to systems, hardware, software, computer-readable media, and methods, for the detection of drift in the operation of machine learning models.
Machine learning model drift detection in fully unsupervised domains has proven to be a difficult problem. Taking, for example, the operation of mobile edge nodes such forklifts operating in a warehouse environment, labeled data acquisition in the context of dangerous cornering detection would require acquiring and annotating the mobile device trajectories and all their correlated sensor information, such as positioning and acceleration for example, in order to be able to determine that a dangerous cornering event is occurring, or is about to occur.
Such an approach is not feasible, however. First, the volume of data may be too large for practical labeling by human experts. Further, the raw sensor data is not easily interpretable, and the detection of cornering events in that raw sensor data is a challenge in itself. As well, dangerous cornering events cannot be easily generated on-demand in the environment. It is unfeasible and impractical, for example, to ask an operator to carelessly drive a mobile edge device such as a forklift so as to enable the generation and collection of data concerning anomalous cornering events, or to have an autonomous vehicle operate repeatedly in a real environment, and in every configuration of unsafe behavior possible, to generate a sufficiently representative training set for a machine leaning model.
Still another problem is that the absence of labels, that is, event indications in the training data, affects the training of the event detection itself, as building predictive (supervised) machine learning models requires the collection of a large amount of labeled data. In contrast, an unsupervised model may be used for real-time event detection.
Finally, and with particular reference to the task of drift detection, it may be the case that labels do not become available, if at all, even after the events of interest have taken place. This means that the performance of an event detection model is not trivially verified, but also means that the performance of the model cannot be used to assess drift in a straightforward fashion.
In order to describe the manner in which at least some of the advantages and features of the invention may be obtained, a more particular description of embodiments of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered to be limiting of its scope, embodiments of the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings.
Embodiments of the present invention generally relate to the implementation and use of machine learning models. More particularly, at least some embodiments of the invention relate to systems, hardware, software, computer-readable media, and methods, for the detection of drift in the operation of machine learning models. Some particular embodiments may be employed in connection with unsupervised event-detection machine learning (ML) models, which may be deployed in mobile edge devices that are operating in an edge computing environment, and that are operable to communicate with one or more near-edge nodes, and a central node.
In general, example embodiments of the invention may employ an autoencoder-based approach for event detection in unsupervised domains. For the purposes of illustration only, reference is made herein to an example use-case of a large-scale logistics warehouse in which mobile entities, such as forklifts for example, are equipped with sensors and operate as far-edge nodes in relation to a near-edge local infrastructure. Example embodiments may operate to perform model drift detection in these scenarios. Example embodiments may operate to determine, possibly in real time, whether a proportion of reconstruction errors is changed from an expected baseline number of reconstruction errors. Such determinations may be based on a known proportion of events of interest in the domain of interest, as well as the performance of the autoencoder in reconstructing normative samples.
Embodiments of the invention, such as the examples disclosed herein, may be beneficial in a variety of respects. For example, and as will be apparent from the present disclosure, one or more embodiments of the invention may provide one or more advantageous and unexpected effects, in any combination, some examples of which are set forth below. It should be noted that such effects are neither intended, nor should be construed, to limit the scope of the claimed invention in any way. It should further be noted that nothing herein should be construed as constituting an essential or indispensable element of any invention or embodiment. Rather, various aspects of the disclosed embodiments may be combined in a variety of ways so as to define yet further embodiments. Such further embodiments are considered as being within the scope of this disclosure. As well, none of the embodiments embraced within the scope of this disclosure should be construed as resolving, or being limited to the resolution of, any particular problem(s). Nor should any such embodiments be construed to implement, or be limited to implementation of, any particular technical effect(s) or solution(s). Finally, it is not required that any embodiment implement any of the advantageous and unexpected effects disclosed herein.
For example, an embodiment may operate to perform drift detection with respect to the performance of an ML model that is operating to implement unsupervised event detection in a domain. Thus, such embodiments may enable improvements to the performance of such an ML model, and/or the performance of an associated mobile edge device, by determining whether the ML model is drifting or not. A drifting ML model may then be refined, or replaced. Various other advantageous aspects of example embodiments will be apparent from this disclosure.
It is noted that embodiments of the invention, whether claimed or not, cannot be performed, practically or otherwise, in the mind of a human. Accordingly, nothing herein should be construed as teaching or suggesting that any aspect of any embodiment of the invention could or would be performed, practically or otherwise, in the mind of a human. Further, and unless explicitly indicated otherwise herein, the disclosed methods, processes, and operations, are contemplated as being implemented by computing systems that may comprise hardware and/or software. That is, such methods processes, and operations, are defined as being computer-implemented.
Example embodiments may generally be concerned with the task of drift detection in the performance of ML models, or simply ‘models.’ Some particular embodiments may be concerned with detecting drift in unsupervised event-detection models applied over sensor streams in edge environments.
As a representative use-case, not limiting in any way the scope of the invention, example embodiments may consider the task of detecting dangerous cornering events in trajectories of mobile devices, such as forklifts for example, at the far edge, that is, in an edge computing environment.
One particular challenge in real-time event detection at the edge is that labels are typically not available regarding data that is generated and/or collected by an edge device. However, conventional supervised learning approaches require data labels, or simply ‘labels,’ indicating the events of interest.
Because, as noted, labels may not be available in some circumstances however, an unsupervised approach to event detection in an edge environment may be required. One possible implementation of such an unsupervised approach is disclosed in the ‘Related Application’ identified herein. In that example approach, an autoencoder may be trained using a training set X which includes only normative behavior, that is, for example, behavior that comprises normal, safe, mobile edge device cornering events. When applied over a test set Y, the autoencoder may yield reconstruction errors for each sample. Samples with a high reconstruction error, above a predetermined threshold for example, may be assigned as relevant events, that is, potentially dangerous cornering events. Some example embodiments of the invention may employ an approach such as is disclosed in the ‘Related Application,’ and may further perform drift detection over, that is, with respect to, the event-detection.
The ‘Related Application’ discloses a representative embodiment of a cornering detection approach via anomaly detection. The approach performs a training stage, leveraging the data collected at a near-edge node. This is represented in the configuration 100 of
In the example of
The training stage may further comprise, in the illustrative example of mobile edge devices operating in warehouse, a cornering detection algorithm 104. The cornering detection algorithm may operate to capture triplets of positioning data, along with associated inertial measurements, and may operate to compose a training set of cornering events.
As well, the training stage may comprise an autoencoder model that may be trained to reconstruct typical cornering events. In some instances, a reconstruction error distribution may additionally be obtained obtain for the typical cornering events in the training set 106. These cornering events may then be deployed to each mobile entity for online decision making, leveraging the sensor data stream at the entity itself, as disclosed in
Particularly, the configuration 200 of
The approach referred to in
Drift detection is an area of significant research in machine learning, and there are many approaches applicable for the detection of nonstationary distributions comprising data and/or concept drift. At least some example embodiments are concerned particularly with unsupervised drift detection in streaming data, such as may be generated by edge devices such as sensors for example. At least some embodiments of the invention may consider the number of samples in an uncertainty region of an event detection model. In contrast with approaches which determine similar uncertainty regions for classifier models, example embodiments may implement a fully unsupervised approach. That is, an event detection model according to some embodiments of the invention may, itself, be unsupervised, and no labeled data is made available to that event detection model to correct or guide the drift detection approach over time.
Example embodiments of the invention may define, implement, and operate, a drift detection approach for an unsupervised event-detection model that is operating in an edge environment. Embodiments may comprise an offline training stage for the drift detection approach, which may coincide with the training stage for the event detection model itself, and an online drift detection stage, which may coincide with the online event detection inferencing stage.
This approach taken by some example embodiments may present various advantages. For example, this approach may be applicable to use cases in a fully-unsupervised domain. As another example, this approach may be applicable to the use case of event detection. Further, this approach may implement online determination of drift, albeit for a window of past samples. Further, this approach may operate unsupervised in a drift detection stage. Finally, this approach may not require the existence of a temporal relation between samples, but only the reconstruction errors, in the training stage.
The example embodiments disclosed herein may be useful in a variety of domains, at least insofar as such embodiments may operate to serve as a warning system of possible drift in the operation of an unsupervised event detection model.
Furthermore, example embodiments may be used in conjunction with others approaches, such that, for example, upon identification of a window of drift by an example embodiment, a more robust, and possibly more expensive or delayed approach, may be selectively applied to a particular model and/or set of circumstances.
As noted above, example embodiments may comprise an offline stage, which may be implemented in an edge environment, such as the example edge environment 300 disclosed in
In particular,
The edge device Ei, equipped with sensors 306, may operate to collect and process data as a sensor stream Si 308. The data from these sensor streams Si 308 may be collected over time at the central node ‘A’ 302 into a centralized repository 310. The management and orchestration of the repository 310, such as with the discarding of outliers, compression, or discarding of too-old samples, may be implemented in some embodiments.
As noted earlier, embodiments may obtain an event detection model ‘M’ 312. In the case of
Example implementations of a model ‘M’ 312 deployed at the edge nodes 304 may perform event detection and yield an event indication q for each appropriate input, which input may comprise a collection of sensor data and/or contextual information. In example embodiments, q may correspond to a reconstruction error yielded by an autoencoder, that is, a drift detection model, that is included as part of the model ‘M’ 312. In similar fashion to the sensor data, the event indications may be communicated and stored at the central node 302 in a repository 316. Note that q may indicate how likely it is that particular data correspond to a particular, possibly dangerous, event.
With reference now to
Regardless of the method for obtaining Z, embodiments may assume a proportion P=|z|/|x U z| of anomalous events in the domain of interest, such as mobile edge devices operating in a warehouse for example. In the case (A) above, and with reference to
Notice that all these cases (A), (B), and (C), may provide data that may not be normative of the behavior in the domain, and hence may not be able to be used for training an autoencoder for event detection, nor do these cases provide labeled data for the purposes of training. Rather, the data Z 404 may simply comprise samples of synthetic data that may reasonably resemble actual samples, and may be likely to be poorly reconstructed by an autoencoder, as discussed below in connection with
Particularly, in case (A), Z may comprise mostly outliers from sensor readings and drop-off periods, for example. In case (B), the normative behavior of another instance of the problem is considered. In case (C), the data Z may be purely synthetic, and the model used to generate it may determine the resulting samples. Thus, it is noted that while embodiments may use “known” anomalous data, the use of that anomalous data may still satisfy the requirements for operating the ML model in a fully unsupervised domain.
With reference now to
Given the reconstruction error distributions, embodiments may define a margin of confidence over the results of the event detection model. In T. S. Sethi and M. Kantardzic, “On the reliable detection of concept drift from streaming unlabeled data,” Expert Systems with Applications (2017), an approach is proposed for detecting drift from data streams when labels are not readily available, considering uncertainty regions from supervised classifier models. Some embodiments of the invention may adapt, and extend, that approach to be applied over the reconstruction error distributions (see
In more detail, and with reference now to
Embodiments of the invention may then count the number of samples in the training dataset whose reconstruction errors E fall within the margin. Recall the discussion regarding cases (A), (B) and (C) above. The anomalies in Z may not be representative of different modes of operation in the domain. In case (C), for example, (Z) is directly related to the specification of the function for generating synthetic samples. Hence, while these samples may be useful for determining the margin, some example embodiments may consider the reference dataset X only, that is, only the original normative data samples. Embodiments may then denote the proportion of samples with reconstruction error within the margin in the reference set as r. Formally,
This is represented in
In some example embodiments, the online stage for drift detection may coincide with the online event detection inferencing stage. As an extension to the operations performed for event detection, embodiments of the invention may operate to continuously keep track of rcurr as the current proportion of samples being observed, or extracted from a sensor stream, whose reconstruction errors E falls within the margin.
When the value of rcurr is much larger, or much smaller, than the reference margin r, drift may be signaled. The reasoning is that if a larger than expected, or smaller than expected, number of samples are ambiguous, the distribution of the reconstruction errors of the observed samples differs from that which was observed during the training period. This may mean that the underlying distribution of the data itself is changed, or that the autoencoder model no longer works as expected for the current data stream. Thus, if the margin m increases, the model may be drifting. As such, example embodiments may specify a maximum acceptable margin m. The acceptable size of a margin m may be a function of various considerations such as, but not limited to, the application domain where the model M is deployed.
A maximum change δ may be assumed to be provided to the method as an argument. A potential drift may be signaled, for example, when |rcurr−r|>δ. In alternative embodiments, different values of maximum positive, and negative, changes in δ can be determined.
Embodiments may additionally determine a series of window w0, w1, . . . and keep a corresponding series of values r0, r1 . . . , such that ri denotes the proportion of anomalous events detected in the samples extracted from window wi. In this case, embodiments of the invention may indicate which windows of recent samples yield a different proportion of samples within the margin.
For example, the drift indication may be triggered only by a sequence of windows showing a proportion smaller or greater r by at least δ. Alternative approaches in which overlapping windows, and other drift parameters such as drift start, drift end, and drift duration, may be used for determining a likely drift duration.
Finally, as part of the online stage, example embodiments of the invention may consider the updating of the drift detection model D. This may take place in two scenarios:
As will be apparent from this disclosure, example embodiments of the invention may possess various useful features and advantages. For example, embodiments may provide for drift detection in the performance of unsupervised event detection models. An embodiment may be applicable to the use case of unsupervised event detection at edge domains. An embodiment may enable online determination of drift, albeit for a window of past samples. An embodiment may operate in an unsupervised manner both in training, and in the drift detection stage. Finally, embodiments may not require temporal relations between samples, but only the reconstruction errors, in the training stage.
It is noted with respect to the disclosed methods, including the example method of
Directing attention now to
With reference now to the particular example of
Using the reconstruction errors determined at 804 and 806, a margin may then be defined 808. The margin may define a range of reconstruction error values, and a size of the margin may be defined according to any suitable criteria. Thus, the margin may be referred to as a reference margin.
Finally, given a number of incoming data samples from the unsupervised event detection model to the drift detection model, a proportion of those data samples may have a reconstruction error that falls within the range of reconstruction errors defined by the margin. When that proportion is much larger, or much smaller, than a threshold value or range, drift in the performance of the unsupervised event detection model may be signaled 810.
When drift is signaled 810, for example, because the performance of the model is outside an acceptable range, various actions may be taken. For example, the unsupervised event detection model may be refined to improve its performance. Alternatively, the unsupervised event detection model may be replaced with a different model. Note that a certain amount of drift may be deemed to be acceptable, so long as, for example, the drift is within a defined range for example.
Following are some further example embodiments of the invention. These are presented only by way of example and are not intended to limit the scope of the invention in any way.
Embodiment 1. A method, comprising: receiving a stream of unlabeled data samples from a model; obtaining a first reconstruction error for the unlabeled data samples; obtaining a second reconstruction error for a set of normative data; defining a margin based on the first reconstruction error and the second reconstruction error; computing an initial proportion of samples from the set of normative data whose reconstruction errors fall within a range of reconstruction errors defined by the margin; computing a new proportion of unlabeled data samples that fall within the range of reconstruction errors defined by the margin; and signaling drift in the performance of the model when said new proportion differs from said initial proportion by more than a predefined tolerance threshold.
Embodiment 2. The method as recited in embodiment 1, wherein the model is an unsupervised event detection model operable to detect events in a domain in which mobile edge devices are deployed.
Embodiment 3. The method as recited in any of embodiments 1-2, wherein when the drift is signaled, the model is retrained, and the margin and proportion are recomputed.
Embodiment 4. The method as recited in any of embodiments 1-3, wherein the stream of unlabeled data samples is generated by one or more mobile edge nodes.
Embodiment 5. The method as recited in any of embodiments 1-4, wherein prior to receiving the stream of unlabeled data samples, a model that performs the signaling of the drift is trained using a combination of anomalous data and the normative data.
Embodiment 6. The method as recited in any of embodiments 1-5, further comprising further comprising comparing a sequence of differences between the current proportions and the initial proportion to determine a drift in the performance of the model.
Embodiment 7. The method as recited in any of embodiments 1-6, wherein boundaries of the margin are defined by a plot of the second reconstruction error.
Embodiment 8. The method as recited in any of embodiments 1-7, wherein the model is deployed at each of a plurality of edge nodes.
Embodiment 9. The method as recited in any of embodiments 1-8, wherein the stream of unlabeled data samples comprises data about a movement and/or a position of a physical mobile edge device.
Embodiment 10. The method as recited in any of embodiments 1-9, wherein a size of the margin is variable based on constraints associated with an application domain where the model is deployed.
Embodiment 11. A system, comprising hardware and/or software, operable to perform any of the operations, methods, or processes, or any portion of any of these, disclosed herein.
Embodiment 12. A non-transitory storage medium having stored therein instructions that are executable by one or more hardware processors to perform operations comprising the operations of any one or more of embodiments 1-10.
The embodiments disclosed herein may include the use of a special purpose or general-purpose computer including various computer hardware or software modules, as discussed in greater detail below. A computer may include a processor and computer storage media carrying instructions that, when executed by the processor and/or caused to be executed by the processor, perform any one or more of the methods disclosed herein, or any part(s) of any method disclosed.
As indicated above, embodiments within the scope of the present invention also include computer storage media, which are physical media for carrying or having computer-executable instructions or data structures stored thereon. Such computer storage media may be any available physical media that may be accessed by a general purpose or special purpose computer.
By way of example, and not limitation, such computer storage media may comprise hardware storage such as solid state disk/device (SSD), RAM, ROM, EEPROM, CD-ROM, flash memory, phase-change memory (“PCM”), or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other hardware storage devices which may be used to store program code in the form of computer-executable instructions or data structures, which may be accessed and executed by a general-purpose or special-purpose computer system to implement the disclosed functionality of the invention. Combinations of the above should also be included within the scope of computer storage media. Such media are also examples of non-transitory storage media, and non-transitory storage media also embraces cloud-based storage systems and structures, although the scope of the invention is not limited to these examples of non-transitory storage media.
Computer-executable instructions comprise, for example, instructions and data which, when executed, cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. As such, some embodiments of the invention may be downloadable to one or more systems or devices, for example, from a website, mesh topology, or other source. As well, the scope of the invention embraces any hardware system or device that comprises an instance of an application that comprises the disclosed executable instructions.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts disclosed herein are disclosed as example forms of implementing the claims.
As used herein, the term ‘module’ or ‘component’ may refer to software objects or routines that execute on the computing system. The different components, modules, engines, and services described herein may be implemented as objects or processes that execute on the computing system, for example, as separate threads. While the system and methods described herein may be implemented in software, implementations in hardware or a combination of software and hardware are also possible and contemplated. In the present disclosure, a ‘computing entity’ may be any computing system as previously defined herein, or any module or combination of modules running on a computing system.
In at least some instances, a hardware processor is provided that is operable to carry out executable instructions for performing a method or process, such as the methods and processes disclosed herein. The hardware processor may or may not comprise an element of other hardware, such as the computing devices and systems disclosed herein.
In terms of computing environments, embodiments of the invention may be performed in client-server environments, whether network or local environments, or in any other suitable environment. Suitable operating environments for at least some embodiments of the invention include cloud computing environments where one or more of a client, server, or other machine may reside and operate in a cloud environment.
With reference briefly now to
In the example of
Such executable instructions may take various forms including, for example, instructions executable to perform any method or portion thereof disclosed herein, and/or executable by/at any of a storage site, whether on-premises at an enterprise, or a cloud computing site, client, datacenter, data protection site including a cloud storage site, or backup server, to perform any of the functions disclosed herein. As well, such instructions may be executable to perform any of the other operations and methods, and any portions thereof, disclosed herein.
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.