The present disclosure relates generally to methods, systems, and apparatuses for producing virtual sensors in a quasi-optimal fashion for a collection of related devices. The disclosed techniques may be applied to, for example, generate virtual sensors that detect vehicle engine characteristics, distribute these virtual sensors to existing engines, or install these virtual sensors in new engines before distribution of such.
Virtual sensors are software-generated measurement systems that produce a real time measurement of a quantity. Two possible motivating factors drive the substitution of physical sensors with virtual sensors. First, physical sensors may be expensive to purchase, install, or maintain. Secondly, it may be inconvenient to implement a physical realization of a particular quantity. In both cases, the physical sensor can be approximated by less expensive, and more convenient to use, physical sensors and a model that generates the virtual sensor values from these physical sensors.
If there is sufficient a priori knowledge of a particular device, and the accompanying physics of the dynamics within the device, it may be possible to produce an analytic model of the virtual sensor from the actual sensors. For example, the temperature at every point in a vehicle engine may be approximated by a few temperature sensors accompanied by knowledge of the temperature diffusion characteristics of the materials in the engine.
However, in many cases, the appropriate background knowledge or the expertise to convert this knowledge into a working model is not present. In these cases, an inductive (or empirical) approach is warranted. Such an approach first constructs a model of the virtual sensor values from the actual sensors, and then continuously produces a series of scores (i.e., predictions) from this model meant to represent the value of the virtual sensor. The inductive approach, while not relying on subject-matter expertise, does rely on a training signal in order to construct the appropriate model. This training signal can only be derived from an actual physical sensor. Thus, even though the goal is to replace the physical sensor by a model, one cannot eliminate the use of target physical sensor entirely.
Conventional virtual sensor techniques employ a number of different supervised machine learning methods that may be used to construct such a model, including neural networks and support vector machines, although in principle any inductive technique may be used. Time-series methods such as deep learning that take into account past values of the actual sensors to predict the future value of the virtual sensor may improve the predictive accuracy of this method. Regardless of accuracy of the inductive technique, however, the conventional virtual sensor implementations largely ignore the fact that these supervised models of necessity rely on a training signal that originates from the actual physical sensor that will be replaced. This means that this presumably expensive sensor must be present at least during the training period, cancelling out any cost savings obtained by not having this sensor present.
Accordingly, it is desired to exploit the similarity between devices in an ecosystem in order to minimize the use of actual physical sensors, while accurately reproducing target sensor values.
Embodiments of the present invention address and overcome one or more of the above shortcomings and drawbacks, by providing methods, systems, and apparatuses related to producing a virtual sensor model and distributing this model to a collection of related devices. More specifically, the techniques described herein provide a solution to the more general problem of creating a virtual sensor for a distributed set of devices that are similar but not necessarily identical in nature. Optimization both with respect to training time, number of physical sensors needed for this training, and number and cost of actual sensors needed to generate the virtual sensor are also provided by the techniques described herein.
According to some embodiments, a computer-implemented method producing an inductive virtual sensor model of a target device within an ecosystem of devices includes a computing system identifying a first subset of devices in the ecosystem of devices. Each device in the first subset comprises a target sensor and additional sensors. The computing system collects target sensor data from the target sensor of each device in the first subset of devices, and additional sensor data from the additional sensors of each device in the first subset. A predictive model is trained to predict the target sensor data based on the additional sensor data. The computing system identifies a second subset of devices in the ecosystem of devices lacking the target sensor. Each device in the second subset of devices comprises the plurality of additional sensors. The computing system distributes the predictive model to each device in the second subset of devices.
According to other embodiments of the present invention, a computer-implemented method producing an inductive virtual sensor model of a target device within an ecosystem of devices includes a computing system grouping devices in the ecosystem of devices into a type-of hierarchy of devices. The computing system determines a measure of similarity between each node in the type-of hierarchy. A first subset of devices in the ecosystem of devices is identified. Each device in the first subset comprises a target sensor and additional sensors. The computing system collects target sensor data from the target sensor of each device in the first subset of devices, as well as additional sensor data from the additional sensors of each device in the first subset. A predictive model is trained to predict the target sensor data based on the additional sensor data. The computing system determines a first node in the type-of hierarchy corresponding to the first subset of devices, as well as a second node in the type-of hierarchy. The second node is selected such that the measure of similarity between the first node and the second node is above a predetermined threshold. The computing system identifies a second subset of devices in the ecosystem of devices that (a) correspond to the second node in the type-of hierarchy; (b) lack the target sensor; and (c) comprise the plurality of additional sensors. The computing system may then distribute the predictive model to each device in the second subset of devices.
In some embodiments of the second method described above, after collecting the additional sensor data, the computing system generates a listing of possible combinations of the additional sensors. Then, the computing system applies a heuristic search algorithm to the listing of possible combinations to identify an optimal combination of the additional sensors with respect to (i) number of sensors and (ii) ability to predict the target sensor data based on the additional sensor data. The computing system can then train the predictive model to predict the target sensor data based on the additional sensor data corresponding to the optimal combination of the additional sensors, rather than the full set of additional sensor data.
Additional features and advantages of the invention will be made apparent from the following detailed description of illustrative embodiments that proceeds with reference to the accompanying drawings.
The foregoing and other aspects of the present invention are best understood from the following detailed description when read in connection with the accompanying drawings. For the purpose of illustrating the invention, there are shown in the drawings embodiments that are presently preferred, it being understood, however, that the invention is not limited to the specific instrumentalities disclosed. Included in the drawings are the following Figures:
Systems, methods, and apparatuses are described herein which relate generally to various techniques for the optimization of virtual sensing in a multi-device environment (referred to herein as an “ecosystem”). More specifically, the techniques described herein may be applied to create a virtual sensor model for multiple devices of identical or similar behaviors such that the number of physical sensors needed to create this model is minimized. Other optimizations, including reduction in the number and cost of the physical sensors predicting the virtual sensor values, are also provided in various embodiments discussed herein.
The table below shows a hypothetical flat view of the training problem within this device 100:
For simplicity, the table only shows 5 discrete time steps. Each sensor i has a value sij at time j. In addition, the target sensor (i.e., the sensor to be virtualized) has a value tj for time j.
For the purposes of training the virtual sensor model, this table can be rotated into the more typical form below:
The purpose of the training is to produce a model that takes the sij values and produces a virtual value for the target sensor tj for each time step j. In practice, many more such time steps would be used in training than are shown here. If these allow a sufficiently accurate model, then in the future the physical sensor producing the last column in this table can be removed, and can be substituted by the model itself.
Once the data is in this canonical form, various inductive training algorithms generally known in the art can be used to create a predictive model that generates target sensor values from the values of the other sensors. Example predictive models that may be used in different embodiments of the present invention include, without limitation, linear regression, neural networks, support vector machines, and gradient boost machines. The choice of inductive model will entail a trade-off between training time, the distribution of data in the training space, and the degree of predictive accuracy required.
In some embodiments, past values of the sensors may contain information that improves the accuracy of the model. For example, in the table below, the model is trained to generate a target sensor tj from the current sensor value sij but also the last two values si(j-1) and si(j-2). For the purposes of training, rows corresponding to values occurring before recording began (shown here as “-”) can be ignored. Alternatively, values can be imputed such as the mean value for the sensor for these empty cells and training can include such rows.
Time series methods generally known in the art such as the deep-learning Long short-term memory (LSTM) algorithm may be then used to produce such a model from such a table. Note, however, that unlike in a typical time series problem which can profit by use of the so-called autoregressive past values of the data stream to be predicted, past values of the target sensor do not find their way into the transformed table in virtual sensor time-series training. The reason for this is that when the sensor is removed, these values will be unavailable; that is, there will be no physical sensor present generating past values, and the model must therefore rely on past (or present) values of the other sensors s1-s3 alone.
It is important to realize that, regardless of the transforms and algorithms applied, the values in the column for the virtual target sensor can only be filled in by having an actual physical sensor in place for a period of time. This forms the crux of the problem addressed by the technology described herein. Specifically, if there are a collection of devices with similar operating characteristics, one would have to install an actual sensor in each device to obtain the training column and then remove it to form the best set of virtual sensors for these devices. However, this defeats the purpose of replacing an actual sensor by a virtual model because this method would require n physical target sensors for n devices and result in no savings whatsoever.
One solution is to move a single actual sensor from device to device and train the virtual model in a sequential fashion. But this is costly from both the point of view of logistics and time. Consider that there may be n devices, and the training time for each on average is m seconds, with a mean transfer and installation time of p seconds. Then the total training time over the set of n devices is n*(m+p). One could decrease total training time by using more than one actual sensor at once, training on a subset, moving to new subset, etc., until all devices are trained; however this will entail greater investment in the presumably expensive physical sensor to be virtualized.
The method illustrated in
As before, this table is a simplified version of the actual training data; in most actual cases, there would be many more than 3 training devices, and many more than 5 examples in each. In addition this table may be expanded in a time-series fashion as described above. This illustrates the fundamental aggregation technique: rows of data, each comprising the values of the predicting sensors and with a single target sensor values, may be extracted from each device and appended to form a larger table. This table then includes a new and larger training set than could be produced from a single device, and therefore thus a model that is presumably with greater generalizability to the devices for which training is not explicitly carried out.
Typically, the first column with the device number will be ignored in this training process as it cannot be generalized to other devices. However, in some embodiments, aggregation with static features that are informative may occur. This is illustrated in the table below, showing the identical table from above, but augmented by a feature indicating whether the device is in a factory setting or not.
Note that this static feature does not change from time step to time step within a device. It may vary from device to device, however, and this variation may be useful in generalizing to new devices on which the virtual sensor has not been explicitly trained.
In some embodiments, the locus of aggregation will be the cloud rather than an explicit data repository in a server connected in a physical manner to the devices in question. The advantages of aggregating data and training the model in the cloud are numerous and include the ability to easily combine data from diverse geographic locations, the ability to process large amount of data during training without the need to purchase permanent processing power for this spurt of activity, and access to a number of standard database and training regimes without the need to maintain such.
In yet another embodiment, the adequate subsample of devices is made empirically. Recall that the goal is to reduce the number of devices involved in training as much as possible, in order to minimize training cost. However, if too few devices are used, the resulting virtual sensor model will not generalize well to the untrained devices. There may be situations, for example, that do not appear in a single or small number of devices because of, for example, changes arising in the manufacturing process or differences in background conditions. When these conditions arise on the m untrained devices, predictive accuracy may be poor because they have not been previously encountered.
Once a virtual sensor model has been created, regardless of both the method for doing so and the physical substrate upon which this carried out, the model can be distributed to the other devices for which training was not explicitly carried out. This distribution can be done by the normal channels of distributing software, or via downloading from the cloud as appropriate. Various formats generally known in the art may be used for distributing the model. For example, in some embodiments, the model is distributed via Predictive Model Markup Languages (PMML). Techniques such as PMML have the benefit that they efficiently compress a potentially large amount of information. It should be noted that, regardless of the format employed for distribution, the target devices would need to be equipped to run models encoded in the distribution format.
It should also be noted that, although this distribution can include future devices that have not yet been built (assuming they are of the same type), this extends the range of applicability of this method. For example, if the original data was derived from a subset of 50 engines from a total of 1000, the virtual sensor model can be inherited by not only the remaining 1000−50=950 engines, but also new engines of this type that are constructed after the training takes place.
In the discussion presented above, it was assumed that all devices were identical in nature. However, there may be cases wherein devices that are similar but not identical can benefit from the created sensor model. For example,
This analysis also may be extended to more complex hierarchies, although more caution should be exercised when the distance in the hierarchy between devices is greater. For example, the mere fact that two different engines are produced by the same manufacturer will not, in general, be sufficient motivation for the transfer of virtual sensor models between such engines, unless there is sufficient a priori knowledge to conclude that they are sufficiently similar or the virtual sensor model produced from the data from one engine is validated on data from the other engine.
In some embodiments, each physical device may include a file indicating its compatibility with certain models or model types. As a particular device is connected to the network or otherwise gets activated, the computing system performing the modeling may retrieve the file and decide how to model the device. To continue with the example in
The techniques described above are concerned with the production of a virtual sensor model from the entire collection of physical sensors present on a device. When the aggregated data is translated into tabular format as in the tables presented above, this becomes a matter of applying a standard predictive algorithm to this table. But, as in any learning process, some columns of the table may add little or nothing to the predictive power of the resulting model, and may be removed. In this case, the corresponding sensors may also be removed, unless they have an intrinsic value. That is, unless a sensor is needed to predict the virtual sensor value, or has some other use, such a sensor may be eliminated on all devices, resulting in further cost savings.
In one embodiment, columns in the training table, and the corresponding sensors, can be eliminated by removing all those with mutual information with respect to the training column, (i.e., virtual sensor column) below a given threshold, or by matrix reduction techniques such as principal component analysis (PCA) that retain only those columns whose contribution to these components is above a given threshold.
In another embodiment, the optimal set of columns (and therefore the optimal set of predictive sensors) is not removed before training as described above. Rather, a search is conducted of the training space over the best set of predictive sensors. Then, the set that provides the highest predictive accuracy with the least number of such sensors is retained. That is, multiple trainings are conducted, each corresponding to the one subset of the powerset of the predictive sensors. For example, with 3 sensors a, b, and c, the powerset comprises the set of 7 elements {{a}, {b}, {c}, {ab}, {ac}, {bc}, {abc}}. The table below shows this powerset, and hypothetical Pearson correlation values between the set of predicted and actual virtual sensor values.
In this case, the set {b,c} is the most accurate of the two sensor pairs. The full set {a,b,c} adds only marginally to the predictive power of the model, so the presumably less expensive set {b,c} may be used to predict the virtual sensor, unless sensor {a} has intrinsic value and therefore is in use already. Because there are on the order of 2n elements in this powerset, where n is the number of predictive sensors, this method is very costly for n>5 or so. Thus, in some embodiments, approximate heuristic methods can be used. One such method is a beam search that identifies the best m of n sensors initially, then the best m of combinations of 2 sensors, etc., until the desired number of sensors is reached. Other heuristic search techniques generally known in the art may applied in other embodiments.
In another embodiment, the utility function guiding the search is not the total number of sensors, but the total cost. Here, the objective is to produce the most accurate virtual sensor model such that the combined cost of the predictive sensors is under a pre-specified threshold. As before, a number of heuristic search techniques generally known in the art can be used. The result of this search process will be a collection of systems that accurately predicts, in the ideal case, the values of the expensive virtual sensor with a collection of low-cost sensors. For example, suppose the maximum allowable cost is $42. The table below shows both predictive power as revealed by the Pearson correlation between predicted and actual sensor values, and the cost of the sensor combinations.
The cost threshold eliminates the last two rows of the table, and the best predictive set of the remaining rows will therefore be {a,c}.
In another embodiment, the utility function is a combination of cost and predictive accuracy, and the search is conducted with this as the guiding evaluation function. Again, the goal is to predict the virtual sensor by low cost, but in this case, depending on the weight given in the utility function to predictive accuracy, medium cost solutions with greater accuracy may be preferred. In some embodiments, this search or the analogous search for a minimal number of sensors regardless of cost may be accelerated by distributing each element in the powerset to a unique processor for evaluation; these evaluation steps are completely independent of each other and therefore completely parallelizable.
At step 510, with the first subset identified, the computing system collects target sensor data from the target sensor of each device in the first subset of devices. Then, at step 515, additional sensor data is collected from the additional sensors of each device in the first subset. In some embodiments, each device is configured to push its sensor data to the computing system as its generated or at regular intervals (e.g., hourly). In other embodiments, the computing system communicates with each device (e.g., over the Internet) to retrieve one or more files including the sensor data. Once the additional sensor data is retrieved, at step 520, a predictive model is trained to predict the target sensor data based on the additional sensor data.
Once the predictive model is trained, it may be used to create a “virtual sensor” that takes the place of the target sensor. Continuing with reference to
Continuing with reference to
At step 630, the computing system trains a predictive model to predict the target sensor data based on the additional sensor data. Techniques of predictive models based on a set of training data vary between the types of model employed; however, these techniques are generally known in the art and, thus not detailed herein. In some embodiments, a subset of the additional sensor data may be used rather than all of the sensor data. This may be especially useful in instances where there are a large number of additional sensors and a correspondingly large set of additional sensor data. To reduce the amount of additional sensor data that may be needed for model training, the computing system may first generate a listing of possible combinations of the additional sensors. Then, the computing system can apply a heuristic search algorithm (e.g., beam search) to the listing in order to identify an optimal combination of the additional sensors with respect to (i) the number of sensors and (ii) the ability to predict the target sensor data based on the additional sensor data.
As noted above devices in the first subset correspond to a first node in the type-of hierarchy generated at step 605. At step 630, the computing system determines the measure of similarity between the first node and other nodes in the type-of hierarchy (e.g., based on the data calculated at step 605). Next, at step 635, the computing system selects a second node in the type-of hierarchy having a measure of similarity above a threshold value. This threshold value may be set during each execution of the method 600, for example, by a user or fixed values may be employed. If multiple nodes are above the threshold value, the node with the maximum measure of similarity may be selected or other selection mechanisms may be used (e.g., random selection). Once the second node has been selected, at step 640, the computing system identifies a second subset of devices in the ecosystem of devices that meet at least three criterion: (a) each device corresponds to the second node in the type-of hierarchy; (b) each device lacks the target sensor; and (c) and each device comprises the plurality of additional sensors. Once the second subset of devices is identified, at step 645, the computing system distributes the predictive model to each device in the second subset of devices. The computing system may use techniques similar to those discussed above with respect to step 530 of
Parallel portions of a big data platform and/or big simulation platform may be executed on the platform 700 as “device kernels” or simply “kernels.” A kernel comprises parameterized code configured to perform a particular function. The parallel computing platform is configured to execute these kernels in an optimal manner across the platform 700 based on parameters, settings, and other selections provided by the user. Additionally, in some embodiments, the parallel computing platform may include additional functionality to allow for automatic processing of kernels in an optimal manner with minimal input provided by the user.
The processing required for each kernel is performed by a grid of thread blocks (described in greater detail below). Using concurrent kernel execution, streams, and synchronization with lightweight events, the platform 700 of
The device 710 includes one or more thread blocks 730 which represent the computation unit of the device 710. The term thread block refers to a group of threads that can cooperate via shared memory and synchronize their execution to coordinate memory accesses. For example, in
Continuing with reference to
Each thread can have one or more levels of memory access. For example, in the platform 700 of
The embodiments of the present disclosure may be implemented with any combination of hardware and software. For example, aside from parallel processing architecture presented in
While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purposes of illustration and are not intended to be limiting, with the true scope and spirit being indicated by the following claims.
An executable application, as used herein, comprises code or machine readable instructions for conditioning the processor to implement predetermined functions, such as those of an operating system, a context data acquisition system or other information processing system, for example, in response to user command or input. An executable procedure is a segment of code or machine readable instruction, sub-routine, or other distinct section of code or portion of an executable application for performing one or more particular processes. These processes may include receiving input data and/or parameters, performing operations on received input data and/or performing functions in response to received input parameters, and providing resulting output data and/or parameters.
A graphical user interface (GUI), as used herein, comprises one or more display images, generated by a display processor and enabling user interaction with a processor or other device and associated data acquisition and processing functions. The GUI also includes an executable procedure or executable application. The executable procedure or executable application conditions the display processor to generate signals representing the GUI display images. These signals are supplied to a display device which displays the image for viewing by the user. The processor, under control of an executable procedure or executable application, manipulates the GUI display images in response to signals received from the input devices. In this way, the user may interact with the display image using the input devices, enabling user interaction with the processor or other device.
The functions and process steps herein may be performed automatically or wholly or partially in response to user command. An activity (including a step) performed automatically is performed in response to one or more executable instructions or device operation without user direct initiation of the activity.
The system and processes of the figures are not exclusive. Other systems, processes and menus may be derived in accordance with the principles of the invention to accomplish the same objectives. Although this invention has been described with reference to particular embodiments, it is to be understood that the embodiments and variations shown and described herein are for illustration purposes only. Modifications to the current design may be implemented by those skilled in the art, without departing from the scope of the invention. As described herein, the various systems, subsystems, agents, managers and processes can be implemented using hardware components, software components, and/or combinations thereof. No claim element herein is to be construed under the provisions of 35 U.S.C. 112(f) unless the element is expressly recited using the phrase “means for.”
This application claims priority to U.S. Provisional Patent Application Ser. No. 62/567,147, filed on Oct. 2, 2017, the entire contents of which are hereby incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
62567147 | Oct 2017 | US |