The present application relates to a method for determining a sensor configuration in a vehicle which includes a plurality of sensors.
Modern vehicles include a large number of sensors for detecting a variety of state variables, as for example rotational speeds of wheels, shafts, gears etc., temperature, force, torque, voltage, current, acceleration about roll axis, pitch axis, yaw axis, etc. Further, vehicles sometimes include sensors that determine a location of the vehicle, or a distance of the vehicle from other vehicles or from obstacles. Other sensors are cameras that detect visual or non-visual images, for example rear view cameras, infrared cameras, etc. The sensors are based on a variety of different technologies, like for example rotary encoders, temperature probes, voltmeters, radar transmitters and receivers, CCD chips, etc.
The large number of sensors in a vehicle contribute to the weight, the complexity and the costs of the vehicle.
The present application aims to at least partially solve the above problems.
The above object may be achieved by a method for determining a sensor configuration in a vehicle which includes a plurality of sensors, comprising the steps of: establishing a preliminary sensor configuration for the vehicle, which sensor configuration includes a first number of real sensors, each of which outputting a real sensor signal; determining whether at least one of the real sensors can be replaced by a virtual sensor; and changing the preliminary sensor configuration into a final sensor configuration which includes a second number of real sensors and at least one virtual sensor, wherein the second number is smaller than the first number.
A real sensor is a piece of hardware that measures a certain state variable, particularly a physical entity, as for example a rotational speed, a force, a torque, light, etc.
A virtual sensor is a software module that receives at least one measurement signal from a real sensor and optionally other parameters and/or variables or signals, and calculates a physical target value from these inputs, preferably in real time.
An idea of the present application is to find an optimum sensor configuration both in real and virtual sensors of the vehicle, namely to replace as many real sensors as possible by virtual sensors and to find preferably an optimum between the accuracy that can be achieved by the virtual sensors and the costs produced by real sensors.
The step of determining whether at least one of the real sensors can be replaced by a virtual sensor includes preferably the use of artificial intelligence, particularly the use of machine learning technology.
In many cases, real sensor signals are recorded and then evaluated. The recording of the real sensor signals may be conducted during a test run of the vehicle, wherein the evaluation of the recorded real sensor signals is conducted subsequently on a stationary evaluation computer.
In an alternative embodiment, the recording of the real sensor signals is conducted during a test run of the vehicle, wherein the evaluation of the recorded real sensor signals and the replacement of at least one of the real sensors by a virtual sensor is conducted during the test run on a mobile evaluation computer.
In addition, it is possible to conduct at least one of the recording of the real sensor signals, the evaluation of the recorded real sensor signals and the replacement of at least one of the real sensors by a virtual sensor on a simulation computer.
The use of a mobile evaluation computer has the advantage that the impact of the replacement of a real sensor on the vehicle behavior can be immediately experienced. On the other hand, the use of a simulation computer has the advantage that real test drives can be dispensed with.
In some evaluation examples, the step of determining whether at least one of the real sensors can be replaced by a virtual sensor is not conducted for each of the real sensors. Rather, some of the real sensors may be categorized as “irreplaceable”, due to safety considerations, for example. Secondly, some sensors are very cheap and have a low weight. Therefore, one might consider to conduct the evaluation whether a certain real sensor can be replaced by a virtual sensor only in case that the real sensor has a significant weight and/or has significant costs. Further, some real sensors in certain environments can be defined as “must be replaced”. This applies for example to development environments, in which the preliminary sensor configuration includes not only sensors that are to be realized in the vehicle that is being produced. Rather, such development environment may include sensors that are set up and connected for development purposes only. These “development sensors” are not available anymore in the series-production vehicle, and therefore are considered to be “must be replaced”.
In addition, accuracy of the virtual sensor may be a relevant consideration, as well as a time delay which a virtual sensor might have in comparison to the real sensor. The time delay might be caused by complex calculations on the basis of the inputs to the virtual sensor. On the other hand, some virtual sensors may not be as accurate as the real sensor which is replaced by this virtual sensor. The loss of accuracy and the time delay may have an impact on the vehicle behavior which, in some cases, is to be analyzed and evaluated as well in order to determine whether the replacement of a real sensor is possible or not. The question of whether a real sensor can be replaced by a virtual sensor is therefore often not a clear yes or no but a matter of considering several boundary conditions that might, in addition, be weighted in order to arrive at a preferred final sensor configuration.
In addition, while it is possible to replace a real sensor for cost reasons, the present application might also be used in order to not replace a real sensor but create a secondary—virtual—sensor for the real sensor, so as to improve redundancy and possibly safety of the sensor configuration.
The object is achieved in full.
In a preferred embodiment, the determining step includes recording the real sensor signals of at least a subset of the first number of real sensors, evaluating the recorded real sensor signals in order to determine whether at least a first one of the real sensors can be replaced by a first virtual sensor that receives at least one real sensor signal from a second real sensor and outputs a virtual sensor signal that emulates the real sensor signal of the first real sensor.
As discussed above, the evaluating step can be conducted in a number of different steps.
In a preferred embodiment, the evaluating step includes the use of a Boltzmann machine having a number of visible nodes, each visible node representing a real sensor, and having a number of hidden nodes, the hidden nodes being computed by exploiting combinations of nodes.
The use of a Boltzmann machine in the evaluating step is a brute force approach. The Boltzmann machine is an undirected generative stochastic neural network that can learn the probability distribution over its set of inputs. It is always capable of generating different states of a system.
A Boltzmann machine is able to represent any system with many states given infinite training data. In the present case, the system at first represents the preliminary sensor configuration. The visible nodes are features/inputs to the system which are the real sensors in the vehicle. The hidden nodes are nodes to be trained that will identify and exploit the combination of the visible nodes. Essentially, a Boltzmann machine tries to learn how the nodes are influencing each other by estimating the weights in their edges (edges resemble the conditional probability distributions).
In theory, once the model is trained, the Boltzmann machine is capable to reconstruct all sensors given only one sensor. In other words, this theoretical approach would lead to a concept wherein only one single physical sensor is necessary in order to construct all other sensors in a vehicle.
While in theory the Boltzmann machine is a great model and can solve many problems, in practice it is very difficult to implement. This is due to computations power because the increase of the number of nodes leads to an exponential increase of the edges/connections. If a preliminary sensor configuration uses 200 sensors in a vehicle, and if additional 400 hidden nodes are added, then the number of edges will be 600×(600−1)/2=179,700 edges.
Therefore, typically a restricted Boltzmann machine (RBM) is used, where nodes from the same type do not connect to each other. This concept is used to trade performance with the ability to run the computations. The RBM is trained and predicts in the same way as the Boltzmann machine, using a contrastive divergence algorithm.
A Boltzmann machine and a restricted Boltzmann machine are structures which might not respect the temporal dependency in a time series. Therefore, the Boltzmann machine that is used is preferably a Recurrent Temporal Restricted Boltzmann Machine. Namely, this Boltzmann machine can be used when dealing with signals and time series. The recurrent temporal restricted Boltzmann machine (RTRBM) uses Recurrent Neurons as memory cells that remember the path, and uses a back propagation through time in a contrastive divergence algorithm to train the model. A very advanced and powerful type of an RTRMB is a RNN-Gaussian dynamic Boltzmann machine which is preferably used to model the sensor configuration.
In general, and as explained above, the step of determining whether at least one of the real sensors can be replaced by a virtual sensor, is a function of an accuracy of the virtual sensor and/or of a driving behavior of the vehicle and/or of the costs of the real sensor to be replaced.
It is preferred here if the accuracy of the virtual sensor and/or the driving behavior of the vehicle and/or the costs of the real sensor to be replaced are weighted and are calculated to a target value.
The above concepts of determining whether at least one of the real sensors can be replaced by a virtual sensor are more or less based on a brute force approach. However, there are also ways to conduct this step on the basis of a causation analysis.
Accordingly, it is preferred if the determining step includes detecting and recording the outputs of at least a subset of the real sensors for a predetermined number of temporarily subsequent sampling steps, and conducting a causation analysis which determines causations between the recorded outputs of the real sensors.
The term causation or causality is to be understood as a relationship between causes and effects. The basic question in this approach is whether and to which extent one real sensor causes another real sensor. The terms causation, causality and correlation are used within this application in an exchangeable manner. For each of those terms, the broadest interpretation is to be applied.
Further, in the present application, if one sensor output causes another sensor output, this means essentially that the other sensor output is dependent on the one sensor output.
Preferably, the outputs of all sensors of the preliminary sensor configuration are passed to an algorithm which will decide whether sensors can be replaced, and which preferably is able to build a model of the sensor (the virtual sensor) that replaces the real sensor.
In a preferred embodiment, it can be assumed that the preliminary sensor configuration forms a sensor space X. A dependency graph which reflects the dependencies between the real sensors or the causations between the real sensors, has a plurality of edges, which can be built by the following statement:
{∀x,∀y∈X;Ex,y=C(x,y)+f|x≠y}
In this statement, C is a bivariate causation function, f is a penalty factor taking for instance the cost or safety aspects (like redundancy) into account, and Ex,y are the coefficients of the edges.
Some measures that are said to measure causal relations are Granger causality, Transfer Entropy, Conversion Cross Mapping, and Mutual information, while correlations can be estimated by Pearson Autocorrelation algorithms.
For each mentioned measure one has to specify a maximum lag (shift). The lag typically corresponds to a certain number of temporally spatial samples. In one lag, the number of samples of different sensors may be different, because different sensors may have different sampling frequencies. For example, if one signal has a sampling time of 10 ms (corresponding to a sampling frequency of 100 Hz) and if another signal has a sampling time of 100 ms, a lag of 10 would then mean for the first signal a time period of 100 ms that is being considered, and for the second signal a time period of 1 s. For any causation calculation, it does not matter if the signals have the same time basis or not. This might eventually be relevant for a training of a virtual sensor later on, however.
The result of the causation metrics is a matrix, which preferably includes the relations between the sensors. The relations or values or causations of the matrix are preferably normalized or standardized so that a maximum causation has a value of 1 and a minimal causation has a value of 0.
In a preferred embodiment, the causations between the recorded outputs of the real sensors are determined for at least a subset of the samples, wherein the causations determined for the subset of samples are subjected a post-processing in order to determination a final causation set or matrix between the recorded outputs of the real sensor.
A subset of samples can be defined to be a signal vector of the form [lag; now], where lag <or =max lag.
Further, in a preferred embodiment, a Directed Cyclic Graph (DCG) is established on the basis of the determined causations. The weights in the directed edges explain “how much the sensor causes/correlates to the other sensor”. One can detect cycles in a graph using a Depth-First Search (DFS) algorithm.
In order to find out which real sensor can be replaced best, it is preferred if the DCG is converted into a Directed Acyclic Graph (DAG), wherein either the real sensor with the highest or the one with the lowest causation is taken as a root for the DAG.
One can use several algorithms in order to convert the DCG to the DAG. Another strategy would be just taking the most dependent sensor as a root and build the tree from there.
The directed acyclic graph is a tree having a root and a stem and finally leaves. For example, sensors can be replaced by removing the leaves at the node of the tree. Each sensor at the leaves will go through a model identification pipeline where the target value is the sensor signal, that is to be reconstructed, and wherein the inputs are the corresponding parents in the tree. Further, one can remove more levels, but it should be born in mind that the more levels are removed, the less accurate the reconstruction would be.
As another example, it is possible to identify replaceable sensors by using a graph ranking algorithm on the DCG, in order to identify the most important sensors based on its outgoing and incoming causation edges. One such sensor is the personalized PageRank algorithm.
Correspondingly, it is preferred if at least one real sensor which forms a leave or a root, respectively, in the DAG is determined to be replaceable.
When a DCG is established, it is an alternative preferred approach to compute a rank matrix from the DCG, wherein at least one real sensor is determined to be low rank and thus replaceable.
The rank matrix may be computed on the basis of a ranking algorithm. One example for such ranking algorithm is the Page Rank algorithm as is used in search engines.
As another preferred example for using an DCG, it is possible to generate a stochastic probabilistic process from the DCG, wherein a state of at least one real sensor can be reached by the state of another real sensor and can thus be determined to be replaceable.
A stochastic probabilistic process can be realized by a probability algorithm, as for example Markov Chain Monte Carlo (MCMC).
Further, it is preferred if a mathematical model for the real sensor that has been determined to be replaceable, is determined on the basis of a statistic or deterministic approach (algorithm).
Particularly, for each pruned leaf from the previous step, the leaf is taken as a label and the causation branches (starting from the root to the leave) are taken as features to train a model. As an advantageous way to identify a model able to replace the given sensor (i.e. turn the real sensor into a virtual sensor), any statistical or deterministic algorithm that can learn the representation of signals can be used to build the model that will be used to reconstruct the sensor.
For example, a neural network architecture called Time Delayed Neural Network (TDNN) can be used.
Such network is a feed forward neural network that can be applied to time series. The general architecture will be used for all pruned leaves. However, hyperparameters of the model should be optimized using optimization algorithms in order to help the general algorithm architecture to be specific for the given problem, such as Grid search, Random search or Bayesian hyperparameter optimization.
For the example of the TDNN, the following hyperparameters can be optimized: number of neurons, number of layers, drop-out rate, etc.
Once the model is trained and evaluated against a test set, it is preferred if weights and parameters of the model are extracted and if the prediction of the model is calculated through feedforward calculation.
When using the approach of a causation analysis, there is one important aspect, which is that a causation can depend on the current system state. For example, the speeds involved with a transmission of the car might have the following causalities: while a starting clutch is closed, there is a high causality between the engine speed and the wheel speed. On the other hand, when the clutch is open, the causality is lower. There are several approaches to tackle this issue, like for instance:
The whole method is highly parallelizable, where building the graph, building the model for each pruned leaf and model optimization can be multi-threaded.
Further, there are many ways to set the maximum lag. One heuristic approach is suggested by Schwert (1989) as a rule of thumb. It is calculated as following:
Max._lag=[12×(T/100)0.25]
where T is the number of observations in a signal, i.e. the length of the signal. As mentioned, the Schwert rule of thumb is an ad hoc approach, and getting the lag value correctly is challenging because too small lag values will bias the statistical test. However, too large values will enlarge the power of the statistical test. There are many publications that suggest that it is better to error on the side that includes too many lags (type 2 error).
In another preferred aspect of the application, the determining step includes detecting and recording the outputs of at least a subset of the real sensors, and conducting a causation analysis which determines causations between the recorded outputs of the subset of real sensors, wherein the causation analysis includes building a component-wise neural network, CWNN, where each real sensor of the subset of real sensors corresponds to one of the components of the CWNN, wherein each component is formed by a virtual sensor which is trained so as to emulate a respective real sensor.
The virtual sensor is preferably a sub-model of the neural network. In a preferred embodiment, the training step uses the outputs of some or each of the other real sensors of the subset of real sensors. Further, past outputs of the real sensor to be emulated (the so-called target) may be used as well for training the virtual sensor.
The virtual sensors (the sub-models) of the neural network may be trained individually or all together.
Preferably, the training step includes applying sparsity inducing penalty to respective first hidden layers of at least some of the virtual sensors.
Preferably, the sparsity inducing penalty is applied to respective first hidden layers of each of the virtual sensors. When applying sparsity inducing penalty, similar features are grouped together using parameter tying technique, and features that do not Granger-cause the target are zeroed-out.
In one embodiment it is preferred if the sparsity inducing penalty is chosen from the family of Group Lasso regularizations.
In another preferred embodiment the sparsity inducing penalty is chosen from the family of Group Order Weighted Lasso (GrOWL) regulations.
In addition, it is preferred if the sparsity inducing penalties are optimized using a sparsity inducing optimizer so as to generate a sparse model.
Here, it is preferred if the sparse model is optimized using a semi-stochastic Proximal Gradient Descent, SPGD, algorithm.
In an alternative preferred embodiment, the sparse model is optimized using a Follow the Regularized Leader, FtRL, algorithm.
In the preferred aspect of the application, it is generally preferred if a causation vector is computed for each trained virtual sensor (sub-model), and wherein the causation vectors are concatenated to generate a causation matrix.
In this case, it is preferred if computing the causation vectors for the respective sub-models includes:
Here, it is advantageous if ranking the clusters by importance is done by a permutation test method.
In an alternative embodiment, ranking the clusters by importance is done by a Zero-out method.
It will be understood that the features of the application mentioned above and those yet to be explained below can be used not only in the respective combination indicated, but also in other combinations or in isolation, without leaving the scope of the present invention.
Exemplary embodiments of the application are explained in more detail in the following description and are represented in the drawings, in which is:
In
The rear wheels 16L, 16R are driven wheels driven by a drive train 18.
The drive train 18 includes an internal combustion engine 20 and a transmission arrangement 24. The internal combustion engine 20 and the transmission arrangement 24 are preferably connected via a clutch arrangement 22, for example a starting clutch.
Typically, the transmission arrangement 24 includes multiple shiftable gear stages 25 for establishing a number of gear stages.
An output of the transmission arrangement 24 is connected to a differential 26 which is adapted to distribute drive power to the driven rear wheels 16L, 16R.
The vehicle 10 includes a number of sensors, for example an engine speed sensor 30 for detecting the rotary speed Seng of the internal combustion engine 20.
Further, the transmission arrangement 24 includes a first transmission speed sensor 32 which detects the speed of an input shaft of the transmission arrangement. Further, the transmission arrangement 24 includes a second transmission speed sensor 34 which detects a second transmission speed, for example the rotary speed ST, Strn of an output shaft of the transmission arrangement 24.
In addition, the drive train 18 may include a left driven wheel sensor 36 for measuring a rotary speed SL, Swl of the left driven rear wheel 16L, as well as a right driven wheel sensor 38 for detecting a rotary speed SR, Swr of the right driven wheel 16R.
The sensors 30 to 38 are connected to a controller 40, which can be drive train controller 18. The controller 40 may be a multi-system controller, comprising for example a transmission controller, an internal combustion engine controller, etc.
Further, the vehicle 10 may include further sensors, for example an engine torque sensor 42 for detecting a torque provided by the internal combustion engine 20. Further sensors may include a clutch position sensor 44 for detecting a clutch position of the clutch arrangement 22, as well as one or more temperature sensors 46, for measuring for example the temperature of fluid in the transmission 24.
The vehicle 10 may include a large number of further sensors, which measure for example the rotary speed of electric motors for adjusting an inclination of a vehicle seat, a temperature sensor for measuring the temperature in a vehicle compartment, radar sensors for measuring distances (for example LIDAR), camera sensors for detecting the surrounding of the vehicle, acceleration sensors for detecting roll movement, pitch movements and/or yaw movements. In addition, a number of electrical sensors for measuring electrical voltage, electrical currents etc. may be provided.
At least some of the sensors, preferably each of the sensors are connected to a controller of the vehicle, which might include the drive train controller 40 mentioned above.
In addition, any controller (for example controller 40) may be connected via a wireless communication 48 to a network 46 outside of the vehicle 10, for example the Internet, a GPS network, a cellular telephone network, a wireless local area (WLAN, Wifi) network, etc.
The evaluation computer 50 is connected to at least one the controllers of the vehicle, for example the controller 40 and is adapted to conduct a method for determining a sensor configuration in the vehicle 10, which includes the plurality of sensors, including the steps of determining a preliminary sensor configuration for the vehicle, which preliminary sensor configuration includes a first number of real sensors, each of which outputting a real sensor signal, and the step of determining, whether at least one of the real sensors can be replaced by a virtual sensor, and comprising the step of changing the preliminary sensor configuration into a final sensor configuration which includes a second number of real sensors and at least one virtual sensor, wherein the second number is smaller than the first number.
The method may be conducted in accordance with a number of different embodiments, some of which being explained below. The below embodiments mainly relate to a sensor configuration for the drive train 18. However, the embodiments that are presently applied to the drive train 18, may be applied to other parts of the vehicle 10 as well, for example to a navigational system configuration, to a temperature control configuration, etc.
In the diagrams, one assumes that the present time is t. Further, it is assumed that each of the sensors have a similar sampling frequency, corresponding to an identical sampling period, although this is not necessary.
In many cases, there will be a so-called best lag 60, which corresponds typically to a number of single lags 56 and is smaller than the maximum lag 58. The best lag 60 corresponds to a window 54′. At present, the best lag 60 corresponds to eight single lags, i.e. to a time period from t to t-8.
The best lag can be determined by one or more of the following:
In the diagrams of
In a period from t-25 to t-20, SL deviates from ST and is larger than ST. Similarly SR is smaller than ST during the time period t-25 to t-20.
At t-15, the transmission output speed ST starts to decrease to zero. The output transmission speed of zero is achieved at t-10.
At this point, the vehicle is at a stop. Correspondingly, SL and SR are also zero.
If the driver wishes to start the vehicle again, he might experience on a p-split road a situation, where for example the right driven speed SR maintains zero for a few samples, while the other driven wheel speed SL increases.
The right driven wheel speed SR maintains at zero from t-10 to t-5 and then takes up speed again, for example due to a braking effect that an anti-slip control imparts onto the right driven wheel.
At t, the speeds ST, SL and SR are identical again.
From
Nevertheless, one can say that the driven wheel speeds SL, SR are causing the output transmission speed ST, at least for certain situations, and preferably for most of the time.
The question arises whether any of these three sensors, which are real sensors in the example of
The sensor configuration process includes the use of a so-called causation stage 72, into which are input real sensor signals and optionally other parameters. The causation stage 72 includes a causation matrix 74 that is established on the basis of the real sensor signals, which are looked at for a certain lag, ideally the best lag.
The causation matrix 74 is based on the causations between the recorded outputs of the real sensors, which are determined for at least a subset of the samples (e.g. a best lag), and wherein the causations determined for the subset of samples are subjected to a post processing in order to determine a final causation set or matrix between the recorded outputs of the real sensors.
In other words, the causation matrix is one representation of the result of a causation analysis which determines causations between the recorded outputs of the real sensors.
The causation matrix 74 is used to establish a directed cyclic graph (DCG).
In box 76 of the causation stage 72, a conversion process is conducted, in order to convert the DCG into a directed acyclic graph (DAG), wherein either the real sensor with the highest or the one with the lowest causation is taken as a root for the directed acyclic graph.
In the directed acyclic graph (DAG), at least one real sensor which forms a leaf or a root of the graph, respectively, is determined to be replaceable.
In other words, the causation stage 72 determines which of the real sensors can be replaced by a virtual sensor.
The output of the causation stage 72 is entered into a modeling stage 78, which is used for modeling a virtual sensor that shall replace a real sensor. The modeling stage 78 includes a model building process 80 in which a model of the virtual sensor is built. Further, the modeling stage 78 includes a model optimization process 82 in which the model of process 80 is optimized.
Finally, the virtual sensor is included in the final sensor configuration, which is shown at 84, on the basis of which code is generated for implementing the virtual sensor.
In line one, sensor X1 is shown to cause sensor X2 at a factor of 0.8 (causation 75a), and sensor X5 at a factor of 0.7, while X1 does not cause any of the sensors X3, X4, X6 at all.
The causation is typically a value between 0 and 1, wherein 0 means that a sensor does not cause another sensor at all. On the other hand, a “1” means that a sensor fully causes another sensor, so that the other sensor is redundant or even superfluous. In any case the other sensor can be replaced by the first sensor.
Another example is for instance that sensor X4 causes sensor X3 at a value of 0.4 (causation 75b), while sensor X3 causes X4 at a value of 0.7. These two sensors X3, X4 do not cause any of the other sensors.
In
In the directed cyclic graph (DCG), the weight in the directed edges explains “how much the respective sensor causes/correlates the other sensor”. One can detect the cycles in a graph using a depth-first search (DFS) algorithm. In order to identify the dependency order of the sensors, the DCG must be transformed into a directed acyclic graph (DAG).
The directed cyclic graph DCG 88 of
The DAG 90 is
On the other hand, X2 causes X3, and X3 causes X4.
The graph or tree in
In the DAG of
The above example illustrates a simple sensor space or configuration, wherein the DAG tree is built based on the understanding that all sensors in the configuration are generally replaceable, and then remove the least dependent sensors. However, in a dynamic system like a car, there are many sensors that are redundant for safety reasons, and should be neither removed nor replaced. Therefore, it is important for the algorithm to distinguish these important unreplaceable sensors in the configuration/space. Therefore, for example, one can either associate with each sensor a flag variable that indicates whether it is replaceable or not, or reflect the irreplaceability with a high penalty factor f. Then, the algorithm that converts DCG to DAG may put the irreplaceability into consideration when building the tree. For example, in the former approach (the flag), the algorithm assigns the sensor Xi that has a flag value “irreplaceable” and is the most dependent sensor as a root and builds the tree from there or can be excluded from the procedure.
In some cases, a combination of two or more sensors (any time signal operation such as some summation, differencing or dynamic scaling) can cause another sensor. If the caused sensor is worth to be replaced, then one merges the sensor into a new hybrid sensor, which will be added to the sensor configuration. A simple example can be seen when reconstructing the output speed of a transmission: the speed of left or rear wheel alone does not cause the transmission output speed during cornering, due to the differential. However, the mean or average value of both wheel speeds causes directly and can be used to infer the transmission output speed ST. When two or more sensors are combined, then all involved sensors in the combination should be flagged with “irreplaceable”.
From the above analysis, one knows that a combination of both wheel speed Swr, Swl causes directly the speed ST (Swl, wr) of the transmission. Therefore, one can create a new linearly combined sensor Swl, wr, add it to the sensor space and toggle the reproducible flag for Swl and Swr. This has been done in the causation matrix of
When one assumes that a causation graph is built with a maximum lag of 50, the values in the causation matrix might differ on the basis of the lag value (in the example above, the values are assumed based on domain knowledge and are not actually computed).
The causation matrix 74″ of
The DCG 88″ of
It can be seen that the speed Swl, wr corresponding to the output transmission speed ST has been taken as a root, which real sensor signal causes Seng, Swl, and Swr.
Further, Seng causes Stran to some extent (corresponding ST).
In view of the above, it can be seen that Stran forms the leaf of the directed acyclic graph 90″ and thus indicates that the corresponding real sensor 32 might be replaced by a virtual sensor.
In
In the modelling stage 78, the causation branches of the DAG are taken as features to train the model. As an advantageous way to identify a model able to replace the given sensor (i.e. turn the real sensor into a virtual sensor), any statistical or deterministic algorithm that can learn the representation of signals can be used to build the model that will be used to reconstruct the sensor. For example, neural network architecture called Time Delayed Neural Network (TDNN) can be used. This is a type of feed forward neural network that are suitable for time series. This general architecture can be used for all pruned leaves. However, the hyperparameters of the model should be optimized using an optimization algorithm (in 82), to help the general algorithm architecture be specific for the given problem, such as, Grid Search, Random Search or Bayesian hyperparameter optimization. For the example of TDNN, the following hyperparameters can be optimized: number of neurons, number of layers, drop-out rate, etc.
Once the model is trained and evaluated against a test set, the final sensor configuration of the process 70 will extract the weights and the parameters of the model and calculate the prediction of the model through feed forward calculation.
In
The Boltzmann machine, which is shown for example in
The Boltzmann Machine and its variations train the model using a Contrastive Divergence algorithm. In a nutshell, the training works as follows:
1. randomly initialize the weights between the nodes;
2. feed a sample input vector to the visible nodes;
3: compute the hidden nodes based on the weights and a global bias (feed forward approach);
4. reconstruct the visible nodes from the hidden nodes;
5. compare the visible nodes versus the reconstructed visible nodes, using for example a Kullback divergence;
6. update the weights based for example on the Kullback divergence loss function using a gradient descent; and
7. repeat steps 2 to 6 for all feature samples until convergence.
While in theory the Boltzmann Machine of
Therefore, one can use a variant of the Boltzmann machine called Restricted Boltzmann Machine (RBM), where nodes from the same type do not connect to each other, as is shown in
Here the visible nodes 102′ connect to the hidden nodes 104′ by edges 106′, but neither the visible nodes 102′ connect to each other, nor do the hidden nodes 104′.
In the above example for different speeds, namely the engine speed, Seng, the transmission speed Stran, the speed Swl of the left driven wheel, and the speed Swr of the right driven wheel, a Restricted Boltzmann Machine (RBM) can be established as shown 100″ in
Here, the visible nodes 102″ correspond to the above-mentioned four speeds. Further, a number of hidden nodes is established, wherein the number of hidden nodes is preferably larger than the number of hidden nodes.
During training, a model requires a big data set of all sensors as shown in
After training the model, one can provide the information of the physical sensors (real sensors), that one does not wish to replace. These sensors help to identify the current state of the system, and compute the values of the missing sensors as is shown in
Here, the engine speed and the transmission speed are measured by real sensors, and the speeds FL (corresponding to Swl) and FR (corresponding to Swr) are computed by the RBM as shown in
The advantage of this brute force algorithm is that, once the model is trained, one can at any time remove or add a sensor without the need to re-train or reconfigure the model. This is quite useful if a physical sensor fails, because then the model will keep working sufficiently.
As mentioned above, in theory, a BM would probably be the best concept to represent a system, particularly a recurrent version of it. However, it is difficult to be implemented due to lack of computation power. Nevertheless, this might be easier in the future.
There are further variations of a Boltzmann Machine, such as a Deep Boltzmann Machine. But the intuition is the same. The only difference is that it will require more effort and resources to compute in an attempt to generalize better to the given problem.
The Boltzmann machine is inspired by the Markov Chain Monte Carlo (MCMC) algorithm. More specifically, the training algorithm, Contrastive Divergence, is based on Gibbs Sampling that is used in MCMC for obtaining sequence of observations which are approximated from a specified multi-variant probability distribution.
With a Boltzman Machine approach, the trained model for a vehicle is only applicable to this vehicle. There is no guarantee that it is applicable to other vehicles, even from the same series.
Even if BM sounds tempting, one might still prefer the above graph networks approach of
In the above description, several terms have been used, which will be defined as follows:
A lag refers to a passed point of a time signal.
A maximum lag is the maximum point in the past that one can look to.
Best lag. This is some time point in the past that happens between observed time and maximum lag. It is the best lag because the sliding window from observed time until this lag time is producing the best causality value, which in turn will be potentially the best to model the needed observed value.
Sliding window. It is a way to reconstruct a time series into windows with size of lag. Then the window is shifted by a step. For example, this is the following time series:
When a lag of 3 is chosen and a step of 1, the sliding window would be:
The term “feature” is a terminology from machine learning. These are the inputs to an algorithm to train and be fit to predict the output (label or target). In other words, features are the input variables used in making predictions.
A label is also a term used in machine learning terminology. It is the output of the algorithm. Moreover, it is the prediction that a fit model will produce given the features (inputs).
Hyperparameters are also used in machine learning. These are parameters whose values are set before the learning process begins. For example, the number of neurons in a hidden layer in a neuron network is a hyperparameter. Another example is the number of decision trees in a Random forest.
In
The method 120 includes a first step D2 which is conducted after a start of the method. In step D2 in
The recorded outputs of the real sensors are sampled time series of the real sensor data.
The recorded outputs of the real sensors (X0, X1, . . . XN in
The sub-models of the Neural GC are shown at C1, C2, . . . , CN in
The Neural GC is trained. Particularly, each of the virtual sensors (sub-models) of the Neural GC are trained. The virtual sensors can be trained individually or together, as is described later.
The Neural GC is a non-sequential neural network that branches into several internal neural networks (sub-models). Each of those sub-models can be trained individually to predict a sensor (real sensor), given all the other sensors as an input (the recorded outputs of the other real sensors excluding the one which is to be predicted by that particular sub-model). In an alternative approach, each of the sub-models are trained together by adding up their losses and back propagating them to optimize the weights of the sub-models.
In other words, the Neural GC is a component-wise model wherein each component can be viewed as an independent neural network which is denoted sub-model or virtual sensor (or component).
The training of the Neural GC is shown at D6 in which the question arises whether the Neural GC is fit. If not, the training has to be resumed (word “no” in
In a subsequent step D10 in
As described earlier, the causation matrix can then be converted into a directed cyclic graph (DCG), as is for example shown at 88′″ in
Subsequently, at least one real sensor which forms a leaf or a root of the DAG may be determined to be replaceable and preferably be replaced by a virtual sensor in the final sensor configuration of the vehicle.
In
As described above, step D6 determines whether the Neural GC is fit. The Neural GC is fit if all of its sub-models (virtual sensors) are fit. The definition of “fit” is provided below.
As mentioned before, all sub-models have preferably the same architecture, which is essentially shown in the flow chart of
In step T2, the sub-model receives as inputs the outputs of each of the real sensors (in one embodiment except the one which corresponds to the sub-model that is actually trained).
In step T4 and T6, the inputs are split into continuous time series (T4) and categorical time series (T6). Categorical time series are time series in which the values at each time point are categories rather than measurements, wherein a sampled value of a categorical time series may for example be an integer value. A categorical time series is for example the output of an ignition key sensor (ignition on or ignition off), or a gear number sensor.
The categorical time series are transformed into their respective embedding layers (shown at T8), before they are concatenated with the continuous time series T4 and fed to the first hidden layer. The layers of the sub-model neural network are shown at T10-1 to T10-N. The first layer T10-1 is a first hidden layer. All subsequent layers are preferably 1D Convolutional layers. Such 1D Convolutional layers work well with time series. However, the layers may as well be Recurrent or Dense layers.
For the first hidden layer T10-1, it is preferred if a Group Lasso or a Group Order Weighted Lasso (GrOWL) regularization penalty is used to group the similar features together using a parameter tying technique, and to zero-out those features that do not Granger-cause the target with the help of PGD (Proximal Gradient Descent), or another sparse inducing optimizer.
In other words, weights of the respective layers are established, as shown at T24-1 to T24-N, using sparsity inducing penalty only for the weights T24-1 of the first hidden layer T10-1.
The layers T10-1 to T10-N lead to a prediction of the output of the real sensor which is to be emulated. This is shown at T14.
T18 is an input of the true values (the output of the real sensor) which is to be predicted/emulated.
In T16, a loss function is computed. In other words, the loss between the predicted and the true value is computed. The losses are shown at T20 in
The losses T20 are used to optimize the weights T24-1 to T24-N based on using a sparse inducing optimizer, as shown at T22 in
The sparse inducing optimizer may be PGD, semi-stochastic PGD (SPGD), or FtRL
(“Follow the Regularized Leader”).
The proximal operator in the sparse inducing optimizer needs to be optimized to work with the regularized penalty. A sub-model is fit if one of the following conditions is met:
Again, each of the sub-models, as shown in
On the other hand, the sub-models may be trained together, as shown in
In this case, the losses T20 will be accumulated, as shown in T26, wherein the accumulated losses are used to optimize the weights using a sparse inducing optimizer at T28. The output of the sparse inducing optimizer (T28) is in this case back propagated to each of the other sub-models and their respective weights, and not only to the weights T24-1 to T24-N of the present sub-model.
If all the sub-modules of the Neural GC are fit, the Neural GC is fit.
Once the Neural GC is fit, the weights of the respective first layers of each sub-model should be sparse (wherein features with assigned zeros do not Granger-cause the target (prediction) of that sub-model).
To generate the causation matrix as shown in
The transformation is made as follows:
In the first step, the weight matrix is converted to the Affinity Matrix (similarity matrix) using a pairwise similarity metric like cosine similarity. Subsequently, the features are clustered using the generated Affinity Matrix with any clustering algorithm that works with an Affinity Matrix like an Affinity Propagation algorithm. In step 3, the clusters are ranked by importance using feature importance measures like Permutation Test or Zero-out Test. In the Permutation Test, for example, the original data-set (recorded output data of the other real sensors), i.e. the data-set the respective-model is trained on, is randomly shuffled, and fed again to predict. The cluster that yields higher losses means that it has a higher importance than the rest. Similarly, each of the features are ranked.
In step 5, for example, the absolute global ranking Fjimportance of a feature j found in a cluster Pi may be computed by the following equation (other equations may be used as well):
where:
Finally, in step 6, the ranking is normalized so that all rankings add up to 1.
On the basis of these causation or causality vectors, the causation matrix can be generated by concatenating them. On the basis of the causation matrix, a directed cyclic graph (as DCG 88′″ in
It is to be understood that the foregoing is a description of one or more preferred exemplary embodiments of the invention. The invention is not limited to the particular embodiment(s) disclosed herein, but rather is defined solely by the claims below. Furthermore, the statements contained in the foregoing description relate to particular embodiments and are not to be construed as limitations on the scope of the invention or on the definition of terms used in the claims, except where a term or phrase is expressly defined above. Various other embodiments and various changes and modifications to the disclosed embodiment(s) will become apparent to those skilled in the art. All such other embodiments, changes, and modifications are intended to come within the scope of the appended claims.
As used in this specification and claims, the terms “for example,” “e.g.,” “for instance,” “such as,” and “like,” and the verbs “comprising,” “having,” “including,” and their other verb forms, when used in conjunction with a listing of one or more components or other items, are each to be construed as open-ended, meaning that that the listing is not to be considered as excluding other, additional components or items. Other terms are to be construed using their broadest reasonable meaning unless they are used in a context that requires a different interpretation.
Number | Date | Country | Kind |
---|---|---|---|
10 2019 121 589.7 | Aug 2019 | DE | national |
This application is a continuation application of PCT application PCT/EP2020/072196, which has been filed Aug. 6, 2020 and which claims the priority of German patent application DE 10 2019 121 589.7, filed Aug. 9, 2019.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2020/072196 | Aug 2020 | US |
Child | 17667214 | US |