The present invention relates to classifying the state of a system and in particular to methods, apparatus and computer program products for classifying the state of a system.
Classification lies in the field of machine learning. In particular, statistical classification is the problem of identifying a sub-group to which a new data item belongs, where the identity or label of the sub-group is unknown, on the basis of a training set of data containing data items whose sub-group association is known. Such classifications will show a variable behaviour which can be investigated using statistics. New individual items of data need to be placed into groups based on quantitative information on one or more measurements, traits or characteristics, etc. of the data and based on the training set for which previously decided groupings or classes have already been established.
A wide variety of different classifiers are known and some of the most widely used classifier algorithms include neural network, support vector machines, k-nearest neighbours, Gaussian mixture model, Gaussian, naïve Bayes and decision tree classifiers.
However, a significant issue, particularly for complex systems, is the computational burden imposed by the classification algorithm. This may require very significant computing resources to be made available in order to implement the algorithm both for speed and accuracy of classification reasons. Further, for some systems, real-time classification may be needed, in order to provide a control and/or data signal to be timely issued. Reliable real-time classification may not be possible for some classification algorithms irrespective of the computing resources available or with the computing resources practically available in a real world or industrial environment.
Hence, there is a need for a reliable classification method which has a low computational overhead.
A first aspect of the invention provides a method for classifying a system and in particular the state of a system. The state of the system can be in one of a plurality of different classes. The system can have at least one property represented by a set of data items. A current data item representing a property of the system can be received. The system can be classified as being in one of a plurality of different classes based on the probability that the system is in any one of the plurality of classes. The probability can be calculated using the current data item, a recursively calculated mean value for the set of data items representing the property of the system and at least one or a plurality of recursively calculated statistical parameter for the set of data items representing the property. Whether to output a signal can be determined based on the class in which the system is classified.
The method uses recursive calculations and so the computational burden is low. Hence, the method can operate in real-time, even for complicated systems having tens, hundreds or even thousands of different properties that can be represented by a set of data, for example data output by sensor or the like. The recursive calculations use only a current data item and stored data items which summarise, in a statistical way, the past operation of the system. Hence, the method does not need to process all, or large number of, past or historical data items.
The input to the system can be a set data items having numerical values, and in some cases a sequence of numerical values, representing one or more physical entities. In many cases the data items can come from, or have been generated by, physical sensors, but not necessarily. For example the data items might represent the number of sales of a specific product in a given time interval by a supermarket in which case the data set might be the result of a query to a database and only indirectly arise from a physical sensor (in this example a supermarket checkout barcode scanner).
The data items are not necessarily a time series. The benefits of the invention also arise for off-line unordered data. However, the invention is particularly applicable to real-time applications where other approaches are not suitable because of their relative computational inefficiency.
The system can include one or a plurality of sensors or transducers. Each sensor or transducer can output data representing a different property of the system. Each sensor or transducer can output a single set of data or a plurality of sets of data.
The data processed using the method can be applied to on-line data or off-line data. On-line data might include time series data being received in real time. Off-line data might include batch data. The batch data might include time series data but which has been collected over a time period.
The method can be a real-time classification method.
The method can further comprise recursively calculating and/or storing an updated mean value for the data item representing the property of the system using the current data item. The method can further comprise recursively calculating and/or storing a plurality of updated statistical parameters for the set of data items representing the property.
The method can further comprise receiving an input of an actual class of the system for the current data item. This can be used to train the classifier.
The method can further comprise setting the actual class of the system to the input actual class, or else to the class that the system was classified as being in.
The method can further comprise maintaining a data structure which stores, for each of the plurality of classes, data items representing the recursively calculated mean, the recursively calculated statistical parameters and the associated class of the system. The data structure can comprise a single entity or a plurality of entities. The data structure can be in the form of a single table having a separate row for each different class. The data structure can be in the form of a plurality of tables each corresponding to a different class.
The method can further comprise creating a new data structure storing data items representing the recursively calculated mean, the recursively calculated statistical parameters and the associated class of the system when it is determined that the system is in a class which does not correspond to any previous class.
The recursive calculating and storing can be only carried out if either an actual class has been input or no actual class has been input and there is a low classification error associated with the class in which the system has been classified. This can help to improve the reliability of the classifier.
Each recursive calculation can use a previously stored value representing all of the previously received data items. Hence, as only a current data item value and a summary of all previously received data items are used, the complexity of calculation, and memory required for storing data, are both very low.
The values used in the recursive calculations representing previously received data items can be stored in a data structure. The values can be stored associated with a class in which the system has been classified. The data structure can be in the form of a table. A class data item representing the class in which the system has been classified can be stored in the data structure. The class data item can be stored in a same row of a table as the values used in the recursive calculations representing previously received data items. The class data item can be a class label data item.
The statistical parameters can include one or more of the covariance matrix, the inverse of the covariance matrix and the determinant of the covariance matrix. The statistical parameters can be determined for each different class of the data.
A value representing the normalised outer product of all previously received data items and/or the mean of the current data items can be used to calculate the statistical parameters.
The system can have a plurality of properties. The method can be applied for the plurality of properties. Each property of the system can be represented by a set of data items. A current data item representing each property of the system can be received. The probability can be calculated using the current data items, a recursively calculated mean value for the set of data items representing the properties of the system and at least one or a plurality of recursively calculated statistical parameter for the set of data items representing the properties. A mean value for each data item can be recursively calculated and updated using the current data items. A plurality of statistical parameters for the data items representing each of the properties can be recursively calculated and updated.
The system can include one or more sensors for outputting data representing one or more properties of the system. The data can be time series data.
The method can include outputting a variety of different kinds of signal. The signal can encode or correspond to a control, command and/or data. The signal can be selected from: a data signal; a control signal; a feedback signal; an alarm signal; a command signal; a warning signal; an alert signal; a servo signal; a trigger signal; a data capture signal; and a data acquisition signal.
The method can include recursively calculating a covariance matrix. The method can include occasionally regularising the covariance matrix to avoid singularity of the covariance matrix. The covariance matrix can be regularised each time a pre-defined number of data items have been received.
The system can be any electrical or electro-mechanical system. The system can be a machine, an apparatus, a vehicle, an engine, a plant, a piece of plant, a piece of machinery, an electrical or electronic device or similar.
The system can be a video system. The sensor can be an image sensor. The data can be image data. The property can relate to a sub-region of a frame of image data. The method can further comprise processing the image data to extract one or more image features. The image data can be video data or still data.
A second aspect of the invention describes a data processing apparatus for classifying the state of a system. The data processing apparatus can comprise a data processing device and a storage device in communication with the data processing device. The storage device can store computer program code executable by the data processing device to carry out the method aspect of the invention and any preferred features thereof.
A third aspect of the invention provides a system. The system can comprise at least one operative part and at least one sensor which can output data representing a property of the system or of the operative part. The data can be time series data. The system can also include a data processing apparatus according to the second aspect of the invention. The data processing apparatus can be in communication with the sensor to receive the data from the sensor.
The system can include a plurality of sensors. Each sensor can output data representing a different property of the system or a part of the system. The data can be time series data.
The data processing apparatus can have an output. The output can be in communication with the system to output the signal to the system. Additionally or alternatively, the output can be in communication with another system or a sub-part or sub-system of the system. The data processing apparatus can have a plurality of outputs. Each output can be in communication with a different part of the system, a different system or other apparatus or devices.
The system can be an imaging system. The operative part can comprise an image sensor. The image sensor can output video image data or still image data.
A fourth aspect of the invention provides a computer readable medium storing computer program code executable by a data processing device to carry out the method aspect of the invention and any preferred features thereof.
Embodiments of the invention will now be described in detail, by way of example only, and with reference to the accompanying drawings, in which:
Similar items in different Figures share common reference signs unless indicated otherwise
Generally, the invention provides an adaptive way of classifying the behaviour of complex systems, which can be carried out in real-time. In the context of certain systems, the system behaviour may be classified as a fault, whereas in other systems, a specific classification of the overall system may be a trigger for a control signal, a feedback signal, data recording or some other action. Importantly, no a priori knowledge of the system is required. The invention may be configured with a suitable temporal sampling interval for capturing data from the system (for example, a few or a few tens of samples per second), or may determine a suitable sampling interval itself. In particular, the invention has no need for knowledge of ranges of sensor data, operating limits for sensor data or the meaning of sensor data.
A period of learning is allowed (either in real-time, or by being fed captured historical data), and a model of various “normal” behaviours of the system can be built up. This behaviour may include multiple classes of normal operating modes, and the invention can automatically discover these modes. For example, the sensor data from an aircraft will take different normal values depending on the phase of the flight, such as take-off, cruising and landing. A signal, such as an alarm, control signal or trigger signal, can be output when data is received resulting in the current state of the system being sufficiently statistically outside a normal learned mode, i.e. classified as being in a non-normal or anomalous mode.
Before describing example embodiments of the invention in detail, the mathematical basis of the classifier of the invention will be described. The classifier is optimal and non-linear (quadratic). The approach assumes a Gaussian distribution for the probability of the data describing the system state and that distances between new data items and mean values, which serve as prototypes, are of Mahalonobis type. An exact formula is introduced for the recursive calculation of an inverse covariance matrix as well as for the determinant of the covariance matrix which are both used to allow recursive calculation, using current data items, historical mean values and the covariance matrix, of the maximum likelihood criteria which guarantees that the classifier is optimal. This makes it possible to recalculate, after each new data sample, the exact value of the criteria without having to store all past data.
The approach is also useful for other types of data distribution, but the results will not be optimal as they are in the case of a Gaussian distribution.
An optimal Bayesian type classifier with quadratic Fisher discriminant criteria will now be described. Further details of Bayesian classifiers in general are provided in C. Bishop, Pattern Recognition and Machine Learning, Springer, N.Y., USA, 2nd edition, 2009, the content of which is incorporated herein by reference in its entirety for all purposes.
Consider the problem of classifying data samples from an n-dimensional space of features represented by real numbers, xεRn, into a finite set of non-overlapping classes C={1, . . . , c}. It is assumed that the data in each class (say c*) have the same type of distribution which can be parameterised by its mean, μc* and covariance Σc*. The probability density function (pdf) of a data sample x to be of class c* is denoted:
p(x|c*)=N(μc*,Σc*) (1)
where the * superscript denotes a certain selected c out of all of the classes, 1 to c. Then the optimal (i.e. optimal in terms of maximum likelihood) Bayesian classifier with quadratic Fisher discriminant criteria is provided by:
arg maxc*εC)[ln(pc*λc*)−1/2(x−μc*Σc*−1(x−μc*)T−1/2ln(det(Σc*−1))] (2)
where pc* is the a priori probability that the data sample is of class c*, λc* is the penalty for misclassifying the data sample to class c*, μc* is the expectation of the class c* (i.e. the mean value) and Σc* is the covariance of the class c*. The expression within is evaluated for each class of the classifier and involves matrix operations. As indicated above, x is an n-dimensional vector for a current data sample in an n-dimensional space of features.
For practical applications a finite set, or stream, of data samples can be considered, in which the data set or stream is composed of n-dimensional real numbers and C={c1, . . . , cC} provides a set of C class labels.
Then the maximum likelihood solution is
where N1 is the number of data samples of class j,
is the mean and
is the covariance matrix.
A recursive solution is adopted. In a real-time (also referred to herein as “on-line”) implementation, the data samples are arriving one by one. An issue with real-time, or on-line, classification is how to automatically update the classifier. Re-designing the classifier (i.e. solving equations (2) to (5)) for each new data sample is not efficient. Moreover, inverting the matrix of the covariance is prone to problems such as singularities. Further, calculating the determinant of the covariance matrix is also computationally very expensive.
While updating the mean (4) and covariance (5) are not very computationally expensive (having quadratic computational complexity O(n2)), the inverse and determinant of the covariance matrices is of an order of magnitude more computationally expensive (i.e. cubic O(n3)). An exact derivation of both the inverse covariance matrix and the determinant of the covariance matrices is described in what follows.
An exact derivation of a recursive form of the inverse covariance matrix will now be described. From analysing the classifier defined by equations (2)-(5), the following quantities need to be known and updated in real-time for every new data sample available: μ, Σ, Σ−1, det(Σ). The determinants are related such that det(Σ−1)=1/det(Σ).
The following formula is used in the derivation, and which is known as the matrix inverse lemma:
Starting from the expression for the covariance matrix:
the means, μ, can easily be updated recursively. The first element in the expression (7) is denoted by:
which is a normalised outer product of all data items with themselves up to a time corresponding to that of the kth data item.
For k+1:
Using equation (6) the recursive update of the inverse of the quantity, Φ, is given by:
Returning to equation (7), which is the expression for the covariance matrix, at time step k+1:
Σk+1=φk+1−μk+1μk+1T=Φk+1(tμt+1)(tμk+1T) (11)
where i=√−1. The inverse of the covariance is then (using equation (6))
Starting with an initial estimate for the covariance matrix, Σ0, the starting conditions in the covariance estimate are defined as
φe=αI, μ0=0 (13)
where α is a small constant. In this way, the covariance matrix will be non-singular from the very beginning. Finally, the expressions required for the update of the covariance and inverse covariance matrix are provided by:
Using equations (13), (14), (16) and (17) arrives at the following expression:
In practice, the last regularisation component term of equation (19) can tend towards zero which can lead to computational problems. Theoretically, that corresponds to tending to the true covariance matrix, but in practice this can lead to singularity of the matrix and a computational algorithm may stop. This practical problem can be solved by setting a limit on the number of steps, N1, after which the matrix is regularised again by
This gives the following expression for the covariance matrix:
An exact derivation of a recursive form of the determinant of the covariance matrix will now be described. The aim is to calculate the determinant of the covariance matrix at the moment in time k+1, det(Σk+1). From equations (14) and (17):
Starting with the first two components in the brackets:
The following proposition is adopted. The n eigen-values of a matrix A are denoted by λ1, . . . , λn. Then the eigen-values of the matrix (A+αI) will be (λ1, +α), . . . , (λn+α) and the eigenvectors remain the same. The determinant of the matrix
can be found as a product of eigen-values:
Thus not more than a single eigen-value in the respective matrices differs from zero. Therefore, it is possible to write:
Because the trace function tr(A) is invariant for any Hermit operator for any basis (from the theorem for the characteristic polynomial of the operator) it follows that:
then
In a similar way:
det(Φk+1−μk+1Tμk+1)=det(Φk+1)det(1−Φk+1−1μk+1)=det(1−Φ(k+1)′(−1)μ1(k+1)′Tμ1(k+1)=1−<Φ1(k+1)t(−1)μ1(k+1),μ1(k+1)>)
Finally,
Hence, a recursive approach can be used in the classifier algorithm. The recursive approach reduces the computational complexity of the algorithm to quadratic, i.e. O(n2). Updated values are calculated and applied to only the class to which the new data sample belongs. The principle of on-line, or real-time, classifiers is similar to the principle of adaptive control and estimation-update sequences used in signal processing and estimation theory. The low computational complexity and recursive updates enables a rapid update and real-time update of the optimal classifier defined by equation (2) using equations (16) to (18), (29) and the fact that the determinant of the inverse covariance matrix is equal to the inverse of the determinant of the covariance matrix for the recursive calculation of the mean, covariance, inverse covariance matrix and the determinant of the covariance matrix. However, in some embodiments the classification does not need to be carried out in an online or real time mode in order to take advantage of the ease of computational burden. However, owing to its use of recursive calculations, and resultant ease of computation, the method is particularly suitable for real-time classification applications.
The algorithm on which the invention is based uses exact formulas for the automatic real-time update of the Fischer discriminant criteria of equation (2), assuming a Gaussian type pdf and Mahalonobis type distance, which can have various applications. A significant advantage is that the pdf is exact, and not approximate, and that it is recursively calculated. As mentioned above, for non-Gaussian distributions, the results are not exact but can still be used to give useful classification results.
A first embodiment of the invention will now be described in the field of flight data analysis to which the algorithm can be successfully applied. However, it will be appreciated that the invention is not limited to flight data analysis. Rather, the invention has application in relation to all kinds of electronic, mechanical and electro-mechanical systems in which it is useful to be able to classify the behaviour of the system and provide some output signal or data responsive to the determined classification. For example, a second embodiment of the invention is described below applied to the field of image processing.
In many aircraft there is typically a flight data recorder (FDR) which may record between a dozen and 1400 parameters relating to the aircraft (for example, values of properties of the aircraft, such as physical variables which might include pitch, approach speed, altitude, gear speed, acceleration, rate of descent, etc.). For example, the FDR of an Airbus A330 records about 1400 parameters at a frequency 16 Hz (i.e. one set of readings every 16th of a second), an Embraer 190 FDR records about 380 parameters, an ATR 42 FDR records about 60, and some Fokker aircraft FDRs record merely 13 different parameters. Conventionally, Flight Data Analysis (FDA) is routinely performed off-line or not in real-time (i.e. after the flight) and primarily only on flights which had some easily identifiable problems. However, using the classifier of the invention it is possible to have an automatic classification of the state of the flight into a ‘Normal’ or an ‘Abnormal’ class, in real-time and during the flight, meaning that an alarm and/or other signals can be generate during the flight so that an emergency, or un-scheduled, landing can be made. This can be fully automated or the pilot/air crew on board can simply be notified of the abnormal state of the flight.
Returning to
The data processing apparatus 102 includes a data processing unit 120 including one or more central processing units, local memory and other hardware as typically found in a conventional electronic general purpose programmable computer. The data processing unit 120 is in communication with a data store 122 which may be in the form of a database. Data processing unit 120 has a plurality of outputs 124, 126, 128. A first output 124 is in communication with a further part of the system 100, such as a display unit 130 in the cockpit of the aircraft. The system 100 may include a further part 132, such as a further computing or data processing device to which an output signal can be supplied by the data processing unit 120. Finally, a third output 128 is in communication with sub-system 112, and in particular, allows a signal path to wing servo 114. Hence, the data processing unit 120 may output various different signals to different parts of the system in order to control or otherwise interact with other parts of the system 100.
Data processing unit 120 locally stores computer program code to implement a data processing method also according to an aspect of the invention and which will be described in greater detail below. For example, the computer program code may be stored in compiled form in a local ROM. A local RAM is also provided to provide working memory and storage for the data processing unit in order to execute the computer program instructions.
As can be seen from equation (8), Φ is effectively a normalised outer product of all data values ‘to date’ or put another way a normalised outer product of each data item with itself for all preceding or previously received data items. The recursive calculation of each of the values illustrated in
Field 218 can store a class label, as described below, which indicates which particular class of behaviour the data structure corresponds to, e.g. “normal”, “fault”, “unknown”. Initially field 218 can simply store a class identifier data item which allows the different classes to be differentiated, e.g. Class—1, Class—2, Class—3, before any real world salience is attached to the different classes. Once real world class labels are established, then the class identifier can be replaced with the real world label. In other embodiments, the class label associated with each table is not provided as part of the table but is simply associated with its respective table by some other data structure, for example by using a mapping table or providing a reference or a pointer for each class label to the table with which it is associated. Hence, the class label is associated with a table, but need not be a part of the table.
The classification process and creation of the tables involves equation (2) and is described in greater detail below with particular reference to
As mentioned above, and illustrated in
Generally, two different types of training can be used: off-line training, based on batch sets of training pairs of inputs or features and corresponding outputs or class labels or classification identifiers; and on-line training in which data samples come one by one and when there are k−1 pairs of inputs/features plus correct class labels/IDs the classifier is trained as described in equations (2), (16)-(18) and (29) above in order to predict the correct class label of the sample k. After that, if and when the correct class label of sample k is available the training can continue. These are the so called prior and posterior information pairs or predict and update used in automatic control and estimation theory. For the example of flight data, once a fault is confirmed by a human (for example the pilot or a ground controller) the class label can be set to ‘FAULT’ from ‘NORMAL’ and the input data or features can be used for re-training. Before that, off-line pre-training can be used. Future use of the same classifier works automatically and does not require re-training from the beginning but only based on the new labels. For the image processing embodiment described below, once a landmark is determined by a human user to be of a specific class, the labels can be used for re-training, but again not from the start but only based on the new data samples.
The process of classifier training 302 and classifier use 304 will now be described with reference to
Following initialization of the program, at step 504 a first data set or data sample of n data values 506 from the n sensors is received and at step 508 the data sample count index, k, is incremented by 1. As this is the first data sample (k=1) there is only a single class and so processing effectively skips to steps 534 and 536 at which the mean values for the sensor data S1 to Sn are calculated 534 and values for the various statistical parameters are calculated 532 and written to the corresponding fields of table 202. Processing continues (to be described in greater detail below) and returns to step 504 at which a second data sample 506 is received and the data sample index is incremented at step 508 (to k=2).
At step 510 a classification process is carried out using the most recently received data sample data items (i.e. for k=2) to determine which class the current data sample corresponds to. The classification step 510 is effectively a prediction or estimate of which class the system is believed to be in, at this stage of processing.
At step 606 it is determined whether the likelihood for the current class is greater than a current maximum likelihood. During a first loop, the current maximum likelihood will be zero and so the current likelihood for the current class will be greater. And so at step 608, the current maximum likelihood is set to the current likelihood and the classification is set at the current class, in this case Class 1. Processing then proceeds to step 610 and a next class, if there are any remaining, is selected for evaluation and processing returns 612. Hence steps 602 to 612 implement a loop by which the method determines which of the currently existing classes there is the highest likelihood that the current data sample corresponds to.
As there is only a single class at present, processing proceeds to step 614 at which the maximum likelihood determined during loop 602-612 is compared to a threshold likelihood value, for example e−1 which is approximately 0.38 and which represents the so called ‘one sigma’ condition. If the maximum likelihood for the current data sample is extremely low, then this may indicate that is an error in the data or some other problem (for example a sensor malfunctioning or noise) in which case at step 616 an exception or error handling routine may be called for example to discard the current data sample. Alternatively, if the maximum likelihood is merely very low, then this may indicate that the current data sample corresponds to a genuine class of behaviour of the system, but which is different to the behaviour corresponding to the currently existing class tables, in the current example, a class of behaviour different to that corresponding to the first class table 202. Hence, at step 616 a new class table 204 is created corresponding to the second class of behaviour. At step 618 a class label is assigned to the new class table. The class label can be an input, either from a user or another computer or system, and which provides a real world class label to be associated with the new class table. However, the real world class label does not need to be received at this stage and can be received subsequently or at some other time. Hence, if no real world class label is assigned at step 618, then the method automatically assigns a class label by incrementing a count of the number of different classes and assigning the class table that label, e.g. class—2 in the current example.
The system may then store an indication that the current data sample (k=2) corresponds to the system being in class—2 at step 620. Alternatively, if a new class table was not set up, then at step 614, processing may proceed directly to step 620 which the system stores an indication that the current data sample corresponds to the system being in class—1. Hence, method 600 assigns an initial or estimated classification either from amongst the plurality of already existing classes, or a newly added class, to the most recent data sample. Processing then returns to method 500.
Hence, the current state of operation of the aircraft has been provisionally classified as being in one of a plurality of classes. Depending on the classification of the state of operation of the aircraft, and possibly secondary considerations, some output may or may not be required. Hence, at step 512, it is determined whether any output is required from the classifier. For example, if the aircraft is classified as being in a fault state, then at step 512, it may be determined that one or more output signals are required, such as an alarm signal, and at step 514, an alarm signal may be issued to the pilots' instrumentation 130 for display. The output determining step 512, may also take other input as well as simply the estimated classification of the state of operation of the aircraft. For example, logical input may also be taken from other systems or sub-systems of the aircraft and applied in a rule based approach to determine what output, if any, may be required at step 514. If it is determined at step 512 that no output is required, for example because the aircraft is classified as being in a normal state of operation, then output step 514 is by passed and processing proceeds to step 516.
At step 516, the method 500 can pause or wait. The extent of the delay can vary depending on the field of application of the method. For example, when the method is being applied to quickly varying input data sets then the delay can be a few seconds, tenths or hundredths of a second. In other applications the delay can be a few minutes or hours. Hence, delay 514 is simply a suitable delay to provide time for any real world feedback as to the actual class that the current data sample (k=2) belongs to. This is likely to be provided during the training phase, but may not be needed after the training phase. Hence, at step 518 there may optionally be some input of data 520 identifying the actual class that the current data sample corresponds to. The input may be from a user or may be from some other computer, system or apparatus. For example, if the current data sample corresponds to a normal mode of flight, then the actual class input at 518 may be that corresponding to ‘normal’, e.g. class—1. Alternatively, if the current data sample corresponds to an abnormal mode of flight, then the actual class input may be that corresponding to ‘abnormal’ e.g. class—2. It will be appreciated that the input may be either a class indicator (e.g. class—1 or class—2) or may be the real world class label (“normal” or “abnormal”). At some stage during the training phase a real world label will need to be input in order to attach real world significance to each of the classes created by the method.
Steps 518 and 520 are optional in the sense that they do not have to be completed for every data sample received. However, the more often they are completed, particularly during the early stage of the classifier, then the more rapidly the method will be able to train and/or improve its reliability of classification.
The method 500 can optionally include step 522 which can further improve the reliability of the classifier. At step 522 it is determined whether any actual class was input at step 522 and also whether there is a sufficiently large classification error associated with the estimated class. If no actual class was input and there is a large classification error associated with the estimated class generated by step 510, then processing returns to step 504 and the class tables are not updated. The classification error can be quantified by maintaining a count of the number of times the estimated class does not correspond to the input actual class (i.e. the number of wrong classifications) and also maintaining a count of the number of times that the system has been classified as being in the actual class (the total number of classifications). The classification error is the given by the number of wrong classifications divided by the total number of classifications. If that classification error exceeds some threshold, e.g. 5%, then the classification error can be considered large and processing can return to step 504.
Hence, if there is no actual class input and there is a large classification error associated with the estimated class, then it is preferable not to update the class table for the estimated class as there is no actual class available to confirm that the estimated class is correct and so the new data sample may make the classification method less reliable if the class table is updated using the current data sample. The large classification error may be an indication that the system is still effectively training for this class and so more actual class data may be needed.
If there is no actual class input and there is a low classification error associated with the estimated class, then this may indicated that the classification behaviour is reliable and even though there is no actual class it is acceptable to update the class table for the estimated class.
If there is an actual class input, then the classification error is irrelevant, and so the class tables can be updated using the actual class to help train the classifier.
After the determination at step 522, processing can proceed to step 526. If there was an input indicating the actual class at step 518 then the actual class is set as the input class, otherwise, in the absence of an input class, the estimated class is assumed to be accurate and the actual class is set to the estimated class at step 526.
At step 530, the method determines whether the actual class corresponds to any of the currently existing classes (e.g. class—1 or class—2). This determination can be carried out simply by comparing class labels. The class label for the actual class is compared with the class labels for all currently existing classes to see if a class corresponding to the actual class already exists or not. If not, the processing proceeds to step 532 and a new data structure 206 for a new class (class—3) is created at step 532. Otherwise, if the actual class is determined to correspond to one of the existing classes at step 530, then at step 534, the mean values for the sensor inputs S1 to Sn are re-calculated (using the stored mean values and the current data sample values and equation (16)) and overwritten in field 220 for the table corresponding to the determined class. As indicated above if an actual class data item 520 is not received at step 518, then the preliminary classification determined by step 510 is assumed to be correct. If an actual class data item is received at step 518, then that classification is assumed to be correct (and may or may not correspond to the preliminary classification generated by step 510). Then at step 526, updated values for the statistical parameters are recursively calculated and the values updated in the table corresponding to the determined class by overwriting fields 222 to 232 respectively. Also, the data item indicating the number of data samples allocated to the classification, e.g. NC1 218, is incremented.
In particular at step 526, an updated value for 1 is recursively calculated using equation (14), an updated value of the covariance matrix is recursively calculated using equation (19), the inverse of 1 is recursively calculated using equation (15), the inverse of the covariance matrix is recursively calculated using equation (18), the determinant of 1 is recursively calculated using equation (28) and the determinant of the covariance matrix is recursively calculated using equation (29).
As described above, in order to avoid singularities in the covariance matrix, the covariance matrix may require regularisation. Hence, at step 538, it is determined whether the algorithm has been applied for a fixed number of steps, equal to a limit, N1. If the algorithm has not been applied N1 times, then processing proceeds to step 540 at which the number of times the algorithm has been applied (steps) is incremented by one. Otherwise, if the algorithm has been applied N1 times, then at step 542, the covariance matrix is regularised, corresponding to the application equation (20). The count (steps) of the number of time the method 500 has been applied is then reset to zero at step 544. Processing then returns to step 504 at which a next set of sensor data items is input, corresponding to k=3. Processing then continues to loop as described above.
Having described a first embodiment of the invention, in relation to an aircraft operation system, a second embodiment of the invention will now be described in the context of image processing.
With reference to
The data items delivered by the image processing unit 706 to the data processing apparatus 720 may be chosen to reflect the nature of any particular classification task. The image processing unit may be considered as a pre-processing unit, configured so as to deliver useful data items. The data processing unit 720 has no concept of the meaning of the data items, rather it acts simply as a classifier.
For example as described above, each image frame may be considered as a single entity or may be divided into a set of bins, each one being a single entity. Then for each entity, a range of numerical data items may be calculated. These data items may, for example, be the average values of each of the red, green and blue signals (RGB) for each such entity. Alternatively the data items may be derived such as the average values of hue, saturation and brightness value (HSV) for each entity. Equally the data items may be grey scale values. It will be understood that other techniques of “binning” the image may be used, and various image processing algorithms may be used, so that a range of appropriate data items may be derived from the original captured image. Suitable methods and transformations include those found in tools such as Photoshop and the like, and/or as described in “Pattern Recognition and Machine Learning” incorporated by reference above.
Additionally the image processing unit may extract features from the data using well known techniques such as principal component analysis (PCA), GiST (as described in A. Oliva, A. Torralba, “Modelling the shape of the scene: a holistic representation of the Spatial Envelope”, International Journal of Computer Vision, 42: 145-175, 2001, which is incorporated herein by reference for all purposes) and the like.
With reference to
A next bin of the current frame is optionally selected at step 1010 for processing and processing loops 1014 in this way until all of the bins of the current image frame have been processed. Classification of the current image frame is then carried out at step 1016. Then, at step 1018, a next image frame, in this instance, the second, is selected for processing and processing returns, as illustrated by process flow line 1020, to step 1004 at which the second image frame is selected for processing. Hence, as images are provided by the image capture device, each image frame is processed on a frame by frame basis in order to classify each of the image frames. Hence, it will be appreciated that the general method 1000 can be applied both to video data or to non-sequentially captured still images. In an alternative embodiment, processing and classification is carried out on a whole frame basis, rather than using bins, and so steps 1006, 1010 and 1014 are omitted from the process illustrated in
The image classification step 1016 uses a method essentially the same as classification method 500 and so significant differences therein only will be described. Instead of a data sample being a set of sensor outputs S1 to Sn, the image classification method uses a plurality of image features X1 to Xn which maybe directly or indirectly obtained from a frame of image data. Each frame is initially classified and new classes can be added to table 900 by adding rows. During training actual class labels can be input (e.g. car, lorry, plane) to be associated with the different classes of image identified by the classifier. As the mean values of the data items are recursively calculated and updated in field 906 of table 900 for the row corresponding to the determined class and the statistical parameters are similarly recursively calculated and the updated in the corresponding fields, 908 to 918 of the same row of table 900. The classification assigned to the frame of video data is indicated by field 904. For example, a frame may be classified as relating to one of multiple different types of objects, for example, a car, lorry, plane or unknown. In other embodiments, for example, parts of the landscape which stand out from the surrounding can be classified as landmarks and used for navigation, simple maps, arranging rendezvous for mobile robots or for video diaries.
Once the classification has been determined the method can determine whether any output is required based on the classification assigned to the currently considered frame. One output can be to set the current frame as a ‘landmark’ and assign to it an ID incrementally from a previous ID: ID(K)=ID(k−1)+1. This can then be used for simple map building, navigation, rendezvous, video diaries, etc. If it is determined that an output is required in response to the determining classification, then the data processing apparatus 720 can output a signal to control or otherwise interact with the image capture system 700. For example, an alert signal or trigger signal may be issued, causing further or other data to be captured or displayed. If no further output beyond storing the classification for the image frame is required, then no further output signal may be generated.
Generally, embodiments of the present invention, and in particular the processes involved in the identification of anomalous states of the system employ various processes involving data processed by, stored in or transferred through one or more computing or data processing devices. Embodiments of the present invention also relate to an apparatus, which may include one or more individual data processing devices, for performing these operations. This apparatus may be specially constructed for the required purposes, or it may be a general-purpose computer or data processing device, or devices, selectively activated or reconfigured by a computer program and/or data structure stored in the computer or devices. The processes presented herein are not inherently related to any particular computer or other apparatus. In particular, various general-purpose machines may be used with programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required method steps.
In addition, embodiments of the present invention relate to computer readable media or computer program products that include program instructions and/or data (including data structures) for performing various computer-implemented operations. Examples of computer-readable media include, but are not limited to, magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM disks; magneto-optical media; semiconductor memory devices, and hardware devices that are specially configured to store and perform program instructions, such as read-only memory devices (ROM) and random access memory (RAM). The data and program instructions of this invention may also be embodied on a carrier wave or other transport medium. Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter.
CPU 902 can also be coupled to an interface 910 that can connect to one or more input/output devices such as video monitors, track balls, mice, keyboards, microphones, touch-sensitive displays, transducer card readers, magnetic or paper tape readers, tablets, styluses, voice or handwriting recognizers, or other well-known input devices such as, of course, other computers. Finally, CPU 902 optionally may be coupled to an external device such as a database or a computer or telecommunications network using an external connection as shown generally at 912. With such a connection, it is contemplated that the CPU might receive information from the network, or might output information to the network in the course of performing the method steps described herein.
Although the above has generally described the present invention according to specific processes and apparatus, the present invention has a much broader range of applicability. In particular, aspects of the present invention are not limited to any specific type of industrial system and can be applied to virtually any type of industrial system in which one or more sensors are available to provide time series data relating to one or more properties of the system. One of ordinary skill in the art would recognize other variants, modifications and alternatives in light of the foregoing discussion.
Number | Date | Country | Kind |
---|---|---|---|
1218209.3 | Oct 2012 | GB | national |
This application is a Continuation of International Application No. PCT/GB2013/052635, filed on Oct. 9, 2013, the contents of which are incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/GB2013/052635 | Oct 2013 | US |
Child | 14677269 | US |