The present disclosure relates to machine learning. In particular it relates to explainable machine learning.
The dramatic success of Deep Neural Networks (DNN) has led to an explosion of its applications. However, the effectiveness of DNNs can be limited by the inability to explain how the models arrived at their predictions.
According to a first aspect of the present disclosure, there is a provided a computer implemented method for machine learning comprising: training an autoencoder having a set of input units, a set of output units and at least one set of hidden units, wherein connections between each of the sets of units are provided by way of interval type-2 fuzzy logic systems each including one or more rules, and the fuzzy logic systems are trained using an optimization algorithm; and generating a representation of rules in each of the interval type-2 fuzzy logic systems triggered beyond a threshold by input data provided to the input units so as to indicate the rules involved in generating an output at the output units in response to the data provided to the input units.
In some embodiments, the optimization algorithm is a Big-Bang Big-Crunch algorithm.
In some embodiments, each type-2 fuzzy logic system is generated based on a type-1 fuzzy logic system adapted to include a degree of uncertainty to a membership function of the type-1 fuzzy logic system.
In some embodiments, the type-1 fuzzy logic system is trained using the Big-Bang Big-Crunch optimization algorithm.
In some embodiments, the representation is rendered for display as an explanation of an output of the machine learning method.
According to a second aspect of the present disclosure, there is a provided a computer system including a processor and memory storing computer program code for performing the steps of the method set out above.
According to a third aspect of the present disclosure, there is a provided a computer system including a processor and memory storing computer program code for performing the method set out above.
Embodiments of the present disclosure will now be described, by way of example only, with reference to the accompanying drawings, in which:
Artificial Intelligence (AI) systems are being adopted very rapidly across many industries and fields such as robotics, financial, Insurance, healthcare, automotive, speech recognition etc., as there are huge incentives to use AI systems for business needs such as cost reductions, productivity improvements, risk management etc. However, the use of complex AI systems such as deep learning, random forests, and support vector machines (SVMs), could result in a lack of transparency in order to create “black/opaque box” models. These lack of transparency issues are not specific to deep learning, or complex models, there are other classifiers, such as kernel machines, linear or logistic regressions, or decision trees that can also become very difficult to interpret for high-dimensional inputs.
Hence, it is necessary to build trust in AI systems by moving towards “explainable AI” (XAI). XAI is a DARPA (Defense Advanced Research Projects Agency) project intended to enable “third-wave AI systems” in which machines understand context and environment in which they operate and, over time, build underlying explanatory models allowing them to characterize real world phenomena.
An example of why interpretability is important is the Husky vs. Wolf experiment (Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). ACM, New York, N.Y., USA, 1135-1144. DOI: https://doi.org/10.1145/2939672.2939778). In this experiment a neural network was trained to differentiate between dogs and wolfs. It did not learn the difference between them—instead it learned that wolves usually stand near snow and dogs usually stand on grass. It is especially necessary to provide a model for high dimensional inputs which provides better interpretability than existing black/opaque box models.
Deep Neural Networks have been applied in a variety of tasks such as time series prediction, classification, natural language processing, dimensionality reduction, speech enhancement etc. Deep learning algorithms use multiple layers to extract inherent features and use them to discover patterns in the data.
Embodiments of the present disclosure use an Interpretable Type 2 Multi-Layer Fuzzy Logic System which is trained using greedy layer wise training similar to the way Stacked Auto Encoders are trained (Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy layer-wise training of deep networks,” in Advances in neural information processing systems, 2007, pp. 153-160). Greedy layer wise training is used to learn important features or combine features. This allows a system to handle a much larger number of inputs when compared to standard Fuzzy Logic Systems. A further benefit is that it allows the system to be trained using unsupervised data.
The IT2FLS 200 operates in the following way: crisp inputs in data are first fuzzified by the fuzzifier 202 into an input type-2 fuzzy set. A type-2 fuzzy set is characterized by a membership function. Herein we use interval type-2 fuzzy sets such as those depicted in
Once inputs are fuzzified, the inference engine 204 activates a rule base 206 using the input type-2 fuzzy sets and produces output type-2 fuzzy sets. There may be no difference between the rule base of a type-1 FLS and a type-2 FLS except that fuzzy sets are interval type-2 fuzzy sets instead of type-1 fuzzy sets.
Subsequently, the output type-2 sets produced in the previous step are converted into a crisp number. There are two methods for doing this: in a first method, a two-step process is used where the output type-2 sets are converted into type-reduced interval type-1 sets followed by defuzzification of the type reduced sets; in a second method, a direct defuzzification process is introduced arising due to computational complexity of the first method. There are different types of type reduction and direct defuzzification such as those described by J. Mendel in “Uncertain Rule-Based Fuzzy Logic Systems: Introduction and New Directions” (Upper Saddle River, N.J.: Prentice Hall, 2001).
According to embodiments of the present invention, for a type-2 FLS, a Center of Sets type reduction is used as it has a reasonable computational complexity that lies between the computationally expensive centroid type reduction and simple height and modified height type reduction which have problems when only one rule fires (R. Chimatapu, H. Hagras, A. Starkey and G. Owusu, “Interval Type-2 Fuzzy Logic Based Stacked Autoencoder Deep Neural Network For Generating Explainable AI Models in Workforce Optimization,” 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Rio de Janeiro, 2018, pp. 1-8). After the type reduction, the type reduced sets are defuzzified by taking an average of the type reduced sets. For type-1 FLS, center of sets defuzzification is used.
The Big Bang Big Crunch (BB-BC) algorithm is a heuristic population-based evolutionary approach presented by Erol and Eksin (O. Erol and I. Eksin, “A new optimization method: big bang-big crunch,” Advances in Engineering Software, vol. 37, no. 2, pp. 106-111, 2006). Key advantages of the BB-BC are its low computational cost, ease of implementation and fast convergence. The algorithm is similar to a Genetic Algorithm with respect to creating an initial population randomly. The creation of the initial random population is called the Big Bang phase. The Big Bang phase is followed by a Big Crunch phase which is akin to a convergence operator that picks out one output from many inputs via a center of mass or minimum cost approach (B. Yao, H. Hagras, D. Alghazzawi, and M. Alhaddad, “A Big Bang-Big Crunch Optimization for a Type-2 Fuzzy Logic Based Human Behaviour Recognition System in Intelligent Environments,” in Systems, Man, and Cybernetics (SMC), 2013 IEEE International Conference on, 2013, pp. 2880-2886: IEEE). All subsequent Big Bang phases are randomly distributed around the output picked in the previous Big Crunch phase. The procedure of the BB-BC are as follows:
Optimization Method for the Multi Layer Fuzzy Logic System
Architecture of the Proposed Multi-Layer FLS
To optimize the Fuzzy Auto Encoder, the Membership Functions (MFs) and the rule base are optimized using a method similar to autoencoder training with some modifications. Firstly, the BB-BC algorithm is used in place of, for example, a gradient descent algorithm. Secondly, each auto encoder is trained in multiple steps instead of in a single step.
The steps followed for training the IT2 Fuzzy Autoencoder (FAE) is as follows:
1. Train a Type 1 FAE using BB-BC and the parameters of membership functions and rule base are encoded in the following format to create the particles of the BB-BC algorithm:
where Mi represents the membership functions for inputs and consequents, there are j membership functions per inputs and four points per MF representing the four points of a trapezoidal membership function.
where Rl represents the lth rule of the FLS with a antecedents and c consequents per rule
where Mie represents the membership functions for the inputs of the encoder FLS along with the MF for the k consequents created using (3), Rle represents the rules of the encoder FLS with 1 rules and created using (4). Similarly, Mgd, Rld represent the membership functions and rules of the decoder FLS.
2. In the second step a footprint of uncertainty is added to the membership functions of the inputs and the consequents and the system is trained using the BB-BC algorithm. The parameters for this step are encoded in the following format to create the particles of the BB-BC algorithm:
where Fi+ke represents the Footprint of Uncertainty (FOU) for each of the i input and k consequent membership functions of the encoder FLS. Similarly, Fg+hd represents the FOUs for the decoder FLS.
3. In the third step the rules of the IT2 FAE are retrained using BB-BC algorithm. The parameters for this step are represented as follows:
Note: two default consequents can be added representing a maximum and minimum range of the output which improves the performance of the FLS.
The full ML FLS system including the final layer is trained starting from the FAE system trained using the method described above and removing the decoder layer of the FAE (per
where Mie represents MFs for inputs of the First FLS along with the MF for the k consequents created using (3); and Fi+Ke is the FOU for the MFs, Rle represents rules of the encoder FLS with 1 rules created using (4). Similarly, Mgf, Fg+hf, Rlf represent the membership functions, FOU of MFs and rules of the second/final FLS.
Experiments were conducted using a predefined dataset. The IT2 Multi Layer FLS is compared with a sparse autoencoder (SAE) with a single neuron as a final layer trained using greedy layer-wise training (see, for example, Bengio et al.) The M-FLS system has 100 rules and 3 antecedents in the first layer and 10 consequents. The second layer also has 100 rules and 3 antecedents. Each input has 3 membership functions (Low, Mid and High) and there are 7 consequents at the output layer.
An exemplary visualization of rules triggered when input is provided to the system are depicted in
Insofar as embodiments of the disclosure described are implementable, at least in part, using a software-controlled programmable processing device, such as a microprocessor, digital signal processor or other processing device, data processing apparatus or system, it will be appreciated that a computer program for configuring a programmable device, apparatus or system to implement the foregoing described methods is envisaged as an aspect of the present disclosure. The computer program may be embodied as source code or undergo compilation for implementation on a processing device, apparatus or system or may be embodied as object code, for example.
Suitably, the computer program is stored on a carrier medium in machine or device readable form, for example in solid-state memory, magnetic memory such as disk or tape, optically or magneto-optically readable memory such as compact disk or digital versatile disk etc., and the processing device utilizes the program or a part thereof to configure it for operation. The computer program may be supplied from a remote source embodied in a communications medium such as an electronic signal, radio frequency carrier wave or optical carrier wave. Such carrier media are also envisaged as aspects of the present disclosure.
It will be understood by those skilled in the art that, although the present disclosure has been described in relation to the above described example embodiments, the disclosure is not limited thereto and that there are many possible variations and modifications which fall within the scope of the disclosure.
The scope of the present disclosure includes any novel features or combination of features disclosed herein. The applicant hereby gives notice that new claims may be formulated to such features or combination of features during prosecution of this application or of any such further applications derived therefrom. In particular, with reference to the appended claims, features from dependent claims may be combined with those of the independent claims and features from respective independent claims may be combined in any appropriate manner and not merely in the specific combinations enumerated in the claims.
Number | Date | Country | Kind |
---|---|---|---|
19164777.5 | Mar 2019 | EP | regional |
The present application is a National Phase entry of PCT Application No. PCT/EP2020/057529, filed Mar. 18, 2020, which claims priority from EP Patent Application No. 19164777.5, filed Mar. 23, 2019, each of which is hereby fully incorporated herein by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2020/057529 | 3/18/2020 | WO | 00 |