The present invention relates to the provision of machine learning algorithms and the execution of machine learning algorithms.
Machine learning algorithms are increasingly deployed to address challenges that are unsuitable for being, or too costly to be, addressed using traditional computer programming techniques. Increasing data volumes, widening varieties of data and more complex system requirements tend to require machine learning techniques. It can therefore be necessary to produce models that can analyse larger, more complex data sets and deliver faster, more accurate results and preferably without programmer intervention.
Many different machine learning algorithms exist and, in general, a machine learning algorithm can be expressed as a method to approximate an ideal target function, f, that best maps input variables x (the domain) to output variables y (the range), thus:
y=f(x)
The machine learning algorithm as an approximation of f is therefore suitable for providing predictions of y. Supervised machine learning algorithms generate a model for approximating f based on training data sets, each of which is associated with an output y. Supervised algorithms generate a model approximating f by a training process in which predictions can be formulated based on the output y associated with a training data set. The training process can iterate until the model achieves a desired level of accuracy on the training data.
Other machine learning algorithms do not require training. Unsupervised machine learning algorithms generate a model approximating f by deducing structures, relationships, themes and/or similarities present in input data. For example, rules can be extracted from the data, a mathematical process can be applied to systematically reduce redundancy, or data can be organised based on similarity.
Semi-supervised algorithms can also be employed, such as a hybrid of supervised and unsupervised approaches.
Notably, the range, y, of f can be, inter alia: a set of classes of a classification scheme, whether formally enumerated, extensible or undefined, such that the domain x is classified e.g. for labelling, categorising etc.; a set of clusters of data, where clusters can be determined based on the domain x and/or features of an intermediate range y′; or a continuous variable such as a value, series of values or the like.
Regression algorithms for machine learning can model f with a continuous range y. Examples of such algorithms include: Ordinary Least Squares Regression (OLSR); Linear Regression; Logistic Regression; Stepwise Regression; Multivariate Adaptive Regression Splines (MARS); and Locally Estimated Scatterplot Smoothing (LOESS).
Clustering algorithms can be used, for example, to infer f to describe hidden structure from data including unlabelled data. Such algorithms include, inter alia: k-means; mixture models; neural networks; and hierarchical clustering. Anomaly detection algorithms can also be employed.
Classification algorithms address the challenge of identifying which of a set of classes or categories (range y) one or more observations (domain x) belong. Such algorithms are typically supervised or semi-supervised based on a training set of data. Algorithms can include, inter alia: linear classifiers such as Fisher's linear discriminant, logistic regression, Naïve Bayes classifier; support vector machines (SVMs) such as a least squares support vector machine; quadratic classifiers; kernel estimation; decision trees; neural networks; and learning vector quantisation.
While the detailed implementation of any machine learning algorithm is beyond the scope of this description, the manner of their implementation will be familiar to those skilled in the art with reference to relevant literature including, inter alia: “Machine Learning” (Tom M. Mitchell, McGraw-Hill, 1 Mar. 1997); “Elements of Statistical Learning” (Hastie et al, Springer, 2003); “Pattern Recognition and Machine Learning” (Christopher M. Bishop, Springer, 2006); “Machine Learning: The Art and Science of Algorithms that Make Sense of Data” (Peter Flach, Cambridge, 2012); and “Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies” (John D. Kelleher, MIT Press, 2015).
Thus it can be seen that a selection of a machine learning algorithm to address a problem can be challenging in view of the numerous alternatives available, each with varying suitability. Furthermore, machine learning algorithms are tailored specifically for a task and implemented in a manner that tightly coupled algorithms to tasks. It would be beneficial to address these challenges in the state of the art to provide for more effective execution and arrangement of machine learning algorithms.
According to a first aspect of the present invention, there is provided a computer implemented method of a machine learning algorithm modelling a target function mapping inputs in an input domain to outputs in an output range, the machine learning algorithm including an array of processing nodes arranged in a network of layers of nodes including an input layer for receiving an input value, an output layer for providing an output value, and one or more intermediate layers between the input and output layers, each node in the processing set being outside the input layer receiving input from at least some adjacent nodes logically closer to the input layer via weighted connections between nodes, and each node being outside the output layer generating output to at least some adjacent nodes logically closer to the output layer via weighted connections between nodes, wherein each node includes: an adjustable weight for application to each input to the node, the adjustment weight being responsive to a threshold function applied to a value of the node input; a combination function for combining outputs of the threshold function; and a node bypass function for selectively mapping one or more of the inputs to the node to the output of the node, the method comprising iteratively training the machine learning algorithm to model the target function by adjustment, at each iteration, of at least weights of connections between at least a subset of the nodes, such that the nodes of the network are programmable during operation of the algorithm by adjustment of the threshold function and the bypass function so as to selectively emphasise subsets of nodes in the network.
Preferably, the target function is defined through example by a set of inputs each associated with an output.
Preferably, the algorithm is iteratively trained using backpropagation.
Preferably, the machine learning algorithm is trained by an evolutionary algorithm whereby adjustments to the threshold functions and/or weights of connections between nodes are made by mutation and measurement of a degree of fitness of the machine learning algorithm to model the target function.
Preferably, the threshold function of at least a subset of nodes is adjusted during training in response to a measure of a degree of fitness of the algorithm for modelling the target function.
Preferably, the bypass function of at least a subset of nodes selectively maps in response to a measure of a degree of fitness of the algorithm for modelling the target function.
According to a second aspect of the present invention, there is a provided a computer system including a processor and memory storing computer program code for performing the steps of the method set out above.
According to a third aspect of the present invention, there is a provided a computer system including a processor and memory storing computer program code for performing the steps of the method set out above.
Embodiments of the present invention will now be described, by way of example only, with reference to the accompanying drawings, in which:
In use, the machine learning algorithm is trained iteratively using a conventional training approach such as supervised or unsupervised machine learning including, for example, backpropagation, based on training data provided via nodes 210 in the input layer 204. During training, adjustments are made to the weights of weighted connections between nodes. Preferably, training is continued until a measure of a degree of fitness of the machine learning algorithm 200 to model the target function 202 meets a threshold degree. For example, a degree of fitness can be measured by way of test data for which proper or expected output of the target function applied to the test data is known. Such test data provided as input to the trained machine learning algorithm 200 to generate output at the output layer 208 can be used to compare with proper or expected output of the target function 202 to measure a degree of affinity or fitness of the trained algorithm 200 to mode the target function 202.
Embodiments of the present invention provide for programming of the machine learning algorithm 200 by way of programming the network during operation of the algorithm by adjustment of characteristics of the nodes 210 so as to selectively emphasise subsets of the nodes in the network. Such selective emphasis provides for the formation of dominant subsets of nodes in the machine learning algorithm 200 and for the provision of an improved memory capability of the algorithm 200.
In contrast to nodes, neurons or comparable processing elements in conventional machine learning algorithms, the node 210 according to the present invention includes a bypass function 304, and one or more threshold functions 306, 308 with indicators f0, f1, fm for determining a weight for application to inputs X0, X1, Xm to emphasise or deemphasise inputs in the node 210.
The bypass function 304 selectively maps one or more of the node inputs X0, X1, Xm to the output 314, Y of the node 210. The bypass function 304 is programmable at a runtime of the machine learning algorithm 210 to influence a value of the output 314 of the node 210 such as by locking the value to a value of one of the inputs X0, X1, Xm. The selection of the bypass 304 can be made by a process or parameter external to the algorithm 200 such as based on an input or configuration of the algorithm 200.
Where the bypass function 304 is not selected, inputs X0, X1, Xm are each processed by a threshold function 306 such as a sigmoid function. Responsive to the threshold function 306, an indicator f0, f1, fm identifies whether an input X0, X1, Xm, so processed by the threshold function 306, is to be emphasised or deemphasised such as by indicating an excitatory or inhibitory effect of the respective input. For example, an excitatory effect can be realised by emphasising an input such as by magnifying, multiplying, scaling or increasing a value of the input. In contrast, in inhibitory effect can be realised by deemphasising an input such as by reducing a value of the input. In some embodiments, inputs may be further processed by one or more further threshold functions 308. Threshold functions of the node 210 may be adjusted, reconfigured or adapted as part of the training process to improve fitness of the algorithm 200 to model the target function 202.
Thus, in use, the machine learning algorithm 200 is trained iteratively to model the target function 202 by adjustment, at each iteration, of weights of connections between at least a subset of nodes 210 in the algorithm 200. Furthermore, the nodes of the algorithm 200 are programmable during operation of the algorithm 200 by adjustment of the threshold function 306 for emphasising or deemphasising inputs to a node 210, and by selective bypassing of a node 210 by the bypass function 304.
In one embodiment, the machine learning algorithm 200 is trained by an evolutionary algorithm technique whereby adjustments to the threshold function(s) 306, 308 and/or the weights of connections between nodes 210 are made by mutation. The evolutionary algorithm can operate on the basis of an objective function such as a measurement of a degree of fitness of the machine learning algorithm to model the target function such that exemplars in generations of evolutionary adjustments are retained or discarded based such a measure.
Thus, in one embodiment, for a machine learning algorithm 200 having an array of nodes 210 of predetermined dimension, training data is presented to the nodes at the input layer 204 and the algorithm 200 can be tested using a standard loss/error function. Error values can be propagated through the array of nodes and used to modify one or more of the following:
Each node 210 can also contain an internal logic state table that modifies the inhibition/excitation weights so permitting the algorithm 200 to act as a programmable logic array. The logic state can be defined by a binary truth table which determines whether an input to a node 210 excites or inhibits the node 210 to a degree that can be variable by weightings applied to each input to the node 210. The state of the truth table for each node 210 may either be predefined or dynamically adapted as part of a training phase. Unlike a conventional programmable gate array, the logic can be part of a learned response of the algorithm.
In some embodiments, the algorithm can adapt its internal topology such that sub-networks are dynamically formed to perform specific tasks required to properly model the target function 202. Such sub-networks can be considered to operate in a manner similar to subroutines. Further, in some embodiments the dimensions of the array of nodes 210 in the algorithm can be dynamically adjusted such as by growing or shrinking the array during training, such as to adjust a rate of learning or to accommodate constraints or availability of computing resource. Such adjustments permit the algorithm 200 to respond to, and dynamically adjust for, changes in computing or network performance.
The bypass function 304 within each node 210 provides a “phase tracking” facility which can be employed to “phase lock” nodes to a current state of a connected node. This is beneficial if the algorithm 200 is modelling a time-dependent functions such as signal processing, or real-time speech analysis. It can also be used to help form localised blocks of nodes 210 with specific logic functions. For example, time constants to regulate such a phase-locking process may be included as part of learning hyperparameters of machine learning algorithm. In some embodiments, the algorithm 200 can learn to group nodes 210 for a specific target function, such as a group of phase locked nodes 210 emerging from training in response to specific features within the training data, e.g. edges in images, or phonemes in speech analysis.
In embodiments where an evolutionary algorithm may be employed, groups of nodes 210 in the algorithm 200, such as layers or columns or nodes 210, can be mapped to a single chromosome and parameters of each node 210 within such group can form a corresponding genes within the chromosome. In such an embodiment, A training phase can use a number of fitness evaluation steps of the algorithm 200.
An algorithm 200 such as that implemented in accordance with embodiments of the present invention can take a form of a recurrent neural network model with each node 210 containing a primary transfer function plus state logic to control the effects of each input connection on the node 210 output. A resulting node 210 can also act as an external forgetting gate, or memory gate to a neighbouring node 210, via its application of an inhibitory or excitory signal as previously described.
Insofar as embodiments of the invention described are implementable, at least in part, using a software-controlled programmable processing device, such as a microprocessor, digital signal processor or other processing device, data processing apparatus or system, it will be appreciated that a computer program for configuring a programmable device, apparatus or system to implement the foregoing described methods is envisaged as an aspect of the present invention. The computer program may be embodied as source code or undergo compilation for implementation on a processing device, apparatus or system or may be embodied as object code, for example.
Suitably, the computer program is stored on a carrier medium in machine or device readable form, for example in solid-state memory, magnetic memory such as disk or tape, optically or magneto-optically readable memory such as compact disk or digital versatile disk etc., and the processing device utilises the program or a part thereof to configure it for operation. The computer program may be supplied from a remote source embodied in a communications medium such as an electronic signal, radio frequency carrier wave or optical carrier wave. Such carrier media are also envisaged as aspects of the present invention.
It will be understood by those skilled in the art that, although the present invention has been described in relation to the above described example embodiments, the invention is not limited thereto and that there are many possible variations and modifications which fall within the scope of the invention.
The scope of the present invention includes any novel features or combination of features disclosed herein. The applicant hereby gives notice that new claims may be formulated to such features or combination of features during prosecution of this application or of any such further applications derived therefrom. In particular, with reference to the appended claims, features from dependent claims may be combined with those of the independent claims and features from respective independent claims may be combined in any appropriate manner and not merely in the specific combinations enumerated in the claims.
Number | Date | Country | Kind |
---|---|---|---|
2020439.2 | Dec 2020 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2021/083782 | 12/1/2021 | WO |