The present invention relates to a machine learning device and a machine learning method.
Machine learning and artificial intelligence, which have been attracting high expectations recently, rely on deep learning as their core technology. Meanwhile, reservoir computing has also met with remarkable progress recently. These techniques are not so different from each other in terms of configuration only, but are different from each other in the learning process. Specifically, deep learning involves learning of all coefficients for calculation performed at each node, whereas reservoir computing involves learning of only coefficients related to output, with coefficients at each node fixed. In other words, the coefficients are learned over multiple layers in deep learning, but are learned only in a single layer in reservoir computing. These are the fundamental difference between these two methods in the configuration, and the difference characterizes each method. In deep learning, the coefficients of all nodes are learned. This results in the learning generally taking a long period of time, but also results in high calculation versatility. In contrast, reservoir computing involves the learning performed with the output layer only. Thus, compared with deep learning, shorter learning time can be achieved, but with the coefficients fixed at the nodes except for the output layer, the versatility is compromised. Still, with reservoir computing, learning is not required for layers other than the output layer, meaning that the versatility of the calculation unit is not required in the first place. Thus, while deep learning requires a calculator with relatively high versatility as a precondition, reservoir computing does not require the versatility of the calculation unit as a precondition. In view of this, active researches for using physical systems such as a spin system and a quantum system in reservoir computing have recently been conducted.
For example, WO2017/131081 discloses a quantum information processing system where reservoir computing is applied to a quantum system. WO2017/131081 discloses a “technique of a system including a quantum system including multiple qubits interacting with each other, in which a quantum reservoir includes multiple virtual nodes that hold signals from the respective qubits at each time points dividing a certain time interval into multiple intervals, a weight for determining a linear weight in a linear coupling between the multiple virtual nodes is determined, and an output signal is read that is obtained from quantum superimposition of the states at the virtual node and is linearly combined using the linear weight determined.”
Gouhei Tanaka, Toshiyuki Yamane, Jean Benoit He'roux, Ryosho Nakane, Naoki Kanazawa, Seiji Takeda, Hidetoshi Numata, Daiju Nakano, and Akira Hirose, “Recent Advances in Physical Reservoir Computing: A Review” arXiv: 1808.04962v2. discloses an example where reservoir computing is applied to a spin system, as one example of application of reservoir computing to a physical system (6. Spintronic RC).
When a physical system is applied to a machine learning device, the controllability is insufficient but the degree of freedom of the system is high. The latter feature can offer rich dynamics (dynamic characteristics) required in machine learning. In particular, in quantum systems, physical quantities time-evolve in accordance with an equation different from that in classical systems, and thus the quantum systems can achieve dynamics unachievable in the classical systems. The disadvantage of insufficient controllability has been a problem for the application of the physical systems to machine learning devices; however as described above, this will no longer be a case if the concept of reservoir computing is used.
For the above reasons, the physical systems have become usable as machine learning devices. However, actual applications of physical systems to calculation units encounter many hardware limitations, and thus there are cases that cannot fully exploit the dynamics that are supposed to be provided with the physical systems, owing to reasons such as difficulty in measuring output values. This is particularly the case with a quantum system. Thus, to improve the effectiveness of the physical system in computing, a solution is required that enables us to draw out the dynamics of the physical system even when the number of measurement times is small. However, the above prior art does not consider this point, i.e., enabling us to sufficiently draw out the dynamics of the physical system even when the number of measurement times is small.
In view of the above, an object of the present invention is to provide a machine learning device capable of sufficiently drawing out the dynamics in a machine learning unit with a small number of measurement times.
A machine learning device according to the present invention preferably includes:
With the present invention, a machine learning device capable of sufficiently exploiting the dynamics in a machine learning unit with a small number of measurement times can be provided.
Hereinafter, embodiments of the present invention will be described with reference to the drawings. The following description and the drawings are examples for describing the present invention, and they might be partly omitted and simplified to some extent as appropriate for the sake of clarification of the description. The present invention can be implemented in other various embodiments. Unless specified, each component may be provided singly or in plurality.
For the sake of understanding of the invention, the positions, sizes, shapes, ranges, and the like of the components shown in the drawings and the like may not represent actual positions, sizes, shapes, ranges, and the like. Thus, the present invention is not necessarily limited to the position, size, shape, range, and the like disclosed in the drawings and the like.
In the following description, various types of information may be described using expressions such as “table”, “list”, and “queue”. However, various types of information may be expressed by other data structures. To indicate that the information does not depend on the data structure, “XX table”, “XX list”, etc. may be referred to as “XX information”. Identifying information may be described using expressions such as “identifying information”, “identifier”, “name”, “ID”, and “number”. These can be interchangeably used.
When there are a plurality of components having the same or similar functions, such components may be described while being denoted with the same reference numerals and different suffixes. When such plurality of components need not to be distinguished from each other, the suffixes may be omitted in the description thereof.
Furthermore, in the following descriptions, processing such that a program is executed may be described, where the object for such processing is variously provided. In some cases, a program is executed mainly with a processor (e.g., CPU or GPU), and a storage resource (e.g., memory) and/or an interface device (e.g., a communication port) may be used as appropriate. Similarly, the object for the processing where a program is executed may be placed on a controller, a device, a system, a computer, or a node including a processor. Alternatively, a dedicated circuit (e.g., an FPGA or an ASIC) for executing specific processing may be included in the object for the processing where a program is executed.
The program may be installed on a device such as a computer from a program source. The program source may be, for example, a program distribution server or a computer-readable storage medium. When the program source is a program distribution server, the program distribution server may include a processor and a storage resource for storing the program to be distributed, and the processor of the program distribution server may distribute the subject program to another computer. In the following descriptions, two or more programs may be implemented as a single program, or a single program may be implemented as two or more programs.
Hereinafter, embodiments of the present invention will be described with reference to
Hereinafter, a first embodiment will be described with reference to
A machine learning device according to the first embodiment uses reservoir computing as a basic feature.
First of all, a configuration of the machine learning device according to the first embodiment will be described with reference to
The machine learning device according to the first embodiment has substantially the same configuration as an ordinary computer, but differs from an ordinary computer in that a reservoir arithmetic device 100 as a machine learning arithmetic device is provided in addition to a general arithmetic device 902. A calculation procedure is almost the same as that in an ordinary computer. Specifically, the procedure is implemented by exchanging data between a main storage device 901 and the general arithmetic device 902 and between the main storage device 901 and the reservoir arithmetic device 100 under the control of a control device 903.
Work data and an execution program are loaded into the main storage device 901, to be referred to from the control device 903 and the general arithmetic device 902. An auxiliary storage device 904 is a nonvolatile large-capacity storage device such as a hard disk drive (HDD) or a solid state drive (SSD), and an execution program and data installed therein are loaded into the main storage device 901 as necessary. Data input to the main storage device 901 is performed through an input device 905, and data output from the main storage device 901 is performed through an output device 906. These operations are performed under the control of the control device 903.
The reservoir arithmetic device 100 is a dedicated calculation unit that is dedicated for supervised learning based on reservoir computing. The reservoir arithmetic device 100 includes a reservoir 20, an input unit 10, and an output unit 30. The reservoir 20 is a body including multiple interacting nodes 201 to 204 and an input/output system.
Next, an operation of the machine learning device will be described with reference to
First of all, an operation performed by the machine learning device will be described.
It is assumed that (xv,yv0) is input as training data. Here, the subscript v indicates that xv and yv0 are vectors, and components of these are represented as xv=(x1, x2, . . . , xnx)T and yv0=(y10, y20, . . . , yny0)T. Here, xv and yv0 are vertical vectors, and T represents transposition. Furthermore, xv is referred to as the first training data and yv0 is referred to as the second training data. The main storage device 901 stores (xv,yv0) input through the input device 905.
The first training data xv is input to the node 201, through the input unit 10 in the reservoir arithmetic device 100, in the order of x1, x2, . . . , xnx. The subscripts of these represent time points. In
In the example of the nodes of the reservoir 20 illustrated in
Let zi(tk) be measurement outcome at a node i (i is a subscript corresponding to the node 202, the node 203, and the node 204) at a certain time point tk. This zi(tk) is a value obtained by measuring a physical quantity at a node and converting it into voltage or current. Through the output unit 30, zi(tk) is transmitted to and stored in the main storage device 901. Thus, the measurement outcome zi(tk) is obtained.
Next, a training process and a subsequent inference process using unknown input data will be described.
First of all, the training process and the inference process in a case where each measurement outcome is used once will be described. These are general training and inference processes in ordinary machine learning.
An output at the time point tk is assumed to be given by using a coefficient wi to be y(tk)=Σizi(tk)wi. In the training according to supervised learning, wi is determined so that yv=(y(t2), y(t2), . . . , y(tny))T coincides with yv0. Specifically, wi minimizing Σk(y(tk)−y0(tk))2 should be found.
Let wv be a vertical vector whose component is wi, and let Z be a matrix whose (k,i) matrix element is zi(tk). Then, yv=Zwv. If there exists wv=wv0 resulting in output yv0, then yv0=Zwv0. Since Z is generally not a square matrix, it has no inverse matrix generally, but a pseudo-inverse matrix (Moore-Penrose pseudo-inverse matrix) can be defined. Assuming that Z is m×n matrix, rank(Z)=n, and Z+ is the pseudo inverse matrix for Z, the pseudo inverse matrix is Z+=(ZTZ)−1ZT. Here, (ZTZ)−1 is the inverse matrix of ZTZ. Using the pseudo inverse matrix Z+, wv0=Z+yv0 holds, whereby wv0 is obtained. This wv0 obtained corresponds to wv achieving the minimum value of Σk(y(tk)−y0(tk)2.
The calculation described above is implemented with the main storage device 901 and the general arithmetic device 902 exchanging data, and wv0 obtained in the end is stored in the main storage device 901.
The above training can be summarized as a process of “determining the coefficient wv0 to reproduce the second training data yv0”. In the process to reproduce the second training data yv0 from the measurement outcome (output value) zi(tk) at the node i (i is an subscript corresponding to the node 202, the node 203, and the node 204), the measurement outcome zi(tk) is the object processed. Thus, the process of determining the coefficient wv0 can be regarded as a process of determining a rule for processing the measurement outcome zi(tk). Furthermore, the successful determination of the coefficient wv0 means the successful determination of the rule for processing the measurement outcome zi(tk) for reproducing the second training data yv0.
After the training, an output signal can be inferred from an input data xvs using wv0 obtained, to obtain yvs. That is, xvs is transmitted to the node 201 through the input unit 10; zis(tk) is transmitted to the main storage device 901 through the output unit 30 to be stored therein as a result of the time evolution of xvs in accordance with the dynamics of the node 201 itself, the dynamics based on the interaction among the nodes 201 to 204, and the dynamics of the nodes other than the node 201. Using this zis(tk) i.e., using Zs, yvs is obtained by the relation yvs=Zswv0.
This inference process can be summarized as follows: “general input data xvs is arithmetically processed in the reservoir arithmetic device 100, and the arithmetic result is processed based on the rule obtained in the training to obtain an output”.
As illustrated in
Next, in the inference (prediction, decision, recognition) process, xvs is input to the reservoir arithmetic device 100, and zis(tk) is output. Then, the general arithmetic device 902 uses Zs whose (k,i) matrix element is zis(tk) and wv0 obtained in the training process, to determine yvs.
The processing described above corresponds to a conventional technique, where the training process and the inference process use each measurement outcome once. Next, let us describe the extended version of the processing described above, where each measurement outcome is used multiple times in the process of obtaining output data from input data.
In the case where each measurement outcome is used once, the output at the time point tk is calculated as y(tk)=Σi=1nzi(tk)wi as described above. In terms of a matrix and a vector, this is represented as yv=Zwv. This is illustrated in
Let Zr be the matrix with the elements of Z shifted by r rows (r=0, . . . , q−1); Z0=Z.
When each measurement outcome is used once, the sum with respect to i is from i=1 to i=n. This means that the size of the vector wv is n. As can be clearly seen in
Thus, the machine learning device according to the first embodiment is the same as the conventional technique in regard that wv0 is determined to reproduce the second training data in the training process and the output is inferred from unknown input data using the wv0, but is different therefrom in regard that the measurement signal zi(tk) at the node i at the time point tk obtained through the output unit 30 is redundantly used to draw out the dynamics of the reservoir arithmetic device 100 as much as possible.
As described above, in the present embodiment, the output (measurement outcome) from the machine learning device is redundantly used so that the number of processed data can be expanded to be larger than the number of measurement outcomes. Thus, the dynamics of the machine learning unit can be sufficiently exploited even with a small number of measurement times.
A machine learning device according to a second embodiment is described below with reference to
The machine learning device according to the present embodiment has substantially the same configuration as the machine learning device according to the first embodiment, and the concept of the training and inference processes are also the same between the two embodiments.
The present embodiment is different from the first embodiment in the configuration of the reservoir arithmetic device 100. In the reservoir arithmetic device 100 according to the first embodiment, the reservoir 20 is a single reservoir. In contrast, the reservoir arithmetic device 100 illustrated in
A machine learning device according to a third embodiment is described below.
In the first embodiment, a description is given on the processing by the machine learning device that repeatedly uses the measurement outcomes. This is for responding to the situation that the number of measurement times is reduced.
Application of a physical system to the reservoir arithmetic device 100 may encounter various limitations. The number of measurement times is represented by the product of the number of measurement nodes and a measurement frequency. In physical systems, some cases may limit the number of nodes where measurement is possible, and other cases may limit the measurement frequency. In some cases, the number of nodes where measurement is possible is strongly limited (the number of nodes cannot be made large), but the measurement frequency is not so limited. In such a case, it is effective to reduce the number of measurement nodes and increase the measurement frequency.
In the first embodiment, a signal is input at time point tk, and measurement (output from a node) is also performed at each time point tk. This means that the input frequency and the measurement frequency are the same. When the measurement frequency is increased so that the measurement is performed M times within δ=tk+1−tk (a time period between the time point tk and a time point tk+1) it will be equivalent to increasing the number of measurement nodes by M times. Also considering that the measurement outcome is used q times, the output y(tk) at the time point tk is expressed by an expression y(tk)=Σi=1qnMzi(tk)wi.
As described, the number of measurement nodes and the measurement frequency can be changed in accordance with the characteristics of the system. Actual values may be determined based on the constraint conditions of each system.
A machine learning device according to a fourth embodiment is described below with reference to
In the first to the third embodiments, the operation principle of the present invention is described using reservoir computing as an example. In the present embodiment, an idea of the invention will be described using a specific calculation result. It is assumed that a quantum system (Ising model) is used for the reservoir arithmetic device 100, and the result of a computer simulation performed under such a condition is used. The Ising model was developed as a physical model in statistical mechanics. Each position (node) can take only two states, and only the interaction between two nodes will be taken into consideration.
In the present embodiment, there are four nodes as in the model of
The task is a timer task as illustrated in
According to the notation in the first embodiment, x(tk)=0 for 0≤tk≤99, and x(tk)=1 for 100 tk.
The output value y° (tk) of the training data is set as y0(ti)=1 at an arbitrary certain time point ti and is set as y0(tk)=0 at other time points tk(≠ti). That is, the output is a pulse at an arbitrary time point ti.
When the value of zi(tk) is used in a range of 0≤k≤299 under the condition that the measurement outcome is used only once, Z is a 300×30 matrix. The pseudo inverse matrix Z+ is derived and wv0=Z+yv0 is calculated; then the training is completed.
Owing to the forgetting effect of the reservoir arithmetic device 100, the peak height decreases as the pulse position proceeds from t=100 to t=150. In this simulation, relaxation time τ was set to τ=30. Actually, in
Next,
The reason why the output signals became clearly identifiable (the peaks in
The calculation of the present invention is performed by the general arithmetic device 902 and the reservoir arithmetic device 100 (machine learning arithmetic device) as described in the first embodiment. One point in this context is that the versatility is not required for the machine learning arithmetic device. For this reason, a system without versatility but with a special performance can be used for the machine learning arithmetic device. As an example of this, a quantum system is described in the present embodiment. Special systems, such as quantum systems, may involve many limitations, and preferably the number of measurement times should be reduced as much as possible. As a solution to this, the measurement outcomes are used multiple times in the inference process of the machine learning device of the present invention. The machine learning arithmetic device is engaged only in obtaining the measurement outcomes; it is the general arithmetic device 902 that uses the measurement outcomes multiple times. That is, the general arithmetic device 902 and the machine learning arithmetic device are separately in charge of part of calculations; the former is engaged in the calculation requiring high versatility and the latter is engaged in the calculation not requiring versatility but preferably being performed with rich dynamics. Thus, one aspect of the present invention is that it succeeded in letting each arithmetic device dedicate itself to what they are good at.
Another possible aspect is as follows.
In recurrent neural network (RNN) and reservoir computing, an outcome at a node at a time point is reflected on an outcome at the node at a different time point. These methods and the present invention have similarities in the viewpoint of utilizing values at different time points. However, the present invention is clearly different from these methods. Specifically, while the approach used in general RNN and reservoir is related to a concept inside the RNN and the reservoir, the approach according to the present invention is related to the concept outside the RNN or reservoir. The approach used in general RNN and reservoir improves the dynamics through temporally connecting values at respective nodes inside a time sequence. In contrast, the approach according to the present invention effectively extracts the dynamics that are obtainable through the general approaches for RNNs and reservoirs; the aim is different. In order to maximize the performance of the system, means for generating dynamics and means for drawing the dynamics out are both required. The method of the present invention achieves the latter, so that both of these means can be provided.
Furthermore, while the time sequence in the RNN or reservoir flows only in one direction from the previous time point to the subsequent time point, the time sequence in accordance with the present invention may be in either direction. This is another major difference between the two concepts.
In the fourth embodiment, an example of the execution of a timer task was described as the most easily understandable example. Alternatively, the present invention can be applied to wide variety of tasks such as voice recognition.
A machine learning device according to a fifth embodiment is described below with reference to
In the first to the fourth embodiments, one input x(tk) and one output y(tk) are assumed at the time point tk. Alternatively, both input and output can be more than one as illustrated in
A machine learning device according to a sixth embodiment is described below with reference to
In the first to the fifth embodiments, the embodiments of the present invention are described using reservoir computing as an example. Alternatively, the present invention can be applied to other types of machine learning. In the present embodiment, an example in which the present invention is applied to deep learning will be described.
In a machine learning model illustrated in
As described above in the embodiments, the present invention can make the number of processing data large even when the number of measurement times at the machine-learning-dedicated device is small. Thus, a large amount of information can be extracted with the small number of measurement times, whereby the dynamics offered with physical systems can be sufficiently drawn out even when measurements accompany difficulties.
Number | Date | Country | Kind |
---|---|---|---|
2019-158104 | Aug 2019 | JP | national |