This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2023-014633, filed Feb. 2, 2023, the entire contents of which are incorporated herein by reference.
Embodiments described herein relate generally to an RNN training apparatus, a method, and a storage medium BACKGROUND
There is a method of constructing mini-batches from time-series data with various sequence lengths and training a recurrent neural network (RNN) in units of mini-batches. In the RNN, there is a technique called truncated back propagation through time (TBPTT). In the TBPTT, the sequence data is not subjected to error back propagation in the temporal direction without limitation in the training of the RNN, but is subjected to error back propagation in units of blocks cut out in time steps (for example, 128 or the like) of a fixed length.
In the RNN, it is necessary to train the RNN by constructing a mini-batch while ensuring continuity of time-series by taking over a hidden state or the like. In addition, in the training of the deep neural network, it is necessary to shuffle the time-series data supplied to the RNN in order to avoid data bias. In the training of the RNN by the time-series data in units of mini-batches, if shuffling is performed with blocks divided by the TBPTT, continuity of a hidden state that should be propagated beyond the TBPTT is interrupted. For this reason, in the training of the RNN by the time-series data in units of mini-batches by the TBPTT, in a case where the continuity of the hidden state is to be secured, the degree of freedom in selecting the time-series data constituting the mini-batch is reduced, the time-series data supplied to the RNN is biased, and the convergence of the training is unstable and the efficiency is deteriorated.
The RNN training apparatus according to the embodiment includes a storage unit, a construction unit, a reading unit, an optimization unit, and a writing unit. The storage unit stores a hidden state that is intermediate output data of the recurrent neural network for the N sequences. The construction unit selects data of M sequences from data of N sequences used for training of the recurrent neural network to construct a mini-batch where M is smaller than N and outputs sequence information identifying the selected sequence. The reading unit reads the unprocessed hidden state of the sequence corresponding to the sequence information from the storage unit according to the sequence information. The optimization unit executes optimization calculation of the recurrent neural network based on the unprocessed hidden state and the mini-batch. The writing unit writes the processed hidden state, which is the intermediate output data of the recurrent neural network obtained by the optimization calculation, in the storage unit according to the sequence information.
Hereinafter, an RNN training apparatus, a method, and a program according to the present embodiment will be described with reference to the drawings.
The processing circuitry 1 includes a processor such as a central processing unit (CPU) and a memory such as a random access memory (RAM). The processing circuitry 1 includes an obtainment unit 11, a construction unit 12, a reading unit 13, an optimization unit 14, a writing unit 15, and a training control unit 16. The processing circuitry 1 implements each function of the respective units 11 to 16 by executing the RNN training program. The RNN training program is stored in a non-transitory computer-readable storage medium such as the storage 2. The RNN training program may be implemented as a single program that describes all the functions of the respective units 11 to 16 described above, or may be implemented as a plurality of modules divided into several functional units. Each of the respective units 11 to 16 may be implemented by an integrated circuit such as an application specific integrated circuit (ASIC). In this case, it may be mounted on a single integrated circuit or may be individually mounted on a plurality of integrated circuits.
The obtainment unit 11 obtains data of N sequences (hereinafter, sequence data) used for training the RNN. The sequence data includes a plurality of elements following an any rule. Examples of the sequence data according to the present embodiment include time-series data including a plurality of elements along a time-series, linguistic data including a plurality of elements along a word order, and the like. The time-series data is, for example, data having a plurality of measurement values continuously output from various measuring instruments as elements. The linguistic data is data having a plurality of words disposed in a word order as elements.
The construction unit 12 selects M (natural number) sequences from N pieces of sequence data used for training of the RNN to construct a mini-batch where M is smaller than N and outputs sequence information identifying the selected sequence. The number M of sequences constituting each mini-batch is referred to as a mini-batch size. As the sequence information, an identifier that uniquely identifies the sequence is used.
The reading unit 13 reads an unprocessed hidden state of the sequence corresponding to the sequence information from the storage 2 according to the sequence information. The reading of the unprocessed hidden state is performed before the optimization calculation by the optimization unit 14. The hidden state means intermediate output data from the RNN based on the sequence data. The unprocessed hidden state means a hidden state used for the optimization calculation of the optimization unit 14.
The optimization unit 14 executes optimization calculation of the RNN based on the unprocessed hidden state and the mini-batch. In the optimization calculation, the optimization unit 14 performs forward propagation calculation, back propagation calculation, and parameter update. In the forward propagation calculation and/or the back propagation calculation, the optimization unit 14 calculates a processed hidden state. The processed hidden state means a hidden state obtained by the optimization calculation.
The writing unit 15 writes the processed hidden state, which is the intermediate output data of the RNN obtained by the optimization calculation, in the storage 2 according to the sequence information.
The training control unit 16 controls the training processing of the RNN. The training control unit 16 determines whether the update end condition is satisfied, and controls the obtainment unit 11, the construction unit 12, the reading unit 13, the optimization unit 14, and the writing unit 15 to repeat the training processing until it is determined that the update end condition is satisfied. In a case where it is determined that the update end condition is satisfied, the training control unit 16 ends the training processing.
The storage 2 includes a read only memory (ROM), a hard disk drive (HDD), a solid state drive (SSD), an integrated circuit storage apparatus, and the like. The storage 2 stores an RNN training program and the like. In addition, the storage 2 stores the hidden state for the N sequences in a readable/writable manner.
The input device 3 receives various types of commands from the user. As the input device 3, a keyboard, a mouse, various switches, a touch pad, a touch panel display, and the like can be used. An output signal from the input device 3 is supplied to the processing circuitry 1. Note that the input device 3 may be an input device of a computer connected to the processing circuitry 1 in a wired or wireless manner.
The communication device 4 is an interface for performing data communication with an external apparatus connected to the RNN training apparatus 100 via a network.
The display 5 displays various types of information. As the display 5, a cathode-ray tube (CRT) display, a liquid crystal display, an organic electro luminescence (EL) display, a light-emitting diode (LED) display, a plasma display, or any other displays known in the art can be appropriately used. The display 5 may be a projector.
Hereinafter, an operation example of the RNN training apparatus 100 according to the present embodiment will be described.
First, the structure of the sequence data according to the present embodiment will be described with reference to
Next, the RNN training processing by the RNN training apparatus 100 according to the present embodiment will be described.
First, the training control unit 16 sets the index i to the value “0” (step SA1). The index i is a variable representing the epoch number for determining an update end condition.
In a case where step SA1 is performed, the training control unit 16 initializes a hidden state storage area 21 (step SA2). The hidden state storage area 21 is a storage area for a hidden state provided in the storage 2.
In a case where step SA3 is performed, the construction unit 12 selects M pieces of sequence data from N pieces of sequence data 41 to construct the mini-batch 42, and outputs sequence information 43 identifying the selected sequence (step SA3).
The construction unit 12 selects 3 blocks from among the 6 pieces of sequence data. The three selected blocks constitute one mini-batch 42m. There are various methods for selecting the three blocks. For example, first, the construction unit 12 randomly selects three pieces of sequence data from among the six pieces of sequence data. As another example, the construction unit 12 may select three pieces of sequence data from the six pieces of sequence data according to a predetermined rule. Next, the construction unit 12 sequentially selects an unprocessed block for each of the selected sequence data. Specifically, the construction unit 12 sequentially selects blocks from a block with an earlier sequence to a block with a later sequence among the unprocessed blocks. For example, as the first mini-batch 421, a block so of the sequence data s, a block q0 of the sequence data q, and a block u1 of the sequence data u are selected.
In a case where a block is selected, the construction unit 12 outputs sequence information of the selected block. For example, in a case where a block so of the sequence data s, a block q0 of the sequence data q, and a block u0 of the sequence data u are selected as the first mini-batch 421, an identifier representing the sequence s, an identifier representing the sequence q, and an identifier representing the sequence u are output. For example, the construction unit 12 may hold the sequence information database, query the sequence information database with the sequence data of the selected block, and output the sequence information corresponding to the block. The sequence information database is a database that systematically associates sequence information for each type of sequence data. The sequence information may include at least an identifier of a sequence.
In a case where step SA3 is performed, the training control unit 16 determines the presence or absence of an unprocessed mini-batch (step SA4). The presence or absence of an unprocessed mini-batch can be determined based on the presence or absence of an unselected block. Specifically, the training control unit 16 determines that there is an unprocessed mini-batch in a case where there is an unselected block, and determines that there is no unprocessed mini-batch in a case where there is no unselected block.
In step SA4, when it is determined that there is an unprocessed mini-batch (step SA4: YES), the reading unit 13 reads unprocessed hidden state 44 from the hidden state storage area 21 according to the sequence information 43 output in step SA3 (step SA5). The hidden state storage area 21 is a storage area provided in the storage 2. The hidden state storage area 21 is secured for each sequence identifier and stores hidden information related to the sequence identifier. The reading unit 13 searches the hidden state storage area 21 using the sequence information 43 as an index, and reads the hidden state (the unprocessed hidden state) from the storage area of the sequence information 43. Since the hidden state is overwritten in the hidden state storage area 21, the latest hidden state for one piece of sequence data, that is, only the hidden state obtained by the previous optimization calculation is stored.
In step SA5, the optimization unit 14 executes optimization calculation based on the mini-batch 42 constructed in step SA3 and the unprocessed hidden state 44 read in step SA5 (step SA6). In the optimization calculation, the optimization unit 14 performs forward propagation calculation, back propagation calculation, and parameter update. In the forward propagation calculation, an input/output of each layer is calculated. In the back propagation calculation, a gradient of an input/output of each calculated layer is calculated. In the parameter update, the parameter is updated based on the calculated gradient. In the forward propagation calculation, the optimization unit 14 calculates a hidden state of the last forward propagation calculation output of the RNN layer (forward propagation calculation output from the n-th RNN). The hidden state is stored in the hidden state storage area 21. Hereinafter, the hidden state is referred to as a processed hidden state. Note that the hidden state obtained by the forward propagation calculation performed after the parameter update may be stored in the hidden state storage area 21 as the “processed hidden state”.
The optimization unit 14 executes forward propagation calculation by recursively applying the hidden states before processing hu,2n-1, hv,n-1, hq,4n-1 and the n elements Xu,2n to Xu,3n-1, the n elements Xv,1n to Xv,2n-1, and the n elements Xq,4n to Xq,5n-1 to the RNN 60. More specifically, as illustrated in
In a case where step SA6 is performed, the writing unit 15 writes the processed hidden state 45 obtained by the optimization calculation in step SA6 in the hidden state storage area 21 according to the sequence information 43 output in step SA3 (step SA7). The writing unit 15 searches the hidden state storage area 21 using the sequence information 43 of the sequence data as an index for each of the three pieces of sequence data included in the mini-batch 42 to be processed, and overwrites the storage area of the sequence information 43 with the processed hidden state 45. As a result, the latest hidden state for each piece of the sequence data is stored in the hidden state storage area 21.
In a case where step SA7 is performed, in step SA3 again, the construction unit 12 selects M pieces of sequence data from the unprocessed sequence data 41, constructs the next mini-batch 42, and outputs the next sequence information 43. The read processing (step SA5), the optimization calculation (step SA6), and the write processing (step SA7) are executed for the next mini-batch 42 and/or the next sequence information 43.
In this manner, steps SA3 to SA7 are repeated as described above until it is determined in step SA4 that there is no unprocessed mini-batch 42. As illustrated in
In step SA4, in a case where it is determined that there is no unprocessed mini-batch 42 (step SA4: NO), the training control unit 16 adds the value “1” to the index i and determines whether the index i is less than an upper limit epoch number TH (step SA8). The upper limit epoch number TH may be set to an any value according to experience or an any algorithm.
When it is determined in step SA8 that the index i is less than the upper limit epoch number TH (step SA8: YES), steps SA1 to SA7 are repeated again for the next epoch until it is determined in step SA4 that there is no unprocessed mini-batch 42.
Then, in a case where it is determined in step SA8 that the index i is equal to or larger than the upper limit epoch number TH (step SA8: NO), the training control unit 16 outputs a trained network parameter 46 (step SA9). The trained network parameters 46 are stored in the storage 2. The trained network parameters 46 are assigned to the RNN, thereby constructing the trained RNN.
As described above, the RNN training processing by the RNN training apparatus 100 ends.
The RNN training processing illustrated in
Next, a detailed embodiment of the mini-batch initialization process (step SA2) to the mini-batch presence/absence determination process (step SA4) in
As illustrated in
In a case where step SB1 is performed, the construction unit 12 randomly selects sequences of a mini-batch size (M pieces) from the dictionary remain_len (step SB2). In step SB2, M sequences are randomly selected from sequences having a remaining length of 1 or more among the N sequences.
In a case where step SB2 is performed, the construction unit 12 extracts a block with seq_len[id]−remain_len[id] as an offset from the sequence (selection sequence) selected in step SB2, constructs a mini-batch to output sequence information (step SB3). seq_len[id] is a dictionary of the sequence identifier id that outputs the sequence length of the selection sequence. remain_len[id] is a dictionary of the sequence identifier id that outputs the remaining length of the selection sequence. seq_len[id]-remain_len[id] means the position in the sequence data of the block selected in the current time step.
In a case where step SB3 is performed, the training control unit 16 subtracts the TBPTT length from the remaining length of the selection sequence (step SB4). That is, in step SB4, the training control unit 16 executes remain_len[id]−=TBPTT_length. TBPTT_length represents a TBPTT length.
In a case where step SB4 is performed, the training control unit 16 deletes the selection identifier id from the dictionary remain_len if the remain_len[id] is 0 or less (step SB5). remain_len[id] being 0 or less than means that there is no remaining block in the sequence. In this case, since there is no need to be selected in the remaining time steps, the selection identifier id of the sequence is deleted from the dictionary remain_len.
In a case where step SB5 is performed, the training control unit 16 determines the absence or not (that is presence) of the content of the dictionary remain_len (step SB6). The presence of the content of the dictionary remain_len means that there is a sequence having a remaining block, and the absence of the content of the dictionary remain_len means that there is no sequence having a remaining block.
In step SB6, in a case where it is determined that the content of the dictionary remain_len is present (step SB6: NO), the read processing (step SA5), the optimization calculation (step SA6), and the write processing (step SA7) illustrated in
This is the end of Example 1.
As illustrated in
This is the end of Example 2.
As illustrated in
This is the end of Example 3.
Effects according to the present embodiment will be described with reference to
Here, a comparative example will be briefly described with reference to
On the other hand, since the RNN training apparatus 100 according to the present embodiment includes the reading unit 13, the writing unit 15, and the hidden state storage area 21, it is possible to read and write the hidden state at any timing. Therefore, in the present embodiment, since padding can be reduced as compared with that in the comparative example, it is possible to reduce a calculation load for a padded block. In addition, since the type of the sequence data or the block can be made different between adjacent mini-batches, in the present embodiment, as compared with that in the comparative example, the bias of the training data is reduced, the convergence of the training is stabilized, and the performance of the finally converged RNN is improved. As illustrated in
Thus, according to the present embodiment, it is possible to provide an RNN training apparatus, a method, and a program capable of improving convergence stability and efficiency of training of a recurrent neural network.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2023-014633 | Feb 2023 | JP | national |