This application relates to the field of artificial intelligence technologies, and in particular, to a data processing method and apparatus, and a storage medium.
With continuous development of artificial intelligence (AI) technologies, a recurrent neural network (RNN) has a large quantity of application requirements in a terminal device, for example, applications such as voice wake-up, speech noise cancellation, and speech recognition. However, storage resources and computing resources of the terminal device are limited, and a quantity of included parameters and a calculation amount in the recurrent neural network are large. Consequently, it is difficult to deploy the recurrent neural network on the terminal device. Therefore, how to reduce the calculation amount and the quantity of parameters in the recurrent neural network and accelerate a network computing speed with network precision being ensured becomes an urgent problem to be resolved.
In view of this, a data processing method and apparatus, and a storage medium are provided.
According to a first aspect, an embodiment of this application provides a data processing method. The method includes: extracting a feature sequence of target data, where the feature sequence includes T input features, T is a positive integer, and t∈[1, T]; obtaining T hidden state vectors based on a recurrent neural network, where a tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, and a (t−1)th extended state vector, and the (t−1)th extended state vector is obtained by performing lightweight processing based on the (t−1)th hidden state vector; and obtaining a processing result of the target data based on the T hidden state vectors by using a downstream task network.
According to this embodiment of this application, because a partial state vector in a state vector that currently needs to be input to the recurrent neural network is an extended state vector obtained through lightweight processing, the recurrent neural network may be controlled to output a hidden state vector of a small dimension. In this way, a quantity of parameters and a calculation amount that are required for outputting the hidden state vector by the recurrent neural network can be reduced. A dimension of the hidden state vector output by the recurrent neural network is reduced. However, because an extended state vector obtained by performing lightweight processing on the hidden state vector and the hidden state vector jointly form a complete state vector input to the recurrent neural network, this is equivalent to a supplementary to status information input to the recurrent neural network. In this way, a network computing speed can be improved, network precision can be ensured during data processing, and processing efficiency of the target data can be improved. In addition, a recurrent neural network with a reduced quantity of parameters and a reduced calculation amount can be deployed on a terminal device, and has higher universality.
In an embodiment, the recurrent neural network includes a first-type recurrent neural network. The first-type recurrent neural network includes a reset gate layer and an update gate layer. The reset gate layer is used to control information to be discarded from a hidden state vector. The update gate layer is used to control information to be added to a hidden state vector. That a tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, and a (t−1)th extended state vector includes: determining first gated vectors based on the (t−1)th input feature, the (t−1)th hidden state vector, and the (t−1)th extended state vector respectively by using first gated neurons at the reset gate layer and the update gate layer; determining, by using a candidate neuron in the first-type recurrent neural network, a first candidate hidden state vector based on the first gated vector determined by the first gated neuron at the reset gate layer, the (t−1)th input feature, and the (t−1)th hidden state vector, or determining a first candidate hidden state vector based on the first gated vector determined by the first gated neuron at the reset gate layer, the (t−1)th input feature, the (t−1)th hidden state vector, and the (t−1)th extended state vector; and determining the tth hidden state vector based on the first gated vector determined by the first gated neuron at the update gate layer, the (t−1)th hidden state vector, and the first candidate hidden state vector.
According to this embodiment of this application, the tth hidden state vector is determined based on the (t−1)th input feature, the (t−1)th hidden state vector, and the (t−1)th extended state vector by using the first-type recurrent neural network, so that the first-type recurrent neural network can output a hidden state vector of a small dimension, thereby reducing a quantity of parameters and a calculation amount in the first-type recurrent neural network.
In an embodiment, the recurrent neural network includes a first-type recurrent neural network. The first-type recurrent neural network includes a reset gate layer and an update gate layer. The reset gate layer is used to control information to be discarded from a hidden state vector. The update gate layer is used to control information to be added to a hidden state vector. That a tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, and a (t−1)th extended state vector by using the recurrent neural network includes: determining a first gated vector based on the (t−1)th input feature, the (t−1)th hidden state vector, and the (t−1)th extended state vector by using a first gated neuron at the reset gate layer or the update gate layer in the first-type recurrent neural network; performing lightweight processing on the first gated vector by using a first transform neuron in the first-type recurrent neural network, to obtain a first supplementary gated vector; and determining the tth hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first supplementary gated vector, and the first gated vector, or determining the tth hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first supplementary gated vector, the first gated vector, and the (t−1)th extended state vector.
In an embodiment, when the first gated neuron is a gated neuron at the update gate layer in the first-type recurrent neural network, the determining the tth hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first supplementary gated vector, and the first gated vector includes: determining a second candidate hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, and the first supplementary gated vector by using a candidate neuron in the first-type recurrent neural network; and determining the tth hidden state vector based on the first gated vector, the (t−1)th hidden state vector, and the second candidate hidden state vector.
In an embodiment, when the first gated neuron is a gated neuron at the reset gate layer in the first-type recurrent neural network, the determining the tth hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first supplementary gated vector, and the first gated vector includes: determining a third candidate hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, and the first gated vector by using a candidate neuron in the first-type recurrent neural network; and determining the tth hidden state vector based on the first supplementary gated vector, the (t−1)th hidden state vector, and the third candidate hidden state vector.
In an embodiment, when the first gated neuron is a gated neuron at the update gate layer in the first-type recurrent neural network, the determining the tth hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first supplementary gated vector, the first gated vector, and the (t−1)th extended state vector includes: determining a fourth candidate hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first supplementary gated vector, and the (t−1)th extended state vector by using a candidate neuron in the first-type recurrent neural network; and determining the tth hidden state vector based on the first gated vector, the (t−1)th hidden state vector, and the fourth candidate hidden state vector.
In an embodiment, when the first gated neuron is a gated neuron at the reset gate layer in the first-type recurrent neural network, the determining the tth hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first supplementary gated vector, the first gated vector, and the (t−1)th extended state vector includes: determining a fifth candidate hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first gated vector, and the (t−1)th extended state vector by using a candidate neuron in the first-type recurrent neural network; and determining the tth hidden state vector based on the first supplementary gated vector, the (t−1)th hidden state vector, and the fifth candidate hidden state vector.
According to this embodiment of this application, lightweight processing is performed on the first gated vector, to obtain the first supplementary gated vector. This is equivalent to generating a partial gated vector through lightweight processing. In a related technology, two gated neurons in the first-type recurrent neural network are directly used to output two gated vectors based on the (t−1)th input feature and a (t−1)th spliced state vector. In comparison, in this application, a quantity of parameters and a calculation amount for generating a gated vector can be reduced, thereby reducing a quantity of parameters and a calculation amount in the entire first-type recurrent neural network and improving a network computing speed. Control of the first-type recurrent neural network on a hidden state can be ensured, so that the first-type recurrent neural network has higher universality.
In an embodiment, the recurrent neural network includes a second-type recurrent neural network. That a tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, and a (t−1)th extended state vector includes: splicing the (t−1)th hidden state vector and the (t−1)th extended state vector, to obtain a (t−1)th spliced state vector; and determining the tth hidden state vector and a tth cell state vector based on the (t−1)th input feature, the (t−1)th spliced state vector, and a (t−1)th cell state vector by using the second-type recurrent neural network, where the tth cell state vector is determined based on the (t−1)th spliced state vector, the (t−1)th input feature, and the (t−1)th cell state vector, the tth hidden state vector is determined based on the (t−1)th spliced state vector, the (t−1)th input feature, and the tth cell state vector, and a 0th cell state vector is an initial value.
According to this embodiment of this application, the second-type recurrent neural network can output a hidden state vector of a small dimension, thereby reducing a quantity of parameters and a calculation amount in the second-type recurrent neural network.
In an embodiment, the determining the tth hidden state vector and a tth cell state vector based on the (t−1)th input feature, the (t−1)th spliced state vector, and a (t−1)th cell state vector by using the second-type recurrent neural network includes: determining a second gated vector based on the (t−1)th input feature and the (t−1)th spliced state vector by using a second gated neuron in the second-type recurrent neural network; performing lightweight processing on the second gated vector by using a second transform neuron in the second-type recurrent neural network, to obtain a second supplementary gated vector; determining a first candidate cell state vector based on the (t−1)th input feature and the (t−1)th spliced state vector by using a candidate neuron in the second-type recurrent neural network; and determining the tth hidden state vector and the tth cell state vector based on the second gated vector, the second supplementary gated vector, the first candidate cell state vector, and the (t−1)th cell state vector.
In an embodiment, the second-type recurrent neural network includes a forget gate layer, an input gate layer, and an output gate layer. The forget gate layer is used to control information to be discarded from a cell state vector. The input gate layer is used to control information to be added to a cell state vector. The output gate layer is used to control information in a to-be-output cell state vector.
In an embodiment, when the second gated neuron is a gated neuron at the forget gate layer, the second supplementary gated vector includes second supplementary gated vectors that are obtained by performing lightweight processing on the second gated vector respectively by second transform neurons at the input gate layer and the output gate layer. The determining the tth hidden state vector and the tth cell state vector based on the second gated vector, the second supplementary gated vector, the first candidate cell state vector, and the (t−1)th cell state vector includes: determining the tth cell state vector based on the second supplementary gated vector that is obtained by performing lightweight processing on the second gated vector by the second transform neuron at the input gate layer, the second gated vector, and the first candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the second supplementary gated vector that is obtained by performing lightweight processing on the second gated vector by the second transform neuron at the output gate layer.
In an embodiment, when the second gated neuron is a gated neuron at the input gate layer, the second supplementary gated vector includes second supplementary gated vectors that are obtained by performing lightweight processing on the second gated vector respectively by second transform neurons at the forget gate layer and the output gate layer. The determining the tth hidden state vector and the tth cell state vector based on the second gated vector, the second supplementary gated vector, the first candidate cell state vector, and the (t−1)th cell state vector includes: determining the tth cell state vector based on the second supplementary gated vector that is obtained by performing lightweight processing on the second gated vector by the second transform neuron at the forget gate layer, the second gated vector, and the first candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the second supplementary gated vector that is obtained by performing lightweight processing on the second gated vector by the second transform neuron at the output gate layer.
In an embodiment, when the second gated neuron is a gated neuron at the output gate layer in the second-type recurrent neural network, the second supplementary gated vector includes second supplementary gated vectors that are obtained by performing lightweight processing on the second gated vector respectively by second transform neurons at the forget gate layer and the input gate layer in the second-type recurrent neural network. The determining the tth hidden state vector and the tth cell state vector based on the second gated vector, the second supplementary gated vector, the first candidate cell state vector, and the (t−1)th cell state vector includes: determining the tth cell state vector based on the second supplementary gated vectors that are obtained by performing lightweight processing on the second gated vector respectively by the second transform neurons at the forget gate layer and the input gate layer, and the first candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the second gated vector.
In an embodiment, when the second gated neuron includes gated neurons at the forget gate layer and the input gate layer in the second-type recurrent neural network, the second supplementary gated vector includes a second supplementary gated vector that is obtained by the second transform neuron by performing lightweight processing on a second gated vector determined by the second gated neuron at the forget gate layer and/or the input gate layer. The determining the tth hidden state vector and the tth cell state vector based on the second gated vector, the second supplementary gated vector, the first candidate cell state vector, and the (t−1)th cell state vector includes: determining the tth cell state vector based on the second gated vectors respectively determined by the second gated neurons at the forget gate layer and the input gate layer, and the first candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the second supplementary gated vector.
In an embodiment, when the second gated neuron includes gated neurons at the forget gate layer and the output gate layer, the second supplementary gated vector includes a second supplementary gated vector that is obtained by the second transform neuron by performing lightweight processing on a second gated vector determined by a second gated neuron at the forget gate layer and/or the output gate layer. The determining the tth hidden state vector and the tth cell state vector based on the second gated vector, the second supplementary gated vector, the first candidate cell state vector, and the (t−1)th cell state vector includes: determining the tth cell state vector based on the second gated vector determined by the second gated neuron at the forget gate layer, the second supplementary gated vector, and the first candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the second gated vector determined by the second gated neuron at the output gate layer.
In an embodiment, when the second gated neuron includes gated neurons at the input gate layer and the output gate layer in the second-type recurrent neural network, the second supplementary gated vector includes a second supplementary gated vector that is obtained by the second transform neuron by performing lightweight processing on a second gated vector determined by a second gated neuron at the input gate layer and/or the output gate layer. The determining the tth hidden state vector and the tth cell state vector based on the second gated vector, the second supplementary gated vector, the first candidate cell state vector, and the (t−1)th cell state vector includes: determining the tth cell state vector based on the second gated vector determined by the second gated neuron at the input gate layer, the second supplementary gated vector, and the first candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the second gated vector determined by the second gated neuron at the output gate layer.
According to this embodiment of this application, lightweight processing is performed on the second gated vector, to obtain the second supplementary gated vector. This is equivalent to generating a partial gated vector through lightweight processing. In a related technology, three gated neurons in the second-type recurrent neural network are directly used to output three gated vectors based on the (t−1)th input feature and a (t−1)th spliced state vector. In comparison, in this application, a quantity of parameters and a calculation amount for generating a gated vector can be reduced, thereby reducing a quantity of parameters and a calculation amount in the entire second-type recurrent neural network and improving a network computing speed. Control of the second-type recurrent neural network on a hidden state can be ensured, so that the second-type recurrent neural network has higher universality.
In some embodiments, the lightweight processing includes nonlinear transformation and/or linear transformation.
According to this embodiment of this application, a corresponding extended state vector and/or a corresponding supplementary gated vector are/is obtained through nonlinear transformation and/or linear transformation, so that an overall quantity of parameters and a calculation amount can be reduced through lightweight processing at a lightweight level.
In some embodiments, the target data includes at least one of the following: voice data, image data, and text data; and the processing result includes at least one of the following: a speech recognition result of the voice data, a speech noise cancellation result of the voice data, a voice wake-up result of the voice data, a text recognition result of the image data, and a text translation result of the text data.
In some embodiments, the quantity of parameters in the recurrent neural network is positively correlated with a dimension of a hidden state vector output by the recurrent neural network.
According to this embodiment of this application, because the quantity of parameters in the recurrent neural network is positively correlated with the dimension of the hidden state vector output by the recurrent neural network, a hidden state vector with a small dimension may be output by using a recurrent neural network with a small quantity of parameters, thereby reducing a calculation amount in the recurrent neural network and improving a network processing speed.
According to a second aspect, an embodiment of this application provides a data processing method. The method includes: extracting a feature sequence of target data, where the feature sequence includes T input features, T is a positive integer, and t∈[1, T]; obtaining T hidden state vectors based on a first-type recurrent neural network, where a tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, a third supplementary gated vector, and a third gated vector, the third gated vector is determined based on the (t−1)th input feature and the (t−1)th hidden state vector by using a first gated neuron in the first-type recurrent neural network, a 0th hidden state vector is an initial value, and the third supplementary gated vector is obtained by performing lightweight processing on the third gated vector by using a first transform neuron in the first-type recurrent neural network; and obtaining a processing result of the target data based on the T hidden state vectors by using a downstream task network.
In an embodiment, the first-type recurrent neural network includes a reset gate layer and an update gate layer. The reset gate layer is used to control information to be discarded from a hidden state vector. The update gate layer is used to control information to be added to a hidden state vector.
In an embodiment, when the first gated neuron is a gated neuron at the update gate layer, that a tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, a third supplementary gated vector, and a third gated vector includes: determining a sixth candidate hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, and the third supplementary gated vector by using a candidate neuron in the first-type recurrent neural network; and determining the tth hidden state vector based on the third gated vector, the (t−1)th hidden state vector, and the sixth candidate hidden state vector.
In an embodiment, when the first gated neuron is a gated neuron at the reset gate layer, that a tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, a third supplementary gated vector, and a third gated vector includes: determining a seventh candidate hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, and the third gated vector by using a candidate neuron in the first-type recurrent neural network; and determining the tth hidden state vector based on the third supplementary gated vector, the (t−1)th hidden state vector, and the seventh candidate hidden state vector.
According to this embodiment of this application, lightweight processing is performed on the third gated vector, to obtain the third supplementary gated vector. This is equivalent to generating a partial gated vector through lightweight processing. In a related technology, two gated neurons in the first-type recurrent neural network are directly used to output two gated vectors based on the (t−1)th input feature and a (t−1)th hidden state vector. In comparison, in this application, a quantity of parameters and a calculation amount for generating a gated vector can be reduced, thereby reducing a quantity of parameters and a calculation amount in the entire first-type recurrent neural network and improving a network computing speed. Control of the first-type recurrent neural network on a hidden state can be ensured, so that the first-type recurrent neural network has higher universality.
In some embodiments, the lightweight processing includes nonlinear transformation and/or linear transformation. According to this embodiment of this application, a corresponding supplementary gated vector is obtained through nonlinear transformation and/or linear transformation, so that an overall quantity of parameters and a calculation amount can be reduced through lightweight processing at a lightweight level.
According to a third aspect, an embodiment of this application provides a data processing method. The method includes: extracting a feature sequence of target data, where the feature sequence includes T input features, Tis a positive integer, t∈[1, T], a tth hidden state vector and a tth cell state vector are determined based on a fourth gated vector, a fourth supplementary gated vector, a second candidate cell state vector, and a (t−1)th cell state vector, a 0th cell state vector is an initial value, the fourth gated vector is determined based on a (t−1)th input feature and a (t−1)th hidden state vector by using a second gated neuron in a second-type recurrent neural network, a 0th hidden state vector is an initial value, the fourth supplementary gated vector is obtained by performing lightweight processing on the fourth gated vector by using a second transform neuron in the second-type recurrent neural network, and the second candidate cell state vector is determined based on the (t−1)th input feature and the (t−1)th hidden state vector by using a candidate neuron in the second-type recurrent neural network; and obtaining a processing result of the target data based on the T hidden state vectors by using a downstream task network.
In an embodiment, the second-type recurrent neural network includes a forget gate layer, an input gate layer, and an output gate layer. The forget gate layer is used to control information to be discarded from a cell state vector. The input gate layer is used to control information to be added to a cell state vector. The output gate layer is used to control information in a to-be-output cell state vector.
In an embodiment, when the second gated neuron is a gated neuron at the forget gate layer, the fourth supplementary gated vector includes fourth supplementary gated vectors that are obtained by performing lightweight processing on the fourth gated vector respectively by second transform neurons at the input gate layer and the output gate layer. That a tth hidden state vector and a tth cell state vector are determined based on a fourth gated vector, a fourth supplementary gated vector, a second candidate cell state vector, and a (t−1)th cell state vector includes: determining the tth cell state vector based on the fourth supplementary gated vector that is obtained by performing lightweight processing on the fourth gated vector by the second transform neuron at the input gate layer, the fourth gated vector, and the second candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the fourth supplementary gated vector that is obtained by performing lightweight processing on the fourth gated vector by the second transform neuron at the output gate layer.
In an embodiment, when the second gated neuron is a gated neuron at the input gate layer in the second-type recurrent neural network, the fourth supplementary gated vector includes fourth supplementary gated vectors that are obtained by performing lightweight processing on the fourth gated vector respectively by second transform neurons at the forget gate layer and the output gate layer in the second-type recurrent neural network. That a tth hidden state vector and a tth cell state vector are determined based on a fourth gated vector, a fourth supplementary gated vector, a second candidate cell state vector, and a (t−1)th cell state vector includes: determining the tth cell state vector based on the fourth supplementary gated vector that is obtained by performing lightweight processing on the fourth gated vector by the second transform neuron at the forget gate layer, the fourth gated vector, and the second candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the fourth supplementary gated vector that is obtained by performing lightweight processing on the fourth gated vector by the second transform neuron at the output gate layer.
In an embodiment, when the second gated neuron is a gated neuron at the output gate layer, the fourth supplementary gated vector includes fourth supplementary gated vectors that are obtained by performing lightweight processing on the fourth gated vector respectively by second transform neurons at the forget gate layer and the input gate layer. That a tth hidden state vector and a tth cell state vector are determined based on a fourth gated vector, a fourth supplementary gated vector, a second candidate cell state vector, and a (t−1)th cell state vector includes: determining the tth cell state vector based on the fourth supplementary gated vectors that are obtained by performing lightweight processing on the fourth gated vector respectively by the second transform neurons at the forget gate layer and the input gate layer, and the second candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the second gated vector.
In an embodiment, when the second gated neuron includes gated neurons at the forget gate layer and the input gate layer, the fourth supplementary gated vector includes a fourth supplementary gated vector that is obtained by the second transform neuron by performing lightweight processing on a fourth gated vector determined by the second gated neuron at the forget gate layer and/or the input gate layer. That a tth hidden state vector and a tth cell state vector are determined based on a fourth gated vector, a fourth supplementary gated vector, a second candidate cell state vector, and a (t−1)th cell state vector includes: determining the tth cell state vector based on the fourth gated vectors respectively determined by the second gated neurons at the forget gate layer and the input gate layer, and the second candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the fourth supplementary gated vector.
In an embodiment, when the second gated neuron includes gated neurons at the forget gate layer and the output gate layer, the fourth supplementary gated vector includes a fourth supplementary gated vector that is obtained by the second transform neuron by performing lightweight processing on a fourth gated vector determined by a second gated neuron at the forget gate layer and/or the output gate layer. That a tth hidden state vector and a tth cell state vector are determined based on a fourth gated vector, a fourth supplementary gated vector, a second candidate cell state vector, and a (t−1)th cell state vector includes: determining the tth cell state vector based on the fourth gated vector determined by the second gated neuron at the forget gate layer, the fourth supplementary gated vector, and the second candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the fourth gated vector determined by the second gated neuron at the output gate layer.
In an embodiment, when the second gated neuron includes gated neurons at the input gate layer and the output gate layer, the fourth supplementary gated vector includes a fourth supplementary gated vector that is obtained by the second transform neuron by performing lightweight processing on a fourth gated vector determined by a second gated neuron at the input gate layer and/or the output gate layer. That a tth hidden state vector and a tth cell state vector are determined based on a fourth gated vector, a fourth supplementary gated vector, a second candidate cell state vector, and a (t−1)th cell state vector includes: determining the tth cell state vector based on the fourth gated vector determined by the second gated neuron at the input gate layer, the fourth supplementary gated vector, and the second candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the fourth gated vector determined by the second gated neuron at the output gate layer.
According to this embodiment of this application, lightweight processing is performed on the fourth gated vector, to obtain the fourth supplementary gated vector. This is equivalent to generating a partial gated vector through lightweight processing. In a related technology, three gated neurons in the second-type recurrent neural network are directly used to output three gated vectors based on the (t−1)th input feature and a (t−1)th hidden state vector. In comparison, in this application, a quantity of parameters and a calculation amount for generating a gated vector can be reduced, thereby reducing a quantity of parameters and a calculation amount in the entire second-type recurrent neural network and improving a network computing speed. Control of the second-type recurrent neural network on a hidden state can be ensured, so that the second-type recurrent neural network has higher universality.
In some embodiments, the lightweight processing includes nonlinear transformation and/or linear transformation. According to this embodiment of this application, a corresponding supplementary gated vector is obtained through nonlinear transformation and/or linear transformation, so that an overall quantity of parameters and a calculation amount can be reduced through lightweight processing at a lightweight level.
According to a fourth aspect, an embodiment of this application provides a data processing apparatus. The apparatus includes the following modules. A feature extraction module is configured to extract a feature sequence of target data. The feature sequence includes T input features. Herein, T is a positive integer, and t∈[1, T]. A first determining module is configured to obtain T hidden state vectors based on a recurrent neural network. A tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, and a (t−1)th extended state vector. The (t−1)th extended state vector is obtained by performing lightweight processing based on the (t−1)th hidden state vector. A processing result determining module is configured to obtain a processing result of the target data based on the T hidden state vectors by using a downstream task network.
In an embodiment, the recurrent neural network includes a first-type recurrent neural network. The first-type recurrent neural network includes a reset gate layer and an update gate layer. The reset gate layer is used to control information to be discarded from a hidden state vector. The update gate layer is used to control information to be added to a hidden state vector. That a tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, and a (t−1)th extended state vector includes: determining first gated vectors based on the (t−1)th input feature, the (t−1)th hidden state vector, and the (t−1)th extended state vector respectively by using first gated neurons at the reset gate layer and the update gate layer; determining, by using a candidate neuron in the first-type recurrent neural network, a first candidate hidden state vector based on the first gated vector determined by the first gated neuron at the reset gate layer, the (t−1)th input feature, and the (t−1)th hidden state vector, or determining a first candidate hidden state vector based on the first gated vector determined by the first gated neuron at the reset gate layer, the (t−1)th input feature, the (t−1)th hidden state vector, and the (t−1)th extended state vector; and determining the tth hidden state vector based on the first gated vector determined by the first gated neuron at the update gate layer, the (t−1)th hidden state vector, and the first candidate hidden state vector.
In an embodiment, the recurrent neural network includes a first-type recurrent neural network. The first-type recurrent neural network includes a reset gate layer and an update gate layer. The reset gate layer is used to control information to be discarded from a hidden state vector. The update gate layer is used to control information to be added to a hidden state vector. That a tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, and a (t−1)th extended state vector includes: determining a first gated vector based on the (t−1)th input feature, the (t−1)th hidden state vector, and the (t−1)th extended state vector by using a first gated neuron at the reset gate layer or the update gate layer in the first-type recurrent neural network; performing lightweight processing on the first gated vector by using a first transform neuron in the first-type recurrent neural network, to obtain a first supplementary gated vector; and determining the tth hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first supplementary gated vector, and the first gated vector, or determining the tth hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first supplementary gated vector, the first gated vector, and the (t−1)th extended state vector.
In an embodiment, when the first gated neuron is a gated neuron at the update gate layer in the first-type recurrent neural network, the determining the tth hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first supplementary gated vector, and the first gated vector includes: determining a second candidate hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, and the first supplementary gated vector by using a candidate neuron in the first-type recurrent neural network; and determining the tth hidden state vector based on the first gated vector, the (t−1)th hidden state vector, and the second candidate hidden state vector.
In an embodiment, when the first gated neuron is a gated neuron at the reset gate layer in the first-type recurrent neural network, the determining the tth hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first supplementary gated vector, and the first gated vector includes: determining a third candidate hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, and the first gated vector by using a candidate neuron in the first-type recurrent neural network; and determining the tth hidden state vector based on the first supplementary gated vector, the (t−1)th hidden state vector, and the third candidate hidden state vector.
In an embodiment, when the first gated neuron is a gated neuron at the update gate layer in the first-type recurrent neural network, the determining the tth hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first supplementary gated vector, the first gated vector, and the (t−1)th extended state vector includes: determining a fourth candidate hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first supplementary gated vector, and the (t−1)th extended state vector by using a candidate neuron in the first-type recurrent neural network; and determining the tth hidden state vector based on the first gated vector, the (t−1)th hidden state vector, and the fourth candidate hidden state vector.
In an embodiment, when the first gated neuron is a gated neuron at the reset gate layer in the first-type recurrent neural network, the determining the tth hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first supplementary gated vector, the first gated vector, and the (t−1)th extended state vector includes: determining a fifth candidate hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first gated vector, and the (t−1)th extended state vector by using a candidate neuron in the first-type recurrent neural network; and determining the tth hidden state vector based on the first supplementary gated vector, the (t−1)th hidden state vector, and the fifth candidate hidden state vector.
In an embodiment, the recurrent neural network includes a second-type recurrent neural network. That a tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, and a (t−1)th extended state vector includes: splicing the (t−1)th hidden state vector and the (t−1)th extended state vector, to obtain a (t−1)th spliced state vector; and determining the tth hidden state vector and a tth cell state vector based on the (t−1)th input feature, the (t−1)th spliced state vector, and a (t−1)th cell state vector by using the second-type recurrent neural network, where the tth cell state vector is determined based on the (t−1)th spliced state vector, the (t−1)th input feature, and the (t−1)th cell state vector, the tth hidden state vector is determined based on the (t−1)th spliced state vector, the (t−1)th input feature, and the tth cell state vector, and a 0th cell state vector is an initial value.
In an embodiment, the determining the tth hidden state vector and a tth cell state vector based on the (t−1)th input feature, the (t−1)th spliced state vector, and a (t−1)th cell state vector by using the second-type recurrent neural network includes: determining a second gated vector based on the (t−1)th input feature and the (t−1)th spliced state vector by using a second gated neuron in the second-type recurrent neural network; performing lightweight processing on the second gated vector by using a second transform neuron in the second-type recurrent neural network, to obtain a second supplementary gated vector; determining a first candidate cell state vector based on the (t−1)th input feature and the (t−1)th spliced state vector by using a candidate neuron in the second-type recurrent neural network; and determining the tth hidden state vector and the tth cell state vector based on the second gated vector, the second supplementary gated vector, the first candidate cell state vector, and the (t−1)th cell state vector.
In an embodiment, the second-type recurrent neural network includes a forget gate layer, an input gate layer, and an output gate layer. The forget gate layer is used to control information to be discarded from a cell state vector. The input gate layer is used to control information to be added to a cell state vector. The output gate layer is used to control information in a to-be-output cell state vector.
In an embodiment, when the second gated neuron is a gated neuron at the forget gate layer in the second-type recurrent neural network, the second supplementary gated vector includes second supplementary gated vectors that are obtained by performing lightweight processing on the second gated vector respectively by second transform neurons at the input gate layer and the output gate layer. The determining the tth hidden state vector and the tth cell state vector based on the second gated vector, the second supplementary gated vector, the first candidate cell state vector, and the (t−1)th cell state vector includes: determining the tth cell state vector based on the second supplementary gated vector that is obtained by performing lightweight processing on the second gated vector by the second transform neuron at the input gate layer, the second gated vector, and the first candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the second supplementary gated vector that is obtained by performing lightweight processing on the second gated vector by the second transform neuron at the output gate layer.
In an embodiment, when the second gated neuron is a gated neuron at the input gate layer in the second-type recurrent neural network, the second supplementary gated vector includes second supplementary gated vectors that are obtained by performing lightweight processing on the second gated vector respectively by second transform neurons at the forget gate layer and the output gate layer. The determining the tth hidden state vector and the tth cell state vector based on the second gated vector, the second supplementary gated vector, the first candidate cell state vector, and the (t−1)th cell state vector includes: determining the tth cell state vector based on the second supplementary gated vector that is obtained by performing lightweight processing on the second gated vector by the second transform neuron at the forget gate layer, the second gated vector, and the first candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the second supplementary gated vector that is obtained by performing lightweight processing on the second gated vector by the second transform neuron at the output gate layer.
In an embodiment, when the second gated neuron is a gated neuron at the output gate layer, the second supplementary gated vector includes second supplementary gated vectors that are obtained by performing lightweight processing on the second gated vector respectively by second transform neurons at the forget gate layer and the input gate layer in the second-type recurrent neural network. The determining the tth hidden state vector and the tth cell state vector based on the second gated vector, the second supplementary gated vector, the first candidate cell state vector, and the (t−1)th cell state vector includes: determining the tth cell state vector based on the second supplementary gated vectors that are obtained by performing lightweight processing on the second gated vector respectively by the second transform neurons at the forget gate layer and the input gate layer, and the first candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the second gated vector.
In an embodiment, when the second gated neuron includes gated neurons at the forget gate layer and the input gate layer, the second supplementary gated vector includes a second supplementary gated vector that is obtained by the second transform neuron by performing lightweight processing on a second gated vector determined by the second gated neuron at the forget gate layer and/or the input gate layer. The determining the tth hidden state vector and the tth cell state vector based on the second gated vector, the second supplementary gated vector, the first candidate cell state vector, and the (t−1)th cell state vector includes: determining the tth cell state vector based on the second gated vectors respectively determined by the second gated neurons at the forget gate layer and the input gate layer, and the first candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the second supplementary gated vector.
In an embodiment, when the second gated neuron includes gated neurons at the forget gate layer and the output gate layer, the second supplementary gated vector includes a second supplementary gated vector that is obtained by the second transform neuron by performing lightweight processing on a second gated vector determined by a second gated neuron at the forget gate layer and/or the output gate layer. The determining the tth hidden state vector and the tth cell state vector based on the second gated vector, the second supplementary gated vector, the first candidate cell state vector, and the (t−1)th cell state vector includes: determining the tth cell state vector based on the second gated vector determined by the second gated neuron at the forget gate layer, the second supplementary gated vector, and the first candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the second gated vector determined by the second gated neuron at the output gate layer.
In an embodiment, when the second gated neuron includes gated neurons at the input gate layer and the output gate layer, the second supplementary gated vector includes a second supplementary gated vector that is obtained by the second transform neuron by performing lightweight processing on a second gated vector determined by a second gated neuron at the input gate layer and/or the output gate layer. The determining the tth hidden state vector and the tth cell state vector based on the second gated vector, the second supplementary gated vector, the first candidate cell state vector, and the (t−1)th cell state vector includes: determining the tth cell state vector based on the second gated vector determined by the second gated neuron at the input gate layer, the second supplementary gated vector, and the first candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the second gated vector determined by the second gated neuron at the output gate layer.
In the fourth aspect or the first, second, third, and fourth possible implementations of the fourth aspect, the lightweight processing includes nonlinear transformation and/or linear transformation.
According to a fifth aspect, an embodiment of this application provides a data processing apparatus. The apparatus includes the following modules. A feature extraction module is configured to extract a feature sequence of target data. The feature sequence includes T input features. Herein, T is a positive integer, and t∈[1, T]. A second determining module is configured to obtain T hidden state vectors based on a first-type recurrent neural network. A tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, a third supplementary gated vector, and a third gated vector. The third gated vector is determined based on the (t−1)th input feature and the (t−1)th hidden state vector by using a first gated neuron in the first-type recurrent neural network. A 0th hidden state vector is an initial value. The third supplementary gated vector is obtained by performing lightweight processing on the third gated vector by using a first transform neuron in the first-type recurrent neural network. A result determining module is configured to obtain a processing result of the target data based on the T hidden state vectors by using a downstream task network.
In an embodiment, the first-type recurrent neural network includes a reset gate layer and an update gate layer. The reset gate layer is used to control information to be discarded from a hidden state vector. The update gate layer is used to control information to be added to a hidden state vector.
In an embodiment, when the first gated neuron is a gated neuron at the update gate layer, that a tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, a third supplementary gated vector, and a third gated vector includes: determining a sixth candidate hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, and the third supplementary gated vector by using a candidate neuron in the first-type recurrent neural network; and determining the tth hidden state vector based on the third gated vector, the (t−1)th hidden state vector, and the sixth candidate hidden state vector.
In an embodiment, when the first gated neuron is a gated neuron at the reset gate layer, that a tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, a third supplementary gated vector, and a third gated vector includes: determining a seventh candidate hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, and the third gated vector by using a candidate neuron in the first-type recurrent neural network; and determining the tth hidden state vector based on the third supplementary gated vector, the (t−1)th hidden state vector, and the seventh candidate hidden state vector.
In some possible implementations of the fifth aspect, the lightweight processing includes nonlinear transformation and/or linear transformation.
According to a sixth aspect, an embodiment of this application provides a data processing apparatus. The apparatus includes the following modules. A feature extraction module is configured to extract a feature sequence of target data. The feature sequence includes T input features. Herein, T is a positive integer, and t∈[1, T]. A third determining module is configured to obtain T hidden state vectors based on a second-type recurrent neural network. A tth hidden state vector and a tth cell state vector are determined based on a fourth gated vector, a fourth supplementary gated vector, a second candidate cell state vector, and a (t−1)th cell state vector. A 0th cell state vector is an initial value. The fourth gated vector is determined based on a (t−1)th input feature and a (t−1)th hidden state vector by using a second gated neuron in the second-type recurrent neural network. A 0th hidden state vector is an initial value. The fourth supplementary gated vector is obtained by performing lightweight processing on the fourth gated vector by using a second transform neuron in the second-type recurrent neural network. The second candidate cell state vector is determined based on the (t−1)th input feature and the (t−1)th hidden state vector by using a candidate neuron in the second-type recurrent neural network. A result determining module is configured to obtain a processing result of the target data based on the T hidden state vectors by using a downstream task network.
In an embodiment, the second-type recurrent neural network includes a forget gate layer, an input gate layer, and an output gate layer. The forget gate layer is used to control information to be discarded from a cell state vector. The input gate layer is used to control information to be added to a cell state vector. The output gate layer is used to control information in a to-be-output cell state vector.
In an embodiment, when the second gated neuron is a gated neuron at the forget gate layer, the fourth supplementary gated vector includes fourth supplementary gated vectors that are obtained by performing lightweight processing on the fourth gated vector respectively by second transform neurons at the input gate layer and the output gate layer. That a tth hidden state vector and a tth cell state vector are determined based on a fourth gated vector, a fourth supplementary gated vector, a second candidate cell state vector, and a (t−1)th cell state vector includes: determining the tth cell state vector based on the fourth supplementary gated vector that is obtained by performing lightweight processing on the fourth gated vector by the second transform neuron at the input gate layer, the fourth gated vector, and the second candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the fourth supplementary gated vector that is obtained by performing lightweight processing on the fourth gated vector by the second transform neuron at the output gate layer.
In an embodiment, when the second gated neuron is a gated neuron at the input gate layer, the fourth supplementary gated vector includes fourth supplementary gated vectors that are obtained by performing lightweight processing on the fourth gated vector respectively by second transform neurons at the forget gate layer and the output gate layer. That a tth hidden state vector and a tth cell state vector are determined based on a fourth gated vector, a fourth supplementary gated vector, a second candidate cell state vector, and a (t−1)th cell state vector includes: determining the tth cell state vector based on the fourth supplementary gated vector that is obtained by performing lightweight processing on the fourth gated vector by the second transform neuron at the forget gate layer, the fourth gated vector, and the second candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the fourth supplementary gated vector that is obtained by performing lightweight processing on the fourth gated vector by the second transform neuron at the output gate layer.
In an embodiment, when the second gated neuron is a gated neuron at the output gate layer, the fourth supplementary gated vector includes fourth supplementary gated vectors that are obtained by performing lightweight processing on the fourth gated vector respectively by second transform neurons at the forget gate layer and the input gate layer. That a tth hidden state vector and a tth cell state vector are determined based on a fourth gated vector, a fourth supplementary gated vector, a second candidate cell state vector, and a (t−1)th cell state vector includes: determining the tth cell state vector based on the fourth supplementary gated vectors that are obtained by performing lightweight processing on the fourth gated vector respectively by the second transform neurons at the forget gate layer and the input gate layer, and the second candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the second gated vector.
In an embodiment, when the second gated neuron includes gated neurons at the forget gate layer and the input gate layer, the fourth supplementary gated vector includes a fourth supplementary gated vector that is obtained by the second transform neuron by performing lightweight processing on a fourth gated vector determined by the second gated neuron at the forget gate layer and/or the input gate layer. That a tth hidden state vector and a tth cell state vector are determined based on a fourth gated vector, a fourth supplementary gated vector, a second candidate cell state vector, and a (t−1)th cell state vector includes: determining the tth cell state vector based on the fourth gated vectors respectively determined by the second gated neurons at the forget gate layer and the input gate layer, and the second candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the fourth supplementary gated vector.
In an embodiment, when the second gated neuron includes gated neurons at the forget gate layer and the output gate layer, the fourth supplementary gated vector includes a fourth supplementary gated vector that is obtained by the second transform neuron by performing lightweight processing on a fourth gated vector determined by a second gated neuron at the forget gate layer and/or the output gate layer. That a tth hidden state vector and a tth cell state vector are determined based on a fourth gated vector, a fourth supplementary gated vector, a second candidate cell state vector, and a (t−1)th cell state vector includes: determining the tth cell state vector based on the fourth gated vector determined by the second gated neuron at the forget gate layer, the fourth supplementary gated vector, and the second candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the fourth gated vector determined by the second gated neuron at the output gate layer.
In an embodiment, when the second gated neuron includes gated neurons at the input gate layer and the output gate layer, the fourth supplementary gated vector includes a fourth supplementary gated vector that is obtained by the second transform neuron by performing lightweight processing on a fourth gated vector determined by a second gated neuron at the input gate layer and/or the output gate layer. That a tth hidden state vector and a tth cell state vector are determined based on a fourth gated vector, a fourth supplementary gated vector, a second candidate cell state vector, and a (t−1)th cell state vector includes: determining the tth cell state vector based on the fourth gated vector determined by the second gated neuron at the input gate layer, the fourth supplementary gated vector, and the second candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the fourth gated vector determined by the second gated neuron at the output gate layer.
In some embodiments, the lightweight processing includes nonlinear transformation and/or linear transformation.
According to a seventh aspect, an embodiment of this application provides a data processing apparatus. The apparatus includes a processor, and a memory configured to store instructions executable by the processor. When the processor is configured to execute the instructions, the data processing method in the first aspect or one or more of the possible implementations of the first aspect is implemented.
According to an eighth aspect, an embodiment of this application provides a non-volatile computer-readable storage medium. The non-volatile computer-readable storage medium stores computer program instructions. When the computer program instructions are executed by a processor, the data processing method according to the first aspect or one or more of the possible implementations of the first aspect is implemented.
According to a ninth aspect, an embodiment of this application provides a terminal device. The terminal device may perform the data processing method in the first aspect or one or more of the possible implementations of the first aspect.
According to a tenth aspect, an embodiment of this application provides a computer program product, including computer-readable code or a non-volatile computer-readable storage medium carrying computer-readable code. When the computer-readable code is run in an electronic device, a processor in the electronic device performs the data processing method in the first aspect or one or more of the possible implementations of the first aspect.
These aspects and other aspects of this application are more concise and more comprehensible in descriptions of the following (a plurality of) embodiments.
The accompanying drawings included in this specification and constituting a part of this specification and this specification jointly show example embodiments, features, and aspects of this application, and are intended to explain the principles of this application.
The following describes various example embodiments, features, and aspects of this application in detail with reference to the accompanying drawings. Identical reference signs in the accompanying drawings indicate elements that have same or similar functions. Although various aspects of embodiments are illustrated in the accompanying drawings, the accompanying drawings are not necessarily drawn in proportion unless otherwise specified.
The special term “example” herein means “used as an example, an embodiment, or an illustration”. Any embodiment described as “exemplary” is not necessarily explained as being superior or better than other embodiments.
In addition, to better describe this application, numerous specific details are given in the following specific implementations. A person skilled in the art should understand that this application can also be implemented without some specific details. In some instances, methods, means, elements, and circuits that are well-known to a person skilled in the art are not described in detail, so that the subject matter of this application is highlighted.
For better understanding of solutions in embodiments of this application, the following first describes related terms and concepts that may be used in embodiments of this application.
(1) A recurrent neural network (RNN) is a recursive neural network that uses sequence data as an input and implements recursion in a sequence evolution direction and in which all nodes (recurrent units) are connected in a chain manner. A reason why the RNN is referred to as the recurrent neural network is that a current output of a sequence is also related to a previous output of the sequence. A specific representation form is that the network memorizes previous information and applies the previous information to calculation of the current output. To be specific, nodes at a hidden layer are connected, and an input of the hidden layer not only includes an output of an input layer, but also includes an output of the hidden layer at a previous moment. In other words, in terms of a network structure, the recurrent neural network memorizes the previous information and uses the previous information to affect an output of a subsequent node.
(2) A long short-term memory (LSTM) neural network is a recurrent neural network with three gate structures and can learn long-term dependency (long-term dependencies). A name “gate” is used because three neurons use a sigmoid activation function and a gate outputs a value ranging from 0 to 1 to indicate a specific amount of currently input information can pass through the gate. When the gate is opened (for example, an output of a sigmoid neural network layer is 1), it means that all information can pass through the gate. When the gate is closed (for example, the output of the sigmoid neural network layer is 0), it means that any information cannot pass through the gate. Compared with a hidden state in a conventional RNN, a cell state is added to the LSTM. The three gate structures in the LSTM can protect and control the cell state. The cell state represents long-term memory information, and the hidden state represents short-term memory information.
Herein, ∘ represents a Hadamard product, Wf, Wi, Wc, and Wo respectively represent weight matrices of the neurons for xt-1, Uf, Ui, Uc, and Uo respectively represent weight matrices of the neurons for ht-1, and bf, bi, bo, and bc may respectively represent biases in the neurons. It should be understood that the bias in the neuron may be 0.
(3) A gated recurrent unit (GRU) neural network is an LSTM-based recurrent neural network variant. The gated recurrent unit neural network combines a forget gate and an input gate in the LSTM into a separate update gate, and further combines a cell state and a hidden state and makes some modifications. A network structure of the GRU is simpler than that of the LSTM.
Herein, Wz, Wr, and Wn respectively represent weight matrices of the neurons for xt-1, Uz, Ur, and Uh respectively represent weight matrices of the neurons for ht-1, and bz, br, and bh may respectively represent biases in the neurons. It should be understood that the bias in the neuron may be 0.
It should be noted that the LSTM and the GRU are two recurrent neural networks provided in this application. Actually, the recurrent neural network in this application is not limited thereto. The recurrent neural network in this application may include a first-type recurrent neural network (for example, a recurrent neural network that uses only a hidden state, such as a gated recurrent unit neural network or a bidirectional gated recurrent unit neural network), and may further include a second-type recurrent neural network (for example, a recurrent neural network that uses a cell state and a hidden state, such as a long short-term memory neural network or a bidirectional long short-term memory neural network). In addition to the two types of recurrent neural networks, in the data processing method in this application, another neural network that needs to use a state vector to cache historical information may be actually further applied. In addition, the recurrent neural network in this application may be further used for signal processing in another field, for example, further used to process a serialized signal such as a time signal or a communication signal.
To better understand the solutions of embodiments of the data processing method in this application, in this application, description is provided by using an example in which the gated recurrent unit neural network represents a first-type recurrent neural network, and the long short-term memory neural network represents a second-type recurrent neural network. A processing process of another type of recurrent neural network is similar to that of the LSTM or the GRU. In addition, in this application, a neuron that uses a sigmoid function in a recurrent neural network is referred to as a gated neuron, and a neuron that uses a tanh function in a recurrent neural network is referred to as a candidate neuron.
With continuous development of artificial intelligence (AI) technologies, a recurrent neural network has a large quantity of application requirements in a terminal device, for example, applications such as voice wake-up, speech noise cancellation, and speech recognition. In a current recurrent neural network, a quantity of parameters in an entire network may usually reach a level of hundreds of thousands, millions, or tens of millions. If a 32-bit floating-point number is used for representation, a memory or a cache of hundreds of megabytes is needed. However, memory and cache resources of a terminal device are very limited. How to reduce the quantity of parameters in the recurrent neural network to adapt to the terminal device is an urgent problem to be resolved. In addition, because a calculation amount in the recurrent neural network is positively correlated with a time step of input data, floating point operations per second (FLOPS) of the recurrent neural network including hundreds of thousands of parameters may reach a level of tens of millions. When the terminal device performs computing, the recurrent neural network needs to consume a large quantity of computing resources. Therefore, the calculation amount in the recurrent neural network needs to be reduced. Therefore, how to reduce the calculation amount and the quantity of parameters in the recurrent neural network and accelerate a network computing speed with network precision being ensured becomes an urgent problem to be resolved.
Currently, to reduce the quantity of parameters and the calculation amount in the recurrent neural network, the recurrent neural network is usually pruned. For example, some gated neurons in the recurrent neural network are usually directly deleted, and a state is kept and updated by using a remaining gated neuron.
Herein, ReLU represents a ReLU function, and BN represents batch normalization.
Compared with the GRU shown in
In view of this, this application provides several data processing methods. According to a data processing method in embodiments of this application, a complete hidden state vector and a complete gated vector can be generated through lightweight processing such as linear transformation based on a hidden state vector and a gated vector that are generated by a recurrent neural network, to effectively reduce a quantity of parameters and a calculation amount of neurons in the recurrent neural network, thereby improving running efficiency of the recurrent neural network. The data processing method in embodiments of this application is applicable to various data processing by using the recurrent neural network, so that the quantity of parameters and the calculation amount in the network can be reduced when network precision is ensured, thereby improving processing efficiency of target data.
Specifically, embodiments of this application provide an extension manner of a hidden state vector for constructing a high-efficiency recurrent neural network. To be specific, lightweight processing such as matrix transformation, normalization, and nonlinear transformation is performed on the hidden state vector generated by the recurrent neural network. This is equivalent to extending the hidden state vector through calculation at a lightweight level, to obtain a complete state vector, thereby constructing a miniaturized model. Embodiments of this application further provide a supplementing manner of a gated vector for constructing a high-efficiency recurrent neural network. To be specific, lightweight processing such as matrix transformation, normalization, and nonlinear transformation is performed on a gated vector generated by using a gated neuron, to obtain a supplementary gated vector. This is equivalent to supplementing the gated vector through calculation at a lightweight level, thereby constructing a miniaturized model.
The data processing method in embodiments of this application can be applied to processing of various serialized target data, for example, data processing scenarios such as voice data, text data, and image data, to reduce the quantity of parameters and the calculation amount in the recurrent neural network. This improves a network running speed, and further improves processing efficiency of target data. In addition, the recurrent neural network can be deployed on a terminal device. For example, the following briefly describes a voice data processing scenario.
Voice wake-up in a voice assistant:
Speech noise cancellation in a MeeTime call:
Speech recognition in a voice input method: A user can use the voice input method to convert, into text, content that the user says. The data processing method in embodiments of this application may be used as a speech recognition model, to reduce a quantity of parameters and a calculation amount in the speech recognition model, thereby improving speech recognition efficiency. For example, the smartphone (for example, a microphone on the smartphone) may obtain voice data input when a user uses a voice input method, extract a feature sequence of the voice data by using the speech recognition model deployed on the smartphone, and output a text sequence corresponding to the target data.
It should be understood that the data processing method in embodiments of this application may be applied to various scenarios in which voice data needs to be processed, or may be applied to various scenarios in which a terminal device processes voice data by using a recurrent neural network. The scenarios include but are not limited to the foregoing three application scenarios. For example, the data processing method may be further used to recognize text in image data, and may be further used to translate text data.
The terminal device in this application may alternatively be another terminal device. The data processing method in embodiments of this application may be deployed on various terminal devices through software or hardware reconstruction, so that storage resources and computing resources required for deploying the recurrent neural network can be reduced, thereby improving processing efficiency of target data. For example, the terminal device in this application may include but is not limited to a terminal device such as a tablet computer, an in-vehicle device, an augmented reality (AR) device/a virtual reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (PDA), an artificial intelligence (artificial intelligence) device, and a wearable device. The wearable device may be a smart watch, a smart band, a wireless headset, smart glasses, a smart helmet, a glucometer, a blood pressure meter, or the like.
The terminal device in this application may be a touchscreen device, may be a non-touchscreen device, or may have no screen. The touchscreen device may be controlled by performing tapping, sliding, or the like on a display by using a finger, a stylus, or the like. The non-touchscreen device may be connected to an input device such as a mouse, a keyboard, or a touch panel. The terminal device is controlled by using the input device. The terminal device that has no screen may be, for example, a Bluetooth speaker without a screen. For example, in a speech noise cancellation scenario, the user may tap a corresponding control on the terminal device by using a finger, to trigger an operation of a voice call or a video call. In this way, in the data processing method in this application, target data can be obtained in response to the operation of the user, to perform speech noise cancellation.
The terminal device in this application may be a device with a wireless connection function. The wireless connection function means that the terminal device may be connected to another terminal device or a server in a wireless connection manner such as Wi-Fi or Bluetooth. The terminal device in this application may also have a function of performing communication through a wired connection. For example, in the speech noise cancellation scenario, target data of both parties in a call can be transmitted through communication between a terminal device and a server, so that noise cancellation processing is performed on transmitted voice data on a terminal device on which the data processing method in this application is deployed.
It should be noted that the data processing method in embodiments of this application can also be deployed on a server. The server may be located on a cloud or located locally, may be a physical device, or may be a virtual device such as a virtual machine or a container. The server has a wireless communication function. The wireless communication function may be set on a chip (system) or another component or part of the server. The server may be a device with a wireless connection function. The wireless connection function means that the server may be connected to another server or terminal device in a wireless connection manner such as Wi-Fi or Bluetooth. The server in this application may also have a function of performing communication through a wired connection. For example, the server in this application may be located on a cloud, communicate with the terminal device, receive target data sent by the terminal device, output a processing result (for example, voice data obtained after speech noise cancellation and a text sequence obtained through speech recognition) of the target data by using the data processing method deployed on the server, and return the processing result to the terminal device.
The following describes in detail the data processing method provided in embodiments of this application by using
Operation S601: Extract a feature sequence of target data, where the feature sequence includes T input features, and T is a positive integer.
The target data may include at least one of the following: voice data, image data, and text data. The target data may be target data collected by a data collection apparatus (such as a microphone) of the foregoing terminal device, or may be target data obtained by the terminal device from a local storage or a cloud server, or the like. A source of the target data is not limited in this application.
For example, for the voice data, a feature sequence of the voice data may be extracted by using a mel-frequency cepstral coefficient (MFCC). The MFCC is a cepstral parameter extracted in a mel scale frequency domain. A mel scale is used to describe a nonlinear feature of a human ear frequency. The MFCC may include pre-emphasis, frame segmentation, windowing, fast Fourier transform, a mel filter bank, discrete cosine transform, and the like. The MFCC is used to extract an acoustic feature from a segment of voice data. Because some information in the voice data is irrelevant to speech recognition, and makes speech recognition more complex, acoustic feature extraction is performed on the voice data, and the voice data may be described by using a given quantity of signal components, to extract a feature sequence that helps data processing.
It should be understood that, in a process of extracting the acoustic feature of the voice data by using the MFCC, the entire voice data is usually divided into a plurality of segments based on a time step (that is, a window movement step) for feature extraction. In this case, the feature sequence may include the T sequentially extracted input features, and the T input features may be arranged in a time order.
It should be noted that a feature extraction manner of the target data is not limited in this embodiment of this application. For example, a gammatone frequency cepstral coefficient (GFCC), a shifted delta cepstrum (SDC), or the like may be further used for the voice data, and a convolutional neural network may be used for the image data.
Operation S602: Obtain T hidden state vectors based on a recurrent neural network, where a tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, and a (t−1)th extended state vector, and the (t−1)th extended state vector is obtained by performing lightweight processing based on the (t−1)th hidden state vector, and t∈[1, T]. The lightweight processing may include linear transformation and/or nonlinear transformation. The linear transformation and the nonlinear transformation may be transformation manners at a lightweight level. Matrix transformation, normalization, an activation function, or the like may be used for the linear transformation. Convolution processing or the like may be used for the nonlinear transformation.
The lightweight processing at the lightweight level is performed based on the (t−1)th hidden state vector to obtain the (t−1)th extended state vector. This is equivalent to extending the hidden state vector through calculation at the lightweight level. Because a quantity of parameters in the recurrent neural network is positively correlated with a dimension of a hidden state vector output by the recurrent neural network, the hidden state vector is extended through calculation at the lightweight level, so that the recurrent neural network can output a hidden state vector of a small dimension, thereby reducing the quantity of parameters in the recurrent neural network and reducing an overall calculation amount.
It should be understood that operation S602 is a recursive recurrent process. A 0th hidden state vector may be a customized initial value, for example, may be 0 or any empirical value. 1st to Tth hidden state vectors may be output values of the recurrent neural network.
As described above, the recurrent neural network may include a first-type recurrent neural network represented by a gated recurrent unit neural network.
As shown in
As shown in
Herein, [ht-1, gt-1] represents a spliced state vector obtained by splicing ht-1 and gt-1, Gh represents a weight matrix of a neuron for gt-1, zt1 represents a gated vector output by a gated neuron at an update gate layer, rt1 represents a gated vector output by a gated neuron at a reset gate layer, {tilde over (h)}t1 represents a candidate hidden state vector output by a candidate neuron, and rtg represents an intermediate gated vector that is obtained by performing lightweight processing on rt1 and that has the same dimension as gt-1. It should be understood that dimensions of ht-1 and gt-1 may be different. If it is expected to perform same processing on gt-1 as ht-1 when ht is determined, rt1 needs to be transformed into a gated vector rtg of the same dimension as gt-1. The lightweight processing may be linear transformation and/or nonlinear transformation. A dashed line in
Formula (4-1) and Formula (4-2) indicate to obtain two gated vectors zt1 and rt1 by inputting the (t−1)th input feature xt-1 and the spliced state vector [ht-1, gt-1] to two gated neurons in the gated recurrent unit neural network. Formula (4-3) indicates to obtain a candidate hidden state vector {tilde over (h)}t1 by inputting the (t−1)th input feature xt-1, a product (rt1∘ht-1) of the (t−1)th hidden state vector ht-1 and the gated vector rt1, and the (t−1)th extended state vector gt-1 to a candidate neuron in the gated recurrent unit neural network; or obtain a candidate hidden state vector {tilde over (h)}t1 by inputting the (t−1)th input feature xt-1, a product (rt1∘ht-1) of the (t−1)th hidden state vector ht-1 and the gated vector rt1, and a product (rtg∘gt-1) of the (t−1)th extended state vector gt-1 and the intermediate gated vector rtg to a candidate neuron in the gated recurrent unit neural network; or obtain a candidate hidden state vector {tilde over (h)}t1 by inputting the (t−1)th input feature xt-1 and a product (rtg∘gt-1) of the (t−1)th hidden state vector ht-1 and the gated vector rt1 to a candidate neuron in the gated recurrent unit neural network. Formula (4-4) indicates to obtain the tth hidden state vector ht by multiplying the gated vector zt1 by the candidate hidden state vector ht, multiplying a difference between a unit vector and the gated vector zt1 by the (t−1)th hidden state vector ht-1, and adding two multiplication results.
As described above, the quantity of parameters in the recurrent neural network is positively correlated with the dimension of the hidden state vector output by the recurrent neural network. The hidden state vector is extended through calculation at the lightweight level, so that the recurrent neural network can output a hidden state vector of a small dimension, thereby reducing the quantity of parameters in the recurrent neural network. It is assumed that biases of all neurons in the gated recurrent unit neural networks in
times the quantity or parameters in the gated recurrent unit neural network in
times the calculation amount in
It can be obtained based on
It can be obtained based on
Herein, Paramsgru represents the total quantity of parameters in the gated recurrent unit neural network in
As described above, the recurrent neural network may include a second-type recurrent neural network represented by a long short-term memory neural network. When the recurrent neural network includes the second-type recurrent neural network, a process of determining a hidden state vector by using the second-type recurrent neural network is described with reference to
As shown in
Herein, [ht-1, gt-1] represents a spliced state vector obtained by splicing ht-1 and gt-1. A processing process of Formulas (6-1) to (6-6) may be expressed as follows: inputting the (t−1)th input feature xt-1 and the spliced state vector [ht-1, gt-1] to three gated neurons and a candidate neuron in the long short-term memory neural network, to obtain three gated vectors ft0, it0, and ot0, and a candidate cell state vector {tilde over (c)}t0; multiplying the gated vector ft0 by the (t−1)th cell state vector Ct-1, and adding a multiplication result and a product of the gated vector it0 and the candidate cell state vector {tilde over (c)}t0, to obtain the tth cell state vector Ct; and then, multiplying the gated vector ot0 by tanh(Ct), to obtain the tth hidden state vector ht.
It is assumed that biases of all neurons in the long short-term memory neural networks in
times a quantity of parameters in the long short-term memory neural network in
It can be obtained based on
It can be obtained based on
Herein, Paramslstm represents the total quantity of parameters in the long short-term memory neural network in
Operation S603: Obtain a processing result of the target data based on the T hidden state vectors by using a downstream task network.
It should be understood that the downstream task network may be customized based on a specific downstream task, and different downstream task networks may be used for different downstream tasks. A network structure, a network type, and the like of the downstream task network are not limited in this embodiment of this application. For example, the downstream task may include at least one of the following: a speech recognition task, a speech noise cancellation task, a voice wake-up task, a text recognition task, and a text translation task. Correspondingly, the processing result may include at least one of the following: a speech recognition result of the voice data, a speech noise cancellation result of the voice data, a voice wake-up result of the voice data, a text recognition result of the image data, and a text translation result of the text data.
For example, in the speech recognition task, a text sequence corresponding to voice data may be determined based on T hidden state vectors by using a downstream task network (for example, a decoder network). For example, the decoder network may determine, based on the T hidden state vectors, a probability that each hidden state vector belongs to each word in a language model, and provide a text sequence with a maximum probability as a processing result of the target data. In the voice wake-up task, after a text sequence of semantic data is output by using a downstream task network, whether the text sequence corresponding to the voice data matches a specified word or sentence of a voice assistant may be detected, and a matching result is used as a processing result of the voice data. In the speech noise cancellation task, a processing process reverse to feature extraction in operation S601 may be performed on T hidden state vectors, for example, processing reverse to MFCC is performed, to obtain voice data after noise cancellation.
According to this embodiment of this application, because a partial state vector in a complete state vector that currently needs to be input to the recurrent neural network is an extended state vector obtained through lightweight processing, the recurrent neural network may be controlled to output a hidden state vector of a small dimension. In this way, a quantity of parameters and a calculation amount that are required for outputting the hidden state vector by the recurrent neural network can be reduced. A dimension of the hidden state vector output by the recurrent neural network is reduced. However, because an extended state vector obtained by performing lightweight processing on the hidden state vector and the hidden state vector jointly form a complete state vector input to the recurrent neural network, this is equivalent to a supplementary to status information input to the recurrent neural network. In this way, a network computing speed can be improved, network precision can be ensured during data processing, and processing efficiency of the target data can be improved. In addition, a recurrent neural network with a reduced quantity of parameters and a reduced calculation amount can be deployed on a terminal device, and has higher universality.
As described above, the recurrent neural network may include the first-type recurrent neural network represented by the gated recurrent unit neural network. The first-type recurrent neural network includes a reset gate layer and an update gate layer. The reset gate layer is used to control information to be discarded from a hidden state vector (that is, to-be-discarded old information). The update gate layer is used to control information to be added to a hidden state vector (that is, to-be-added new information). Based on the process of determining the hidden state vector shown in
Operation S6021: Determine first gated vectors based on the (t−1)th input feature, the (t−1)th hidden state vector, and the (t−1)th extended state vector respectively by using first gated neurons at the reset gate layer and the update gate layer in the first-type recurrent neural network.
Operation S6022: Determine, by using a candidate neuron in the first-type recurrent neural network, a first candidate hidden state vector based on the first gated vector determined by the first gated neuron at the reset gate layer, the (t−1)th input feature, and the (t−1)th hidden state vector, or determine a first candidate hidden state vector based on the first gated vector determined by the first gated neuron at the reset gate layer, the (t−1)th input feature, the (t−1)th hidden state vector, and the (t−1)th extended state vector.
Operation S6023: Determine the tth hidden state vector based on the first gated vector determined by the first gated neuron at the update gate layer, the (t−1)th hidden state vector, and the first candidate hidden state vector.
In operation S6021, in the processing manners shown in the foregoing Formulas (4-1) and (4-2), the first gated vectors may be determined based on the (t−1)th input feature, the (t−1)th hidden state vector, and the (t−1)th extended state vector respectively by using the first gated neurons at the reset gate layer and the update gate layer in the first-type recurrent neural network. Specifically, the (t−1)th hidden state vector and the (t−1)th extended state vector may be spliced, to obtain a spliced state vector. Then, the (t−1)th input feature and the spliced state vector are input to the first gated neurons at the reset gate layer and the update gate layer, to obtain the two first gated vectors zt1 and rt1. To be specific, zt1 may represent the first gated vector output by the first gated neuron at the update gate layer, and rt1 may represent the first gated vector output by the first gated neuron at the reset gate layer.
In operation S6022, the three processing manners shown in the foregoing Formula (4-3) may be used to implement: determining the first candidate hidden state vector {tilde over (h)}t1 based on the first gated vector rt1 determined by the first gated neuron at the reset gate layer, the (t−1)th input feature, and the (t−1)th hidden state vector, or determining the first candidate hidden state vector {tilde over (h)}t1 based on the first gated vector rt1 determined by the first gated neuron at the reset gate layer, the (t−1)th input feature, the (t−1)th hidden state vector, and the (t−1)th extended state vector.
Specifically, when the first candidate hidden state vector is determined based on the first gated vector determined by the first gated neuron at the reset gate layer, the (t−1)th input feature, and the (t−1)th hidden state vector, the (t−1)th input feature and a product of the first gated vector rt1 and the (t−1)th hidden state vector may be input to the candidate neuron, to obtain the first candidate hidden state vector {tilde over (h)}t1; when the first candidate hidden state vector is determined based on the first gated vector determined by the first gated neuron at the reset gate layer, the (t−1)th input feature, the (t−1)th hidden state vector, and the (t−1)th extended state vector, the (t−1)th input feature, the (t−1)th hidden state vector, and a product of the (t−1)th extended state vector and the first gated vector rt1 may be input to the candidate neuron in the first-type recurrent neural network, to obtain the first candidate hidden state vector {tilde over (h)}t1; or the first gated vector rt1 may be first transformed into an intermediate gated vector rtg of the same dimension as the (t−1)th extended state vector, and then the (t−1)th input feature xt-1, a product of the (t−1)th hidden state vector and the first gated vector rt1, and a product of the (t−1)th extended state vector and the intermediate gated vector r are input to the candidate neuron, to obtain the first candidate hidden state vector ht.
In operation S6023, with reference to the processing manner shown in the foregoing Formula (4-4), the tth hidden state vector may be determined based on the first gated vector determined by the first gated neuron at the update gate layer, the (t−1)th hidden state vector, and the first candidate hidden state vector. Specifically, the first gated vector zt1 may be multiplied by the first candidate hidden state vector {tilde over (h)}t1, a difference between the unit vector and the first gated vector zt1 is multiplied by the (t−1)th hidden state vector, and two multiplication results are added, to obtain the tth hidden state vector ht.
According to this embodiment of this application, the tth hidden state vector is determined based on the (t−1)th input feature, the (t−1)th hidden state vector, and the (t−1)th extended state vector by using the first-type recurrent neural network, so that the first-type recurrent neural network can output a hidden state vector of a small dimension, thereby reducing a quantity of parameters and a calculation amount in the first-type recurrent neural network.
In an embodiment, to further reduce the quantity of parameters and the calculation amount in the recurrent neural network and accelerate a network running speed, a calculation process in the recurrent neural network may be further improved. For example, calculation of a gated neuron is simplified through lightweight processing at the lightweight level. In a possible implementation, when the recurrent neural network includes the first-type recurrent neural network, that a tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, and a (t−1)th extended state vector in operation S602 may include the following operations:
Operation S6024: Determine a first gated vector based on the (t−1)th input feature, the (t−1)th hidden state vector, and the (t−1)th extended state vector by using a first gated neuron at the reset gate layer or the update gate layer in the first-type recurrent neural network.
Operation S6025: Perform lightweight processing on the first gated vector by using a first transform neuron in the first-type recurrent neural network, to obtain a first supplementary gated vector.
Operation S6026: Determine the tth hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first supplementary gated vector, and the first gated vector; or determine the tth hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first supplementary gated vector, the first gated vector, and the (t−1)th extended state vector.
In operation S6024, the first gated neural network may be a gated neuron at the update gate layer in the first-type recurrent neural network, or a gated neuron at the reset gate layer in the first-type recurrent neural network. The first gated neuron may determine the first gated vector based on the (t−1)th input feature, the (t−1)th hidden state vector, and the (t−1)th extended state vector with reference to the processing process shown in the foregoing Formula (4-1) or (4-2).
In operation S6025, the lightweight processing may include linear transformation and/or nonlinear transformation. The linear transformation and the nonlinear transformation may be transformation manners at the lightweight level. Matrix transformation, normalization, an activation function, or the like may be used for the linear transformation. Convolution processing or the like may be used for the nonlinear transformation. It should be understood that a dimension of the first supplementary gated vector obtained by performing lightweight processing on the first gated vector is the same as a dimension of the first gated vector, that is, the same as a dimension of the (t−1)th hidden state vector. In addition, lightweight processing performed by the first transform neuron on the first gated vector may be different from or certainly the same as lightweight processing performed on the (t−1)th hidden state vector in operation S602.
Lightweight processing is performed on the first gated vector in operation S6025, to obtain the first supplementary gated vector. In comparison with a case in which two gated vectors are output based on the (t−1)th input feature, the (t−1)th hidden state vector, and the (t−1)th extended state vector by using two gated neurons in
As shown in
As shown in
The processing process shown in
Formula (8-1) indicates to obtain the first gated vector zt1 by inputting the (t−1)th input feature xt-1 and the spliced state vector [ht-1, gt-1] to the first gated neuron at the update gate layer. Formula (8-2) indicates to obtain the first supplementary gated vector rt1* by inputting the first gated vector zt1 to the first transform neuron φ1. Formula (8-3) indicates to obtain the second candidate hidden state vector {tilde over (h)}t2 by multiplying the first supplementary gated vector rt1* by the (t−1)th hidden state vector ht-1, and inputting a multiplication result and the input feature xt-1 to the candidate neuron. Formula (8-4) indicates to obtain the tth hidden state vector ht by multiplying the first gated vector zt1 by the second candidate hidden state vector {tilde over (h)}t2, multiplying a difference between the unit vector and the first gated vector zt1 by the (t−1)th hidden state vector ht-1, and adding two multiplication results.
As shown in
As shown in
The processing process shown in
Herein, Gh represents a weight matrix of the candidate neuron for the (t−1)th extended state vector, and rtg* represents an intermediate gated vector that is obtained by performing lightweight processing on rt1* and that has the same dimension as gt-1. It should be understood that dimensions of ht-1 and gt-1 may be different. If it is expected to perform same processing on gt-1 as ht-1 when {tilde over (h)}t4 is determined, rt1* needs to be transformed into a gated vector rtg* of the same dimension as gt-1. The lightweight processing may be linear transformation and/or nonlinear transformation.
Formula (8-5) indicates to obtain the fourth candidate hidden state vector {tilde over (h)}t4 by inputting the (t−1)th input feature xt-1, a product (rt1*∘ht-1) of the (t−1)th hidden state vector ht-1 and the first supplementary gated vector rt1*, and the (t−1)th extended state vector gt-1 to the candidate neuron in the first-type recurrent neural network; or obtain the fourth candidate hidden state vector {tilde over (h)}t4 by inputting the (t−1)th input feature xt-1, a product (rt1*∘ht-1) of the (t−1)th hidden state vector ht-1 and the first supplementary gated vector rt1*, and a product (rtg*∘gt-1) of the (t−1)th extended state vector gt-1 and the intermediate gated vector rtg* to the candidate neuron in the first-type recurrent neural network. Formula (8-6) indicates to obtain the tth hidden state vector ht by multiplying the first gated vector zt1 by the fourth candidate hidden state vector {tilde over (h)}t4, multiplying a difference between the unit vector and the first gated vector zt1 by the (t−1)th hidden state vector ht-1, and adding two multiplication results.
As shown in
As shown in
The processing process shown in
Formula (9-1) indicates to obtain the first gated vector rt1 by inputting the (t−1)th input feature xt-1 and the spliced state vector [ht-1, gt-1] to the first gated neuron at the reset gate layer. Formula (9-2) indicates to obtain the first supplementary gated vector zt1* by inputting the first gated vector rt1 to the first transform neuron φ1. Formula (9-3) indicates to obtain the third candidate hidden state vector {tilde over (h)}t3 by multiplying the first gated vector rt1 by the (t−1)th hidden state vector ht-1, and inputting a multiplication result and the input feature xt-1 to the candidate neuron. Formula (9-4) indicates to obtain the tth hidden state vector ht by multiplying the first supplementary gated vector zt1* by the third candidate hidden state vector {tilde over (h)}t3, multiplying a difference between the unit vector and the first supplementary gated vector zt1* by the (t−1)th hidden state vector ht-1, and adding two multiplication results.
As shown in
As shown in
The processing process shown in
Herein, rtg represents an intermediate gated vector that is obtained by performing lightweight processing on rt1 and that has the same dimension as gt-1. It should be understood that dimensions of ht-1 and gt-1 may be different. If it is expected to perform same processing on gt-1 as ht-1 when {tilde over (h)}t5 is determined, rt1 needs to be transformed into a gated vector rtg of the same dimension as gt-1. The lightweight processing may be linear transformation and/or nonlinear transformation.
Formula (9-5) indicates to obtain the fifth candidate hidden state vector {tilde over (h)}t5 by inputting the (t−1)th input feature xt-1, a product (rt1∘ht-1) of the (t−1)th hidden state vector ht-1 and the first gated vector rt1, and the (t−1)th extended state vector gt-1 to the candidate neuron in the first-type recurrent neural network; or obtain the fifth candidate hidden state vector {tilde over (h)}t5 by inputting the (t−1)th input feature xt-1, a product (rt1∘ht-1) of the (t−1)th hidden state vector ht-1 and the first gated vector rt1, and a product (rtg∘gt-1) of the (t−1)th extended state vector gt-1 and the intermediate gated vector rtg to the candidate neuron in the first-type recurrent neural network. Formula (9-6) indicates to obtain the tth hidden state vector ht by multiplying the first supplementary gated vector zt1* by the fifth candidate hidden state vector {tilde over (h)}t5, multiplying a difference between the unit vector and the first supplementary gated vector zt1* by the (t−1)th hidden state vector ht-1, and adding two multiplication results.
It can be learned based on Formulas (8-1) to (8-6) and Formulas (9-1) to (9-6) that a quantity of weight matrices of a neuron in the recurrent neural network in Formulas (8-1) to (8-6) and Formulas (9-1) to (9-6) is smaller than that in the foregoing Formulas (4-1) to (4-4), thereby significantly reducing the quantity of parameters and the calculation amount in the recurrent neural network.
It should be noted that the first transform neuron φ1 may perform lightweight processing at the lightweight level such as linear transformation and/or nonlinear transformation. For example, φ1 may also use a sigmoid function.
As described above, the quantity of parameters in the recurrent neural network is positively correlated with the dimension of the hidden state vector output by the recurrent neural network. A main difference between
According to this embodiment of this application, lightweight processing is performed on the first gated vector, to obtain the first supplementary gated vector. This is equivalent to supplementing a gated vector through lightweight processing at the lightweight level. Two gated neurons in the first-type recurrent neural network are directly used to output two gated vectors based on the (t−1)th input feature and a (t−1)th spliced state vector. In comparison, in this application, a quantity of parameters and a calculation amount for generating a gated vector can be reduced, thereby reducing a quantity of parameters and a calculation amount in the entire first-type recurrent neural network and improving a network computing speed. In addition, compared with a current manner in which a quantity of parameters and a calculation amount in a network are compressed through pruning processing, in this embodiment of this application, the quantity of parameters and the calculation amount can be reduced, and control of the first-type recurrent neural network on a hidden state can be ensured, so that the first-type recurrent neural network has higher universality.
As described above, the recurrent neural network may include the second-type recurrent neural network represented by the long short-term memory neural network. The second-type recurrent neural network includes a forget gate layer, an input gate layer, and an output gate layer. The forget gate layer is used to control information to be discarded from a cell state vector (that is, to-be-discarded old information). The input gate layer is used to control information to be added to a cell state vector (that is, to-be-added new information). The output gate layer is used to control output information in a cell state vector (that is, output partial information screened out from the cell state vector). Based on the process of determining the hidden state vector shown in
Operation S6027: Splice the (t−1)th hidden state vector and the (t−1)th extended state vector, to obtain a (t−1)th spliced state vector.
Operation S6028: Determine the tth hidden state vector and a tth cell state vector based on the (t−1)th input feature, the (t−1)th spliced state vector, and a (t−1)th cell state vector by using the second-type recurrent neural network.
In an embodiment, in operation S6028, by using the processing process shown in
It should be understood that operation S602 is a recursive recurrent process. The 0th cell state vector may be a customized initial value, for example, may be 0 or any empirical value. 1st to Tth cell state vectors may be output values of the second-type recurrent neural network.
In an embodiment, to further reduce the quantity of parameters and the calculation amount in the recurrent neural network and accelerate a network running speed, a calculation process in the recurrent neural network may be further improved. For example, calculation of a gated neuron is simplified through lightweight processing at the lightweight level. In a possible implementation, determining the tth hidden state vector and the tth cell state vector based on the (t−1)th input feature, the (t−1)th spliced state vector, and the (t−1)th cell state vector by using the second-type recurrent neural network in operation S6028 may include the following operations:
Operation S60281: Determine a second gated vector based on the (t−1)th input feature and the (t−1)th spliced state vector by using a second gated neuron in the second-type recurrent neural network.
Operation S60282: Perform lightweight processing on the second gated vector by using a second transform neuron in the second-type recurrent neural network, to obtain a second supplementary gated vector.
Operation S60283: Determine a first candidate cell state vector based on the (t−1)th input feature and the (t−1)th spliced state vector by using a candidate neuron in the second-type recurrent neural network.
Operation S60284: Determine the tth hidden state vector and the tth cell state vector based on the second gated vector, the second supplementary gated vector, the first candidate cell state vector, and the (t−1)th cell state vector.
In operation S60281, the second gated neural network may be any gated neuron at the forget gate layer, the input gate layer, and the output gate layer in the second-type recurrent neural network, or may be any two gated neurons at the forget gate layer, the input gate layer, and the output gate layer. The second gated neuron may determine the second gated vector based on the (t−1)th input feature and the (t−1)th spliced state vector with reference to the processing process shown in the foregoing Formulas (6-1), (6-2), and (6-3).
In operation S60282, the lightweight processing may include linear transformation and/or nonlinear transformation. The linear transformation and the nonlinear transformation may be transformation manners at the lightweight level. Matrix transformation, normalization, an activation function, or the like may be used for the linear transformation. Convolution processing or the like may be used for the nonlinear transformation. It should be understood that lightweight processing performed by the second transform neuron on the second gated vector may be different from or certainly the same as lightweight processing performed on the (t−1)th hidden state vector and lightweight processing performed on the first gated vector in operation S602.
When the second gated neural network in operation S60281 is any gated neuron at the forget gate layer, the input gate layer, and the output gate layer in the second-type recurrent neural network, in operation S60282, lightweight processing may be separately performed on the second gated vector by using two second transform neurons, to obtain two second supplementary gated vectors. It should be understood that weight matrices of the two second transform neurons may be different. When the second gated neural network in operation S60281 is any two gated neurons at the forget gate layer, the input gate layer, and the output gate layer in the second-type recurrent neural network, in operation S60282, lightweight processing may be performed by using the second transform neuron on a second gated vector output by one of any two gated neurons; or an operation such as splicing or adding may be first performed on two second gated vectors output by any two gated neurons, and then lightweight processing is performed on a result obtained after the operation such as splicing or adding, to obtain the second supplementary gated vector.
In operation S60282, lightweight processing is performed on a second gated vector generated by one or two gated neurons in the second-type recurrent neural network, to obtain the second supplementary gated vector. In comparison with a case in which three gated vectors are output based on the (t−1)th input feature and the (t−1)th spliced state vector by using three gated neurons in
In operation S60283, the candidate neuron may determine the first candidate cell state vector based on the (t−1)th input feature and the (t−1)th spliced state vector with reference to the processing process shown in the foregoing Formula (6-4).
With reference to
As shown in
As shown in
The processing process shown in
Formula (10-1) indicates to obtain the second gated vector ft2 by inputting the (t−1)th input feature xt-1 and the spliced state vector [ht-1, gt-1] to the second gated neuron at the forget gate layer. Formula (10-2) indicates to obtain the second supplementary gated vector it2* by inputting the second gated vector ft2 to the second transform neuron at the input gate layer. Formula (10-3) indicates to obtain the second supplementary gated vector ot2* by inputting the second gated vector ft2 to the second transform neuron at the output gate layer. Formula (10-4) indicates to obtain the first candidate cell state vector {tilde over (c)}t1 by inputting the (t−1)th input feature xt-1 and the spliced state vector [ht-1,gt-1] to the candidate neuron. Formula (10-5) indicates to obtain the tth cell state vector Ct by multiplying the second gated vector ft2 by the (t−1)th cell state vector Ct-1, and adding a multiplication result and a product of the second supplementary gated vector it2* and the first candidate cell state vector {tilde over (c)}t1. Formula (10-6) indicates to obtain the tth hidden state vector ht by multiplying the second supplementary gated vector ot2* by tanh(Ct).
As shown in
As shown in
The processing process shown in
Formula (11-1) indicates to obtain the second gated vector it2 by inputting the (t−1)th input feature xt-1 and the spliced state vector [ht-1,gt-1] to the second gated neuron at the input gate layer. Formula (11-2) indicates to obtain the second supplementary gated vector ft2* by inputting the second gated vector it to the second transform neuron at the forget gate layer. Formula (11-3) indicates to obtain the second supplementary gated vector ot2* by inputting the second gated vector it2 to the second transform neuron at the output gate layer. Formula (11-4) indicates to obtain the first candidate cell state vector ît by inputting the (t−1)th input feature xt-1 and the spliced state vector [ht-1,gt-1] to the candidate neuron. Formula (11-5) indicates to obtain the tth cell state vector Ct by multiplying the second supplementary gated vector ft2* by the (t−1)th cell state vector Ct-1, and adding a multiplication result and a product of the second gated vector it2 and the first candidate cell state vector {tilde over (c)}t1. Formula (11-6) indicates to obtain the tth hidden state vector ht by multiplying the second supplementary gated vector ot2* by tanh(Ct).
As shown in
As shown in
The processing process shown in
Formula (12-1) indicates to obtain the second gated vector ot2 by inputting the (t−1)th input feature xt-1 and the spliced state vector [ht-1,gt-1] to the second gated neuron at the output gate layer. Formula (12-2) indicates to obtain the second supplementary gated vector ft2* by inputting the second gated vector ot2 to the second transform neuron at the forget gate layer. Formula (12-3) indicates to obtain the second supplementary gated vector it2* by inputting the second gated vector ot2 to the second transform neuron at the input gate layer. Formula (12-4) indicates to obtain the first candidate cell state vector {tilde over (c)}t1 by inputting the (t−1)th input feature xt-1 and the spliced state vector [ht-1, gt-1] to the candidate neuron. Formula (12-5) indicates to obtain the tth cell state vector Ct by multiplying the second supplementary gated vector ft2* by the (t−1)th cell state vector Ct-1, and adding a multiplication result and a product of the second supplementary gated vector it2* and the first candidate cell state vector {tilde over (c)}t1. Formula (12-6) indicates to obtain the tth hidden state vector ht by multiplying the second gated vector ot2 by tanh(Ct).
As described above, the quantity of parameters in the recurrent neural network is positively correlated with the dimension of the hidden state vector output by the recurrent neural network. A main difference between
As shown in
Two dashed lines in
As shown in
The processing process shown in
Formula (13-1) indicates to obtain the second gated vector ft2 by inputting the (t−1)th input feature xt-1 and the spliced state vector [ht-1, gt-1] to the second gated neuron at the forget gate layer. Formula (13-2) indicates to obtain the second gated vector it2 by inputting the (t−1)th input feature xt-1 and the spliced state vector [ht-1, gt-1] to the second gated neuron at the input gate layer. Formula (13-3) indicates to obtain the second supplementary gated vector ot2* by inputting the second gated vector it2 and/or ft2 to the second transform neuron at the output gate layer. Formula (13-4) indicates to obtain the first candidate cell state vector {tilde over (c)}t1 by inputting the (t−1)th input feature xt-1 and the spliced state vector [ht-1, gt-1] to the candidate neuron. Formula (13-5) indicates to obtain the tth cell state vector Ct by multiplying the second gated vector ft2 by the (t−1)th cell state vector Ct-1, and adding a multiplication result and a product of the second gated vector it2 and the first candidate cell state vector {tilde over (c)}t1. Formula (13-6) indicates to obtain the tth hidden state vector ht by multiplying the second supplementary gated vector ot2* by tanh(Ct).
As shown in
Two dashed lines in
As shown in
The processing process shown in
Formula (14-1) indicates to obtain the second gated vector ft2 by inputting the (t−1)th input feature xt-1 and the spliced state vector [ht-1, gt-1] to the second gated neuron at the forget gate layer. Formula (14-2) indicates to obtain the second gated vector ot2 by inputting the (t−1)th input feature xt-1 and the spliced state vector [ht-1, gt-1] to the second gated neuron at the output gate layer. Formula (14-3) indicates to obtain the second supplementary gated vector it2* by inputting the second gated vector ft2 and/or ot2 to the second transform neuron at the input gate layer. Formula (14-4) indicates to obtain the first candidate cell state vector ît by inputting the (t−1)th input feature xt-1 and the spliced state vector [ht-1, gt-1] to the candidate neuron. Formula (14-5) indicates to obtain the tth cell state vector Ct by multiplying the second gated vector ft2 by the (t−1)th cell state vector Ct-1, and adding a multiplication result and a product of the second supplementary gated vector it2* and the first candidate cell state vector {tilde over (c)}t1. Formula (14-6) indicates to obtain the tth hidden state vector ht by multiplying the second gated vector ot2 by tanh(Ct).
As shown in
Two dashed lines in
As shown in
The processing process shown in
Formula (15-1) indicates to obtain the second gated vector it2 by inputting the (t−1)th input feature xt-1 and the spliced state vector [ht-1, gt-1] to the second gated neuron at the input gate layer. Formula (15-2) indicates to obtain the second gated vector ot2 by inputting the (t−1)th input feature xt-1 and the spliced state vector [ht-1,gt-1] to the second gated neuron at the output gate layer. Formula (15-3) indicates to obtain the second supplementary gated vector ft2* by inputting the second gated vector it2 and/or ot2 to the second transform neuron at the forget gate layer. Formula (15-4) indicates to obtain the first candidate cell state vector {tilde over (c)}t
As described above, the quantity of parameters in the recurrent neural network is positively correlated with the dimension of the hidden state vector output by the recurrent neural network. A main difference between
According to this embodiment of this application, lightweight processing is performed on the second gated vector, to obtain the second supplementary gated vector. Three gated neurons in the second-type recurrent neural network are directly used to output three gated vectors based on the (t−1)th input feature and a (t−1)th spliced state vector. In comparison, in this application, a quantity of parameters and a calculation amount for generating a gated vector can be reduced, thereby reducing a quantity of parameters and a calculation amount in the entire second-type recurrent neural network and improving a network computing speed. In addition, compared with a current manner in which a quantity of parameters and a calculation amount in a network are compressed through pruning processing, in this embodiment of this application, the quantity of parameters and the calculation amount can be reduced, and control of the second-type recurrent neural network on a hidden state can be ensured, so that the second-type recurrent neural network has higher universality.
In the foregoing embodiment of this application, operation S601 to operation S603 may be understood as generating an extended state vector through lightweight processing at the lightweight level outside the recurrent neural network, to reduce the quantity of parameters and the calculation amount in the recurrent neural network. Operation S6024 to operation S6026, operation S60281 to operation S60284, and the like may be understood as a combination of generating an extended state vector through lightweight processing at the lightweight level outside the recurrent neural network and generating a supplementary gated vector through lightweight processing at the lightweight level inside the recurrent neural network, to comprehensively reduce the quantity of parameters and the calculation amount in the recurrent neural network. Actually, the supplementary gated vector may be generated through only lightweight processing at the lightweight level inside the recurrent neural network, to reduce the quantity of parameters and the calculation amount in the recurrent neural network. The following describes in detail with reference to
Operation S121: Extract a feature sequence of target data, where the feature sequence includes T input features, T is a positive integer, and t∈[1, T].
The feature sequence of the target data may be extracted with reference to the feature sequence extraction process in operation S601 in the foregoing embodiment of this application. Details are not described herein again.
For t∈[1, T], the following operation S122 to operation S122 are performed to obtain T hidden state vectors.
Operation S122: Obtain the T hidden state vectors based on a first-type recurrent neural network, where a tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, a third supplementary gated vector, and a third gated vector. The third gated vector is determined based on the (t−1)th input feature and the (t−1)th hidden state vector by using a first gated neuron in the first-type recurrent neural network. The third supplementary gated vector is obtained by performing lightweight processing on the third gated vector by using a first transform neuron in the first-type recurrent neural network.
The first gated neural network may be a gated neuron at an update gate layer in the first-type recurrent neural network, or a gated neuron at a reset gate layer in the first-type recurrent neural network. The first gated neuron may determine the third gated vector based on the (t−1)th input feature and the (t−1)th hidden state vector with reference to the processing process shown in the foregoing Formula (2-1) or (2-2).
It should be understood that operation S122 is a recursive recurrent process. A 0th hidden state vector may be a customized initial value, for example, may be 0 or any empirical value. 1st to Tth hidden state vectors may be output values of the recurrent neural network.
The lightweight processing may include linear transformation and/or nonlinear transformation. The linear transformation and the nonlinear transformation may be transformation manners at a lightweight level. Matrix transformation, normalization, an activation function, or the like may be used for the linear transformation. Convolution processing or the like may be used for the nonlinear transformation.
Lightweight processing is performed on the third gated vector, to obtain the third supplementary gated vector. In comparison with a case in which two gated vectors zt and rt are output based on the (t−1)th input feature and the (t−1)th hidden state vector by using two gated neurons in
With reference to
As shown in
As shown in
The processing process shown in
Formula (16-1) indicates to obtain the third gated vector zt3 by inputting the (t−1)th input feature xt-1 and the (t−1)th hidden state vector ht-1 to the first gated neuron at the update gate layer. Formula (16-2) indicates to obtain the third supplementary gated vector rt3* by inputting the third gated vector zt3 to the first transform neuron φ1. Formula (16-3) indicates to obtain the sixth candidate hidden state vector {tilde over (h)}t6 by multiplying the third supplementary gated vector rt3* by the (t−1)th hidden state vector ht-1, and inputting a multiplication result and the input feature xt-1 to the candidate neuron tanh. Formula (16-4) indicates to obtain the tth hidden state vector ht by multiplying the third gated vector zt3 by the sixth candidate hidden state vector {tilde over (h)}t6, multiplying a difference between a unit vector and the third gated vector zt3 by the (t−1)th hidden state vector ht-1, and adding two multiplication results.
As shown in
As shown in
The processing process shown in
Formula (17-1) indicates to obtain the third gated vector rt3 by inputting the (t−1)th input feature xt-1 and the (t−1)th hidden state vector ht-1 to the first gated neuron at the reset gate layer. Formula (17-2) indicates to obtain the third supplementary gated vector zt3* by inputting the third gated vector rt3 to the first transform neuron φ1. Formula (17-3) indicates to obtain the seventh candidate hidden state vector {tilde over (h)}t7 by multiplying the third gated vector rt3 by the (t−1)th hidden state vector ht-1, and inputting a multiplication result and the input feature xt-1 to the candidate neuron. Formula (17-4) indicates to obtain the tth hidden state vector ht by multiplying the third supplementary gated vector zt3* by the seventh candidate hidden state vector {tilde over (h)}t7, multiplying a difference between a unit vector and the third supplementary gated vector zt3* by the (t−1)th hidden state vector ht-1, and adding two multiplication results.
It can be learned based on Formulas (16-1) to (16-4) and Formulas (17-1) to (17-4) that a quantity of weight matrices of a neuron in the recurrent neural network in Formulas (16-1) to (16-4) and Formulas (17-1) to (17-4) is smaller than that in the foregoing Formulas (2-1) to (2-4), thereby significantly reducing the quantity of parameters and the calculation amount in the recurrent neural network.
It should be noted that the first transform neuron φ1 may perform lightweight processing at the lightweight level such as linear transformation and/or nonlinear transformation. For example, φ1 may use a sigmoid function.
Operation S123: Obtain a processing result of the target data based on the T hidden state vectors by using a downstream task network.
The downstream task network may be customized based on a specific downstream task, and different downstream task networks may be used for different downstream tasks. A network structure, a network type, and the like of the downstream task network are not limited in this embodiment of this application. For example, the downstream task may include at least one of the following: a speech recognition task, a speech noise cancellation task, a voice wake-up task, a text recognition task, and a text translation task. Correspondingly, the processing result may include at least one of the following: a speech recognition result, a speech noise cancellation result, and a voice wake-up result of voice data, a text recognition result of image data, and a text translation result of text data.
For example, in the speech recognition task, a text sequence corresponding to voice data may be determined based on T hidden state vectors by using a decoder. For example, the decoder may determine, based on the T hidden state vectors, a probability that each hidden state vector belongs to each word in a language model, and provide a text sequence with a maximum probability as a processing result of the voice data. In the voice wake-up task, after a text sequence of voice data is output by using a decoder, whether the text sequence corresponding to the voice data matches a specified word or sentence of a voice assistant may be detected, and a matching result is used as a processing result of the voice data. In the speech noise cancellation task, a processing process reverse to feature extraction in operation S121 may be performed on T hidden state vectors, for example, processing reverse to MFCC is performed, to obtain voice data after noise cancellation.
According to this embodiment of this application, lightweight processing is performed on the third gated vector, to obtain the third supplementary gated vector. This is equivalent to generating a partial gated vector through lightweight processing. In a related technology, two gated neurons in the first-type recurrent neural network are directly used to output two gated vectors based on the (t−1)th input feature and a (t−1)th hidden state vector. In comparison, in this application, a quantity of parameters and a calculation amount for generating a gated vector can be reduced, thereby reducing a quantity of parameters and a calculation amount in the entire first-type recurrent neural network and improving a network computing speed. In addition, compared with a current manner in which a quantity of parameters and a calculation amount in a network are compressed through pruning processing, in this embodiment of this application, the quantity of parameters and the calculation amount can be reduced, and control of the first-type recurrent neural network on a hidden state can be ensured, so that the first-type recurrent neural network has higher universality.
Operation S141: Extract a feature sequence of target data, where the feature sequence includes T input features, T is a positive integer, and t∈[1, T].
The feature sequence of the target data may be extracted with reference to the feature sequence extraction process in operation S601 in the foregoing embodiment of this application. Details are not described herein again.
Operation S142: Obtain T hidden state vectors based on a second-type recurrent neural network, where a tth hidden state vector and a tth cell state vector are determined based on a fourth gated vector, a fourth supplementary gated vector, a second candidate cell state vector, and a (t−1)th cell state vector. The fourth gated vector is determined based on a (t−1)th input feature and a (t−1)th hidden state vector by using a second gated neuron in the second-type recurrent neural network. The fourth supplementary gated vector is obtained by performing lightweight processing on the fourth gated vector by using a second transform neuron in the second-type recurrent neural network. The second candidate cell state vector is determined based on the (t−1)th input feature and the (t−1)th hidden state vector by using a candidate neuron in the second-type recurrent neural network.
The second gated neural network may be any gated neuron at a forget gate layer, an input gate layer, and an output gate layer in the second-type recurrent neural network, or may be any two gated neurons at a forget gate layer, an input gate layer, and an output gate layer. The second gated neuron may determine the fourth gated vector based on the (t−1)th input feature and the (t−1)th hidden state vector with reference to the processing process shown in the foregoing Formulas (1-1), (1-2), and (2-3).
The lightweight processing may include linear transformation and/or nonlinear transformation. The linear transformation and the nonlinear transformation may be transformation manners at the lightweight level. Matrix transformation, normalization, an activation function, or the like may be used for the linear transformation. Convolution processing or the like may be used for the nonlinear transformation.
When the second gated neural network in operation S142 is any gated neuron at the forget gate layer, the input gate layer, and the output gate layer in the second-type recurrent neural network, lightweight processing may be separately performed on the fourth gated vector by using two second transform neurons, to obtain two fourth supplementary gated vectors. It should be understood that weight matrices of the two second transform neurons may be different. When the second gated neural network in operation S142 is any two gated neurons at the forget gate layer, the input gate layer, and the output gate layer in the second-type recurrent neural network, lightweight processing may be performed by using the second transform neuron on a fourth gated vector output by one of any two gated neurons; or an operation such as splicing or adding may be first performed on two fourth gated vectors output by any two gated neurons, and then lightweight processing is performed on a result obtained after the operation such as splicing or adding, to obtain the fourth supplementary gated vector.
Lightweight processing is performed on a fourth gated vector generated by one or two gated neurons in the second-type recurrent neural network, to obtain the fourth supplementary gated vector. In comparison with a case in which three gated vectors ft, it, and ot are output based on the (t−1)th input feature and the (t−1)th hidden state vector by using three gated neurons in
The candidate neuron may determine the second candidate cell state vector based on the (t−1)th input feature and the (t−1)th hidden state vector with reference to the processing process shown in the foregoing Formula (1-4).
It should be understood that operation S142 is a recursive recurrent process. A 0th hidden state vector is an initial value, and a 0th cell state vector may be a customized initial value, for example, may be 0 or any empirical value. 1st to Tth cell state vectors and 1st to Tth hidden state vectors may be output values of the recurrent neural network.
With reference to
As shown in
As shown in
The processing process shown in
Formula (18-1) indicates to obtain the fourth gated vector ft4 by inputting the (t−1)th input feature xt-1 and the (t−1)th hidden state vector ht-1 to the second gated neuron at the forget gate layer. Formula (18-2) indicates to obtain the fourth supplementary gated vector it4* by inputting the fourth gated vector ft4 to the second transform neuron at the input gate layer. Formula (18-3) indicates to obtain the fourth supplementary gated vector ot4* by inputting the fourth gated vector ft4 to the second transform neuron at the output gate layer. Formula (18-4) indicates to obtain the second candidate cell state vector {tilde over (c)}t2 by inputting the (t−1)th input feature xt-1 and the (t−1)th hidden state vector ht-1 to the candidate neuron. Formula (18-5) indicates to obtain the tth cell state vector Ct by multiplying the fourth gated vector ft4 by the (t−1)th cell state vector Ct-1, and adding a multiplication result and a product of the fourth supplementary gated vector it4* and the second candidate cell state vector {tilde over (c)}t2. Formula (18-6) indicates to obtain the tth hidden state vector ht by multiplying the fourth supplementary gated vector ot4* by tanh(Ct).
As shown in
As shown in
The processing process shown in
Formula (19-1) indicates to obtain the fourth gated vector it by inputting the (t−1)th input feature xt-1 and the (t−1)th hidden state vector ht-1 to the second gated neuron at the input gate layer. Formula (19-2) indicates to obtain the fourth supplementary gated vector ft4* by inputting the fourth gated vector it4 to the second transform neuron at the forget gate layer. Formula (19-3) indicates to obtain the fourth supplementary gated vector ot4* by inputting the fourth gated vector it4 to the second transform neuron at the output gate layer. Formula (19-4) indicates to obtain the second candidate cell state vector {tilde over (c)}t2 by inputting the (t−1)th input feature xt-1 and the (t−1)th hidden state vector ht-1 to the candidate neuron. Formula (19-5) indicates to obtain the tth cell state vector Ct by multiplying the fourth supplementary gated vector ft4* by the (t−1)th cell state vector Ct-1, and adding a multiplication result and a product of the fourth gated vector it4 and the second candidate cell state vector {tilde over (c)}t2. Formula (19-6) indicates to obtain the tth hidden state vector ht by multiplying the fourth supplementary gated vector ot4* by tanh(Ct).
As shown in
As shown in
The processing process shown in
Formula (20-1) indicates to obtain the fourth gated vector ot4 by inputting the (t−1)th input feature xt-1 and the (t−1)th hidden state vector ht-1 to the second gated neuron at the output gate layer. Formula (20-2) indicates to obtain the fourth supplementary gated vector ft4* by inputting the fourth gated vector ot4 to the second transform neuron at the forget gate layer. Formula (20-3) indicates to obtain the fourth supplementary gated vector it4* by inputting the fourth gated vector ot4 to the second transform neuron at the input gate layer. Formula (20-4) indicates to obtain the second candidate cell state vector {tilde over (c)}t2 by inputting the (t−1)th input feature xt-1 and the (t−1)th hidden state vector ht-1 to the candidate neuron. Formula (20-5) indicates to obtain the tth cell state vector Ct by multiplying the fourth supplementary gated vector ft4* by the (t−1)th cell state vector Ct-1, and adding a multiplication result and a product of the fourth supplementary gated vector it4* and the second candidate cell state vector {tilde over (c)}t2. Formula (20-6) indicates to obtain the tth hidden state vector ht by multiplying the fourth gated vector ot4 by tanh(Ct).
As shown in
Two dashed lines in
As shown in
The processing process shown in
Formula (21-1) indicates to obtain the fourth gated vector ft4 by inputting the (t−1)th input feature xt-1 and the (t−1)th hidden state vector ht-1 to the second gated neuron at the forget gate layer. Formula (21-2) indicates to obtain the fourth gated vector it4 by inputting the (t−1)th input feature xt-1 and the (t−1)th hidden state vector ht-1 to the second gated neuron at the input gate layer. Formula (21-3) indicates to obtain the fourth supplementary gated vector ot4* by inputting the fourth gated vector it4 and/or ft4 to the second transform neuron at the output gate layer. Formula (21-4) indicates to obtain the second candidate cell state vector {tilde over (c)}t2 by inputting the (t−1)th input feature xt-1 and the (t−1)th hidden state vector ht-1 to the candidate neuron. Formula (21-5) indicates to obtain the tth cell state vector Ct by multiplying the fourth gated vector ft4 by the (t−1)th cell state vector Ct-1, and adding a multiplication result and a product of the fourth gated vector it4 and the second candidate cell state vector {tilde over (c)}t2. Formula (21-6) indicates to obtain the tth hidden state vector ht by multiplying the fourth supplementary gated vector ot4* by tanh(Ct).
As shown in
Two dashed lines in
As shown in
The processing process shown in
Formula (22-1) indicates to obtain the fourth gated vector ft4 by inputting the (t−1)th input feature xt-1 and the (t−1)th hidden state vector ht-1 to the second gated neuron at the forget gate layer. Formula (22-2) indicates to obtain the fourth gated vector ot4 by inputting the (t−1)th input feature xt-1 and the (t−1)th hidden state vector ht-1 to the second gated neuron at the output gate layer. Formula (22-3) indicates to obtain the fourth supplementary gated vector it4* by inputting the fourth gated vector ft4 and/or ot4 to the second transform neuron at the input gate layer. Formula (22-4) indicates to obtain the second candidate cell state vector {tilde over (c)}t2 by inputting the (t−1)th input feature xt-1 and the (t−1)th hidden state vector ht-1 to the candidate neuron. Formula (22-5) indicates to obtain the tth cell state vector Ct by multiplying the fourth gated vector ft4 by the (t−1)th cell state vector Ct-1, and adding a multiplication result and a product of the fourth supplementary gated vector it4* and the second candidate cell state vector {tilde over (c)}t2. Formula (22-6) indicates to obtain the tth hidden state vector ht by multiplying the fourth gated vector ot4 by tanh(Ct).
As shown in
Two dashed lines in
As shown in
The processing process shown in
Formula (23-1) indicates to obtain the fourth gated vector it by inputting the (t−1)th input feature xt-1 and the (t−1)th hidden state vector ht-1 to the second gated neuron at the input gate layer. Formula (23-2) indicates to obtain the fourth gated vector ot4 by inputting the (t−1)th input feature xt-1 and the (t−1)th hidden state vector ht-1 to the second gated neuron at the output gate layer. Formula (23-3) indicates to obtain the fourth supplementary gated vector ft4* by inputting the fourth gated vector it4 and/or ot4 to the second transform neuron at the forget gate layer. Formula (23-4) indicates to obtain the second candidate cell state vector {tilde over (c)}t2 by inputting the (t−1)th input feature xt-1 and the (t−1)th hidden state vector ht-1 to the candidate neuron. Formula (23-5) indicates to obtain the tth cell state vector Ct by multiplying the fourth supplementary gated vector ft4* by the (t−1)th cell state vector Ct-1, and adding a multiplication result and a product of the fourth gated vector it4 and the second candidate cell state vector {tilde over (c)}t2. Formula (23-6) indicates to obtain the tth hidden state vector ht by multiplying the fourth gated vector ot4 by tanh(Ct).
Operation S143: Obtain a processing result of the target data based on the T hidden state vectors by using a downstream task network.
The downstream task network may be customized based on a specific downstream task, and different downstream task networks may be used for different downstream tasks. A network structure, a network type, and the like of the downstream task network are not limited in this embodiment of this application. For example, the downstream task may include at least one of the following: a speech recognition task, a speech noise cancellation task, a voice wake-up task, a text recognition task, and a text translation task. Correspondingly, the processing result may include at least one of the following: a speech recognition result, a speech noise cancellation result, and a voice wake-up result of voice data, a text recognition result of image data, and a text translation result of text data.
For example, in the speech recognition task, a text sequence corresponding to voice data may be determined based on T hidden state vectors by using a decoder. For example, the decoder may determine, based on the T hidden state vectors, a probability that each hidden state vector belongs to each word in a language model, and provide a text sequence with a maximum probability as a processing result of the voice data. In the voice wake-up task, after a text sequence of voice data is output by using a decoder, whether the text sequence corresponding to the voice data matches a specified word or sentence of a voice assistant may be detected, and a matching result is used as a processing result of the voice data. In the speech noise cancellation task, a processing process reverse to feature extraction in operation S141 may be performed on T hidden state vectors, for example, processing reverse to MFCC is performed, to obtain voice data after noise cancellation.
According to this embodiment of this application, lightweight processing is performed on the fourth gated vector, to obtain the fourth supplementary gated vector. This is equivalent to generating a partial gated vector through lightweight processing. In a related technology, three gated neurons in the second-type recurrent neural network are directly used to output three gated vectors based on the (t−1)th input feature and a (t−1)th hidden state vector. In comparison, in this application, a quantity of parameters and a calculation amount for generating a gated vector can be reduced, thereby reducing a quantity of parameters and a calculation amount in the entire second-type recurrent neural network and improving a network computing speed. In addition, compared with a current manner in which a quantity of parameters and a calculation amount in a network are compressed through pruning processing, in this embodiment of this application, the quantity of parameters and the calculation amount can be reduced, and control of the second-type recurrent neural network on a hidden state can be ensured, so that the second-type recurrent neural network has higher universality.
A feature extraction module 171 is configured to extract a feature sequence of target data. The feature sequence includes T input features. Herein, T is a positive integer, and t∈[1, T].
A first determining module 172 is configured to obtain T hidden state vectors based on a recurrent neural network. A tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, and a (t−1)th extended state vector. The (t−1)th extended state vector is obtained by performing lightweight processing based on the (t−1)th hidden state vector.
A result determining module 173 is configured to obtain a processing result of the target data based on the T hidden state vectors by using a downstream task network.
According to this embodiment of this application, because a partial state vector in a complete state vector that currently needs to be input to the recurrent neural network is an extended state vector obtained through lightweight processing, the recurrent neural network may be controlled to output a hidden state vector of a small dimension. In this way, a quantity of parameters and a calculation amount that are required for outputting the hidden state vector by the recurrent neural network can be reduced. A dimension of the hidden state vector output by the recurrent neural network is reduced. However, because an extended state vector obtained by performing lightweight processing on the hidden state vector and the hidden state vector jointly form a complete state vector input to the recurrent neural network, this is equivalent to a supplementary to status information input to the recurrent neural network. In this way, a network computing speed can be improved, network precision can be ensured during data processing, and processing efficiency of the target data can be improved. In addition, a recurrent neural network with a reduced quantity of parameters and a reduced calculation amount can be deployed on a terminal device, and has higher universality.
In an embodiment, the recurrent neural network includes a first-type recurrent neural network. The first-type recurrent neural network includes a reset gate layer and an update gate layer. The reset gate layer is used to control information to be discarded from a hidden state vector. The update gate layer is used to control information to be added to a hidden state vector. For the first determining module 172, when the recurrent neural network includes the first-type recurrent neural network, that a tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, and a (t−1)th extended state vector includes: determining first gated vectors based on the (t−1)th input feature, the (t−1)th hidden state vector, and the (t−1)th extended state vector respectively by using first gated neurons at the reset gate layer and the update gate layer in the first-type recurrent neural network; determining, by using a candidate neuron in the first-type recurrent neural network, a first candidate hidden state vector based on the first gated vector determined by the first gated neuron at the reset gate layer, the (t−1)th input feature, and the (t−1)th hidden state vector, or determining a first candidate hidden state vector based on the first gated vector determined by the first gated neuron at the reset gate layer, the (t−1)th input feature, the (t−1)th hidden state vector, and the (t−1)th extended state vector; and determining the tth hidden state vector based on the first gated vector determined by the first gated neuron at the update gate layer, the (t−1)th hidden state vector, and the first candidate hidden state vector.
According to this embodiment of this application, the tth hidden state vector is determined based on the (t−1)th input feature, the (t−1)th hidden state vector, and the (t−1)th extended state vector by using the first-type recurrent neural network, so that the first-type recurrent neural network can output a hidden state vector of a small dimension, thereby reducing a quantity of parameters and a calculation amount in the first-type recurrent neural network.
In an embodiment, for the first determining module 172, when the recurrent neural network includes the first-type recurrent neural network, that a tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, and a (t−1)th extended state vector includes: determining a first gated vector based on the (t−1)th input feature, the (t−1)th hidden state vector, and the (t−1)th extended state vector by using a first gated neuron at the reset gate layer or the update gate layer in the first-type recurrent neural network; performing lightweight processing on the first gated vector by using a first transform neuron in the first-type recurrent neural network, to obtain a first supplementary gated vector; and determining the tth hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first supplementary gated vector, and the first gated vector, or determining the tth hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first supplementary gated vector, the first gated vector, and the (t−1)th extended state vector.
In an embodiment, when the first gated neuron is a gated neuron at the update gate layer in the first-type recurrent neural network, the determining the tth hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first supplementary gated vector, and the first gated vector includes: determining a second candidate hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, and the first supplementary gated vector by using a candidate neuron in the first-type recurrent neural network; and determining the tth hidden state vector based on the first gated vector, the (t−1)th hidden state vector, and the second candidate hidden state vector.
In an embodiment, when the first gated neuron is a gated neuron at the reset gate layer in the first-type recurrent neural network, the determining the tth hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first supplementary gated vector, and the first gated vector includes: determining a third candidate hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, and the first gated vector by using a candidate neuron in the first-type recurrent neural network; and determining the tth hidden state vector based on the first supplementary gated vector, the (t−1)th hidden state vector, and the third candidate hidden state vector.
In an embodiment, when the first gated neuron is a gated neuron at the update gate layer in the first-type recurrent neural network, the determining the tth hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first supplementary gated vector, the first gated vector, and the (t−1)th extended state vector includes: determining a fourth candidate hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first supplementary gated vector, and the (t−1)th extended state vector by using a candidate neuron in the first-type recurrent neural network; and determining the tth hidden state vector based on the first gated vector, the (t−1)th hidden state vector, and the fourth candidate hidden state vector.
In an embodiment, when the first gated neuron is a gated neuron at the reset gate layer in the first-type recurrent neural network, the determining the tth hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first supplementary gated vector, the first gated vector, and the (t−1)th extended state vector includes: determining a fifth candidate hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, the first gated vector, and the (t−1)th extended state vector by using a candidate neuron in the first-type recurrent neural network; and determining the tth hidden state vector based on the first supplementary gated vector, the (t−1)th hidden state vector, and the fifth candidate hidden state vector.
According to this embodiment of this application, lightweight processing is performed on the first gated vector, to obtain the first supplementary gated vector. This is equivalent to generating a partial gated vector through lightweight processing. In a related technology, two gated neurons in the first-type recurrent neural network are directly used to output two gated vectors based on the (t−1)th input feature and a (t−1)th spliced state vector. In comparison, in this application, a quantity of parameters and a calculation amount for generating a gated vector can be reduced, thereby reducing a quantity of parameters and a calculation amount in the entire first-type recurrent neural network and improving a network computing speed. In addition, compared with a current manner in which a quantity of parameters and a calculation amount in a network are compressed through pruning processing, in this embodiment of this application, the quantity of parameters and the calculation amount can be reduced, and control of the first-type recurrent neural network on a hidden state can be ensured, so that the first-type recurrent neural network has higher universality.
In an embodiment, the recurrent neural network includes a second-type recurrent neural network. The second-type recurrent neural network includes a forget gate layer, an input gate layer, and an output gate layer. The forget gate layer is used to control information to be discarded from a cell state vector. The input gate layer is used to control information to be added to a cell state vector. The output gate layer is used to control information in a to-be-output cell state vector. For the first determining module 172, that a tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, and a (t−1)th extended state vector includes: splicing the (t−1)th hidden state vector and the (t−1)th extended state vector, to obtain a (t−1)th spliced state vector; and determining the tth hidden state vector and a tth cell state vector based on the (t−1)th input feature, the (t−1)th spliced state vector, and a (t−1)th cell state vector by using the second-type recurrent neural network, where the tth cell state vector is determined based on the (t−1)th spliced state vector, the (t−1)th input feature, and the (t−1)th cell state vector, the tth hidden state vector is determined based on the (t−1)th spliced state vector, the (t−1)th input feature, and the tth cell state vector, and a 0th cell state vector is an initial value.
According to this embodiment of this application, the second-type recurrent neural network can output a hidden state vector of a small dimension, thereby reducing a quantity of parameters and a calculation amount in the second-type recurrent neural network.
In an embodiment, the determining the tth hidden state vector and a tth cell state vector based on the (t−1)th input feature, the (t−1)th spliced state vector, and a (t−1)th cell state vector by using the second-type recurrent neural network includes: determining a second gated vector based on the (t−1)th input feature and the (t−1)th spliced state vector by using a second gated neuron in the second-type recurrent neural network; performing lightweight processing on the second gated vector by using a second transform neuron in the second-type recurrent neural network, to obtain a second supplementary gated vector; determining a first candidate cell state vector based on the (t−1)th input feature and the (t−1)th spliced state vector by using a candidate neuron in the second-type recurrent neural network; and determining the tth hidden state vector and the tth cell state vector based on the second gated vector, the second supplementary gated vector, the first candidate cell state vector, and the (t−1)th cell state vector.
In an embodiment, when the second gated neuron is a gated neuron at the forget gate layer in the second-type recurrent neural network, the second supplementary gated vector includes second supplementary gated vectors that are obtained by performing lightweight processing on the second gated vector respectively by second transform neurons at the input gate layer and the output gate layer in the second-type recurrent neural network. The determining the tth hidden state vector and the tth cell state vector based on the second gated vector, the second supplementary gated vector, the first candidate cell state vector, and the (t−1)th cell state vector includes: determining the tth cell state vector based on the second supplementary gated vector that is obtained by performing lightweight processing on the second gated vector by the second transform neuron at the input gate layer, the second gated vector, and the first candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the second supplementary gated vector that is obtained by performing lightweight processing on the second gated vector by the second transform neuron at the output gate layer.
In an embodiment, when the second gated neuron is a gated neuron at the input gate layer in the second-type recurrent neural network, the second supplementary gated vector includes second supplementary gated vectors that are obtained by performing lightweight processing on the second gated vector respectively by second transform neurons at the forget gate layer and the output gate layer in the second-type recurrent neural network. The determining the tth hidden state vector and the tth cell state vector based on the second gated vector, the second supplementary gated vector, the first candidate cell state vector, and the (t−1)th cell state vector includes: determining the tth cell state vector based on the second supplementary gated vector that is obtained by performing lightweight processing on the second gated vector by the second transform neuron at the forget gate layer, the second gated vector, and the first candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the second supplementary gated vector that is obtained by performing lightweight processing on the second gated vector by the second transform neuron at the output gate layer.
In an embodiment, when the second gated neuron is a gated neuron at the output gate layer in the second-type recurrent neural network, the second supplementary gated vector includes second supplementary gated vectors that are obtained by performing lightweight processing on the second gated vector respectively by second transform neurons at the forget gate layer and the input gate layer in the second-type recurrent neural network. The determining the tth hidden state vector and the tth cell state vector based on the second gated vector, the second supplementary gated vector, the first candidate cell state vector, and the (t−1)th cell state vector includes: determining the tth cell state vector based on the second supplementary gated vectors that are obtained by performing lightweight processing on the second gated vector respectively by the second transform neurons at the forget gate layer and the input gate layer, and the first candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the second gated vector.
In an embodiment, when the second gated neuron includes gated neurons at the forget gate layer and the input gate layer in the second-type recurrent neural network, the second supplementary gated vector includes a second supplementary gated vector that is obtained by the second transform neuron by performing lightweight processing on a second gated vector determined by the second gated neuron at the forget gate layer and/or the input gate layer. The determining the tth hidden state vector and the tth cell state vector based on the second gated vector, the second supplementary gated vector, the first candidate cell state vector, and the (t−1)th cell state vector includes: determining the tth cell state vector based on the second gated vectors respectively determined by the second gated neurons at the forget gate layer and the input gate layer, and the first candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the second supplementary gated vector.
In an embodiment, when the second gated neuron includes gated neurons at the forget gate layer and the output gate layer in the second-type recurrent neural network, the second supplementary gated vector includes a second supplementary gated vector that is obtained by the second transform neuron by performing lightweight processing on a second gated vector determined by a second gated neuron at the forget gate layer and/or the output gate layer. The determining the tth hidden state vector and the tth cell state vector based on the second gated vector, the second supplementary gated vector, the first candidate cell state vector, and the (t−1)th cell state vector includes: determining the tth cell state vector based on the second gated vector determined by the second gated neuron at the forget gate layer, the second supplementary gated vector, and the first candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the second gated vector determined by the second gated neuron at the output gate layer.
In an embodiment, when the second gated neuron includes gated neurons at the input gate layer and the output gate layer in the second-type recurrent neural network, the second supplementary gated vector includes a second supplementary gated vector that is obtained by the second transform neuron by performing lightweight processing on a second gated vector determined by a second gated neuron at the input gate layer and/or the output gate layer. The determining the tth hidden state vector and the tth cell state vector based on the second gated vector, the second supplementary gated vector, the first candidate cell state vector, and the (t−1)th cell state vector includes: determining the tth cell state vector based on the second gated vector determined by the second gated neuron at the input gate layer, the second supplementary gated vector, and the first candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the second gated vector determined by the second gated neuron at the output gate layer.
According to this embodiment of this application, lightweight processing is performed on the second gated vector, to obtain the second supplementary gated vector. This is equivalent to generating a partial gated vector through lightweight processing. In a related technology, three gated neurons in the second-type recurrent neural network are directly used to output three gated vectors based on the (t−1)th input feature and a (t−1)th spliced state vector. In comparison, in this application, a quantity of parameters and a calculation amount for generating a gated vector can be reduced, thereby reducing a quantity of parameters and a calculation amount in the entire second-type recurrent neural network and improving a network computing speed. In addition, compared with a current manner in which a quantity of parameters and a calculation amount in a network are compressed through pruning processing, in this embodiment of this application, the quantity of parameters and the calculation amount can be reduced, and control of the second-type recurrent neural network on a hidden state can be ensured, so that the second-type recurrent neural network has higher universality.
In an embodiment, the lightweight processing includes nonlinear transformation and/or linear transformation.
A feature extraction module 181 is configured to extract a feature sequence of target data. The feature sequence includes T input features. Herein, T is a positive integer, and t∈[1, T].
A second determining module 182 is configured to obtain T hidden state vectors based on a first-type recurrent neural network, where a tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, a third supplementary gated vector, and a third gated vector. The third gated vector is determined based on the (t−1)th input feature and the (t−1)th hidden state vector by using a first gated neuron in the first-type recurrent neural network. The third supplementary gated vector is obtained by performing lightweight processing on the third gated vector by using a first transform neuron in the first-type recurrent neural network.
A result determining module 183 is configured to obtain a processing result of the target data based on the T hidden state vectors by using a downstream task network.
In an embodiment, when the first gated neuron is a gated neuron at an update gate layer in the first-type recurrent neural network, for the second determining module 182, that a tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, a third supplementary gated vector, and a third gated vector includes: determining a sixth candidate hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, and the third supplementary gated vector by using a candidate neuron in the first-type recurrent neural network; and determining the tth hidden state vector based on the third gated vector, the (t−1)th hidden state vector, and the sixth candidate hidden state vector.
In an embodiment, when the first gated neuron is a gated neuron at a reset gate layer in the first-type recurrent neural network, for the second determining module 182, that a tth hidden state vector is determined based on a (t−1)th input feature, a (t−1)th hidden state vector, a third supplementary gated vector, and a third gated vector includes: determining a seventh candidate hidden state vector based on the (t−1)th input feature, the (t−1)th hidden state vector, and the third gated vector by using a candidate neuron in the first-type recurrent neural network; and determining the tth hidden state vector based on the third supplementary gated vector, the (t−1)th hidden state vector, and the seventh candidate hidden state vector.
In an embodiment, the lightweight processing includes nonlinear transformation and/or linear transformation.
According to this embodiment of this application, lightweight processing is performed on the third gated vector, to obtain the third supplementary gated vector. This is equivalent to generating a partial gated vector through lightweight processing. In a related technology, two gated neurons in the first-type recurrent neural network are directly used to output two gated vectors based on the (t−1)th input feature and a (t−1)th hidden state vector. In comparison, in this application, a quantity of parameters and a calculation amount for generating a gated vector can be reduced, thereby reducing a quantity of parameters and a calculation amount in the entire first-type recurrent neural network and improving a network computing speed. In addition, compared with a current manner in which a quantity of parameters and a calculation amount in a network are compressed through pruning processing, in this embodiment of this application, the quantity of parameters and the calculation amount can be reduced, and control of the first-type recurrent neural network on a hidden state can be ensured, so that the first-type recurrent neural network has higher universality.
A feature extraction module 191 is configured to extract a feature sequence of target data. The feature sequence includes T input features. Herein, T is a positive integer, and t∈[1, T].
A third determining module 192 is configured to obtain T hidden state vectors based on a second-type recurrent neural network, where a tth hidden state vector and a tth cell state vector are determined based on a fourth gated vector, a fourth supplementary gated vector, a second candidate cell state vector, and a (t−1)th cell state vector. The fourth gated vector is determined based on a (t−1)th input feature and a (t−1)th hidden state vector by using a second gated neuron in the second-type recurrent neural network. The fourth supplementary gated vector is obtained by performing lightweight processing on the fourth gated vector by using a second transform neuron in the second-type recurrent neural network. The second candidate cell state vector is determined based on the (t−1)th input feature and the (t−1)th hidden state vector by using a candidate neuron in the second-type recurrent neural network.
A result determining module 193 is configured to obtain a processing result of the target data based on the T hidden state vectors by using a downstream task network.
In an embodiment, when the second gated neuron is a gated neuron at a forget gate layer in the second-type recurrent neural network, the fourth supplementary gated vector includes fourth supplementary gated vectors that are obtained by performing lightweight processing on the fourth gated vector respectively by second transform neurons at an input gate layer and an output gate layer in the second-type recurrent neural network. For the third determining module 192, that a tth hidden state vector and a tth cell state vector are determined based on a fourth gated vector, a fourth supplementary gated vector, a second candidate cell state vector, and a (t−1)th cell state vector includes: determining the tth cell state vector based on the fourth supplementary gated vector that is obtained by performing lightweight processing on the fourth gated vector by the second transform neuron at the input gate layer, the fourth gated vector, and the second candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the fourth supplementary gated vector that is obtained by performing lightweight processing on the fourth gated vector by the second transform neuron at the output gate layer.
In an embodiment, when the second gated neuron is a gated neuron at an input gate layer in the second-type recurrent neural network, the fourth supplementary gated vector includes fourth supplementary gated vectors that are obtained by performing lightweight processing on the fourth gated vector respectively by second transform neurons at a forget gate layer and an output gate layer in the second-type recurrent neural network. For the third determining module 192, that a tth hidden state vector and a tth cell state vector are determined based on a fourth gated vector, a fourth supplementary gated vector, a second candidate cell state vector, and a (t−1)th cell state vector includes: determining the tth cell state vector based on the fourth supplementary gated vector that is obtained by performing lightweight processing on the fourth gated vector by the second transform neuron at the forget gate layer, the fourth gated vector, and the second candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the fourth supplementary gated vector that is obtained by performing lightweight processing on the fourth gated vector by the second transform neuron at the output gate layer.
In an embodiment, when the second gated neuron is a gated neuron at the output gate layer in the second-type recurrent neural network, the fourth supplementary gated vector includes fourth supplementary gated vectors that are obtained by performing lightweight processing on the fourth gated vector respectively by second transform neurons at the forget gate layer and the input gate layer in the second-type recurrent neural network. For the third determining module 192, that a tth hidden state vector and a tth cell state vector are determined based on a fourth gated vector, a fourth supplementary gated vector, a second candidate cell state vector, and a (t−1)th cell state vector includes: determining the tth cell state vector based on the fourth supplementary gated vectors that are obtained by performing lightweight processing on the fourth gated vector respectively by the second transform neurons at the forget gate layer and the input gate layer, and the second candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the second gated vector.
In an embodiment, when the second gated neuron includes gated neurons at the forget gate layer and the input gate layer in the second-type recurrent neural network, the fourth supplementary gated vector includes a fourth supplementary gated vector that is obtained by the second transform neuron by performing lightweight processing on a fourth gated vector determined by the second gated neuron at the forget gate layer and/or the input gate layer. For the third determining module 192, that a tth hidden state vector and a tth cell state vector are determined based on a fourth gated vector, a fourth supplementary gated vector, a second candidate cell state vector, and a (t−1)th cell state vector includes: determining the tth cell state vector based on the fourth gated vectors respectively determined by the second gated neurons at the forget gate layer and the input gate layer, and the second candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the fourth supplementary gated vector.
In an embodiment, when the second gated neuron includes gated neurons at the forget gate layer and the output gate layer in the second-type recurrent neural network, the fourth supplementary gated vector includes a fourth supplementary gated vector that is obtained by the second transform neuron by performing lightweight processing on a fourth gated vector determined by a second gated neuron at the forget gate layer and/or the output gate layer. For the third determining module 192, that a tth hidden state vector and a tth cell state vector are determined based on a fourth gated vector, a fourth supplementary gated vector, a second candidate cell state vector, and a (t−1)th cell state vector includes: determining the tth cell state vector based on the fourth gated vector determined by the second gated neuron at the forget gate layer, the fourth supplementary gated vector, and the second candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the fourth gated vector determined by the second gated neuron at the output gate layer.
In an embodiment, when the second gated neuron includes gated neurons at the input gate layer and the output gate layer in the second-type recurrent neural network, the fourth supplementary gated vector includes a fourth supplementary gated vector that is obtained by the second transform neuron by performing lightweight processing on a fourth gated vector determined by a second gated neuron at the input gate layer and/or the output gate layer. For the third determining module 192, that a tth hidden state vector and a tth cell state vector are determined based on a fourth gated vector, a fourth supplementary gated vector, a second candidate cell state vector, and a (t−1)th cell state vector includes: determining the tth cell state vector based on the fourth gated vector determined by the second gated neuron at the input gate layer, the fourth supplementary gated vector, and the second candidate cell state vector; and determining the tth hidden state vector based on the tth cell state vector and the fourth gated vector determined by the second gated neuron at the output gate layer.
According to this embodiment of this application, lightweight processing is performed on the fourth gated vector, to obtain the fourth supplementary gated vector. This is equivalent to generating a supplementary gated vector through lightweight processing at a lightweight level. Three gated neurons in the second-type recurrent neural network are directly used to output three gated vectors based on the (t−1)th input feature and a (t−1)th hidden state vector. In comparison, in this application, a quantity of parameters and a calculation amount for generating a gated vector can be reduced, thereby reducing a quantity of parameters and a calculation amount in the entire second-type recurrent neural network and improving a network computing speed. In addition, compared with a current manner in which a quantity of parameters and a calculation amount in a network are compressed through pruning processing, in this embodiment of this application, the quantity of parameters and the calculation amount can be reduced, and control of the second-type recurrent neural network on a hidden state can be ensured, so that the second-type recurrent neural network has higher universality.
An embodiment of this application provides a data processing apparatus, including a processor and a memory configured to store instructions executable by the processor. When executing the instructions, the processor is configured to implement the foregoing method.
An embodiment of this application provides a terminal device. The terminal device may perform the foregoing data processing method.
An embodiment of this application provides a non-volatile computer-readable storage medium. The non-volatile computer-readable storage medium stores computer program instructions. When the computer program instructions are executed by a processor, the foregoing method is implemented.
An embodiment of this application provides a computer program product, including computer-readable code or a non-volatile computer-readable storage medium carrying computer-readable code. When the computer-readable code is run in a processor in an electronic device, the processor in the electronic device performs the foregoing method.
The following specifically describes the components of the electronic device 1300 with reference to
The processor 1801 may be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits configured to control execution of the foregoing solution program. The processor 1801 may include one or more processing units. For example, the processor 1801 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), an image signal processor (ISP), a controller, a video codec, a digital signal processor (DSP), a baseband processor, and/or a neural-network processing unit (NPU). Different processing units may be independent components, or may be integrated into one or more processors.
The communication interface 1803 is configured to communicate with another electronic device or a communication network, for example, an Ethernet, a radio access network (RAN), a core network, or a wireless local area network (WLAN).
The memory 1802 may be a read-only memory (ROM), another type of static storage device that can store static information and instructions, a random access memory (RAM), or another type of dynamic storage device that can store information and instructions; or may be an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM), other compact disc storage, optical disc storage (including a compact disc, a laser disc, an optical disc, a digital versatile disc, a Blu-ray disc, and the like), a magnetic disk storage medium, another magnetic storage device, or any other medium that can be used to carry or store expected program code in a form of instructions or a data structure and that is accessible to a computer. However, this is not limited thereto. The memory may exist independently, and is connected to the processor through a bus. The memory may alternatively be integrated with the processor.
The memory 1802 is configured to store application program code for executing the foregoing solution, and the processor 1801 controls the execution. The processor 1801 is configured to execute the application program code stored in the memory 1802.
In the foregoing embodiments, descriptions of embodiments have respective focuses.
For a part that is not described in detail in an embodiment, refer to related descriptions in other embodiments.
The computer-readable storage medium may be a tangible device that can retain and store instructions used by an instruction execution device. The computer-readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination thereof. More specific examples (a non-exhaustive list) of the computer-readable storage medium include: a portable computer disk, a hard disk, a random-access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM, or flash memory), a static random-access memory (SRAM), a portable compact disk read-only memory (CD-ROM), a digital versatile disc (DVD), a memory stick, a floppy disk, a mechanical coding device, for example, a punching card or a groove protrusion structure that stores instructions, and any suitable combination thereof.
Computer-readable program instructions or code described herein can be downloaded to respective computing/processing devices from a computer-readable storage medium, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network and/or a wireless network. The network may include a copper transmission cable, optical fiber transmission, wireless transmission, a router, a firewall, a switch, a gateway computer, and/or an edge server. A network adapter card or network interface in each computing/processing device receives the computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device.
The computer program instructions used to perform the operations in this application may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, status setting data, or source code or object code written in any combination of one or more programming languages. The programming languages include object-oriented programming languages such as Smalltalk and C++, and conventional procedural programming languages such as a “C” language or a similar programming language. The computer-readable program instructions may be executed entirely on a user computer, partly on a user computer, as a stand-alone software package, partly on a user computer and partly on a remote computer, or entirely on a remote computer or a server. In a case involving a remote computer, the remote computer may be connected to a user computer over any type of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (for example, connected over Internet by using an Internet service provider). In some embodiments, an electronic circuit, for example, a programmable logic circuit, a field-programmable gate array (FPGA), or a programmable logic array (PLA), is customized by using status information of computer-readable program instructions. The electronic circuit may execute the computer-readable program instructions, to implement various aspects of this application.
The various aspects of this application are described herein with reference to the flowcharts and/or block diagrams of the method, the apparatus (system), and the computer program product according to embodiments of this application. It should be understood that each block of the flowcharts and/or block diagrams and a combination of blocks in the flowcharts and/or block diagrams may be implemented by the computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general-purpose computer, a dedicated computer, or another programmable data processing apparatus to produce a machine, so that when the instructions are executed by the processor of the computer or the another programmable data processing apparatus, an apparatus for implementing functions/actions specified in one or more blocks in the flowcharts and/or block diagrams is generated. These computer-readable program instructions may alternatively be stored in the computer-readable storage medium. These instructions enable a computer, a programmable data processing apparatus, and/or another device to work in a specific manner. Therefore, the computer-readable medium storing the instructions includes an artifact that includes instructions for implementing the various aspects of the functions/actions specified in the one or more blocks in the flowcharts and/or block diagrams.
The computer-readable program instructions may alternatively be loaded onto a computer, another programmable data processing apparatus, or another device, so that a series of operation steps are executed on the computer, the another programmable data processing apparatus, or the another device to produce a computer-implemented process. Therefore, the instructions executed on the computer, the another programmable data processing apparatus, or the another device implement the functions/actions specified in the one or more blocks in the flowcharts and/or block diagrams.
The flowcharts and block diagrams in the accompanying drawings show possible implementations of system architectures, functions, and operations of apparatuses, systems, methods, and computer program products according to a plurality of embodiments of this application. In this regard, each block in the flowcharts or block diagrams may represent a module, a program segment, or a part of the instructions, and the module, the program segment, or the part of the instructions includes one or more executable instructions for implementing a specified logical function. In some alternative implementations, a function marked in the block may also occur in an order different from that marked in the accompanying drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and may sometimes be executed in a reverse order, depending on a related function.
It should also be noted that each block in the block diagrams and/or the flowcharts, and a combination of blocks in the block diagrams and/or the flowcharts may be implemented by hardware (for example, a circuit or an application specific integrated circuit (ASIC)) that performs a corresponding function or action, or may be implemented by a combination of hardware and software, for example, firmware.
Although the present invention is described with reference to embodiments, in a process of implementing the present invention that claims protection, a person skilled in the art may understand and implement another variation of the disclosed embodiments by viewing the accompanying drawings, disclosed content, and the appended claims. In the claims, “comprising” (comprising) does not exclude another component or another step, and “a” or “one” does not exclude a case of multiple. A single processor or another unit can implement several functions enumerated in the claims. Some measures are recorded in dependent claims that are different from each other, but this does not mean that these measures cannot be combined to produce good effect.
The foregoing has described embodiments of this application. The foregoing descriptions are examples, not exhaustive, and are not limited to the disclosed embodiments. Without departing from the scope of the described embodiments, many modifications and variations are apparent to a person of ordinary skill in the technical field. Selection of terms used in this specification is intended to best explain the principles of embodiments, actual application, or improvements to technologies in the market, or to enable another person of ordinary skill in the art to understand embodiments disclosed in this specification.
| Number | Date | Country | Kind |
|---|---|---|---|
| 202211258515.2 | Oct 2022 | CN | national |
This application is a continuation of International Application No. PCT/CN2023/103854, filed on Jun. 29, 2023, which claims priority to Chinese Patent Application No. 202211258515.2, filed on Oct. 13, 2022. The disclosures of the aforementioned applications are hereby incorporated by reference in their entireties.
| Number | Date | Country | |
|---|---|---|---|
| Parent | PCT/CN2023/103854 | Jun 2023 | WO |
| Child | 19176382 | US |