The present disclosure relates to artificial intelligence (AI) technologies, and in particular, to an item recommendation method and a related device thereof.
With rapid development of computer technologies, to meet an Internet access requirement of a user, developers are increasingly inclined to display content of interest on a page of an application. In view of this, for an application, it is usually necessary to predict which item or items the user purchases, that is, recommend an item of interest to the user, to present these items on the page of the application, thereby providing a service for the user.
Currently, a neural network model of an AI technology can be used to predict an item that can be recommended to the user. Specifically, historical information of the user may be first collected, and the historical information indicates items that the user has interacted with and behaviors of the user for these items. Because there are a plurality types of behaviors of the user for the item, the historical information of the user may be classified based on the types of behaviors, and various types of information are separately processed by using the neural network model, to obtain processing results of the various types of information. Finally, the processing results of the various types of information may be superimposed to obtain an item recommendation result, thereby determining a target item to be recommended to the user.
In the foregoing process, when the neural network model processes information, mutual impact of a plurality of behaviors belonging to a same category is mainly considered, and factors that are considered are simple. As a result, accuracy of the item recommendation result finally output by the model is not high, affecting user experience.
Embodiments of the present disclosure provide an item recommendation method and a related device thereof. An item recommendation result output by a neural network model used by the method can have high accuracy, thereby helping optimize user experience.
A first aspect of embodiments of the present disclosure provides an item recommendation method. The method includes the following steps.
When a user uses an application, to display an item of interest on a page of the application, some historical data of previously using the application by the user may be first acquired, and N pieces of first information may be obtained based on the historical data. An ith piece of first information indicates an ith first item (that is, a historical item) that has been operated by the user when the user uses the application and an ith behavior. The ith behavior may be understood as a behavior performed by the user when the user operates the ith item. N behaviors performed by the user may be classified into M categories. i=1, . . . , N, N≥M, and M>1. For example, when the user uses shopping software, to predict a commodity that can be recommended to the user, five pieces of first information generated when the user uses the software previously may be obtained. A 1st piece of first information indicates a piece of clothing and a click behavior of the user on the clothing. A 2nd piece of first information indicates a pair of shoes and an add-to-favorites behavior of the user on the shoes. A 3rd piece of first information indicates a hat and a purchase behavior of the user on the hat. A 4th piece of first information indicates a pair of trousers and a click behavior of the user on the trousers. A 5th piece of first information indicates another pair of trousers and a purchase behavior of the user on the trousers. It can be learned that the five behaviors of the user on the five commodities may be classified into three types: a tap behavior, an add-to-favorites behavior, and a purchase behavior.
After the N pieces of first information are obtained, the N pieces of first information may be input into a target model, so that the target model may process the N pieces of first information based on a multi-head self-attention mechanism, to correspondingly obtain N pieces of second information. After the N pieces of second information are obtained, the target model may obtain an item recommendation result based on the N pieces of second information. The item recommendation result may be used to determine, from K second items (that is, candidate items), a target item recommended to the user, and K≥1.
It can be learned from the foregoing method that, when the target item of interest needs to be recommended to the user, the N pieces of first information may be first input to the target model. The ith piece of first information indicates the ith first item and the ith behavior. The ith behavior is a behavior of the user for the ith item. The N behaviors of the user correspond to M categories. i=1, . . . , N≥M, and M>1. Then, the N pieces of first information may be processed by using the target model based on the multi-head self-attention mechanism, to obtain the N pieces of second information. Finally, the item recommendation result can be obtained by using the target model based on the N pieces of second information. The item recommendation result is used to determine, from the K second items, the target item recommended to the user, and K≥1. In the foregoing process, the N pieces of first information not only indicate N first items, but also indicate the N behaviors that can be classified into the M categories. Therefore, in a process in which the target model processes the N pieces of first information to correspondingly obtain the N pieces of second information, not only mutual impact of a plurality of behaviors belonging to a same category and mutual impact of a plurality of first items may be considered, but also mutual impact of a plurality of behaviors belonging to different categories may be considered. Factors that are considered are comprehensive. Therefore, the item recommendation result output by the target model based on the N pieces of second information can have high accuracy, thereby helping optimize user experience.
In a possible implementation, processing the N pieces of first information based on the multi-head self-attention mechanism, to obtain the N pieces of second information includes: performing linear processing on the ith piece of first information, to obtain an ith piece of Q information, an ith piece of K information, and an ith piece of V information; and performing an operation on the ith piece of Q information, N pieces of K information, N pieces of V information, and N pieces of weight information corresponding to the ith behavior, to obtain an ith piece of second information, where a jth piece of weight information corresponding to the ith behavior is determined based on the ith behavior and a jth behavior, and j=1, . . . , N. In the foregoing implementation, after the N pieces of first information are received, for any one of the N pieces of first information, that is, the ith piece of first information, the target model may first perform linear processing on the ith piece of first information, to obtain the ith piece of Q information, the ith piece of K information, and the ith piece of V information. For the remaining first information other than the ith piece of first information, the target model may also perform an operation similar to that performed on the ith piece of first information. Therefore, a total of N pieces of Q information, N pieces of K information, and N pieces of V information can be obtained. To be specific, the target model may perform linear processing on the 1st piece of first information, to obtain a 1st piece of Q information, a 1st piece of K information, and a 1st piece of V information, may further perform linear processing on the 2nd piece of first information, to obtain a 2nd piece of Q information, a 2nd piece of K information, a 2nd piece of V information, . . . , and may further perform linear processing on an Nth piece of first information, to obtain an Nth piece of Q information, an Nth piece of K information, and an Nth piece of V information.
For the ith piece of Q information, the target model may perform an operation on the ith piece of Q information, the N pieces of K information, the N pieces of V information, and the N pieces of weight information corresponding to the ith behavior, to obtain the ith piece of second information. The jth piece of weight information corresponding to the ith behavior is determined based on the ith behavior and the jth behavior, and j=1, . . . , N. For the remaining Q information other than the ith piece of Q information, the target model may also perform an operation similar to that performed on the ith piece of Q information. Therefore, the N pieces of second information can be obtained. To be specific, the target model may first perform an operation on the 1st piece of Q information, the N pieces of K information, the N pieces of V information, and N pieces of weight information corresponding to a 1st behavior, to obtain a 1st piece of second information, the target model may further perform an operation on the 2nd piece of Q information, the N pieces of K information, the N pieces of V information, and N pieces of weight information corresponding to a 2nd behavior, to obtain a 2nd piece of second information, . . . , and the target model may further perform an operation on the Nth piece of Q information, the N pieces of K information, the N pieces of V information, and N pieces of weight information corresponding to an Nth behavior, to obtain an Nth piece of second information. A 1st piece of weight information corresponding to the 1st behavior is determined based on the 1st behavior, a 2nd piece of weight information corresponding to the 1st behavior is determined based on the 1st behavior and the 2nd behavior, . . . , an Nth piece of weight information of the 1st behavior is determined based on the 1st behavior and the Nth behavior, . . . , a 1st piece of weight information corresponding to the Nth behavior is determined based on the Nth behavior and the 1st behavior, a 2nd piece of weight information corresponding to the Nth behavior is determined based on the Nth behavior and the 2nd behavior, . . . , and an Nth piece of weight information of the Nth behavior is determined based on the Nth behavior.
In a possible implementation, the method further includes: obtaining N pieces of third information, where an ith piece of third information indicates the ith behavior; and performing an operation on the ith piece of third information and the N pieces of third information, to obtain N pieces of fourth information corresponding to the ith behavior, where a jth piece of fourth information corresponding to the ith behavior indicates a distance between the ith behavior and the jth behavior. The performing an operation on the ith piece of Q information, N pieces of K information, N pieces of V information, and N pieces of weight information corresponding to the ith behavior, to obtain an ith piece of second information includes: performing an operation on the ith piece of Q information, the N pieces of K information, the N pieces of V information, the N pieces of weight information corresponding to the ith behavior, and the N pieces of fourth information corresponding to the ith behavior, to obtain the ith piece of second information. In the foregoing implementation, after the N pieces of third information are received, for any one of the N pieces of third information, that is, the ith piece of third information, the target model may perform an operation on the ith piece of third information and the N pieces of third information, to obtain the N pieces of fourth information corresponding to the ith behavior. The jth piece of fourth information corresponding to the ith behavior indicates a distance between the ith behavior and the jth behavior. For the remaining third information other than the ith piece of third information, the target model may also perform an operation similar to that performed on the ith piece of third information. Therefore, a total of N pieces of fourth information corresponding to the 1st behavior, N pieces of fourth information corresponding to the 2nd behavior, . . . , and N pieces of fourth information corresponding to the Nth behavior can be obtained. To be specific, the target model may perform an operation on the 1st piece of third information and the 1st piece of third information to obtain a 1st piece of fourth information (indicating a distance between 1st behaviors) corresponding to the 1st behavior, may further perform an operation on the 1st piece of third information and a 2nd piece of third information to obtain a 2nd piece of fourth information (indicating a distance between the 1st behavior and the 2nd behavior) corresponding to the 1st behavior, . . . , may further perform an operation on the 1st piece of third information and an Nth piece of third information to obtain an Nth piece of fourth information (indicating a distance between the 1st behavior and the Nth behavior) corresponding to the 1st behavior, . . . , may further perform an operation on the Nth piece of third information and the 1st piece of third information to obtain a 1st piece of fourth information (indicating a distance between the Nth behavior and the 1st behavior) corresponding to the Nth behavior, may further perform an operation on the Nth piece of third information and the 2nd piece of third information to obtain a 2nd piece of fourth information (indicating a distance between the Nth behavior and the 2nd behavior) corresponding to the Nth behavior, . . . , and may further perform an operation on the Nth piece of third information and the Nth piece of third information to obtain an Nth piece of fourth information (indicating a distance between Nth behaviors) corresponding to the Nth behavior. In this case, for the ith piece of Q information, the target model may perform an operation on the ith piece of Q information, the N pieces of K information, the N pieces of V information, the N pieces of weight information corresponding to the ith behavior, and the N pieces of fourth information corresponding to the ith behavior, to obtain the ith piece of second information. For the remaining Q information other than the ith piece of Q information, the target model may also perform an operation similar to that performed on the ith piece of Q information. Therefore, the N pieces of second information can be obtained. To be specific, the target model may first perform an operation on the 1st piece of Q information, the N pieces of K information, the N pieces of V information, the N pieces of weight information corresponding to the 1st behavior, and the N pieces of fourth information corresponding to the 1st behavior to obtain the 1st piece of second information, the target model may further perform an operation on the 2nd piece of Q information, the N pieces of K information, the N pieces of V information, the N pieces of weight information corresponding to the 2nd behavior, and the N pieces of fourth information corresponding to the 2nd behavior to obtain the 2nd piece of second information, . . . , and the target model may further perform an operation on the Nth piece of Q information, the N pieces of K information, the N pieces of V information, the N pieces of weight information corresponding to the Nth behavior, and the N pieces of fourth information corresponding to the Nth behavior to obtain the Nth piece of second information. It can be learned that, in a process of processing the N pieces of first information based on the multi-head self-attention mechanism, the target model further considers impact caused by a distance between orders of different behaviors. Factors that are considered are more comprehensive in comparison with a related technology. The item recommendation result output by the target model may also accurately fit a real intention of the user, thereby further improving accuracy of the item recommendation result.
In a possible implementation, the distance between the ith behavior and the jth behavior includes an interval between an order of the ith behavior and an order of the jth behavior, for example, an interval between a time at which the user performs the ith behavior and a time at which the user performs the jth behavior.
In a possible implementation, obtaining the item recommendation result based on the N pieces of second information includes: performing feature extraction on the N pieces of second information to obtain fifth information and sixth information, where the fifth information indicates a difference between the N behaviors, and the sixth information indicates a same point between the N behaviors; fusing the fifth information and the sixth information to obtain seventh information, where the seventh information indicates interest distribution of the user; and calculating matching degrees between the seventh information and K pieces of eighth information, where the matching degree is used as the item recommendation result, a tth piece of eighth information indicates a tth second item, and t=1, . . . , K. In the foregoing implementation, after the N pieces of second information are obtained, the target model may perform feature extraction on the N pieces of second information in one manner, to obtain the fifth information. The fifth information includes an exclusive characteristic of each of the N behaviors. Therefore, the fifth information may be used to indicate a difference between the N behaviors. At the same time, the target model may further perform feature extraction on the N pieces of second information in another manner, to obtain the sixth information. The sixth information includes a common characteristic of the N behaviors. Therefore, the sixth information may indicate a same point between the N behaviors. After the fifth information and the sixth information are obtained, the target model may perform weighted summation on the fifth information and the sixth information to obtain the seventh information. The seventh information is a behavior representation of the user. Therefore, the seventh information may indicate interest distribution of the user. After the seventh information is obtained, the target model may further obtain the K pieces of eighth information. The tth piece of eighth information indicates the tth second item, and t=1, . . . , K. In the K pieces of eighth information, the target model may calculate a matching degree between the seventh information and the tth piece of eighth information. For eighth information other than the tth piece of eighth information, the target model may also perform an operation similar to that performed on the tth piece of eighth information. Therefore, the matching degrees between the seventh information and the K pieces of eighth information can be obtained. Therefore, these matching degrees can be used as final item recommendation results output by the target model. It can be learned that the target model may perform deeper information mining on a processing result of the multi-head self-attention mechanism, to mine exclusive information of a plurality of behaviors of the user and common information of the plurality of behaviors, thereby constructing a behavior representation of the user. In this case, an item that can match the behavior representation of the user may be used as the target item recommended to the user, so that accuracy of item recommendation can be improved.
In a possible implementation, the K second items include the N first items.
A second aspect of embodiments of the present disclosure provides a model training method. The method includes: inputting N pieces of first information into a to-be-trained model to obtain a predicted item recommendation result, where the to-be-trained model is configured to: obtain the N pieces of first information, where an ith piece of first information indicates an ith first item and an ith behavior, the ith behavior is a behavior of a user for the ith item, N behaviors of the user correspond to M categories, i=1, . . . , N, N≥M, and M>1; process the N pieces of first information based on a multi-head self-attention mechanism, to obtain N pieces of second information; and obtain the predicted item recommendation result based on the N pieces of second information, where the predicted item recommendation result is used to determine, from K second items, a target item recommended to the user, and K≥1; obtaining a target loss based on the predicted item recommendation result and a real item recommendation result, where the target loss indicates a difference between the predicted item recommendation result and the real item recommendation result; and updating a parameter of the to-be-trained model based on the target loss until a model training condition is met, to obtain a target model.
The target model obtained through training in the foregoing method has a function of recommending an item to the user. When the target item of interest needs to be recommended to the user, the N pieces of first information may be first input to the target model. The ith piece of first information indicates the ith first item and the ith behavior. The ith behavior is a behavior of the user for the ith item. The N behaviors of the user correspond to M categories. i=1, . . . , N≥M, and M>1. Then, the N pieces of first information may be processed by using the target model based on the multi-head self-attention mechanism, to obtain the N pieces of second information. Finally, the item recommendation result can be obtained by using the target model based on the N pieces of second information. The item recommendation result is used to determine, from the K second items, the target item recommended to the user, and K≥1. In the foregoing process, the N pieces of first information not only indicate N first items, but also indicate the N behaviors that can be classified into the M categories. Therefore, in a process in which the target model processes the N pieces of first information to correspondingly obtain the N pieces of second information, not only mutual impact of a plurality of behaviors belonging to a same category and mutual impact of a plurality of first items may be considered, but also mutual impact of a plurality of behaviors belonging to different categories may be considered. Factors that are considered are comprehensive. Therefore, the item recommendation result output by the target model based on the N pieces of second information can have high accuracy, thereby helping optimize user experience.
In a possible implementation, the to-be-trained model is configured to: perform linear processing on the ith piece of first information, to obtain an ith piece of Q information, an ith piece of K information, and an ith piece of V information; and perform an operation on the ith piece of Q information, N pieces of K information, N pieces of V information, and N pieces of weight information corresponding to the ith behavior, to obtain an ith piece of second information, where a jth piece of weight information corresponding to the ith behavior is determined based on the ith behavior and a jth behavior, and j=1, . . . , N.
In a possible implementation, the to-be-trained model is further configured to: obtain N pieces of third information, where an ith piece of third information indicates the ith behavior; and perform an operation on the ith piece of third information and the N pieces of third information, to obtain N pieces of fourth information corresponding to the ith behavior, where a jth piece of fourth information corresponding to the ith behavior indicates a distance between the ith behavior and the ith behavior. The to-be-trained model is configured to perform an operation on the ith piece of Q information, the N pieces of K information, the N pieces of V information, the N pieces of weight information corresponding to the ith behavior, and the N pieces of fourth information corresponding to the ith behavior, to obtain the ith piece of second information.
In a possible implementation, the distance between the ith behavior and the jth behavior includes an interval between an order of the ith behavior and an order of the jth behavior.
In a possible implementation, the to-be-trained model is configured to: perform feature extraction on the N pieces of second information to obtain fifth information and sixth information, where the fifth information indicates a difference between the N behaviors, and the sixth information indicates a same point between the N behaviors; fuse the fifth information and the sixth information to obtain seventh information, where the seventh information indicates interest distribution of the user; and calculate matching degrees between the seventh information and K pieces of eighth information, where the matching degree is used as the item recommendation result, a tth piece of eighth information indicates a tth second item, and t=1, . . . , K.
In a possible implementation, the K second items include the N first items.
A third aspect of embodiments of the present disclosure provides an item recommendation apparatus. The apparatus includes: a first obtaining module configured to obtain N pieces of first information by using a target model, where an ith piece of first information indicates an ith first item and an ith behavior, the ith behavior is a behavior of a user for the ith item, N behaviors of the user correspond to M categories, i=1, . . . , N, N≥M, and M>1; a processing module configured to process the N pieces of first information by using the target model based on a multi-head self-attention mechanism, to obtain N pieces of second information; and a second obtaining module configured to obtain an item recommendation result by using the target model based on the N pieces of second information, where the item recommendation result is used to determine, from K second items, a target item recommended to the user, and K≥1.
It can be learned from the foregoing apparatus that, when the target item of interest needs to be recommended to the user, the N pieces of first information may be first input to the target model. The ith piece of first information indicates the ith first item and the ith behavior. The ith behavior is a behavior of the user for the ith item. The N behaviors of the user correspond to M categories. i=1, . . . , N≥M, and M>1. Then, the N pieces of first information may be processed by using the target model based on the multi-head self-attention mechanism, to obtain the N pieces of second information. Finally, the item recommendation result can be obtained by using the target model based on the N pieces of second information. The item recommendation result is used to determine, from the K second items, the target item recommended to the user, and K≥1. In the foregoing process, the N pieces of first information not only indicate N first items, but also indicate the N behaviors that can be classified into the M categories. Therefore, in a process in which the target model processes the N pieces of first information to correspondingly obtain the N pieces of second information, not only mutual impact of a plurality of behaviors belonging to a same category and mutual impact of a plurality of first items may be considered, but also mutual impact of a plurality of behaviors belonging to different categories may be considered. Factors that are considered are comprehensive. Therefore, the item recommendation result output by the target model based on the N pieces of second information can have high accuracy, thereby helping optimize user experience.
In a possible implementation, the processing module is configured to: perform linear processing on the ith piece of first information by using the target model, to obtain an ith piece of Q information, an ith piece of K information, and an ith piece of V information; and perform an operation on the ith piece of Q information, N pieces of K information, N pieces of V information, and N pieces of weight information corresponding to the ith behavior by using the target model, to obtain an ith piece of second information, where a jth piece of weight information corresponding to the ith behavior is determined based on the ith behavior and a jth behavior, and j=1, . . . , N.
In a possible implementation, the apparatus further includes: a third obtaining module configured to obtain N pieces of third information by using the target model, where an ith piece of third information indicates the ith behavior; an operation module configured to perform an operation on the ith piece of third information and the N pieces of third information by using the target model to obtain N pieces of fourth information corresponding to the ith behavior, where a jth piece of fourth information corresponding to the ith behavior indicates a distance between the ith behavior and the jth behavior; and a processing module configured to perform an operation on the ith piece of Q information, the N pieces of K information, the N pieces of V information, the N pieces of weight information corresponding to the ith behavior, and the N pieces of fourth information corresponding to the ith behavior by using the target model, to obtain the ith piece of second information.
In a possible implementation, the distance between the ith behavior and the jth behavior includes an interval between an order of the ith behavior and an order of the jth behavior.
In a possible implementation, the second obtaining module is configured to: perform feature extraction on the N pieces of second information by using the target model to obtain fifth information and sixth information, where the fifth information indicates a difference between the N behaviors, and the sixth information indicates a same point between the N behaviors; fuse the fifth information and the sixth information by using the target model to obtain seventh information, where the seventh information indicates interest distribution of the user; and calculate matching degrees between the seventh information and K pieces of eighth information by using the target model, where the matching degree is used as the item recommendation result, a tth piece of eighth information indicates a tth second item, and t=1, . . . , K.
In a possible implementation, the K second items include the N first items.
A fourth aspect of embodiments of the present disclosure provides a model training apparatus, where the apparatus includes: a processing module configured to input N pieces of first information into a to-be-trained model to obtain a predicted item recommendation result, where the to-be-trained model is configured to: obtain the N pieces of first information, where an ith piece of first information indicates an ith first item and an ith behavior, the ith behavior is a behavior of a user for the ith item, N behaviors of the user correspond to M categories, i=1, . . . , N, N≥M, and M>1; process the N pieces of first information based on a multi-head self-attention mechanism, to obtain N pieces of second information; and obtain the predicted item recommendation result based on the N pieces of second information, where the predicted item recommendation result is used to determine, from K second items, a target item recommended to the user, and K≥1; an obtaining module configured to obtain a target loss based on the predicted item recommendation result and a real item recommendation result, where the target loss indicates a difference between the predicted item recommendation result and the real item recommendation result; and an updating module configured to update a parameter of the to-be-trained model based on the target loss until a model training condition is met, to obtain a target model.
The target model obtained through training by the apparatus has a function of recommending an item to the user. When the target item of interest needs to be recommended to the user, the N pieces of first information may be first input to the target model. The ith piece of first information indicates the ith first item and the ith behavior. The ith behavior is a behavior of the user for the ith item. The N behaviors of the user correspond to M categories. i=1, . . . , N≥M, and M>1. Then, the N pieces of first information may be processed by using the target model based on the multi-head self-attention mechanism, to obtain the N pieces of second information. Finally, the item recommendation result can be obtained by using the target model based on the N pieces of second information. The item recommendation result is used to determine, from the K second items, the target item recommended to the user, and K≥1. In the foregoing process, the N pieces of first information not only indicate N first items, but also indicate the N behaviors that can be classified into the M categories. Therefore, in a process in which the target model processes the N pieces of first information to correspondingly obtain the N pieces of second information, not only mutual impact of a plurality of behaviors belonging to a same category and mutual impact of a plurality of first items may be considered, but also mutual impact of a plurality of behaviors belonging to different categories may be considered. Factors that are considered are comprehensive. Therefore, the item recommendation result output by the target model based on the N pieces of second information can have high accuracy, thereby helping optimize user experience.
In a possible implementation, the to-be-trained model is configured to: perform linear processing on the ith piece of first information, to obtain an ith piece of Q information, an ith piece of K information, and an ith piece of V information; and perform an operation on the ith piece of Q information, N pieces of K information, N pieces of V information, and N pieces of weight information corresponding to the ith behavior, to obtain an ith piece of second information, where a jth piece of weight information corresponding to the ith behavior is determined based on the ith behavior and a jth behavior, and j=1, . . . , N.
In a possible implementation, the to-be-trained model is further configured to: obtain N pieces of third information, where an ith piece of third information indicates the ith behavior; and perform an operation on the ith piece of third information and the N pieces of third information to obtain N pieces of fourth information corresponding to the ith behavior, where a jth piece of fourth information corresponding to the ith behavior indicates a distance between the ith behavior and the jth behavior. The to-be-trained model is configured to perform an operation on the ith piece of Q information, the N pieces of K information, the N pieces of V information, the N pieces of weight information corresponding to the ith behavior, and the N pieces of fourth information corresponding to the ith behavior, to obtain the ith piece of second information.
In a possible implementation, the distance between the ith behavior and the jth behavior includes an interval between an order of the ith behavior and an order of the jth behavior.
In a possible implementation, the to-be-trained model is configured to: perform feature extraction on the N pieces of second information to obtain fifth information and sixth information, where the fifth information indicates a difference between the N behaviors, and the sixth information indicates a same point between the N behaviors; fuse the fifth information and the sixth information to obtain seventh information, where the seventh information indicates interest distribution of the user; and calculate matching degrees between the seventh information and K pieces of eighth information, where the matching degree is used as the item recommendation result, a tth piece of eighth information indicates a tth second item, and t=1, . . . , K.
In a possible implementation, the K second items include the N first items.
A fifth aspect of embodiments of the present disclosure provides an item recommendation apparatus. The apparatus includes a memory and a processor. The memory stores code, and the processor is configured to execute the code. When the code is executed, the item recommendation apparatus performs the method according to any one of the first aspect or the possible implementations of the first aspect.
A sixth aspect of embodiments of the present disclosure provides a model training apparatus. The apparatus includes a memory and a processor. The memory stores code, and the processor is configured to execute the code. When the code is executed, the model training apparatus performs the method according to any one of the second aspect or the possible implementations of the second aspect.
A seventh aspect of embodiments of the present disclosure provides a circuit system. The circuit system includes a processing circuit. The processing circuit is configured to perform the method according to any one of the first aspect, the possible implementations of the first aspect, the second aspect, or the possible implementations of the second aspect.
An eighth aspect of embodiments of the present disclosure provides a chip system. The chip system includes a processor configured to invoke a computer program or computer instructions stored in a memory, so that the processor performs the method according to any one of the first aspect, the possible implementations of the first aspect, the second aspect, or the possible implementations of the second aspect.
In a possible implementation, the processor is coupled to the memory through an interface.
In a possible implementation, the chip system further includes the memory. The memory stores the computer program or the computer instructions.
A ninth aspect of embodiments of the present disclosure provides a computer storage medium. The computer storage medium stores a computer program. When the program is executed by a computer, the computer is enabled to perform the method according to any one of the first aspect, the possible implementations of the first aspect, the second aspect, or the possible implementations of the second aspect.
A tenth aspect of embodiments of the present disclosure provides a computer program product. The computer program product stores instructions. When the instructions are executed by a computer, the computer is enabled to perform the method according to any one of the first aspect, the possible implementations of the first aspect, the second aspect, or the possible implementations of the second aspect.
In embodiments of the present disclosure, when the target item of interest needs to be recommended to the user, the N pieces of first information may be first input to the target model. The ith piece of first information indicates the ith first item and the ith behavior. The ith behavior is a behavior of the user for the ith item. The N behaviors of the user correspond to M categories. i=1, . . . , N≥M, and M>1. Then, the N pieces of first information may be processed by using the target model based on the multi-head self-attention mechanism, to obtain the N pieces of second information. Finally, the item recommendation result can be obtained by using the target model based on the N pieces of second information. The item recommendation result is used to determine, from the K second items, the target item recommended to the user, and K≥1. In the foregoing process, the N pieces of first information not only indicate N first items, but also indicate the N behaviors that can be classified into the M categories. Therefore, in a process in which the target model processes the N pieces of first information to correspondingly obtain the N pieces of second information, not only mutual impact of a plurality of behaviors belonging to a same category and mutual impact of a plurality of first items may be considered, but also mutual impact of a plurality of behaviors belonging to different categories may be considered. Factors that are considered are comprehensive. Therefore, the item recommendation result output by the target model based on the N pieces of second information can have high accuracy, thereby helping optimize user experience.
Embodiments of the present disclosure provide an item recommendation method and a related device thereof. An item recommendation result output by a neural network model used by the method can have high accuracy, thereby helping optimize user experience.
With rapid development of computer technologies, to meet an Internet access requirement of a user, developers are increasingly inclined to display content of interest on a page of an application. In view of this, for an application, it is usually necessary to predict which item or items the user purchases, that is, recommend an item of interest to the user, to present these items on the page of the application, thereby providing a service for the user. For example, for shopping software, it is required to predict commodities that the user is inclined to purchase when using the shopping software, that is, recommend commodities of interest to the user, and display these commodities on a page of the shopping software for the user to browse and purchase.
Currently, a neural network model of an AI technology can be used to predict an item that can be recommended to the user. Specifically, historical information of the user may be first collected, and the historical information indicates items that the user has interacted with and behaviors of the user for these items. Because there are a plurality of types of behaviors (for example, various types of behaviors such as a tap behavior, an add-to-favorites behavior, a search behavior, an add-to-cart behavior, and a purchase behavior) of the user for the item, the historical information of the user may be classified based on the types of behaviors, and various types of historical information is separately processed by using the neural network model, to obtain processing results of the various types of historical information. Finally, the processing results of the various types of historical information may be superimposed to obtain an item recommendation result, thereby determining a target item recommended to the user.
In the foregoing process, when the neural network model processes historical information, mutual impact of a plurality of behaviors belonging to a same category is mainly considered, and factors that are considered are simple. As a result, accuracy of the item recommendation result finally output by the model is not high, affecting user experience.
Further, a plurality of behaviors of the user are usually sorted (for example, in a time sequence), and orders of the behaviors usually affect a purchase decision of the user. For example, a purchase behavior of the user for an item some time ago still have a large impact on a current interest of the user in another item (whether to perform a purchase behavior). The foregoing neural network model cannot be aware of the impact. As a result, an item recommendation result output by the neural network model cannot accurately match a real intention of the user, and accuracy of the item recommendation result is also reduced.
Further, in a training process of the neural network model, a function of training data indicating an auxiliary behavior (for example, a tap behavior, an add-to-favorites behavior, a search behavior, and an add-to-cart behavior) is usually ignored, and only training data indicating a main behavior (for example, a purchase behavior) is used to complete model training, leading to poor performance of a model obtained through training.
To resolve the foregoing problem, an embodiment of the present disclosure provides an item recommendation method. The method may be implemented with reference to AI technology. The AI technology is a technical discipline that simulates, extends, and expands human intelligence by using a digital computer or a machine controlled by a digital computer. The AI technology obtains an optimal result by perceiving an environment, obtaining knowledge, and using knowledge. In other words, the AI technology is a branch of computer science, and attempts to understand essence of intelligence and produce a new intelligent machine that can react in a similar manner to human intelligence. Using AI to process data is a common application manner of AI.
An overall working process of an AI system is first described.
The infrastructure provides computing capability support for the AI system, implements communication with an external world, and implements support by using a basic platform. The infrastructure communicates with the outside by using a sensor. A computing capability is provided by a smart chip (a hardware acceleration chip such as a central processing unit (CPU), a neural processing unit (NPU), a graphics processing unit (GPU), an application-specific integrated circuit (ASIC), or a field-programmable gate array (FPGA)). The basic platform includes related platforms such as a distributed computing framework and a network for assurance and support, including cloud storage and computing, an interconnection network, and the like. For example, the sensor communicates with the outside to obtain data, and the data is provided to a smart chip in a distributed computing system provided by the basic platform for computing.
Data at an upper layer of the infrastructure indicates a data source in the AI field. The data relates to a graph, an image, a speech, and text, further relates to Internet of things (IoT) data of a device, and includes service data of an existing system and perception data such as force, displacement, a liquid level, a temperature, and humidity.
Data processing usually includes data training, machine learning, deep learning, searching, inference, decision making, and the like.
Machine learning and deep learning may mean performing symbolic and formalized intelligent information modeling, extraction, preprocessing, training, and the like on data.
Inference is a process in which human intelligent inference is simulated in a computer or an intelligent system, and machine thinking and problem resolving are performed by using formal information based on an inference control policy. A typical function is searching and matching.
Decision making is a process of making a decision after intelligent information is inferred, and usually provides functions such as classification, ranking, and prediction.
After data processing mentioned above is performed on data, some general capabilities may further be formed based on a data processing result, for example, an algorithm or a general system such as translation, text analysis, computer vision processing, speech recognition, and image recognition.
The intelligent product and industry application are products and applications of the AI system in various fields. The intelligent product and industry application involve packaging overall AI solutions, to productize and apply intelligent information decision-making. Application fields of the intelligent information decision-making mainly include a smart terminal, smart transportation, smart health care, autonomous driving, a smart city, and the like.
The following describes several application scenarios of the present disclosure.
The data processing device may be a device or a server that has a data processing function, for example, a cloud server, a network server, an application server, or a management server. The data processing device receives the item recommendation request from the smart terminal through an interactive interface, and then performs item recommendation processing in a manner such as machine learning, deep learning, searching, inference, and decision making by using a memory storing data and a processor processing data. The memory in the data processing device may be a general name, and includes a local storage and a database storing historical data. The database may be in the data processing device, or may be in another network server.
In the item recommendation system shown in
In
In the item recommendation system shown in
In
The user equipment in
The processor in
In a process in which the execution device 110 preprocesses the input data, or in a process in which a calculation module 111 of the execution device 110 performs related processing such as calculation (for example, performs function implementation of a neural network in the present disclosure), the execution device 110 may invoke data, code, and the like in a data storage system 150 for corresponding processing, and may further store, into the data storage system 150, data, an instruction, and the like that are obtained through corresponding processing.
Finally, the I/O interface 112 returns a processing result to the client device 140, to provide the processing result for the user.
It should be noted that, for different objectives or different tasks, a training device 120 may generate corresponding target model/rules based on different training data. The corresponding target model/rules may be used to achieve the foregoing objectives or complete the foregoing tasks, thereby providing a required result for the user. The training data may be stored in a database 130, and is a training sample acquired by a data acquisition device 160.
In a case shown in
It should be noted that
An embodiment of the present disclosure further provides a chip. The chip includes a neural-network processing unit NPU. The chip may be disposed in the execution device 110 shown in
The neural-network processing unit NPU serves as a coprocessor, and may be disposed on a host CPU. The host CPU assigns a task. A core part of the NPU is an operation circuit, and a controller controls the operation circuit to extract data in a memory (a weight memory or an input memory) and perform an operation.
In some implementations, the operation circuit includes a plurality of processing engines (PE) inside. In some implementations, the operation circuit is a two-dimensional systolic array. The operation circuit may alternatively be a one-dimensional systolic array or another electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the operation circuit is a general-purpose matrix processor.
For example, it is assumed that there is an input matrix A, a weight matrix B, and an output matrix C. The operation circuit fetches, from a weight memory, data corresponding to the matrix B, and caches the data on each PE in the operation circuit. The operation circuit fetches data of the matrix A from an input memory, to perform a matrix operation on the matrix B, and stores an obtained partial result or an obtained final result of the matrix in an accumulator.
A vector calculation unit may perform further processing on an output of the operation circuit, for example, vector multiplication, vector addition, an exponent operation, a logarithm operation, or a value comparison. For example, the vector calculation unit may be configured to perform network computation, such as pooling, batch normalization, or local response normalization at a non-convolutional/non-FC layer in the neural network.
In some implementations, the vector calculation unit can store a processed output vector in a unified buffer. For example, the vector calculation unit may apply a non-linear function to the output of the operation circuit, for example, a vector of an accumulated value, to generate an activation value. In some implementations, the vector calculation unit generates a normalized value, a combined value, or both a normalized value and a combined value. In some implementations, the processed output vector can be used as an activation input to the operation circuit, for example, used at a subsequent layer in the neural network.
A unified memory is configured to store input data and output data.
For weight data, a direct memory access controller (DMAC) transfers input data in the external memory to the input memory and/or the unified memory, stores, into the weight memory, weight data in the external memory, and stores, into the external memory, the data in the unified memory.
A bus interface unit (BIU) is configured to implement interaction between the host CPU, the DMAC, and an instruction fetch buffer by using a bus.
The instruction fetch buffer connected to the controller is configured to store instructions used by the controller.
The controller is configured to invoke the instructions buffered in the instruction fetch buffer, to control a working process of an operation accelerator.
Usually, the unified memory, the input memory, the weight memory, and the instruction fetch buffer each are an on-chip memory. The external memory is a memory outside the NPU. The external memory may be a double data rate synchronous dynamic random-access memory (DDR SDRAM), a high bandwidth memory (HBM), or another readable and writable memory.
Embodiments of the present disclosure relate to massive application of a neural network. Therefore, for ease of understanding, the following first describes terms and concepts related to the neural network in embodiments of the present disclosure.
The neural network may include a neuron. The neuron may be an operation unit that uses xs and an intercept of 1 as an input. An output of the operation unit may be as follows:
s=1, 2, . . . , or n. n is a natural number greater than 1. Ws is a weight of xs. b is a bias of the neuron. f is an activation function of the neuron, and is used to introduce a non-linear feature into the neural network to, convert an input signal in the neuron into an output signal. The output signal of the activation function may serve as an input of a next convolutional layer. The activation function may be a sigmoid function. The neural network is a network formed by connecting many single neurons together. To be specific, an output of a neuron may be an input of another neuron. An input of each neuron may be connected to a local receptive field of a previous layer to extract a feature of the local receptive field. The local receptive field may be a region including several neurons.
Work at each layer of the neural network may be described by using a mathematical expression y=a(Wx+b). From a physical layer, work at each layer of the neural network may be understood as completing transformation from input space to output space (namely, from row space to column space of a matrix) by performing five operations on the input space (a set of input vectors). The five operations include: 1. dimension increasing/dimension reduction; 2. scaling up/scaling down; 3. rotation; 4. translation; and 5. “bending”. The operations 1, 2, and 3 are performed by Wx, the operation 4 is performed by +b, and the operation 5 is performed by a. The word “space” is used herein for expression because a classified object is not a single thing, but a type of things. Space is a collection of all individuals of this type of things. W is a weight vector, and each value in the vector represents a weight value of one neuron at this layer of the neural network. The vector W determines space transformation from the input space to the output space described above. In other words, a weight W at each layer controls how to transform space. A purpose of training the neural network is to finally obtain a weight matrix (a weight matrix formed by vectors W at a plurality of layers) at all layers of a trained neural network. Therefore, a training process of the neural network is essentially a manner of learning of control of space transformation, and more specifically, learning of a weight matrix.
Because it is expected that an output of the neural network is as close as possible to a value that is actually expected to be predicted, a current predicted value of the network may be compared with an actually expected target value, and then a weight vector at each layer of the neural network is updated based on a difference between the current predicted value and the target value (certainly, there is usually an initialization process before a first update, that is, a parameter is preconfigured for each layer of the neural network). For example, if the predicted value of the network is large, the weight vector is adjusted to decrease the predicted value, and adjustment is continuously performed, until the neural network can predict the actually expected target value. Therefore, “how to obtain a difference between the predicted value and the target value through comparison” needs to be predefined. This is a loss function or an objective function. The loss function and the objective function are important equations that measure the difference between the predicted value and the target value. The loss function is used as an example. A higher output value (loss) of the loss function indicates a larger difference. Therefore, training of the neural network is a process of minimizing the loss as much as possible.
In a training process, a neural network may correct a value of a parameter in an initial neural network model by using an error back propagation (BP) algorithm, so that a reconstruction error loss of the neural network model becomes increasingly smaller. Specifically, an input signal is forward transferred until the error loss is generated in an output, and the parameter of the initial neural network model is updated through back propagation of information about the error loss, to converge the error loss. The back propagation algorithm is an error-loss-centered back propagation motion intended to obtain a parameter, such as a weight matrix, of an optimal neural network model.
The following describes the method provided in the present disclosure from a neural network training side and a neural network application side.
The model training method provided in this embodiment of the present disclosure relates to data sequence processing, and may be specifically applied to methods such as data training, machine learning, and deep learning, to perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, and the like on training data (for example, N pieces of first information and N pieces of third information in the model training method provided in this embodiment of the present disclosure), and finally obtain a trained neural network (for example, a target model in the model training method provided in this embodiment of the present disclosure). In addition, in the item recommendation method provided in this embodiment of the present disclosure, input data (for example, the N pieces of first information and the N pieces of third information in the item recommendation method provided in this embodiment of the present disclosure) may be input into the trained neural network by using the foregoing trained neural network, to obtain output data (for example, the item recommendation result in the item recommendation method provided in this embodiment of the present disclosure). It should be noted that, the model training method and the item recommendation method provided in embodiments of the present disclosure are based on a same idea, or may be understood as two parts in a system, or two phases of an entire procedure, for example, a model training phase and a model application phase.
401: Obtain N pieces of first information, where an ith piece of first information indicates an ith first item and an ith behavior, the ith behavior is a behavior of a user for the ith item, N behaviors of the user correspond to M categories, i=1, . . . , N, N≥M, and M>1.
In this embodiment, when a user uses an application, to display an item of interest on a page of the application, some historical data of previously using the application by the user may be first acquired. The historical data may include attribute information (for example, a name of a first item, a price of the first item, a function of the first item, and a category of the first item) of N first items (which may also be referred to as historical items) that the user has operated on the application, and related information (for example, a type of a behavior) of the N behaviors performed by the user when the user operates the N first items. It should be noted that, based on the related information of the N behaviors, the N behaviors may be classified into M categories, and one category may include at least one behavior. For example, for shopping software, it is assumed that the user once taps three commodities, adds two commodities to favorites, and purchases two commodities on the software. It can be learned that the user has operated seven commodities, and therefore, the user correspondingly performs seven behaviors. The seven behaviors may be classified into three types. A first type includes three tap behaviors, a second type includes two add-to-favorites behaviors, and a third type includes two purchase behaviors. In this case, when historical data of the user for the software is acquired, the historical data includes attribute information of the seven commodities and related information of the seven behaviors.
In this case, the attribute information of the N first items may be separately mapped to latent space, to correspondingly obtain vector representations of the N first items, where a vector representation of the ith first item indicates the ith first item. Similarly, the related information of the N behaviors may be separately mapped to the latent space, to correspondingly obtain vector representations (that is, N pieces of third information) of the N behaviors, where a vector representation of the ith behavior (that is, an ith piece of third information) indicates a behavior of the user for the ith first item. For example, after the attribute information of the N first items is mapped, vector representations x=[x1, x2, . . . , xN] of the N first items may be obtained. A vector representation x1 of a 1st first item indicates the 1st first item, a vector representation x2 of a 2nd first item indicates the 2nd first item, . . . , and a vector representation xN of an Nth first item indicates the Nth first item. Similarly, after the related information of the N behaviors is mapped, vector representations b=[b1, b2, . . . , bN] of the N behaviors may be obtained. A vector representation b1 of a 1st behavior (that is, a 1st piece of third information) indicates a behavior of the user for the 1st first item, a vector representation b2 of a 2nd behavior (that is, a 2nd piece of third information) indicates a behavior of the user for the 2nd first item, . . . , and a vector representation bN of an Nth behavior (that is, an Nth piece of third information) indicates a behavior of the user for the Nth first item.
Based on this, the vector representation of the ith first item and the vector representation of the ith behavior may be spliced to obtain the ith piece of first information. The ith piece of first information indicates the ith first item and a behavior of the user for the ith first item. Similar splicing operations may also be performed on vector representations of remaining first items and vector representations of remaining behaviors, so that the N pieces of first information can be obtained. Still as in the foregoing example, the vector representation x1 of the 1st first item and the vector representation b1 of the 1st behavior may be spliced to obtain a 1st piece of first information h1, the vector representation of the 2nd first item and the vector representation of the 2nd behavior may be spliced to obtain a 2nd piece of first information h2, . . . , and the vector representation of the Nth first item and the vector representation of the Nth behavior are spliced to obtain an Nth piece of first information hN. In this way, the N pieces of first information H=[h1, h2, . . . , hN] are obtained.
After the N pieces of first information are obtained, the N pieces of first information may be input into a target model (a trained neural network model), to process the N pieces of first information by using the target model and obtain an item recommendation result.
402: Process the N pieces of first information by using the target model based on a multi-head self-attention mechanism, to obtain N pieces of second information.
The target model obtains the N pieces of first information, and the N pieces of first information may be processed based on the multi-head self-attention mechanism to obtain the N pieces of second information.
It should be noted that, as shown in
Specifically, the first module of the target model may process the N pieces of first information based on the multi-head self-attention mechanism in the following manners, to obtain the N pieces of second information.
(1) After the N pieces of first information are received, for any one of the N pieces of first information, that is, the ith piece of first information, the first module may first perform linear processing on the ith piece of first information, to obtain an ith piece of Q information, an ith piece of K information, and an ith piece of V information. For the remaining first information other than the ith piece of first information, the first module may further perform an operation similar to that performed on the ith piece of first information. Therefore, a total of N pieces of Q information, N pieces of K information, and N pieces of V information can be obtained. To be specific, the first module may perform linear processing on the 1st piece of first information, to obtain a 1st piece of Q information, a 1st piece of K information, and a 1st piece of V information, may further perform linear processing on the 2nd piece of first information, to obtain a 2nd piece of Q information, a 2nd piece of K information, a 2nd piece of V information, . . . , and may further perform linear processing on the Nth piece of first information, to obtain an Nth piece of Q information, an Nth piece of K information, and an Nth piece of V information.
(2) For the ith piece of Q information, the first module may perform an operation on the ith piece of Q information, the N pieces of K information, the N pieces of V information, and N pieces of weight information corresponding to the ith behavior, to obtain an ith piece of second information. A jth piece of weight information corresponding to the ith behavior is determined based on the ith behavior and a jth behavior, and j=1, . . . , N. For the remaining Q information other than the ith piece of Q information, the first module may further perform an operation similar to that performed on the ith piece of Q information. Therefore, the N pieces of second information can be obtained. To be specific, the first module may first perform an operation on the 1st piece of Q information, the N pieces of K information, the N pieces of V information, and N pieces of weight information corresponding to a 1st behavior, to obtain a 1st piece of second information, the first module may further perform an operation on the 2nd piece of Q information, the N pieces of K information, the N pieces of V information, and N pieces of weight information corresponding to a 2nd behavior, to obtain a 2nd piece of second information, . . . , and the first module may further perform an operation on the Nth piece of Q information, the N pieces of K information, the N pieces of V information, and N pieces of weight information corresponding to an Nth behavior, to obtain an Nth piece of second information. A 1st piece of weight information corresponding to the 1st behavior is determined based on the 1st behavior, a 2nd piece of weight information corresponding to the 1st behavior is determined based on the 1st behavior and the 2nd behavior, . . . , an Nth piece of weight information of the 1st behavior is determined based on the 1st behavior and the Nth behavior, . . . , a 1st piece of weight information corresponding to the Nth behavior is determined based on the Nth behavior and the 1st behavior, a 2nd piece of weight information corresponding to the Nth behavior is determined based on the Nth behavior and the 2nd behavior, . . . , and an Nth piece of weight information of the Nth behavior is determined based on the Nth behavior.
Further, the second module of the target model may further cooperate with the first module of the target model to jointly obtain the N pieces of second information.
(1) After the N pieces of third information are received, for any one of the N pieces of third information, that is, the ith piece of third information, the second module may perform an operation on the ith piece of third information and the N pieces of third information, to obtain N pieces of fourth information corresponding to the ith behavior. A jth piece of fourth information corresponding to the ith behavior indicates a distance between the ith behavior and the jth behavior. For the remaining third information other than the ith piece of third information, the second module may also perform an operation similar to that performed on the ith piece of third information. Therefore, a total of N pieces of fourth information corresponding to the 1st behavior, N pieces of fourth information corresponding to the 2nd behavior, . . . , and N pieces of fourth information corresponding to the Nth behavior can be obtained. To be specific, the second module may perform an operation on the 1st piece of third information and the 1st piece of third information to obtain a 1st piece of fourth information (indicating a distance between 1st behaviors) corresponding to the 1st behavior, may further perform an operation on the 1st piece of third information and the 2nd piece of third information to obtain a 2nd piece of fourth information (indicating a distance between the 1st behavior and the 2nd behavior) corresponding to the 1st behavior, . . . , may further perform an operation on the 1st piece of third information and an Nth piece of third information to obtain an Nth piece of fourth information (indicating a distance between the 1st behavior and the Nth behavior) corresponding to the 1st behavior, . . . , may further perform an operation on the Nth piece of third information and the 1st piece of third information to obtain a 1st piece of fourth information (indicating a distance between the Nth behavior and the 1st behavior) corresponding to the Nth behavior, may further perform an operation on the Nth piece of third information and the 2nd piece of third information to obtain a 2nd piece of fourth information (indicating a distance between the Nth behavior and the 2nd behavior) corresponding to the Nth behavior, . . . , and may further perform an operation on the Nth piece of third information and the Nth piece of third information to obtain an Nth piece of fourth information (indicating a distance between Nth behaviors) corresponding to the Nth behavior.
(2) After the N pieces of fourth information corresponding to the 1st behavior, the N pieces of fourth information corresponding to the 2nd behavior, . . . , and the N pieces of fourth information corresponding to the Nth behavior from the second module are received, for the ith piece of Q information, the first module may perform an operation on the ith piece of Q information, the N pieces of K information, the N pieces of V information, the N pieces of weight information corresponding to the ith behavior, and the N pieces of fourth information corresponding to the ith behavior, to obtain the ith piece of second information. For the remaining Q information other than the ith piece of Q information, the first module may further perform an operation similar to that performed on the ith piece of Q information. Therefore, the N pieces of second information can be obtained. To be specific, the first module may first perform an operation on the 1st piece of Q information, the N pieces of K information, the N pieces of V information, the N pieces of weight information corresponding to the 1st behavior, and the N pieces of fourth information corresponding to the 1st behavior to obtain the 1st piece of second information, the first module may further perform an operation on the 2nd piece of Q information, the N pieces of K information, the N pieces of V information, the N pieces of weight information corresponding to the 2nd behavior, and the N pieces of fourth information corresponding to the 2nd behavior to obtain the 2nd piece of second information, . . . , and the first module may further perform an operation on the Nth piece of Q information, the N pieces of K information, the N pieces of V information, the N pieces of weight information corresponding to the Nth behavior, and the N pieces of fourth information corresponding to the Nth behavior to obtain the Nth piece of second information.
Further, the distance between the ith behavior and the jth behavior includes an interval between an order of the ith behavior and an order of the jth behavior, for example, an interval between a time at which the user performs the ith behavior and a time at which the user performs the jth behavior.
Still as in the foregoing example, as shown in
The following describes working processes of the N groups of self-attention modules and the MLP module. Because the working processes of the N groups of self-attention modules are similar, for ease of description, the following describes any one of the N groups of self-attention modules, that is, an ith group of self-attention modules.
For a jth self-attention module in the ith group of self-attention modules, the ith piece of first information hi and a jth piece of first information hj are input. The self-attention module may first perform linear processing on the ith piece of first information hi to obtain the ith piece of Q information qi, the ith piece of K information ki, and the ith piece of V information vi, and perform linear processing on the jth piece of first information hj to obtain a jth piece of Q information qj, a jth piece of K information kj, and a jth piece of V information vj.
Then, the module may multiply the ith piece of Q information qi, the jth piece of K information kj, and a jth first weight matrix W(b
At the same time, the module may further receive the jth piece of fourth information P1[i, j] corresponding to the ith behavior from the second module. The jth piece of fourth information P1[i, j] corresponding to the ith behavior may be obtained by the second module by performing an operation on the ith piece of third information bi and a jth piece of third information bj. An operation process is expressed as the following formula:
In the foregoing formula, (j−i) may be determined based on the ith piece of third information bi and the jth piece of third information bj, and indicates an interval between an order of the ith behavior and an order of the jth behavior.
Then, the module may perform fusion (for example, addition) on the jth piece of ninth information F[i, j] corresponding to the ith behavior and the jth piece of fourth information P1[i, j] corresponding to the ith behavior, and then perform normalization (for example, softmax processing), to obtain a jth piece of tenth information A[i, j] corresponding to the ith behavior.
Subsequently, the module may multiply the jth piece of tenth information A[i, j] corresponding to the ith behavior, a jth second weight matrix W(b
It should be noted that the jth first weight matrix W(b
Similarly, the remaining self-attention modules other than the jth self-attention module in the ith group of self-attention modules may also perform an operation similar to that performed by the jth self-attention module. Therefore, the ith group of self-attention modules may obtain N pieces of eleventh information R[i]=[R [i, 1], R [i, 2], . . . , R [i, N]] corresponding to the ith behavior, and perform weighted summation on R[i, 1], R [i, 2], . . . , R [i, N], to obtain an ith piece of twelfth information gi.
Similarly, the remaining groups of self-attention modules other than the ith group of self-attention modules may also perform an operation similar to that performed by the ith group of self-attention modules. Therefore, the N groups of self-attention modules may obtain N pieces of twelfth information G=[g1, g2, . . . , gN] in total, that is, the 1st piece of twelfth information g1, the 2nd piece of twelfth information g2, . . . , and the Nth piece of twelfth information gN.
Finally, after the N pieces of twelfth information G=[g1, g2, . . . , gN] are received, the MLP module may process the N pieces of twelfth information G=[g1, g2, . . . , gN] with reference to the N pieces of third information b=[b1, b2, . . . , bN], to obtain N pieces of second information H=[h1, h2, . . . , hN]. To be specific, the 1st piece of third information b1 and the 1st piece of twelfth information g1 are processed (for example, feature extraction and non-linear processing) to obtain the 1st piece of second information h1, the 2nd piece of third information b2 and the 2nd piece of twelfth information g2 are processed to obtain the 2nd piece of second information h2, . . . , and the Nth piece of third information bN and the Nth piece of twelfth information gN are processed (for example, feature extraction and non-linear processing) to obtain the Nth piece of second information hN.
403: Obtain an item recommendation result by using the target model based on the N pieces of second information, where the item recommendation result is used to determine, from K second items, a target item recommended to the user, and K≥1.
After the N pieces of second information are obtained, the target model may obtain the item recommendation result based on the N pieces of second information. The item recommendation result may be used to determine, from the K second items (which may also be understood as candidate items), the target item recommended to the user, and K≥1. Usually, the K second items include the N first items.
Specifically, the third module of the target model may obtain the item recommendation result in the following manners.
(1) After the N pieces of second information from the first module are received, a first expert network of the third module may perform feature extraction on the N pieces of second information to obtain fifth information. The fifth information includes an exclusive characteristic of each of the N behaviors. Therefore, the fifth information may be used to indicate a difference between the N behaviors. At the same time, a second expert network of the third module may perform feature extraction on the N pieces of second information to obtain sixth information. The sixth information includes a common characteristic of the N behaviors. Therefore, the sixth information may indicate a same point between the N behaviors.
(2) After the fifth information and the sixth information are obtained, the third module may perform weighted summation on the fifth information and the sixth information (a used weight may be determined by the third module based on the N pieces of third information), to obtain seventh information. The seventh information is a behavior representation of the user. Therefore, the seventh information may indicate interest distribution of the user.
(3) After the seventh information is obtained, the third module may further obtain K pieces of eighth information. A tth piece of eighth information indicates a tth second item, and t=1, . . . , K. In the K pieces of eighth information, the third module may calculate a matching degree between the seventh information and the tth piece of eighth information. For eighth information other than the tth piece of eighth information, the third module may also perform an operation similar to that performed on the tth piece of eighth information. Therefore, matching degrees between the seventh information and the K pieces of eighth information can be obtained. Therefore, these matching degrees can be used as final item recommendation results output by the target model.
In this way, based on the item recommendation result, it may be determined that some second items with a high matching degree are target items recommended to the user.
In addition, the target module provided in this embodiment of the present disclosure may be further compared with a neural network model provided in a related technology, for comparison of performance of these models on different datasets. Comparison results are shown in Table 1.
It can be learned from Table 1 that, in terms of recommendation accuracy, the target model provided in embodiments of the present disclosure can obtain an optimal experimental result on both indicators, which proves effectiveness of the item recommendation manner provided in embodiments of the present disclosure.
In embodiments of the present disclosure, when the target item of interest needs to be recommended to the user, the N pieces of first information may be first input to the target model. The ith piece of first information indicates the ith first item and the ith behavior. The ith behavior is a behavior of the user for the ith item. The N behaviors of the user correspond to M categories. i=1, . . . , N≥M, and M>1. Then, the N pieces of first information may be processed by using the target model based on the multi-head self-attention mechanism, to obtain the N pieces of second information. Finally, the item recommendation result can be obtained by using the target model based on the N pieces of second information. The item recommendation result is used to determine, from the K second items, the target item recommended to the user, and K≥1. In the foregoing process, the N pieces of first information not only indicate N first items, but also indicate the N behaviors that can be classified into the M categories. Therefore, in a process in which the target model processes the N pieces of first information to correspondingly obtain the N pieces of second information, not only mutual impact of a plurality of behaviors belonging to a same category and mutual impact of a plurality of first items may be considered, but also mutual impact of a plurality of behaviors belonging to different categories may be considered. Factors that are considered are comprehensive. Therefore, the item recommendation result output by the target model based on the N pieces of second information can have high accuracy, thereby helping optimize user experience.
Further, in a process of processing the N pieces of first information based on the multi-head self-attention mechanism, the target model further considers impact caused by an interval (for example, an interval between times at which the user performs different behaviors) between orders of different behaviors. Factors that are considered are more comprehensive in comparison with a related technology. The item recommendation result output by the target model may also accurately fit a real intention of the user, thereby further improving accuracy of the item recommendation result.
The foregoing describes in detail the item recommendation method provided in embodiments of the present disclosure. The following describes a model training method provided in embodiments of the present disclosure.
701: Input N pieces of first information into a to-be-trained model to obtain a predicted item recommendation result, where the to-be-trained model is configured to: obtain the N pieces of first information, where an ith piece of first information indicates an ith first item and an ith behavior, the ith behavior is a behavior of a user for the ith item, N behaviors of the user correspond to M categories, i=1, . . . , N, N≥M, and M>1; process the N pieces of first information based on a multi-head self-attention mechanism, to obtain N pieces of second information; and obtain the predicted item recommendation result based on the N pieces of second information, where the predicted item recommendation result is used to determine, from K second items, a target item recommended to the user, and K≥1.
In this embodiment, when the to-be-trained model (that is, a neural network model that needs to be trained) needs to be trained, a batch of training data may be first obtained. The batch of training data includes the N pieces of first information. The ith piece of first information indicates the ith first item (that is, a historical item) and the ith behavior, the ith behavior is the behavior of the user for the ith item, the N behaviors of the user correspond to the M categories, i=1, . . . , N, N≥M, and M>1. It should be noted that real item recommendation results corresponding to the N pieces of first information are known. Therefore, based on the real item recommendation results, a real item recommended to the user may be determined from the K second items (that is, candidate items).
In this case, after the N pieces of first information are obtained, the N pieces of first information may be input into the to-be-trained model. In this way, after the N pieces of first information are received, the to-be-trained model may process the N pieces of first information based on the multi-head self-attention mechanism, to obtain the N pieces of second information. Then, the to-be-trained model may obtain the predicted item recommendation result based on the N pieces of second information, and the predicted item recommendation result is used to determine, from the K second items, the target item (a predicted item) recommended to the user.
In a possible implementation, the to-be-trained model is configured to: perform linear processing on the ith piece of first information, to obtain an ith piece of Q information, an ith piece of K information, and an ith piece of V information; and perform an operation on the ith piece of Q information, N pieces of K information, N pieces of V information, and N pieces of weight information corresponding to the ith behavior, to obtain an ith piece of second information, where a jth piece of weight information corresponding to the ith behavior is determined based on the ith behavior and a jth behavior, and j=1, . . . , N.
In a possible implementation, the to-be-trained model is further configured to: obtain N pieces of third information, where an ith piece of third information indicates the ith behavior; and perform an operation on the ith piece of third information and the N pieces of third information, to obtain N pieces of fourth information corresponding to the ith behavior, where a jth piece of fourth information corresponding to the ith behavior indicates a distance between the ith behavior and the jth behavior. The to-be-trained model is configured to perform an operation on the ith piece of Q information, the N pieces of K information, the N pieces of V information, the N pieces of weight information corresponding to the ith behavior, and the N pieces of fourth information corresponding to the ith behavior, to obtain the ith piece of second information.
In a possible implementation, the distance between the ith behavior and the jth behavior includes an interval between an order of the ith behavior and an order of the jth behavior.
In a possible implementation, the to-be-trained model is configured to: perform feature extraction on the N pieces of second information to obtain fifth information and sixth information, where the fifth information indicates a difference between the N behaviors, and the sixth information indicates a same point between the N behaviors; fuse the fifth information and the sixth information to obtain seventh information, where the seventh information indicates interest distribution of the user; and calculate matching degrees between the seventh information and K pieces of eighth information, where the matching degree is used as the item recommendation result, a tth piece of eighth information indicates a tth second item, and t=1, . . . , K.
In a possible implementation, the K second items include the N first items.
It should be noted that for descriptions of step 701, refer to related descriptions of step 401 to step 403 in the embodiment shown in
702: Obtain a target loss based on the predicted item recommendation result and a real item recommendation result, where the target loss indicates a difference between the predicted item recommendation result and the real item recommendation result.
After the predicted item recommendation result output by the to-be-trained model is obtained, because the real item recommendation result is known, the predicted item recommendation result and the real item recommendation result may be calculated by using a preset target loss function, to obtain the target loss. The target loss indicates the difference between the predicted item recommendation result and the real item recommendation result.
703: Update a parameter of the to-be-trained model based on the target loss until a model training condition is met, to obtain a target model.
After the target loss is obtained, the parameter of the to-be-trained model may be updated based on the target loss, and the to-be-trained model obtained after the parameter is updated continues to be trained by using a next batch of training data until the model training condition is met (for example, the target loss is convergent), to obtain the target model in the embodiment shown in
The target model obtained through training in this embodiment of the present disclosure has a function of recommending an item to the user. When the target item of interest needs to be recommended to the user, the N pieces of first information may be first input to the target model. The ith piece of first information indicates the ith first item and the ith behavior. The ith behavior is a behavior of the user for the ith item. The N behaviors of the user correspond to M categories. i=1, . . . , N≥M, and M>1. Then, the N pieces of first information may be processed by using the target model based on the multi-head self-attention mechanism, to obtain the N pieces of second information. Finally, the item recommendation result can be obtained by using the target model based on the N pieces of second information. The item recommendation result is used to determine, from the K second items, the target item recommended to the user, and K≥1. In the foregoing process, the N pieces of first information not only indicate N first items, but also indicate the N behaviors that can be classified into the M categories. Therefore, in a process in which the target model processes the N pieces of first information to correspondingly obtain the N pieces of second information, not only mutual impact of a plurality of behaviors belonging to a same category and mutual impact of a plurality of first items may be considered, but also mutual impact of a plurality of behaviors belonging to different categories may be considered. Factors that are considered are comprehensive. Therefore, the item recommendation result output by the target model based on the N pieces of second information can have high accuracy, thereby helping optimize user experience.
Further, in a process of processing the N pieces of first information based on the multi-head self-attention mechanism, the target model further considers impact caused by an interval (for example, an interval between times at which the user performs different behaviors) between orders of different behaviors. Factors that are considered are more comprehensive in comparison with a related technology. The item recommendation result output by the target model may also accurately fit a real intention of the user, thereby further improving accuracy of the item recommendation result.
Further, in a training process of the target model, the used training data, that is, the N pieces of first information, indicates the N behaviors that can be classified into the M categories. The N behaviors may include behaviors such as a tap behavior, an add-to-favorites behavior, a search behavior, and an add-to-cart behavior, and may further include a purchase behavior. Therefore, in this embodiment of the present disclosure, not only a function of training data indicating a main behavior in model training is considered, but also a function of training data indicating an auxiliary behavior in model training is considered, so that the target model obtained through training may have good performance.
The foregoing describes in detail the model training method provided in embodiments of the present disclosure. The following describes an item recommendation apparatus and a model training device apparatus provided in embodiments of the present disclosure.
In embodiments of the present disclosure, when the target item of interest needs to be recommended to the user, the N pieces of first information may be first input to the target model. The ith piece of first information indicates the ith first item and the ith behavior. The ith behavior is a behavior of the user for the ith item. The N behaviors of the user correspond to M categories. i=1, . . . , N≥M, and M>1. Then, the N pieces of first information may be processed by using the target model based on the multi-head self-attention mechanism, to obtain the N pieces of second information. Finally, the item recommendation result can be obtained by using the target model based on the N pieces of second information. The item recommendation result is used to determine, from the K second items, the target item recommended to the user, and K≥1. In the foregoing process, the N pieces of first information not only indicate N first items, but also indicate the N behaviors that can be classified into the M categories. Therefore, in a process in which the target model processes the N pieces of first information to correspondingly obtain the N pieces of second information, not only mutual impact of a plurality of behaviors belonging to a same category and mutual impact of a plurality of first items may be considered, but also mutual impact of a plurality of behaviors belonging to different categories may be considered. Factors that are considered are comprehensive. Therefore, the item recommendation result output by the target model based on the N pieces of second information can have high accuracy, thereby helping optimize user experience.
In a possible implementation, the processing module 802 is configured to: perform linear processing on the ith piece of first information by using the target model, to obtain an ith piece of Q information, an ith piece of K information, and an ith piece of V information; and perform an operation on the ith piece of Q information, N pieces of K information, N pieces of V information, and N pieces of weight information corresponding to the ith behavior by using the target model, to obtain an ith piece of second information, where a jth piece of weight information corresponding to the ith behavior is determined based on the ith behavior and a jth behavior, and j=1, . . . , N.
In a possible implementation, the apparatus further includes: a third obtaining module configured to obtain N pieces of third information by using the target model, where an ith piece of third information indicates the ith behavior; an operation module configured to perform an operation on the ith piece of third information and the N pieces of third information by using the target model to obtain N pieces of fourth information corresponding to the ith behavior, where a jth piece of fourth information corresponding to the ith behavior indicates a distance between the ith behavior and the jth behavior; and a processing module 802 configured to perform an operation on the ith piece of Q information, the N pieces of K information, the N pieces of V information, the N pieces of weight information corresponding to the ith behavior, and the N pieces of fourth information corresponding to the ith behavior by using the target model, to obtain the ith piece of second information.
In a possible implementation, the distance between the ith behavior and the jth behavior includes an interval between an order of the ith behavior and an order of the jth behavior.
In a possible implementation, the second obtaining module 803 is configured to: perform feature extraction on the N pieces of second information by using the target model to obtain fifth information and sixth information, where the fifth information indicates a difference between the N behaviors, and the sixth information indicates a same point between the N behaviors; fuse the fifth information and the sixth information by using the target model to obtain seventh information, where the seventh information indicates interest distribution of the user; and calculate matching degrees between the seventh information and K pieces of eighth information by using the target model, where the matching degree is used as the item recommendation result, a tth piece of eighth information indicates a tth second item, and t=1, . . . , K.
In a possible implementation, the K second items include the N first items.
The target model obtained through training in this embodiment of the present disclosure has a function of recommending an item to the user. When the target item of interest needs to be recommended to the user, the N pieces of first information may be first input to the target model. The ith piece of first information indicates the ith first item and the ith behavior. The ith behavior is a behavior of the user for the ith item. The N behaviors of the user correspond to M categories. i=1, . . . , N≥M, and M>1. Then, the N pieces of first information may be processed by using the target model based on the multi-head self-attention mechanism, to obtain the N pieces of second information. Finally, the item recommendation result can be obtained by using the target model based on the N pieces of second information. The item recommendation result is used to determine, from the K second items, the target item recommended to the user, and K≥1. In the foregoing process, the N pieces of first information not only indicate N first items, but also indicate the N behaviors that can be classified into the M categories. Therefore, in a process in which the target model processes the N pieces of first information to correspondingly obtain the N pieces of second information, not only mutual impact of a plurality of behaviors belonging to a same category and mutual impact of a plurality of first items may be considered, but also mutual impact of a plurality of behaviors belonging to different categories may be considered. Factors that are considered are comprehensive. Therefore, the item recommendation result output by the target model based on the N pieces of second information can have high accuracy, thereby helping optimize user experience.
In a possible implementation, the to-be-trained model is configured to: perform linear processing on the ith piece of first information, to obtain an ith piece of Q information, an ith piece of K information, and an ith piece of V information; and perform an operation on the ith piece of Q information, N pieces of K information, N pieces of V information, and N pieces of weight information corresponding to the ith behavior, to obtain an ith piece of second information, where a jth piece of weight information corresponding to the ith behavior is determined based on the ith behavior and a jth behavior, and j=1, . . . , N.
In a possible implementation, the to-be-trained model is further configured to: obtain N pieces of third information, where an ith piece of third information indicates the ith behavior; and perform an operation on the ith piece of third information and the N pieces of third information, to obtain N pieces of fourth information corresponding to the ith behavior, where a jth piece of fourth information corresponding to the ith behavior indicates a distance between the ith behavior and the jth behavior. The to-be-trained model is configured to perform an operation on the ith piece of Q information, the N pieces of K information, the N pieces of V information, the N pieces of weight information corresponding to the ith behavior, and the N pieces of fourth information corresponding to the ith behavior, to obtain the ith piece of second information.
In a possible implementation, the distance between the ith behavior and the jth behavior includes an interval between an order of the ith behavior and an order of the jth behavior. In a possible implementation, the to-be-trained model is configured to: perform feature extraction on the N pieces of second information to obtain fifth information and sixth information, where the fifth information indicates a difference between the N behaviors, and the sixth information indicates a same point between the N behaviors; fuse the fifth information and the sixth information to obtain seventh information, where the seventh information indicates interest distribution of the user; and calculate matching degrees between the seventh information and K pieces of eighth information, where the matching degree is used as the item recommendation result, a tth piece of eighth information indicates a tth second item, and t=1, . . . , K.
In a possible implementation, the K second items include the N first items.
It should be noted that content such as information exchange between the modules/units of the apparatuses and an execution process is based on the same concept as that of the method embodiments of the present disclosure, and produces the same technical effects as those of the method embodiments of the present disclosure. For specific content, refer to the foregoing descriptions in the method embodiments of the present disclosure.
An embodiment of the present disclosure further relates to an execution device.
The memory 1004 may include a read-only memory and a random access memory, and provide instructions and data for the processor 1003. A part of the memory 1004 may further include a non-volatile random-access memory (NVRAM). The memory 1004 stores a processor and operation instructions, an executable module or a data structure, a subnet thereof, or an extended set thereof. The operation instructions may include various operation instructions for various operations.
The processor 1003 controls an operation of the execution device. During specific application, the components of the execution device are coupled together through a bus system. In addition to a data bus, the bus system may further include a power bus, a control bus, a status signal bus, and the like. However, for clear description, various types of buses in the figure are referred to as the bus system.
The method disclosed in embodiments of the present disclosure may be applied to the processor 1003, or may be implemented by the processor 1003. The processor 1003 may be an integrated circuit chip, and has a signal processing capability. In an implementation process, steps in the methods can be implemented by using a hardware integrated logic circuit in the processor 1003, or by using instructions in a form of software. The processor 1003 may be a general-purpose processor, a digital signal processor (DSP), a microprocessor, or a microcontroller. The processor 1003 may further include an ASIC, an FPGA) or another programmable logic device, a discrete gate, a transistor logic device, or a discrete hardware component. The processor 1003 may implement or perform the methods, the steps, and logical block diagrams that are disclosed in embodiments of the present disclosure. The general-purpose processor may be a microprocessor, or the processor may be any general processor or the like. The steps in the methods disclosed with reference to embodiments of the present disclosure may be directly performed and completed by a hardware decoding processor, or may be performed and completed by using a combination of hardware in the decoding processor and a software module. The software module may be located in a mature storage medium in the art, such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, an electrically erasable programmable memory, or a register. The storage medium is located in the memory 1004, and the processor 1003 reads information in the memory 1004 and completes the steps in the foregoing methods in combination with hardware of the processor.
The receiver 1001 may be configured to receive input digital or character information, and generate a signal input related to setting and function control of the execution device. The transmitter 1002 may be configured to output the digital or character information through a first interface. The transmitter 1002 may be configured to send instructions to a disk group through the first interface, to modify data in the disk group. The transmitter 1002 may further include a display device such as a display.
In this embodiment of the present disclosure, in one case, the processor 1003 is configured to process the user-associated information by using the target model in the embodiment corresponding to
An embodiment of the present disclosure further relates to a training device.
The training device 1100 may further include one or more power supplies 1126, one or more wired or wireless network interfaces 1150, one or more input/output interfaces 1158, or one or more operating systems 1141, for example, Windows Server™, Mac OS X™, Unix™, Linux™ and FreeBSD™.
Specifically, the training device may perform the model training method in the embodiment corresponding to
An embodiment of the present disclosure further relates to a computer-readable storage medium. The computer-readable storage medium stores a program used for signal processing. When the program runs on a computer, the computer is enabled to perform the steps performed by the foregoing execution device, or the computer is enabled to perform the steps performed by the foregoing training device.
An embodiment of the present disclosure further relates to a computer program product. The computer program product stores instructions. When the instructions are executed by a computer, the computer is enabled to perform the steps performed by the foregoing execution device, or the computer is enabled to perform the steps performed by the foregoing training device.
The execution device, the training device, or the terminal device provided in embodiments of the present disclosure may be specifically a chip. The chip includes a processing unit and a communication unit. The processing unit may be, for example, a processor. The communication unit may be, for example, an input/output interface, a pin, or a circuit. The processing unit may execute computer-executable instructions stored in a storage unit, so that a chip in the execution device performs the item recommendation method described in embodiments, or a chip in the training device performs the model training method described in embodiments. Optionally, the storage unit is a storage unit in the chip, for example, a register or a cache. Alternatively, the storage unit may be a storage unit in a wireless access device but outside the chip, for example, a read-only memory (ROM), another type of static storage device that can store static information and instructions, or a random-access memory (RAM).
Specifically,
In some implementations, the operation circuit 1203 includes a plurality of processing engines (PE) inside. In some implementations, the operation circuit 1203 is a two-dimensional systolic array. The operation circuit 1203 may alternatively be a one-dimensional systolic array or another electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the operation circuit 1203 is a general-purpose matrix processor.
For example, it is assumed that there is an input matrix A, a weight matrix B, and an output matrix C. The operation circuit fetches, from a weight memory 1202, data corresponding to the matrix B, and caches the data on each PE in the operation circuit. The operation circuit fetches data of the matrix A from an input memory 1201, to perform a matrix operation on the matrix B, and stores an obtained partial result or an obtained final result of the matrix in an accumulator 1208.
A unified memory 1206 is configured to store input data and output data. Weight data is directly transferred to the weight memory 1202 by using a direct memory access controller (DMAC) 1205. The input data is also transferred to the unified memory 1206 by using the DMAC.
A BIU is a bus interface unit, namely, a bus interface unit 1213, and is configured to perform interaction between an AXI bus and the DMAC and between the AXI bus and an instruction fetch buffer (IFB) 1209.
The bus interface unit (BIU) 1213 is used by the instruction fetch buffer 1209 to obtain instructions from an external memory, and is further used by the direct memory access controller 1205 to obtain original data of the input matrix A or the weight matrix B from the external memory.
The DMAC is mainly configured to transfer input data in the external memory DDR to the unified memory 1206, transfer weight data to the weight memory 1202, or transfer input data to the input memory 1201.
A vector calculation unit 1207 includes a plurality of operation processing units. If required, further processing is performed on an output of the operation circuit 1203, for example, vector multiplication, vector addition, an exponential operation, a logarithmic operation, or a value comparison. The vector calculation unit 1207 is mainly configured to perform network computation at a non-convolutional/fully-connected layer of a neural network, for example, batch normalization, pixel-level summation, and upsampling of a predicted label plane.
In some implementations, the vector calculation unit 1207 can store a processed output vector in the unified memory 1206. For example, the vector calculation unit 1207 may apply a linear function or a non-linear function to the output of the operation circuit 1203. For example, linear interpolation is performed on a predicted label plane extracted at a convolutional layer. For another example, vectors whose values are accumulated are used to generate an activation value. In some implementations, the vector calculation unit 1207 generates a normalized value, a pixel-level summation value, or both a normalized value and a pixel-level summation value. In some implementations, the processed output vector can be used as an activation input to the operation circuit 1203, for example, used at a subsequent layer in the neural network.
The instruction fetch buffer 1209 connected to the controller 1204 is configured to store instructions used by the controller 1204.
The unified memory 1206, the input memory 1201, the weight memory 1202, and the instruction fetch buffer 1209 are all on-chip memories. The external memory is private to a hardware architecture of the NPU.
Any one of the processors mentioned above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling program execution.
In addition, it should be noted that the described apparatus embodiment is merely an example. The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all the modules may be selected according to actual needs to achieve the objectives of the solutions of embodiments. In addition, in the accompanying drawings of the apparatus embodiments provided in the present disclosure, a connection relationship between modules indicates that the modules have a communication connection with each other, and may be specifically implemented as one or more communication buses or signal cables.
Based on the description of the foregoing implementations, a person skilled in the art may clearly understand that the present disclosure may be implemented by software in addition to necessary universal hardware, or by dedicated hardware, including an ASIC, a dedicated CPU, a dedicated memory, a dedicated component, and the like. Usually, any function implemented by a computer program may be easily implemented by using corresponding hardware. In addition, specific hardware structures used to implement a same function may be various, for example, an analog circuit, a digital circuit, or a dedicated circuit. However, in the present disclosure, a software program implementation is a better implementation in most cases. Based on such an understanding, the technical solutions of the present disclosure may be implemented in a form of a software product. The computer software product is stored in a readable storage medium, such as a floppy disk, a Universal Serial Bus (USB) flash drive, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disc of a computer, and includes several instructions for instructing a computer device (which may be a personal computer, a training device, a network device, or the like) to perform the methods in embodiments of the present disclosure.
All or some of the foregoing embodiments may be implemented by using software, hardware, firmware, or any combination thereof. When software is used to implement embodiments, the foregoing embodiments may be implemented completely or partially in a form of a computer program product.
The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the procedures or functions according to embodiments of the present disclosure are completely or partially generated. The computer may be a general-purpose computer, a dedicated computer, a computer network, or other programmable apparatuses. The computer instructions may be stored in a computer-readable storage medium, or may be transmitted from a computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, a computer, a training device, or a data center to another website, computer, training device, or data center in a wired (for example, a coaxial cable, an optical fiber, or a digital subscriber line (DSL)) or wireless (for example, infrared, radio, or microwave) manner. The computer-readable storage medium may be any usable medium that can be stored by a computer, or a data storage device, for example, a training device or a data center, integrating one or more usable media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape), an optical medium (for example, a digital versatile disc (DVD)), a semiconductor medium (for example, a solid state drive (SSD)), or the like.
Number | Date | Country | Kind |
---|---|---|---|
202210705920.8 | Jun 2022 | CN | national |
This is a continuation of International Patent Application No. PCT/CN2023/101248 filed on Jun. 20, 2023, which claims priority to Chinese Patent Application No. 202210705920.8 filed on Jun. 21, 2022. The disclosures of the aforementioned applications are hereby incorporated by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2023/101248 | Jun 2023 | WO |
Child | 18989318 | US |