The present disclosure relates to the technical field of Internet applications, and more particularly, to information pushing.
In the field of Internet information pushing, in order to improve the accuracy of information pushing, an information push platform may usually use a machine learning model to select information to be pushed.
In the related art, when information is be pushed, the information pushing platform inputs an information feature of each information that may be pushed into a trained probability estimation model to obtain an estimated probability (for example, an estimated conversion rate) of a specified event after the information is pushed and displayed, and then determines the information pushed this time according to the estimated conversion rate of each information.
However, in an information pushing scene, the determined estimated conversion rate is different from an actual situation, thereby affecting the accuracy of information pushing.
Embodiments of the disclosure provide an information pushing method and apparatus, a computer device, and a storage medium, which may improve the accuracy of information pushing. The technical solution is as follows:
In accordance with certain embodiments of the present disclosure, an information pushing method performed by at least one processor is provided. The method includes extracting an information feature of candidate information, the information feature comprising a coarse-grained feature and a fine-grained feature, a number of tail value samples of the coarse-grained feature being greater than a number of tail value samples of the fine-grained feature; obtaining a first feature of the candidate information based on an intermediate feature, the intermediate feature being obtained in a process of extracting the coarse-grained feature; obtaining a second feature of the candidate information based on the information feature and the intermediate feature; obtaining target information from a plurality of pieces of candidate information, based on the first feature and the second feature; and pushing the target information.
In accordance with other embodiments of the present disclosure, an information pushing apparatus is provided, and includes at least one memory configured to store program code and at least one processor configured to read the program code and operate as instructed by the program code. The program code includes information feature extraction code, configured to cause the at least one processor to extract information feature of candidate information, the information feature comprising a coarse-grained feature and a fine-grained feature, a number of tail value samples of the coarse-grained feature being greater than a number of tail value samples of the fine-grained feature; first feature obtaining code, configured to cause the at least one processor to obtain a first feature of the candidate information based on an intermediate feature, the intermediate feature being obtained in a process of extracting the coarse-grained feature; second feature obtaining code, configured to cause the at least one processor to obtain a second feature of the candidate information based on the information feature and the intermediate feature; information obtaining code, configured to cause the at least one processor to obtain target information from a plurality of pieces of the candidate information based on the first feature and the second feature; and information pushing code, configured to cause the at least one processor to push the target information.
In accordance with still other embodiments of the present disclosure, a non-transitory computer-readable storage medium storing at least one computer instruction is provided. The at least one computer instruction is executable by at least one processor to cause the at least one processor to extract an information feature of candidate information, the information feature comprising a coarse-grained feature and a fine-grained feature, a number of tail value samples of the coarse-grained feature being greater than a number of tail value samples of the fine-grained feature; obtain a first feature of the candidate information based on an intermediate feature, the intermediate feature being obtained in a process of extracting the coarse-grained feature; obtain a second feature of the candidate information based on the information feature and the intermediate feature; obtain target information from a plurality of pieces of candidate information, based on the first feature and the second feature; and push the target information.
It is to be understood that, the foregoing general descriptions and the following detailed descriptions are merely for illustration and explanation purposes and are not intended to limit the disclosure.
The above and other aspects and features of certain embodiments of the present disclosure will be more apparent from the following description taken in conjunction with the accompanying drawings, in which:
Exemplary embodiments are described in detail herein, and examples of the exemplary embodiments are shown in the accompanying drawings. When the following description involves the accompanying drawings, unless otherwise indicated, the same numerals in different accompanying drawings represent the same or similar elements. The implementations described in the following exemplary embodiments do not represent all implementations consistent with the disclosure. On the contrary, the implementations are merely examples of apparatuses and methods that are recited in detail in the appended claims and that are consistent with some aspects of the disclosure.
Before describing the various embodiments shown in the disclosure, several concepts involved in the disclosure are first introduced.
1) Artificial Intelligence (AI)
AI is a theory, method, technology, and application system that use a digital computer or a machine controlled by a digital computer to simulate, extend, and expand human intelligence, perceive an environment, acquire knowledge, and use knowledge to obtain an optimal result. In other words, AI is a comprehensive technology in computer science and attempts to understand the essence of intelligence and produce a new intelligent machine that may react in a manner similar to human intelligence. AI is to study the design principles and implementation methods of various intelligent machines, to enable the machines to have the functions of perception, reasoning, and decision-making. The AI technology is a comprehensive discipline, and relates to a wide range of fields including both hardware-level technologies and software-level technologies. The basic AI technologies generally include technologies such as a sensor, a dedicated AI chip, cloud computing, distributed storage, a big data processing technology, an operating/interaction system, and electromechanical integration. AI software technologies mainly include several major directions such as a computer vision (CV) technology, a speech processing technology, a natural language processing technology, and machine learning/deep learning.
2) Machine Learning (ML)
ML is a multi-field interdiscipline, and relates to a plurality of disciplines such as the probability theory, statistics, approximation theory, convex analysis, and algorithm complexity theory. ML specializes in studying how a computer simulates or implements a human learning behavior to obtain new knowledge or skills, and reorganize an existing knowledge structure, so as to keep improving its performance. ML is the core of AI, is a basic way to make the computer intelligent, and is applied to various fields of AI. ML and deep learning generally include technologies such as an artificial neural network, a belief network, reinforcement learning, transfer learning, inductive learning, and learning from demonstrations.
3) Big Data
“Big Data” refers to a data set that cannot be captured, managed and processed by conventional software tools in a certain time range, and is a massive, high-growth and diversified information asset that instead uses newer processing modes to have stronger decision-making power, insight and discovery ability and process optimization ability. With the advent of the cloud era, big data also has attracted more and more attentions. Big data uses special technology to effectively process a large amount of data that has been collected for a long time. Technologies suitable for big data processing include a large-scale parallel processing database, data mining, a distributed file system, a distributed database, a cloud computing platform, an Internet, and an extensible storage system.
The user terminal 120 may be a mobile phone, a tablet computer, an e-book reader, a Moving Picture Experts Group Audio Layer III (MP3) player, a Moving Picture Experts Group Audio Layer IV (MP4) player, a smart wearable device, a laptop portable computer, a desktop computer, and the like.
The user terminal 120 is connected to the server 140 through a communication network. Optionally, the communication network is a wired network or a wireless network.
The server 140 may be an independent physical server, may also be a server cluster or a distributed system composed of multiple physical servers, and may further be a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, Content Delivery Network (CDN), and big data and artificial intelligence platforms.
Optionally, the server 140 may include a server configured to implement an information delivery platform 142. Optionally, the server 140 may further include a server configured to implement an information pushing platform 144.
Optionally, the information delivery platform 142 has functions of pushing and maintaining an information delivery interface, and receiving information delivered by an information delivery person.
The information above is information that may be displayed in many different applications at the same or similar times, such as an advertisement. As used herein, the term “advertisement” may include a non-economic advertisement and an economic advertisement, and the term “non-economic advertisement” refers to an advertisement not for the purpose of profit, which is also known as an effect advertisement, such as various announcements, notices, and statements of government administrative departments, social institutions, and even individuals; the term “economic advertisement” is also known as a “commercial advertisement”, and refers to an advertisement for the purpose of profit.
Optionally, the information pushing platform 144 has functions of managing and maintaining messages and pushing information to user terminals.
It should be noted that, the servers for implementing the information delivery platform 142 and the information pushing platform 144 may be servers independent from each other, and may also be implemented in a same physical server.
Optionally, the system may further include a management device (not shown in the drawing), which is connected with the server 140 through the communication network. Optionally, the communication network is a wired network or a wireless network.
Optionally, the wireless network or the wired network uses a standard communications technology and/or protocol. The network is usually the Internet, but may be any other network, including but not limited to a Local Area Network (LAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a mobile, wired, or wireless network, a dedicated network or a virtual dedicated network, or any combination thereof. In some embodiments, data exchanged by using a network may be represented by using a technology and/or format such as a Hyper Text Mark-up Language (HTML) and an Extensible Markup Language (XML). In addition, all or some links may be encrypted by using conventional encryption technologies such as a Secure Socket Layer (SSL), a Transport Layer Security (TLS), a Virtual Private Network (VPN), and Internet Protocol Security (IPsec). In some other embodiments, customized and/or dedicated data communication technologies may also be used to replace or supplement the foregoing data communication technologies.
Operation 201. Extract an information feature of candidate information, the information feature including a coarse-grained feature and a fine-grained feature; where the number of tail value samples of the coarse-grained feature is greater than that of the fine-grained feature.
As used herein, a “tail value” of the feature refers to a feature value corresponding to one or more categories arranged at a tail position of a queue after each sample information is classified according to each feature value of a certain feature and sorted according to an order of the number of information in each category from large to small, for example, it may be a feature value which is arranged at the tail position of the queue and has the corresponding number of information less than a quantity threshold. That is to say, the number of the tail value samples above is the number of sample information arranged in the category of the tail position of the queue.
For example,
In the sample number histogram 31 corresponding to the advertisement ID in
For another example, in the sample number histogram 32 corresponding to the advertiser in
For another example, in the sample number histogram 33 corresponding to a product type in
Herein, the above three features, i.e., the advertisement ID, the advertiser ID, and the product type, are presented as examples to introduce and explain division of the coarse-grained and fine-grained features, but this is for convenience and clarity purposes, and it is noted that the principles disclosed herein are generally applicable to other features.
The coarse-grained and fine-grained features may be manually divided by developers according to the number of tail samples of each feature, or the coarse-grained and fine-grained features may also be automatically divided by a computer device based on statistical results of the number of tail samples of each feature according to division rules set by developers, which is not limited in the embodiments of the disclosure.
In an embodiment of the disclosure, when there is an information displaying opportunity, the computer device may obtain each information satisfying the information displaying opportunity as a group of candidate information, and extract the information features of these candidate information, where these information feature are divided into the coarse-grained feature and the fine-grained feature.
Operation 202. Obtain a first feature of candidate information based on the coarse-grained feature; where the first feature is obtained based on an intermediate feature; and the intermediate feature is obtained in a process of extracting the coarse-grained feature.
In an embodiment of the disclosure, for the coarse-grained feature of each candidate information, the computer device may further extract these coarse-grained features. For example, the computer device first performs feature extraction on the coarse-grained feature to obtain the intermediate feature, and then further processes the intermediate feature corresponding to the coarse-grained feature to obtain the first feature.
Operation 203. Obtain a second feature of the candidate information based on the information feature and the intermediate feature.
In an embodiment of the disclosure, in order to extract more accurate feature characterization, when extracting the second feature from the candidate information, besides using the information feature of the candidate information, the intermediate feature of the candidate information is also shared, so that the multi-level feature characterization in the candidate information (an information whole level, a coarse-grained feature level, and a fine-grained feature level) may be learned.
Operation 204. Obtain target information from at least two pieces of candidate information based on the first feature and the second feature.
Operation 205. Push the target information.
To sum up, in various embodiments of the disclosure, the information feature is divided into the coarse-grained feature with a large number of tail value samples and the fine-grained feature with a small number of tail value samples, the first feature is extracted from the coarse-grained feature, and the second feature is extracted from the information feature including the coarse-grained feature and the fine-grained feature. When extracting the second feature, the second feature is extracted by combining the intermediate feature between the coarse-grained feature and the first feature. Multi-level feature characterization is synchronously learned from the information feature, so that the characterization effect of the extracted feature on the candidate information at multiple granularities may be improved, the target information for pushing may be accurately obtained from the candidate feature through the first feature and the second feature, and the accuracy of information pushing may be improved.
In an embodiment of the disclosure, the method shown in
Operation 401. Extract information feature of candidate information.
Operation 401 may be equivalent to Operation 201 in the embodiment shown in
Operation 402. Obtain a first feature of candidate information based on the coarse-grained feature.
In an embodiment of the disclosure, when the computer device extracts the first feature, it may first perform feature extraction on the coarse-grained feature to obtain multiple intermediate features, and then weight the multiple intermediate features to obtain the first feature.
For example, the process of obtaining the first feature of the candidate information based on the coarse-grained feature may include:
For each candidate information, the computer device may conduct the processes above, respectively, that is, it may obtain the first feature corresponding to each candidate information.
For example, in an embodiment of the disclosure, the m first intermediate features may be extracted from the coarse-grained feature by preset m expert networks, and the computer device also obtains the first weight corresponding to the m first intermediate features based on the coarse-grained feature, and then weights the m first intermediate features based on the first weight to obtain the first feature of each candidate information.
By determining the first weight of the first intermediate feature, when obtaining the first feature of the candidate information, an importance degree of each first intermediate feature with respect to the first feature may be determined based on the first weight respectively corresponding to m first intermediate features, which facilitates to improve the accuracy of the first feature, so as to perform feature characterization on the coarse-grained feature level more accurately.
In one possible implementation, the process of obtaining the first feature of candidate information based on the coarse-grained feature may include: processing the coarse-grained feature through a first extraction branch in the probability estimation model to obtain the first feature.
The first extraction branch may include three parts: a feature extraction network, a weight obtaining network, and a weighting network.
In an exemplary solution, the feature extraction network may include m expert networks, which respectively process the input coarse-grained feature and respectively output a copy of expert information (i.e., the first intermediate feature).
In an exemplary solution, the weight obtaining network may be a gate network, and the gate network in the first extraction branch may process the input coarse-grained feature and output the weights respectively corresponding to m expert networks (i.e., the first weights).
In an exemplary solution, the weighting network may be realized by including a weighting layer and a tower-shaped network. The weighting layer of the weighting network in the first extraction branch may perform weighting summation on the expert information output by the m expert networks based on the weights output by the gate network in the first extraction branch, and the tower-shaped network of the weighting network in the first extraction branch may extract the features of the weighting summation results of the weighting layers by means of knowledge distillation to obtain the first feature output by the first extraction branch.
In an embodiment of the disclosure, the first extraction branch may also be called a grouping layer. The purpose of the existence of the grouping layer is to learn generalized characterization of each information group, which contains common knowledge transmitted among all information in the group. In
In an embodiment of the disclosure, the expert network may be composed of a single-layer neural network, and a Rectified Linear Unit (ReLU) is adopted as an activation function. For example, the output of the expert network at the grouping layer may be represented as:
t
g
k=ReLU(W1Kxg)
where W3K∈ is the input feature of the grouping layer, and W3K∈
represents a coefficient matrix that a k-th expert network maps the input feature from an initial embedding space W3K∈
to a new space W3K∈
.
In order to self-adaptively fuse the expert networks, in the framework shown in
w
g=Softmax(W2xg)
W3K∈ is the coefficient matrix, and m is the number of expert networks in the grouping layer.
In the first extraction branch 51 shown in
hg stands for the tower-shaped network of grouping layer.
Operation 403. Obtain a second feature of the candidate information based on the information feature and the intermediate feature.
An embodiment of the disclosure adopts an asymmetric feature sharing processing mode to extract features, where the asymmetric feature sharing mode refers to an intermediate feature obtained in the process of sharing the first feature when extracting the second feature.
In one possible implementation, the process of obtaining the second feature of the candidate information based on the information feature and the intermediate feature may be as follows:
For each candidate information to be processed, the computer device may conduct the processes above, respectively, that is, the second feature corresponding to each candidate information may be obtained.
For example, in an embodiment of the disclosure, the n second intermediate features may be obtained by respectively extracting the coarse-grained feature and the fine-grained feature by the preset n expert networks, and the computer device also obtains second weights respectively corresponding to the n second intermediate features based on the coarse-grained feature and the fine-grained feature. In addition, the computer device also obtains second weights respectively corresponding to the m first intermediate features based on the coarse-grained feature and the fine-grained feature, and then, further based on the second weights, the m first intermediate features and the n second intermediate features are weighted to obtain the second feature of each candidate information.
By determining the second weights of the first intermediate feature and the second intermediate feature with respect to the information feature, respectively, when obtaining the second feature of the candidate information, the importance degree of each first intermediate feature and the second intermediate feature with respect to the second feature and the influence size when determining the second feature may be determined based on the second weights respectively corresponding to the m first intermediate features and n second intermediate features, which facilitates to improve the accuracy of the second feature, so that the overall information level, coarse-grained feature level and fine-grained feature level may be characterized more accurately.
In one possible implementation, the process of obtaining the second weight of n second intermediate features and the second weight of m first intermediate features based on information feature may include:
In an embodiment of the disclosure, in order to learn the features of candidate information more accurately to improve the accuracy of subsequent information pushing, the popularity of each candidate information may also be considered when obtaining the second weight.
In one possible implementation, the process of obtaining the second weight of n second intermediate features and the second weight of m first intermediate features based on the information feature and popularity vector of candidate information may include:
In an embodiment of the disclosure, the computer device may splice the fine-grained feature, the coarse-grained feature and the popularity vector of the candidate information, and then process the spliced feature to obtain the second weight. Through feature splicing, the information carried by the popularity vectors may be better integrated into the fine-grained feature and the coarse-grained feature, so as to effectively determine the accurate second weight according to the popularity feature.
In one possible implementation, the process of obtaining the second feature of the candidate information based on the information feature and the intermediate feature may include:
In an exemplary solution, the feature extraction network in the second extraction branch may include n expert networks, which respectively process the input information feature (the coarse-grained feature+the fine-grained feature) and respectively output a copy of expert information (i.e., the second intermediate feature).
In an exemplary solution, the weight obtaining network in the second extraction branch may be a gate network, and the gate network in the second extraction branch may process the input information feature and output the m expert networks in the second extraction branch and the weights corresponding to the n expert networks in the first extraction branch (i.e., the second weight).
In an exemplary solution, the weighting network may be realized by including a weighting layer and a tower-shaped network. The weighting layer in the second extraction branch may perform weighting summation on the expert information output by m+n expert networks based on the weights output by the gate network in the second extraction branch, and the tower-shaped network of the weighting network in the second extraction branch may extract the features of the weighting summation results of the weighting layer by means of knowledge distillation to obtain the second feature output by the second extraction branch.
As shown in
In the implementation of the disclosure, the second extraction branch in
t
a
k=ReLU(W3Kxa)
xa∈ is the input feature of the information layer, and W3K∈
is the transformation matrix of the k-th expert network.
In the information layer shown in
In addition, in an embodiment of the disclosure, the information with rich positive samples is also distinguished from the new information with a few positive samples through the historical conversion times of information. In order to let the model learn the differences between the popularity of this information, display definition and construction are performed on the characterization thereof in the gate network of the information layer.
For example, in an embodiment of the disclosure, popularity is first divided into buckets according to a numerical range, and each bucket is characterized and learned. Considering an oligopoly effect of popularity, the numerical range of buckets would be expanded with the increase of popularity.
For example, the computer device may divide the numerical range of popularity into r numerical intervals end to end, where for a certain piece of candidate information, the historical conversion times of the candidate information (which may be the total conversion times or the conversion times in the recent time period) are obtained, the numerical interval in which the historical conversion times are located (assumed to be the s-th interval) is determined, and a popularity vector with the dimension of r is generated; the s-th element in the popularity vector is 1, and other dimensions are 0.
The characterization of popularity is spliced with other input features, and after conversion, it is used as the output of the gate network of the information layer, so that the following formula represents the output of the gate network of the information layer:
w
a=Softmax(W4(xg⊕xa⊕epopu))
epopu represents the popularity vector, ⊕ is the splicing operation, and W4∈ is the parameter matrix of the gate network. Based on this lightweight design, the popularity of information may affect the characterization fusion more conveniently and directly.
For example,
The characterization vector of the information layer may be obtained by the following formula:
where m and n are the number of expert networks in the grouping layer and the information layer, and ha represents the tower-shaped network in the information layer.
After obtaining the first feature and the second feature, the computer device may obtain target information from at least two pieces (that is, from a plurality of pieces) of candidate information based on the first feature and the second feature, and the process may further include the following operations.
Operation 404. Fuse the first feature and the second feature to obtain a fused feature of the candidate information.
In one possible implementation, the process of fusing the first feature and the second feature to obtain the fused feature of the candidate information may include:
In an embodiment of the disclosure, when the computer device fuses the first feature and the second feature of the candidate information, after the second feature is weighted, it may be fused with the first feature where the third weight of the second feature is obtained through the information feature (the coarse-grained feature+the fine-grained feature) of the candidate information. The importance degree of the second feature with respect to the information feature and the influence degree of the second feature when generating the fused feature may be accurately embodied through the third weight, thereby effectively improving the accuracy of the fused feature.
In one possible implementation, the process of obtaining the third weight of the second feature based on the information feature may include:
In an embodiment of the disclosure, when calculating the third weight of the second feature of the candidate information, the influence of the popularity of the candidate information on the weight of the second feature may also be considered, so as to further improve the accuracy of the influence of the third weight during feature fusion.
In one possible implementation, the process of obtaining the third weight of the second feature based on the information feature and the popularity vector may include:
In an embodiment of the disclosure, when considering the influence of the popularity of the candidate information on the weight of the second feature, the popularity vector of the candidate information may be spliced with the information feature of the candidate information, so that the fusion degree of the popularity vector and the information feature may be improved by splicing, and the third weight is calculated based on the obtained spliced feature.
In one possible implementation, the process of fusing the first feature and the second feature based on the third weight of the second feature to obtain the fused feature may include:
After the second feature is weighted and when it is fused with the first feature, the weighted result between the second feature and the third weight may be added with the first feature to obtain the fused feature. By weighting, it may better reflect the indication function of the third weight on the importance degree of the second feature and improve the accuracy of the fused feature.
In one possible implementation, the process of fusing the first feature and the second feature to obtain the fused feature of the candidate information may include:
In an embodiment of the disclosure, the process of fusing the first feature and the second feature may be called dynamic characterization fusion. Referring to
v
fuse=tanh(W5(xa⊕epopu))
e=e
a
+v
fuse
⊗e
g
where e∈ is the final characterization vector output by the model, and e∈
sub-tables are the characterization vectors of information layer and grouping layer. W5∈
is the coefficient matrix, W5∈
is the vector element product operation, W5∈
is the learned fusion weight vector (i.e., the third weight mentioned above), and W5∈
is the weighted feature.
The combination of information layer characterization and grouping layer characterization contains a lot of effective information, so that the final characterization of information has a stronger generalization ability, therefore, it may alleviate the impact brought by a cold start issue in event probability estimation after information display.
In an embodiment of the disclosure, the third weight is explained using a weight vector as an example. Optionally, the third weight may also be represented in various representation forms, for example, the third weight may also be a weight value.
Operation 405. Obtain the estimated event probability of the candidate information based on the fused feature; where the estimated event probability is configured to identify the estimated probability of the specified event after the corresponding information display.
The specified event may be at least one of a conversion event, a click event or an exposure event for candidate information.
In an embodiment of the disclosure, the computer device may estimate the probability that effective pushing meeting the specified issue may be generated after the candidate information is pushed and displayed (i.e., events such as conversion, click or exposure occur after pushing). The estimated event probability is related to the specific type of the specified event. For example, the estimated event probability may be at least one of the estimated conversion rate, the estimated click rate and the estimated exposure rate.
In one possible implementation, the process of obtaining the estimated event probability of candidate information based on the fused feature may include:
In an embodiment of the disclosure, as shown in
In an embodiment of the disclosure, the computer device may also train the probability estimation model before obtaining the candidate information.
In one possible implementation, the training process of the probability estimation model may be as follows:
The computer device may regularly collect a pushing situation of various information in the network within a certain period of time (for example, within 48 hours before the current moment), such as whether it is pushed, and whether click, exposure and conversion events occur after pushing, and construct the sample information and the labeling probability of the sample information based on the pushing situation of various information in the network.
In an embodiment of the disclosure, the probability estimation model may focus on learning an optimal characterization vector for each information, and a multi-layer neural network may be adopted to learn the characterization vector of the user. Taking the estimated event probability to be a conversion rate estimated value as an example, the estimated conversion rate estimated value may be represented as:
ŷ
ι=Sigmoid(e·eu)
eu is the characterization vector output by the user side.
In an embodiment of the disclosure, a logarithmic loss may be used as the loss function. The logarithmic loss is a common loss function in conversion rate estimation. Because the positive samples in a real data set may be gathered on a little information with high popularity, in order to prevent the loss function from being influenced by these samples too much, in an embodiment of the disclosure, the loss function is optimized as follows:
yi and ŷi represent an actual value of user conversion and an estimated value of the conversion rate, respectively, wi is the weight value of training sample i, and N is the total number of training samples. The significance of introducing weight into the loss function is that it may appropriately reduce the sensitivity of loss to popularity advertisements and further focus on new advertisements.
Optionally, the formula for calculating the weight of the training sample is:
Ki represents the popularity of training sample i, for example, Ki may be the number of historical conversion times of training sample i. In an embodiment of the disclosure, the weight difference between the advertisement with higher popularity and the new advertisement with lower popularity may reach two orders of magnitude, which will lead to unsatisfactory training results. Therefore, in an embodiment of the disclosure, Ki may be truncated, for example, the maximum value of Ki is set to 20.
Operation 406. Obtain target information from at least two pieces of candidate information based on the estimated event probability.
In an embodiment of the disclosure, the computer device may rank at least two pieces (that is, a plurality of pieces) of candidate information from large to small according to the estimated event probability, and select one or more candidate information ranked at the forefront as the target information.
Operation 407. Push the target information.
As may be seen by observing
Various embodiments of the disclosure adopt the strategy of feature grouping and asymmetric sharing. Firstly, the input features are grouped, and the information layer is completely isolated from the grouping layer. The expert network of the information layer only inputs a fineness feature, and the gate network of the information layer also only fuses the expert network of the information layer. In an output part of the information layer and the grouping layer, numerical value based fusion is adopted, so that the final output of the whole system is the weighted sum of the information layer and grouping layer. The variant is labeled “V1” in
In an embodiment of the disclosure, variant “V2” in
Various embodiments of the disclosure also consider popularity embedding characterization. The correlation between the features of the information layer and grouping layer is very complex and will be affected by sample distribution. Accordingly, AutoFuse uses popularity embedding characterization to self-adaptively guide this fusion, resulting in the variant labeled as V3 in
Various embodiments of the disclosure also adopt the strategy of dynamic fusion and self-adaptive loss. Dynamic fusion is to self-adaptively combine the characterization output of the information layer and grouping layer. The weighted sum method based on numerical values may reduce the order of magnitude of each vector. AutoFuse adopts vector-based fusion, which gives different weight values to different dimensions of input vectors. This method is more flexible and may introduce more nonlinearity, and the resulting model is labeled as V4 in
To sum up, in various embodiments of the disclosure, the information feature is divided into the coarse-grained feature with a large number of tail value samples and the fine-grained feature with a small number of tail value samples, and the first feature is extracted from the coarse-grained feature, and the second feature is extracted from the complete information feature. Moreover, when extracting the second feature, the second feature is extracted by combining the intermediate feature between the coarse-grained feature and the first feature, multi-level feature characterization may be learned synchronously from the information feature, so that the characterization effect of the extracted features on the information may be improved, and the accuracy of information pushing may be improved when information is selected and pushed through the extracted first feature and second feature.
Various embodiments of the disclosure may be realized or executed in combination with a blockchain. For example, some or all of the operations in the various embodiments may be performed in a blockchain system; or, data used for the execution of each operation in the various embodiments or the generated data may be stored in the blockchain system; for example, the model input data such as the training samples used during model training and the candidate information in the model application process may be obtained from the blockchain system by the computer device; for another example, the parameters of the model obtained after the model training may be stored in the blockchain system.
In one possible implementation, the first feature obtaining module 1002 is configured to,
In one possible implementation, the second feature obtaining module 1003 is configured to,
In one possible implementation, the second feature obtaining module 1003 is configured to obtain the second weight of the n second intermediate features and the second weight of the m first intermediate features based on the information feature and the popularity vector of the candidate information; where the popularity vector is used for indicating historical conversion times of the candidate information.
In one possible implementation, the second feature obtaining module 1003 is configured to,
In a possible implementation, the information obtaining module 1004 is configured to:
In a possible implementation, the information obtaining module 1004 is configured to:
In a possible implementation, the information obtaining module 1004 is configured to:
In a possible implementation, the information obtaining module 1004 is configured to:
In a possible implementation, the information obtaining module 1004 is configured to:
In one possible implementation, the first feature obtaining module 1002 is configured to process the coarse-grained feature through a first extraction branch in a probability estimation model to obtain the first feature;
In a possible implementation, the apparatus further includes:
The apparatus further includes:
To sum up, in various embodiments of the disclosure, the information feature is divided into the coarse-grained feature with a large number of tail value samples and the fine-grained feature with a small number of tail value samples, and the first feature is extracted from the coarse-grained feature, and the second feature is extracted from the complete information feature. Moreover, when extracting the second feature, the second feature is extracted by combining the intermediate feature between the coarse-grained feature and the first feature, multi-level feature characterization may be learned synchronously from the information feature, so that the characterization effect of the extracted features on the information may be improved, and the accuracy of information pushing may be improved when information is selected and pushed through the extracted first feature and second feature.
The mass storage device 1107 is connected to the CPU 1101 by using a mass storage controller (not shown) connected to the system bus 1105. The mass storage device 1107 and a computer-readable medium associated therewith provide non-volatile storage for the computer device 1100. That is to say, the mass storage device 1107 may include a computer-readable medium (not shown) such as a hard disk or a Compact Disc Read-Only Memory (CD-ROM) drive.
In general, the computer-readable medium may include a computer storage medium and a communications medium. The computer storage medium includes volatile and non-volatile media, and removable and non-removable media implemented by using any method or technology used for storing information such as computer-readable instructions, data structures, program modules, or other data. The computer storage medium includes RAM, ROM, flash memory or other solid-state storage technologies, CD-ROM, or other optical storage, magnetic tape cassettes, magnetic tapes, magnetic disk storage or other magnetic storage devices. A person skilled in the art will understand that the computer storage medium is not limited to the foregoing several types. The system memory 1104 and the mass storage device 1107 may be collectively described as a memory.
The computer device 1100 may be connected to the Internet or other network devices through a network interface unit 1111 connected to the system bus 1105.
The memory further includes one or more programs. The one or more programs are stored in the memory. The CPU 1101 implements all or partial operations of the method shown in
In an exemplary embodiment, a non-transitory computer-readable storage medium including an instruction is further provided, for example, a memory including a computer program (an instruction), and the foregoing program (instruction) may be executed by a processor of the computer device to complete the methods shown in various embodiments of the disclosure. For example, the non-transitory computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, a computer program product or computer program including computer instructions stored in a computer-readable storage medium is also provided. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the method shown in the above-mentioned various embodiments.
The technical solutions provided in the embodiments of the disclosure have beneficial effects including, but not limited to: An information feature is divided into a coarse-grained feature with a large number of tail value samples and a fine-grained feature with a small number of tail value samples. A first feature is extracted from the coarse-grained feature, and a second feature is extracted from information feature including the coarse-grained feature and the fine-grained feature. When extracting the second feature, the second feature is extracted by combining an intermediate feature between the coarse-grained feature and the first feature, and multi-level feature characterization is synchronously learned from the information feature. Therefore, a characterization effect of the extracted feature on candidate information at multiple granularities may be improved, target information for pushing may be accurately obtained from the candidate feature through the first feature and the second feature, and the accuracy of information pushing may be improved.
After considering and practicing the present disclosure, a person skilled in the art may easily conceive of other implementations thereof. This disclosure is intended to cover any variations, uses, or adaptive changes thereof. These variations, uses, or adaptive changes follow the general principles of the disclosure and include common general knowledge or common technical means in the art, which are not disclosed herein. The disclosed embodiments are considered as merely exemplary, and the scope and spirit of the disclosure are pointed out in the following claims.
It should be understood that the disclosure is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes may be made without departing from the scope of the disclosure, which is subject only to the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
202110898411.7 | Aug 2021 | CN | national |
This application is a bypass continuation application of International Patent Application No. PCT/CN2022/102583, filed on Jun. 30, 2022, which is based on and claims priority to Chinese Patent Application No. 202110898411.7, filed with the China National Intellectual Property Administration on Aug. 5, 2021, the disclosures of which are incorporated by reference herein in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2022/102583 | Jun 2022 | US |
Child | 18332398 | US |