The present application claims priority to and incorporates by reference the entire contents of Japanese Patent Application No. 2016-133353 filed in Japan on Jul. 5, 2016.
The present invention relates to an information analysis apparatus, an information analysis method, and an information analysis program.
Conventionally, research has been conducted on a technique for displaying goods or services matching the user's hobby preference as recommendation, on a shopping site on the Internet. In this regard, by performing machine learning using click log of advertisement as learning data, a technique for predicting CTR (Click Through Rate) is known (for example, refer to JP 2014-174753 A).
In the conventional technique, by deciding which products or services to recommend by using click log data, there have been cases where goods or services that are not very interested to the user are recommended. As a result, it may be difficult to improve the purchase willingness of the user.
It is an object of the present invention to at least partially solve the problems in the conventional technology.
According to one aspect of an embodiment, An information analysis apparatus includes a weight assigning unit that assigns a weight to each of a plurality of items based on an action taken by a user who has viewed a sales content on which the plurality of items to be recommended are posted. The information analysis apparatus includes a selection unit that selects a plurality of pairs in which two items are selected among the plurality of items placed in the sales content and associated with each other. The information analysis apparatus includes an evaluation unit that evaluates a characteristic based on characteristic information indicating a property of each of the two items selected as a pair by the selection unit and the weight assigned by the weight assigning unit to the two items.
The above and other objects, features, advantages and technical and industrial significance of this invention will be better understood by reading the following detailed description of presently preferred embodiments of the invention, when considered in connection with the accompanying drawings.
Hereinafter, an information analysis apparatus, an information analysis method, and a non-transitory computer readable storage medium having stored therein an information analysis program to which the present invention is applied will be described with reference to the drawings.
Overview
The information analysis apparatus is realized by one or more processors. The information analysis apparatus is a device that evaluates a characteristic indicating a property of a recommended item based on the action of a user who browses sales content on which a plurality of recommended items (recommended items) are displayed.
The sales content includes a website (sales site) displayed by a UA (User Agent) such as a web browser, an application screen displayed when the application program installed in the terminal device cooperates with the server, and the like. In the following description, it is assumed that the sales content is a sales site displayed by the web browser.
An item includes one or both of goods and services. An item may be displayed as an image or text (character) in a part or all of the sales site, or may be displayed by pop-uping a new window on the window displaying the sales site.
The characteristic includes a word included in an introduction text such as a title displayed when an item is posted on the sales site, attribute information such as a category previously assigned to items, and other information.
Evaluation of characteristic is performed from the viewpoint of whether the action of the user who has browsed the sales site has been guided in a preferable direction (for example, purchasing direction) when the item recommended at the sales site has its characteristic. For example, evaluation of characteristic is performed by comprehensively selecting any two recommended items (there is no need to select everything), analyzing a disparity in user's action among the selected pairs, and machine learning the result. As a result, it is possible to generate information for recommending an item with high interest of the user. By applying this evaluation result to criteria for adoption of recommended items and the like on and after the next time, it is possible to improve the sales performance of the sales site.
Overall Structure
Each device shown in
Each of the plurality of terminal devices 10-1 to 10-n is a terminal device used by a user. Hereinafter, in the case where each of the plurality of terminal devices 10-1 to 10-n is not distinguished, they will be described while being simply referred to as the terminal device 10. The terminal device 10 is, for example, a mobile phone such as a smartphone, a tablet terminal, a PDA (Personal Digital Assistant), or a personal computer. The user operates the terminal device 10 and accesses the website provided by the web server device 100.
For example, UA such as a web browser is activated, and a predetermined operation is performed by the user, whereby the terminal device 10 transmits an HTTP (Hypertext Transfer Protocol) request to the web server device 100. Then, the terminal device 10 displays the web page on the display unit based on the HTTP response returned from the web server device 100. For data transmitted as an HTTP response includes, for example, text data described in a markup language such as HTML (Hyper Text Markup Language), a style sheet, still image data, moving image data, audio data and the like.
The web server device 100 is, for example, a server device that provides a sales site such as a shopping site, an auction site, a flea market site or the like. The web server device 100 posts a recommended item to a sales site provided by itself. This recommended item may be limited to an item handled in the sales site provided by the web server device 100 itself or may include an item handled in a web site provided by another web server device.
The information analysis apparatus 200 evaluates the characteristic of the recommended item posted on the sales site by the web server device 100. Details will be described later.
Web Server Device
The respective configurations of the web server device 100 and the information analysis apparatus 200 will be described below.
The communication unit 110 includes, for example, a communication interface such as NIC (Network Interface Card). The communication unit 110 communicates with the terminal device 10 and the information analysis apparatus 200 via the network NW. For example, the communication unit 110 receives an HTTP request from the terminal device 10. Further, the communication unit 110 may receive information on the browsing history of the web browser from the terminal device 10.
The server side control unit 120 includes, for example, an HTTP processing unit 122, a recommendation processing unit 124, and a recommended item determination unit 126. These components are implemented, for example, by a processor such as a CPU (Central Processing Unit) by executing a program stored in the server side storage unit 130. In addition, some or all of the components of the server side control unit 120 may be implemented by hardware (circuitry) such as a LSI (Large Scale Integration), an ASIC (Application Specific Integrated Circuit), or a FPGA (Field-Programmable Gate Array), and may be realized by cooperation of software and hardware.
When the HTTP request is received by the communication unit 110, the HTTP processing unit 122 reads data for generating a web page stored in advance in the server side storage unit 130, and using the communication unit 110, the HTTP processing unit 122 transmits the data read out to the transmission source of the HTTP request as an HTTP response.
In order to post a recommended item on a web page requested as an HTTP request before the HTTP processing unit 122 transmits an HTTP response, the recommendation processing unit 124 edits data transmitted as an HTTP response. For example, the recommendation processing unit 124 stores still image data, moving image data, audio data, and the like related to the recommended item in the data transmitted as an HTTP response. Further, the recommendation processing unit 124 may write a description designating the placement position and the font size of an image, a description, or the like indicating the recommended item of the web page on the text data or the style sheet to be transmitted together with these data, and may newly generate text data or style sheet in which these descriptions are written.
The recommended item determination unit 126 performs collaborative filtering based on browsing item information 132, a cart item information 134, and a purchased item information 136 to be described later, and determines the recommended item for each session. The collaborative filtering is processing of extracting, from preference information of a large number of users (132, 134, 136 etc. described above), preference information of other users similar in preference to the user who is recommended for the item, and guessing an item that matches the preference of the target user.
A session is a period of time from accessing a certain web page in the sales site to switching to another web page in the sales site or a web page in another website. In addition, the session may be a period from accessing a certain web page in the sales site to closing the web browser displaying the web page. In addition, the session may be a period from accessing a certain web page in the sales site until a predetermined time passes (timeout). The recommended item determination unit 126 may update the recommended item according to the change of the session.
Further, the recommended item determination unit 126 may determine priority (rank) of items to be adopted as a recommended item when assigning a collaborative filtering process, and may determine an item to be finally adopted as a recommended item after assigning a probability element such as a random number.
In addition, when information on the browsing history of the web browser in the terminal device 10 is acquired by the communication unit 110, the recommended item determination unit 126 may determine the recommended item by performing the collaborative filtering while further taking the information into consideration.
Also, the recommended item determination unit 126 may determine the placement order of items to be posted as recommended items based on the evaluation result by the information analysis apparatus 200. For example, when there is a limit on the number of recommended items that can be posted in the same sales site during one session, under this limitation, an item to be preferentially posted as a recommended item is selected from candidates of items indicated by recommended item candidate information 138 to be described later.
Also, the server side control unit 120 transmits, using the communication unit 110, information on items to be posted on the sales site as browsing item information 132, cart item information 134, purchased item information 136 to be described later, and recommended items, to the information analysis apparatus 200.
The server side storage unit 130 is realized by, for example, a HDD (Hard Disc Drive), a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), a ROM (Read Only Memory), a RAM (Random Access Memory), or a hybrid storage device combining a plurality of these. The server side storage unit 130 stores various programs such as firmware and application program, information received by the communication unit 110, and the like. In addition, the server side storage unit 130 stores the browsing item information 132, the cart item information 134, the purchased item information 136, and the recommended item candidate information 138.
The browsing item information 132 is information in which an item ID for identifying an item selected at the sales site is associated with each user ID for identifying a user. For example, the user ID may be a login ID of the sales site or a session ID managed by the web browser. The session ID is, for example, identification information that is written in a Cookie stored in a header of an HTTP response and is passed from the web server device 100 that manages the sales site to the web browser of the terminal device 10 This cookie may include information indicating the presence or absence of browsing of the item (for example, information on the browsing history of the web browser). The web browser of the terminal device 10 stores the cookie including the received session ID in the HTTP request, and transmits the HTTP request to the web server device 100. The HTTP processing unit 122 compares the session ID included in the HTTP request with the session ID included in the HTTP response, thereby identifying whether the session is the same session by the same user. As a result, the item ID of the selected item is associated with the user ID.
The cart item information 134 is information in which the item ID of an item to be purchased in a cart is associated with the user ID. The purchased item information 136 is information in which the item ID of the already purchased item is associated with the user ID. For example, the user ID in this case is the login ID of the sales site.
The recommended item candidate information 138 is information indicating a plurality of items that are candidate for recommended items. In the case where the item to be handled at the sales site provided by the web server device 100 is a recommended item, the web server device 100 may extract a plurality of items that are candidate for the recommended item from a part or all of the items handled at the sales site provided by the web server device 100. Furthermore, in the case where the item to be sold at the web site provided by another server device is a recommended item, the web server device 100 may extract a plurality of items that are candidate for the recommended item from a part or all of the items handled at another web site.
Information Analysis Apparatus
The communication unit 210 includes, for example, a communication interface such as NIC. The communication unit 210 communicates with the web server device 100 via the network NW. For example, the communication unit 210 receives, from the web server device 100, the above-described browsing item information 132, the cart item information 134, purchased item information 136, information on recommendation items posted on the sales site (information corresponding to the recommended item information 232).
The control unit 220 includes, for example, a per-conversion label assigning unit 222, a pairwise learning unit 224, and an evaluation unit 226. These constituent elements are realized, for example, by a processor such as a CPU executing a program stored in the storage unit 230. In addition, some or all of the components of the control unit 220 may be realized by hardware (circuitry) such as LSI, ASIC, FPGA, etc., or may be realized by cooperation of software and hardware.
The storage unit 230 is realized by, for example, an HDD, a flash memory, an EEPROM, a ROM (Read Only Memory), a RAM, or a hybrid type storage device combining a plurality of these. The storage unit 230 stores various programs such as firmware and application program, information received by the communication unit 210, and the like. In addition, the storage unit 230 stores recommended item information 232, item-by-item label information 234, and learning model information 236.
The per-conversion label assigning unit 222 determines whether various conversions are established based on the action of the user who has viewed the sales site during one session. The conversion means that a user who has selected the recommended item takes an action expected by a client who has requested the publication of the recommended item (for example, a site administrator or a store manager who raises revenue by the sales site). This action includes, for example, purchasing a recommended item after selection of the recommended item, purchasing an item different from the recommended item after selection of the recommended item at the sale site on which the recommended item is posted, (that is, purchasing some item different from the recommended item in the same sales site), and simply selecting a recommended item without purchasing an item (including the recommended item) in the sales site. Here, the selecting means an operation of the user clicking or tapping an area of the recommended item using the terminal device 10 and requesting the web server device 100 to transmit a web page relating to the recommended item.
For example, when a user purchases a recommended item, the per-conversion label assigning unit 222 determines that a first conversion has been established. In addition, the per-conversion label assigning unit 222 determines that a second conversion is established when the user purchases another item that is not the recommended item. In addition, the per-conversion label assigning unit 222 determines that the third conversion has been established when the user selects the recommended item and thereafter the session is switched without purchasing any item. Whether these conversions are successful or not is judged by referring to tracking information that can be included in cookie (HTTP cookie) managed by each web browser for each terminal device 10, information on Web Storage function, or the like.
Then, the per-conversion label assigning unit 222 assigns a label to the recommended item according to the presence or absence of the conversion and/or the type of the established conversion. A labels is represented by a numerical value, for example, and is treated as a weight (coefficients) in pairwise learning described below. The per-conversion label assigning unit 222 is an example of a “weight assigning unit”.
As the action of the user who has viewed the sales site is closer to the action expected by a client such as a site administrator, the per-conversion label assigning unit 222 assigns a label having a larger value to the recommended item. When a site administrator or the like expects improvement of profit by posting a recommended item, a label having the largest value is assigned to the action of purchasing the recommended item, and a label having a value larger than a label value assigned when the action is the action of purchasing the recommended item is assigned to the action of purchasing another item which is not a recommended item. A label having a value larger than the label value assigned when an action of purchasing another item is assigned to the action of simply selecting the recommended item without purchasing the item including the recommended item.
The pairwise learning unit 224 derives the relevance between the characteristics corresponding to each of the plurality of recommended items to which the label is assigned, by pairwise learning. The pairwise learning in this embodiment is executed as a supervised learning that classifies target data into binary by treating the differential vector of the pair of two feature vectors as an index. The pairwise learning unit 224 is an example of a “selection unit”.
For example, in one session, the pairwise learning unit 224 selects two non-overlapping labels from the four labels associated with the conversion type, and pairs the two labels, in a combination of all labels. At this time, a pair in which the order of the two labels is exchanged with respect to the previously selected pair may be selected as a pair different from the previously selected pair. Thus, in the example of
The pairwise learning unit 224 derives a distance between a feature vector and a boundary line of the dimension represented by a hyperplane HP for each of the plurality of feature vectors, in the feature space where the difference between the two labels in pairs is a feature vector (difference vector). The hyperplane HP is a subspace of the feature space, and is, for example, a space having a diminished dimension by 1 from the dimension number of the feature space. As shown in
Note that the pairwise learning unit 224 may change the boundary line indicating the hyperplane HP by learning, by using machine learning such as Ranking SVM described above such that the magnitude relation of the distance between the point indicating each feature vector and the boundary line tends to be the same as the magnitude relationship of the value indicating the feature vector (the difference of the label value). For example, the pairwise learning unit 224 may change the boundary line indicating the hyperplane HP by changing the parameters of the kernel function (such as the Radial Basis Function kernel). An equation modeling the boundary line indicating the hyperplane HP derived by the machine learning is stored in the storage unit 230 as the learning model information 236.
The evaluation unit 226 evaluates the relevance between the characteristics of the recommended item based on the distance to the hyperplane HP for each feature vector derived in the feature space by the pairwise learning unit 224.
Hereinafter, in order to describe the evaluation method, attention is paid only to the feature vector on the positive side; however, the negative side may also be evaluated in the same way as the positive side. Also, the characteristic of the recommended item corresponding to the label 4 is f4, the characteristic of the recommended item corresponding to the label 3 is f3, the characteristic of the recommended item corresponding to the label 2 is f2, and the characteristic of the recommended item corresponding to the label 0 is f0, and with this configuration, the evaluation method is described.
For example, when attention is paid to the feature vectors (x1-x2) and (x1-x3) in the above described
The evaluation unit 226 transmits the evaluation result described above, that is, the evaluation result of the degree of contribution for each characteristic with respect to the action leading to the conversion, for example, to the web server device 100 using the communication unit 210. For example, the evaluation result may be information arranged in descending order in ranking form from a highly evaluated characteristic. As a result, the recommended item determination unit 126 in the web server device 100 refers to the recommended item candidate information 138 and determines the order of priority when posting the item as the recommended item. For example, when there are a plurality of similar items of the same category as the recommended item candidates indicated by the recommended item candidate information 138, the recommended item determination unit 126 may compare the characteristics of respective items and sequentially determine the item in order from the item with the high evaluation value as the recommended item.
In addition, the evaluation unit 226 may transmit the evaluation result to a computer operated by a site administrator or a store manager of the sales site using the communication unit 210, and may output the evaluation result to a display device (not shown) of the information analysis apparatus 200 or the like. As a result, for example, the site administrator or the like can change the word to be added to the title of the item to be handled to a word with a higher evaluation (more easily purchased).
Processing Flow
Next, the per-conversion label assigning unit 222 compares the browsing item information 132 and the recommended item information 232 and determines whether or not the recommend item is selected for each session (S102). If no recommended item is selected, the per-conversion label assigning unit 222 assigns the label 0 to the item ID of the recommend item (S104).
On the other hand, if the recommended item is selected, the per-conversion label assigning unit 222 determines whether or not the recommended item is purchased (S106). If the recommended item is purchased, the per-conversion label assigning unit 222 assigns the label 4 to the item ID of the recommended item (S108).
On the other hand, if the recommended item is not purchased, the per-conversion label assigning unit 222 determines whether or not another item that is not the recommended item is purchased (S110). If another item is purchased, the per-conversion label assigning unit 222 assigns the label 3 to the item ID of the recommended item (S112).
On the other hand, if another item is not purchased, the per-conversion label assigning unit 222 assigns the label 2 to the item ID of the recommended item (S114).
Next, the pairwise learning unit 224 generates a total of twelve pairs by solving the permutation problem of 4P2 by using the four types of labels assigned for each recommended item by the per-conversion label assigning unit 222 (S116).
Next, in the feature space with a difference between the labels of the 12 pairs as the feature vector, the pairwise learning unit 224 derives a distance between the feature vector and the boundary line of the dimension represented by the hyperplane HP for each of the plurality of feature vectors using Ranking SVM (S118).
Next, the evaluation unit 226 evaluates the relevance between the characteristics of the recommended item based on the distance to the hyperplane HP for each feature vector derived in the feature space by the pairwise learning unit 224 (S120).
Next, the evaluation unit 226 outputs the evaluation result to an external device or the like (S122). As a result, the processing of this flowchart ends.
Validation Example
The applicant of the present application conducted a following experiment and verified an evaluation method proposed in this embodiment.
The technique using the CTR, which is the conventional technique, is a method of performing machine learning using the determination result as to whether or not the third conversion is established among the conversions in the present embodiment. Only the vector of difference between the label 2 and the label 0 is taken as a feature vector. Also, the TF-IDF method performs evaluation based on two indexes of a word appearance frequency TF (Term Frequency) obtained by dividing the number of occurrences of a word of interest appearing in one document by the sum of appearance frequencies of all words appearing in one document, and an inverse document frequency IDF (Inverse Document Frequency) obtained by dividing the total number of documents in the data by the number of documents containing the target word.
The evaluation index used for verification as KPI (Key Performance Indicator) is, for example, macro-auc (%), MRR (Mean Reciprocal Rank) (%), and a plurality of NDOCs (Normalized Discounted Cumulated Gain) with different maximum number ranking (%). The macro-auc is an index represented by the area under the curve on the ROC (Receiver Operating Characteristic) curve showing the correlation between the correct data and the error data. The correct data and the error data may be acquired by classifying the test data into binary according to the boundary line of the hyperplane HP derived by the training data. For example, macro-auc is 100% if the test data can be completely classified into the correct data and error data, and is 50% if the test data is randomly classified. MRR is an evaluation index obtained by, while attention is paid to the reciprocal of the ranking, calculating the reciprocal of the order of the correct data when the correct data first appears (rank indicating the order in which the correct answer data has appeared from the first data (RR (Reciprocal Rank)), and averaging the reciprocal of the order of all correct data. For example, MRR becomes 0 if no correct data appears. NDOC is an index indicating the correctness of the ranking proposed by machine learning and a value thereof is normalized so that the value in the case where perfectly correct ranking is made is 100%. The larger the value of NDOC, the better the evaluation. In the present embodiment, NDOG@1 which evaluates the accuracy of the highest ranking, NDOG@3 which evaluates the correctness of the top three rankings, and NDOG@5 which evaluates the accuracy of the top 5 rankings are used to perform evaluation. As shown in
In addition, the applicant of the present application verified real-time evaluation by transmitting the training data at any time from the web server device 100 to the information analysis apparatus 200 by a live test format.
Based on the above evaluation results, it is possible to evaluate that in this method, there is posted a recommended item that a user is more interested to than in the conventional method, on the sales site. That is, it can be evaluated that the user's purchase willingness is increased.
According to the above-described embodiment, based on the action taken by the user who has viewed the sales content on which a plurality of recommended items are posted, by assigning a weight to each of a plurality of recommended items, selecting a plurality of pairs associating two items from a plurality of recommended items, and evaluating the characteristic based on characteristic information indicating the property of each of the two items selected as a pair and the weight assigned to the two items, it is possible to generate information for recommending an item with high interest of the user.
It is to be noted that although the above-described terminal device 10 has been described as providing the sales site by the web browser as the sales content, the present invention is not limited to this. For example, an application screen corresponding to the sales site may be provided by a previously installed application program. In this case, the web server device 100 may be an application server cooperating with the application program installed in the terminal device 10.
Further, the evaluation unit 226 in the information analysis apparatus 200 described above may determine a feature vector to be evaluated from a plurality of feature vectors in the feature space according to the attribute of the user. The attribute may be, for example, sex, age, occupation, but is not limited thereto. For example, the evaluation unit 226 extracts only the feature vector labeled based on the action (conversion) taken by the user matching the attribute such as a man under 30 years old from the feature space, and evaluates the relevance between the characteristics extracted from these extracted feature vectors. In this way, it is possible to post a recommended item which can attract a particular user's interest particularly to the sales site.
In addition, one or both of the recommendation processing unit 124 and the recommended item determination unit 126 in the above-described web server device 100 may be included in the control unit 220 of the information analysis apparatus 200.
Further, some or all of the functions of the pairwise learning unit 224 in the information analysis apparatus 200 and the evaluation unit 226 may be provided by other analysis apparatuses.
Hardware Configuration
The web server device 100 and the information analysis apparatus 200 of the embodiment described above are realized by a hardware configuration as shown in
The web server device 100 has a structure in which a NIC 100-1, a CPU 100-2, a RAM 100-3, a ROM 100-4, a secondary storage device 100-5 such as a flash memory and HDD, and a drive device 100-6 are mutually connected by an internal bus or a dedicated communication line. A portable storage medium such as an optical disk is mounted on the drive device 100-6. The advertisement moving image management program stored in the portable storage medium attached to the secondary storage device 100-5 or the drive device 100-6 is developed in the RAM 100-3 by a DMA controller (not shown) or the like, and executed by the CPU 100-2, thereby realizing the server side control unit 120. The program referred to by the server side control unit 120 may be downloaded from another device via the network NW.
The information analysis apparatus 200 has a structure in which a NIC 200-1, a CPU 200-2, a RAM 200-3, a ROM 200-4, a secondary storage device 200-5 such as a flash memory and HDD, and a drive device 200-6 are mutually connected by an internal bus or a dedicated communication line. A portable storage medium such as an optical disk is attached to the drive device 200-6. The advertisement moving image management program stored in the portable storage medium attached to the secondary storage device 200-5 or the drive device 200-6 is developed in the RAM 200-3 by a DMA controller (not shown) or the like, and executed by the CPU 200-2, thereby realizing the control unit 220. The program referred to by the control unit 220 may be downloaded from another device via the network NW.
According to an aspect of the present invention, it is possible to generate information for recommending an item with high interest of the user.
Although the invention has been described with respect to specific embodiments for a complete and clear disclosure, the appended claims are not to be thus limited but are to be construed as embodying all modifications and alternative constructions that may occur to one skilled in the art that fairly fall within the basic teaching herein set forth.
Number | Date | Country | Kind |
---|---|---|---|
2016-133353 | Jul 2016 | JP | national |