Marketing on the World Wide Web (the Web) is a significant business. Users often purchase products through a company's Website. Further, advertising revenue can be generated in the form of payments to the host or owner of a Website when users click on advertisements that appear on the Website. The amount of revenue earned through Website advertising and product sales may depend on a Website's ability to attract clients and develop a loyal base of returning clients. Often, the ability to attract a client to a particular Website depends on the organization of the Website and whether the user is able to effectively navigate the Website to locate relevant information or products.
Certain exemplary embodiments are described in the following detailed description and in reference to the drawings, in which:
Exemplary embodiments of the present invention provide techniques for delivering personalized Web page content that more closely represents the interests of a client to a Web page. As used herein, the term “exemplary” merely denotes an example that may be useful for clarification of the present invention. The examples are not intended to limit the scope, as other techniques may be used while remaining within the scope of the present claims. The techniques disclose herein can improve a Website experience by personalizing the appearance and content of the Website, which may lead to increased traffic and, thus, revenue for the Website. This personalizing of the Website may be particularly important when the Website first encounters a particular client identifier (user ID) for which prior Website use information is not available.
A user ID is a unique identifier used to identify a particular system used to access a Website, for example, an IP address, a client name, and the like. In the exemplary embodiments of the present invention, a relatively small number of questions are presented in a sequence to the user ID and the answers received associated with those questions are utilized to personalize the Website. The answer that is received to a question may be utilized to determine the next question that is presented to the user ID based on a decision tree. In this manner, the next question asked depends on the answers to all the previous questions. Based on an analysis of the received answers, specific Website content may be selected to be presented to the user ID.
A first task in accordance with embodiments of the present invention is to categorize possible Website clients, as represented by a user ID, into use segments. This may be achieved by identifying and statistically processing a source of information on computer usage by consumers to identify clusters as described below. One source of such computer usage may be a computer usage survey such as may be provided by FORRESTER RESEARCH, INC. (400 Technology Square, Cambridge, Ma 02139). However other survey suppliers may provide computer usage information surveys also. These surveys may typically include a hundred or more multiple yes/no questions answered by thousands of people related to activities performed on a home or other computer by those surveyed.
In an exemplary embodiment of the present invention, the identified computer usage information is statically processed and cluster information is generated and used to provide a cluster type or a vocabulary of possible client interests for a user ID that is used to access one or more Websites. The resulting cluster information may provide groupings of words that pertain to the content of Websites. The groupings, referred to herein as “clusters,” may be used to characterize the content of individual Websites in terms of the interests of clients that visit those Websites. Each cluster can represent a unique cluster type and may be assigned a unique cluster-type descriptor. The resulting cluster information can provide words that pertain to the usage of Websites the surveyed computer clients reported that they made of visited Websites.
A use-case refers to a particular market or markets a Website content is useful to address. As used herein, a Website may include one or more Web pages each of which may have, or may be configured to have different content. In addition, each Web page may also have sub Web pages.
Usage segment types corresponding to the interests of a particular client are determined initially by answers to questions provided by that client's user ID. These answers are utilized, upon accessing a selected Website, to make an initial determination of which usage segments and cluster types relate to content available from the selected Website. The Website may use the cluster types to customize the Website according to the interests indicated by the answers provided from the user ID. This is useful when a user ID is received for the first time by a Website and information relating to prior computer usage associated with that user ID may not be available to the Website.
An exemplary embodiment of the present invention enables a Website to provide relevant client interest information to a first time client while reducing the likelihood that extraneous or irrelevant information will be presented to the client. This may provide the Website client with a more favorable initial impression of the Website when prior information of the client's interest is not available to the Website.
The client system 102 can have other units operatively coupled to the processor 112 through the bus 113. These units can include tangible, machine-readable storage media, such as a storage system 122 for the long term storage of operating programs and data, including the programs and data used in exemplary embodiments of the present techniques. The storage system 122 may also store a database of cluster information and a client profile generated in accordance with exemplary embodiments of the present techniques. Further, the client system 102 can have one or more other types of non-transitory, computer readable storage media, such as a memory 124, for example, which may comprise read-only memory (ROM) and/or random access memory (RAM). In an exemplary embodiment, the client system 102 includes a network interface adapter 126, for connecting the client system 102 to a network, such as a local area network (LAN 128), a wide-area network (WAN), or another network configuration. The LAN 128 can include routers, switches, modems, or any other kind of interface device used for interconnection.
Through the LAN 128, the client system 102 can connect to a business server 130. The business server 130 can have a storage array 132 for storing enterprise data, buffering communications, and storing operating programs for the business server 130. The business server 130 can have associated printers 134, scanners, copiers and the like. The business server 130 can access the Internet 110 through a connected router/firewall 136, providing the client system 102 with Internet access. Those of ordinary skill in the art will appreciate that business networks can be far more complex and can include numerous business servers 130, printers 134, routers 136, and client systems 102, among other units. Moreover, the business network discussed above should not be considered limiting as any number of other configurations may be used. For example, in embodiments, the client system 102 may be directly connected to the Internet 110 through the network interface adapter 126, or may be connected through a router or firewall 136. Any system that allows the client system 102 to access the Internet 110 should be considered to be within the scope of the present techniques.
Through the router/firewall 136, the client system 102 can access a search engine 104 connected to the Internet 110. In exemplary embodiments of the present invention, the search engine 104 can include generic search engines, such as GOOGLE™, YAHOO®, BING™, and the like. The client system 102 can also access the Websites 106 through the Internet 110. The Websites 106 can have single Web pages, or can have multiple sub pages 138. The Websites 106 can also provide search functions, for example, searching sub pages 138 to locate products or publications provided by the Website 106. For example, the Websites 106 may include sites such as EBAY®, AMAZON.COM™, WIKIPEDIA™, CRAIGSLIST™, FOXNEWS.COM™, and the like. Further, one or more of the Websites 106 may be configured to receive information from a client to the Website, for example, from a unit located at a particular user ID, regarding interests of the client, and the Website may use the information to determine, in part, the content to deliver to the user ID.
One or more Websites 106 may also access a database 144, which is connected to the Internet 110 and includes computer usage information from, for example, a survey of computer usage. The database 144 may also include cluster information, which may be generated, at least in part, by an automated or other analysis of the computer usage information as described below in reference to
The method begins at block 202, wherein a source of information on consumer computer usage may be filtered 204. The output of the filtering process 204 is a list of yes/no questions relevant to a particular use-case of activities performed on a home or other computer. Such activities include internet usage, social activities, audio and video usage, gaming participation, online shopping and other activities. These questions represent a multidimensional binary vector that can be used to classify each particular surveyed client where a value of 1 may be used to correspond to answering yes to a question. If, for example, there were 5000 computer clients surveyed and 150 questions were selected, then the computer usage of each of the 5000 surveyed clients may be represented by 150 binary vectors based on their answers to the 150 selected survey questions. In some embodiments, the answers may be in the form of preferences such as, for example, a rating of 1 to 5 instead of in binary form. The questions are selected to be relevant to a target market, or use-case, of a particular Website which may be utilized by a user ID. This selection of relevant questions may be made from a list that may include more than a hundred questions some of which may not be relevant to a use-case of interest. Therefore the non relevant questions may be discarded or not further utilized. The selection of relevant questions may be performed by automated or manual means.
At block 206, cluster information is generated from the selected questions. The cluster information may be generated by automated analysis of the questions by, for example, a statistical analysis such as clustering, co-clustering, information-theoretic co-clustering, and the like based on a specific use-case. In one exemplary embodiment of the present invention, the automated analysis includes segmenting the questions into cluster types. In an implementation where the set of selected questions is sufficiently small, the cluster information may be generated manually based on a specific use-case. As used herein, the term “cluster type(s)” refers to a unique cluster that represents a particular client's interest or type of Web content. Each cluster may also be assigned a unique cluster-type descriptor, as will be explained further below. For example, questions relating to photography can be assigned to cluster type “Q” where Q is a unique cluster identification reference. In like manner questions relating to stocks can be assigned to cluster type G. It should also be noted that a cluster may be a single question such as “do you purchase airline tickets?” Therefore a cluster may also be considered a category or usage type. Exemplary individual clusters types that may be identified by a cluster analysis are detailed in Table 1. Of course the use of different computer usage information or other analytical tools may generate the same, less, more or different clusters types.
At block 208 by using topic modeling analysis such as, for example, Probabilistic Latent Semantic Indexing (“PLSI”) analysis or Latent Dirichlet Allocation (“LDA”), on the identified binary vectors, computer usage segments are identified. In the exemplary example, four usage segments were identified: Social Net Usage, Spenders, Enthusiast, & Finance. The segment names such as “Spenders” are arbitrary, but are selected to aid human understanding of aspects of the related segment. For example, the “Spenders” segment can represent computer purchasing usage such as the purchase of airline, movie and other event tickets. The relationship between the Clusters and the usage Segments is illustrated in
At Block 210 a decision tree is generated from the cluster data from 206. An example of a decision tree is graphically illustrated in
The maximum depth of the resultant tree is limited to about 6 levels in the exemplary embodiment discussed herein, but in some applications a deeper tree may be useful. However, a tree of level 6 will provide a set of questions that may generally provide an adequate level of information from a first time Website client, as represented by a user ID, without the number of questions becoming objectionable. While a more accurate categorization of a first time client may be had by asking 150 questions, most Website clients would find having to answer so many questions undesirable and refuse to use the associated Website. The answers from an user ID to these questions can be subsequently utilized to determine the content of a displayed Website. Once the decision tree is generated, it may remain fixed for a particular use-case and utilized to classify any user ID that is presented to the Website for the first time.
At block 504, Using the decision tree of
Once a usage Segment 302-308 is identified, then content likely relevant to that usage Segment may be selected and displayed or made available to the User-ID by a Website. This may provide a first time client to the Website, as represented by a user ID, a more satisfying experience. In other embodiments, once a specific cluster type A-Q is determined to be relevant to the user ID as indicated by the received answers, the content of the Website may be customized to present or otherwise make available to the user ID content without relying on or determining one or more relevant usage Segments 302-308.
The various software components discussed herein can be stored on the non-transitory, computer readable medium 600 as indicated in
A forth block 612 can include a cluster type comparator for analyzing information received from a user ID to identify one or more matching computer usage Segments associate with the Website. A fifth block 614 can include a Website or Web page configurator to customize a Web page or a Website to display information related to the matching computer usage Segmentss.
Although shown as contiguous blocks, the software components can be stored in any order or configuration. For example, if the non-transitory, computer readable medium 600 is a hard drive, the software components can be stored in non-contiguous, or even overlapping, sectors.