A variety of different search systems have been developed to assist users in identifying products, such as movies, music, news, books, research articles, web pages, search queries, social tags, restaurants, and descriptions of persons on online dating platforms. These systems typically involve one or more collaborative or content-based filtering techniques. Collaborative filtering typically involves automatically predicting the interests of a user based on the preferences collected from the user and other users. Content-based filtering typically involves comparing product descriptions with a profile of the user's preferences. In another approach, recommendations are generated based on a conceptual or semantic matching process that involves parsing text information relating to a movie or other content into components (e.g., scenes or clips of a movie), assigning predefined semantics (e.g., concepts or themes, such as “chase scene,” “fight scene,” “anger,” and “happiness”) to these components based on the text information, indexing and categorizing the content based on the assigned semantics, and recommending contents based on the likelihoods that the semantics assigned to their respective components match user or group profiles or preferences.
In the following description, like reference numbers are used to identify like elements. Furthermore, the drawings are intended to illustrate major features of exemplary embodiments in a diagrammatic manner. The drawings are not intended to depict every feature of actual embodiments nor relative dimensions of the depicted elements, and are not drawn to scale.
A “product” is any tangible or intangible good or service that is available for purchase or use.
A “document” is a persistent text based information record.
A “word group” is a set of word-based elements of a document and an assigned weight.
An “element” is a word, name, or phrase.
A “weight” is a numerical quantity assigned to an element that indicates an importance level of the element relative to other elements.
A “vector” is a set of one or more word groups.
“Classic literature” refers to written works judged over a period of time to be of the highest quality and outstanding of its kind.
“Punctuation” refers to marks, such as periods, commas, parentheses, page breaks, and other demarcations that are used in writing to separate, for example, chapters, paragraphs, sentences and other elements, and to clarify meaning.
A “computer” is any machine, device, or apparatus that processes data according to computer-readable instructions that are stored on a computer-readable medium either temporarily or permanently. A “computer operating system” is a software component of a computer system that manages and coordinates the performance of tasks and the sharing of computing and hardware resources. A “software application” (also referred to as software, an application, computer software, a computer application, a program, and a computer program) is a set of instructions that a computer can interpret and execute to perform one or more specific tasks. A “data file” is a block of information that durably stores data for use by a software application.
The term “computer-readable medium” (also referred to as “memory”) refers to any tangible, non-transitory medium capable storing information (e.g., instructions and data) that is readable by a machine (e.g., a computer). Storage devices suitable for tangibly embodying such information include, but are not limited to, all forms of physical, non-transitory computer-readable memory, including, for example, semiconductor memory devices, such as random access memory (RAM), EPROM, EEPROM, and Flash memory devices, magnetic disks such as internal hard disks and removable hard disks, magneto-optical disks, DVD-ROM/RAM, and CD-ROM/RAM.
A “network node” (also referred to simply as a “node”) is a physical junction or connection point in a communications network. Examples of network nodes include, but are not limited to, a terminal, a computer, and a network switch. A “server node” is a network node that responds to requests for information or service. A “client node” is a network node that requests information or service from a server node.
As used herein, the term “includes” means includes but not limited to, the term “including” means including but not limited to. The term “based on” means based at least in part on.
A. Introduction
The examples that are described herein provide improved systems and methods for recommending products to users. These examples provide a conceptual product recommendation service that allows users to define the parameters that drive a search for one or more target products as a concept that can be specified in a variety of different ways, ranging from the specification of an abstract or generic idea (e.g., “courage” or “loneliness”) to the specification of a particular instance of a product (e.g., a particular movie, book, music, news item, web page, encyclopedia entry, or other document) that embodies one or more conceptual elements (e.g., idea, theme, mood, place, person, or item) sought by the user. In the process of matching the user-specified concept to a set of target products, the conceptual product recommendation service compares a word vector based representation of the user-specified concept to respective word vector based representations of multi-document compilations relating to the target products. In this way, these systems and methods provide results that better reflect the user's intention than other product recommendation approaches, such as those that rely on preconceived concepts or themes for matching products to user inputs or profile preferences.
B. Exemplary Operating Environment
The first client network node 12 includes a tangible computer-readable memory 22, a processor 24, and input/output (I/O) hardware 26 (including a display). The processor 24 executes at least one network-enabled application 28 (e.g., a web browser) that is stored in the memory 22. Each of the other client network nodes 14 typically is configured in substantially the same general way as the first client network node 12, with a tangible computer-readable memory storing at least one communications application, a processor, and input/output (I/O) hardware (including a display).
The product provider 18 includes at least one server network node 30 that includes a product recommendation and provision application 32 that hosts a product recommendation and provision service. In some examples, the product provider 18 is a content source (e.g., Amazon.com, Netflix, Inc., Comcast Corporation, and Apple Inc.) that supplies digital media content to the users' client network nodes 12, 14. The product recommendation and provision service maintains a product database 34, a concept database 36, and a conceptual mappings database 38. The product database 34 includes records that describe various target products (e.g., physical products, non-physical products, or both physical and non-physical products) that are available from the product provider 18. In some examples, the product database 34 also includes digital media content or links to digital media content that may be transmitted to the client network nodes 12, 14. The products listed in the product database 34 typically correspond to a particular market, which may encompass one or more product categories. The listed products within each product category may encompass a particular segment (e.g., all movies having a popularity above a threshold level) within that product category. The concept database 36 includes records that describe various search concepts. The concepts listed in the concept database 36 may be selected in a wide variety of different ways. In some examples, the selected concepts correspond to all of the products in the product database 34 and a subset of the entries in an online encyclopedia (e.g., Wikipedia). The conceptual mappings database 38 includes records that describe associations between the search concepts and respective ones of the target products.
C. Interfacing Users With the Conceptual Product Recommendation Service
In response to user input, the product recommendation service returns a ranked list of product descriptions (e.g., titles or synopses) from the product database 34 that match the user-specified concept based on the mappings described in the conceptual mappings database 38, as described in detail below.
In accordance with the method of
As the user enters text into the text input box 62, the product recommendation service automatically matches the user input to concepts (
The product recommendation service displays the content tags that are associated with respective ones of the concepts, sorted by their associated concept ratings (
The product recommendation service receives user selection of a respective one of the displayed concept tags (
In some examples, if user input in the text input box 62 matches a respective one of the Concept Title Tag entries 84, the product recommendation service automatically displays the associated sorted list of Matching Product Titles 86. If the user input matches more than one of the Concept Title Tag entries 84, the matching Concept Title Tag entries 84 are displayed in the drop-down menu 68 (
D. Conceptually Mapping Concepts to Products
For each product listed in the product database 34 (
Based on an analysis of the identified target conceptual documents 98, the conceptual document selection engine 94 selects a respective mix 102 of target conceptual documents 98 (also referred to as a “target multi-document compilation”) that conceptually “describes” the product (
In some examples, for each of respective ones of the target products, the conceptual document selection engine 94 selects different types of the identified target conceptual documents 98 for the respective mix 102. Exemplary document types include descriptive documents that include descriptions of the target product, review documents that include reviews of the target product (e.g., user reviews and professional critic reviews), and reference documents that include technical specifications of the target product (e.g., for movies, technical specifications include director, actors, release date, title, characters, synopsis, etc.). In some examples, one or more product types are associated with a respective target proportions of document content from descriptive documents, review documents, and reference documents. In these examples, for each of respective ones of the target products, the conceptual document selection engine 94 selects document content from descriptive documents, review documents, and reference documents based on the respective target proportion associated with the type of the target product. In some examples, each of the movie and book product types is associated with a target document proportion of document content selected from user review documents, critic review documents, and reference documents with the proportion of document content from user review documents being greater than the proportions of document content from critic review documents and reference documents combined. In one example, each of the movie and book product types is associated with a target document proportion of document content selected from four parts user review documents, one part critic review documents, and one part reference documents.
Similarly, for each concept listed in the concept database 36 (
For each product listed in the product database 34, the conceptual mapping engine 96 determines a respective target word vector representation of the respective target multi-document compilation (
For each concept in the concept database 36, the conceptual mapping engine 96 compares the search word vector and respective ones of the target word vectors to associate the concept with target products and respective match scores corresponding to degrees of match between the concept and the respective target products (
In accordance with the method of
For each of multiple search concepts, the product recommendation service chooses search conceptual documents relating to the search concept (
In some examples, the product recommendation service chooses the search conceptual documents by analyzing respective ones of the selected target conceptual documents for references to entries in an online encyclopedia (e.g., Wikipedia), and choosing a number of the most highly referenced ones of the entries in the online encyclopedia as search conceptual documents. These entries may include, for example, words (e.g., “brain” and “whistling”), names (e.g., Julius Caesar and Tony Curtis), or phrases (e.g., “labor camp” or “muscle car”). In addition, the selected target conceptual documents themselves may be used as search conceptual documents to search other target search documents. For example, if the target products consisted of a selection of books, the target conceptual document “Moby-Dick” may be used as a search conceptual document to find other books that are similar to “Moby-Dick” such as “Hunters of the Dark Sea” by Mel Odom. Likewise, for movies, a user may want to know movies that are similar to his favorite movies.
In some examples, search conceptual documents may be prepared to extract common classifications and lists from the selected target conceptual documents. Such search conceptual documents may be used to search for lists of targets. For example, if the target product type is movies, then a search conceptual document that includes a brief description of all the movies that won the Best Picture Oscar might be used to obtain a list of movies that won the Best Picture Oscar award.
In some examples, the process of determining the target and search vectors involves, for each of the respective conceptual documents: identifying names corresponding to names in a names dictionary comprising names of famous people, places, and events; identifying word sequences corresponding to phrases in a phrase dictionary and assigning to the identified phrases respective weights specified in the phrase dictionary; and identifying individual words corresponding to words in a word dictionary and assigning to the individual words respective weights specified in the word dictionary. This process additionally involves, for each of the conceptual documents: forming a respective word group from a respective pairing of each word-based element of the conceptual document with each subsequent word-based element in a sliding window of text of the conceptual document; assigning a respective weight to each word group formed; and reducing the weight assigned to each word group based on extents to which word based elements and punctuation appear between the constituent words of the word group in the respective conceptual document.
For each search concept (
In non-transitory computer-readable memory, the product recommendation service stores associations between the search concepts and respective ones of the target products in one or more data structures permitting computer-based generation of lists of respective ones of the target products sorted by the respective match scores in response to respective queries comprising respective ones of the search concepts (
E. Dictionaries
As explained above, the determination of the target and search word vectors is based on identification of word-based elements of the respective multi-document compilations in a name dictionary 120, a weighted phrase dictionary 122, and a weighted word dictionary 124. In some examples, these dictionaries are created as follows.
The name dictionary is created by collecting the names of famous people (e.g., Alexander the Great), places (e.g., London), and events (e.g., Battle of the Bulge). In this process, if two names indicate the same person, the two names are combined into a single name. For example, the names Bill Clinton, President Clinton, and President Bill Clinton all would be referred to President Bill Clinton. Common last names, such as Murray, also are included in the name dictionary. Last names that conflict with names of common words, such as “little” or “west” are not used. Titles such as Mrs., Captain, and President are included in the name dictionary. Names are not given a weight in the name dictionary; instead they are weighted when they are paired with a word or phrase into a word group.
The word dictionary is created by starting with a normal English dictionary, excluding proper nouns that are in the Names dictionary, and weighting each remaining word (including abbreviations) according to its commonality, preciseness, use in classic literature, and emotion. In this process, qualities of words are assessed according to statistics obtained from words extracted from a collection of classic literature, and weights are assigned to words in the word dictionary based at least in part on the assessed qualities of the words. In addition, the precisions of words are assessed based on respective counts of different meanings that are associated with the words, and weights are assigned to words in the word dictionary based at least in part on the assessed precision of the words. If words are used commonly, they are weighted lower; if they are rare they are weighted higher. Words such as “the” or “with” are used so commonly they are assigned a weight of zero and not used in the word vector correlation process. If words have multiple meanings their weight is reduced. For example, “hit” would be penalized because it has many meanings depending on the context. This is determined by examining a normal English dictionary and counting the number of different meanings of a word. If a word needs context to be useful it is weighted lower. For example, “army” needs the additional context of who owns the army (British, Roman etc.). Words that have strong meanings are rated higher. For example “abhorrent” is assigned a higher weight than “abduct” because it adds extra energy in a sentence. If words appear more often in “classic books” (e.g., Moby-Dick and The Hobbit) they are weighted more heavily.
The phrase dictionary includes consecutive normal words that are commonly seen in English text and have special meaning when placed together. For example, “affirmative action” or “hot and bothered.” If two or more consecutive words change their meaning when combined (e.g., “spaghetti western”) they are placed in a phrase dictionary and given a higher weight. Weights also are assigned to phrases according to the commonality, preciseness, use in classic literature, and emotion criteria described above. If two or more consecutive words are commonly placed together in text and either one or both are low weight words (e.g., “time travel”), they are combined into a phrase with greater weight. If two or more consecutive words both have high weights in the word dictionary, they are not placed in the phrase dictionary unless the consecutive words change their meaning when combined. In other words, the phrases in the phrase dictionary consisting of two or more consecutive words that are assigned relatively high weights in the word dictionary are phrases whose meanings are not suggested by their constituent words. Application of this criterion would preclude the inclusion of “river boat” in the phrase dictionary.
In some examples, respective ones of the names dictionary, the phrase dictionary, and the word dictionary are modified based on an analysis of the corpus of target conceptual documents that are selected for the target products. In these examples, respective ones of the weights in one or more of the names dictionary, the phrase dictionary, and the word dictionary are modified based on commonality of words in the target conceptual documents. For example, in movie descriptions the word “actor” typically is extremely common and therefore its assigned weight would be reduced. In addition, respective ones of the names dictionary, the phrase dictionary, and the word dictionary are modified to include new names, phrases, and words (including slang) identified in the selected target conceptual documents.
F. Extracting Word Group Vectors
The process of extracting word group vector representations from conceptual documents is the same for both target conceptual documents and search conceptual documents. This process involves scanning through the target and search conceptual documents to form names, words, and phrases. In this process, multiple words may be compressed into a new entity and all punctuation is saved.
Initially, the target and search conceptual documents are scanned to form names. All the names in the scanned documents that appear in the name dictionary are formed. If a single proper noun is not part of a sequence, appeared previously in the document as the end of a collected multiple sequence name, and is marked as a last name in the name dictionary, the single proper noun is recorded as equivalent to the previous multiple sequence name. For example, if the name Smith appears in a document and the name Adam Smith previously was found in the document, then Smith is converted to Adam Smith.
Word sequences in the target and search conceptual documents that match entries in the phrase dictionary are formed and weighted according to the weights in the phrase dictionary.
Individual words in the target and search conceptual documents that match entries in the word dictionary are formed and weighted according to the weights in the word dictionary. If a word has a weight of zero because it is very common (e.g., “can” and “then”), it is deleted from the text and not used in the correlation. Numbers and dates found in the documents are weighted. Dates are given a nominal weight unless they specify a famous event. All numbers that are not part of dates are counted as words. Small numbers have a minimal weight and larger numbers a normal weight.
For each target and search conceptual document, all the elements of the document are paired into word groups by searching forwards through the document and pairing the current element with all subsequent elements and assigning a weight to each word group that is formed. The initial word group weight is defined as the largest weight of the two elements extracted from the word or phrase dictionary. Since names have no weight, the names take on the weights assigned to the word or phrase with which they are paired. The distance between two elements in a document is defined as the number of elements they are apart linearly in the text.
In some examples, the weight of the word group is reduced proportionally with the distance. In one example, the reduced weight (w(new)) is equal to three times the original weight (w) divided by two times the distance (d) (i.e., w(new)=(3·w)/(2·d)). For example, if there are two words with weight w0=5 and w1=9, and they are 5 elements apart, a new word group would be formed with a weight given by:
weight(word group)=(3·max(9, 5))/(5·2)=2.7
In some examples, the constants 3 and 2 in the equation can be altered +/−10% depending on the type of documents being processed.
All punctuation (including paragraph and chapter crossings) between two elements to be gathered into a word group is collected. Depending on the type of punctuation and the frequency of its occurrence, the weight is reduced. In some examples, the weight reduction increases with position in the following punctuation sequence, with commas being associated with the least reduction in weight and end of chapters being associated with the most reduction in weight: comma; semicolon; colon; end of sentence; bullets; end of paragraph; and end of chapter.
Two names cannot form a word group. For example, the word group [Adam Smith, Victor Hugo, weight] is not allowed.
A word group cannot have equal elements. For example, [bald, bald, weight] is not allowed. If this pattern is encountered for a given element, the search forward for the given element is stopped and word group forming process is started for the next element.
Element pairs are stored alphabetically; the order in which the elements were extracted from the document is not used. For example, [man, bad, weight] would be stored as [bad, man, weight].
If, after generating a word group vector, any two word groups in the vector have equal elements, the word groups are combined into a single word group that is assigned a weight equal to the sum of the weights of the two word groups.
G. Recommending Target Products
As explained above, the conceptual mapping engine 96 performs a correlation matching process that generates match scores corresponding to degrees of match between the search concepts and the target products based on comparisons between the respective search word vectors and the respective target word vectors.
Before performing the correlation process, the conceptual mapping engine 96 normalizes the weights in the target and search vectors to account for differences in the relative sizes of the selected target conceptual documents and the chosen search conceptual documents. In some examples, the weights normalization is accomplished in each vector by dividing all non-normalized weights (weight(original)) according to the equation:
weight(normalized)=weight(original)/((document size)EXP)
In some examples, the value of the exponent EXP is altered ±10% depending on the types of documents being processed. For example, documents with a large amount of technical data are normalized with an EXP value reduced by −10%, and documents with a large amount conversation are normalized with an EXP value increased by +10%. A typical value for EXP is 0.46.
After the target and search vector weights have been normalized, the conceptual mapping engine 96 performs a correlation matching process. In some examples, this process involves performing a vector correlation operation that operates on two word group vectors to generate a final correlation single fixed-point number value (referred to as a “match score”). In accordance with this operation, the two word group vectors are compared. If any two word groups have equal elements, their weights are multiplied. All the multiplied word group weights are summed and the resulting sum is the final correlation value of the two word group vectors. For each search vector, the vector correlation operation is applied to all target vectors. This results in a vector of match scores equal in length to the number of target multi-document compilations (i.e., the number of target products). The correlation results for each search multi-document compilations are sorted by match score to produce an ordered list of the most similar target multi-document compilations, which corresponds to an ordered list of the most similar target products.
Users typically access a network communication environment from respective network nodes. Each of these network nodes typically is implemented by a general-purpose computer system or a dedicated communications computer system (or “console”). Each network node executes communications processes that connect with one or both of the product recommendation provider and the product provider.
A user may interact (e.g., input commands or data) with the computer system 320 using one or more input devices 330 (e.g. one or more keyboards, computer mice, microphones, cameras, joysticks, physical motion sensors such Wii input devices, and touch pads). Information may be presented through a graphical user interface (GUI) that is presented to the user on a display monitor 332, which is controlled by a display controller 334. The computer system 320 also may include other input/output hardware (e.g., peripheral output devices, such as speakers and a printer). The computer system 320 connects to other network nodes through a network adapter 336 (also referred to as a “network interface card” or NIC).
A number of program modules may be stored in the system memory 324, including application programming interfaces 338 (APIs), an operating system (OS) 340 (e.g., the Windows® operating system available from Microsoft Corporation of Redmond, Wash. U.S.A.), software applications 341 including the network enabled application 28, drivers 342 (e.g., a GUI driver), network transport protocols 344, and data 346 (e.g., input data, output data, program data, a registry, and configuration settings).
In some embodiments, the one or more server network nodes of the product providers 18, 42, and the recommendation provider 44 are implemented by respective general-purpose computer systems of the same type as the client network node 320, except that each server network node typically includes one or more server software applications.
In other embodiments, the one or more server network nodes of the product providers 18, 42, and the recommendation provider 44 are implemented by respective network devices that perform edge services (e.g., routing and switching).
The embodiments that are described herein provide improved systems and methods for recommending products to users.
Other embodiments are within the scope of the claims.