The subject matter disclosed herein generally relates to information retrieval. Specifically, the present disclosure addresses such methods and apparatus involving presenting a related item using a cluster.
General merchandising of items for sale via a network-based merchandising system is well-known. Many websites accessible via the Internet are operated as online stores or auctions. These websites enable users to purchase items that may be physical items (e.g., an article of clothing), electronic data items (e.g., a downloadable digital media product), or services to be rendered by an affiliated service provider.
To facilitate potential transactions and thereby improve user experiences, some websites provide recommendations of items to users. A recommendation of an item may be provided by sending an e-mail message to a user to notify the user that a popular product is available for sale. Providing a recommendation may also be performed by displaying an advertisement for a best-selling product directly to the user.
Greater sophistication in providing a recommendation to a user may be achieved by selecting an item to be recommended based on user preferences stored in a user profile or based on a history of previous purchases by the user. Additionally, aggregated opinions or ratings of items provided by other users may be used to enhance identification of an item to be recommended.
Some embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings in which:
Example methods and apparatus are directed to presenting a related item using a cluster. Examples merely typify possible variations. Unless explicitly stated otherwise, components and functions are optional and may be combined or subdivided, and operations may vary in sequence or be combined or subdivided. In the following description, for purposes of explanation, numerous specific details are set forth to provide a thorough understanding of example embodiments. It will be evident to one skilled in the art, however, that the present subject matter may be practiced without these specific details.
Items are grouped into clusters that are defined by query expressions applied to descriptions of the items. Given an initial item, its associated cluster is accessed, and another item is identified from the initial item's cluster or from a similar cluster. Once identified, the other item is presented as related to the initial item.
Potential advantages include, but are not limited to, improving the quality of recommendations provided to a user of a network-based merchandising system, reducing computational loads on processing hardware involved in providing recommendations, reducing network traffic involved in users searching for items, and efficiently providing recommendations in situations where items are not assigned to predefined categories.
As used herein, the term “item” refers to a physical or non-physical item potentially or actually available for sale, as well as to a representation of such an item within a network-based merchandising system. As examples, a physical item may be a product or a good (e.g., an article of clothing), and a non-physical item may be a data package (e.g., downloadable digital media content). An example of a representation of an item is an item identifier (e.g., a serial number, an item number, or a sales listing number) assigned to an item by a network-based merchandising system. Another example of a representation of an item is an image of the item (e.g., a picture of an article of clothing).
If an item has been identified by the query expression, the item may be grouped into the corresponding cluster by associating the item with the CID corresponding to the query expression. For example, a cluster dictionary may associate the query expression “men's and red and shirt” with CID “1000.” Within an inventory of items, all items that are men's red shirts satisfy the query expression and may be associated with CID 1000, thus grouping the men's red shirts into that cluster. As used herein, a “query expression” is a set of one or more criteria that defines the membership of items in a cluster. For example, a query expression may be a Boolean expression including one or more keywords functioning as the one or more criteria (e.g., “men's and red and shirt” or “music and player and (digital or analog”).
A cluster may contain any number of items or no items at all. Similarly, an item may be a member of any number of clusters, or no cluster at all. As shown in
A cluster 110 may be designated as similar to another cluster 120 based on any similarity model or no model at all. In some example embodiments, similarity is based on associative relationship between clusters. For example, an item-cluster database may associate an item with all CIDs of clusters containing the item, as well as with all CIDs of clusters designated within a networked-based merchandising system as sufficiently similar to the clusters containing the item. In various example embodiments, similarity is based on more sophisticated models involving mathematical weighting factors applied to different criteria used in the query expressions defining the clusters. For example, a cluster defined by the query expression “men's and red and shirt” may be designated as more similar to a cluster defined by the query expression “men's and red and hat” than to cluster defined by the query expression “women's and red and shirt” on the basis of giving more weight to clothing type than to the wearer's gender.
Furthermore, an individual item may be designated as similar to another individual item in the same manner as in the designation of similar clusters. All similarity features and operations described herein as applied to clusters may be applied to individual items. For example, descriptions of multiple items may be processed to determine similarity scores relative to a description of a reference item (e.g.,
As used herein, the term “related” refers to a cluster-based relationship existing between or among items. Two or more items are related to each other if the items are associated together by a direct or indirect cluster relationship. For example, as shown in
The cluster dictionary 240 is used to assign one or more CIDs to an item, based on one or more query expressions (e.g., query expression 242) contained in the cluster dictionary 240. For example, when receiving an item into an inventory of items, a description of the item may be processed to identify a match between a keyword in the description and a criterion of the query expression 242. If a match is identified, the corresponding CID 244 is assigned to the item, and the item becomes a member of that cluster (e.g.,
Query expressions (e.g., query expression 242) may be of any length, and their criteria may overlap. As a result, clusters may be of any size and may include other clusters (e.g., sub-clusters). Accordingly, the cluster dictionary 240 may define a hierarchy or heterarchy of clusters that includes one or more parent clusters and one or more child clusters. A hierarchy of clusters or a heterarchy of clusters may have any level of sophistication or complexity.
In some example embodiments, the query expression 242 is a predefined query expression received from a server machine or an administrator of a network-based merchandising system. For example, a database of predefined query expressions may be maintained on a server machine within a network-based merchandising system, and periodic updates of this database may be received from the server machine. In certain example embodiments, the query expression 242 is a user-defined query expression received from a user of the network-based merchandising system. For example, the network-based merchandising system may allow users to create their own clusters by submitting their own query expressions.
The data structure 332 contains a set of associated data fields that associate an item 334 (e.g.,
Multiple sets of associated data fields may be contained within the data structure 332. Alternatively, multiple data structures may be contained within the item-cluster database 330. In either case, the item-cluster database 330 allows identification of one or more CDs, given an item. Conversely, the item-cluster database 330 also allows identification of one or more items, given a CID. For example, such identification may be performed via a lookup operation.
Operation 410 involves receiving a query expression (e.g.,
Operation 420 involves receiving an item (e.g.,
Operation 430 involves determining that the item (e.g.,
Operation 440 involves generating a data structure (e.g.,
Operation 450 involves accessing a CID (e.g.,
Operation 460 involves accessing a data structure (e.g.,
Operation 470 involves identifying another item (e.g.,
Since the initial item (e.g.,
Determination of the subset may be based on a hierarchy or heterarchy of CIDs, one or more lengths of one or more query expressions, one or more numbers of criteria in one or more query expressions, or any combination thereof. For example, using a hierarchy or heterarchy of CIDs may involve limiting the subset to certain children (or grandchildren, great-grandchildren, etc.) CIDs, ignoring one or more parent CIDs, or any combination thereof. As another example, the subset may be limited to CIDs with query expressions above a specified length. As a further example, the subset may be limited to CIDs with query expressions having more than a specified number of criteria.
Operation 480 involves presenting the other item (e.g.,
In some example embodiments, presentation of the other item (e.g.,
Operation 510 involves receiving a query expression (e.g.,
Operation 520 involves receiving a first item (e.g.,
Operation 530 involves determining that the first item (e.g.,
Operation 540 involves generating a data structure (e.g.,
Operation 550 involves accessing the first CID (e.g.,
Operation 560 involves accessing a data structure (e.g.,
Operation 570 involves identifying the second item (e.g.,
Since the first item (e.g.,
Determination of the subset may be based on a hierarchy or heterarchy of CDs, one or more lengths of one or query expressions, one or more numbers of criteria in one or more query expressions, a similarity score calculated based on a weighting factor corresponding to one or more query expression criteria, or any combination thereof. For example, using a hierarchy or heterarchy of CDs may involve limiting the subset to certain children (or grandchildren, great-grandchildren, etc.) CIDs, ignoring one or more parent CIDs, or any combination thereof. As another example, the subset may be limited to CIDs with query expressions above a specified length. As a further example, the subset may be limited to CIDs with query expressions having more than a specified number of criteria. Additionally, a similarity score may be calculated based on mathematical weighting factors applied to various criteria in the query expressions of the relevant clusters. For example, if the first CID corresponds to a cluster defined by the query expression “men's and red and shirt,” a cluster defined by the query expression “men's and red and hat” may receive a greater similarity score than a cluster defined by the query expression “women's and red and shirt,” on the basis of giving more weight to similarities in clothing type than to similarities in wearer's gender.
Operation 580 involves presenting the second item (e.g.,
In some example embodiments, presentation of the second item (e.g.,
The hardware apparatus 610 includes an access module 612, an identification module 614, a presentation module 616, an intake module 617, and a network interface 619. The hardware apparatus 610 may be a computer system that implements the access module 612, the identification module 614, the presentation module 616, or any combination thereof, in hardware within the computer system.
The access module 612 is configured to access a CID (e.g.,
In various example embodiments, the access module 612 is configured to receive one or more query expressions (e.g.,
The identification module 614 is configured to identify the second item (e.g.,
In some example embodiments, the identification module 614 is further configured to perform the identification of the second item (e.g.,
In various example embodiments, the identification module 614 is configured to perform the identifying of the second item (e.g.,
The presentation module 616 is configured to present the second item (e.g.,
The intake module 617 is configured to receive the first item (e.g.,
The network interface 619 may be any network interface able to communicatively couple the hardware apparatus 610 with the network 620. The network 620 may be any wired or wireless network. For example, the network 620 may be a local area network, a wide area network, the Internet, or any combination thereof.
Computer system 700 includes processor 702 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), application specific integrated circuits (ASICs), radio-frequency integrated circuits (RFICs), or any combination of these), main memory 704, and static memory 706, which communicate with each other via bus 708. Computer system 700 may further include graphics display unit 710 (e.g., a plasma display panel (PDP), a liquid crystal display (LCD), a projector, or a cathode ray tube (CRT)). Computer system 700 may also include alphanumeric input device 712 (e.g., a keyboard), cursor control device 714 (e.g., a mouse, a trackball, a joystick, a motion sensor, or other pointing instrument), storage unit 716, signal generation device 718 (e.g., a speaker), and network interface device 720.
Storage unit 716 includes a machine-readable medium 722 on which is stored instructions 724 (e.g., software) embodying any one or more of the methodologies or functions described herein. Instructions 724 (e.g., software) may also reside, completely or at least partially, within main memory 704 and/or within processor 702 (e.g., within a processor's cache memory) during execution thereof by computer system 700, main memory 704 and processor 702 also constituting machine-readable media. Instructions 724 (e.g., software) may be transmitted or received over network 726 via network interface device 720.
As used herein, the term “memory” refers to a machine-readable medium able to store data temporarily or permanently and may be taken to include, but not be limited to, random-access memory, read-only memory, buffer memory, flash memory, and cache memory. While machine-readable medium 722 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) able to store instructions (e.g., instructions 724). The term “machine-readable medium” shall also be taken to include any medium that is capable of storing instructions (e.g., instructions 724) for execution by the machine and that cause the machine to perform any one or more of the methodologies described herein. The term “machine-readable medium” shall accordingly be taken to include, but not be limited to, data repositories in the form of solid-state memories, optical media, and magnetic media.
Throughout this specification, plural instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. Structures and functionality presented as separate components in example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.
Certain embodiments are described herein as including logic or a number of components, modules, or mechanisms. Modules may constitute either software modules (e.g., code embodied on a machine-readable medium or in a transmission signal) or hardware modules. A hardware module is tangible unit capable of performing certain operations and may be configured or arranged in a certain manner. In example embodiments, one or more computer systems (e.g., a standalone, client or server computer system) or one or more hardware modules of a computer system (e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a hardware module that operates to perform certain operations as described herein.
In various embodiments, a hardware module may be implemented mechanically or electronically. For example, a hardware module may comprise dedicated circuitry or logic that is permanently configured (e.g., as a special-purpose processor, such as a field programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations. A hardware module may also comprise programmable logic or circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that is temporarily configured by software to perform certain operations. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.
Accordingly, the term “hardware module” should be understood to encompass a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner and/or to perform certain operations described herein. As used herein, “hardware-implemented module” refers to a hardware module. Considering embodiments in which hardware modules are temporarily configured (e.g., programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where the hardware modules comprise a general-purpose processor configured using software, the general-purpose processor may be configured as respective different hardware modules at different times. Software may accordingly configure a processor, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.
Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple hardware modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) that connect the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further hardware module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware modules may also initiate communications with input or output devices, and can operate on a resource (e.g., a collection of information).
The various operations of example methods described herein may be performed, at least partially, by one or more processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules that operate to perform one or more operations or functions. The modules referred to herein may, in some example embodiments, comprise processor-implemented modules.
Similarly, the methods described herein may be at least partially processor-implemented. For example, at least some of the operations of a method may be performed by one or processors or processor-implemented modules. The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the processor or processors may be located in a single location (e.g., within a home environment, an office environment or as a server farm), while in other embodiments the processors may be distributed across a number of locations.
The one or more processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS). For example, at least some of the operations may be performed by a group of computers (as examples of machines including processors), these operations being accessible via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., application program interfaces (APIs).)
The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the one or more processors or processor-implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.
Some portions of this specification are presented in terms of algorithms or symbolic representations of operations on data stored as bits or binary digital signals within a machine memory (e.g., a computer memory). These algorithms or symbolic representations are examples of techniques used by those of ordinary skill in the data processing arts to convey the substance of their work to others skilled in the art. As used herein, an “algorithm” is a self-consistent sequence of operations or similar processing leading to a desired result. In this context, algorithms and operations involve physical manipulation of physical quantities. Typically, but not necessarily, such quantities may take the form of electrical, magnetic, or optical signals capable of being stored, accessed, transferred, combined, compared, or otherwise manipulated by a machine. It is convenient at times, principally for reasons of common usage, to refer to such signals using words such as “data,” “content,” “bits,” “values,” “elements,” “symbols,” “characters,” “terms,” “numbers,” “numerals,” or the like. These words, however, are merely convenient labels and are to be associated with appropriate physical quantities.
Unless specifically stated otherwise, discussions herein using words such as “processing,” “computing,” “calculating,” “determining,” “presenting,” “displaying,” or the like may refer to actions or processes of a machine (e.g., a computer) that manipulates or transforms data represented as physical (e.g., electronic, magnetic, or optical) quantities within one or more memories (e.g., volatile memory, non-volatile memory, or any combination thereof), registers, or other machine components that receive, store, transmit, or display information. Furthermore, unless specifically stated otherwise, the terms “a” or “an” are herein used, as is common in patent documents, to include one or more than one instance. Finally, as used herein, the conjunction “or” refers to a non-exclusive or, unless specifically stated otherwise.