1. Field of Art
The disclosure generally relates to the field of recommender systems, and more specifically, the selection of specific items from a corpus of geotagged recommender content.
2. Description of the Related Art
Global Positioning System (GPS) enabled mobile devices with either Text-to-Speech (TTS) or media player functionality are giving individuals the power to mediate reality in new and interesting ways. Simultaneously, the growing availability of geocoded or geotagged items, including, but not limited to, short messages, news articles, blog posts, encyclopedia entries, and telemetric data, is creating a virtual landscape of incredible size and scope. There is a need to integrate these elements so that people can passively receive information about relevant activities, promotions, thoughts, feelings, and experiences associated with localities.
The disclosed embodiments have other advantages and features that will be more readily apparent from the detailed description, the appended claims, and the accompanying figures (or drawings). A brief introduction of the figures is below.
FIG. (
The Figures (FIGS.) and the following description relate to preferred embodiments by way of illustration only. It should be noted that from the following discussion, alternative embodiments of the structures and methods disclosed herein will be readily recognized as viable alternatives that may be employed without departing from the principles of what is claimed.
Reference will now be made in detail to several embodiments, examples of which are illustrated in the accompanying figures. It is noted that wherever practicable similar or like reference numbers may be used in the figures and may indicate similar or like functionality. The figures depict embodiments of the disclosed system (or method) for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein.
A system, a method and a computer program product are disclosed for recommending geotagged items collected from one or more digital sources. A location fix and heading obtained from a mobile device, along with historical listening preferences and histories of end-users, are utilized to select a relevant item to be presented to an end-user.
One aspect of the disclosed recommender is that it does not adopt the filtering approach commonly employed by many content-based recommender systems. Under the filtering approach, a one-dimensional stream of candidate items is input into a recommender and, for each item, the probability of a given end-user liking the candidate item is calculated. Candidate items with a probability that exceeds a certain threshold are then recommended to the end-user, often as a list on a two-dimensional display device. An example of the filtering approach is Bayesian spam filtering, wherein the recommender monitors an end-user's incoming emails and routes them to either a junk folder or inbox folder.
In contrast to conventional systems, a disclosed recommender uses a funneling approach. The recommender takes in a two-dimensional (spatial) or three-dimensional (spatiotemporal) collection of candidate items, calculates emission probabilities for them in the context of a given end-user, and then uses these probabilities to recommend a candidate item for presentation to the end-user. Repeated, or iterative, application of the funneling approach yields a one-dimensional stream of relevant items as the end-user moves either physically or virtually through space-time. One advantage of this approach is that it minimizes interruptions or “dead air” since the recommender will continue to present items to the end user as long as previously un-presented items are available within the region the end-user is exploring.
Another aspect of the disclosed recommender relates to how feedback is handled. Typically, a content-based recommender system makes recommendations to an end-user based on items the end-user has previously liked. Getting feedback from an end-user is a slow process, however, so existing content-based recommenders have typically suffered from a “cold-start” problem. That is, they tend to make poor recommendations until sufficiently customized via feedback. When an end-user is primarily restricted to supplying feedback through a mobile device, the “cold-start” problem is exacerbated. For example, it may be dangerous, or even illegal, to provide feedback while in certain situations, such as while driving. The disclosed recommender is not solely dependent on feedback; rather, it is a hybrid system that is composed of learning components as well as components that model facets of human attention in order to make extrapolations about an end-user's interests.
Still another final aspect of the disclosed recommender is its incorporation of social information. A content-based recommender system, by definition, makes recommendations based off the content, or features, of the items under its consideration. A collaborative recommender system makes use of the opinions of other end-users when formulating recommendations. For example, the disclosed recommender utilizes some of the benefits of collaborative recommenders by preprocessing social information into features (e.g., popularity, sponsorship, etc.) of the items themselves.
Referring now to
In the illustrated embodiment of
Still referring to the illustrated embodiment of
Again referring to the illustrated embodiment of
Referring back to
During playback, and for a short time thereafter, the mobile device 102 solicits feedback for the recommended item. If feedback is given, the mobile device 102 sends it to the recommendation server 106, along with the user ID and the identity of the item. The recommendation server 106 records the feedback as an event in the event database 110. In the illustrated embodiment, it is only possible to give positive feedback (e.g., a “LIKED”) on a recommended item. Alternative embodiments permit negative feedback (e.g., a “DISLIKED”), and numerical feedback scores (e.g., scoring a recommended item on a scale from one to five). In further embodiments, information pertaining to the state of the mobile device 102 at the time of feedback (e.g., location fix, heading, speed, etc.) accompanies the data sent to the recommendation server 106.
The mobile device 102 shown in
Turning now to
The candidate item module 212 identifies candidate items stored in the item database 112 that are within the geographic area of interest. Candidate items are discussed in greater detail below, with reference to
Turning now to
Referring now to
Turning now to
Based on the above definitions, the probability of emitting candidate item j for feature k is given by the following probability mass function:
f
Y(j;θk)=θjk
Furthermore, assuming that the random variable X is a mixture of the preceding K categorical random variables, the probability of emitting candidate item j given the entire model of features is:
where the mixture weights (ak) obey unit-simplex constraints. The rest of the process 500 is concerned with estimating the m·K emission parameters and using them to recommend candidate items. The process 500l proceeds by calculating vectors of counts from actual or simulated data for each feature:
n
k
=[n
1k
, n
2k
, . . . , n
mk]T
A location fix 602 and heading 604 are obtained 503 from a mobile device, an example of which is illustrated by
Referring again to
In
B˜Unif(a, b)
B is defined as the angle from the location fix 602 to the point being generated, and it is specified in degrees East of true North. Hence, the uniform distribution is parameterized with a=0 and b=360.
Next, there is an independent draw from an exponential distribution of scaling factors:
S˜Exp (λ)
and distance (D) is then computed as:
D=r·S
where r is the radius of the given geographic area of interest 606. The exponential distribution is parameterized with λ=6.60775 in the illustrated embodiment, since this value assures that more than 99% of the generated points lie within the area of interest 606. Other parameter values are possible; however, care must be taken to prevent too many generated points from falling outside the area of interest 606. In alternative embodiments, different distance decay models are used, e.g., a uniform distribution between the location fix 602 and the outside edge of the area of interest 606, or the probability of a point being generated at D being inversely proportional to D2.
Given the location fix 602 and values for B and D, there is enough information to finish generating a point. This is accomplished using methods known in the art; for example, as in Williams, E. (2010), “Lat/lon given radial and distance”, Aviation Formulary V1.45, retrieved from http://williams.best.vwh.net/avform.htm.
Returning to
e.g., for each candidate item j, there is a count of the number of generated points that are closer to it than any other candidate item, and the sum of those counts will be exactly equal to the total number of generated points, N1.
Referring back to
In
X˜N(μ, Σ)
A value for X, in conjunction with the given location fix 602, can be converted into a latitude-longitude pair via the methods of Williams (2010). More specifically, if:
X=[X1, X2]T
then B, in radians measured clockwise from the x-axis, is computed as:
B=a tan2 (X2, X1)
Distance is computed as the length or magnitude of the random vector. Assuming a is a heading specified in radians measured clockwise from the x-axis, not degrees East of true North, and r is the radius of the geographic area of interest, then the parameters of the bivariate normal distribution are:
The above described model and parameter definitions were chosen such that the N2 points are clustered around a point, at which the mobile device 102 is likely to be in the near future, based on the location fix 602 and heading 604. This can be seen from
In further embodiments, data from a geographic (i.e., map) database is used to supplement the location fix and heading data when generating points 504 and 508. For example, in one such embodiment points are favorably generated in regions with a direct road connection to the location fix, and points are less likely to be generated in inaccessible areas such as lakes.
Referring again to
As described earlier with regards to the set of N1 points, in different embodiments, different methods are used to determine which candidate item a given point is nearest to.
Candidate items are ranked 512 according to recency. For example, the candidate items are ordered from oldest to newest in terms of when they were created and then, in the simplest case, they are assigned values 1, 2, . . . , m. Thus:
Still referring to
Continuing to refer to
A numerical value indicating consistency of a specific candidate item relative to candidate items that have been previously presented to the user is computed 518. In the illustrated embodiment, this is achieved by counting the total number of records in the event database indicating that items “similar” (as defined previously) to the candidate item have been PLAYED to the user. The result is that:
In the illustrated embodiment, there is no consideration of when items were LIKED or PLAYED when computing affinity and consistency (516 and 518). In an alternative embodiment, only events that occurred within a certain timeframe (e.g., the last month) are considered. In another embodiment, recent events are weighted to count more than older events.
Once the complete set of vectors of counts for all features and all candidate items have been determined, vectors of estimated emission parameters are computed 520. The vector of estimated emission parameters associated with feature k is computed 520 as follows:
This represents a smoothed maximum likelihood estimate (MLE) for θk, utilizing Lidstone smoothing. This method is described in further detail in the eNotes article “Rule of succession” (http://www.enotes.com/topic/Rule_of_succession) which in incorporated herein by reference in its entirety. In other embodiments, other smoothing methods are used.
Once all estimated emission parameters have been computed 520, a candidate item for recommendation (j*) is selected 522, in accordance with the following rule:
In the illustrated embodiment, the mixture weights are of the form:
meaning that all six features are equally weighted.
In other embodiments, the mixture weights are algorithmically learned from data, set by the end-user via Graphical User Interface (GUI) elements (e.g., sliders) on the mobile device 102, or otherwise determined by appropriate methods known in the art. Although the basic rule, as used in the illustrated embodiment, is to recommend the candidate item with the highest emission probability, other recommendation strategies may be adopted. For instance, candidate items with arbitrarily high emission probabilities may be selected for recommendation. In order to add stochasticity to the process 500, in some embodiments a candidate item is sampled from the mixture distribution and presented as the recommended item. In one such embodiment, a candidate item is selected in response to one or more randomly generated numbers, with candidate items that yield higher values for fx(j) being more likely to be selected. Regardless of how the recommended item or recommended items are chosen, the process 500 terminates 524 and returns the selected candidate item or items.
In some embodiments, the above probability calculations are done in logarithmic space in order to ensure numerical stability. In such embodiments where logarithmic space is used, the final summation should be done as a log sum of exponentials.
Although the illustrated embodiment enumerates six features, alternative embodiments using more, less, or different features are possible, as should be apparent to one of skill in the art. Additional features considered in further embodiments include, but are not limited to, ranking candidate items by source quality, computing the affinity of a group for a candidate item, and the inclusion of sponsorship as a feature, wherein the amount of advertising money spent on a candidate item influences its recommendation likelihood.
The system described has several advantages over other recommender systems. The use of a funneling approach enables a continuous stream of relevant recommendations to be provided, without any potential recommendations having to be considered and blocked. The system makes intelligent “guesses,” even when the amount of feedback data is very limited. As a result, it does not require a long calibration period before it becomes useful to the end user. In some embodiments, the system is configured such that items determined to be of low relevance are still presented to the end user. If the determination was incorrect, and the user in fact likes the recommendation, the parameters and weightings used to select items in future are updated. In other embodiments, both the end user and the system provider can update these parameters and weightings manually in real time, giving a recommender system a great deal of versatility.
Turning next to
The machine may be a server computer, a client computer, a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a cellular telephone, a smartphone, a web appliance, a network router, switch or bridge, or any machine capable of executing instructions 824 (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute instructions 824 to perform any one or more of the methodologies discussed herein.
The example computer system 800 includes a processor 802 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), one or more application specific integrated circuits (ASICs), one or more radio-frequency integrated circuits (RFICs), or any combination of these), a main memory 804, and a static memory 806, which are configured to communicate with each other via a bus 808. The computer system 800 may further include graphics display unit 810 (e.g., a plasma display panel (PDP), a liquid crystal display (LCD), a projector, or a cathode ray tube (CRT)). The computer system 800 may also include alphanumeric input device 812 (e.g., a keyboard), a cursor control device 814 (e.g., a mouse, a trackball, a joystick, a motion sensor, or other pointing instrument), a storage unit 816, a signal generation device 818 (e.g., a speaker), and a network interface device 820, which also are configured to communicate via the bus 808.
The storage unit 816 includes a machine-readable medium 822 on which is stored instructions 824 (e.g., software) embodying any one or more of the methodologies or functions described herein. The instructions 824 (e.g., software) may also reside, completely or at least partially, within the main memory 804 or within the processor 802 (e.g., within a processor's cache memory) during execution thereof by the computer system 800, the main memory 804 and the processor 802 also constituting machine-readable media. The instructions 824 (e.g., software) may be transmitted or received over a network 826 via the network interface device 820.
While machine-readable medium 822 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store instructions (e.g., instructions 824). The term “machine-readable medium” shall also be taken to include any medium that is capable of storing instructions (e.g., instructions 824) for execution by the machine and that cause the machine to perform any one or more of the methodologies disclosed herein. The term “machine-readable medium” includes, but not be limited to, data repositories in the form of solid-state memories, optical media, and magnetic media.
Throughout this specification, plural instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. Structures and functionality presented as separate components in example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.
Certain embodiments are described herein as including logic or a number of components, modules, or mechanisms, all of which may be implemented using, software (e.g., code embodied on a machine-readable medium or in a transmission signal), hardware modules, firmware, or a combination thereof, e.g., as described with
In various embodiments, a hardware module may be implemented mechanically or electronically. For example, a hardware module may comprise dedicated circuitry or logic that is permanently configured (e.g., as a special-purpose processor, such as a field programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations. A hardware module may also comprise programmable logic or circuitry (e.g., as encompassed within a general-purpose processor, e.g., 802, or other programmable processor) that is temporarily configured by software to perform certain operations. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.
The various operations of example methods described herein may be performed, at least partially, by one or more processors, e.g., 802, that are temporarily configured (e.g., by software instructions, e.g., 824) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules that operate to perform one or more operations or functions.
The one or more processors, e.g., 802, may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS). For example, at least some of the operations may be performed by a group of computers (as examples of machines including processors), these operations being accessible via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., application program interfaces (APIs).)
The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the one or more processors or processor-implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.
Some portions of this specification are presented in terms of algorithms or symbolic representations of operations on data stored as bits or binary digital signals within a machine memory (e.g., computer memory 804). These algorithms or symbolic representations are examples of techniques used by those of ordinary skill in the data processing arts to convey the substance of their work to others skilled in the art. As used herein, an “algorithm,” e.g., as described with FIGS. 5 and 6A-C, is a self-consistent sequence of operations or similar processing leading to a desired result. In this context, algorithms and operations involve physical manipulation of physical quantities. Typically, but not necessarily, such quantities may take the form of electrical, magnetic, or optical signals capable of being stored, accessed, transferred, combined, compared, or otherwise manipulated by a machine. It is convenient at times, principally for reasons of common usage, to refer to such signals using words such as “data,” “content,” “bits,” “values,” “elements,” “symbols,” “characters,” “terms,” “numbers,” “numerals,” or the like. These words, however, are merely convenient labels and are to be associated with appropriate physical quantities.
Unless specifically stated otherwise, discussions herein using words such as “processing,” “computing,” “calculating,” “determining,” “presenting,” “displaying,” or the like may refer to actions or processes of a machine (e.g., a computer) that manipulates or transforms data represented as physical (e.g., electronic, magnetic, or optical) quantities within one or more memories (e.g., volatile memory, non-volatile memory, or a combination thereof), registers, or other machine components that receive, store, transmit, or display information.
As used herein any reference to “one embodiment” or “an embodiment” means that a particular element, feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
Some embodiments may be described using the expression “coupled” and “connected” along with their derivatives. For example, some embodiments may be described using the term “coupled” to indicate that two or more elements are in direct physical or electrical contact. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other. The embodiments are not limited in this context.
As used herein, the terms “comprises,” “comprising,” “includes,” “including,” “has,” “having” or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Further, unless expressly stated to the contrary, “or” refers to an inclusive or and not to an exclusive or. For example, a condition A or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).
In addition, use of the “a” or “an” are employed to describe elements and components of the embodiments herein. This is done merely for convenience and to give a general sense of the invention. This description should be read to include one or at least one and the singular also includes the plural unless it is obvious that it is meant otherwise.
Upon reading this disclosure, those of skill in the art will appreciate still additional alternative structural and functional designs for a system and a process for recommending geotagged items in the vicinity of an end user, based on a plurality of factors, through the disclosed principles herein. Thus, while particular embodiments and applications have been illustrated and described, it is to be understood that the disclosed embodiments are not limited to the precise construction and components disclosed herein. Various modifications, changes and variations, which will be apparent to those skilled in the art, may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the spirit and scope defined in the appended claims.
This application claims the benefit of U.S. Provisional Application No. 61/436,571, filed Jan. 26, 2011, which is incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61436571 | Jan 2011 | US |