Marketing of products entails balancing the valuations placed on items by a buyer against the prices (or, more generally, contracts) set by the seller. In general, if the buyer's valuation of an item is greater than the price of the item, then the sale is likely to occur; conversely, if the buyer's valuation is less than the price of the item then no sale is likely.
More quantitatively, a “buyer's surplus” from a purchase can be identified as the difference between the buyer's valuation of the item and the sale price. A “seller's profit” is determined by the sale price minus the cost to the seller in obtaining and marketing the item. The seller would like to sell the item at the highest price possible, since this maximizes the seller's revenue; however, if the price is set too high, and more particularly above the buyer's valuation, then no sale will occur and no revenue is obtained. Further complicating the situation is that in the case of multiple buyers, different buyers may place different values on the item, or an individual buyer may have different valuations at different times.
In existing approaches, the seller sets the price for an item based on the seller's past experience or other available information, and then adjusts the price upward or downward over time based on sales. For example, if an unexpectedly low number of sales is achieved at the initial price, the seller may lower the price in order to encourage additional sales. Conversely, if sales are brisk at the initial price then the seller may try increasing the price. If sales volume is maintained (or decreases only slightly) at the higher price, then the seller's revenue is increased. These approaches are sometimes referred to as censored price experiment (CPE) approaches. The seller estimates the distribution of buyers' valuations from censored observations (that is, observations that the valuation is greater than the price or that the valuation is less than the price; more generally a censored observation is one that is only known to come from some set).
Such approaches have numerous disadvantages, such as being relatively slow and imprecise. For example, if the price is increased by 20% and sales remain brisk, the seller does not know whether or not a further 10% price increase would also be acceptable to buyers, and can only determine this information by further price experimentation, which takes more time. Lost seller's revenue during the slow price optimization can be substantial. Frequent manipulation of pricing can also be problematic, since it can annoy buyers. If the price is set too high then buyers may go elsewhere, and may not return to the seller even if the seller later lowers the price. It is also possible for past price adjustments to affect the current price “experiment”. For example, if the seller frequently changes the price, then a buyer may learn to wait for a low price point in the buyer-perceived pricing “cycle” before making a purchase. Slow adjustment of price over time can also fail to identify more rapid market changes that may modify the optimal price. For example, seasonal variations in demand may not be detected, and as a result the price may be set too high (or too low) at certain times of the year.
Other approaches have been attempted for price optimization. These approaches typically are variants of the price adjustment scheme, sometimes under different nomenclature. For example, in the automotive industry it is known to offer price rebates to encourage purchases. Such rebates are simply short-term price adjustments. The same problems arise. For example, at certain times American car buyers have come to expect certain automobile manufacturers to offer frequent rebates, and delay purchase until the next rebate offer.
Other approaches attempt to rely upon self-reporting by buyers. An extreme example of this is the “pay what you like” restaurant model. In this model, the buyer actually sets the price by being allowed to pay whatever the buyer believes the restaurant meal was worth. See, e.g. “Pay-what-you-like restaurants”, http://www.cnn.com/2008/TRAVEL/04/01/flex.payment/index.html (last accessed May 7, 2010). This model also is reliant upon honesty of the self-reporting, and additionally introduces a problematic self-interest factor into the self-reporting.
In some illustrative embodiments disclosed herein as illustrative examples, a method comprises: presenting a plurality of sale offers for at least one item for sale to one or more buyers, the sale offers including at least one non-deterministic sale offer having a sale price and non-deterministic consideration for the sale price; conducting selling activity including the presenting and further including at least one actual sale transacted in accordance with an accepted sale offer of the plurality of sale offers; receiving buyer decision data during the conducting of selling activity; and generating buyer valuation information for the at least one item for sale based on the buyer decision data.
In some illustrative embodiments disclosed herein as illustrative examples, a method comprises: presenting a plurality of offers to one or more offerees, the offers including at least one non-deterministic offer having non-deterministic consideration for the offeree; conducting business activity including the presenting and further including at least one actual business transaction executed in response to an acceptance by an offeree of one of the plurality of offers; receiving offeree decision data during the conducting of business activity; and generating valuation information based on the offeree decision data. In some such illustrative embodiments, the method further includes generating a new offer based on the generated valuation information and conducting additional business activity including presenting the new offer. In some such illustrative embodiments, the method is performed using n offeree folds wherein the generating of a new offer for an ith offeree fold is based on the generated valuation information for offeree folds other than the ith offeree fold.
In some illustrative embodiments disclosed herein as illustrative examples, a method comprises: clustering offerees into n folds based on information indicative of likelihood of offeree-offeree collusion; and conducting valuation learning for each fold using valuation information obtained solely from the other (n−1) folds.
In some illustrative embodiments disclosed herein as illustrative examples, a digital processor is configured to perform a method as set forth in any one of the three immediately preceding paragraphs. In some illustrative embodiments disclosed herein as illustrative examples, a storage medium stores instructions executable on a digital processor to perform a method as set forth in any one of the three immediately preceding paragraphs.
The inventors have analyzed the problem of learning optimal prices, and have developed the following. A substantial issue is the low rate of pricing information collected by conventional CPE approaches, and the low reliability of pricing information collected by “hopeful” approaches that rely upon buyer self-reporting of his or her valuation (and “hope” that the buyer is being truthful). What is desired is a pricing information collection approach that collects a large volume of pricing information in a way that filters out or biases against extreme buyer valuations.
To achieve robustness, pricing approaches disclosed herein enlarge the action space of valuation options available to buyers. However, one cannot simply provide buyers with different prices and ask the buyer to choose his or her price, as this would be susceptible to failure due to buyer dishonesty. Many (perhaps most) buyers would not choose the price that is closest to their valuation—they would simply choose the lowest offered price.
The problem can be viewed as a partially observed Markov Decision Problem (POMDP) which considers the state space to be a space of functions, known as beliefs. The pricing approaches disclosed herein consider the control space to be a space of functions, known as mechanisms or designs. Mechanisms correspond to menus of lotteries and variable lotteries. The term “lottery” as used herein refers to a non-deterministic sale offer having a sale price and non-deterministic consideration or allocation for the sale price. The consideration (or allocation) is broadly defined herein as “what is received in return for the sale price”. In a conventional purchase, the consideration is the item that is purchased. This is a deterministic consideration. Disclosed herein are techniques for employing lotteries in rapid and robust price optimization. In the case of a lottery, the consideration is non-deterministic in that what the buyer receives in return for the sale price is not fixed. For example, in an illustrative lottery the consideration comprises a chance (e.g., 60% chance) of receiving the item. In another illustrative lottery the consideration comprises receiving either item “A” or item “B” (e.g. a 60% chance of receiving item “A” and a 40% chance of receiving item “B”).
The lottery approach can be understood in terms of what is referred to herein as a “Schrödinger's Price Experiment” (SPE). A conventional censored price experiment (CPE) uses a sequence of prices. The SPE combines all these prices as a “superposition” to enable experimenting on the sequence of prices simultaneously or concurrently. The label “Schrödinger” analogizes to similar superposition experiments in quantum physics which also have non-deterministic outcomes. As already noted, however, such a superposition cannot be constructed simply by offering the buyer a set of different prices at each round, because many (possibly most) buyers would select the lowest price since that selection is in the buyer's self-interest.
However, if the lowest price only gave the item with some probability or in some other non-deterministic fashion—that is, if it corresponded to a lottery—then the buyer would select between different non-deterministic sale offers (that is, between different lotteries) without necessarily being biased by self interest toward choosing the lowest sale price. Indeed, as disclosed herein the sale prices and probabilities for the lotteries can be selected in such a way that only buyers with specific valuations will rationally prefer a given lottery. As a consequence, the SPE learns faster than a CPE because it collects data for different price points simultaneously, yet it is otherwise equivalent.
With reference to
As a result, a rational buyer will find that one of the three sale offers L1, L2, S3 provides an optimal buyer's surplus as compared with the other sale offers. This is emphasized in
An illustrative example is given next for computing the sale price p and probability z for a case involving two lotteries, which is equivalent to two CPE, namely a first CPE that sets a sale price v1 on n1 days, and a second CPE that sets a sale price v2 on n2 days. A corresponding SPE would include a first lottery having a sale price
for probability
of receiving the item, and a second lottery having a sale price
for probability 1 of receiving the item. In these expressions n:=n1+n2. Imagine that buyers have valuation lower than v1 with probability 1−q1−q2, valuations in [v1, v2) with probability q1 and valuations in [v2,∞) with probability q2. These probabilities are unknown to the seller. The expected profit (assuming zero seller cost, although a non-zero seller cost is readily accommodated by a shift in valuations) is (q1+q2)v1n1+q2v2n2 for the CPE, and is
for the SPE. It is readily verified that these are equivalent. However, in the case of the CPE, when the sale price v2 is rejected by a buyer, the CPE does not know if this was due to the buyer having a valuation that is lower than v1, or is due to the buyer having a valuation in the range [v1, v2). In contrast, the SPE distinguishes these cases. While the foregoing lottery example tests two valuations v1 and v2, extension to three, four, or more valuations is straightforward through the use of additional lotteries. In general, a lottery for a single-item problem includes a non-deterministic sale offer in which the non-deterministic consideration is a probabilistic consideration having a probability P of receiving the item and a probability (1−P) of not receiving the item.
The lottery approach can also address two, three, four, or more (in general, N) items for sale. Some of the N items may be mutually exclusive such that purchase of one item precludes purchase of another item. For example, if the items for sale are different brands of toothpaste, then (at least in many cases) the purchase of one brand of toothpaste will preclude the buyer from purchasing another brand of toothpaste at the same time. In the case of N such “mutually exclusive” items (where N is at least two), the non-deterministic consideration of the non-deterministic sale offer defining the lottery is suitably constructed as receipt of one of the N items, with the item received determined probabilistically. One or more such lotteries can be used to determine (in this example) the optimal sale price for each different brand of toothpaste.
More generally, in the case of a multi-item problem the lotteries can be optimized to identify an optimal sale offer which may, in general, itself be a lottery. By way of illustrative example, in the marketing of two brands A and B of toothpaste, if the (average) buyer does not particularly prefer either brand A or brand B, then the optimal sale offer may be a lottery having a discounted sale price (as compared with a deterministic sale of brand A or a deterministic sale of brand B) for which the consideration is a 50% likelihood of receiving brand A and a 50% likelihood of receiving brand B.
With reference to
With continuing reference to
A lottery construction module 22 generates a plurality of sale offers for the one or more items for sale. At least one of the sale offers should be a lottery, that is, a non-deterministic sale offer having a sale price and non-deterministic consideration for the sale price. As already described, for example with reference to
Once the set of sale offers 24 is established, the “Schrödinger's Price Experiment” (SPE) is carried out by a buyer interface module 30, which presents a buyer with the set of sale offers 24 including at least one lottery that is offered at least at one time instant. It is to be understood that the buyer interface module 30 engages in actual selling using the set of sale offers 24 as genuine offers for sale. The illustrative example of
In the case of a transaction in which the accepted sale offer is a lottery (that is, the accepted sale offer is a non-deterministic sale offer in which the return for the sale price is non-deterministic consideration), the checkout module 32 suitably includes or has access to a random result generator 34. (As is conventional in the art, the term “random result generator” as used herein includes both true random number generators and “pseudorandom number generators”, for example a processor implementing a pseudorandom number generation algorithm in which the output is actually deterministic but has sufficient complexity, and a distribution approximating a random number distribution, so as to appear to be random and to well approximate a given distribution of random numbers). For example, if the accepted sale offer is a single-item lottery in which the consideration is a 30% chance of receiving the item, then the checkout module 32 suitably evaluates result=0.3*R where R is a random (encompassing pseudorandom) number generated by a random (encompassing pseudorandom) result generator implementing a uniform or constant probability density function over the range [0,1). If result is less than or equal to 0.3 then the buyer “wins the lottery”, and the item is shipped to the buyer. On the other hand, if result is greater than 0.3 then the buyer “loses the lottery” and the item is not shipped to the buyer. In either case, the checkout module 32 suitably informs the buyer whether or not the buyer “won the lottery”, that is, whether or not the buyer will be receiving the item. Preferably, this information is provided to the buyer as a purchase receipt, and is preferably also stored in a persistent repository (not shown) owned by the seller. The persistent repository provides data for tax purposes, and for tracking performance of the checkout module 32 to ensure and verify that the checkout module 32 is providing “fair” lottery results in a statistical sense.
Operation of the checkout module 32 ensures that buyer decisions are “real” decisions that have real-world consequences for the buyer. This, in turn, ensures that the buyer acceptance/rejection data 40 collected by the buyer interface module 30 accurately reflects real-world buyer decisions that inform about buyer valuation. As already noted, the checkout module 32 also engages in actual selling of items, and so generates a continuing revenue stream for the seller during the pricing optimization process. A valuation module 40 processes the buyer acceptance/rejection data 42 to determine a distribution of buyer valuations indicated by actual purchases. (More generally, given covariates on buyer valuations, the valuation module 40 may in some embodiments process the buyer acceptance/rejection data 42 to determine a family of distributions over buyer valuations, which itself may vary with time.)
With continuing reference to
The valuation distribution information output by the valuation module 42 can be used in various ways. One approach is to use this valuation distribution information as feedback supplied to the lottery construction module 22, which can then construct refine the set of sale offers 24 and repeat the process to further refine the estimated valuation distribution. This can be repeated cyclically to refine the estimated valuation distribution. The cycle can be repeated after each buyer makes a decision, or after a selected number of buyers make decisions. The final result (either with or without iteration) is a set of one or more optimized sale offer(s) 44 for the product. If iteration is employed, it is expected that the iterations for a single-item problem will ultimately converge to a final discrete sale price for the item. In the case of a multiple-item problem the convergence may be to final discrete sale prices for the respective (multiple) items, or the convergence may be to an optimized menu of lotteries.
In
With reference to
With continuing reference to
To quantify the illustrative example, the valuations are drawn from a multinomial distribution over a set V:=(v1, v2, . . . , vN). The assumption of a known finite set of valuations can in some cases be motivated by a discretization of the space of monetary units. The multinomial has parameter vector θ={θ1, . . . , θN} with Σi=1Nθi=1. A buyer is understood to have valuation vk with probability θk.
While the set of possible valuations V is known, the probabilities θ of observing particular values is not known perfectly to the seller. A Dirichlet distribution is taken as the assumed density (i.e. representation of the seller's belief about the parameters of the multinomial) in the illustrative examples. The Dirichlet distribution is sufficiently general to allow any specific valuation probabilities θ and is thus known as a non-parametric prior. Choosing such a distribution makes much sense as real-world valuation distributions are known to be rather complex, involving sharp transitions from budget constraints and competing outside options.
At time step t=0 the seller's Dirichlet belief is given by parameters α={α1, . . . , αN} with αi>0. The corresponding probability density function over possible valuation distributions is:
The Dirichlet distribution is conjugate to the multinomial. Therefore computing posteriors (θ|vi) after fully-observing a buyer's value vi is easy. The result is another Dirichlet with parameters α′:=α+ei where ei is a shorthand for a vector of length N with a 1 at position i and zeros everywhere else. That is:
The experimental design is next addressed by way of illustrative example. This corresponds to
To trade-off exploration and exploitation, in the illustrative examples an approach is adopted that is similar to the best of sampled sets (BOSS) method of Asmuth et al (2009). J. Asmuth et al., “A Bayesian Sampling Approach to Exploration in Reinforcement Learning”, 25th UAI, pp. 19-26, 2009. BOSS drives exploration by sampling multiple models from the posterior belief and selecting actions optimistically.
Multiple multinomial models of buyer valuations are sampled from the Dirichlet posterior belief, as per operation 60. For each sample, the profit-maximizing price and the corresponding expected profit on the sampled valuation distribution are identified, as per operations 62, 64. Some samples have a high expected profit on their valuation distribution relative to the expected profit of the myopic-optimal price on the current posterior. This could happen in two ways. Firstly, because the sampled valuation distribution has more buyers just above the myopic-optimal price. In this case, the myopic price would also perform well. Secondly, because the price is substantially different from the myopic-optimal. In this case, the myopic price is rather risky and it is imperative to explore the alternative price. Accordingly, in the illustrative examples a sample is considered to be “optimistic” if the expectation on the sample's valuation distribution of the difference between the profit for the sample's optimal price and the profit for the myopic-optimal price is large. Note that an optimistic sample could correspond to a price that is higher, lower or equivalent to the myopic-optimal price.
For simplicity, the illustrative examples mix just one optimistic price with the myopic-optimal price. The resulting menu of lotteries will then have two or three market segments, that is, sets of valuations, where all buyers in each segment will receive the same lottery probability and price. The highest value segment will receive the item with probability one and the lowest segment will receive the item with probability zero. The intermediate segment, when there is one, will receive the item non-deterministically. To ensure that observations are robust to liars, it is ensured that any buyer wishing to select a lottery other than that which maximizes their surplus will lose at least ε of that surplus. This is referred to as ε-incentive compatibility. For the assumptions above, this means that the buyer will decide not to lie.
Given the myopic-optimum price vm and the optimistic price vu, in the operation 66 the problem is to find profit-maximizing lotteries such that we can distinguish between values in the segments [0, vm), [vm, vu), [vu, ∞) or in the segments [0, vu), [vu, vm), [vm, ∞) with ε-incentive compatibility. For the sake of obtaining simple formulae, in the illustrative examples it is assumed that
where Δv is the spacing between successive valuations in the Dirichlet model. This problem turns out to be a linear program (LP). By observing which constraints should be active, it is straightforward to find a closed-form solution to this LP. In the upper segment the lottery has probability 1. The rest of the solution is parameterized as follows: in the middle segment the lottery probability is z; in the interval with lowest valuation vu, the price is pu; and in the interval with lowest valuation vm, the price is pm. The solution is then:
In summary the steps of the experimental design are: (i) Given belief hyperparameters a over the parameters 61 of the valuation distribution, (ii) Find the myopic-optimal price p* for the current belief, (iii) Sample K parameter vectors θ1, . . . , θK from the current belief (K=5 is used in some illustrative examples herein), (iv) For each sample θk solve for the optimal price pk, (v) Evaluate the profits πk and πk*, for this optimal price and for the myopic-optimal price on this sample, (vi) Select an optimistic price with index k satisfying πk−πk*≧πj−πj* for all j≠k, and (vii) Obtain the menu of lotteries for valuations vu=pk, vm=p* using the formulae given immediately above.
It is noted that the foregoing differs substantially from conventional BOSS. For example, BOSS attempts to solve general Markov Decision Problems (MDPs), and accordingly BOSS selects actions from different models in different observable states. That step is not employed herein, as the system is assumed herein to be always essentially in just one “observable state” corresponding to the multinomial distribution being fixed rather than time-varying. Another difference is that BOSS selects a single action at each state, whereas in the illustrative approaches disclosed herein several are selected to be explored in parallel. In some illustrative examples, two actions are selected: the myopic action and the optimistic action. Yet another difference is that BOSS defines an “optimistic” action as one that maximizes expected discounted reward when all possible sampled models of the world can be chosen from, whereas the illustrative examples herein define an optimistic action as one that maximizes expected welfare or profit on a sample relative to the welfare or profit that the myopic action would attain on that sample.
While the above description of the selection of the myopic and optimistic prices used the words “profit” or “welfare”, these terms were for simplicity of exposition. It is also contemplated for the buyer's objective in selecting the myopic and optimistic prices to reflect some weighted linear combination of seller profit and buyer surplus, where the weight for either part of the combination is significantly non-zero. On the one hand, the seller would be “squeezing” their buyers; on the other hand, the seller would have to “squeeze” their suppliers. Both extremes may be undesirable in various applications. In some complex real-world settings, a weighted linear combination of profit and welfare may be achievable. See, e.g. K. Roberts, “The characterization of implementable choice rules”, in Jean-Jacques Laffont, editor, Aggregation and Revelation of Preferences. Papers presented at the 1st European Summer Workshop of the Econometric Society, pages 321-349. North-Holland, 1979.
The belief update is next addressed by way of illustrative example. This corresponds to
R(v0,ε):={v:w(v,vo)≧w(v,v)−ε},
where w(a,b) is the surplus of a buyer who has valuation a but lies that their valuation is b. For any observation, this corresponds to one of the two or three segments generated in the experimental design.
We lose conjugacy to the Dirichlet density in the case that vi is not directly observed but instead is known to come from set S⊂V. In this case, the exact posterior is a mixture of Dirichlet's (MoD). To see this, consider that Bayes's rule gives:
After T censored observations with censoring set St at time t there may be many components to this mixture. This not computationally tractable.
If it is desired for the belief to remain in some simple family, one could apply assumed density filtering (ADF) or expectation propagation (EP), as described, for example, in: Minka, “Expectation Propagation for Approximate Bayesian Inference”, Proc. 17th Annual Conf. Uncertainty in Artificial Intelligence (2001); Minka, “A family of algorithms for approximate Bayesian inference”, PhD Thesis, MIT (2001); and T. Heskes, and O. Zoeter, “Expectation Propagation for Approximate Inference in Dynamic Bayesian Networks”, Proc. 18th Annual Conf. Uncertainty in Artificial Intelligence (2002). ADF computes the posterior after each observation and then updates the posterior by projecting it in the sense of minimum Kullback-Leibler divergence, into the simple family of beliefs.
If the different mixture components are close to each other, the mixture of Dirichlet's (MoD) posterior may be well-approximated by a single Dirichlet distribution. For instance, the MoD posterior is suitably projected onto a single Dirichlet by the standard Kullback-Leibler (KL) projection. For probability densities from exponential families, the best matching approximation in the KL sense is one that matches so-called natural moments. The natural moments for the Dirichlet are log θi.
To construct such approximations, a link function is suitably used. A link function informs as to the expected natural moments for a given set of parameters. It is thus generally a mapping from a parameter vector to a moment vector. The inverse of the link function informs as to what the parameter values should be if the natural moments are to have some specific values. The link function for the Dirichlet is:
where ψ is the digamma function,
Thus, to find a best-approximating set of Dirichlet parameters α′ to a MoD with probability density function (θ) it is sufficient to compute
α′=g−1(θ˜log θ)
where log θ is an abbreviation for the vector with components log θ1, . . . , log θN. Often there is no closed form for the inverse link function or for the expected natural moments. Fortunately, for a MoD, the expected natural moments are simply the weighted combination of the natural moments of the individual Dirichlets making up the mixture.
Determination of the inverse link function is next addressed. For the Dirichlet updates, the expression to be solved is:
Putting x:=ψ(α0) shows that the only nontrivial part of this problem is the root-finding problem:
While this is a single non-linear equation, it involves inverse digamma functions, for which rational or spline approximants can be approximated. As an alternative solution, a direct application of Newton's method to the full set of equations using sparse matrix operations is highly efficient.
The foregoing assumes that buyers operate in their self-interest, and (since they are engaged in actual purchases) do not “lie” in their valuations. However, the analysis can also be adapted to allow for buyers to “lie”, that is, to set forth valuations that are not in accord with the buyer's internal belief. This is quantified as follows: each buyer may be a liar who is willing to lose an amount of their surplus in order to fool a mechanism.
The foregoing description relates to embodiments relating to selling, with optimization of buyers' welfare, seller's profit, or a linear weighted mixture of these objectives, or so forth. More generally, the disclosed approaches involving superpositions can also be readily applied to procurement, to repeated auctions with a reserve price which might be varied, to a variety of repeated exchange scenarios such as double auctions (markets), to purchases immediately involving multiple parties and to “multi-sided markets” (also known as “platforms” where the seller sets a price to multiple parties, as when a credit card firm makes charges to both card owners and shops that accept the credit card). In the case of repeated procurement, a buyer may wish to purchase an item at a sequence of times for a reasonably low price. The buyer may not be in a suitable situation to identify precisely what a reasonably low price is for certain kinds of items. Rather than posting a guess for the highest price that the buyer is willing to pay, the buyer may attempt to learn about the distribution of sellers' costs or of sellers' outside options. In this case, a cost may be considered as a negative valuation and the arguments disclosed above apply. For instance, in the analogue of a CPE, the buyer may post a purchase price of $1 for one week and a purchase price of $2 for a second week. Using the analogous SPE the buyer would post a purchase price of $ 3/2 for a purchase of the item with probability 1 and a price of $1 for a purchase with probability ½. The non-deterministic situation might be implemented as the buyer paying $1 and subsequently, on sharing a random variable distributed as a Bernoulli variable with probability ½, deciding to receive the purchase or to not receive the purchase. Analogously to the use of the SPE for selling, the SPE for purchasing preserves the buyer's surplus and the seller's profit that would be achieved via a CPE, but it is, advantageously, possible to learn faster with an SPE than with a CPE.
To generalize, in the disclosed approaches an offeror presents a plurality of offers to one or more offerees, the offers including at least one non-deterministic offer having non-deterministic consideration for the offeree. The offeror receives decision data responsive to the presenting, and generates offeree valuation information based on the decision data. The offeror may then present a new offer based on the generated offeree valuation information. In the illustrative case, the offeror is the seller and the offeree is the buyer; however, in a procurement application the offeror may be the buyer who makes various offers to purchase a product at various purchase prices, at least one of which is a lottery in which the price paid is non-deterministic, and the offeree may be a seller who accepts or rejects the offer.
The disclosed pricing and consideration optimization employing lotteries is learned on buyer decision data respective to the sale offers. More generally, substantially any pricing optimization approach will ultimately entail such learning based on buyer decision data. (Indeed, even sellers who simply make a single decision about what price to offer for a single item still do so on the basis of past sales decisions by buyers for other items). This is inherent because ultimately the valuation of an item offered for sale is determined by the valuation placed on that item by buyers. One can attempt to estimate the buyer valuation by various approaches (e.g., computing a manufacturing, transportation or servicing cost and adding a predetermined profit margin, comparison with valuations of similar products, or so forth), but the valuation ultimately is controlled by what buyers are willing to pay.
In some suitable approaches disclosed herein, the learning of the optimal pricing employs partitioning of the set of buyers into folds, where each fold maintains a belief about all the other folds and this belief is used to select learning mechanisms for the buyers contained in it. As disclosed herein, folds make the learning mechanism truly incentive compatible in cases where buyers may make multiple purchases. It is disclosed herein that having few folds makes belief maintenance computationally efficient, and can also reduce the risk of collusion amongst buyers.
The disclosed use of folds in the learning is motivated by the following observations. If a buyer believes that future prices that the buyer receives will increase with any valuations that he or she expresses to a valuation learning mechanism, then it can be in the buyer's interest to express lower than truthful valuations, although there are some sellers' beliefs for which a buyer expressing a lower-than-truthful valuation makes prices go up rather than down. This problem can be avoided by ensuring that prices for one buyer only depend on valuations expressed by other buyers. However, in settings with large numbers of buyers, maintaining a separate belief for each buyer is computationally demanding, incurring a computational cost that is in some instances quadratic in the number of buyers. A further issue is that some buyers may be known to be more likely to collude with other buyers. It is desired to reduce the impact of such collusion.
In pattern recognition, folds are sometimes employed for estimating generalization performance of classifiers. For example, in the method of n-fold cross-validation, the data set is partitioned into n sets. A classifier is trained at times t=1, . . . , n. At time t all data other than set t is used for training and set t is then used for testing. The test results are then averaged. If n equals the size of the dataset, the method is known as the jackknife. Typically smaller n=10 is used for computational convenience.
It is recognized herein that maintaining a separate belief for each buyer is directly analogous to the jackknife. For computational convenience, in some illustrative examples of the disclosed fold-based valuation learning, buyers are partitioned into n=10 “folds” and only one belief is maintained per fold. A fold's belief is based only on information from other folds and is used to set prices or choose mechanisms for buyers in that fold.
In some embodiments, rather than partitioning into folds purely randomly, some side-information is utilized that assigns a probability of collusion to each pair of buyers. For example, this side information may be based on family, geographical location or other social-network information that suggests collusion may be likely. Given a target number of folds, minimizing the total probability of collusion then corresponds to a clustering problem, which can be solved using substantially any known clustering method. The disclosed folds-based learning approaches advantageously facilitate pricing optimization or experimentation that is incentive compatible, yet computationally efficient.
With reference to
In an operation 102, the buyers are clustered into n folds based on the side information. The clustering groups together buyers that are likely to collude, while placing buyers that are unlikely to collude into different folds. Some suitable clustering approaches that operate on pairwise “similarity” data (where a pair of buyers who are likely to collude are considered to be “similar” while a pair of buyers who are unlikely to collude are considered to be dissimilar) include K-means clustering, spectral clustering, or so forth. The purpose of the clustering operation 102 is to ensure that (i) each fold includes buyers who are likely to collude with each other and (ii) buyers in different folds are unlikely to collude with each other.
In an operation 104, the learning of valuation distributions is performed on a per-fold basis. Valuation learning for the i-th fold is performed using buyer valuation information obtained solely from the other (n−1) folds in setting the belief of the i-th fold. For example, in the illustrative example of learning by lottery as set forth in
In an operation 106, it is contemplated that a new buyer may be identified during the valuation learning, who was not one of the buyers processed in the clustering operation 102. In such a case, the operation 106 characterizes the new buyer, for example using the characterization operation 100, and assigns the new buyer to a fold that is the best fit for the characterization of the new buyer while balancing the relative sizes of the folds to ensure computational tractability. Alternatively, or if the number of new buyers becomes too large, the clustering operation 102 may be repeated to generate wholly new sets of clusters.
In the illustrative examples, the valuation learning operation 104 employs a learning by lottery approach as disclosed herein. More generally, however, substantially any valuation learning algorithm can be employed, with the fold approach advantageously ensuring incentive-compatibility, or reducing or eliminating the effect of any collusion between buyers.
Moreover, the fold-based learning of
It will be appreciated that variants of the above-disclosed and other features and functions, or alternatives thereof, may be combined into many other different systems or applications. Various presently unforeseen or unanticipated alternatives, modifications, variations or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims.