Businesses, organizations, and individuals often evaluate items to determine which are best or most liked by a target audience. Evaluated items can be almost anything including, for example, services, products, product features, artistic creations, and ideas and may be evaluated under a number of criteria such as utility, appearance, and value. Businesses often use the results of evaluations when making decisions such as which products or features a business will develop. Individuals may also use evaluation processes that provide the judgments, preferences, or opinions of others.
Use of the same reference symbols in different figures indicates similar or identical items.
The evaluation of a large number of items can be a time consuming process. For a business, a single person or a panel responsible for reviewing and evaluating potential projects can become a bottleneck in the decision making process, particularly when a large number of candidate projects must be considered and compared. To reduce the burden on individual reviewers and improve throughput, some evaluation processes invite large populations to help pick the best items. Crowdsourcing is a practice that uses open calls to a large and often unidentified group, i.e., a crowd, to perform a task, and some Internet services employ crowdsourcing for evaluation purposes. For example, an Internet service such as Google Moderator may allow people to vote “up” or “down” on prospective ideas and count the votes to determine a winner from among the ideas. However, such evaluation processes may not provide reliable results because many people that vote will not have or take the time necessary for thoughtful consideration, particularly if a large number of items are involved. As a result, votes tend to be biased toward the well known items, rather than items that are best under the relevant criterion. Further, many evaluation processes allow a voter to endorse every item, for example, with an “up” or “like” vote without specifically indicating a ranking or preference among items. Voters that “like” all or most options effectively create signal noise and can make reliable identification of the best items more difficult.
Another concern for evaluation processes is that the list of candidate items to be evaluated may change. For example, developers of a product or service often face a continuous stream (or deluge) of feature requests. In response, some developers have adopted “agile methodology,” which requires frequent prioritization of open issues. Static polls may not be well suited for evaluation of a list of items that is constantly evolving because older items may have already accumulated votes before the newer items were added.
In one implementation, a crowdsourcing process using pair-wise comparisons can handle evaluation of any number of items without increasing the complexity of individual votes and can employ a large number of voters to manage the burden on individual voters. For example, an evaluation process can be broken up into a collection of simple pair-wise comparisons, and for each pair-wise comparison, a voter is presented with a pair of items A and B sampled from a list of items. For each vote, the voter simply specifies whether the voter prefers item. A or B. Each vote is thus a simple and brief task that each voter can perform many times and that can be parceled out to many people. The evaluation process can be offered as a service for a fee to users having items to evaluate and may be presented to a large number of voters through an Internet web site or other convenient communication channels.
In another implementation, each item being evaluated has a rating, and each vote indicating a choice of the better item from a pair of items results in changes of the ratings of the two compared items by an amount or amounts that depend on the difference between the ratings of the items in the pair. As used herein, a rating for an item refers to a numerical score, which may be representative of the value of the item in terms of the selected criteria for evaluation. Each vote can thus be treated as a contest between items with the results of the contest increasing the rating of the winning item and decreasing the rating of the losing item in a manner similar to the system employed for rating chess players. The ratings of the items at the end of an evaluation process containing a statistically desired number of votes can be used to rank the items from best to worst. For example, a list of items may be ordered according to rating with the items having the highest ratings receiving the best ranks.
In another implementation, an evaluation process can be presented to voters as a game that provides the voters with an incentive to make well-reasoned votes. For example, voters may be required to pay a fee in order to take part in an evaluation game. To play the game, each voter submits a number of votes. The votes may be conducted as described above, where for each vote, the voter is presented a pair of items and chooses an item from the pair. At the end of the game, the voter or voters that make the most “correct” votes, i.e., votes that are consistent with the final results of the evaluation process, are rewarded from a prize pool that may include the voters' fees, the user's fees, or other prizes. The service providing the game may take all or a portion of the user's or voters' fees. In such a game, voters that pay fees or could win a prize are likely to take voting seriously, and each voter has an incentive to make votes that are most likely to be correct. The results of the evaluation game can be useful to a user that wanted the items ranked or items may be presented simply for game purposes, e.g., to provide items and comparisons that may be of interest to the voters.
In an exemplary implementation, service device 110 is a server system connected to a wide area network such as the Internet. User device 120 may be a desktop computer employing a browser to communicate with service device 110, and voter devices 130 may be a mixture of different types of devices such as desktop computers, portable computers, tablets, and smart phones similarly employing browsers to communicate with service device 110. As will be understood by those of skill in the art, the configuration of devices illustrated in
The illustrated implementation of service device 110 in
The service may charge a fee for conducting the evaluation process, which the user pays in step 230. Additionally, the user in step 250 may help to fund a prize pool that may be given to one or more winning voters as described further below.
The service in step 240 initializes ratings 170 of the items in the list. For example, all items can be initialized to the same score, e.g., all items assigned an initial rating of 0, which may be appropriate if there is no reason to believe that any item is preferred over the other items. Alternatively, items can be assigned ratings 170 according to user preferences, according to additional information such as may have been obtained in a prior evaluation process, assigned according to any given rule, or assigned arbitrarily. The service in step 250 then invites voters to participate in the evaluation process. For example, the service can send a link to a list of people, asking them to view the list of items, e.g., photos or business ideas. The service then waits in step 260 for an event. The list of people allowed to vote may be restricted by criteria 160.
The service begins a vote process 300 each time a voter clicks the invitation link or otherwise indicates to the service a willingness to participate.
The service in step 320 presents the voter with a pair of items sampled from the user's list and presents a question based on the user's criteria for the evaluation process. For example, the service may present two images or photos to a voter along with the question “Which photo would look better in a desk frame?” The voter is also presented with potential answers, which will at least allow the voter to select one of the two items, e.g., A or B. The user may also be presented with other options such as “A and B are equal” or “I can't decide.” After the voter chooses and returns an answer, the service can perform steps that are unseen by the voter such as a step 340 of recording the vote for later determination of game winners and a step 350 of adjusting the ratings of the two items just compared.
The service, in step 360, determines whether to present another pair of items to the voter. For example, in one implementation, the voter's fee paid in step 310 may cover a set number of votes, and another pair of items is presented to the voter if the voter has not used up the purchased number of votes. In another implementation, the voter can choose whether to continue voting as many times as the voter wants, and another pair of items is presented to the voter. Whenever the voter is presented with another vote, process 300 branches from step 360 to repeat steps 320 and 330 to reload the page with another pair of items, and the voter can continue voting on pairs.
The pair of items presented to a voter for any pair-wise comparison can be selected randomly or systematically from among the list. In one implementation, the selection of pairings in repetitions of step 320 can depend on the ratings of the items. For example, with the goal of achieving more accuracy for the ratings of the best items, step 320 may be made more likely to select items with higher ratings for the comparison. In one specific implementation, a number W of items having the highest ratings are identified, for example, by voting that verifies that the W highest rated items defeat all the candidates below a rating threshold and then pairings are selected to preferentially include both items from the W highest rated items. In another implementation, items having similar ratings are preferentially paired to better distinguish which of the items may be better. In yet another implementation, items previously presented in fewer comparisons are preferentially selected for subsequent comparisons, so that each item is involved in a statistically sufficient number of comparisons.
Recording of a vote as in step 340 may only be needed if the service needs to identify winning voters for rewards.
Step 350, which changes the ratings of items based on a vote, can employ a rating system that changes the rating of the two items involved in the vote by an amount or amounts that depend on the difference in the ratings of the two items. By convention, a rating system is commonly such that a higher rating is better, and the rating of an item increases when the item “wins” in a pair-wise comparison or decreases whenever the item “loses” in a pair-wise comparison. Alternatively, the opposite convention could be used. In which case, lower ratings are better, and an item's rating decreases after winning a comparison or increases after losing a comparison. Without loss of generality, the following description assumes that higher ratings are better. The magnitude of the change in an item's rating depends on the difference between the rating of the item and the rating of the other item in the comparison. Winning against an item with a higher rating results in a larger increase than winning against a lower rated item.
Comparisons of a pair of items A and B in some implementations of step 330 may give a voter choices other than “A is better” or “B is better.” For example, a voter may be able to select “A and B are equal” or “I can't decide.” The rating system used in step 350 can also take into account a vote other than A or B. For example, a vote indicating that A and B are equal can be considered a draw in the rating system, and the ratings of A and B may be changed by amounts that depend on the difference between the ratings of A and B. The change for a “draw” may be less, e.g., one half of, the change for a win or a loss. Alternatively, a vote such as “I don't know” or “I can't decide” could be ignored for ratings purposes.
The votes can be employed in a rating system that in one specific implementation of step 350 is similar to the rating system that is used for calculating the relative skill levels of players in two-player games such as chess. In particular, after a pair-wise comparison with an item B having a rating RB, an item A with a rating RA is assigned a new rating RA′ as given in Equation 1 in which: W is a performance value 1, 0.5, or 0 depending on whether A won, drew, or lost the comparison; and E indicates an estimated performance of item A relative to item B based on their current ratings RA and RB. Factor K can be a constant that is selected according to the desired average magnitude or separation of the ratings or based on the number of votes expected. Factor K could alternatively be a function, for example, that decreases with the number of comparisons. Equation 2 indicates an estimated performance E of a higher ranked item A against a lower ranked item B in one implementation of a rating system in which wins and losses respectively count as 1 and 0 and draws (when possible) count as 0.5. For this specific rating system, the performance of item B in the comparison with item A is (1−W), and the estimated performance for item B is (1−E). As a result, the change in the rating RB is the negative of the change in rating RA as shown in Equation 3. This characteristic of the rating system maintains the average rating of the items.
An advantage of the rating system described above is that it may permit the user to change the list of items during the evaluation process. For example, in process 200 of
Step 450 determines whether the user is seeking to remove items from the list. If so, the service can simply remove the item in step 450. The prior comparisons involving removed items are still valid for ratings of the other items, so that no ratings need to be changed when the items are removed.
Results from the evaluation process can be returned to the user at the end of the process or at any time during the evaluation process. Ranking the items can simply be performed by ordering the items based on the ratings, e.g., with higher rated items receiving higher ranks. In the process of
Some implementations can use an incentive mechanism to induce voters to volunteer their pair-wise comparisons of whatever set of items or ideas are presented to them, and ideally convey a truthful opinion of what the voters think is best. One incentive mechanism rewards whoever ranks highest the item that the consensus of other voters also considers to be the best. Another incentive mechanism rewards whoever provides the most votes or the highest percentage of votes that are consistent with the final results of the evaluation process.
Step 530 determines whether winners of the evaluation process should be identified for payment. Identifying winners is primarily for evaluation processes in implementations that reward the voters, and identifying winners may only occur if additional conditions are met such as the evaluation game being complete. When winners are identified, step 540 compares the votes (e.g., votes 180 of
Voters participating in an evaluation including result process 500 of
Various implementations of systems and methods described above may achieve several key advantages. For example, an implementation using pair-wise comparison can select pairs of items to focus voters' contributions on determining which items belong among the top ranks without wasting voter energy on determining an exact ordering of the least-preferred candidates. An evaluation process in accordance with an implementation using ratings may be able to elucidate a rank ordering of candidates more efficiently than absolute rating schemes (e.g., thumbs up/down or star ratings), which some studies have indicated cannot be guaranteed to deliver an accurate rank ordering. Some implementations of evaluation processes using ratings may also allow candidate items to be added to a poll at any time without penalizing the added items, which enables use of such evaluation processes for open-ended innovation campaigns and scrum development processes. Some implementations can encourage reasoned voting and better results through providing incentives. More generally different implementations can contain different combinations and variations of the features to achieve different combinations of such advantages.
The above description concentrates on some specific implementations of example systems or processes. However, additional systems employing the above-described principles can be implemented as computer readable media containing instructions that are executable by one or more processors to perform one or more of the processes described herein. Such computer readable media includes non-transitory media such as hard drives, computer readable disks, flash drives, and other storage devices.
Although particular implementations of systems and processes have been described above, the description only provides some illustrative examples that should not be taken as limitations.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US11/39037 | 6/3/2011 | WO | 00 | 3/17/2014 |