A crowdsourcing system automatically distributes instances of a given task (commonly referred to as “Human Intelligence Task” or simply “HIT”) to a group of human workers (or “contributors”) who execute instances of such tasks according to certain requirements or goals. Upon successful completion, contributors receive a reward such as a monetary sum that is based on the amount of submitted work.
Common types of Human Intelligence Tasks include finding entities in a text, annotating an image by drawing bounding boxes, providing natural language variants of a sentence by writing in a text field, or validating other contributors' answers.
A group of HITs that share the same purpose (instructions) and format (input/result types) is called a Job. For instance, a Job with the instructions “Count the number of people in the image,” has ‘image’ as input type, and ‘number’ as output type. Each HIT within a Job will have a different input instance. In the example above, each HIT addresses a different image.
In a Job, each HIT can be executed by several distinct contributors. Each solved instance of a HIT is called a HIT Execution. Each HIT Execution has a different instance of the result type according to the corresponding contributor's answer. In the example above, contributors' answer with a number representing their perception of the number of people in the image.
The intrinsic characteristics of the crowdsourcing environment often require quality control mechanisms. Poor quality work can be a result of both fraud (contributors exploiting the system for money) or lack of skills (for instance, language skills).
A commonly used quality control mechanism in crowdsourcing is to introduce gold tasks in between regular executions, typically without notification for the contributors. Gold HITs share the same instructions and format than the job they are designed for, but they also define the expected/correct answer. Upon submitting a Gold Execution, contributors' output is compared to the expected answer, allowing to infer their performance on the current job. The more Gold HITs a given contributor answers, the more reliable is the projection of their on-job output quality.
The analysis of the set of Gold Executions for each contributor can result in a multitude of actions, including being signaled for further investigation, being blocked from the current job, and work being discarded and not rewarded.
The inventors have noted that the strategy used to assign Gold HITs in a job (the “Gold Strategy”) can significantly affect the performance of contributors. The inventors analyze performance of a Gold Strategy by two variables:
In more detail, it is optimal to decrease the number of Gold HITs performed by good contributors (cost), while at the same time to detect low quality contributions as early as possible (reactivity). These two goals may conflict, as making the system more sensitive to quality variations, intuitively requires increasing the proportion of Gold HITs that are mixed along regular tasks.
A Gold Strategy for a job is defined by six elements:
The first five elements that define a Gold Strategy above are closely connected to the type of data collection (type of output and desired level of quality). The inventors have recognized that the assignment method, however, is not, and can be exploited to optimize the full performance of the Gold Strategy.
Common Gold HIT assignment strategies are Fixed Assignment and Flat Rate. In Fixed Assignment, a contributor is exposed to a Gold HIT every fixed number of regular tasks, for instance, every 5 regular tasks. In Flat Rate, on the other hand, there is a previously established percentage representing the probability of a contributor receiving a Gold HIT. In this case, if the probability is set to 10%, it is expected that, on average, contributors will receive a Gold HIT every 10 regular tasks. Both Fixed Assignment and Flat Rate strategies have the same performance with respect to the Gold Strategy variables of cost and reactivity.
To optimize the Gold HIT Assignment process for the variables of cost and reactivity, the inventors have conceived and reduced to practice a software and/or hardware facility that assigns Gold Executions using a per-contributor Gold HIT Assignment probability that it dynamically adapts according to on-job contributor performance (“the facility”). This probability of the next assignment task being a Gold HIT is called Suspicion. The facility operates its Dynamic Gold HIT Assignment Strategy in two stages: Assignment Stage and Update Stage.
By performing in some or all of the ways described above, the facility more efficiently and effectively discerns and resolves poor performance by contributors.
Also, the facility improves the functioning of computer or other hardware, such as by reducing the dynamic display area, processing, storage, and/or data transmission resources needed to perform a certain task, thereby enabling the task to be permitted by less capable, capacious, and/or expensive hardware devices, and/or be performed with lesser latency, and/or preserving more of the conserved resources for use in performing other tasks.
In some embodiments, the Update Stage occurs every time a Gold HIT is answered. During this stage, the corresponding contributor suspicion rate is updated, i.e., increased if the contributor failed the assessment and decreased otherwise.
In full detail, the Dynamic Gold HIT Assignment Strategy has four components.
1. Maximum and Minimum Suspicion Rates: two thresholds that define, respectively, a ceiling and a floor for the value of suspicion of any contributor in the Job. The Maximum Suspicion Rate allows to diminish returns of over assigning Gold HITs, or a situation where all tasks assigned are Gold, which can enable malicious users to know when they are being evaluated and possibly exploit the platform. The Minimum Suspicion Rate, on the other hand, allows to control for the situation where contributors reach a certain level where they never get assessed.
In some embodiments, the Maximum and Minimum Suspicion Rates are subject to the following criteria:
2. Suspicion Kernel Function: a suspicion function ƒ(x) that maps a value of confidence (noted by x) to a suspicion rate. This function allows for controlling the degree to which the suspicion rate increases and decreases, according to a given decrease or increase in the confidence associated with the contributor. After establishing both the Maximum and Minimum Suspicion Rates and the Suspicion Kernel Function, the values for minimum and maximum confidence can be extrapolated by the following equations:
min_confidence=ƒ1(max_suspicion) and,
max_confidence=ƒ1(min_suspicion)
In some embodiments, it is true of the Suspicion Kernel Function ƒ(x) that:
∀×∈[min_confidence, max_confidence]0≤ƒ(x)≤1,
min_confidence≤max_confidence
These conditions are true, for instance, for any continuous, decreasing function. Table 1 below shows three sample combinations of suspicion kernel functions with minimum and maximum suspicions used by the facility in some embodiments.
3. Starting Suspicion Rate: defines suspicion rate for contributors when starting the job. The corresponding starting confidence can be computed by:
starting_confidence=ƒ−1(starting_supicion)
4. Increasing and Decreasing Confidence Steps: these two values define, for the domain of the chosen suspicion kernel function, the degree of increase and decrease of the confidence (and consequently suspicion) after a given Gold Execution, during the Update Stage. Separating these two values allows for controlling individually the rising and the lowering of the suspicion rate of the contributor.
The formulation of the Dynamic Gold HIT Assignment Strategy described above enables strategies customized to best suit each Job. In various embodiments, the facility uses one or more of the following approaches to configuring itself for a Job: expert tuning, historical data optimization, and simulation.
An expert in crowdsourcing or data labeling can use the facility as a comprehensive platform to make decisions on the behavior of the assignment of the gold HITs. In various embodiments, this includes a) deciding that any given contributor should, at least, on average, be exposed to a gold HIT every 20 tasks, thus setting the minimum suspicion to 5%, b) wanting to enforce a linear and balanced variation of the suspicion rate, thus choosing a linear kernel function and equivalent increasing and decreasing confidence steps, c) wanting to use the initial stage of the job as qualification, thus setting a high starting suspicion rate and choosing a slowly decreasing kernel function, or d) rely on the previous contributor reputation on the platform to initialize their suspicion.
The facility of the Dynamic Gold HIT Assignment Strategy also allows for parameter optimization based on historical data on previous collections. In other words, for a given past Job, typically of similar format, what would have been the optimal parameters values so that a) contributors that were blocked in the Job would have done the minimum amount of regular tasks (reactivity) and b) contributors that were not blocked were exposed to the least Gold HITs (cost). Given the reduced number of parameters and variables to be optimized, and thus the low computational costs for simulating large numbers of combinations, it is affordable to run grid searches over pre-defined sets of parameters.
Finally, in the absence of enough context (e.g.: new type of annotation in the platform, and/or most contributors having a short history in the platform), the facility allows for running simulations over common/target scenarios.
To allow for simulating different data collection scenarios, the inventors defined the following Contributor Archetypes:
The Contributor Archetypes are defined by establishing the probability of a given contributor failing a Gold HIT at a certain phase of the job. Table 2 below shows an example of the facility's use of the Contributor Archetypes by dividing the job into four stages (quarters Q1-Q4) and defining the probability of each archetype failing a Gold HIT at each phase.
The archetypes allow for defining distinct Job scenarios, in what concerns the predicted distribution of the behavior of the crowd entering the job. By varying the parameters of the Dynamic Gold HIT Assignment Strategy, it is possible to identify the combination that minimizes the cost of the strategy and maximizes its reactivity.
With the archetypes defined above, three scenarios were prepared varying the distribution of contributors per archetype. Table 3 below resumes such distribution, considering 200 contributors per scenario.
For each of the scenarios above, simulations with different parameter combinations were run over 500 executions. In this specific example, all simulations used the Suspicion Kernel Function sigmoid(−x), which enables taking smaller steps while decreasing towards 0.
Table 4 below summarizes the set of parameters obtained for each of the scenarios. As can be seen, when facing higher rates of Good and Flawless contributors—hence a more trustable crowd—the strategy chooses to assign less Gold HITs at the beginning of the Job (since it trusts the crowd), but penalizes more when a contributor fails a Gold Execution.
Based on a combination of expertise in crowdsourcing, usage of historical data, and simulation, in some embodiments the facility uses the following default parametrization for the parameters of the Dynamic Gold HIT Assignment Strategy:
Table 5 below compares the assignment of HITs (Gold and Regular) for each of the archetypes when using a Flat Rate Gold HIT Assignment (with 5% as a parameter) versus the Dynamic Assignment (using the parameters mentioned above). The results for the Reckless and Poor contributors show significant gains in reactivity of the job. The new strategy made it possible to block these contributors under 11 regular executions for the Reckless contributor, and under 18 executions for the Poor, contrasting with 118 and 72 regular executions, respectively, in the Flat Rate formulation. This increase in reactivity did not compromise the overall cost of the strategy, since the number of Gold HITs assigned in all archetypes is similar.
An Assigner component 608 decides the type of execution to assign to each contributor at a given moment. The Assigner component reads the current contributor suspicion rate from 606 and, depending on the final decision of the assignment process described above in connection with
Upon submission of a Gold HIT Execution, the Gold Evaluator component 602 is activated. A Gold Comparator 603 of the Gold Evaluator applies the appropriate comparison metric (depending on the type of output of the Job) between the contributor's response and the Gold Answer. Based on the job configuration, the Gold Comparator 603 decides whether the Gold Execution passes or fails, and communicates that decision to both the Confidence Variation Calculator 604 and the Gold Score Combinator 605.
The Confidence Variation Calculator module 604 reads the current suspicion rate of the contributor (stored among Job Suspicion Rates 606) and updates it accordingly to the procedures described above in Section I—Strategy Formulation (decreasing confidence if the contributor fails, and increasing it otherwise). The facility computes a new suspicion rate using the Suspicion Kernel Function and stores it among the Job Suspicion Rates 606.
On the other hand, the Gold Score Combinator 605 updates the Gold Score of the contributor, using the information of passed/failed Gold HIT. This involves reading the history of Gold Executions of the contributor in 607, recomputing the new score, and writing the new Gold Score 607.
Finally, the Contributor Evaluator 611 periodically reads the current Gold Score of the contributor 607 and, according to the job configuration in place, produces a given action to be applied to the contributor, storing it as Contributors Evaluation 612 (for instance, decides to prevent the contributor to submit further executions).
In a Dynamic Gold Assignment System 703, a HIT Assigner component 705 is responsible for deciding the type of execution to assign to each contributor at a given moment based on contributor suspicion rate 606. Depending on the final decision, the facility chooses to retrieve either a regular HIT or a gold HIT from HIT Pool 713.
Upon submission of a gold HIT Execution, the Gold Evaluator component 706 is activated. The Gold Comparator 707 applies the appropriate comparison metric (depending on the type of output) between the contributor's response and the gold answer. Based on the job configuration, the Gold Comparator 707 decides whether the HIT execution passes or fails, and communicates that decision to both the Confidence Variation Calculator 710 and the Gold Score Combinator 709.
The Confidence Variation Calculator module 710 reads the current suspicion rate of the contributor, stored among Job Suspicion Rates 712, and updates it accordingly to the procedures described above in Section I—Strategy Formulation (decreasing confidence if the contributor fails, and increasing it otherwise). The new suspicion rate is computed using the Suspicion Kernel Function and is then stored among Job Suspicion Rates 712.
On the other hand, the Gold Score Combinator 709 updates the Gold Score of the contributor, using the information of passed/failed Gold HIT. This involves reading the history of Gold Executions of the contributor and recomputing the new score.
Finally, the Contributor Evaluator 704 periodically reads the current Gold Score of the contributor and, according to the configuration in place, produces a given action to be applied to the contributor. For instance, decides to prevent the contributor to submit further executions.
The various embodiments described above can be combined to provide further embodiments. All of the U.S. patents, U.S. patent application publications, U.S. patent applications, foreign patents, foreign patent applications and non-patent publications referred to in this specification and/or listed in the Application Data Sheet are incorporated herein by reference, in their entirety. Aspects of the embodiments can be modified, if necessary to employ concepts of the various patents, applications and publications to provide yet further embodiments.
These and other changes can be made to the embodiments in light of the above-detailed description. In general, in the following claims, the terms used should not be construed to limit the claims to the specific embodiments disclosed in the specification and the claims, but should be construed to include all possible embodiments along with the full scope of equivalents to which such claims are entitled. Accordingly, the claims are not limited by the disclosure.