Online “crowdsourcing” services enable work requestors to access a flexible and potentially large pool of unsupervised human workers. The Mechanical Turk crowdsourcing marketplace service offered by Amazon.com, Inc. is an example (see www.mturk.com). To date, such services typically have been used to recruit unsupervised online human workers to perform relatively low skill and/or repetitive tasks that a human is considered to be better than a computer or other machine at performing. Examples include editing written content, rating a website or other web-based content, and identifying duplicative content.
Typical services do not provide effective mechanisms to ensure the quality, accuracy, etc. of the specific work product produced in response to a particular task. In the case of Mechanical Turk, for example, a requestor's recourse if a task is not performed to the requestor's satisfaction is to refuse payment. Some attempts have been made to identify and ban workers who game the system and/or do not do good work. Statistical methods, such as statistical classifiers, have been used to determine which of a plurality of individual, separate responses to the same task are correct. But typically no reliable mechanism is provided to ensure that work produced by a particular worker in response to a specific task request satisfies applicable acceptance criteria.
Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.
The invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
A reputation system to evaluate work quality is disclosed. In various embodiments, answer data for each of one or more workers is received. The answer data includes one or more answers, each representing a judgment by the worker that reflects on the quality of a given work product. The reputation system determines, based at least in part on reputation data of an owner of the work product, the answer data, and reputation data of the workers who provided the answer data, which answers are correct. The determination is used, in some embodiments along with other information, to decide whether the work product satisfies applicable acceptance criteria. In some embodiments, adjustments to the respective reputations of the workers and/or the work product owner are generated based at least in part on the determination of which answers are correct.
A work request may be submitted by a work requestor. For example, a user associated with work requestor client system 110 may request that work be performed, such as a request to proofread a blog entry before the user posts the entry. In some embodiments, a widget or other tool is provided via a blog entry creation interface to enable an “edit” of the entry (or other text) to be requested, for example by clicking on an “edit” button. In other embodiments, a tools menu such as a pull down or popup menu includes an option to “edit” content. Automatically on selection of the “edit” option, the text in question and a request to edit the request is generated and sent to the service 114. A business process flow instance is created to manage performance of the work. Depending on the amount of text and how the business process and/or service 114 are configured, the text may be broken into subparts, for example paragraphs, sentences, or other parts, and for each subpart a task defined and posted to edit that part. Once all the component tasks have been completed, the work done by the various workers who completed the tasks is combined to generate and deliver to the work requestor an edited version of the original text.
In some embodiments, a work requestor such as one associated with work requestor client system 110 uses a work request interface, such as a graphical user interface, a web services interface, and/or an API, to request that work be performed. As in the example above, the service 114 creates an instance of a business process flow to manage performance of the work through completion. The business process flow invokes a work completion platform to cause required work to be performed. The work completion platform instantiates its own workflow to manage completion of the required work, the result of which is returned to the crowdsourcing service business process flow, which assembles and delivers the final work product to the work requestor, initiates payment by the work requestor, etc. The business process flow and/or the work completion workflow or both may enter a wait state while a component flow or sub-flow executes. Upon completion of execution of the component flow or sub-flow, processing at the next level up in the workflow resumes. Multiple component flows and/or processes may in some cases execute in parallel. A first workflow may invoke a second workflow which may invoke a third workflow, and so on, to any arbitrary depth as may be required to perform work required to produce a final work output of the overall business process flow.
Automatically obtaining a review of work to determine whether the work meets acceptance criteria is disclosed. Upon completion of a task by an originating worker, in various embodiments one or more review tasks are generated automatically, to be performed by one or more reviewing workers from a set of unsupervised, remote workers. In some embodiments, to originating worker is a member of the set of unsupervised, remote workers. In some embodiments, an original task has a review task counterpart usable to determine whether the original work satisfies acceptance criteria. For example, an original task to write a headline for an article or other content may have an associated review task to determine, given the content and the headline provided by the original task performer, whether the headline fits the content. Based at least in part on the input received from one or more reviewers, a decision is made programmatically whether the work performed by the originating worker satisfies applicable acceptance criteria. If so, the work is accepted and the originating worker and reviewers who agreed the work met acceptance criteria are paid. If not, the work is caused to be redone by another worker, and so on, until the work has been completed in a manner that meets acceptance criteria.
The business process flow instance receives and processes input received from the work requestor to enable the work to be performed (206). Examples include without limitation a document or other content to be edited; text to be translated; and information obtained from the work requestor to be used to create content, such as a press release. The input data is processed into a format and/or unit size indicated by the business process flow as being required to complete the work. For example, text to be edited may be divided up into pages or other subdivisions of a prescribed unit size, to enable the work completion platform to assign each page separately to be edited in parallel. Or, input data provided by a work requestor may be parsed and reformatted for consumption by the work completion platform, such as xml or other structured data. The processed input data is provided to a work completion platform to cause specific work to be done, for example by calling an “edit” or other service of the work completion platform and providing the respective pages of input as objects on which the “edit” work is to be performed (208). The business process flow instance enters a waiting state while the work completion platform causes the work to be performed, in some embodiments as described below in connection with
Upon completion of a task, one or more corresponding review tasks are generated automatically (306). The respective results of the review tasks are received and processed (308). If based on the review results received so far a decision cannot be made automatically with a sufficient degree of confidence that the work should be accepted or, conversely, rejected, then more input is obtained (312). In various embodiments work on the work completion platform side is managed by a workflow configured to use an escalation strategy to be able to determine with a sufficient degree of confidence that the original work should be accepted or, conversely rejected. For example, depending on the nature of the work and how the applicable workflow has been configured, one or more additional tasks to obtain further review may be generated, or in a case in which uncertainty persists beyond a configured number of iterations, human intervention by a supervisory staff may be requested. The required degree of certainty may vary depending on factors such as the nature of the task, the sensitivity of a particular work request, for example as indicated by the requestor in the request, and/or the configured and/or indicated preferences of the work requestor.
Once a result (e.g., accept or reject) is determined with the requisite level of certainty (310), if the work was rejected then the original task is resubmitted for completion by another worker, and the task completion and review processing described above is repeated. In some embodiments, the originating worker is not paid and the originating worker's reputation is downgraded if work is rejected. The task and review cycle is repeated until the work produced is accepted. In some embodiments, timeouts or other events may trigger human intervention and/or other exception handling, for example if a task has not been completed within a prescribed time and/or within a prescribed number of attempts.
If the decision is to accept (314), then the original task is completed, and the originating and/or reviewing workers who performed their tasks correctly are paid. If other tasks remain to be performed (316), those tasks are created and caused to be performed (304, etc.). Certain tasks may have dependencies on other tasks and cannot be posted until the tasks on which they depend have been completed. For example, a review task may not be generated and/or posted until a task to generate the work that is to be reviewed has been completed. Upon submission of work product for the original task, one or more review tasks are created and the work produced by the originating worker, or a portion thereof, may be associated with the review tasks as input. Likewise, a task to edit the work product produced by one or more human and/or machine translators cannot be performed until the translation work has been completed. Conversely, an original task cannot move to completion until required review tasks have been completed and processed.
Once all tasks have been completed (316) the work produced is returned (318), for example to the business process flow that invoked the work completion platform, and the process of
While in the example shown in
In various embodiments, techniques described herein are used to perform various types of work, including without limitation editing content (e.g., proofreading), creating content, translating or otherwise transforming content, and/or more complicated work involving as subcomponents elements of some or all of the above types of work.
While in some embodiments described herein the work product that the review results or other result data relates to is an original work product produced by an originating worker, such as an outsource worker, the reputation system disclosed herein may be used to process any result data, generated by any worker, that reflects on the quality of a given work product. The work product may be produced by an originating human worker, a machine, or a combination thereof
Once the result of the analysis module 510 is known, reputation adjustment module 512 uses the results to compute amounts by which the respective reputations of the originating worker and reviewing workers should be adjusted. For example, if the originating task is determined to have been performed accurately, based on two reviews indicating agreement with the originating worker's result and one review expressing disagreement, then a small upward adjustment may be determined for the originating worker and the two reviewers who agreed that the originating worker performed the task accurately and a larger downward adjustment may be made to the dissenting reviewer's reputation. In some embodiments, the magnitude of the adjustments may be determined at least in part on factors such as the respective reputations of the reviewers and/or the originating worker prior to the current task family being evaluated, how certain the reputation system is of the determination as to which workers were right and which were wrong, and recent historical trends and/or adjustments to the respective workers' reputations with respect to other work they have performed. In some embodiments adjustments are made such that it takes a worker a long time to build up a reputation relative to the time it takes to lose or damage his/her reputation, for example through clearly and/or consistently inaccurate work. In some embodiments, separate reputation scores or other values are maintained for different qualifications and/or levels, and the reputation system computes adjustment amounts that affect only the reputation score(s) relevant to a particular task. The results determined by the reputation system 502 (i.e., which workers are right, degree of certainty, and respective reputation adjustment amounts) are returned to the outsourcing system, which stores reputation updates and initiates payment transactions for workers determined to have performed their task accurately.
In various embodiments, a reputation score or other reputation data as described herein comprises a single, composite score that reflects and embodies both a current reputation level of the worker (originating or reviewing worker) and at least in part a reputation history of the worker. The score reflects how the worker's reputation has changed over time, including whether the score has increased consistently over a long or short period of time, whether and by how much the score has increased or decrease in recent times, etc. The reputation score is based in various embodiments at least in substantial part on actual judgments by other workers, conducted without knowledge of the identity of the worker whose work they are reviewing, of work produced by the worker; and/or on whether other workers agreed or disagreed with a judgment or decision of the worker, such as an indication by the worker in a reviewer role that reviewed work should be accepted or rejected. The reputation score, therefore, reflects the collective judgment of others, over time, as to the quality of the worker's work. The approach described herein differs from rating systems, in which users provide star, numerical, or other ratings to other users. Ratings provided in such systems may be based on considerations other than an objective assessment of the quality of the rated user's work. Ratings for a user typically are not determined based on blind review by others of specific work product of the user. By comparison, in the approach described herein, a worker's reputation score is based largely on the ideally blind review (i.e., not knowing the identity of the originating worker) of the work output produced by the worker over time. Moreover, the judgments by reviewing workers are made in the context of a review task in which the reviewer is motivated by self-interest—such as the desire to be compensated for producing a correct answer and to protect his/her own reputation by producing a correct answer—to provide a correct, unbiased response. The reputation score determined as described herein reflects, therefore, the human experience by which reputation is built or lost, such as the collective judgment by qualified peers as to whether or not the worker produces quality work. In addition, in various embodiments a reputation score rises slowly through consistently performed acceptable work, but can decrease by larger increments if work quality suddenly or dramatically declines, as is common in human experience as well.
Other factors reflected in a reputation score in various embodiments, and/or otherwise considered by the reputation system described herein, include how long the worker has been a member of the worker pool (how much history) and in some embodiments other data such as demographic, psychographic, and other data associated with the worker.
In some embodiments, resolution requests received by the reputation system include state information that reflects a starting state of the inputs to the reputations system in the context of a global system in which multiple work flows and associated task families and resolution requests may be running in parallel. The reputation system includes the state information in the resolution results it returns as output. The resolution requestor, such as the task resolution module described above, uses the state information to determine whether to process the reputation system's output. If the state information in the response from the reputation system is not consistent with current state information of the task resolution module and/or other components, then the state information is updated and the resolution request resubmitted with updated state information. For example, resolution result data received in response to another resolution request involving one or more overlapping workers may have resulted in a current reputation score of a worker having been changed in the time between submission of the current resolution request and receipt of a response. In some embodiments, resolution requests and responses include a starting reputation score and a proposed adjusted score. If by the time the resolution request response is received the current reputation score does not match the starting reputation score indicated in the response, the resolution request is resubmitted with the starting reputation score updated to reflect the current score.
An administrator may intervene to adjust the outcome of a particular resolution request, resulting in a different resolution system output for the task family, including different reputation score adjustments. In some embodiments, task families that were resolved subsequent to the modified resolution are re-run with the modified resolution as a new starting point, which in turn may require other task resolutions to be re-run to reflect changes to historical reputation scores of participating workers and/or work product owners. In some embodiments, historical information regarding which resolution algorithms, etc. were used for specific tasks or types of task over time is used to ensure that the same approach is used to re-run resolution of a task family as was applied in the original resolution.
While in certain embodiments a task by an originating work and work produced in response to such a task are described, techniques disclosed herein are applied in other embodiments to other types of work product, including without limitation work product in whole or in part by a machine. Examples include content translated at least initially by a machine and/or search engine results. In the later case, for example, “reviewing” workers may be asked to judge whether search results generated in response to a query were useful.
Although the foregoing embodiments have been described in some detail for purposes of clarity of understanding, the invention is not limited to the details provided. There are many alternative ways of implementing the invention. The disclosed embodiments are illustrative and not restrictive.
CROSS REFERENCE TO OTHER APPLICATIONS This application claims priority to U.S. Provisional Patent Application No. 61/403,834 entitled OUTSOURCING TASKS VIA A NETWORK filed Sep. 21, 2010 which is incorporated herein by reference for all purposes.
Number | Date | Country | |
---|---|---|---|
61403834 | Sep 2010 | US |