Getting started with Mechanical Turk

September 20th, 2011 by Yali

Amazon has done an excellent job of making Mechanical Turk very easy to use. It also provides great documentation to help users get started. The purpose of this post then is to provide a high level overview of how to:

  1. Conceptually to think about using Mechanical Turk
  2. Use the web UI Amazon provide to do the actual implementation

Define your HIT(s)

At the heart of every Mechanical Turk engagement is what Amazon calls a “Human Intelligence Task” or “HIT”. Each HIT is an independent unit of work.

As we mentioned in our last blog post, we have been using Mechanical Turk to check the language of a short content item. We already have an inkling what language each content item is, however we are only 70-80% sure that we are correct – so we use Mechanical Turk to get real people to check if each guess is correct.

In our case, then, each “HIT” consists simply of a worker checking the content and either confirming that the content is in the language we thought it was, or not. Notice this HIT has several important characteristics:

  1. It is a very simple to instruct. “Look at the below sentence. Is it in French? If so, click ‘yes’. Otherwise, click ‘no’.”
  2. It is independent. We have millions of sentences to check. However, the checking of each individual sentence is a completely independent task: there is no requirement that the person checking sentence A needs also to check sentence B. Hence it is possible that many hundreds or thousands of workers can work on the tasks in parallel, to ensure they are done quickly.
  3. It is repeatable. We can ask a number of different workers to perform the same task, and they should all give the same answer.  (This becomes important for ensuring the accuracy of results, because it means that we can verify the accuracy of individual tasks and individual workers.)

Best practices for defining HITs

1. Make your HIT “as small as possible”

Because you pay workers on a “per hit” basis, it is tempting to get workers to work harder to complete each HIT, for example by asking them to provide more information to complete each HIT. (For example, asking them not just whether or not a webpage contains pornographic content, but also whether it also contains expletives.)  Avoid this temptation! It makes the instructions more difficult to follow (which increases the probability of inaccurate results), makes it harder to assess accuracy (because you have to assess the accuracy of two answers per HIT rather than one) and does not save money (because workers expect more money for HITs that take longer).

Instead, ask just one question per HIT. If there are two questions you have of a specific content item / web page / anything else, create 2 HITs. This keeps each HIT simple and allows you to adopt different strategies to ensure the accuracy of answers for each question.

2. Ask closed questions rather than open questions

Instead of asking “what language this sentence is”, ask “is this sentence in French?”.  Closed questions are preferable for a number of reasons:

  • The results are easier to analyse. HITs are performed by humans, but the output should be machine-processable if you plan to use Mechanical Turk as part of a set of scalable business processes.  That will be much easier if only one of a finite number of results are possible for each HIT
  • It is easier to assess the accuracy of the results
  • Closed questions are (normally) easier for the workers to answer. This makes it more likely that they will give accurate answers

Publishing HITs on Mechanical Turk

Amazon provides 3 ways to create HITs, publish them and collect the results: a web UI, a command-line interface and an API. When getting started, the web UI is the simplest interface to use.

1. Set the basic parameters of the HIT

Before creating a new HIT template, Amazon makes users fill in a long webform. Some guidance on the individual sections:

Describe your HIT

HIT “Title”, “Description” and “Keywords” should all be filled in to make it as easy as possible for workers to find your HIT (by searching for appropriate keywords e.g. “French, language”) and define the task to do as clearly and unambiguously as possible.

Working on your HIT

Overestimate the amount of time allotted to each HIT. Workers are typically good at looking at a couple of sample HITs and making a guess as to how long each will take – they will use this figure (rather than the figure you give) to calculate the effective hourly wage you’re paying. So the only real impact of filling in this field is to cut workers short when they’re doing a task: for that reason it’s always best to give a much high number than is realistic.

Amazon also offers a set of options for ensuring that only workers with a specific history of accuracy are allowed to fill in your results. There are a number of different strategies to ensuring accuracy, we will discuss these in a later blog post. To start, we suggest not limiting HITs to “Masters”. There are other, more reliable means, of ensuring accuracy.

Paying workers

There are no hard and fast rules on what you should pay workers. The more you pay, the more likely workers will be to prioritise your HITs and the faster they will be completed. We generally look to pay workers at least $5 per hour, and have been impressed by the speed with which work has been completed at that rate.

2. Create the HIT template

The second step is to design the “HIT template”: the webform with the question each worker will answer. Assuming each HIT involves asking a closed question, Amazon provides a visual designer to make creating the form straightforward:

The corresponding HTML for the form is super simple, and shown below:

<h3><span style="font-family: Arial; ">Is this Twitter update in French?</span></h3>
<pre><span style="font-family: Verdana; ">${Message}</span></pre>
<table cellspacing="4" cellpadding="0" border="0">
      <td valign="center"><span style="font-family: Arial; "><input type="radio" name="is_language" id="is_language_yes" value="yes" /><span class="answertext">Yes this update is in French</span></span></td>
      <td valign="center"><span style="font-family: Arial; "><input type="radio" name="is_language" id="is_language_no" value="no" /><span class="answertext">No this update is not in French</span></span></td>

Note the ${Message} placeholder. Before we run the HIT, we will upload a CSV file into Mechanical Turk. Each line in the CSV will represent one HIT, and each line will therefore need to contain a cell with the actual message who’s language we want to confirm. The title of that column will be “Message”, and Amazon will automatically insert that content into the webform where the ${Message} placeholder is found.

3. Upload the relevant data to Amazon and publishing the HIT

Once the template has been created, you need to upload the data to serve into each webform.

At the very least, the CSV you upload should have one column for every placeholder in the webform – in our example above, there is only one, ${message}.  It is also possible to introduce additional columns of data: for example, if the data comes from more than once source, you may want to indicate which source in a “source” column in the CSV. The source column will not impact the HITs. However, when the data comes back from Amazon (with the HIT results), the source fields will be included, and this will make it easy e.g. to analyse the accuracy of different sources.

Note: Amazon is very fussy that the uploaded file is UTF-8 encoded. For that reason, we recommend not using MS Excel to generate or edit CSV files used with Mechanical Turk, because Excel has a tendency to mess up the character encoding.

Once the data is uploaded, you can preview the HIT, as below:

Assuming you are satisfied with the HIT as previewed, you are in a position to go live!

4. Downloading the results

Once workers have completed all your HITs, you will want to download the results. Again, Amazon makes this straightforward via their Web UI:

5. Next steps

Using the results: approaches to accuracy

Clearly there’s no point in going to the effort (and expense) of using Mechanical Turk unless you use the results to do something useful. Quite what that is depends on your use case – but in most use cases, you’ll want some way to ensure that the results are accurate. This is a big topic: we’ll discuss one particular approach to accuracy in a forthcoming blog post.

Completing the Mechanical Turk workflow: approving HITs and paying workers

Before workers are paid, you have to approve the work that they’ve done. That approval process should be part of your general approach to accuracy – again, we’ll be covering this in a subsequent blog post.

Assuming, though, that you have verified the accuracy of the work, you will need to approve each HIT. In the downloaded CSV, Amazon will have included an “approved” column. The simplest way to approve HITs is by marking an “x” in this column for each approved HIT, and then uploading it back into Mechanical Turk. Amazon will then deduct the funds from your account, and pay the workers accordingly.

Stay tuned…

Hopefully this blog post has given you the encouragement to start experimenting with Mechanical Turk to automate some of your own business processes using HITs. Feel free to share your experiences getting started with Mechanical Turk in the comments below, and in the next post in this series we will be looking at strategies for improving the quality of your results.

Leave a Reply