My HITS have a rather high bounce rate. Between 40-50% of the Turkers who preview my HIT, choose not to accept it. I previously posted a histogram of the screen widths that I observed from workers who had accepted at least one HIT. That is very clearly a biased sample; it could be that only workers with screens large enough to comfortably display my HIT choose to accept it. I was curious to see if there was another population of Turkers that chose not to accept my HIT because their screens were too small.
I made the necessary modifications to my webapp and then generated the following graph:
Screen resolution observed for Experiment 15
You’ll notice that there isn’t much of a practical difference between the workers who accept the HIT and those that do not. This makes me feel a little better. I’m not worried that my bounce rate is due to a display artifact. It does make me wonder though, is my bounce rate typical?
I wrote in my previous post about some scripts I developed to make interacting with Amazon Mechanical Turk a bit easier from the command line. I didn’t talk much about the web2py + Google AppEngine piece that actually serves the ExternalQuestion HITS. I realized after talking to another PhD student that people might be interested in a way to host general purpose HITS for free. My web2py application does that.
Getting it running is pretty simple. Sign up for an AppEngine account, install their SDK, install web2py into the SDK, and copy my application (really a directory) into your web2py installation. Use the development webserver included with the SDK to test that your installation is sane. Push it to google to make sure your account is working right. Once you have it running on appspot, install the Amazon Command line tools, install my wrapper scripts to make them behave better, and run the test experiment on the sandbox to verify that AMT + GAE + web2py are talking nicely in the clouds.
Once that’s done, defining your own rubrics for HITS is pretty easy:
- Define a new controller to handle your custom rubrics. Example:
grade6.py to hold
- Copy the provided method template and modify it to handle a given rubricCode. The template builds a SQLFORM.factory to present the questions to Turkers and then validate the form input. Once the form is accepted, the method processes the result (scores it) and forwards it to a generic method to write it to the GAE datastore and sends it back to AMT. Example:
def grade6_test2_problem35_version1() ....
- Copy the provided view template and modify it to ask the question you want for a given rubric code. The template extends a view class that knows how to display an informed consent in preview mode and track whenever a Turker clicks on a form element. It uses standard web2py tricks to protect you against injection and give you form validation for free. Example:
- Prep the HITS by using my scripts to cross the rubricCode with the image (or whatever) you want to display. Run it on the sandbox and test it out. Promote the experiment and run it on production when you are ready.
Okay. Well maybe the setup doesn’t look easy, but a complete definition of a HIT is just over 125 lines of code including comments. That’s not really that bad. It’s a heck-of-a-lot easier than trying to put an ExternalQuestion together from scratch. If the internet is interested, I’ll clean up the application code (read: remove the parts pertinent to my research and IRB restrictions) and post it on GitHub. Leave a comment or send me mail if you are interested.
Amazon Mechanical Turk is a useful thing. Interacting with it can be a giant pain.
Anything under 700 pixels is fair game.
But that’s not what this post is about. I’d like to eventually use boto to build the control of AMT directly in to the webapp itself. For now, I’m using the command line interface that Amazon provides. The CLT is an ugly hack that implements the Java API in a bunch of shell scripts. I have written my own wrapper around their scripts that enforces a certain amount of sanity. You can get the scripts over on gitHub.
There isn’t really any documentation beyond the scripts themselves. The idea is that you create an
amt-script directory where the new-and-improved scripts live. Under that directory, you create several
exp# directories that hold the info that you need for experiments. Even ones are for production runs. Odd ones are for sandbox runs. Once you get a sandbox run working you run
buildGoLiveExp.sh and it makes a copy from the staged experiment to a new go-live experiment. It’s a bit of a hack at the moment, but it works for me. I like it because it gives me an audit trail for each thing I run on AMT. Feel free to use them yourself. (Or use them as inspiration for something better that you can write yourself!)