Back in undergrad, I wrote a term paper for May Berenbaum’s Insects & People honors seminar. The assignment was to relate something about insects to people’s daily lives. At the time, I was finishing up the first course in the year long classical mechanics sequence, so my daily life was physics problem sets. For the first time, I noticed something. There are a lot of insects in mechanics word problems (if a roach is walking on a turntable…; if a bee files in a spiral parametrized by… ). It seems they were always there, but take a class on insects and all of sudden you notice them making cameos. So, that’s what I wrote about. It was a fun paper.
Seven years later, this paper makes it into the first chapter of her book: The Earwig’s Tail: A Modern Bestiary of Multi-legged Legends.
It’s kinda funny; reading it on Google Books. My only complaint is that she misspelled my last name. It’s not Van Houdnos; it’s VanHoudnos. That’s why I didn’t know about it; I had never google stalked myself with the alternate spelling of “nathan van houdnos” before.
I’m pretty excited about this. I’ll have to see if she’ll send me a signed copy. :)
This essay in the American Entomologist is actually my first citation in a published work. It’s from Spring 2003, so it beats the book by several years. (Of course, the essay is the relevant part of the book; it’s not like I have two citations or anything.)
My HITS have a rather high bounce rate. Between 40-50% of the Turkers who preview my HIT, choose not to accept it. I previously posted a histogram of the screen widths that I observed from workers who had accepted at least one HIT. That is very clearly a biased sample; it could be that only workers with screens large enough to comfortably display my HIT choose to accept it. I was curious to see if there was another population of Turkers that chose not to accept my HIT because their screens were too small.
I made the necessary modifications to my webapp and then generated the following graph:
Screen resolution observed for Experiment 15
You’ll notice that there isn’t much of a practical difference between the workers who accept the HIT and those that do not. This makes me feel a little better. I’m not worried that my bounce rate is due to a display artifact. It does make me wonder though, is my bounce rate typical?
I wrote in my previous post about some scripts I developed to make interacting with Amazon Mechanical Turk a bit easier from the command line. I didn’t talk much about the web2py + Google AppEngine piece that actually serves the ExternalQuestion HITS. I realized after talking to another PhD student that people might be interested in a way to host general purpose HITS for free. My web2py application does that.
Getting it running is pretty simple. Sign up for an AppEngine account, install their SDK, install web2py into the SDK, and copy my application (really a directory) into your web2py installation. Use the development webserver included with the SDK to test that your installation is sane. Push it to google to make sure your account is working right. Once you have it running on appspot, install the Amazon Command line tools, install my wrapper scripts to make them behave better, and run the test experiment on the sandbox to verify that AMT + GAE + web2py are talking nicely in the clouds.
Once that’s done, defining your own rubrics for HITS is pretty easy:
- Define a new controller to handle your custom rubrics. Example:
grade6.py to hold
- Copy the provided method template and modify it to handle a given rubricCode. The template builds a SQLFORM.factory to present the questions to Turkers and then validate the form input. Once the form is accepted, the method processes the result (scores it) and forwards it to a generic method to write it to the GAE datastore and sends it back to AMT. Example:
def grade6_test2_problem35_version1() ....
- Copy the provided view template and modify it to ask the question you want for a given rubric code. The template extends a view class that knows how to display an informed consent in preview mode and track whenever a Turker clicks on a form element. It uses standard web2py tricks to protect you against injection and give you form validation for free. Example:
- Prep the HITS by using my scripts to cross the rubricCode with the image (or whatever) you want to display. Run it on the sandbox and test it out. Promote the experiment and run it on production when you are ready.
Okay. Well maybe the setup doesn’t look easy, but a complete definition of a HIT is just over 125 lines of code including comments. That’s not really that bad. It’s a heck-of-a-lot easier than trying to put an ExternalQuestion together from scratch. If the internet is interested, I’ll clean up the application code (read: remove the parts pertinent to my research and IRB restrictions) and post it on GitHub. Leave a comment or send me mail if you are interested.
Amazon Mechanical Turk is a useful thing. Interacting with it can be a giant pain.
Anything under 700 pixels is fair game.
But that’s not what this post is about. I’d like to eventually use boto to build the control of AMT directly in to the webapp itself. For now, I’m using the command line interface that Amazon provides. The CLT is an ugly hack that implements the Java API in a bunch of shell scripts. I have written my own wrapper around their scripts that enforces a certain amount of sanity. You can get the scripts over on gitHub.
There isn’t really any documentation beyond the scripts themselves. The idea is that you create an
amt-script directory where the new-and-improved scripts live. Under that directory, you create several
exp# directories that hold the info that you need for experiments. Even ones are for production runs. Odd ones are for sandbox runs. Once you get a sandbox run working you run
buildGoLiveExp.sh and it makes a copy from the staged experiment to a new go-live experiment. It’s a bit of a hack at the moment, but it works for me. I like it because it gives me an audit trail for each thing I run on AMT. Feel free to use them yourself. (Or use them as inspiration for something better that you can write yourself!)
I figured I’d document this since I have to do it again. If you are using Google App Engine and you get the following:
2011-01-26 14:17:45,892 WARNING appengine_rpc.py:405 ssl module not found.
Without the ssl module, the identity of the remote host cannot be verified, and
connections may NOT be secure. To fix this, please install the ssl module from
To learn more, see http://code.google.com/appengine/kb/general.html#rpcssl .
You download the file from the page suggested in the error. Then do this:
vanhoudn@gauze:~/Downloads$ tar xzf ssl-1.15.tar.gz
vanhoudn@gauze:~/Downloads$ cd ssl-1.15/
vanhoudn@gauze:~/Downloads/ssl-1.15$ python2.5 setup.py build
If it worked, great! Just do
sudo python2.5 setup.py install and you are done. If not, you may have run into this error:
vanhoudn@gauze:~/Downloads/ssl-1.15$ python2.5 setup.py
looking for /usr/include/openssl/ssl.h
looking for /usr/local/ssl/include/openssl/ssl.h
looking for /usr/contrib/ssl/include/openssl/ssl.h
Traceback (most recent call last):
File "setup.py", line 167, in
ssl_incs, ssl_libs, libs = find_ssl()
File "setup.py", line 142, in find_ssl
raise Exception("No SSL support found")
Exception: No SSL support found
You can fix that by installing the relevant parts of the SSL source code with
sudo apt-get install libssl-dev. Trying again you might get:
vanhoudn@gauze:~/Downloads/ssl-1.15$ python2.5 setup.py build
looking for /usr/include/openssl/ssl.h
looking for /usr/include/krb5.h
looking for /usr/kerberos/include/krb5.h
building 'ssl._ssl2' extension
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I./ssl/2.5.1 -I/usr/include/python2.5 -c ssl/_ssl2.c -o build/temp.linux-i686-2.5/ssl/_ssl2.o
ssl/_ssl2.c:17:20: error: Python.h: No such file or directory
In file included from ssl/_ssl2.c:76:
./ssl/2.5.1/socketmodule.h:112: error: expected specifier-qualifier-list before ‘PyObject_HEAD’
.... lots of other errors ....
error: command 'gcc' failed with exit status 1
You can fix that by installing the headers for python2.5. Apparently, when I initially installed deadsnakes I forgot to install the headers. Easy enough with
sudo apt-get install python2.5-dev. But not I get a bluetooth related error:
In file included from ssl/_ssl2.c:76:
./ssl/2.5.1/socketmodule.h:45:33: error: bluetooth/bluetooth.h: No such file or directory
./ssl/2.5.1/socketmodule.h:46:30: error: bluetooth/rfcomm.h: No such file or directory
./ssl/2.5.1/socketmodule.h:47:29: error: bluetooth/l2cap.h: No such file or directory
./ssl/2.5.1/socketmodule.h:48:27: error: bluetooth/sco.h: No such file or directory
So, you guessed it:
sudo apt-get install libbluetooth-dev. Then finally success! Finish with
sudo python2.5 setup.py install and the GAE error goes away!
I’m one of those people who often gets lost on Wikipedia. I also get distracted with the news. And Arts & Letters Daily and just about everything it links to. I’ve been know to follow several blogs. I spend too much time on Twitter pursuing the links that people share. All in all, I spend a lot of time reading and thinking about the wider world (or, really, just the parts of it that I am interested in).
This semester I’m much too busy to do this. I’ve got too many things to do to just keep up, one huge thing to get off the ground, and a giant plane to land before May. So, here it goes. I’m giving up the internet. I’ll still blog because I need the practice writing, but unless it’s a comment on the blog, I’m not going to follow it. Email will also work, if you want to get a hold of me. But I’m no longer allowing myself to check the dozens of RSS feeds I follow. No more twitter. No more news or blogs. I’ve got work-related reading to do if I’m bored. I’ll come out of the cave eventually. This too shall pass.
I’m updating my resume/cv for content and style. To help with the latter, I’m following some examples that use the
moderncv LaTeX package.
I had trouble getting it to work. I like the look of this example, but I couldn’t get the provided TeX to compile. Instead, the error I got was:
LaTeX Error: File `marvosym.sty' not found.
Googling didn’t help me find an answer, so I searched the apt repository manually to see if this file was provided by a missing package:
root@gauze:~# apt-cache search marvosym
texlive-fonts-recommended - TeX Live: Recommended fonts
ttf-marvosym - Symbol font for school and office
root@gauze:~# apt-get install texlive-fonts-recommended
fixed the problem. So now, Google, index this page and be more useful to the future.
One of the reasons that I don’t often take advantage of the cool features in Revolution R is that I absolutely can’t stand their Visual Studio interface. Previously, if I wanted to run something in RevoR, I fired up the RGui.exe that comes buried in their distribution and used R’s built in script editor. My normal workflow is to use StatEt inside of Eclipse, so dealing with R’s meager editor was always painful. (Although less painful than the bloated VS-standalone alternative.)
Over the break, I ran across Luke Miller’s excellent post on getting Eclipse setup with StatEt the right way. I was able to follow his tutorial to get vanilla 64-bit R setup on a new installation of 64-bit Eclipse Helios. Once that was working, I changed two things to add a second shortcut for Revo R.
First, I followed his directions to install
rJava in RevoR:
R version 2.11.1 (2010-05-31)
Copyright (C) 2010 The R Foundation for Statistical Computing
Type 'revo()' to visit www.revolutionanalytics.com for the latest
Revolution R news, 'forum()' for the community forum, or 'readme()'
for release notes.
package 'rJava' successfully unpacked and MD5 sums checked
The downloaded packages are in
And then installed
rj in RevoR, once again using his directions.
C:RevolutionRevo-4.0RevoEnt64R-2.11.1bin>R CMD INSTALL --no-test-load "C:UsersnathanvanDownloadsrj_0.5.2-1.tar.gz"
* DONE (rj)
And finally setup Eclipse with a second Run Configuration which I named Revo-R-x64-2.11.1. Now I can run the 64bit version of RevoR without having to deal with the VisualStudio interface. If I get around to it, I’ll post some performance numbers. (The last time I used the VS interface, it was noticeably slower than calling RGui.exe directly.)
After talking with Cosma Shalizi, I’ve decided to set my work-related New Year’s Resolution as blogging once a week. This is more of a goal then a commitment; if I don’t have anything useful to share or I don’t have time to share it, there won’t be a post. The hope is to be high-quality at low-volume. I intend to blog the afternoon after my Advanced Probability problem set is due.
I’ll try to post (in roughly increasing frequency):
- General summaries of things that I finish (papers, released code, file-drawer projects).
- Small self contained bits of research when I can. These will be tidbits too small to go up on arXiv, but likely to be of use to at least someone. First up: subsetting rating data.
- Reviews of research aimed at a general audience with lots of links and as much context as I can muster.
- Ideas that I don’t have time to pursue but would rather like someone else to actualize so that I could purchase their product/service.
- Reactions to scholarly ideas in the popular press. (If I’m training to become a public intellectual, I might as well start.)
- Snippets of code that may be of general use. (Including benchmarking and howto posts.)
Let’s see if I can keep this up for at least a semester.
It’s been awhile.
I’ve got some preliminary results that I’ll be presenting to the PIER EdBag on Thursday. If there is interest, I’ll post my slides for those of you who are not able to make it.
I need to redo my computing setup; the easiest way to do this is nuke everything and restore my files. Unfortunately that would mean that I’d have to get my application stack setup again (albeit a better setup, hence the exercise). But I don’t have the time to take the time to get setup in a way that would save me time. So, hobble, hobble.
At least it’s almost Thanksgiving.