Giving up the internet

I’m one of those people who often gets lost on Wikipedia. I also get distracted with the news. And Arts & Letters Daily and just about everything it links to. I’ve been know to follow several blogs. I spend too much time on Twitter pursuing the links that people share. All in all, I spend a lot of time reading and thinking about the wider world (or, really, just the parts of it that I am interested in).

This semester I’m much too busy to do this. I’ve got too many things to do to just keep up, one huge thing to get off the ground, and a giant plane to land before May. So, here it goes. I’m giving up the internet. I’ll still blog because I need the practice writing, but unless it’s a comment on the blog, I’m not going to follow it. Email will also work, if you want to get a hold of me. But I’m no longer allowing myself to check the dozens of RSS feeds I follow. No more twitter. No more news or blogs. I’ve got work-related reading to do if I’m bored. I’ll come out of the cave eventually. This too shall pass.

moderncv won’t work on Ubuntu 10.04 because of marvosym.sty

I’m updating my resume/cv for content and style. To help with the latter, I’m following some examples that use the moderncv LaTeX package.

I had trouble getting it to work. I like the look of this example, but I couldn’t get the provided TeX to compile. Instead, the error I got was:

LaTeX Error: File `marvosym.sty' not found.

Googling didn’t help me find an answer, so I searched the apt repository manually to see if this file was provided by a missing package:

root@gauze:~# apt-cache search marvosym
texlive-fonts-recommended - TeX Live: Recommended fonts
ttf-marvosym - Symbol font for school and office

Installing

root@gauze:~# apt-get install texlive-fonts-recommended

fixed the problem. So now, Google, index this page and be more useful to the future.

Revolution R with Eclipse Helios

One of the reasons that I don’t often take advantage of the cool features in Revolution R is that I absolutely can’t stand their Visual Studio interface. Previously, if I wanted to run something in RevoR, I fired up the RGui.exe that comes buried in their distribution and used R’s built in script editor. My normal workflow is to use StatEt inside of Eclipse, so dealing with R’s meager editor was always painful. (Although less painful than the bloated VS-standalone alternative.)

Over the break, I ran across Luke Miller’s excellent post on getting Eclipse setup with StatEt the right way. I was able to follow his tutorial to get vanilla 64-bit R setup on a new installation of 64-bit Eclipse Helios. Once that was working, I changed two things to add a second shortcut for Revo R.

First, I followed his directions to install rJava in RevoR:

C:Usersnathanvan>cd C:RevolutionRevo-4.0RevoEnt64R-2.11.1bin
C:RevolutionRevo-4.0RevoEnt64R-2.11.1bin>R.exe

R version 2.11.1 (2010-05-31)
Copyright (C) 2010 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
...
Type 'revo()' to visit www.revolutionanalytics.com for the latest
Revolution R news, 'forum()' for the community forum, or 'readme()'
for release notes.

> install.packages("rJava")
...
package 'rJava' successfully unpacked and MD5 sums checked

The downloaded packages are in
C:UsersnathanvanAppDataLocalTempRtmpG3tMzbdownloaded_packages

And then installed rj in RevoR, once again using his directions.

C:RevolutionRevo-4.0RevoEnt64R-2.11.1bin>R CMD INSTALL --no-test-load "C:UsersnathanvanDownloadsrj_0.5.2-1.tar.gz"
...
* DONE (rj)

And finally setup Eclipse with a second Run Configuration which I named Revo-R-x64-2.11.1. Now I can run the 64bit version of RevoR without having to deal with the VisualStudio interface. If I get around to it, I’ll post some performance numbers. (The last time I used the VS interface, it was noticeably slower than calling RGui.exe directly.)

A common one

After talking with Cosma Shalizi, I’ve decided to set my work-related New Year’s Resolution as blogging once a week. This is more of a goal then a commitment; if I don’t have anything useful to share or I don’t have time to share it, there won’t be a post. The hope is to be high-quality at low-volume. I intend to blog the afternoon after my Advanced Probability problem set is due.

I’ll try to post (in roughly increasing frequency):

  1. General summaries of things that I finish (papers, released code, file-drawer projects).
  2. Small self contained bits of research when I can. These will be tidbits too small to go up on arXiv, but likely to be of use to at least someone. First up: subsetting rating data.
  3. Reviews of research aimed at a general audience with lots of links and as much context as I can muster.
  4. Ideas that I don’t have time to pursue but would rather like someone else to actualize so that I could purchase their product/service.
  5. Reactions to scholarly ideas in the popular press. (If I’m training to become a public intellectual, I might as well start.)
  6. Snippets of code that may be of general use. (Including benchmarking and howto posts.)

Let’s see if I can keep this up for at least a semester.

hobble hobble

It’s been awhile.

I’ve got some preliminary results that I’ll be presenting to the PIER EdBag on Thursday. If there is interest, I’ll post my slides for those of you who are not able to make it.

I need to redo my computing setup; the easiest way to do this is nuke everything and restore my files. Unfortunately that would mean that I’d have to get my application stack setup again (albeit a better setup, hence the exercise). But I don’t have the time to take the time to get setup in a way that would save me time. So, hobble, hobble.

At least it’s almost Thanksgiving.

A memory leak in getResults from AMT CLT? Nope.

I had a strange error popping up. Every time I tried to use the getResults.sh script outside of a project in the samples directory, an unidentified process would eat up all of the memory on my machine. Worse, when I used strace, the process seemed to be just hanging there.

I tried a bunch of things:

  1. Moving the directory back to samples. No dice.
  2. Switching from OpenJDK to Sun’s JDK. Since the CLT doesn’t support 1.6, I followed the directions here to install sun-java5-jdk and set JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun/jre/
  3. Taking another close look at the script.

Here an excerpt from the the getResults.sh script:

JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun/jre/
export JAVA_HOME
DIR=`pwd`
./getResults.sh $1 $2 $3 $4 $5 $6 $7 $8 $9 -successfile $DIR/rate-3-4-36.success -outputfile $DIR/rate-3-4-36.results
cd $DIR

Notice that I forgot to change to the bin directory of the CLT. (I don’t know why, but you seemingly have to be in the bin directory for the CLT to work properly. I haven’t messed with it much yet; I might have screwed up my path somehow when I first tried.)

If you add the line:

cd ~/workspace/aws-mturk-clt-*/bin

It works. My script was recursing on itself. That’s what was eating the memory. And it didn’t show up in top because each invocation of the script used up only a very little bit of the memory. Man I feel dumb.

I only post this in case someone else has these troubles.

XML for AMT ExternalQuestion

If you are trying to use Amazon Mechanical Turk for something and you don’t know XML very well, you might run into this error when trying to write an ExternalQuestion with the command line tools.

vanhoudn@gauze:~/workspace/aws-mturk-clt-1.3.0/samples/rate_tests$ ./run.sh
--[Initializing]----------
Input: ../samples/rate_tests/rate_tests.input.csv
Properties: ../samples/rate_tests/rate_tests.properties
Question File: ../samples/rate_tests/rate_tests.question
Preview mode disabled
--[Loading HITs]----------
Start time: Thu Aug 05 16:12:33 EDT 2010
[Fatal Error] :6:111: The reference to entity "student2" must end with the ';' delimiter.
[ERROR] Error creating HIT 1 (3001): [6,111] The reference to entity "student2" must end with the ';' delimiter.
...

The issue is that I’m trying to pass more than one variable via the url and this:

<ExternalURL>

http://www.contrib.andrew.cmu.edu/~nmv/amt/test.php?

student1=${helper.urlencode($student1)}
&student2=${helper.urlencode($student2)}
&student3=${helper.urlencode($student3)}
</ExternalURL> 

Should instead be:

<ExternalURL>

http://www.contrib.andrew.cmu.edu/~nmv/amt/test.php?

student1=${helper.urlencode($student1)}
&student2=${helper.urlencode($student2)}
&student3=${helper.urlencode($student3)}
</ExternalURL> 

The & that separates the variables need to be replaced with &. Then it works just fine.

Better ways to access NCES data

Since I’m supported by a Institute for Education Sciences (IES) pre-doctoral training fellowship (ours is called PIER), I had the opportunity to attend the 2010 IES Research Conference: Connecting Research, Policy and Practice. The morning of the last day of the conference, John Easton, the director of IES, held a round-table discussion with all of the pre-doc students that IES supports.

I asked him why the National Center for Education Statistics (NCES), which is a sub-agency of IES, only made their data available as flat files. I compared their offerings with the World Bank Data Catalog which goes so far as to offer an API to access the data in addition to csv and xml flat files. I also mentioned that even though they have a lot of GIS information (such as the boundaries of every school district since the mid 90s) they don’t make it easy to mash up that information onto, say a Google Maps layer.

Dr. Easton replied that he was aware of the problem and aware of the limitations of the current set of data tools offered by NCES. He asked for concrete use cases that he could forward to the appropriate people. That way they can develop towards requests from users instead of guesses about what they think users might want.

I’d like to get back to him in about a week and I thought I’d open the discussion to others. What would you like to see NCES offer in terms of data access?

Messing with R packages

This was really frustrating. I’m trying to modify a package from Matt Johnson and although I could get the package he sent me to install flawlessly, I couldn’t un-tar it, make a change, re-tar it, and then R CMD INSTALL it. I was about to pull out my hair. The error I got was:
ERROR: cannot extract package from ‘hrm-rev9.tar.gz’

The secret: you have to have the name correct.
R CMD INSTALL hrm-rev9.tar.gz
barfs. But
R CMD INSTALL hrm_0.1-9.tar.gz
works fine. I’m sure it’s somewhere in the docs. I just couldn’t find it.

As always, I made a script to do it for me: (Updated 6/17/2010 15:41)

#!/bin/bash
# Quick script to tar & gzip the package, remove the old one, and install the new one
# I'll add options automatically tag and release it later.

#Set the library that I'm using
LIB="/home/vanhoudn/R/i486-pc-linux-gnu-library/2.10/"

#Commit
svn commit -m "Build commit"

#get the revision number from svn
REV=`svn info -R | grep Revision | cut -d: -f 2 | sort -g -r | head -n 1 | sed 's/ //g'`

#Build the filename
FILENAME="hrm_0.1-$REV.tar.gz"

# I need to tar up the pkg so I can install it.
# Jump to the parent directory and work from there.
cd ..
# Exclude any hidden files under the directories (svn has a bunch)
# and add the named files
tar czf $FILENAME --exclude '.*' hrm/DESCRIPTION hrm/NAMESPACE hrm/src hrm/R

# Remove the old version of the package
R CMD REMOVE -l $LIB hrm

# Install the new package
R CMD INSTALL $FILENAME

# Clean up
rm $FILENAME

# Go back to our previous directory
cd hrm

StatEt in Ubuntu 10.04

I wanted a “lightweight” version of Eclipse to run R from Ubuntu. (I installed eclipse-pde using apt-get. It worked fine.) Once it was running, I installed StatEt via the “Install new software” feature from http://download.walware.de/eclipse-3.5. While it was downloading, I opened up an R console and ran install.packages("rJava"). When the installation of both StatEt and rJava finished I restarted Eclipse. This is when things stopped working and I couldn’t really find any step-by step directions on how to proceed. Here is what I did:

  1. Run -> Run Configurations
  2. Click on R-Console in the left pane. This will create a new run configuration. Change the name to “R 2.10″
  3. Click on the “R_Config” tab. Choose “Selected Configuration:” and then hit the “Configure…” button.
  4. Click “Add”. Change “Location (R_Home):” to “/usr/lib/R” and click “Detect Default Properties/Settings” Click “Ok” until you are back to the “Run Configurations” window
  5. This is the important step. Without it you will get

    Launching the R Console was cancelled, because it seems starting the Java process/R engine failed.
    Please make sure that R package 'rJava' with JRI is installed and look into the Troubleshooting section on the homepage.

    Click on the JRE tab. In the “VM Arguments” box add
    -Drjava.path=/home/<username>/R/i486-pc-linux-gnu-library/2.10/rJava

    Where <username> is your username. (You are providing the path to rJava, for some reason, even though Eclipse will detect it during the setup in the “R_Config” step, it doesn’t seem to share that information with the JRE.)

  6. Click Run. It should work.