R Tips and Links

Links     The Shiny package     Tips for working in R     R Programming Tips     Problems and Solutions     Useful Functions    


Helpful R Links

  1. My R Class Notes    (Advanced)
  2. Official R Site Search     RSeek (Google-type search for R related material)     Inside R (documentation for base R and packages)
  3. Documentation (incl. Download)   (Hint: Try creating a bookmark to "C:\Program Files\R\rw2001\doc\html\rwin.html" in Windows; substitute your most recent version for "rw2001"; the link to "Search Engine and Keywords" is most helpful.)
  4. Packages     Crantastic Package Page     R Example Graph Library
  5. FAQ     R-help for asking questions   Bug reporting    
  6. R Inferno: problems and solutions
  7. R Studio: an indegrated development environment for R
  8. R color chart (pdf)
  9. Intro to R
  10. R Language Definition
  11. Operator Precedence
  12. R Data Import/Export
  13. R Reference Guide (pdf, 450 pages, 12MB)
  14. Evaluating the design of the R language (pdf)
  15. aRrgh: a newcomer's (angry) guide to R (gripes from the CS community)
  16. R inferno (traps and tips)
  17. Advanced R Programming
  18. Vectorization in R
  19. R Data Import/Export
  20. Big Data with R (and Python)
  21. A good proposed Programming Style Guide
  22. Areas of statistics: CRAN Task Views
  23. Lumley's R Fundamentals (pdf)
  24. R Reference Card
  25. Peng's Debugging in R (pdf)
  26. R Programming Resource Center
  27. Mathematical annotation in plots ("plotmath")
  28. R-News
  29. R wiki
  30. Spherula (R quick reference, scripts, book notes, etc.)
  31. OmegaHat: interfaces to other languages
  32. Package gdata has function read.xls() which can read Excel files. An alternative is the RODBC package.
  33. Exchanging data between R and MS Windows apps (Excel, etc)
  34. Rtools needed for building packages (choose "Download R for" and an operating system)
  35. Nabble R forum
  36. Kickstarting R
  37. Using R for psychological research (personality-project)
  38. York U R tips
  39. Books
  40. Theresa Scott's tutorial
  41. O'Reilly free tutorial
  42. R for Cats tutorial
  43. CUNY R Tutorial
  44. ILSTU 1-Page R Tutorial (Windows)
  45. Tips for Creating, Modifying, and Checking Data Frames
  46. Practical Regression and Anova using R (pdf)
  47. Non-Parametric Inference with R by Larry and Chad (pdf)
  48. Example of running repeated measures in R to match SPSS (etc)
  49. nlme() mixed models guide (pdf)

"Shiny": a package for creating interactive R applications with a web browser interface

  1. Try it: To try shiny, just click on one of the links given here. (Note: These run on a remote server. Normally you will develop, and perhaps deploy, your shiny apps from within your R session.)
  2. Official Homepage: The Shiny website
  3. Function Reference: Shiny at Inside R
  4. Official Tutorial: The shiny tutorial
  5. Getting started:
    1. Start R in any directory
    2. Install shiny using install.packages("shiny") if it has not previously been installed.
    3. Run library("shiny") once per R session (or place this command in the .First() function).
    4. Place files named ui.R and server.R in the directory (or in another directory, e.g., named "foo"). You can create these files from scratch, or you may want to start with the files linked here, which implement a simple histogram vs. boxplot app.
    5. From R in your working directory, run runApp() (or runApp("foo")) if the ui.R and server.R files were placed in another directory).
    6. Interact with the app in the browser window.
    7. Make whatever additions / changes you need to either file to change the app so that it does what you want it to do (see the official tutorial and Inside R for details).
    8. Click the browser's "reload" button to see the updated app after making any changes. (Check the R console for possible error messages.)
    9. To quit the app, use the "escape" key in the R console window.
  6. Five ways to deploy your shiny app:
    1. Distribute your ui.R and server.R files to users. They only need to put them in a directory, load shiny in R, and run an R command like runApp() or runApp("myShinyDirectory").
    2. Put your ui.R and server.R files in a github gist. Then users only need to load shiny and run an R command like runGist("myFirstShinyApp")
    3. Put your ui.R and server.R files in zip file (as a subdirectory) and put the zip file on your website. Then users only need to load shiny and run an R command like runUrl("myFirstShinyApp.zip")
    4. Let R Studio host your app on their server. Your users only need to enter the URL in their browser
    5. Set up your own server (advanced). Your users only need to enter the URL in their browser

Tips for Working in R

  1. Use helpstart() to bring up help in a browser; the link to "Search Engine and Keywords" is most useful.
  2. When looking at complex expressions, decode them by working from the inside out. E.g., here is a decomposition of some code to make a density plot of the product to two normal random variables using a sample of size 20. (Note the optional use of "tmp" to keep the lines so that they all use the same random numbers.)
         plot(density(apply(matrix(rnorm(40),20), 1, prod)))
         tmp = rnorm(40)   # a vector of 40 standard normal variates
         tmp
         matrix(tmp,20)  # put into a matrix of 20 rows and 40/20=2 columns
         apply(matrix(tmp,20), 1, prod)  # the 20 products
         density(apply(matrix(tmp,20), 1, prod)) # the density estimate
         plot(density(apply(matrix(tmp,20), 1, prod))) # the plot
        
  3. Keep a text record of all working R commands needed to re-run your analysis. Ideally you should be able to source() the file and recreate your work, e.g. if your client finds an error in the data (which happens 98% of the time according to Seltman's Law of Data Analysis).
  4. Under Linux, "ESS" (Emacs Speaks S) is usually the most efficient way to work. Briefly, you start emacs, then use "Alt-X R" to start R from within emacs. The ESS menu (with keyboard shortcuts) allows you to automatically run code that you write, among other features. The home page is ESS .
  5. Write out TRUE and FALSE, because T and F can be redefined.
  6. When defining and redefining columns of a data.frame, make liberal use of summary() and table(..., exclude=NULL) to verify that you accomplished what you tried to accomplish.
  7. Important:Remember that table() ignores missing data; use table(..., exclude=NULL) to also see missing data.
  8. You can use the .First function to automatically load libraries that you frequently use, or to perform other startup tasks. E.g.
     .First = function() {library(nlme); options(locatorBell=FALSE)} 
    will load the nlme library every time R starts up in the current directory. It also turns off the annoying sound associated with the locator() function.
  9. Use this function to find large, unneeded objects that can be removed to free up space:
         sizes = function() {
           ob = objects(name=parent.frame())
           rslt = sapply(ob,function(x){object.size(get(x))})
           return(sort(rslt))
         }
         
  10. Contrast testing in R (using C() or contrasts()), ignores your scaling, so although the t-values and p-values are correct, the estimates and standard errors (and any confidence intervals you construct from them) are incorrect. To do this correctly, use fit.contrast() in package "gmodels". E.g.
         x = factor(rep(LETTERS[1:3], each=20)); y = rnorm(60)
         m1 = aov(y~x)
         library(gmodels)
         cont = rbind(AvsBC = c(1, -1/2, -1/2), BvsC = c(0, 1, -1))
         fit.contrast(m1, "x", cont, conf.int=0.95)
         
  11. If you are working on a public computer without write access to where most of R lives, you can still install packages to a private space (Windows example shown here, but it is similar on Linux). Make a directory you can write to, e.g., c:\\myPackages. In R, to install, e.g., package "mice" use
         install.packages("mice", "c:\\myPackages")
         
    Then each R session use
         library(mice, lib.loc="c:\\myPackages")
         
  12. A system for documenting data analysis projects:

    Here is an idea for making R code that stores comments and results in a separate, readable file. This is especially nice when you might need to re-source() your code due to changes in the data or analysis (i.e., essentially always). Optionally, you can run reportLatex() after all of your report() commands to create a .tex file that is formatted better and incorporates graphical output (see below).

    (An alternative is sweave. Unlike sweave, report does not require you to understand latex, and it has only a single command to learn.)

    The code and more documentation are at report.R.

    Put these two lines near the top of your code:

         if (!exists("report")) source("http://www.stat.cmu.edu/~hseltman/files/report.R")
         report("Start of my report on project X", new=TRUE, prefix="myProjectX")
         
    This creates a file named "myProjectXYYYY-MM-DD.txt" with the quoted string in the first argument as the text at the top of the file. You can include "\n" in the first argument to write multiple lines in one call. You can optionally add the argument useTime=TRUE to include the time of creation along with the date in the file name if you want to keep multiple versions from the same day.

    Note that the variable "reportFileName" is created in your global environment and you should not delete this variable, at least while you are working on any one report.

    Now anytime in your code, you can include code of the form

     report(x) 
    or
     report(x, ..., z)
    to cause the value of x (or all of the variables x through z) to go to both the screen and the report file. This constructs the report on-the-fly as you work through your analysis. (If multiple arguments are used with report() and they are all strings or single numbers, then they are pasted together without any spaces between them (i.e., using sep="")).

    Note that you can manually erase errors from the report file using a text editor.

    Note that with a little planning, you will be in the situation such that if you re-source() your whole .R file, e.g., after correcting an error in the data or analysis, you will end up with a brand new, complete, readable report of the entire analysis with no effort.

    Note that the screen width affects the output by controlling the usual R text wrapping, e.g., with table(). Normally, you will want to keep the screen width around 60-70 characters to make it easier to read the report.

    Note that whenever you run report(x, new=TRUE, ...), if the report file name matches an existing file, the old file is deleted.

    Here are some examples that demonstrate what you can do:

         report("\nDemographics")
         report(table(age, gender))
         report(paste("Number of visits =", nrow(dat)))
         report("\nSuccess by treatment")
         report(with(dat, table(success, treatment, exclude=NULL)))
         report("\nYears of education:")
         report(summary(demog$educ))
         report(paste("\nDroppping", sum(noVisits|oneVisit), 
                      "subjects with no CERAD's or only 1 visit"))
         report(expression(str(my.data.frame)))
         
    The last example uses "expression()" because the "str()" function breaks the usual R rules and uses "cat()" rather than returning its result as an object. The "stem()" function is another example.

    There three helper functions in report.R.

    1. matForm(x, cols=12) converts a vector (string, numeric or factor) into a string matrix with a specific number of columns (even if length(x)%%cols!=0), so that long vectors don't ruin the appearance of the report.
    2. total(tab, margins=1:2) adds totals to the result of table()
    3. pct(tab, margins=1:2) adds percents to the result of table()

    Note that pct() and total() can both be used on the same table, in either order. Each respects the results of the other to avoid the incorrect and/or confusing output that could result from, e.g., including data and their total when computing percents.

    Optionally, you can use reportLatex() (code and description in reportLatex.R) to convert your .txt file into a .tex (Latex) file. This can incorporate plots as follows: when you are going through your analysis use the report text "See ... in myPlotFile.pdf", e.g.,

         plot(rnorm(20, type="b", main="Random normals", xlab="time", ylab="x")
         fname = "rnorm.pdf"
         dev.copy(pdf, fname); dev.off()
         # Important: Be sure to put a blank between "in" and the end quote
         #            since  sep="" will be in effect.
         report("\nSee 20 Gaussians in ", fname)
         
    With or without these special graphics commands, when you run reportLatex() a .tex file is created with the same base name as your .txt report file. Note that you can manually edit the .tex file at this point if desired.

    You then process this .tex file with pdflatex myReportFile.tex in Linux (or however else you know to process Latex files on any operating system) to produce the .pdf report file.

    If you used the special graphics indicator text "See ... in someFileName.pdf", then the plots will be included in the report, and the caption of the figure will be the text between "See" and "in". Also the caption will include figure numbers starting at "Figure 1".

    If you prefer to use a different graphics file type than "pdf" (as long as it is compatible with whatever version on pdflatex or latex that you are using) just run the optional form, e.g., reportLatex(extension=".pdf") substituting your graphics extension for "pdf".

Tips for Programming in R

  1. End each function with return() or invisible() rather than using implicit returns. This conforms to standard programming practice in most other languages and make your program easier to read.

  2. Start each function with checks of the arguments. It takes a little extra time but will usually repay you (or other users of the function) by pointing out the source of errors. Here is an example:
         myfun = function(dtf, name, p=0.5) {
           if (is.matrix(dtf)) dtf = data.frame(dtf)
           if (!is.data.frame(dtf)) stop("dtf must be a data.frame or matrix")
           if (!is.character(name) || length(name)!=1) stop("name must be a single character string")
           if (p<=0 || p>=1) stop("p must be in the interval (0,1)"
           ...
           return(rslt)
         }
         
  3. Allow for stopping and restarting of functions with long loops (e.g., MCMC).
    A good trick is to setup your function (or even just a loop) as follows:
         myfun = function() {
           if (file.exists("myresults.dat")) {
              ...load and use old results...
           }
           ...
           for (i in 1:10000) {
             if (file.exists("stop")) {
               write.table(myresults, file="myresults.dat")
               stop("Early stop due to detection of stop file")
             }
             ...
           }
           ...
           return(...)
         }
         
    Then, you can create a file called "stop" at any time (e.g., in Linux using "echo stop>stop" at the Linux prompt) and the function will gracefully stop at the start of the next loop iteration. Without too much work, you can probably set up your function to automatically continue wherever you left off. Just remember to delete or rename the "stop" file before running the function again.

  4. Avoid using "attach" as a way to save typing. The major problem is that modification of old elements or creation of new ones is not saved when you quit (and "save workspace") R. This leads to insidious errors. One alternative is to use with(), e.g., something like:
         with(mydtf, plot(x, y, col=gender))
         
    where the columns of "mydtf" are x, y, and gender.

  5. Working with "non-visible functions": If you try, e.g., methods(logLik), you will find some methods (e.g., logLik.glm) that are marked with an asterisk and are "non-visible". Here is how to get a copy of those functions. Use getAnywhere(logLik.glm) to find that it is in the "namespace" called "stats". Then mylogLik.glm=stats:::logLik.glm will get you a copy of the function.

  6. (Advanced) To make a nice user interface with dialog boxes, etc. consider the Tcl/Tk package. Here is a good introduction. Here is the R help. Here is a primer with an update. You might prefer a higher-level package called rpanel, described here and here, with this home page, and this package reference, and this cute little example which needs spacer.gif. Here are more R examples. Here are links about comparing tcl/tk to other systems. And here is a (non-R) Tcl/Tk Electronic Reference.

Problems and Solutions

  1. Problem: Loading dates, e.g., from Excel, and working with dates is poorly documented.     Solution: Load datetest.csv, then try the examples in Rdates.R.
  2. Problem: Each click for the locator() annoyingly causes the bell to ring.
        Solution: options(locatorBell=FALSE)
  3. Problem: Create a new data.frame column that is a complex code based on old columns.
        Solution: Create a function for one subject and apply() it to all subjects. This is much more efficient than a for loop. E.g.
         myfun = function(x) {
           # Argument x should contain one row, columns a,b,e,f.
           # The result is the mean of a and f unless b is missing or negative,
           # in which case the min of e and f is returned.
           if (is.na(x[2]) || x[2]<=0) {
             return(min(c(x[3],x[4])))
           } else {
             return((x[1]+x[4])/2)
           }
         }
         dtf$new = apply(dtf[,c("a","b","e","f")], 1, myfun)
         
    An alternative is as follows. The optional first line may prevent some wonky errors, and is good practice.
         dtf$new = NA  # in general, NA is safer than 0, protecting against bad logic
         Sel = is.na(dtf$b) | dtf$b<=0
         dtf[Sel, "new"] = pmin(dtf$e[Sel], dtf$f[Sel])
         Sel = !is.na(dtf$b) & dtf$b>0
         dtf[Sel, "new"] = (dtf$a[Sel]+dtf$f[Sel])/2
         
  4. Problem: Analyze (all) subsets of a data.frame
       Solution: To analyze a single subset of a data.frame, you can use an index vector (logical or numeric) as the "row.selector" (first argument) of the form "incdata[row.selector, col.selector]". For example, the expression median(incdata[incdata$sex=="female", "income"]) calculates the median income of just the female subjects in data.frame "incdata".

    But expressions for each of several categories is awkward and inefficient. So the methods below present efficient alternatives. If no subsetting variable exists, consider using the R function cut() or Problem/Solution #2 to create it.

    Here is code you can paste into R to generate a sample data.frame to use as an example:

         n = 20
         incdata = data.frame(sex=c("male","female")[1+rbinom(n,1,0.5)],
                           race=c("black","white","hispanic","Asian")[1+rbinom(n,3,0.5)],
                           income=round(rnorm(n,50000,15000)),
                           networth=pmax(0,round(rnorm(n,50000,30000))))
         
  5. Problem: Need a function like read.table(), but the data are already in a character variable
        dat = c("category size type", "abc 23.4 g17a72", "aaa 3.2 h19h33", "bar 17 z12z12")
        dtf = read.table(textConnection(dat), header=TRUE)
      
  6. Problem: R is too slow

Useful Functions

Note: In R, use source("somefunction.R"), including the quotes, to make the functions in somefunction.R available.


All links active 5/15/2014. Please report missing links, errors, and suggestions to


up To my Home Page