Statistical Computing, 36-350

Monday September 24, 2018

- Data frames are a representation of the “classic” data table in R: rows are observations/cases, columns are variables/features
- Each column can be a different data type (but must be the same length)
`subset()`

: function for extracting rows of a data frame meeting a condition`split()`

: function for splitting up rows of a data frame, according to a factor variable`apply()`

: function for applying a given routine to rows or columns of a matrix or data frame`lapply()`

: similar, but used for applying a routine to elements of a vector or list`sapply()`

: similar, but will try to simplify the return type, in comparison to`lapply()`

`tapply()`

: function for applying a given routine to groups of elements in a vector or list, according to a factor variable

*Plot basics*

Base R has a set of powerful plotting tools. An overview:

`plot()`

: generic plotting function`points()`

: add points to an existing plot`lines()`

,`abline()`

: add lines to an existing plot`text()`

,`legend()`

: add text to an existing plot`rect()`

,`polygon()`

: add shapes to an existing plot`hist()`

,`image()`

: histogram and heatmap`heat.colors()`

,`topo.colors()`

, etc: create a color vector`density()`

: estimate density, which can be plotted`contour()`

: draw contours, or add to existing plot`curve()`

: draw a curve, or add to existing plot

To make a scatter plot of one variable versus another, use `plot()`

```
n = 50
set.seed(0)
x = sort(runif(n, min=-2, max=2))
y = x^3 + rnorm(n)
plot(x, y)
```

The `type`

argument controls the plot type. Default is `p`

for points; set it to `l`

for lines

`plot(x, y, type="p")`

`plot(x, y, type="l")`

Try also `b`

or `o`

, for both points and lines

The `main`

argument controls the title; `xlab`

and `ylab`

are the x and y labels

`plot(x, y, main="A noisy cubic") # Note the default x and y labels`

`plot(x, y, main="A noisy cubic", xlab="My x variable", ylab="My y variable")`

Use the `pch`

argument to control point type

`plot(x, y, pch=21) # Empty circles, default`

`plot(x, y, pch=19) # Filled circles`

Try also `20`

for small filled circles, or `"."`

for single pixels

Use the `lty`

argument to control the line type, and `lwd`

to control the line width

`plot(x, y, type="l", lty=1, lwd=1) # Solid line, default width`

`plot(x, y, type="l", lty=2, lwd=3) # Dashed line, 3 times as thick`

Use the `col`

argument to control the color. Can be:

- An integer between 1 and 8 for basic colors
- A string for any of the 657 available named colors

The function `colors()`

returns a string vector of the available colors

`plot(x, y, pch=19, col=1) # Black, default`