- Data frames are a representation of the “classic” data table in R: rows are observations/cases, columns are variables/features
- Each column can be a different data type (but must be the same length)
`subset()`

: function for extracting rows of a data frame meeting a condition`split()`

: function for splitting up rows of a data frame, according to a factor variable`apply()`

: function for applying a given routine to rows or columns of a matrix or data frame`lapply()`

: similar, but used for applying a routine to elements of a vector or list`sapply()`

: similar, but will try to simplify the return type, in comparison to`lapply()`

`tapply()`

: function for applying a given routine to groups of elements in a vector or list, according to a factor variable

*Plot basics*

Base R has a set of powerful plotting tools. An overview:

`plot()`

: generic plotting function`points()`

: add points to an existing plot`lines()`

,`abline()`

: add lines to an existing plot`text()`

,`legend()`

: add text to an existing plot`rect()`

,`polygon()`

: add shapes to an existing plot`hist()`

,`image()`

: histogram and heatmap`heat.colors()`

,`topo.colors()`

, etc: create a color vector`density()`

: estimate density, which can be plotted`contour()`

: draw contours, or add to existing plot`curve()`

: draw a curve, or add to existing plot

To make a scatter plot of one variable versus another, use `plot()`

```
n = 50
set.seed(0)
x = sort(runif(n, min=-2, max=2))
y = x^3 + rnorm(n)
plot(x, y)
```

The `type`

argument controls the plot type. Default is `p`

for points; set it to `l`

for lines

`plot(x, y, type="p")`

`plot(x, y, type="l")`

Try also `b`

or `o`

, for both points and lines

The `main`

argument controls the title; `xlab`

and `ylab`

are the x and y labels

`plot(x, y, main="A noisy cubic") # Note the default x and y labels`