---
title: '36-350: Lab 1, August 29 2014'
output: pdf_document
---
Today's agenda: Manipulating data objects; using the built-in functions,
doing numerical calculations, and basic plots; reinforcing core probabilistic
ideas.
***General instructions for labs***: Upload an R Markdown file, named with your andrew ID, to Blackboard. You will give the commands to answer each question in its own code block, which will also produce plots that will be automatically embedded in the output file. Each answer must be supported by written statements as well as any code used. Include the name of your lab partner (if you have one) in the file.
***R Markdown Test***
0. Open a new R Markdown file; set the output to HTML mode and "Knit". This should produce a web page with the knitting procedure executing your code blocks. You can edit this new file to produce your homework submission.
Background
----------
The exponential distribution is defined by its cumulative distribution function
$$ F(x) = 1-e^{-\lambda x} $$
The R function `rexp` generates random variables with an exponential
distribution.
```
rexp(n=10, rate=5)
```
produces 10 exponentially-distributed numbers with rate ($\lambda$) of 5. If
the second argument is omitted, the default rate is 1; this is the ``standard
exponential distribution''.
Part I
----------
1. Generate 200 random values from the standard exponential distribution and store them in a vector `exp.draws.1`. Find the mean and standard deviation of `exp.draws.1`.
2. Repeat, but change the rate to 0.1, 0.5, 5 and 10, storing the results in vectors called `exp.draws.0.1`, `exp.draws.0.5`, `exp.draws.5` and `exp.draws.10`.
3. The function `plot()` is the generic function in R for the visual display of data. `hist()` is a function that takes in and bins data as a side effect. To use this function, we must first specify what we'd like to plot.
a. Use the `hist()` function to produce a histogram of your standard exponential distribution.
b. Use `plot()` with this vector to display the random values from your standard distribution in order.
c. Now, use `plot()` with two arguments -- any two of your other stored random value vectors -- to create a scatterplot of the two vectors against each other.
4. We'd now like to compare the properties of each of our vectors. Begin by creating a vector of the means of each of our five distributions in the order we created them and saving this to a variable name of your choice. Using this and other similar vectors, create the following scatterplots:
a. The five means versus the five rates used to generate the distribution.
b. The standard deviations versus the rates.
c. The means versus the standard deviations.
For each plot, explain in words what's going on.
Part II
-------
5. R's capacity for data and computation is large to what was available 10 years ago.
a. To show this, generate 1.1 million numbers from the standard exponential distribution and store them in a vector called `big.exp.draws.1`. Calculate the mean and standard deviation.
b. Plot a histogram of `big.exp.draws.1`. Does it match the function $1-e^{-x}$? Should it?
c. Find the mean of all of the entries in `big.exp.draws.1` which are strictly greater than 1. You may need to first create a new vector to identify which elements satisfy this.
d. Create a matrix, `big.exp.draws.1.mat`, containing the the values in
`big.exp.draws.1`, with 1100 rows and 1000 columns. Use this matrix as the input to the `hist()` function and save the result to a variable of your choice. What happens to your data?
e. Calculate the mean of the 371st column of `big.exp.draws.1.mat`.
f. Now, find the means of all 1000 columns of `big.exp.draws.1.mat` simultaneously. Plot the histogram of column means. Explain why its shape does not match the histogram in problem 5b).
g. Take the square of each number in `big.exp.draws.1`, and find the mean of this new vector. Explain this in terms of the mean and standard deviation of `big.exp.draws.1`. ***Hint:*** think carefully about the formula R uses to calculate the standard deviation.