21 March 2019

\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Indicator}[1]{\mathbb{1}\left( #1 \right)} \]

```
library(MASS)
data(cats)
head(cats$Bwt)
```

`## [1] 2.0 2.0 2.0 2.1 2.1 2.1`

`summary(cats$Bwt)`

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 2.300 2.700 2.724 3.025 3.900
```

- ECDF: \[
\hat{F}_n(a) = \frac{1}{n}\sum_{i=1}^{n}{\Indicator{x_i \leq a}}
\]
- Makes a jump at each distinct value seen in the data
- Constant in between the jumps

`ecdf`

`plot(ecdf(cats$Bwt), xlab = "Body weight (kg)", ylab = "Cumulative probability")`

- Suppose \(x_1, x_2, \ldots x_n\) are IID with CDF \(F\). Then for
*any*\(a\), \[ \hat{F}_n(a) \rightarrow F(a) \] by law of large numbers- Also: see backup slides

- What’s the probability of a cat weighing between \(3.21\) and \(3.22\) kg?
- ECDF says: exactly 0

- Q from discussion: What about running a smoother (e.g., a spline) through the ECDF?
- A: You could do that, but you’d need to make sure the curve was always monotone, between 0 and 1, etc.
- Once we estimate the density, we can recover the CDF by integrating

- More generally, pdf is derivative of CDF, \(f(x) = dF/dx\)
- Derivative of an
*empirical*CDF is either 0 or \(\infty\)- Where is it \(\infty\)?

- How do we estimate a pdf?

`hist(cats$Bwt, xlab = "Body weight (kg)", ylab = "Number of cats", breaks = 20)`

- Gives us an idea of where we have more or fewer data points…

- Pick a parametric pdf that looks sensible
- For example, the gamma distribution with shape parameter \(a\) and scale \(s\) is \[
f(x) = \frac{1}{s^a \Gamma(a)} x^{a-1} e^{-x/s}
\]
- Exponential is the special case \(a=1\)
- \(a > 1\) is a hump with an exponential right tail

- Now maximize the likelihood
- For some (but not all) standard distributions, the function
`fitdistr`

in the package MASS has pre-programmed maximum likelihood estimates- You saw this in homework 3 for the \(t\) distribution
- As it happens,
`fitdistr`

knows about gamma distributions too

- It’s also easy to maximize the log-likelihood yourself

```
# Define the (negative, normalized) log-likelihood function
gamma.loglike <- function(params, data = cats$Bwt) {
-mean(log(dgamma(x = data, shape = params[1], scale = params[2])))
}
# Ask R to optimize it, with a starting guess about the parameters
cats.gamma <- optim(fn = gamma.loglike, par = c(1, 1))
# What is the best (negative, normalized) log-likelihood?
cats.gamma$value
```

`## [1] 0.6677077`

```
# What parameters did optim find for us?
cats.gamma$par
```

`## [1] 32.65187332 0.08341489`

```
plot(hist(cats$Bwt, plot = FALSE, breaks = 20), freq = FALSE, xlab = "Body weight (kg)")
curve(dgamma(x, shape = cats.gamma$par[1], scale = cats.gamma$par[2]), add = TRUE)
```

- Estimated density is
*constant*on bins - \(N_i =\) number of samples in bin \(i\)
- \(i(x)=\) which bin \(x\) falls into
- \(h =\) width of the bins

\[ \hat{f}_{hist}(x) \equiv \frac{1}{h} \frac{N_{i(x)}}{n} \]

Probability of falling into a region of length \(h\) around \(x\) will be \[ \int_{x-h/2}^{x+h/2}{f(u) du} \approx f(x) h \] for very small \(h\). So \[ N_i \sim \mathrm{Binom}(n, f(x) h) \] and \[ \hat{f}_{hist}(x) = \frac{N_i}{nh} \sim \frac{1}{nh} \mathrm{Binom}(n, f(x) h) \] which \(\rightarrow f(x)\) as \(n \rightarrow \infty\)

Error from variance: \(\propto 1/nh\)

Error from bias: \(\propto h^2 (f^{\prime}(x))^2\)

Total error for estimating the density, \(O(h^2) + O(1/nh)\)

Best \(h = O(n^{-1/3})\)

Best total error, \(O(n^{-2/3})\)

vs. \(O(n^{-1})\) if we’ve got a correct parametric model

Q from discussion: What about variable-width bins?

A: That can help, but it’s even better to get away from fixed bins altogether

Idea: “Smooth out” each observation into a little distribution

Kernel functions *are* pdfs

Put a copy of the kernel at each observation; add them up

\[ \hat{f}_{kern}(x) = \frac{1}{n} \sum_{i=1}^{n}{\frac{1}{h} K\left(\frac{x-x_i}{h}\right)} \]

Say \(h=0.01\)

- Cross-validation
- Cross-validate
*what*? **Integrated squared error**: \(\int{(f(x) - \hat{f}(x))^2 dx}\)- Or (negative, normalized)
**log-likelihood**\(\int{-\log{(\hat{f}(x))} f(x) dx}\)

- The integral is an expected value: \[ \int{-\log{(\hat{f}(x))} f(x) dx} = \Expect{-\log{\hat{f}(X)}} \]
- With \(n\) IID samples \(x_i\), \[
\Expect{-\log{\hat{f}(X)}} \approx \frac{1}{n}\sum_{i=1}^{n}{-\log{\hat{f}(x_i)}}
\] by LLN
- A similar but more complicated trick for ISE; see backup slides

`npudens`

```
library(np)
bwt.np <- npudens(cats$Bwt)
plot(bwt.np, xlab = "Body weight (kg)")
```