2 Day 2 (January 22)

2.1 Announcements

Correction about Canvas
- Journals should be uploaded to Canvas 24 hours after lecture ends
- Activity 1 will eventually be uploaded to Canvas (but no due date yet)
Please start activity 1
Questions/clarifications from journals
- “What data set can I use for the class project?”
- “My question is how to deal with uncertainty or any unexpected situation that happens.”
- “I definitely need to confess that I don’t have a solid understanding of Bayesian statistics.”
- “The brief mention of telemetry data made me wonder whether we’ll need to deal with very large, irregularly sampled, or noisy time-series data right away, or if we’ll build up to that gradually.”
- “Difference between dynamic and descriptive approaches”

2.2 Opening example: Human movement

The goal of this activity is to show you how cool spatio-temporal statistics is!
Human movement modeling with the linear regression model and other fancy tools!
Trajectories are a time series of the spatial location of an object (or animal).
- We can usually pick the object and the time that we obtain its spatial location (i.e., time is fixed)
- The location is a random variable in most cases, but time can also be a random variable.
In-class marathon example (Download R script here)

2.3 Statistical models

Read pgs. 77 - 106 in Wikle et al. (2019)
What is a model?
- Simplification of something that is real designed to serve a purpose
What is a statistical model?
- Simplification of a real data generating mechanism
- Constructed from deterministic mathematical equations and probability density / mass functions
- Capable of generating data
- Generative vs. non-generative models
What is the purpose of a statistical model
- See section 1.2 on pg. 7 and pg. 77 of Wikle et al. (2019)
- Capable of making predictions, forecasts, and hindcasts
- Enables statistical inference about observable and unobservable quantities
- Reliability quantify and communicate uncertainty
  - Example using simple linear regression

2.4 Matrix review

Column vectors
- \(\mathbf{y}\equiv(y_{1},y_{2},\ldots,y_{n})^{'}\)
- \(\mathbf{x}\equiv(x_{1},x_{2},\ldots,x_{n})^{'}\)
- \(\boldsymbol{\beta}\equiv(\beta_{1},\beta_{2},\ldots,\beta_{p})^{'}\)
- \(\boldsymbol{1}\equiv(1,1,\ldots,1)^{'}\)
- In R
```
y <- matrix(c(1,2,3),nrow=3,ncol=1)
y
```
```
##      [,1]
## [1,]    1
## [2,]    2
## [3,]    3
```

Matrices

\(\mathbf{X}\equiv(\mathbf{x}_{1},\mathbf{x}_{2},\ldots,\mathbf{x}_{p})\)
In R

X <- matrix(c(1,2,3,4,5,6),nrow=3,ncol=2,byrow=FALSE)
X

##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6

Vector multiplication
- \(\mathbf{y}^{'}\mathbf{y}\)
- \(\mathbf{1}^{'}\mathbf{1}\)
- \(\mathbf{1}\mathbf{1}^{'}\)
- In R
```
t(y)%*%y    
```
```
##      [,1]
## [1,]   14
```
Matrix by vector multiplication
- \(\mathbf{X}^{'}\mathbf{y}\)
- In R
```
t(X)%*%y
```
```
##      [,1]
## [1,]   14
## [2,]   32
```
Matrix by matrix multiplication
- \(\mathbf{X}^{'}\mathbf{X}\)
- In R
```
t(X)%*%X
```
```
##      [,1] [,2]
## [1,]   14   32
## [2,]   32   77
```

Matrix inversion

\((\mathbf{X}^{'}\mathbf{X})^{-1}\)
In R

solve(t(X)%*%X)

##            [,1]       [,2]
## [1,]  1.4259259 -0.5925926
## [2,] -0.5925926  0.2592593

Determinant of a matrix

\(|\mathbf{I}|\)
In R

I <- diag(1,3)
I

##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1

det(I)

## [1] 1

Quadratic form
- \(\mathbf{y}^{'}\mathbf{S}\mathbf{y}\)
Derivative of a quadratic form (Note \(\mathbf{S}\) is a symmetric matrix; e.g., \(\mathbf{X}^{'}\mathbf{X}\))
- \(\frac{\partial}{\partial\mathbf{y}}\mathbf{y^{'}\mathbf{S}\mathbf{y}}=2\mathbf{S}\mathbf{y}\)
Other useful derivatives
- \(\frac{\partial}{\partial\mathbf{y}}\mathbf{\mathbf{x^{'}}\mathbf{y}}=\mathbf{x}\)
- \(\frac{\partial}{\partial\mathbf{y}}\mathbf{\mathbf{X^{'}}\mathbf{y}}=\mathbf{X}\)

2.5 Distribution theory review

Probability density functions (PDF) and probability mass functions (PMF)
- Normal distribution (continuous support)
- Binomial distribution (discrete support)
- Poisson distribution (discrete support)
- And many more (see handout)

Distributions in R

PDF of the normal distribution \[[z|\mu,\sigma^2] = \frac{1}{\sqrt{2\pi\sigma^2}}\textit{e}^{-\frac{1}{2\sigma^2}(z - \mu)^2}\]
- \(z\) is the random variable
- \(\mu\) and \(\sigma^2\) are the parameters
PDFs & PMFs in R
- ?dnorm
Generate random variables (\(z\)) from a PDF (e.g., \(z_i\sim\text{N}(\mu,\sigma^2)\))

z <- rnorm(n = 5, mean = 0, sd = 1)
z

## [1] 1.6583210 0.4404361 0.4330551 0.0216365 0.1167306

Histogram representation of a PDF

library(latex2exp)
z <- rnorm(n = 10000, mean = 0, sd = 1)
hist(z,freq=FALSE,col="grey",main = "",
  xlab= TeX('$\\textit{z}$'),
  ylab = TeX('$\\lbrack\\textit{z}|\\mu,\\sigma^2\\rbrack$'))

Plot a PDF in R

curve(expr = dnorm(x = x, mean = 0, sd = 1), from = -10, to = 10,
  xlab= TeX('$\\textit{z}$'),
  ylab = TeX('$\\lbrack\\textit{z}|\\mu,\\sigma^2\\rbrack$'))

Evaluate the “likelihood” at a given value of the parameters

dnorm(x = 1.759, mean = 0, sd = 1, log = FALSE)

## [1] 0.08492566

Other distributions

rpois(n = 5, lambda = 2)
rbinom(n = 5, size = 10, prob = 0.5)    
runif(n = 5,min = 0,max = 3)
rt(n = 5,df = 1)
rcauchy(n = 5, location = 2, scale = 4)

See stats package for more information

Making your functions for a distribution

PDF of the exponential distribution \[[z|\lambda] = \lambda\textit{e}^{-\lambda z}\]
Make your own function for the PDF of the exponential distribution

dexp <- function(z, lambda){lambda*exp(-lambda*z)}

Make your own function to simulate random variables from the exponential distribution using the inverse probability integral transform

rexp <- function(n, lambda){
  u <- runif(n)
  -1/lambda*log(1-u)
}

Make histogram by sampling from rexp() and overlay the PDF using dexp()

z <- rexp(n = 10000, lambda = 1)
hist(z,freq=FALSE,col="grey",main = "",
  xlab= TeX('$\\textit{z}$'),
  ylab = TeX('$\\lbrack\\textit{z}|\\lambda\\rbrack$'))
curve(expr = dexp(z = x,lambda = 1), from = 0, to = 10, 
  add = TRUE,col = "deepskyblue",lwd = 3)

Moments of a distribution
- First moment: \(\text{E}(z) = \int z [z|\theta]dz\)
- Second central moment: \(\text{Var}(z) = \int (z -\text{E}(z))^2[z|\theta]dz\)
- Note that \([z|\theta]\) is an arbitrary PDF or PMF with parameters \(\theta\)
- Example normal distribution \[\begin{eqnarray} \text{E}(z) &=& \int_{-\infty}^\infty z\frac{1}{\sqrt{2\pi\sigma^2}}\textit{e}^{-\frac{1}{2\sigma^2}(z - \mu)^2}dz\\&=& \mu \end{eqnarray}\] \[\begin{eqnarray} \text{Var}(z) &=& \int_{-\infty}^\infty (z-\mu)^2\frac{1}{\sqrt{2\pi\sigma^2}}\textit{e}^{-\frac{1}{2\sigma^2}(z - \mu)^2}dz\\&=& \sigma^2 \end{eqnarray}\]
- Example exponential distribution\[\begin{eqnarray} \text{E}(z) &=& \int_{0}^\infty z\lambda\textit{e}^{-\lambda z}dz\\&=& \frac{1}{\lambda} \end{eqnarray}\]\[\begin{eqnarray}\text{Var}(z) &=& \int_{0}^\infty (z-\mu)^2\lambda\textit{e}^{-\lambda z}dz\\&=& \frac{1}{\lambda^2} \end{eqnarray}\]