12 Day 12 (February 26)

12.1 Announcements

  • Activity 3 is posted

  • Final project proposals (if you have a 10/10 grade on Canvas start working!)

  • Office hours have been moved to 2-3pm today (Dickens Hall 109)

  • Read Ch. 4 pgs 137-192

  • Questions/clarifications from journals

    • Use of other software for class project (e.g., python, matlab, SAS, etc)
    • Trevor is getting old and python may be a better option for this class.
    • If you drop the NAs but keep the zeros, what missingness assumption are you making? (see monograph here)
    • “I’m still struggling to understand how we decide which probability distribution is appropriate for a given problem. For example, in the rainfall example we used a normal distribution even though rainfall can’t be negative, and that confused me because the normal distribution allows both negative and positive values.”

12.2 Extreme precipitation in Kansas

  • Note, this is essentially the same as activity 2

  • On September 3, 2018 there was an extreme precipitation event that resulted in flooding in Manhattan, KS and the surrounding areas. If you would like to know more about this, check out this link and this video here and here.

  • My process

    • Determine the goals of the study
    • Data acquisition
    • Exploratory data analysis
      • Live demonstration
    • The model building process
      • 1). Choose appropriate PDFs or PMFs for the data, process, and parameter models
      • 2). Choose appropriate mathematical models for the “parameters” or moments of the PDFs/PMFs from step 1.
      • 3). Choose an algorithm fit the statistical model to the data
      • 4). Make statistical inference (e.g., calculate derived quantities and summarize the posterior distribution)
    • Model checking, improvements, validation, and selection (Ch. 6)
  • What we will need to learn

    • How to use R as a geographic information system
    • New general tools from statistics
      • Gaussian process
      • Metropolis and Metropolis–Hastings algorithms
      • Gibbs sampler
    • How to use the hierarchical modeling framework to describe Kriging
      • Hierarchical Bayesian model vs. “empirical” hierarchical model
    • Specialized language used in spatial statistics (e.g., range, nugget, variogram)

12.2.1 Multivariate normal distribution

  • Multivariate normal distribution

  • \(\boldsymbol{\eta}\sim\text{N}(\mathbf{0},\sigma^{2}\mathbf{R})\)

    • Definitions
  • Correlation matrix – A positive semi-definite matrix whose elements are the correlation between observations

  • Correlation function – A function that describes the correlation between observations

  • Example correlation matrices

  • Compound symmetry \[\mathbf{R}=\left[\begin{array}{cccccc} 1 & 1 & 0 & 0 & 0 & 0\\ 1 & 1 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 1 & 0 & 0\\ 0 & 0 & 1 & 1 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 1\\ 0 & 0 & 0 & 0 & 1 & 1 \end{array}\right]\]

    • AR(1) \[\mathbf{R(\phi})=\left[\begin{array}{ccccc} 1 & \phi^{1} & \phi^{2} & \cdots & \phi^{n-1}\\ \phi^{1} & 1 & \phi^{1} & \cdots & \phi^{n-2}\\ \phi^{2} & \phi^{1} & 1 & \vdots & \phi^{n-3}\\ \vdots & \vdots & \vdots & \ddots & \ddots\\ \phi^{n-1} & \phi^{n-2} & \phi^{n-3} & \ddots & 1 \end{array}\right]\:\]

      • Example simulating from \(\boldsymbol{\eta}\sim\text{N}(\mathbf{0},\sigma^{2}\mathbf{R})\) in R
      n <- 200
      x <- 1:n
      I <- diag(1, n)
      sigma2 <- 1
      
      library(MASS)
      set.seed(2034)
      eta <- mvrnorm(1, rep(0, n), sigma2 * I)
      cor(eta[1:(n - 1)], eta[2:n])
      ## [1] -0.06408623
      acf(eta)

      par(mfrow = c(2, 1))
      par(mar = c(0, 2, 0, 0), oma = c(5.1, 3.1, 2.1, 2.1))
      plot(x, eta, typ = "l", xlab = "", xaxt = "n")
      abline(a = 0, b = 0)
      
      n <- 200
      x <- 1:n  # Must be equally spaced
      phi <- 0.7
      R <- phi^abs(outer(x, x, "-"))
      sigma2 <- 1
      set.seed(1330)
      eta <- mvrnorm(1, rep(0, n), sigma2 * R)
      cor(eta[1:(n - 1)], eta[2:n])
      ## [1] 0.7015304
      plot(x, eta, typ = "l")
      abline(a = 0, b = 0)

      acf(eta)

    • Example correlation functions

    • Gaussian correlation function \[r_{ij}(\phi)=e^{-\frac{d_{ij}^{2}}{\phi}}\] where \(d_{ij}\) is the “distance” between locations i and j (note that \(d_{ij}=0\) for \(i=j\)) and \(r_{ij}(\phi)\) is the element in the \(i^{\textrm{th}}\) row and \(j^{\textrm{th}}\) column of \(\mathbf{R}(\phi)\).

      library(fields)
      n <- 200
      x <- 1:n
      phi <- 40
      d <- rdist(x)
      R <- exp(-d^2/phi)
      sigma2 <- 1
      
      set.seed(4673)
      eta <- mvrnorm(1, rep(0, n), sigma2 * R)
      plot(x, eta, typ = "l")
      abline(a = 0, b = 0)

      cor(eta[1:(n - 1)], eta[2:n])
      ## [1] 0.9718077
  • Linear correlation function \[r_{ij}(\phi)=\begin{cases} 1-\frac{d_{ij}}{\phi} &\text{if}\ d_{ij}<0\\ 0 &\text{if}\ d_{ij}>0 \end{cases}\]

    library(fields)
    n <- 200
    x <- 1:n
    phi <- 40
    d <- rdist(x)
    R <- ifelse(d < phi, 1 - d/phi, 0)
    sigma2 <- 1
    
    set.seed(4803)
    eta <- mvrnorm(1, rep(0, n), sigma2 * R)
    plot(x, eta, typ = "l")
    abline(a = 0, b = 0)

    cor(eta[1:(n - 1)], eta[2:n])
    ## [1] 0.9774775

12.3 Linear models for correlated errors

  • Motivating example

    ISIT <- read.table("https://www.dropbox.com/scl/fi/n5ku9crpsr384p9z2rc8b/bioluminescence.txt?rlkey=airzs2ukql4g0hqxjr72re53j&dl=1",
        header = TRUE)
    
    Sources16 <- ISIT$Sources[ISIT$Station == 16]
    Depth16 <- ISIT$SampleDepth[ISIT$Station == 16]
    Sources16 <- Sources16[order(Depth16)]
    Depth16 <- sort(Depth16)
    plot(Depth16, Sources16, las = 1, ylim = c(0, 65), col = rgb(0, 0, 0, 0.25), pch = 19,
        ylab = "Density of bioluminescence", xlab = "Depth in meters")