Dear students,

Please answer to the following two questions and send it in e-mail
to kubo@ees.hokudai.ac.jp , the subject of which should be kubostat

The due date is 30th, November. Thank you for all!
Send your e-mail to me if there is any difficulty.


[Question 1]
Describe shortly on statistical methods that may be needed in your thesis.
What kind of statistical softwares and methods did you use?

[Question 2]
Firstly, download "data" by using one of two methods below:
method 1: download manually data.RData from kubo-stat web page, then load("data.RData") in R
method 2: directly download using the following log(url(...)) function,
load(url("http://hosho.ees.hokudai.ac.jp/~kubo/stat/2017/Fig/distribution/data.RData"))

You have 6 line R code below:

sample.mean <- mean(data)
print(sample.mean)
rpois(50, lambda = sample.mean)
mean(rpois(50, lambda = sample.mean))
replicate(10, mean(rpois(50, lambda = sample.mean)))
hist(replicate(3000, mean(rpois(50, lambda = sample.mean))))

Describe what happens when you run each line on R
referring to and citing R outputs.

The last line of R code above generates
a histgram. Discuss the relationship between the histgram
and parameter estimation based on limited data.

[Question 3]
You have 3 line R code below:

d <- read.csv("http://hosho.ees.hokudai.ac.jp/~kubo/stat/2017/Fig/poisson/data3a.csv")
d$y.new <- rpois(nrow(d), lambda = exp(1 + 0.1 * d$x + c(0, 0.8)[d$f]))
summary(glm(y.new ~ x + f, data = d, family = poisson))

Describe what happens when you run each line and the output
generated by the final line.
You can skip the description on deviances appeared
in the outputs of glm() because I have not yet explain it.

How do you make a prediction using the estimates obtained from the
output of summary(glm(...)) in [assignment 3-1]. Show your answer using
plot(), lines() and other functions to draw the quantitative prediction.

[Question 4]
The following 5 line R code is a demonstration of stepAIC() function
that allows you to perform AIC model selection automatically.

d <- read.csv("http://hosho.ees.hokudai.ac.jp/~kubo/stat/2017/Fig/poisson/data3a.csv")
fit <- glm(y ~ x + f, data = d, family = poisson)
library(MASS) # reading MASS package
fit.best <- stepAIC(fit)
summary(fit.best)

Describe the output  lines obtained by running "fit.best <- stepAIC(fit)" line.

[Question 5]
Run the following code on R,

d <- read.csv("http://hosho.ees.hokudai.ac.jp/~kubo/stat/2017/Fig/binomial/data4a.csv")
fit <- glm(cbind(y, N - y) ~ x + f, data = d, family = binomial)
library(MASS)
fit.best <- stepAIC(fit)

then you can obtain the best model based on the example
data ("data4a.csv"). Using this results, draw two curves to predict
the expected numbers of alive seeds on data plot,

plot(d$x, d$y, pch = c(21, 19)[d$f])

both for f = C and f = T. The following function definition in R,

logistic <- function(z) { 1 / (1 + exp(-z)) }

must be helpful for you to draw the prediction curves.
Send to me your R code to draw the curves.


[Question 6]
Run the code below on R,
and describe shortly the implication of the output plot.

logistic <- function(z) 1 / (1 + exp(-z))
n <- 200
n.seed <- 8
par(mfrow = c(2, 2)) # splitting plot regions into 2x2
for (s in c(0.2, 0.5, 1.0, 2.0)) {
   # binomial
   prob.binom <- dbinom(0:n.seed, size = n.seed, prob = logistic(0))
   plot(
     0:n.seed, n * prob.binom,
     type = "b", # both, lines and points
     ylim = c(0, max(n * prob.binom) * 1.2),
     main = paste("s =", s),
     col = "blue"
  )
  # binomial + Gaussian
  r <- rnorm(n, mean = 0, sd = s)
  y <- rbinom(n, size = n.seed, prob = logistic(r))
  table.y <- table(y)
  print(table.y)
  px <- as.numeric(names(table.y))
  points(px, table.y, col = "red", pch = 19)
}

[Question 7]
Run the code below on R,
and describe briefly the implication of the output plot.

# read data and plot
d1 <- read.csv("http://hosho.ees.hokudai.ac.jp/~kubo/stat/2017/Ees/g/fig/nested/d1.csv")

# install and load packages
install.packages("lme4") # if you have not installed lme4 package yet.
library(lme4) # load lme4

# plot data
plot(d1$pot, d1$y, col = rep(c("blue", "red"), each = 5))

# fit GLMM (i.e., hierarchical Bayesian model)
fit1 <- glmer(
  y ~ f + (1 | id) + (1 | pot), 
  family = poisson,
  data = d1
)
print(summary(fit1))

[Question 8 (your impression)]
The final assignment is your general and specific comments on
the statistical modeling class.
Critical feedbacks are welcome!