Dear students, Please answer to the following two questions and send it in e-mail to kubo@ees.hokudai.ac.jp , the subject of which should be kubostat The due date is 30th, November. Thank you for all! Send your e-mail to me if there is any difficulty. [Question 1] Describe shortly on statistical methods that may be needed in your thesis. What kind of statistical softwares and methods did you use? [Question 2] Firstly, download "data" by using one of two methods below: method 1: download manually data.RData from kubo-stat web page, then load("data.RData") in R method 2: directly download using the following log(url(...)) function, load(url("http://hosho.ees.hokudai.ac.jp/~kubo/stat/2017/Fig/distribution/data.RData")) You have 6 line R code below: sample.mean <- mean(data) print(sample.mean) rpois(50, lambda = sample.mean) mean(rpois(50, lambda = sample.mean)) replicate(10, mean(rpois(50, lambda = sample.mean))) hist(replicate(3000, mean(rpois(50, lambda = sample.mean)))) Describe what happens when you run each line on R referring to and citing R outputs. The last line of R code above generates a histgram. Discuss the relationship between the histgram and parameter estimation based on limited data. [Question 3] You have 3 line R code below: d <- read.csv("http://hosho.ees.hokudai.ac.jp/~kubo/stat/2017/Fig/poisson/data3a.csv") d$y.new <- rpois(nrow(d), lambda = exp(1 + 0.1 * d$x + c(0, 0.8)[d$f])) summary(glm(y.new ~ x + f, data = d, family = poisson)) Describe what happens when you run each line and the output generated by the final line. You can skip the description on deviances appeared in the outputs of glm() because I have not yet explain it. How do you make a prediction using the estimates obtained from the output of summary(glm(...)) in [assignment 3-1]. Show your answer using plot(), lines() and other functions to draw the quantitative prediction. [Question 4] The following 5 line R code is a demonstration of stepAIC() function that allows you to perform AIC model selection automatically. d <- read.csv("http://hosho.ees.hokudai.ac.jp/~kubo/stat/2017/Fig/poisson/data3a.csv") fit <- glm(y ~ x + f, data = d, family = poisson) library(MASS) # reading MASS package fit.best <- stepAIC(fit) summary(fit.best) Describe the output lines obtained by running "fit.best <- stepAIC(fit)" line. [Question 5] Run the following code on R, d <- read.csv("http://hosho.ees.hokudai.ac.jp/~kubo/stat/2017/Fig/binomial/data4a.csv") fit <- glm(cbind(y, N - y) ~ x + f, data = d, family = binomial) library(MASS) fit.best <- stepAIC(fit) then you can obtain the best model based on the example data ("data4a.csv"). Using this results, draw two curves to predict the expected numbers of alive seeds on data plot, plot(d$x, d$y, pch = c(21, 19)[d$f]) both for f = C and f = T. The following function definition in R, logistic <- function(z) { 1 / (1 + exp(-z)) } must be helpful for you to draw the prediction curves. Send to me your R code to draw the curves. [Question 6] Run the code below on R, and describe shortly the implication of the output plot. logistic <- function(z) 1 / (1 + exp(-z)) n <- 200 n.seed <- 8 par(mfrow = c(2, 2)) # splitting plot regions into 2x2 for (s in c(0.2, 0.5, 1.0, 2.0)) { # binomial prob.binom <- dbinom(0:n.seed, size = n.seed, prob = logistic(0)) plot( 0:n.seed, n * prob.binom, type = "b", # both, lines and points ylim = c(0, max(n * prob.binom) * 1.2), main = paste("s =", s), col = "blue" ) # binomial + Gaussian r <- rnorm(n, mean = 0, sd = s) y <- rbinom(n, size = n.seed, prob = logistic(r)) table.y <- table(y) print(table.y) px <- as.numeric(names(table.y)) points(px, table.y, col = "red", pch = 19) } [Question 7] Run the code below on R, and describe briefly the implication of the output plot. # read data and plot d1 <- read.csv("http://hosho.ees.hokudai.ac.jp/~kubo/stat/2017/Ees/g/fig/nested/d1.csv") # install and load packages install.packages("lme4") # if you have not installed lme4 package yet. library(lme4) # load lme4 # plot data plot(d1$pot, d1$y, col = rep(c("blue", "red"), each = 5)) # fit GLMM (i.e., hierarchical Bayesian model) fit1 <- glmer( y ~ f + (1 | id) + (1 | pot), family = poisson, data = d1 ) print(summary(fit1)) [Question 8 (your impression)] The final assignment is your general and specific comments on the statistical modeling class. Critical feedbacks are welcome!