Homework 2

Due Friday September 15th at 5:00pm

Exercise 1

Let \(Y_1, Y_2 | \theta\) be i.i.d. binary(\(\theta\)), so that \(p(y_1, y_2 | \theta) = \theta ^{y_1 + y_2} (1- \theta) ^{2 - y_1 - y_2}\) and let \(\theta \sim \text{beta}(\eta, \eta)\)

Compute \(E~Y_i\) and \(Var~Y_i\) (the mean and variance of \(Y_i\) unconditional on \(\theta\)) as a function of \(\eta\)
Compute \(E~Y_1 Y_2\), which is the same as \(p(Y_1 = 1, Y_2 = 1)\) unconditional on \(\theta\). You can do this with help from the formula on page 33 of the book.
Using the terms you have calculated above, make a graph of the correlation between \(Y_1\) and \(Y_2\) as a function of \(\eta\).
Interpreting \(\eta\) as how confident you are that \(\theta\) is near \(\frac{1}{2}\), and interpreting \(Cor(Y_1, Y_2)\) as how much information \(Y_1\) and \(Y_2\) provide about each other, explain in words why the correlation changes as a function of \(\eta\).

Exercise 2

Suppose \(n\) individuals volunteer to count birds in a forest. Let \(Y_i\) be the number of birds counted by individual \(i\), and let \(x_i\) be the number of hours spent in the forest by volunteer \(i\). We will model the data \(Y_1, \ldots Y_n\) as being independent given \(\theta\), but not identically distributed. Specifically, our model is that \(Y_i | \theta \sim \text{Pois}(\theta x_i)\), independently for \(i = 1, \ldots n\).

Compute \(E~Y_i | \theta\) and explain what \(\theta\) represents.
Write out a formula for the joint pdf \(p(y_1, \ldots y_n |\theta)\) and simplify as much as possible. Find the MLE, that is, the value of \(\theta\) that maximizes \(p(y_1, \ldots y_n | \theta)\). Explain why it makes sense.
Let \(\theta \sim \text{gamma}(a, b)\). Find a formula for the posterior mode of \(\theta\) and compare to the MLE.

Exercise 3

Let \(\theta_1\) be the prevalence of a rare allele among people with Alzheimer’s disease, and let \(\theta_2\) be the prevalence among people without the disease. To estimate \(\theta_1\) and \(\theta_2\), a sample of \(n_1 = 19\) Alzheimer’s patients and \(n_2 = 176\) control subjects are genotyped for the presence of the allele. Let \(Y_1\) and \(Y_2\) be the number of people in the two samples who have the allele. We will model \(Y_1\) and \(Y_2\) as independent with \(Y_1 | \theta_1 \sim \text{binomial}(n_1, \theta_1)\) and \(Y_2 | \theta_2 \sim \text{binomial}(n_2, \theta_2)\). Prior studies suggest that \(\theta_2 \sim \text{beta}(2, 30)\) is a reasonable prior distribution for \(\theta_2\). For now, we will use the same prior distribution for \(\theta_1\). The study is performed and the data are that \(Y_1 = 1\) and \(Y_2 = 16\).

State the posterior distributions of \(\theta_1\) and \(\theta_2\). Plot the posterior densities together on a single graph with the prior density, and compare all three curves with words.
Compute the posterior mean and a 95% posterior interval for each of \(\theta_1\) and \(\theta_2\).
With a picture, with words, or mathematically, try to describe different kind of joint prior distribution for \(\theta_1\) and \(\theta_2\) that represents the \(\theta_1\) and \(\theta_2\) are close to each other, but highly uncertain.