STA360 at Duke University
A data scientist at a small subscriber-based tech company models the number of new subscribers in a day as \(Y|\theta \sim \text{Poisson}(\theta)\) with prior \(\theta \sim \text{gamma}(a,b)\). A priori, the data scientist believes that there are on average 20 signups per day and 90% of the time there are between approximately 3 and 50 signups on a given day.
Find suitable \(a\) and \(b\) that satisfy the data scientist’s prior beliefs.
Verify how well your prior aligns with this belief using Monte Carlo sampling to generate the prior predictive distribution, \(p(\tilde{y}) = \int p(\tilde{y}, \theta)d\theta\).
After one month the data scientist observes the following daily subscriber counts:
The data scientist is fundamentally interested in the variance of subscriber counts per day. Is the Poisson model appropriate for this data?
Report \(p(\tilde{S}^2 > s^2_{obs} | y_1,\ldots y_n)\) where \(\tilde{S}^2\) is the posterior predictive sample variance and \(s^2_{obs}\) is the observed sample variance (\(s^2_{obs} = 21.3\)). To generate samples under the posterior predictive distribution, use the prior from part (a).
Let \(Y_1,\ldots Y_n\) be iid random variables with expectation \(\theta\) and variance \(\sigma^2\).
Show that \(\frac{1}{n} \sum_{i = 1}^n (Y_i -\bar{Y})^2\) is a biased estimator of \(\sigma^2\).