# Introduction to Bayesian statistics - Bolstad M.

ISBN 0-471-27020-2

**Download**(direct link)

**:**

**75**> 76 77 78 79 80 81 .. 126 >> Next

11.1 COMPARING FREQUENTIST AND BAYESIAN POINT ESTIMATORS

A frequentist point estimator for a parameter is a statistic that we use to estimate the parameter. The simple rule we use to determine a frequentist estimator for ^ is to use

0Introduction to Bayesian Statistics. By William M. Bolstad ISBN 0-471-27020-2 Copyright ©John Wiley & Sons, Inc.

193

194

COMPARING BAYESIAN AND FREQUENTIST INFERENCES FOR MEAN

the statistic that is the sample analog of the parameter to be estimated. So we use the sample mean y to estimate the population mean

In Chapter 9 we learned that frequentist estimators for unknown parameters are evaluated by considering their sampling distribution. In other words, we look at the distribution of the estimator over all possible samples. A commonly used criterion is that the estimator be unbiased. That is, the mean of its sampling distribution is the true unknown parameter value. The second criterion is that the estimator have small variance in the class of all possible unbiased estimators. The estimator that has the smallest variance in the class of unbiased estimators is called the minimum variance unbiased estimator and is generally preferred over other estimators from the frequentist point of view.

When we have a random sample from a normal distribution, we know that the

2

sampling distribution of y is normal with mean ^ and variance sn. The sample mean, y, turns out to be the minimum variance unbiased estimator of ^.

We take the mean of the posterior distribution to be the Bayesian estimator for ^:

rv I \ 1/s2 n/a2

Ив = EMyi, ■■■•Уп)= n/a 2 + 1/S2 x m + n/a2 + 1/s2 X y •

We know that the posterior mean minimizes the posterior mean square. This means that jlB is the optimum estimator in the post-data setting. In other words, it is the optimum estimator for ^ given our sample data and using our prior.

We will compare its performance to that of jlf = y under the frequentist assumption that the true mean ^ is a fixed but unknown constant. The probabilities will be calculated from the sampling distribution of y. In other words, we are comparing the two estimators for ^ in the pre-data setting.

The posterior mean is a linear function of the random variable y, so its expected value is

л 1/s2 n/a2

E(MB) = n/a2 + 1/s2 X m + n/a2 + 1/s2 X M •

The bias of the posterior mean is its expected value minus the true parameter value, which simplifies to

a2

(m — p) •

ns2 + a2

The posterior mean is a biased estimator of ^. The bias could only be 0 if our prior mean coincides with the unknown true value. The probability of that happening is

0. The bias increases linearly with the distance the prior mean m is from the true unknown mean ^. The variance of the posterior mean is

n/a2 + 1/s

as x — = —т-^ a

ns2 + a2

1 The maximum likelihood estimator is the value of the parameter that maximizes the likelihood function.

It turns out that y is the maximum likelihood estimator of ^ for a normal random sample.

n

COMPARING FREQUENTIST AND BAYESIAN POINT ESTIMATORS

195

Figure 11.1 Biases of Arnold’s, Beth’s, and Carol’s estimators.

2

and is seen to be clearly smaller than П, which is the variance of the frequentist estimator jf = y. The mean squared error of an estimator combines both the bias and the variance into a single measure:

) = bias2 + Var(j).

The frequentist estimator jf = y is an unbiased estimator of j, so its mean squared error equals its variance:

2

MS(jf ) = ^.

When there is prior information, we will see that the Bayesian estimator has smaller mean squared error over the range of j values that are realistic.

Example 19 Arnold, Beth, and Carol want to estimate the mean weight of "1 kg" packages of milk powder produced at a dairy company. The weight in individual packages is subject to random variation. They know that when the machine is adjusted properly, the weights are normally distributed with mean 1015 grams, and standard deviation 5 gm. They are going to base their estimate on a sample of size 10. Arnold decides to use a normal prior with mean 1000 gm and standard deviation 20 gm. Beth decides she will use a normal prior with mean 1015 and standard deviation 15. Carol decides she will use a "flat"prior. They calculate the bias, variance, and mean squared error of their estimators for various values of j to see how well they perform.

Figure 11.1 shows that only Carol’s prior will give an unbiased Bayesian estimator. Her posterior Bayesian estimator corresponds exactly to the frequentist estimator

196

COMPARING BAYESIAN AND FREQUENTIST INFERENCES FOR MEAN

Figure 11.2 Mean-squared errors of Arnold’s, Beth’s, and Carol’s estimators.

pf = y, since she used the "flat"prior. In Figure 11.2 we see the ranges over which the Bayesian estimators have smaller MS than the frequentist estimator. In that range they will be closer to the true value, on average, than the frequentist estimator. The realistic range is the target mean (1015) plus or minus 3 standard deviations (5) which is from 1000 to 1030.

**75**> 76 77 78 79 80 81 .. 126 >> Next