in black and white
Main menu
Home About us Share a book
Biology Business Chemistry Computers Culture Economics Fiction Games Guide History Management Mathematical Medicine Mental Fitnes Physics Psychology Scince Sport Technics

Introduction to Bayesian statistics - Bolstad M.

Bolstad M. Introduction to Bayesian statistics - Wiley Publishing, 2004. - 361 p.
ISBN 0-471-27020-2
Download (direct link): introductiontobayesianstatistics2004.pdf
Previous << 1 .. 44 45 46 47 48 49 < 50 > 51 52 53 54 55 56 .. 126 >> Next

fo g(n) x f(y|n) dn
which requires an integration. Depending on the prior g(n) chosen, there may not necessarily be a closed form for the integral, so it may be necessary to do the integration numerically. We will look at some possible priors.
If we don’t have any idea beforehand what the proportion n is, we might like to choose a prior that does not favor any one value over another. Or, we may want to be as objective as possible, and not put our personal belief into the inference. In that case we should use the uniform prior that gives equal weight to all possible values of the success probability n. Although this does not achieve universal objectivity
1We know that for independent events (or random variables) the joint probability (or density) is the product of the marginal probabilities (or density functions). If they are not independent this does not hold. Likelihoods come from probability functions or probability density functions, so the same pattern holds. They can only be multiplied when they are independent.
(which is impossible to achieve), it is objective for this formulation of the problem:2
д(п) = 1 for 0 < п < 1.
Clearly, we see that in this case, the posterior density is proportional to the likelihood:
д(п|у) = ^ у ^ пу(1 — п)”-У for 0 < п < 1 .
We can ignore the part that doesn’t depend on п. It is a constant for all values of п, so it doesn’t affect the shape of the posterior. When we examine that part of the formula that shows the shape of the posterior as a function of п, we recognize this is a beta(a,b) distribution where a = у +1 and b = n — у + 1. So in this case, the posterior distribution of п given у is easily obtained. All that is necessary is look at the exponents of п and (1 — п). We didn’t have to do the integration.
Suppose a beta (a,b) prior density is used for п:
д(п; ab) = п“-1(1 — п)-1 for 0 < п < 1.
The posterior is proportional to prior times likelihood. We can ignore the constants in the prior and likelihood that don’t depend on the parameter, since we know multiplying either the prior or the likelihood by a constant won’t affect the results of Bayes’ theorem. This gives
д(п|у) <x п“+у-1(1 — п)ь+”-у-1 for 0 < п < 1
which is the shape of the posterior as a function of п. We recognize that this is the beta distribution with parameters a' = a + у and b1 = b + n — у. That is, we add the number of successes to a, and add the number of failures to b:
д(п|у) = / r(n + a + b) ^ пу+“-1(1 — п)”-У+ь-1
У( 1У) Г(у + a)r(n — у + b) ( )
for 0 < п < 1. Again, the posterior density of п has been easily obtained without having to go through the integration.
Figure 8.1 shows the shapes of beta(a,b) densities for values of a = .5,1,2,3 and
b = .5,1,2,3. This shows the variety of shapes members of the beta(a,b) family can
take. When a < b, the density has more weight in the lower half. The opposite is true when a > b. When a = b, the beta(a,b) density is symmetric. We note that the uniform prior is a special case of the beta(a,b) prior where a =1 and b =1.
2There are many possible parameterizations of the problem. Any one-to-one function of the parameter would also be a suitable parameter. The prior density for the new parameter could be found from the prior density of the original parameter using the change of variable formula, and would not be flat. In other words, it would favor some values of the new parameter over others. You can be objective in a given parameterization, but it would not be objective in the new formulation. Universal objectivity is not attainable.
beta(.5, 2)
beta(.5, 1)
beta(.5, 3)
beta(.5, .5)
beta(1, 2)
beta(1, .5)
beta(1, 1)
beta(1, 3)
beta(2, .5)
beta(2, 3)
beta(2, 2)
beta(3, 3)
beta(3, 2)
beta(3, .5)
beta(3, 1)
Figure 8.1 Some beta distributions.
Conjugate Family of Priors for Binomial Observation is the Beta Family
When we examine the shape of the binomial likelihood function as a function of n, we see that this is of the same form as the beta(a,b) distribution, a product of n to a power times (1 — n) to another power. When we multiply the beta prior times the binomial likelihood, we add the exponents of n and (1 — n), respectively. So we start with a beta prior, we get a beta posterior by the simple rule "add successes to a, add failures to b." This makes using beta(a,b) priors when we have binomial observations particularly easy. Using Bayes’ theorem moves us to another member of the same family.
We say that the beta distribution is the conjugate3 family for the binomial observation distribution. When we use a prior from the conjugate family, we don’t have to do any integration to find the posterior. All we have to do is use the observations to update the parameters of the conjugate family prior to find the conjugate family posterior. This is a big advantage.
Previous << 1 .. 44 45 46 47 48 49 < 50 > 51 52 53 54 55 56 .. 126 >> Next