# Introduction to Bayesian statistics - Bolstad M.

ISBN 0-471-27020-2

**Download**(direct link)

**:**

**30**> 31 32 33 34 35 36 .. 126 >> Next

The posterior probability of any particular Bj given A is the proportion of A that is also in Bj. In other words, the probability of that Bj П A divided by the sum of Bj П A summed over all j = 1,...,n.

In Bayes’ theorem, each of the joint probabilities are found by multiplying the prior probability P(Bj) times the likelihood P(A|Bj). In Chapter 5, we will see that

BAYES’ THEOREM

67

u Bi A

B2

Bs

B4

Figure 4.10 The Bayesian universe U with four unobservable events Bi for i = 1,... , 4 which partition it shown in the vertical dimension, and the observable event A shown in the horizontal dimension.

the universe set out with two dimensions for two jointly distributed discrete random variables is very similar to that shown in Figures 4.10 and 4.11. One random variable will be observed, and we will determine the conditional probability distribution of the other random variable, given our observed value of the first. In Chapter 6, we will develop Bayes’ theorem for two discrete random variables in an analogous manner to our development of Bayes’ theorem for events in this chapter.

Example 5 (continued) Figure 4.10 shows the four unobservable events B* for

i = 1,..., 4 that partition the Bayesian universe, together with event A which is observable. Figure 4.11 shows the reduced universe, given event A has occurred. These figures will give us better insight than Figures 4.8 and 4.9. We know where in the Bayesian universe we are in the horizontal direction since we know event A occurred. However we don’t know where we are in the vertical direction since we don’t know which one of the B* occurred.

Multiplying by constant. The numerator of Bayes’ theorem is the prior probability times the likelihood. The denominator is the sum of the prior probabilities times likelihoods over the whole partition. This division of the prior probability times likelihood by the sum of prior probabilities times likelihoods makes the posterior probability sum to 1.

Note, if we multiplied each of the likelihoods by a constant, the denominator would also be multiplied by the same constant. The constant would cancel out in the division, and we would be left with the same posterior probabilities. Because of this, we only need to know the likelihood to within a constant of proportionality. The relative weights given to each of the possibilities by the likelihood is all we need. Similarly, we could multiply each prior probability by a constant. The denominator would again be multiplied by the same constant, so we would be left with the same posterior probabilities. The only thing we need in the prior is the relative weights we

68

LOGIC, PROBABILITY, AND UNCERTAINTY

u B! Ur=A BpA

B2 B2 ПА

Bs ЦП A

B4 B/\A

Figure 4.11 The reduced Bayesian universe, given A has occurred, together with the four unobservable events Bi for i = 1,... , 4 that partition it.

give to each of the possibilities. We often write Bayes theorem in its proportional form as

posterior ж prior x likelihood

This gives the relative weights for each of the events Bi for i = 1,...,n after we know A has occurred. Dividing by the sum of the relative weights rescales the relative weights so they sum to 1. This makes it a probability distribution.

We can summarize the use of Bayes’ theorem for events by the following three steps:

1. Multiply prior times likelihood for each of the Bi. This finds the probability of Bi n A by the multiplication rule.

2. Sum them for i = 1,...,n. This finds the probability of A by the law of total probability.

3. Divide each of the prior times likelihood values by their sum. This finds the conditional probability of that particular Bi given A.

4.7 ASSIGNING PROBABILITIES

Any assignment of probabilities to all possible events must satisfy the probability axioms. Of course, to be useful the probabilities assigned to events must correspond to the real world. There are two methods of probability assignment that we will use:

1. Long run relativefrequency probability assignment: the probability of an event is considered to be the proportion of times it would occur if the experiment was repeated an infinite number of repetitions. This is the method of assigning probabilities used in frequentist statistics. For example, if I was trying to

ODDS RATIOS AND BAYES FACTOR

69

assign the probability of getting a head on a toss of a coin, I would toss it a large number of times, and use the proportion of heads that occurred as an approximation to the probability.

2. Degree of beliefprobability assignment: the probability of an event is what I believe it is from previous experience. This is subjective. Someone else can have a different belief. For example, I could say that I believe the coin is a fair one, so for me, the probability of getting a head equals .5. Someone else might look at the coin and observing a slight asymmetry he/she might decide the probability of getting a head equals .49.

In Bayesian statistics, we will use long run relative frequency assignments of probabilities for events that are outcomes of the random experiment, given the value of the unobservable variable. We call the unobservable variable the parameter. Think about repeating the experiment over and over again an infinite number of times while holding the parameter (unobservable) at a fixed value. The set of all possible observable values of the experiment is called the sample space of the experiment. The probability of an event is long run relative frequency of the event over all these hypothetical repetitions. We see the sample space is the observable (horizontal) dimension of the Bayesian universe.

**30**> 31 32 33 34 35 36 .. 126 >> Next