# Introduction to Bayesian statistics - Bolstad M.

ISBN 0-471-27020-2

**Download**(direct link)

**:**

**41**> 42 43 44 45 46 47 .. 126 >> Next

0 0 ?? 0 0/ 3 = 0

1 1/15 4 4 i 15 115 / 3 1 5

2 2/15 3 4 1 10 1 /1 10 / 3 6 20

3 3/15 2 4 1 10 â– 1 /1 10 / 3 6 20

4 4/15 1 4 1 15 115 / 3 1 5

5 5/15 0 4 0 0/ 3 = 0

1 3 1.00

6.1 TWO EQUIVALENT WAYS OF USING BAYESâ€™ THEOREM

We may have more than one data set concerning a parameter. They might not even become available at the same time. Should we wait for the second data set, combine it with the first, and then use Bayesâ€™ theorem on the combined data set? This would mean that we have to go back to scratch every time more data became available, which would result in a lot of work. Another approach requiring less work would be to use the posterior probabilities given the first data set, as the prior probabilities for analyzing the second data set. We will find that these two approaches lead to the same posterior probabilities. This is a significant advantage to Bayesian methods. In frequentist statistics, we would have to use the first approach, re-analyzing the combined data set when the second one arrives.

Analyzing the observations in sequence. Suppose that we randomly draw a second ball out of the urn without replacing the first. Suppose the second draw resulted in a green ball, so Y = 0. We want to find the posterior probabilities of X given the results of the two observations, red first, green second. We will analyze the observations in sequence using Bayesâ€™ theorem each time. We will use the same prior probabilities as before for the first draw. However, we will use the posterior probabilities from the first draw as the prior probabilities for the second draw. The results are shown in Table 6.7.

Analyzing the observations all together. Alternatively, we could consider both draws together, then revise the probabilities using Bayesâ€™ theorem only once. Initially, we are in the same state of knowledge as before. So we take the same prior probabilities that we originally used for the first draw when we were analyzing the observations in sequence. All possible values of X are equally likely. The prior probability function is g(x) = | for x = 0,..., 5.

Let Yi and Y2 be the outcome of the first and second draw, respectively. The probabilities of the second draw depend on the balls left after the first draw. By the multiplication rule, the observation probability conditional on X is

f (yi,y2|x) = f (yilx) x f (Ð£2 IÐ£i, x)

TWO EQUIVALENT WAYS OF USING BAYESâ€™ THEOREM 101 Table 6.8 The joint distribution of X, Yi, Y2 and marginal distribution of Yi, Y2

Xi prior Vj ,Vj2 0,0

0 1/6 1 6 V 5 5 V

1 1/6 1 6 V 4 5 V

2 1/6 1 6 V 3 5 V

3 1/6 1 6 V 2 5 V

4 1/6 1 6 V 1 5 V

5 1/6 1 6 V 0 5 V

/ (Vi,V2) 40/120

Vjl , Vj2 0,1

Vjl ,Vj2 1,0

Vjl ,Vj2 1,1

1 V 5 v 4

6 V 5 V 4

1 V 4 V 1

6 V 5 V 4

1 V 3 V 2

6 V 5 V 4

1 V 2 V 3

6 V 5 V 4

1 V 1 V 4

6 V 5 V 4

1 V 0 V 4

6 V 5 V 4

1 V 0 V 4

6 V 5 V 4

1 V 1 V 4

6 V 5 V 4

1 V 2 V 3

6 V 5 V 4

1 V 3 V 2

6 V 5 V 4

1 V 4 V 1

6 V 5 V 4

1 V 5 V 0

6 V 5 V 4

1 V 0 V 4

6 V 5 V 4

1 V 1 V 2

6 V 5 V 4

1 V 2 V 1

6 V 5 V 4

1 V 3 V 2

6 V 5 V 4

1 V 4 V 3

6 V 5 V 4

1 V 5 V 4

6 V 5 V 4

20/120

20/120

40/120

4

Table 6.9 The posterior probability distribution given Yi = 1 and Y2 = 0

Xi prior Vjl, Vj2 Vjl ,Vj2 Vjl ,Vj2 2 Vjl posterior

0,0 0,1 1,0 1,1

0 1/6 20 120 0 0 0 0 =0

1 1/6 12 120 4 120 4 120 0 4 / 20 = 1 120/ 120 5

2 1/6 6 120 6 120 6 120 2 120 6 / 20 = 3 120/ 120 10

3 1/6 2 120 6 120 6 120 6 120 6 / 20 3 120 / 120 10

4 1/6 0 4 120 4 120 12 120 4 / 20 1 120/ 120 5

5 1/6 0 0 0 20 120 0 =0

f (V1,V2) 20/120 1.00

The joint distribution of X and Y1,Y2 is given in Table 6.8. The first ball was red, second was green, so the reduced universe probabilities are in column y3l ,yj2 = 1, 0. The likelihood function given by the conditional observation probabilities in that column are highlighted.

The first ball was red, second was green, so the reduced universe probabilities are in column Vj1, yj2 = 1,0. The posterior probability of X given Y1 = 1 and Y2 = 0 is found by rescaling the probabilities in the reduced universe so they sum to 1. This is shown in Table 6.9.

We see this is the same as the posterior probabilities we found analyzing the observations sequentially, using the posterior after the first as the prior for the second. This shows that it makes no difference whether you analyze the observations one at a time in sequence using the posterior after the previous step as the prior for the next step, or whether you analyze all observations together in a single step starting with your initial prior!

102 BAYESIAN INFERENCE FOR DISCRETE RANDOM VARIABLES

Table 6.10 The posterior probability distribution after both observations

Xi prior likelihood prior x likelihood posterior

0 1/6 0 0 0 /1 =0

20 120 120 6

1 1/6 4 4 4 /1 1

20 120 120 6 5

2 1/6 6 6 6 /1 3

20 120 120 6 10

3 1/6 6 6 6 /1 3

20 120 120 6 10

4 1/6 4 4 4 /1 1

20 120 120 6 5

5 1/6 0 0 0 /1 = 0

20 120 120 / 6

1 6 1.00

Since we only use the column corresponding to the reduced universe, it is simpler to finding the posterior by multiplying prior times likelihood and rescaling to make it a probability distribution. This is shown in Table 6.10

**41**> 42 43 44 45 46 47 .. 126 >> Next