Download (direct link):
[0, 1], and we choose &F to be the horizontal line. This implies that the coefficient of Pi in (44) must be zero:
(C1X — Cqo) ¦+* (C0i — Ñö)Ðì — (C10 — C00)PF = 0. (46)
A Bayes test designed to minimize the maximum possible risk is called a minimax test. Equation 46 is referred to as the minimax equation and is useful whenever the maximum of ftB(Pi) is interior to the interval.
A special cost assignment that is frequently logical is
Coo = Ñ ii = 0
(This guarantees the maximum is interior.)
Coi = CM,
Ñþ = CF.
the risk is,
&f = CFPF + Pi(CmPm ~ CFPF)
= P0CFPF + P \CMP M
and the minimax equation is
CmPm = CpPp. (50)
Before continuing our discussion of likelihood ratio tests we shall discuss a second criterion and prove that it also leads to a likelihood ratio test.
Neyman-Pearson Tests. In many physical situations it is difficult to assign realistic costs or a priori probabilities. A simple procedure to bypass this difficulty is to work with the conditional probabilities PF and PD. In general, we should like to make PF as small as possible and PD as large as possible. For most problems of practical importance these are conflicting objectives. An obvious criterion is to constrain one of the probabilities and maximize (or minimize) the other. A specific statement of this criterion is the following:
Neyman-Pearson Criterion. Constrain PF = a < a and design a test to maximize PD (or minimize PM) under this constraint.
The solution is obtained easily by using Lagrange multipliers. We construct the function F,
F = PM + \[PF-a’]9 (51)
F = jz Ë³ÍõÎÂÄ) dR + A [Jz />r,H0(R|tf0)dR - a'], (52)
Clearly, if PF = a', then minimizing F minimizes PM.
34 2.2 Simple Binary Hypothesis Tests
F = A(1 - «') + Ã [p^iRlHJ - \prlHo(R\H0)] dR. (53)
Now observe that for any positive value of A an LRT will minimize F. (A negative value of A gives an LRT with the inequalities reversed.)
This follows directly, because to minimize F we assign a point R to Z0 only when the term in the bracket is negative. This is equivalent to the test
pr iHl(R|#i) ÷ . .
-----< A, assign point to Z0 or say H0. (54)
The quantity on the left is just the likelihood ratio. Thus F is minimized by the likelihood ratio test
A(R) > A. (55)
To satisfy the constraint we choose A so that PF = a. If we denote the density of Ë when H0 is true as /?Ë|ÿ0(À|#î), then we require
Ðë\í0(Ë\Í0) dA — a . (56)
Solving (56) for A gives the threshold. The value of A given by (56) will be non-negative because />ë|ÿ0(À|#î) is zero for negative values of A. Observe that decreasing A is equivalent to increasing Zu the region where we say #!. Thus PD increases as A decreases. Therefore we decrease A until we obtain the largest possible a < a. In most cases of interest to us PF is a continuous function of A and we have PF = a. We shall assume this continuity in all subsequent discussions. Under this assumption the Neyman-Pearson criterion leads to a likelihood ratio test. On p. 41 we shall see the effect of the continuity assumption not being valid.
Summary. In this section we have developed two ideas of fundamental importance in hypothesis testing. The first result is the demonstration that for a Bayes or a Neyman-Pearson criterion the optimum test consists of processing the observation R to find the likelihood ratio A(R) and then comparing A(R) to a threshold in order to make a decision. Thus, regardless of the dimensionality of the observation space, the decision space is one-dimensional.
The second idea is that of a sufficient statistic /(R). The idea of a sufficient statistic originated when we constructed the likelihood ratio and saw that it depended explicitly only on /(R). If we actually construct A(R) and then recognize /(R), the notion of a sufficient statistic is perhaps of secondary value. A more important case is when we can recognize /(R) directly. An easy way to do this is to examine the geometric interpretation of a sufficient
Likelihood Ratio Tests 35
statistic. We considered the observations ru r2,..rN as a point r in an N-dimensional space, and one way to describe this point is to use these coordinates. When we choose a sufficient statistic, we are simply describing the point in a coordinate system that is more useful for the decision problem. We denote the first coordinate in this system by /, the sufficient statistic, and the remaining N — 1 coordinates which will not affect our decision by the (N — l)-dimensional vector y. Thus
A(R) = A(L, Y) = (57)