Solutions Manual for Statistical Inference, Second Edition

Solutions Manual for
Statistical Inference, Second Edition

George Casella
University of Florida

Roger L. Berger
North Carolina State University
Damaris Santana
University of Florida

0-2

Solutions Manual for Statistical Inference

“When I hear you give your reasons,” I remarked, “the thing always appears to me to be so ridiculously simple that I could easily do it myself, though at each successive instance of your reasoning I am baﬄed until you explain your process.”
Dr. Watson to Sherlock Holmes
A Scandal in Bohemia
0.1 Description
This solutions manual contains solutions for all odd numbered problems plus a large number of solutions for even numbered problems. Of the 624 exercises in Statistical Inference, Second Edition, this manual gives solutions for 484 (78%) of them. There is an obtuse pattern as to which solutions were included in this manual. We assembled all of the solutions that we had from the ﬁrst edition, and ﬁlled in so that all odd-numbered problems were done. In the passage from the ﬁrst to the second edition, problems were shuﬄed with no attention paid to numbering (hence no attention paid to minimize the new eﬀort), but rather we tried to put the problems in logical order.
A major change from the ﬁrst edition is the use of the computer, both symbolically through
Mathematicatm and numerically using R. Some solutions are given as code in either of these languages. Mathematicatm can be purchased from Wolfram Research, and R is a free download from http://www.r-project.org/. Here is a detailed listing of the solutions included.
Chapter
1
2
3
4

Number of Exercises
55
40
50
65

Number of Solutions
51
37
42
52

5

69

46

6
7

43
66

35
52

8
9

58
58

51
41

10
11
12

48
41
31

26
35
16

Missing
26, 30, 36, 42
34, 38, 40
4, 6, 10, 20, 30, 32, 34, 36
8, 14, 22, 28, 36, 40
48, 50, 52, 56, 58, 60, 62
2, 4, 12, 14, 26, 28 all even problems from 36 − 68
8, 16, 26, 28, 34, 36, 38, 42
4, 14, 16, 28, 30, 32, 34,
36, 42, 54, 58, 60, 62, 64
36, 40, 46, 48, 52, 56, 58
2, 8, 10, 20, 22, 24, 26, 28, 30
32, 38, 40, 42, 44, 50, 54, 56 all even problems except 4 and 32
4, 20, 22, 24, 26, 40 all even problems

0.2 Acknowledgement
Many people contributed to the assembly of this solutions manual. We again thank all of those who contributed solutions to the ﬁrst edition – many problems have carried over into the second edition. Moreover, throughout the years a number of people have been in constant touch with us, contributing to both the presentations and solutions. We apologize in advance for those we forget to mention, and we especially thank Jay Beder, Yong Sung Joo, Michael Perlman, Rob Strawderman, and Tom Wehrly. Thank you all for your help.
And, as we said the ﬁrst time around, although we have beneﬁted greatly from the assistance and

ACKNOWLEDGEMENT

0-3

comments of others in the assembly of this manual, we are responsible for its ultimate correctness.
To this end, we have tried our best but, as a wise man once said, “You pays your money and you takes your chances.”
George Casella
Roger L. Berger
Damaris Santana
December, 2001

Chapter 1

Probability Theory

“If any little problem comes your way, I shall be happy, if I can, to give you a hint or two as to its solution.”
Sherlock Holmes
The Adventure of the Three Students
1.1 a. Each sample point describes the result of the toss (H or T) for each of the four tosses. So, for example THTT denotes T on 1st, H on 2nd, T on 3rd and T on 4th. There are 24 = 16 such sample points.
b. The number of damaged leaves is a nonnegative integer. So we might use S = {0, 1, 2, . . .}.
c. We might observe fractions of an hour. So we might use S = {t : t ≥ 0}, that is, the half inﬁnite interval [0, ∞).
d. Suppose we weigh the rats in ounces. The weight must be greater than zero so we might use
S = (0, ∞). If we know no 10-day-old rat weighs more than 100 oz., we could use S = (0, 100].
e. If n is the number of items in the shipment, then S = {0/n, 1/n, . . . , 1}.
1.2 For each of these equalities, you must show containment in both directions.
a. x ∈ A\B ⇔ x ∈ A and x ∈ B ⇔ x ∈ A and x ∈ A ∩ B ⇔ x ∈ A\(A ∩ B ). Also, x ∈ A and
/
/ x ∈ B ⇔ x ∈ A and x ∈ B c ⇔ x ∈ A ∩ B c .
/
b. Suppose x ∈ B . Then either x ∈ A or x ∈ Ac . If x ∈ A, then x ∈ B ∩ A, and, hence x ∈ (B ∩ A) ∪ (B ∩ Ac ). Thus B ⊂ (B ∩ A) ∪ (B ∩ Ac ). Now suppose x ∈ (B ∩ A) ∪ (B ∩ Ac ).
Then either x ∈ (B ∩ A) or x ∈ (B ∩ Ac ). If x ∈ (B ∩ A), then x ∈ B . If x ∈ (B ∩ Ac ), then x ∈ B . Thus (B ∩ A) ∪ (B ∩ Ac ) ⊂ B . Since the containment goes both ways, we have
B = (B ∩ A) ∪ (B ∩ Ac ). (Note, a more straightforward argument for this part simply uses the Distributive Law to state that (B ∩ A) ∪ (B ∩ Ac ) = B ∩ (A ∪ Ac ) = B ∩ S = B.)
c. Similar to part a).
d. From part b).
A ∪ B = A ∪ [(B ∩ A) ∪ (B ∩ Ac )] = A ∪ (B ∩ A) ∪ A ∪ (B ∩ Ac ) = A ∪ [A ∪ (B ∩ Ac )] =
A ∪ (B ∩ Ac ).
1.3 a. x ∈ A ∪ B ⇔ x ∈ A or x ∈ B ⇔ x ∈ B ∪ A x ∈ A ∩ B ⇔ x ∈ A and x ∈ B ⇔ x ∈ B ∩ A.
b. x ∈ A ∪ (B ∪ C ) ⇔ x ∈ A or x ∈ B ∪ C ⇔ x ∈ A ∪ B or x ∈ C ⇔ x ∈ (A ∪ B ) ∪ C .
(It can similarly be shown that A ∪ (B ∪ C ) = (A ∪ C ) ∪ B .) x ∈ A ∩ (B ∩ C ) ⇔ x ∈ A and x ∈ B and x ∈ C ⇔ x ∈ (A ∩ B ) ∩ C .
c. x ∈ (A ∪ B )c ⇔ x ∈ A or x ∈ B ⇔ x ∈ Ac and x ∈ B c ⇔ x ∈ Ac ∩ B c
/
/ x ∈ (A ∩ B )c ⇔ x ∈ A ∩ B ⇔ x ∈ A and x ∈ B ⇔ x ∈ Ac or x ∈ B c ⇔ x ∈ Ac ∪ B c .
/
/
/
1.4 a. “A or B or both” is A∪B . From Theorem 1.2.9b we have P (A∪B ) = P (A)+P (B )−P (A∩B ).

1-2

Solutions Manual for Statistical Inference

b. “A or B but not both” is (A ∩ B c ) ∪ (B ∩ Ac ). Thus we have
P ((A ∩ B c ) ∪ (B ∩ Ac ))

c.
d.
1.5 a.
b.
1.6

= P (A ∩ B c ) + P (B ∩ Ac )
(disjoint union)
= [P (A) − P (A ∩ B )] + [P (B ) − P (A ∩ B )] (Theorem1.2.9a)
= P (A) + P (B ) − 2P (A ∩ B ).

“At least one of A or B ” is A ∪ B . So we get the same answer as in a).
“At most one of A or B ” is (A ∩ B )c , and P ((A ∩ B )c ) = 1 − P (A ∩ B ).
A ∩ B ∩ C = {a U.S. birth results in identical twins that are female}
1
P (A ∩ B ∩ C ) = 90 × 1 × 1
3
2 p0 = (1 − u)(1 − w),

p1 = u(1 − w) + w(1 − u),

p0 = p2 p1 = p2

p2 = uw,

⇒ u+w =1
⇒ uw = 1/3.

These two equations imply u(1 − u) = 1/3, which has no solution in the real numbers. Thus, the probability assignment is not legitimate.
1.7 a.
P (scoring i points) =

1− πr 2
A

πr 2
A
(6−i)2 −(5−i)2
52

if i = 0 if i = 1, . . . , 5.

b.
P (scoring i points|board is hit)
P (board is hit)

=

P (scoring i points ∩ board is hit)

P (scoring i points ∩ board is hit)
P (board is hit) πr2 A πr2 (6 − i)2 − (5 − i)2 i = 1, . . . , 5.
A
52

=

=

Therefore,
P (scoring i points|board is hit) =

(6 − i)2 − (5 − i)2
52

i = 1, . . . , 5

which is exactly the probability distribution of Example 1.2.7.
1.8 a. P (scoring exactly i points) = P (inside circle i) − P (inside circle i + 1). Circle i has radius
(6 − i)r/5, so
2

P (sscoring exactly i points) =

2

2

π (6 − i) r2 π ((6−(i + 1)))2 r2
(6 − i) −(5 − i)
−
=
.
52 πr2
52 πr2
52

−
b. Expanding the squares in part a) we ﬁnd P (scoring exactly i points) = 11252i , which is decreasing in i.
−
c. Let P (i) = 11252i . Since i ≤ 5, P (i) ≥ 0 for all i. P (S ) = P (hitting the dartboard) = 1 by deﬁnition. Lastly, P (i ∪ j ) = area of i ring + area of j ring = P (i) + P (j ).
1.9 a. Suppose x ∈ (∪α Aα )c , by the deﬁnition of complement x ∈ ∪α Aα , that is x ∈ Aα for all α ∈ Γ. Therefore x ∈ Ac for all α ∈ Γ. Thus x ∈ ∩α Ac and, by the deﬁnition of intersection α α x ∈ Ac for all α ∈ Γ. By the deﬁnition of complement x ∈ Aα for all α ∈ Γ. Therefore α x ∈ ∪α Aα . Thus x ∈ (∪α Aα )c .

Second Edition

1-3

b. Suppose x ∈ (∩α Aα )c , by the deﬁnition of complement x ∈ (∩α Aα ). Therefore x ∈ Aα for some α ∈ Γ. Therefore x ∈ Ac for some α ∈ Γ. Thus x ∈ ∪α Ac and, by the deﬁnition of α α union, x ∈ Ac for some α ∈ Γ. Therefore x ∈ Aα for some α ∈ Γ. Therefore x ∈ ∩α Aα . Thus α x ∈ (∩α Aα )c .
1.10 For A1 , . . . , An c n

Ai

(i)

n

=

(ii)

Ai

i=1

i=1

c

n

Ac i i=1

n

Ac i = i=1 Proof of (i): If x ∈ (∪Ai )c , then x ∈ ∪Ai . That implies x ∈ Ai for any i, so x ∈ Ac for every i
/
/ i and x ∈ ∩Ai .
Proof of (ii): If x ∈ (∩Ai )c , then x ∈ ∩Ai . That implies x ∈ Ac for some i, so x ∈ ∪Ac .
/
i i 1.11 We must verify each of the three properties in Deﬁnition 1.2.1.
a. (1) The empty set ∅ ∈ {∅, S }. Thus ∅ ∈ B . (2) ∅c = S ∈ B and S c = ∅ ∈ B . (3) ∅∪S = S ∈ B .
b. (1) The empty set ∅ is a subset of any set, in particular, ∅ ⊂ S . Thus ∅ ∈ B . (2) If A ∈ B , then A ⊂ S . By the deﬁnition of complementation, Ac is also a subset of S , and, hence,
Ac ∈ B . (3) If A1 , A2 , . . . ∈ B , then, for each i, Ai ⊂ S . By the deﬁnition of union, ∪Ai ⊂ S .
Hence, ∪Ai ∈ B .
c. Let B1 and B2 be the two sigma algebras. (1) ∅ ∈ B1 and ∅ ∈ B2 since B1 and B2 are sigma algebras. Thus ∅ ∈ B1 ∩ B2 . (2) If A ∈ B1 ∩ B2 , then A ∈ B1 and A ∈ B2 . Since
B1 and B2 are both sigma algebra Ac ∈ B1 and Ac ∈ B2 . Therefore Ac ∈ B1 ∩ B2 . (3) If
A1 , A2 , . . . ∈ B1 ∩ B2 , then A1 , A2 , . . . ∈ B1 and A1 , A2 , . . . ∈ B2 . Therefore, since B1 and B2 are both sigma algebra, ∪∞ Ai ∈ B1 and ∪∞ Ai ∈ B2 . Thus ∪∞ Ai ∈ B1 ∩ B2 . i=1 i=1 i=1 1.12 First write
∞

P

∞

n

Ai

Ai ∪

=P

i=1

i=1

Ai i=n+1 ∞

n

=P

Ai

+P

Ai

i=1

∞

n

=

(Ai s are disjoint)

i=n+1

P (Ai ) + P i=1 Ai

(ﬁnite additivity)

i=n+1

∞

Now deﬁne Bk = i=k Ai . Note that Bk+1 ⊂ Bk and Bk → φ as k → ∞. (Otherwise the sum of the probabilities would be inﬁnite.) Thus
∞

P

∞

Ai i=1 = lim P n→∞ i=1

∞

n

Ai

= lim

n→∞

P (Ai ) + P (B n+1 ) = i=1 P (Ai ). i=1 1.13 If A and B are disjoint, P (A ∪ B ) = P (A) + P (B ) = 1 + 3 = 13 , which is impossible. More
3
4
12
generally, if A and B are disjoint, then A ⊂ B c and P (A) ≤ P (B c ). But here P (A) > P (B c ), so A and B cannot be disjoint.
1.14 If S = {s1 , . . . , sn }, then any subset of S can be constructed by either including or excluding si , for each i. Thus there are 2n possible choices.
1.15 Proof by induction. The proof for k = 2 is given after Theorem 1.2.14. Assume true for k , that is, the entire job can be done in n1 × n2 × · · · × nk ways. For k + 1, the k + 1th task can be done in nk+1 ways, and for each one of these ways we can complete the job by performing

1-4

Solutions Manual for Statistical Inference

the remaining k tasks. Thus for each of the nk+1 we have n1 × n2 × · · · × nk ways of completing the job by the induction hypothesis. Thus, the number of ways we can do the job is
(1 × (n1 × n2 × · · · × nk )) + · · · + (1 × (n1 × n2 × · · · × nk )) = n1 × n2 × · · · × nk × nk+1 . nk+1 terms
3

1.16 a) 26 .

3

2

b) 26 + 26 .

c) 264 + 263 + 262 .

1.17 There are n = n(n − 1)/2 pieces on which the two numbers do not match. (Choose 2 out of
2
n numbers without replacement.) There are n pieces on which the two numbers match. So the total number of diﬀerent pieces is n + n(n − 1)/2 = n(n + 1)/2.
(n)n!
1.18 The probability is 2 n = (n−1)(n−1)! . There are many ways to obtain this. Here is one. The n 2nn−2 denominator is nn because this is the number of ways to place n balls in n cells. The numerator is the number of ways of placing the balls such that exactly one cell is empty. There are n ways to specify the empty cell. There are n − 1 ways of choosing the cell with two balls. There are n 2 ways of picking the 2 balls to go into this cell. And there are (n − 2)! ways of placing the remaining n − 2 balls into the n − 2 cells, one ball in each cell. The product of these is the numerator n(n − 1) n (n − 2)! = n n!.
2
2
1.19 a. 6 = 15.
4
b. Think of the n variables as n bins. Diﬀerentiating with respect to one of the variables is equivalent to putting a ball in the bin. Thus there are r unlabeled balls to be placed in n r unlabeled bins, and there are n+r −1 ways to do this.
1.20 A sample point speciﬁes on which day (1 through 7) each of the 12 calls happens. Thus there are 712 equally likely sample points. There are several diﬀerent ways that the calls might be assigned so that there is at least one call each day. There might be 6 calls one day and 1 call each of the other days. Denote this by 6111111. The number of sample points with this pattern is 7 162 6!. There are 7 ways to specify the day with 6 calls. There are 162 to specify which of the 12 calls are on this day. And there are 6! ways of assigning the remaining 6 calls to the remaining 6 days. We will now count another pattern. There might be 4 calls on one day, 2 calls on each of two days, and 1 call on each of the remaining four days. Denote this by 4221111.
The number of sample points with this pattern is 7 142 6 8 6 4!. (7 ways to pick day with 4
222
calls, 142 to pick the calls for that day, 6 to pick two days with two calls, 8 ways to pick
2
2 two calls for lowered numbered day, 6 ways to pick the two calls for higher numbered day,
2
4! ways to order remaining 4 calls.) Here is a list of all the possibilities and the counts of the sample points for each one. pattern 6111111
5211111
4221111
4311111
3321111
3222111
2222211

number of sample points
7 162 6! =
7 152 6 7 5! =
2
7 142 6 8 6 4! =
222
7 142 6 8 5! =
3
7 12 9
6
2
3
3 5 2 4! =
12 6 9 7 5
7 3 3 3 2 2 3! =
7 12 10 8 6 4
5
2
2
2 2 2 2! =

4,656,960
83,825,280
523,908,000
139,708,800
698,544,000
1,397,088,000
314,344,800
3,162,075,840

,
The probability is the total number of sample points divided by 712 , which is 3,1627075,840 ≈
12
.2285.
( n )22r
1.21 The probability is 2r2n . There are 2n ways of choosing 2r shoes from a total of 2n shoes.
2r
( 2r )
2n
Thus there are 2r equally likely sample points. The numerator is the number of sample points n for which there will be no matching pair. There are 2r ways of choosing 2r diﬀerent shoes

Second Edition

1-5

styles. There are two ways of choosing within a given shoe style (left shoe or right shoe), which n gives 22r ways of arranging each one of the 2r arrays. The product of this is the numerator n 2r
2r 2 .
1.22 a)

(31)(29)(31)(30)···(31)
15
15
15
15
15
(366)
180

b)

336 335
316
366 365 ··· 336
366
30

(

)

.

1.23 n P ( same number of heads )

P (1st tosses x, 2nd tosses x)

= x=0 n

n x = x=0 x

1
2

n−x 2

1
2

1
4

=

n

n

x=0

n x 2

.

1.24 a.
∞

P (A wins)

P (A wins on ith toss)

= i=1 1
2

1
+
2

=

2

1
+
2

1
2

4

1
2

+ ··· = i=0 ∞ i=0 b. P (A wins) = p + (1 − p)2 p + (1 − p)4 p + · · · =
c.

∞

1
2

p(1 − p)2i =

2i+1

= 2/3.

p
.
1−(1−p)2

2

p
= [1−(1−p)2 ]2 > 0. Thus the probability is increasing in p, and the minimum p is at zero. Using L’Hˆpital’s rule we ﬁnd limp→0 1−(1−p)2 = 1/2. o d dp p
1−(1−p)2

1.25 Enumerating the sample space gives S = {(B, B ), (B, G), (G, B ), (G, G)} ,with each outcome equally likely. Thus P (at least one boy) = 3/4 and P (both are boys) = 1/4, therefore
P ( both are boys | at least one boy ) = 1/3.
An ambiguity may arise if order is not acknowledged, the space is S = {(B, B ), (B, G), (G, G)}, with each outcome equally likely.
1.27 a. For n odd the proof is straightforward. There are an even number of terms in the sum n (0, 1, · · · , n), and n and n−k , which are equal, have opposite signs. Thus, all pairs cancel k and the sum is zero. If n is even, use the following identity, which is the basis of Pascal’s
−1
triangle: For k > 0, n = n−1 + n−1 . Then, for n even k k k n k (−1) k=0 n k n
+
0

=

n−1

n n + k n

(−1)k k=1 n−1

n−1 n−1 + k k−1

=

n n +
+
0 n =

n n n−1 n−1 +
−
−
0
n
0
n−1

b. Use the fact that for k > 0, k n k k=1 n k n k =n n =n k=1 n−1 k−1 (−1)k k=1 = 0.

to write

n−1 k−1 n−1

=n j =0

n−1 j = n2n−1 .

1-6

Solutions Manual for Statistical Inference n k+1

n

k+1
c.
kn = k=1 (−1) k=1 (−1) k 1.28 The average of the two integrals is

n−1 k−1 =n

[(n log n − n) + ((n + 1) log (n + 1) − n)] /2

n−1 j n−1 j =0 (−1) j = 0 from part a).

= [n log n + (n + 1) log (n + 1)] /2 − n
≈ (n + 1/2) log n − n.

Let dn = log n! − [(n + 1/2) log n − n], and we want to show that limn→∞ mdn = c, a constant.
This would complete the problem, since the desired limit is the exponential of this one. This is accomplished in an indirect way, by working with diﬀerences, which avoids dealing with the factorial. Note that
1
1 dn − dn+1 = n + log 1 +
− 1.
2
n
1
1
Diﬀerentiation will show that ((n + 2 )) log((1 + n )) is increasing in n, and has minimum value (3/2) log 2 = 1.04 at n = 1. Thus dn − dn+1 > 0. Next recall the Taylor expansion of log(1 + x) = x − x2 /2 + x3 /3 − x4 /4 + · · ·. The ﬁrst three terms provide an upper bound on log(1 + x), as the remaining adjacent pairs are negative. Hence

0 < dn dn+1 <

n+

1
2

11
1
+3
2
n 2n
3n

−1=

1
1
+ 3.
2
12n
6n

∞

It therefore follows, by the comparison test, that the series 1 dn − dn+1 converges. Moreover, the partial sums must approach a limit. Hence, since the sum telescopes,
N

dn − dn+1 = lim d1 − dN +1 = c.

lim

N →∞

1

N →∞

Thus limn→∞ dn = d1 − c, a constant.
Unordered
Ordered
1.29 a. {4,4,12,12} (4,4,12,12), (4,12,12,4), (4,12,4,12)
(12,4,12,4), (12,4,4,12), (12,12,4,4)
Unordered
Ordered
(2,9,9,12), (2,9,12,9), (2,12,9,9), (9,2,9,12)
{2,9,9,12} (9,2,12,9), (9,9,2,12), (9,9,12,2), (9,12,2,9)
(9,12,9,2), (12,2,9,9), (12,9,2,9), (12,9,9,2)
b. Same as (a).
c. There are 66 ordered samples with replacement from {1, 2, 7, 8, 14, 20}. The number of or6! dered samples that would result in {2, 7, 7, 8, 14, 14} is 2!2!1!1! = 180 (See Example 1.2.20).
180
Thus the probability is 66 .
d. If the k objects were distinguishable then there would be k ! possible ordered arrangements.
Since we have k1 , . . . , km diﬀerent groups of indistinguishable objects, once the positions of the objects are ﬁxed in the ordered arrangement permutations within objects of the same group won’t change the ordered arrangement. There are k1 !k2 ! · · · km ! of such permutations
!
for each ordered component. Thus there would be k1 !k2k···km ! diﬀerent ordered components.
!
e. Think of the m distinct numbers as m bins. Selecting a sample of size k , with replacement, is the same as putting k balls in the m bins. This is k+m−1 , which is the number of distinct k bootstrap samples. Note that, to create all of the bootstrap samples, we do not need to know what the original sample was. We only need to know the sample size and the distinct values.
1.31 a. The number of ordered samples drawn with replacement from the set {x1 , . . . , xn } is nn . The number of ordered samples that make up the unordered sample {x1 , . . . , xn } is n!. Therefore the outcome with average x1 +x2 +···+xn that is obtained by the unordered sample {x1 , . . . , xn } n Second Edition

1-7

n! has probability nn . Any other unordered outcome from {x1 , . . . , xn }, distinct from the unordered sample {x1 , . . . , xn }, will contain m diﬀerent numbers repeated k1 , . . . , km times where k1 + k2 + · · · + km = n with at least one of the ki ’s satisfying 2 ≤ ki ≤ n. The probability of obtaining the corresponding average of such outcome is

n! n! < n , since k1 !k2 ! · · · km ! > 1. k1 !k2 ! · · · km !nn n Therefore the outcome with average

x1 +x2 +···+xn n is the most likely.
√
b. Stirling’s approximation is that, as n → ∞, n! ≈ 2πnn+(1/2) e−n , and thus
√
√ n! 2nπ n!en 2πnn+(1/2) e−n en
√
√
=
=
= 1. n n n e nn 2nπ nn 2nπ
c. Since we are drawing with replacement from the set {x1 , . . . , xn }, the probability of choosing
1
any xi is n . Therefore the probability of obtaining an ordered sample of size n without xi
1n
1 is (1 − n ) . To prove that limn→∞ (1 − n )n = e−1 , calculate the limit of the log. That is lim n log 1 −

n→∞

1 n log 1 − n→∞ 1/n

= lim

1 n .

L’Hˆpital’s rule shows that the limit is −1, establishing the result. See also Lemma 2.3.14. o 1.32 This is most easily seen by doing each possibility. Let P (i) = probability that the candidate hired on the ith trial is best. Then
P (1) =

1
,
N

P (2) =

1
,
N −1

...

, P (i) =

1
,
N −i+1

...

, P (N ) = 1.

1.33 Using Bayes rule
P (M |CB ) =

.05 × 1
P (CB |M )P (M )
2
=
P (CB |M )P (M ) + P (CB |F )P (F )
.05 × 1 +.0025 ×
2

= .9524.

1
2

1.34 a.
P (Brown Hair)
= P (Brown Hair|Litter 1)P (Litter 1) + P (Brown Hair|Litter 2)P (Litter 2)
2
1
3
1
19
=
+
=
.
3
2
5
2
30
b. Use Bayes Theorem
P (Litter 1|Brown Hair) =

P (BH |L1)P (L1)
P (BH |L1)P (L1) + P (BH |L2)P (L2

=

2
3

1.35 Clearly P (·|B ) ≥ 0, and P (S |B ) = 1. If A1 , A2 , . . . are disjoint, then
∞

P

Ai B

=

P(

∞ i=1 i=1

=

∞ i=1 Ai ∩ B )
P (B )

=

P (Ai ∩ B )
P (B )

=

P(

∞ i=1 (Ai ∩ B ))
P (B )

∞

P (Ai |B ). i=1 1
2
19
30

=

10
.
19

1-8

Solutions Manual for Statistical Inference

1.37 a. Using the same events A, B, C and W as in Example 1.3.4, we have
P (W ) = P (W|A)P (A) + P (W|B )P (B ) + P (W|C )P (C )
1
1
1
γ +1
=γ
+0
+1
=
.
3
3
3
3
Thus, P (A|W ) =

P (A∩W )
P (W )

=

γ/3
(γ +1)/3

=

γ γ +1

γ
 γ +1 =

γ
<
 γ +1
γ>
γ +1

where,
1
3
1
3
1
3

if γ = 1
2
if γ < 1
2
if γ > 1 .
2

b. By Exercise 1.35, P (·|W ) is a probability function. A, B and C are a partition. So
P (A|W ) + P (B |W ) + P (C |W ) = 1.
But, P (B |W ) = 0. Thus, P (A|W ) + P (C |W ) = 1. Since P (A|W ) = 1/3, P (C |W ) = 2/3.
(This could be calculated directly, as in Example 1.3.4.) So if A can swap fates with C , his chance of survival becomes 2/3.
1.38 a. P (A) = P (A ∩ B ) + P (A ∩ B c ) from Theorem 1.2.11a. But (A ∩ B c ) ⊂ B c and P (B c ) =
1 − P (B ) = 0. So P (A ∩ B c ) = 0, and P (A) = P (A ∩ B ). Thus,
P (A|B ) =

P (A)
P (A ∩ B )
=
= P (A)
P (B )
1

.
b. A ⊂ B implies A ∩ B = A. Thus,
P (B |A) =

P (A ∩ B )
P (A)
=
= 1.
P (A)
P (A)

And also,
P (A|B ) =

P (A ∩ B )
P (A)
=
.
P (B )
P (B )

c. If A and B are mutually exclusive, then P (A ∪ B ) = P (A) + P (B ) and A ∩ (A ∪ B ) = A.
Thus,
P (A ∩ (A ∪ B ))
P (A)
P (A|A ∪ B ) =
=
.
P (A ∪ B )
P (A) + P (B )
d. P (A ∩ B ∩ C ) = P (A ∩ (B ∩ C )) = P (A|B ∩ C )P (B ∩ C ) = P (A|B ∩ C )P (B |C )P (C ).
1.39 a. Suppose A and B are mutually exclusive. Then A ∩ B = ∅ and P (A ∩ B ) = 0. If A and B are independent, then 0 = P (A ∩ B ) = P (A)P (B ). But this cannot be since P (A) > 0 and
P (B ) > 0. Thus A and B cannot be independent.
b. If A and B are independent and both have positive probability, then
0 < P (A)P (B ) = P (A ∩ B ).
This implies A ∩ B = ∅, that is, A and B are not mutually exclusive.
1.40 a. P (Ac ∩ B ) = P (Ac |B )P (B ) = [1 − P (A|B )]P (B ) = [1 − P (A)]P (B ) = P (Ac )P (B ) , where the third equality follows from the independence of A and B .
b. P (Ac ∩ B c ) = P (Ac ) − P (Ac ∩ B ) = P (Ac ) − P (Ac )P (B ) = P (Ac )P (B c ).

Second Edition

1-9

1.41 a.
P ( dash sent | dash rec)
=
=

P ( dash rec | dash sent)P ( dash sent)
P ( dash rec | dash sent)P ( dash sent) + P ( dash rec | dot sent)P ( dot sent)
(2/3)(4/7)
= 32/41.
(2/3)(4/7) + (1/4)(3/7)

b. By a similar calculation as the one in (a) P (dot sent|dot rec) = 27/434. Then we have
16
P ( dash sent|dot rec) = 43 . Given that dot-dot was received, the distribution of the four possibilities of what was sent are
Event
dash-dash dash-dot dot-dash dot-dot Probability
(16/43)2
(16/43)(27/43)
(27/43)(16/43)
(27/43)2

1.43 a. For Boole’s Inequality, n P (∪n ) ≤ i=1 n

P (Ai ) − P2 + P3 + · · · ± Pn ≤ i=1 P (Ai ) i=1 since Pi ≥ Pj if i ≤ j and therefore the terms −P2k + P2k+1 ≤ 0 for k = 1, . . . , n−1 when
2
n is odd. When n is even the last term to consider is −Pn ≤ 0. For Bonferroni’s Inequality apply the inclusion-exclusion identity to the Ac , and use the argument leading to (1.2.10). i b. We illustrate the proof that the Pi are increasing by showing that P2 ≥ P3 . The other arguments are similar. Write n−1 P (Ai ∩ Aj )

P2 =

n

P (Ai ∩ Aj )

=

1≤i 0.

1
2

+

1 π π
2

= 1, and

Second Edition

1-11
−y

(1−
1
1 d 1 d 1
e. limy→−∞ 1+−−y = 0, limy→∞ + 1+−−y = 1, dx ( 1+−−y ) = (1+e)ey )2 > 0 and dx ( + 1+−−y ) >
−
e e e e 1
0, FY (y ) is continuous except on y = 0 where limy↓0 ( + 1+−−y ) = F (0). Thus is FY (y ) right e continuous.

1.48 If F (·) is a cdf, F (x) = P (X ≤ x). Hence limx→∞ P (X ≤ x) = 0 and limx→−∞ P (X ≤ x) = 1.
F (x) is nondecreasing since the set {x : X ≤ x} is nondecreasing in x. Lastly, as x ↓ x0 ,
P (X ≤ x) → P (X ≤ x0 ), so F (·) is right-continuous. (This is merely a consequence of deﬁning
F (x) with “ ≤ ”.)
1.49 For every t, FX (t) ≤ FY (t). Thus we have
P (X > t) = 1 − P (X ≤ t) = 1 − FX (t) ≥ 1 − FY (t) = 1 − P (Y ≤ t) = P (Y > t).
And for some t∗ , FX (t∗ ) < FY (t∗ ). Then we have that
P (X > t∗ ) = 1 − P (X ≤ t∗ ) = 1 − FX (t∗ ) > 1 − FY (t∗ ) = 1 − P (Y ≤ t∗ ) = P (Y > t∗ ).
1.50 Proof by induction. For n = 2
2

tk−1 = 1 + t = k=1 n k−1 k=1 t

Assume true for n, this is n+1 1−tn
1−t .

n

tk−1 = k=1 =

tk−1 + tn = k=1 1−t2
.
1−t

Then for n + 1

1−tn
1−tn +tn (1−t)
1−tn+1
+ tn =
=
,
1−t
1−t
1−t

where the second inequality follows from the induction hypothesis.
1.51 This kind of random variable is called hypergeometric in Chapter 3. The probabilities are obtained by counting arguments, as follows. x 0
1
2
3
4

fX (x) = P (X = x)
5
0
5
1
5
2
5
3
5
4

25
4
25
3
25
2
25
1
25
0

30
4
30
4
30
4
30
4
30
4

≈ .4616
≈ .4196
≈ .1095
≈ .0091
≈ .0002

The cdf is a step function with jumps at x = 0, 1, 2, 3 and 4.
1.52 The function g (·) is clearly positive. Also,
∞

∞

g (x)dx = x0 x0

1−F (x0 ) f (x) dx =
= 1.
1−F (x0 )
1−F (x0 )

1.53 a. limy→−∞ FY (y ) = limy→−∞ 0 = 0 and limy→∞ FY (y ) = limy→∞ 1 − y12 = 1. For y ≤ 1, d FY (y ) = 0 is constant. For y > 1, dy FY (y ) = 2/y 3 > 0, so FY is increasing. Thus for all y ,
FY is nondecreasing. Therefore FY is a cdf.
2/y 3 if y > 1 d b. The pdf is fY (y ) = dy FY (y ) =
0
if y ≤ 1.
c. FZ (z ) = P (Z ≤ z ) = P (10(Y − 1) ≤ z ) = P (Y ≤ (z/10) + 1) = FY ((z/10) + 1). Thus,
FZ (z ) =

0
1−

1
[(z/10)+1]2

if z ≤ 0 if z > 0.

1-12

Solutions Manual for Statistical Inference

1.54 a.
b.

π /2 sin xdx
0
∞ e−|x| dx
−∞

= 1. Thus, c = 1/1 = 1.
=

0 ex dx
−∞

+

∞ −x e dx
0

= 1 + 1 = 2. Thus, c = 1/2.

1.55
3

P (V ≤ 5) = P (T < 3) =
0

1 −t/1.5 e dt = 1 − e−2 .
1.5

For v ≥ 6,
P (V ≤ v ) = P (2T ≤ v ) = P T ≤

v
2

v
2

=

Therefore,
P (V ≤ v ) =

0
1 − e−2
1 − e−v/3

0

1 −t/1.5 e dt = 1 − e−v/3 .
1.5

−∞ < v < 0,
.
0≤v > 0, f (a − ) = e−(a− ) > e−(a+ ) = f (a + ).
Therefore, f (x) is not symmetric about a > 0. If − < a ≤ 0, f (a − ) = 0 < e−(a+ ) = f (a + ).
Therefore, f (x) is not symmetric about a ≤ 0, either.
e. The median of X = log 2 < 1 = EX .
2.27 a. The standard normal pdf.
b. The uniform on the interval (0, 1).
c. For the case when the mode is unique. Let a be the point of symmetry and b be the mode. Let assume that a is not the mode and without loss of generality that a = b+ > b for > 0. Since b is the mode then f (b) > f (b + ) ≥ f (b + 2 ) which implies that f (a − ) > f (a) ≥ f (a + ) which contradict the fact the f (x) is symmetric. Thus a is the mode.
For the case when the mode is not unique, there must exist an interval (x1 , x2 ) such that f (x) has the same value in the whole interval, i.e, f (x) is ﬂat in this interval and for all b ∈ (x1 , x2 ), b is a mode. Let assume that a ∈ (x1 , x2 ), thus a is not a mode. Let also assume without loss of generality that a = (b + ) > b. Since b is a mode and a = (b + ) ∈ (x1 , x2 ) then f (b) > f (b + ) ≥ f (b + 2 ) which contradict the fact the f (x) is symmetric. Thus a ∈ (x1 , x2 ) and is a mode.
d. f (x) is decreasing for x ≥ 0, with f (0) > f (x) > f (y ) for all 0 < x < y . Thus f (x) is unimodal and 0 is the mode.

Second Edition

2-9

2.28 a.
∞

µ3

∞

a

(x − a)3 f (x)dx =

=
−∞
0

−∞

−∞
∞

y 3 f (y + a)dy
∞

y 3 f (y + a)dy
0

0

=

(change variable y = x − a)

0

−y 3 f (−y + a)dy +

=

(x − a)3 f (x)dx a ∞

y 3 f (y + a)dy +

=

(x − a)3 f (x)dx +

(f (−y + a) = f (y + a))

0.

b. For f (x) = e−x , µ1 = µ2 = 1, therefore α3 = µ3 .
∞

µ3

∞

(x − 1)3 e−x dx =

=

(x3 − 3x2 + 3x − 1)e−x dx

0

=

0

Γ(4) − 3Γ(3) + 3Γ(2) − Γ(1) = 3! − 3 × 2! + 3 × 1 − 1 = 3.

c. Each distribution has µ1 = 0, therefore we must calculate µ2 = EX 2 and µ4 = EX 4 .
2

(i) f (x) = √1 π e−x /2 , µ2 = 1, µ4 = 3, α4 = 3 .
2
1
1
µ2 = 1 , µ4 = 5 , α4 = 9 .
(ii) f (x) = 2 , −1 < x < 1,
3
5
(iii) f (x) = 1 e−|x| , −∞ < x < ∞, µ2 = 2, µ4 = 24, α4 = 6 .
2
As a graph will show, (iii) is most peaked, (i) is next, and (ii) is least peaked.
2.29 a. For the binomial n EX (X − 1)

nx p (1 − p)n−x x x(x − 1)

= x=2 n

= n(n − 1)p2 x=2 n−2

= n(n − 1)p2 y =0

n − 2 x−2 p (1 − p)n−x x n−2 y p (1 − p)n−2−y = n(n − 1)p2 , y where we use the identity x(x − 1) n = n(n − 1) n−2 , substitute y = x − 2 and recognize x x that the new sum is equal to 1. Similarly, for the Poisson
∞

∞

e−λ λx e−λ λy
EX (X − 1) = x(x − 1)
= λ2
= λ2 , x! y! y =0 x=2 where we substitute y = x − 2.
b. Var(X ) = E[X (X − 1)] + EX − (EX )2 . For the binomial
Var(X ) = n(n − 1)p2 + np − (np)2 = np(1 − p).
For the Poisson
Var(X ) = λ2 + λ − λ2 = λ.
c.
n

EY

=

y y =0

a n y+a y

a+b−1 a n+a+b−1 y +a

n

=

n y =1

a n−1 (y − 1) + (a + 1) y − 1

a+b−1 a (n−1)+(a+1)+b−1
(y −1)+(a+1)

2-10

Solutions Manual for Statistical Inference n =

n y =1

a+b−1 a (n−1)+(a+1)+b−1
(y −1)+(a+1)

a n−1 (y − 1) + (a + 1) y − 1

=

na a+b−1 a+1 a a+1+b−1 a+1

=

na a+b n−1

n−1 a+1 (y − 1) + (a + 1) y − 1 y =1

a+1+b−1 a+1 (n−1)+(a+1)+b−1
(j +(a+1)

a+1 n−1 j + (a + 1) j j =0

a+1+b−1 a+1 (n−1)+(a+1)+b−1
(y −1)+(a+1)

n

=

na
,
a+b

since the last summation is 1, being the sum over all possible values of a beta-binomial(n −
(n 1)
+1)
1, a + 1, b). E[Y (Y − 1)] = na+−)(aa(a+1) is calculated similar to EY, but using the identity
(b
+b
−2
y (y − 1) n = n(n − 1) n−2 and adding 2 instead of 1 to the parameter a. The sum over all y y possible values of a beta-binomial(n − 2, a + 2, b) will appear in the calculation. Therefore

Var(Y ) = E[Y (Y − 1)] + EY − (EY )2 = tX tX

c c tx 1
1
1
1
1 e c dx = ct etx 0 = ct etc − ct 1 = ct (etc
0
c 2x tx e dx = c22t2 (ctetc − etc + 1).
0 c2

)=
)=

2.30 a. E(e

b. E(e
c.

α

tx

E(e )

−∞

−α/β

e

=

2β

=

∞ tx r +x−1 x=0 e x d. E etX = that ∞ r +x−1 x=0 x

−∞

1 −(x−α)/β tx e e dx
2β
1 eα/β 1 e−x( β −t)
1
2β ( β − t)

∞

α

−2/β < t < 2/β.

x

t

(1 − p)e

α

+−

pr (1 − p)x = pr t (integration-by-parts)

α

1
1
ex( β +t)
1
( β +t)

4eαt
,
4−β 2 t2

− 1).

∞

1 (x−α)/β tx e e dx +
2β

=

nab(n + a + b)
.
(a + b)2 (a + b + 1)

∞ r +x−1 x=0 x r x

. Now use the fact

= 1 for (1 − p)et < 1, since this is just the

1 − (1 − p)e

r

p
1−(1−p)et

sum of this pmf, to get E(etX ) =

t

(1 − p)e

, t < − log(1 − p).

2.31 Since the mgf is deﬁned as MX (t) = EetX , we necessarily have MX (0) = Ee0 = 1. But t/(1 − t) is 0 at t = 0, therefore it cannot be an mgf.
2.32
d
S (t) dt = t=0 d
(log(M x (t)) dt d2
S (t) dt2 = t=0 d dt = t=0 d dt Mx (t)

=

Mx (t)

Mx (t)
Mx (t)

t=0

EX
= EX
1

since MX (0) = Ee0 = 1

2

=

Mx (t)M x (t) − [M x (t)]
2

[M x (t)]

t=0
2

1 · EX 2 −(EX )
1

=
2.33 a. MX (t) =
EX =

∞ tx e−λ λx x=0 e x! d dt Mx (t) t=0

= e−λ

= eλ(e

t

−1)

∞ (et λ)x x=1 x!

λet

= λ. t=0 = VarX. t = e−λ eλe = eλ(e

t

−1)

.

t=0

Second Edition

EX 2 =

d2
Mx (t)
=
dt2 t=0 2
2

λet eλ(e

t

−1)

t

λet +λet eλ(e

2-11

−1)

= λ2 + λ. t=0 VarX = EX − (EX ) = λ2 + λ − λ2 = λ.
b.
∞

Mx (t)

∞

x=0

d
Mx (t) dt =

=

t < − log(1 − p).

−p

t

t2

(1 − (1 − p)e )

t=0

p(1 − p) p2 2 d Mx (t) dt2 =
EX 2

p t, 1−(1 − p)e

=

=

((1 − p)e ) x=0 1
=p
t
1−(1 − p)e
EX

tx

etx p(1 − p)x = p

=

−(1 − p)e

t=0

1−p
.
p

=

t=0 t 2

1−(1 − p)e

t

t

p(1 − p)e

=

t

+p(1 − p)e 2 1−(1 − p)e

t

(1 − p)e

t4

(1 − (1 − p)e )

t=0
2

3

2

2

=

p (1 − p) + 2p (1 − p) p4 p(1 − p) + 2(1 − p)
.
p2

=

p(1 − p) + 2(1 − p)
(1 − p)
−
p2 p2 =

2

VarX

∞

2

2

2

1
1
c. Mx (t) = −∞ etx √2πσ e−(x−µ) /2σ dx = √2πσ plete the square in the numerator by writing

x2 − 2µx − 2σ 2 tx+µ2

Then we have Mx (t) = e[2µσ
EX =
EX 2 =

d dt Mx (t) t=0

2

d
Mx (t)
=
dt2 t=0 2
2
2

2
2
2
2
∞ e−(x −2µx−2σ tx+µ )/2σ dx.
−∞

Now com-

= x2 − 2(µ + σ 2 t)x ± (µ + σ 2 t)2 + µ2
= (x − (µ + σ 2 t))2 − (µ + σ 2 t)2 + µ2
= (x − (µ + σ 2 t))2 − [2µσ 2 t + (σ 2 t)2 ].

2
2
1
∞
t+(σ 2 t)2 ]/2σ 2 √ 1 e− 2σ2 (x−(µ+σ t)) dx
2πσ −∞

= (µ+σ 2 t)eµt+σ

2

1−p
.
p2

=

22

t /2

(µ+σ 2 t) e

σ 2 t2
2

.

= µ.

t=0
2 µt+σ t /2
22

= eµt+

+σ 2 eµt+σ

2

t/2

= µ2 + σ 2 . t=0 VarX = µ + σ − µ = σ 2 .
2.35 a.
∞

EX r
1

xr √

=
0

=
=

∞

1
√
2π
1
√
2π

= er

2

/2

2
1
e−(log x) /2 dx
2πx

ey(r−1) e−y
−∞
∞

e−y
−∞

.

2

/2+ry

2

(f1 is lognormal with µ = 0, σ2 = 1)

/2 y

e dy

1 dy = √
2π

(substitute y = log x, dy = (1/x)dx)
∞
−∞

e−(y

2

−2ry +r 2 )/2 r 2 /2

e

dy

2-12

Solutions Manual for Statistical Inference

b.
∞

∞

2
1
e−(log x) /2 sin(2π log x)dx
2πx
0
∞
2
1
e(y+r)r √ e−(y+r) /2 sin(2πy + 2πr)dy
2π
−∞
(substitute y = log x, dy = (1/x)dx)
∞
2
2
1
√ e(r −y )/2 sin(2πy )dy
2π
−∞
(sin(a + 2πr) = sin(a) if r = 0, 1, 2, . . .)
0,

xr f1 (x) sin(2π log x)dx =

xr √

0

=

=

=
2

2

2

2

because e(r −y )/2 sin(2πy ) = −e(r −(−y) )/2 sin(2π (−y )); the integrand is an odd function so the negative integral cancels the positive one.
2.36 First, it can be shown that
2
lim etx−(log x) = ∞ x→∞ by using l’Hˆpital’s rule to show o tx − (log x)2
= 1, x→∞ tx lim and, hence, lim tx − (log x)2 = lim tx = ∞.

x→∞

x→∞

Then for any k > 0, there is a constant c such that
∞
k

∞

1 tx ( log x)2 /2 ee dx ≥ c x k

1
∞
dx = c log x|k = ∞. x Hence Mx (t) does not exist.
2.37 a. The graph looks very similar to Figure 2.3.2 except that f1 is symmetric around 0 (since it is standard normal).
b. The functions look like t2 /2 – it is impossible to see any diﬀerence.
c. The mgf of f1 is eK1 (t) . The mgf of f2 is eK2 (t) .
d. Make the transformation y = ex to get the densities in Example 2.3.10. x d
2.39 a. dx 0 e−λt dt = e−λx . Verify x d dx b.

d dλ ∞ −λt e dt
0

e−λt dt =
0

=

∞ d −λt e dt
0 dλ

d
1
− e−λt dx λ

= d dλ

c.

d dt 11 dx t x2

=

− t1 .
2

d dt ∞
1
dx
1 (x−t)2

1 t ∞d dt 1

=

d dt =
0

d dx 1
1
− e−λx + λ λ

= e−λx .

1
−te−λt dt = − Γ(2) = − λ2 . Verify λ2 ∞

e−λt dt =
0

d1
1
= − 2. dλ λ λ Verify d dt

d.

∞
0

x

1 d dx =
2
x dt −

1
(x−t)2

∞
1

dx =

∞

(x − t)−2 dx =
1

1 x 1

= t d dt −1 +

1 t =−

−2

2(x − t)−3 dx = −(x − t)

d
−1
−(x − t) dt 1
.
t2
∞

=
1

∞

=
1

1
.
(1−t)2

d1
1
=
2.
dt 1 − t
(1 − t)

Verify

Chapter 3

Common Families of Distributions

1
N1 −N0 +1 ,

3.1 The pmf of X is f (x) =

N1

EX

=

x x=N 0

=
=

x = N0 , N0 + 1, . . . , N1 . Then

1
N1 −N 0 +1
N1 + N0
.
2

Similarly, using the formula for

N
1

Ex2

=

1
N1 −N 0 +1

VarX

=

EX 2 − EX

N0 −1

N1

1
1
=
N1 −N 0 +1
N1 −N 0 +1

x− x=1 x x=1 N1 (N 1 +1) (N 0 −1)(N 0 −1 + 1)
−
2
2

x2 , we obtain
N1 (N 1 +1)(2N 1 +1) − N0 (N 0 −1)(2N 0 −1)
6
(N 1 −N 0 )(N 1 −N 0 +2)
=
.
12

3.2 Let X = number of defective parts in the sample. Then X ∼ hypergeometric(N = 100, M, K ) where M = number of defectives in the lot and K = sample size.
a. If there are 6 or more defectives in the lot, then the probability that the lot is accepted
(X = 0) is at most
P (X = 0 | M = 100, N = 6, K ) =

6
0

94
K
100
K

=

(100 − K ) · · · · · (100 − K − 5)
.
100 · · · · · 95

By trial and error we ﬁnd P (X = 0) = .10056 for K = 31 and P (X = 0) = .09182 for
K = 32. So the sample size must be at least 32.
b. Now P (accept lot) = P (X = 0 or 1), and, for 6 or more defectives, the probability is at most P (X = 0 or 1 | M = 100, N = 6, K ) =

6
0

94
K
100
K

+

6
1

94
K −1
100
K

.

By trial and error we ﬁnd P (X = 0 or 1) = .10220 for K = 50 and P (X = 0 or 1) = .09331 for K = 51. So the sample size must be at least 51.
3.3 In the seven seconds for the event, no car must pass in the last three seconds, an event with probability (1 − p)3 . The only occurrence in the ﬁrst four seconds, for which the pedestrian does not wait the entire four seconds, is to have a car pass in the ﬁrst second and no other car pass. This has probability p(1 − p)3 . Thus the probability of waiting exactly four seconds before starting to cross is [1 − p(1 − p)3 ](1 − p)3 .

3-2

Solutions Manual for Statistical Inference

3.5 Let X = number of eﬀective cases. If the new and old drugs are equally eﬀective, then the probability that the new drug is eﬀective on a case is .8. If the cases are independent then X ∼ binomial(100, .8), and
100

100
.8x .2100−x = .1285. x P (X ≥ 85) = x=85 So, even if the new drug is no better than the old, the chance of 85 or more eﬀective cases is not too small. Hence, we cannot conclude the new drug is better. Note that using a normal approximation to calculate this binomial probability yields P (X ≥ 85) ≈ P (Z ≥ 1.125) =
.1303.
3.7 Let X ∼ Poisson(λ). We want P (X ≥ 2) ≥ .99, that is,
P (X ≤ 1) = e−λ + λe−λ ≤ .01.
Solving e−λ + λe−λ = .01 by trial and error (numerical bisection method) yields λ = 6.6384.
3.8 a. We want P (X > N ) < .01 where X ∼ binomial(1000, 1/2). Since the 1000 customers choose randomly, we take p = 1/2. We thus require
1000

P (X > N ) = x=N +1

1000 x x

1
2

1−

1
2

1000−x

< .01

which implies that
1
2

1000

1000

x=N +1

1000 x < .01.

This last inequality can be used to solve for N , that is, N is the smallest integer that satisﬁes
1000

1000

1
2

x=N +1

1000 x < .01.

The solution is N = 537.
1
b. To use the normal approximation we take X ∼ n(500, 250), where we used µ = 1000( 2 ) = 500
11
2 and σ = 1000( 2 )( 2 ) = 250.Then

P (X > N ) = P

X − 500
N − 500
√
>√
250
250

< .01

thus,
P

Z>

N − 500
√
250

< .01

where Z ∼ n(0, 1). From the normal table we get
N − 500
√
= 2.33
250
⇒ N ≈ 537.

P (Z > 2.33) ≈ .0099 < .01 ⇒

Therefore, each theater should have at least 537 seats, and the answer based on the approximation equals the exact answer.

Second Edition

3-3

3.9 a. We can think of each one of the 60 children entering kindergarten as 60 independent Bernoulli
1
trials with probability of success (a twin birth) of approximately 90 . The probability of having
5 or more successes approximates the probability of having 5 or more sets of twins entering
1
kindergarten. Then X ∼ binomial(60, 90 ) and
4

60 x P (X ≥ 5) = 1 − x=0 x

1
90

1
90

1−

60−x

= .0006,

which is small and may be rare enough to be newsworthy.
b. Let X be the number of elementary schools in New York state that have 5 or more sets of twins entering kindergarten. Then the probability of interest is P (X ≥ 1) where X ∼ binomial(310,.0006). Therefore P (X ≥ 1) = 1 − P (X = 0) = .1698.
c. Let X be the number of States that have 5 or more sets of twins entering kindergarten during any of the last ten years. Then the probability of interest is P (X ≥ 1) where X ∼ binomial(500, .1698). Therefore P (X ≥ 1) = 1 − P (X = 0) = 1 − 3.90 × 10−41 ≈ 1.
3.11 a.
M
x

lim

M/N →p,M →∞,N →∞

=

N −M
K −x
N
K

K!
M !(N −M )!(N −K )! lim x!(K −x)! M/N →p,M →∞,N →∞ N !(M −x)!(N −M −(K −x))!

In the limit, each of the factorial terms can be replaced by the approximation from Stirling’s formula because, for example,
√
√
M ! = (M !/( 2πM M +1/2 e−M )) 2πM M +1/2 e−M
√
√ and M !/( 2πM M +1/2 e−M ) → 1. When this replacement is made, all the 2π and exponential terms cancel. Thus,
M
x

lim

M/N →p,M →∞,N →∞

=

K x N −M
K −x
N
K
N −M +1/2

M M +1/2 (N −M )

lim

M/N →p,M →∞,N →∞

M −x+1/2

N N +1/2 (M −x)

N −K +1/2

(N −K )

N −M −(K −x)+1/2

(N −M −K +x)

.

We can evaluate the limit by breaking the ratio into seven terms, each of which has a ﬁnite limit we can evaluate. In some limits we use the fact that M → ∞, N → ∞ and M/N → p imply N − M → ∞. The ﬁrst term (of the seven terms) is lim M →∞

M
M −x

M

= lim

M →∞

1
M −x M
M

= lim

M →∞

1
M
1+ −x
M

=

1
= ex . e−x Lemma 2.3.14 is used to get the penultimate equality. Similarly we get two more terms, lim N −M →∞

N −M
N − M − (K − x)

and lim N →∞

N −K
N

N −M

N

= e−K .

= eK −x

3-4

Solutions Manual for Statistical Inference

Note, the product of these three limits is one. Three other terms are
M
M −x

1/2

N −M
N − M − (K − x)

1/2

lim M → ∞ lim N −M →∞

and
N →∞

=

1

1/2

N −K
N

lim

=1

= 1.

The only term left is
K −x

x

(M − x) (N − M − (K − x))

lim

K

(N − K )

M/N →p,M →∞,N →∞

=

lim

M/N →p,M →∞,N →∞

M −x
N −K

x

N − M − (K − x)
N −K

K −x

= px (1 − p)K −x .
b. If in (a) we in addition have K → ∞, p → 0, M K/N → pK → λ, by the Poisson approximation to the binomial, we heuristically get
N −M
K −x
N
K

M x Kx e−λ λx p (1 − p)K −x →
.
x x! →

c. Using Stirling’s formula as in (a), we get
M
x

lim

N,M,K →∞, M →0, KM →λ
N
N

N −M
K −x
N
K
−x

K −x K −x

=

e K x ex M x ex (N −M )
N K eK
N,M,K →∞, M →0, KM →λ x!
N
N

=

1 lim x! N,M,K →∞, M →0, KM →λ
N
N

=

MK
1x
λ lim 1− N x! N,M,K →∞, M →0, KM →λ
K
N
N

=

e−λ λx
.
x!

lim

KM
N

x

N −M
N

e

K −x

K

3.12 Consider a sequence of Bernoulli trials with success probability p. Deﬁne X = number of successes in ﬁrst n trials and Y = number of failures before the rth success. Then X and Y have the speciﬁed binomial and hypergeometric distributions, respectively. And we have
Fx (r − 1)

=
=
=
=
=
=

P (X ≤ r − 1)
P (rth success on (n + 1)st or later trial)
P (at least n + 1 − r failures before the rth success)
P (Y ≥ n − r + 1)
1 − P (Y ≤ n − r)
1 − FY (n − r).

Second Edition

3-5

3.13 For any X with support 0, 1, . . ., we have the mean and variance of the 0−truncated XT are given by
∞

EXT

=

∞

xP (XT = x) = x=1 x x=1 P (X = x)
P (X > 0)

∞

=

∞

1 xP (X = x) =
P (X > 0) x=1
EX 2
P (X>0) .

2
In a similar way we get EXT =

VarXT =

1 xP (X = x) =
P (X > 0) x=0

Thus,

EX 2
−
P (X > 0)

EX
P (X > 0)

a. For Poisson(λ), P (X > 0) = 1 − P (X = 0) = 1 −
P (XT = x)

e−λ λx x!(1−e−λ )

=

e−λ λ0
0!

P (XT = x)

x = 1, 2, . . .

EX =

−1 log p

=

1 log p

∞ −(1−p)x x=1 x

∞ x (1−p)

=

x=1

pr (1 − p)0 = 1 − pr . Then

pr (1 − p)
, x = 1, 2, . . .
1−pr
r(1 − p) p(1 − pr ) r(1 − p) + r2 (1 − p)2 r(1 − p)
−
. p2 (1 − pr ) p(1 − pr )2

=

VarXT

r −1
0

x

r +x−1 x =

EXT

=

.

= 1 − e−λ , therefore

b. For negative binomial(r, p), P (X > 0) = 1 − P (X = 0) = 1 −

∞ −(1−p)x x=1 x log p

2

= λ/(1 − e−λ )
= (λ2 + λ)/(1 − e−λ ) − (λ/(1 − e−λ ))2 .

EXT
VarXT

3.14 a.
b.

EX
.
P (X > 0)

= 1, since the sum is the Taylor series for log p.
∞

−1 log p

x

(1−p) −1 == x=0 −1 1
−1
−1 = log p p log p

Since the geometric series converges uniformly,
EX 2

=
=

−1 log p

∞

x(1 − p)x = x=1 (1−p) d log p dp

∞

x=1

VarX =

d
(1 − p)x dp x=1

(1−p) d 1−p log p dp p (1 − p)x =

Thus

∞

(1−p) log p

−(1−p) p2 log p

1+

=

(1−p)
.
log p

Alternatively, the mgf can be calculated,
Mx (t) =

−1 log p

∞ t (1−p)e x=1 and can be diﬀerentiated to obtain the moments.

x

=

log(1+pet −et ) log p

−(1−p)
.
p2 log p

1−p p .

3-6

Solutions Manual for Statistical Inference

3.15 The moment generating function for the negative binomial is
M (t) =

p t 1−(1 − p)e

r

t

=

1+

1 r(1 − p)(e −1) r 1−(1 − p)et

r

,

the term t t

r(1 − p)(e −1) λ(e −1)
→
= λ(et − 1) t 1
1−(1 − p)e

as r → ∞, p → 1 and r(p − 1) → λ.

Thus by Lemma 2.3.14, the negative binomial moment generating function converges to t eλ(e −1) , the Poisson moment generating function.
3.16 a. Using integration by parts with, u = tα and dv = e−t dt, we obtain
∞

−t

t(α+1)−1 e−t dt = tα (−e )

Γ(α + 1) =

∞

∞
0

0

b. Making the change of variable z =
∞

∞

0

0

√

t−1/2 e−t dt =

Γ(1/2) =

αtα−1 (−e−t )dt = 0 + αΓ(α) = αΓ(α).

−

0

2t, i.e., t = z 2 /2, we obtain
√
√
∞
√
√π√
2
2 −z2 /2 e zdz = 2 e−z /2 dz = 2 √ = π. z 2
0

where the penultimate equality uses (3.3.14).
3.17
∞

EX ν

=
=

∞

1
Γ(α)β α

1 xα−1 e−x/β dx =
Γ(α)β α
0
Γ(ν +α)β ν +α β ν Γ(ν +α)
=
.
Γ(α)β α
Γ(α)
xν

x(ν +α)−1 e−x/β dx
0

Note, this formula is valid for all ν > −α. The expectation does not exist for ν ≤ −α.
3.18 If Y ∼ negative binomial(r, p), its moment generating function is MY (t) = from Theorem 2.3.15, MpY (t) = lim p→0

r

, and,

r

p
1−(1−p)ept

p pt 1−(1 − p)e

p
1−(1−p)et

. Now use L’Hˆpital’s rule to calculate o = lim

p→0

1
1
=
,
pt
1−t
(p − 1)te +ept

so the moment generating function converges to (1 − t)−r , the moment generating function of a gamma(r, 1).
3.19 Repeatedly apply the integration-by-parts formula
1
Γ(n)

∞

z n−1 z −z dz = x xn−1 e−x
1
+
(n − 1)!
Γ(n − 1)

∞

z n−2 z −z dz, x until the exponent on the second integral is zero. This will establish the formula. If X ∼ gamma(α, 1) and Y ∼ Poisson(x). The probabilistic relationship is P (X ≥ x) = P (Y ≤ α − 1).
3.21 The moment generating function would be deﬁned by
∞
0

etx dx >
1+x2

∞
0

1 π ∞ etx dx.
−∞ 1+x2

x dx = ∞,
1+x2

thus the moment generating function does not exist.

On (0, ∞), etx > x, hence

Second Edition

3-7

3.22 a.
∞

E(X (X −1))

x(x − 1)

= x=0 e−λ λx x! ∞

= e−λ λ2

λx−2
(x−2)!
x=2

(let y = x − 2)

∞

= e−λ λ2
EX 2
VarX

λy y! y =0

= e−λ λ2 eλ = λ2

= λ2 + EX = λ2 + λ
= EX 2 − (EX )2 = λ2 + λ − λ2 = λ.

b.
∞

E(X (X −1))

x(x − 1)

r+x−1 pr(1 − p)x x r(r + 1)

=

r+x−1 pr(1 − p)x x−2 x=0
∞

= x=2 2∞

= r(r + 1)

(1 − p) p2 y =0

r+2+y−1 pr + 2(1 − p)y y 2

= r(r − 1)

(1 − p)
,
p2

where in the second equality we substituted y = x − 2, and in the third equality we use the fact that we are summing over a negative binomial(r + 2, p) pmf. Thus,
VarX

=

EX (X − 1) + EX − (EX )2
2

2

r(1 − p) r2 (1 − p)
(1 − p)
+
−
2
p p p2 r(1 − p)
.
p2

= r(r + 1)
=
c.
∞

EX 2

x2

=
0

1 xα−1 e−x/β dx =
Γ(α)β α

1
Γ(α)β α

∞

xα+1 e−x/β dx
0

=
VarX

1
Γ(α + 2)β α+2 = α(α + 1)β 2 .
Γ(α)β α

=

EX 2 − (EX )2 = α(α + 1)β 2 − α2 β 2 = αβ 2 .

d. (Use 3.3.18)
EX

=

EX 2

=

VarX

=

Γ(α+1)Γ(α+β )
Γ(α+β +1)Γ(α)
Γ(α+2)Γ(α+β )
Γ(α+β +2)Γ(α)
EX 2 − (EX )2

αΓ(α)Γ(α+β ) α =
.
(α+β )Γ(α+β )Γ(α) α+β (α+1)αΓ(α)Γ(α+β ) α(α+1) =
=
.
(α+β +1)(α+β )Γ(α+β )Γ(α)
(α+β )(α+β +1) αβ α(α+1) α2 =
.
=
−
2
(α+β )(α+β +1) (α+β )2
(α+β ) (α+β +1)
=

3-8

Solutions Manual for Statistical Inference

e. The double exponential(µ, σ ) pdf is symmetric about µ. Thus, by Exercise 2.26, EX = µ.
∞

VarX

(x − µ)2

=
−∞

∞

1 −|x−µ|/σ e dx =
2σ

1 σz 2 e−|z| σdz
2
−∞

∞

z 2 e−z dz = σ 2 Γ(3) = 2σ 2 .

= σ2
0

3.23 a.
∞

∞

−1 −β x β

x−β −1 dx = α 1
,
βαβ

= α thus f (x) integrates to 1 .
b. EX n =

βαn
(n−β ) ,

therefore
EX
EX 2

=

VarX

αβ
(1 − β ) αβ 2
(2 − β )

=

=

2

αβ 2
(αβ )
−
2
2−β
(1−β )

c. If β < 2 the integral of the second moment is inﬁnite.
1
3.24 a. fx (x) = β e−x/β , x > 0. For Y = X 1/γ , fY (y ) = tion z = y γ /β , we calculate

EY n =

γ β ∞

y γ +n−1 e−y

γ

1 −x/β
,
βe

z n/γ e−z dz = β n/γ Γ

dy = β n/γ

0

0
2
γ +1

EY =

1 γ +1

−Γ2

x > 0. For Y = (2X/β )1/2 , fY (y ) = ye−y

2

/2

n
+1 . γ .

, y > 0 . We now notice that

√

∞

2 −y 2 /2

ye

2π
2

dy =

0
∞

y > 0. Using the transforma-

∞
/β

1
Thus EY = β 1/γ Γ( γ + 1) and VarY = β 2/γ Γ

b. fx (x) =

γ −y γ /β γ −1 y , βe 2

since √1 π −∞ y 2 e−y /2 = 1, the variance of a standard normal, and the integrand is sym2 metric. Use integration-by-parts to calculate the second moment
∞

y 3 e−y

EY 2 =

2

∞
/2

ye−y

dy = 2

0

2

/2

dy = 2,

0

where we take u = y 2 , dv = ye−y
c. The gamma(a, b) density is

2

/2

. Thus VarY = 2(1 − π/4).

fX (x) =

1 xa−1 e−x/b .
Γ(a)ba

Make the transformation y = 1/x with dx = −dy/y 2 to get fY (y ) = fX (1/y )|1/y 2 | =

1
Γ(a)ba

1 y a+1

e−1/by .

Second Edition

3-9

The ﬁrst two moments are a EY

=

EY 2

=

and so VarY =
d. fx (x) =

∞
1
Γ(a − 1)ba−1
1
e−1/by = a Γ(a)b 0 y Γ(a)ba
Γ(a − 2)ba−2
1
=
,
Γ(a)ba
(a − 1)(a − 2)b2

1
(a − 1)b

=

1
(a−1)2 (a−2)b2 .

1 x3/2−1 e−x/β ,
Γ(3/2)β 3/2

x > 0. For Y = (X/β )1/2 , fY (y ) =
2

2
2 −y 2
,
Γ(3/2) y e
−y 2

calculate the moments we use integration-by-parts with u = y , dv = ye
EY =

∞

2
Γ(3/2)

∞

2
Γ(3/2)

2

y 3 e−y dy =
0

y > 0. To

to obtain

1
Γ(3/2)

2

ye−y dy =
0

2

and with u = y 3 , dv = ye−y to obtain
EY 2 =

∞

2
Γ(3/2)

3
Γ(3/2)

2

y 4 e−y dy =
0
∞

∞

2

y 2 e−y dy =
0

3√
π.
Γ(3/2)

2

1
Using the fact that 2√π −∞ y 2 e−y = 1, since it is the variance of a n(0, 2), symmetry yields
√
√
2
∞ 2 −y y e dy = π . Thus, VarY = 6 − 4/π , using Γ(3/2) = 1 π .
2
0 α −y

α −y

e. fx (x) = e−x , x > 0. For Y = α − γ log X , fY (y ) = e−e γ e γ of EY and EY 2 cannot be done in closed form. If we deﬁne
∞

I1

−∞ < y < ∞. Calculation

∞

log xe−x dx,

=

1 γ, (log x)2 e−x dx,

I2 =

0

0

then EY = E(α − γ log x) = α − γI1 , and EY 2 = E(α − γ log x)2 = α2 − 2αγI1 + γ 2 I2 .The constant I1 = .5772157 is called Euler’s constant.
3.25 Note that if T is continuous then,
P (t ≤ T ≤ t+δ |t ≤ T )

=
=
=

P (t ≤ T ≤ t+δ, t ≤ T )
P (t ≤ T )
P (t ≤ T ≤ t+δ )
P (t ≤ T )
FT (t+δ ) − F T (t)
.
1−F T (t)

Therefore from the deﬁnition of derivative, hT (t) =

1
1 − FT (t)

=

lim

δ →0

FT (t + δ ) − FT (t) δ =

FT (t)
1 − FT (t)

=

Also,
−
3.26 a. fT (t) =

1 −t/β βe d
1
(log[1 − F T (t)]) = −
(−fT (t)) = hT (t). dt 1−F T (t)

and FT (t) =

t 1 −x/β e dx
0β

hT (t) =

= − e−x/β

t
0

= 1 − e−t/β . Thus,

fT (t)
(1/β )e−t/β
1
=.
=
−t/β
1−F T (t) β 1−(1 − e
)

fT (t)
.
1−F T (t)

3-10

Solutions Manual for Statistical Inference t γ γ −1 −xγ /β x e dx 0β

γ

γ
b. fT (t) = β tγ −1 e−t /β , t ≥ 0 and FT (t) = γ 1 − e−t /β , where u = xγ/β . Thus,

γ

(γ/β )tγ −1 e−t e−tγ /β

hT (t) =
c. FT (t) =

1
1+e−(t−µ)/β

and fT (t) =

hT (t) =

e−(t−µ)/β
2

(1+e−(t−µ)/β )

/β

=

tγ/β
0

=

tγ/β

e−u du = − e−u |0

=

γ γ −1 t . β . Thus,

1 −(t−µ)/β (1+e−(t−µ)/β )2 e β

1

1
FT (t). β =

e−(t−µ)/β
1+e−(t−µ)/β

3.27 a. The uniform pdf satisﬁes the inequalities of Exercise 2.27, hence is unimodal. α−2 −x/β

d e b. For the gamma(α, β ) pdf f (x), ignoring constants, dx f (x) = x β
[β (α−1) − x], which only has one sign change. Hence the pdf is unimodal with mode β (α − 1).
−
d
c. For the n(µ, σ 2 ) pdf f (x), ignoring constants, dx f (x) = xσ2µ e−(−x/β ) one sign change. Hence the pdf is unimodal with mode µ.
d. For the beta(α, β ) pdf f (x), ignoring constants,

2

/2 σ 2

, which only has

d f (x) = xα−2 (1 − x)β −2 [(α−1) − x(α+β −2)] , dx which only has one sign change. Hence the pdf is unimodal with mode

α−1 α+β −2 .

3.28 a. (i) µ known, f (x|σ 2 ) = √ h(x) = 1, c(σ 2 ) =
(ii) σ 2 known,

√1
I
(σ 2 ),
2πσ 2 (0,∞)

f (x|µ) = √ h(x) = exp

−x2
2σ 2

,

c(µ) =

1 w1 (σ 2 ) = − 2σ2 ,

1 x2 exp − 2
2σ
2πσ
√1
2πσ

−1
(x − µ)2 ,
2σ 2

1 exp 2πσ

−µ2
2σ 2

exp

exp −
,

t1 (x) = (x − µ)2 .

µ2
2σ 2

exp µ

w1 (µ) = µ,

x
,
σ2

t1 (x) =

x σ2 .

b. (i) α known, f (x|β ) = α− 1

h(x) = x α) , x > 0,
Γ(
(ii) β known,

c(β ) =

1 βα ,

w1 (β ) =

f (x|α) = e−x/β h(x) = e−x/β , x > 0, c(α) =
(iii) α, β unknown,

1
Γ(α)β α

f (x|α, β ) =

−x
1
xα−1 e β , α Γ(α)β

1 β, t1 (x) = −x.

1 exp((α − 1) log x),
Γ(α)β α

w1 (α) = α − 1, t1 (x) = log x.

1 x exp((α − 1) log x − ),
Γ(α)β α β 1 h(x) = I{x>0} (x), c(α, β ) = Γ(α)β α , w1 (α) = α − 1, t1 (x) = log x, w2 (α, β ) = −1/β, t2 (x) = x.
1
c. (i) α known, h(x) = xα−1 I[0,1] (x), c(β ) = B (α,β ) , w1 (β ) = β − 1, t1 (x) = log(1 − x).
1
β −1
(ii) β known, h(x) = (1 − x)
I[0,1] (x), c(α) = B (α,β ) , w1 (x) = α − 1, t1 (x) = log x.

Second Edition

3-11

(iii) α, β unknown,
1
h(x) = I[0,1] (x), c(α, β ) = B (α,β ) , w1 (α) = α − 1, t1 (x) = log x, w2 (β ) = β − 1, t2 (x) = log(1 − x).
1
d. h(x) = x! I{0,1,2,...} (x), c(θ) = e−θ , w1 (θ) = log θ, t1 (x) = x. x−1 e. h(x) =

r − 1 I{r,r+1,...} (x), c(p) =

r

p
1−p

, w1 (p) = log(1 − p), t1 (x) = x.

3.29 a. For the n(µ, σ 2 )
2

e−µ /2σ σ 1
√
2π

f (x) =

2
2

e−x

/2σ 2 +xµ/σ 2

,

so the natural parameter is (η1 , η2 ) = (−1/2σ 2 , µ/σ 2 ) with natural parameter space
{(η1 ,η2 ):η1 < 0, −∞ < η2 < ∞}.
b. For the gamma(α, β ),
1
f (x) = e(α−1) log x−x/β ,
Γ(α)β α so the natural parameter is (η1 , η2 ) = (α − 1, −1/β ) with natural parameter space
{(η1 ,η2 ):η1 > −1,η2 < 0}.
c. For the beta(α, β ),
Γ(α+β )
Γ(α)Γ(β )

f (x) =

e(α−1) log x+(β −1) log(1−x) ,

so the natural parameter is (η1 , η2 ) = (α − 1, β − 1) and the natural parameter space is
{(η1 ,η2 ):η1 > −1,η2 > −1}.
d. For the Poisson
1
f (x) = e−θ exlogθ x! so the natural parameter is η = log θ and the natural parameter space is {η :−∞ < η < ∞}.
e. For the negative binomial(r, p), r known,
P (X = x) =

r +x−1

x

(pr ) ex log (1−p) ,

so the natural parameter is η = log(1 − p) with natural parameter space {η :η < 0}.
3.31 a.
0

=

∂
∂θ

k

h(x)c(θ) exp

wi (θ)ti (x) dx i=1 k

=

h(x)c (θ) exp

wi (θ)ti (x) dx i=1 k

+

h(x)c(θ) exp

k

wi (θ)ti (x) i=1 =

=
Therefore E

h(x)

∂ logc(θ) c(θ) exp
∂θj

∂ logc(θ) + E
∂θj
k
∂wi (θ ) i=1 ∂θj ti (x)

k

i=1

i=1

∂wi (θ) ti (x) dx
∂θj

k

k

wi (θ)ti (x) dx + E i=1 ∂wi (θ) ti (x)
∂θj

∂
= − ∂θj logc(θ).

i=1

∂wi (θ) ti (x)
∂θj

3-12

Solutions Manual for Statistical Inference

b.
0

=

k

∂2
∂θ2

h(x)c(θ) exp

wi (θ)ti (x) dx i=1 k

=

h(x)c (θ) exp

wi (θ)ti (x) dx i=1 k

+

k

h(x)c (θ) exp i=1 i=1

k

+

k

h(x)c (θ) exp

i=1

k

k

h(x)c(θ) exp

wi (θ)ti (x) i=1 i=1

k

+

k

h(x)c(θ) exp

wi (θ)ti (x) i=1 i=1

∂
2 logc(θ ) c(θ ) exp
∂θj

h(x)

c (θ) h(x) c(θ)

+

+2

+E ( i=1 k

i=1 k +E ( i=1 =

Therefore Var

wi (θ)ti (x) dx

wi (θ)ti (x) dx i=1 k

i=1

∂wi (θ) ti (x)
∂θj

∂wi (θ) ti (x))2 + E
∂θj

∂
∂
2 logc(θ ) + ∂θ logc(θ )
∂θj
j
−2E

∂wi (θ) ti (x) E
∂θj

∂2
2 logc(θ ) + Var
∂θj

k

i=1
2

k

i=1

k

i=1

∂wi (θ) ti (x)
∂θj
k

i=1

3.33 a. (i) h(x) = ex I{−∞ θ2 . Let X1 ∼ f (x − θ1 ) and X2 ∼ f (x − θ2 ). Let F (z ) be the cdf corresponding to f (z ) and let Z ∼ f (z ).Then
F (x | θ1 )

= P (X1 ≤ x) = P (Z + θ1 ≤ x) = P (Z ≤ x − θ1 ) = F (x − θ1 )
≤ F (x − θ2 ) = P (Z ≤ x − θ2 ) = P (Z + θ2 ≤ x) = P (X2 ≤ x)
= F (x | θ2 ).

3-14

Solutions Manual for Statistical Inference

The inequality is because x − θ2 > x − θ1 , and F is nondecreasing. To get strict inequality for some x, let (a, b] be an interval of length θ1 − θ2 with P (a < Z ≤ b) = F (b) − F (a) > 0.
Let x = a + θ1 . Then
F (x | θ1 )

= F (x − θ1 ) = F (a + θ1 − θ1 ) = F (a)
< F (b) = F (a + θ1 − θ2 ) = F (x − θ2 ) = F (x | θ2 ).

b. Let σ1 > σ2 . Let X1 ∼ f (x/σ1 ) and X2 ∼ f (x/σ2 ). Let F (z ) be the cdf corresponding to f (z ) and let Z ∼ f (z ). Then, for x > 0,
F (x | σ1 )

= P (X1 ≤ x) = P (σ1 Z ≤ x) = P (Z ≤ x/σ1 ) = F (x/σ1 )
≤ F (x/σ2 ) = P (Z ≤ x/σ2 ) = P (σ2 Z ≤ x) = P (X2 ≤ x)
= F (x | σ2 ).

The inequality is because x/σ2 > x/σ1 (because x > 0 and σ1 > σ2 > 0), and F is nondecreasing. For x ≤ 0, F (x | σ1 ) = P (X1 ≤ x) = 0 = P (X2 ≤ x) = F (x | σ2 ). To get strict inequality for some x, let (a, b] be an interval such that a > 0, b/a = σ1 /σ2 and
P (a < Z ≤ b) = F (b) − F (a) > 0. Let x = aσ1 . Then
F (x | σ1 ) = F (x/σ1 ) = F (aσ1 /σ1 ) = F (a)
< F (b) = F (aσ1 /σ2 ) = F (x/σ2 )
= F (x | σ2 ).
1
3.43 a. FY (y |θ) = 1 − FX ( y |θ) y > 0, by Theorem 2.1.3. For θ1 > θ2 ,

FY (y |θ1 ) = 1 − FX

1 θ1 y

≤ 1 − FX

1 θ2 y

= FY (y |θ2 )

for all y , since FX (x|θ) is stochastically increasing and if θ1 > θ2 , FX (x|θ2 ) ≤ FX (x|θ1 ) for
1
1 all x. Similarly, FY (y |θ1 ) = 1 − FX ( y |θ1 ) < 1 − FX ( y |θ2 ) = FY (y |θ2 ) for some y , since if θ1 > θ2 , FX (x|θ2 ) < FX (x|θ1 ) for some x. Thus FY (y |θ) is stochastically decreasing in θ.
b. FX (x|θ) is stochastically increasing in θ. If θ1 > θ2 and θ1 , θ2 > 0 then θ1 > θ1 . Therefore
2
1
1
FX (x| θ1 ) ≤ FX (x| θ1 ) for all x and FX (x| θ1 ) < FX (x| θ1 ) for some x. Thus FX (x| θ ) is
1
2
1
2 stochastically decreasing in θ.
3.44 The function g (x) = |x| is a nonnegative function. So by Chebychev’s Inequality,
P (|X | ≥ b) ≤ E|X |/b.
Also, P (|X | ≥ b) = P (X 2 ≥ b2 ). Since g (x) = x2 is also nonnegative, again by Chebychev’s
Inequality we have
P (|X | ≥ b) = P (X 2 ≥ b2 ) ≤ EX 2 /b2 .
For X ∼ exponential(1), E|X | = EX = 1 and EX 2 = VarX + (EX )2 = 2 . For b = 3,
E|X |/b = 1/3 > 2/9 = EX 2 /b2 .
√
Thus EX 2 /b2 is a better bound. But for b = 2,
√
E|X |/b = 1/ 2 < 1 = EX 2 /b2 .
Thus E|X |/b is a better bound.

Second Edition

3-15

3.45 a.
∞

MX (t)

∞

etx fX (x)dx ≥

=

etx fX (x)dx

−∞

a
∞

≥ eta

fX (x)dx = eta P (X ≥ a), a where we use the fact that etx is increasing in x for t > 0.
b.
∞

MX (t)

a

etx fX (x)dx ≥

=

etx fX (x)dx

−∞

−∞ a ta

fX (x)dx = eta P (X ≤ a),

≥e

−∞

where we use the fact that etx is decreasing in x for t < 0.
c. h(t, x) must be nonnegative.
3.46 For X ∼ uniform(0, 1), µ =

1
2

and σ 2 =

P (|X − µ| > kσ ) = 1 − P

1
12 ,

thus

1 k 1 k −√ ≤X≤ +√
2
2
12
12

1−

=

2
√k
12

0

k< k≥ √
√

3,
3,

For X ∼ exponential(λ), µ = λ and σ 2 = λ2 , thus
1 + e−(k+1) − ek−1 e−(k+1) P (|X − µ| > kσ ) = 1 − P (λ − kλ ≤ X ≤ λ + kλ) =

k≤1 k > 1.

From Example 3.6.2, Chebychev’s Inequality gives the bound P (|X − µ| > kσ ) ≤ 1/k 2 . k .1
.5
1
1.5
√
3
2
4
10

Comparison of probabilities u(0, 1) exact .942
.711
.423
.134
0
0
0
0

exp(λ) exact .926
.617
.135
.0821
0.0651
0.0498
0.00674
0.0000167

Chebychev
100
4
1
.44
.33
.25
.0625
.01

So we see that Chebychev’s Inequality is quite conservative.
3.47
P (|Z | > t)

=
=
=

1
2P (Z > t) = 2 √
2π
2 π 2 π ∞ t 2

e−x

/2

dx

t

1+x2 −x2 /2 e dx
1+x2

∞ t ∞

1 −x2 /2 e dx+
1+x2

∞ t x2 −x2 /2 e dx .
1+x2

3-16

Solutions Manual for Statistical Inference

To evaluate the second term, let u = obtain ∞ t x2 −x2 /2 e dx =
1 + x2

x
1+x2 ,

2

dv = xe−x

2 x (−e−x /2 )
2
1+x
2
t e−t /2 +
2
1+t

=

/2

2

dx, v = −e−x

∞

∞

− t t
∞

/2

, du =

1−x2
(1+x2 )2 ,

to

2
1 − x2
(−e−x /2 )dx
2 )2
(1 + x

1 − x2 −x2 /2 e dx.
(1 + x2 )2

t

Therefore,
P (Z ≥ t)

2
2t
e−t /2 + π 1 + t2

=

∞

2 π 2
2t
e−t /2 +
2
π1+t

=

2 π t
∞
t

1
1 − x2
+
1 + x2
(1 + x2 )2
2
(1 + x2 )2

2

e−x

/2

2

e−x

/2

dx

dx

2
2t
e−t /2 . π 1 + t2

≥
3.48 For the negative binomial
P (X = x + 1) =

r+x+1−1 r p (1 − p)x+1 = x+1 r+x x+1 (1 − p)P (X = x).

For the hypergeometric
 (M −x)(k−x+x+1)(x+1)

P (X =x)

M
N −M
P (X = x + 1) = (x+1)(k−x−1)


(N ) k 0

if x < k , x < M , x ≥ M − (N − k ) if x = M − (N − k ) − 1 otherwise. 3.49 a.
∞

E(g (X )(X − αβ )) =

g (x)(x − αβ )
0

1 β α xα−1 e−x/β dx.
Γ(α)

Let u = g (x), du = g (x), dv = (x − αβ )xα−1 e−x/β , v = −βxα e−x/β . Then
Eg (X )(X − αβ ) =

1
−g (x)βxα e−x/β
Γ(α)β α

∞

∞

g (x)xα e−x/β dx .

+β
0

0

Assuming g (x) to be diﬀerentiable, E|Xg (X )| < ∞ and limx→∞ g (x)xα e−x/β = 0, the ﬁrst term is zero, and the second term is β E(Xg (X )).
b.
E g (X ) β −(α−1)

1−X x =

Γ(α+β )
Γ(α)Γ(β )

1

g (x) β − (α − 1)
0

1−x x xα−1 (1 − x)β −1 dx.

Let u = g (x) and dv = (β − (α − 1) 1−x )xα−1 (1 − x)β . The expectation is x Γ(α + β ) g (x)xα−1 (1 − x)β
Γ(α)Γ(β )

1
0

1

(1 − x)g (x)xα−1 (1 − x)β −1 dx = E((1 − X )g (X )),

+
0

assuming the ﬁrst term is zero and the integral exists.

Second Edition

3-17

3.50 The proof is similar to that of part a) of Theorem 3.6.8. For X ∼ negative binomial(r, p),
Eg (X )
∞

=

g (x) x=0 ∞

r+x−1 r p (1 − p)x x g (y − 1)

r+y−2 r p (1 − p)y−1 y−1 g (y − 1)

=

y r+y−1 y =1
∞

= y =1
∞

= y =0

=

E

r+y−1 r p (1 − p)y−1 y y g (y − 1) r+y−1 1−p
X
g (X − 1) r+X −1 1−p

(set y = x + 1)

r+y−1 r p (1 − p)y y (the summand is zero at y = 0)

,

where in the third equality we use the fact that

r +y −2 y −1

=

y r +y −1

r +y −1 y .

Chapter 4

Multiple Random Variables

4.1 Since the distribution is uniform, the easiest way to calculate these probabilities is as the ratio of areas, the total area being 4.
a. The circle x2 + y 2 ≤ 1 has area π , so P (X 2 + Y 2 ≤ 1) = π .
4
2
b. The area below the line y = 2x is half of the area of the square, so P (2X − Y > 0) = 4 .
c. Clearly P (|X + Y | < 2) = 1.
4.2 These are all fundamental properties of integrals. The proof is the same as for Theorem 2.2.5 with bivariate integrals replacing univariate integrals.
4.3 For the experiment of tossing two fair dice, each of the points in the 36-point sample space are equally likely. So the probability of an event is (number of points in the event)/36. The given probabilities are obtained by noting the following equivalences of events.
P ({X = 0, Y = 0})

= P ({(1, 1), (2, 1), (1, 3), (2, 3), (1, 5), (2, 5)}) =

P ({X = 0, Y = 1})

= P ({(1, 2), (2, 2), (1, 4), (2, 4), (1, 6), (2, 6)}) =

6
36
6
36

=
=

1
6
1
6

P ({X = 1, Y = 0})
= P ({(3, 1), (4, 1), (5, 1), (6, 1), (3, 3), (4, 3), (5, 3), (6, 3), (3, 5), (4, 5), (5, 5), (6, 5)})
12
1
=
=
36
3
P ({X = 1, Y = 1})
= P ({(3, 2), (4, 2), (5, 2), (6, 2), (3, 4), (4, 4), (5, 4), (6, 4), (3, 6), (4, 6), (5, 6), (6, 6)})
12
1
=
=
36
3
4.4 a.

12
00

C (x + 2y )dxdy = 4C = 1, thus C = 1 .
4
11
(x
04

+ 2y )dy = 1 (x + 1) 0 < x < 2
4
0 otherwise x y c. FXY (x, y ) = P (X ≤ x, Y ≤ y ) = −∞ −∞ f (v, u)dvdu. The way this integral is calculated depends on the values of x and y . For example, for 0 < x < 2 and 0 < y < 1,

b. fX (x) =

x

y

x

FXY (x, y ) =

y

f (u, v )dvdu =
−∞

−∞

0

0

1 x2 y y 2 x
(u + 2v )dvdu =
+
.
4
8
4

But for 0 < x < 2 and 1 ≤ y , x y

FXY (x, y ) =

x

1

f (u, v )dvdu =
−∞

−∞

0

0

1 x2 x
(u + 2v )dvdu =
+.
4
8
4

4-2

Solutions Manual for Statistical Inference

The complete deﬁnition of FXY is

x ≤ 0 or y ≤ 0
 02

 x y/8 + y 2 x/4 0 < x < 2 and 0 < y < 1

2 ≤ x and 0 < y < 1
FXY (x, y ) = y/2 + y 2 /2
.
2
 x /8 + x/4
0 < x < 2 and 1 ≤ y


1
2 ≤ x and 1 ≤ y
d. The function z = g (x) = 9/(x + 1)2 is monotone on 0 < x < 2, so use Theorem 2.1.5 to obtain fZ (z ) = 9/(8z 2 ), 1 < z < 9.
√
11
7
4.5 a. P (X > Y ) = 0 √y (x + y )dxdy = 20 .
1

√

y

b. P (X 2 < Y < X ) = 0 y 2xdxdy = 1 .
6
4.6 Let A = time that A arrives and B = time that B arrives. The random variables A and B are independent uniform(1, 2) variables. So their joint pdf is uniform on the square (1, 2) × (1, 2).
Let X = amount of time A waits for B . Then, FX (x) = P (X ≤ x) = 0 for x < 0, and
FX (x) = P (X ≤ x) = 1 for 1 ≤ x. For x = 0, we have
2

a

FX (0) = P (X ≤ 0) = P (X = 0) = P (B ≤ A) =

1dbda =
1

1

1
.
2

And for 0 < x < 1,
2−x

2

FX (x) = P (X ≤ x) = 1 − P (X > x) = 1 − P (B − A > x) = 1 −

1dbda =
1

a+x

x2
1
+x− .
2
2

4.7 We will measure time in minutes past 8 A.M. So X ∼ uniform(0, 30), Y ∼ uniform(40, 50) and the joint pdf is 1/300 on the rectangle (0, 30) × (40, 50).
60−y

50

P (arrive before 9 A.M.) = P (X + Y < 60) =
40

0

1
1
dxdy = .
300
2

4.9
P (a ≤ X ≤ b, c ≤ Y ≤ d)
= P (X ≤ b, c ≤ Y ≤ d) − P (X ≤ a, c ≤ Y ≤ d)
= P (X ≤ b, Y ≤ d) − P (X ≤ b, Y ≤ c) − P (X ≤ a, Y ≤ d) + P (X ≤ a, Y ≤ c)
= F (b, d) − F (b, c) − F (a, d) − F (a, c)
= FX (b)FY (d) − FX (b)FY (c) − FX (a)FY (d) − FX (a)FY (c)
= P (X ≤ b) [P (Y ≤ d) − P (Y ≤ c)] − P (X ≤ a) [P (Y ≤ d) − P (Y ≤ c)]
= P (X ≤ b)P (c ≤ Y ≤ d) − P (X ≤ a)P (c ≤ Y ≤ d)
= P (a ≤ X ≤ b)P (c ≤ Y ≤ d).
4.10 a. The marginal distribution of X is P (X = 1) = P (X = 3) = 1 and P (X = 2) =
4
marginal distribution of Y is P (Y = 2) = P (Y = 3) = P (Y = 4) = 1 . But
3

1
2.

The

11
P (X = 2, Y = 3) = 0 = ( )( ) = P (X = 2)P (Y = 3).
23
Therefore the random variables are not independent.
b. The distribution that satisﬁes P (U = x, V = y ) = P (U = x)P (V = y ) where U ∼ X and
V ∼ Y is

Second Edition

4-3

1
V

2
3
4

U
2

3

1
12
1
12
1
12

1
6
1
6
1
6

1
12
1
12
1
12

4.11 The support of the distribution of (U, V ) is {(u, v ) : u = 1, 2, . . . ; v = u + 1, u + 2, . . .}. This is not a cross-product set. Therefore, U and V are not independent. More simply, if we know
U = u, then we know V > u.
4.12 One interpretation of “a stick is broken at random into three pieces” is this. Suppose the length of the stick is 1. Let X and Y denote the two points where the stick is broken. Let X and Y both have uniform(0, 1) distributions, and assume X and Y are independent. Then the joint distribution of X and Y is uniform on the unit square. In order for the three pieces to form a triangle, the sum of the lengths of any two pieces must be greater than the length of the third. This will be true if and only if the length of each piece is less than 1/2. To calculate the probability of this, we need to identify the sample points (x, y ) such that the length of each piece is less than 1/2. If y > x, this will be true if x < 1/2, y − x < 1/2 and 1 − y < 1/2.
These three inequalities deﬁne the triangle with vertices (0, 1/2), (1/2, 1/2) and (1/2, 1). (Draw a graph of this set.) Because of the uniform distribution, the probability that (X, Y ) falls in the triangle is the area of the triangle, which is 1/8. Similarly, if x > y , each piece will have length less than 1/2 if y < 1/2, x − y < 1/2 and 1 − x < 1/2. These three inequalities deﬁne the triangle with vertices (1/2, 0), (1/2, 1/2) and (1, 1/2). The probability that (X, Y ) is in this triangle is also 1/8. So the probability that the pieces form a triangle is 1/8 + 1/8 = 1/4.
4.13 a.
E(Y − g (X ))2
=
=

2

E ((Y − E(Y | X )) + (E(Y | X ) − g (X )))
E(Y − E(Y | X ))2 + E(E(Y | X ) − g (X ))2 + 2E [(Y − E(Y | X ))(E(Y | X ) − g (X ))] .

The cross term can be shown to be zero by iterating the expectation. Thus
2

E(Y − g (X )) = E(Y − E(Y | X ))2 + E(E(Y | X ) − g (X ))2 ≥ E(Y − E(Y | X ))2 , for all g(·).
The choice g (X ) = E(Y | X ) will give equality.
b. Equation (2.2.3) is the special case of a) where we take the random variable X to be a constant. Then, g (X ) is a constant, say b, and E(Y | X ) = EY .
4.15 We will ﬁnd the conditional distribution of Y |X + Y . The derivation of the conditional distribution of X |X + Y is similar. Let U = X + Y and V = Y . In Example 4.3.1, we found the joint pmf of (U, V ). Note that for ﬁxed u, f (u, v ) is positive for v = 0, . . . , u. Therefore the conditional pmf is f (u, v )
=
f (v |u) = f (u)

θ u−v e−θ λv e−λ
(u−v )! v! (θ +λ)u e−(θ+λ) u! =

u v λ θ+λ v

θ θ+λ u−v

, v = 0, . . . , u.

That is V |U ∼ binomial(U, λ/(θ + λ)).
4.16 a. The support of the distribution of (U, V ) is {(u, v ) : u = 1, 2, . . . ; v = 0, ±1, ±2, . . .}.
If V > 0, then X > Y . So for v = 1, 2, . . ., the joint pmf is fU,V (u, v )

= P (U = u, V = v ) = P (Y = u, X = u + v )
= p(1 − p)u+v−1 p(1 − p)u−1 = p2 (1 − p)2u+v−2 .

4-4

Solutions Manual for Statistical Inference

If V < 0, then X < Y . So for v = −1, −2, . . ., the joint pmf is
= P (U = u, V = v ) = P (X = u, Y = u − v )
= p(1 − p)u−1 p(1 − p)u−v−1 = p2 (1 − p)2u−v−2 .

fU,V (u, v )

If V = 0, then X = Y . So for v = 0, the joint pmf is fU,V (u, 0) = P (U = u, V = 0) = P (X = Y = u) = p(1 − p)u−1 p(1 − p)u−1 = p2 (1 − p)2u−2 .
In all three cases, we can write the joint pmf as
2u

fU,V (u, v ) = p2 (1 − p)2u+|v|−2 = p2 (1 − p)

(1 − p)|v|−2 , u = 1, 2, . . . ; v = 0, ±1, ±2, . . . .

Since the joint pmf factors into a function of u and a function of v , U and V are independent.
b. The possible values of Z are all the fractions of the form r/s, where r and s are positive integers and r < s. Consider one such value, r/s, where the fraction is in reduced form. That is, r and s have no common factors. We need to identify all the pairs (x, y ) such that x and y are positive integers and x/(x + y ) = r/s. All such pairs are (ir, i(s − r)), i = 1, 2, . . ..
Therefore,
P Z=

r s ∞

∞

p(1 − p)ir−1 p(1 − p)i(s−r)−1

P (X = ir, Y = i(s − r)) =

= i=1 i=1

p

=

∞

2

si

((1 − p) )

2

(1 − p)

=

i=1

(1 − p) s 1−(1 − p)
(1 − p) p s−2

s

2

2

p2 (1 − p)
s.
1−(1 − p)

=

c.
P (X = x, X + Y = t) = P (X = x, Y = t − x) = P (X = x)P (Y = t − x) = p2 (1 − p)t−2 . i+1 4.17 a. P (Y = i + 1) = i e−x dx = e−i (1 − e−1 ), which is geometric with p = 1 − e−1 .
b. Since Y ≥ 5 if and only if X ≥ 4,
P (X − 4 ≤ x|Y ≥ 5) = P (X − 4 ≤ x|X ≥ 4) = P (X ≤ x) = e−x , since the exponential distribution is memoryless.
4.18 We need to show f (x, y ) is nonnegative and integrates to 1. f (x, y ) ≥ 0, because the numerator is nonnegative since g (x) ≥ 0, and the denominator is positive for all x > 0, y > 0. Changing to polar coordinates, x = r cos θ and y = r sin θ, we obtain
∞

∞

∞

π /2

f (x, y )dxdy =
0

0

0

4.19 a. Since (X1 − X2 )

√

0

2
2g (r) rdrdθ = πr π

∞

π /2

g (r)drdθ =
0

0

2 π π /2

1dθ = 1.
0

2 ∼ n(0, 1), (X1 − X2 )2 2 ∼ χ2 (see Example 2.1.9).
1

b. Make the transformation y1 =
|J | = y2 . Then f (y 1 , y 2 ) =

x1 x1 +x2 ,

y2 = x1 + x2 then x1 = y1 y2 , x2 = y2 (1 − y1 ) and

Γ(α1 +α2 ) α1 −1 α −1 y (1 − y 1 ) 2
Γ(α1 )Γ(α2 ) 1

1 y α1 +α2 −1 e−y2 ,
Γ(α1 +α2 ) 2

thus Y1 ∼ beta(α1 , α2 ), Y2 ∼ gamma(α1 + α1 , 1) and are independent.

Second Edition

4-5

4.20 a. This transformation is not one-to-one because you cannot determine the sign of X2 from
Y1 and Y2 . So partition the support of (X1 , X2 ) into A0 = {−∞ < x1 < ∞, x2 = 0},
A1 = {−∞ < x1 < ∞, x2 > 0} and A2 = {−∞ < x1 < ∞, x2 < 0}. The support of (Y1 , Y2 )
√
is B = {0 < y1 < ∞, −1 < y2 < 1}. The inverse transformation from B to A1 is x1 = y2 y1
2
and x2 = y1 −y 1 y2 with Jacobian
√

y
1 √2
2 y1
√2
1
1 √−y 2 y1 2

J1 =

y2
√

y1

√

=

y1

1−y 2
2

1
2
2 1 − y2

.

√
2
The inverse transformation from B to A2 is x1 = y2 y1 and x2 = − y1 −y 1 y2 with J2 =
−J1 . From (4.3.6), fY1 ,Y 2 (y1 , y2 ) is the sum of two terms, both of which are the same in this case. Then fY1 ,Y 2 (y1 , y2 )

=
=

1 −y1 /(2σ2 )
1
e
2πσ 2
2 1−y 2
2
1 −y1 /(2σ2 )
1
e
,
2πσ 2
1−y 2
2

2

0 < y1 < ∞, −1 < y2 < 1.

b. We see in the above expression that the joint pdf factors into a function of y1 and a function of y2 . So Y1 and Y2 are independent. Y1 is the square of the distance from (X1 , X2 ) to the origin. Y2 is the cosine of the angle between the positive x1 -axis and the line from
(X1 , X2 ) to the origin. So independence says the distance from the origin is independent of the orientation (as measured by the angle).
4.21 Since R and θ are independent, the joint pdf of T = R2 and θ is
1 −t/2 e , 0 < t < ∞, 0 < θ < 2π.
4π
√
√
Make the transformation x = t cos θ, y = t sin θ. Then t = x2 + y 2 , θ = tan−1 (y/x), and fT,θ (t, θ) =

J=

2x

2y

−y x2 +y 2

−x x2 +y 2

= 2.

Therefore fX,Y (x, y ) =

2 − 1 (x2 +y2 )
, 0 < x2 + y 2 < ∞, 0 < tan−1 y/x < 2π. e2 4π

Thus, fX,Y (x, y ) =

1 − 1 (x2 +y2 ) e2 , −∞ < x, y < ∞.
2π

So X and Y are independent standard normals.
4.23 a. Let y = v , x = u/y = u/v then
J=

fU,V (u, v ) =

∂x
∂u
∂y
∂u

Γ(α+β ) Γ(α+β +γ )
Γ(α)Γ(β ) Γ(α+β )Γ(γ )

∂x
∂v
∂y
∂v

u v =

1 v u
− v2
1

0

α−1

1−

u v β −1

=

1
.
v

1 v α+β −1 (1−v )γ −1 , 0 < u < v < 1. v 4-6

Solutions Manual for Statistical Inference

Then, fU (u)

=
=
=
=

Γ(α+β +γ ) α−1 u Γ(α)Γ(β )Γ(γ )

1

v β −1 (1 − v )γ −1 ( u v −u β −1
)
dv v 1
Γ(α+β +γ ) α−1 u (1 − u)β +γ −1 y β −1 (1 − y )γ −1 dy
Γ(α)Γ(β )Γ(γ )
0
Γ(β )Γ(γ )
Γ(α+β +γ ) α−1 u (1 − u)β +γ −1
Γ(α)Γ(β )Γ(γ )
Γ(β +γ )
Γ(α+β +γ ) α−1 u (1 − u)β +γ −1 ,
0 < u < 1.
Γ(α)Γ(β +γ )

y=

dv v−u , dy =
1−u
1−u

Thus, U ∼ gamma(α, β + γ ).
√
b. Let x = uv , y = u then v ∂x
∂u
∂y
∂u

J=

fU,V (u, v ) =

∂x
∂v
∂x
∂v

1 1/2 −1/2 u 2v
1 −1/2 −1/2 v u
2

=

1 1/2 −1/2 v 2u
− 1 u1/2 v −3/2
2

√
Γ(α + β + γ ) √ α−1
( uv
(1 − uv )β −1
Γ(α)Γ(β )Γ(γ )

u v =

1
.
2v

α+β −1

1−

u v γ −1

1
.
2v

1
The set {0 < x < 1, 0 < y < 1} is mapped onto the set {0 < u < v < u , 0 < u < 1}. Then,

fU (u)
1/u

=

fU,V (u, v )dv u =

Γ(α + β + γ ) α−1 u (1−u)β +γ −1
Γ(α)Γ(β )Γ(γ )

1/u u √
1 − uv
1−u

β −1

1 − u/v
1−u

γ −1

( u/v )β dv. 2v (1 − u)

Call it A
√
√ u/v u/v −u
To simplify, let z = 1−u . Then v = u ⇒ z = 1, v = 1/u ⇒ z = 0 and dz = − 2(1−u)v dv .
Thus,
fU (u)

=A
=
=

z β −1 (1 − z )γ −1 dz

( kernel of beta(β, γ ))

Γ(α+β +γ ) α−1
Γ(β )Γ(γ ) u (1 − u)β +γ −1
Γ(α)Γ(β )Γ(γ )
Γ(β +γ )
Γ(α+β +γ ) α−1 u (1 − u)β +γ −1 ,
0 < u < 1.
Γ(α)Γ(β +γ )

That is, U ∼ beta(α, β + γ ), as in a). x 4.24 Let z1 = x + y , z2 = x+y , then x = z1 z2 , y = z1 (1 − z2 ) and
|J | =

∂x
∂z1
∂y
∂z1

∂x
∂z2
∂y
∂z2

=

z2
1−z 2

z1
−z 1

= z1 .

The set {x > 0, y > 0} is mapped onto the set {z1 > 0, 0 < z2 < 1}. fZ1 ,Z2 (z 1 , z2 )

=
=

1
(z1 z2 )r−1 e−z1 z2
Γ(r)
1 z r+s−1 e−z1 ·
Γ(r+s) 1

1
(z1 − z1 z2 )s−1 e−z1 +z1 z2 z1
Γ(s)
Γ(r+s) r−1 z (1 − z2 )s−1 ,
0 < z1 , 0 < z2 < 1.
Γ(r)Γ(s) 2
·

Second Edition

4-7

fZ1 ,Z 2 (z1 , z2 ) can be factored into two densities. Therefore Z1 and Z2 are independent and
Z1 ∼ gamma(r + s, 1), Z2 ∼ beta(r, s).
4.25 For X and Z independent, and Y = X + Z , fXY (x, y ) = fX (x)fZ (y − x). In Example 4.5.8, fXY (x, y ) = I(0,1) (x)

1
I(0,1/10) (y − x).
10

In Example 4.5.9, Y = X 2 + Z and fXY (x, y ) = fX (x)fZ (y − x2 ) =

1
1
I(−1,1) (x) I(0,1/10) (y − x2 ).
2
10

4.26 a.
P (Z ≤ z, W = 0)

= P (min(X, Y ) ≤ z, Y ≤ X ) = P (Y ≤ z, Y ≤ X ) z ∞
1 −x/λ 1 −y/µ
=
e e dxdy λ µ
0
y
=

λ µ+λ 1
1
+ µλ 1 − exp −

z

.

Similarly,
P (Z ≤ z,W =1)

= P (min(X, Y ) ≤ z, X ≤ Y ) = P (X ≤ z, X ≤ Y ) z ∞
1 −x/λ 1 −y/µ µ = e e dydx =
1 − exp − λ µ µ+λ 0 x 1
1
+ µλ z

.

b.
∞

∞

λ
1 −x/λ 1 −y/µ e e dxdy =
.
λ µ µ+λ
0
y µ P (W = 1) = 1 − P (W = 0) =
.
µ+λ
1
1
P (Z ≤ z ) = P (Z ≤ z, W = 0) + P (Z ≤ z, W = 1) = 1 − exp −
+
z. µλ Therefore, P (Z ≤ z, W = i) = P (Z ≤ z )P (W = i), for i = 0, 1, z > 0. So Z and W are independent. 4.27 From Theorem 4.2.14 we know U ∼ n(µ + γ, 2σ 2 ) and V ∼ n(µ − γ, 2σ 2 ). It remains to show that they are independent. Proceed as in Exercise 4.24.
P (W = 0) = P (Y ≤ X ) =

fXY (x, y ) =

1 − 12 [(x−µ)2 +(y−γ )2 ] e 2σ
2πσ 2

(by independence, sofXY = fX fY )

Let u = x + y , v = x − y , then x = 1 (u + v ), y = 1 (u − v ) and
2
2
1/2 1/2
1/2 −1/2

|J | =

=

1
.
2

The set {−∞ < x < ∞, −∞ < y < ∞} is mapped onto the set {−∞ < u < ∞, −∞ < v < ∞}.
Therefore
1
1 − 2σ2 (( u+v )−µ)2 +(( u−v )−γ )2 1
2
2 fU V (u, v ) = e ·
2πσ 2
2
2

2

2

2

(µ+γ )
(µ+γ )
1
1 − 2σ2 2( u ) −u(µ+γ )+ 2 +2( v ) −v(µ−γ )+ 2
2
2 e 4πσ 2
1 − 2(21 2 )
−1
2
2
σ
= g (u) e (u − (µ + γ )) · h(v )e 2(2σ2 ) (v − (µ − γ )) .
2
4πσ
By the factorization theorem, U and V are independent.

=

4-8

Solutions Manual for Statistical Inference

4.29 a.

− θ = R cos θ = cot θ. Let Z = cot θ. Let A1 = (0, π ), g1 (θ) = cot θ, g1 1 (z ) = cot−1 z ,
R sin
−1
−1
A2 = (π, 2π ), g2 (θ) = cot θ, g2 (z ) = π + cot z . By Theorem 2.1.8
X
Y

fZ (z ) =

1
−1
1
−1
11
|
|+
|
|=
,
2π 1 + z 2
2π 1 + z 2 π 1 + z2

−∞ < z < ∞.

b. XY = R2 cos θ sin θ then 2XY = R2 2 cos θ sin θ = R2 sin 2θ. Therefore 2XY = R sin 2θ.
R
√
2
2
Since R = X 2 + Y 2 then √XXYY 2 = R sin 2θ. Thus √XXYY 2 is distributed as sin 2θ which
2+
2+ is distributed as sin θ. To see this let sin θ ∼ fsin θ . For the function sin 2θ the values of the function sin θ are repeated over each of the 2 intervals (0, π ) and (π, 2π ) . Therefore the distribution in each of these intervals is the distribution of sin θ. The probability of choosing between each one of these intervals is 1 . Thus f2 sin θ = 1 fsin θ + 1 fsin θ = fsin θ .
2
2
2
2
2
Therefore √XXYY 2 has the same distribution as Y = sin θ. In addition, √XXYY 2 has the
2+
2+ same distribution as X = cos θ since sin θ has the same distribution as cos θ. To see this let consider the distribution of W = cos θ and V = sin θ where θ ∼ uniform(0, 2π ). To derive
−
the distribution of W = cos θ let A1 = (0, π ), g1 (θ) = cos θ, g1 1 (w) = cos−1 w, A2 = (π, 2π ),
−1
−1 g2 (θ) = cos θ, g2 (w) = 2π − cos w. By Theorem 2.1.8 fW (w) =

1
−1
1
1
1
1
|√
|√
|+
|= √
, −1 ≤ w ≤ 1.
2π 1 − w2
2π 1 − w2 π 1 − w2

π
To derive the distribution of V = sin θ, ﬁrst consider the interval ( π , 32 ). Let g1 (θ) = sin θ,
2
−
4g1 1 (v ) = π − sin−1 v , then

fV (v ) =

1
1
√
,
π 1 − v2

−1 ≤ v ≤ 1.

π
Second, consider the set {(0, π ) ∪ ( 32 , 2π )}, for which the function sin θ has the same values
2
−π π π as it does in the interval ( 2 , 2 ). Therefore the distribution of V in {(0, π ) ∪ ( 32 , 2π )} is
2
−π π
1√ 1 the same as the distribution of V in ( 2 , 2 ) which is π 1−v2 , −1 ≤ v ≤ 1. On (0, 2π ) each π π of the sets ( π , 32 ), {(0, π ) ∪ ( 32 , 2π )} has probability 1 of being chosen. Therefore
2
2
2

fV (v ) =

1
11
11
1
1
1
√
√
+
=√
,
2 π 1 − v2
2 π 1 − v2 π 1 − v2

−1 ≤ v ≤ 1.

Thus W and V has the same distribution.
Let X and Y be iid n(0, 1). Then X 2 + Y 2 √ χ2 is a positive random variable. Therefore
∼2
with X = R cos θ and Y = R sin θ, R = X 2 + Y 2 is a positive random variable and
Y
2 θ = tan−1 ( X ) ∼ uniform(0, 1). Thus √XXYY 2 ∼ X ∼ n(0, 1).
2+
4.30 a.
1
.
2

EY

=

E {E(Y |X )} = EX

VarY

=

Var (E(Y |X )) + E (Var(Y |X )) = VarX + EX 2 =

EXY

=

E[E(XY |X )] = E[X E(Y |X )] = EX 2 =

Cov(X, Y )

=

EXY − EX EY

=

=

1
−
3

1
2

2

=

1
1
+
12 3

=

5
.
12

1
3

1
.
12

b. The quick proof is to note that the distribution of Y |X = x is n(1, 1), hence is independent of X . The bivariate transformation t = y/x, u = x will also show that the joint density factors. Second Edition

4-9

4.31 a.
EY = E{E(Y |X )} = EnX =

n
.
2

VarY = Var (E(Y |X )) + E (Var(Y |X )) = Var(nX ) + EnX (1 − X ) =

n n2 +.
12
6

b. ny x (1 − x)n−y , y P (Y = y, X ≤ x) =

y = 0, 1, . . . , n,

0 < x < 1.

c.
P (y = y ) =

n Γ(y + 1)Γ(n − y + 1)
.
y
Γ(n + 2)

4.32 a. The pmf of Y , for y = 0, 1, . . ., is
∞

fY (y )

∞

0

=

λy e−λ
1
λα−1 e−λ/β dλ y ! Γ(α)β α
0


 −λ 
∞
λ(y+α)−1 exp dλ β
0

fY (y |λ)fΛ (λ)dλ =

=

1 y !Γ(α)β α

1+β

=

1
Γ(y + α) y !Γ(α)β α

y +α

β
1+β

.

If α is a positive integer, fY (y ) =

y+α−1 y β
1+β

y

1
1+β

α

,

the negative binomial(α, 1/(1 + β )) pmf. Then
EY
VarY

=
=

E(E(Y |Λ)) = EΛ = αβ
Var(E(Y |Λ)) + E(Var(Y |Λ)) = VarΛ + EΛ = αβ 2 + αβ = αβ (β + 1).

b. For y = 0, 1, . . ., we have
∞

P (Y = y |λ)

P (Y = y |N = n, λ)P (N = n|λ)

= n=y ∞

= n=y ∞

= n=y ny e−λ λn p (1 − p)n−y n! y
1
y !(n − y )!
∞

= e−λ

=

e−λ y! 1 y !m! m=0 p
1−p

p
1−p
p
1−p

y

[(1 − p)λ]n e−λ y [(1 − p)λ]m+y
∞

y

[(1 − p)λ]y

= e−λ (pλ)y e(1−p)λ y (pλ) e−pλ
=
, y! (let m = n − y ) m [(1−p)λ] m! m=0

4-10

Solutions Manual for Statistical Inference

the Poisson(pλ) pmf. Thus Y |Λ ∼ Poisson(pλ). Now calculations like those in a) yield the pmf of Y , for y = 0, 1, . . ., is fY (y ) =

1 α Γ(y + α)
Γ(α)y !(pβ )

pβ
1+pβ

y +α

.

Again, if α is a positive integer, Y ∼ negative binomial(α, 1/(1 + pβ )).
4.33 We can show that H has a negative binomial distribution by computing the mgf of H .
EeHt = EE eHt N = EE e(X1 +···+XN )t N = E

E eX1 t N

N

,

because, by Theorem 4.6.7, the mgf of a sum of independent random variables is equal to the product of the individual mgfs. Now,
∞

EeX1 t =

x1

ex1 t x1 =1

−1 (1 − p) logp x1

=

−1 logp x

∞

1

x1

(et (1 − p)) x1 =1

=

−1
− log 1−et (1 − p) logp .

Then
E

log {1−et (1 − p)} logp ∞

N

= n=0 = e−λ e

n

log {1−et (1 − p)} logp ∞

λlog(1−et (1−p)) logp e

e−λ λn n! (since N ∼ Poisson)

−λlog(1−et (1−p)) logp λlog(1−et (1−p)) logp n

n!

n=0 t .

The sum equals 1. It is the sum of a Poisson [λlog(1 − e (1 − p))]/[logp] pmf. Therefore,
Ht

E(e

)

= e−λ elog(1−e
=

t

(1−p))

p
1−et (1 − p)

λ/ log p

=

elogp

−λ/ logp

1
1−et (1 − p)

−λ/ log p

−λ/ logp

.

This is the mgf of a negative binomial(r, p), with r = −λ/ log p, if r is an integer.
4.34 a.
1

P (Y = y )

P (Y = y |p)fp (p)dp

=
0
1

=
0

ny
1
p (1 − p)n−y pα−1 (1 − p)β −1 dp y B (α, β )

1 n Γ(α+β ) py+α−1 (1 − p)n+β −y−1 dp y Γ(α)Γ(β ) 0 n Γ(α+β ) Γ(y +α)Γ(n+β −y )
,
y = 0, 1, . . . , n. y Γ(α)Γ(β )
Γ(α+n+β )

=
=
b.

1

P (X = x)

P (X = x|p)fP (p)dp

=
0
1

=
0

r+x−1 r
Γ(α + β ) α−1 p (1 − p)x p (1 − p)β −1 dp x Γ(α)Γ(β )

Second Edition

r+x−1 x r+x−1 x =
=

4-11

Γ(α + β ) 1 (r+α)−1 p (1 − p)(x+β )−1 dp
Γ(α)Γ(β ) 0
Γ(α + β ) Γ(r + α)Γ(x + β ) x = 0, 1, . . .
Γ(α)Γ(β ) Γ(r + x + α + β )

Therefore,
EX = E[E(X |P )] = E

r(1 − P ) rβ =
,
P α−1 since
1

1−P
P

E

1−P
P

=
0

=
=

Var(X )

=

Γ(α + β )
Γ(α)Γ(β ) β . α−1 Γ(α + β ) α−1 p (1 − p)β −1 dp
Γ(α)Γ(β )
1

p(α−1)−1 (1 − p)(β +1)−1 dp =
0

E(Var(X |P )) + Var(E(X |P )) = E

=r

Γ(α + β ) Γ(α − 1)Γ(β + 1)
Γ(α)Γ(β )
Γ(α + β )

r(1 − P )
+ Var
P2

r(1 − P )
P

β (α + β − 1)
(β + 1)(α + β )
+ r2
,
α(α − 1)
(α − 1)2 (α − 2)

since
1

1−P
P2

E

=
=

Γ(α + β ) (α−2)−1 p (1 − p)(β +1)−1 dp =
0 Γ(α)Γ(β )
(β + 1)(α + β ) α(α − 1)

Γ(α + β ) Γ(α − 2)Γ(β + 1)
Γ(α)Γ(β ) Γ(α + β − 1)

and
Var

1−P
P

2

1−P
P

−E

=

E

=

1−P
P

2

=

β (β + 1) β2 −(
)
(α − 2)(α − 1) α−1 β (α + β − 1)
,
(α − 1)2 (α − 2)

where
E

1−P
P

2

1

=
0

=

Γ(α + β ) (α−2)−1 p (1 − p)(β +2)−1 dp
Γ(α)Γ(β )

Γ(α + β ) Γ(α − 2)Γ(β + 2)
Γ(α)Γ(β ) Γ(α − 2 + β + 2)

4.35 a. Var(X ) = E(Var(X |P )) + Var(E(X |P )). Therefore,
Var(X )

E[nP (1 − P )] + Var(nP ) αβ =n
+ n2 VarP
(α + β )(α + β + 1) αβ (α + β + 1 − 1)
=n
+ n2 VarP
(α + β 2 )(α + β + 1)
=

=

β (β + 1)
.
(α − 2)(α − 1)

4-12

Solutions Manual for Statistical Inference

nαβ (α + β + 1) nαβ −
+ n2 VarP
2 )(α + β + 1)
2 )(α + β + 1)
(α + β
(α + β β α
=n
− nVarP + n2 VarP α+β α+β
= nEP (1 − EP ) + n(n − 1)VarP.
=

1
b. Var(Y ) = E(Var(Y |Λ)) + Var(E(Y |Λ)) = EΛ + Var(Λ) = µ + α µ2 since EΛ = µ = αβ and

Var(Λ) = αβ 2 =
4.37 a. Let Y = Xi .

(αβ )2 α P (Y = k )

µ2
α.

=

The “extra-Poisson” variation is

12 αµ .

1
1
< c = (1 + p) < 1)
2
2
1
1
(Y = k |c = (1 + p))P (P = p)dp
2
0

= P (Y = k,
=

1

n1
1
Γ(a + b) a−1
[ (1 + p)]k [1 − (1 + p)]n−k p (1 − p)b−1 dp k2 2
Γ(a)Γ(b)

1

n (1 + p)k (1 − p)n−k Γ(a + b) a−1 p (1 − p)b−1 dp k 2k
2n−k
Γ(a)Γ(b)

=
0

=
0

n Γ(a + b) k 2n Γ(a)Γ(b)

=

n Γ(a + b) k 2n Γ(a)Γ(b)

=

k

= j =0

b.

k j 2n

k

1

pk+a−1 (1 − p)n−k+b−1 dp j =0 k j =0

0

k Γ(k + a)Γ(n − k + b) j Γ(n + a + b)

n Γ(a + b) Γ(k + a)Γ(n − k + b) k Γ(a)Γ(b)
Γ(n + a + b)

.

A mixture of beta-binomial.
EY = E(E(Y |c)) = E[nc] = E n

1
(1 + p)
2

=

n
2

1+

a a+b .

Using the results in Exercise 4.35(a),
Var(Y ) = nEC (1 − EC ) + n(n − 1)VarC.
Therefore,
Var(Y )

1
1
(1 + P ) + n(n − 1)Var
(1 + P )
2
2 n n(n − 1)
(1 + EP )(1 − EP ) +
VarP
4
4
2 n a n(n − 1) ab 1−
+
.
2 (a + b + 1)
4
a+b
4
(a + b)

= nE
=
=

1
(1 + P )
2

4.38 a. Make the transformation u = λ 0

x ν −

1−E

x λ , du =

−x ν ν 2 dν , λ−ν

1 −x/ν
1
ν r−1 dν e ν Γ(r)Γ(1 − r) (λ−ν )r

=

x λu .

Then

Second Edition

=
=

1
Γ(r)Γ(1 − r)

∞

1 x xr−1 e−x/λ r Γ(r )Γ(1 − r ) λ 1 u r

x λu ∞

0

4-13

0

e−(u+x/λ) du

r

e−u du =

xr−1 e−x/λ
,
Γ(r)λr

since the integral is equal to Γ(1 − r) if r < 1.
b. Use the transformation t = ν/λ to get λ 1
Γ(r)Γ(1 − r)

pλ (ν )dν =
0

c.

λ

ν r−1 (λ − ν )−r dν =
0

1
Γ(r)Γ(1 − r)

1

tr−1 (1 − t)−r dt = 1,
0

since this is a beta(r, 1 − r). d d
1
r−1
1
log f (x) = log +(r − 1) log x − x/λ =
− >0 dx dx
Γ(r)λr
x λ for some x, if r > 1. But, d log dx ∞
0

− e−x/ν qλ (ν )dν = ν ∞ 1 −x/ν e qλ (ν )dν
0 ν2
∞ 1 −x/ν e qλ (ν )dν
0ν

< 0 ∀x.

4.39 a. Without loss of generality lets assume that i < j . From the discussion in the text we have that f (x1 , . . . , xj −1 , xj +1 , . . . , xn |xj )
(m − xj )!
=
x1 !· · · · ·xj −1 !·xj +1 !· · · · ·xn !
×

p1
1 − pj

x1

·····

pj −1
1 − pj

xj −1

pj +1
1 − pj

xj +1

·····

pn
1 − pj

xn

Then, f (xi |xj ) f (x1 , . . . , xj −1 , xj +1 , . . . , xn |xj )

=
(x1 ,...,xi−1 ,xi+1 ,...,xj −1 ,xj +1 ,...,xn )

=
(xk =xi ,xj )

×(

(m − xj )! x1 !· · · · ·xj −1 !·xj +1 !· · · · ·xn !

p1 x1 pj −1 xj−1 pj +1 xj+1 pn xn
) · · · · ·(
)
(
)
· · · · ·(
)
1 − pj
1 − pj
1 − pj
1 − pj

(m − xi − xj )! 1 −

pi
1−pj

(m − xi − xj )! 1 −

pi
1−pj

×

m−xi −xj m−xi −xj m−x −x

=

i j (m − xj )! pi xi pi (
)
1− xi !(m − xi − xj )! 1 − pj
1 − pj
(m − xi − xj )!
×
x1 !· · · · ·xi−1 !, xi+1 !· · · · ·xj −1 !, xj +1 !· · · · ·xn !

(xk =xi ,xj )

×(

p1 pi−1 pi+1
)x1 · · · · ·(
)xi−1 (
)xi+1
1 − pj − p i
1 − p j − pi
1 − pj − pi

.

4-14

Solutions Manual for Statistical Inference

×(

pj −1 pj +1 pn )xj−1 (
)xj+1 · · · · ·(
)xn
1 − pj − p i
1 − pj − p i
1 − pj − p i

(m − xj )! pi xi
(
) xi !(m − xi − xj )! 1 − pj

=

pi
1 − pj

1−

m−xi −xj

.

p
Thus Xi |Xj = xj ∼ binomial(m − xj , 1−ipj ).

b. f (xi , xj ) = f (xi |xj )f (xj ) =

m! x pxi p j (1 − pj − pi )m−xj −xi . xi !xj !(m − xj − xi )! i j

Using this result it can be shown that Xi + Xj ∼ binomial(m, pi + pj ). Therefore,
Var(Xi + Xj ) = m(pi + pj )(1 − pi − pj ).
By Theorem 4.5.6 Var(Xi + Xj ) = Var(Xi ) + Var(Xj ) + 2Cov(Xi , Xj ). Therefore,
Cov(Xi , Xj ) =

1
1
[m(pi + pj )(1 − pi − pj ) − mpi (1 − pi ) − mpi (1 − pi )] = (−2mpi pj ) = −mpi pj .
2
2

4.41 Let a be a constant. Cov(a, X ) = E(aX ) − EaEX = aEX − aEX = 0.
4.42
2

ρXY,Y =

Cov(XY, Y )
E(XY )−µXY µY
EX EY 2 −µX µY µY
=
=
,
σXY σY σXY σY σXY σY

where the last step follows from the independence of X and Y. Now compute
2
σXY

= E(XY )2 − [E(XY )]2 = EX 2 EY 2 − (EX )2 (EY )2
2
2
22
2
2
= (σX + µ2 )(σY + µ2 ) − µ2 µ2 = σX σY + σX µ2 + σY µ2 .
X
Y
XY
Y
X

Therefore, ρXY,Y =

2 µX (σY +µ2 )−µX µ2
Y
Y
22
(σX σY

2
+σX µ2
Y

2
+σY

1/2 µ2 ) σY
X

=

µX σY
2
(µ2 σY
X

+µ2
Y

.

1/2

2
22
σX +σX σY )

4.43
= E(X1 + X2 )(X2 + X3 ) − E(X1 + X2 )E(X2 + X3 )
= (4µ2 + σ 2 ) − 4µ2 = σ 2
2
2
Cov(X 1 +X 2 )(X 1 −X 2 ) = E(X1 + X2 )(X1 − X2 ) = EX1 − X2 = 0.

Cov(X1 + X2 , X2 + X3 )

4.44 Let µi = E(Xi ). Then n Var

Xi

=

Var (X1 + X2 + · · · + Xn )

=

E [(X 1 + X2 + · · · + Xn ) − (µ1 + µ2 + · · · + µn )]

=

E [(X 1 −µ1 ) + (X 2 −µ2 ) + · · · + (X n −µn )]

i=1
2

2

n

E(Xi − µi )2 + 2

=

E(Xi − µi )(Xj − µj )
1≤i 0)
(symmetry of X and Y )
≤ z )(P (Y < 0) + P (Y > 0))
≤ z ).

By a similar argument, for z > 0, we get P (Z > z ) = P (X > z ), and hence, P (Z ≤ z ) =
P (X ≤ z ). Thus, Z ∼ X ∼ n(0, 1).
b. By deﬁnition of Z , Z > 0 ⇔ either (i)X < 0 and Y > 0 or (ii)X > 0 and Y > 0. So Z and
Y always have the same sign, hence they cannot be bivariate normal.

4-18

Solutions Manual for Statistical Inference

4.49 a. fX (x)

=

(af1 (x)g1 (y ) + (1 − a)f2 (x)g2 (y ))dy

= af1 (x)

g1 (y )dy + (1 − a)f2 (x)

g2 (y )dy

= af1 (x) + (1 − a)f2 (x). fY (y )

=

(af1 (x)g1 (y ) + (1 − a)f2 (x)g2 (y ))dx

= ag1 (y )

f1 (x)dx + (1 − a)g2 (y )

f2 (x)dx

= ag1 (y ) + (1 − a)g2 (y ).
b. (⇒) If X and Y are independent then f (x, y ) = fX (x)fY (y ). Then, f (x, y ) − fX (x)fY (y )
= af1 (x)g1 (y ) + (1 − a)f2 (x)g2 (y )
− [af1 (x) + (1 − a)f2 (x)][ag1 (y ) + (1 − a)g2 (y )]
= a(1 − a)[f1 (x)g1 (y ) − f1 (x)g2 (y ) − f2 (x)g1 (y ) + f2 (x)g2 (y )]
= a(1 − a)[f1 (x) − f2 (x)][g1 (y ) − g2 (y )]
= 0.
Thus [f1 (x) − f2 (x)][g1 (y ) − g2 (y )] = 0 since 0 < a < 1.
(⇐) if [f1 (x) − f2 (x)][g1 (y ) − g2 (y )] = 0 then f1 (x)g1 (y ) + f2 (x)g2 (y ) = f1 (x)g2 (y ) + f2 (x)g1 (y ).
Therefore
fX (x)fY (y )
= a2 f1 (x)g1 (y ) + a(1 − a)f1 (x)g2 (y ) + a(1 − a)f2 (x)g1 (y ) + (1 − a)2 f2 (x)g2 (y )
= a2 f1 (x)g1 (y ) + a(1 − a)[f1 (x)g2 (y ) + f2 (x)g1 (y )] + (1 − a)2 f2 (x)g2 (y )
= a2 f1 (x)g1 (y ) + a(1 − a)[f1 (x)g1 (y ) + f2 (x)g2 (y )] + (1 − a)2 f2 (x)g2 (y )
= af1 (x)g1 (y ) + (1 − a)f2 (x)g2 (y ) = f (x, y ).
Thus X and Y are independent.
c.
Cov(X, Y )

= aµ1 ξ1 + (1 − a)µ2 ξ2 − [aµ1 + (1 − a)µ2 ][aξ1 + (1 − a)ξ2 ]
= a(1 − a)[µ1 ξ1 − µ1 ξ2 − µ2 ξ1 + µ2 ξ2 ]
= a(1 − a)[µ1 − µ2 ][ξ1 − ξ2 ].

To construct dependent uncorrelated random variables let (X, Y ) ∼ af1 (x)g1 (y ) + (1 −
a)f2 (x)g2 (y ) where f1 , f2 , g1 , g2 are such that f1 − f2 = 0 and g1 − g2 = 0 with µ1 = µ2 or ξ1 = ξ2 .
d. (i) f1 ∼ binomial(n, p), f2 ∼ binomial(n, p), g1 ∼ binomial(n, p), g2 ∼ binomial(n, 1 − p).
(ii) f1 ∼ binomial(n, p1 ), f2 ∼ binomial(n, p2 ), g1 ∼ binomial(n, p1 ), g2 ∼ binomial(n, p2 ). p p
(iii) f1 ∼ binomial(n1 , n1 ), f2 ∼ binomial(n2 , n2 ), g1 ∼ binomial(n1 , p), g2 ∼ binomial(n2 , p).

Second Edition

4-19

4.51 a.
1
2t
1
2+

t>1
(1 − t) t ≤ 1
P (XY ≤ t) = t − t log t
0 < t < 1.

P (X/Y ≤ t)

=

b.
1

P (XY /Z ≤ t)

P (XY ≤ zt)dz

=
0

1
0

=

1 t 0

zt
2 + (1 − zt) zt 2 + (1 − zt)

1 − t/4
1
t − 4t +

=

1
2t

if t ≤ 1

dz dz +

11
1
2zt dz t if t ≤ 1

if t ≤ 1
.
if t > 1

log t

4.53
P (Real Roots) =
=
=
=

P (B 2 > 4AC )
P (2 log B > log 4 + log A + log C )
P (−2 log B ≤ − log 4 − log A − log C )
P (−2 log B ≤ − log 4 + (− log A − log C )) .

Let X = −2 log B , Y = − log A − log C . Then X ∼ exponential(2), Y ∼ gamma(2, 1), independent, and
P (X < − log 4 + Y )

P (Real Roots) =

∞

P (X < − log 4 + y )fY (y )dy

= log 4
∞

− log 4+y

= log 4
∞

0

1 −x/2 e dxye−y dy
2

1

1 − e− 2 log 4 e−y/2 y e−y dy.

= log 4

Integration-by-parts will show that
P (Real Roots) =

∞ a ye−y/b = b(a + b)e−a/b and hence

1
1
(1 + log 4) −
4
24

n

2
+ log 4
3

n

= .511.

n

4.54 Let Y = i=1 Xi . Then P (Y ≤ y ) = P ( i=1 Xi ≤ y ) = P ( i=1 − log Xi ≥ − log y ). Now, n − log Xi ∼ exponential(1) = gamma(1, 1). By Example 4.6.8, i=1 − log Xi ∼ gamma(n, 1).
Therefore,
∞
1 n−1 −z
P (Y ≤ y ) = z e dz,
− log y Γ(n) and fY (y )

=

d dy ∞
− log y

1 n−1 −z z e dz
Γ(n)

1 d =−
(− log y )n−1 e−(− log y) (− log y )
Γ(n)
dy
1
=
(− log y )n−1 ,
0 < y < 1.
Γ(n)

4-20

Solutions Manual for Statistical Inference

4.55 Let X1 , X2 , X3 be independent exponential(λ) random variables, and let Y = max(X1 , X2 , X3 ), the lifetime of the system. Then
P (Y ≤ y )

= P (max(X1 , X2 , X3 ) ≤ y )
= P (X1 ≤ y and X2 ≤ y and X3 ≤ y )
= P (X1 ≤ y )P (X2 ≤ y )P (X3 ≤ y ). y 1 −x/λ e dx
0λ

by the independence of X1 , X2 and X3 . Now each probability is P (X1 ≤ y ) =
1 − e−y/λ , so
P (Y ≤ y ) = 1−e−y/λ

=

3

0 < y < ∞,

,

and the pdf is
2 −y/λ

3 1 − e−y/λ
0

fY (y ) =

e

y>0 y ≤ 0.

4.57 a. n n

1 xi , n x=1

A1

=

[

1
1
x1 ] 1 n x=1 i

A−1

=

[

1 x−1 ]−1 = n x=1 i

=

n

11 n ( x1

n

lim log Ar

r →0

lim

r →0

1 n 1
+ ··· +

1, xn )

the harmonic mean.

n

1
1
= lim log[ xr ] r r →0 n x=1 i

=

the arithmetic mean.

1
1
= lim log[ xr ] = r →0 r n x=1 i

n r i=1 xi log xi n 1 r i=1 xi n 1 n =

n

r →0

n r −1 i=1 rxi n 1 r i=1 xi n n

log xi = i=1 n i=1 lim

1 n n

1 log( xi ). n i=1

1

1 xi )) = ( i=1 xi ) n , the geometric mean. The term
Thus A0 = limr→0 Ar = exp( n log( r −1 r −1 d dr r rxi = xi log xi since rxi = dr xi = dr exp(r log xi ) = exp(r log xi ) log xi = xr log xi . i b. (i) if log Ar is nondecreasing then for r ≤ r log Ar ≤ log Ar , then elog Ar ≤ elog Ar . Therefore
Ar ≤ Ar . Thus Ar is nondecreasing in r.

(ii)

d dr log Ar =

−1 r2 1 log( n

n x=1 xr ) + i 1 r n

1 n rxr−1 i i=1 n 1 n xr i=1 i

=

r

1 r2 n

i=1 n xr log xi i x=1

xr i 1
− log( n

where we use the identity for rxr−1 showed in a). i (iii) n r i=1 xi log xi n r x=1 xi

r

=

log(n) +

n

− log(

r n =

log(n) + i=1 n

=

log(n) + i=1 n

=

log(n) − i=1 1 xr ) n x=1 i

n r i=1 xi log xi n r x=1 xi

xr i n i=1 xr i n i=1 xr i n i=1 xr i n

xr ) i − log( x=1 n

xr i r log xi −

n i=1 xr i xr ) i log( x=1 n

xr i xr i (r log xi − log(

log(

xr )) i x=1 n x=1 xr i

xr i n

) = log(n) −

ai log( i=1 1
).
ai

n x=1 xr ) , i Second Edition

4-21

n

1
We need to prove that log(n) ≥ i=1 ai log( ai ). Using Jensen inequality we have that n n
1
1
1
1
E log( a ) = i=1 ai log( ai ) ≤ log(E a ) = log( i=1 ai ai ) = log(n) which establish the result. 4.59 Assume that EX = 0, EY = 0, and EZ = 0. This can be done without loss of generality because we could work with the quantities X − EX , etc. By iterating the expectation we have
Cov(X, Y ) = EXY = E[E(XY |Z )].
Adding and subtracting E(X |Z )E(Y |Z ) gives
Cov(X, Y ) = E[E(XY |Z ) − E(X |Z )E(Y |Z )] + E[E(X |Z )E(Y |Z )].
Since E[E(X |Z )] = EX = 0, the second term above is Cov[E(X |Z )E(Y |Z )]. For the ﬁrst term write E[E(XY |Z ) − E(X |Z )E(Y |Z )] = E [E { XY − E(X |Z )E(Y |Z )| Z }] where we have brought E(X |Z ) and E(Y |Z ) inside the conditional expectation. This can now be recognized as ECov(X, Y |Z ), establishing the identity.
4.61 a. To ﬁnd the distribution of f (X1 |Z ), let U = x1 = h2 (u, v ) = v . Therefore

X2 −1
X1

and V = X1 . Then x2 = h1 (u, v ) = uv +1,

fU,V (u, v ) = fX,Y (h1 (u, v ), h2 (u, v ))|J | = e−(uv+1) e−v v, and ∞

ve−(uv+1) e−v dv =

fU (u) =
0

e−1
.
(u + 1)2

−v

Thus V |U = 0 has distribution ve . The distribution of X1 |X2 is e−x1 since X1 and X2 are independent.
b. The following Mathematica code will draw the picture; the solid lines are B1 and the dashed lines are B2 . Note that the solid lines increase with x1, while the dashed lines are constant.
Thus B1 is informative, as the range of X2 changes. e = 1/10;
Plot[{-e*x1 + 1, e*x1 + 1, 1 - e, 1 + e}, {x1, 0, 5},
PlotStyle -> {Dashing[{}], Dashing[{}],Dashing[{0.15, 0.05}],
Dashing[{0.15, 0.05}]}]
c.
P (X1 ≤ x|B1 )

= P (V ≤ v | − < U < ) =
−v ∗ (1+ )

e−1

e

1+

=

Thus lim

→0

1
1+

−

=

e

1 e−1 − 1+ +
∗

P (X1 ≤ x|B1 ) = 1 − e−v − v ∗ e−v =

P (X1 ≤ x|B2 )
Thus lim

−

∗

→0

v∗
0
−
∞
0
−

∗

x 1+
00

e−(x1 +x2 ) dx2 dx1

1+
0

e−x2 dx2

P (X1 ≤ x|B2 ) = 1 − ex =

x x1 e dx1
0

−v ∗ (1− )

1−
1
1− v∗ 0

=

+

1
1−

v e−(uv+1) e−v dudv v e−(uv+1) e−v dudv
.

ve−v dv = P (V ≤ v ∗ |U = 0). e−(x+1+ ) − e−(1+ ) − e−x + 1
.
1 − e−(1+ )

= P (X1 ≤ x|X2 = 1).

4-22

Solutions Manual for Statistical Inference

4.63 Since X = eZ and g (z ) = ez is convex, by Jensen’s Inequality EX = Eg (Z ) ≥ g (EZ ) = e0 = 1.
In fact, there is equality in Jensen’s Inequality if and only if there is an interval I with P (Z ∈
I ) = 1 and g (z ) is linear on I . But ez is linear on an interval only if the interval is a single point. So EX > 1, unless P (Z = EZ = 0) = 1.
4.64 a. Let a and b be real numbers. Then,
|a + b|2 = (a + b)(a + b) = a2 + 2ab + b2 ≤ |a|2 + 2|ab| + |b|2 = (|a| + |b|)2 .
Take the square root of both sides to get |a + b| ≤ |a| + |b|.
b. |X + Y | ≤ |X | + |Y | ⇒ E|X + Y | ≤ E(|X | + |Y |) = E|X | + E|Y |.
4.65 Without loss of generality let us assume that Eg (X ) = Eh(X ) = 0. For part (a)
∞

E(g (X )h(X ))

=

g (x)h(x)fX (x)dx
−∞

=

g (x)h(x)fX (x)dx +
{x:h(x)≤0}

g (x)h(x)fX (x)dx
{x:h(x)≥0}

≤ g (x0 )

h(x)fX (x)dx + g (x0 )
{x:h(x)≤0}

h(x)fX (x)dx
{x:h(x)≥0}

∞

=

h(x)fX (x)dx
−∞

= g (x0 )Eh(X ) = 0. where x0 is the number such that h(x0 ) = 0. Note that g (x0 ) is a maximum in {x : h(x) ≤ 0} and a minimum in {x : h(x) ≥ 0} since g (x) is nondecreasing. For part (b) where g (x) and h(x) are both nondecreasing
∞

E(g (X )h(X ))

=

g (x)h(x)fX (x)dx
−∞

=

g (x)h(x)fX (x)dx +
{x:h(x)≤0}

≥ g (x0 )

g (x)h(x)fX (x)dx
{x:h(x)≥0}

h(x)fX (x)dx + g (x0 )
{x:h(x)≤0}

h(x)fX (x)dx
{x:h(x)≥0}

∞

=

h(x)fX (x)dx
−∞

= g (x0 )Eh(X ) = 0.
The case when g (x) and h(x) are both nonincreasing can be proved similarly.

Chapter 5

Properties of a Random Sample

5.1 Let X = # color blind people in a sample of size n. Then X ∼ binomial(n, p), where p = .01.
The probability that a sample contains a color blind person is P (X > 0) = 1 − P (X = 0), where P (X = 0) = n (.01)0 (.99)n = .99n . Thus,
0
P (X > 0) = 1 − .99n > .95 ⇔ n > log(.05)/ log(.99) ≈ 299.
5.3 Note that Yi ∼ Bernoulli with pi = P (Xi ≥ µ) = 1 − F (µ) for each i. Since the Yi ’s are iid n Bernoulli, i=1 Yi ∼ binomial(n, p = 1 − F (µ)).
¯
¯
5.5 Let Y = X1 + · · · + Xn . Then X = (1/n)Y , a scale transformation. Therefore the pdf of X is x 1 fX (x) = 1/n fY 1/n = nfY (nx).
¯
0
−1
∞ fZ,W (z, w) = fX (w)fY (w − z ) · 1, thus fZ (z ) = −∞ fX (w)fY (w − z )dw.

5.6 a. For Z = X − Y , set W = X . Then Y = W − Z , X = W , and |J | =

1
1

= 1. Then

0
1
= −1/w. Then
2
1/w −z/w
∞
fZ,W (z, w) = fX (w)fY (z/w) · |−1/w|, thus fZ (z ) = −∞ |−1/w| fX (w)fY (z/w)dw.

b. For Z = XY , set W = X . Then Y = Z/W and |J | =

0
1
= w/z 2 . Then
2
−w/z 1/z
∞
fZ,W (z, w) = fX (w)fY (w/z ) · |w/z 2 |, thus fZ (z ) = −∞ |w/z 2 |fX (w)fY (w/z )dw.
5.7 It is, perhaps, easiest to recover the constants by doing the integrations. We have
c. For Z = X/Y , set W = X . Then Y=W/Z and |J | =

∞
−∞

B ω σ

1+

∞
2 dω

D

= σπB,
−∞

1+

dω ω −z 2 τ = τ πD

and
∞
−∞

Aω

− ω2 1+ σ 1+
∞

Aω

=
−∞

=A

1+

ω σ Cω ω −z 2 τ 2−

σ2 ω log 1+
2
σ

dω

C (ω −z )
1+
2

−

ω −z 2 τ ∞

dω − Cz

Cτ 2 log 1+
2

1

dω ω −z 2
−∞ 1+ τ 2∞

ω −z τ − τ πCz.
−∞

2
The integral is ﬁnite and equal to zero if A = M σ2 , C = M τ22 for some constant M . Hence

fZ (z ) = if B =

τ σ +τ ,

D=

σ σ +τ ) ,

1
2πM z
1
1
,
σ πB −τ πD−
=
π 2 στ τ π (σ +τ ) 1+ (z/(σ +τ ))2
M=

−στ 2
1
2z (σ +τ ) 1+( z )2 . σ +τ

5-2

Solutions Manual for Statistical Inference

5.8 a.
1
2n(n − 1)
=

=

n

n
2

(X i −Xj ) i=1 j =1

1
2n(n − 1)
1
2n(n − 1)

n

n

¯
¯
(Xi − X + X − Xj )2 i=1 j =1 n n

¯2
¯
¯
¯2
(X i −X ) −2(X i −X )(X j −X ) + (X j −X ) i=1 j =1




=

n

n n n


1
¯
¯
¯
¯

(Xj − X )2  n(Xi − X )2 − 2
(Xi − X )
(Xj −X ) +n


2n(n − 1)  i=1

i=1 j =1 j =1
=0

=

=

n
2n(n − 1)
1
n−1

n

¯
(Xi − X )2 + i=1 n
2n(n − 1)

n

¯
(Xj − X )2 j =1

n

¯
(Xi − X )2 = S 2 . i=1 b. Although all of the calculations here are straightforward, there is a tedious amount of bookkeeping needed. It seems that induction is the easiest route. (Note: Without loss of generality we can assume θ1 = 0, so EXi = 0.)
4

4

1
(i) Prove the equation for n = 4. We have S 2 = 24 i=1 j =1 (Xi − Xj )2 , and to calculate
Var(S 2 ) we need to calculate E(S 2 )2 and E(S 2 ). The latter expectation is straightforward and we get E(S 2 ) = 24θ2 . The expected value E(S 2 )2 = E(S 4 ) contains 256(= 44 ) terms of which 112(= 4 × 16 + 4 × 16 − 42 ) are zero, whenever i = j . Of the remaining terms,
2
• 24 are of the form E(Xi − Xj )4 = 2(θ4 + 3θ2 )
2
2
2
• 96 are of the form E(Xi − Xj ) (Xi − Xk ) = θ4 + 3θ2
2
2
2
• 24 are of the form E(Xi − Xj ) (Xk − X ) = 4θ2
Thus,

Var(S 2 ) =

1
1
12
2
2
2
24 × 2(θ4 + 3θ2 ) + 96(θ4 + 3θ2 ) + 24 × 4θ4 − (24θ2 ) = θ4 − θ2 .
2
24
4
3

(ii) Assume that the formula holds for n, and establish it for n +1. (Let Sn denote the variance based on n observations.) Straightforward algebra will establish

 n n n 1
2
2
2

Sn+1 =
(X −Xj ) + 2
(X k −Xn+1 )
2n(n + 1) i=1 j =1 i k=1 def’n

=

1
[A + 2B ]
2n(n + 1)

where
Var(A)

=

4n(n − 1)2 θ4 −

n−3 2 θ n−1 2

2
Var(B ) = n(n + 1)θ4 − n(n − 3)θ2
2
Cov(A, B ) = 2n(n − 1) θ4 − θ2

(induction hypothesis)
(Xk and Xn+1 are independent)
(some minor bookkeeping needed)

Second Edition

5-3

Hence,
1

2

Var(S n+1 ) =

4n2 (n

2

+ 1)

[Var(A) + 4Var(B ) + 4Cov(A, B )] =

1 n−2 2 θ4 − θ, n+1 n2 establishing the induction and verifying the result.
c. Again assume that θ1 = 0. Then


1
¯
E
Cov(X, S 2 ) = 2
2n (n − 1) 

n

n

n

Xk i=1 j =1

k=1



2
(X i −Xj ) .


The double sum over i and j has n(n − 1) nonzero terms. For each of these, the entire expectation is nonzero for only two values of k (when k matches either i or j ). Thus
¯
Cov(X, S 2 ) =

1
2n(n − 1)
EXi (Xi − Xj )2 = θ3 ,
2n2 (n − 1) n ¯ and X and S 2 are uncorrelated if θ3 = 0.
5.9 To establish the Lagrange Identity consider the case when n = 2,
(a1 b2 − a2 b1 )2

= a2 b2 + a2 b2 − 2a1 b2 a2 b1
12
21
22
22
= a1 b2 + a2 b1 − 2a1 b2 a2 b1 + a2 b2 + a2 b2 − a2 b2 − a2 b2
11
22
11
22
= (a2 + a2 )(b2 + b2 ) − (a1 b1 + a2 b2 )2 .
1
2
1
2

Assume that is true for n, then n+1 n+1

a2 i i=1

2

n+1

b2 i −

i=1

ai bi i=1 n

n

a2 i =

+

a2 +1 n i=1

−

i=1

ai bi n a2 i ai bi + an+1 bn+1

i=1

n

+

n

b2 +1 + a2 +1 n n

i=1

b2 i i=1

n

−2

ai bi

an+1 bn+1

i=1

n

(ai bj − aj bi )2 +

= i=1 j =i+1 n −
2

n

b2 i i=1

n−1

+

i=1

n

a2 i =

b2 +1 n i=1

n

2

n

b2 i (ai bn+1 − an+1 bi )2 i=1 n+1

(ai bj − aj bi )2 .

= i=1 j =i+1

If all the points lie on a straight line then Y − µy = c(X − µx ), for some constant c = 0. Let n n+1 bi = Y − µy and ai = (X − µx ), then bi = cai . Therefore i=1 j =i+1 (ai bj − aj bi )2 = 0. Thus the correlation coeﬃcient is equal to 1.
5.10 a. θ1 =

EXi = µ

5-4

Solutions Manual for Statistical Inference

= E(Xi − µ)2 = σ 2
= E(Xi −µ)3
= E(Xi − µ)2 (Xi − µ)
(Stein’s lemma: Eg (X )(X − θ) = σ 2 Eg (X ))
= 2σ 2 E(Xi − µ) = 0
= E(Xi − µ)4 = E(Xi − µ)3 (Xi − µ) = 3σ 2 E(Xi − µ)2 = 3σ 4 .

θ2 θ3 θ4

4

1
1
2σ
2
b. VarS 2 = n (θ4 − n−3 θ2 ) = n (3σ 4 − n−3 σ 4 ) = n−1 . n−1 n−1
c. Use the fact that (n − 1)S 2 /σ 2 ∼ χ2 −1 and Varχ2 −1 = 2(n − 1) to get n n

Var

(n − 1)S σ2 2

= 2(n − 1)

2

1) which implies ( (n−4 )VarS 2 = 2(n − 1) and hence σ VarS 2 =

2(n − 1)

=

2

(n − 1) /σ 4

2σ 4
.
n−1

Remark: Another approach to b), not using the χ2 distribution, is to use linear model theory.
For any matrix A Var(X AX ) = 2µ2 trA2 + 4µ2 θ Aθ, where µ2 is σ 2 , θ = EX = µ1. Write
2
n
1
1
¯
¯
S 2 = n−1 i=1 (Xi − X ) = n−1 X (I − Jn )X .Where

1
1
1
1− n − n · · · − n

.
.
 − 1 1− 1
.
n
¯
I − Jn =  . n

. .
..
.
.
.
.
.
1
1
−n
· · · · · · 1− n
Notice that trA2 = trA = n − 1, Aθ = 0. So
VarS 2 =

1

2 Var(X

(n − 1)

AX ) =

1
2

(n − 1)

2σ 4 (n − 1) + 0 =

2σ 4
.
n−1

5.11 Let g (s) = s2 . Since g (·) is a convex function, we know from Jensen’s inequality that Eg (S ) ≥ g (ES ), which implies σ 2 = ES 2 ≥ (ES )2 . Taking square roots, σ ≥ ES . From the proof of
Jensen’s Inequality, it is clear that, in fact, the inequality will be strict unless there is an interval I such that g is linear on I and P (X ∈ I ) = 1. Since s2 is “linear” only on single points, we have ET 2 > (ET )2 for any random variable T , unless P (T = ET ) = 1.
5.13
√
E c S2

=c

σ2
E
n−1

=c

σ2 n−1 S 2 (n − 1) σ2 ∞
0

√

q

Γ

n−1
2

n−1
1
q ( 2 )−1 e−q/2 dq,
(n−1)/2
2

Since S 2 (n − 1)/σ 2 is the square root of a χ2 random variable. Now adjust the integrand to be another χ2 pdf and get
√
E c S2 = c

σ2
Γ(n/2)2n/2
· n − 1 Γ((n − 1)/2)2((n−1)/2

∞
0

1
1
q (n−1)/2 − e−q/2 dq .
2
Γ(n/2)2n/2
=1

So c =

Γ(

n−1
2

√

√

)

n−1

2Γ( n )
2

gives E(cS ) = σ .

since

χ2 n pdf

Second Edition

5-5

5.15 a. n ¯
Xi
Xn+1 + i=1 Xi
Xn+1 + nXn
=
=
.
n+1 n+1 n+1 n+1 i=1

¯
Xn+1 =
b.
nS 2 +1 n =

n
(n + 1) − 1 n+1 n+1

¯
Xi − Xn+1 i=1 ¯
Xn+1 + nXn n+1 2

¯
Xn+1 nXn
Xi −
−
n+1 n+1

2

Xi −

= i=1 n+1

= i=1 2

n+1

i=1 n+1 ¯
Xi −Xn

2

¯
Xi − Xn

2

=

¯
±Xn

¯
Xn+1 −Xn n+1 ¯
−2 Xi −Xn

i=1 n =

2

¯
Xn+1
Xn
−
n+1 n+1

¯
Xi − Xn −

=

(use (a))

+

1

¯
Xn+1 −Xn

2

(n + 1)

2

2

¯
+ Xn+1 − Xn

2

−2

i=1

¯
(X n+1 −Xn ) n+1 ¯
+
2 Xn+1 − Xn n+1 (n + 1)

2

n

¯
(Xi − Xn ) = 0

since
1

=
5.16 a.
b.

3 i=1 Xi −1 i (n −

Xi −i 2 i 2
1)Sn

n
¯
+
Xn+1 − Xn n+1 2

.

∼ χ2
3
3 i=2 Xi −i 2 i 2 ∼ t2

c. Square the random variable in part b).
5.17 a. Let U ∼ χ2 and V ∼ χ2 , independent. Their joint pdf is p q
1
p
2

Γ

Γ

q
2

p

2(p+q)/2

q

u 2 −1 v 2 −1 e

−(u+v )
2

.

From Deﬁnition 5.3.6, the random variable X = (U/p)/(V /q ) has an F distribution, so we make the transformation x = (u/p)/(v/q ) and y = u + v . (Of course, many choices of y will do, but this one makes calculations easy. The choice is prompted by the exponential term in the pdf.) Solving for u and v yields u= 1

p q xy q, + px

v=

y q , and |J | =
1 + px

q py q
1 + px

2.

We then substitute into fU,V (u, v ) to obtain fX,Y (x, y ) =

1
Γ

p
2

Γ

q
2

2(p+q)/2

1

p q xy q + px

p
2 −1

y q 1 + px

q
2 −1

e

−y
2

q py q
1 + px

2.

5-6

Solutions Manual for Statistical Inference

Note that the pdf factors, showing that X and Y are independent, and we can read oﬀ the pdfs of each: X has the F distribution and Y is χ2+q . If we integrate out y to recover the p proper constant, we get the F pdf fX (x) =

b. Since Fp,q =

χ2 /p p χ2 /q , q EFp,q

Γ

p+q
2

p
2

Γ

q
2

Γ

p/2

q p xp/2−1 p+q 2

q px 1+

.

let U ∼ χ2 , V ∼ χ2 and U and V are independent. Then we have p q
U /p
V /q p 1 qE p
V

=

E

=

U p =E

q
V

E

(by independence)
(EU = p).

Then
∞

1
V

E

1 vΓ =
0

1

=
Hence, EFp,q =

1 q 2

Γ

q
2

pq p q −2

=

2q/2 q q −2 ,

Γ

q

q−2
2

∞

1

v

v 2 −1 e− 2 dv =
2q/2

q
2

Γ

Γ

2(q−2)/2 =

Γ

v

2q/2

q −2
2 −1

v

e− 2 dv

0

q −2
2
q −2
2

2(q−2)/2 q −2
2

=

2q/2

1
.
q−2

if q > 2. To calculate the variance, ﬁrst calculate
U 2 q2 p2 V 2

2
E(Fp,q ) = E

=

q2
E(U 2 )E p2 1
V2

.

Now
E(U 2 ) = Var(U ) + (EU )2 = 2p + p2 and E

∞

1
V2

1
1
1 v (q/2)−1 e−v/2 dv =
.
2 Γ (q/2) 2q/2 v (q − 2)(q − 4)

=
0

Therefore,
1
q2
(p + 2) q2 p(2 + p)
=
,
2
p
(q − 2)(q − 4) p (q − 2)(q − 4)

2
EFp,q =

and, hence
Var(Fp,q ) =
c. Write X =
d. Let Y =

U/p
V /p

q 2 (p + 2) q2 =2
−
p(q − 2)(q − 4) (q − 2)2
1
X

then

(p/q )X
1+(p/q )X

=

=

V /q
U/p

pX q +pX ,

fY (y )

=

=

Γ

B

q+p−2 p(q − 4)

, q > 4.

∼ Fq,p , since U ∼ χ2 , V ∼ χ2 and U and V are independent. p q

so X =

Γ

2

q q−2 p
2

qY p(1−Y )

q +p
2

Γ pq ,
22

and p 2

p q q
2
−1

q
= p (1 − y )−2 . Thus, Y has pdf

dx dy qy p(1−y )

1+ p p−2
2

p qy q p(1−y ) q q p+q 2

2

p(1 − y )

y 2 −1 (1 − y ) 2 −1 ∼ beta

pq
,
.
22

Second Edition

5-7

5.18 If X ∼ tp , then X = Z/ V /p where Z ∼ n(0, 1), V ∼ χ2 and Z and V are independent. p a. EX = EZ/ V /p = (EZ )(E1/ V /p) = 0, since EZ = 0, as long as the other expectation is ﬁnite. This is so if p > 1. From part b), X 2 ∼ F1,p . Thus VarX = EX 2 = p/(p − 2), if p > 2
(from Exercise 5.17b).
b. X 2 = Z 2 /(V /p). Z 2 ∼ χ2 , so the ratio is distributed F1,p .
1
c. The pdf of X is
Γ( p+1 )
1
2
.
fX (x) =
√
Γ(p/2) pπ (1 + x2 /p)(p+1)/2
Denote the quantity in square brackets by Cp . From an extension of Stirling’s formula
(Exercise 1.28) we have
√
lim Cp

=

p→∞

=

lim √

p→∞

2π
2π

p−1
1
2 +2

p−1
2
p−2
2

e−1/2
√
lim π p→∞

e−

p−2
1
2 +2

p−1
2

− p−2
2

e

1
√
pπ

p−1
1
2 +2

p−2
1
2 +2

p−2
2

p−1
2

√

= p e−1/2 e1/2
√ √, π 2

by an application of Lemma 2.3.14. Applying the lemma again shows that for each x lim 1+x2 /p

(p+1)/2

p→∞

2

= ex

/2

,

establishing the result.
d. As the random variable F1,p is the square of a tp , we conjecture that it would converge to the square of a n(0, 1) random variable, a χ2 .
1
e. The random variable qFq,p can be thought of as the sum of q random variables, each a tp squared. Thus, by all of the above, we expect it to converge to a χ2 random variable as q p → ∞.
5.19 a. χ2 ∼ χ2 + χ2 where χ2 and χ2 are independent χ2 random variables with q and d = p − q p q q d d degrees of freedom. Since χ2 is a positive random variable, for any a > 0, d P (χp > a) = P (χ2 + χ2 > a) > P (χ2 > a). q d q b. For k1 > k2 , k1 Fk1 ,ν ∼ (U + V )/(W/ν ), where U , V and W are independent and U ∼ χ22 , k V ∼ χ21 −k2 and W ∼ χ2 . For any a > 0, because V /(W/ν ) is a positive random variable, ν k we have
P (k1 Fk1 ,ν > a) = P ((U + V )/(W/ν ) > a) > P (U/(W/ν ) > a) = P (k2 Fk2 ,ν > a).
c. α = P (Fk,ν > Fα,k,ν ) = P (kFk,ν > kFα,k,ν ). So, kFα,k,ν is the α cutoﬀ point for the random variable kFk,ν . Because kFk,ν is stochastically larger that (k − 1)Fk−1,ν , the α cutoﬀ for kFk,ν is larger than the α cutoﬀ for (k − 1)Fk−1,ν , that is kFα,k,ν > (k − 1)Fα,k−1,ν .
5.20 a. The given integral is
∞
0

=

√
2
1
1
√ e−t x/2 ν x
(νx)(ν/2)−1 e−νx/2 dx ν/2 2π
Γ(ν/2)2
1 ν ν/2
√
2π Γ(ν/2)2ν/2

∞

2

e−t
0

x/2 ((ν +1)/2)−1 −νx/2

x

e

dx

5-8

Solutions Manual for Statistical Inference

=

1 ν ν/2
√
2π Γ(ν/2)2ν/2

∞

2

x((ν +1)/2)−1 e−(ν +t

)x/2

integrand is kernel of gamma((ν +1)/2, 2/(ν +t2 )

dx

0

ν/2

(ν +1)/2

2 ν +t2

=

1 ν √
Γ((ν + 1)/2)
2π Γ(ν/2)2ν/2

=

1
1 Γ((ν +1)/2)
√
,
2
νπ Γ(ν/2) (1 + t /ν )(ν +1)/2

the pdf of a tν distribution.
b. Diﬀerentiate both sides with respect to t to obtain
∞

νfF (νt) =

yf1 (ty )fν (y )dy,
0

where fF is the F pdf. Now write out the two chi-squared pdfs and collect terms to get νfF (νt)

∞

t−1/2

=

y (ν −1)/2 e−(1+t)y/2 dy

(ν +1)/2

Γ(1/2)Γ(ν/2)2

Γ( ν +1 )2(ν +1)/2
2

t−1/2

=

0

(ν +1)/2

(ν +1)/2

Γ(1/2)Γ(ν/2)2

.

(1 + t)

Now deﬁne y = νt to get fF (y ) =

−1/2
Γ( ν +1 )
(y/ν )
2
, ν Γ(1/2)Γ(ν/2) (1 + y/ν )(ν +1)/2

the pdf of an F1,ν .
c. Again diﬀerentiate both sides with respect to t, write out the chi-squared pdfs, and collect terms to obtain
(ν/m)fF ((ν/m)t) =

∞

t−m/2
(ν +m)/2

Γ(m/2)Γ(ν/2)2

y (m+ν −2)/2 e−(1+t)y/2 dy.
0

Now, as before, integrate the gamma kernel, collect terms, and deﬁne y = (ν/m)t to get fF (y ) =

Γ( ν +m )
2
Γ(m/2)Γ(ν/2)

m ν y m/2−1

m/2

(ν +m)/2

,

(1 + (m/ν )y )

the pdf of an Fm,ν .
5.21 Let m denote the median. Then, for general n we have
P (max(X 1 , . . . , Xn ) > m) =
=

1 − P (Xi ≤ m for i = 1, 2, . . . , n) n 1 − [P (X1 ≤ m)]

= 1−

1
2

n

.

5.22 Calculating the cdf of Z 2 , we obtain
FZ 2 (z )

=
=
=
=
=

√
P ((min(X, Y ))2 ≤ z ) = P (−z ≤ min(X, Y ) ≤ z )
√
√
P (min(X, Y ) ≤ z ) − P (min(X, Y ) ≤ − z )
√
√
[1 − P (min(X, Y ) > z )] − [1 − P (min(X, Y ) > − z )]
√
√
P (min(X, Y ) > − z ) − P (min(X, Y ) > z )
√
√
√
√
P (X > − z )P (Y > − z ) − P (X > z )P (Y > z ),

Second Edition

5-9

where we use the independence of X and Y . Since X and Y are identically distributed, P (X >
a) = P (Y > a) = 1 − FX (a), so
√
√
√
FZ 2 (z ) = (1 − FX (− z ))2 − (1 − FX ( z ))2 = 1 − 2FX (− z ),
√
√ since 1 − FX ( z ) = FX (− z ). Diﬀerentiating and substituting gives fZ 2 (z ) =

√1 d 1
FZ 2 (z ) = fX (− z ) √ = √ e−z/2 z −1/2 , dz z
2π

the pdf of a χ2 random variable. Alternatively,
1
2

P (Z ≤ z )

2

= P [min(X, Y )] ≤ z
√
√
= P (− z ≤ min(X, Y ) ≤ z )
√
√
√
√
= P (− z ≤ X ≤ z, X ≤ Y ) + P (− z ≤ Y ≤ z, Y ≤ X )
√
√
= P (− z ≤ X ≤ z |X ≤ Y )P (X ≤ Y )
√
√
+P (− z ≤ Y ≤ z |Y ≤ X )P (Y ≤ X )
√
√
√
√
1
1
P (− z ≤ X ≤ z ) + P (− z ≤ Y ≤ z ),
=
2
2

using the facts that X and Y are independent, and P (Y ≤ X ) = P (X ≤ Y ) = since X and Y are identically distributed
√
√
2
P (Z ≤ z ) = P (− z ≤ X ≤ z )

1
2

. Moreover,

and fZ 2 (z )

=
=

√
√
d
P (− z ≤ X ≤ z ) = dz 1
√ z −1/2 e−z/2 ,
2π

1
1
1
√ (e−z/2 z −1/2 + e−z/2 z −1/2 )
2
2
2π

the pdf of a χ2 .
1
5.23
∞

P (Z > z )

∞

P (Z > z |x)P (X = x) =

=

P (U1 > z, . . . , Ux > z |x)P (X = x)

x=1
∞x

=

x=1

P (Ui > z )P (X = x)

(by independence of the Ui ’s)

x=1 i=1
∞

∞

P (Ui > z )x P (X = x) =

= x=1 x=1
∞

=

(1 − z )x

1−z

x

1
(1 − z )
(e − 1) x=1 x! 1
(e − 1)x!

=

e

−1 e−1 0 < z < 1.

5.24 Use fX (x) = 1/θ, FX (x) = x/θ, 0 < x < θ. Let Y = X(n) , Z = X(1) . Then, from Theorem
5.4.6,
fZ,Y (z, y ) =

n!
11
0!(n − 2)!0! θ θ

z θ 0

y−z θ n−2

1−

y θ 0

=

n(n − 1)
(y − z )n−2 , 0 < z < y < θ. θn 5-10

Solutions Manual for Statistical Inference

Now let W = Z/Y , Q = Y . Then Y = Q, Z = W Q, and |J | = q . Therefore fW,Q (w, q ) =

n(n − 1) n(n − 1)
(q − wq )n−2 q =
(1 − w)n−2 q n−1 , 0 < w < 1, 0 < q < θ. θn θn

The joint pdf factors into functions of w and q , and, hence, W and Q are independent.
5.25 The joint pdf of X(1) , . . . , X(n) is f (u1 , . . . , un ) =

n!an a−1 u · · · ua−1 , n θan 1

0 < u1 < · · · < un < θ.

Make the one-to-one transformation to Y1 = X(1) /X(2) , . . . , Yn−1 = X(n−1) /X(n) , Yn = X(n) .
2
n
The Jacobian is J = y2 y3 · · · yn −1 . So the joint pdf of Y1 , . . . , Yn is f (y 1 , . . . , yn )

=
=

n!an
2
n
(y1 · · · yn )a−1 (y2 · · · yn )a−1 · · · (yn )a−1 (y2 y3 · · · yn −1 ) θan n!an a−1 2a−1 na y y2 · · · yn −1 , 0 < yi < 1; i = 1, . . . , n − 1, 0 < yn < θ. θan 1

We see that f (y1 , . . . , yn ) factors so Y1 , . . . , Yn are mutually independent. To get the pdf of a Y1 , integrate out the other variables and obtain that fY1 (y1 ) = c1 y1 −1 , 0 < y1 < 1, for some a constant c1 . To have this pdf integrate to 1, it must be that c1 = a. Thus fY1 (y1 ) = ay1 −1 , ia−1 0 < y1 < 1. Similarly, for i = 2, . . . , n − 1, we obtain fYi (yi ) = iayi
, 0 < yi < 1. From na na
Theorem 5.4.4, the pdf of Yn is fYn (yn ) = θna yn −1 , 0 < yn < θ. It can be checked that the product of these marginal pdfs is the joint pdf given above.
5.27 a. fX(i) |X(j) (u|v ) = fX(i) ,X (j) (u, v )/fX(j) (v ). Consider two cases, depending on which of i or j is greater. Using the formulas from Theorems 5.4.4 and 5.4.6, and after cancellation, we obtain the following.
(i) If i < j ,
=

(j − 1)!
1
i− fX (u)FX 1 (u)[FX (v ) − FX (u)]j −i−1 FX−j (v )
(i − 1)!(j − 1 − i)!

=

fX(i) |X(j) (u|v )

(j − 1)! fX (u) FX (u)
(i − 1)!(j − 1 − i)! FX (v ) FX (v )

i−1

1−

FX (u)
FX (v )

j −i−1

,

u < v.

Note this interpretation. This is the pdf of the ith order statistic from a sample of size j −1, from a population with pdf given by the truncated distribution, f (u) = fX (u)/FX (v ), u < v.
(ii) If j < i and u > v , fX(i) |X(j) (u|v )
=

(n − j )! n−i i−1−j j −n fX (u) [1−F X (u)]
[FX (u) − F X (v )]
[1−F X (v )]
(n − 1)!(i − 1 − j )!

=

FX (u) − F X (v )
(n − j )! fX (u)
(i − j − 1)!(n − i)! 1−F X (v )
1−F X (v )

i−j −1

1−

FX (u) − F X (v )
1−F X (v )

n−i

.

This is the pdf of the (i − j )th order statistic from a sample of size n − j , from a population with pdf given by the truncated distribution, f (u) = fX (u)/(1 − FX (v )), u > v .
b. From Example 5.4.7, fV |R (v |r) =

n(n − 1)r n(n − 1)r

n−2

n−2

n

/a

n

(a − r)/a

=

1
,
a−r

r/2 < v < a − r/2.

Second Edition

5-11

5.29 Let Xi = weight of ith booklet in package. The Xi s are iid with EXi = 1 and VarXi = .052 .
100
100
¯
We want to approximate P i=1 Xi > 100.4 = P i=1 Xi /100 > 1.004 = P (X > 1.004).
¯
By the CLT, P (X > 1.004) ≈ P (Z > (1.004 − 1)/(.05/10)) = P (Z > .8) = .2119.
¯
¯
¯
¯
5.30 From the CLT we have, approximately, X1 ∼ n(µ, σ 2 /n), X2 ∼ n(µ, σ 2 /n). Since X1 and X2
¯ 1 − X2 ∼ n(0, 2σ 2 /n). Thus, we want
¯
are independent, X
.99

≈P
=P

¯
¯
X1 −X2 < σ/5
−σ/5
σ/

n/2

<

¯
¯
X1 −X2 σ/ n/2

<

σ/5 σ/ n/2

n
1n
0,
√
√
√
√
P
Xn − a >
=P
Xn − a
Xn + a >
Xn + a
√
= P |Xn − a| >
Xn + a
√
≤ P |Xn − a| > a → 0,
√
√ as n → ∞, since Xn → a in probability. Thus Xn → a in probability.
b. For any > 0,
P

a
−1 ≤
Xn

=P
=P
≥P
=P

a a ≤ Xn ≤
1+
1− a a a− ≤ Xn ≤ a +
1+
1− a a a− ≤ Xn ≤ a +
1+
1+ a |Xn − a| ≤
→ 1,
1+

a+

as n → ∞, since Xn → a in probability. Thus a/Xn → 1 in probability.

a a 0 there exist N such that if n > N , then P (Xn + Yn > c) > 1 − . Choose N1 such that P (Xn > −m) > 1 − /2 and N2 such that P (Yn > c + m) > 1 − /2. Then
P (Xn + Yn > c) ≥ P (Xn > −m, +Yn > c + m) ≥ P (Xn > −m) + P (Yn > c + m) − 1 = 1 − .
¯
¯
5.34 Using EXn = µ and VarXn = σ 2 /n, we obtain
√¯
√
√
n(Xn −µ) n¯ n
E
=
E(Xn − µ) =
(µ − µ) = 0. σ σ σ √¯ n(Xn −µ) n n n σ2
¯
¯
Var
= 2 Var(Xn − µ) = 2 VarX = 2
= 1. σ σ σ σn
¯
5.35 a. Xi ∼ exponential(1). µX = 1, VarX = 1. From the CLT, Xn is approximately n(1, 1/n). So
¯
Xn −1
√ → Z ∼ n(0, 1)
1/ n

and P

¯
Xn −1
√ ≤x
1/ n

→ P (Z ≤ x).

b.
2
d d 1
P (Z ≤ x) =
FZ (x) = fZ (x) = √ e−x /2 . dx dx
2π

d
P
dx
=
=

¯
Xn −1
√ ≤x
1/ n d dx

n

n

√
Xi ≤ x n + n

Xi ∼ gamma(n, 1)

W=

i=1

i=1

√
√
√ d FW (x n + n) = fW (x n + n) · n = dx √
√
√
1
(x n + n)n−1 e−(x n+n) n.
Γ(n)

√
√
√
2
Therefore, (1/Γ(n))(x n + n)n−1 e−(x n+n) n ≈ √1 π e−x /2 as n → ∞. Substituting x = 0
2
√ yields n! ≈ nn+1/2 e−n 2π .
5.37 a. For the exact calculations, use the fact that Vn is itself distributed negative binomial(10r, p).
The results are summarized in the following table. Note that the recursion relation of problem
3.48 can be used to simplify calculations.

v
0
1
2
3
4
5
6
7
8
9
10

(a)
Exact
.0008
.0048
.0151
.0332
.0572
.0824
.1030
.1148
.1162
.1085
.0944

P (Vn = v )
(b)
(c)
Normal App. Normal w/cont.
.0071
.0056
.0083
.0113
.0147
.0201
.0258
.0263
.0392
.0549
.0588
.0664
.0788
.0882
.0937
.1007
.1100
.1137
.1114
.1144
.1113
.1024

Second Edition

5-13

b. Using the normal approximation, we have µv = r(1 − p)/p = 20(.3)/.7 = 8.57 and σv =

2

r(1 − p)/p =

(20)(.3)/.49 = 3.5.

Then,
P (Vn = 0) = 1 − P (Vn ≥ 1) = 1 − P

Vn −8.57
1−8.57
≥
3.5
3.5

= 1 − P (Z ≥ −2.16) = .0154.

Another way to approximate this probability is
P (Vn = 0) = P (Vn ≤ 0) = P

V − 8.57
0−8.57
≤
3.5
3.5

= P (Z ≤ −2.45) = .0071.

Continuing in this way we have P (V = 1) = P (V ≤ 1) − P (V ≤ 0) = .0154 − .0071 = .0083, etc. −
−
c. With the continuity correction, compute P (V = k ) by P (k−.5)5 8.57 ≤ Z ≤ (k+.5)5 8.57 , so
3.
3.
P (V = 0) = P (−9.07/3.5 ≤ Z ≤ −8.07/3.5) = .0104 − .0048 = .0056, etc. Notice that the continuity correction gives some improvement over the uncorrected normal approximation.
5.39 a. If h is continuous given > 0 there exits δ such that |h(xn ) − h(x)| < for |xn − x| < δ . Since
X1 , . . . , Xn converges in probability to the random variable X , then limn→∞ P (|Xn − X | < δ ) = 1. Thus limn→∞ P (|h(Xn ) − h(X )| < ) = 1.
b. Deﬁne the subsequence Xj (s) = s + I[a,b] (s) such that in I[a,b] , a is always 0, i.e, the subsequence X1 , X2 , X4 , X7 , . . .. For this subsequence
Xj (s) →
5.41 a. Let

s if s > 0 s + 1 if s = 0.

= |x − µ|.

(i) For x − µ ≥ 0
P (|Xn − µ| > )

=
=
≥
=

P (|Xn − µ| > x − µ)
P (Xn − µ < −(x − µ)) + P (Xn − µ > x − µ)
P (Xn − µ > x − µ)
P (Xn > x) = 1 − P (Xn ≤ x).

Therefore, 0 = limn→∞ P (|Xn − µ| > ) ≥ limn→∞ 1 − P (Xn ≤ x). Thus limn→∞ P (Xn ≤
x) ≥ 1.
(ii) For x − µ < 0
P (|Xn − µ| > )

=
=
≥
=

P (|Xn − µ| > −(x − µ))
P (Xn − µ < x − µ) + P (Xn − µ > −(x − µ))
P (Xn − µ < x − µ)
P (Xn ≤ x).

Therefore, 0 = limn→∞ P (|Xn − µ| > ) ≥ limn→∞ P (Xn ≤ x).
By (i) and (ii) the results follows.
b. For every > 0,
P (|Xn − µ| > ) ≤ P (Xn − µ < − ) + P (Xn − µ > )
= P (Xn < µ − ) + 1 − P (Xn ≤ µ + ) → 0 as n → ∞.

5-14

Solutions Manual for Statistical Inference

5.43 a. P (|Yn − θ| < ) = P

(n)(Yn − θ) <

lim P (|Yn − θ| < ) = lim P

n→∞

n→∞

(n)

. Therefore,

(n)(Yn − θ) <

(n)

= P (|Z | < ∞) = 1,

where Z ∼ n(0, σ 2 ). Thus Yn → θ in probability.
√
b. By Slutsky’s Theorem (a), g (θ) n(Yn − θ) → g (θ)X where X ∼ n(0, σ 2 ). Therefore
√
√ n[g (Yn ) − g (θ)] = g (θ) n(Yn − θ) → n(0, σ 2 [g (θ)]2 ).
5.45 We do part (a), the other parts are similar. Using Mathematica, the exact calculation is
In[120]:=
f1[x_]=PDF[GammaDistribution[4,25],x] p1=Integrate[f1[x],{x,100,\[Infinity]}]//N 1-CDF[BinomialDistribution[300,p1],149]
Out[120]=
e^(-x/25) x^3/2343750
Out[121]=
0.43347
Out[122]=
0.0119389.
The answer can also be simulated in Mathematica or in R. Here is the R code for simulating the same probability p1100) mean(rbinom(10000, 300, p1)>149)
In each case 10,000 random variables were simulated. We obtained p1 = 0.438 and a binomial probability of 0.0108.
5.47 a. −2 log(Uj ) ∼ exponential(2) ∼ χ2 . Thus Y is the sum of ν independent χ2 random variables.
2
2
By Lemma 5.3.2(b), Y ∼ χ2ν .
2
b. β log(Uj ) ∼ exponential(2) ∼ gamma(1, β ). Thus Y is the sum of independent gamma random variables. By Example 4.6.8, Y ∼ gamma(a, β ) a c. Let V = j =1 log(Uj ) ∼ gamma(a, 1). Similarly W =
V
Exercise 4.24, V +W ∼ beta(a, b).
5.49 a. See Example 2.1.4.
b. X = g (U ) = − log 1−U . Then g −1 (x) = 1+1−y . Thus
U
e fX (x) = 1 ×

e−y e−y =
−y )2
(1 + e
(1 + e−y )2

b j =1

log(Uj ) ∼ gamma(b, 1). By

− ∞ < y < ∞,

which is the density of a logistic(0, 1) random variable.
−
1
c. Let Y ∼ logistic(µ, β ) then fY (y ) = β fZ ( −(yβ µ) ) where fZ is the density of a logistic(0, 1).
Then Y = βZ + µ. To generate a logistic(µ, β ) random variable generate (i) generate U ∼
U
uniform(0, 1), (ii) Set Y = β log 1−U + µ.
5.51 a. For Ui ∼ uniform(0, 1), EUi = 1/2, VarUi = 1/12. Then
12

¯
Ui − 6 = 12U − 6 =

X= i=1 √

12

¯
U −1/2
√
1/ 12

Second Edition

5-15

√
¯
is in the form n (U −EU )/σ with n = 12, so X is approximately n(0, 1) by the Central
Limit Theorem.
b. The approximation does not have the same range as Z ∼ n(0, 1) where −∞ < Z < +∞, since −6 < X < 6.
c.
12

Ui −6

EX = E

12

12

EUi − 6 =

=

i=1

i=1

i=1

12

− 6 = 6 − 6 = 0.

12

Ui −6

VarX = Var

1
2

= Var

i=1

Ui = 12VarU1 = 1 i=1 EX 3 = 0 since X is symmetric about 0. (In fact, all odd moments of X are 0.) Thus, the ﬁrst three moments of X all agree with the ﬁrst three moments of a n(0, 1). The fourth moment is not easy to get, one way to do it is to get the mgf of X . Since EetU = (et − 1)/t, t Ee

12 i=1 Ui −6

= e−6t

et −1 t 12

=

et/2 − e−t/2 t 12

.

Computing the fourth derivative and evaluating it at t = 0 gives us EX 4 . This is a lengthy calculation. The answer is EX 4 = 29/10, slightly smaller than EZ 4 = 3, where Z ∼ n(0, 1).
5.53 The R code is the following:
a. obs 0 and b − [b] > 0 and y ∈ (0, 1).

Second Edition

b. M = supy
c. M = supy

Γ(a+b) y a−1 (1−y )b−1
Γ(a)Γ(b)
Γ([a]+b) y [a]−1 (1−y )b−1
Γ([a])Γ(b)
Γ(a+b) y a−1 (1−y )b−1
Γ(a)Γ(b)
Γ([a]+1+β ) y [a]+1−1 (1−y )b −1
Γ([a]+1)Γ(b )

5-17

< ∞, since a − [a] > 0 and y ∈ (0, 1).
< ∞, since a − [a] − 1 < 0 and y ∈ (0, 1). b − b > 0

when b = [b] and will be equal to zero when b = b, thus it does not aﬀect the result.
d. Let f (y ) = y α (1 − y )β . Then df (y )
= αy α−1 (1 − y )β − y α β (1 − y )β −1 = y α−1 (1 − y )β −1 [α(1 − y ) + βy ] dy which is maximize at y =
M=

α α+β .

Γ(a+b)
Γ(a)Γ(b)
Γ(a +b )
Γ(a )Γ(b )

Therefore for, α = a − a and β = b − b a−a a−a a−a +b−b

We need to minimize M in a and b . First consider αα c

c = α + β , then this term becomes

is at α = 1 c. Then M = ( 1 )(a−a +b−b )
2
2

c−α c−α
.
c

Γ(a+b)
Γ(a)Γ(b)
Γ(a +b )
Γ(a )Γ(b )

b−b

b−b a−a +b−b

.

a−a

a−a a−a +b−b

This term is

b−b a−a +b−b maximize at α c b−b

. Let
=

1
2,

this

. Note that the minimum that M could be

is one, which it is attain when a = a and b = b . Otherwise the minimum will occur when a − a and b − b are minimum but greater or equal than zero, this is when a = [a] and b = [b] or a = a and b = [b] or a = [a] and b = b.
5.63 M = supy y= −1 λ −y 2
√1 e 2
2π
−|y |
1
λ
2λ e

. Let f (y ) =

−y 2
2

+

|y |
λ.

Then f (y ) is maximize at y =

when y < 0. Therefore in both cases M =

−1
√1 e 2λ2
2π
−1
1
λ2
2λ e

1 λ when y ≥ 0 and at
1

. To minimize M let M = λe 2λ2 .

1
1
Then d logλM = λ − λ3 , therefore M is minimize at λ = 1 or λ = −1. Thus the value of λ that d will optimize the algorithm is λ = 1.

5.65 m P (X ∗ ≤ x)

m

P (X ∗ ≤ x|qi )qi =

= i=1 m f (Yi ) i=1 g (Yi ) I (Yi ≤ m f (Yi )
1
i=1 g (Yi ) m 1 m i=1

(Y )
Eg f (Y ) I (Y ≤ g (Y )
Eg f ( Y ) g −→ m→∞ I (Yi ≤ x)qi =

x)
=

x
−∞
∞
−∞

f (y ) g (y ) g (y )dy f (y ) g (y ) g (y )dy

x)

x

=

f (y )dy.
−∞

5.67 An R code to generate the sample of size 100 from the speciﬁed distribution is shown for part
c). The Metropolis Algorithm is used to generate 2000 variables. Among other options one can choose the 100 variables in positions 1001 to 1100 or the ones in positions 1010, 1020, ..., 2000.
a. We want to generate X = σZ + µ where Z ∼ Student’s t with ν degrees of freedom.
Therefore we ﬁrst can generate a sample of size 100 from a Student’s t distribution with ν degrees of freedom and then make the transformation to obtain the X’s. Thus fZ (z ) =
Γ( ν +1 ) 1
2
√
Γ( ν ) νπ 2

1
1+

z2 ν (v +1)/2

ν
. Let V ∼ n(0, ν −2 ) since given ν we can set

EV = EZ = 0,

and Var(V ) = Var(Z ) =

ν
.
ν−2

Now, follow the algorithm on page 254 and generate the sample Z1 , Z2 . . . , Z100 and then calculate Xi = σZi + µ.

5-18

Solutions Manual for Statistical Inference

b. fX (x) =

2 /2σ 2

e−(log x−µ)
√1
x
2πσ

. Let V ∼ gamma(α, β ) where
2

(eµ+(σ /2) )2 α = 2(µ+σ2 )
,
e
− e2µ+σ2

2

2

e2(µ+σ ) − e2µ+σ and β =
,
eµ+(σ2 /2)

since given µ and σ 2 we can set
EV = αβ = eµ+(σ and Var(V ) = αβ 2 = e2(µ+σ

2

)

2

/2)

= EX
2

− e2µ+σ = Var(X ).

Now, follow the algorithm on page 254.
c. fX (x) =

α βe −xα β xα−1 . Let V ∼ exponential(β ). Now, follow the algorithm on page 254 where ρi = min

Viα−1 −Vi α− e
Zi−11

α +V −Z α i i−1 +Zi−1 β ,1

An R code to generate a sample size of 100 from a Weibull(3,2) is:
#initialize a and b b L(θ |x) if and
ˆ
only if log L(θ|x) > log L(θ |x). So the value θ that maximizes log L(θ|x) is the same as the value that maximizes L(θ|x).
7.5 a. The value z solves the equation
ˆ
(1 − p)n =

(1 − xi z ), i ˆ where 0 ≤ z ≤ (maxi xi )−1 . Let k = greatest integer less than or equal to 1/z . Then from
ˆ
ˆ
Example 7.2.9, k must satisfy n [k (1 − p)] ≥

(k − xi )

and

i

n

[(k + 1)(1 − p)] <

(k + 1 − xi ). i ˆ
Because the right-hand side of the ﬁrst equation is decreasing in z , and because k ≤ 1/z (so
ˆ
ˆ
ˆ) and k + 1 > 1/z , k must satisfy the two inequalities. Thus k is the MLE.
ˆ
ˆ
ˆ
z ≤ 1/k
ˆ
ˆ
4

b. For p = 1/2, we must solve 1 = (1 − 20z )(1 − z )(1 − 19z ), which can be reduced to the
2
cubic equation −380z 3 + 419z 2 − 40z + 15/16 = 0. The roots are .9998, .0646, and .0381,
ˆ
ˆ leading to candidates of 1, 15, and 26 for k . The ﬁrst two are less than maxi xi . Thus k = 26.
−2
7.6 a. f (x|θ) = i θx−2 I[θ,∞) (xi ) = θn I[θ,∞) (x(1) ). Thus, X(1) is a suﬃcient statistic for i i xi θ by the Factorization Theorem.
−2
b. L(θ|x) = θn
I[θ,∞) (x(1) ). θn is increasing in θ. The second term does not involve θ. i xi
So to maximize L(θ|x), we want to make θ as large as possible. But because of the indicator
ˆ
function, L(θ|x) = 0 if θ > x(1) . Thus, θ = x(1) .
∞

∞

c. E X = θ θx−1 dx = θ logx|θ = ∞. Thus the method of moments estimator of θ does not exist. (This is the Pareto distribution with α = θ, β = 1.)
√
7.7 L(0|x) = 1, 0 < xi < 1, and L(1|x) = i 1/(2 xi ), 0 < xi < 1. Thus, the MLE is 0 if
√
√
1 ≥ i 1/(2 xi ), and the MLE is 1 if 1 < i 1/(2 xi ).
7.8 a. E X 2 = Var X + µ2 = σ 2 . Therefore X 2 is an unbiased estimator of σ 2 .
b.
L(σ |x)
∂ logL
∂σ
∂ 2 logL
∂σ 2

2
2
1 e−x /(2σ ) . log L(σ |x) = log(2π )−1/2 − log σ − x2 /(2σ 2 ).
2πσ
√
1
x2 set
= − + 3 = 0 ⇒ σ X 2 = σ 3 ⇒ σ = X 2 = |X |.
ˆ
ˆ
ˆ
σσ
−3x2 σ 2
1
ˆ
=
+ 2 , which is negative at σ = |x|. σ6 σ

=

√

Thus, σ = |x| is a local maximum. Because it is the only place where the ﬁrst derivative is
ˆ
zero, it is also a global maximum.
1
1
2
ˆ
c. Because E X = 0 is known, just equate E X 2 = σ 2 = n i=1 Xi = X 2 ⇒ σ = |X |.
7.9 This is a uniform(0, θ) model. So E X = (0 + θ)/2 = θ/2. The method of moments estimator
˜
˜
˜
¯
¯
is the solution to the equation θ/2 = X , that is, θ = 2X . Because θ is a simple function of the sample mean, its mean and variance are easy to calculate. We have θ ˜
¯
E θ = 2E X = 2E X = 2 = θ,
2

θ2 /12 θ2 ˜
¯
and Var θ = 4Var X = 4
=
. n 3n

Second Edition

7-3

The likelihood function is n L(θ|x) =

1
1
I[0,θ] (xi ) = n I[0,θ] (x(n) )I[0,∞) (x(1) ), θ θ i=1 where x(1) and x(n) are the smallest and largest order statistics. For θ ≥ x(n) , L = 1/θn , a
ˆ
decreasing function. So for θ ≥ x(n) , L is maximized at θ = x(n) . L = 0 for θ < x(n) . So the
ˆ
ˆ overall maximum, the MLE, is θ = X(n) . The pdf of θ = X(n) is nxn−1 /θn , 0 ≤ x ≤ θ. This can be used to calculate
ˆ
Eθ =

n θ, n+1

ˆ
E θ2 =

n2 θ n+2

ˆ and Var θ =

nθ2

2.

(n + 2)(n + 1)

˜
ˆ
θ is an unbiased estimator of θ; θ is a biased estimator. If n is large, the bias is not large because n/(n + 1) is close to one. But if n is small, the bias is quite large. On the other hand,
ˆ
˜
ˆ
˜
Var θ < Var θ for all θ. So, if n is large, θ is probably preferable to θ. n α−1

α α 7.10 a. f (x|θ) = i β α xα−1 I[0,β ] (xi ) = β α ( i xi )
I(−∞,β ] (x(n) )I[0,∞) (x(1) ) = L(α, β |x). By i the Factorization Theorem, ( i Xi , X(n) ) are suﬃcient.
b. For any ﬁxed α, L(α, β |x) = 0 if β < x(n) , and L(α, β |x) a decreasing function of β if β ≥ x(n) . Thus, X(n) is the MLE of β . For the MLE of α calculate

∂
∂
logL = nlogα−nαlogβ +(α−1)log
∂α
∂α

xi = i n
− n log β + log α xi . i ˆ
Set the derivative equal to zero and use β = X(n) to obtain n α=
ˆ
nlogX(n) − log

1
=
n i Xi

−1

(logX(n) − logXi )

.

i

The second derivative is −n/α2 < 0, so this is the MLE.
ˆ
c. X(n) = 25.0, log i Xi = i log Xi = 43.95 ⇒ β = 25.0, α = 12.59.
ˆ
7.11 a. θ −1

f (x|θ)

θxθ−1 i =

=θ

n

i

d log L = dθ d nlogθ+(θ−1)log dθ

= L(θ|x)

xi i xi i =

n
+
θ

log xi . i 1
ˆ
Set the derivative equal to zero and solve for θ to obtain θ = (− n i log xi )−1 . The second
ˆ
derivative is −n/θ2 < 0, so this is the MLE. To calculate the variance of θ, note that
ˆ = n/T , where
Yi = − log Xi ∼ exponential(1/θ), so − i log Xi ∼ gamma(n, 1/θ). Thus θ
T ∼ gamma(n, 1/θ). We can either calculate the ﬁrst and second moments directly, or use
ˆ
the fact that θ is inverted gamma (page 51). We have

1
T
1
E2
T
E

=
=

θn
Γ(n)
θn
Γ(n)

∞
0
∞
0

1 n−1 −θt θn Γ(n − 1) θ t e dt =
=
. n−1 t
Γ(n) θ n−1 1 n−1 −θt θn Γ(n − 2) θ2 t e dt =
=
,
2
n−2 t Γ(n) θ
(n − 1)(n − 2)

7-4

Solutions Manual for Statistical Inference

and thus n θ n−1 ˆ
Eθ =

n2

ˆ
Var θ =

and

2

(n − 1) (n − 2)

θ2 → 0 as n → ∞.

b. Because X ∼ beta(θ, 1), E X = θ/(θ + 1) and the method of moments estimator is the solution to
1
θ i Xi
˜
Xi =
⇒θ=
. ni θ+1 n− i X i
7.12 Xi ∼ iid Bernoulli(θ), 0 ≤ θ ≤ 1/2.
a. method of moments:
EX = θ =

1 n ¯
Xi = X

⇒

˜
¯
θ = X.

i

MLE: In Example 7.2.7, we showed that L(θ|x) is increasing for θ ≤ x and is decreasing
¯
¯
¯
for θ ≥ x. Remember that 0 ≤ θ ≤ 1/2 in this exercise. Therefore, when X ≤ 1/2, X is
¯
¯ is the overall maximum of L(θ|x). When X > 1/2, L(θ|x) is an
¯
the MLE of θ, because X increasing function of θ on [0, 1/2] and obtains its maximum at the upper bound of θ which
ˆ
¯ is 1/2. So the MLE is θ = min X, 1/2 .
˜
˜
˜
˜
b. The MSE of θ is MSE(θ) = Var θ + bias(θ)2 = (θ(1 − θ)/n) + 02 = θ(1 − θ)/n. There is no
ˆ), but an expression is simple formula for MSE(θ n ˆ
MSE(θ)

ˆ
E(θ − θ)2 =

=

ny θ (1 − θ)n−y y ˆ
(θ − θ)2 y =0

[n/2]

y
−θ
n

= y =0

2

ny θ (1 − θ)n−y + y n

y =[n/2]+1

1
−θ
2

2

ny θ (1 − θ)n−y , y where Y = i Xi ∼ binomial(n, θ) and [n/2] = n/2, if n is even, and [n/2] = (n − 1)/2, if n is odd.
c. Using the notation used in (b), we have n ˜
¯
MSE(θ) = E(X − θ)2 = y =0

y
−θ
n

2

ny θ (1 − θ)n−y . y Therefore, n ˜
ˆ
MSE(θ) − MSE(θ)

= y =[n/2]+1 n = y =[n/2]+1

y
−θ
n

2

−

y
1
+ − 2θ n2 1
−θ
2

2

ny θ (1 − θ)n−y y y
1
− n2 ny θ (1 − θ)n−y . y The facts that y/n > 1/2 in the sum and θ ≤ 1/2 imply that every term in the sum is positive.
ˆ
˜
ˆ
˜
Therefore MSE(θ) < MSE(θ) for every θ in 0 < θ ≤ 1/2. (Note: MSE(θ) = MSE(θ) = 0 at θ = 0.)
1

1

7.13 L(θ|x) = i 1 e− 2 |xi −θ| = 21 e− 2 Σi |xi −θ| , so the MLE minimizes n 2 where x(1) , . . . , x(n) are the order statistics. For x(j ) ≤ θ ≤ x(j +1) , j n

|x(i) − θ| = i=1 i=1

|xi − θ| =

j

n

(θ − x(i) ) +

i

(x(i) − θ) = (2j − n)θ − i=j +1

|x(i) − θ|,

n

x(i) + i=1 i

x(i) . i=j +1

Second Edition

7-5

This is a linear function of θ that decreases for j < n/2 and increases for j > n/2. If n is even,
2j − n = 0 if j = n/2. So the likelihood is constant between x(n/2) and x((n/2)+1) , and any value in this interval is the MLE. Usually the midpoint of this interval is taken as the MLE. If
ˆ
n is odd, the likelihood is minimized at θ = x((n+1)/2) .
7.15 a. The likelihood is
L(µ, λ|x) =

λ λn/2 exp −
(2π )n i xi
2

(xi − µ)2 µ2 xi

i

.

For ﬁxed λ, maximizing with respect to µ is equivalent to minimizing the sum in the exponential. d dµ

i

(xi − µ)2 d =
2x
µi dµ 2

((xi /µ) − 1)
=−
xi

i

2 ((xi /µ) − 1) xi
.
xi µ2 i

Setting this equal to zero is equivalent to setting xi −1 µ i

= 0,

and solving for µ yields µn = x. Plugging in this µn and maximizing with respect to λ
ˆ
¯
ˆ
amounts to maximizing an expression of the form λn/2 e−λb . Simple calculus yields n ˆ λn =
2b

where

b= i (xi − x)2
¯
.
2¯2 xi x Finally,
2b = i b.
c.
7.17 a.
b.

xi
−2
x2
¯

1
+
x
¯

i

i

1 n =− + xi x
¯

i

1
=
xi

i

1
1
− xi x
¯

.

This is the same as Exercise 6.27b.
This involved algebra can be found in Schwarz and Samanta (1991).
This is a special case of the computation in Exercise 7.2a.
Make the transformation z = (x2 − 1)/x1 , w = x1

⇒

x1 = w, x2 = wz + 1.

The Jacobean is |w|, and fZ (z ) =

fX1 (w)fX2 (wz + 1)wdw =

1 −1/θ e θ2

we−w(1+z)/θ dw,

where the range of integration is 0 < w < −1/z if z < 0, 0 < w < ∞ if z > 0. Thus, fZ (z ) =
Using the fact that

1 −1/θ e θ2

−1/z we−w(1+z)/θ dw
0
∞ we−w(1+z)/θ dw
0

if z < 0 if z ≥ 0

we−w/a dw = −e−w/a (aw + a2 ), we have
−1/θ

fZ (z ) = e

z θ +e(1+z)/zθ (1+z −zθ ) θz (1+z )2
1
2
(1+z )

if z < 0 if z ≥ 0

7-6

Solutions Manual for Statistical Inference

ˆ
c. From part (a) we get θ = 1. From part (b), X2 = 1 implies Z = 0 which, if we use the second
ˆ
density, gives us θ = ∞.
d. The posterior distributions are just the normalized likelihood times prior, so of course they are diﬀerent.
7.18 a. The usual ﬁrst two moment equations for X and Y are x= ¯ y= ¯

EX
EY

= µX ,

1 n = µY ,

1 n 2 x2 = E X 2 = σX + µ2 , i X i 2
2
yi = E Y 2 = σY + µ2 .
Y
i

We also need an equation involving ρ.
1
n

xi yi = E XY = Cov(X, Y ) + (E X )(E Y ) = ρσX σY + µX µY . i Solving these ﬁve equations yields the estimators given. Facts such as
1
n

x2 − x2 =
¯
i

i

i

x2 − ( i n

2

i

xi ) /n

=

− x)2
¯
n

i (xi

are used.
b. Two answers are provided. First, use the Miscellanea: For k L(θ |x) = h(x)c(θ ) exp

wi (θ )ti (x) , i=1 n

n

the solutions to the k equations j =1 ti (xj ) = Eθ j =1 ti (X j ) = nEθ ti (X1 ), i = 1, . . . , k , provide the unique MLE for θ . Multiplying out the exponent in the bivariate normal pdf shows it has this exponential family form with k = 5 and t1 (x, y ) = x, t2 (x, y ) = y , t3 (x, y ) = x2 , t4 (x, y ) = y 2 and t5 (x, y ) = xy . Setting up the method of moment equations, we have xi 2 x2 = n(µ2 + σX ), i X

= nµX ,

i

i

yi

2
2
yi = n(µ2 + σY ),
Y

= nµY ,

i

xi yi

i

=

i

[Cov(X, Y ) + µX µY ] = n(ρσX σY + µX µY ). i These are the same equations as in part (a) if you divide each one by n. So the MLEs are the same as the method of moment estimators in part (a).
For the second answer, use the hint in the book to write
L(θ|x, y)

= L(θ|x)L(θ, x|y)
=

n

2
(2πσX )− 2 exp −

1
2
2σ X

(xi − µX )2 i A
2
× 2πσY (1−ρ2 )

−n
2

exp

−1
2
2σY (1 − ρ2 )

yi − µY + ρ i B

σY
(x − µX ) σX i

2

Second Edition

7-7

¯
We know that x and σX = i (xi − x)2 /n maximizes A; the question is whether given σY ,
¯
ˆ2
2
2 µY , and ρ, does x, σX maximize B ? Let us ﬁrst ﬁx σX and look for µX , that maximizes B .
¯ˆ
ˆ
We have
∂ logB
∂µX

∝

−2 i ⇒

ρσY
(x − µX ) σX i ρσY Σ(xi − µX ).
ˆ
σX

(y i − µY )−
(yi − µY ) =

i

ρσY σX set

=0

X
Similarly do the same procedure for L(θ|y)L(θ, y|x) This implies i (xi −µX ) = ρσY i (yi − σ µY ). The solutions µX and µY therefore must satisfy both equations. If i (yi − µY ) = 0 or
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
i (xi − µX ) = 0, we will get ρ = 1/ρ, so we need i (yi − µY ) = 0 and i (xi − µX ) = 0.
∂ 2 log B
This implies µX = x and µY = y . ( ∂µ2 < 0. Therefore it is maximum). To get σX take
ˆ
¯
ˆ
¯
ˆ2
X

∂ log B
2
∂σX

∝ i ⇒ i ρσY ρσY set
ˆ
(xi − µX )
= 0.
2 (xi − µX ) (y i − µY )− σ σX X ρσY (xi − µX )(yi − µY ) =
ˆ
ˆ
(xi − µX )2 .
ˆ
σX
ˆ

X
ˆ2
ˆ2
ˆ2
Similarly, i (xi − µX )(yi − µY ) = ρσY
ˆ
ˆ i (yi − µY ) . Thus σX and σY must satisfy the σ ˆ
¯ , µY = Y . This implies
¯
above two equations with µX = X ˆ
ˆ

σY
ˆ
σX
ˆ

(xi − x)2 =
¯
i

σX
ˆ2

σX
ˆ
σY
ˆ
2

(yi − y )2 ⇒
¯

i

i

(xi − x)2
¯
=
2
σX
ˆ

i

(y i − y )2
¯
.
2
σY
ˆ

σY
ˆ2
2

Therefore,
= a i (xi − x) ,
¯
= a i (yi − y )2 where a is a constant. Combining the
¯
1 knowledge that x, n i (xi − x) = (ˆX , σX ) maximizes A, we conclude that a = 1/n.
¯
¯ µ ˆ2
Lastly, we ﬁnd ρ, the MLE of ρ. Write
ˆ
log L(¯, y , σX , σY , ρ|x, y) x ¯ ˆ2 ˆ2 n 1
= − log(1 − ρ2 ) −
2
2(1−ρ2 )

i



¯
¯
¯
¯
(xi − x)2 2ρ(xi − x)(y i − y ) (y i − y )2
−
+ σX ˆ2 σX ,σY
ˆˆ
σY
ˆ2




1 n 2
2n − 2ρ
= − log(1 − ρ ) −
2) 
2
2(1−ρ 

i


(xi − x)(y i − y ) 
¯
¯

σX σ Y
ˆˆ

A

1 n =

and

i (xi

2

1 n − x) and
¯

σY
ˆ2

log L = −

because

σX
ˆ2

n n ρ log(1 − ρ2 ) −
+
A
2
1 − ρ2
1 − ρ2

=

i (yi

2

− y ) . Now
¯

nρ
A(1−ρ2 ) + 2Aρ2 set
∂ log L n =
−
+
= 0.
∂ρ
1 − ρ2
(1−ρ2 )2
(1−ρ2 )2

This implies
A + Aρ2 −nρ−nρ3
ˆˆ
= 0 ⇒ A(1 + ρ2 ) = nρ(1 + ρ2 )
ˆ
ˆ
ˆ
(1−ρ2 )2
(xi − x)(y i − y )
¯
¯
A
1
⇒ ρ=
ˆ
=
.
n ni σX σY
ˆˆ

7-8

Solutions Manual for Statistical Inference

7.19 a.
L(θ|y)

√

= i =

=

1
2πσ 2

exp −

1
(y − βxi )2
2σ 2 i

(2πσ 2 )−n/2 exp −
(2πσ 2 )−n/2 exp −

By Theorem 6.1.2, (

i

Yi2 ,

1
2σ 2 β 2

2

(y i −2βxi yi + β 2 x2 ) i i

2 i xi
2
2σ

exp −

1
2σ 2

2 yi + i β σ2 xi yi

.

i

xi Yi ) is a suﬃcient statistic for (β, σ 2 ).

i

b. logL(β,σ 2 |y) = −

n
1
n log(2π ) − log σ 2 − 2
2
2
2σ

2 yi +

β σ2 xi yi − i β2
2σ 2

x2 . i i

For a ﬁxed value of σ 2 ,
∂ logL
1
=2
∂β
σ

xi yi − i β σ2 set
ˆ
x2 = 0 ⇒ β = i i

i

xi yi
2.
i xi

Also,
∂ 2 logL
1
=2
2
∂β σ x2 < 0, i i

ˆ
ˆ
so it is a maximum. Because β does not depend on σ 2 , it is the MLE. And β is unbiased because xi E Yi xi · βxi
ˆ
Eβ = i
=i
= β.
2
2 i xi i xi
ˆ
c. β = i ai Yi , where ai = xi / tributed with mean β , and
ˆ
Var β =

ˆ x2 are constants. By Corollary 4.6.10, β is normally disj

j

a2 Var Yi i xi
2
j xj

=

i

i

2

σ2 =

x2 2 i σ= x2 )2 jj i

(

σ2
2.
i xi

7.20 a.
E

Yi
=
i xi

1

i

i xi

E Yi =

1 i i

βxi = β.

xi

i

b.
Var
Because

i

Yi i xi

i

x2 − nx2 =
¯
i

=

(

i (xi

1
2
i xi )

− x)2 ≥ 0,
¯

ˆ
Var β =

Var Yi = i i

σ2 nσ 2 σ2 = 2 2=
.
2 nx ¯ nx2 ¯ i xi ) i (

x2 ≥ nx2 . Hence,
¯
i

σ2 σ2 ≤
= Var
2
nx2
¯
i xi

Yi i xi

i

.

ˆ
(In fact, β is BLUE (Best Linear Unbiased Estimator of β ), as discussed in Section 11.3.2.)

Second Edition

7-9

7.21 a.
E

1 n 1
Yi
= xi n

i

1
E Yi
=
xi n i

i

βxi
= β. xi b.
Var

1 n i

Using Example 4.7.8 with ai =

Yi
1
=2 xi n

1/x2 i Var Yi σ2 =2 x2 n i i

1
.
x2 i i

we obtain
1
n

i

1
≥
x2 i n
2.
i xi

Thus, σ2 σ2
≤2
2 n i xi

ˆ
Var β =

1
1
= Var x2 n i i

i

Yi
.
xi

Because g (u) = 1/u2 is convex, using Jensen’s Inequality we have
1
1
≤
x2
¯
n

1
.
x2 i i

Thus,
Var

Yi i xi

i

=

σ2 σ2 ≤2 nx2 ¯ n i

1
1
2 = Var n xi i

Yi
.
xi

7.22 a.
√
2
2
n −n(¯−θ)2 /(2σ2 ) 1
√
f (¯, θ) = f (¯|θ)π (θ) = √ x x ex e−(θ−µ) /2τ .
2πσ
2πτ
b. Factor the exponent in part (a) as
1
1
1
−n
(¯ − θ)2 − 2 (θ − µ)2 = − 2 (θ − δ (x))2 − 2 x (¯ − µ)2 , x 2σ 2
2τ
2v τ + σ 2 /n where δ (x) = (τ 2 x + (σ 2 /n)µ)/(τ 2 + σ 2 /n) and v = (σ 2 τ 2 /n) (τ + σ 2 /n). Let n(a, b) denote
¯
the pdf of a normal distribution with mean a and variance b. The above factorization shows that f (x, θ) = n(θ, σ 2 /n) × n(µ, τ 2 ) = n(δ (x), v 2 ) × n(µ, τ 2 + σ 2 /n),
¯
where the marginal distribution of X is n(µ, τ 2 + σ 2 /n) and the posterior distribution of θ|x is n(δ (x), v 2 ). This also completes part (c).
7.23 Let t = s2 and θ = σ 2 . Because (n − 1)S 2 /σ 2 ∼ χ2 −1 , we have n f (t|θ) =

1
Γ ((n − 1)/2) 2(n−1)/2

n−1 t θ

[(n−1)/2]−1

e−(n−1)t/2θ

n−1
.
θ

With π (θ) as given, we have (ignoring terms that do not depend on θ)
1
θ

π (θ|t) ∝
∝

1 θ ((n−1)/2)−1

e−(n−1)t/2θ
((n−1)/2)+α+1

exp −

1 θ 1 −1/βθ e θα+1

1 (n − 1)t
1
+ θ 2 β ,

7-10

Solutions Manual for Statistical Inference

which we recognize as the kernel of an inverted gamma pdf, IG(a, b), with a= n−1
+α
2

and

(n − 1)t
1
+
2
β

b=

−1

.

Direct calculation shows that the mean of an IG(a, b) is 1/((a − 1)b), so
E(θ|t) =

n−1
1
2 t+ β n−1 2 + α−1

=

n−1 2
2s
n−1
2+

+

1 β α−1

.

This is a Bayes estimator of σ 2 .
7.24 For n observations, Y = i Xi ∼ Poisson(nλ).
a. The marginal pmf of Y is
∞

m(y )

=
0

=

(nλ)y e−nλ
1
λα−1 e−λ/β dλ y! Γ(α)β α

ny y !Γ(α)β α

∞

ny
Γ(y + α) y !Γ(α)β α

λ

λ(y+α)−1 e− β/(nβ+1) dλ =

0

β nβ +1

y +α

.

Thus, λ λ(y+α)−1 e− β/(nβ+1) β f (y |λ)π (λ)
=
π (λ|y ) = y +α ∼ gamma y + α, m(y ) nβ +1
Γ(y +α) nββ
+1

.

b.
E(λ|y )

=

(y + α)

Var(λ|y )

=

(y + α)

β nβ +1 β2 β
1
y+
(αβ ). nβ +1 nβ +1

=
2.

(nβ +1)

7.25 a. We will use the results and notation from part (b) to do this special case. From part (b), the Xi s are independent and each Xi has marginal pdf
∞

m(x|µ, σ 2 , τ 2 ) =

∞

f (x|θ, σ 2 )π (θ|µ, τ 2 ) dθ =
−∞

−∞

1 −(x−θ)2 /2σ2 −(θ−µ)2 /2τ 2 e e dθ. 2πστ

Complete the square in θ to write the sum of the two exponents as θ− −

xτ 2 σ 2 +τ 2

+

µσ 2 σ 2 +τ 2

2τ 2
2 σσ +τ 2
2

2

−

(x − µ)2
.
2(σ 2 + τ 2 )

Only the ﬁrst term involves θ; call it −A(θ). Also, e−A(θ) is the kernel of a normal pdf. Thus,
∞

e−A(θ) dθ =
−∞

√

2π √

στ
,
+ τ2

σ2

and the marginal pdf is m(x|µ, σ 2 , τ 2 )

=
=

a n(µ, σ 2 + τ 2 ) pdf.

(x − µ)2
1√
στ exp −
2π √
2(σ 2 + τ 2 )
2πστ
σ2 + τ 2
(x − µ)2
1
√√ exp −
,
2(σ 2 + τ 2 )
2π σ 2 + τ 2

Second Edition

7-11

b. For one observation of X and θ the joint pdf is h(x, θ|τ ) = f (x|θ)π (θ|τ ), and the marginal pdf of X is
∞

h(x, θ|τ ) dθ.

m(x|τ ) =
−∞

Thus, the joint pdf of X = (X1 , . . . , Xn ) and θ = (θ1 , . . . , θn ) is h(x, θ |τ ) =

h(xi , θi |τ ), i and the marginal pdf of X is
∞

m(x|τ )

∞

···

=
−∞

h(xi , θi |τ ) dθ1 . . . dθn
−∞

∞

i n ∞

···

=
−∞

h(x1 , θ1 |τ ) dθ1
−∞

h(xi , θi |τ ) dθ2 . . . dθn . i=2 The dθ1 integral is just m(x1 |τ ), and this is not a function of θ2 , . . . , θn . So, m(x1 |τ ) can be pulled out of the integrals. Doing each integral in turn yields the marginal pdf m(x|τ ) =

m(xi |τ ). i Because this marginal pdf factors, this shows that marginally X1 , . . . , Xn are independent, and they each have the same marginal distribution, m(x|τ ).
7.26 First write
2
n x f (x1 , . . . , xn |θ)π (θ) ∝ e− 2σ2 (¯−θ) −|θ|/a where the exponent can be written
|θ|
n n n
2
(¯ − θ)2 − x =
(θ − δ± (x)) + 2 x2 − δ± (x)
¯
2σ 2 a 2σ 2
2σ
with δ± (x) = x ±
¯
mean is

σ2 na ,

where we use the “+” if θ > 0 and the “−” if θ < 0. Thus, the posterior
2
n
∞
θe− 2σ2 (θ−δ± (x)) dθ
−∞
.
2
n
∞
e− 2σ2 (θ−δ± (x)) dθ
−∞

E(θ|x) =

Now use the facts that for constants a and b,
∞

a

0

2

e− 2 (t−b) dt
0
∞

− a (t−b)2
2

te
0
0

−∞
∞

dt

π
,
2a

2

∞

2

a

(t − b)e− 2 (t−b) dt +

=

a

2

1 − a b2 e 2 +b a 2

a

be− 2 (t−b) dt =

0

te− 2 (t−b) dt
−∞

a

e− 2 (t−b) dt =

=

0

1 a2
= − e− 2 b + b a π
,
2a

to get
E(θ|x) =

πσ2
2n

2

(δ− (x) + δ+ (x)) + σ n 2

n

2

n

2

e− 2σ2 δ+ (x) −e− 2σ2 δ− (x)
.

πσ2
2n

π
,
2a

7-12

Solutions Manual for Statistical Inference

7.27 a. The log likelihood is n −βτi + yi log(βτi ) − τi + xi log(τi ) − log yi ! − log xi !

log L = i=1 and diﬀerentiation gives n ∂ log L
∂β

=

∂ log L
∂τj

=

−τi + i=1 ⇒

n

τj =

n j =1

ˆ
Combining these expressions yields β =

⇒ n j =1

xj +
1+β

j =1 n j =1

n j =1

yj /

n i=1 yi n i=1 τi

β=

yj β xj −i+ βτj τj

−β +

⇒

y i τi βτi τj = yj xj + yj
1+β

.

xj and τj =
ˆ

xj +yj
ˆ.
1+β

b. The stationary point of the EM algorithm will satisfy
ˆ
β= τ1 ˆ

=

τj
ˆ

=

n i=1 yi n i=2

τ1 +
ˆ
τ1 + y 1
ˆ
ˆ β+1 xj + yj
.
ˆ β+1 xi

The second equation yields τ1 = y1 /β , and substituting this into the ﬁrst equation yields n n β= j =2 yj / j =2 xj . Summing over j in the third equation, and substituting β = n n n n
ˆ
j =2 yj / j =2 xj shows us that j =2 τj = j =2 xj , and plugging this into the ﬁrst equaˆ tion gives the desired expression for β . The other two equations in (7.2.16) are obviously satisﬁed. ˆ
c. The expression for β was derived in part (b), as were the expressions for τi .
ˆ
7.29 a. The joint density is the product of the individual densities.
b. The log likelihood is n −mβτi + yi log(mβτi ) + xi log(τi ) + log m! − log yi ! − log xi !

log L = i=1 and
∂
log L = 0
∂β
∂ log L = 0
∂τj
n

n

⇒

β=

⇒

τj =

n

n i=1 yi n i=1 mτi

xj + yj
.
mβ

ˆ
Since
τj = 1, β = i=1 yi /m = i=1 yi / i=1 xi . Also, j τj = j (yj + xj ) = 1, which implies that mβ = j (yj + xj ) and τj = (xj + yj )/ i (yi + xi ).
ˆ
c. In the likelihood function we can ignore the factorial terms, and the expected complete-data
(r )
(r ) likelihood is obtained by on the rth iteration by replacing x1 with E(X1 |τ1 ) = mτ1 .
ˆ
ˆ
Substituting this into the MLEs of part (b) gives the EM sequence.

Second Edition

7-13

ˆ
The MLEs from the full data set are β = 0.0008413892 and τ= ˆ

(0.06337310, 0.06374873, 0.06689681, 0.04981487, 0.04604075, 0.04883109,
0.07072460, 0.01776164, 0.03416388, 0.01695673, 0.02098127, 0.01878119,
0.05621836, 0.09818091, 0.09945087, 0.05267677, 0.08896918, 0.08642925).

The MLEs for the incomplete data were computed using R, where we take m =
R code is

xi . The

#mles on the incomplete data# xdatam Var X .
c. We formulate a general theorem. Let T (X ) be a complete suﬃcient statistic, and let T (X ) be any statistic other than T (X ) such that E T (X ) = E T (X ). Then E[T (X )|T (X )] = T (X ) and Var T (X ) > Var T (X ).

Second Edition

7-21

7.53 Let a be a constant and suppose Covθ0 (W, U ) > 0. Then
Varθ0 (W + aU ) = Varθ0 W + a2 Varθ0 U + 2aCovθ0 (W, U ).
Choose a ∈ −2Covθ0 (W, U ) Varθ0 U, 0 . Then Varθ0 (W + aU ) < Varθ0 W , so W cannot be best unbiased.
7.55 All three parts can be solved by this general method. Suppose X ∼ f (x|θ) = c(θ)m(x), a < x < θ θ. Then 1/c(θ) = a m(x) dx, and the cdf of X is F (x) = c(θ)/c(x), a < x < θ. Let Y = X(n) be the largest order statistic. Arguing as in Example 6.2.23 we see that Y is a complete suﬃcient statistic. Thus, any function T (Y ) that is an unbiased estimator of h(θ) is the best unbiased estimator of h(θ). By Theorem 5.4.4 the pdf of Y is g (y |θ) = nm(y )c(θ)n /c(y )n−1 , a < y < θ.
Consider the equations θ θ

f (x|θ) dx = 1

T (y )g (y |θ) dy = h(θ),

and

a

a

which are equivalent to θ m(x) dx = a 1 c(θ) θ

and a T (y )nm(y ) h(θ) dy =
.
c(y )n−1 c(θ)n Diﬀerentiating both sides of these two equations with respect to θ and using the Fundamental
Theorem of Calculus yields m(θ) = −

c (θ) c(θ)2 and

T (θ)nm(θ) c(θ)n h (θ) − h(θ)nc(θ)n−1 c (θ)
=
. n−1 c(θ) c(θ)2n Change θs to y s and solve these two equations for T (y ) to get the best unbiased estimator of h(θ) is h (y )
.
T (y ) = h(y ) + nm(y )c(y )
For h(θ) = θr , h (θ) = rθr−1 .
a. For this pdf, m(x) = 1 and c(θ) = 1/θ. Hence
T (y ) = y r +

ry r−1 n+r r
=
y. n(1/y ) n b. If θ is the lower endpoint of the support, the smallest order statistic Y = X(1) is a complete suﬃcient statistic. Arguing as above yields the best unbiased estimator of h(θ) is
T (y ) = h(y ) −

h (y )
.
nm(y )c(y )

For this pdf, m(x) = e−x and c(θ) = eθ . Hence
T (y ) = y r −

ry r−1 ry r−1
= yr −
.
−y ey ne n

c. For this pdf, m(x) = e−x and c(θ) = 1/(e−θ − e−b ). Hence
T (y ) = y r −

ry r−1 −y ry r−1 (1 − e−(b−y) )
(e − e−b ) = y r −
.
ne−y n 7-22

Solutions Manual for Statistical Inference

7.56 Because T is suﬃcient, φ(T ) = E[h(X1 , . . . , Xn )|T ] is a function only of T . That is, φ(T ) is an estimator. If E h(X1 , . . . , Xn ) = τ (θ), then
E h(X1 , · · · , Xn ) = E [E ( h(X 1 , . . . , X n )| T )] = τ (θ), so φ(T ) is an unbiased estimator of τ (θ). By Theorem 7.3.23, φ(T ) is the best unbiased estimator of τ (θ).
7.57 a. T is a Bernoulli random variable. Hence, n Ep T = Pp (T = 1) = Pp

Xi > Xn+1

= h(p).

i=1

b.

n+1 i=1 n+1 i=1 Xi is a complete suﬃcient statistic for θ, so E T estimator of h(p). We have n+1 n

Xi = y

ET

=P

i=1

Xi = y i=1 n

n+1

=P

Xi > Xn+1 , i=1 n+1 y is the best unbiased

n+1

Xi > Xn+1 i=1 The denominator equals

Xi

n+1

Xi = y

P

i=1

Xi = y . i=1 py (1 − p)n+1−y . If y = 0 the numerator is n P

n+1

Xi > Xn+1 , i=1 Xi = 0

= 0.

i=1

If y > 0 the numerator is n n+1

P

Xi > Xn+1 , i=1 n

Xi = y, X n+1 = 0

+P

n+1

Xi > Xn+1 ,

i=1

i=1

Xi = y, X n+1 = 1 i=1 which equals n n

P

n

X i > 0, i=1 Xi = y P (Xn+1 = 0) + P

n

Xi = y − 1 P (Xn+1 = 1).

Xi > 1,

i=1

i=1

i=1

For all y > 0, n P

n

Xi > 0, i=1 n

Xi = y

=P

Xi = y

i=1

=

i=1

ny p (1 − p)n−y . y If y = 1 or 2, then n P

n

Xi = y − 1

Xi > 1, i=1 = 0.

i=1

And if y > 2, then n P

n

i=1

n

Xi = y − 1

Xi > 1, i=1 Xi = y − 1

=P i=1 =

n py−1 (1 − p)n−y+1 . y−1 Second Edition

7-23

Therefore, the UMVUE is n+1 ET

Xi = y i=1 
 0n y

 (y )p (1−p)n−y (1−p)
(n)
 y 1
= n+1 = (n+1)(n+1−y)
(n+1)py (1−p)n−y+1
(y)
= y  ((n)+( n ))py (1−p)n−y+1
 y y −1
(n)+(y n )


= y n+1−1 = 1 n+1 y (1−p)n−y +1
( y )p
(y)

if y = 0 if y = 1 or 2 if y > 2.

7.59 We know T = (n − 1)S 2 /σ 2 ∼ χ2 −1 . Then n ET

p/2

=
Γ

n−1
2

p

∞

1
2

n−1
2

t

p+n−1
−1
2

e

0

Thus
E

t
−2

(n − 1)S σ2 2

22 Γ dt =
Γ

p+n−1
2
n−1
2

= Cp,n .

p/2

= Cp,n ,

¯ so (n − 1)p/2 S p Cp,n is an unbiased estimator of σ p . From Theorem 6.2.25, (X, S 2 ) is a
¯
complete, suﬃcient statistic. The unbiased estimator (n − 1)p/2 S p Cp,n is a function of (X, S 2 ).
Hence, it is the best unbiased estimator.
7.61 The pdf for Y ∼ χ2 is ν f (y ) =

1 y ν/2−1 e−y/2 .
Γ(ν/2)2ν/2

Thus the pdf for S 2 = σ 2 Y /ν is g (s2 ) =

1 ν 2 Γ(ν/2)2ν/2 σ s2 ν σ2 ν /2−1

e−s

2

ν/(2σ 2 )

.

Thus, the log-likelihood has the form (gathering together constants that do not depend on s2 or σ 2 )
1
s2 s2 log L(σ 2 |s2 ) = log
+ K log
−K 2 +K , σ2 σ2 σ where K > 0 and K > 0.
The loss function in Example 7.3.27 is
L(σ 2 , a) =

a a − log
− 1, σ2 σ2

so the loss of an estimator is the negative of its likelihood.
2

7.63 Let a = τ 2 /(τ 2 + 1), so the Bayes estimator is δ π (x) = ax. Then R(µ, δ π ) = (a − 1) µ2 + a2 .
As τ 2 increases, R(µ, δ π ) becomes ﬂatter.
7.65 a. Figure omitted.
b. The posterior expected loss is E (L(θ, a)|x) = eca E e−cθ − cE(a − θ) − 1, where the expectation is with respect to π (θ|x). Then d set
E (L(θ, a)|x) = ceca E e−cθ − c = 0, da and a = − 1 log E e−cθ is the solution. The second derivative is positive, so this is the minic mum. 7-24

Solutions Manual for Statistical Inference

c. π (θ|x) = n(¯, σ 2 /n). So, substituting into the formula for a normal mgf, we ﬁnd E e−cθ = x 22
¯
e−cx+σ c /2n , and the LINEX posterior loss is
¯
E (L(θ, a)|x) = ec(a−x)+σ

22

c /2n

− c(a − x) − 1.
¯

22

¯
Substitute E e−cθ = e−cx+σ c /2n into the formula in part (b) to ﬁnd the Bayes rule is
2
x − cσ /2n.
¯
¯
d. For an estimator X + b, the LINEX posterior loss (from part (c)) is
2

E (L(θ, x + b)|x) = ecb ec
¯
2

σ 2 /2 n

− cb − 1.

2

¯
For X the expected loss is ec σ /2n − 1, and for the Bayes estimator (b = −cσ 2 /2n) the
¯
expected loss is c2 σ 2 /2n. The marginal distribution of X is m(¯) = 1, so the Bayes risk is x ¯ inﬁnite for any estimator of the form X + b.
2
¯
¯
¯
e. For X + b, the squared error risk is E (X + b) − θ = σ 2 /n + b2 , so X is better than the
Bayes estimator. The Bayes risk is inﬁnite for both estimators.
7.66 Let S = i Xi ∼ binomial(n, θ).
2

ˆ
a. E θ2 = E S2 = n ( i)

b. Tn =

j =i

1
2
n2 E S
2

Xj

=

1 n2 (nθ (1

− θ) + (nθ)2 ) =

θ n +

n−1 2 n θ.

( i)

(n − 1)2 . For S values of i, Tn = (S − 1)2 /(n − 1)2 because the Xi
( i)

that is dropped out equals 1. For the other n − S values of i, Tn = S 2 /(n − 1)2 because the Xi that is dropped out equals 0. Thus we can write the estimator as
JK(Tn ) = n

c. E JK(Tn ) =

S2 n−1 − n2 n

1 n(n−1) (nθ (1

2

S

(S − 1)

2

(n − 1)

− θ) + (nθ)2 − nθ) =

+ (n − S ) n2 θ 2 −nθ 2 n(n−1) S2
2

(n − 1)

=

S 2 −S
.
n(n − 1)

= θ2 .

d. For this binomial model, S is a complete suﬃcient statistic. Because JK(Tn ) is a function of
S that is an unbiased estimator of θ2 , it is the best unbiased estimator of θ2 .

Chapter 8

Hypothesis Testing

8.1 Let X = # of heads out of 1000. If the coin is fair, then X ∼ binomial(1000, 1/2). So
1000

1000 x P (X ≥ 560) = x=560 x

1
2

1
2

n−x

≈ .0000825,

where a computer was used to do the calculation. For this binomial, E X = 1000p = 500 and
Var X = 1000p(1 − p) = 250. A normal approximation is also very good for this calculation.
P {X ≥ 560} = P

559.5−500
X − 500
√
√
≥
250
250

≈ P {Z ≥ 3.763} ≈ .0000839.

Thus, if the coin is fair, the probability of observing 560 or more heads out of 1000 is very small. We might tend to believe that the coin is not fair, and p > 1/2.
8.2 Let X ∼ Poisson(λ), and we observed X = 10. To assess if the accident rate has dropped, we could calculate
10

P ( X ≤ 10| λ = 15) = i=0 e−15 15i
152
1510
= e−15 1+15+
+···+
≈ .11846. i! 2!
10!

This is a fairly large value, not overwhelming evidence that the accident rate has dropped. (A normal approximation with continuity correction gives a value of .12264.)
8.3 The LRT statistic is λ(y ) =

supθ≤θ0 L(θ|y1 , . . . , ym )
.
supΘ L(θ|y1 , . . . , ym )

m

Let y = i=1 yi , and note that the MLE in the numerator is min {y/m,θ0 } (see Exercise 7.12) while the denominator has y/m as the MLE (see Example 7.2.7). Thus
1
λ(y ) =

(θ0 )y (1−θ0 )m−y
(y/m)y (1−y/m)m−y

if y/m ≤ θ0 if y/m > θ0 ,

and we reject H0 if y m−y

(θ0 ) (1−θ0 ) y m−y

(y/m) (1 − y/m)

< c.

To show that this is equivalent to rejecting if y > b, we could show λ(y ) is decreasing in y so that λ(y ) < c occurs for y > b > mθ0 . It is easier to work with log λ(y ), and we have log λ(y ) = y log θ0 + (m − y ) log (1 − θ0 ) − y log

y
− (m − y ) log m m−y m ,

8-2

Solutions Manual for Statistical Inference

and d logλ(y ) dy =
=

log

m−y m y
1
− y + log m y

log θ0 − log(1 − θ0 ) − log

+ (m − y )

1 m−y m−y

θ0 m y/m 1−θ0

.

For y/m > θ0 , 1 − y/m = (m − y )/m < 1 − θ0 , so each fraction above is less than 1, and the d log is less than 0. Thus dy log λ < 0 which shows that λ is decreasing in y and λ(y ) < c if and only if y > b.
8.4 For discrete random variables, L(θ|x) = f (x|θ) = P (X = x|θ). So the numerator and denominator of λ(x) are the supremum of this probability over the indicated sets.
8.5 a. The log-likelihood is log L(θ, ν |x) = n log θ + nθ log ν − (θ + 1) log

xi

ν ≤ x(1) ,

,

i

where x(1) = mini xi . For any value of θ, this is an increasing function of ν for ν ≤ x(1) . So both the restricted and unrestricted MLEs of ν are ν = x(1) . To ﬁnd the MLE of θ, set
ˆ
∂ n log L(θ, x(1) |x) = + n log x(1) − log
∂θ
θ and solve for θ yielding
ˆ
θ=

log(

xi

= 0,

i

n n =. n T i xi /x(1) )

ˆ
(∂ 2 /∂θ2 ) log L(θ, x(1) |x) = −n/θ2 < 0, for all θ. So θ is a maximum.
ˆ
b. Under H0 , the MLE of θ is θ0 = 1, and the MLE of ν is still ν = x(1) . So the likelihood ratio
ˆ
statistic is λ(x) =

2

xn /(
(1)
n n2 /T
(n/T ) x(1)

i

xi )

n/T +1

(

i

=

xi )

T n n

e−T
−T n/T

(e

)

=

T n n

e−T +n .

(∂/∂T ) log λ(x) = (n/T ) − 1. Hence, λ(x) is increasing if T ≤ n and decreasing if T ≥ n.
Thus, T ≤ c is equivalent to T ≤ c1 or T ≥ c2 , for appropriately chosen constants c1 and c2 .
c. We will not use the hint, although the problem can be solved that way. Instead, make the following three transformations. First, let Yi = log Xi , i = 1, . . . , n. Next, make the n-to-1 transformation that sets Z1 = mini Yi and sets Z2 , . . . , Zn equal to the remaining
Yi s, with their order unchanged. Finally, let W1 = Z1 and Wi = Zi − Z1 , i = 2, . . . , n.
Then you ﬁnd that the Wi s are independent with W1 ∼ fW1 (w) = nν n e−nw , w > log ν , n and Wi ∼ exponential(1), i = 2, . . . , n. Now T = i=2 Wi ∼ gamma(n − 1, 1), and, hence,
2
2T ∼ gamma(n − 1, 2) = χ2(n−1) .
8.6 a. λ(x, y) =

supΘ0 L(θ|x, y) supΘ L(θ|x, y)

=

=

supθ supθ,µ supθ

n
1 −xi /θ i=1 θ e n 1 −xi /θ i=1 θ e
1
θ m+n

exp −

supθ,µ θ1 exp {− n n i=1 m 1 −y j /θ j =1 θ e m 1 −y j /µ j =1 µ e n i=1

xi +

m j =1

xi /θ} µ1 exp − m yj m j =1

θ
.
yj /µ

Second Edition

8-3

ˆ
Diﬀerentiation will show that in the numerator θ0 = (
ˆ¯
denominator θ = x and µ = y . Therefore,
ˆ¯

xi +

i

j

yj )/(n + m), while in the

n+m

λ(x, y)

i

=

j

exp −

yj

i

n+m xi +

n n i

=

n+m xi +

i

yj

xi +

j

yj

m n exp −

xi

j

i

(n + m)n+m ( n n mm

m

i xi

xi

m

exp −

yj

j

j

j

yj

yj

m

n

xi )

j

i xi +

j yj

i

yj n+m .

And the LRT is to reject H0 if λ(x, y) ≤ c.
b.
n+m

xi i xi +

(n + m) λ= n n mm

n j i

j

yj

i

m

yj

xi +

j

n+m

=

yj

(n + m) n n mm

T n (1 − T )m .

Therefore λ is a function of T . λ is a unimodal function of T which is maximized when
T = mn n . Rejection for λ ≤ c is equivalent to rejection for T ≤ a or T ≥ b, where a and b
+
are constants that satisfy an (1 − a)m = bn (1 − b)m .
c. When H0 is true, i Xi ∼ gamma(n, θ) and j Yj ∼ gamma(m, θ) and they are independent. So by an extension of Exercise 4.19b, T ∼ beta(n, m).
8.7 a. n L(θ, λ|x) =

1 −(xi −θ)/λ e I[θ,∞) (xi ) = λ i=1

1 λ n

e−(Σi xi −nθ)/λ I[θ,∞) (x(1) ),

ˆ which is increasing in θ if x(1) ≥ θ (regardless of λ). So the MLE of θ is θ = x(1) . Then
∂ log L n =− +
∂λ
λ

i

ˆ xi − nθ set
=0
λ2

ˆ nλ =

⇒

ˆ xi − nθ

ˆ¯ λ = x − x(1) .

⇒

i

Because n ∂ 2 log L
= 2 −2
∂λ2
λ

i

ˆ xi − nθ
3
λ

= x−x(1) ¯

n
2

(¯ − x(1) ) x −

2n(¯ − x(1) ) x 3

(¯ − x(1) ) x =

−n
2

(¯ − x(1) ) x < 0,

ˆ
ˆ¯
we have θ = x(1) and λ = x − x(1) as the unrestricted MLEs of θ and λ. Under the restriction θ ≤ 0, the MLE of θ (regardless of λ) is
ˆ
θ0 =

0 x(1) if x(1) > 0 if x(1) ≤ 0.

ˆ
ˆ
For x(1) > 0, substituting θ0 = 0 and maximizing with respect to λ, as above, yields λ0 = x.
¯
Therefore, λ(x) =

sup{(λ,θ):θ≤0} L(λ,θ | x) supΘ0 L(θ,λ | x)
=
=
ˆˆ
supΘ L(θ,λ | x)
L(θ, λ | x)

1
L(¯,0|x)
x
ˆˆ
L(λ,θ |x)

if x(1) ≤ 0 if x(1) > 0,

where
L(¯, 0 | x) x =
ˆˆ
L(λ, θ | x)

n

¯¯
(1/x) e−nx/x
¯

ˆ
1/λ

n

x x e−n(¯−x(1) )/(¯−x(1) )

=

ˆ λ x
¯

n

=

x−x(1)
¯
x
¯

n

= 1−

x(1) x ¯

n

.

So rejecting if λ(x) ≤ c is equivalent to rejecting if x(1) /x ≥ c∗ , where c∗ is some constant.
¯

8-4

Solutions Manual for Statistical Inference

b. The LRT statistic is λ(x) =

supβ (1/β n )e−Σi xi /β supβ,γ (γ n /β n )(

i

γ −1 −Σi xγ /β . i xi )

e

ˆ
The numerator is maximized at β0 = x. For ﬁxed γ , the denominator is maximized at
¯
γ
ˆγ = β xi /n. Thus i λ(x) =

x−n e−n
¯
n

γ −1

ˆ supγ (γ n /β γ )(

xi )

i

ˆ
−Σi xγ /β γ i x−n
¯
n n /β )( supγ (γ ˆγ

=

e

γ −1

i

.

xi )

The denominator cannot be maximized in closed form. Numeric maximization could be used to compute the statistic for observed data x.
8.8 a. We will ﬁrst ﬁnd the MLEs of a and θ. We have n L(a, θ | x)

√

= i=1 n

log L(a, θ | x)

=

2
1
e−(xi −θ) /(2aθ) ,
2πaθ

1
1
(xi − θ)2 .
− log(2πaθ) −
2
2aθ i=1 Thus n ∂ log L
∂a

=

∂ log L
∂θ

=

− i=1 n

− i=1 1
1
2
+
(x − θ)
2a 2θa2 i

−

=

n
1
+
2a 2θa2

n

(xi − θ)2

set
=0

i=1

1
1
1
2
+
(x − θ) + (xi − θ)
2θ 2aθ2 i aθ n
1
=− +
2θ 2aθ2

n

nx − nθ
¯
aθ

(xi − θ)2 + i=1 set
= 0.

ˆ
We have to solve these two equations simultaneously to get MLEs of a and θ, say a and θ.
ˆ
Solve the ﬁrst equation for a in terms of θ to get a= 1 nθ n

(xi − θ)2 . i=1 Substitute this into the second equation to get
−

n n n(¯−θ) x +
+
= 0.
2θ 2θ aθ ˆ¯
So we get θ = x, and a= ˆ

1 nx ¯

n

(xi − x)2 =
¯
i=1

σ2
ˆ
, x ¯

the ratio of the usual MLEs of the mean and variance. (Veriﬁcation that this is a maximum is lengthy. We omit it.) For a = 1, we just solve the second equation, which gives a quadratic in θ that leads to the restricted MLE
2

ˆ θR =

−1+ 1+4(ˆ +¯2 ) σx 2

.

Second Edition

8-5

Noting that aθ = σ 2 , we obtain
ˆˆ ˆ

λ(x)

=

n i=1 ˆ
L(θR | x)
L(ˆ, θ | x) aˆ =

=
=

n i=1 n/2

ˆ
1/(2π θR )

√

ˆ

1
ˆ
2 π θR

e−(xi −θR )

2π aθ
ˆˆ

ˆ

ˆ
/(2θ R )

ˆ 2 /(2ˆθ ) aˆ √1

e−Σi (xi −θR )

2

2

e−(xi −θ)

ˆ
/(2θ R )

n/2 −Σi (x −x)2 /(2ˆ 2 ) σ i¯ e (1/(2π σ 2 ))
ˆ
n/2

σ 2 /θR
ˆˆ

ˆ

e(n/2)−Σi (xi −θR )

2

ˆ
/(2θ R )

.

b. In this case we have n 1
1
− log(2πaθ2 ) −
(xi − θ)2 .
2
2aθ2

log L(a, θ | x) = i=1 Thus
∂ logL
∂a
∂ logL
∂θ

n

−

= i=1 n

= i=1 =−

1
1
2
+ 2 2 (xi − θ)
2a 2a θ

=−

n
1
+ 22
2a 2a θ

n

(xi − θ)2

set
= 0.

i=1

1
1
1
2
− + 3 (xi − θ) + 2 (xi − θ) θ aθ aθ n
1
+3 θ aθ

n

(xi − θ)2 + i=1 n

1 aθ2 set
= 0.

(xi − θ) i=1 Solving the ﬁrst equation for a in terms of θ yields a= 1 nθ2 n

(xi − θ)2 . i=1 Substituting this into the second equation, we get
−

nn
+ +n θ θ

i

(xi −θ)
2

i (xi −θ )

= 0.

ˆ¯
So again, θ = x and a= ˆ

1 nx2 ¯

n

(xi − x)2 =
¯
i=1

σ2
ˆ
x2
¯

in the unrestricted case. In the restricted case, set a = 1 in the second equation to obtain
∂ log L n 1
=− + 3
∂θ
θ θ n

(xi − θ)2 + i=1 1 θ2 n

set
(xi − θ) = 0.

i=1

Multiply through by θ3 /n to get
−θ2 +

1 n n

(xi − θ)2 − i=1 θ n n

(xi − θ) = 0. i=1 Add ±x inside the square and complete all sums to get the equation
¯
−θ2 + σ 2 + (¯ − θ)2 + θ(¯ − θ) = 0.
ˆ
x x 8-6

Solutions Manual for Statistical Inference

This is a quadratic in θ with solution for the MLE
2

ˆ θR = x +
¯

x+4(ˆ +¯2 )
¯
σx

2.

which yields the LRT statistic
ˆ
L(θR | x) λ(x) =
=
L(ˆ, θ | x) aˆ n i=1 n i=1 ˆ

1
ˆ2
2 π θR

√

e−(xi −θR )

2

2

ˆ
/(2θ R )
ˆ2

1
2π aθ 2
ˆˆ

ˆ2 a e−(xi −θ) /(2ˆθ )

σ
ˆ
ˆR θ =

n

ˆ

e(n/2)−Σi (xi −θR )

2

ˆ
/(2θ R )

.

ˆ
ˆ
¯ −1 , and the MLE of λi under H1 is λi = Y −1 . The
8.9 a. The MLE of λ under H0 is λ0 = Y i LRT statistic is bounded above by 1 and is given by
1≥

−n −n

¯
Y
(

i

e

Yi

.
−1
) e−n

1/n
¯
Rearrangement of this inequality yields Y ≥ ( i Yi ) , the arithmetic-geometric mean inequality. ˆ
b. The pdf of Xi is f (xi |λi ) = (λi /x2 )e−λi /xi , xi > 0. The MLE of λ under H0 is λ0 = i ˆ i = Xi . Now, the argument proceeds as in n/ [ i (1/Xi )], and the MLE of λi under H1 is λ part (a).
8.10 Let Y = i Xi . The posterior distribution of λ|y is gamma (y + α, β/(β + 1)).
a.
y +α

P (λ ≤ λ0 |y ) =

(β +1)
Γ(y +α)β y+α

λ0

ty+α−1 e−t(β +1)/β dt.

0

P (λ > λ0 |y ) = 1 − P (λ ≤ λ0 |y ).
b. Because β/(β + 1) is a scale parameter in the posterior distribution, (2(β + 1)λ/β )|y has a gamma(y + α, 2) distribution. If 2α is an integer, this is a χ2y+2α distribution. So, for
2
α = 5/2 and β = 2,
P (λ ≤ λ0 |y ) = P

2(β +1)λ0
2(β +1)λ
≤
y β β

= P (χ2y+5 ≤ 3λ0 ).
2

8.11 a. From Exercise 7.23, the posterior distribution of σ 2 given S 2 is IG(γ, δ ), where γ = α + (n −
1)/2 and δ = [(n − 1)S 2 /2 + 1/β ]−1 . Let Y = 2/(σ 2 δ ). Then Y |S 2 ∼ gamma(γ, 2). (Note:
If 2α is an integer, this is a χ2γ distribution.) Let M denote the median of a gamma(γ, 2)
2
distribution. Note that M depends on only α and n, not on S 2 or β . Then we have P (Y ≥
2/δ |S 2 ) = P (σ 2 ≤ 1|S 2 ) > 1/2 if and only if
M>

2
2
= (n − 1)S 2 + , δ β

that is,

S2 <

M − 2/β
.
n−1

¯
b. From Example 7.2.11, the unrestricted MLEs are µ = X and σ 2 = (n − 1)S 2 /n. Under H0 ,
ˆ
ˆ
¯ , because this was the maximizing value of µ, regardless of σ 2 . Then because µ is still X
ˆ
L(¯, σ 2 |x) is a unimodal function of σ 2 , the restricted MLE of σ 2 is σ 2 , if σ 2 ≤ 1, and is 1, x ˆ
ˆ
if σ 2 > 1. So the LRT statistic is
ˆ
λ(x) =

1
2 n/2 σ2 (ˆ ) e−n(ˆ −1)/2 σ if σ 2 ≤ 1
ˆ
if σ 2 > 1.
ˆ

Second Edition

8-7

We have that, for σ 2 > 1,
ˆ
∂
2

∂ (ˆ ) σ log λ(x) =

n
2

1
−1
σ2
ˆ

< 0.

So λ(x) is decreasing in σ 2 , and rejecting H0 for small values of λ(x) is equivalent to rejecting
ˆ
for large values of σ 2 , that is, large values of S 2 . The LRT accepts H0 if and only if S 2 < k ,
ˆ
where k is a constant. We can pick the prior parameters so that the acceptance regions match in this way. First, pick α large enough that M/(n − 1) > k . Then, as β varies between
0 and ∞, (M − 2/β )/(n − 1) varies between −∞ and M/(n − 1). So, for some choice of β ,
(M − 2/β )/(n − 1) = k and the acceptance regions match.
√
8.12 a. For H0 : µ ≤ 0 vs. H1 : µ > 0 the LRT is to reject H0 if x > cσ/ n (Example 8.3.3). For
¯
α = .05 take c = 1.645. The power function is
√
¯ µ nµ
X −µ
√ > 1.645− √
.
= P Z > 1.645− β (µ) = P σ σ/ n σ/ n
√
Note that the power will equal .5 when µ = 1.645σ/ n.
√
b. For H0 : µ = 0 vs. HA : µ = 0 the LRT is to reject H0 if |x| > cσ/ n (Example 8.2.2). For
¯
α = .05 take c = 1.96. The power function is
√
√ β (µ) = P −1.96 − nµ/σ ≤ Z ≤ 1.96 + nµ/σ .
√
In this case, µ = ±1.96σ/ n gives power of approximately .5.
8.13 a. The size of φ1 is α1 = P (X1 > .95|θ = 0) = .05. The size of φ2 is α2 = P (X1 +X2 > C |θ = 0).
If 1 ≤ C ≤ 2, this is
1

2

1

α2 = P (X1 + X2 > C |θ = 0) =

1 dx2 dx1 =
1−C

C −x1

Setting this equal to α and solving for C gives C = 2 −
√
C = 2 − .1 ≈ 1.68.
b. For the ﬁrst test we have the power function β1 (θ) = Pθ (X1 > .95) =

√

(2 − C )
.
2

2α, and for α = .05, we get

0 if θ ≤ −.05 θ + .05 if −.05 < θ ≤ .95
1
if .95 < θ.

Using the distribution of Y = X1 + X2 , given by fY (y |θ) =

y − 2θ
2θ + 2 − y
0

if 2θ ≤ y < 2θ + 1 if 2θ+1 ≤ y < 2θ + 2 otherwise, we obtain the power function for the second test as

0

(2θ + 2 − C )2 /2 β2 (θ) = Pθ (Y > C ) =
2

 1 − (C − 2θ) /2
1

if if if if θ ≤ (C/2) − 1
(C/2) − 1 < θ ≤ (C − 1)/2
(C − 1)/2 < θ ≤ C/2
C/2 < θ.

c. From the graph it is clear that φ1 is more powerful for θ near 0, but φ2 is more powerful for larger θs. φ2 is not uniformly more powerful than φ1 .

8-8

Solutions Manual for Statistical Inference

d. If either X1 ≥ 1 or X2 ≥ 1, we should reject H0 , because if θ = 0, P (Xi < 1) = 1. Thus, consider the rejection region given by
{(x1 , x2 ) : x1 + x2 > C }

{(x1 , x2 ) : x1 > 1}

{(x1 , x2 ) : x2 > 1}.

The ﬁrst set is the rejection region for φ2 . The test with this rejection region has the same size as φ2 because the last two sets both have probability 0 if θ = 0. But for 0 < θ < C − 1,
The power function of this test is strictly larger than β2 (θ). If C − 1 ≤ θ, this test and φ2 have the same power.
8.14 The CLT tells us that Z = ( i Xi − np)/ np(1 − p) is approximately n(0, 1). For a test that rejects H0 when i Xi > c, we need to ﬁnd c and n to satisfy
P

Z>

We thus want

c−n(.49)

= .01

n(.49)(.51) c−n(.49) = 2.33

n(.49)(.51)

and P

c−n(.51)

Z>

n(.51)(.49)

c−n(.51)

and

= .99.

= −2.33.

n(.51)(.49)

Solving these equations gives n = 13,567 and c = 6,783.5.
8.15 From the Neyman-Pearson lemma the UMP test rejects H0 if
−n/2

2

2

2 f (x | σ1 )
(2πσ1 ) e−Σi xi /(2σ1 )
=
=
2
2 −n/2 e−Σi x2 /(2σ0 ) f (x | σ0 ) i (2πσ0 )

σ0 σ1 n

1
2

exp

x2 i i

1
1
2 − σ2 σ0 1

>k

for some k ≥ 0. After some algebra, this is equivalent to rejecting if n x2 > i 2log (k (σ1 /σ0 ) )

i

1
2
σ0

−

1
2
σ1

=c

This is the UMP test of size α, where α = Pσ0 (
2
2 α, use the fact that i Xi /σ0 ∼ χ2 . Thus n because

i

2
Xi > c). To determine c to obtain a speciﬁed

2
2
2
Xi /σ0 > c/σ0

α = Pσ0

1
1
2 − σ2 > 0 . σ0 1

2
= P χ2 > c/σ0 , n i
2
2 so we must have c/σ0 = χ2 , which means c = σ0 χ2 . n,α n,α
8.16 a.

Size = P (reject H0 | H0 is true) = 1
Power = P (reject H0 | HA is true) = 1

⇒
⇒

Type I error = 1.
Type II error = 0.

Size = P (reject H0 | H0 is true) = 0
Power = P (reject H0 | HA is true) = 0

⇒
⇒

Type I error = 0.
Type II error = 1.

b.

8.17 a. The likelihood function is µ−1 L(µ, θ|x, y) = µn

xi i θ−1

 θn 

yj  j .

Second Edition

8-9

Maximizing, by diﬀerentiating the log-likelihood, yields the MLEs µ=− ˆ

n i log xi

m
.
j log yj

ˆ and θ = −

Under H0 , the likelihood is
θ−1


L(θ|x, y) = θn+m 

xi

yj 

i

,

j

and maximizing as above yields the restricted MLE,
ˆ
θ0 = −

n+m
.
i log xi + j log yj

The LRT statistic is
ˆ
θm+n λ(x, y) = 0 µn θm
ˆˆ

ˆˆ θ0 −µ

xi

ˆˆ
ˆm
θ0 +n θn θm
=00=
µn θm
ˆˆ
µn θm
ˆˆ

yj 



i

.

j

ˆˆ
ˆ
b. Substituting in the formulas for θ, µ and θ0 yields ( λ(x, y) =

θ0 −θ
ˆˆ



m+n m m

ˆˆ θ0 −θ

ˆˆ θ0 −µ

i xi )

m+n n j yj

= 1 and

n

(1 − T )m T n .

This is a unimodal function of T . So rejecting if λ(x, y) ≤ c is equivalent to rejecting if
T ≤ c1 or T ≥ c2 , where c1 and c2 are appropriately chosen constants.
c. Simple transformations yield − log Xi ∼ exponential(1/µ) and − log Yi ∼ exponential(1/θ).
Therefore, T = W/(W + V ) where W and V are independent, W ∼ gamma(n, 1/µ) and
V ∼ gamma(m, 1/θ). Under H0 , the scale parameters of W and V are equal. Then, a simple generalization of Exercise 4.19b yields T ∼ beta(n, m). The constants c1 and c2 are determined by the two equations
P (T ≤ c1 ) + P (T ≥ c2 ) = α

and (1 − c1 )m cn = (1 − c2 )m cn .
1
2

8.18 a. β (θ)

¯
¯
|X −θ0 |
|X −θ0 |
√ >c
√ ≤c
= 1 − Pθ σ/ n σ/ n cσ cσ
¯
= 1 − Pθ − √ ≤ X −θ0 ≤ √ n n
√
√
¯
−cσ/ n + θ0 −θ
X −θ cσ/ n + θ0 −θ
√
√≤
√
= 1 − Pθ
≤
σ/ n σ/ n σ/ n θ0 −θ θ0 −θ
= 1 − P −c + √ ≤ Z ≤ c + √ σ/ n σ/ n θ0 −θ θ0 −θ
= 1 + Φ −c + √
−Φ c+ √
,
σ/ n σ/ n
= Pθ

where Z ∼ n(0, 1) and Φ is the standard normal cdf.

8-10

Solutions Manual for Statistical Inference

b. The size is .05 = β (θ0 ) = 1 + Φ(−c) − Φ(c) which implies c = 1.96. The power (1 − type II error) is
√
√
√
√
.75 ≤ β (θ0 + σ ) = 1 + Φ(−c − n) − Φ(c − n) = 1 + Φ(−1.96− n) −Φ(1.96 − n).
≈0

Φ(−.675) ≈ .25 implies 1.96 −

√

n = −.675 implies n = 6.943 ≈ 7.

8.19 The pdf of Y is
1 (1/θ)−1 −y1/θ y e
, y > 0. θ By the Neyman-Pearson Lemma, the UMP test will reject if f (y |θ) =

f (y |2)
1 −1/2 y−y1/2 y e
=
> k.
2
f (y |1)
To see the form of this rejection region, we compute d dy

1 −1/2 y−y1/2 y e
2

=

1 −3/2 y−y1/2 y e
2

y−

y 1/2
1
−
2
2

which is negative for y < 1 and positive for y > 1. Thus f (y |2)/f (y |1) is decreasing for y ≤ 1 and increasing for y ≥ 1. Hence, rejecting for f (y |2)/f (y |1) > k is equivalent to rejecting for y ≤ c0 or y ≥ c1 . To obtain a size α test, the constants c0 and c1 must satisfy α = P (Y ≤ c0 |θ = 1) + P (Y ≥ c1 |θ = 1) = 1 − e−c0 + e−c1

f (c0 |2) f (c1 |2)
=
. f (c0 |1) f (c1 |1)

and

Solving these two equations numerically, for α = .10, yields c0 = .076546 and c1 = 3.637798.
The Type II error probability is c1 P (c0 < Y < c1 |θ = 2) = c0 1/2
1 −1/2 −y1/2 y e dy = −e−y
2

c1

= .609824. c0 8.20 By the Neyman-Pearson Lemma, the UMP test rejects for large values of f (x|H1 )/f (x|H0 ).
Computing this ratio we obtain x 1

2

f (x|H1 ) f (x|H0 )

34

6

54

5

32

6

7

1 .84

The ratio is decreasing in x. So rejecting for large values of f (x|H1 )/f (x|H0 ) corresponds to rejecting for small values of x. To get a size α test, we need to choose c so that P (X ≤ c|H0 ) = α. The value c = 4 gives the UMP size α = .04 test. The Type II error probability is
P (X = 5, 6, 7|H1 ) = .82.
8.21 The proof is the same with integrals replaced by sums.
8.22 a. From Corollary 8.3.13 we can base the test on i Xi , the suﬃcient statistic. Let Y =
Xi ∼ binomial(10, p) and let f (y |p) denote the pmf of Y . By Corollary 8.3.13, a test i that rejects if f (y |1/4)/f (y |1/2) > k is UMP of its size. By Exercise 8.25c, the ratio f (y |1/2)/f (y |1/4) is increasing in y . So the ratio f (y |1/4)/f (y |1/2) is decreasing in y , and rejecting for large value of the ratio is equivalent to rejecting for small values of y . To get α = .0547, we must ﬁnd c such that P (Y ≤ c|p = 1/2) = .0547. Trying values c = 0, 1, . . ., we ﬁnd that for c = 2, P (Y ≤ 2|p = 1/2) = .0547. So the test that rejects if Y ≤ 2 is the
UMP size α = .0547 test. The power of the test is P (Y ≤ 2|p = 1/4) ≈ .526.

Second Edition

8-11 k 10

10−k

10
1
1
b. The size of the test is P (Y ≥ 6|p = 1/2) =
≈ .377. The power k=6 k
2
2
10
10 k
10−k
function is β (θ) = k=6 k θ (1 − θ)
c. There is a nonrandomized UMP test for all α levels corresponding to the probabilities
P (Y ≤ i|p = 1/2), where i is an integer. For n = 10, α can have any of the values 0,
1
11
56
176
386
638
848
968
1013 1023
1024 , 1024 , 1024 , 1024 , 1024 , 1024 , 1024 , 1024 , 1024 , 1024 , and 1.
8.23 a. The test is Reject H0 if X > 1/2. So the power function is
1

β (θ) = Pθ (X > 1/2) =
1/2

1
Γ(θ+1) θ−1 x (1 − x)1−1 dx = θ xθ
Γ(θ)Γ(1)
θ

1

=1−
1/2

1
.
2θ

θ

The size is supθ∈H0 β (θ) = supθ≤1 (1 − 1/2 ) = 1 − 1/2 = 1/2.
b. By the Neyman-Pearson Lemma, the most powerful test of H0 : θ = 1 vs. H1 : θ = 2 is given by Reject H0 if f (x | 2)/f (x | 1) > k for some k ≥ 0. Substituting the beta pdf gives f (x|2)
=
f (x|1)

1
2−1
(1 β (2,1) x
1
1−1 (1 β (1,1) x

1−1

− x)

1−1

− x)

=

Γ(3) x = 2x.
Γ(2)Γ(1)

Thus, the MP test is Reject H0 if X > k/2. We now use the α level to determine k . We have
1
θ ∈Θ0

1

fX (x|1) dx =

α = sup β (θ) = β (1) = k/2 k/2

1 k x1−1 (1 − x)1−1 dx = 1 − . β (1, 1)
2

Thus 1 − k/2 = α, so the most powerful α level test is reject H0 if X > 1 − α.
c. For θ2 > θ1 , f (x|θ2 )/f (x|θ1 ) = (θ2 /θ1 )xθ2 −θ1 , an increasing function of x because θ2 > θ1 .
So this family has MLR. By the Karlin-Rubin Theorem, the test that rejects H0 if X > t is the UMP test of its size. By the argument in part (b), use t = 1 − α to get size α.
8.24 For H0 : θ = θ0 vs. H1 : θ = θ1 , the LRT statistic is λ(x) =

L(θ0 |x)
=
max{L(θ0 |x), L(θ1 |x)}

1 if L(θ0 |x) ≥ L(θ1 |x)
L(θ0 |x)/L(θ1 |x) if L(θ0 |x) < L(θ1 |x).

The LRT rejects H0 if λ(x) < c. The Neyman-Pearson test rejects H0 if f (x|θ1 )/f (x|θ0 ) =
L(θ1 |x)/L(θ0 |x) > k . If k = 1/c > 1, this is equivalent to L(θ0 |x)/L(θ1 |x) < c, the LRT. But if c ≥ 1 or k ≤ 1, the tests will not be the same. Because c is usually chosen to be small (k large) to get a small size α, in practice the two tests are often the same.
8.25 a. For θ2 > θ1 ,
2
2
2
2
2
2 g (x | θ2 ) e−(x−θ2 ) /2σ
= −(x−θ )2 /2σ2 = ex(θ2 −θ1 )/σ e(θ1 −θ2 )/2σ .
1
g (x | θ1 ) e Because θ2 − θ1 > 0, the ratio is increasing in x. So the families of n(θ, σ 2 ) have MLR.
b. For θ2 > θ1 , x g (x | θ2 ) e−θ2 θx /x! θ2 = −θ1 2
=
eθ1 −θ2 , x g (x | θ1 ) e θ1 /x! θ1 which is increasing in x because θ2 /θ1 > 1. Thus the Poisson(θ) family has an MLR.
c. For θ2 > θ1 , n−x x n nx g (x | θ2 ) θ2 (1−θ1 )
1 − θ2 x θ (1−θ2 )
=n2
=
.
n−x x θ1 (1−θ2 )
1 − θ1 g (x | θ1 ) x θ1 (1−θ1 )
Both θ2 /θ1 > 1 and (1 − θ1 )/(1 − θ2 ) > 1. Thus the ratio is increasing in x, and the family has MLR.
(Note: You can also use the fact that an exponential family h(x)c(θ) exp(w(θ)x) has MLR if w(θ) is increasing in θ (Exercise 8.27). For example, the Poisson(θ) pmf is e−θ exp(x log θ)/x!, and the family has MLR because log θ is increasing in θ.)

8-12

Solutions Manual for Statistical Inference

8.26 a. We will prove the result for continuous distributions. But it is also true for discrete MLR families. For θ1 > θ2 , we must show F (x|θ1 ) ≤ F (x|θ2 ). Now f (x|θ1 )
−1 . f (x|θ2 )

d
[F (x|θ1 ) − F (x|θ2 )] = f (x|θ1 ) − f (x|θ2 ) = f (x|θ2 ) dx Because f has MLR, the ratio on the right-hand side is increasing, so the derivative can only change sign from negative to positive showing that any interior extremum is a minimum.
Thus the function in square brackets is maximized by its value at ∞ or −∞, which is zero.
b. From Exercise 3.42, location families are stochastically increasing in their location param2 eter, so the location Cauchy family with pdf f (x|θ) = (π [1+(x−θ) ])−1 is stochastically increasing. The family does not have MLR.
8.27 For θ2 > θ1 , c(θ2 ) [w(θ2 )−w(θ1 )]t g (t|θ2 ) e = g (t|θ1 ) c(θ1 ) which is increasing in t because w(θ2 ) − w(θ1 ) > 0. Examples include n(θ, 1), beta(θ, 1), and
Bernoulli(θ).
8.28 a. For θ2 > θ1 , the likelihood ratio is
2

f (x|θ2 )
1+ex−θ1
= eθ1 −θ2 f (x|θ1 )
1+ex−θ2

.

The derivative of the quantity in brackets is d 1+ex−θ1 ex−θ1 − ex−θ2
=
2. dx 1+ex−θ2
(1+ex−θ2 )
Because θ2 > θ1 , ex−θ1 > ex−θ2 , and, hence, the ratio is increasing. This family has MLR.
b. The best test is to reject H0 if f (x|1)/f (x|0) > k . From part (a), this ratio is increasing in x. Thus this inequality is equivalent to rejecting if x > k . The cdf of this logistic is
F (x|θ) = ex−θ (1 + ex−θ ). Thus α = 1 − F (k |0) =

1
1+ek

and β = F (k |1) =

ek −1
.
1+ek −1

For a speciﬁed α, k = log(1 − α)/α. So for α = .2, k ≈ 1.386 and β ≈ .595.
c. The Karlin-Rubin Theorem is satisﬁed, so the test is UMP of its size.
8.29 a. Let θ2 > θ1 . Then
2

2

2

f (x|θ2 )
1 + (1+θ1 ) /x − 2θ1 /x
1+(x − θ1 )
.
=
2=
2
2
f (x|θ1 )
1+(x − θ2 )
1 + (1+θ2 ) /x − 2θ2 /x
The limit of this ratio as x → ∞ or as x → −∞ is 1. So the ratio cannot be monotone increasing (or decreasing) between −∞ and ∞. Thus, the family does not have MLR.
b. By the Neyman-Pearson Lemma, a test will be UMP if it rejects when f (x|1)/f (x|0) > k , for some constant k . Examination of the derivative shows that f (x|1)/f (x|0) is decreasing
√
√
√
for x ≤ (1 − 5)/2√ −.618, is increasing for (1 − 5)/2 ≤ x ≤ (1 + 5)/2 = 1.618, and is
=
decreasing for (1 + 5)/2 ≤ x. Furthermore, f (1|1)/f (1|0) = f (3|1)/f (3|0) = 2. So rejecting if f (x|1)/f (x|0) > 2 is equivalent to rejecting if 1 < x < 3. Thus, the given test is UMP of its size. The size of the test is
3

P (1 < X < 3|θ = 0) =
1

11
1
2 dx = π arctanx π 1+x

3

≈ .1476.
1

Second Edition

8-13

The Type II error probability is
3

1 − P (1 < X < 3|θ = 1) = 1 −
1

1
1
1 dx = 1 − arctan(x − 1) π 1+(x − 1)2 π 3

≈ .6476.
1

c. We will not have f (1|θ)/f (1|0) = f (3|θ)/f (3|0) for any other value of θ = 1. Try θ = 2, for example. So the rejection region 1 < x < 3 will not be most powerful at any other value of
θ. The test is not UMP for testing H0 : θ ≤ 0 versus H1 : θ > 0.
8.30 a. For θ2 > θ1 > 0, the likelihood ratio and its derivative are
2
f (x|θ2 ) θ2 θ1 +x2
=
2 f (x|θ1 ) θ1 θ2 +x2

and

2
2
θ2 θ2 −θ1 d f (x|θ2 )
=
x.
2
dx f (x|θ1 ) θ1 (θ2 +x2 )2

2
2
The sign of the derivative is the same as the sign of x (recall, θ2 − θ1 > 0), which changes sign. Hence the ratio is not monotone.
b. Because f (x|θ) = (θ/π )(θ2 + |x|2 )−1 , Y = |X | is suﬃcient. Its pdf is

f (y |θ) =

2θ 1
,
π θ2 +y 2

y > 0.

Diﬀerentiating as above, the sign of the derivative is the same as the sign of y , which is positive. Hence the family has MLR.
8.31 a. By the Karlin-Rubin Theorem, the UMP test is to reject H0 if i Xi > k , because i Xi is suﬃcient and i Xi ∼ Poisson(nλ) which has MLR. Choose the constant k to satisfy
P ( i Xi > k |λ = λ0 ) = α.
b.
Xi > k λ = 1

√
≈ P Z > (k − n)/ n

Xi > k λ = 2

P

√
≈ P Z > (k − 2n)/ 2n

set
= .05,

i

P

set
= .90.

i

Thus, solve for k and n in k−n √ = 1.645 n and

k − 2n
√
= −1.28,
2n

yielding n = 12 and k = 17.70.
8.32 a. This is Example 8.3.15.
b. This is Example 8.3.19.
8.33 a. From Theorems 5.4.4 and 5.4.6, the marginal pdf of Y1 and the joint pdf of (Y1 , Yn ) are f (y1 |θ) = n(1 − (y1 − θ))n−1 , θ < y1 < θ + 1, f (y1 , yn |θ) = n(n − 1)(yn − y1 )n−2 , θ < y1 < yn < θ + 1.
Under H0 , P (Yn ≥ 1) = 0. So
1

n(1 − y1 )n−1 dy1 = (1 − k )n .

α = P (Y1 ≥ k |0) = k Thus, use k = 1 − α1/n to have a size α test.

8-14

Solutions Manual for Statistical Inference

b. For θ ≤ k − 1, β (θ) = 0. For k − 1 < θ ≤ 0, θ +1

n(1 − (y1 − θ))n−1 dy1 = (1 − k + θ)n .

β (θ) = k For 0 < θ ≤ k , θ +1

β (θ)

k

θ +1

n(1 − (y1 − θ))n−1 dy1 +

= k n(n − 1)(yn − y1 )n−2 dyn dy1 θ 1

= α + 1 − (1 − θ)n .
And for k < θ, β (θ) = 1.
c. (Y1 , Yn ) are suﬃcient statistics. So we can attempt to ﬁnd a UMP test using Corollary 8.3.13 and the joint pdf f (y1 , yn |θ) in part (a). For 0 < θ < 1, the ratio of pdfs is f (y 1 , yn |θ)
=
f (y 1 , yn |0)

0 if 0 < y1 ≤ θ, y1 < yn < 1
1
if θ < y1 < yn < 1
∞ if 1 ≤ yn < θ + 1, θ < y1 < yn .

For 1 ≤ θ, the ratio of pdfs is f (y 1 , yn |θ)
=
f (y 1 , yn |0)

0 if y1 < yn < 1
∞ if θ < y1 < yn < θ + 1.

For 0 < θ < k , use k = 1. The given test always rejects if f (y1 , yn |θ)/f (y1 , yn |0) > 1 and always accepts if f (y1 , yn |θ)/f (y1 , yn |0) < 1. For θ ≥ k , use k = 0. The given test always rejects if f (y1 , yn |θ)/f (y1 , yn |0) > 0 and always accepts if f (y1 , yn |θ)/f (y1 , yn |0) < 0. Thus the given test is UMP by Corollary 8.3.13.
d. According to the power function in part (b), β (θ) = 1 for all θ ≥ k = 1 − α1/n . So these conditions are satisﬁed for any n.
8.34 a. This is Exercise 3.42a.
b. This is Exercise 8.26a.
8.35 a. We will use the equality in Exercise 3.17 which remains true so long as ν > −α. Recall that
Y ∼ χ2 = gamma(ν/2, 2). Thus, using the independence of X and Y we have ν ET = E

√
√ Γ((ν − 1)/2)
√
= (E X ) ν E Y −1/2 = µ ν
Γ(ν/2) 2
Y /ν

X

if ν > 1. To calculate the variance, compute
E(T )2 = E

X2
Γ((ν − 2)/2)
(µ2 + 1)ν
= (E X 2 )ν E Y −1 = (µ2 + 1)ν
=
Y /ν
Γ(ν/2)2
ν−2

if ν > 2. Thus, if ν > 2,
Var T =

√ Γ((ν − 1)/2)
(µ2 + 1)ν
√
−µν ν−2 Γ(ν/2) 2

2

b. If δ = 0, all the terms in the sum for k = 1, 2, . . . are zero because of the δ k term. The expression with just the k = 0 term and δ = 0 simpliﬁes to the central t pdf.
c. The argument that the noncentral t has an MLR is fairly involved. It may be found in
Lehmann (1986, p. 295).

Second Edition

8-15

√
√
¯
¯
8.37 a. P (X > θ0 + zα σ/ n|θ0 ) = P (X −θ0 )/(σ/ n) > zα |θ0 = P (Z > zα ) = α, where Z ∼ n(0, 1). Because x is the unrestricted MLE, and the restricted MLE is θ0 if x > θ0 , the LRT
¯
¯ statistic is, for x ≥ θ0
¯
−[n(¯−θ0 )2 +(n−1)s2 ]] x −n/2 −Σi (x −θ0 )2 /2σ 2 i λ(x) =

(2πσ 2 )

e

−n/2 −Σi (x −x)2 /2σ 2 i¯ (2πσ 2 ) e =

2σ 2

e

e−(n−1)s

2

x
= e−n(¯−θ0 )

/2σ 2

2

/2σ 2

.

and the LRT statistic is 1 for x < θ0 . Thus, rejecting if λ < c is equivalent to rejecting if
¯
√
(¯ − θ0 )/(σ/ n) > c (as long as c < 1 – see Exercise 8.24). x b. The test is UMP by the Karlin-Rubin Theorem.
√
¯
c. P (X > θ0 + tn−1,α S/ n|θ = θ0 ) = P (Tn−1 > tn−1,α ) = α, when Tn−1 is a Student’s
1
t random variable with n − 1 degrees of freedom. If we deﬁne σ 2 = n (xi − x)2 and
ˆ
¯
1
2
2
2
2 n/2 σ0 = n (xi − θ0 ) , then for x ≥ θ0 the LRT statistic is λ = (ˆ /σ0 ) , and for x < θ0 the
ˆ
¯ σˆ ¯
LRT statistic is λ = 1. Writing σ 2 = n−1 s2 and σ0 = (¯ − θ0 )2 + n−1 s2 , it is clear that the
ˆ
ˆ2 x n n LRT is equivalent to the t-test because λ < c when n−1 2 ns 2
(¯−θ0 ) + n−1 s2 x n

(n − 1)/n
< c and x ≥ θ0 ,
¯
22
(¯−θ0 ) /s +(n − 1)/n x √ which is the same as rejecting when (¯ − θ0 )/(s/ n) is large. x d. The proof that the one-sided t test is UMP unbiased is rather involved, using the bounded completeness of the normal distribution and other facts. See Chapter 5 of Lehmann (1986) for a complete treatment.
8.38 a.
Size

=

¯
= Pθ0 | X − θ0 |> tn−1,α/2

S 2 /n

¯
S 2 /n ≤ X − θ0 ≤ tn−1,α/2

=

1 − Pθ0 −tn−1,α/2

=

1 − Pθ0

=

S 2 /n

1 − (1 − α) = α.

¯
X − θ0

−tn−1,α/2 ≤

S 2 /n

¯
X − θ0

≤ tn−1,α/2

S 2 /n

∼ tn−1 under H0

ˆ
¯
¯2
b. The unrestricted MLEs are θ = X and σ 2 =
ˆ
i (Xi − X ) /n. The restricted MLEs are
ˆ
θ0 = θ0 and σ0 = i (Xi − θ0 )2 /n. So the LRT statistic is
ˆ2
−n/2

λ(x)

=

(2π σ0 )
ˆ

−n/2

(2π σ )
ˆ

2

exp{−nσ 2 /(2ˆ 0 )}
ˆ0 σ
2

exp{−nσ 2 /(2ˆ )}
ˆ
σ n/2 2

i i =

(xi −x)
¯

n/2

2

(xi −θ0 )

i

=

2

(xi −x)
¯
2

i

.

2

(xi −x) + n(¯−θ0 )
¯
x

For a constant c, the LRT is
2

i

reject H0 if

(xi −x)
¯
2

i

2

(xi −x) + n(¯−θ0 )
¯
x

=

1
2

1 + n(¯−θ0 ) / x 2

i

(xi −x)
¯

After some algebra we can write the test as reject H0 if |x − θ0 |>
¯

c−2/n − 1 (n − 1)

s2 n 1 /2

.

< c2/n .

8-16

Solutions Manual for Statistical Inference

We now choose the constant c to achieve size α, and we s2 /n.

reject if |x − θ0 |> tn−1,α/2
¯

c. Again, see Chapter 5 of Lehmann (1986).
2
2
2
8.39 a. From Exercise 4.45c, Wi = Xi − Yi ∼ n(µW , σW ), where µX − µY = µW and σX + σY −
2
ρσX σY = σW . The Wi s are independent because the pairs (Xi , Yi ) are.
b. The hypotheses are equivalent to H0 : µW = 0 vs H1 : µW = 0, and, from Exercise 8.38, if
2
¯ we reject H0 when | W |> tn−1,α/2 SW /n, this is the LRT (based on W1 , . . . , Wn ) of size
α. (Note that if ρ > 0, Var Wi can be small and the test will have good power.)
8.41 a. supH0 L(µX , µY , σ 2 | x, y)
L(ˆ, σ0 | x, y) µ ˆ2
=
.
2 | x, y) supL(µX , µY , σ
L(ˆX , µY , σ1 | x, y) µ ˆ ˆ2

λ(x, y) =

Under H0 , the Xi s and Yi s are one sample of size m + n from a n(µ, σ 2 ) population, where µ = µX = µY . So the restricted MLEs are i µ=
ˆ

2

Xi + i Yi nx+ny ¯¯
=
n+m n+m and σ0 =
ˆ2

i

2

(X i − µ) + i (Y i − µ)
ˆ
ˆ
.
n+m

To obtain the unrestricted MLEs, µx , µy , σ 2 , use
ˆˆˆ
L(µX , µY , σ 2 |x, y ) = (2πσ 2 )−(n+m)/2 e−[Σi (xi −µX )

2

+Σi (y i −µY )2 ]/2σ 2

.

Firstly, note that µX = x and µY = y , because maximizing over µX does not involve µY
ˆ
¯
ˆ
¯ and vice versa. Then
∂ log L n+m 1
1
=−
+
2
2
∂σ
2σ
2

2

1

2

(xi − µX ) +
ˆ

(y i − µY )
ˆ

i

set
=0

2

(σ 2 )

i

implies n m
2

2

σ2 =
ˆ

(xi − x) +
¯
i=1

(y i − y )
¯
i=1

1
.
n+m

To check that this is a maximum,
∂ 2 log L

n+m 1
−
2 (σ 2 )2

=

2

∂ (σ 2 )

σ2
ˆ

2

i

(y i − µY )
ˆ

3

(σ 2 )

i

1 n+m 1
2 2 − (n + m)
22
2 (ˆ ) σ (ˆ ) σ =

1

2

(xi − µX ) +
ˆ

n+m 1
=−
2 (ˆ 2 )2 σ σ2
ˆ

< 0.

Thus, it is a maximum. We then have
− n+m
2

λ(x, y) =

(2π σ0 )
ˆ2

1 exp − 2ˆ 2 σ n i=1 (xi − µ) +
ˆ

1 exp − 2ˆ 2 σ n i=1 (xi − x) +
¯

0

− n+m
2

(2π σ 2 )
ˆ

2

m i=1 (y i − µ)
ˆ

2

2

m i=1 (y i − y )
¯

2

=

− n+m
2

σ0
ˆ2
σ1
ˆ2

and the LRT is rejects H0 if σ0 /σ 2 > k . In the numerator, ﬁrst substitute µ = (nx +
ˆ2 ˆ
ˆ
¯ my )/(n + m) and write
¯
n

xi − i=1 nx+my
¯
¯ n+m n

2

(xi −x)+ x−
¯
¯

= i=1 nx+my
¯
¯ n+m n

2

(xi − x)2 +
¯

= i=1 nm2

x ¯2
2 (¯ − y ) ,

(n + m)

Second Edition

8-17

because the cross term is zero. Performing a similar operation on the Y sum yields σ0 ˆ2
=
σ2
ˆ

2

2

2

(y i −y ) + nnm (¯−y )
¯
+m x ¯

(xi −x) +
¯

2

=n+m+

σ2
ˆ

nm (¯−y ) x¯ . n + m σ2
ˆ

m−
2
2
Because σ 2 = n++m 2 Sp , large values of σ0 σ 2 are equivalent to large values of (¯ − y )2 Sp
ˆ
ˆ2 ˆ x¯ n and large values of |T |. Hence, the LRT is the two-sample t-test.

b.
¯¯
(X − Y )

¯¯
X −Y

T=

σ 2 (1/n + 1/m)

=

.

2

2
Sp (1/n + 1/m)

[(n + m − 2)S p /σ 2 ]/(n + m − 2)

2
2
¯¯
Under H0 , (X − Y ) ∼ n(0, σ 2 (1/n +1/m)). Under the model, (n − 1)SX /σ 2 and (m − 1)SY /σ 2
2
are independent χ random variables with (n − 1) and (m − 1) degrees of freedom. Thus,
2
2
2
¯
¯
(n + m − 2)Sp /σ 2 = (n − 1)SX /σ 2 + (m − 1)SY /σ 2 ∼ χ2 +m−2 . Furthermore, X − Y is n 2
2
2 independent of SX and SY , and, hence, Sp . So T ∼ tn+m−2 .
c. The two-sample t test is UMP unbiased, but the proof is rather involved. See Chapter 5 of
Lehmann (1986).
2
2
¯
¯
d. For these data we have n = 14, X = 1249.86, SX = 591.36, m = 9, Y = 1261.33, SY = 176.00
2
and Sp = 433.13. Therefore, T = −1.29 and comparing this to a t21 distribution gives a p-value of .21. So there is no evidence that the mean age diﬀers between the core and periphery. 8.42 a. The Satterthwaite approximation states that if Yi ∼ χ2i , where the Yi ’s are independent, r then
2
2
( i ai Yi ) approx χν
ˆ
where ν =
ˆ
. ai Yi ∼
22
ν
ˆ
i ai Yi /r i i 2
2
2
2
We have Y1 = (n − 1)SX /σX ∼ χ2 −1 and Y2 = (m − 1)SY /σY ∼ χ2 −1 . Now deﬁne n m

a1 =

2 σX 2
2
n(n − 1) [(σX /n) + (σY /m)]

and a2 =

2 σY .
2
2 m(m − 1) [(σX /n) + (σY /m)]

Then, ai Yi

=

=

2
2
(n − 1)S X σX 2 /n) + (σ 2 /m)]
2
n(n − 1) [(σX σX Y
2
2
(m − 1)S Y σY +
2
2
2
m(m − 1) [(σX /n) + (σY /m)] σY 2
2
SX /n + S Y /m χ2 ν
ˆ
∼
2
2 σX /n+σY /m ν ˆ

where ν= ˆ

2
SX /n+S 2 /m
Y
2
2
σX /n+σY /m

2

2

4
4
SX
SY
1
1
(n−1) n2 (σ 2 /n+σ 2 /m)2 + (m−1) m2 (σ 2 /n+σ 2 /m)2
X
Y
X
Y

2
2
¯
¯
b. Because X − Y ∼ n µX − µY , σX /n+σY /m and µX − µY = 0 we have

T=

¯¯
(X − Y )

¯¯
X −Y
2
SX /n +

2
SY

=
/m

=

2
SX /n + S Y /m

4
4
SX
SY
n2 (n−1) + m2 (m−1)

2
SX /n+S 2 /m
Y
2
2
σX /n+σY /m

approx

2
2
σX /n+σY /m

2
(SX /n+S 2 /m)
Y
2
2
(σX /n+σY /m)

2

∼

approx

∼

.

χ2 /ν , under H0 : νˆ ˆ

tν .
ˆ

8-18

Solutions Manual for Statistical Inference

c. Using the values in Exercise 8.41d, we obtain T = −1.46 and ν = 20.64. So the p-value is
ˆ
.16. There is no evidence that the mean age diﬀers between the core and periphery.
2
2
d. F = SX /SY = 3.36. Comparing this with an F13,8 distribution yields a p-value of 2P (F ≥
3.36) = .09. So there is some slight evidence that the variance diﬀers between the core and periphery. 8.43 There were typos in early printings. The t statistic should be
¯¯
(X − Y ) − (µ1 − µ2 )
1
n1

+

ρ2 n2 (n1 −1)s2 +(n2 −1)s2 /ρ2
X
Y n1 +n2 −2

,

and the F statistic should be s2 /(ρ2 s2 ). Multiply and divide the denominator of the t statistic
Y
X by σ to express it as
¯¯
(X − Y ) − (µ1 − µ2 ) σ2 n1

ρ2 σ 2 n2 +

divided by
(n1 − 1)s2 /σ 2 + (n2 − 1)s2 /(ρ2 σ 2 )
X
Y
.
n1 + n 2 − 2
The numerator has a n(0, 1) distribution. In the denominator, (n1 − 1)s2 /σ 2 ∼ χ2 1 −1 and n X
(n2 − 1)s2 /(ρ2 σ 2 ) ∼ χ2 2 −1 and they are independent, so their sum has a χ2 1 +n2 −2 distribution. n n
Y
Thus, the statistic has the form of n(0, 1)/ χ2 /ν where ν = n1 + n2 − 2, and the numerator ν and denominator are independent because of the independence of sample means and variances in normal sampling. Thus the statistic has a tn1 +n2 −2 distribution. The F statistic can be written as s2 s2 /(ρ2 σ 2 )
[(n2 − 1)s2 /(ρ2 σ 2 )]/(n2 − 1)
Y
Y
= Y2 2 = ρ2 s2 sX /σ
[(n1 − 1)s2 /(σ 2 )]/(n1 − 1)
X
X which has the form [χ2 2 −1 /(n2 − 1)]/[χ2 1 −1 /(n1 − 1)] which has an Fn2 −1,n1 −1 distribution. n n
(Note, early printings had a typo with the numerator and denominator degrees of freedom switched.) √
√
¯
¯
8.44 Test 3 rejects H0 : θ = θ0 in favor of H1 : θ = θ0 if X > θ0 + zα/2 σ/ n or X < θ0 − zα/2 σ/ n.
¯
Let Φ and φ denote the standard normal cdf and pdf, respectively. Because X ∼ n(θ, σ 2 /n), the power function of Test 3 is
√
√
¯
¯ β (θ) = Pθ (X < θ0 − zα/2 σ/ n) + Pθ (X > θ0 + zα/2 σ/ n)
=

θ0 − θ
√ − zα/2 σ/ n

Φ

θ0 − θ
√ + zα/2 , σ/ n

+1−Φ

and its derivative is
√
dβ (θ) n =− φ dθ σ θ0 − θ
√ − zα/2 σ/ n

√
+

n φ σ

θ0 − θ
√ + zα/2 . σ/ n

Because φ is symmetric and unimodal about zero, this derivative will be zero only if
−

θ0 − θ
√ − zα/2 σ/ n

=

θ0 − θ
√ + zα/2 , σ/ n

that is, only if θ = θ0 . So, θ = θ0 is the only possible local maximum or minimum of the power function. β (θ0 ) = α and limθ→±∞ β (θ) = 1. Thus, θ = θ0 is the global minimum of β (θ), and, for any θ = θ0 , β (θ ) > β (θ0 ). That is, Test 3 is unbiased.

Second Edition

8-19

8.45 The veriﬁcation of size α is the same computation as in Exercise 8.37a. Example 8.3.3 shows that the power function βm (θ) for each of these tests is an increasing function. So for θ > θ0 , βm (θ) > βm (θ0 ) = α. Hence, the tests are all unbiased.
8.47 a. This is very similar to the argument for Exercise 8.41.
+
b. By an argument similar to part (a), this LRT rejects H0 if
¯¯
X −Y −δ

T+ =

1 n 2
Sp

+

≤ −tn+m−2,α .

1 m +
−
c. Because H0 is the union of H0 and H0 , by the IUT method of Theorem 8.3.23 the test that rejects H0 if the tests in parts (a) and (b) both reject is a level α test of H0 . That is, the test rejects H0 if T + ≤ −tn+m−2,α and T − ≥ tn+m−2,α .
d. Use Theorem 8.3.24. Consider parameter points with µX − µY = δ and σ → 0. For any σ , P (T + ≤ −tn+m−2,α ) = α. The power of the T − test is computed from the noncentral t distribution with noncentrality parameter |µx − µY − (−δ )|/[σ (1/n + 1/m)] = 2δ/[σ (1/n +
1/m)] which converges to ∞ as σ → 0. Thus, P (T − ≥ tn+m−2,α ) → 1 as σ → 0. By Theorem
8.3.24, this IUT is a size α test of H0 .
8.49 a. The p-value is

7 or more successes out of 10 Bernoulli trials

P

10
1
7
2
= .171875.

7

=

1
2

3

+

θ=

10
8

1
2

1
2

8

1
2

2

+

10
9

1
2

9

1
2

1

+

10
10

1
2

10

b.
P-value

= P {X ≥ 3 | λ = 1} = 1 − P (X < 3 | λ = 1) e−1 12 e−1 11 e−1 10
+
+
≈ .0803.
= 1−
2!
1!
0!

c.
= P{

P-value

Xi ≥ 9 | 3λ = 3} = 1 − P (Y < 9 | 3λ = 3) i =

1 − e−3

38 37 36 35
31 30
+ + + +···+ +
8! 7! 6! 5!
1! 0!

≈ .0038,

3

where Y = i=1 Xi ∼ Poisson(3λ).
8.50 From Exercise 7.26, π (θ|x) = where δ± (x) = x ±
¯

σ2 na n −n(θ−δ± (x))2 /(2σ2 ) e ,
2πσ 2

and we use the “+” if θ > 0 and the “−” if θ < 0.

a. For K > 0,
P (θ > K |x, a) = where Z ∼ n(0, 1).

n
2πσ 2

√

∞

e−n(θ−δ+ (x))
K

2

/(2σ 2 )

dθ = P

Z>

n
[K −δ+ (x)] , σ 1
2

0

8-20

Solutions Manual for Statistical Inference
√

b. As a → ∞, δ+ (x) → x so P (θ > K ) → P Z >
¯

n
¯
σ (K −x)

.

c. For K = 0, the answer in part (b) is 1 − (p-value) for H0 : θ ≤ 0.
8.51 If α < p(x), sup P (W (X) ≥ cα ) = α < p(x) = sup P (W (X) ≥ W (x)). θ ∈Θ0

θ ∈Θ0

Thus W (x) < cα and we could not reject H0 at level α having observed x. On the other hand, if α ≥ p(x), sup P (W (X) ≥ cα ) = α ≥ p(x) = sup P (W (X) ≥ W (x)). θ ∈Θ0

θ ∈Θ0

Either W (x) ≥ cα in which case we could reject H0 at level α having observed x or W (x) < cα .
But, in the latter case we could use cα = W (x) and have {x : W (x ) ≥ cα } deﬁne a size α rejection region. Then we could reject H0 at level α having observed x.
8.53 a.
P (−∞ < θ < ∞) =

111
+√
2 2 2πτ 2

∞

e−θ

2

/(2τ 2 )

dθ =

−∞

11
+ = 1.
22

b. First calculate the posterior density. Because
√
n −n(¯−θ)2 /(2σ2 ) ex , f (¯|θ) = √ x 2πσ we can calculate the marginal density as mπ (¯) x =
=

2
2
1
1∞
1 f (¯|0) + x f (¯|θ) √ x e−θ /(2τ ) dθ
2
2 −∞
2πτ
√
2
2
1
n −nx2 /(2σ2 ) 1
1
¯2
√
e¯
+√
e−x /[2((σ /n)+τ )]
2 2πσ
2 2π (σ 2 /n)+τ 2

1
(see Exercise 7.22). Then P (θ = 0|x) = 2 f (¯|0)/mπ (¯).
¯
x x c.
¯
P |X | > x θ = 0
¯

¯
= 1 − P |X | ≤ x θ = 0
¯
¯
= 1 − P −x ≤ X ≤ x θ = 0
¯
¯

√
= 2 1−Φ x/(σ/ n)
¯

,

where Φ is the standard normal cdf.
d. For σ 2 = τ 2 = 1 and n = 9 we have a p-value of 2 (1 − Φ(3¯)) and x P (θ = 0| x) =
¯

1+

1 81¯2 /20 ex 10

−1

.

The p-value of x is usually smaller than the Bayes posterior probability except when x is
¯
¯ very close to the θ value speciﬁed by H0 . The following table illustrates this.

p-value of x
¯
posterior
P (θ = 0|x)
¯

Some p-values and posterior probabilities (n = 9) x ¯
0
±.1
±.15
±.2
±.5
±.6533
±.7
1
.7642 .6528 .5486 .1336
.05
.0358
.7597

.7523

.7427

.7290

.5347

.3595

.3030

±1
.0026

±2
≈0

.0522

≈0

Second Edition

8-21

8.54 a. From Exercise 7.22, the posterior distribution of θ|x is normal with mean [τ 2 /(τ 2 + σ 2 /n)]¯ x and variance τ 2 /(1 + nτ 2 /σ 2 ). So
P (θ ≤ 0|x)

0 − [τ 2 /(τ 2 +σ 2 /n)]¯ x =P

Z≤

=P

Z≤−

τ 2 /(1 + nτ 2 /σ 2 ) τ (σ 2 /n)(τ 2 +σ 2 /n)

x
¯

Z≥

=P

τ
(σ 2 /n)(τ 2 +σ 2 /n)

x.
¯

¯
b. Using the fact that if θ = 0, X ∼ n(0, σ 2 /n), the p-value is
¯
P (X ≥ x) = P
¯

Z≥

x−0
¯
√ σ/ n

Z≥

=P

1
√x
¯ σ/ n

c. For σ 2 = τ 2 = 1,
P (θ ≤ 0|x) = P

1

Z≥

(1/n)(1 + 1/n)

Because

x
¯

1
(1/n)(1 + 1/n)

¯ and P (X ≥ x) = P
¯

<

1
1/n

Z≥

1
1/n

x.
¯

,

the Bayes probability is larger than the p-value if x ≥ 0. (Note: The inequality is in the
¯
opposite direction for x < 0, but the primary interest would be in large values of x.)
¯
¯
2
d. As τ → ∞, the constant in the Bayes probability, τ (σ 2 /n)(τ 2 +σ 2 /n)

=

1
(σ 2 /n)(1+σ 2 /(τ 2 n))

→

1
√,
σ/ n

the constant in the p-value. So the indicated equality is true.
8.55 The formulas for the risk functions are obtained from (8.3.14) using the power function β (θ) =
Φ(−zα + θ0 − θ), where Φ is the standard normal cdf.
8.57 For 0–1 loss by (8.3.12) the risk function for any test is the power function β (µ) for µ ≤ 0 and
1 − β (µ) for µ > 0. Let α = P (1 < Z < 2), the size of test δ . By the Karlin-Rubin Theorem, the test δzα that rejects if X > zα is also size α and is uniformly more powerful than δ , that is, βδzα (µ) > βδ (µ) for all µ > 0. Hence,
R(µ, δzα ) = 1 − βδzα (µ) < 1 − βδ (µ) = R(µ, δ ),

for all µ > 0.

∗
∗
Now reverse the roles of H0 and H1 and consider testing H0 : µ > 0 versus H1 : µ ≤ 0. Consider
∗
∗
∗
∗ the test δ that rejects H0 if X ≤ 1 or X ≥ 2, and the test δzα that rejects H0 if X ≤ zα . It is
∗
∗ easily veriﬁed that for 0–1 loss δ and δ have the same risk functions, and δzα and δzα have the same risk functions. Furthermore, using the Karlin-Rubin Theorem as before, we can conclude
∗
that δzα is uniformly more powerful than δ ∗ . Thus we have
∗
R(µ, δ ) = R(µ, δ ∗ ) ≥ R(µ, δzα ) = R(µ, δzα ),

with strict inequality if µ < 0. Thus, δzα is better than δ .

for all µ ≤ 0,

Chapter 9

Interval Estimation

9.1 Denote A = {x : L(x) ≤ θ} and B = {x : U (x) ≥ θ}. Then A ∩ B = {x : L(x) ≤ θ ≤ U (x)} and 1 ≥ P {A ∪ B } = P {L(X ) ≤ θ or θ ≤ U (X )} ≥ P {L(X ) ≤ θ or θ ≤ L(X )} = 1, since
L(x) ≤ U (x). Therefore, P (A ∩ B ) = P (A)+ P (B ) − P (A ∪ B ) = 1 − α1 +1 − α2 − 1 = 1 − α1 − α2 .
9.3 a. The MLE of β is X(n) = max Xi . Since β is a scale parameter, X(n) /β is a pivot, and cβ β

.05 = Pβ (X(n) /β ≤ c) = Pβ (all Xi ≤ cβ ) =

α0 n

= cα0 n

implies c = .051/α0 n . Thus, .95 = Pβ (X(n) /β > c) = Pβ (X(n) /c > β ), and {β : β <
X(n) /(.051/α0 n )} is a 95% upper conﬁdence limit for β .
b. From 7.10, α = 12.59 and X(n) = 25. So the conﬁdence interval is (0, 25/[.051/(12.59·14) ]) =
ˆ
(0, 25.43).
9.4 a. λ(x, y ) =

2
2
supλ=λ0 L σX , σY x, y
2
2 supλ∈(0,+∞) L ( σX , σY | x, y )
2
ΣXi n 2
2
The unrestricted MLEs of σX and σY are σX =
ˆ2
2
2
restriction, λ = λ0 , σY = λ0 σX , and

=

2
2πσX

−n/2

=

2
2
L σX , λ0 σX x, y

2
2πσX

−(m+n)/2

2
2πλ0 σX

and σY =
ˆ2

ΣYi2 m, −m/2 −Σx2 /(2σ 2 ) i X

e

as usual. Under the
2

2

· e−Σyi /(2λ0 σX )

2
2
−m/2 −(λ0 Σx2 +Σyi )/(2λ0 σX ) i λ0

e

Diﬀerentiating the log likelihood gives d log L

=

2

2 d (σX )

d
2
dσX

=−

−

2 m+n m+n m λ0 Σx2 + Σyi i 2 log σX − log (2π ) − log λ0 −
2
2
2
2
2λ0 σX

m+n 2 σX 2

−1

+

2 λ0 Σx2 + Σyi i 2 σX 2λ0

−2

set

=

0

which implies σ0 =
ˆ2

2 λ0 Σx2 + Σyi i . λ0 (m + n)

To see this is a maximum, check the second derivative: d2 log L
22
d (σX )

=

m+n 2 σX 2

=−

−2

−

1
2
λ0 Σx2 + Σyi i λ0

m + n 2 −2
(ˆ0 ) σ < 0,
2

2 σX −3
2
σX =ˆ0 σ2 9-2

Solutions Manual for Statistical Inference

therefore σ0 is the MLE. The LRT statistic is
ˆ2
n/2

σX
ˆ2
m/2

λ0

σY
ˆ2

m/2

(m+n)/2

(ˆ0 ) σ2 ,

and the test is: Reject H0 if λ(x, y ) < k , where k is chosen to give the test size α.
2
2
2
b. Under H0 ,
Yi2 /(λ0 σX ) ∼ χ2 and
Xi /σX ∼ χ2 , independent. Also, we can write m n

n/2 
m/2
1
1


 λ(X, Y ) = 
2
2
(ΣY 2 /λ0 σ 2 )/m
(ΣXi /σX )/n n m
+ (Σi 2 /σ2X /n · mm n
+ (ΣY 2 /λ0 σ2 )/m · mn n m+n + m+n +
X
) i = where F =

n n+m ΣYi2 /λ0 m
2
ΣXi /n

i

X

n/2

1
+ mm n F
+

1 m m+n

+

X

m/2

n
−1
m+n F

∼ Fm,n under H0 . The rejection region is




(x, y ) :



1 n n+m

+

m m+n F

n/2

1

· m m+n

+

m/2

n
−1
m+n F

< cα







where cα is chosen to satisfy
P

n m +
F
n+m m+n

−n/2

m n +
F −1 n+m m+n

−m/2

< cα

= α.

2
c. To ease notation, let a = m/(n + m) and b = a yi / x2 . From the duality of hypothesis i tests and conﬁdence sets, the set

 m/2 n/2


1
1 c(λ) = λ :
≥ cα

 a + b/λ
(1 − a)+ a(1−a) λ b is a 1 − α conﬁdence set for λ. We now must establish that this set is indeed an interval. To do this, we establish that the function on the left hand side of the inequality has only an interior maximum. That is, it looks like an upside-down bowl. Furthermore, it is straightforward to establish that the function is zero at both λ = 0 and λ = ∞. These facts imply that the set of λ values for which the function is greater than or equal to cα must be an interval. We make some further simpliﬁcations. If we multiply both sides of the inequality by [(1 − a)/b]m/2 , we need be concerned with only the behavior of the function h(λ) =

1 a + b/λ

n/2

1 b + aλ

m/2

.

Moreover, since we are most interested in the sign of the derivative of h, this is the same as the sign of the derivative of log h, which is much easier to work with. We have d log h(λ) dλ =
=
=

d n m
− log(a + b/λ) − log(b + aλ) dλ 2
2
n b/λ2 ma −
2 a + b/λ
2 b + aλ
1
−a2 mλ2 +ab(n − m)λ+nb2 .
2λ2 (a + b/λ)(b + aλ)

Second Edition

9-3

The sign of the derivative is given by the expression in square brackets, a parabola. It is easy to see that for λ ≥ 0, the parabola changes sign from positive to negative. Since this is the sign change of the derivative, the function must increase then decrease. Hence, the function is an upside-down bowl, and the set is an interval.
9.5 a. Analogous to Example 9.2.5, the test here will reject H0 if T < k (p0 ). Thus the conﬁdence set is C = {p : T ≥ k (p)}. Since k (p) is nondecreasing, this gives an upper bound on p.
b. k (p) is the integer that simultaneously satisﬁes n y =k(p)

9.6 a. For Y =

n

ny p (1 − p)n−y ≥ 1 − α y and y =k(p)+1

ny p (1 − p)n−y < 1 − α. y Xi ∼ binomial(n, p), the LRT statistic is λ(y ) =

n y n y py (1 − p0 )n−y
0
py (1 − p)n−y
ˆ
ˆ

y

p0 (1 − p)
ˆ
p(1 − p0 )
ˆ

=

1−p0
1−p
ˆ

n

where p = y/n is the MLE of p. The acceptance region is
ˆ
A(p0 ) =

y:

p0 p ˆ

y

1−p0
1−p
ˆ

n−y

≥ k∗

where k ∗ is chosen to satisfy Pp0 (Y ∈ A(p0 )) = 1 − α. Inverting the acceptance region to a conﬁdence set, we have
C (y ) =

p:

p p ˆ

y

(1 − p)
1−p
ˆ

n−y

≥ k∗

.

b. For given n and observed y , write n−y y

C (y ) = p : (n/y ) (n/(n − y ))

n−y

py (1 − p)

≥ k∗ .

This is clearly a highest density region. The endpoints of C (y ) are roots of the nth degree y n−y y polynomial (in p), (n/y ) (n/(n − y )) p (1 − p)n−y − k ∗ . The interval of (10.4.4) is p: p−p
ˆ
p(1 − p)/n

≤ zα/2

.

The endpoints of this interval are the roots of the second degree polynomial (in p), (ˆ− p)2 − p 2 zα/2 p(1 − p)/n. Typically, the second degree and nth degree polynomials will not have the same roots. Therefore, the two intervals are diﬀerent. (Note that when n → ∞ and y → ∞, the density becomes symmetric (CLT). Then the two intervals are the same.)
9.7 These densities have already appeared in Exercise 8.8, where LRT statistics were calculated for testing H0 : a = 1.
a. Using the result of Exercise 8.8(a), the restricted MLE of θ (when a = a0 ) is
−a0 +
ˆ
θ0 =

a2 + 4
0
2

x2 /n i ,

and the unrestricted MLEs are
ˆ¯
θ=x

and

a=
ˆ

(xi − x)2
¯
. nx ¯

9-4

Solutions Manual for Statistical Inference

The LRT statistic is λ(x) =

aθ
ˆˆ
ˆ a0 θ0

n/2 −

e

−

e

1
ˆ
2a0 θ0

ˆ
Σ(xi −θ0 )2

=

1
ˆ
Σ(xi −θ )2
2ˆ θ aˆ n/2

1

−

en/2 e

ˆ
2πa0 θ0

1
ˆ
2a0 θ0

ˆ
Σ(xi −θ0 )2

The rejection region of a size α test is {x : λ(x) ≤ cα }, and a 1 − α conﬁdence set is
{a0 : λ(x) ≥ cα }.
b. Using the results of Exercise 8.8b, the restricted MLE (for a = a0 ) is found by solving
−a0 θ2 + [ˆ 2 + (¯ − θ)2 ] + θ(¯ − θ) = 0, σ x x yielding the MLE
ˆ
θR = x +
¯

x + 4a0 (ˆ 2 + x2 )/2a0 .
¯
σ
¯

The unrestricted MLEs are
ˆ¯
θ=x

and

a=
ˆ

1 nx2 ¯

n

(xi − x)2 =
¯
i=1

σ2
ˆ
, x2 ¯

yielding the LRT statistic λ(x) = σ /θR
ˆˆ

n

ˆ

e(n/2)−Σ(xi −θR )

2

ˆ
/(2θ R )

.

The rejection region of a size α test is {x : λ(x) ≤ cα }, and a 1 − α conﬁdence set is
{a0 : λ(x) ≥ cα }.
9.9 Let Z1 , . . . , Zn be iid with pdf f (z ).
¯
¯
a. For Xi ∼ f (x − µ), (X1 , . . . , Xn ) ∼ (Z1 + µ, . . . , Zn + µ), and X − µ ∼ Z + µ − µ = Z . The
¯ does not depend on µ. distribution of Z
¯
¯
b. For Xi ∼ f (x/σ )/σ , (X1 , . . . , Xn ) ∼ (σZ1 , . . . , σZn ), and X/σ ∼ σZ/σ = Z . The distribu¯ tion of Z does not depend on σ .
¯
c. For Xi ∼ f ((x − µ)/σ )/σ , (X1 , . . . , Xn ) ∼ (σZ1 + µ, . . . , σZn + µ), and (X − µ)/SX ∼
¯ (σSZ ) = Z/SZ . The distribution of Z/SZ does not depend on
¯
¯
(σZ + µ − µ)/SσZ +µ = σ Z/ µ or σ .
9.11 Recall that if θ is the true parameter, then FT (T |θ) ∼ uniform(0, 1). Thus,
Pθ0 ({T : α1 ≤ FT (T |θ0 ) ≤ 1 − α2 }) = P (α1 ≤ U ≤ 1 − α2 ) = 1 − α2 − α1 , where U ∼ uniform(0, 1). Since t ∈ {t : α1 ≤ FT (t|θ) ≤ 1 − α2 }

⇔

θ ∈ {θ : α1 ≤ FT (t|θ) ≤ 1 − α2 }

the same calculation shows that the interval has conﬁdence 1 − α2 − α1 .
√
√¯
9.12 If X1 , . . . , Xn ∼√ n(θ, θ), then n(X − θ)/ θ ∼ n(0, 1) and a 1 − α conﬁdence interval is iid √
{θ : | n(¯ − θ)/ θ| ≤ zα/2 }. Solving for θ, we get x 2
2
θ : nθ2 − θ 2nx + zα/2 + nx2 ≤ 0 = θ : θ ∈ 2nx + zα/2 ±
¯
¯
¯

4
4nxzα/2 + zα/2 /2n .
¯2

√
¯
Simpler answers can be obtained using the t pivot, (X −θ)/(S/ n), or the χ2 pivot, (n−1)S 2 /θ2 .
(Tom Werhley of Texas A&M university notes the following: The largest probability of getting
√
a negative discriminant (hence empty conﬁdence interval) occurs when nθ = 1 zα/2 , and
2
the probability is equal to α/2. The behavior of the intervals for negative values of x is also
¯
interesting. When x = 0 the lefthand endpoint is also equal to 0, but when x < 0, the lefthand
¯
¯ endpoint is positive. Thus, the interval based on x = 0 contains smaller values of θ than that
¯
based on x < 0. The intervals get smaller as x decreases, ﬁnally becoming empty.)
¯
¯

Second Edition

9.13 a. For Y = −(log X )−1 , the pdf of Y is fY (y ) =
2θ

P (Y /2 ≤ θ ≤ Y ) = θ 9-5

θ −θ/y
,
y2 e

θ −θ/y e dy = e−θ/y y2 0 < y < ∞, and
2θ

= e−1/2 − e−1 = .239.

θ

θ −1

b. Since fX (x) = θx , 0 < x < 1, T = X θ is a good guess at a pivot, and it is since fT (t) = 1,
0 < t < 1. Thus a pivotal interval is formed from P (a < X θ < b) = b − a and is θ: log b log a
≤θ≤
log x log x

.

Since X θ ∼ uniform(0, 1), the interval will have conﬁdence .239 as long as b − a = .239.
c. The interval in part a) is a special case of the one in part b). To ﬁnd the best interval, we minimize log b − log a subject to b − a = 1 − α, or b = 1 − α + a. Thus we want to minimize log(1 − α + a) − log a = log 1+ 1−α , which is minimized by taking a as big as possible. a Thus, take b = 1 and a = α, and the best 1 − α pivotal interval is θ : 0 ≤ θ ≤ log α . Thus log x the interval in part a) is nonoptimal. A shorter interval with conﬁdence coeﬃcient .239 is
{θ : 0 ≤ θ ≤ log(1 − .239)/log(x)}.
9.14 a. Recall the Bonferroni Inequality (1.2.9), P (A1 ∩ A2 ) ≥ P (A1 ) + P (A2 ) − 1. Let A1 =
P (interval covers µ) and A2 = P (interval covers σ 2 ). Use the interval (9.2.14), with tn−1,α/4 to get a 1 − α/2 conﬁdence interval for µ. Use the interval after (9.2.14) with b = χ2 −1,α/4 n and a = χ2 −1,1−α/4 to get a 1 − α/2 conﬁdence interval for σ . Then the natural simultaneous n set is
Ca (x)

s s (µ, σ 2 ) : x − tn−1,α/4 √ ≤ µ ≤ x + tn−1,α/4 √
¯
¯ n n

=

2

and

2

(n − 1)s
(n − 1)s
≤ σ2 ≤ 2 χ2 −1,α/4 χn−1,1−α/4 n

and P Ca (X ) covers (µ, σ 2 ) = P (A1 ∩ A2 ) ≥ P (A1 ) + P (A2 ) − 1 = 2(1 − α/2) − 1 = 1 − α.
b. If we replace the µ interval in a) by use zα/4 and

µ : x− √n ≤ µ ≤ x +
¯ kσ
¯

kσ
√
n

then

¯
X −µ
√
σ/ n

∼ n(0, 1), so we

2

Cb (x) =

2

σ σ (n − 1)s
(n − 1)s
(µ, σ 2 ) : x − zα/4 √ ≤ µ ≤ x + zα/4 √ and 2
¯
¯
≤ σ2 ≤ 2 χn−1,α/4 χn−1,1−α/4 n n

and P Cb (X ) covers (µ, σ 2 ) ≥ 2(1 − α/2) − 1 = 1 − α.
c. The sets can be compared graphically in the (µ, σ ) plane: Ca is a rectangle, since µ and σ 2 are treated independently, while Cb is a trapezoid, with larger σ 2 giving a longer interval.
Their areas can also be calculated
Area of C a

=

s
2tn−1,α/4 √ n Area of C b

=

s zα/4 √ n ×

and compared numerically.

2

(n − 1)s n−1 + χ2 −1,1−α/4 n 2

(n − 1)s

1 χ2 −1,1−α/4 n n−1 χ2 −1,α/4 n 1
1
−2 χ2 −1,1−α/4 χn−1,α/4 n

−

1 χ2 −1,α/4 n 9-6

Solutions Manual for Statistical Inference

9.15 Fieller’s Theorem says that a 1 − α conﬁdence set for θ = µY /µX is θ: x2 −
¯

t2 −1,α/2 n n−1

s2
X

θ2 − 2 xy −
¯¯

t2 −1,α/2 n n−1

sY X

θ+

y2 −
¯

t2 −1,α/2 n n−1

s2
Y

≤0 .

t2

1,α/
a. Deﬁne a = x2 − ts2 , b = xy − tsY X , c = y 2 − ts2 , where t = n−−1 2 . Then the parabola
¯
¯¯
¯
X
Y
n opens upward if a > 0. Furthermore, if a > 0, then there always exists at least one real root.
¯ ¯¯
This follows from the fact that at θ = y /x, the value of the function is negative. For θ = y /x
¯¯
we have

x2 − ts2
¯
X

y
¯
x
¯

= −t

2

− 2 (¯y − tsXY ) x¯ y
¯
y2 2
¯
s − 2 sXY +s2
Y
x2 X
¯
x
¯
n

= −t i=1 n

= −t i=1 b.
c.
9.16 a.

b.
c.
9.17 a.
b.

y
¯
+ y 2 − as2
¯
Y x ¯

y2
¯
y
¯
(xi − x)2 − 2 (xi − x)(y i − y ) + (y i − y )2
¯
¯
¯
¯
2
x
¯
x
¯
y
¯
(x − x) − (y i − y )
¯
¯ xi ¯

2

which is negative.
The parabola opens downward if a < 0, that is, if x2 < ts2 . This will happen if the test of
¯
X
H0 : µX = 0 accepts H0 at level α.
The parabola has no real roots if b2 < ac. This can only occur if a < 0.
√
The LRT (see Example 8.2.1) has rejection region {x : |x − θ0 | > zα/2 σ/ n}, acceptance
¯
√
√
region A(θ0 ) =√x : −zα/2 σ/ n ≤ x − θ0 ≤ zα/2 σ/ n}, and 1 − α conﬁdence interval C (θ) =
{
¯√
{θ : x − zα/2 σ/ n ≤ θ ≤ x + zα/2 σ/ n}.
¯
¯
√
We have a UMP test with √ rejection region {x : x − θ0 < −zα σ/ n}, acceptance region
¯
√
¯
A(θ0 ) = {x : x − θ0 ≥ −zα σ/ n}, and 1 − α conﬁdence interval C (θ) = {θ : x + zα σ/ n ≥ θ}.
¯
√
Similar to b), the UMP test has rejection region {x : x − θ0 > zα σ/ n}, acceptance region
¯
√
√
A(θ0 ) = {x : x − θ0 ≤ zα σ/ n}, and 1 − α conﬁdence interval C (θ) = {θ : x − zα σ/ n ≤ θ}.
¯
¯
Since X − θ ∼ uniform(−1/2, 1/2), P (a ≤ X − θ ≤ b) = b − a. Any a and b satisfying
1
b = a + 1 − α will do. One choice is a = − 2 + α , b = 1 − α .
2
2
2
Since T = X/θ has pdf f (t) = 2t, 0 ≤ t ≤ 1, b 2t dt = b2 − a2 .

P (a ≤ X/θ ≤ b) = a Any a and b satisfying b2 = a2 + 1 − α will do. One choice is a =
9.18 a. Pp (X = 1) = 3 p1 (1 − p)3−1 = 3p(1 − p)2 , maximum at p = 1/3.
1
Pp (X = 2) = 3 p2 (1 − p)3−2 = 3p2 (1 − p), maximum at p = 2/3.
2

α/2, b =

1 − α/2.

b. P (X = 0) = 3 p0 (1 − p)3−0 = (1 − p)3 , and this is greater than P (X = 2) if (1 − p)2 > 3p2 ,
0
or 2p2 + 2p − 1 < 0. At p = 1/3, 2p2 + 2p − 1 = −1/9.
c. To show that this is a 1 − α = .442 interval, compare with the interval in Example 9.2.11.
There are only two discrepancies. For example,
P (p ∈ interval | .362 < p < .634) = P (X = 1 or X = 2) > .442 by comparison with Sterne’s procedure, which is given by

Second Edition

x
0
1
2
3

9-7

interval
[.000,.305)
[.305,.634)
[.362,.762)
[.695,1].

9.19 For FT (t|θ) increasing in θ, there are unique values θU (t) and θL (t) such that FT (t|θ) < 1 − if and only if θ < θU (t) and FT (t|θ) > α if and only if θ > θL (t). Hence,
2

α
2

P (θL (T ) ≤ θ ≤ θU (T )) = P (θ ≤ θU (T )) − P (θ ≤ θL (T )) α α
= P FT (T ) ≤ 1 −
− P FT (T ) ≤
2
2
= 1 − α.
9.21 To construct a 1 − α conﬁdence interval for p of the form {p : ≤ p ≤ u} with P ( ≤ p ≤ u) =
1 − α, we use the method of Theorem 9.2.12. We must solve for and u in the equations
(1)

α
=
2

x

k=0

nk u (1 − u)n−k k and (2)

α
=
2

n

k=x

n k k

(1 − )n−k .

In equation (1) α/2 = P (K ≤ x) = P (Y ≤ 1 − u), where Y ∼ beta(n − x, x + 1) and
K ∼ binomial(n, u). This is Exercise 2.40. Let Z ∼ F2(n−x),2(x+1) and c = (n − x)/(x + 1). By
Theorem 5.3.8c, cZ/(1 + cZ ) ∼ beta(n − x, x + 1) ∼ Y . So we want α/2 = P

cZ
≤1−u
(1 + cZ )

=P

1 cu ≥
.
Z
1−u

From Theorem 5.3.8a, 1/Z ∼ F2(x+1),2(n−x) . So we need cu/(1 − u) = F2(x+1),2(n−x),α/2 . Solving for u yields x+1 F2(x+1),2(n−x),α/2
.
u = n−x x+1 1 + n−x F2(x+1),2(n−x),α/2
A similar manipulation on equation (2) yields the value for .
9.23 a. The LRT statistic for H0 : λ = λ0 versus H1 : λ = λ0 is
ˆ
ˆ g (y ) = e−nλ0 (nλ0 )y /e−nλ (nλ)y ,

ˆ where Y =
Xi ∼ Poisson(nλ) and λ = y/n. The acceptance region for this test is
A(λ0 ) = {y : g (y ) > c(λ0 )) where c(λ0 ) is chosen so that P (Y ∈ A(λ0 )) ≥ 1 − α. g (y ) is a unimodal function of y so A(λ0 ) is an interval of y values. Consider constructing A(λ0 ) for each λ0 > 0. Then, for a ﬁxed y , there will be a smallest λ0 , call it a(y ), and a largest λ0 , call it b(y ), such that y ∈ A(λ0 ). The conﬁdence interval for λ is then C (y ) = (a(y ), b(y )).
The values a(y ) and b(y ) are not expressible in closed form. They can be determined by a numerical search, constructing A(λ0 ) for diﬀerent values of λ0 and determining those values for which y ∈ A(λ0 ). (Jay Beder of the University of Wisconsin, Milwaukee, reminds us that since c is a function of λ, the resulting conﬁdence set need not be a highest density region of a likelihood function. This is an example of the eﬀect of the imposition of one type of inference (frequentist) on another theory (likelihood).)
b. The procedure in part a) was carried out for y = 558 and the conﬁdence interval was found to be (57.78, 66.45). For the conﬁdence interval in Example 9.2.15, we need the values χ2 ,.95 =
1116
1039.444 and χ2 ,.05 = 1196.899. This conﬁdence interval is (1039.444/18, 1196.899/18) =
1118
(57.75, 66.49). The two conﬁdence intervals are virtually the same.

9-8

Solutions Manual for Statistical Inference

9.25 The conﬁdence interval derived by the method of Section 9.2.3 is
C (y ) =

µ: y +

1 α 1 α log
≤ µ ≤ y + log 1 − n 2 n 2

where y = mini xi . The LRT method derives its interval from the test of H0 : µ = µ0 versus
H1 : µ = µ0 . Since Y is suﬃcient for µ, we can use fY (y | µ). We have λ(y )

supµ=µ0 L(µ|y ) supµ∈(−∞,∞) L(µ|y )

=

ne−n (y − µ0 )I[µ0 ,∞)(y) ne−(y−y) I[µ,∞)(y)

=

0 e−n(y−µ0 )

= e−n(y−µ0 ) I[µ0 ,∞) (y ) =

if y < µ0 if y ≥ µ0 .

We reject H0 if λ(y ) = e−n(y−µ0 ) < cα , where 0 ≤ cα ≤ 1 is chosen to give the test level α. To determine cα , set α = P { reject H0 | µ = µ0 } = P
=P
=

Y > µ0 −

−e−n(y−µ0 )

µ0 − log cα n log cα or Y < µ0 µ = µ0 n ∞

log cα µ = µ0 n ∞

Y > µ0 −

ne−n(y−µ0 ) dy

= µ0 − log cα n = elog cα

= cα .

Therefore, cα = α and the 1 − α conﬁdence interval is µ: µ ≤ y ≤ µ −

C (y ) =

log α n =

µ: y +

1 log α ≤ µ ≤ y . n To use the pivotal method, note that since µ is a location parameter, a natural pivotal quantity is Z = Y − µ. Then, fZ (z ) = ne−nz I(0,∞) (z ). Let P {a ≤ Z ≤ b} = 1 − α, where a and b satisfy α =
2

a

ne−nz dz = −e−nz
0

α
=
2

a
0

= 1 − e−na

∞

ne−nz dz = −e−nz b ∞ b = e−nb

α
2
− log 1 −
⇒ a= n α
⇒ −nb = log
2
1 α ⇒ b = − log n 2
⇒ e−na = 1 −

α
2

Thus, the pivotal interval is Y + log(α/2)/n ≤ µ ≤ Y + log(1 − α/2), the same interval as from
Example 9.2.13. To compare the intervals we compare their lengths. We have
Length of LRT interval
Length of Pivotal interval

1
1
log α) = − log α n n
1
1 y + log(1 − α/2) − (y + log α/2) = n n

= y − (y +
=

1
1 − α/2 log n α/2 Thus, the LRT interval is shorter if − log α < log[(1 − α/2)/(α/2)], but this is always satisﬁed.
9.27 a. Y = Xi ∼ gamma(n, λ), and the posterior distribution of λ is π (λ|y ) =

(y + 1 )n+a
1
1
1
b e− λ (y+ b ) ,
Γ(n + a) λn+a+1

Second Edition

9-9

an IG n + a, (y + 1 )−1 . The Bayes HPD region is of the form {λ : π (λ|y ) ≥ k }, which is b an interval since π (λ|y ) is unimodal. It thus has the form {λ : a1 (y ) ≤ λ ≤ a2 (y )}, where a1 and a2 satisfy
1
1
1
1
1
1 e− a1 (y+ b ) = n+a+1 e− a2 (y+ b ) . n+a+1 a1 a2 b. The posterior distribution is IG(((n − 1)/2) + a, (((n − 1)s2 /2) + 1/b)−1 ). So the Bayes HPD region is as in part a) with these parameters replacing n + a and y + 1/b.
c. As a → 0 and b → ∞, the condition on a1 and a2 becomes
2
1 (n−1)s
2

1

a1

e− a1
((n−1)/2)+1

=

2
1 (n−1)s
2

1

a2

e− a2
((n−1)/2)+1

.

9.29 a. We know from Example 7.2.14 that if π (p) ∼ beta(a, b), the posterior is π (p|y ) ∼ beta(y + a, n − y + b) for y = xi . So a 1 − α credible set for p is:
{p : βy+a,n−y+b,1−α/2 ≤ p ≤ βy+a,n−y+b,α/2 }.
b. Converting to an F distribution, βc,d =

1

(c/d)F2c,2d
1+(c/d)F2c,2d ,

y +a n−y +b F2(y +a),2(n−y +b),1−α/2 ya + n−++b F2(y+a),2(n−y+b),1−α/2 y ≤p≤

1

the interval is

y +a n−y +b F2(y +a),2(n−y +b),α/2 ya + n−++b F2(y+a),2(n−y+b),α/2 y −1 or, using the fact that Fm,n = Fn,m ,

1
1+

n−y +b y +a F2(n−y +b),2(y +a),α/2

≤p≤

1

y +a n−y +b F2(y +a),2(n+b),α/2
.
ya
+ n−++b F2(y+a),2(n−y+b),α/2 y For this to match the interval of Exercise 9.21, we need x = y and
Lower limit: n − y + b = n − x + 1 y+a=x Upper limit: y + a = x + 1 n−y+b=n−x ⇒
⇒
⇒
⇒

b=1 a=0 a=1 b = 0.

So no values of a and b will make the intervals match.
9.31 a. We continually use the fact that given Y = y , χ2y is a central χ2 random variable with 2y
2
degrees of freedom. Hence
Eχ2Y
2
Varχ2Y
2

=
=
=

E[E(χ2Y |Y )] = E2Y = 2λ
2
E[Var(χ2Y |Y )] + Var[E(χ2Y |Y )]
2
2
E[4Y ] + Var[2Y ] = 4λ + 4λ = 8λ

mgf

=

Eetχ2Y

2

∞

= y =0

e−λ

2

= E[E(etχ2Y |Y )] = E λ 1−2t

y!

y λ = e−λ+ 1−2t .

√
From Theorem 2.3.15, the mgf of (χ2Y − 2λ)/ 8λ is
2
√ λ√ −λ+
1−t/ 2λ
.
e−t λ/2 e

1
1 − 2t

Y

9-10

Solutions Manual for Statistical Inference

The log of this is
√
t2 λ t2 λ
√=
√=
√
√√
→ t2 /2 as λ → ∞,
− λ/2t − λ +
−t 2 + 2 λ
−(t 2/ λ) + 2
1 − t/ 2λ
2

so the mgf converges to et /2 , the mgf of a standard normal.
b. Since P (χ2Y ≤ χ2Y,α ) = α for all λ,
2
2 χ2Y,α − 2λ
2
√
→ zα as λ → ∞.
8λ
In standardizing (9.2.22), the upper bound is nb 2 nb+1 χ2(Y +a),α/2

√

− 2λ

8λ

=

nb
2
nb+1 [χ2(Y +a),α/2

8(λ + a)
8λ

− 2(λ + a)]

8(λ + a)

+

nb nb+1 2(λ

+ a) − 2λ

8(λ + a)

.

While the ﬁrst quantity in square brackets → zα/2 , the second one has limit lim nb
−2 nb1 λ + a nb+1
+1

λ→∞

8(λ + a)

→ −∞,

so the coverage probability goes to zero.
9.33 a. Since 0 ∈ Ca (x) for every x, P ( 0 ∈ Ca (X )| µ = 0) = 1. If µ > 0,
P (µ ∈ Ca (X ))

= P (µ ≤ max{0, X + a}) = P (µ ≤ X + a)
= P (Z ≥ −a)
= .95

(since µ > 0)
(Z ∼ n(0, 1))
(a = 1.645.)

A similar calculation holds for µ < 0.
b. The credible probability is max(0,x+a) min(0,x−a)

max(−x,a)
2
12
1
1
1
√ e− 2 t dt
√ e− 2 (µ−x) dµ =
2π
2π min(−x,−a) = P (min(−x, −a) ≤ Z ≤ max(−x, a)) .

To evaluate this probability we have two cases:
(i)
(ii)

|x| ≤ a
|x| > a

⇒
⇒

credible probability = P (|Z | ≤ a) credible probability = P (−a ≤ Z ≤ |x|)

Thus we see that for a = 1.645, the credible probability is equal to .90 if |x| ≤ 1.645 and increases to .95 as |x| → ∞.
√
√
9.34 a. A 1 − α conﬁdence interval for µ is {µ : x − 1.96σ/ n ≤ µ ≤ x + 1.96σ/ n}. We need
¯
¯
√
√
2
2(1.96)σ/ n ≤ σ/4 or n ≥ 4(2)(1.96). Thus we need n ≥ 64(1.96) ≈ 245.9. So n = 246 suﬃces. √
b. The length of a 95% conﬁdence interval is 2tn−1,.025 S/ n. Thus we need
P

S σ 2tn−1,.025 √ ≤
4
n

≥ .9 ⇒ P

S2 σ2 4t2 −1,.025
≤
n n 16


≥ .9


 (n − 1)S 2
(n − 1)n 


≤2
⇒ P
 ≥ .9. σ2 tn−1,.025 · 64 

∼χ2 −1 n Second Edition

9-11

We need to solve this numerically for the smallest n that satisﬁes the inequality
(n − 1)n
≥ χ2 −1,.1 . n t2 −1,.025 · 64 n Trying diﬀerent values of n we ﬁnd that the smallest such n is n = 276 for which
(n − 1)n
= 306.0 ≥ 305.5 = χ2 −1,.1 . n · 64

t2 −1,.025 n As to be expected, this is somewhat larger than the value found in a).
√
√
9.35 length = 2zα/2 σ/ n, and if it is unknown, E(length) = 2tα/2,n−1 cσ/ n, where
√
c=

n − 1Γ( n−1 )
2
√
2Γ(n/2)

√ and EcS = σ (Exercise 7.50). Thus the diﬀerence in lengths is (2σ/ n)(zα/2 − ctα/2 ). A little work will show that, as n → ∞, c → constant. (This can be done using Stirling’s formula along with Lemma 2.3.14. In fact, some careful algebra will show that c → 1 as n → ∞.) Also, we know
√
that, as n → ∞, tα/2,n−1 → zα/2 . Thus, the diﬀerence in lengths (2σ/ n)(zα/2 − ctα/2 ) → 0 as n → ∞.
9.36 The sample pdf is n eiθ−xi I(iθ,∞) (xi ) = eΣ(iθ−xi ) I(θ,∞) [min(xi /i)].

f (x1 , . . . , xn |θ) = i=1 Thus T = min(Xi /i) is suﬃcient by the Factorization Theorem, and n P (T > t) =

n

i=1

n

∞

eiθ−x dx =

P (Xi > it) = i=1 it

ei(θ−t) = e−

n(n+1)
(t−θ )
2

,

i=1

and

n(n + 1) − n(n+1) (t−θ)
2
e
, t ≥ θ.
2
Clearly, θ is a location parameter and Y = T − θ is a pivot. To ﬁnd the shortest conﬁdence interval of the form [T + a, T + b], we must minimize b − a subject to the constraint P (−b ≤
Y ≤ −a) = 1 − α. Now the pdf of Y is strictly decreasing, so the interval length is shortest if
−b = 0 and a satisﬁes n(n+1) P (0 ≤ Y ≤ −a) = e− 2 a = 1 − α. fT (t) =

So a = 2 log(1 − α)/(n(n + 1)).
9.37 a. The density of Y = X(n) is fY (y ) = ny n−1 /θn , 0 < y < θ. So θ is a scale parameter, and
T = Y /θ is a pivotal quantity. The pdf of T is fT (t) = ntn−1 , 0 ≤ t ≤ 1.
b. A pivotal interval is formed from the set
{θ : a ≤ t ≤ b} = θ : a ≤

y y y
≤ b = θ: ≤ θ ≤
,
θ b a

and has length Y (1/a − 1/b) = Y (b − a)/ab. Since fT (t) is increasing, b − a is minimized and ab is maximized if b = 1. Thus shortest interval will have b = 1 and a satisfying a α = 0 ntn−1 dt = an ⇒ a = α1/n . So the shortest 1 − α conﬁdence interval is {θ : y ≤ θ ≤
1/n
y/α }.

9-12

Solutions Manual for Statistical Inference a 9.39 Let a be such that −∞ f (x) dx = α/2. This value is unique for a unimodal pdf if α > 0. Let µ
∞
be the point of symmetry and let b = 2µ − a. Then f (b) = f (a) and b f (x) dx = α/2. a ≤ µ a µ since −∞ f (x) dx = α/2 ≤ 1/2 = −∞ f (x) dx. Similarly, b ≥ µ. And, f (b) = f (a) > 0 since a f (a) ≥ f (x) for all x ≤ a and −∞ f (x) dx = α/2 > 0 ⇒ f (x) > 0 for some x < a ⇒ f (a) > 0.
So the conditions of Theorem 9.3.2 are satisﬁed.
9.41 a. We show that for any interval [a, b] and > 0, the probability content of [a − , b − ] is greater (as long as b − > a). Write b− a

b

f (x) dx − b a

f (x) dx −

f (x) dx = a− b−

f (x) dx a− ≤ f (b − )[b − (b − )] − f (a)[a − (a − )]
≤ [f (b − ) − f (a)] ≤ 0, where all of the inequalities follow because f (x) is decreasing. So moving the interval toward zero increases the probability, and it is therefore maximized by moving a all the way to zero.
b. T = Y − µ is a pivot with decreasing pdf fT (t) = ne−nt I[0,∞] (t). The shortest 1 − α interval
1
on T is [0, − n log α], since b ne−nt dt = 1 − α ⇒ b = −
0

1 log α. n 1
Since a ≤ T ≤ b implies Y − b ≤ µ ≤ Y − a, the best 1 − α interval on µ is Y + n log α ≤ µ ≤ Y .
9.43 a. Using Theorem 8.3.12, identify g (t) with f (x|θ1 ) and f (t) with f (x|θ0 ). Deﬁne φ(t) = 1 if t ∈ C and 0 otherwise, and let φ be the indicator of any other set C satisfying C f (t) dt ≥
1 − α. Then (φ(t) − φ (t))(g (t) − λf (t)) ≤ 0 and

0≥

(φ − φ )(g − λf ) =

g−
C

g−λ
C

f−
C

f≥
C

g−
C

g,
C

showing that C is the best set.
b. For Exercise 9.37, the pivot T = Y /θ has density ntn−1 , and the pivotal interval a ≤ T ≤ b results in the θ interval Y /b ≤ θ ≤ Y /a. The length is proportional to 1/a − 1/b, and thus g (t) = 1/t2 . The best set is {t : 1/t2 ≤ λntn−1 }, which is a set of the form {t : a ≤ t ≤ 1}.
This has probability content 1 − α if a = α1/n . For Exercise 9.24 (or Example 9.3.4), the g function is the same and the density of the pivot is fk , the density of a gamma(k, 1). The b set {t : 1/t2 ≤ λfk (t)} = {t : fk+2 (t) ≥ λ }, so the best a and b satisfy a fk (t) dt = 1 − α and fk+2 (a) = fk+2 (b).
9.45 a. Since Y =
Xi ∼ gamma(n, λ) has MLR, the Karlin-Rubin Theorem (Theorem 8.3.2) shows that the UMP test is to reject H0 if Y < k (λ0 ), where P (Y < k (λ0 )|λ = λ0 ) = α.
1
b. T = 2Y /λ ∼ χ2n so choose k (λ0 ) = 2 λ0 χ2n,α and
2
2
{λ : Y ≥ k (λ)} = is the UMA conﬁdence set.
c. The expected length is E χ2Y
2

2n,α

λ: Y ≥

=

12 λχ 2 2n,α

= λ : 0 < λ ≤ 2Y /χ2n,α
2

2nλ
.
χ2n,α
2

d. X(1) ∼ exponential(λ/n), so EX(1) = λ/n. Thus
E(length(C ∗ ))

=

E(length(C m ))

=

2 × 120 λ = .956λ
251.046
−λ
= .829λ.
120 × log(.99)

Second Edition

9-13

9.46 The proof is similar to that of Theorem 9.3.5:
Pθ (θ ∈ C ∗ (X )) = Pθ (X ∈ A∗ (θ )) ≤ Pθ (X ∈ A(θ )) = Pθ (θ ∈ C (X )) , where A and C are any competitors. The inequality follows directly from Deﬁnition 8.3.11.
9.47 Referring to (9.3.2), we want to show that for the upper conﬁdence bound, Pθ (θ ∈ C ) ≤ 1 − α if θ ≥ θ. We have
√
¯
Pθ (θ ∈ C ) = Pθ (θ ≤ X + zα σ/ n).
Subtract θ from both sides and rearrange to get
Pθ (θ ∈ C ) = Pθ

¯ θ −θ
X −θ
√≤
√ + zα σ/ n σ/ n

=P

Z≥

θ −θ
√ − zα , σ/ n

which is less than 1 − α as long as θ ≥ θ. The solution for the lower conﬁdence interval is similar. 9.48 a. Start with the hypothesis test H0 : θ ≥ θ0 versus H1 : θ < θ0 . Arguing as in Example 8.2.4
√
¯ and Exercise 8.47, we ﬁnd that the LRT rejects H0 if (X − θ0 )/(S/ n) < −tn−1,α . So the
√
acceptance region is {x : (¯ − θ0 )/(s/ n) ≥ −tn−1,α } and the corresponding conﬁdence set x √ is {θ : x + tn−1,α s/ n ≥ θ}.
¯
b. The test in part a) is the UMP unbiased test so the interval is the UMA unbiased interval.
√
¯
9.49 a. Clearly, for each σ , the conditional probability Pθ0 (X > θ0 + zα σ/ n | σ )√ α, hence the
=
test has unconditional size α. The conﬁdence set is {(θ,σ ) : θ ≥ x − zα σ/ n}, which has
¯
conﬁdence coeﬃcient 1 − α conditionally and, hence, unconditionally.
b. From the Karlin-Rubin Theorem, the UMP test is to reject H0 if X > c. To make this size α, Pθ0 (X > c)

= Pθ0 (X > c| σ = 10) P (σ = 10) + P (X > c| σ = 1) P (σ = 1) c − θ0
X − θ0
= pP
>
+ (1 − p)P (X − θ0 > c − θ0 )
10
10 c − θ0
+ (1 − p)P (Z > c − θ0 ),
= pP Z >
10

where Z ∼ n(0, 1). Without loss of generality take θ0 = 0. For c = z(α−p)/(1−p) we have for the proposed test
Pθ0 (reject)

= p + (1 − p)P Z > z(α−p)/(1−p)
= p + (1 − p)

(α−p)
(1 − p)

= p + α − p = α.

This is not UMP, but more powerful than part a. To get UMP, solve for c in pP (Z > c/10) + (1 − p)P (Z > c) = α, and the UMP test is to reject if X > c. For p = 1/2, α = .05, we get c = 12.81. If α = .1 and p = .05, c = 1.392 and z .1−.05 =.0526 = 1.62.
.95

9.51
Pθ (θ ∈ C (X1 , . . . , Xn ))

¯
¯
= Pθ X − k1 ≤ θ ≤ X + k2
¯
= Pθ −k2 ≤ X − θ ≤ k1
= Pθ −k2 ≤

Zi /n ≤ k1 ,

where Zi = Xi − θ, i = 1, . . . , n. Since this is a location family, for any θ, Z1 , . . . , Zn are iid with pdf f (z ), i. e., the Zi s are pivots. So the last probability does not depend on θ.

9-14

Solutions Manual for Statistical Inference

9.52 a. The LRT of H0 : σ = σ0 versus H1 : σ = σ0 is based on the statistic supµ,σ=σ0 L (µ, σ0 | x)
.
supµ,σ∈(0,∞) L(µ, σ 2 | x)

λ(x) =

In the denominator, σ 2 = (xi − x)2 /n and µ = x are the MLEs, while in the numerator,
ˆ
¯
ˆ¯
2 σ0 and µ are the MLEs. Thus
ˆ

λ(x) =

2
2πσ0

−n/2 −

e

−n/2 −
(2π σ 2 )
ˆ
e

Σ(xi −x)2
¯
2σ 2
0
Σ(xi −x)2
¯
2σ 2

=

−n/2

2 σ0 σ2
ˆ

−

e

Σ(xi −x)2
¯
2σ 2
0

e−n/2

,

and, writing σ 2 = [(n − 1)/n]s2 , the LRT rejects H0 if
ˆ
2 σ0 n−1 2 ns −n/2

−

e

(n−1)s2
2σ 2
0

< kα ,

where kα is chosen to give a size α test. If we denote t =

(n−1)s2
,
2 σ0 then T ∼ χ2 −1 under H0 , n and the test can be written: reject H0 if tn/2 e−t/2 < kα . Thus, a 1 − α conﬁdence set is σ 2 : tn/2 e−t/2 ≥ kα =

σ2 :

n/2

(n − 1)s2 σ2 e−

(n−1)s2 σ2 /2

≥ kα

.

Note that the function tn/2 e−t/2 is unimodal (it is the kernel of a gamma density) so it follows that the conﬁdence set is of the form σ 2 : tn/2 e−t/2 ≥ kα

(n − 1)s2
≤b
σ2
(n − 1)s2
(n − 1)s2 σ2 :
≤ σ2 ≤
,
b b σ2 : a ≤ t ≤ b

=
=

σ2 : a ≤

=

where a and b satisfy an/2 e−a/2 = bn/2 e−b/2 (since they are points on the curve tn/2 e−t/2 ).
Since n = n+2 − 1, a and b also satisfy
2
2
Γ

n+2
2

1 a((n+2)/2)−1 e−a/2 =
Γ
2(n+2)/2

n+2
2

1 b((n+2)/2)−1 e−b/2 ,
2(n+2)/2

or, fn+2 (a) = fn+2 (b).
b. The constants a and b must satisfy fn−1 (b)b2 = fn−1 (a)a2 . But since b((n−1)/2)−1 b2 = b((n+3)/2)−1 , after adjusting constants, this is equivalent to fn+3 (b) = fn+3 (a). Thus, the values of a and b that give the minimum length interval must satisfy this along with the probability constraint. The conﬁdence interval, say I (s2 ) will be unbiased if (Deﬁnition 9.3.7)
c.
2

2

Pσ2 σ 2 ∈ I (S ) ≤ Pσ2 σ 2 ∈ I (S ) = 1 − α.
Some algebra will establish
2

2

Pσ2 σ 2 ∈ I (S )

= Pσ2

σ2
(n − 1)S
(n − 1)S
≤ 2≤
2
bσ σ aσ 2

= Pσ2

χ2 −1 χ2 σ2 n ≤ 2 ≤ n−1 b σ a 2

bc

=

fn−1 (t) dt, ac Second Edition

9-15

where c = σ 2 /σ 2 . The derivative (with respect to c) of this last expression is bfn−1 (bc) − afn−1 (ac), and hence is equal to zero if both c = 1 (so the interval is unbiased) and bfn−1 (b) = afn−1 (a). From the form of the chi squared pdf, this latter condition is equivalent to fn+1 (b) = fn+1 (a).
d. By construction, the interval will be 1 − α equal-tailed.
9.53 a. E [blength(C ) − IC (µ)] = 2cσb − P (|Z | ≤ c), where Z ∼ n(0, 1).
2

[2cσb − P (|Z | ≤ c)] = 2σb − 2 √1 π e−c /2 .
2
√
2
c. If bσ > 1/ 2π the derivative is always positive since e−c /2 < 1.

b.

d dc 9.55
E[L((µ,σ ), C )]

= E [L((µ,σ ), C )|S < K ] P (S < K ) + E [L((µ,σ ), C )|S > K ] P (S > K )
= E L((µ,σ ), C )|S < K P (S < K ) + E [L((µ,σ ), C )|S > K ] P (S > K )
= R L((µ,σ ), C ) + E [L((µ,σ ), C )|S > K ] P (S > K ),

where the last equality follows because C = ∅ if S > K . The conditional expectation in the second term is bounded by
E [L((µ,σ ), C )|S > K ]

=
=
>
=

E [blength(C ) − IC (µ)|S > K ]
E [2bcS − IC (µ)|S > K ]
E [2bcK − 1|S > K ]
(since S > K and IC ≤ 1)
2bcK − 1,

which is positive if K > 1/2bc. For those values of K , C dominates C .
¯
9.57 a. The distribution of Xn+1 − X is n[0, σ 2 (1 + 1/n)], so
¯
P Xn+1 ∈ X ± zα/2 σ

1 + 1/n = P (|Z | ≤ zα/2 ) = 1 − α.

b. p percent of the normal population is in the interval µ ± zp/2 σ , so x ± kσ is a 1 − α tolerance
¯
interval if
¯
¯
¯
P (µ ± zp/2 ⊆ σ X ± kσ ) = P (X − kσ ≤ µ − zp/2 σ and X + kσ ≥ µ + zp/2 σ ) ≥ 1 − α.
This can be attained by requiring
¯
¯
P (X − kσ ≥ µ − zp/2 σ ) = α/2 and P (X + kσ ≤ µ + zp/2 σ ) = α/2,
√
which is attained for k = zp/2 + zα/2 / n.
¯
¯
c. From part (a), (Xn+1 − X )/(S 1 + 1/n) ∼ tn−1 , so a 1 − α prediction interval is X ± tn−1,α/2 S 1 + 1/n.

Chapter 10

Asymptotic Evaluations

10.1 First calculate some moments for this distribution.
EX = θ/3,

E X 2 = 1/3,

VarX =

1 θ2
−.
3
9

¯
So 3Xn is an unbiased estimator of θ with variance
¯
Var(3Xn ) = 9(VarX )/n = (3 − θ2 )/n → 0 as n → ∞.
¯
So by Theorem 10.1.3, 3Xn is a consistent estimator of θ.
10.3 a. The log likelihood is
−

1 n log (2πθ) −
2
2

(xi − θ)/θ.

Diﬀerentiate and set equal to zero, and a little algebra will show that the MLE is the root
√
of θ2 + θ − W = 0. The roots of this equation are (−1 ± 1 + 4W )/2, and the MLE is the root with the plus sign, as it has to be nonnegative.
b. The second derivative of the log likelihood is (−2
Fisher information of
I (θ) = −Eθ

−2

x2 + nθ)/(2θ3 ), yielding an expected i 2
2nθ + n
Xi + nθ
=
,
2θ3
2θ2

and by Theorem 10.1.12 the variance of the MLE is 1/I (θ).
10.4 a. Write
Xi Yi
2=
Xi

Xi (Xi + i )
=1+
2
Xi

Xi i
2.
Xi

From normality and independence
EXi

i

= 0,

VarXi

i

= σ 2 (µ2 + τ 2 ),

2
EXi = µ2 + τ 2 ,

2
VarXi = 2τ 2 (2µ2 + τ 2 ),

and Cov(Xi , Xi i ) = 0. Applying the formulas of Example 5.5.27, the asymptotic mean and variance are
E

Xi Yi
2
Xi

≈ 1 and Var

Xi Yi
2
Xi

≈

nσ 2 (µ2 + τ 2 ) σ2 =
[n(µ2 + τ 2 )]2 n(µ2 + τ 2 )

b.
Yi
=β+
Xi
with approximate mean β and variance σ 2 /(nµ2 ).

i

Xi

10-2

Solutions Manual for Statistical Inference

c.

1 n Yi
1
=β+
Xi
n

i

Xi

with approximate mean β and variance σ 2 /(nµ2 ).
2
10.5 a. The integral of ETn is unbounded near zero. We have
2
ETn >

n
2πσ 2

1
0

1 −(x−µ)2 /2σ2 e dx > x2 2

n
K
2πσ 2

1
0

1 dx = ∞, x2 2

where K = max0≤x≤1 e−(x−µ) /2σ
b. If we delete the interval (−δ, δ ), then the integrand is bounded, that is, over the range of integration 1/x2 < 1/δ 2 .
c. Assume µ > 0. A similar argument works for µ < 0. Then
√
√
√
√
P (−δ < X < δ ) = P [ n(−δ − µ) < n(X − µ) < n(δ − µ)] < P [Z < n(δ − µ)], where Z ∼ n(0, 1). For δ < µ, the probability goes to 0 as n → ∞.
10.7 We need to assume that τ (θ) is diﬀerentiable at θ = θ0 , the true value of the parameter. Then we apply Theorem 5.5.24 to Theorem 10.1.12.
10.9 We will do a more general problem that includes a) and b) as special cases. Suppose we want to estimate λt e−λ /t! = P (X = t). Let
T = T (X1 , . . . , Xn ) =

1 if X1 = t
0 if X1 = t.

Then ET = P (T = 1) = P (X1 = t), so T is an unbiased estimator. Since
Xi is a complete suﬃcient statistic for λ, E(T | Xi ) is UMVUE. The UMVUE is 0 for y = Xi < t, and for y ≥ t,
E(T |y )

= P (X1 = t|
=
=
=
=

Xi = y )

P (X1 = t, Xi = y )
P ( Xi = y ) n P (X1 = t)P ( i=2 Xi = y − t)
P ( Xi = y )
{λt e−λ /t!}{[(n − 1)λ]y−t e−(n−1)λ /(y − t)!}
(nλ)y e−nλ /y ! y −t y (n − 1)
.
ny t a. The best unbiased estimator of e−λ is ((n − 1)/n)y .
b. The best unbiased estimator of λe−λ is (y/n)[(n − 1)/n]y−1
c. Use the fact that for constants a and b, d aλ λ b = bλ λa−1 (a + λ log b), dλ to calculate the asymptotic variances of the UMVUEs. We have for t = 0,
ARE

n−1 n ˆ nλ , e−λ

=

2

e−λ n−1 nλ n log

n−1 n n ,

Second Edition

10-3

and for t = 1
ARE

nˆ λ n−1

n−1 n ˆ nλ ˆ
, λe−λ

2

(λ − 1)e−λ

=

n n−1 n−1 nλ n 1 + log

n−1 n n .

Since [(n − 1)/n]n → e−1 as n → ∞, both of these AREs are equal to 1 in the limit.
ˆ
¯
d. For these data, n = 15,
Xi = y = 104 and the MLE of λ is λ = X = 6.9333. The estimates are
MLE
UMVUE
P (X = 0) .000975 .000765
P (X = 1) .006758 .005684
10.11 a. It is easiest to use the Mathematica code in Example A.0.7. The second derivative of the log likelihood is
∂2
log
∂µ2

1 x−1+µ/β e−x/β
Γ[µ/β ]β µ/β

=

1 ψ (µ/β ), β2 where ψ (z ) = Γ (z )/Γ(z ) is the digamma function.
b. Estimation of β does not aﬀect the calculation.
c. For µ = αβ known, the MOM estimate of β is x/α. The MLE comes from diﬀerentiating
¯
the log likelihood d set
−αn log β − xi /β = 0 ⇒ β = x/α.
¯
dβ i d. The MOM estimate of β comes from solving
1
n

xi = µ and i 1 n x2 = µ2 + µβ, i i

˜ˆ¯ which yields β = σ 2 /x. The approximate variance is quite a pain to calculate. Start from
1
2 µβ, Eˆ 2 ≈ µβ, Varˆ 2 ≈ µβ 3 , σ σ n n where we used Exercise 5.8(b) for the variance of σ 2 . Now using Example 5.5.27 and (and
ˆ
˜ ≈ 3β 3 . The ARE is then assuming the covariance is zero), we have Varβ nµ ¯
EX = µ,

¯
VarX =

ˆ˜
ARE(β, β ) = 3β 3 /µ

E−

d2 l(µ, β |X dβ 2

.

Here is a small table of AREs. There are some entries that are less than one - this is due to using an approximation for the MOM variance. µ β
1
2
3
4
5
6
7
8
9
10

1
1.878
4.238
6.816
9.509
12.27
15.075
17.913
20.774
23.653
26.546

3
0.547
1.179
1.878
2.629
3.419
4.238
5.08
5.941
6.816
7.704

6
0.262
0.547
0.853
1.179
1.521
1.878
2.248
2.629
3.02
3.419

10
0.154
0.317
0.488
0.667
0.853
1.046
1.246
1.451
1.662
1.878

10-4

Solutions Manual for Statistical Inference

10.13 Here are the 35 distinct samples from {2, 4, 9, 12} and their weights.
{12, 12, 12, 12}, 1/256
{9, 9, 9, 12}, 1/64
{4, 9, 12, 12}, 3/64
{4, 4, 12, 12}, 3/128
{4, 4, 4, 12}, 1/64
{2, 12, 12, 12}, 1/64
{2, 9, 9, 9}, 1/64
{2, 4, 9, 9}, 3/64
{2, 4, 4, 4}, 1/64
{2, 2, 9, 9}, 3/128
{2, 2, 4, 4}, 3/128
{2, 2, 2, 4}, 1/64

{9, 12, 12, 12}, 1/64
{9, 9, 9, 9}, 1/256
{4, 9, 9, 12}, 3/64
{4, 4, 9, 12}, 3/64
{4, 4, 4, 9}, 1/64
{2, 9, 12, 12}, 3/64
{2, 4, 12, 12}, 3/64
{2, 4, 4, 12}, 3/64
{2, 2, 12, 12}, 3/128
{2, 2, 4, 12}, 3/64
{2, 2, 2, 12}, 1/64
{2, 2, 2, 2}, 1/256

{9, 9, 12, 12}, 3/128
{4, 12, 12, 12}, 1/64
{4, 9, 9, 9}, 1/64
{4, 4, 9, 9}, 3/128
{4, 4, 4, 4}, 1/256
{2, 9, 9, 12}, 3/64
{2, 4, 9, 12}, 3/32
{2, 4, 4, 9}, 3/64
{2, 2, 9, 12}, 3/64
{2, 2, 4, 9}, 3/64
{2, 2, 2, 9}, 1/64

The veriﬁcations of parts (a) − (d) can be done with this table, or the table of means in Example A.0.1 can be used. For part (e),verifying the bootstrap identities can involve much painful algebra, but it can be made easier if we understand what the bootstrap sample space (the space of all nn bootstrap samples) looks like. Given a sample x1 , x2 , . . . , xn , the bootstrap sample space can be thought of as a data array with nn rows (one for each bootstrap sample) and n columns, so each row of the data array is one bootstrap sample.
For example, if the sample size is n = 3, the bootstrap sample space is x1 x1 x1 x1 x1 x1 x1 x1 x1 x2 x2 x2 x2 x2 x2 x2 x2 x2 x3 x3 x3 x3 x3 x3 x3 x3 x3 x1 x1 x1 x2 x2 x2 x3 x3 x3 x1 x1 x1 x2 x2 x2 x3 x3 x3 x1 x1 x1 x2 x2 x2 x3 x3 x3

x1 x2 x3 x1 x2 x3 x1 x2 x3 x1 x2 x3 x1 x2 x3 x1 x2 x3 x1 x2 x3 x1 x2 x3 x1 x2 x3

Note the pattern. The ﬁrst column is 9 x1 s followed by 9 x2 s followed by 9 x3 s, the second column is 3 x1 s followed by 3 x2 s followed by 3 x3 s, then repeated, etc. In general, for the entire bootstrap sample,

Second Edition

10-5

◦ The ﬁrst column is nn−1 x1 s followed by nn−1 x2 s followed by, . . ., followed by nn−1 xn s
◦ The second column is nn−2 x1 s followed by nn−2 x2 s followed by, . . ., followed by nn−2 xn s, repeated n times
◦ The third column is nn−3 x1 s followed by nn−3 x2 s followed by, . . ., followed by nn−3 xn s, repeated n2 times
.
.
.
◦ The nth column is 1 x1 followed by 1 x2 followed by, . . ., followed by 1 xn , repeated nn−1 times So now it is easy to see that each column in the data array has mean x, hence the entire
¯
bootstrap data set has mean x. Appealing to the 33 × 3 data array, we can write the
¯
numerator of the variance of the bootstrap means as
3

3

3

1
(xi + xj + xk ) − x
¯
3

i=1 j =1 k=1

1
32

=

1
32

=

3

3

2

3
2

[(xi − x) + (xj − x) + (xk − x)]
¯
¯
¯
i=1 j =1 k=1
3

3

3

(xi − x)2 + (xj − x)2 + (xk − x)2 ,
¯
¯
¯
i=1 j =1 k=1

because all of the cross terms are zero (since they are the sum of deviations from the mean).
Summing up and collecting terms shows that
1
32

3

3

3

3

(xi − x)2 + (xj − x)2 + (xk − x)2 = 3
¯
¯
¯
i=1 j =1 k=1

(xi − x)2 ,
¯
i=1

and thus the average of the variance of the bootstrap means is
3

3 i=1 (xi
33

− x)2
¯

¯ which is the usual estimate of the variance of X if we divide by n instead of n − 1. The general result should now be clear. The variance of the bootstrap means is n n

n

··· i1 =1 i2 =1

=

1 n2 in =1 n n

1
(xi + xi2 + · · · + xin ) − x
¯
n1

2

n

(xi1 − x)2 + (xi2 − x)2 + · · · + (xin − x)2 ,
¯
¯
¯

··· i1 =1 i2 =1

in =1

since all of the cross terms are zero. Summing and collecting terms shows that the sum is n n nn−2 i=1 (xi − x)2 , and the variance of the bootstrap means is nn−2 i=1 (xi − x)2 /nn =
¯
¯ n 2
2
(xi − x) /n .
¯
i=1
ˆ
ˆ
10.15 a. As B → ∞ Var∗ (θ) = Var∗ (θ).
B
∗ˆ
b. Each VarBi (θ) is a sample variance, and they are independent so the LLN applies and
1
m

m

ˆ m→∞
ˆ
ˆ
Var∗ i (θ) → EVar∗ (θ) = Var∗ (θ),
B
B i=1 where the last equality follows from Theorem 5.2.6(c).

10-6

Solutions Manual for Statistical Inference

10.17 a. The correlation is .7781
b. Here is R code (R is available free at http://cran.r-project.org/) to bootstrap the data, calculate the standard deviation, and produce the histogram: cor(law) n k χ2 −k /(N − k )
N

≥P

χ2 −1 (0)/(k − 1) k >k χ2 −k /(N − k )
N

= α,

where the inequality follows from the fact that the noncentral chi squared is stochastically increasing in the noncentrality parameter.

11-6

Solutions Manual for Statistical Inference

11.14 Let Xi ∼ n(θi , σ 2 ). Then from Exercise 11.11
Cov
Var

i

i

a
√ i Xi ci a
√ i Xi , ci √ i a2 i ci ,

= σ2

ci vi Xi = σ 2
Var

√ i ai vi

ci vi Xi = σ 2

2 ci vi ,

and the Cauchy-Schwarz inequality gives a2 /ci ≤ i ai vi

2 ci vi .

If ai = ci vi this is an equality, hence the LHS is maximized. The simultaneous statement is equivalent to k i=1

2

ai (¯i· − θi ) y k i=1 s2 p a2 /n i ≤ M for all a1 , . . . , ak ,

and the LHS is maximized by ai = ni (¯i· − θi ). This produces the F statistic. y 2
11.15 a. Since tν = F1,ν , it follows from Exercise 5.19(b) that for k ≥ 2
P [(k − 1)Fk−1,ν ≥ a] ≥ P (t2 ≥ a). ν b.
c.

11.16 a.
b.
11.17 a.
b.

c.

So if a = t2 2 , the F probability is greater than α, and thus the α-level cutoﬀ for the F ν,α/ must be greater than t2 2 . ν,α/ The only diﬀerence in the intervals is the cutoﬀ point, so the Scheﬀ´ intervals are wider. e Both sets of intervals have nominal level 1 − α, but since the Scheﬀ´ intervals are wider, e tests based on them have a smaller rejection region. In fact, the rejection region is contained in the t rejection region. So the t is more powerful.
If θi = θj for all i, j , then θi − θj = 0 for all i, j , and the converse is also true.
H0 : θ ∈ ∩ij Θij and H1 : θ ∈ ∪ij (Θij )c .
If all of the means are equal, the Scheﬀ´ test will only reject α of the time, so the t tests e will be done only α of the time. The experimentwise error rate is preserved.
This follows from the fact that the t tests use a smaller cutoﬀ point, so there can be rejection using the t test but no rejection using Scheﬀ´. Since Scheﬀ´ has experimentwise level α, e e the t test has experimentwise error greater than α.
The pooled standard deviation is 2.358, and the means and t statistics are
Low
3.51.

Mean
Medium
9.27

High
24.93

Med-Low
3.86

t statistic
High-Med
10.49

High-Low
14.36

The t statistics all have 12 degrees of freedom and, for example, t12,.01 = 2.68, so all of the tests reject and we conclude that the means are all signiﬁcantly diﬀerent.
11.18 a.
P (Y > a|Y > b)

= P (Y > a, Y > b)/P (Y > b)
= P (Y > a)/P (Y > b)
> P (Y > a).

(a > b)
(P (Y > b) < 1)

b. If a is a cutoﬀ point then we would declare signiﬁcance if Y > a. But if we only check if Y is signiﬁcant because we see a big Y (Y > b), the proper signiﬁcance level is P (Y > a|Y > b), which will show less signiﬁcance than P (Y > a).

Second Edition

11-7

11.19 a. The marginal distributions of the Yi are somewhat straightforward to derive. As Xi+1 ∼ i i gamma(λi+1 , 1) and, independently, j =1 Xj ∼ gamma( j =1 λj , 1) (Example 4.6.8), we only need to derive the distribution of the ratio of two independent gammas. Let X ∼ gamma(λ1 , 1) and Y ∼ gamma(λ2 , 1). Make the transformation u = x/y,

⇒

v=y

x = uv,

y = v,

with Jacobian v . The density of (U, V ) is f (u, v ) =

uλ1 −1
1
(uv )λ1 −1 v λ2 −1 ve−uv e−v = v λ1 +λ2 −1 e−v(1+u) .
Γ(λ1 )Γ(λ2 )
Γ(λ1 )Γ(λ2 )

To get the density of U , integrate with respect to v . Note that we have the kernel of a gamma(λ1 + λ2 , 1/(1 + u)), which yields f (u) =

uλ1 −1
Γ(λ1 + λ2 )
.
Γ(λ1 )Γ(λ2 ) (1 + u)λ1 +λ2 −1

The joint distribution is a nightmare. We have to make a multivariate change of variable.
This is made a bit more palatable if we do it in two steps. First transform
W1 = X1 ,

W 2 = X1 + X2 ,

W 3 = X1 + X2 + X3 ,

...,

Wn = X1 + X2 + · · · + Xn ,

with
X1 = W1 ,

X 2 = W2 − W1 ,

X 3 = W3 − W2 ,

...

Xn = Wn − Wn−1 ,

and Jacobian 1. The joint density of the Wi is n f (w1 , w2 , . . . , wn ) =

1
(wi − wi−1 )λi −1 e−wn ,
Γ(λi ) i=1 w1 ≤ w2 ≤ · · · ≤ wn ,

where we set w0 = 0 and note that the exponent telescopes. Next note that y1 =

w2 − w1
,
w1

with wi =

y2 =

w3 − w2
,
w2

yn n−1 j =i (1 +

yj )

,

...

yn−1 =

wn − wn−1
,
wn−1

i = 1, . . . , n − 1,

yn = wn ,

wn = yn .

Since each wi only involves yj with j ≥ i, the Jacobian matrix is triangular and the determinant is the product of the diagonal elements. We have dwi yn
,
=− n−1 dyi
(1 + yi ) j =i (1 + yj )

i = 1, . . . , n − 1,

dwn
= 1, dyn and f (y1 , y2 , . . . , yn )

=

1
Γ(λ1 ) n−1 × i=2 n−1 j =1 (1

1
Γ(λi )

n−1

× i=1 λ1 −1

yn
+ yj ) yn n−1 j =i (1

+ yj )

yn
(1 + yi )

n−1 j =i (1

+ yj )

−

.

λi −1

yn n−1 j =i−1 (1

+ yj )

e−yn

11-8

Solutions Manual for Statistical Inference

Factor out the terms with yn and do some algebra on the middle term to get f (y1 , y2 , . . . , yn )

Σ yn i λi −1 e−yn

=

n−1

× i=2 n−1

× i=1 n−1 j =1 (1

yi−1
1 + yi−1

1
Γ(λi )

+ yj ) λi −1

1 n−1 j =i (1

1 n−1 j =i (1

(1 + yi )

λ1 −1

1

1
Γ(λ1 )

+ yj )

+ yj )

.

We see that Yn is independent of the other Yi (and has a gamma distribution), but there does not seem to be any other obvious conclusion to draw from this density.
b. The Yi are related to the F distribution in the ANOVA. For example, as long as the sum of the λi are integers,
Xi+1

Yi =

i j =1

=

Xj

2Xi+1 i j =1

2

Xj

=

χ2 i+1 λ χ2

∼ Fλ

i

i+1 ,

i j =1

j =1

λj

λj

.

Note that the F density makes sense even if the λi are not integers.
11.21 a.
Grand mean y··
¯

=

Total sum of squares

188.54
15

=

3

= 12.57

5
2

(yij − y·· )
¯

= 1295.01.

i=1 j =1
3

5
2

(yij − yi· )
¯

Within SS =
1

1

5

5
2

1

5
2

(y1j − 3.508) +

=

2

(y2j − 9.274) +
1

(y3j − 24.926)
1

=

1.089 + 2.189 + 63.459 = 66.74

=

5

=

5(82.120 + 10.864 + 152.671) = 245.65 · 5 = 1228.25.

3

Between SS

2

(yij − yi· )
¯
1

ANOVA table:
Source
Treatment
Within
Total

df
2
12
14

SS
1228.25
66.74
1294.99

MS
614.125
5.562

F
110.42

Note that the total SS here is diﬀerent from above – round oﬀ error is to blame. Also,
F2,12 = 110.42 is highly signiﬁcant.
b. Completing the proof of (11.2.4), we have k ni

k
2

¯
(yij − y ) i=1 j =1

ni
2

¯
((yij − yi· ) + (¯i − y ))
¯
y

= i=1 j =1

Second Edition

11-9

ni

k

k

ni

2

2

¯
(¯i· − y ) y (yij − yi· ) +
¯

=

i=1 j =1

i=1 j =1 k ni

¯
(yij − yi· ) (¯i· − y ) ,
¯y

+ i=1 j =1

where the cross term (the sum over j ) is zero, so the sum of squares is partitioned as k ni

k
2

¯2 ni (¯i − y ) y (yij − yi· ) +
¯
i=1 j =1

i=1

c. From a), the F statistic for the ANOVA is 110.42. The individual two-sample t’s, using s2 = 151 3 (66.74) = 5.5617, are p − t2 12 t2 13

=

t2
23

(3.508 − 9.274)2
=
(5.5617)(2/5)
(3.508 − 24.926)2
=
2.2247
(9.274 − 24.926)2
=
2.2247

=

=

and

33.247
2.2247

= 14.945,

206.201,
110.122,

2(14.945) + 2(206.201) + (110.122)
= 110.42 = F.
6

11.23 a.
EYij
VarYij

=
=

E(µ + τi + bj + ij ) = µ + τi + Ebj + E
2
Varbj + Var ij = σB + σ 2 ,

by independence of bj and

ij

= µ + τi

ij .

b. n n

¯ ai Yi·

Var

¯ a2 VarYi· + 2 i =

i=1

i=1

Cov(ai Yi· , ai Yi · ). i>i The ﬁrst term is n 

n

¯ a2 VarYi· = i i=1

a2 Var  i i=1

1 r 

r

µ + τ i + bj + j =1

= 1 ij r2

n
2
a2 (rσB + rσ 2 ) i i=1

from part (a). For the covariance
¯
EYi· = µ + τi , and 
¯¯
E(Yi· Yi · )

=

=

E µ + τi +


1
r

 µ + τi + 1 r 


(bj + j (µ + τi )(µ + τi ) +



ij )

1 
E
r2

(bj + j (bj +

i j)



j


(bj +

ij )

 j i j)



11-10

Solutions Manual for Statistical Inference

since the cross terms have expectation zero. Next, expanding the product in the second term again gives all zero cross terms, and we have
1
2
¯¯
E(Yi· Yi · ) = (µ + τi )(µ + τi ) + 2 (rσB ), r and
2
¯¯
Cov(Yi· , Yi · ) = σB /r.

Finally, this gives n ¯ ai Yi·

Var

1 r2 =

i=1

n
2
a2 (rσB + rσ 2 ) + 2 i i=1

2 ai ai σB /r i>i n

n

=

1 r =

12 σ a2 r i=1 i

=

12
2
(σ + σB )(1 − ρ) a2 , i r i=1 2 a2 σ 2 + σB ( i i=1 n ai )2 i=1 n

where, in the third equality we used the fact that

i

ai = 0.

11.25 Diﬀerentiation yields
a.

∂
∂c RSS
∂
∂d RSS

=2

set

[yi − (c+dxi )] (−1) = 0 ⇒ nc + d

xi =

set

= 2 [yi − (ci +dxi )] (−xi ) = 0 ⇒ c xi + d
b. Note that nc + d xi = yi ⇒ c = y − dx. Then
¯
¯
(¯ − dx) y ¯

x2 = i xi + d

xi yi

which simpliﬁes to d = xi (yi − y )/
¯
estimates.
c. The second derivatives are
∂2
RSS = n,
∂c2

yi x2 i

=

xi yi .

x2 − nx2 =
¯
i

and d

xi yi −

(xi − x)2 . Thus c and d are the least squares
¯

∂2
RSS =
∂c∂d

∂2
RSS =
∂d2

xi ,

x2 . i Thus the Jacobian of the second-order partials is n xi

xi x2 i

11.27 For the linear estimator
E

i

ai Yi

Since Var

i

2

x2 − i xi

ai (α + βxi ) = α ⇒ i ai Yi = σ 2

i

(xi − x)2 > 0.
¯

=n

ai Yi to be unbiased for α we have

=

i

=n

ai = 1 and i ai xi = 0. i a2 , we need to solve: i a2 subject to i minimize i xi y
¯

ai = 1 and i ai xi = 0. i Second Edition

11-11

A solution can be found with Lagrange multipliers, but verifying that it is a minimum is excruciating. So instead we note that ai = 1 ⇒ ai = i 1
+ k (bi − ¯), b n

for some constants k, b1 , b2 , . . . , bn , and ai xi = 0 ⇒ k = i −x
¯
1
¯)(xi − x) and ai = n −
¯
i (bi − b

Now a2 = i i

i

x(bi − ¯)
¯
b
(bi − ¯)(xi − x) b ¯ i 1
−
n

2

=

1
+
n[

x(bi − ¯)
¯
b
¯)(xi − x) .
¯
i (bi − b

x2 i (bi − ¯)2
¯
b
,
(bi − ¯)(xi − x)]2 b ¯ i since the cross term is zero. So we need to minimize the last term. From Cauchy-Schwarz we know that
¯2
1 i (bi − b)
,
¯)(xi − x)]2 ≥
(xi − x)]2
¯
[ i (bi − b
¯
i and the minimum is attained at bi = xi . Substituting back we get that the minimizing ai is x(xi −x)
¯
¯
1
ˆ¯
¯
, which results in i ai Yi = Y − β x, the least squares estimator. n− (x −x)2
¯
i

i

11.28 To calculate max L(σ 2 |y, αβ )
ˆˆ
2 σ = max
2
σ

1
2πσ 2

n/2

ˆ

1

ˆ e− 2 Σi [yi −(α+βxi )]

2

/σ 2

take logs and diﬀerentiate with respect to σ 2 to get d n
1
log L(σ 2 |y, α, β ) = − 2 +
ˆˆ
dσ 2
2σ
2

i [yi

− (ˆ + βxi )]2 αˆ .
(σ 2 )2

Set this equal to zero and solve for σ 2 . The solution is σ 2 .
ˆ
11.29 a.
Eˆi = E(Yi − α − βxi ) = (α + βxi ) − α − βxi = 0.
ˆˆ
b.
=

E[Yi − α − βxi ]2
ˆˆ

=

Varˆi

ˆ
E[(Yi − α − βxi ) − (ˆ − α) − xi (β − β )]2 α ˆ
ˆ
VarYi + Varˆ + x2 Varβ − 2Cov(Yi , α) − 2xi Cov(Yi , β ) + 2xi Cov(ˆ , β ). α ˆ αˆ =

i

11.30 a. Straightforward algebra shows α ˆ

= y − βx
¯ ˆ¯
1
x (xi − x)yi
¯
¯
=
yi − n (xi − x)2
¯
1 x(xi − x)
¯
¯
=
− yi . n (xi − x)2
¯

11-12

Solutions Manual for Statistical Inference

b. Note that for ci =

1 n −

x(xi −x)
¯
¯
,
(xi −x)2
¯

Eˆ α =

Varˆ α ci = 1 and ci Yi =

E

=

ci xi = 0. Then ci (α + βxi = α,

c2 VarYi = σ 2 i c2 , i and c2 i

=
ˆ
c. Write β =

2

x(xi − x)
¯
¯
1
− n (xi − x)2
¯

=
1
+ n x2
¯
(xi − x)2
¯

x2 (xi − x)2
¯
¯
1
+
2
n2
( (xi − x)2 )
¯

=

(cross term = 0)

x2 i . nSxx =

di yi , where

xi − x
¯
.
(xi − x)2
¯

di =
From Exercise 11.11,
Cov(ˆ , β ) αˆ =

Cov

ci Yi ,

= σ2

di Yi

x(xi − x)
¯
¯
1
− n (xi − x)2
¯

= σ2

ci di

(xi − x)
¯
(xi − x)2
¯

=

−σ 2 x
¯
.
(xi − x)2
¯

11.31 The fact that
[δij − (cj + dj xi )]Yj

ˆi = i follows directly from (11.3.27) and the deﬁnition of cj and dj . Since α =
ˆ
11.3.2
Cov(ˆi , α) = σ 2
ˆ

cj [δij − (cj + dj xi )] j 



= σ 2 ci −

cj (cj + dj xi ) j 



= σ 2 ci −

c2 − xi j j

cj dj  . j Substituting for cj and dj gives ci =

c2 j =

j

xi

cj dj

1
(xi − x)¯
¯x
− n Sxx
1
x2
¯
+ n Sxx

=−

j

xi x
¯
,
Sxx

ˆ and substituting these values shows Cov(ˆi , α) = 0. Similarly, for β ,
ˆ


ˆ
Cov(ˆi , β ) = σ 2 di −

d2  j cj dj − xi j j

i ci Yi ,

from Lemma

Second Edition

11-13

with di cj dj j d2 j xi

(xi − x)
¯
Sxx x ¯
=−
Sxx
=

=

j

1
,
Sxx

ˆ and substituting these values shows Cov(ˆi , β ) = 0.
11.32 Write the models as
3y i yi = α + βxi + i
= α + β (xi − x) +
¯
= α + β zi + i .

i

a. Since z = 0,
¯
ˆ β= (xi − x)(yi − y )
¯
¯
=
2
(xi − x)
¯

zi (yi − y )
¯
ˆ
=β.
2 zi b. α ˆ α ˆ

= y − β x,
¯ ˆ¯
= y−β z =y
¯ ˆ¯ ¯

since z = 0.
¯
α ∼ n(α + β z , σ 2 /n) = n(α, σ 2 /n).
ˆ
¯
c. Write

Then
Cov(ˆ , β ) = −σ 2 αˆ since

zi
2
zi

1ˆ yi β = n α=
ˆ

1 n yi .

zi
2
zi

= 0,

zi = 0.

11.33 a. From (11.23.25), β = ρ(σY /σX ), so β = 0 if and only if ρ = 0 (since we assume that the variances are positive).
b. Start from the display following (11.3.35). We have
ˆ
β2
2 /S
S
xx

=

2
Sxy /Sxx
RSS/(n − 2)

= (n − 2)
= (n − 2)

2
Sxy
2
Syy − Sxy /Sxx Sxx
2
Sxy
2
Syy Sxx − Sxy

,

and dividing top and bottom by Syy Sxx ﬁnishes the proof.
√
√
√
ˆ
c. From (11.3.33) if ρ = 0 (equivalently β = 0), then β/(S/ Sxx ) = n − 2 r/ 1 − r2 has a tn−2 distribution.

11-14

Solutions Manual for Statistical Inference

11.34 a. ANOVA table for height data
Source
Regression
Residual
Total

df
1
6
7

SS
60.36
7.14
67.50

MS
60.36
1.19

F
50.7

The least squares line is y = 35.18 + .93x.
ˆ
b. Since yi − y = (yi − yi ) + (ˆi − y ), we just need to show that the cross term is zero.
¯
ˆ y ¯ n n

(yi − yi )(ˆi − y )
ˆy
¯

yi − (ˆ + βxi ) αˆ =

i=1

(α + βxi ) − y
ˆˆ
¯

i=1 n ˆ
(yi − y ) − β (xi − x)
ˆ
¯
¯

=

ˆ β (xi − x)
¯

i=1 n n

ˆ
=β

ˆ
(xi − x)(yi − y ) − β 2
¯
¯ i=1 c.

(α = y − β x)
ˆ ¯ ˆ¯

(xi − x)2 = 0,
¯
i=1

ˆ from the deﬁnition of β .
ˆ
(yi − y )2 = β 2
ˆ
¯

(xi − x)2 =
¯

2
Sxy
.
Sxx

11.35 a. For the least squares estimate: d dθ

(yi − θx2 )2 = 2 i (yi − θx2 )x2 = 0 i i

i

i

which implies
ˆ
θ=

i

yi x2 i . x4 i i b. The log likelihood is log L = −

n
1
log(2πσ 2 ) − 2
2
2σ

(yi − θx2 )2 , i i

and maximizing this is the same as the minimization in part (a).
c. The derivatives of the log likelihood are d log L = dθ d2 log L = dθ2 so the CRLB is σ 2 /

i

1 σ2 −1 σ2 (yi − θx2 )x2 i i i x4 , i i

ˆ x4 . The variance of θ is i ˆ
Varθ = Var

i

yi x2 i 4 i xi

ˆ so θ is the best unbiased estimator.

= i x2 i 4 j xj

σ2 = σ2 /

x4 , i i

Second Edition

11-15

11.36 a.
Eˆ
α

=

ˆ
Eβ =

ˆ¯
ˆ¯ ¯
¯
¯
E(Y − β X ) = E E(Y − β X |X )

¯
¯
= E α+β X − β X

= Eα = α.

ˆ¯
E[E(β |X )] = Eβ = β.

b. Recall
VarY = Var[E(Y |X )] + E[Var(Y |X )]
Cov(Y , Z ) = Cov[E(Y |X ), E(Z |X )] + E[Cov(Y, Z |X )].
Thus
Varˆ α = E[Var(α|X )] = σ 2 E
ˆ

2
Xi

SXX

ˆ
Varβ = σ 2 E[1/SXX ]
¯
Cov(ˆ , β ) = E[Cov(ˆ , β |X )] = −σ 2 E[X/SXX ]. αˆ αˆˆ
11.37 This is almost the same problem as Exercise 11.35. The log likelihood is log L = −

1 n log(2πσ 2 ) − 2
2
2σ

(yi − βxi )2 . i 2 i xi ,

The MLE is i xi yi / with mean β and variance σ 2 / i x2 , the CRLB. i 11.38 a. The model is yi = θxi + i , so the least squares estimate of θ is xi yi / x2 (regression i through the origin). xi Yi x2 i xi Yi x2 i

E
Var

xi (xi θ)
=θ
x2 i x2 (xi θ) i =θ
2
( x2 )
(
i

=
=

x3 i 2.

x2 ) i The estimator is unbiased.
b. The likelihood function is n L(θ|x)

= i=1 ∂ logL =
∂θ

e−θxi (θxi )yi
(y i )!

∂
−θ
∂θ

=−

=

(θxi )yi yi !

yi log(θxi ) − log

xi +

xi +

e−θΣxi

yi !

xi yi set
=0
θxi

which implies
ˆ
θ=

ˆ
Eθ =

θxi
=θ
xi

and

yi xi ˆ
Varθ = Var

yi xi yi
− yi
=
θ θ2 and

=

θxi
(

2

xi )

=

θ
.
xi

c.
∂2
∂ log L =
−
2
∂θ
∂θ
Thus, the CRLB is θ/

xi +

E−

∂2 log L =
∂θ2

xi , and the MLE is the best unbiased estimator.

xi
.
θ

11-16

Solutions Manual for Statistical Inference

11.39 Let Ai be the set


Ai = α, β : (α + βx0i ) − (α + βx0i )
ˆˆ
ˆˆ



S




1
(x0i − x)2 
¯
+
≤ tn−2,α/2m .

n
Sxx

Then P (∩m Ai ) is the probability of simultaneous coverage, and using the Bonferroni Ini=1 equality (1.2.10) we have m m

P (∩m Ai ) ≥ i=1 P (Ai ) − (m − 1) = i=1 1− i=1 α
− (m − 1) = 1 − α. m 11.41 Assume that we have observed data (y1 , x1 ), (y2 , x2 ), . . . , (yn−1 , xn−1 ) and we have xn but not yn . Let φ(yi |xi ) denote the density of Yi , a n(a + bxi , σ 2 ).
a. The expected complete-data log likelihood is n−1 n

log φ(Yi |xi )

E

log φ(yi |xi ) + E log φ(Y |xn ),

=

i=1

i=1

where the expectation is respect to the distribution φ(y |xn ) with the current values of the parameter estimates. Thus we need to evaluate
1
1
2
E log φ(Y |xn ) = E − log(2πσ1 ) − 2 (Y − µ1 )2 ,
2
2σ 1
2
where Y ∼ n(µ0 , σ0 ). We have
2
E(Y − µ1 )2 = E([Y − µ0 ] + [µ0 − µ1 ])2 = σ0 + [µ0 − µ1 ]2 ,

since the cross term is zero. Putting this all together, the expected complete-data log likelihood is
−

n
1
2 log(2πσ1 ) − 2
2
2σ 1

n−1

[yi − (a1 + b1 xi )]2 − i=1 n
1
2
= − log(2πσ1 ) − 2
2
2σ 1

2 σ0 + [(a0 + b0 xn ) − (a1 + b1 xn )]2
2
2σ1

n

[yi − (a1 + b1 xi )]2 − i=1 2 σ0 2
2σ1

if we deﬁne yn = a0 + b0 xn .
b. For ﬁxed a0 and b0 , maximizing this likelihood gives the least squares estimates, while the
2
maximum with respect to σ1 is σ1 =
ˆ2

n i=1 [yi

2
− (a1 + b1 xi )]2 + σ0
.
n

So the EM algorithm is the following: At iteration t, we have estimates a(t) , ˆ(t) , and σ 2(t) .
ˆ
b
ˆ
(t)
We then set yn = a(t) + ˆ(t) xn (which is essentially the E-step) and then the M-step is
ˆ
b to calculate a(t+1) and ˆ(t+1) as the least squares estimators using (y1 , x1 ), (y2 , x2 ), . . .
ˆ
b
(t)
(yn−1 , xn−1 ), (yn , xn ), and
2(t+1)

σ1
ˆ

=

n i=1 [yi

2(t)

− (a(t+1) + b(t+1) xi )]2 + σ0 n .

Second Edition

11-17

(t)
c. The EM calculations are simple here. Since yn = a(t) + ˆ(t) xn , the estimates of a and b
ˆ
b must converge to the least squares estimates (since they minimize the sum of squares of the observed data, and the last term adds nothing. For σ 2 we have (substituting the least
ˆ
squares estimates) the stationary point

σ2 =
ˆ

n i=1 [yi

− (ˆ + ˆ i )]2 + σ 2 a bx
ˆ
n

⇒

2 σ 2 = σobs ,
ˆ

2 where σobs is the MLE from the n − 1 observed data points. So the MLE s are the same as those without the extra xn .
d. Now we use the bivariate normal density (see Deﬁnition 4.5.10 and Exercise 4.45 ). Denote the density by φ(x, y ). Then the expected complete-data log likelihood is n−1 log φ(xi , yi ) + E log φ(X, yn ), i=1 where after iteration t the missing data density is the conditional density of X given Y = yn ,
(t)

(t)

(t)

(t)

2(t)

X |Y = yn ∼ n µX + ρ(t) (σX /σY )(yn − µY ), (1 − ρ2(t) )σX

.

2
Denoting the mean by µ0 and the variance by σ0 , the expected value of the last piece in the likelihood is

E log φ(X, yn )
1
22
= − log(2πσX σY (1 − ρ2 ))
2
1
X − µX
−
E
2)
2(1 − ρ σX 2

− 2 ρE

1
22
= − log(2πσX σY (1 − ρ2 ))
2
2
1
σ0 µ0 − µX
+
−
2 ) σ2
2(1 − ρ σX X

(X − µX )(yn − µY ) σ X σY

2

− 2ρ

+

(µ0 − µX )(yn − µY ) σX σ Y

yn − µY σY +

2

yn − µY σY 2

.

So the expected complete-data log likelihood is n−1 log φ(xi , yi ) + log φ(µ0 , yn ) − i=1 2 σ0 2.
2(1 − ρ2 )σX

2
The EM algorithm is similar to the previous one. First note that the MLEs of µY and σY
2
are the usual ones, y and σY , and don’t change with the iterations. We update the other
¯
ˆ
(t)
estimates as follows. At iteration t, the E-step consists of replacing xn by
(t)

(t)

x(t+1) = µX + ρ(t)
ˆ
n
(t+1)

Then µX

σX

(t)

σY

(yn − y ).
¯

= x and we can write the likelihood as
¯

2
1
1
Sxx + σ0
Sxy
Syy
22
− log(2πσX σY (1 − ρ2 )) −
ˆ
− 2ρ
+2.
2
2)
2
2(1 − ρ σX σX σY
ˆ
σY
ˆ

11-18

Solutions Manual for Statistical Inference
2
which is the usual bivariate normal likelihood except that we replace Sxx with Sxx + σ0 .
So the MLEs are the usual ones, and the EM iterations are
(t)

x(t+1) n (t+1)

(t)

= µX + ρ(t)
ˆ

µX
ˆ

=

ρ(t+1)
ˆ

=

(t)

σY

(yn − y )
¯

= x(t)
¯

2(t+1)

σX

(t)

σX
ˆ

2(t)

Sxx + (1 − ρ2(t) )ˆX
ˆ
σ n (t)
Sxy
(t)

.
2(t+1)

(Sxx + (1 − ρ2(t) )ˆX
ˆ
σ

)Syy

Here is R code for the EM algorithm:
nsim

Solutions Manual for Statistical Inference, Second Edition

Similar Documents

Ffsfdghiufbdsbfdsnfnsdkjfbnd Ckjsdbn Cldnfc Lkdanf Hfkjdnfjdanfl

Accounting

Dr Manger

Gscm326 Full Course Latest All Discussions All Quizzes and All Week Course Project

Statistics

Solman

Statistical Thining in Sports

Structural Equation Modeling

Probability

Data Mining

Essays

Educationa Research

Mba Special Assignment

The Life

Healthcare

Popular Essays