Free Essay

Categorical Book Answers

In:

Submitted By dc528
Words 10207
Pages 41
1

CATEGORICAL DATA ANALYSIS, 3rd edition

Solutions to Selected Exercises
Alan Agresti
Version August 3, 2012, c Alan Agresti 2012
This file contains solutions and hints to solutions for some of the exercises in Categorical
Data Analysis, third edition, by Alan Agresti (John Wiley, & Sons, 2012). The solutions given are partly those that are also available at the website www.stat.ufl.edu/~ aa/ cda2/cda.html for many of the odd-numbered exercises in the second edition of the book (some of which are now even-numbered). I intend to expand the document with additional solutions, when I have time.
Please report errors in these solutions to the author (Department of Statistics, University of Florida, Gainesville, Florida 32611-8545, e-mail AA@STAT.UFL.EDU), so they can be corrected in future revisions of this site. The author regrets that he cannot provide students with more detailed solutions or with solutions of other exercises not in this file.

Chapter 1
1. a. nominal, b. ordinal, c. interval, d. nominal, e. ordinal, f. nominal,
3. π varies from batch to batch, so the counts come from a mixture of binomials rather than a single bin(n, π). Var(Y ) = E[Var(Y | π)] + Var[E(Y | π)] > E[Var(Y | π)] =
E[nπ(1 − π)].

7. a. ℓ(π) = π 20 , so it is not close to quadratic.
b. π = 1.0. Wald statistic z = (1.0−.5)/ 1.0(0)/20 = ∞. Wald CI is 1.0 ±1.96 1.0(0)/20 =
ˆ
1.0 ± 0.0, or (1.0, 1.0). These are not sensible.
c. z = (1.0 − .5)/ .5(.5)/20 = 4.47, P < 0.0001. Score CI is (0.839, 1.000).
d. Test statistic 2(20) log(20/10) = 27.7, df = 1. The CI is (exp(−1.962 /40), 1) =
(0.908, 1.0).
e. P -value = 2(.5)20 = .00000191.
9. The chi-squared goodness-of-fit test of the null hypothesis that the binomial proportions equal (0.75, 0.25) has expected frequencies (827.25, 275.75), and X 2 = 3.46 based on df = 1. The P -value is 0.063, giving moderate evidence against the null.
10. The sample mean is 0.61. Fitted probabilities for the truncated distribution are
0.543, 0.332, 0.102, 0.021, 0.003. The estimated expected frequencies are 108.5, 66.4,
20.3, 4.1, and 0.6, and the Pearson X 2 = 0.7 with df = 3 (0.3 with df = 2 if we truncate at 3 and above). The fit seems adequate.
11. With the binomial test the smallest possible P -value, from y = 0 or y = 5, is

2
2(1/2)5 = 1/16. Since this exceeds 0.05, it is impossible to reject H0 , and thus P(Type
I error) = 0. With the large-sample score test, y = 0 and y = 5 are the only outcomes to give P ≤ 0.05 (e.g., with y = 5, z = (1.0 − 0.5)/ 0.5(0.5)/5 = 2.24 and P = 0.025).
Thus, for that test, P(Type I error) = P (Y = 0) + P (Y = 5) = 1/16.
12. a. No outcome can give P ≤ .05, and hence one never rejects H0 .
b. When T = 2, mid P -value = 0.04 and one rejects H0 . Thus, P(Type I error) = P(T
= 2) = 0.08.
c. P -values of the two tests are 0.04 and 0.02; P(Type I error) = P(T = 2) = 0.04 with both tests.
d. P(Type I error) = E[P(Type I error | T )] = (5/8)(0.08) = 0.05. Randomized tests are not sensible for practical application.
16. Var(ˆ ) = π(1 − π)/n decreases as π moves toward 0 or 1 from 0.5. π 17. a. Var(Y ) = nπ(1 − π), binomial.

n
> nπ(1 − π).
2
c. Var(Y ) = E[Var(Y |π)] + Var[E(Y |π)] = E[nπ(1 − π)] + Var(nπ) = nρ − nE(π 2 ) +
[n2 E(π 2 )−n2 ρ2 ] = nρ+(n2 −n)[E(π 2 )−ρ2 ]−nρ2 = nρ(1−ρ)+(n2 −n)V ar(π) > nρ(1−ρ).
18. This is the binomial probability of y successes and k − 1 failures in y + k − 1 trials times the probability of a failure at the next trial.

b. Var(Y ) =

Var(Yi ) + 2

i to ) + (1/2)p(to ) = 1 − P (T ≤ to ) +
(1/2)p(to ) = 1 − Fmid (to ).

29. a. The kernel of the log likelihood is L(θ) = n1 log θ2 +n2 log[2θ(1−θ)]+n3 log(1−θ)2 .
Take ∂L/∂θ = 2n1 /θ + n2 /θ − n2 /(1 − θ) − 2n3 /(1 − θ) = 0 and solve for θ.
b. Find the expectation using E(n1 ) = nθ2 , etc. Then, the asymptotic variance is the
ˆ
ˆ inverse information = θ(1 − θ)/2n, and thus the estimated SE = θ(1 − θ)/2n.
ˆ
ˆ
ˆ
ˆ
c. The estimated expected counts are [nθ2 , 2nθ(1 − θ), n(1 − θ)2 ]. Compare these to the
2
2 observed counts (n1 , n2 , n3 ) using X or G , with df = (3 − 1) − 1 = 1, since 1 parameter is estimated.
30. Since ∂ 2 L/∂π 2 = −(2n11 /π 2 ) − n12 /π 2 − n12 /(1 − π)2 − n22 /(1 − π)2 , the information is its negative expected value, which is
2nπ 2 /π 2 + nπ(1 − π)/π 2 + nπ(1 − π)/(1 − π)2 + n(1 − π)/(1 − π)2 , which simplifies to n(1 + π)/π(1 − π). The asymptotic standard error is the square root of the inverse information, or π(1 − π)/n(1 + π).
32. c. Let π = n1 /n, and (1 − π ) = n2 /n, and denote the null probabilities in the two
ˆ
ˆ categories by π0 and (1 − π0 ). Then, X 2 = (n1 − nπ0 )2 /nπ0 + (n2 − n(1 − π2 ))2 /n(1 − π0 )
= n[(ˆ − π0 )2 (1 − π0 ) + ((1 − π ) − (1 − π0 ))2 π0 ]/π0 (1 − π0 ), π ˆ
2
2 which equals (ˆ − π0 ) /[π0 (1 − π0 )/n] = zS . π 33. Let X be a random variable that equals πj0 /ˆj with probability πj . By Jensen’s π ˆ inequality, since the negative log function is convex, E(− log X) ≥ − log(EX). Hence,
E(− log X) = πj log(ˆj /pj0 ) ≥ − log[ πj (πj0 /ˆj )] = − log( πj0 ) = − log(1) = 0.
ˆ
π
ˆ
π
2
Thus G = 2nE(− log X) ≥ 0.

35. If Y1 is χ2 with df = ν1 and if Y2 is independent χ2 with df = ν2 , then the mgf of
Y1 + Y2 is the product of the mgfs, which is m(t) = (1 − 2t)−(ν1 +ν2 )/2 , which is the mgf of a χ2 with df = ν1 + ν2 .

40a. The Bayes estimator is (n1 + α)/(n + α + β), in which α > 0, β > 0. No proper prior leads to the ML estimate, n1 /n. The ML estimator is the limit of Bayes estimators as α and β both converge to 0.
b. This happens with the improper prior, proportional to [π1 (1 − π1 )]−1 , which we get from the beta density by taking the improper settings α = β = 0.

4

Chapter 2
3. P (−|C) = 1/4. It is unclear from the wording, but presumably this means that
¯
¯
P (C|+) = 2/3. Sensitivity = P (+|C) = 1 − P (−|C) = 3/4. Specificity = P (−|C) =
¯
1 − P (+|C) can’t be determined from information given.
5. a. Relative risk.
b. (i) π1 = 0.55π2 , so π1 /π2 = 0.55.
(ii) 1/0.55 = 1.82.

11. a. (0.847/0.153)/(0.906/0.094) = 0.574.
b. This is interpretation for relative risk, not the odds ratio. The actual relative risk =
0.847/0.906 = 0.935; i.e., 60% should have been 93.5%.
12. a. Relative risk: Lung cancer, 14.00; Heart disease, 1.62. (Cigarette smoking seems more highly associated with lung cancer)
Difference of proportions: Lung cancer, 0.00130; Heart disease, 0.00256. (Cigarette smoking seems more highly associated with heart disease)
Odds ratio: Lung cancer, 14.02; Heart disease, 1.62. e.g., the odds of dying from lung cancer for smokers are estimated to be 14.02 times those for nonsmokers. (Note similarity to relative risks.)
b. Difference of proportions describes excess deaths due to smoking. That is, if N = no. smokers in population, we predict there would be 0.00130N fewer deaths per year from lung cancer if they had never smoked, and 0.00256N fewer deaths per year from heart disease. Thus elimination of cigarette smoking would have biggest impact on deaths due to heart disease.
15. Marginal odds ratio = 1.84, but most conditional odds ratios are close to 1.0 except in Department A where odds ratio = 0.35. Note that males tend to apply in greater numbers to Departments A and B, in which admissions rates are relatively high, and females tend to aply in greater numbers to Departments C, D, E, F, in which admissions rates are relatively low. This results in the marginal association whereby the odds of admission for males are 84% higher than those for females.
17. a. 0.18 for males and 0.32 for females; e.g., for male children, the odds that a white was a murder victim were 0.18 times the odds that a nonwhite was a murder victim.
b. 0.21.
18. The age distribution is relatively higher in Maine. Death rates are higher at older ages, and Maine tends to have an older population than South Carolina.
19. Kentucky: Counts are (31, 360 / 7, 50) when victim was white and (0, 18 / 2, 106) when victim was black. Conditional odds ratios are 0.62 and 0.0, whereas marginal odds ratio is 1.42. Simpson’s paradox occurs. Whites tend to kill whites and blacks tend to kill blacks, and killing a white is more likely to result in the death penalty.
21. Yes, this would be an occurrence of Simpson’s paradox. One could display the data

5 as a 2 × 2 × K table, where rows = (Smith, Jones), columns = (hit, out) response for each time at bat, layers = (year 1, . . . , year K). This could happen if Jones tends to have relatively more observations (i.e., “at bats”) for years in which his average is high.
25. a. Let “pos” denote positive diagnosis, “dis” denote subject has disease.
P (dis|pos) =

P (pos|dis)P (dis)
P (pos|dis)P (dis) + P (pos|no dis)P (no dis)

b. 0.95(0.005)/[0.95(0.005) + 0.05(0.995)] = 0.087.
Test
+

Total
Reality + 0.00475 0.00025 0.005
− 0.04975 0.94525 0.995
Nearly all (99.5%) subjects are not HIV+. The 5% errors for them swamp (in frequency) the 95% correct cases for subjects who truly are HIV+. The odds ratio = 361; i.e., the odds of a positive test result are 361 times higher for those who are HIV+ than for those not HIV+.
27. a. The numerator is the extra proportion that got the disease above and beyond
¯
what the proportion would be if no one had been exposed (which is P (D | E)).
¯
b. Use Bayes Theorem and result that RR = P (D | E)/P (D | E).

29. a. For instance, if first row becomes first column and second row becomes second column, the table entries become n11 n12

n21 n22 The odds ratio is the same as before. The difference of proportions and relative risk are only invariant to multiplication of cell counts within rows by a constant.
30. Suppose π1 > π2 . Then, 1−π1 < 1−π2 , and θ = [π1 /(1−π1 )]/[π2 /(1−π2 )] > π1 /π2 >
1. If π1 < π2 , then 1 − π1 > 1 − π2 , and θ = [π1 /(1 − π1 )]/[π2 /(1 − π2 )] < π1 /π2 < 1.
31. This simply states that ordinary independence for a two-way table holds in each partial table.

36. This condition is equivalent to the conditional distributions of Y in the first I − 1 rows being identical to the one in row I. Equality of the I conditional distributions is equivalent to independence.
37. Use an argument similar to that in Sec. 1.2.5. Since Yi+ is sum of independent Poissons, it is Poisson. In the denominator for the calculation of the conditional probability, the distribution of {Yi+ } is a product of Poissons with means {µi+ }. The multinomial distributions are obtained by identifying πj|i with µij /µi+ .
40. If in each row the maximum probability falls in the same column, say column 1, then

6
E[V (Y | X)] = i πi+ (1 − π1|i ) = 1 − π+1 = 1 − max{π+j }, so λ = 0. Since the maximum being the same in each row does not imply independence, λ = 0 can occur even when the variables are not independent.
Chapter 3
14. b. Compare rows 1 and 2 (G2 = 0.76, df = 1, no evidence of difference), rows
3 and 4 (G2 = 0.02, df = 1, no evidence of difference), and the 3 × 2 table consisting of rows 1 and 2 combined, rows 3 and 4 combined, and row 5 (G2 = 95.74, df = 2, strong evidences of differences).
16.a. X 2 = 8.9, df = 6, P = 0.18; test treats variables as nominal and ignores the information on the ordering.
b. Residuals suggest tendency for aspirations to be higher when family income is higher.
c. Ordinal test gives M 2 = 4.75, df = 1, P = 0.03, and much stronger evidence of an association. 18. a. It is plausible that control of cancer is independent of treatment used. (i) P -value is hypergeometric probability P (n11 = 21 or 22 or 23) = 0.3808, (ii) P -value = 0.638 is sum of probabilities that are no greater than the probability (0.2755) of the observed table. b. 0.3808 - 0.5(0.2755) = 0.243. With this type of P -value, the actual error probability tends to be closer to the nominal value, the sum of the two one-sided P-values is 1, and the null expected value is 0.50; however, it does not guarantee that the actual error probability is no greater than the nominal value.
25. For proportions π and 1 −π in the two categories for a given sample, the contribution to the asymptotic variance is [1/nπ + 1/n(1 − π)]. The derivative of this with respect to π is 1/n(1 − π)2 − 1/nπ 2 , which is less than 0 for π < 0.50 and greater than 0 for π > 0.50. Thus, the minimum is with proportions (0.5, 0.5) in the two categories.
29. Use formula (3.9), noting that the partial derivative of the measure with respect to πi is just ηi /δ 2 .
30. For any reasonable significance test, whenever H0 is false, the test statistic tends to be larger and the P -value tends to be smaller as the sample size increases. Even if H0 is just slightly false, the P -value will be small if the sample size is large enough. Most statisticians feel we learn more by estimating parameters using confidence intervals than by conducting significance tests.
31. a. Note θ = π1+ = π+1 .
b. The log likelihood has kernel
L = n11 log(θ2 ) + (n12 + n21 ) log[θ(1 − θ)] + n22 log(1 − θ)2

ˆ
∂L/∂θ = 2n11 /θ + (n12 + n21 )/θ − (n12 + n21 )/(1 − θ) − 2n22 /(1 − θ) = 0 gives θ =
(2n11 + n12 + n21 )/2(n11 + n12 + n21 + n22 ) = (n1+ + n+1 )/2n = (p1+ + p+1 )/2.
ˆ
c,d. Calculate estimated expected frequencies (e.g., µ11 = nθ2 ), and obtain Pearson X 2 ,
ˆ

7 which is 2.8. We estimated one parameter, so df = (4-1)-1 = 2 (one higher than in testing independence without assuming identical marginal distributions). The free throws are plausibly independent and identically distributed.
32. By expanding the square and simplifying, one can obtain the alternative formula for
X 2,
X 2 = n[
(n2 /ni+ n+j ) − 1]. ij i

j

Since nij ≤ ni+ , the double sum term cannot exceed i j nij /n+j = J, and since nij ≤ n+j , the double sum cannot exceed i j nij /ni+ = I. It follows that X 2 cannot exceed n[min(I, J) − 1] = n[min(I − 1, J − 1)].

35. Because G2 for full table = G2 for collapsed table + G2 for table consisting of the two rows that are combined.
ˆ
37. j p+j rj = j p+j [ k < j}. This equals
(1/2)[

k 0. Note that the general

logistic cdf on p. 121 has mean µ and standard deviation τ π/ 3. Writing α + βx in the

9 form (x − α/β)/(1/β), we identify µ with α/β and τ with 1/β, so the standard deviation

is π/β 3 when β > 0.
19. For j = 1, xij = 0 for group B, and for observations in group A, ∂µA /∂ηi is constant,
ˆ
¯ so likelihood equation sets A (yi − µA )/µA = 0, so µA = yA . For j = 0, xij = 1 and the likelihood equation gives

A

(yi − µA ) ∂µA
+
µA
∂ηi

(yi − µB ) ∂µB µB ∂ηi

B

= 0.

The first sum is 0 from the first likelihood equation, and for observations in group B,
ˆ
¯
∂µB /∂ηi is constant, so second sum sets B (yi − µB )/µB = 0, so µB = yB .
21. Letting φ = Φ′ , wi = [φ(

j

βj xij )]2 /[Φ(

j

βj xij )(1 − Φ(

j

βj xij ))/ni ]

22. a. Since φ is symmetric, Φ(0) = 0.5. Setting α + βx = 0 gives x = −α/β.
b. The derivative of Φ at x = −α/β is βφ(α + β(−α/β)) = βφ(0). The logistic pdf√ has x x 2 φ(x) = e /(1 + e ) which equals 0.25 at x = 0; the standard normal pdf equals 1/ 2π at x = 0.
c. Φ(α + βx) = Φ( x−(−α/β) ).
1/β
23. a. Cauchy. You can see this by taking the derivative and noting it has the form of a Cauchy density. The GLM with Bernoulli random component, systematic component α + βx, link function tan[pi(π(x) − 1/2)] (where pi = 3.14...), would work well when the rate of convergence of π to 0 and 1 is slower than with the logit or probit link (Recall that Cauchy density has thick tails compared to logistic and normal densities).
26. a. With identity link the GLM likelihood equations simplify to, for each i, µi )/µi = 0, from which µi = j yij /ni .
ˆ
b. Deviance = 2 i j [yij log(yij /¯i ). y ni j=1 (yij −

31. For log likelihood L(µ) = −nµ + ( i yi ) log(µ), the score is u = ( i yi − nµ)/µ,
H = −( i yi )/µ2, and the information is n/µ. It follows that the adjustment to µ(t) in
¯
¯
Fisher scoring is [µ(t) /n][( i yi −nµ(t) )/µ(t) ] = y −µ(t) , and hence µ(t+1) = y . For Newton(t)
(t)
(t) 2
(t+1)
(t)
Raphson, the adjustment to µ is µ − (µ ) /¯, so that µ y = 2µ − (µ(t) )2 /¯. Note y (t)
(t+1)
that if µ = y , then also µ
¯
= y.
¯
Chapter 5
2. a. π = e−3.7771+0.1449(8) /[1 + e−3.7771+0.1449(8) ].
ˆ
b. π = 0.5 at −ˆ β = 3.7771/0.1449 = 26.
ˆ
α/ ˆ
ˆπ
c. At LI = 8, π = 0.068, so rate of change is βˆ (1 − π ) = 0.1449(0.068)(0.932) = 0.009.
ˆ
ˆ
ˆ
β
.1449
e. e = e
= 1.16.
f. The odds of remission at LI = x + 1 are estimated to fall between 1.029 and 1.298 times the odds of remission at LI = x.
g. Wald statistic = (0.1449/0.0593)2 = 5.96, df = 1, P -value = 0.0146 for Ha :β = 0.
h. Likelihood-ratio statistic = 34.37 - 26.07 = 8.30, df = 1, P -value = 0.004.

10
5. a. At 26.3, estimated odds = exp[−12.351 + 0.497(26.3)] = 2.06, and at 27.3 the estimated odds = exp[−12.351 + 0.497(27.3)] = 3.38, and 3.38 = 1.64(2.06). For each
1-unit increase in x, the odds multiply by 1.64 (i.e., increase by 64%).
b. The approximate rate of change when π = 0.5 is βπ(1 − π) = β/4. The 95% Wald CI for β of (0.298, 0.697) translates to one for β/4 of (0.07, 0.17).
7. logit(ˆ ) = -3.866 + 0.397(snoring). Fitted probabilities are 0.021, 0.044, 0.093, 0.132. π Multiplicative effect on odds equals exp(0.397) = 1.49 for one-unit change in snoring, and
2.21 for two-unit change. Goodness-of-fit statistic G2 = 2.8, df = 2 shows no evidence of lack of fit.
9. The Cochran–Armitage test uses the ordering of rows and has df = 1, and tends to give smaller P -values when there truly is a linear trend.
11. Estimated odds of contraceptive use for those with at least 1 year of college were e0.501 = 1.65 times the estimated odds for those with less than 1 year of college. The 95%
Wald CI for the true odds ratio is exp[0.501 ± 1.96(0.077)] = (e0.350 , e0.652 ) = (1.42, 1.92).
14. The original variables c and x relate to the standardized variables zc and zx by zc =
(c − 2.44)/0.80 and zx = (x − 26.3)/2.11, so that c = 0.80zc + 2.44 and x = 2.11zx + 26.3.
Thus, the prediction equation is logit(ˆ ) = −10.071 − 0.509[0.80zc + 2.44] + 0.458[2.11zx + 26.3], π The coefficients of the standardized variables are -0.509(0.80) = -0.41 and 0.458(2.11) =
0.97. Adjusting for the other variable, a one standard deviation change in x has more than double the effect of a one standard deviation change in c. At x = 26.3, the esti¯ mated logits at c = 1 and at c = 4 are 1.465 and -0.062, which correspond to estimated probabilities of 0.81 and 0.48.

15. a. Black defendants with white victims had estimated probability e−3.5961+2.4044 /[1 + e−3.5961+2.4044 ] = 0.23.
b. For a given defendant’s race, the odds of the death penalty when the victim was white are estimated to be between e1.3068 = 3.7 and e3.7175 = 41.2 times the odds when the victim was black.
c. Wald statistic (−0.8678/0.3671)2 = 5.6, LR statistic = 5.0, each with df = 1. P -value
= 0.025 for LR statistic.
d. G2 = 0.38, X 2 = 0.20, df = 1, so model fits well.
19. R = 1: logit(ˆ ) = −6.7 + 0.1A + 1.4S. R = 0: logit(ˆ ) = −7.0 + 0.1A + 1.2S. π π
The YS conditional odds ratio is exp(1.4) = 4.1 for blacks and exp(1.2) = 3.3 for whites.
Note that 0.2, the coeff. of the cross-product term, is the difference between the log odds ratios 1.4 and 1.2. The coeff. of S of 1.2 is the log odds ratio between Y and S when R
= 0 (whites), in which case the RS interaction does not enter the equation. The P -value of P < 0.01 for smoking represents the result of the test that the log odds ratio between
Y and S for whites is 0.
22. Logit model gives fit, logit(ˆ ) = -3.556 + 0.053(income). π 25. The derivative equals β exp(α + βx)/[1 + exp(α + βx)]2 = βπ(x)(1 − π(x)).

11
26. The odds ratio eβ is approximately equal to the relative risk when the probability is near 0 and the complement is near 1, since eβ = [π(x + 1)/(1 − π(x + 1))]/[π(x)/(1 − π(x))] ≈ π(x + 1)/π(x).

27. ∂π(x)/∂x = βπ(x)[1 − π(x)], and π(1 − π) ≤ 0.25 with equality at π = 0.5. For multiple explanatory variable case, the rate of change as xi changes with other variables held constant is greatest when π = 0.5.
28. The square of the denominator is the variance of logit(ˆ ) = α + βx. For large n, the π ˆ ˆ ratio of (α + βx - logit(π0 ) to its standard deviation is approximately standard normal,
ˆ ˆ and (for fixed π0 ) all x for which the absolute ratio is no larger than zα/2 are not contradictory.
β)

29. a. Since log[π/(1 − π)] = α + log(dβ ), exponentiating yields π/(1 − π) = eα elog(d eα dβ . Letting d = 1, eα equals the odds for the first draft pick.
b. As a function of d, the odds decreases more quickly for pro basketball.

=

30. a. Let ρ = P(Y=1). By Bayes Theorem,
P (Y = 1|x) = ρ exp[−(x−µ1 )2 /2σ 2 ]/{ρ exp[−(x−µ1 )2 /2σ 2 +(1−ρ) exp[−(x−µ0 )2 /2σ 2 ]}
= 1/{1 + [(1 − ρ)/ρ] exp{−[µ2 − µ2 + 2x(µ1 − µ0 )]/2σ 2 }
0
1

= 1/{1 + exp[−(α + βx)]} = exp(α + βx)/[1 + exp(α + βx)], where β = (µ1 − µ0 )/σ 2 and α = − log[(1 − ρ)/ρ] + [µ2 − µ2 ]/2σ 2 .
0
1

32. a. Given {πi }, we can find parameters so model holds exactly. With constraint βI = 0, log[πI /(1 − πI )] = α determines α. Since log[πi /(1 − πi )] = α + βi , it follows that βi = log[πi /(1 − πi )]) − log[πI /(1 − πI )].
That is, βi is the log odds ratio for rows i and I of the table. When all βi are equal, then the logit is the same for each row, so πi is the same in each row, so there is independence.
35. d. When yi is a 0 or 1, the log likelihood is i [yi log πi + (1 − yi ) log(1 − πi )].
For the saturated model, πi = yi , and the log likelihood equals 0. So, in terms of the ML
ˆ
fit and the ML estimates {ˆi } for this linear trend model, the deviance equals π D = −2

i [yi

log πi + (1 − yi ) log(1 − πi )] = −2
ˆ
ˆ

= −2 i [yi (α + βxi ) + log(1 − πi )].
ˆ ˆ
ˆ
For this model, the likelihood equations are
So, the deviance simplifies to
ˆ
D = −2[α i πi + β i xi πi + i log(1 − πi )]
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ ˆ ˆ
= −2[ i πi (α + βxi ) + i log(1 − πi )]
= −2

i

πi log
ˆ

πi
ˆ
1−ˆi π −2

i

i

i [yi

yi =

log i πi
ˆ
1−ˆi π πi and
ˆ

+ log(1 − πi )]
ˆ
i

xi yi =

i

log(1 − πi ).
ˆ

xi πi .
ˆ

40. a. Expand log[p/(1 − p)] in a Taylor series for a neighborhood of points around p = π, and take just the term with the first derivative.
b. Let pi = yi /ni . The ith sample logit is
(t)

(t)

(t)

(t)

(t)

log[pi /(1 − pi )] ≈ log[πi /(1 − πi )] + (pi − πi )/πi (1 − πi )

12
(t)

(t)

(t)

(t)

(t)

= log[πi /(1 − πi )] + [yi − ni πi ]/ni πi (1 − πi )

Chapter 6
1. logit(ˆ ) = -9.35 + 0.834(weight) + 0.307(width). π a. Like. ratio stat. = 32.9 (df = 2), P < 0.0001. There is extremely strong evidence that at least one variable affects the response.
b. Wald statistics are (0.834/0.671)2 = 1.55 and (0.307/0.182)2 = 2.85. These each have df = 1, and the P -values are 0.21 and 0.09. These predictors are highly correlated
(Pearson corr. = 0.887), so this is the problem of multicollinearity.
12. The estimated odds of admission were 1.84 times higher for men than women. Howˆ ever, θAG(D) = 0.90, so given department, the estimated odds of admission were 0.90 times as high for men as for women. Simpson’s paradox strikes again! Men applied relatively more often to Departments A and B, whereas women applied relatively more often to Departments C, D, E, F. At the same time, admissions rates were relatively high for Departments A and B and relatively low for C, D, E, F. These two effects combine to give a relative advantage to men for admissions when we study the marginal association.
The values of G2 are 2.68 for the model with no G effect and 2.56 for the model with
G and D main effects. For the latter model, CI for conditional AG odds ratio is (0.87,
1.22).
17. The CMH statistic simplifies to the McNemar statistic of Sec. 11.1, which in chisquared form equals (14 −6)2 /(14 + 6) = 3.2 (df = 1). There is slight evidence of a better response with treatment B (P = 0.074 for the two-sided alternative).
27. logit(ˆ ) = −12.351 + 0.497x. Prob. at x = 26.3 is 0.674; prob. at x = 28.4 (i.e., one π std. dev. above mean) is 0.854. The odds ratio is [(0.854/0.146)/(0.674/0.326)] = 2.83, so λ = 1.04, δ = 5.1. Then n = 75.
31. We consider the contribution to the X 2 statistic of its two components (corresponding to the two levels of the response) at level i of the explanatory variable. For simplicity, we use the notation of (4.21) but suppress the subscripts. Then, that contribution is
(y − nπ)2 /nπ + [(n − y) − n(1 − π)]2 /n(1 − π), where the first component is (observed
- fitted)2 /fitted for the “success” category and the second component is (observed fitted)2 /fitted for the “failure” category. Combining terms gives (y − nπ)2 /nπ(1 − π), which is the square of the residual. Adding these chi-squared components therefore gives the sum of the squared residuals.
35. The noncentrality is the same for models (X + Z) and (Z), so the difference statistic has noncentrality 0. The conditional XY independence model has noncentrality proportional to n, so the power goes to 1 as n increases.
41. a. E(Y ) = α + β1 X + β2 Z. The slope β1 of the line for the partial relationship between E(Y ) and X is the same at all fixed levels of Z.

13
b. With dummy (indicator) variables for Z, one has parallelism of lines. That is, the slope of the line relating E(Y ) and X is the same for each category of Z.
c. Use dummy variables for X and Z, but no interaction terms. The difference between
E(Y ) at two categories of X is the same at each fixed category of Z.
d. For logistic models, the odds ratio relating Y and X is the same at each category of Z.
Chapter 7
21. Log likelihood for the probit model is log 


N

Φ



βj xij

i=1

βj xij

i

= i yi log 

Φ

j

βj xij j βj xij

1−Φ

For the probit model,
∂L
=
∂βj



i



yi 

1−Φ
Φ

xij φ i j j j

βj xij

βj xij

βj xij j βj xij

1−Φ



φ

j



i

i

=⇒ i 

βj xij



+

i

log 1 − Φ

j

j

βj xij j βj xij

1−Φ

+ xij φ j βj xij

j

βj xij Φ

2

= 0

yi xij φ

=⇒

βj xij j βj xij xij 1 − Φ

ˆ yi xij φ j βj xij
ˆ
1−Φ j βj xij Φ

=⇒



1−yi 

(1 − yi ) log 1 − Φ

+

j

i



1−Φ

j

yi log Φ

=

yi

ˆ βj xij

πi (1 − πi )
ˆ
ˆ

j

ˆ βj xij



xij φ



i

xij φ i j

1−Φ j ˆ βj xij πi
ˆ

πi (1 − πi )
ˆ
ˆ

ˆ βj xij
ˆ
j βj xij

= 0

= 0

ˆ yi − πi xij zi = 0, where zi = φ(Σj βj xij )/ˆi (1 − πi ).
ˆ
π
ˆ

For logistic regression, from (4.28) with {ni = 1},

i

yi − πi xij = 0.
ˆ

25. (log π(x2 ))/(log π(x1 )) = exp[β(x2 − x1 )], so π(x2 ) = π(x1 )exp[β(x2 −x1 )] . For x2 − x1 =
1, π(x2 ) equals π(x1 ) raised to the power exp(β).

Chapter 8

j

βj xij

14
3. Both gender and race have significant effects. The logistic model with additive effects and no interaction fits well, with G2 = 0.2 based on df = 2. The estimated odds of preferring Democrat instead of Republican are higher for females and for blacks, with estimated conditional odds ratios of 1.8 between gender and party ID and 9.8 between race and party ID.
7. For any collapsing of the response, for Democrats the estimated odds of response in the liberal direction are exp(0.975) = 2.65 times the estimated odds for Republicans. The estimated probability of a very liberal response equals exp(−2.469)/[1 + exp(−2.469)] =
0.078 for Republicans and exp(−2.469 + 0.975)/[1 + exp(−2.469 + 0.975)] = 0.183 for
Democrats.
8. a. Four intercepts are needed for five response categories. For males in urban areas wearing seat belts, all dummy variables equal 0 and the estimated cumulative probabilities are exp(3.3074)/[1 + exp(3.3074)] = 0.965, exp(3.4818)/[1 + exp(3.4818)] = 0.970, exp(5.3494)/[1 + exp(5.3494)] = 0.995, exp(7.2563)/[1 + exp(7.2563)] = 0.9993, and 1.0.
The corresponding response probabilities are 0.965, 0.005, 0.025, 0.004, and 0.0007.
b. Wald CI is exp[−0.5463±1.96(0.0272)] = (exp(−0.600), exp(−0.493)) = (0.549, 0.611).
Give seat belt use and location, the estimated odds of injury below any fixed level for a female are between 0.549 and 0.611 times the estimated odds for a male.
c. Estimated odds ratio equals exp(−0.7602 − 0.1244) = 0.41 in rural locations and exp(−0.7602) = 0.47 in urban locations. The interaction effect -0.1244 is the difference between the two log odds ratios.
10. a. Setting up indicator variables (1,0) for (male, female) and (1,0) for (sequential, alternating), we get treatment effect = -0.581 (SE = 0.212) and gender effect = -0.541
(SE = 0.295). The estimated odds ratios are 0.56 and 0.58. The sequential therapy leads to a better response than the alternating therapy; the estimated odds of response with sequential therapy below any fixed level are 0.56 times the estimated odds with alternating therapy.
b. The main effects model fits well (G2 = 5.6, df = 7), and adding an interaction term does not give an improved fit (The interaction model has G2 = 4.5, df = 6).
15. The estimated odds a Democrat is classified in the more liberal instead of the more conservative of two adjacent categories are exp(0.435) = 1.54 times the estimated odds for a Republican. For the two extreme categories, the estimated odds ratio equals exp[4(0.435)] = 5.7.
17.a. Using scores 3.2, 3.75, 4.5, 5.2, the proportional odds model has a treatment effect of 0.805 with SE = 0.206; for the treatment group, the estimated odds that ending cholesterol is below any fixed level are exp(0.805) = 2.24 times the odds for the control group. The psyllium treatment seems to have had a strong, beneficial effect.
18. CMH statistic for correlation alternative, using equally-spaced scores, equals 6.3
(df = 1) and has P -value = 0.012. When there is roughly a linear trend, this tends to be more powerful and give smaller P -values, since it focuses on a single degree of freedom.
LR statistic for cumulative logit model with linear effect of operation = 6.7, df = 1, P =

15
0.01; strong evidence that operation has an effect on dumping, gives similar results as in
(a). LR statistic comparing this model to model with four separate operation parameters equals 2.8 (df = 3), so simpler model is adequate.
29. The multinomial mass function factors as the multinomial coefficient times n πJ exp[ J−1 ni log(πi /πJ )], which has the form a function of the data times a function of i=1 the parameters (namely (1 − π1 − ... − πJ−1 )n ) times an exponential function of a sum of the observations times the canonical parameters, which are the baseline-category logits.
1 +β1 x)+β2 exp(α2
32. ∂π3 (x)/∂x = −[β1 exp(α1 +β1 x)+exp(α2 +β+β2 x)] .
2
[1+exp(α
2 x)]
a. The denominator is positive, and the numerator is negative when β1 > 0 and β2 > 0.

36. The baseline-category logit model refers to individual categrories rather than cumulative probabilities. There is not linear structure for baseline-category logits that implies identical effects for each cumulative logit.
37. a. For j < k, logit[P (Y ≤ j | X = xi )] - logit[P (Y ≤ k | X = xi )] =
(αj − αk ) + (βj − βk )x. This difference of cumulative probabilities cannot be positive since P (Y ≤ j) ≤ P (Y ≤ k); however, if βj > βk then the difference is positive for large x, and if βj > βk then the difference is positive for small x.
39. a. df = I(J − 1) − [(J − 1) + (I − 1)] = (I − 1)(J − 2).
c. The full model has an extra I − 1 parameters.
d. The cumulative probabilities in row a are all smaller or all greater than those in row b depending on whether µa > µb or µa < µb .
43. For a given subject, the model has the form πj =

αj + βj x + γuj
.
h αh + βh x + γuh

For a given cost, the odds a female selects a over b are exp(βa − βb ) times the odds for males. For a given gender, the log odds of selecting a over b depend on ua − ub .
Chapter 9
1. G2 values are 2.38 (df = 2) for (GI, HI), and 0.30 (df = 1) for (GI, HI, GH).
b. Estimated log odds ratios is -0.252 (SE = 0.175) for GH association, so CI for odds ratio is exp[−0.252 ± 1.96(0.175)]. Similarly, estimated log odds ratio is 0.464 (SE =
0.241) for GI association, leading to CI of exp[0.464 ± 1.96(0.241)]. Since the intervals contain values rather far from 1.0, it is safest to use model (GH, GI, HI), even though simpler models fit adequately.
4. For either approach, from (8.14), the estimated conditional log odds ratio equals
ˆ
ˆ
ˆ
ˆ λAC + λAC − λAC − λAC
11
22
12
21
5. a. G2 = 31.7, df = 48. The data are sparse, but the model seems to fit well. It is

16 plausible that the association between any two items is the same at each combination of levels of the other two items.
b. log(µ11cl µ33cl /µ13cl µ31cl ) = log(µ11cl ) + log(µ33cl ) − log(µ13cl ) − log(µ31cl ).
Substitute model formula, and simplify. The estimated odds ratio equals exp(2.142) =
8.5. There is a strong positive association. Given responses on C and L, the estimated odds of judging spending on E to be too much instead of too little are 8.5 times as high for those who judge spending on H to be too much than for those who judge spending on H to be too low. The 95% CI is exp[2.142 ± 1.96(0.523)], or (3.1, 24.4.). Though it is very wide, it is clear that the true association is strong.
7. a. Let S = safety equipment, E = whether ejected, I = injury. Then, G2 (SE, SI, EI) =
2.85, df = 1. Any simpler model has G2 > 1000, so it seems there is an association for each pair of variables, and that association can be regarded as the same at each level of the third variable. The estimated conditional odds ratios are 0.091 for S and E (i.e., wearers of seat belts are much less likely to be ejected), 5.57 for S and I, and 0.061 for
E and I.
b. Loglinear models containing SE are equivalent to logit models with I as response variable and S and E as explanatory variables. The loglinear model (SE, SI, EI) is equivalent to a logit model in which S and E have additive effects on I. The estimated odds of a fatal injury are exp(2.798) = 16.4 times higher for those ejected (controlling for S), and exp(1.717) = 5.57 times higher for those not wearing seat belts (controlling for E).
8. Injury has estimated conditional odds ratios 0.58 with gender, 2.13 with location, and
0.44 with seat-belt use. “No” is category 1 of I, and “female” is category 1 of G, so the odds of no injury for females are estimated to be 0.58 times the odds of no injury for males (controlling for L and S); that is, females are more likely to be injured. Similarly, the odds of no injury for urban location are estimated to be 2.13 times the odds for rural location, so injury is more likely at a rural location, and the odds of no injury for no seat belt use are estimated to be 0.44 times the odds for seat belt use, so injury is more likely for no seat belt use, other things being fixed. Since there is no interaction for this model, overall the most likely case for injury is therefore females not wearing seat belts in rural locations.
9. a. (DV F, Y D, Y V, Y F ).
b. Model with Y as response and additive factor effects for D and V , logit(π) =
V
α + βiD + βj .
c. (i) (DV F, Y ), logit(π) = α, (ii) (DV F, Y F ), logit(π) = α + βiF ,
DV
(iii) (DV F, Y DV, Y F ), add term of form βij to logit model.
13. Homogeneous association model (BP, BR, BS, P R, P S, RS) fits well (G2 = 7.0, df = 9). Model deleting PR association also fits well (G2 = 10.7, df = 11), but we use the full model.
For homogeneous association model, estimated conditional BS odds ratio equals exp(1.147)
= 3.15. For those who agree with birth control availability, the estimated odds of viewing premarital sex as wrong only sometimes or not wrong at all are about triple the

17 estimated odds for those who disagree with birth control availability; there is a positive association between support for birth control availability and premarital sex. The 95%
CI is exp(1.147 ± 1.645(0.153)) = (2.45, 4.05).
Model (BP R, BS, P S, RS) has G2 = 5.8, df = 7, and also a good fit.
17. b. log θ11(k) = log µ11k + log µ22k − log θ12k − log θ21k = λXY + λXY − λXY − λXY ; for
11
22
12
21 zero-sum constraints, as in problem 16c this simplifies to 4λXY .
11
e. Use equations such as λ = log(µ111 ), λXY Z = log ijk λX = log i µi11
,
µ111

λXY = log ij µij1 µ111 µi11 µ1j1

[µijk µ11k /µi1k µ1jk ]
[µij1 µ111 /µi11 µ1j1 ]

19. a. When Y is jointly independent of X and Z, πijk = π+j+ πi+k . Dividing πijk by π++k , we find that P (X = i, Y = j|Z = k) = P (X = i|Z = k)P (Y = j). But when πijk = π+j+ πi+k , P (Y = j|Z = k) = π+jk /π++k = π+j+ π++k /π++k = π+j+ = P (Y = j).
Hence, P (X = i, Y = j|Z = k) = P (X = i|Z = k)P (Y = j) = P (X = i|Z = k)P (Y = j|Z = k) and there is XY conditional independence.
b. For mutual independence, πijk = πi++ π+j+ π++k . Summing both sides over k, πij+ = πi++ π+j+ , which is marginal independence in the XY marginal table.
c. For instance, model (Y, XZ) satisfies this, but X and Z are dependent (the conditional association being the same as the marginal association in each case, for this model).
21. Use the definitions of the models, in terms of cell probabilities as functions of marginal probabilities. When one specifies sufficient marginal probabilities that have the required one-way marginal probabilities of 1/2 each, these specified marginal distributions then determine the joint distribution. Model (XY, XZ, YZ) is not defined in the same way; for it, one needs to determine cell probabilities for which each set of partial odds ratios do not equal 1.0 but are the same at each level of the third variable.
a.

X

Y
Y
0.125 0.125 0.125 0.125
0.125 0.125 0.125 0.125
Z=1

Z=2

This is actually a special case of (X,Y,Z) called the equiprobability model.
b.
0.15 0.10 0.15 0.10
0.10 0.15 0.10 0.15
c.
1/4 1/24 1/12 1/8
1/8 1/12 1/24 1/4

18
d.
2/16 1/16 4/16 1/16
1/16 4/16 1/16 2/16
e. Any 2 × 2 × 2 table

T
1

23. Number of terms = 1 +

+

T
2

+ ... +

T
T

=

i

T i 1i 1T −i = (1 + 1)T ,

by the Binomial theorem.
25. a. The λXY term does not appear in the model, so X and Y are conditionally independent. All terms in the saturated model that are not in model (W XZ, W Y Z) involve X and Y , so permit an XY conditional association.
b. (W X, W Z, W Y, XZ, Y Z)
27. For independent Poisson sampling,
L=
i

j

nij log µij −

ni+ λX + i µij = nλ + i j

i

j

n+j λY − j exp(log µij ) i j

It follows that {ni+ }, {n+j } are minimal sufficient statistics, and the likelihood equations are µi+ = ni+ , µ+j = n+j for all i and j. Since the model is µij = µi+ µ+j /n,
ˆ
ˆ the fitted values are µij = µi+ µ+j /n = ni+ n+j /n. The residual degrees of freedom are
ˆ
ˆ ˆ
IJ − [1 + (I − 1) + (J − 1)] = (I − 1)(J − 1).
28. For this model, in a given row the J cell probabilities are equal. The likelihood equations are µi+ = ni+ for all i. The fitted values that satisfy the model and the likelihood
ˆ
equations are µij = ni+ /J.
ˆ

31. a. The formula reported in the table satisfies the likelihood equations µh+++ =
ˆ
nh+++ , µ+i++ = n+i++ , µ++j+ = n++j+ , µ+++k = n+++k , and they satisfy the model,
ˆ
ˆ
ˆ
which has probabilistic form πhijk = πh+++ π+i++ π++j+ π+++k , so by Birch’s results they are ML estimates.
b. Model (W X, Y Z) says that the composite variable (having marginal frequencies
{nhi++ }) is independent of the Y Z composite variable (having marginal frequencies
{n++jk }). Thus, df = [no. categories of (XY )-1][no. categories of (Y Z)-1] = (HI −
1)(JK − 1). Model (W XY, Z) says that Z is independent of the W XY composite variable, so the usual results apply to the two-way table having Z in one dimension, HIJ levels of W XY composite variable in the other; e.g., df = (HIJ − 1)(K − 1).
Chapter 10
2. a. For any pair of variables, the marginal odds ratio is the same as the conditional odds ratio (and hence 1.0), since the remaining variable is conditionally independent of each of those two.
b. (i) For each pair of variables, at least one of them is conditionally independent of the remaining variable, so the marginal odds ratio equals the conditional odds ratio. (ii)

19 these are the likelihood equations implied by the λAC term in the model.
c. (i) Both A and C are conditionally dependent with M, so the association may change when one controls for M. (ii) For the AM odds ratio, since A and C are conditionally independent (given M), the odds ratio is the same when one collapses over C. (iii) These are likelihood equations implied by the λAM and λCM terms in the model.
d. (i) no pairs of variables are conditionally independent, so collapsibility conditions are not satisfied for any pair of variables. (ii) These are likelihood equations implied by the three association terms in the model.
7. Model (AC, AM, CM) fits well. It has df = 1, and the likelihood equations imply fitted values equal observed in each two-way marginal table, which implies the difference between an observed and fitted count in one cell is the negative of that in an adjacent cell; their SE values are thus identical, as are the standardized Pearson residuals. The other models fit poorly; e.g. for model (AM, CM), in the cell with each variable equal to yes, the difference between the observed and fitted counts is 3.7 standard errors.
19. W and Z are separated using X alone or Y alone or X and Y together. W and Y are conditionally independent given X and Z (as the model symbol implies) or conditional on X alone since X separates W and Y . X and Z are conditionally independent given
W and Y or given only Y alone.
20. a. Yes – let U be a composite variable consisting of combinations of levels of Y and
Z; then, collapsibility conditions are satisfied as W is conditionally independent of U, given X.
b. No.
21. b. Using the Haberman result, it follows that µ1i log(ˆ0i ) =
ˆ
µ

µ0i log(ˆ0i )
ˆ
µ

ni log(ˆai ) = µ µai log(ˆai ),
ˆ
µ

a = 0, 1.

The first equation is obtained by letting {ˆi } be the fitted values for M0 . The second µ pair of equations is obtained by letting M1 be the saturated model. Using these, one can obtain the result.
25. From the definition, it follows that a joint distribution of two discrete variables is positively likelihood-ratio dependent if all odds ratios of form µij µhk /µik µhj ≥ 1, when i < h and j < k.
a. For L×L model, this odds ratio equals exp[β(uh −ui )(vk −vj )]. Monotonicity of scores implies ui < uh and vj < vk , so these odds ratios all are at least equal to 1.0 when β ≥ 0.
Thus, when β > 0, as X increases, the conditional distributions on Y are stochastically increasing; also, as Y increases, the conditional distributions on X are stochastically increasing. When β < 0, the variables are negatively likelihood-ratio dependent, and the conditional distributions on Y (X) are stochastically decreasing as X (Y ) increases.
b. For row effects model with j < k, µhj µik /µhk µij = exp[(µi − µh )(vk − vj )]. When µi − µh > 0, all such odds ratios are positive, since scores on Y are monotone increasing.
Thus, there is likelihood-ratio dependence for the 2 × J table consisting of rows i and h,

20 and Y is stochastically higher in row i.
27. a. Note the derivative of the log likelihood with respect to β is i j ui vj (nij − µij ), which under indep. estimates is n i j ui vj (pij − pi+ p+j ).
b. Use formula (3.9). In this context, ζ = ui vj (πij − πi+ π+j ) and φij = ui vj − ui ( b vb π+b ) − vj ( a ua πa+ ) Under H0 , πij = πi+ π+j , and πij φij simplifies to
−( ui πi+ )( vj π+j ). Also under H0 , i i

j

i

j

i

j

i

j

u2 πi+ )( i vj π+j )−2(

ui πi+ )(

ui vj πi+ π+j )(

+2(

j

i

i

j

j

2 vj π+j )

ui πi+ )2 (

u2 πi+ ) + ( i vj π+j )2 (

2 u2 vj πi+ π+j + ( i πij φ2 = ij vj π+j )2 −2(

ui πi+ )2 .

2 vj π+j )( j i

Then σ 2 in (3.9) simplifies to
[
i

u2 πi+ − ( i ui πi+ )2 ][ j i

2 vj π+j − (

vj π+j )2 ]. j √
The asymptotic standard error is σ/ n, the estimate of which is the same formula with πij replaced by pij .
28. For Poisson sampling, log likelihood is ni+ λX + i L = nλ + i n+j λY + j j

µi [ i j

nij vj ] −

exp(λ + ...) i j

Thus, the minimal sufficient statistics are {ni+ }, {n+j }, and { j nij vj }. Differentiating with respect to the parameters and setting results equal to zero gives the likelihood equations. For instance, ∂L/∂µi = j vj nij − j vj µij , i = 1, ..., I, from which follows the I equations in the third set of likelihood equations.
30. a. These equations are obtained successively by differentiating with respect to λXZ , λY Z , and β. Note these equations imply that the correlation between the scores for X and the scores for Y is the same for the fitted and observed data. This model uses the ordinality of X and Y , and is a parsimonious special case of model (XY, XZ, Y Z).
b. The third equation is replaced by the K equations, ui vj nijk ,

ui vj µijk =
ˆ
i

j

i

k = 1, ..., K.

j

This model corresponds to fitting L × L model separately at each level of Z. The G2 value is the sum of G2 for separate fits, and df is the sum of IJ − I − J values from separate fits (i.e., df = K(IJ − I − J)).

31. Deleting the XY superscript to simplify notation,

log θij(k) = (λij + λi+1,j+1 − λi,j+1 − λi+1,j ) + β(ui+1 − ui )(vj+1 − vj )wk .
This has form αij + βij wk , a linear function of the scores for the levels of Z. Thus, the conditional association between X and Y changes linearly across the levels of Z.

21
36. Suppose ML estimates did exist, and let c = µ111 . Then c > 0, since we must be able
ˆ
to evaluate the logarithm for all fitted values. But then µ112 = n112 − c, since likelihood
ˆ
equations for the model imply that µ111 + µ112 = n111 + n112 (i.e., µ11+ = n11+ ). Using
ˆ
ˆ
ˆ
similar arguments for other two-way margins implies that µ122 = n122 +c, µ212 = n212 +c,
ˆ
ˆ and µ222 = n222 − c. But since n222 = 0, µ222 = −c < 0, which is impossible. Thus we
ˆ
ˆ have a contradiction, and it follows that ML estimates cannot exist for this model.
37. That value for the sufficient statistic becomes more likely as the model parameter moves toward infinity.
Chapter 11
6. a. Ignoring order, (A=1,B=0) occurred 45 times and (A=0,B=1)) occurred 22 times.
The McNemar z = 2.81, which has a two-tail P -value of 0.005 and provides strong evidence that the response rate of successes is higher for drug A.
b. Pearson statistic = 7.8, df = 1
7. b. z 2 = (3 − 1)2 /(3 + 1) = 1.0 = CMH statistic.
e. The P -value equals the binomial probability of 3 or more successes out of 4 trials when the success probability equals 0.5, which equals 5/16.
9. a. Symmetry has G2 = 22.5, X 2 = 20.4, with df = 10. The lack of fit results primarily from the discrepancy between n13 and n31 , for which the adjusted residual is

(44 − 17)/ 44 + 17 = 3.5.
b. Compared to quasi symmetry, G2 (S | QS) = 22.5 − 10.0 = 12.5, df = 4, for a P -value of .014. The McNemar statistic for the 2×2 table with row and column categories (High

Point, Others) is z = (78 − 42)/ 78 + 42 = 3.3. The 95% CI comparing the proportion choosing High Point at the two times is 0.067 ± 0.039.
c. Quasi independence fits much better than independence, which has G2 = 346.4
(df = 16). Given a change in brands, the new choice of coffee brand is plausibly independent of the original choice.
12. a. Symmetry model has X 2 = 0.59, based on df = 3 (P = 0.90). Independence has
X 2 = 45.4 (df = 4), and quasi independence has X 2 = 0.01 (df = 1) and is identical to quasi symmetry. The symmetry and quasi independence models fit well.
b. G2 (S | QS) = 0.591 − 0.006 = 0.585, df = 3 − 1 = 2. Marginal homogeneity is plausible. c. Kappa = 0.389 (SE = 0.060), weighted kappa equals 0.427 (SE = 0.0635).
15. Under independence, on the main diagonal, fitted = 5 = observed. Thus, kappa =
0, yet there is clearly strong association in the table.
16. a. Good fit, with G2 = 0.3, df = 1. The parameter estimates for Coke, Pepsi, and
Classic Coke are 0.580 (SE = 0.240), 0.296 (SE = 0.240), and 0. Coke is preferred to
Classic Coke.
b. model estimate = 0.57, sample proportion = 29/49 = 0.59.
17. G2 = 4.29, X 2 = 4.65, df = 3; With JRSS-B parameter = 0, other estimates are

22
-0.269 for Biometrika, -0.748 for JASA, -3.218 for Communications, so prestige ranking is: 1. JRSS-B, 2. Biometrika, 3. JASA, 4. Commun. Stat.
26. The matched-pairs t test compares means for dependent samples, and McNemar‘s test compares proportions for dependent samples. The t test is valid for interval-scale data (with normally-distributed differences, for small samples) whereas McNemar’s test is valid for binary data.
28. a. This is a conditional odds ratio, conditional on the subject, but the other model is a marginal model so its odds ratio is not conditional on the subject.
d. This is simply the mean of the expected values of the individual binary observations.
e. In the three-way representation, note that each partial table has one observation in each row. If each response in a partial table is identical, then each cross-product that contributes to the M-H estimator equals 0, so that table makes no contribution to the statistic. Otherwise, there is a contribution of 1 to the numerator or the denominator, depending on whether the first observation is a success and the second a failure, or the reverse. The overall estimator then is the ratio of the numbers of such pairs, or in terms of the original 2×2 table, this is n12 /n21 .
30. When {αi } are identical, the individual trials for the conditional model are identical as well as independent, so averaging over them to get the marginal Y1 and Y2 gives binomials with the same parameters.
31. Since βM = log[π+1 (1 − π1+ )/(1 − π+1 )π1+ ], ∂βM /∂π+1 = 1/π+1 + 1/(1 − π+1 ) =
1/π+1 (1 − π+1 ) √ ∂βM /∂π1+ = −1/(1 − π1+ ) − 1/π1+ = −1/π1+ (1 − π1+ ). The covariand ance matrix of n(p+1 , p1+ ) has variances π+1 (1 − π+1 ) and π1+ (1 − π1+ ) and covariance

(π11 π22 −π12 π21 ). By the delta method, the asymptotic variance of n[log(p+1 p2+ /p+2 p1+ )− log(π+1 π2+ /π+2 π1+ )] is

[1/π+1 (1 − π+1 ), −1/π1+ (1 − π1+ )]Cov[ n(p+1 , p1+ )][1/π+1 (1 − π+1 ), −1/π1+ (1 − π1+ )]′ which simplifies to the expression given in the problem. Under independence, the last term in the variance expression drops out (since an odds ratio of 1.0 implies π11 π22 = π12 π21 ) and the variance simplifies √ (π1+ π2+ )−1 + (π+1 π+2 )−1 . Similarly, with the delta to −1
−1
ˆ method the asymptotic variance of n(βC ) is π12 +π21 (which leads to the SE in (11.10)
ˆ
for βC )); under independence, this is (π1+ π+2 )−1 +(π+1 π2+ )−1 . For each variance, combining the two parts to get a common denominator, then expressing marginal probabilities in each numerator in terms of cell probabilities and comparing the two numerators gives the result.
34. Consider the 3×3 table with cell probabilities, by row, (0.20, 0.10, 0, / 0, 0.30, 0.10,
/ 0.10, 0, 0.20).
41. a. Since πab = πba , it satisfies symmetry, which then implies marginal homogeneity and quasi symmetry as special cases. For a = b, πab has form αa βb , identifying βb with αb (1 − β), so it also satisfies quasi independence.
c. β = κ = 0 is equivalent to independence for this model, and β = κ = 1 is equivalent to perfect agreement.

23
43. a. log(Πac /Πca ) = βa − βc = (βa − βb ) + (βb − βc ) = log(Πab /Πba ) + log(Πbc /Πcb ).
b. No, this is not possible, since if a is preferred to b then βa > βb , and if b is preferred to c then βb > βc ; then, it follows that βa > βc , so a is preferred to c.
45. The kernel of the log likelihood simplifies to a

Similar Documents

Premium Essay

A Critique on Kant's Principle of Autonomy

...OF STUDENT…………………………………………………………. SIGNATURE…………………………………………………………………… DATE: …………………………………………………………………………… SUPERVISOR………………………………………………………………….. SIGNATURE…………………………………………………………………… DATE: ………………………………………………………………………….. 3 ABSTRACT The importance of a philosophical study dealing with moral issues, especially the principle of autonomy is indisputably great. It is a common agreement that morality is located within the scope of duty. Kant corroborates this held agreement by stating the categorical imperative which every human is obliged to act upon. He conceived this categorical imperative as the moral law which all those who claim to be moral beings have to live on. However, he also affirmed that only autonomous beings can be moral. Moreover, Autonomy seems to be opposed to any idea of law. It is important to note that Kant conceived autonomy as auto-legislation, auto-determination of the moral subject while the categorical imperative requires a total submission of the same subject. What is categorical imperative? What is moral autonomy? How can a person be autonomous and...

Words: 21012 - Pages: 85

Premium Essay

Lying In Michael J. Sandel's Justice

...Kant believes that lying, in any shape or form, is completely wrong and immoral. In the book Justice, by Michael J. Sandel, a certain scenario was shown to display Kant’s view on lying. The scenario talks about a situation in which you are faced with the decision on whether or not to lie to a murderer in order to save your friends life. In most cases, the answer would be an obvious yes to people. According to Kant’s views on lying, however, he says that you should tell the murderer the truth even if you are risking the life of your friend. Kant offers an alternative to lying, which he says are true yet misleading statements. Kant is okay with these misleading statements because he says that these follow the categorical imperative. The categorical imperative says, according to Sandel, “act only on that maxim whereby you can at the same time that it should become universal law.” (120) Kant believes that the only good way of acting or thinking is by this technique. This applies to the murder scenario because Kant believes that you should tell the truth because it’s the right thing to do, not because of the consequences or...

Words: 673 - Pages: 3

Free Essay

Homework 1

...education fails to improve the quality of instruction in both primary and secondary schools, then it is likely that it will lose additional students to private sector in the years ahead. Answer: Conditional 5.) it is strongly recommended that you have your house inspected for termite damage at the earliest possible opportunity. Answer: Advice 7.) If stem-cell research is restricted, then future cures will not materialize. If future cures do not materialize, then people will die permanently. Therefore, if stem-cell research is restricted, then people will die permanently. Answer: Hypothetical 10.) Five college student who were accused in sneaking into the Cincinnati Zoo and trying to ride the camels pleaded no contest to criminal trespass yesterday. The student scaled a fence to get into the zoo and then climbed another fence to get into the camel pit before security officials caught them, zoo officials said.  Answer: Report 11.) Mortality rates for women undergoing early abortions, where the procedure is legal, appear to be as low as or lower than the rates for normal childbirth. Consequently, any interest of the state in protecting the woman from an inherently hazardous procedure, except when it would be equally dangerous for her to forgo it, has largely disappeared. Answer: Generalization 12.) The pace of reading, clearly, depends entirely upon the reader. He may read as slowly or as rapidly as he can or wishes to read. If he does not understand something...

Words: 857 - Pages: 4

Premium Essay

Kant's Categorical Imperative

...The Categorical Imperative Analyzing Immanuel Kant’s Grounding for A Metaphysics of Morals Anders Bordum WP 4/2002 January 2002 MPP Working Paper No. 4/2002 © January 2002 ISBN: 87-91181-06-2 ISSN: 1396-2817 Department of Management, Politics and Philosophy Copenhagen Business School Blaagaardsgade 23B DK-2200 Copenhagen N Denmark Phone: +45 38 15 36 30 Fax: +45 38 15 36 35 E-mail: as.lpf@cbs.dk www.cbs.dk/departments/mpp 2 The Categorical Imperative Analyzing Immanuel Kant's Grounding for a Metaphysics of Morals By Anders Bordum Keywords: Categorical imperative, discourse ethics, duty, ethics, monologic, dialogic, Immanuel Kant, Jürgen Habermas, self-legislation, self-reference. 3 Abstract In this article I first argue that Immanuel Kant’s conception of the categorical imperative is important to his philosophy. I systematically, though indirectly, interconnect the cognitive and moral aspects of his thinking. Second, I present an interpretation of the Kantian ethics, taking as my point of departure, the concept of the categorical imperative. Finally, I show how the categorical imperative is given a dialogical interpretation by Jürgen Habermas in his approach, usually referred to as discourse ethics. I argue that the dialogical approach taken by discourse ethics is more justifiable and therefore more usefuli. I The Synthesis of Rationalism and Empiricism The philosophy of Immanuel Kant (1724-1804) is in the main inspired...

Words: 10855 - Pages: 44

Premium Essay

Cross Race Effect

...Why have I chosen this topic? One of the main reasons I selected this topic was because I myself have experienced the Cross-Race Effect (CRE) phenomenon. Before, I could never differentiate between East -Asians (Chinese, Japanese, and Korean). My interest in Japanese culture motivated me to study about them and now I can differentiate a Japanese person from a Chinese or Korean person. A summary of what I did: My primary interest was to know that, “How was I able to differentiate between East Asians races just by studying and watching videos about them?” To get my answer I first started by research material available on the Cross-Race effect. To really understand CRE I read abstracts of 9 to 10 books. All in all it has been a pleasure in reading all those books especially D.T. Levin books which helped me in getting a convincing answer to my questions. Cross-Race Effect: The cross-race effect, also known as own-race bias (ORB), is a well established phenomenon in face recognition research. In brief, it has been found that individuals show superior performance in identifying faces of their own race when compared with memory for faces of another, less familiar race. Mechanisms underlying the Cross-Race Effect: Percept versus concept: CRE has been of interest to social psychologists for more than half a century. A number of theoretical explanations for this effect have been proposed but coming to agreement on a satisfying theoretical account for this effect has proven...

Words: 1524 - Pages: 7

Premium Essay

Ehical Decisions

...Introduction: Organizations are all comprised of what makes them who they are; the people. People are all comprised of different make-ups and people are what make businesses what they are which brings me to the point of this discussion; Unethical behavior within organizations. Unethical behavior within organizations has been occurring for centuries and it is what led to their ultimate demise. Unethical behavior is the beginning of the end in some companies and in some of those it results in the ruin of what started out to be a good thing. Some of these companies started out as small prosperous businesses that later grew into large dominate organizations for example; Enron, and of course WorldCom. These businesses began with good intentions and ended up internally combusting. All of it was due to the result of GREED. Greed is a disease, and has plagued several organizational leaders over time and caused them to go against their good ethics and morals. There are many opinions as to why people commit the acts that they do but the bottom line is that money will sometimes bring out the evil in the best of people and Leaders of Corporate America are not immune. Background: The beginning phases of WorldCom began in 1983 with a plan to create a long distance telephone carrier service named (Long Distance Discount Service) Mr. Ebbers was one of the major investor’s and later became the CEO. Like most businesses this one was no different and grew over the years...

Words: 1283 - Pages: 6

Premium Essay

Hello

... | |A Requirement for Paul Amerigo Pajo’s IT-Ethic Class | |De La Salle – College of Saint Benilde | Abstract This book is a consolidated collection of opinions on the Ethical Theories, a chapter from a book assigned to the students of IT-Ethic Section O0B, advised by Mr. Paul Amerigo Pajo. Works written by James Rachels, John Arthur, Friedrich Nietzche and the like are studied and analyzed and some are criticized by the author of this book. Dedication I dedicate this book to the following: My dearest family, who always believes in me; My sweetest friends, who never fails to keep me sane in this crazy world; Lastly, I dedicate this book to the Almighty Father, for everything else is nothing without You. Chapter I Egoism and Moral Scepticism James Rachels Amazon Reference: http://www.amazon.com/Contemporary-Moral-Problems-James-White/dp/0495553204/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1235694270&sr=8-1 Quote: But suppose we were to concede, for the sake of the argument, that all voluntary action is motivated by the agent’s wants, or at least that Smith is so motivated. Even if this were granted, it would not follow that...

Words: 4684 - Pages: 19

Free Essay

Pdf, Docx

...to already existing drugs, or possible side effects. 2. How fuel efficient a certain car model is? 3. Is there any relationship between your GPA and employment opportunities? 4. If you answer all questions on a (T, F) (or multiple choice) examination completely randomly, what are your chances of passing? 5. What is the effect of package designs on sales? 6. ………………….. Question??? 1. What is Statistics? 2. Why we study Statistics? Larson & Farber, Elementary Statistics: Picturing the World, 3e 2 STA 13- SYLLABUS Instructor Phone: MsC. Pham Thanh Hieu mobile:0917.522.383, email: hieuphamthanh@gmail.com Goals of  To learn how to interpret statistical summaries appearing the course in journals, newspaper reports, internet, television …..and many real-world problems.  To learn about the concepts of probability and probabilistic reasoning  Understand variability and sampling distributions  To learn how to interpret and analyze data arising in your own work (coursework and research) STA 13- SYLLABUS Grading: - One Midterms : 30% total, multiple choice exams, closed book exam, one sheet with handwritten notes (no larger than 9 ½ x 11, two sided) is allowed - Final Exam : 50% (multiple choice + short answer exam) comprehensive; closed book exam, two sheets with handwritten notes (no larger than 9 ½ x 11, two sided) are allowed Homework: 20%. Submit homework in discussion sessions. On homework, please print your name. ...

Words: 2522 - Pages: 11

Premium Essay

Kant's Case Of Lie Essay

...action. So, in the case of lying, I would never be able to lie, because I would not want to live in a world where everyone lied always in every instance. Furthermore, Kant would say that, if I do lie, then I would be responsible for the consequences. However, one of my biggest problems, raised by the book, is that Kant never really seemed to explain whether someone would be responsible for the consequences of telling the truth....

Words: 609 - Pages: 3

Premium Essay

Green Marketing - a Research Proposal

...Project FOX Fad or Expedient? - Perceptions of Consumers and Organisations on Green Marketing. Mieke van Kaam a research proposal – 22 April 2012 Table of Contents 1. Background 3 2. Problem statement 3 3. Research objectives 4 4. The scope and limitations of the proposed research 4 5. Literature review 6 5.1. Green fever –A load of Greenwash or not. 6 5.2. How green can you go? 7 5.3. Lets collaborate! 7 5.4. Consumer evolution 8 6. Research plan 9 6.1. Description of research subjects and design 9 6.2. Sampling plan 9 6.3. Instruments 9 6.4. Procedures 9 7. Proposed methods for processing, analysing and interpreting data 11 7.1. Quantitative 11 7.2. Qualitative 11 8. Timeline 12 9. Potential outcomes and conclusion 13 10. Reference list 14 11. Appendix A 15 Background * Green marketing is the product modifications and/or changes in production processes, * packaging and advertising, made by companies to ensure that the final consumer product * is environmentally safe. * This is a simple definition for green marketing, but how many consumers and organisations * in South Africa (SA) actually understand the essence of green marketing. And if they do, * what are their viewpoints on green marketing and how was it shaped? Do organisations see * it as a fad attribute that's merely added to a product to ensure premium pricing options * and eventually higher profits for the company...

Words: 3360 - Pages: 14

Premium Essay

Business Ethics

...of California Theresa Carter Module 1 Case Assignment ETH501: Business Ethics Saturday, April 26, 2014 A Master’s paper submitted to the faculty of University of California In partial fulfillment of the requirement for the award of Graduate Diploma in Master’s Degree in Business Management Introduction The purpose of this assignment is to provide a critical analysis of the 2002 collapse of Adelphia Communications as seen through the lens of Immanuel Kant deontological ethics. This analysis will be accomplished by providing a brief time lime of the Adelphia, identifying and discussing two key ethical problems raised and describing what is meant by deontological ethics. More specifically this paper will show how Kant’s Categorical Imperative (CI) applies to this scenario. The latter discussion will apply the deontological framework of business ethics to the two key ethical problems by constructing CI to the Adelphia scenario. The supporting material for this discussion can be found at Harvard University’s 2011 lecture: Justice, What’s the right thing to do? as presented by Professor Michael Sandel8. To examine the elements of the case, we will inspect the unethical behavior of five key figures culpable in the “rise and fall of the small town saga of epic dimensions8”. John J. Rigas (Founder), his two sons; Timothy J. Riga’s (CFO), Michael J. Riga’s (VP of Operations), James R. Brown (VP of Finance) and Michael C. Mulcahey (Director of Internal Reporting)....

Words: 2164 - Pages: 9

Premium Essay

Abortion

...Are Individual Rights More Important Than Human Life? By Talha Sajjad English 161: Academic II Dr. William Ford University of Illinois at Chicago May 3rd, 2010 There are protests and demonstrations held every day, yet somehow abortion is still legal in the United States. In the decision of the Supreme Court case Roe v. Wade, it was ruled that women have the right, given to them by the Constitution, to have an abortion in the early stages of pregnancy (Infoplease). Hundreds of protesters gather outside clinics that offer abortions and try to present their position on the issue, but it seems as though their cries and complains are never heard. The main question that we must decide on is this: is it just to take away human life before it even has the chance to be lived? Several countries around the world have outlawed the practice of abortion. When deciding the abortion issue, its women’s rights as citizens of the United States versus the religious beliefs of a majority of citizens. What is more important, the sanctity of life or allowing murder on the basis of one’s right to choose? Given the abortion procedure allows women sexual and reproductive freedom, it has unconsciously led to a trend where abortion is being used as a method of contraception. In the United States, 49% of the pregnancies are unintended and American women used abortion as a tool to terminate almost half of these pregnancies (Infoplease). Abortion was not meant to be used in accidental...

Words: 3303 - Pages: 14

Premium Essay

Business Ethics

...Williams-Rivers Module 1 Case Assignment ETH501: Business Ethics Dr. Gary Shelton Saturday, April 26, 2014 A Master’s paper submitted to the faculty of Touro University California In partial fulfillment of the requirement for the award of Graduate Diploma in Master’s Degree in Business Management Introduction The purpose of this assignment is to provide a critical analysis of the 2002 collapse of Adelphia Communications as seen through the lens of Immanuel Kant deontological ethics. This analysis will be accomplished by providing a brief time lime of the Adelphia, identifying and discussing two key ethical problems raised and describing what is meant by deontological ethics. More specifically this paper will show how Kant’s Categorical Imperative (CI) applies to this scenario. The latter discussion will apply the deontological framework of business ethics to the two key ethical problems by constructing CI to the Adelphia scenario. The supporting material for this discussion can be found at Harvard University’s 2011 lecture: Justice, What’s the right thing to do? as presented by Professor Michael Sandel8. To examine the elements of the case, we will inspect the unethical behavior of five key figures culpable in the “rise and fall of the small town saga of epic dimensions8”. John J. Rigas (Founder), his two sons; Timothy J. Riga’s (CFO), Michael J. Riga’s (VP of Operations), James R. Brown (VP of Finance) and Michael C. Mulcahey (Director of Internal Reporting)....

Words: 2168 - Pages: 9

Premium Essay

Define Crimonology

...Task 1- how would you define criminology? Criminology could be defined in many ways. My understanding is that there is not one proven categorical definition of the word criminology as there are so many ways to describe it. Criminology has been orientated towards sociological, and has been in the main since the 1920’s. However there are some statements that link criminology to that of Psychological and Biological thinking the book stipulate. However regardless of whichever disipline you choose to define criminology one thing is for certain which “is the use of a systematic way of thinking”. My definition of Criminology is that of the scientific study of crime, criminals, criminal behaviour, the nature and extent of a crime, and the causes and control of criminal behaviour in both the individual and in society. Criminology could be called a social science as it combines the efforts drawn up between sociologists, psychologists and social anthropologists. With there theories and findings it gives a better understanding of crime and criminal behaviour that can support prosecutors, judges, lawyers, prison officials and probation officers, so they can improve or develop more appropriate sentences and treatments for criminal behaviour. Criminology centres its attention on its research in to crime and the individuals who commit crime, it also looks at the criminal justice system in the hope that the information can be transformed into policies that will effect the handling...

Words: 423 - Pages: 2

Free Essay

Not an Essay

...AS Philosophy & Ethics Course Handbook 2013 to 2014 [pic] OCR AS Level Religious Studies (H172) http://www.ocr.org.uk/qualifications/type/gce/hss/rs/index.aspx OCR AS Level Religious Studies (H172) You are studying Philosophy of Religion and Religious Ethics and will be awarded an OCR AS Level in Religious Studies. The modules and their weightings are: |AS: |Unit Code |Unit Title |% of AS |(% of A Level) | | |G571 |AS Philosophy of Religion |50% |(25%) | | |G572 |AS Religious Ethics |50% |(25%) | If you decide to study for the full A Level you will have to study the following modules at A2: |A2: |Unit Code |Unit Title |(% of A Level) | | |G581 |A2 Philosophy of Religion |(25%) | | |G582 |A2 Religious Ethics |(25%) | Grading | ...

Words: 13036 - Pages: 53