Maximum Likelihood Estimation and Mothod of Moments
In:
Submitted By alexyao Words 1759 Pages 8
7.31 MLE’s and MOM
31. Maximum Likelihood Estimation and Method of Moments
Definition of MLE’s
Easy Examples
Trickier Examples
Invariance Property of MLE’s
Method of Moments
1
7.31 MLE’s and MOM
Definition of MLE’s
Definition: Consider an i.i.d. random sample X1, . . . , Xn, where each Xi has p.d.f./p.m.f. f (x). Further, suppose that θ is some unknown parameter from Xi. The likelihood function is L(θ) ≡ n f (xi). i=1 Definition: The maximum likelihood estimator (MLE) of θ is the value of θ that maximizes L(θ). The MLE is a function of the Xi’s and is therefore a RV.
2
Now maximize L(λ) with respect to λ.
Could take the derivative and plow through all of the horrible algebra. Too tedious. Need a trick. . . .
3
7.31 MLE’s and MOM
Useful Trick: Since the natural log function is one-toone, it’s easy to see that the λ that maximizes L(λ) also maximizes n(L(λ))! n(L(λ)) =
n
λn exp(−λ
n i=1 xi )
= n n(λ) − λ
n i=1 xi
This makes our job less horrible. n ∂ n n
∂
n(L(λ)) =
(n n(λ)−λ xi ) = − xi ≡ 0.
∂λ
∂λ λ i=1 i=1 ¯
This implies that the MLE is ˆ = 1/X. λ 4
7.31 MLE’s and MOM
¯
Remarks: (1) ˆ = 1/X makes sense since E[X] = 1/λ. λ (2) At the end, we put a little hat over λ to indicate that this is the MLE.
(3) At the end, we make all of the little xi’s into big
Xi’s to indicate that this is a RV.
(4) Just to be careful, you probably ought to perform a second-derivative test, but I won’t blame you if you don’t. 5
7.31 MLE’s and MOM
iid
Example: Suppose X1, . . . , Xn ∼ Bern(p).
Find the
MLE for p.
Useful trick for this problem: Since
Xi =
1 w.p. p
,
0 w.p. 1 − p
we can write the p.m.f. as f (x) = px(1 − p)1−x,
x = 0, 1.
6
7.31 MLE’s and MOM
Thus,
L(p) =
n i=1 n
=
f (xi)
pxi (1 − p)1−xi
i=1
= p
n i=1 xi (1
n− n xi i=1 − p)
⇒ n(L(p)) =
n i=1 xi n(p) + (n −
n i=1 xi) n(1 − p)
7
7.31 MLE’s and MOM
⇒
∂
n(L(p)) =
∂p
i xi
p
n − i xi
−
≡ 0.
1−p
⇒
(1 − p)
⇒
n i=1 xi − p n −
n i=1 xi
= 0
¯ p = X.
ˆ
This makes sense since E[X] = p.
8
7.31 MLE’s and MOM
Trickier Examples iid Example: Suppose X1, . . . , Xn ∼ Nor(µ, σ 2). Find the simultaneous MLE’s for µ and σ 2.
L(µ, σ 2)
n
=
i=1
f (xi)
1 (xi − µ)2
√
exp −
=
2
2
σ2 i=1 2πσ n 1
1
1 n (xi − µ)2
=
exp −
2)n/2
2 i=1 σ2 (2πσ
9
7.31 MLE’s and MOM
This ⇒ n(L(µ, σ 2))
n n n
2) − 1
= − n(2π) − n(σ
(xi − µ)2
2
2
2σ 2 i=1
⇒ (by the chain rule)
∂
1 n n(L(µ, σ 2)) = 2
(xi − µ) ≡ 0
∂µ
σ i=1
⇒
¯ µ = X.
ˆ
Now do the same thing for σ 2. . .
10
7.31 MLE’s and MOM
Similarly, take the partial w/rt σ 2 (not σ),
∂
n
1 n n(L(µ, σ 2)) = − 2 +
(xi − µ)2 ≡ 0.
ˆ
2
4
∂σ
2σ
2σ i=1
⇒
−nσ 2
+
n i=1 (xi − x)2 = 0.
¯
After a bit more algebra, we get σ2 =
n (X i i=1
n
¯
− X)2
.
11
7.31 MLE’s and MOM
Recap:
¯
µ = X,
ˆ
σ2 =
n (X i i=1
n
¯
− X)2
.
Remark: Notice how close σ 2 is to the (unbiased) sample variance
S2
=
¯ n − X)2
=
σ2 n−1 n−1
n (X i i=1
σ 2 is a little bit biased, but it has slightly less variance than S 2. Anyway, as n gets big, S 2 and σ 2 become the same.
12
7.31 MLE’s and MOM
Example: The p.d.f. of the Gamma distrn w/parameters r and λ is λr r−1 −λx f (x) = x e
,
Γ(r)
x > 0.
iid
Suppose X1, . . . , Xn ∼ Gam(r, λ). Find the MLE’s for r and λ. r−1 n λnr f (xi) = xi e−λ
L(r, λ) =
[Γ(r)]n i=1 i=1 n
i xi .
13
7.31 MLE’s and MOM
This ⇒ n(L) = rn n(λ)−n n(Γ(r))+(r −1) n( i xi)−λ
i
⇒ n rn
∂
n(L) =
−
xi ≡ 0,
∂λ
λ i=1 so that ˆ = r/X. λ ˆ ¯
The trouble is, we need to find r. . .
ˆ
14
xi
7.31 MLE’s and MOM
Similar to the above work, we get
∂
n d n(L) = n n(λ) −
Γ(r) + n( xi) ≡ 0.
∂r
Γ(r) dr i Note that Ψ(r) ≡ Γ (r)/Γ(r) is the digamma function.
At this point, substitute in ˆ = r/X, and use a comλ
ˆ ¯ puter to search for the value of r that solves
¯
n n(r/X) − nΨ(r) + n( i xi) ≡ 0.
15
7.31 MLE’s and MOM
iid
Example: Suppose X1, . . . , Xn ∼ Unif(0, θ). Find the
MLE for θ.
First of all, the p.d.f. is f (x) = 1/θ, 0 < x < θ, and you need to beware of the funny limits.
In any case,
L(θ) =
n i=1 f (xi) =
1/θn if 0 ≤ xi ≤ θ, ∀i
0
otherwise
16
7.31 MLE’s and MOM
In order to have L(θ) > 0, we must have 0 ≤ xi ≤ θ,
∀i. In other words, we must have θ ≥ maxi xi.
Subject to this constraint, L(θ) = 1/θ n is maximized
ˆ
at the smallest possible θ value, namely, θ = maxi Xi.
This makes sense in light of the similar (unbiased) estimator, Y2 = n+1 maxi Xi, from the previous module. n Remark: We used very little calculus in this example!
17
7.31 MLE’s and MOM
Invariance Property of MLE’s
ˆ
Theorem (Invariance Property): If θ is the MLE of some parameter θ and h(·) is a one-to-one function,
ˆ
then h(θ) is the MLE of h(θ).
Remark: We noted before that such a property does not hold for unbiasedness.
For instance, although
√
2] = σ 2, it is usually the case that E[ S 2] = σ.
E[S
18
7.31 MLE’s and MOM
iid
Example: Suppose X1, . . . , Xn ∼ Nor(µ, σ 2).
We saw that the MLE for σ 2 is σ 2 =
n (X −X)2 i ¯ i=1 .
n
√
If we consider the one-to-one function h(y) = + y, then the invariance property says that the MLE of σ is σ =
ˆ
σ2 =
n (X i i=1
n
¯
− X)2
.
19
7.31 MLE’s and MOM
iid
Example: Suppose X1, . . . , Xn ∼ Exp(λ).
¯
We saw that the MLE for λ is ˆ = 1/X. λ Meanwhile, we define the survival function as
¯
F (x) = Pr(X > x) = 1 − F (x) = e−λx.
Then the invariance property says that the MLE of
¯
F (x) is
ˆ
¯
¯
F (x) = e−λx = e−x/X .
20
7.31 MLE’s and MOM
The Method of Moments
Recall: The kth moment of a RV X is
E[X k ] =
xk f (x) x if X is discrete
xk f (x) dx if X is cts
Definition: Suppose X1, . . . , Xn are i.i.d. from p.d.f. /
p.m.f. f (x). Then the method of moments (MOM) estimator for E[X k ] is
n k Xi /n. i=1 21
7.31 MLE’s and MOM
Examples:
¯
The MOM estimator for µ = E[Xi] is X =
2
The MOM estimator for E[Xi ] is
n i=1 Xi/n.
n
2
Xi /n. i=1 2
The MOM estimator for Var(Xi) = E[Xi ] − (E[Xi])2 is 1 n
2
¯
Xi − X 2 = n i=1
n
2
Xi i=1 ¯
− nX 2
n
n−1 2
=
S . n (Of course, it’s also OK to use S 2.)
22
7.31 MLE’s and MOM
iid
Example: Suppose X1, . . . , Xn ∼ Pois(λ).
¯
Since λ = E[Xi], a MOM estimator for λ is X.
But also note that λ = Var(Xi), so another MOM estimator for λ is n−1 S 2 (or plain old S 2). n Usually use the easier-looking estimator if you have a choice. 23
7.31 MLE’s and MOM
iid
Example: Suppose X1, . . . , Xn ∼ Nor(µ, σ 2).
¯
MOM estimators for µ and σ 2 are X and n−1 S 2 (or n S 2), respectively.
For this example, these estimators are the same as the MLE’s.
Let’s finish up with a less-trivial example. . .
24
7.31 MLE’s and MOM
iid
Example: Suppose X1, . . . , Xn ∼ Beta(a, b).
The p.d.f. is
Γ(a + b) a−1 x f (x) =
(1 − x)b−1,
Γ(a)Γ(b)
0 < x < 1.
It turns out (after lots of alg) that a ,
E[X] = a+b ab
Var(X) =
.
2(a + b + 1)
(a + b)
Let’s estimate a and b via MOM.
25
7.31 MLE’s and MOM
We have
¯
a bE[X] bX
E[X] =
⇒ a =
≈
,
¯
a+b
1 − E[X]
1−X
(∗)
so ab E[X]b
Var(X) =
=
.
2(a + b + 1)
(a + b)
(a + b)(a + b + 1)
¯
Plug into the above X for E[X], S 2 for Var(X), and
¯
bX
¯
1−X
for a.
We can now solve for b (tho it’ll take lots of alg).
26
7.31 MLE’s and MOM
After some work, we get
¯ ¯
(1 − X)2X
¯
− 1 + X. b ≈
S2
To finish up, you can plug back into (∗) to get the
MOM estimator for a.
Example (Hayter): Suppose we take a bunch of observations from a Beta distrn and it turns out that
¯
X = 0.3007 and S 2 = 0.01966. Then the MOM estimators for a and b are 2.92 and 6.78, respectively.
27