Maximum Likelihood Estimation and Mothod of Moments

7.31 MLE’s and MOM

31. Maximum Likelihood Estimation and Method of Moments
Deﬁnition of MLE’s
Easy Examples
Trickier Examples
Invariance Property of MLE’s
Method of Moments
1

7.31 MLE’s and MOM

Deﬁnition of MLE’s
Deﬁnition: Consider an i.i.d. random sample X1, . . . , Xn, where each Xi has p.d.f./p.m.f. f (x). Further, suppose that θ is some unknown parameter from Xi. The likelihood function is L(θ) ≡ n f (xi). i=1 Deﬁnition: The maximum likelihood estimator (MLE) of θ is the value of θ that maximizes L(θ). The MLE is a function of the Xi’s and is therefore a RV.
2

7.31 MLE’s and MOM

Easy Examples iid Example: Suppose X1, . . . , Xn ∼ Exp(λ).
MLE for λ.
L(λ) =

n i=1 f (xi) =

n

Find the

λe−λxi = λn exp(−λ

i=1

n i=1 xi).

Now maximize L(λ) with respect to λ.
Could take the derivative and plow through all of the horrible algebra. Too tedious. Need a trick. . . .
3

7.31 MLE’s and MOM

Useful Trick: Since the natural log function is one-toone, it’s easy to see that the λ that maximizes L(λ) also maximizes n(L(λ))! n(L(λ)) =

n

λn exp(−λ

n i=1 xi )

= n n(λ) − λ

n i=1 xi

This makes our job less horrible. n ∂ n n
∂
n(L(λ)) =
(n n(λ)−λ xi ) = − xi ≡ 0.
∂λ
∂λ λ i=1 i=1 ¯
This implies that the MLE is ˆ = 1/X. λ 4

7.31 MLE’s and MOM

¯
Remarks: (1) ˆ = 1/X makes sense since E[X] = 1/λ. λ (2) At the end, we put a little hat over λ to indicate that this is the MLE.
(3) At the end, we make all of the little xi’s into big
Xi’s to indicate that this is a RV.
(4) Just to be careful, you probably ought to perform a second-derivative test, but I won’t blame you if you don’t. 5

7.31 MLE’s and MOM

iid

Example: Suppose X1, . . . , Xn ∼ Bern(p).

Find the

MLE for p.
Useful trick for this problem: Since
Xi =





1 w.p. p
,
0 w.p. 1 − p

we can write the p.m.f. as f (x) = px(1 − p)1−x,

x = 0, 1.

6

7.31 MLE’s and MOM

Thus,
L(p) =

n i=1 n

=

f (xi)

pxi (1 − p)1−xi

i=1

= p

n i=1 xi (1

n− n xi i=1 − p)

⇒ n(L(p)) =

n i=1 xi n(p) + (n −

n i=1 xi) n(1 − p)
7

7.31 MLE’s and MOM

⇒
∂
n(L(p)) =
∂p

i xi

p

n − i xi
−
≡ 0.
1−p

⇒
(1 − p)
⇒

n i=1 xi − p n −

n i=1 xi

= 0

¯ p = X.
ˆ
This makes sense since E[X] = p.
8

7.31 MLE’s and MOM

Trickier Examples iid Example: Suppose X1, . . . , Xn ∼ Nor(µ, σ 2). Find the simultaneous MLE’s for µ and σ 2.
L(µ, σ 2)

n

=

i=1

f (xi)

1 (xi − µ)2
√
exp −
=
2
2
σ2 i=1 2πσ n 1

1
1 n (xi − µ)2
=
exp −
2)n/2
2 i=1 σ2 (2πσ
9

7.31 MLE’s and MOM

This ⇒ n(L(µ, σ 2))

n n n
2) − 1
= − n(2π) − n(σ
(xi − µ)2
2
2
2σ 2 i=1

⇒ (by the chain rule)
∂
1 n n(L(µ, σ 2)) = 2
(xi − µ) ≡ 0
∂µ
σ i=1
⇒
¯ µ = X.
ˆ
Now do the same thing for σ 2. . .
10

7.31 MLE’s and MOM

Similarly, take the partial w/rt σ 2 (not σ),
∂
n
1 n n(L(µ, σ 2)) = − 2 +
(xi − µ)2 ≡ 0.
ˆ
2
4
∂σ
2σ
2σ i=1
⇒
−nσ 2

+

n i=1 (xi − x)2 = 0.
¯

After a bit more algebra, we get σ2 =

n (X i i=1

n

¯
− X)2

.
11

7.31 MLE’s and MOM

Recap:
¯
µ = X,
ˆ

σ2 =

n (X i i=1

n

¯
− X)2

.

Remark: Notice how close σ 2 is to the (unbiased) sample variance
S2

=

¯ n − X)2
=
σ2 n−1 n−1

n (X i i=1

σ 2 is a little bit biased, but it has slightly less variance than S 2. Anyway, as n gets big, S 2 and σ 2 become the same.
12

7.31 MLE’s and MOM

Example: The p.d.f. of the Gamma distrn w/parameters r and λ is λr r−1 −λx f (x) = x e
,
Γ(r)

x > 0.

iid

Suppose X1, . . . , Xn ∼ Gam(r, λ). Find the MLE’s for r and λ. r−1 n λnr f (xi) = xi e−λ
L(r, λ) =
[Γ(r)]n i=1 i=1 n

i xi .

13

7.31 MLE’s and MOM

This ⇒ n(L) = rn n(λ)−n n(Γ(r))+(r −1) n( i xi)−λ

i

⇒ n rn
∂
n(L) =
−
xi ≡ 0,
∂λ
λ i=1 so that ˆ = r/X. λ ˆ ¯
The trouble is, we need to ﬁnd r. . .
ˆ
14

xi

7.31 MLE’s and MOM

Similar to the above work, we get
∂
n d n(L) = n n(λ) −
Γ(r) + n( xi) ≡ 0.
∂r
Γ(r) dr i Note that Ψ(r) ≡ Γ (r)/Γ(r) is the digamma function.
At this point, substitute in ˆ = r/X, and use a comλ
ˆ ¯ puter to search for the value of r that solves
¯
n n(r/X) − nΨ(r) + n( i xi) ≡ 0.
15

7.31 MLE’s and MOM

iid

Example: Suppose X1, . . . , Xn ∼ Unif(0, θ). Find the
MLE for θ.
First of all, the p.d.f. is f (x) = 1/θ, 0 < x < θ, and you need to beware of the funny limits.

In any case,
L(θ) =

n i=1 f (xi) =





1/θn if 0 ≤ xi ≤ θ, ∀i
0
otherwise

16

7.31 MLE’s and MOM

In order to have L(θ) > 0, we must have 0 ≤ xi ≤ θ,
∀i. In other words, we must have θ ≥ maxi xi.
Subject to this constraint, L(θ) = 1/θ n is maximized
ˆ
at the smallest possible θ value, namely, θ = maxi Xi.
This makes sense in light of the similar (unbiased) estimator, Y2 = n+1 maxi Xi, from the previous module. n Remark: We used very little calculus in this example!
17

7.31 MLE’s and MOM

Invariance Property of MLE’s
ˆ
Theorem (Invariance Property): If θ is the MLE of some parameter θ and h(·) is a one-to-one function,
ˆ
then h(θ) is the MLE of h(θ).
Remark: We noted before that such a property does not hold for unbiasedness.

For instance, although
√
2] = σ 2, it is usually the case that E[ S 2] = σ.
E[S
18

7.31 MLE’s and MOM

iid

Example: Suppose X1, . . . , Xn ∼ Nor(µ, σ 2).
We saw that the MLE for σ 2 is σ 2 =

n (X −X)2 i ¯ i=1 .

n

√

If we consider the one-to-one function h(y) = + y, then the invariance property says that the MLE of σ is σ =
ˆ

σ2 =

n (X i i=1

n

¯
− X)2

.

19

7.31 MLE’s and MOM

iid

Example: Suppose X1, . . . , Xn ∼ Exp(λ).
¯
We saw that the MLE for λ is ˆ = 1/X. λ Meanwhile, we deﬁne the survival function as
¯
F (x) = Pr(X > x) = 1 − F (x) = e−λx.

Then the invariance property says that the MLE of
¯
F (x) is
ˆ
¯
¯
F (x) = e−λx = e−x/X .
20

7.31 MLE’s and MOM

The Method of Moments
Recall: The kth moment of a RV X is
E[X k ] =











xk f (x) x if X is discrete

xk f (x) dx if X is cts

Deﬁnition: Suppose X1, . . . , Xn are i.i.d. from p.d.f. /
p.m.f. f (x). Then the method of moments (MOM) estimator for E[X k ] is

n k Xi /n. i=1 21

7.31 MLE’s and MOM

Examples:
¯
The MOM estimator for µ = E[Xi] is X =
2
The MOM estimator for E[Xi ] is

n i=1 Xi/n.

n
2
Xi /n. i=1 2
The MOM estimator for Var(Xi) = E[Xi ] − (E[Xi])2 is 1 n
2
¯
Xi − X 2 = n i=1

n
2
Xi i=1 ¯
− nX 2

n

n−1 2
=
S . n (Of course, it’s also OK to use S 2.)
22

7.31 MLE’s and MOM

iid

Example: Suppose X1, . . . , Xn ∼ Pois(λ).
¯
Since λ = E[Xi], a MOM estimator for λ is X.
But also note that λ = Var(Xi), so another MOM estimator for λ is n−1 S 2 (or plain old S 2). n Usually use the easier-looking estimator if you have a choice. 23

7.31 MLE’s and MOM

iid

Example: Suppose X1, . . . , Xn ∼ Nor(µ, σ 2).
¯
MOM estimators for µ and σ 2 are X and n−1 S 2 (or n S 2), respectively.
For this example, these estimators are the same as the MLE’s.

Let’s ﬁnish up with a less-trivial example. . .
24

7.31 MLE’s and MOM

iid

Example: Suppose X1, . . . , Xn ∼ Beta(a, b).
The p.d.f. is
Γ(a + b) a−1 x f (x) =
(1 − x)b−1,
Γ(a)Γ(b)

0 < x < 1.

It turns out (after lots of alg) that a ,
E[X] = a+b ab
Var(X) =
.
2(a + b + 1)
(a + b)

Let’s estimate a and b via MOM.
25

7.31 MLE’s and MOM

We have
¯
a bE[X] bX
E[X] =
⇒ a =
≈
,
¯
a+b
1 − E[X]
1−X

(∗)

so ab E[X]b
Var(X) =
=
.
2(a + b + 1)
(a + b)
(a + b)(a + b + 1)
¯
Plug into the above X for E[X], S 2 for Var(X), and
¯
bX
¯
1−X

for a.

We can now solve for b (tho it’ll take lots of alg).
26

7.31 MLE’s and MOM

After some work, we get
¯ ¯
(1 − X)2X
¯
− 1 + X. b ≈
S2
To ﬁnish up, you can plug back into (∗) to get the
MOM estimator for a.
Example (Hayter): Suppose we take a bunch of observations from a Beta distrn and it turns out that
¯
X = 0.3007 and S 2 = 0.01966. Then the MOM estimators for a and b are 2.92 and 6.78, respectively.
27

Maximum Likelihood Estimation and Mothod of Moments

Similar Documents

Popular Essays