Free Essay

Maximum Likelihood Estimation and Mothod of Moments

In:

Submitted By alexyao
Words 1759
Pages 8
7.31 MLE’s and MOM

31. Maximum Likelihood Estimation and Method of Moments
Definition of MLE’s
Easy Examples
Trickier Examples
Invariance Property of MLE’s
Method of Moments
1

7.31 MLE’s and MOM

Definition of MLE’s
Definition: Consider an i.i.d. random sample X1, . . . , Xn, where each Xi has p.d.f./p.m.f. f (x). Further, suppose that θ is some unknown parameter from Xi. The likelihood function is L(θ) ≡ n f (xi). i=1 Definition: The maximum likelihood estimator (MLE) of θ is the value of θ that maximizes L(θ). The MLE is a function of the Xi’s and is therefore a RV.
2

7.31 MLE’s and MOM

Easy Examples iid Example: Suppose X1, . . . , Xn ∼ Exp(λ).
MLE for λ.
L(λ) =

n i=1 f (xi) =

n

Find the

λe−λxi = λn exp(−λ

i=1

n i=1 xi).

Now maximize L(λ) with respect to λ.
Could take the derivative and plow through all of the horrible algebra. Too tedious. Need a trick. . . .
3

7.31 MLE’s and MOM

Useful Trick: Since the natural log function is one-toone, it’s easy to see that the λ that maximizes L(λ) also maximizes n(L(λ))! n(L(λ)) =

n

λn exp(−λ

n i=1 xi )

= n n(λ) − λ

n i=1 xi

This makes our job less horrible. n ∂ n n

n(L(λ)) =
(n n(λ)−λ xi ) = − xi ≡ 0.
∂λ
∂λ λ i=1 i=1 ¯
This implies that the MLE is ˆ = 1/X. λ 4

7.31 MLE’s and MOM

¯
Remarks: (1) ˆ = 1/X makes sense since E[X] = 1/λ. λ (2) At the end, we put a little hat over λ to indicate that this is the MLE.
(3) At the end, we make all of the little xi’s into big
Xi’s to indicate that this is a RV.
(4) Just to be careful, you probably ought to perform a second-derivative test, but I won’t blame you if you don’t. 5

7.31 MLE’s and MOM

iid

Example: Suppose X1, . . . , Xn ∼ Bern(p).

Find the

MLE for p.
Useful trick for this problem: Since
Xi =





1 w.p. p
,
0 w.p. 1 − p

we can write the p.m.f. as f (x) = px(1 − p)1−x,

x = 0, 1.

6

7.31 MLE’s and MOM

Thus,
L(p) =

n i=1 n

=

f (xi)

pxi (1 − p)1−xi

i=1

= p

n i=1 xi (1

n− n xi i=1 − p)

⇒ n(L(p)) =

n i=1 xi n(p) + (n −

n i=1 xi) n(1 − p)
7

7.31 MLE’s and MOM



n(L(p)) =
∂p

i xi

p

n − i xi

≡ 0.
1−p


(1 − p)


n i=1 xi − p n −

n i=1 xi

= 0

¯ p = X.
ˆ
This makes sense since E[X] = p.
8

7.31 MLE’s and MOM

Trickier Examples iid Example: Suppose X1, . . . , Xn ∼ Nor(µ, σ 2). Find the simultaneous MLE’s for µ and σ 2.
L(µ, σ 2)

n

=

i=1

f (xi)

1 (xi − µ)2

exp −
=
2
2
σ2 i=1 2πσ n 1

1
1 n (xi − µ)2
=
exp −
2)n/2
2 i=1 σ2 (2πσ
9

7.31 MLE’s and MOM

This ⇒ n(L(µ, σ 2))

n n n
2) − 1
= − n(2π) − n(σ
(xi − µ)2
2
2
2σ 2 i=1

⇒ (by the chain rule)

1 n n(L(µ, σ 2)) = 2
(xi − µ) ≡ 0
∂µ
σ i=1

¯ µ = X.
ˆ
Now do the same thing for σ 2. . .
10

7.31 MLE’s and MOM

Similarly, take the partial w/rt σ 2 (not σ),

n
1 n n(L(µ, σ 2)) = − 2 +
(xi − µ)2 ≡ 0.
ˆ
2
4
∂σ

2σ i=1

−nσ 2

+

n i=1 (xi − x)2 = 0.
¯

After a bit more algebra, we get σ2 =

n (X i i=1

n

¯
− X)2

.
11

7.31 MLE’s and MOM

Recap:
¯
µ = X,
ˆ

σ2 =

n (X i i=1

n

¯
− X)2

.

Remark: Notice how close σ 2 is to the (unbiased) sample variance
S2

=

¯ n − X)2
=
σ2 n−1 n−1

n (X i i=1

σ 2 is a little bit biased, but it has slightly less variance than S 2. Anyway, as n gets big, S 2 and σ 2 become the same.
12

7.31 MLE’s and MOM

Example: The p.d.f. of the Gamma distrn w/parameters r and λ is λr r−1 −λx f (x) = x e
,
Γ(r)

x > 0.

iid

Suppose X1, . . . , Xn ∼ Gam(r, λ). Find the MLE’s for r and λ. r−1 n λnr f (xi) = xi e−λ
L(r, λ) =
[Γ(r)]n i=1 i=1 n

i xi .

13

7.31 MLE’s and MOM

This ⇒ n(L) = rn n(λ)−n n(Γ(r))+(r −1) n( i xi)−λ

i

⇒ n rn

n(L) =

xi ≡ 0,
∂λ
λ i=1 so that ˆ = r/X. λ ˆ ¯
The trouble is, we need to find r. . .
ˆ
14

xi

7.31 MLE’s and MOM

Similar to the above work, we get

n d n(L) = n n(λ) −
Γ(r) + n( xi) ≡ 0.
∂r
Γ(r) dr i Note that Ψ(r) ≡ Γ (r)/Γ(r) is the digamma function.
At this point, substitute in ˆ = r/X, and use a comλ
ˆ ¯ puter to search for the value of r that solves
¯
n n(r/X) − nΨ(r) + n( i xi) ≡ 0.
15

7.31 MLE’s and MOM

iid

Example: Suppose X1, . . . , Xn ∼ Unif(0, θ). Find the
MLE for θ.
First of all, the p.d.f. is f (x) = 1/θ, 0 < x < θ, and you need to beware of the funny limits.

In any case,
L(θ) =

n i=1 f (xi) =





1/θn if 0 ≤ xi ≤ θ, ∀i
0
otherwise

16

7.31 MLE’s and MOM

In order to have L(θ) > 0, we must have 0 ≤ xi ≤ θ,
∀i. In other words, we must have θ ≥ maxi xi.
Subject to this constraint, L(θ) = 1/θ n is maximized
ˆ
at the smallest possible θ value, namely, θ = maxi Xi.
This makes sense in light of the similar (unbiased) estimator, Y2 = n+1 maxi Xi, from the previous module. n Remark: We used very little calculus in this example!
17

7.31 MLE’s and MOM

Invariance Property of MLE’s
ˆ
Theorem (Invariance Property): If θ is the MLE of some parameter θ and h(·) is a one-to-one function,
ˆ
then h(θ) is the MLE of h(θ).
Remark: We noted before that such a property does not hold for unbiasedness.

For instance, although

2] = σ 2, it is usually the case that E[ S 2] = σ.
E[S
18

7.31 MLE’s and MOM

iid

Example: Suppose X1, . . . , Xn ∼ Nor(µ, σ 2).
We saw that the MLE for σ 2 is σ 2 =

n (X −X)2 i ¯ i=1 .

n



If we consider the one-to-one function h(y) = + y, then the invariance property says that the MLE of σ is σ =
ˆ

σ2 =

n (X i i=1

n

¯
− X)2

.

19

7.31 MLE’s and MOM

iid

Example: Suppose X1, . . . , Xn ∼ Exp(λ).
¯
We saw that the MLE for λ is ˆ = 1/X. λ Meanwhile, we define the survival function as
¯
F (x) = Pr(X > x) = 1 − F (x) = e−λx.

Then the invariance property says that the MLE of
¯
F (x) is
ˆ
¯
¯
F (x) = e−λx = e−x/X .
20

7.31 MLE’s and MOM

The Method of Moments
Recall: The kth moment of a RV X is
E[X k ] =











xk f (x) x if X is discrete

xk f (x) dx if X is cts

Definition: Suppose X1, . . . , Xn are i.i.d. from p.d.f. /
p.m.f. f (x). Then the method of moments (MOM) estimator for E[X k ] is

n k Xi /n. i=1 21

7.31 MLE’s and MOM

Examples:
¯
The MOM estimator for µ = E[Xi] is X =
2
The MOM estimator for E[Xi ] is

n i=1 Xi/n.

n
2
Xi /n. i=1 2
The MOM estimator for Var(Xi) = E[Xi ] − (E[Xi])2 is 1 n
2
¯
Xi − X 2 = n i=1

n
2
Xi i=1 ¯
− nX 2

n

n−1 2
=
S . n (Of course, it’s also OK to use S 2.)
22

7.31 MLE’s and MOM

iid

Example: Suppose X1, . . . , Xn ∼ Pois(λ).
¯
Since λ = E[Xi], a MOM estimator for λ is X.
But also note that λ = Var(Xi), so another MOM estimator for λ is n−1 S 2 (or plain old S 2). n Usually use the easier-looking estimator if you have a choice. 23

7.31 MLE’s and MOM

iid

Example: Suppose X1, . . . , Xn ∼ Nor(µ, σ 2).
¯
MOM estimators for µ and σ 2 are X and n−1 S 2 (or n S 2), respectively.
For this example, these estimators are the same as the MLE’s.

Let’s finish up with a less-trivial example. . .
24

7.31 MLE’s and MOM

iid

Example: Suppose X1, . . . , Xn ∼ Beta(a, b).
The p.d.f. is
Γ(a + b) a−1 x f (x) =
(1 − x)b−1,
Γ(a)Γ(b)

0 < x < 1.

It turns out (after lots of alg) that a ,
E[X] = a+b ab
Var(X) =
.
2(a + b + 1)
(a + b)

Let’s estimate a and b via MOM.
25

7.31 MLE’s and MOM

We have
¯
a bE[X] bX
E[X] =
⇒ a =

,
¯
a+b
1 − E[X]
1−X

(∗)

so ab E[X]b
Var(X) =
=
.
2(a + b + 1)
(a + b)
(a + b)(a + b + 1)
¯
Plug into the above X for E[X], S 2 for Var(X), and
¯
bX
¯
1−X

for a.

We can now solve for b (tho it’ll take lots of alg).
26

7.31 MLE’s and MOM

After some work, we get
¯ ¯
(1 − X)2X
¯
− 1 + X. b ≈
S2
To finish up, you can plug back into (∗) to get the
MOM estimator for a.
Example (Hayter): Suppose we take a bunch of observations from a Beta distrn and it turns out that
¯
X = 0.3007 and S 2 = 0.01966. Then the MOM estimators for a and b are 2.92 and 6.78, respectively.
27

Similar Documents