Free Essay

Finance Notes

In:

Submitted By dolce891
Words 69445
Pages 278
Lecture Notes in Finance 1 (MiQE/F, MSc course at UNISG)
Paul Söderlind1
14 December 2011

1 University

of St. Gallen. Address: s/bf-HSG, Rosenbergstrasse 52, CH-9000 St. Gallen,
Switzerland. E-mail: Paul.Soderlind@unisg.ch. Document name: Fin1MiQEFAll.TeX

Contents

1

Mean-Variance Frontier
1.1 Portfolio Return: Mean, Variance, and the Effect of Diversification
1.2 Mean-Variance Frontier of Risky Assets . . . . . . . . . . . . . .
1.3 Mean-Variance Frontier of Riskfree and Risky Assets . . . . . . .
1.4 Examples of Portfolio Weights from MV Calculations . . . . . . .

.
.
.
.

.
.
.
.

4
4
9
19
22

A A Primer in Matrix Algebra

24

B A Primer in Optimization

27

2

.
.
.
.
.
.
.
.

31
31
32
37
39
42
45
46
47

3

Risk Measures
3.1 Symmetric Dispersion Measures . . . . . . . . . . . . . . . . . . . .
3.2 Downside Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Empirical Return Distributions . . . . . . . . . . . . . . . . . . . . .

54
54
56
67

4

CAPM
4.1 Portfolio Choice with Mean-Variance Utility . . . . . . . . . . . . . .

70
70

Index Models
2.1 The Inputs to a MV Analysis .
2.2 Single-Index Models . . . . .
2.3 Estimating Beta . . . . . . . .
2.4 Multi-Index Models . . . . . .
2.5 Principal Component Analysis
2.6 Estimating Expected Returns .
2.7 Estimation on Subsamples . .
2.8 Robust Estimation . . . . . .

.
.
.
.

.
.
.
.
.
..
..
..

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

1

4.2
4.3
4.4

Beta Representation of Expected Returns . . . . . . . . . . . . . . .
Market Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . .
An Application of MV Portfolio Choice: International Assets . . . . .

80
85
89

5

Utility-Based Portfolio Choice
97
5.1 Utility Functions and Risky Investments . . . . . . . . . . . . . . . . 97
5.2 Utility Optimization and the Two-Fund Theorem . . . . . . . . . . . 104
5.3 Application of Normal Returns: Value at Risk, ES, Lpm and the Telser
Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.4 Behavioural Finance . . . . . . . . . . . . . . . . . . . . . . . . . . 120

6

CAPM Extensions
6.1 Background Risk . . . . . . . . .
6.2 Heterogenous Investors . . . . . .
6.3 CAPM without a Riskfree Rate .
6.4 Multi-Factor Models and APT . .
6.5 Joint Portfolio and Savings Choice

7

.
.
.
.
.

124
124
132
134
137
140

Testing CAPM and Multifactor Models
7.1 Market Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Several Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3 Fama-MacBeth . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

145
145
155
158

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

A Statistical Tables

161

8

Performance Analysis
8.1 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . .
8.2 Performance Attribution . . . . . . . . . . . . . . . . . . . . . . . .
8.3 Style Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

164
164
173
174

9

Predicting Asset Returns
9.1 Asset Prices, Random Walks, and the Efficient Market Hypothesis
9.2 Autocorrelations . . . . . . . . . . . . . . . . . . . . . . . . . .
9.3 Other Predictors and Methods . . . . . . . . . . . . . . . . . . .
9.4 Security Analysts . . . . . . . . . . . . . . . . . . . . . . . . . .

177
177
181
189
193

.
.
.
.

.
.
.
.

2

9.5
9.6
9.7

Technical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Spurious Regressions and In-Sample Overfit . . . . . . . . . . . . . 201
Empirical U.S. Evidence on Stock Return Predictability . . . . . . . . 205

10 Event Studies
10.1 Basic Structure of Event Studies
10.2 Models of Normal Returns . . .
10.3 Testing the Abnormal Return . .
10.4 Quantitative Events . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

11 Investment for the Long Run
11.1 Time Diversification: Approximate Case . . . . . . . . .
11.2 Time Diversification and the Growth-Optimal Portfolio:
Returns . . . . . . . . . . . . . . . . . . . . . . . . . .
11.3 More General Utility Functions and Rebalancing . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

219
. . . . . . . 219
Lognormal
. . . . . . . 225
. . . . . . . 231

12 Dynamic Portfolio Choice
12.1 Optimal Portfolio Choice: CRRA Utility and iid Returns . . . . . .
12.2 Optimal Portfolio Choice: Logarithmic Utility and Non-iid Returns .
12.3 Optimal Portfolio Choice: CRRA Utility and non-iid Returns . . . .
12.4 Performance Measurement with Dynamic Benchmarks . . . . . .
A Some Proofs

.
.
.
.

210
210
212
215
218

.
.
.
.

234
234
235
243
252
257

3

1

Mean-Variance Frontier

Reference: Elton, Gruber, Brown, and Goetzmann (2010) 4–6; Fabozzi, Focardi, and
Kolm (2006) 4

1.1

Portfolio Return: Mean, Variance, and the Effect of Diversification

Many portfolio choice models center around two moments of the chosen portfolio: the expected return and the variance. This section is therefore devoted to discussing how these moments of the portfolio are related to the corresponding moments of the underlying assets. 1.1.1

Portfolio Returns: Expected Value and Variance

Remark 1.1 (Expected value and variance of a linear combination) Recall that
E.aR1 C bR2 / D a E.R1 / C b E.R2 /, and

Var.aR1 C bR2 / D a2 where ij

D Cov.Ri ; Rj /; and

ii

11

C b2

22

C 2ab

12 ;

D Cov.Ri ; Ri / D Var.Ri /.

Remark 1.2 (On the notation in these lecture notes ) Mean returns are denoted E.Ri / or i . Variances are denoted i2 or i i and the standard deviations i . Covariances are denoted ij .
The return on a portfolio with the portfolio weights w1 ; w2 ; :::; wn (˙in 1 wi D 1) is
D
Rp D w1 R1 C w2 R2 (with n D 2)
D

n
X

wi Ri (more generally),

(1.1)
(1.2)

i D1

4

and the expected return is
E.Rp / D w1 E.R1 / C w2 E.R2 / (with n D 2)
D
Let ij D Cov.Ri ; Rj /, and return is then
2
p

n
X

ii

11

D

wi2 i i

i D1

(1.4)

i D1

2
D w1

n
X

wi E.Ri / (more generally).

(1.3)

D Cov.Ri ; Ri / D Var.Ri /. The variance of a portfolio

2
C w2

C

22

C 2w1 w2

n n XX

12

wi wj

(with n D 2) ij (more generally).

(1.5)
(1.6)

i D1 j D1;j ¤i

In matrix form we have
E.Rp / D w 0 E.R/ and
2
p

D w 0 ˙w:

(1.7)
(1.8)

Remark 1.3 (Details on the matrix form) With two assets, we have the following:
#
"
#
"#
"
w1
E.R1 /
11
12 wD ; E.R/ D
; and ˙ D
:
w2
E.R2 /
12
22
E.Rp / D w 0 E.R/
"
# h i E.R /
1
D w1 w2
E.R2 /
D w1 E.R1 / C w2 E.R2 /:

5

2 p D w 0 ˙w
"
i h D w1 w2 h D w1
2
D w1

1.1.2

11

11

#" # w1 w2
22

11

12

12

C w2

"# iw 1 w1 12 C w2 22 w2 12

C w2 w1

12

C w1 w2

12

2
C w2

22 :

The Effect of Diversification

First, assume that the returns are uncorrelated ( ij D 0 if i ¤ j ). This is clearly not realistic, but provides a good starting point for illustrating the effect of diversification.
We will consider equally weighted portfolios of n assets (wi D 1=n). There are other portfolios with lower variance (and the same expected return), but it provides a simple analytical case.
The variance of an equally weighted portfolio is (when all covariances are zero)
D

n
X1
n2 i D1

D

2 p 1 n ii ,

n

ii

(if

D ij 1 X ii n i D1 n
D 0/:

(1.9)
(1.10)

In this expression, i i is the average variance of an individual return. This number could be treated as a constant (that is, not depend on n) if we form portfolios by randomly picking assets. In any case, (1.10) shows that the portfolio variance goes to zero as the number of assets (included in the portfolio) goes to infinity. Also a portfolio with a large but finite number of assets will typically have a low variance (unless we have systematically picked the very most volatile assets).
Second, we now allow for correlations of the returns. The variance of the equally weighted portfolio is then
1
2
(1.11)
ii ij C ij ; pD n where ij is the average covariance of two returns (which, again, can be treated as a constant if we pick assets randomly). Realistically, ij is positive. When the portfolio includes many assets, then the average covariance dominates. In the limit (as n goes to infinity), only this non-diversifiable risk matters.
6

Variance of equally weighted portfolio
0.03

(expected) Variance

Based on 10 US industry portfolios, 1947:1−2010:12

0.025
Variance
Avg covariance
0.02

0.015

0

2

4
6
Number of assets in portfolio

8

10

Figure 1.1: Effect of diversification
See Figure 1.1 for an example.
Proof. (of (1.11)) The portfolio variance is
2
p

D

n
X1
n2 i D1

ii

C

n
X

n
X

i D1 j D1;;j ¤i

1 n2 n n n
1 X ii n 1X X
D
C n i D1 n n i D1

ij

j D1;j ¤i

D

1 n ii

C

n

1 n ij

n .n

1/

ij ;

which can be rearranged as (1.11).
Remark 1.4 (On negative covariances in (1.11) ) Formally, it can be shown that ij must be non-negative as n ! 1. It is simply not possible to construct a very large number of random variables (asset returns or whatever other random variable) that are, on average, negatively correlated with each other. In (1.11) this manifests itself in that ij < 0 would give a negative portfolio variance as n increases.

7

A (NoDur)
B (Durbl)
C (Manuf)
D (Enrgy)
E (HiTec)
F (Telcm)
G (Shops)
H (Hlth )
I (Utils)
J (Other)
Table 1.1: Industries
1.1.3

Some Practical Remarks: Annualizing, Portfolio Weights

Remark 1.5 (Annualizing the MV figures ) Suppose we have weekly net returns R t D
P t =P t 1 1. The standard way of annualizing the mean and the standard deviation is to first estimate means and the covariance matrix on weekly returns, do all the MV calculations, and then (when showing the results) multiply the mean weekly return by 52 p and the standard deviation of the weekly return by 52. To see why, notice that an annual return would be
P t =P t

52

1 D .P t =P t

1 /.P t 1 =P t 2 / : : : .P t 51 =P t 52 /

D .R t C 1/.R t
Rt C Rt

1

1

C 1/ : : : .R t

C : : : C Rt

51

C 1/

1

1

51 :

To a first approximation, the mean annual return would therefore be
E.R t C R t

1

C : : : C Rt

51 /

D 52 E.R t /;

and if returns are iid (in particular, same variance and uncorrelated across time)
Var.R t C R t

D 52 Var.R t / ) p Std.R t C R t 1 C : : : C R t 51 / D 52 Std.R t /:
1

C : : : C Rt

51 /

Remark 1.6 (Portfolio weights ) If your total portfolio is worth W , and you have bought
˛i shares of firm i at the price Pi each, then the portfolio weight of that firm is clearly
8

wi D ˛i Pi =W .

1.2

Mean-Variance Frontier of Risky Assets

To calculate a point on the mean-variance frontier, we have to find the portfolio that
2
minimizes the portfolio variance, p , for a given expected return,
. The problem is thus minwi

2 p subject to E.Rp / D

Pn

i D1 wi

i

and

D

Pn

i D1 wi

D 1:

(1.12)

Let ˙ be the covariance matrix of the asset returns. The portfolio variance is then calculated as
Pn
2
0
(1.13) p D Var. i D1 wi Ri / D w ˙w:
The whole mean-variance frontier is generated by solving this problem for different values of the expected return ( ). The results are typically shown in a figure with the standard deviation on the horizontal axis and the required return on the vertical axis. The efficient frontier is the upper leg of the curve. Reasonably, a portfolio on the lower leg is dominated by one on the upper leg at the same volatility (since it has a higher expected return). See
Figure 1.2 for an example.
Remark 1.7 (Only two assets.) In the (empirically uninteresting) case of only two assets, the MV frontier can be calculated by simply calculating the mean and variance
E.Rp / D w
2
p

Dw

1

C .1

11

C .1

w/ w/ 2
2

22

C 2w.1

w/

12 :

at a set of different portfolio weights (for instance, w D .0; 0:25; 0:5; 0:75; 1/.) The reason is that, with only two assets, both assets are on the MV frontier—so no explicit minimization is needed. See Figures 1.3–1.4 for examples.
It is (relatively) straightforward to calculate the mean-variance frontier if there are no other constraints: it just takes some linear algebra—see Section 1.2.2. See Figure 1.6 for an example.
There are sometimes additional restrictions, for instance, no short sales: wi

0:

(1.14)
9

Mean−variance frontiers: w/wo short sales
15
original assets

Mean, %

10
E(R)
Std

6.00
4.80

Correlation matrix:
1.00 0.33
0.33 1.00
0.45 0.05

5

12.50 10.50
12.90 9.00

0.45
0.05
1.00

no restrictions no short sales
0

0

5

10

15

Std, %

Figure 1.2: Mean-variance frontiers
We then have to apply some explicit numerical minimization algorithm to find portfolio weights. Algorithms that solve quadratic problems are best suited (this is a quadratic problem—see (1.13)). See Figure 1.2 for an example. Other commonly used restrictions are that the new weights should not deviate too much from the old (when rebalancing)—in an effort to reduce trading costs jwinew wiold j < Ui ;

(1.15)

or that the portfolio weights must be between some boundaries
Li Ä wi Ä Ui :

(1.16)

Such constraints are typically easy to implement numerically.
Consider what happens when we add assets to the investment opportunity set. The old mean-variance frontier is, of course, still obtainable: we can always put zero weights on the new assets. In most cases, we can do better than that so the mean-variance frontier is moved to the left (lower volatility at the same expected return). See Figure 1.5 for an example. 10

MV−frontier with two assets
8.5
(x,y) means a portfolio with x% in asset A and y% in asset B

8

(0,100)

7.5
(25,75)

Mean, %

7
6.5

(50,50)

6
(75,25)

5.5
5

(100,0)

4.5
4

8

9

10

11

12
13
Std, %

14

15

16

17

Figure 1.3: Mean-variance frontiers for two risky assets.
1.2.1

The Shape of the MV Frontier of Risky Assets

This section discusses how the shape of the MV frontier depends on the correlation of the assets. For simplicity, only two assets are used but the general findings hold also when there are more assets.
With intermediate correlations ( 1 < < 1) the mean-variance frontier is a hyperbola— see Figures 1.7 and 1.8. Notice that the mean–volatility trade-off improves as the correlation decreases: a lower correlation means that we get a lower portfolio standard deviation
(at the same expected return). In fact, the case of a perfect (positive) correlation is a limiting case: a combination of two assets can never have higher standard deviation than the line connecting them in the
E.R/ space.
When the assets are perfectly correlated ( D 1), then the MV frontier is a pair of two straight lines—see Figure 1.8. The efficient frontier is clearly the upper leg. However, if short sales are ruled out then the MV frontier is just a straight line connecting the two assets (see the circles in Figure 1.8). The intuition is that a perfect correlation means that the second asset is a linear transformation of the first (R2 D a C bR1 ), so changing the portfolio weights essentially means forming just another linear combination of the first asset. In particular, there are no diversification benefits.
11

MV−frontier with two assets: different correlations
8.5
corr = 0 corr = 0.75

8
7.5
Mean, %

7
6.5
6
5.5
5
4.5
4
8

9

10

11

12
13
Std, %

14

15

16

17

Figure 1.4: Mean-variance frontiers for two risky assets, different correlations.
Also when the assets are perfectly negatively correlated ( D 1), then the MV frontier is a pair of straight lines, see Figure 1.8. In contrast to the case with a perfect positive correlation, this is true also when short sales are ruled out. This means, for instance, that we can combine the two assets (with positive weights) to get a riskfree portfolio. Proof. (of the MV shapes with 2 assets ) With a perfect correlation ( D 1) the standard deviation can be rearranged. Suppose the portfolio weights are positive (no short sales). Then we get p 2
D w1 11 C .1
˚
D Œw1 1 C .1

D w1

1

C .1

w1 /2

C 2w1 .1
«1=2
w1 / 2 2 w1 /

p

2

1

2

1=2
12

2:

We can rearrange this expression as w1 D expression for the expected return to get
E.Rp / D

w1 /

22

p

ŒE.R1 /

2

=.

1

2/

which we can use in the

E.R2 / C E.R2 /:

12

Mean−variance frontiers: effect of adding an asset
15
original assets new asset

Mean, %

10

5
3 assets
4 assets
0

0

5

10

15

Std, %

Figure 1.5: Mean-variance frontiers
This shows that the mean-variance frontier is just a straight line (if there are no short sales). With a perfectly negative correlation ( D 1) the standard deviation can be rearranged as follows (assuming positive weights)
1=2

p

2
D w1 11 C .1 w1 /2 22 2w1 .1 w1 / 1 2

«1=2
Œw1 1 .1 w1 / 2 2
D w1 1 .1 w1 / 2 if Œ 0

«
2 1=2
Œ w1 1 C .1 w1 / 2 
D w1 1 C .1 w1 / 2 if Œ 0:

The 2nd expression is 1 times the 1st expression. Only one can be positive at each time. Both have same form as in case with D 1, so both generate linear relation:
E Rp D a C b p —but with different slopes. We get a riskfree portfolio ( p D 0) if w1 D 2 =. 1 C 2 /.
1.2.2

Calculating the MV Frontier of Risky Assets: No Restrictions

When there are no restrictions on the portfolio weights, then there are two ways of finding a point on the mean-variance frontier: let a numerical optimization routine do the work or use some simple matrix algebra. The section demonstrates the second approach.
13

US industry portfolios, 1947:1−2010:12
20

Mean, %

15
H
C
G
J

A
I

10

D
B

E

F

5

0

0

5

10

15

20

25

Std, %

Figure 1.6: M-V frontier from US industry indices
To simplify the following equations, define the scalars A; B and C as
AD

0

˙

1

;B D

0

˙

1

1, and C D 10 ˙

1

1;

(1.17)

where 1 is a (column) vector of ones and 0 is the transpose of the column vector . Then, calculate the scalars (for a given required return )
D

C
AC

B
AB
and ı D
:
B2
AC B 2

(1.18)

The weights for a portfolio on the MV frontier of risky assets (at a given required return
) are then w D ˙ 1.
C 1ı/:
(1.19)
Using this in (1.13) gives the variance (take the square root to get the standard deviation).
We can trace out the entire MV frontier, by repeating this calculations for different values of the required return and then connecting the dots. In the std mean space, the efficient frontier (the upper part) is concave. See Figure 1.2 for an example.

14

Mean−variance frontiers: effect of higher correlation
15
original assets

Mean, %

10 original 0.4 higher correlation
5

0

0

5

10

15

Std, %

Figure 1.7: Mean-variance frontiers for normal and high correlations.
Example 1.8 (Transpose of a matrix) Consider the following examples
2 30
2
30
#
"
#
"
#0 "
1
h i 12
12
135
12
67
6
7
D
: and 435 D 1 3 5 ; 43 45 D
24
246
24
5
56
Transposing a symmetric matrix does nothing, that is, if A is symmetric, then A0 D A.
Proof. (of (1.17)–(1.19)) We set up this as a Lagrangian problem
2
L D .w1

11

2
C w2

22

C 2w1 w2

12 /=2

C.

w1

1

w2

2/

C ı.1

w1

w2 /:

The first order condition with respect to wi is @L=@wi D 0, that is, for w1 W w1

for w2 W w1

11
12

C w2

C w2

12

1

22

2

In matrix notation these first order conditions are
"
#" #
"#
w1
11
12
1
w2
12
22
2

ı D 0; ı D 0:

"# "#
1
0 ı D
:
1
0
15

High correlations (>0)

Low correlations (>0)
15

10

10

Mean, %

Mean, %

15

5 corr = 1 corr = 1/2

0
−5

0

5

10
15
Std, %

20

5 corr = 1/2 corr = 0

0
−5

25

0

Negative correlations

10
15
Std, %

20

25

Very negative correlations
15

10

10

Mean, %

15
Mean, %

5

5 corr = 0 corr = −1/2

0
−5

0

5

5 corr = −1/2 corr = −1

0

10
15
Std, %

20

−5

25

0

5

10
15
Std, %

20

25

Figure 1.8: Mean-variance frontiers for two risky assets: different correlations. The two assets are indicated by circles. Points between the two assets can be generated with positive portfolio weights (no short sales).
We can solve these equations for w1 and w2 as
"
"# w1 1
22
D
2
w2
11 22
12
12
"
#1
"#
D wD˙ 11

12

1

12

22

2

1

.

#

"

12
11

#
1
2

" #!
1

1

" #!
1

1

C ı 1/;

where 1 is a column vector of ones. The first order conditions for the Lagrange multipliers

16

MV−frontier at high correlations
15
corr = 1 corr = 0.995 corr = 0.99 corr = 0.98

Mean, %

10

5

0

−5

0

5

10

15

20

25

Std, %

Figure 1.9: Mean-variance frontiers for two risky assets at high correlations are (of course) for W

w1

for ı W 1

w1

1

w2

w2 D 0:

2

D 0;

In matrix notation, these conditions are
D

0

w and 1 D 10 w:

17

Mean−variance frontiers: w/wo riskfree asset
15
original assets tangency portfolio

Mean, %

10

5 risky risky+riskfree
0

0

5

10

15

Std, %

Figure 1.10: Mean-variance frontiers
Stack these into a 2

1 vector and substitute for w
"# "#
1

0

D

10

w

"#
0

D

1
"

0

˙

1

.

C ı 1/

#" #
0
˙1
˙ 11
D01

10 ˙ 1 1 ı "
#" #
AB
D
:
BC ı Solve for

0

and ı as

C
B
AB and ı D
:
2
AC B
AC B 2
Use this in the expression for w above.
D

18

1.3

Mean-Variance Frontier of Riskfree and Risky Assets

We now add a riskfree asset with return Rf . With two risky assets, the portfolio return is
Rp D w1 R1 C w2 R2 C .1
D w1 .R1

w1

Rf / C w2 .R2

e e D w1 R1 C w2 R2 C Rf ;

w2 /Rf
Rf / C Rf

(1.20)

e where Ri is the excess return of asset i . We denote the corresponding expected excess e return by e (so e D E.Ri /). i i
The minimization problem is now
2
minw1 ;w2 .w1

2
11 C w2 22 C 2w1 w2 12 /=2

subject to w1

e
1 C w2

e
2 C Rf

D

: (1.21)

Notice that we don’t need any restrictions on the sum of weights: the investment in the riskfree rate automatically makes the overall sum equal to unity.
With more assets, the minimization problem is minwi 2 p subject to E.Rp / D

Pn

i D1 wi

e i C Rf D

;

(1.22)

where the portfolio variance is calculated as usual
2
p

P
D Var. nD1 wi Ri / D w 0 ˙w: i (1.23)

When there are no additional constraints, then we can find an explicit solution in terms of some matrices and vectors—see Section 1.3.1. In all other cases, we need to apply an explicit numerical minimization algorithm (preferably for quadratic models).
1.3.1

Calculating the MV Frontier of Riskfree and Risky Assets: No Restrictions

The weights (of the risky assets) for a portfolio on the MV frontier (at a given required return ) are
Rf
w D e 0 1 e ˙ 1 e;
(1.24)
. /˙ where Rf is the riskfree rate and e the vector of mean excess returns ( weight on the riskfree asset is 1 10 w .

Rf ). The

19

US industry portfolios, 1947:1−2010:12
20

Mean, %

15
H
C
G
J

A
I

10

D
B

E

F

5

0

0

5

10

15

20

25

Std, %

Figure 1.11: M-V frontier from US industry indices
Using this in (1.13) gives the variance (take the square root to get the standard deviation). We can trace out the entire MV frontier, by repeating this calculations for different values of the required return and then connecting the dots. In the std mean space, the efficient frontier (the upper part) is just a line. See Figure 1.10 for an example.
Proof. (of (1.24)) Define the Lagrangian problem
2
L D .w1

11

2
C w2

22

C 2w1 w2

12 /=2

C.

w1

e
1

w2

e
2

Rf /:

The first order condition with respect to wi is @L=@wi D 0, so for w1 W w1

for w2 W w1

11
12

C w2
C w2

12

e
1

22

e
2

D 0;
D 0:

20

It is then immediate that we can write them in matrix form as
"
#" #
" # "# e w1
0
11
12
1
D
, so e w2
0
12
22
2
"#"
#1 " w1 11
12
D w2 12
22
wD˙

1

e

e
1
e
2

#
, or

:

The first order condition for the Lagrange multiplier is (in matrix form) e D w0

C Rf :

Combine to get
/ ˙ 1 e C Rf , so
Rf
D e 0 1 e:
. /˙
D.

e0

Use in the above expression for w .
1.3.2

Tangency Portfolio

The MV frontier for risky assets and the frontier for risky+riskfree assets are tangent at one point—called the tangency portfolio. In this case the portfolio weights (1.19) and
(1.24) coincide. Therefore, the portfolio weights (1.24) must sum to unity (so the weight on the riskfree asset is zero). This helps use to understand what the expected excess return on the tangency portfolio is—which if used in (1.24) gives the portfolio weights of the tangency portfolio
˙1e
(1.25) w D 0 1 e:

Proof. (of (1.25)) Put the sum of the portfolio weights in (1.24) equal to one
10 w D

Rf
.

e /0 ˙ 1

which only happens if
Rf D

.

e

10 ˙

1

e

D 1;

e0

/˙ 1 e
:
10 ˙ 1 e
21

Using in (1.24) gives (1.25).

1.4

Examples of Portfolio Weights from MV Calculations

With 2 risky assets and 1 riskfree asset the portfolio weights satisfy (1.24). We can write this as
"
# e e
1
22 1
12 2 wD ;
(1.26)
2 e e
11 22

11

12

12

2

1

where > 0 if we limit our attention to the efficient part where
> Rf . (This follows e0 1e
1
from the fact that . / ˙
> 0 since ˙ is positive definite, because ˙ is). We can then discuss some general properties of all portfolios in the efficient set.
Simple Case 1: Uncorrelated Assets (

12

D 0)

From (1.26) we then get
"

w1 w2 #

"
D

e
1 = 11 e 2 = 22

#
:

(1.27)

Suppose that > 0 (efficient part of the MV frontier) and that both excess returns are positive. In that case we have the following.
First, both weights are positive. The intuition is that uncorrelated assets make it efficient to diversify (to get the same expected return, but at a lower variance).
Second, the asset with the highest e = i i ratio has the highest portfolio weight. The i intuition is that an asset with a high excess return and/or low volatility is an efficient way to achieve a low volatility at a given mean return.
Notice that increasing e = i i does not guarantee that the actual weight on asset i i increases (because changes too). For instance, an increase in the expected return of an asset may allow us to shift assets towards the riskfree asset (and still get the same expected portfolio return, but lower variance).
Example 1.9 (Portfolio weights with uncorrelated assets) When . e ; e / D .0:07; 0:07/,
1
2 the correlation is zero, . 11 ; 22 / D .1; 1/, and
R D 0:09, then (1.27) gives
"
#
"
#"
#
w1
0:07
0:64
D 9:18
D
: w2 0:07
0:64
22

If we change to .

e
1;

e
2/

If we instead change to .

D .0:09; 0:07/, then
"
#
"
#"
#
w1
0:09
0:62
D 6:92
D
: w2 0:07
0:48
D .1=2; 1/, then
"
#
"
#"
#
w1
0:14
0:86
D 6:12
D
: w2 0:07
0:43

11 ;

22 /

Simple Case 2: Same Variances (but Correlation)
Let

D 22 D 1 (as a normalization), so the covariance becomes the correlation where 1 < < 1:
12 D
From (1.26) we then get
"
#
"
# e e w1 1
1
2
D
:
(1.28)
2 e e
1
w2
2
1
11

Suppose that > 0 (efficient part of the MV frontier) and that both excess returns are positive. In that case, we have the following.
First, both weights are positive if the returns are negatively correlated ( < 0). The intuition is that a negative correlation means that the assets “hedge” each other (even better than diversification), so the investor would like to hold both of them to reduce the overall risk. e Second, if > 0 and e is considerably higher than e (so e <
1
2
2
1 , which also e e implies 1 >
2 ), then w1 > 0 but w2 < 0. The intuition is that a positive correlation reduces the gain from holding both assets (they don’t hedge each other, and there is relatively little diversification to be gained if the correlation is high). On top of this, asset
1 gives a higher expected return, so it is optimal to sell asset 2 short (essentially a risky
“loan” which allows the investor to buy more of asset 1).
Example 1.10 (Portfolio weights with correlated assets) When .
D 0:8, and
R D 0:09, then (1.27) gives
"
#
"
#"
#
w1
0:039
0:64
D 16:53
D
: w2 0:039
0:64

e
1;

e
2/

D .0:07; 0:07/,

23

This is the same as in the previous example. If we change to . e ; e / D .0:09; 0:07/, then
1
2 we get
"
#
"
#"
#
w1
0:094
1:05
D 11:10
D
: w2 0:006
0:06
If we also change to

D

0:8, then we get
"
#
"
#"
#
w1
0:406
0:57
D 1:40
D
: w2 0:394
0:55

These two last solutions are very different from the previous example.

A

A Primer in Matrix Algebra

Let c be a scalar and define the matrices
"#
"#
"
#
"
# x1 z1
A11 A12
B11 B12 xD ;z D
;A D
, and B D
:
x2 z2 A21 A22
B21 B22
Adding/subtracting a scalar to a matrix or multiplying a matrix by a scalar are both element by element
"
#
"
#
A11 A12
A11 C c A12 C c
Cc D
A21 A22
A21 C c A22 C c
"
#
"
#
A11 A12
A11 c A12 c cD :
A21 A22
A21 c A22 c
Example A.1
"
#
"
13
11
C 10 D
34
13
"
#
"
10
13
10 D
30
34

13
14

#

#
30
:
40

Matrix addition (or subtraction) is element by element
"
#"
#"
#
A11 A12
B11 B12
A11 C B11 A12 C B12
ACB D
C
D
:
A21 A22
B21 B22
A21 C B21 A22 C B22
24

Example A.2 (Matrix addition and subtraction/
" # "# "#
10
2
8
D
11
5
6
"
#"
#"
#
13
12
25
C
D
34
3
2
62
To turn a column into a row vector, use the transpose operator like in x 0
" #0 h i x1 x0 D
D x1 x2 : x2 Similarly, transposing a matrix is like flipping it around the main diagonal
"
#0 "
#
A11 A12
A11 A21
D
:
A0 D
A21 A22
A12 A22
Example A.3 (Matrix transpose)
#0
h
10
D 10
11
2
"
#0
1
123
6
D 42
456
3
"

11

i

3
4
7
55
6

Matrix multiplication requires the two matrices to be conformable: the first matrix has as many columns as the second matrix has rows. Element ij of the result is the multiplication of the i th row of the first matrix with the j th column of the second matrix
#
"
#"
#"
A11 A12 B11 B12
A11 B11 C A12 B21 A11 B12 C A12 B22
AB D
D
:
A21 A22 B21 B22
A21 B11 C A22 B21 A21 B12 C A22 B22
Multiplying a square matrix A with a column vector z gives a column vector
"
#" # "
#
A11 A12 z1
A11 z1 C A12 z2
Az D
D
:
A21 A22 z2
A21 z1 C A22 z2

25

Example A.4 (Matrix multiplication)
"
#"
#"
#
13 1 2
10
4
D
34 3
2
15
2
"
#" # " #
13 2
17
D
34 5
26
For two column vectors x and z , the product x 0 z is called the inner product
"#
h iz 1
0
x z D x1 x2
D x1 z1 C x2 z2 ; z2 and xz 0 the outer product
"#
"
#
i x1 h x1 z1 x1 z2 xz 0 D
:
z1 z2 D x2 x2 z1 x2 z2
(Notice that xz does not work). If x is a column vector and A a square matrix, then the product x 0 Ax is a quadratic form.
Example A.5 (Inner product, outer product and quadratic form )
" #0 " #
"#
h i2 10
2
D 10 11
D 75
11
5
5
"
#
" # " #0 " # i 20 50
10 2
10 h
D
25D
22 55
11 5
11
" #0 "
#" #
10
1 3 10
D 1244:
11
3 4 11
A matrix inverse is the closest we get to “dividing” by a matrix. The inverse of a matrix A, denoted A 1 , is such that
AA

1

D I and A 1 A D I;

where I is the identity matrix (ones along the diagonal, and zeroes elsewhere). The matrix inverse is useful for solving systems of linear equations, y D Ax as x D A 1 y .

26

Example A.6 (Matrix inverse) We have
"
#"
#"
#
4 =5 3=5
13
10
D
, so
3=5
1=5 3 4
01
"
#1 "
#
13
4 =5 3=5
D
:
34
3=5
1=5
Let z and x be n

1 vectors. The derivative of the inner product is @.z 0 x/=@z D x .

Example A.7 (Derivative of an inner product) With n D 2
"#
x1
@.z1 x1 C z2 x2 /
@.z 0 x/
"#
D z 0 x D z1 x1 C z2 x2 , so
D
:
@z
x2
@z1
@z2
Let x be n 1 and A a symmetric n is @.x 0 Ax/=@x D 2Ax .

n matrix. The derivative of the quadratic form

Example A.8 (Derivative of a quadratic form) With n D 2, the quadratic form is
"
#" # h iA
A12 x1
11
2
2
x 0 Ax D x1 x2
D x1 A11 C x2 A22 C 2x1 x2 A12 :
A12 A22 x2
The derivatives with respect to x1 and x2 are
@.x 0 Ax/
@.x 0 Ax/
D 2x1 A11 C 2x2 A12 and
D 2x2 A22 C 2x1 A12 , or
@x1
@x2
"
#" #
A11 A12 x1
@.x 0 Ax/
" # D2
:
A12 A22 x2
@x1
@x2

B

A Primer in Optimization

You want to choose x and y to minimize
L D .x

2/2 C .4y C 3/2 ;

27

(x−2)2 + (4x+3)2

Contours of (x−2)2 + (4x+3)2
−0.4
y

5

−0.6
−0.8

0
4

−0.5
3

2

1

x

−1

−1 y 1

2

3

4

x

(x−2)2 + (4x+3)2 when x+2y=3

with restriction x+2y=3
20
−0.4 y 15
−0.6

10

−0.8

5

−1

0

y=(3−x)/2

1

2

3

4

1

x

2

3

4

x

Figure B.1: Minimization problem then we have to find the values of x and y that satisfy the first order conditions @L=@x D
@L=@y D 0. These conditions are
0 D @L=@x D 2.x

2/

0 D @L=@y D 8.4y C 3/;

which clearly requires x D 2 and y D 3=4. In this particular case, the first order condition with respect to x does not depend on y , but that is not a general property. In this case, this is the unique solution—but in more complicated problems, the first order conditions could be satisfied at different values of x and y .
See Figure B.1 for an illustration.

28

If you want to add a restriction to the minimization problem, say x C 2y D 3; then we can proceed in two ways. The first is to simply substitute for x D 3 get L D .1 2y/2 C .4y C 3/2 ;

2y in L to

with first order condition
0 D @L=@y D

4.1

2y/ C 8.4y C 3/ D 40y C 20;

which requires y D 1=2. (We could equally well have substituted for y ). This is also the unique solution.
The second method is to use a Lagrangian. The problem is then to choose x , y , and to minimize
L D .x 2/2 C .4y C 3/2 C .3 x 2y/ :
The term multiplying

is the restriction. The first order conditions are now
0 D @L=@x D 2.x

2/

0 D @L=@y D 8.4y C 3/

0 D @L=@ D 3

x

2

2y:

The first two conditions say x D =2 C 2

y D =16

3=4;

so we need to find . To do that, use these latest expressions for x and y in the third first order condition (to substitute for x and y )
3 D =2 C 2 C 2 . =16
D 4:

3=4/ D 5=8 C 1=2, so

29

Finally, use this to calculate x and y as x D 4 and y D

1=2:

Notice that this is the same solution as before (y D 1=2) and that the restriction holds
(4 C 2. 1=2/ D 3). This second method is clearly a lot clumsier in my example, but it pays off when the restriction(s) become complicated.

Bibliography
Elton, E. J., M. J. Gruber, S. J. Brown, and W. N. Goetzmann, 2010, Modern portfolio theory and investment analysis, John Wiley and Sons, 8th edn.
Fabozzi, F. J., S. M. Focardi, and P. N. Kolm, 2006, Financial modeling of the equity market, Wiley Finance.

30

2

Index Models

Reference: Elton, Gruber, Brown, and Goetzmann (2010) 7–8, 11

2.1

The Inputs to a MV Analysis

To calculate the mean variance frontier we need to calculate both the expected return and variance of different portfolios (based on n assets). With two assets (n D 2) the expected return and the variance of the portfolio are
"#
h i 1
E.Rp / D w1 w2
2

2
P

h

D w1 w2

i

"

2
1
12

#" # w1 12
:
2 w2 2

(2.1)

In this case we need information on 2 mean returns and 3 elements of the covariance matrix. Clearly, the covariance matrix can alternatively be expressed as
"
#"
#
2
1

12

12
2
2

D

2
1

12 1 2

12 1 2
2
2

;

(2.2)

which involves two variances and one correlation (3 elements as before).
There are two main problems in estimating these parameters: the number of parameters increase very quickly as the number of assets increases and historical estimates have proved to be somewhat unreliable for future periods.
To illustrate the first problem, notice that with n assets we need the following number of parameters
Required number of estimates

With 100 assets

n n 100
100
4950

i ii ij

n.n

1/=2

31

The numerics is not the problem as it is a matter of seconds to estimate a covariance matrix of 100 return series. Instead, the problem is that most portfolio analysis uses lots of judgemental “estimates.” These are necessary since there might be new assets
(no historical returns series are available) or there might be good reasons to believe that old estimates are not valid anymore. To cut down on the number of parameters, it is often assumed that returns follow some simple model. These notes will discuss so-called single- and multi-index models.
The second problem comes from the empirical observations that estimates from historical data are sometimes poor “forecasts” of future periods (which is what matters for portfolio choice). As an example, the correlation between two asset returns tends to be more “average” than the historical estimate would suggest.
A simple (and often used) way to deal with this is to replace the historical correlation with an average historical correlation. For instance, suppose there are three assets.
Then, estimate ij on historical data, but use the average estimate as the “forecast” of all correlations: 2
3
2
3
1 12 13
1NN
6
7
6
7
estimate 4
1
1 N5 :
23 5 , calculate N D . O12 C O13 C O23 /=3, and use 4
1

2.2

1

Single-Index Models

The single-index model is a way to cut down on the number of parameters that we need to estimate in order to construct the covariance matrix of assets. The model assumes that the co-movement between assets is due to a single common influence (here denoted Rm )
Ri D ˛i C ˇi Rm C ei , where E.ei / D 0, Cov .ei ; Rm / D 0, and Cov.ei ; ej / D 0: (2.3)
The first two assumptions are the standard assumptions for using Least Squares: the residual has a zero mean and is uncorrelated with the non-constant regressor. (Together they imply that the residuals are orthogonal to both regressors, which is the standard assumption in econometrics.) Hence, these two properties will be automatically satisfied if (2.3) is estimated by Least Squares.
See Figures 2.1 – 2.3 for illustrations.
The key point of the model, however, is the third assumption: the residuals for dif32

10
8

CAPM regression: Ri − Rf = α + β(Rm − Rf ) + ei
Intercept (α ) and slope (β ): 2.0 1.3

Excess return asset i,%

6
4
2
0
−2
−4
−6
Data points
Regression line

−8
−10
−10

−5

0
Market excess return,%

5

10

Figure 2.1: CAPM regression ferent assets are uncorrelated. This means that all comovements of two assets (Ri and
Rj , say) are due to movements in the common “index” Rm . This is not at all guaranteed by running LS regressions—just an assumption. It is likely to be false—but may be a reasonable approximation in many cases. In any case, it simplifies the construction of the covariance matrix of the assets enormously—as demonstrated below.
Remark 2.1 (The market model) The market model is (2.3) without the assumption that
Cov.ei ; ej / D 0. This model does not simplify the calculation of a portfolio variance—but will turn out to be important when we want to test CAPM.
If (2.3) is true, then the variance of asset i and the covariance of assets i and j are ii ij

D ˇi2 Var .Rm / C Var .ei /
D ˇi ˇj Var .Rm / :

(2.4)
(2.5)

Together, these equations show that we can calculate the whole covariance matrix by
33

Scatter plot against market return
20

Scatter plot against market return
30

US data
1970:1−2010:12

Excess return %, Utils

Excess return %, HiTec

30

10
0
−10
−20
−30
−30

α β −0.14
1.28

−20 −10
0
10
20
Excess return %, market

30

20
10
0
−10
α β −20
−30
−30

0.22
0.53

−20 −10
0
10
20
Excess return %, market

30

Figure 2.2: Scatter plot against market return having just the variance of the index (to get Var .Rm /) and the output from n regressions
(to get ˇi and Var .ei / for each asset). This is, in many cases, much easier to obtain than direct estimates of the covariance matrix. For instance, a new asset does not have a return history, but it may be possible to make intelligent guesses about its beta and residual variance (for instance, from knowing the industry and size of the firm).
This gives the covariance matrix (for two assets)
" #! "
#
"
#
Ri
ˇi2 ˇi ˇj
Var.ei /
0
Cov
D
Var .Rm / C
, or
(2.6)
2
Rj
ˇi ˇj ˇj
0
Var.ej /
"#
"
#
i
ˇi h
Var.ei /
0
D
(2.7)
ˇi ˇj Var .Rm / C
ˇj
0
Var.ej /
More generally, with n assets we can define ˇ to be an n 1 vector of all the betas and ˙ to be an n n matrix with the variances of the residuals along the diagonal. We can then write the covariance matrix of the n 1 vector of the returns as
Cov.R/ D ˇˇ 0 Var .Rm / C ˙:

(2.8)

See Figure 2.4 for an example based on the Fama-French portfolios detailed in Table
2.2.
Remark 2.2 (Fama-French portfolios) The portfolios in Table 2.2 are calculated by an34

HiTec constant 0:14
. 0:90/ market return
1:28
.32:40/
R2
0:75 obs 492:00
Autocorr (t)
0:75
White
6:77
All slopes
367:19

Utils
0:22
.1:39/
0:53
.12:59/
0:35
492:00
0:85
19:95
172:22

Table 2.1: CAPM regressions, monthly returns, %, US data 1970:1-2010:12. Numbers in parentheses are t-stats. Autocorr is a N(0,1) test statistic (autocorrelation); White is a chi-square test statistic (heteroskedasticity), df = K(K+1)/2 - 1; All slopes is a chi-square test statistic (of all slope coeffs), df = K-1 nual rebalancing (June/July). The US stock market is divided into 5 5 portfolios as follows. First, split up the stock market into 5 groups based on the book value/market value: put the lowest 20% in the first group, the next 20% in the second group etc. Second, split up the stock market into 5 groups based on size: put the smallest 20% in the first group etc. Then, form portfolios based on the intersections of these groups. For instance, in Table 2.2 the portfolio in row 2, column 3 (portfolio 8) belong to the 20%-40% largest firms and the 40%-60% firms with the highest book value/market value.
Book value/Market value
1234
5
Size 1
2
3
4
5

1
6
11
16
21

2
7
12
17
22

3
8
13
18
23

4
9
14
19
24

5
10
15
20
25

Table 2.2: Numbering of the FF indices in the figures.
Proof. (of (2.4)–(2.5) By using (2.3) and recalling that Cov.Rm ; ei / D 0 direct calcu-

35

US industry portfolios, beta (against the market), 1970:1−2010:12

beta

1.5

1

0.5

NoDur Durbl Manuf Enrgy HiTec Telcm Shops

Hlth

Utils

Other

Figure 2.3: ˇ s of US industry portfolios lations give ii D Var .Ri /

D Var .˛i C ˇi Rm C ei /

D Var .ˇi Rm / C Var .ei / C 2

0

D ˇi2 Var .Rm / C Var .ei / :

Similarly, the covariance of assets i and j is (recalling also that Cov ei ; ej D 0) ij D Cov Ri ; Rj

D Cov ˛i C ˇi Rm C ei ; ˛j C ˇj Rm C ej
D ˇi ˇj Var .Rm / C 0
D ˇi ˇj Var .Rm / :

36

Correlations, data

Difference in correlations: data − model

1

0.5

0.5

0

0
25 20
15 10
5
Portfolio

5

20 25
10 15
Portfolio

−0.5
25 20
15 10
5

5

20 25
10 15

25 FF US portfolios, 1957:1−2010:12
Index (factor): US market

Figure 2.4: Correlations of US portfolios

2.3
2.3.1

Estimating Beta
Estimating Historical Beta: OLS and Other Approaches

Least Squares (LS) is typically used to estimate ˛i , ˇi and Std.ei / in (2.3)—and the R2 is used to assess the quality of the regression.
Remark 2.3 (R2 of market model) R2 of (2.3) measures the fraction of the variance (of
Ri ) that is due to the systematic part of the regression, that is, relative importance of market risk as compared to idiosyncratic noise (1 R2 is the fraction due to the idiosyncratic noise) 2
ˇi2 m
Var.˛i C ˇi Rm /
2
:
D 22
RD
2
Var.Ri /
ˇi m C ei
To assess the accuracy of historical betas, Blume and others estimate betas for nonoverlapping samples (periods)—and then compare the betas across samples. They find that the correlation of betas across samples is moderate for individual assets, but relatively high for diversified portfolios. It is also found that betas tend to “regress” towards one: an extreme historical beta is likely to be followed by a beta that is closer to one. There are several suggestions for how to deal with this problem.
37

O
To use Blume’s ad-hoc technique, let ˇi1 be the estimate of ˇi from an early sample,
O
and ˇi 2 the estimate from a later sample. Then regress
O
ˇi 2 D

0

C

OC

1 ˇi1

(2.9)

i

and use it for forecasting the beta for yet another sample. Blume found . O0 ; O1 / D
.0:343; 0:677/ in his sample.
O
Other authors have suggested averaging the OLS estimate (ˇi1 ) with some average
O
O
O
beta. For instance, .ˇi1 C1/=2 (since the average beta must be unity) or .ˇi1 C˙in 1 ˇi1 =n/=2
D
O
(which will typically be similar since ˙in 1 ˇi1 =n is likely to be close to one).
D
The Bayesian approach is another (more formal) way of adjusting the OLS estimate.
O
It also uses a weighted average of the OLS estimate, ˇi1 , and some other number, ˇ0 ,
O
.1 F /ˇi1 C Fˇ0 where F depends on the precision of the OLS estimator. The general idea of a Bayesian approach (Greene (2003) 16) is to treat both Ri and ˇi as random. In this case a Bayesian analysis could go as follows. First, suppose our prior beliefs (before
2
2 having data) about ˇi is that it is normally distributed, N.ˇ0 ; 0 /, where (ˇ0 ; 0 ) are some numbers . Second, run a LS regression of (2.3). If the residuals are normally distributed,
2
O so is the estimator—it is N.ˇi1 ; ˇ1 /, where we have taken the point estimate to be the
2
mean. If we treat the variance of the LS estimator ( ˇ1 ) as known, then the Bayesian estimator of beta is b D .1

O
F /ˇi1 C Fˇ0 , where

2
1= 0
FD
2
1= 0 C 1=

2.3.2

2
ˇ1

D

2
ˇ1

2
0

C

2
ˇ1

:

(2.10)

Fundamental Betas

Another way to improve the forecasts of the beta over a future period is to bring in information about fundamental firm variables. This is particularly useful when there is little historical data on returns (for instance, because the asset was not traded before).
It is often found that betas are related to fundamental variables as follows (with signs in parentheses indicating the effect on the beta): Dividend payout (-), Asset growth (+),
Leverage (+), Liquidity (-), Asset size (-), Earning variability (+), Earnings Beta (slope in earnings regressed on economy wide earnings) (+). Such relations can be used to make an educated guess about the beta of an asset without historical data on the returns—but
38

with data on (at least some) of these fundamental variables.

2.4
2.4.1

Multi-Index Models
Overview

The multi-index model is just a multivariate extension of the single-index model (2.3)
Ri D ai C bi1 I1 C bi 2 I2 C : : : C bi k Ik C ei , where

(2.11)

E.ei / D 0, Cov ei ; Ik D 0, and Cov.ei ; ej / D 0:

As an example, there could be two indices: the stock market return and an interest rate.
An ad-hoc approach is to first try a single-index model and then test if the residuals are approximately uncorrelated. If not, then adding a second index might give an acceptable approximation. It is often found that it takes several indices to get a reasonable approximation—but that a single-index model is equally good (or better) at “forecasting” the covariance over a future period. This is much like the classical trade-off between in-sample fit (requires a large model) and forecasting (often better with a small model).
The types of indices vary, but one common set captures the “business cycle” and includes things like the market return, interest rate (or some measure of the yield curve slope), GDP growth, inflation, and so forth. Another common set of indices are industry indices. It turns out (see below) that the calculations of the covariance matrix are much simpler if the indices are transformed to be uncorrelated so we get the model
Ri D ai C bi1 I1 C bi 2 I2 C : : : C bi k Ik C ei ; where

(2.12)

E.ei / D 0, Cov ei ; Ij D 0, Cov.ei ; ej / D 0 (unless i D j /, and

Cov.Ij ; Ih / D 0 (unless j D h).

If this transformation of the indices is linear (and non-singular, so it is can be reversed if we want to), then the fit of the regression is unchanged.

39

2.4.2

“Rotating” the Indices

There are several ways of transforming the indices to make them uncorrelated, but the following regression approach is perhaps the simplest and may also give the best possibility of interpreting the results:
1. Let the first transformed index equal the original index, I1 D I1 (possibly demeaned). This would often be the market return.
2. Regress the second original index on the first transformed index, I2 D 0 C
"2 . Then, let the second transformed index be the fitted residual, I2 D "2 .
O

1 I1 C

3. Regress the third original index on the first two transformed indices, I3 D Â0 C
Â1 I1 C Â2 I2 C "3 . Then, let I3 D "3 . Follow the same idea for all subsequent
O
indices.
Recall that the fitted residual (from Least Squares) is always uncorrelated with the regressor (by construction). In this case, this means that I2 is uncorrelated with I1 (step
2) and that I3 is uncorrelated with both I1 and I2 (step 3). The correlation matrix of the first three rotated indices is therefore
02 31 2
3
I1
100
B6 7C 6
7
Corr @4I2 5A D 40 1 05 :
(2.13)
I3

001

This recursive approach also helps in interpreting the transformed indices. Suppose the first index is the market return and that the second original index is an interest rate.
The first transformed index (I1 ) is then clearly the market return. The second transformed index (I2 ) can then be interpreted as the interest rate minus the interest rate expected at the current stock market return—that is, the part of the interest rate that cannot be explained by the stock market return.
More generally, let the j th index (j D 1; 2; : : : ; k ) be
Ij D "j , where "j is the fitted residual from the regression
O
O

Ij D

j1

C

j 1 I1

C ::: C

j ;j 1 Ij 1

C "j :

Notice that for the first index (j D 1), the regression is only I1 D the demeaned I1 .

11

(2.14)
(2.15)

C "1 , so I1 equals
40

2.4.3

Multi-Index Model after “Rotating” the Indices

To see why the transformed indices are very convenient for calculating the covariance matrix, consider a two-index model. Then, (2.12) implies that the variance of asset i is ii D Var .ai C bi1 I1 C bi 2 I2 C ei /

2
D bi1 Var .I1 / C bi22 Var .I2 / C Var .ei / :

(2.16)

Similarly, the covariance of assets i and j is ij D Cov ai C bi1 I1 C bi 2 I2 C ei ; aj C bj1 I1 C bj 2 I2 C ej

D bi1 bj1 Var .I1 / C bi 2 bj 2 Var .I2 / :

(2.17)

More generally, with n assets and k indices we can define b1 to be an n 1 vector of the slope coefficients for the first index (bi1 ; bj1 ) and b2 the vector of slope coefficients for the second index and so on. Also, let ˙ to be an n n matrix with the variances of the residuals along the diagonal. The covariance matrix of the returns is then
0
0
0
Cov.R/ D b1 b1 Var .I1 / C b2 b2 Var .I2 / C : : : C bk bk Var .Ik / C ˙:

(2.18)

See Figure 2.5 for an example.
2.4.4

Multi-Index Model as Method in Portfolio Choice

The factor loadings (betas) can be used for more than just constructing the covariance matrix. In fact, the factor loadings are often used directly in portfolio choice. The reason is simple: the betas summarize how different assets are exposed to the big risk factors/return drivers. The betas therefore provide a way to understand the broad features of even complicated portfolios. Combined this with the fact that many analysts and investors have fairly little direct information about individual assets, but are often willing to form opinions about the future relative performance of different asset classes (small vs large firms, equity vs bonds, etc)—and the role for factor loadings becomes clear.
See Figures 2.6–2.7 for an illustration.

41

Correlations, data

Difference in correlations: data − model

1

0.5

0.5

0

0
25 20
15 10
5
Portfolio

5

20 25
10 15
Portfolio

−0.5
25 20
15 10
5

5

20 25
10 15

25 FF US portfolios, 1957:1−2010:12
Indices (factors): US market, SMB, HML

Figure 2.5: Correlations of US portfolios

2.5

Principal Component Analysis

Principal component analysis (PCA) can help us determine how many factors that are needed to explain a cross-section of asset returns.
N
Let z t D R t R t be an n 1 vector of demeaned returns with covariance matrix ˙ .
The first principal component (pc1t ) is the (normalized) linear combinations of z t that account for as much of the variability as possible—and its variance is denoted 1 . The j th (j 2) principal component (pcjt ) is similar (and its variance is denoted j ), except that is must be uncorrelated with all lower principal components. Remark 2.4 gives a a formal definition.
Remark 2.4 (Principal component analysis) Consider the zero mean N 1 vector z t with
0
covariance matrix ˙ . The first (sample) principal component is pc1t D w1 z t , where w1 is the eigenvector associated with the largest eigenvalue ( 1 ) of ˙ . This value of w1 solves the problem maxw w 0 ˙w subject to the normalization w 0 w D 1. The eigenvalue
0
1 equals Var.pc1t / D w1 ˙w1 . The j th principal component solves the same problem, but under the additional restriction that wi0 wj D 0 for all i < j . The solution is the eigenvector associated with the j th largest eigenvalue j (which equals Var.pcjt / D
0
wj ˙ wj ).
42

US portfolios, βm, 1957:1−2010:12

US portfolios, βSMBres
1.5

1.4
1.2

β

β

1
0.5

1

0
0

5

10
15
Portfolio

20

25

0

5

10
15
Portfolio

20

25

β

US portfolios, βHMLres
0.8
0.6
0.4
0.2
0
−0.2
−0.4

0

5

10
15
Portfolio

20

25

Figure 2.6: Loading (betas) of rotated factors
Let the i th eigenvector be the i th column of the n
W D Œ w1
We can then calculate the n

n matrix
(2.19)

wn :

1 vector of principal components as
(2.20)

pc t D W 0 z t :
Since the eigenvectors are ortogonal it can be shown that W 0 D W can be inverted as z t D Wpc t :

1

, so the expression
(2.21)

This shows that the i th eigenvector (the i th column of W ) can be interpreted as the effect of the i th principal component on each of the elements in z t . However, the sign of column

43

Factor exposure of small growth stocks
HML (res)

Factor exposure of large value stocks

HML (res)

Market

Market
SMB (res)

SMB (res)
The factor exposure is measured as abs(β)
The factors are rotated to become uncorrelated

Figure 2.7: Absolute loading (betas) of rotated factors j of W can be changed without any effects (except that the pcjt also changes sign), so we can always reinterpret a negative cofficient as a positive exposure (to p cjt ).
Example 2.5 (PCA with 2 series) With two series we have
"
#0 " #
"
#0 " # w11 z1t w12 z1t pc1t D and pc2t D or w21 z2t w22 z2t "
#"
#0 " # pc1t w11 w12 z1t D and pc2t w21 w22 z2t "#"
#"
# z1t w11 w12 pc1t
D
: z2t w21 w22 pc2t
For instance, w12 shows how pc2t affects z1t , while w22 shows how pc2t affects z2t .
0
0
Remark 2.6 (Data in matrices ) Transpose (2.20) to get pc t D z t W , where the dimensions are 1 n, 1 n and n n respectively. If we form a T n matrix of data Z by putting z t in row t , then the T N matrix of principal components can be calculated as
P C D ZW .

44

25 FF US portfolios, eigenvectors, monthly data 1957:1−2010:12
0.5
1st (83.5%)
2nd (6.9%)
3rd (3.6%)

0

−0.5

0

5

10

15

20

25

Figure 2.8: Eigenvectors for US portfolio returns
Notice that (2.21) shows that all n data series in z t can be written in terms of the n principal components. Since the principal components are uncorrelated (Cov.pci t ; pcjt / D
0/), we can think of the sum of their variances (˙in 1 i ) as the “total variation” of the
D
series in z t . In practice, it is common to report the relative importance of principal component j as relative importance of pcj D j =˙in 1 i :
(2.22)
D
For instance, if it is found that the first two principal components account for 75% for the total variation among many asset returns, then a two-factor model is likely to be a good approximation. 2.6

Estimating Expected Returns

The starting point for forming estimates of future mean excess returns is typically historical excess returns. Excess returns are preferred to returns, since this avoids blurring the risk compensation (expected excess return) with long-run movements in inflation (and therefore interest rates). The expected excess return for the future period is typically formed as a judgmental adjustment of the historical excess return. Evidence suggest that the adjustments are hard to make.
45

It is typically hard to predict movements (around the mean) of asset returns, but a few variables seem to have some predictive power, for instance, the slope of the yield curve, the earnings/price yield, and the book value–market value ratio. Still, the predictive power is typically low.
Makridakis, Wheelwright, and Hyndman (1998) 10.1 show that there is little evidence that the average stock analyst beats (on average) the market (a passive index portfolio).
In fact, less than half of the analysts beat the market. However, there are analysts which seem to outperform the market for some time, but the autocorrelation in over-performance is weak. The evidence from mutual funds is similar. For them it is typically also found that their portfolio weights do not anticipate price movements.
It should be remembered that many analysts also are sales persons: either of a stock
(for instance, since the bank is underwriting an offering) or of trading services. It could well be that their objective function is quite different from minimizing the squared forecast errors—or whatever we typically use in order to evaluate their performance. (The number of litigations in the US after the technology boom/bust should serve as a strong reminder of this.)

2.7

Estimation on Subsamples

To capture time-variation in the regression coefficients, it is fairly common to run the regression 0 yt D xt b C "t
(2.23)
on a longer and longer data set (“recursive estimation”). In the standard recursive estimation, the first estimation is done on the sample t D 1; 2; : : : ; ; while the second estimation is done on t D 1; 2; : : : ; ; C 1; and so forth until we use the entire sample t D 1 : : : ; T . In the “backwards recursive estimate” we instead keep the end-point fixed and use more and more of old data. That is, the first sample could be T
; : : : ; T ; the second T
1; : : : ; T ; and so forth.
Alterntively, a moving data window (“rolling samples”) could be used. In this case, the first sample is t D 1; 2; : : : ; ; but the second is on t D 2; : : : ; ; C 1, that is, by dropping one observation at the start when the sample is extended at the end. See Figure
2.9 for an illustration.
An alternative is to apply an exponentially weighted moving average (EMA) esti46

mator, which uses all data points since the beginning of the sample—but where recent observations carry larger weights. The weight for data in period t is T t where T is the latest observation and 0 < < 1, where a smaller value of means that old data carries low weights. In practice, this means that we define xt D xt
Q

Tt

and y t D y t
Q

Tt

(2.24)

and then estimate yt D xt b C "t :
Q
Q0

(2.25)

Notice that also the constant (in x t ) should be scaled in the same way. (Clearly, this method is strongly related to the GLS approach used when residuals are heteroskedastic.
Also, the idea of down weighting old data is commonly used to estimate time-varying volatility of returns as in the RISK metrics method.)
Estimation on subsamples is not only a way of getting a more recent/modern estimate, but also a way to gauge the historical range and volatility in the betas—which may be important for putting some discipline on judgemental forecasts.
See Figures 2.9–2.10 for an illustration.

2.8
2.8.1

Robust Estimation
Robust Means, Variances and Correlations

Outliers and other extreme observations can have very decisive influence on the estimates of the key statistics needed for financial analysis, including mean returns, variances, covariances and also regression coefficients.
The perhaps best way to solve these problems is to carefully analyse the data—and then decide which data points to exclude. Alternatively, robust estimators can be applied instead of the traditional ones.
To estimate the mean, the sample average can be replaced by the median or a trimmed mean (where the x % lowest and highest observations are excluded).
Similarly, to estimate the variance, the sample standard deviation can be replaced by the interquartile range (the difference between the 75th and the 25th percentiles), divided by 1:35
StdRobust D Œquantile.0:75/ quantile.0:25/=1:35;
(2.26)
47

β of HiTech sector, recursive

β of HiTech sector, backwards recursive

2

2

1.5

1.5

1

1
1960

1980 end of sample

2000

1960

β of HiTech sector, 5−year data window

1980 start of sample

2000

β of HiTech sector, EWMA estimate

2

2

1.5

1.5

1

1
1960
1980
2000
end of 5−year sample

0.9

0.92

0.94

0.96

0.98

1

λ

Figure 2.9: Betas of US industry portfolios or by the median absolute deviation
StdRobust D median.jx t

j/=0:675:

(2.27)

Both these would coincide with the standard deviation if data was indeed drawn from a normal distribution without outliers.
A robust covariance can be calculated by using the identity
Cov.x; y/ D ŒVar.x C y/

Var.x

y /=4

(2.28)

and using a robust estimator of the variances—like the square of (2.26). A robust correlation is then created by dividing the robust covariance with the two robust standard deviations. See Figures 2.11–2.12 for empirical examples.
48

Distribution of betas estimated on moving 5−year data windows

Monthly returns, %

Monthly data 1947:1−2010:12

1

NoDur

Durbl

Manuf

Enrgy

HiTec

Telcm

Shops

Hlth

Utils

Other

Figure 2.10: Distribution of betas of US industry portfolios (estimated on 5-year data windows) 2.8.2

Robust Regression Coefficients

Reference: Amemiya (1985) 4.6
The least absolute deviations (LAD) estimator miminizes the sum of absolute residuals (rather than the squared residuals)
O
ˇLAD D arg min b T

ˇy t

(2.29)

ˇ
0
xt bˇ

t D1

This estimator involve non-linearities, but a simple iteration works nicely. It is typically less sensitive to outliers. (There are also other ways to estimate robust regression coefficients.) This is illustrated in Figure 2.13.
See Figure 2.14 for an empirical example.
If we assume that the median of the true residual, u t , is zero, then we (typically) have p 1
O
T .ˇLAD ˇ0 / !d N 0; f .0/ 2 ˙xx =4 , where ˙xx D plim

XT t D1

0 x t x t =T; (2.30)

49

US industry portfolios, ERe
Monthly data 1947:1−2010:12

mean median mean excess return

0.11
0.1
0.09
0.08
0.07
0.06
A

B

C

D

E

F

G

H

I

J

Figure 2.11: Mean excess returns of US industry portfolios where f .0/ is the value of the pdf of the residual at zero. Unless we know this density function (or else we would probably have used MLE instead of LAD), we need to estimate it—for instance with a kernel density method. p Example 2.7 (N.0; 2 /) When u t
N .0; 2 ), then f .0/ D 1= 2 2 , so the covari2
1
ance matrix in (2.30) becomes
˙xx =2. This is =2 times larger than when using
LS.
Remark 2.8 (Algorithm for LAD) The LAD estimator can be written
O
ˇLAD D arg min
ˇ

T
X
t D1

w t u t .b/2 , w t D 1= ju t .b/j ; with u t .b/ D y t
O
O
O

0O xt b

so it is a weighted least squares where both y t and x t are multiplied by 1= ju t .b/j. It can
O
be shown that iterating on LS with the weights given by 1= ju t .b/j, where the residuals
O
are from the previous iteration, converges very quickly to the LAD estimator.
Some alternatives to LAD: least median squares (LMS), and least trimmed squares

50

US industry portfolios, Std
Monthly data 1947:1−2010:12

std iqr/1.35 0.2

std

0.18
0.16
0.14
0.12
A

B

C

D

E

F

G

H

I

J

Figure 2.12: Volatility of US industry portfolios
(LTS) estimators which solve
O
ˇLM S D arg min median u2
Ot
ˇ

O
ˇLT S D arg min
ˇ

h
X
i D1

, with u t D y t
O

0O xt b

u2 , u2 Ä u2 Ä ::: and h Ä T:
Oi O1
O2

(2.31)
(2.32)

Note that the LTS estimator in (2.32) minimizes the sum of the h smallest squared residuals.

Bibliography
Amemiya, T., 1985, Advanced econometrics, Harvard University Press, Cambridge, Massachusetts.
Elton, E. J., M. J. Gruber, S. J. Brown, and W. N. Goetzmann, 2010, Modern portfolio theory and investment analysis, John Wiley and Sons, 8th edn.
Greene, W. H., 2003, Econometric analysis, Prentice-Hall, Upper Saddle River, New
Jersey, 5th edn.
51

OLS vs LAD of y = 0.75x + u
2
1.5

y : -1.125 -0.750 1.750 1.125 x: -1.500 -1.000 1.000 1.500

1

y

0.5
0
−0.5
−1

Data
OLS (0.25 0.90)
LAD (0.00 0.75)

−1.5
−2
−3

−2

−1

0 x 1

2

3

Figure 2.13: Data and regression line from OLS and LAD
Makridakis, S., S. C. Wheelwright, and R. J. Hyndman, 1998, Forecasting: methods and applications, Wiley, New York, 3rd edn.

52

US industry portfolios, β

β

1.5

Monthly data 1947:1−2010:12

OLS
LAD

1

0.5

A

B

C

D

E

F

G

H

I

J

Figure 2.14: Betas of US industry portfolios

53

3

Risk Measures

Reference: Hull (2006) 18; McDonald (2006) 25; Fabozzi, Focardi, and Kolm (2006)
4–5; McNeil, Frey, and Embrechts (2005); Alexander (2008)

3.1
3.1.1

Symmetric Dispersion Measures
Mean Absolute Deviation

The variance (and standard deviation) is very sensitive to the tails of the distribution.
For instance, even if the standard normal distribution and a student-t distribution with
4 degrees of freedom look fairly similar, the latter has a variance that is twice as large
(recall: the variance of a tn distribution is n=.n 2/ for n > 2). This may or may not be what the investor cares about. If not, the mean absolute deviation is an alternative. Let be the mean, then the definition is mean absolute deviation D E jR

(3.1)

j:

This measure of dispersion is much less sensitive to the tails—essentially because it does not involve squaring the variable.
Notice, however, that for a normally distributed return the mean absolute deviation is proportional to the standard deviation—see Remark 3.1. Both measures will therefore lead to the same portfolio choice (for a given mean return). In other cases, the portfolio choice will be different (and perhaps complicated to perform since it is typically not easy to calculate the mean absolute deviation of a portfolio).
Remark 3.1 (Mean absolute deviation of N. ;
E jR

jD

p
2=

2

/ and tn ) If R

N. ;

2

/, then

0:8 :

p
If R tn , then E jRj D 2 n=Œ.n 1/B.n=2; 0:5/, where B is the beta function. For n D 4, E jRj D 1 which is just 25% higher than for a N.0; 1/ distribution. In contrast, p the standard deviation is 2, which is 41% higher than for the N.0; 1/.
54

3.1.2

Index Tracking Errors
Value at risk and density of returns

VaR95% = −(the 5% quantile)

−VaR95%

R

Figure 3.1: Value at risk
Suppose instead that our task, as fund managers, say, is to track a benchmark portfolio
(returns Rb and portfolio weights wb )—but we are allowed to make some deviations. For instance, we are perhaps asked to track a certain index. The deviations, typically measured in terms of the variance of the tracking errors for the returns, can be motivated by practical considerations and by concerns about trading costs. If our portfolio has the weights w , then the portfolio return is Rp D w 0 R, where R are the original assets. Similarly, the
0
benchmark portfolio (index) has the return Rb D wb R. If the variance of the tracking error should be less than U , then we have the restriction
Var.Rp

Rb / D .w

wb /0 ˙.w

wb / Ä U;

(3.2)

where ˙ is the covariance matrix of the original assets. This type of restriction is fairly easy to implement numerically in the portfolio choice model (the optimization problem).

55

3.2
3.2.1

Downside Risk
Value at Risk

The mean-variance framework is often criticized for failing to distinguish between downside (considered to be risk) and upside (considered to be potential).
The 95% Value at Risk (VaR95% ) says that there is only a 5% chance that the return
(R) will be less than VaR95%
Pr.R Ä

VaR˛ / D 1

˛:

(3.3)

See Figure 3.1.
Example 3.2 (Quantile of a distribution) The 0.05 quantile is the value such that there is only a 5% probability of a lower number, Pr.R Äquantile0:05 / D 0:05.
We can solve this expression for the VaR˛ as
VaR˛ D

cdfR1 .1

˛ /,

(3.4)

where cdfR1 ./ is the inverse cumulative distribution function of the returns, so cdfR1 .1
˛ / is the 1 ˛ quantile (or “critical value”) of the return distribution. For instance,
VaR95% is the negative of the 0:05 quantile of the return distribution. Notice that the return distribution depends on the investment horizon, so a value at risk measure is typically calculated for a stated investment period (for instance, one day).
To convert the value at risk into value terms (CHF, say), just multiply the VaR for returns with the value if the investment (portfolio).
If the return is normally distributed, R N . ; 2 / and c1 ˛ is the 1 ˛ quantile of a N(0,1) distribution (for instance, 1:64 for 1 ˛ D 1 0:95), then
VaR˛ D

. C c1

˛

/:

(3.5)

This is illustrated in Figure 3.2.
See Table 3.1 for an empirical illustration. Also, Figure 3.3 illustrates the VaR calculated from a time series model (to be precise, a AR(1)+GARCH(1,1) model) for daily
S&P returns

56

Std
VaR (95%)
ES (95%)
SemiStd
Drawdown

Small growth
8:4
12:4
17:4
5:5
77:9

Large value
5:0
7:5
10:4
3:2
49:7

Table 3.1: Risk measures of monthly returns of two stock indices (%), US data 1957:12010:12.
Notice that the value at risk for a normally distributed return is a strictly increasing function of the standard deviation (and the variance). Minimizing the VaR at a given mean return therefore gives the same solution (portfolio weights) as minimizing the variance at the same given mean return. In other cases, the portfolio choice will be different (and perhaps complicated to perform).
Remark 3.3 (Critical values of N. ; 2 /) If R
N. ; bility that R Ä
1:64 , a 2.5% probability that R Ä that R Ä
2 :33 .

2

/, then there is a 5% proba1:96 , and a 1% probability

Example 3.4 (VaR with R
N . ; 2 /) If daily returns have D 8% and D 16%, then the 1-day VaR95% D .0:08 1:64 0:16/ 0:18; we are 95% sure that we will not loose more than 18% of the investment over one day, that is, VaR95% D 0:18. Similarly,
VaR97:5% D .0:08 1:96 0:16/ 0:24.
Example 3.5 (VaR and regulation of bank capital) Bank regulations have used 3 times the 99% VaR for 10-day returns as the required bank capital.
Remark 3.6 (Multi-period VaR) If the returns are iid, then a q -period return has the mean q and variance q 2 , where and 2 are the mean and variance of the one-period p returns respectively. If the mean is zero, then the q -day VaR is q times the one-day VaR.
Remark 3.7 (VaR from t-distribution) The assumption of normally distributed returns rules thick tails. As an alternative, suppose the normalized return has a t-distribution with v degrees of freedom
R
tv : s 57

Density of N(8,162)

Density of N(0,1)
0.4

3

5% quantile is c = −1.64

2 pdf pdf

0.3

5% quantile is µ + c*σ = −18

0.2

1
0.1
0

−3

c

0

0

3

x

−40

0
R

40

cdf of N(8,162)

Inverse of cdf of N(8,162)

1

0.5

R

cdf

40
0
−40
0
−40

0
R

40

0

Figure 3.2: Finding critical value of N( ,

0.2

0.4

0.6

0.8

1

cdf

2

) distribution

Notice that s 2 is not the variance of R, since Var.R/ D vs 2 =.v 2/ (assuming v > 2, so the variance is defined). In this case, (3.5) still holds, but with c1 ˛ calculated as the 1 ˛ quantile of a tv distribution. In practice, for a given value of Var.R/, the t distribution gives a smaller value of the VaR than the normal distribution. The reason is that the variance of a t-distribution is very high for low degrees of freedom.
Backtesting a VaR model amounts to checking if actual data fits with the VaR numbers.
For instance, we first find the VaR95% and then calculate what fraction of returns that is actually below this number. If the model is correct it should be 5%. We then repeat this for VaR96% (only 4% of the returns should be below this number).
Figure 3.4 shows results from backtesting a VaR model where the volatility follows a
GARCH process.
The VaR concept has been criticized for having poor aggregation properties. In par58

Value at Risk95% (one day), %

GARCH std, %
5

10

4
3

5

2
1
0
1980

1990

2000

2010

0
1980

1990

2000

2010

S&P 500, daily data 1954:1−2011:5

Distribution of returns, VaR
Estimated N(), unconditional

0.4
0.3
0.2
0.1
0
−4

−2

0

2

4

Figure 3.3: Conditional volatility and VaR ticular, the VaR for a portfolio is not necessarily (weakly) lower than the portfolio of the
VaRs, which contradicts the notion of diversification benefits. (To get this unfortunate property, the return distributions must be heavily skewed.)
3.2.2

Index Models for Calculating the Value at Risk

Consider a multi-index model
R D a C b1 I1 C b2 I2 C : : : C bk Ik C e; or

(3.6)

D a C b 0 I C e;

where b is a k 1 vector of the bi coefficients and I is also a k 1 vector of the Ii indices.
As usual, we assume E.e/ D 0 and Cov .e; Ii / D 0. This model can be used to generate

59

Backtesting VaR from GARCH(1,1), daily S&P 500 returns
0.1
0.09

Empirical Prob(R 0 if
Use the fact that

12

D

12

where

22

e
1

>

12

e
2:

(4.17)

is the correlation coefficient to rewrite as

v1 > 0 if

e
1= 1

>

e
2= 2,

v2 > 0 if

e
2= 2

>

e
1= 1:

and

(4.18)
(4.19)

This provides a simple way to assess if an asset should be held (in positive amounts): if its
Sharpe ratio exceeds the correlation times the Sharpe ratio of the other asset. For instance, both portfolio weights are positive if the correlation is zero and both excess returns are positive. If v1 C v2 ¤ 0, then we can define the weights in the “subportfolio” of risky assets only as wi D vi =.v1 C v2 /. This gives
"
#"
#
e e w1
1
22 1
12 2
D
(4.20) e e e e
. e C e / 12 w2 22 1 C 11 2
12 1 C 11 2
2
1
"
#"
#
e
1
22
12
1
(4.21)
D e e e . e C e / 12
22 1 C 11 2
12
11
2
1
2


1

e

=10 ˙

1

e

;

(4.22)

where 1 is a vector of ones.
This is the tangency portfolio (where the ray from Rf in the p E Rp space is tangent e to the minimum-variance set). It has the highest Sharpe ratio, p = p , of all portfolios on
73

MV Utility, 2 risky assets

MV frontier
0.1

0
Mean

−0.1
−0.2
1
0
v2

−1 −1

0 v1 0.05

1
0

0

0.05

0.1
Std

0.15

0.2

Riskfree rate: 0.01
Mean returns: 0.09 0.06
Covariance matrix:
0.026 0.000
0.000 0.014
Weights on risky assets and riskfree:
Optimal with k=15: 0.21 0.23 0.56
Tangency portfolio: 0.47 0.53 0.00

Figure 4.2: Choice of portfolios weights the minimum-variance set. Note that all investors (different k , but same expectations) hold a mix of this portfolio and the riskfree asset (this follows from the fact that the weights in (4.14) are scaled by k ). This two-fund separation theorem is very useful.
Consider the simple case when the assets are uncorrelated ( 12 D 0), then (4.20) becomes "
#"
# e w1
1
22 1
:
(4.23)
D
e e e w2 22 1 C 11 2
11 2

Results: (i) if both excess returns are positive, then the weight on asset 1 increases if e 1 increases or 11 decreases; (ii) both weights are positive if the excess returns are.
Both results are quite intuitive since the investor likes high expected returns, but dislikes variance. Example 4.1 (Tangency portfolio, numerical) When .

e
1;

e
2/

D .0:08; 0:05/, the corre-

74

lation is zero, and .

When

e
1

11 ;

22 /

D .0:162 ; 0:122 /, then (4.23) gives
"
#"
#
w1
0:47
D
:
w2
0:53

increases from 0:08 to 0:12, then we get
"
#"
#
w1
0:57
D
:
w2
0:43

Now, consider another simple case, where both variances are the same, but the correlation is non-zero ( 11 D 22 D 1 as a normalization, 12 D ). Then (4.20) becomes
"
#"
#
e e w1
1
1
2
D
:
(4.24) e e e . 1 C e /.1
/
w2
2
2
1
Results: (i) both weights are positive if the returns are negatively correlated ( < 0) and both excess returns are positive; (ii) w2 < 0 if > 0 and e is considerably higher
1
e than e (so e <
). The intuition for the first result is that a negative correlation
2
2
1
means that the assets “hedge” each other (even better than diversification), so the investor would like to hold both of them to reduce the overall risk. (Unfortunately, most assets tend to be positively correlated.) The intuition for the second result is that a positive correlation reduces the gain from holding both assets (they don’t hedge each other, and there is relatively little diversification to be gained if the correlation is high). On top of this, asset 1 gives a higher expected return, so it is optimal to sell asset 2 short (essentially a risky “loan” which allows the investor to buy more of asset 1).
Example 4.2 (Tangency portfolio, numerical) When . e ;
1
0:8 we get
"
#"
#
w1
0:51
D
:
w2
0:49
If, instead,

e
2/

D .0:08; 0:05/, and

D

D 0:8, then we get
"

w1 w2 #

"
D

1:54
0:54

#
:

75

4.1.3

N Risky Assets and a Riskfree Asset

In the general case with N risky assets and a riskfree asset, the portfolio weights of the risky assets are
1
v D ˙ 1 e;
(4.25)
k while the weight on the riskfree asset is 1 10 v . The weights of the tangency portfolio are therefore w D ˙ 1 e =10 ˙ 1 e :
(4.26)
Proof. (of 4.26) The portfolio has the return Rp D v 0 R C .1
Rf / C Rf . The mean and variance are
E Rp D v 0

e

10 v/Rf D v 0 .R

C Rf and Var.Rp / D v 0 ˙v:

The optimization problem is maxv v 0

e

C Rf

k0 v ˙v;
2

with first order conditions (see Appendix for matrix calculus)
0N

1

D

e

k˙v , so v D

1
˙
k

1

e

.

As in the case with only one risky asset, the optimal portfolio has e E.Rp /

D k , and p e0 ˙
SRp D

Var.Rp /

1

e;

(4.27)

which SRp is the Sharpe ratio of the portfolio. The first line says that higher risk aversion tilts the portfolio away from a high variance—and the second line says that all investors
(irrespective of their risk aversions) have the same Sharpe ratios. This is clearly the same as saying that they all mix the tangency portfolio with the risk free asset (depending on their risk aversion)—they are all on the Capital Market Line (see Figure 4.9). Clearly, with k D 1, the portfolio has a zero variance, so the expected excess return is zero.
With lower risk aversion, the portfolio shifts along the CLM towards higher variance (and

76

expected return).
Proof. (of (4.27)) We have e E.Rp /

Var.Rp /

D

1
˙
k

Dk

˙


0e
1
˙1e k 1 e 0 ˙ 1˙ 1 k 1 e0 e
1

e /0

e

D k:

e

Multiply by Std.Rp / to get the Sharpe ratio of the portfolio
SRp D k Std.Rp / s 1
Dk
˙ k p e0 ˙ 1
D

1

e

Ã0

Â
˙

1
˙
k

1

e

Ã

e:

Remark 4.3 (Properties of tangency portfolio) The expected excess return and the varie ance of the tangency portfolio are e D e0 ˙ 1 e =10 ˙ 1 e and Var.RT / D e0 ˙ 1 e = 10 ˙
T
2
The square of the Sharpe ratio is therefore e = T D e0 ˙ 1 e .
T
Figures 4.3–4.4 illustrate mean returns and standard deviations, estimated by exponentially moving averages (as by RiskMetrics). Figures 4.5–4.6 show how the optimal portfolio weights (based on mean-variance preferences). It is clear that the portfolio weights change very dramatically—perhaps too much to be realistic.
4.1.4

A Risky Asset and a Riskfree Asset Revisited

Once we have the tangency portfolio (with weights w as in (4.26)), we can actually use that as the risky asset in the case with only one risky asset (and a riskfree). That is, we e can treat w 0 Re as R1 in (4.2). After all, the portfolio choice is really about mixing the tangency portfolio with the riskfree asset.
The result is that the weight on the tangency portfolio is (a scalar) vD 10

k

1

e

;

(4.28)

77

1

e2

.

Mean excess returns (annualized

Mean excess returns (annualized

0.15
0.1
0.05
1990

0.15
0.1

Cnsmr
Manuf
2000

HiTec
Hlth

0.05
1990

2010

2000

2010

Mean excess returns (annualized
0.15
0.1
Other
0.05
1990

2000

2010

Figure 4.3: Dynamicically updated estimates, 5 U.S. industries and 1 v on the riskfree asset.
Proof. (of (4.28)) From (4.25)–(4.26) we directly get vD 10 1e

w; k „ ƒ‚ … v which is just v in (4.28) times the tangency portfolio w from (4.26). To see that this fits e with (4.5) when w 0 Re is substituted for R1 , notice that
E.w 0 Re /
D 10 ˙
Var.w 0 R/

1

e

;

so (4.5) could be written just like (4.28).

78

Std (annualized

Std (annualized

0.25

0.25
Cnsmr
Manuf

HiTec
Hlth

0.2

0.2

0.15

0.15

1990

2000

2010

1990

2000

2010

Std (annualized
0.25
Other
0.2

0.15
1990

2000

2010

Figure 4.4: Dynamicically updated estimates, 5 U.S. industries
4.1.5

Portfolio Choice with Short Sale Constraints

The previous analysis assumes that there are no restrictions on the portfolio weights.
However, many investors (for instance, mutual funds) cannot have short positions. In this case, the objective function is still (4.7), but with the additional restriction
0 Ä vi Ä 1:

(4.29)

See Figures 4.7–4.8 for an illustration.

79

Portfolio weights, Cnsmr

Portfolio weights, Manuf
10

6

fixed mean fixed cov

4
2

5

0
−2
1990

2000

2010

0
1990

Portfolio weights, HiTec

2000

2010

Portfolio weights, Hlth

4

2

2

1

0

0

−2

−1

−4
1990

2000

2010

1990

2000

2010

Figure 4.5: Dynamicically updated portfolio weights, T-bill and 5 U.S. industries

4.2

Beta Representation of Expected Returns

e
For any portfolio, the expected excess return ( p ) is linearly related to the expected excess return on the tangency portfolio ( e ) according to
T
e p D ˇp

e
T,

where ˇp D

Cov Rp ; RT
:
Var .RT /

(4.30)

This result follows directly from manipulating the definition of the tangency portfolio
(4.26).
Example 4.4 (Effect of ˇ ) Suppose the tangency portfolio has an expected excess return of 8% (which happens to be close to the value for the US market return since WWII). An
80

Portfolio weights, Other

Portfolio weights, riskfree

0
2
−5

0 fixed mean fixed cov

−10
1990

−2

2000

2010

1990

2000

2010

Figure 4.6: Dynamicically updated portfolio weights, T-bill and 5 U.S. industries
3 Asset classes, 2002:12−2011:5
20
A MSCI world
B Global govt bonds
C Commodities

Mean, %

15

C

A

10
B

5
MV frontier
MV frontier (no short sales)
0

0

5

10

15
Std, %

20

25

30

Figure 4.7: MV frontier, 3 asset classes asset with a beta of 0:8 should then have an expected excess return of 6:4%, and an asset with a beta of 1:2 should have an expected excess return of 9:6%.
Most stock indices (based on the standard characteristics like industry, size, value/growth) have betas around unity—but there are variations. For instance, building companies, manufacturers of investment goods and cars are typically often very procyclical (high betas), whereas food and drugs are not (low betas).
81

Portfolio weights (MV preferences, no short sales), 2002:12−2011:5
1
MSCI world
Bonds
Commodities

0.8
0.6
0.4
0.2
0

0

1

2
3
Risk aversion

4

5

Figure 4.8: Portfolio choice (3 asset classes) with no short sales
Proof. (of (4.30)) To derive 4.30, consider the asset 1 in the two asset case. We have
Cov .R1 ; RT / D Cov .R1 ; w1 R1 C w2 R2 / D w1

11

C w2

12 :

The expression for asset 2 is similar. Use the definition vi D wi .v1 C v2 / and the result above in the first order conditions (4.10)–(4.11) e 1

D .v1
0

11

C v2

B v1
DB
@ v1 C v2
„ ƒ‚ … w1 12 / k

11

1
C

v2 v C v2
„1 ƒ‚ …

C
C .v1 C v2 /

12 A k

w2

D Cov .R1 ; w1 R1 C w2 R2 / k .v1 C v2 /

D Cov .R1 ; RT / k .v1 C v2 / :

The expression for asset two is similar, so we collect the results as e 1 e 2

D Cov .R1 ; RT / k.v1 C v2 /;
D Cov .R2 ; RT / k.v1 C v2 /:

82

Solve for the covariances as
Cov .R1 ; RT / D

Cov .R2 ; RT / D

e
1A
e
2 A;

where A D 1=Œk.v1 C v2 /. These expressions will soon prove to be useful. Notice that the variance of the tangency portfolio is
Var .RT / D Cov .w1 R1 C w2 R2 ; RT / D w1 Cov .R1 ; RT / C w2 Cov .R2 ; RT / ; which we can rewrite by using the expressions for the covariances above
Var .RT / D w1
D

e
1

e
T A:

C w2

e
2

A

Consider asset 1. Divide Cov .R1 ; RT / by Var .RT /
Cov .R1 ; RT /
D
Var .RT /

e
1A
; e TA

which can rearranged as (4.30).
Remark 4.5 (Why is Risk = ˇ ? Short version) Because ˇ measures the covariance with the market (and the idiosyncratic risk can be diversified away).
Remark 4.6 (Why is Risk = ˇ ? Longer Version) Start by investing 100% in the market portfolio, then increase position in asset i by a small amount (ı , 2% or so) by borrowing at the riskfree rate. The portfolio return is then e Rp D Rm C ıRi :

The expected portfolio return is
E Rp D E Rm C

e ı E Ri
„ ƒ‚ …

incremental risk premium

83

and the portfolio variance is
2
p

2 m D

C ı2


2 i C 2ı Cov .Ri ; Rm /: ƒ‚ …

incremental risk, but ı 2

2 i 0

(For instance, if ı D 2%, then ı 2 D 0:0004 and 2ı D 0:04.) Notice: risk = covariance with the market. The marginal compensation for more risk is e E Ri incremental risk premium
D
: incremental risk
2 Cov .Ri ; Rm /

In equilibrium, the marginal compensation for more risk must be equal across assets e e e E Rj
E Ri
E Rm
D
D ::: D
;
2
2 Cov .Ri ; Rm /
2m
2 Cov Rj ; Rm

since Cov .Rm ; Rm / D
4.2.1

2
m.

Rearrange as the CAPM expression.

Beta of a Long-Short Position

Consider a zero cost portfolio consisting of one unit of asset i and minus one unit of asset j . The beta representation is clearly e i

e j D E.Ri

Rj / D .ˇi

ˇj /

(4.31)

e
T:

If the two assets have the same betas, then this portfolio is not exposed to the tangency portfolio (and ought to carry a zero risk premium, at least according to theory). Such a long-short portfolio is a common way to isolate the investment from certain types of risk
(here the systematic risk with respect to the tangency portfolio).
Proof. (of (4.31)) Notice that
Cov Ri Rj ; RT
Cov .Ri ; RT /
D
Var .RT /
Var .RT /

Cov Rj ; RT
D ˇi
Var .RT /

ˇj :

84

4.3

Market Equilibrium

Suppose all agents have the same expectations about the payoff of the assets and (for simplicity) also the same risk aversions. They will then all chose portfolios on the efficient frontier. (An alternative interpretation is that we allow investors to have different risk aversions, and that the portfolio weights discussed below are the average weights across investors.) To determine the equilibrium asset prices (and therefore expected returns) we have to equate demand (the mean variance portfolios) with supply (exogenous). Since we assume a fixed and exogenous supply (say, 2000 shares of asset 1 and 407 shares of asset 2), prices (and therefore returns) are completely driven by demand.
In equilibrium, net supply of the riskfree assets is zero, which implies that the optimal portfolio weights (4.14) must be such that the weights on the risky assets sum to unity
(v1 C v2 ). Notice that v1 and v2 then defines the tangency portfolio—which coincides with the market portfolio, and that we can interpret the portfolio weights as the relative market capitalization of the assets.
4.3.1

Finding the Equilibrium

We can solve for as (or

e
1

and

e
2

from the expressions for the optimal portfolio weights (4.14)
"

e
1
e
2

#

"

#"
11

12

12

Dk

22

v1 v2 #
(4.32)

D k˙v in matrix notation). Form the tangency portfolio of the left hand side to get
D v1 e C v2 e . Forming the same portfolio of the right hand side gives k Var .RT /.
1
2
Combining gives e e
T

e
T

Dk

SRT D

2

.RT / , or

e
T

.RT /

D k .RT /

(4.33)
(4.34)

If the tangency portfolio is the market portfolio, then this expression shows how the risk premium on the market is determined. The Sharpe ratio (4.34) is often called the “market price of risk.” Having derived an expression for the risk premium, the asset prices can be calculated (not done here, since it is of little importance for our purposes).
85

Combining with the beta representation (4.30) we get e i

D ˇi k

2

.RT / D ˇi SRT .RT / :

(4.35)

This shows that the expected excess return (risk premium) on asset i can be thought of as a product of three components: ˇi which captures the covariance with the market, SRT which is the price of market risk (risk compensation per unit of standard deviation of the market return), and .RT / which measures the amount of market risk.
An important feature of (4.35) is that the only movements in the return of asset i that matter for pricing are those movements that are correlated with the market (tangency portfolio) returns. In particular, if asset i and j have the same betas, then they have the same expected returns—even if one of them has a lot more uncertainty.
4.3.2

Back to Prices (Gordon Model)

The gross return, 1 C R t C1 , is defined as
1 C R t C1 D

D t C1 C P t C1
;
Pt

(4.36)

where P t is the asset price and D t C1 the dividend it gives at the beginning of the next period. Rearranging gives
D t C1
P t C1
Pt D
C
:
(4.37)
1 C R t C1
1 C R t C1
Use the same equation but with all time subscripts advanced one period (P t C1 D
P t C2
) to substitue for P t C1
1CR t C2
D t C1
1
Pt D
C
1 C R t C1
1 C R t C1

Â

Ã
D t C2
P t C2
C
:
1 C R t C2
1 C R t C2

D t C2
C
1CR t C2

(4.38)

86

Now, substitute for P t C2 and then for P t C3 and so on. Finally, we have
Pt D

D t C1
D t C3
D t C2
C
C :::
C
1 C R t C1
.1 C R t C1 /.1 C R t C2 / .1 C R t C1 /.1 C R t C2 /.1 C R t C3 /

(4.39)

D

1
X
j D1

D t Cj
Qj

s D1 .1 C R t Cs /

(4.40)

:

We now make three simplifying assumptions. First, we can appoximate the expectation of a ratio with the ratio of expectations (E.x=y/
E x= E y ). Second, that the expected j -period returns are .1 C /j
Et

Qj

s D1 .1

.1 C /j :

C R t Cs /

Third, that the expected dividends are constant E t D t Cj D D and E t R t Cj D j 1. We can then write (4.40) as
Pt

1
X

D
D
D; j .1 C / j D1

(4.41) for all

(4.42)

which is clearly the Gordon model for an asset price.
If expected dividends increase, but expected returns do no (for instance, because the
ˇ of the asset is unchanged), then this is immediately capitalized in today’s price (which increases). In contrast, if expected dividends are unchanged, but the expected (required) return increases, then today’s asset price decreases. According to CAPM ((4.30) and
(4.33)), the expected return is
D Rf C ˇ

D Rf C ˇk

e m 2

.Rm / :

(4.43)

This expected return increases when (i) the riskfree rate increases; (ii) the market risk premium increases because of higher risk aversion or higher (beliefs about) market uncertainty; (iii) or when (beliefs about) beta increases.

87

Capital market line

Security market line
15

10

10

Mean, %

Mean, %

15

5
0

0

5

10

5
0

15

0

0.5

Std, %
CML: ER = Rf + σ × (ERm − Rf )/σm
Location of efficient portfolios

1 β 1.5

2

SML: ER = Rf + β (ERm − Rf )
Location of all assets

Figure 4.9: CML and SML
4.3.3

CML and SML

According to CAPM, all optimal portfolios (denoted opt ) are on the capital market line opt D Rf C

e m opt ;

(4.44)

m

where e and m are the expected value and the standard deviation of the excess return m of the market portfolio. This is clearly the same as the upper leg of the MV frontier (with risky assets and riskfree asset). See Figure 4.9 for an example. e e
Proof. (of (4.44)) Ropt D aRm C .1 a/Rf , so Ropt D aRm . We then have e e
0). Solve for a from the latter (a D opt = m ) opt D a m and opt D a m (since a and use in the former.
CAPM also implies that the beta representation (4.30) holds (where the tangency portfolio equals the market portfolio). Rewriting we have i D Rf C ˇi .

m

Rf /:

(4.45)

The plot of i against ˇi (for different assets, i ) is called the security market line. See
Figure 4.9 for an example.

88

4.4
4.4.1

An Application of MV Portfolio Choice: International Assets
Foreign Investments

Let the exchange rate, S , be defined as units of domestic currency per unit of foreign currency, that is the price (measured in domestic currency) of foreign currency. For instance, if we take Switzerland to be the domestic economy, then we have around 1.5 CHF per
EUR. Notice that a higher S means a weaker home currency (depreciation) and a lower
S means a stronger home currency (appreciation).
To be really concrete, suppose we bought a foreign asset in t at the price P t , measured in foreign currency; the cost in domestic currency was then S t P t . One period later (in t C 1), the value of the asset (in foreign currency) is P t C1 (think of this as the total value, including dividends or whatever); the value in domestic currency is thus S t C1 P t C1 .
Clearly, the net return in domestic currency (unhedged), Ru , satisfies
S t C1 P t C1
St Pt
S t C1 P t C1
D
St Pt

1 C Ru D

D .1 C Rs /.1 C R /;

(4.46)

(4.47)

where RS is the return on the currency investment (buying foreign currency in t , selling it in t C 1) and R is just the “local” return of the foreign asset (the return measured in foreign currency).
Clearly, we can rewrite the net return as
Ru D RS C R C RS R
RS C R

(4.48)
(4.49)

where the approximation follows from the fact that the product of two net returns is typically very small (for instance, 0:05 0:03 D 0:0015). If we instead use log return (the log of the gross return), then there is no approximation error at all.
The approximation is used throughout this section (since it simplifies many expres-

89

Stock market indices (local currencies)

Index, normalized to 100 in 1998

250
US
UK
FR
DE
JP

200

150

100

50
1998

2000

2002

2004

2006

2008

2010

2012

Figure 4.10: International stock market indices sions considerably). The expected return and the variance (in domestic currency) are then
E Ru
Var.Ru /

E RS C E R , and

Var.RS / C Var.R / C 2 Cov.RS ; R /:

(4.50)
(4.51)

To apply the CAPM analysis to the problem of whether to invest internationally or not, suppose we have only two risky assets: a risky foreign equity index (with domestic currency return Rw ) and a risky domestic equity index (denoted d ). Then, according to
(4.18) we should invest internationally if e = w > e = d . This says that a high Sharpe w d ratio of the foreign asset (measured in domestic currency) or a low correlation with the domestic return both lead to investing internationally.
See Figures 4.10–4.11 and Tables 4.1–4.2 for an illustration.
4.4.2

Exchange Rate Hedging

The return on the foreign investment has two components: the return on the currency
(exchange rate change) and the local foreign return on the asset (equity, say). It may often
90

Exchange rates (against USD)
150

Index, normalized to 100 in 1998

140

GBP
EUR (FFR)
EUR (DEM)
JPY

A value < 100 means that the currency has gained value against the USD

130
120
110
100
90
80
70
60
1998

2000

2002

2004

2006

2008

2010

2012

Figure 4.11: Exchange rate indices
Local stocks
US
UK
FR
DE
JP

5:9
5:7
7:3
7:2
0:4

Exchange rate
0:0
0:5
2:9
2:9
4:2

Sum
5:9
6:2
10:2
10:2
4:6

Home Currency
5:9
6:3
10:3
10:3
4:1

Table 4.1: Contribution to the average return for a US investor investing in different equity markets, 1998:1-2011:5 be useful to hedge the exchange rate component. For instance, the investor may have good knowledge about the properties of the equity return, but not about the exchange rate movements. In practice, it is not entirely straightforward to hedge the exchange rate component entirely. The reason is that the future local foreign return is not known with certainty— that is, we don’t know how many units of currency we need to hedge. In terms of (4.46), this is just saying that the investor does not know (in t ) what P t C1 will be.

91

Local stocks
US
UK
FR
DE
JP

Exchange rate

3:15
2:42
3:89
5:41
3:58

0:00
0:86
1:19
1:19
1:32

Sum
3:15
3:28
5:08
6:60
4:90

Corr

Home Currency

0:06
0:05
0:05
0:26

3:15
3:46
5:28
6:89
3:82

Table 4.2: Contribution to the variance of the return for a US investor investing in different equity markets, 1998:1-2011:5

t C1

t

pay F , get asset

write contract: agree on F

Figure 4.12: Timing convention of forward contract
There are several ways to deal with this—but they are all just approximations. One possibility is to just hedge the initial investment, that is, S t P t in (4.46). Another possibility is to hedge a bit more to incorporate the expected local return. To simplify the subsequent analysis I will assume that we somehow could create a perfect hedge.
An exchange rate hedge clearly affects both expected return and the volatility—and it is not obvious in which way. We therefore need to dig into the details a bit. Hedging means entering a forward (or futures) contract which guarantees us a known exchange rate in the future, F t . See Figure 4.12 for the structure of a forward contract. (A forward contract is typically a private contract between two investors. A futures contract is similar, but is typically traded on an exchange. Futures and forwards have very similar, but not identical, prices.)
In terms of (4.46)—(4.49), we then have the return in domestic currency (hedged)
1 C Rh D
Rh

F t P t C1
, or
St Pt

(4.52)

RSF C R ;

(4.53)
92

where RSF is the return on the currency hedging part: buy foreign currency in t (at the price S t ), sell foreign currency in t C 1 (at the pre-agreed forward price F t ). To understand this we have to analyze the pricing for forward contracts.
The expected return and the variance are therefore
E Rh
Var.Rh /

RSF C E R and

Var.R /;

(4.54)
(4.55)

since RSF is known at the time of investment.
The difference in the expected return is
E Ru

E Rh

E RS

RSF ;

(4.56)

which is the difference between the expected return on an uncovered and covered investment in the foreign currency. If uncovered interest rate parity (UIP) holds, then this difference is zero. Although it is unclear if UIP actually is a reasonable approximation, the deviations from it show few systematic patterns. There may be a chance of saying more about this difference—for a given time and market—but that requires a thorough knowledge of the FX market and the monetary policy setting.
The difference in variance is
Var.Ru /

Var.Rh /

Var.RS / C 2 Cov.RS ; R /;

(4.57)

which can have either sign. Although the first term must be positive, the second term can easily be negative—maybe so negative that the whole difference is negative: the foreign investment without a currency hedge can be a safer investment. To make the covariance in
(4.57) negative, the exchange rate S must have a tendency to decrease (foreign currency becomes cheaper) at the same time as the local foreign return is positive. As an example of how this could happen, let Switzerland be the domestic economy and the Euro zone the foreign economy. If the European Central Bank takes steps that decrease the value of the euro, then it might be the case that European firms become more competitive.
See Figure 4.13 for an example.
Remark 4.7 (Forward-Spot Parity) The forward-spot parity for any asset without inter-

93

mediate dividends is
Forward price = Spot price

(1+interest rate).

The intuition for this expression is that a forward contract is like buying the asset today, but on credit.
Remark 4.8 (Covered Interest Rate Parity, CIP). CIP is just the forward-spot parity applied to exchange rates. To derive it, let i and i be the domestic and foreign interest rates respectively. The spot price (in t ) of getting of one unit of foreign currency in t C 1 is S t =.1 C i /: since we get one units foreign currency in t C 1 if we buy 1=.1 C i / units of foreign currency in t (this is the same as buying one foreign short-term bill). Plugging into the forward-spot parity gives
Ft D

St
.1 C i/:
1Ci

Watch out for the length of period: interest rates are quoted on an annual basis. The previous equations have implicitly assumed that the period between t and t C 1 is one year. Remark 4.9 (Return from the currency position) The return RSF from the currency hedging part in (4.53) equals F t =S t 1. From covered interest rate parity, we get that this equals .1 C i/=.1 C i / 1 i i . Comparing with the unhedged currency return in
(4.47), Rs D S t C1 =S t , we see that the unhedged position gives a higher average return than the hedged position if the home currency depreciates more than predicted by UIP
(the interest rate differential).

4.4.3

Invest in Foreign Stocks? Rule-of-Thumb

The result in (4.19) provides a simple rule of thumb for whether we should invest in foreign assets or not. Let asset 1 represent a domestic market index, and asset 2 a foreign market index. The rule is then: invest in the foreign market if its Sharpe ratio is higher than the Sharpe ratio of the domestic market times the correlation of the two markets (that e is, if e = 2 >
2
1 = 1 ). Clearly, the returns should be measured in the same currency
(but the currency risk may be hedged or not).
See Figure 4.14 for an example.
94

US, index value in USD

UK, index value in USD

2

0

2 unhedged hedged

2000

2005

2010

0

FR, index value in USD

2010

2

2000

2005

2010

2005

2000

mean(unh)
US
5.9
UK
6.3
FR
10.3
DE
10.3
JP
4.1

2

2000

0

2005

2010

Return statistics: unhedged (unh) and hedged (h):

JP, index value in USD

0

2005

DE, index value in USD

2

0

2000

mean(h)
5.9
4.5
7.6
7.6
3.4

std(unh)
17.8
18.6
23.0
26.3
19.5

std(h)
17.8
15.6
19.8
23.3
18.9

2010

Figure 4.13: International stock indices

Bibliography
Danthine, J.-P., and J. B. Donaldson, 2002, Intermediate financial theory, Prentice Hall.
Elton, E. J., M. J. Gruber, S. J. Brown, and W. N. Goetzmann, 2010, Modern portfolio theory and investment analysis, John Wiley and Sons, 8th edn.

95

Investing in foreign equity: SR(foreign) > Corr(foreign,home) × SR(home)
0.5
0.4
0.3
0.2
0.1
0
−0.1

Returns are measured in USD

−0.2

Home market is US

−0.3
−0.4
−0.5

Sample: 1998:1−2011:5

SR(foreign)
Corr(foreign,home) × SR(home)
US

UK

FR

DE

JP

Figure 4.14: International stock indices

96

5

Utility-Based Portfolio Choice

Reference: Elton, Gruber, Brown, and Goetzmann (2010) 12 and 18
Additional references: Danthine and Donaldson (2002) 5–6; Huang and Litzenberger
(1988) 4–5; Cochrane (2001) 9 (5); Ingersoll (1987) 3–5 (6)
Material with a star ( ) is not required reading.

5.1

Utility Functions and Risky Investments

Any model of portfolio choice must embody a notion of “what is best?” In finance, that often means a portfolio that strikes a good balance between expected return and its variance. However, in order to make sense of that idea—and to be able to go beyond it—we must go back to basic economic utility theory.
5.1.1

Specification of Utility Functions, U.W /

In theoretical micro the utility function U.x/ is just an ordering without any meaning of the numerical values: U.x/ > U.y/ only means that the bundle of goods x is preferred to y (but not by how much). In applied microeconomics we must typically be more specific than that by specifying the functional form of U.x/. As an example, to generate demand curves for two goods (x1 and x2 ), we may choose to specify the utility function
˛1
as U.x/ D x1 x2 a (a Cobb-Douglas specification).
In finance (and quite a bit of microeconomics that incorporate uncertainty), the key features of the utility functions that we use are as follows.
First, utility is a function of a scalar argument, U.x/. This argument (x ) can be end-ofperiod wealth, consumption or the portfolio return. In particular, we don’t care about the composition of the consumption basket. In one-period investment problems, the choice of x is irrelevant since consumption equals wealth, which in turn is proportional to the portfolio return.
Second, uncertainty is incorporated by letting investors maximize expected utility,
E U.x/. Since returns (and therefore wealth and consumption) are uncertain, we need
97

some way to rank portfolios at the time of investment (before the uncertainty has been resolved). In most cases, we use expected utility (see Section 5.1.2). As an example, suppose there are two states of the world: W (wealth) will be either 1 or 2 with probabilities
1=3 and 2=3. If U.W / D ln W , then E U.W / D 1=3 ln 1 C 2=3 ln 2:
Third, the functional form of the utility function is such that more is better and uncertainty is bad (investors are risk averse).
5.1.2

Expected Utility Theorem

P
Expected utility, E U.W /, is the right thing to maximize if the investors’ preferences
U.W / are
1. complete: can rank all possible outcomes;
2. transitive: if A is better than B and B is better than C , then A is better than C
(sounds like some basic form of consistency);
3. independent: if X and Y are equally preferred, and Z is some other outcome, then the following gambles are equally preferred
X with prob

and Z with prob 1

Y with prob

and Z with prob 1

(this is the key assumption); and
4. such that every gamble has a certainty equivalent (a non-random outcome that gives the same utility, fairly trivial).
5.1.3

Basic Properties of Utility Functions: (1) More is Better

The idea that more is better (nonsatiation) is almost trivial. If U.W / is differentiable, then this is the same as that marginal utility is positive, U 0 .W / > 0.
Example 5.1 (Logarithmic utility) U.W / D ln W so U 0 .W / D 1=W (assuming W >
0).

98

5.1.4

Basic Properties of Utility Functions: (2) Risk aversion

With a utility function, risk aversion (uncertainty is considered to be bad) is captured by the concavity of the function.
As an example, consider Figure 5.1. It shows a case where the portfolio (or wealth, or consumption,...) of an investor will be worth Z or ZC , each with a probability of a half.
This utility function shows risk aversion since the utility of getting the expected payoff for sure is higher than the expected utility from owning the uncertain asset
U ŒE.Z/ > 0:5U.Z / C 0:5U.ZC / D E U.Z/:

(5.1)

This is a way of saying that the investor does not like risk.
Rearranging gives
U ŒE.Z/

U .Z / > U.ZC /

U ŒE.Z/;

(5.2)

which says that a loss (left hand side) counts for more than a gain of the same amount.
Another way to phrase the same thing is that a poor man appreciates an extra dollar more than a rich man. This is a key property of a concave utility function—and it has an immediate effect on risk premia.
The (lowest) price (P ) the investor is willing to sell this portfolio for is the certain amount of money which gives the same utility as E U .Z/, that is, the value of P that solves the equation
U.P / D E U.Z/:
(5.3)
This price P is also called the certainty equivalent of the portfolio. From (5.1) we know that this utility is lower than the utility from the expected payoff, U.P / < U ŒE.Z/—and we also know that the utility function is an increasing function. It follows directly that the price is lower than the expected payoff
P < E.Z/ D 0:5Z C 0:5ZC :

(5.4)

See Figure 5.1 for an illustration.
Example 5.2 (Certainty equivalent) Suppose you have a CRRA utility function and own an asset that gives either 85 or 115 with equal probability. What is the certainty equivalent

99

Concave utility function
U(EZ )

Utility

U(P )

EU(Z )

Two outcomes (Z− or Z+ ) with equal probabilities
EZ = 0.5Z− + 0.5Z+
P is the certainly equivalent: it solves U(P ) = EU(Z )
Risk aversion implies that P < EZ

Z−

P

EZ

Z+

Figure 5.1: Example of a utility function
(that is, the lowest price you would sell this asset for)? The answer is the P that solves
P1 k
851 k
1151 k
D 0:5
C 0:5
:
1k
1k
1k
(The answer is P D .0:5 851 k C 0:5 1151 k /1=.1 k / :) For instance, with k D 0, 2,
5, 10, and 25 we have P
100, 97.75, 94.69, 91.16, and 87.49. Note that if we scale the asset payoffs (here 85 and 115) with some factor, then the price is scaled with the same factor. This is a typical feature of the CRRA utility function.
This means that the expected gross return on the risky portfolio that the investor demands is
E RZ D E.Z/=P > 1;
(5.5)
which is greater than unity. This “required return” is higher if the investor is very risk averse (very concave utility function). On the other hand, it goes towards unity as the investor becomes less and less risk averse (the utility function becomes more and more linear). In the limit (a risk neutral investor), the required return is unity. Loosely speaking, we can think of E RZ 1 as a risk premium (more generally, the risk premium is E RZ minus a riskfree rate). Notice that this analysis applies to the portfolio (or wealth, or
100

consumption,...) that is the argument of the utility function—not to any individual asset.
To analyse an individual asset, we need to study how it changes the argument of the utility function, so the covariance with the argument plays a key role.
Example 5.3 (Utility and two states) Suppose the utility function is logarithmic and that
.Z ; ZC / D .1; 2/. Then, expected utility in (5.1) is
E U .Z/ D 0:5 ln 1 C 0:5 ln 2

0:35;

so the price must be such that ln P

0:35, that is, P

e 0:35

1:41:

The expected return (5.5) is
.0:5
5.1.5

1 C 0:5

2/ =1:41

1:06:

Is Risk Aversion Related to the Level of Wealth?

We now take a closer look at what the functional form of the utility function implies for investment choices. In particular, we study if risk aversion will be related to the wealth level. First, define absolute risk aversion as
A.W / D

U 00 .W /
;
U 0 .W /

(5.6)

where U 0 .W / is the first derivative and U 00 .W / the second derivative. Second, define relative risk aversion as
R.W / D WA.W / D

W U 00 .W /
:
U 0 .W /

(5.7)

These two definitions are strongly related to the attitude towards taking risk.
Consider an investor with wealth W who can choose between taking on a zero mean risk Z (so E Z D 0) or pay a price P . He is indifferent if
E U.W C Z/ D U.W

P /:

(5.8)

101

If Z is a small risk, then we can make a second order approximation
P

A.W / Var.Z/=2;

(5.9)

which says that the price the investor is willing to pay to avoid the risk Z is proportional to the absolute risk aversion A.W /.
Proof. (of (5.9)) Approximate as
E U.W C Z/

U .W / C U 0 .W / E Z C U 00 .W / E Z 2 =2

D U.W / C U 00 .W / Var.Z/=2;

since E Z D 0. (We here follow the rule of adding terms to the Taylor approximation to have two left after taking expectations.) Now, approximate U.W
P/
U .W /
0
U .W /P . Set equal to get (5.9).
If we change the example in (5.8)–(5.9) to make the risk proportional to wealth, that is Z D W z where z is the risk factor, then (5.9) directly gives
P
P =W

A.W /W 2 Var.z/=2, so
R.W / Var.z/=2;

(5.10)

which says that the fraction of wealth (P =W ) that the investor is willing to pay to avoid the risk (z ) is proportional to the relative risk aversion R.W /.
These results mostly carry over to the portfolio choice: high absolute risk aversion typically implies that only small amounts are invested into risky assets, whereas a high relative risk aversion typically leads to small portfolio weight on risky assets.
Figure 5.2 demonstrates a number of commonly used utility functions, and the following discussion outlines their main properties.
Remark 5.4 (Mean-variance utility and portfolio choice) Suppose expected utility is E.1C
Rp /W0 k VarŒ.1 C Rp /W0 =2 where W0 is initial wealth and the portfolio return is
Rp D vR1 C .1 v /Rf , where R1 is a risky asset and Rf a riskfree asset. The optimal portfolio weight is
1 E R1 Rf
:
vD kW0 Var.R1 /
A poor investor therefore invests the same amount as a rich investor (vW0 does not depend on W0 ), and his portfolio weight on the risky asset (v ) is larger.
102

CARA

CRRA
W 1−γ / (1 − γ )
− exp (− kW )

γ =2 γ =5

k=2 k=5 W

W

Figure 5.2: Examples of utility functions
The CARA utility function (constant absolute risk aversion), U.W / D e k W , is also quite simple to use (in particular when returns are normally distributed—see below), but has the unappealing feature that the amount invested in the risky asset (in a risky/riskfree trade-off) is constant across (initial) wealth levels. This means, of course, that wealthy investors have a lower portfolio weight on risky assets. See the following remark for the algebra. Remark 5.5 (Risk aversion in CARA utility function) U.W / D e k W gives U 0 .W / D ke k W and U 00 .W / D k 2 e k W , so we have A.W / D k . This means an increasing relative risk aversion, R.W / D W k , so a poor investor typically has a larger portfolio weight on the risky asset than a rich investor.
Remark 5.6 (CARA utility function and portfolio choice) Let wealth be W D W0 .1 C
Rp /;where W0 is initial wealth and Rp the portfolio return. If the risky return is normally distributed, then k W0 .1 C Rp / is a normally distributed variable. Proposition 5.13 (see below) then shows that the portfolio choice is the same as with mean-variance preferences. Hence, the conclusions in Remark 5.4 apply: the amount invested in the risky asset is independent of initial wealth—and the portfolio weight is therefore decreasing in initial wealth. The CRRA utility function (constant relative risk aversion) is often harder to work with, but has the nice property that the portfolio weights are unaffected by the initial wealth (once again, see the following remark for the algebra). Most evidence suggests that the CRRA utility function fits data best. For instance, historical data show no trends
103

in portfolio weights or risk premia—in spite of investors having become much richer over time. Remark 5.7 (Risk aversion in CRRA utility function) U.W / D W 1 k =.1 k / gives
U 0 .W / D W k and U 00 .W / D k W k 1 , so we have A.W / D k=W and R.W / D k .
The absolute risk aversion decreases with the wealth level in such a way that the relative risk aversion is constant. In this case, a poor investor typically has the same portfolio weight on the risky asset as a rich investor.

5.2

Utility Optimization and the Two-Fund Theorem

This section demonstrates that if the utility function can be (re-)written in terms of the expected value and the variance of wealth, then the investor will hold a mix of only
2 portfolios (funds): the riskfree asset and the tangency portfolio (from mean-variance analysis). 5.2.1

General Utility-Based Portfolio Choice

For simplicity, assume that consumption equals wealth, which we normalize to unity. The optimization problem with a general utility function, two risky and a riskfree asset is then maxv1 ;v2 E U Rp , where
Rp D v1 R1 C v2 R2 C .1

v1

(5.11) v2 /Rf

e e D v1 R1 C v2 R2 C Rf :

(5.12)
(5.13)

e where Ri is the excess return on asset i and Rf is a riskfree rate.
The first order conditions for the portfolio weights are

@ E U.Rp /
@ E U.Rp /
D 0 and
D 0;
@v1
@v2

(5.14)

which defines two equations in two unknowns: v1 and v2 (use (5.13) to substitute for Rp ).
Suppose we have chosen some utility function and that we know the distribution of the returns—it should then be possible to solve (5.14) for the portfolio weights. Unfortunately, that can be fairly complicated. For instance, utility might be highly non-linear so the calculation of its expected value involves difficult integrations (possibly requiring
104

numerical methods since there is no analytical solution). With many assets there are many first order conditions, so the system of equations can be large.
Remark 5.8 (Alternative way of writing the first order condition ) In some treatments of advanced finance topics, you may find the first order condition written as EŒ@U.Rp /=@Rp e R1  D 0 instead. It is straightforward to rewrite (5.14) on this form. First, notice that
C
@ E U.Rp /=@v1 D EŒ@U.Rp /=@v1 . To see why assume 2 possible outcomes Rp and Rp with a probability of the former. Then
@
@ E U.Rp /
D
@Rp

U .Rp / C .1

C
/U.Rp /

@Rp

@U.Rp /

D

@Rp

C.1

/

C
@U.Rp /

@Rp

Second, use the chain rule to write EŒ@U.Rp /=@v1  as EŒ@U.Rp /=@Rp e notice that @Rp =@v1 D R1 .

DE

@U.Rp /
:
@Rp

@Rp =@v1  and

Example 5.9 (Portfolio choice with log utility and two states) Suppose U.Rp / D ln Rp , and that there is only one risky asset. The excess return on the risky asset Re is either a low value Re (with probability ) or a high value ReC (with probability 1
). The optimization problem is then maxv E U Rp where E U Rp D

ln vRe C Rf C .1

/ ln vReC C Rf :

The first order condition (@ E U Rp =@v D 0) is
Re
C .1 vRe C Rf

/

Re C
D 0; vReC C Rf

so we can solve for the portfolio weight as vD Rf

For instance, with Rf D 1:1; Re D vD 1:1

0:5

/ Re C
Re C .1
:
Re Re C
0:3; ReC D 0:4, and

. 0:3/ C .1 0:5/ 0:4
. 0:3/ 0:4

D 0:5, we get
0:46:

See Figure 5.3 for an illustration.

105

Utility, expected value of ln(R)
0.12
0.1
0.08
0.06
0.04
0.02
0
−0.02

Two assets: riskfree (Rf ) and risky (R)
Rf = 1.1 and R = 0.8 or 1.5 with equal probability

−0.04
−1

−0.5

0
Weight on risky asset

0.5

1

Figure 5.3: Example of portfolio choice with a log utility function
5.2.2

When is the Optimal Portfolio on the Minimum-Variance Frontier?

There are important cases where we can side-step most of the problems with solving
(5.14)—since it can be shown that the portfolio choice will actually be such that a portfolio on the minimum-variance frontier (upper MV frontier) will be chosen.
The optimal portfolio must be on the minimum-variance frontier when expected utility can be (re-)written as a function in terms of the expected return (increasing) and the variance (decreasing) only, that is

with @V .

E U Rp D V .

2 p ; p /=@ p

(5.15)

2 p ; p /;

> 0 and @V .

2
2
p ; p /=@ p

< 0:

For an illustration, see Figure 5.4 which shows the isoutility curves (curves with equal
2
utility) from a mean-variance utility function (E U.Rp / D p .k=2/ p ). Whenever expected utility obeys (5.15) (not just for the mean-variance utility function) the isoutility curves will look similar—so the optimum is on the minimum-variance frontier. The intuition behind (5.15) is that an investor wants to move as far to the north-west as possible in Figure 5.4—but that he/she is willing to trade off lower expected returns for lower volatility, that is, has isoutility functions as in the figure. What is possible is clearly given
106

Utility contours, E(Rp ) − (k/2)Var(Rp )
0.1
k=5
0.08
k=7
0.06
Mean

k=9

0.04

0.02

0
0

0.05

0.1

0.15

Std

Figure 5.4: Iso-utility curves, mean-variance utility with different risk aversions by the mean-variance frontier—so the solution is a point on the upper frontier. (This can also be shown algebraically, but it is slightly messy.) Conditions for (5.15) are discussed below. In the case with both a riskfree and risky assets, this means that all investors (provided they have the same beliefs) will pick some mix of the riskfree asset and the tangency portfolio (where the ray from the riskfree rate is tangent to the mean-variance frontier of risky assets). This is the two-fund theorem. Notice that all this says is that the optimal portfolio is somewhere on the mean variance frontier. We cannot tell exactly where unless we are more precise about the exact form of the preferences.
See Figures 5.5–5.6 for examples of cases when we do not get a mean-variance portfolio.
Remark 5.10 (Taylor expansion of the utility function ) Make a Taylor series expansion

107

Expected utility

Expected utility, contours
2
1.8

−0.19

1.6 v2 −0.18

1.4

−0.2

1.2

2
1.5
v2

1

−0.4
−0.8−0.6
−1.2−1 v1

1
−1.2

−1

−0.8 −0.6 v1 −0.4

Utility function:
R1−γ / (1 − γ ), γ = 5

MV frontiers

Two risky assets (A and B) and one riskfree asset
Three states with equal probability:

std

B

1.1

State 1
State 2
State 3

A

A
0.970
1.080
1.200

B
0.960
1.220
1.150

Rf
1.065
1.065
1.065

1.05
0

0.05

0.1 mean 0.15

0.2

Figure 5.5: Example of when the optimal portfolio is (very slightly) off the MV frontier of the utility function around the expected portfolio return
U.Rp / D U.E Rp / C U 0 .E Rp / Rp
1
C U 000 .E Rp / Rp
6

E Rp

Take expectations to get (since E Rp

3

1
E Rp C U 00 .E Rp / Rp
2

E Rp

2

C H4 :

E Rp D 0)

1
1
E U.Rp / D U.E Rp / C U 00 .E Rp / Var.Rp / C U 000 .E Rp /Skew.Rp / C E H4 ;
2
6 where Skew.Rp / is the third central moment, E Rp

3

E Rp . For a CRRA utility func-

108

Expected utility

Expected utility, contours
2

1.12

1.8

1.1 v2 1.6

1.08

1.4
1.06
2

1.2
1.5
v2

1

−0.4
−0.8−0.6
−1.2−1 v1

1
−1.2

−1

−0.8 −0.6 v1 −0.4

Utility function:
E(R) − (k/ 2)Var(R) + (l/ 3)Skew (R), k = 3.6, l = 0.15

MV frontiers

Two risky assets (A and B) and one riskfree asset
Three states with equal probability:

std

B

1.1

State 1
State 2
State 3

A

A
0.970
1.080
1.200

B
0.960
1.220
1.150

Rf
1.065
1.065
1.065

1.05
0

0.05

0.1 mean 0.15

0.2

Figure 5.6: Example of when the optimal portfolio is (very slightly) off the MV frontier tion, .1 C Rp /1
U 00 .E Rp / D

=.1

/, we have

.1 C E Rp /

1

< 0 and U 000 .E Rp / D .1 C /.1 C E Rp /

2

> 0;

so variance is bad, but skewness is good. For a normal distribution, the skewness is zero.
5.2.3

The Equilibrium Effect of the Two-Fund Theorem

If all investors hold the tangency portfolio, then it must be the market portfolio—since the riskfree asset is in zero net supply: the average (or aggregate) investor holds no riskfree assets. Note that this observation is about how the equilibrium must look like—not about how we get there.

109

Capital market line

Security market line
15

10

10

Mean, %

Mean, %

15

5
0

0

5

10

5
0

15

0

0.5

Std, %
CML: ER = Rf + σ × (ERm − Rf )/σm
Location of efficient portfolios

1 β 1.5

2

SML: ER = Rf + β (ERm − Rf )
Location of all assets

Figure 5.7: Capital market line and security market line
There are several important implications of this. First, all optimal portfolios (denoted opt ) are on the capital market line opt D Rf C

e m opt ;

(5.16)

m

where e and m are the expected value and the standard deviation of the excess return m of the market portfolio. This is clearly the same as the upper leg of the MV frontier (with risky assets and riskfree asset). See Figure 5.7 for an example. e e
Proof. (of (5.16)) Ropt D aRm C .1 a/Rf , so Ropt D aRm . We then have e e
0). Solve for a from the latter (a D opt = m ) opt D a m and opt D a m (since a and use in the former.
Second, we get a beta representation (see lecture notes on CAPM). For any portfolio, e the expected excess return ( p ) is linearly related to the expected excess return on the market portfolio ( e ) according to m e i D ˇi

e m, where ˇi D

Cov .Ri ; Rm /
:
Var .Rm /

(5.17)

The plot of i (that is, e C Rf ) against ˇi (for different assets, i ) is called the security i market line. See Figure 5.7 for an example.
Remark 5.11 (Minimum variance portfolios of risky assets only ) Any portfolio in the set of minimum variance portfolios solves the problem min 2 Rp subject to the re110

strictions that the portfolio mean is
(E Rp D
) and that the weights sum to unity
(˙in 1 wi D 1). We can retrace the entire set by combining any two portfolios in this
D
/ wT , where wg D ˙ 1 1n =10n ˙ 1 1n set. For instance, we can use we D wg C .1 is the global minimum variance portfolio and wT D ˙ 1 e =10 ˙ 1 e is the tangency
0
0 portfolio. The mean net return can be calculated as we and the variance as we ˙we .
Remark 5.12 (Minimum variance portfolios with risky and riskfree assets ) Adding a riskfree asset with gross return Rf transforms the minimum-variance set to two straight lines. The upper one (typically) is a ray that starts at Rf and goes through the tangency portfolio. 5.2.4

Special Cases

This section outlines special cases when the utility-based portfolio choice problem can be rewritten as in (5.15) (in terms of mean and variance only), so that the optimal portfolio belongs to the minimum-variance set. (Recall that with a riskfree asset this minimumvariance set is a ray that starts at Rf and goes through the tangency portfolio.)
Case 1: Mean-Variance Utility
2
We know that if the investor maximizes E.Rp /
.Rp /k=2, then the optimal portfolio is on the mean-variance frontier. Clearly, this is the same as assuming that the utility function is U.Rp / D Rp ŒRp E.Rp /2 k=2 (evaluate E U.Rp / to see this).

Case 2: Quadratic Utility
If utility is quadratic in the return (or equivalently, in wealth)
2
bRp =2;

U.Rp / D Rp

(5.18)

then expected utility can be written
E U.Rp / D E Rp
D E Rp

2 b E.Rp /=2



2

.Rp / C E.Rp /2 =2

(5.19)

2 since 2 .Rp / D E.Rp / E.Rp /2 . (We assume that all these moments are finite.) For b > 0 this function is decreasing in the variance, and increasing in the mean return

111

(as long as b E.Rp / < 1). The optimal portfolio is therefore on the minimum-variance frontier. See Figure 5.9 for an example.
The main drawback with this utility function is that we have to make sure that we are on the portion of the curve where utility is increasing (below the so called “bliss point”).
Moreover, the quadratic utility function has the strange property that the amount invested in risky assets decreases as wealth increases (increasing absolute risk aversion).
Case 3: Normally Distributed Returns
When the distribution of any portfolio return is fully described by the mean and variance, then maximizing E U.Rp / will result in a mean variance portfolio—under some extra assumptions about the utility function discussed below. A normal distribution (among a few other distributions) is completely described by its mean and variance. Moreover, any portfolio return would be normally distributed if the returns on the individual assets have a multivariate normal distribution (recall: x C y is normally distributed if x and y are).
The extra assumptions needed are that utility is strictly increasing in wealth (U 0 .Rp / >
0), displays risk aversion (U 00 .Rp / < 0), and utility must be defined for all possible outcomes. The later sounds trivial, but it is not. For instance, the logarithmic utility function
U.Rp / D ln.Rp / cannot be combined with returns (end of period wealth) that can take negative values (for instance, ln. 1/ D i which is not a real number which is something we require from a utility function).
The algebra required to show this is a bit messy, but the idea is essentially that the mean and variance fully describe the normal distribution. Since increasing concave utility functions are increasing in the mean and decreasing in the variance (of the portfolio return), the result is quite intuitive.
Normally distributed returns should be considered as an approximation for three reason. First, limited liability means that the gross return can never be negative (the asset price cannot be negative), that is, the simple net return can never be less than -100%. A normal distribution cannot rule out this possibility (although it may have a very low probability). Second, option returns have distributions which are clearly different from normal distributions: a lot of probability mass at exactly -100% (no exercise) and then a continuous distribution for higher returns. Third, empirical evidence suggests that most asset returns have distributions with fatter tails and more skewness than implied by a normal distribution, especially when the returns are measured over short horizons.
112

As an illustration, suppose the investor maximizes a utility function with constant absolute risk aversion k > 0
U.Rp / D

exp. Rp k/:

(5.20)

(It is straightforward to show that this utility function satisfies the extra conditions.)
Proposition 5.13 If returns are normally distributed, then maximizing the expected value of the CARA utility function is the same as solving a mean-variance problem.
Proof. (of Proposition 5.13) First, recall that if x
Therefore, rewrite expected utility as
E U.Rp / D E

exp

Rp k

D

exp

N

;

E.Rp /k C

2

, then E e x D e
2

C

2 =2

.

.Rp /k 2 =2 :

Notice that the assumption of normally distributed returns is crucial for this result. Second, recall that if x maximizes (minimizes) f .x/, then it also maximizes (minimizes) g Œf .x/ if g is a strictly increasing function. The function ln . z / =k is defined for z < 0 and it is increasing in z , see Figure 5.8. We can apply this function by letting z be the right hand side of the previous equation to get ln. z /=k D E.Rp /

2

.Rp /k=2:

Therefore, maximizing the expected CARA utility or MV preferences (in terms of the returns) gives the same solution. (When utility is written in terms of wealth W0 .1 C Rp / where Rp is the portfolio return, the last equation becomes W0 E.1CRp / W02 2 .Rp /k=2.)

Case 4: CRRA Utility and Lognormally Distributed Portfolio Returns
Proposition 5.14 Consider a CRRA utility function, .1 C Rp /1 =.1
/, and suppose all log portfolio returns, rp D ln.1 C Rp /, happen to be normally distributed. The solution is then, once again, on the mean-variance frontier.
This result is especially useful in analysis of multi-period investments. (Notice, however, that this should be thought of as an approximation since 1 C Rp D ˛.1 C R1 / C
.1 ˛ /.1 C R2 / is not lognormally distributed even if both R1 and R2 are.)
113

− ln(−z )/k
2

ln[z(1 − γ )]/(1 − γ )

k=1 k=5 1

γ =3 γ =5

0.5
0

0

−0.5

−1

−1

−2
−10

−8

−6

−4

−2

0

−10

−8

−6

−4

z

−2

0

z

Figure 5.8: Transforming expected utility
Utility contours, CARA, k = 7

Utility contours, CARA, k = 11
0.1
mean net return

mean net return

0.1 o 0.05

normal returns

0

o
0.05

normal returns

0
0

0.05

0.1

0.15

0

0.05

Std

Utility contours, CRRA, γ = 7

0.15

Utility contours, CRRA, γ = 11
0.1
mean net return

0.1 mean net return

0.1
Std

o
0.05

lognormal returns

0

o
0.05

lognormal returns

0
0

0.05

0.1
Std

0.15

0

0.05

0.1

0.15

Std

Figure 5.9: Contours with same utility level when returns are normally or lognormally distributed. The means and standard deviations (on the axes) are for the net returns (not log returns).

114

See Figure 5.9 for an example.
Proof. (of Proposition 5.14) Notice that
E.1 C Rp /1
1

E expŒ.1
1

D

/rp 

, where rp D ln.1 C Rp /:

(Clearly, when utility is written in terms of wealth W0 .1 C Rp /, boths sides are multiplied by W01 , which does not affect the optimization problem.) Since rp is normally
2
distributed, the expectation is (recall that if x N . ; 2 /, then E e x D e C =2 )
1
1

E expŒ.1

/rp  D

1
1

expŒ.1

/ E rp C .1

/2

2

.rp /=2:

Assume that > 1. The function ln Œz.1
/ =.1
/ is then defined for z < 0 and it is increasing in z , see Figure 5.8.b. Let z be the the right hand side of the previous equation and apply the transformation to get
E rp C .1

/

2

.rp /=2;

which is increasing in the expected log return and decreasing in the variance of the log return (since we assumed 1
< 0). To express this in terms of the mean and variance of the return instead of the log return we use the following fact: if ln y N . ; 2 /, then p E y D exp. C 2 =2/ and Std .y/ = E y D exp. 2 / 1. Using this fact on the previous expression gives ln.1 C E Rp /

lnŒ

2

.Rp /=.1 C E Rp /2 C 1=2;

which is increasing in E Rp and decreasing in portfolio. 5.3

2

.Rp /. We therefore get a mean-variance

Application of Normal Returns: Value at Risk, ES, Lpm and the
Telser Criterion

The mean-variance framework is often criticised for failing to distinguish between downside (considered to be risk) and upside (considered to be potential). This section illustrates that normally distributed returns often lead to minimum variance portfolios even if the portfolio selection model seems to be far from the standard mean-variance utility
115

function.
5.3.1

Value at Risk and the Telser Criterion

If the return is normally distributed, R

N. ;

VaR˛ D where c1

˛

is the 1

2

/, then the ˛ value at risk, VaR˛ , is

. C c1

˛

(5.21)

/;

˛ quantile of a N(0,1) distribution, for instance, 1:64 for 5%.

Example 5.15 (VaR with R
N . ; 2 /) If D 8% and D 16%, then VaR95% D
.0:08 1:64 0:16/ 0:18; we are 95% sure that we will not loose more than 18% of the investment.
Suppose we abandon MV preferences and instead choose to minimize the Value at
Risk—for a given mean return. With normally distributed returns, the value at risk (5.21) is a strictly increasing function of the standard deviation (and the variance). Hence, minimizing the value at risk gives the same solution (portfolio weights) as minimizing the variance. (However, it should be noted that the VaR approach is often used when data is thought to be strongly non-normal.)
Another portfolio choice approach is to use the value at risk as a restriction. For instance, the Telser criterion says that we should maximize the expected portfolio return subject to the restriction that the value at risk (at some given probability level) does not exceed a given level.
The restriction could be that the VaR95% should be less than 10% of the investment.
With a normal distribution, (5.21) says that the portfolio must be such that the mean and standard deviation satisfy
.

p

1:64

p/ p < 0:1, or
>

0:1 C 1:64

p:

(5.22)

The portfolio choice problem according to the Telser criterion is then to choose the portfolio weights (vi ) to maxvi p

subject to

p

>

0:1 C 1:64

p

and ˙in 1 vi D 1:
D

(5.23)

116

Telser criterion
0.1

maximize expected return subject to VaR < 0.1 shaded area shows where VaR < 0.1

µ (mean)

0.08
0.06
0.04
0.02
0

MV (risky)
MV
−0.10 + 1.64σ

0

0.05

0.1

0.15

σ (std)

Figure 5.10: Telser criterion and VaR
More generally, the Telser criterion is maxvi p

subject to

p

>

VaR˛

c1

˛p

and ˙in 1 vi D 1;
D

(5.24)

where c1 ˛ is the 1 ˛ quantile of a N.0; 1/ distribution.
This problem is illustrated in Figure 5.10, for different VaR restrictions. Any point above a line satisfies the respective restriction, and the issue is to pick the one with the highest possible expected return—among those available. In particular, there are no portfolios above the minimum-variance frontier (with or without a riskfree asset). A lower
VaR is, of course, a tougher restriction.
If the restriction intersects the minium-variance frontier, the solution is the highest intersection point. This is indeed a point on the minimum-variance frontier, which shows that the Telser criterion applied to normally distributed returns leads us to a minimumvariance portfolio. If the restriction doesn’t intersect, then there is no solution to the problem (the restriction is too demanding, the VaR too low).

117

5.3.2

Expected Shortfall

The expected shortfall is the expected loss when the return actually is below the VaR˛ .
For normally distributed returns, R N . ; 2 /, it can be shown that
ES˛ D

C

.c1 ˛ /
;


(5.25)

where ./ is the pdf or a N.0; 1/ variable.
Example 5.16 If D 8% and
.1:64/=0:05 0:25.

D 16%, the 95% expected shortfall is ES95% D

0:08 C

Notice that the expected shortfall for a normally distributed return (5.25) is a strictly increasing function of the standard deviation (and the variance). As for the VaR, this means that minimizing expected shortfall at a given mean return therefore gives the same solution (portfolio weights) as minimizing the variance at the same given mean return.
5.3.3

Lower Partial 2nd Moment

Reference: Bawa and Lindenberg (1977) and Nantell and Price (1979)
Using the variance (or standard deviation) as a measure of portfolio risk (as a meanvariance investor does) fails to distinguish between the downside and upside. As an alternative, one could consider using a lower partial 2nd moment instead. It is defined as p .h/

D EŒmin.Rp

(5.26)

h; 0/2 ;

where h is a “target level” chosen by the investor. In the subsequent analysis it will be set equal to the riskfree rate.
Suppose investors preferences are such that they like high expected returns and dislike the lower partial second moment with a target level equal to the riskfree rate (denoted p to keep the notation brief), that is, if their expected utility can be written as
E U Rp D V .

p;

p /,

with @.

p;

p /=@ p

> 0 and @.

p;

p /=@ p

< 0:

(5.27)

The results in Bawa and Lindenberg (1977) and Nantell and Price (1979) demonstrate several important things. First, there is still a two-fund theorem: all investors hold a combination of a market portfolio and the riskfree asset, so there is a capital market line
118

Mean−target semivariance frontier
15

Normally distributed returns

Mean, %

10

E(R)
Std(R)

6.00
4.80

Correlation matrix:
1.00 0.33
0.33 1.00
0.45 0.05

5

12.50 10.50
12.90 9.00

0.45
0.05
1.00

Risky
Risky and riskfree
0

0

5
10
Target semivariance, %

15

Figure 5.11: Lower partial 2nd moment and expected returns
Std and mean
15

The markers for target semivariance (sv) indicate the std of the portfolio that minimizes the target semivariance at the given mean return

10
Mean, %

MV (risky)
MV (risky&riskfree) target sv (risky) target sv (risky&riskfree)
5

0

0

5

10

15

Std, %

Figure 5.12: Standard deviation and expected returns as in (5.16). See Figure 5.11 for an illustration (based on normally distributed returns, which is not necessary). Second, there is still a beta representation as in (5.17), but where the beta coefficient is different.
119

Third, in case the returns are normally distributed (or t -distributed), then the optimal portfolios are also on the mean-variance frontier, and all the usual MV results hold. See
Figure 5.12 for a numerical illustration.
The basic reason is that p .h/ is increasing in the standard deviation (for a given mean). This means that minimizing p .h/ at a given mean return gives exactly the same
2
solution (portfolio weights) as minimizing p (or p ) at the same given mean return.
As a result, with normally distributed returns, an investor who wants to minimize the lower partial 2nd moment (at a given mean return) is behaving just like a mean-variance investor. Remark 5.17 (Lpm calculation for normally distributed variable ) For an N. ; able, the lower partial 2nd moment around the target level h is p .h/

D

2

2

a .a/ C

.a2 C 1/˚.a/, where a D .h

2

/ vari-

/= ;

while ./ and ˚./ are the pdf and cdf of a N.0; 1/ variable respectively. Notice that
2
=2 for h D . It is straightforward to show that p .h/ D
@

p .h/

@

D 2 ˚.a/;

so the lower partial moment is a strictly increasing function of the standard deviation.

5.4

Behavioural Finance

Reference: Elton, Gruber, Brown, and Goetzmann (2010) 18; Forbes (2009); Shefrin
(2005)
There is relatively little direct evidence on investor’s preferences (utility). For obvious reasons, we can’t know for sure what people really like. The evidence we do have is from two sources: “laboratory” experiments designed to elicit information about the test subject’s preferences for risk, and a lot of indirect information.
5.4.1

Evidence on Utility Theory

The laboratory experiments are typically organized at university campuses (mostly by psychologists and economists) and involve only small compensations—so the test subjects are those students who really need the monetary compensation for taking part or
120

those that are interested in this type of psychological experiments. The results vary quite a bit, but a main theme is that the main assumptions in utility-based portfolio choice might be reasonable, but there are some important systematic deviations from these assumptions.
For instance, investors seem to be unwilling to realize losses, that is, to sell off assets which they have made a loss on (often called the “disposition effect”). They also seem to treat the investment problem much more on an asset-by-asset basis than suggested by mean-variance analysis which pays a lot of attention to the covariance of assets (sometimes called mental accounting). Discounting appears to be non-linear in the sense that discounting is higher when comparing today with dates in the near future than when comparing two dates in the distant future. (Hyperbolic discount factors might be a way to model this, but lead to time-inconsistent behaviour: today we may prefer an asset that pays off in t C 2 to an asset than pays off in t C 1, but tomorrow our ranking might be reversed.) Finally, the results seem to move towards tougher play as the experiments are repeated and/or as more competition is introduced—although the experiments seldom converge to ultra tough/egoistic behaviour (as typically assumed by utility theory).
The indirect evidence is broadly in line with the implications of utility-based theory— especially now that the costs for holding well diversified portfolios have decreased (mutual funds). However, there are clearly some systematic deviations from the theoretical implications. For instance, many investors seem to be too little diversified. In particular, many investors hold assets in companies/countries that are very strongly correlated to their labour income (local bias). Moreover, diversification is often done in a naive fashion and depend on the “menu” of choices. For instance, many pension savers seems to diversify by putting the fraction 1=n in each of the n funds offered by the firm/bank—irrespective of what kind of funds they are. There are, of course, also large chunks of wealth invested for control reasons rather than for a pure portfolio investment reason (which explains part of the so called “home bias”—the fact that many investors do not diversify internationally).
5.4.2

Evidence on Expectations Formation (Forecasting)

In laboratory experiments (and studies of the properties of forecasts made by analysts), several interesting results emerge on how investors seems to form expectations. First, complex situations are often approached by treating them as a simplified representative problem—even against better knowledge (often called “representativeness”)—and stands in contrast to the idea of Bayesian learning where investors update and learn from their
121

mistakes. Second (and fairly similar), difficult problems are often handled as if they were similar to some old/easy problem—and all that is required is a small modification of the logic (called “anchoring”). Third, recent events/data are given much higher weight than they typically warrant (often called “recency bias” or “availability”). Finally, most forecasters seem to be overconfident: they draw too strong conclusions from small data sets (“law of small numbers”) and overstate the precision of their own forecasts.
Notice, however, that it is typically difficult to disentangle (distorted) beliefs from non-traditional preferences. For instance, the aversion of selling off bad investments, may equally well be driven by a belief that past losers will recover.
5.4.3

Prospect Theory

The prospect theory (developed by Kahneman and Tversky) try to explain several of these things by postulating that the utility function is concave over some reference point (which may shift), but convex below it. This means that gains are treated in a risk averse way, but losses in a risk loving way. For instance, after a loss (so we are below the reference point) an asset looks less risky than after a gain—which might explain why investors hold on to losing investments. Clearly, an alternative explanation is that investors believe in meanreversion (losing positions will recover, winning positions will fall back). In general, it is hard to make a clear distinction between non-classical preferences and (potentially distorted) beliefs.

Bibliography
Bawa, V. S., and E. B. Lindenberg, 1977, “Capital market equilibrium in a mean-lower partial moment framework,” Journal of Financial Economics, 5, 189–200.
Cochrane, J. H., 2001, Asset pricing, Princeton University Press, Princeton, New Jersey.
Danthine, J.-P., and J. B. Donaldson, 2002, Intermediate financial theory, Prentice Hall.
Elton, E. J., M. J. Gruber, S. J. Brown, and W. N. Goetzmann, 2010, Modern portfolio theory and investment analysis, John Wiley and Sons, 8th edn.
Forbes, W., 2009, Behavioural finance, Wiley.
122

Huang, C.-F., and R. H. Litzenberger, 1988, Foundations for financial economics, Elsevier
Science Publishing, New York.
Ingersoll, J. E., 1987, Theory of financial decision making, Rowman and Littlefield.
Nantell, T. J., and B. Price, 1979, “An analytical comparison of variance and semivariance capital market theories,” Journal of Financial and Quantitative Analysis, 14, 221–242.
Shefrin, H., 2005, A behavioral approach to asset pricing, Elsevier Academic Press,
Burlington, MA.

123

6

CAPM Extensions

Reference: Elton, Gruber, Brown, and Goetzmann (2010) 14 and 16

6.1

Background Risk

This section discusses the portfolio problem when there is “background risk.” For instance, it often makes sense to treat labour income, social security payments and perhaps also real estate as (more or less) background risk. The same applies to the value of a liability stream. A target retirement wealth or planned future house purchase can be thought of as a virtual liability.
The existence of background will typically affect the portfolio choice and therefore also asset prices—at least as long as the background risk is correlated with some assets.
The intuition is that the assets will be used to hedge against the background risk.
6.1.1

Portfolio Choice with Background Risk

To build a simple example, consider a mean-variance investor who can choose between a riskfree asset (with return Rf ) and equity (with return R1 ). He also has a background risk—in the form of an endowment (positive or negative) of an asset (with return RH ).
This could, for instance, be labour income or a house (positive endowment) or perhaps the present value of a liability stream (negative endowment). The investor’s portfolio problem is to maximize k Var.Rp /; where
2
Rp D vR1 C RH C .1 v
/Rf

E U.Rp / D E.Rp /

e e D vR1 C RH C Rf :

(6.1)
(6.2)
(6.3)

Note that is the portfolio weight of the background risk (which is not a choice variable— rather an “endowment”) and 1 is the weight of the financial portfolio (riskfree plus equity). Recall that is negative if the background risk is a liability (so the investor is
124

endowed with a short position in the background risk).
Use the budget constraint in the objective function to get (using the fact that Rf is known) E U.Rp / D v

e
1

C

e
H

C Rf

k2 v 2

11

2

C

HH

C 2v

1H

;

(6.4)

where 11 and HH are the variances of equity and the background risk respectively, and
1H is their covariance.
The first order condition for the weight on equity, v , is @ E U.Rp /=@v D 0, that is,
0D
vD

e k 1 e 1 =k

.v

11

C

1H

1H / ,

so
(6.5)

:

11

Notice that the second term,
1H = 11 (also called the “hedging term”) depends on how important the background is in the portfolio ( ) and the “beta” of the background risk from a regression e e
RH D ˛ C ˇR1 C "; since ˇ D

1H = 11 :

(6.6)

Essentially, the hedging term is related to how equity can help us create a hedge against the background risk. If the beta is positive, then equity tends to move in the same direction as the background, so a short equity position eliminates a lot of a positive exposure ( > 0) to the background risk—and vice versa.
It is also interesting that the optimal portfolio weight (6.5) does not depend on the return on the background risk. This might seem somewhat unintuitive. After all, if an investor is rich as a troll (yes, according to Scandinavian legends, trolls are supposed to be rich) then he ought to be able to carry more risk. However, that is not how the mean variances preferences work. Rather, those preferences say something about how much extra mean returns require in order to carry a certain amount of extra volatility. (The answer does not depend on the general level of mean returns since the preferences are linear in both the portfolio mean return and variance.)
The presence of background risk has important consequences for the portfolio weights of the financial subportfolio. This subportfolio has the weights w D v=.1
/ on equity and wf D .1 v
/=.1
/ on the riskfree assets (summing to unity). By using (6.5),
125

these weights are wD v

D

1

wf D 1

e
1 =k

.1

1H

/

and

(6.7)

11

(6.8)

w:

First, when the covariance is zero ( 1H D 0), then, the equity weight is increasing in the amount of background risk ( ), while the opposite holds for the riskfree asset. The intuition is that a zero covariance means that the background risk is quite similar to a bond: having an endowment of a bond-like asset in the overall portfolio means that the financial portfolio should tilted away from actual bonds.
Second, when the covariance is positive ( 1H > 0) and we have a positive exposure to the background risk ( > 0), then the hedging term (second term) will then tilt the financial portfolio away from equity and towards the safe asset. The intuition is that the overall portfolio now includes a lot of “equity like” assets, so the financial portfolio should be tilted towards bonds. The opposite holds when the exposure to the background risk is negative (a liability, < 0) or when the background risk is negatively correlated with equity ( 1H < 0, assuming a positive exposure, > 0).
Example 6.1 (Portfolio choice with background risk) Suppose k D 3;
2
11 D 0:2 , then (6.5) gives
Case A ( D 0)
Case B ( D 0:5;
Case C ( D 0:5;

1H
1H

D 0)
D 0:01)

e
1

D 0:08 and

w1 v1 0:67 0:67
0:67 1:33
0:54 1:08

Comparing cases A and B, we see that adding background risk that is uncorrelated with equity tilts the financial portfolio towards equity. Comparing cases B and C, we see that this effect is less pronounced if the background risk is positively correlated with equity.
Example 6.2 (Portfolio choice with a liability) Continuing Example 6.1, suppose now that the background risk is a liability (short position). Then (6.5) gives v1 Case D ( D 0:5; 1H D 0)
0:67
Case E ( D 0:5; 1H D 0:01) 0:79

w1
0:44
0:53
126

Comparing cases A and D, we see that adding a liability risk risk that is uncorrelated with equity tilts the financial portfolio towards bonds. The reason is that the liability is like a short position in bonds which we cover by buying more actual bonds. Comparing cases D and E, we see that a liability risk that is positively correlated with equity tilts the financial portfolio towards equity. The reason is that the liability is now like a short position in equity which we cover by buying more equity.
Example 6.3 (Portfolio choice of young and old) Consider the common portfolio advice that young investors (with labour income) should invest relatively more in stocks than old investors (without labour income). In this case, the background risk is an endowment of
“human capital,” that is, the present value of future labour income—and current labour income can loosely be interpreted as its return. The analysis in the previous section suggests that a low correlation of stock returns and wages means that the young investor is endowed with a bond-like asset. His financial portfolio will therefore be tilted towards the risky asset—compared to the old investor. (This intuition is strengthened by the fact that labour income is typically a lot less volatile than equity returns.)
Remark 6.4 (Optimising over w directly ) Rewrite the portfolio return (6.2) as
Rp D w.1

/R1 C .1

/Rf C RH

w /.1

e
/R1 C Zf , where Zf D .1

D w.1

/Rf C RH :

Use in the objective function (and notice that Zf is a risky asset) to get
E U.Rp / D w.1

/

e
1

C

k2 w .1
2

f

/2

11

C

ff

C 2w.1

/

1f

:

The first order condition with respect to w gives
0D
wD
Since

1f

D Cov.R1 ; Zf / D

e k 1 e 1 =k

.1
1H ,

w .1
1f

/

/

11

C

1f

, so

:

11

this is the same as in (6.8).

With several risky assets the portfolio return is
Rp D v 0 R C .1

10 v

/Rf C RH ;

(6.9)
127

where v is a vector of portfolio weights, R a vector of returns on the risky assets and 1 is a vector of ones (so 10 v is the sum of the elements in the v vector). In this case we get vD˙ 1

.

e

w D v=.1

=k

SH / , and

(6.10)
(6.11)

/;

where ˙ is the covariance matrix of all assets and SH is a vector of covariances of the assets with the background risk.
Proof. (of (6.10)) The investor solves maxv v 0

e

e
H

C

k0 v ˙v C
2

C Rf

2

HH

C 2 v 0 SH ;

with first order conditions
0D

e

vD˙

1

k .˙v C SH / , so
.

e

=k

SH / :

e
As in the univariate case, the hedging term depends on betas from a regression of RH on the vector of risky assets (Re ) e RH D ˛ C ˇ 0 Re C ", since ˇ D ˙

1

SH :

(6.12)

It can also be noted that the background risk could well be a “portfolio” of different background risks, for instance, labour income plus owning a house (positive) or a planned retirement wealth and future house purchase (negative). The properties of the elements of this portfolio matters only so far as they affect the covariances SH . The portfolio weights in (6.11) will (as long as SH ¤ 0) give a portfolio that is off the mean-variance frontier.
See Figure 6.1 for an illustration.
However, the portfolio is on the mean-variance frontier of some transformed assets
Zi D .1
/Ri C RH . To see that, use the facts that v D w.1
/ and 1 10 v
D
.1 10 w/.1
/ to rewrite the portfolio return (6.9) as
Rp D .1

/w 0 R C .1

D w 0 Zi C .1

/.1

10 w/Rf C RH

10 wi /Zf , where Zi D .1

/Ri C RH :

(6.13)
128

Maximizing the objective function (6.1) subject to this new definition of the portfolio return is a standard mean-variance problem—but in terms of the transformed assets Zi
(which are all risky). Therefore, the optimal portfolio will be on the mean-variance frontier of these transformed assets. See Figure 6.1 for an illustration.
Example 6.5 (Portfolio choice, two traded assets and background risk) With two risky traded assets and background risk the investor maximizes E.Rp / k Var.Rp /, where
2
e e e
Rp D v1 R1 C v2 R2 C RH C Rf , that is maxv1 ;v2 v1

e
1 Cv2

e
2C

k2 v 21

e
H CRf

11

2
C v2

22

2

C

HH

C 2v1 v2

12

C 2v1

1H

The first order conditions are
0D

0D or "

The solution is
"#
v1
D
v2

e
1
e
2

e
1

k Œv1

11

e
2

k Œv2

22

#

"
Dk

C v1

12

C

1H 

C

2H  ;

12

12

#"
22

2
12

12

#" #
"
v1
Ck
v2
22

11

"

1
11 22

C v2

12

12
11

e
1
e
2

#

#
1H

:

2H

1 k "

#!
1H

:

2H

Example 6.6 (Portfolio choice of a pharmaceutical engineer) In the previous remark, suppose asset 1 is an index of pharmaceutical stocks, and asset 2 is the rest of the equity market. Consider a person working as a pharmaceutical engineer: the covariance of her labour with asset 1 is likely to be high, while the covariance with asset 2 might be fairly small. This person should therefore tilt his financial portfolio away from pharmaceutical stocks: the market portfolio is not the best for everyone.

6.1.2

Asset Pricing Implications of Background Risk

The beta representation of expected returns is also affected by the existence of background risk. Let Rm denote the market portfolio of the marketable assets (whose weights are
129

C 2v2

2H

:

MV frontier of original assets
15
Mean, %

15
Mean, %

MV frontier of transformed assets

original assets optimal w. background risk

10
5
0

0

5

10

10
5
0

15

transformed assets optimal 0

5

10

Std, %

15

Std, %

Figure 6.1: Portfolio choice with background risk proportional to (6.10)). We then have e i

Q
D ˇi

e m, C.
.
mm C

Q where ˇi D

im

iH

i m/

mH

mm/

:

(6.14)

This coincides with the standard case when
D 0 (no background risk) or when both asset i and the market are uncorrelated with the background risk. This expression suggests one reason for why the traditional beta (against the market portfolio only) could be biased. For instance, if the market is positively correlated with RH , but asset i is
Q
negatively correlated with RH , then ˇi is lower than the traditional beta.
Proof. ( of (6.14)) Divide the portfolio weights in (6.10) by 1 to get the weights of the (financial) market portfolio, wm . For any portfolio with portfolio weights wp we have the covariance with the market pm 0
D wp ˙ wm

0
D wp ˙˙

D

e p = Œk

1

.1

.

e

=k

SH / = .1



pH

= .1

/
/:

Apply this equation to the market return itself to get mm D

e m = Œk

.1



mH

= .1

/:

130

Combine these two equations as
C
mm C pm = .1
= .1

pH mH /
D
/

e p ; e m

which can be rearranged as (6.14).
Notice that a standard CAPM regression of
(6.15)

e e Ri D ˛i C bi Rm C "i ;

would produce (in a very large sample) the traditional beta (bi D ˇ D non-zero intercept equal to
Q
˛i D .ˇi ˇi / e : m i m = mm )

and a
(6.16)

A rejection of the null that the intercept is zero (a rejection of CAPM) could then be due to the existence of background risk. (There are clearly several other possible reasons.)
Proof. (of (6.16)) Take expectations of (6.15) to get e D ˛i C ˇi e . From (6.14) m i
Qi e D ˛i C ˇi e which gives (6.16). we then have ˇ m m Example 6.7 (Different betas) Suppose
0:5
(
Q
ˇi D

im

D 0:8;

0:8
D 0:8
1
0:8C0:3. 0:5 1/
D
1C0:3.0:5 1/

mm

if
0:41 if

D 1;

iH

D

0:5, and

mH

D

D0
D 0:3:

There is also another way to express the expected excess return of asset i —as a multifactor model (or multi-beta model). e i

D ˇi m

e m C ˇiH

e
H:

(6.17)

In this case, the expected excess return on asset i depends on how it is related to both the
(financial) market and the background risk. The key implication of (6.17) is that there are two risk factors that influence the required risk premium of asset i : both the market and the background risk matter. The investor’s portfolio choice will typically depend on the background risk, which in turn will affect asset prices (and returns).
It may seem as if we now have a paradox: both the “adjusted” single-beta representation (6.14) and the multiple-beta representation (6.17) are supposedly true. Can that really be the case—and how should we then test the model? Well, both expressions are
131

true—but there is a key difference: the betas in (6.17) could be estimated by a multiple
Q
regression, whereas ˇi in (6.14) could not.
Proof. ( of (6.17)) The first equation of the Proof of (6.14) can be written e p =k

D .1 h D1

/

pm

i

i

h

i

# pm "

#" mm mH

mH

D1

HH

"

/

mm

mm

HH

ˇpm
ˇpH

.1

mH

/

1

"

#

mH

pm

HH

mH

mH

C

# mm #" mH D1
D .1

(*)

pH

pH

h

h

C
"

pH

#

mH

C

i
HH

"

ˇpm
ˇpH

#
:

(**)

The third line just multiplies and divides by the covariance matrix. The fourth line follows from the usual definition of regression coefficients, ˇ D Var.x/ 1 Cov.x; y/.
Apply the first equation (*) on the market return and an asset with the same return as the RH (this is a short cut, it would be more precise to use a “factor mimicking” portfolio—it is just a bit more complicated). We then get e m =k e H =k

D .1

D .1

/

mm

/

mH

C

C

mH

and

HH :

Use these to substitute for the row vector in (**) to get
"
# h iˇ pm e e e
;
p =k D m =k
H =k
ˇpH
which is the same as (6.17).

6.2

Heterogenous Investors

This section gives a simple example of a model where the investors have different beliefs.

132

Recall the simple MV problem where investor i solves max˛ Ei Rp

Vari .Rp /ki =2; subject to

(6.18)
(6.19)

e
Rp D ˛Rm C Rf :

In these expressions, the expectations, variance, and the risk aversion parameter all carry the subscript i to indicate that they may differ between investors. The solution is that the weight on the risky asset is e 1 Ei Rm
˛i D
;
(6.20) e ki Vari .Rm / e where Ei Rm is the investor’s expectation of the excess return of the risky asset and e Vari .Rm / the investor’s perceived variance.
If all investors have the same initial wealth, then the average (across investors) ˛i must be unity—since the riskfree asset is in zero net supply. Suppose there are N investors, then the average of (6.20) is

1D

e
1 XN 1 Ei Rm
:
i D1 ki Vari .R e /
N
m

(6.21)

This is an equilibrium condition that must hold. We consider a few illustrative special cases. First, suppose all investors have the same expectations and assessments of the variance, but different risk aversions, ki . Then, (6.21) can be rearranged as e e
Q
Q
E Rm D k Var.Rm /; where k D

1
N

1
PN

1 i D1 ki

:

(6.22)

Q
This shows that the risk premium on the market is increasing in the volatility and k . The latter is not the average risk aversion, but closely related to it. For instance, if all ki is
Q
scaled up by a factor b so is k (and therefore the risk premium).
Example 6.8 (“Average” risk aversion) If half of the investors have k D 2 and the other
Q
half has k D 3, then k D 2:4:
Second, suppose now that only the expected excess return is the same for all investors.

133

Then, (6.21) can be rearranged as e E Rm D

1
PN
1

N

1 e i D1 ki Vari .Rm /

:

(6.23)

The market risk premium is now increasing in a complicated expression that is closely related to a weighted average of the perceived market variances—where the weights are increasing in the risk aversion. If all variances or risk aversions are scaled up by a factor b so is the risk premium.
Third, suppose only the expected excess returns differ. Then, (6.21) can be rearranged as 1 XN e e
Ei Rm D k Var.Rm /:
(6.24)
i D1
N
Clearly, the average expected excess return is increasing in the risk aversion and variance.
To interpret this a bit more, let the return be the capital gain (assuming no dividend in the next period), Rm D P t C1 =P t where the current period is t
Â
Ã
P t C1
1 XN e Ei
Rf D k Var.Rm / or
(6.25)
i D1
N
Pt
1
1 XN
Pt D
Ei .P t C1 / :
(6.26)
e i D1 k Var.Rm / C Rf N
This shows that today’s market price, P t , is simply the average expected future price— scaled down by the risk aversion, volatility and the riskfree rate (to create a capital gain to compensate for the risk and the alternative return).
These special cases suggest that, although the general expression (6.21) is complicated, we are unlikely to commit serious errors by sticking to the formulation e e
E Rm D k Var.Rm /;

(6.27)

as long as we interpret the components as (close to) averages across investors.

6.3

CAPM without a Riskfree Rate

This section states the main result for CAPM when there is no riskfree asset. It uses two basic ingredients.
First, suppose investors behave as if they had mean-variance preferences, so they
134

choose portfolios on the mean-variance frontier (of risky assets only). Different investors may have different portfolios, but they are all on the mean-variance frontier. The market portfolio is a weighted average of these individual portfolios, and therefore itself on the mean-variance frontier. (Linear combinations of efficient portfolios are also efficient.)
Second, consider the market portfolio. We know that we can find some other efficient portfolio (denote it Rz ) that has a zero covariance (beta) with the market portfolio,
Cov.Rm ; Rz / D 0. (Such a portfolio can actually be found for any efficient portfolio, not just the market portfolio.) Let vm be the portfolio weights of the market portfolio, and ˙ the variance-covariance matrix of all assets. Then, the portfolio weights vz that generate
0
0
Rz must satisfy vm ˙vz D 0 and vz 1 D 1 (sum to unity). The intuition for how the portfolio weights of the Rz assets is that some of the weights have the same sign as in the market portfolio (contributing to a positive covariance) and some other have the opposite sign compared to the market portfolio (contributing to a negative covariance). Together, this gives a zero covariance.
See Figure 6.2 for an illustration.
The main result is then the “zero-beta” CAPM
E.Ri

Rz / D ˇi E.Rm

Rz /:

(6.28)

Suppose we run the CAPM regression (6.15). We then get (in a very large sample)
˛i D .1

ˇi /

e z; (6.29)

where e is the expected excess return (over the riskfree rate used in the CAPM regresz sion). This suggests that a rejection of CAPM might be due to the fact that investors cannot borrow and lend freely at a riskfree rate.
Proof. (of (6.29)) Subtract Rf from both sides of (6.28), then add and subtract .1
ˇi /Rf on the right hand side. Rearrange to get (6.29).
Proof. ( of (6.28)) An investor (with initial wealth equal to unity) chooses the portfolio weights (vi ) to maximize k Var.Rp /; where
2
Rp D v1 R1 C v2 R2 and v1 C v2 D 1;

E U.Rp / D E.Rp /

135

MV frontier and zero beta model
0.12
0.1

Mean

0.08

Means:
0.09 0.06
Covariance matrix:
0.026 0.000
0.000 0.014

Rm

0.06

weights Rm 0.47 0.53 weights Rz −1.67 2.67

0.04
0.02
E(Rz)
0

Rz
0

0.1

0.2

0.3

0.4

0.5

Std

Figure 6.2: Zero-beta model where we assume two risky assets. Combining gives the Lagrangian
L D v1

1

C v2

2

k2 v 21

11

2
C v2

22

C 2v1 v2

12

C .1

v1

v2 /:

The first order conditions (for v1 and v2 ) are that the partial derivatives equal zero
0 D @L=@v1 D

0 D @L=@v2 D

1

k .v1

11

2

k .v2

22

0 D @L=@ D 1

v1

C v2

C v1

12 /
12 /

v2

Notice that
1m

D Cov.R1 ; v1 R1 C v2 R2 / D v1
„ ƒ‚ …
Rm

and similarly for

2m .

11

C v2

12 ;

We can then rewrite the first order conditions as
0D

0D

0D1

1

k

1m

2

k

(a)

2m

v1

v2

136

Take a weighted average of the first two equations with the weights v1 and v2 respectively v1 1

C v2

D k .v1

2

Dk

m

1m

C v2

2m /

(b)

mm ;

which follows from the fact that v1 1m

C v2

2m

D v1 Cov.R1 ; v1 R1 C v2 R2 / C v2 Cov.R2 ; v1 R1 C v2 R2 /

D Cov.v1 R1 C v2 R2 ; v1 R1 C v2 R2 /

D Var.Rm /:
Divide (a) by (b)

1 m 1

D

k k 1m

or

mm

D ˇ1 .

m

/

Applying this equation on a return Rz with a zero beta (against the market) gives. z D 0.

m

/, so we notice that

D

z:

Combining the last two equations gives (6.28).

6.4
6.4.1

Multi-Factor Models and APT
Multi-Factor Models

A multi-factor model extends the market model by allowing more factors to explain the return on an asset. In terms of excess returns it could be e e e e e Ri D ˇi m Rm C ˇiF RF C "i , where E."i / D 0; Cov.Rm ; "i / D 0; Cov.RF ; "i / D 0:
(6.30)
The pricing implication is a multi-beta model e i

D ˇi m

e m C ˇiF

e
F:

(6.31)

Remark 6.9 (When factors are not excess returns) This formulation assumes that the factor can be expressed as an excess return—but that is not necessary. For instance, it
137

could be that the second factor is a macro variable like inflation surprises. Then there are two possible ways to proceed. First, find that portfolio which mimics the movements in the inflation surprises best and use the excess return of that (factor mimicking) portfolio in
(6.30) and (6.31). Second, we could instead reformulate the model by adding an intercept e in (6.31) and let RF denote whatever the factor is (not necessarily an excess return) and then estimate the factor risk premium, corresponding to e in (6.31), by using a crossF section of different assets (i D 1; 2; : : :).
We have already seen one theoretical multi-factor model: the “CAPM with background risk” in (6.17). The consumption-based model (discussed later on) gives another example. There are also several empirically motivated multi-factor models, that is, empirical models that have been found to work well (even if the theoretical foundation might be a bit weak).
Fama and French (1993) estimate a multi-factor model and show that it performs much better than CAPM. The three factors are: the market return, the return on a portfolio of small stocks minus the return on a portfolio of big stocks, and the return on a portfolio with a high ratio of book value to market value minus the return on a portfolio with a low ratio. He and Ng (1994) try to relate these factors to macroeconomic series.
The multi-factor model by MSCIBarra is widely used in the financial industry. It uses a set of firm characteristics (rather than macro variables) as factors, for instance, size, volatility, price momentum, and industry/country (see Stefek (2002)). This model is often used to value firms without a price history (for instance, before an IPO) or to find mispriced assets.
The APT model (see below) is another motivation for why a multi-factor model may make sense. Finally, consumption-based models typically also suggest multi-factor models (in terms of macro variables).
6.4.2

The Arbitrage Pricing Model

The first assumption of the Arbitrage Pricing Theory (APT) is that the return of asset i can be described as
Ri t D ai C ˇi f t C "i;t , where E "i t D 0; Cov."i t ; f t / D Cov."i t ; "jt / D 0:

(6.32)

138

In this particular formulation there is only one factor, f t , but the APT allows for more factors. Notice that (6.32) assumes that any correlation of two assets (i and j ) is due to movements in f t —the residuals are assumed to be uncorrelated. This is clearly an index model (here a single index).
The second assumption of APT is that there are financial markets are very well developed— so well developed that it is possible to form portfolios that “insure” against almost all possible outcomes. To be precise, the assumption is that it is possible to form a zero cost portfolio (buy some, sell some) that has a zero sensitivity to the factor and also (almost) no idiosyncratic risk. In essence, this assumes that we can form a (non-trivial) zero-cost portfolio of the risky assets that is riskfree. In formal terms, the assumption is that there is a non-trivial portfolio (with the value vj of the position in asset j ) such that
˙iN 1 vi D ˙iN 1 vi ˇi D 0 and ˙iN 1 vi2 Var."i;t /
0. The requirement that the portfolio
D
D
D
is non-trivial means that at least some vj ¤ 0.
Together, these assumptions imply that (the proof isn’t all that simple) for well diversified portfolios we have
E Ri t D Rf C ˇi ;
(6.33)
where is (typically) an unknown constant. The important feature is that there is a linear relation between the risk premium (expected excess return) of an asset and its beta. This expression generalizes to the multi-factor case.
Example 6.10 (APT with three assets) Suppose there are three well-diversified portfolios
(that is, with no residual) with the following factor models
R1;t D 0:01 C 1f t

R2;t D 0:01 C 0:25f t , and
R3;t D 0:01 C 2f t :

APT then holds if there is a portfolio with vi invested in asset i , so that the cost of the portfolio is zero (which implies that the weights must be of the form v1 , v2 , and v1 v2

139

respectively) such that the portfolio has zero sensitivity to f t , that is
0 D v1
D v1
D

v1

1 C v2
.1
v2

0:25 C . v1

2/ C v2

.0:25

v2 /

2

2/

1:75:

There is clearly an infinite number of such weights but they all obey the relation v1 D v2 1:75. Notice the requirement that there is no idiosyncratic volatility is (here) satisfied by assuming that none of the three portfolios have any idiosyncratic noise.
Example 6.11 (APT with two assets) Example 6.10 would not work if we only had the first two assets. To see that, the portfolio would then have to be of the form (v1 ; v1 ) and it is clear that v1 1 v1 0:25 D v1 .1 0:25/ ¤ 0 for any non-trivial portfolio (that is, with v1 ¤ 0).
One of the main drawbacks with APT is that it is silent about both the number of factors and their definition. In many empirical implications, the factors—or the factor mimicking portfolios—are found by some kind of statistical method. The idea is (typically) to find that combination of some given assets that explain most of the covariance of the same assets. Then, we find the next combination of the same assets that is uncorrelated with the first combination but also explain as much as possible of the (remaining) covariance—and so forth. A few such factors are often enough to account for most of the covariance. Still, the factors have no particular economic interpretation, and it is not possible to guess what the betas ought to be. To do that, we have to get back to the multifactor model. For instance. CAPM gives the same type of implication as (6.33)—except that CAPM identifies as the expected excess return on the market.

6.5
6.5.1

Joint Portfolio and Savings Choice
Two-Period Problem

The basic consumption-based multi-period problem postulates that the investor derives utility from consumption in every period and that the utility in one period is additively separable from the utility in other periods. For instance, if the investor plans for 2 periods

140

(labelled 1 and 2), then he/she chooses the amount invested in different assets to maximize expected utility max u.C1 / C ı E1 u.C2 /, subject to

(6.34)
(6.35)

C1 C I1 D W1

(6.36)

e e C2 C I2 D 1 C v1 R1 C v2 R2 C Rf I1 :

In equation (6.34) C t is consumption in period t . The current period (when the portfolio is chosen) is period 1—so all expectations are made on the basis of the information available in period 1. The constant ı is the time discounting, with 0 < ı < 1 indicating impatience. (In equilibrium without risk, we will get a positive real interest rate if investors are impatient.)
Equation (6.35) is the budget constraint for period 1: an initial wealth at the beginning of period 1, W1 , is split between consumption, C1 , and investment, I1 . Equation (6.36) is the budget constraint for period 2: consumption plus investment must equal the wealth at the beginning of period 2. It is clear that I2 D 0 since investing in period 2 is the same as wasting resources. The wealth at the beginning of period 2 equals the investment in period 1, I1 , times the gross portfolio return—which in turn depends on the portfolio weights chosen in period 1 (v1 and v2 ) as well as on the returns on the assets (from holding them from period 1 to period 2).
Use the budget constraints and I2 D 0 to substitute for C1 and C2 in (6.34) to get max u .W1

e e I1 / C ı E1 u 1 C v1 R1 C v2 R2 C Rf I1 :

(6.37)

The decision variables in period 1 are how much to invest, I1 , (which implicitly defines how much we consume in period 1), and the portfolio weights v1 and v2 .
The first order condition for I1 is e e u0 .C1 / C ı E1 u0 .C2 / 1 C v1 R1 C v2 R2 C Rf

D 0;

(6.38)

where u0 .C t / is the marginal utility in period t . (In this expression, the consumption levels is substituted back—in order to facilitate the interpretation.) This says that consumption should be planned so that the marginal loss of utility from decreasing C1 equals the discounted expected marginal gain of utility from increasing C2 by the gross return of the money saved.
141

Utility function with tangents

Marginal utility

Consumption

Consumption

Figure 6.3: Utility function
The first order conditions for v1 and v2 are e E1 u0 .C2 /R1 D 0 and

(6.39)

e
E1 u0 .C2 /R2 D 0;

(6.40)

which say that both excess returns should be orthogonal to marginal utility. To solve for the decision variables (I1 ; v1 ; v2 ) we should use the budget restrictions (6.35) and (6.36) to substitute for C1 and C2 in (6.38), (6.39) and (6.40)—and then solve the three equations for the three unknowns. There are typically no explicit solutions, so numerical solutions are the best we can hope for.
The first order conditions still contain some useful information. In particular, recall that, by definition, Cov.x; y/ D E.xy/ E.x/ E.y/, so (6.39) can be written e e
Cov u0 .C2 /; R1 C E u0 .C2 / E.R1 / D 0 or e E.R1 / D

e
Cov u0 .C2 /; R1
:
E Œu0 .C2 /

(6.41)

This says that asset 1 will have a high risk premium (expected excess return) if it is negatively correlated with marginal utility, that is, if it tends to have a high return when the need is low. Since marginal utility is decreasing in consumption (concave utility function), this is the same as saying that assets that tend to have high returns when consumption is high (and vice versa) will be considered risky assets—and therefore carry large risk premia. Although these results were derived from a two-period problem, it can be shown that a
142

problem with more periods gives the same first-order conditions. In this case, the objective function is
u.C1 / C ı E1 u.C2 / C ı 2 E1 u.C3 / C : : : ı T 1 E1 u.CT /:
(6.42)
6.5.2

From a Consumption-Based Model to CAPM

Suppose marginal utility is an affine function of the market excess return e bRm , with b > 0:

u0 .C2 / D a

(6.43)

This would, for instance, be the case in a Lucas model where consumption equals the market return and the utility function is quadratic–but it could be true in other cases as well. We can then write (6.41) as e E.R1 / D b

e e Cov Rm ; R1

Ea

e bRm (6.44)

:

We can, of course, apply this expression to the market excess return (instead of asset 1) to get e
Var Rm e E.Rm / D b
:
(6.45) e E a bRm e e
Use (6.45) in (6.44) to substitute E.Rm /= Var Rm for b= E a

e
E.R1 /

D

e e Cov Rm ; R1

Var

e
Rm

e bRm e
E.Rm /;

(6.46)

which is the beta representation of CAPM.
6.5.3

From a Consumption-Based Model to a Multi-Factor Model

The consumption-based model may not look like a factor model, but it could easily be written as one. The idea is to assume that marginal utility is a linear function of some key macroeconomic variables, for instance, output and interest rates u0 .C2 / D ay C bi:

(6.47)

143

Such a formulation makes a lot of sense in most macro models—at least as an approximation. It is then possible to write (6.41) as e E.R1 / D

e e a Cov y ; R1 C b Cov i ; R1
:
E .ay C bi/

(6.48)

This, in turn, is easily put in the form of (6.31), where the risk premium on asset 1 depends on the betas against GDP and the interest rate. (See the proof of (6.17) for an idea of how to construct this beta representation.)

Bibliography
Elton, E. J., M. J. Gruber, S. J. Brown, and W. N. Goetzmann, 2010, Modern portfolio theory and investment analysis, John Wiley and Sons, 8th edn.
Fama, E. F., and K. R. French, 1993, “Common risk factors in the returns on stocks and bonds,” Journal of Financial Economics, 33, 3–56.
He, J., and L. Ng, 1994, “Economic forces and the stock market,” Journal of Business, 4,
599–609.
Stefek, D., 2002, “The Barra integrated model,” Barra Research Insight.

144

7

Testing CAPM and Multifactor Models

Reference: Elton, Gruber, Brown, and Goetzmann (2010) 15
More advanced material is denoted by a star ( ). It is not required reading.

7.1

Market Model

The basic implication of CAPM is that the expected excess return of an asset ( e ) is i linearly related to the expected excess return on the market portfolio ( e ) according to m e i D ˇi

e m, where ˇi D

Cov .Ri ; Rm /
:
Var .Rm /

(7.1)

e
Let Ri t D Ri t Rf t be the excess return on asset i in excess over the riskfree asset, e and let Rmt be the excess return on the market portfolio. CAPM with a riskfree return says that ˛i D 0 in e e e Ri t D ˛i C bi Rmt C "i t , where E "i t D 0 and Cov.Rmt ; "i t / D 0:

(7.2)

The two last conditions are automatically imposed by LS. Take expectations to get e e
E Ri t D ˛i C bi E Rmt :

(7.3)

Notice that the LS estimate of bi is the sample analogue to ˇi in (7.1). It is then clear that
CAPM implies that ˛i D 0, which is also what empirical tests of CAPM focus on.
This test of CAPM can be given two interpretations. If we assume that Rmt is the correct benchmark (the tangency portfolio for which (7.1) is true by definition), then it is a test of whether asset Ri t is correctly priced. This is typically the perspective in performance analysis of mutual funds. Alternatively, if we assume that Ri t is correctly priced, then it is a test of the mean-variance efficiency of Rmt . This is the perspective of
CAPM tests.
The t-test of the null hypothesis that ˛i D 0 uses the fact that, under fairly mild

145

conditions, the t-statistic has an asymptotically normal distribution, that is
˛i
O d ! N.0; 1/ under H0 W ˛i D 0:
Std.˛i /
O

(7.4)

Note that this is the distribution under the null hypothesis that the true value of the intercept is zero, that is, that CAPM is correct (in this respect, at least).
The test assets are typically portfolios of firms with similar characteristics, for instance, small size or having their main operations in the retail industry. There are two main reasons for testing the model on such portfolios: individual stocks are extremely volatile and firms can change substantially over time (so the beta changes). Moreover, it is of interest to see how the deviations from CAPM are related to firm characteristics
(size, industry, etc), since that can possibly suggest how the model needs to be changed.
The results from such tests vary with the test assets used. For US portfolios, CAPM seems to work reasonably well for some types of portfolios (for instance, portfolios based on firm size or industry), but much worse for other types of portfolios (for instance, portfolios based on firm dividend yield or book value/market value ratio). Figure 7.1 shows some results for US industry portfolios.
7.1.1

Interpretation of the CAPM Test

Instead of a t-test, we can use the equivalent chi-square test
˛i2
O d !
Var.˛i /
O

2
1

under H0 : ˛i D 0:

(7.5)

Tables (A.2)–(A.1) list critical values for t- and chi-square tests
It is quite straightforward to use the properties of minimum-variance frontiers (see
Gibbons, Ross, and Shanken (1989), and also MacKinlay (1995)) to show that the test statistic in (7.5) can be written
˛i2
O
.SRc /2 .SRm /2
D
;
Var.˛i /
O
Œ1 C .SRm /2 =T

(7.6)

where SRm is the Sharpe ratio of the market portfolio (as before) and SRc is the Sharpe ratio of the tangency portfolio when investment in both the market return and asset i is possible. (Recall that the tangency portfolio is the portfolio with the highest possible
Sharpe ratio.) If the market portfolio has the same (squared) Sharpe ratio as the tangency
146

10
I

5
0

0

D
A
FH

G
C
JB

0.5
1
beta (against the market)

all
A (NoDur)
B (Durbl)
C (Manuf)
D (Enrgy)
E (HiTec)
F (Telcm)
G (Shops)
H (Hlth )
I (Utils)
J (Other)

alpha
NaN
3.50
−0.67
0.83
4.36
−1.72
1.53
1.17
1.86
2.61
−0.47

pval
0.07
0.01
0.74
0.41
0.06
0.37
0.39
0.45
0.30
0.16
0.68

US industry portfolios, 1970:1−2010:12
Mean excess return

Mean excess return

US industry portfolios, 1970:1−2010:12
15

E

1.5

StdErr
NaN
8.88
13.39
6.36
14.63
12.16
11.37
9.89
11.70
11.77
7.20

15
10
5
0

I

D
A
HG
F CB E
J
Excess market return: 5.5%

0
5
10
15
Predicted mean excess return (with α=0)

CAPM
Factor: US market alpha and StdErr are in annualized %

Figure 7.1: CAPM regressions on US industry indices portfolio of the mean-variance frontier of Ri t and Rmt (so the market portfolio is meanvariance efficient also when we take Ri t into account) then the test statistic, ˛i2 = Var.˛i /,
O
O is zero—and CAPM is not rejected.
Proof. ( Proof of (7.6)) From the CAPM regression (7.2) we have
"
#"
#
"
#"
# e e
2
2
Ri t
ˇi2 m C Var."i t / ˇi m
˛i C ˇi e m i
Cov
D
, and
D
: e 2
2
e e Rmt
ˇi m m m m Suppose we use this information to construct a mean-variance frontier for both Ri t and e Rmt , and we find the tangency portfolio, with excess return Rct . It is straightforward to show that the square of the Sharpe ratio of the tangency portfolio is e0 ˙ 1 e , where e is the vector of expected excess returns and ˙ is the covariance matrix. By using the covariance matrix and mean vector above, we get that the squared Sharpe ratio for the

147

tangency portfolio,

e0

˙

1

e

Â

, (using both Ri t and Rmt ) is e c c Ã2

˛i2
D
C
Var."i t /

which we can write as
.SRc /2 D

Â

e m Ã2
;

m

˛i2
C .SRm /2 :
Var."i t /

Combine this with (7.8) which shows that Var.˛i / D Œ1 C .SRm /2  Var."i t /=T .
O
This is illustrated in Figure 7.2 which shows the effect of adding an asset to the investment opportunity set. In this case, the new asset has a zero beta (since it is uncorrelated with all original assets), but the same type of result holds for any new asset. The basic point is that the market model tests if the new assets moves the location of the tangency portfolio. In general, we would expect that adding an asset to the investment opportunity set would expand the mean-variance frontier (and it does) and that the tangency portfolio changes accordingly. However, the tangency portfolio is not changed by adding an asset with a zero intercept. The intuition is that such an asset has neutral performance compared to the market portfolio (obeys the beta representation), so investors should stick to the market portfolio.
7.1.2

Econometric Properties of the CAPM Test

A common finding from Monte Carlo simulations is that these tests tend to reject a true null hypothesis too often when the critical values from the asymptotic distribution are used: the actual small sample size of the test is thus larger than the asymptotic (or “nominal”) size (see Campbell, Lo, and MacKinlay (1997) Table 5.1). The practical consequence is that we should either used adjusted critical values (from Monte Carlo or bootstrap simulations)—or more pragmatically, that we should only believe in strong rejections of the null hypothesis.
To study the power of the test (the frequency of rejections of a false null hypothesis) we have to specify an alternative data generating process (for instance, how much extra return in excess of that motivated by CAPM) and the size of the test (the critical value to use). Once that is done, it is typically found that these tests require a substantial deviation from CAPM and/or a long sample to get good power. The basic reason for this is that asset returns are very volatile. For instance, suppose that the standard OLS assumptions (iid
148

MV frontiers before and after (α=0)
Solid curves: 2 assets,
Dashed curves: 3 assets

0.1
Mean

Mean

0.1

0.05

0

MV frontiers before and after (α=0.05)

0

0.05

0.1

0.15

0.05

0

0

0.05

Std

0.1

0.15

Std
The new asset has the abnormal return α compared to the market (of 2 assets)

MV frontiers before and after (α=−0.04)

Mean

0.05

0

0.0800 0.0500 α + β(ERm−Rf)

Cov matrix 0.1

Means

0.0256 0.0000 0.0000
0.0000 0.0144 0.0000
0.0000 0.0000 0.0144

Tang portf 0

0.05

0.1

0.15

N=2
0.47
0.53
NaN

α=0
0.47
0.53
0.00

α=0.05
0.31
0.34
0.34

α=−0.04
0.82
0.91
−0.73

Std

Figure 7.2: Effect on MV frontier of adding assets residuals that are independent of the market return) are correct. Then, it is straightforward to show that the variance of Jensen’s alpha is
"
#
. e /2 m Var.˛i / D 1 C
O
Var."i t /=T
(7.7)
e
Var Rm
D Œ1 C .SRm /2  Var."i t /=T;

(7.8)

where SRm is the Sharpe ratio of the market portfolio. We see that the uncertainty about the alpha is high when the residual is volatile and when the sample is short, but also when the Sharpe ratio of the market is high. Note that a large market Sharpe ratio means that the market asks for a high compensation for taking on risk. A bit uncertainty about how risky asset i is then translates in a large uncertainty about what the risk-adjusted return should be.
149

Example 7.1 Suppose we have monthly data with bi D 0:2% (that is, 0:2% 12 D 2:4%
˛
p per year), Std ."i t / D 3% (that is, 3%
12 10% per year) and a market Sharpe ratio p 12
0:5 per year). (This corresponds well to US CAPM of 0:15 (that is, 0:15 regressions for industry portfolios.) A significance level of 10% requires a t-statistic (7.4) of at least 1.65, so
0:2
1:65 or T 626: p p
1 C 0:152 3= T
We need a sample of at least 626 months (52 years)! With a sample of only 26 years (312 months), the alpha needs to be almost 0.3% per month (3.6% per year) or the standard deviation of the residual just 2% (7% per year). Notice that cumulating a 0.3% return over 25 years means almost 2.5 times the initial value.

0
Proof. ( Proof of (7.8)) Consider the regression equation y t D x t b C " t . With iid errors that are independent of all regressors (also across observations), the LS estimator,
O
bLs , is asymptotically distributed as

p

O
T .bLs

d

b / ! N.0;

2

1
˙xx /, where

2

0
D Var." t / and ˙xx D plim˙ tT 1 x t x t =T:
D

When the regressors are just a constant (equal to one) and one variable regressor, f t , so x t D Œ1; f t 0 , then we have
"
#"
#
PT
1 ft
1
E ft
1 PT
0
˙xx D E t D1 x t x t =T D E
D
, so
T t D1 f t f t2
E f t E f t2
"
#
"
#
2
2
E f t2
E ft
Var.f t / C .E f t /2
E ft
2
1
D
:
˙xx D
Var.f t /
E f t2 .E f t /2
E ft
1
E ft
1
(In the last line we use Var.f t / D E f t2
7.1.3

.E f t /2 :)

Several Assets

In most cases there are several (n) test assets, and we actually want to test if all the ˛i (for i D 1; 2; :::; n) are zero. Ideally we then want to take into account the correlation of the different alphas.
While it is straightforward to construct such a test, it is also a bit messy. As a quick way out, the following will work fairly well. First, test each asset individually. Second, form a few different portfolios of the test assets (equally weighted, value weighted) and
150

test these portfolios. Although this does not deliver one single test statistic, it provides plenty of information to base a judgement on. For a more formal approach, see Section
7.1.4.
A quite different approach to study a cross-section of assets is to first perform a CAPM regression (7.2) and then the following cross-sectional regression
T
X e Ri t =T D t D1

O
C ˇi C ui ;

(7.9)

P e where TD1 Ri t =T is the (sample) average excess return on asset i . Notice that the est timated betas are used as regressors and that there are as many data points as there are assets (n).
There are severe econometric problems with this regression equation since the regressor contains measurement errors (it is only an uncertain estimate), which typically tend to bias the slope coefficient towards zero. To get the intuition for this bias, consider an extremely noisy measurement of the regressor: it would be virtually uncorrelated with the dependent variable (noise isn’t correlated with anything), so the estimated slope coefficient would be close to zero.
If we could overcome this bias (and we can by being careful), then the testable implications of CAPM is that D 0 and that equals the average market excess return.
We also want (7.9) to have a high R2 —since it should be unity in a very large sample (if
CAPM holds).
7.1.4

Several Assets: SURE Approach

This section outlines how we can set up a formal test of CAPM when there are several test assets.
For simplicity, suppose we have two test assets. Stack (7.2) for the two equations are e e
R1t D ˛1 C b1 Rmt C "1t ;

e e R2t D ˛2 C b2 Rmt C "2t

(7.10)
(7.11)

e where E "i t D 0 and Cov.Rmt ; "i t / D 0. This is a system of seemingly unrelated regressions (SURE)—with the same regressor (see, for instance, Wooldridge (2002) 7.7). In this case, the efficient estimator (GLS) is LS on each equation separately. Moreover, the

151

covariance matrix of the coefficients is particularly simple.
To see what the covariances of the coefficients are, write the regression equation for asset 1 (7.10) on a traditional form
#
"#
"
˛1
1
e
0
; ˇ1 D
;
(7.12)
R1t D x t ˇ1 C "1t , where x t D e b1
Rmt
and similarly for the second asset (and any further assets).
Define
XT
XT
0
O
˙xx D x t x t =T , and O ij D t D1

t D1

"i t "jt =T;
OO

(7.13)

where "i t is the fitted residual of asset i . The key result is then that the (estimated)
O
O
O
asymptotic covariance matrix of the vectors ˇi and ˇj (for assets i and j ) is
OO
O1
Cov.ˇi ; ˇj / D O ij ˙xx =T:

(7.14)

(In many text books, this is written O ij .X 0 X/ 1 .)
The null hypothesis in our two-asset case is
H0 W ˛1 D 0 and ˛2 D 0:

(7.15)

In a large sample, the estimator is normally distributed (this follows from the fact that the LS estimator is a form of sample average, so we can apply a central limit theorem).
Therefore, under the null hypothesis we have the following result. From (7.8) we know
1
that the upper left element of ˙xx =T equals Œ1 C .SRm /2 =T . Then
"#
"#"
#
!
˛1
O
0
11
12
N
;
Œ1 C .SRm /2 =T (asymptotically).
(7.16)
˛2
O
0
12
22
In practice we use the sample moments for the covariance matrix. Notice that the zero means in (7.16) come from the null hypothesis: the distribution is (as usual) constructed by pretending that the null hypothesis is true. In practice we use the sample moments for the covariance matrix. Notice that the zero means in (7.16) come from the null hypothesis: the distribution is (as usual) constructed by pretending that the null hypothesis is true.
We can now construct a chi-square test by using the following fact.
Remark 7.2 If the n

1 vector y

N .0; ˝/, then y 0 ˝

1

y

2
n.

152

To apply this, form the test static
" #0
˛1
O
T
Œ1 C .SRm /2 
˛2
O

"
1

#
11
12

1

"

12
22

˛1
O
˛2
O

#
2
2:

(7.17)

This can also be transformed into an F test, which might have better small sample properties.
7.1.5

Representative Results of the CAPM Test

One of the more interesting studies is Fama and French (1993) (see also Fama and French
(1996)). They construct 25 stock portfolios according to two characteristics of the firm: the size (by market capitalization) and the book-value-to-market-value ratio (BE/ME). In
June each year, they sort the stocks according to size and BE/ME. They then form a 5 5 matrix of portfolios, where portfolio ij belongs to the i th size quintile and the j th BE/ME quintile. Tables 7.1–7.2 summarize some basic properties of these portfolios.
1
Size 1
2
3
4
5

5:6
4:8
5:3
6:4
5:3

Book value/Market value
2
3
4
11:5
8:8
8:9
6:9
6:8

11:9
11:1
9:1
8:8
6:9

13:5
10:9
10:7
9:8
6:7

5
16:3
12:2
12:6
10:2
8:5

Table 7.1: Mean excess returns (annualised %), US data 1957:1–2010:12. Size 1: smallest
20% of the stocks, Size 5: largest 20% of the stocks. B/M 1: the 20% of the stocks with the smallest ratio of book to market value (growth stocks). B/M 5: the 20% of the stocks with the highest ratio of book to market value (value stocks).
They run a traditional CAPM regression on each of the 25 portfolios (monthly data
1963–1991)—and then study if the expected excess returns are related to the betas as they e should according to CAPM (recall that CAPM implies E Ri t D ˇi where is the risk premium (excess return) on the market portfolio). e However, it is found that there is almost no relation between E Ri t and ˇi (there is e a cloud in the ˇi E Ri t space, see Cochrane (2001) 20.2, Figure 20.9). This is due to the combination of two features of the data. First, within a BE/ME quintile, there is
153

1
Size 1
2
3
4
5

1:4
1:4
1:4
1:3
1:1

Book value/Market value
2
3
4
1:2
1:2
1:1
1:1
1:0

1:1
1:1
1:0
1:0
1:0

1:0
1:0
1:0
1:0
0:9

5
1:0
1:2
1:1
1:1
0:9

Table 7.2: Beta against the market portfolio, US data 1957:1–2010:12. Size 1: smallest
20% of the stocks, Size 5: largest 20% of the stocks. B/M 1: the 20% of the stocks with the smallest ratio of book to market value (growth stocks). B/M 5: the 20% of the stocks with the highest ratio of book to market value (value stocks).
Histogram of small growth stocks
0.1

Histogram of large value stocks
0.1

mean, std:
0.47 8.43

0.05

0

mean, std:
0.71 5.01

0.05

−20

−10
0
10
20
Monthly excess return, %

0

−20

−10
0
10
20
Monthly excess return, %

Monthly data on two U.S. indices, 1957:1−2010:12
Sample size: 648
The solid line is an estimated normal distribution

Figure 7.3: Comparison of small growth stock and large value stocks e a positive relation (across size quantiles) between E Ri t and ˇi —as predicted by CAPM
(see Cochrane (2001) 20.2, Figure 20.10). Second, within a size quintile there is a negative e relation (across BE/ME quantiles) between E Ri t and ˇi —in stark contrast to CAPM (see
Cochrane (2001) 20.2, Figure 20.11).
Figure 7.1 shows some results for US industry portfolios and Figures 7.4–7.6 for US size/book-to-market portfolios.

154

18
16

Mean excess return, %

14
12
10
8
6

US data 1957:1−2010:12
25 FF portfolios (B/M and size)

4

p−value for test of model: 0.00

4

6
8
10
12
14
Predicted mean excess return (CAPM), %

16

18

Figure 7.4: CAPM, FF portfolios
7.1.6

Representative Results on Mutual Fund Performance

Mutual fund evaluations (estimated ˛i ) typically find (i) on average neutral performance
(or less: trading costs&fees); (ii) large funds might be worse; (iii) perhaps better performance on less liquid (less efficient?) markets; and (iv) there is very little persistence in performance: ˛i for one sample does not predict ˛i for subsequent samples (except for bad funds).

7.2

Several Factors

e
In multifactor models, (7.2) is still valid—provided we reinterpret bi and Rmt as vectors, e e e so bi Rmt stands for bi o Rot C bip Rpt C ::: e e e Ri t D ˛ C bi o Rot C bip Rpt C ::: C "i t :

(7.18)

155

18
16

Mean excess return, %

14
12
lines connect same size

10
8

1 (small)
2
3
4
5 (large)

6
4
4

6
8
10
12
14
Predicted mean excess return (CAPM), %

16

18

Figure 7.5: CAPM, FF portfolios
In this case, (7.2) is a multiple regression, but the test (7.4) still has the same form (the standard deviation of the intercept will be different, though).
Fama and French (1993) also try a multi-factor model. They find that a three-factor model fits the 25 stock portfolios fairly well (two more factors are needed to also fit the seven bond portfolios that they use). The three factors are: the market return, the return on a portfolio of small stocks minus the return on a portfolio of big stocks (SMB), and the return on a portfolio with high BE/ME minus the return on portfolio with low BE/ME
(HML). This three-factor model is rejected at traditional significance levels, but it can still capture a fair amount of the variation of expected returns (see Cochrane (2001) 20.2,
Figures 20.12–13).
Chen, Roll, and Ross (1986) use a number of macro variables as factors—along with traditional market indices. They find that industrial production and inflation surprises are priced factors, while the market index might not be.
Figure 7.7 shows some results for the Fama-French model on US industry portfolios and Figures 7.8–7.10 on the 25 Fama-French portfolios.
156

18
16

Mean excess return, %

14
12
lines connect same B/M

10
8

1 (low)
2
3
4
5 (high)

6
4
4

6
8
10
12
14
Predicted mean excess return (CAPM), %

16

18

Figure 7.6: CAPM, FF portfolios

Mean excess return

US industry portfolios, 1970:1−2010:12
15
10
5
0

D
A
H

EFI

GC

J

B

0
5
10
15
Predicted mean excess return (with α=0)

all
A (NoDur)
B (Durbl)
C (Manuf)
D (Enrgy)
E (HiTec)
F (Telcm)
G (Shops)
H (Hlth )
I (Utils)
J (Other)

alpha
NaN
2.53
−4.46
−0.31
3.38
1.68
1.25
0.50
4.31
−0.01
−2.77

pval
0.00
0.06
0.02
0.75
0.14
0.30
0.50
0.75
0.01
0.99
0.01

StdErr
NaN
8.63
12.13
6.07
14.15
10.14
11.12
9.78
10.92
10.53
6.17

Fama−French model
Factors: US market, SMB (size), and HML (book−to−market) alpha and StdErr are in annualized %

Figure 7.7: Fama-French regressions on US industry indices

157

18
16

Mean excess return, %

14
12
10
8
6

US data 1957:1−2010:12
25 FF portfolios (B/M and size)

4

p−value for test of model: 0.00

4

6

8
10
12
14
Predicted mean excess return (FF), %

16

18

Figure 7.8: FF, FF portfolios

7.3

Fama-MacBeth

Reference: Cochrane (2001) 12.3; Campbell, Lo, and MacKinlay (1997) 5.8; Fama and
MacBeth (1973)
The Fama and MacBeth (1973) approach is a bit different from the regression approaches discussed so far. The method has three steps, described below.
First, estimate the betas ˇi (i D 1; : : : ; n) from (7.2) (this is a time-series regression). This is often done on the whole sample—assuming the betas are constant.
Sometimes, the betas are estimated separately for different sub samples (so we
O
could let ˇi carry a time subscript in the equations below).
Second, run a cross sectional regression for every t . That is, for period t , estimate t from the cross section (across the assets i D 1; : : : ; n) regression e Ri t D

0O t ˇi

C "i t ;

(7.19)

158

18
16

Mean excess return, %

14
12
lines connect same size

10
8

1 (small)
2
3
4
5 (large)

6
4
4

6

8
10
12
14
Predicted mean excess return (FF), %

16

18

Figure 7.9: FF, FF portfolios
O
where ˇi are the regressors. (Note the difference to the traditional cross-sectional e approach discussed in (7.9), where the second stage regression regressed E Ri t on
O
ˇi , while the Fama-French approach runs one regression for every time period.)
Third, estimate the time averages
"i D
O

T
1X
"i t for i D 1; : : : ; n, (for every asset)
O
T t D1

T
X
OD1
O t:
T t D1

(7.20)
(7.21)

O
The second step, using ˇi as regressors, creates an errors-in-variables problem since
O
ˇi are estimated, that is, measured with an error. The effect of this is typically to bias the estimator of t towards zero (and any intercept, or mean of the residual, is biased upward).
One way to minimize this problem, used by Fama and MacBeth (1973), is to let the assets be portfolios of assets, for which we can expect some of the individual noise in the first159

18
16

Mean excess return, %

14
12
lines connect same B/M

10
8

1 (low)
2
3
4
5 (high)

6
4
4

6

8
10
12
14
Predicted mean excess return (FF), %

16

18

Figure 7.10: FF, FF portfolios
O
step regressions to average out—and thereby make the measurement error in ˇi smaller.
If CAPM is true, then the return of an asset is a linear function of the market return and an error which should be uncorrelated with the errors of other assets—otherwise some factor is missing. If the portfolio consists of 20 assets with equal error variance in a CAPM regression, then we should expect the portfolio to have an error variance which is 1/20th as large.
We clearly want portfolios which have different betas, or else the second step regression (7.19) does not work. Fama and MacBeth (1973) choose to construct portfolios according to some initial estimate of asset specific betas. Another way to deal with the errors-in-variables problem is to adjust the tests.
We can test the model by studying if "i D 0 (recall from (7.20) that "i is the time average of the residual for asset i , "it ), by forming a t-test "i = Std."i /. Fama and MacBeth
O
O
(1973) suggest that the standard deviation should be found by studying the time-variation in "i t . In particular, they suggest that the variance of "i t (not "i ) can be estimated by the
O
O
O

160

(average) squared variation around its mean
T
1X
."i t
Var."i t / D
O
O
T t D1

"i / 2 :
O

(7.22)

Since "i is the sample average of "i t , the variance of the former is the variance of the latter
O
O divided by T (the sample size)—provided "i t is iid. That is,
O
T
1
1X
."i t
Var."i / D Var."i t / D 2
O
O
O
T
T t D1

"i /2 :
O

(7.23)

A similar argument leads to the variance of O
T
X
O/ D 1
Var.
.O t
T 2 t D1

O /2 :

(7.24)

Fama and MacBeth (1973) found, among other things, that the squared beta is not significant in the second step regression, nor is a measure of non-systematic risk.

A

Statistical Tables
Critical values
10%
5%
1%
10
1:81 2:23 3:17
20
1:72 2:09 2:85
30
1:70 2:04 2:75
40
1:68 2:02 2:70
50
1:68 2:01 2:68
60
1:67 2:00 2:66
70
1:67 1:99 2:65
80
1:66 1:99 2:64
90
1:66 1:99 2:63
100
1:66 1:98 2:63
Normal 1:64 1:96 2:58 n Table A.1: Critical values (two-sided test) of t distribution (different degrees of freedom) and normal distribution.
161

Critical values
10%
5%
1%
1
2:71
3:84
6:63
2
4:61
5:99
9:21
3
6:25
7:81 11:34
4
7:78
9:49 13:28
5
9:24 11:07 15:09
6 10:64 12:59 16:81
7 12:02 14:07 18:48
8 13:36 15:51 20:09
9 14:68 16:92 21:67
10 15:99 18:31 23:21 n Table A.2: Critical values of chisquare distribution (different degrees of freedom, n).

Bibliography
Campbell, J. Y., A. W. Lo, and A. C. MacKinlay, 1997, The econometrics of financial markets, Princeton University Press, Princeton, New Jersey.
Chen, N.-F., R. Roll, and S. A. Ross, 1986, “Economic forces and the stock market,”
Journal of Business, 59, 383–403.
Cochrane, J. H., 2001, Asset pricing, Princeton University Press, Princeton, New Jersey.
Elton, E. J., M. J. Gruber, S. J. Brown, and W. N. Goetzmann, 2010, Modern portfolio theory and investment analysis, John Wiley and Sons, 8th edn.
Fama, E., and J. MacBeth, 1973, “Risk, return, and equilibrium: empirical tests,” Journal of Political Economy, 71, 607–636.
Fama, E. F., and K. R. French, 1993, “Common risk factors in the returns on stocks and bonds,” Journal of Financial Economics, 33, 3–56.
Fama, E. F., and K. R. French, 1996, “Multifactor explanations of asset pricing anomalies,” Journal of Finance, 51, 55–84.
Gibbons, M., S. Ross, and J. Shanken, 1989, “A test of the efficiency of a given portfolio,”
Econometrica, 57, 1121–1152.
162

MacKinlay, C., 1995, “Multifactor models do not explain deviations from the CAPM,”
Journal of Financial Economics, 38, 3–28.
Wooldridge, J. M., 2002, Econometric analysis of cross section and panel data, MIT
Press.

163

8

Performance Analysis

Reference: Elton, Gruber, Brown, and Goetzmann (2010) 25
More advanced material is denoted by a star ( ). It is not required reading.

8.1

Performance Evaluation

Reference: Elton, Gruber, Brown, and Goetzmann (2010) 25
8.1.1

The Idea behind Performance Evaluation

Traditional performance analysis tries to answer the following question: “should we include an asset in our portfolio, assuming that future returns will have the same distribution as in a historical sample.” Since returns are random variables (although with different means, variances, etc) and investors are risk averse, this means that performance analysis will typically not rank the fund with the highest return (in a historical sample) first.
Although that high return certainly was good for the old investors, it is more interesting to understand what kind of distribution of future returns this investment strategy might entail. In short, the high return will be compared with the risk of the strategy.
Most performance measures are based on mean-variance analysis, but the full MV portfolio choice problem is not solved. Instead, the performance measures can be seen as different approximations of the MV problem, where the issue is whether we should invest in fund p or in fund q . (We don’t allow a mix of them.) Although the analysis is based on the MV model, it is not assumed that all assets (portfolios) obey CAPM’s beta representation—or that the market portfolio must be the optimal portfolio for every investor. One motivation of this approach could be that the investor (who is doing the performance evaluation) is a MV investor, but that the market is influenced by non-MV investors. Of course, the analysis is also based on the assumption that historical data are good forecasters of the future.

164

There are several popular performance measures, corresponding to different situations: is this an investment of your entire wealth, or just a small increment? However, all these measures are (increasing) functions of Jensen’s alpha, the intercept in the CAPM regression e e e
Ri t D ˛i C bi Rmt C "i t , where E "i t D 0 and Cov.Rmt ; "i t / D 0:

(8.1)

Example 8.1 (Statistics for example of performance evaluations) We have the following information about portfolios m (the market), p , and q
˛
m p q

ˇ

0:000 1:000
0:010 0:900
0:050 1:300

Std."/

e

0:000 0:100 0:180
0:140 0:100 0:214
0:030 0:180 0:236

Table 8.1: Basic facts about the market and two other portfolios, ˛ , ˇ , and Std."/ are from e e
CAPM regression: Ri t D ˛ C ˇRmt C "i t

8.1.2

Sharpe Ratio and M 2 : Evaluating the Overall Portfolio

Suppose we want to know if fund p is better than fund q to place all our savings in.
(We don’t allow a mix of them.) The answer is that p is better if it has a higher Sharpe ratio—defined as e SRp D p = p :
(8.2)
The reason is that MV behaviour (MV preferences or normally distributed returns) implies that we should maximize the Sharpe ratio (selecting the tangency portfolio). Intuitively, for a given volatility, we then get the highest expected return.
Example 8.2 (Performance measure) From Example 8.1 we get the following performance measures
A version of the Sharpe ratio, called M 2 (after some of the early proponents of the measure: Modigliani and Modigliani) is
2
Mp D

e p e m .or

p

m /;

(8.3)
165

SR m p q AR

M2

0:556
0:467
0:763

Treynor

0:000
0:016 0:071
0:037 1:667

T2

0:100 0:000
0:111 0:011
0:138 0:038

Table 8.2: Performance Measures
Sharpe ratio and M2
0.2

q o Data on m, p, q:

0.18

SR: 0.56 0.47 0.76

0.16

q* o 2

M in %: 0.00 −1.59 3.73

0.14 m o

CML
CAL(p)
CAL(q)

0.12
0.1

o p o p* 0.08
0.06
0.04

CML = Rf + σµe /σm (slope is SRm) m 0.02

CAL(x) = Rf + σµe/σx (slope is SRx) x 0

0

0.05

0.1

0.15

0.2

0.25

σ

Figure 8.1: Sharpe ratio and M 2 e where p is the expected return on a mix of portfolio p and the riskfree asset such that the volatility is the same as for the market return.

Rp D aRp C .1

a/Rf , with a D

m= p :

(8.4)

This gives the mean and standard deviation of portfolio p e p p Da

Da

e p p

D

D

e p m= p m: (8.5)
(8.6)

166

The latter shows that Rp indeed has the same volatility as the market. See Example 8.2 and Figure 8.1 for an illustration.
M 2 has the advantage of being easily interpreted—it is just a comparison of two returns. It shows how much better (or worse) this asset is compared to the capital market line (which is the location of efficient portfolios provided the market is MV efficient).
However, it is just a scaling of the Sharpe ratio.
To see that, use (8.2) to write
2
Mp D SRp

D SRp

SRm

p

SRm

m

m:

(8.7)

The second line uses the facts that Rp has the same Sharpe ratio as Rp (see (8.5)–(8.6)) and that Rp has the same volatility as the market. Clearly, the portfolio with the highest
Sharpe ratio has the highest M 2 .
8.1.3

Appraisal Ratio: Which Portfolio to Combine with the Market Portfolio?

If the issue is “should I add fund p or fund q to my holding of the market portfolio?,” then the appraisal ratio provides an answer. The appraisal ratio (also called the information ratio) of fund p is
ARp D ˛p = Std."pt /;
(8.8)
where ˛p is the intercept and Std."pt / the volatility of the residual of a CAPM regression
(8.1). (The residual is often called the tracking error.) A higher appraisal ratio is better.
The motivation is that if we take the market portfolio and portfolio p to be the available assets, and then find the optimal (assuming MV preferences) combination of them, then the squared Sharpe ratio of the optimal portfolio (that is, the tangency portfolio) is
2
SRc

D

Â

˛p
Std."pt /

Ã2

2
C SRm :

(8.9)

If the alpha is positive, a higher appraisal ratio gives a higher Sharpe ratio—which is the objective if we have MV preferences. See Example 8.2 for an illustration.
If the alpha is negative, and we rule out short sales, then (8.9) is less relevant. In this case, the optimal portfolio weight on an asset with a negative alpha is (very likely to be) zero—so those assets are uninteresting.
167

Proof. From the CAPM regression (8.1) we have
#
"
#"
" e 2
2
ˇi2 m C Var."i t / ˇi m
Ri t
, and
D
Cov
2
2 e ˇi m
Rmt
m

e i e m #

"
D

˛i C ˇi

e m #
:

e m Suppose we use this information to construct a mean-variance frontier for both Ri t and e Rmt , and we find the tangency portfolio, with excess return Rct . We assume that there are no restrictions on the portfolio weights. Recall that the square of the Sharpe ratio of the tangency portfolio is e0 ˙ 1 e , where e is the vector of expected excess returns and ˙ is the covariance matrix. By using the covariance matrix and mean vector above, we get that the squared Sharpe ratio for the tangency portfolio (using both Ri t and Rmt ) is
Â

e c Ã2

c

8.1.4

˛i2
C
D
Var."i t /

Â

e m Ã2
:

m

Treynor’s Ratio and T 2 : Portfolio is a Small Part of the Overall Portfolio

Suppose instead that the issue is if we should add a small amount of fund p or fund q to an already well diversified portfolio (not the market portfolio). In this case, Treynor’s ratio might be useful e TRp D p =ˇp :
(8.10)
A higher Treynor’s ratio is better.
If we mix p and q with the riskfree rate to get the same ˇ for both portfolios (here 1 to make it comparable with the market), the one with the highest Treynor’s ratio has the highest expected return. To show this consider the portfolio p
Rp D aRp C .1

a/Rf , with a D 1=ˇp :

(8.11)

This gives the mean and the beta of portfolio p e p

Da

e p D

e p =ˇp

ˇp D aˇp D 1;

(8.12)
(8.13)

168

so the beta is one. The T 2 measure is then
2
Tp D

e p e m D

e p =ˇp

e m: (8.14)

See Example 8.2 and Figure 8.2 for an illustration.
The basic intuition is that with a diversified portfolio and small investment, idiosyncratic risk doesn’t matter, only systematic risk (ˇ ) does. Compare with the setting of the Appraisal Ratio, where we also have a well diversified portfolio (the market), but the investment could be large.
Example 8.3 (Additional portfolio risk) We hold a well diversified portfolio (d ) and buy a fraction 0.05 of asset i (financed by borrowing), so the return is R D Rd C
2
0:05 Ri Rf . Suppose d D i2 D 1 and that the correlation of d and i is 0.25.
The variance of R is then
2
d

C ı2

2 i C 2ı

id

D 1 C 0:052 C 2

0:05

0:25 D 1 C 0:0025 C 0:025;

so the importance of the covariance is 10 times larger than the importance of the variance of asset i .
Proof. ( Version 1: Based on the beta representation.) The derivation of the beta representation shows that for all assets e D Cov .Ri ; Rm / A, where A is some constant. i 2
Rearrange as e =ˇi D A m . A higher ratio than this is to be considered as a positive i “abnormal” return and should prompt a higher investment.
Proof. ( Version 2: From first principles, kind of a proof...) Suppose we initially hold a well diversified portfolio (d ) and we increase the position in asset i with the fraction ı by borrowing at the riskfree rate to get the return
R D Rd C ı Ri

Rf :

The incremental (compared to holding portfolio d ) expected excess return is ı e and the i 22
2
incremental variance is ı i C 2ı id
2ı id , since ı is very small. (The variance of R
2
22 is d C ı i C 2ı id .) To a first-order approximation, the change (E Rp Var.Rp /k=2) in utility is therefore ı e k ı id , so a high value of e = id will increase utility. This i i e suggests i = id as a performance measure. However, if portfolio d is indeed well die versified, then i d i m . We could therefore use i = i m or (by multiplying by m m ),
169

Treynor’s measure and T2
0.2

q o Data on m, p, q:

0.18

TRp: 0.10 0.11 0.14

0.16

q* o 2

T in %: 0.00 1.11 3.85

0.14 p o

SML
TreynorLine(p)
TreynorLine(q)

0.12
0.1

p* o o m 0.08
0.06
0.04

SML = Rf + βµe m 0.02

TreynorLine(x) = Rf + βµe/βx (slope is TRx) x 0

0

0.2

0.4

0.6

0.8

1

1.2

1.4

β

Figure 8.2: Treynor’s ratio e i =ˇi

8.1.5

as a performance measure.
Relationships among the Various Performance Measures

The different measures can give different answers when comparing portfolios, but they all share one thing: they are increasing in Jensen’s alpha. By using the expected values e from the CAPM regression ( p D ˛p C ˇp e ), simple rearrangements give m SRp D

˛p p C Corr.Rp ; Rm /SRm

˛p
Std."pt /
˛p
TRp D
C e: m ˇp

ARp D

(8.15)

and M 2 is just a scaling of the Sharpe ratio. Notice that these expressions do not assume that CAPM is the right pricing model—we just use the definition of the intercept and slope in the CAPM regression.
170

Since Jensen’s alpha is the driving force in all these measurements, it is often used as performance measure in itself. In a sense, we are then studying how “mispriced” a fund is—compared to what it should be according to CAPM. That is, the alpha measures the
“abnormal” return. e Proof. (of (8.15) ) Taking expectations of the CAPM regression (8.1) gives p D
2
˛p C ˇp e , where ˇp D Cov.Rp ; Rm /= m . The Sharpe ratio is therefore m SRp D

e p p

D

˛p p C

ˇp

e m; p

which can be written as in (8.15) since
ˇp
p

e m D

Cov.Rp ; Rm / mp e m :

m

The ARp in (8.15) is just a definition. The TRp measure can be written
TRp D

e p ˇp

D

˛p
C
ˇp

where the second equality uses the expression for
˛
Market
Putnam
Vanguard

SR

0:000 0:266
2:454 0:096
0:246 0:205

e m; e p from above.

M2

AR

0:000
3:320
1:194

0:282
0:035

Treynor
5:207
2:155
4:763

T2
0:000
3:052
0:443

Table 8.3: Performance Measures of Putnam Asset Allocation: Growth A and Vanguard
Wellington, weekly data 1996:1-2011:5

8.1.6

Performance Measurement with More Sophisticated Benchmarks

Traditional performance tests typically rely on the alpha from a CAPM regression. The benchmark for the evaluation is then effectively a fixed portfolio consisting of assets that are correctly priced by the CAPM (obeys the beta representation). It often makes sense to use a more demanding benchmark. There are several popular alternatives.
If there are predictable movements in the market excess return, then it makes sense to add a “market timing” factor to the CAPM regression. For instance, Treynor and Mazuy
171

(1966) argues that market timing is similar to having a beta that is linear in the market excess return e ˇi D bi C ci Rmt :
(8.16)
e e Using in a traditional market model (CAPM) regression, Ri t D ai C ˇi Rmt C "i t , gives

(8.17)

e e e
Ri t D ai C bi Rmt C ci .Rmt /2 C "i t ;

where c captures the ability to “time” the market. That is, if the investor systematically gets out of the market (maybe investing in a riskfree asset) before low returns and vice versa, then the slope coefficient c is positive. The interpretation is not clear cut, however.
If we still regard the market portfolio (or another fixed portfolio that obeys the beta repe resentation) as the benchmark, then a C c.Rmt /2 should be counted as performance. In contrast, if we think that this sort of market timing is straightforward to implement, that is, if the benchmark is the market plus market timing, then only a should be counted as performance. In other cases (especially when we think that CAPM gives systematic pricing errors), then the performance is measured by the intercept of a multifactor model like the FamaFrench model.
A recent way to merge the ideas of market timing and multi-factor models is to allow the coefficients to be time-varying. In practice, the coefficients in period t are only allowed to be linear (or affine) functions of some information variables in an earlier period, z t 1 . To illustrate this, suppose z t 1 is a single variable, so the time-varying (or
“conditional”) CAPM regression is e Ri t D .ai C

i zt 1/

D Âi1 C Âi 2 z t

1

C .bi C ıi z t

e
1 / Rmt

e
C Âi 3 Rmt C Âi 4 z t

C "i t

e
1 Rmt

C "i t :

(8.18)

Similar to the market timing regression, there are two possible interpretations of the results: if we still regard the market portfolio as the benchmark, then the other three terms should be counted as performance. In contrast, if the benchmark is a dynamic strategy in the market portfolio (where z t 1 is allowed to affect the choice market portfolio/riskfree asset), then only the first two terms are performance. In either case, the performance is time-varying. 172

8.2

Performance Attribution

The performance of a fund is in many cases due to decisions taken on several levels. In order to get a better understanding of how the performance was generated, a performance attribution calculation can be very useful. It uses information on portfolio weights (for instance, in-house information) to decompose overall performance according to a number of criteria (typically related to different levels of decision making).
For instance, it could be to decompose the return (as a rough measure of the performance) into the effects of (a) allocation to asset classes (equities, bonds, bills); and (b) security choice within each asset class. Alternatively, for a pure equity portfolio, it could be the effects of (a) allocation to industries; and (b) security choice within each industry.
Consider portfolios p and B (for benchmark) from the same set of assets. Let n be the number of asset classes (or industries). Returns are
Rp D

n
X
i D1

wi RP i and RB D

n
X

(8.19)

vi RBi ;

i D1

where wi is the weight on asset class i (for instance, long T-bonds) in portfolio p , and vi is the corresponding weight in the benchmark B . Analogously, RP i is the return that the portfolio earns on asset class i , and RBi is the return the benchmark earns. In practice, the benchmark returns are typically taken from well established indices.
Form the difference and rearrange to get
Rp

RB D
D

n
X
i D1 n X i D1

.wi RP i
.wi

vi RBi /

vi / RBi C

n
X

wi .Ri

RBi / :

(8.20)

i D1

The first term, .wi vi / RBi , is the contribution from asset class (or industry) i . It uses the benchmark return for that asset class (as if you had invested in that index), and simply measures the contribution from investing more/less in that asset class than the benchmark.
If decisions on allocation to different asset classes are taken by senior management (or a board), then this is the contribution of that level. The second term, wi .RP i RBi /, is the contribution of the security choice (within an asset class) since it measures the difference in returns (within that asset class) of the portfolio and the benchmark.
173

8.3

Style Analysis

Reference: Sharpe (1992)
Style analysis is a way to use econometric tools to find out the portfolio composition from a series of the returns, at least in broad terms.
The basic idea is to identify a number (5 to 10 perhaps) return indices that are expected to account for the brunt of the portfolio’s returns, and then run a regression to find the portfolio “weights.” It is essentially a multi-factor regression without any intercept and where the coefficients are constrained to sum to unity and to be positive e Rpt

D

K
X
j D1

e bj Rjt

K
X
C "pt ; with bj D 1 and bj

0 for all j:

(8.21)

j D1

The coefficients are typically estimated by minimizing the sum of squared residuals. This is a nonlinear estimation problem, but there are very efficient methods for it (since it is a quadratic problem). Clearly, the restrictions could be changed to Uj Ä bj Ä Lj , which could allow for short positions.
A pseudo-R2 (the squared correlation of the fitted and actual values) is sometimes used to gauge how well the regression captures the returns of the portfolio. The residuals can be thought of as the effect of stock selection, or possibly changing portfolio weights more generally. One way to get a handle of the latter is to run the regression on a moving data sample. The time-varying weights are often compared with the returns on the indices to see if the weights were moved in the right direction.
See Figure 8.3 and Figure 8.5 for examples.

Bibliography
Elton, E. J., M. J. Gruber, S. J. Brown, and W. N. Goetzmann, 2010, Modern portfolio theory and investment analysis, John Wiley and Sons, 8th edn.
Sharpe, W. F., 1992, “Asset allocation: management style and performance measurement,” Journal of Portfolio Management, 39, 119–138.

174

Putnam Asset Allocation: Growth A: style analysis on moving data window
Equity: Int. (ex US), Developed
Equity: US, LargeCap, Value
Fixed Income: US, Bills

Static weights:
Equity: Int. (ex US), Developed
0.41
0.6 Equity: US, LargeCap, Value
0.21
Fixed Income: US, Bills
0.12

0.5
0.4 R2=0.90
0.3
0.2
0.1
0
1996

1998

2000

2002

2004

2006

2008

2010

2012

Figure 8.3: Example of style analysis, rolling data window
Vanguard Wellington: style analysis on moving data window
Equity: US, LargeCap, Value
Fixed Income: US, Corp. Bonds
Fixed Income: US, Gov. Bonds

Static weights:
Equity: US, LargeCap, Value
0.51
0.6 Fixed Income: US, Corp. Bonds
0.25
Fixed Income: US, Gov. Bonds
0.12

0.5
0.4
0.3
R2=0.89

0.2
0.1
0
1996

1998

2000

2002

2004

2006

2008

2010

2012

Figure 8.4: Example of style analysis, rolling data window
175

Vanguard Wellington: weight and relative return on the index Equity: US, LargeCap, Value
0.5

Weight

0.2
0

Index return minus SP500 return

0
1996
1996

1998

2000

2002

2004

−0.2
2006

2008

2010

2012

Vanguard Wellington: weight and relative return on the index Fixed Income: US, Corp. Bonds

0
0
1996

1998

2000

2002

2004

2006

2008

2010

2012

Vanguard Wellington: weight and relative return on the index Fixed Income: US, Gov. Bonds
0.5
0
0
1996

1998

2000

2002

2004

2006

2008

2010

2012

Figure 8.5: Style analysis and returns

176

9

Predicting Asset Returns

Reference (medium): Elton, Gruber, Brown, and Goetzmann (2010) 17 (efficient markets) and 26 (earnings estimation)
Additional references: Campbell, Lo, and MacKinlay (1997) 2 and 7; Cochrane (2001)
20.1
More advanced material is denoted by a star ( ). It is not required reading.

9.1

Asset Prices, Random Walks, and the Efficient Market Hypothesis

Let P t be the price of an asset at the end of period t , after any dividend in t has been paid
(an ex-dividend price). The gross return (1 C R t C1 , like 1.05) of holding an asset with dividends (per current share), D t C1 , between t and t C 1 is then defined as
1 C R t C1 D

P t C1 C D t C1
:
Pt

(9.1)

The dividend can, of course, be zero in a particular period, so this formulation encompasses the case of daily stock prices with annual dividend payment.
Remark 9.1 (Conditional expectations) The expected value of the random variable y t C1 conditional on the information set in t , E t y t C1 is the best guess of y t C1 using the information in t . Example: suppose y t C1 equals x t C " t C1 , where x t is known in t , but all we know about " t C1 in t is that it is a random variable with a zero mean and some (finite) variance. In this case, the best guess of y t C1 based on what we know in t is equal to x t .
Take expectations of (9.1) based on the information set in t
E t P t C1 C E t D t C1 or Pt
E t P t C1 C E t D t C1
Pt D
:
1 C E t R t C1

1 C E t R t C1 D

(9.2)
(9.3)

177

This formulation is only a definition, but it will help us organize the discussion of how asset prices are determined.
This expected return, E t R t C1 , is likely to be greater than a riskfree interest rate if the asset has positive systematic (non-diversifiable) risk. For instance, in a CAPM model this would manifest itself in a positive “beta.” In an equilibrium setting, we can think of this as a “required return” needed for investors to hold this asset.
9.1.1

Different Versions of the Efficient Market Hypothesis

The efficient market hypothesis casts a long shadow on every attempt to forecast asset prices. In its simplest form it says that it is not possible to forecast asset prices, but there are several other forms with different implications. Before attempting to forecast financial markets, it is useful to take a look at the logic of the efficient market hypothesis. This will help us to organize the effort and to interpret the results.
A modern interpretation of the efficient market hypothesis (EMH) is that the information set used in forming the market expectations in (9.2) includes all public information.
(This is the semi-strong form of the EMH since it says all public information; the strong form says all public and private information; and the weak form says all information in price and trading volume data.) The implication is that simple stock picking techniques are not likely to improve the portfolio performance, that is, abnormal returns. Instead, advanced (costly?) techniques are called for in order to gather more detailed information than that used in market’s assessment of the asset. Clearly, with a better forecast of the future return than that of the market there is plenty of scope for dynamic trading strategies. Note that this modern interpretation of the efficient market hypothesis does not rule out the possibility of forecastable prices or returns. It does rule out that abnormal returns can be achieved by stock picking techniques which rely on public information.
There are several different traditional interpretations of the EMH. Like the modern interpretation, they do not rule out the possibility of achieving abnormal returns by using better information than the rest of the market. However, they make stronger assumptions about whether prices or returns are forecastable. Typically one of the following is assumed to be unforecastable: price changes, returns, or returns in excess of a riskfree rate
(interest rate). By unforecastable, it is meant that the best forecast (expected value conditional on available information) is a constant. Conversely, if it is found that there is some information in t that can predict returns R t C1 , then the market cannot price the asset as
178

if E t R t C1 is a constant—at least not if the market forms expectations rationally. We will now analyze the logic of each of the traditional interpretations.
If price changes are unforecastable, then E t P t C1 P t equals a constant. Typically, this constant is taken to be zero so P t is a martingale. Use E t P t C1 D P t in (9.2)
E t R t C1 D

E t D t C1
:
Pt

(9.4)

This says that the expected net return on the asset is the expected dividend divided by the current price. This is clearly implausible for daily data since it means that the expected return is zero for all days except those days when the asset pays a dividend (or rather, the day the asset goes ex dividend)—and then there is an enormous expected return for the one day when the dividend is paid. As a first step, we should probably refine the interpretation of the efficient market hypothesis to include the dividend so that E t .P t C1 C D t C1 / D P t .
Using that in (9.2) gives 1 C E t R t C1 D 1, which can only be satisfied if E t R t C1 D 0, which seems very implausible for long investment horizons—although it is probably a reasonable approximation for short horizons (a week or less).
If returns are unforecastable, so E t R t C1 D R (a constant), then (9.3) gives
Pt D

E t P t C1 C E t D t C1
:
1CR

(9.5)

The main problem with this interpretation is that it looks at every asset separately and that outside options are not taken into account. For instance, if the nominal interest rate changes from 5% to 10%, why should the expected (required) return on a stock be unchanged? In fact, most asset pricing models suggest that the expected return E t R t C1 equals the riskfree rate plus compensation for risk.
If excess returns are unforecastable, then the compensation (over the riskfree rate) for risk is constant. The risk compensation is, of course, already reflected in the current price P t , so the issue is then if there is some information in t which is correlated with the risk compensation in P t C1 . Note that such forecastability does not necessarily imply an inefficient market or presence of uninformed traders—it could equally well be due to movements in risk compensation driven by movements in uncertainty (option prices suggest that there are plenty of movements in uncertainty). If so, the forecastability cannot be used to generate abnormal returns (over riskfree rate plus risk compensation). However, it could also be due to exploitable market inefficiencies. Alternatively, you may argue
179

that the market compensates for risk which you happen to be immune to—so you are interested in the return rather than the risk adjusted return.
This discussion of the traditional efficient market hypothesis suggests that the most interesting hypotheses to test are if returns or excess returns are forecastable. In practice, the results for them are fairly similar since the movements in most asset returns are much greater than the movements in interest rates.
9.1.2

Martingales and Random Walks

Further reading: Cuthbertson (1996) 5.3
The accumulated wealth in a sequence of fair bets is expected to be unchanged. It is then said to be a martingale.
The time series x is a martingale with respect to an information set ˝ t if the expected value of x t Cs (s 1) conditional on the information set ˝ t equals x t . (The information set ˝ t is often taken to be just the history of x : x t ; x t 1 ; :::)
The time series x is a random walk if x t C1 D x t C " t C1 , where " t and " t Cs are uncorrelated for all s ¤ 0, and E " t D 0. (There are other definitions which require that
" t and " t Cs have the same distribution.) A random walk is a martingale; the converse is not necessarily true.
Remark 9.2 (A martingale, but not a random walk). Suppose y t C1 D y t u t C1 , where u t and u t Cs are uncorrelated for all s ¤ 0, and E t u t C1 D 1 . This is a martingale, but not a random walk.
In any case, the martingale property implies that x t Cs D x t C " t Cs , where the expected value of " t Cs based on ˝ t is zero. This is close enough to the random walk to motivate the random walk idea in most cases.

180

9.2
9.2.1

Autocorrelations
Autocorrelation Coefficients and the Box-Pierce Test

The autocovariances of the y t process can be estimated as
T
1X
.y t
Os D
T t D1Cs

y/ .y t
N

s

(9.6)

y/ ;
N

T
1X
with y D
N
yt :
T t D1

(We typically divide by T in (9.6) even if we have only T s from.) Autocorrelations are then estimated as

(9.7) s full observations to estimate

(9.8)

Os D Os = O0 :

The sampling properties of Os are complicated, but there are several useful large sample results for Gaussian processes (these results typically carry over to processes which are similar to the Gaussian—a homoskedastic process with finite 6th moment is typically enough, see Priestley (1981) 5.3 or Brockwell and Davis (1991) 7.2-7.3). When the true autocorrelations are all zero (not 0 , of course), then for any i and j different from zero
#!
"
#
" #" p Oi
0
10
T
!d N
;
:
(9.9)
Oj
0
01
This result can be used to construct tests for both single autocorrelations (t-test or and several autocorrelations at once ( 2 test).

2

test)

Example 9.3 (t-test) We want to test the hypothesis that 1 D 0. Since the N.0; 1/ distribution has 5% of the probability mass below -1.65 and another 5% above 1.65, we p can reject the null hypothesis at the 10% level if T j O1 j > 1:65. With T D 100, we p therefore need j O1 j > 1:65= 100 D 0:165 for rejection, and with T D 1000 we need p j O1 j > 1:65= 1000 0:052.

p
The Box-Pierce test follows directly from the result in (9.9), since it shows that T Oi p and T Oj are iid N(0,1) variables. Therefore, the sum of the square of them is distributed
181

as a

2

variable. The test statistics typically used is
QL D T

L
X
s D1

2
O s !d

(9.10)

2
L:

Example 9.4 (Box-Pierce) Let O1 D 0:165, and T D 100, so Q1 D 100 0:1652 D
2:72. The 10% critical value of the 2 distribution is 2.71, so the null hypothesis of no
1
autocorrelation is rejected.
The choice of lag order in (9.10), L, should be guided by theoretical considerations, but it may also be wise to try different values. There is clearly a trade off: too few lags may miss a significant high-order autocorrelation, but too many lags can destroy the power of the test (as the test statistics is not affected much by increasing L, but the critical values increase). 9.2.2

Autoregressions

An alternative way of testing autocorrelations is to estimate an AR model y t D c C a1 y t

1

C a2 y t

2

C ::: C ap y t

p

C "t ;

(9.11)

and then test if all slope coefficients (a1 ; a2 ; :::; ap ) are zero with a 2 or F test. This approach is somewhat less general than the Box-Pierce test, but most stationary time series processes can be well approximated by an AR of relatively low order.
See Figure 9.5 for an illustration.
The autoregression can also allow for the coefficients to depend on the market situation. For instance, consider an AR(1), but where the autoregression coefficient may be different depending on the sign of last period’s return
(
1 if q is true y t D c C aı.y t 1 Ä 0/y t 1 C bı.y t 1 > 0/y t 1 , where ı.q/ D
(9.12)
0 else.
See Figure 9.3 for an illustration.
Inference of the slope coefficient in autoregressions on returns for longer data horizons than the data frequency (for instance, analysis of weekly returns in a data set consisting of daily observations) must be done with care. If only non-overlapping returns are used
(use the weekly return for a particular weekday only, say Wednesdays), the standard LS
182

expression for the standard deviation of the autoregressive parameter is likely to be reasonable. This is not the case, if overlapping returns (all daily data on weekly returns) are used. Remark 9.5 (Overlapping returns ) Consider an AR(1) for the two-period return, y t yt y t C1 C y t C2 D a C b2 .y t 1 C y t / C " t C2 :

1C

Two successive observations with non-overlapping returns are then y t C1 C y t C2 D a C b2 .y t

1

C y t / C " t C2

y t C3 C y t C4 D a C b2 .y t C1 C y t C2 / C " t C4 :
Suppose that y t is not autocorrelated, so the slope coefficient b2 D 0. We can then write the residuals as
" t C2 D
" t C4 D

a C y t C1 C y t C2

a C y t C3 C y t C4 ;

which are uncorrelated. Compare this to the case where we use overlapping data. Two successive observations are then y t C1 C y t C2 D a C b2 .y t

1

C y t / C " t C2

y t C2 C y t C3 D a C b2 .y t C y t C1 / C " t C3 :
As before, b2 D 0 if y t has no autocorrelation, so the residuals become
" t C2 D

" t C3 D

a C y t C1 C y t C2

a C y t C2 C y t C3 ;

which are correlated since y t C2 shows up in both. This demonstrates that overlapping return data introduces autocorrelation of the residuals—which has to be handled in order to make correct inference.

183

SMI

SMI daily excess returns, %

8

10
SMI
bill portfolio

6

5

4

0

2

−5

0

1990

1995

2000
Year

2005

2010

−10

1990

1995

2000
Year

2005

2010

Daily SMI data, 1988:7−2011:5
1st order autocorrelation of returns (daily, weekly, monthly): 0.02 −0.08 0.04
1st order autocorrelation of absolute returns (daily, weekly, monthly): 0.28 0.29 0.17

Figure 9.1: Time series properties of SMI
9.2.3

Autoregressions versus Autocorrelations

It is straightforward to see the relation between autocorrelations and the AR model when the AR model is the true process. This relation is given by the Yule-Walker equations.
For an AR(1), the autoregression coefficient is simply the first autocorrelation coefficient. For an AR(2), y t D a1 y t 1 C a2 y t 2 C " t , we have
2
32
3
Cov.y t ; y t /
Cov.y t ; a1 y t 1 C a2 y t 2 C " t /
6
76
7
4 Cov.y t 1 ; y t / 5 D 4 Cov.y t 1 ; a1 y t 1 C a2 y t 2 C " t / 5
Cov.y t 2 ; y t /
Cov.y t 2 ; a1 y t 1 C a2 y t 2 C " t /
2
3 a1 Cov.y t ; y t 1 / C a2 Cov.y t ; y t 2 / C Cov.y t ; " t /
6
7
D 4 a1 Cov.y t 1 ; y t 1 / C a2 Cov.y t 1 ; y t 2 /
5 , or a1 Cov.y t

2
6
4

0
1
2

C a2 Cov.y t
3
a1 1 C a2 2 C Var." t /
76
7
5 D 4 a1 0 C a2 1
5:
a1 1 C a2 0
3

2; yt 1/

2; yt 2/

2

(9.13)

184

Autocorr, daily excess returns

Autocorr, weekly excess returns

0.3

0.3
Autocorr with 90% conf band around 0
S&P 500, 1979:1−2011:5

0.2

0.2

0.1

0.1

0

0

−0.1

1

2

3 lags (days)

4

−0.1

5

Autocorr, daily abs(excess returns)

1

2

3
4
lags (weeks)

5

Autocorr, weekly abs(excess returns)

0.3

0.3

0.2

0.2

0.1

0.1

0

0

−0.1

1

2

3 lags (days)

4

−0.1

5

1

2

3
4
lags (weeks)

5

Figure 9.2: Predictability of US stock returns
To transform to autocorrelation, divide by
"
#"
#
" a1 C a2 1
1
D or a1 1 C a2
2

0.

1
2

The last two equations are then
#"
# a1 = .1 a2 /
D
:
2
a1 = .1 a2 / C a2

(9.14)

If we know the parameters of the AR(2) model (a1 , a2 , and Var." t /), then we can solve for the autocorrelations. Alternatively, if we know the autocorrelations, then we can solve for the autoregression coefficients. This demonstrates that testing if all the autocorrelations are zero is essentially the same as testing if all the autoregressive coefficients are zero. Note, however, that the transformation is non-linear, which may make a difference in small samples.

185

Autoregression coeff, after negative returns
0.1
with 90% conf band around 0
S&P 500 (daily), 1979:1−2011:5

0.05

Autoregression coeff, after positive returns
0.1
0.05

0

0

−0.05

−0.05

−0.1

1

2

3 lags (days)

4

5

−0.1

1

2

3 lags (days)

4

5

Based on the following regression:
Rt = α + β (1 − Qt−1 )Rt−1 + γQt−1 Rt−1 + ǫt
Qt−1 = 1 if Rt−1 > 0, and zero otherwise

Figure 9.3: Predictability of US stock returns, results from a regression with interactive dummies 9.2.4

Variance Ratios

A variance ratio is another way to measure predictability. It is defined as the variance of a q -period return divided by q times the variance of a 1-period return
Á
Pq 1
Var
yt s s D0
:
(9.15)
VRq D q Var.y t /
To see that this is related to predictability, consider the 2-period variance ratio.
Var.y t C y t 1 /
2 Var.y t /
Var .y t / C Var .y t 1 / C 2 Cov .y t ; y t
D
2 Var .y t /
Cov .y t ; y t 1 /
D1C
Var .y t /

(9.16)

VR2 D

D1C

1:

1/

(9.17)

It is clear from (9.17) that if y t is not serially correlated, then the variance ratio is unity; a value above one indicates positive serial correlation and a value below one indicates negative serial correlation. The same applies to longer horizons.
186

Autocorr, excess returns, smallest decile

Autocorr, excess returns, 5th decile

0.3

0.3

0.2

0.2

0.1

0.1

0

0

−0.1

1

2

3 lags (days)

4

−0.1

5

1

2

3 lags (days)

4

5

Autocorr with 90% conf band around 0
US daily data 1979:1−2010:12

Autocorr, excess returns, largest decile
0.3
0.2
0.1
0
−0.1

1

2

3 lags (days)

4

5

Figure 9.4: Predictability of US stock returns, size deciles
The estimation of VRq is typically not done by replacing the population variances in
(9.15) with the sample variances, since this would require using non-overlapping long returns—which wastes a lot of data points. For instance, if we have 24 years of data and we want to study the variance ratio for the 5-year horizon, then 4 years of data are wasted.
Instead, we typically rely on a transformation of (9.15)
Á
Pq 1
Var
s D0 y t s
VRq D q Var.y t /
Ã
q1

js j
D
1 s or q s D .q 1/

D1C2

q1

s D1

1

s q à s: (9.18)

187

Return = a + b*lagged Return, R2

Return = a + b*lagged Return, slope
0.5

Slope with 90% conf band

0.1
0
0.05
−0.5
0

20
40
Return horizon (months)

0

60

0

20
40
Return horizon (months)

60

US stock returns 1926:1−2011:4

Scatter plot, 36 month returns
2
Return

1
0
−1
−2
−2

−1

0
1
Lagged return

2

Figure 9.5: Predictability of US stock returns
To estimate VRq , we first estimate the autocorrelation coefficients (using all available data points for each estimation) and then calculate (9.18).

b

Remark 9.6 ( Sampling distribution of V Rq ) Under the null hypothesis that there is no autocorrelation, (9.9) and (9.18) give
" q1 Â
Ã#
Á
X
p s2 T V Rq 1 !d N 0;
41
: q s D1

b

b

b b b

Example 9.7 (Sampling distributions of V R2 and V R3 )
Á
p
T V R2 1 !d N .0; 1/ or V R2 !d N .1; 1=T /
Á
p and T V R3 1 !d N .1; 20=9/ or V R3 !d N Œ1; .20=9/=T  :

b b 188

Variance Ratio, 1926−

Variance Ratio, 1957−

VR with 90% conf band

1.5

1.5

1

1

0.5

0.5
0

20
40
Return horizon (months)

60

0

20
40
Return horizon (months)

60

US stock returns 1926:1−2011:4
The confidence bands use the asymptotic sampling distribution of the variance ratios

Figure 9.6: Variance ratios, US excess stock returns
The results in CLM Table 2.5 and 2.6 (weekly CRSP stock index returns, early 1960s to mid 1990s) show variance ratios above one and increasing with the number of lags, q .
The results for individual stocks in CLM Table 2.7 show variance ratios close to, or even below, unity. Cochrane Tables 20.5–6 report weak evidence for more mean reversion in multi-year returns (annual NYSE stock index,1926 to mid 1990s).
See Figure 9.6 for an illustration.

9.3

Other Predictors and Methods

There are many other possible predictors of future stock returns. For instance, both the dividend-price ratio and nominal interest rates have been used to predict long-run returns, and lagged short-run returns on other assets have been used to predict short-run returns.
9.3.1

Lead-Lags

Stock indices have more positive autocorrelation than (most) individual stocks: there should therefore be fairly strong cross-autocorrelations across individual stocks. (See
Campbell, Lo, and MacKinlay (1997) Tables 2.7 and 2.8.) Indeed, this is also what is found in US data where weekly returns of large size stocks forecast weekly returns of small size stocks.
189

See Figures 9.7–9.8 for an illustration.
Correlation of largest decile with lags of smallest decile
5th decile largest decile

0.3

Correlation of 5th decile with lags of smallest decile
5th decile largest decile

0.3

0.2

0.2

0.1

0.1

0

0

−0.1

1

2

3
Days

4

5

−0.1

Correlation of smallest decile with lags of

1

2

3
Days

4

5

US size deciles
US daily data 1979:1−2010:12

smallest decile
5th decile largest decile

0.3
0.2
0.1
0
−0.1

1

2

3
Days

4

5

Figure 9.7: Cross-correlation across size deciles

9.3.2

Dividend-Price Ratio as a Predictor

One of the most successful attempts to forecast long-run returns is a regression of future returns on the current dividend-price ratio (here in logs) q X s D1

r t Cs D ˛ C ˇq .d t

p t / C " t Cq :

(9.19)

For instance, CLM Table 7.1, report R2 values from this regression which are close to zero for monthly returns, but they increase to 0.4 for 4-year returns (US, value weighted index, mid 1920s to mid 1990s).
See Figure 9.10 for an illustration.
190

Regression of largest decile on lags of

Regression of 5th decile on lags of

Regression coefficient

0.2

0.2

0.1

0.1

0

0

−0.1
−0.2

−0.1

self

1

2

3
Days

4

5

−0.2

Regression of smallest decile on lags of
0.2

self largest decile

1

2

3
Days

4

5

US size deciles
US daily data 1979:1−2010:12
Multiple regression with lagged return on self and largest deciles as regressors.
The figures show regression coefficients.

0.1
0
−0.1
−0.2

self largest decile

1

2

3
Days

4

5

Figure 9.8: Coefficients from multiple prediction regressions
9.3.3

Predictability but No Autocorrelation

The evidence for US stock returns is that long-run returns may perhaps be predicted by the dividend-price ratio or interest rates, but that the long-run autocorrelations are weak (longrun US stock returns appear to be “weak-form efficient” but not “semi-strong efficient”).
This should remind us of the fact that predictability and autocorrelation need not be the same thing: although autocorrelation implies predictability, we can have predictability without autocorrelation.
9.3.4

Trading Strategies

Another way to measure predictability and to illustrate its economic importance is to calculate the return of a dynamic trading strategy, and then measure the “performance” of this strategy in relation to some benchmark portfolios. The trading strategy should, of
191

(Auto−)correlation matrix, monthly FF returns 1957:1−2010:12
1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

1

0.21

0.18

0.21

0.20

0.19

0.26

0.24

0.24

0.22

0.21

0.27

0.27

0.24

0.25

0.19

0.27

0.24

0.23

0.22

0.21

0.24

0.23

0.23

0.21

0.19

2

0.21

0.18

0.22

0.22

0.20

0.27

0.26

0.27

0.25

0.24

0.28

0.30

0.27

0.27

0.23

0.28

0.27

0.27

0.25

0.24

0.25

0.25

0.26

0.24

0.22

3

0.22

0.20

0.24

0.24

0.22

0.27

0.27

0.27

0.26

0.25

0.28

0.30

0.28

0.29

0.24

0.28

0.29

0.28

0.27

0.25

0.26

0.26

0.27

0.27

0.23

4

0.24

0.21

0.26

0.26

0.24

0.29

0.28

0.29

0.28

0.27

0.30

0.32

0.30

0.30

0.26

0.30

0.30

0.30

0.29

0.26

0.27

0.28

0.28

0.28

0.25

5

0.27

0.25

0.30

0.30

0.29

0.31

0.32

0.33

0.32

0.33

0.33

0.35

0.33

0.34

0.31

0.32

0.34

0.34

0.32

0.31

0.30

0.31

0.32

0.32

0.29

6

0.11

0.08

0.11

0.11

0.09

0.17

0.15

0.15

0.14

0.12

0.19

0.19

0.17

0.17

0.12

0.20

0.18

0.17

0.15

0.14

0.18

0.17

0.17

0.16

0.13

7

0.13

0.10

0.14

0.13

0.12

0.18

0.17

0.17

0.16

0.15

0.19

0.21

0.19

0.19

0.15

0.20

0.21

0.20

0.18

0.16

0.19

0.18

0.19

0.20

0.17

8

0.11

0.09

0.13

0.12

0.11

0.17

0.16

0.17

0.16

0.15

0.18

0.20

0.19

0.20

0.16

0.20

0.21

0.21

0.19

0.16

0.19

0.20

0.21

0.21

0.18

9

0.12

0.10

0.14

0.14

0.13

0.17

0.17

0.17

0.17

0.17

0.18

0.21

0.19

0.20

0.17

0.19

0.21

0.21

0.19

0.17

0.18

0.19

0.21

0.22

0.19

10

0.13

0.10

0.15

0.15

0.14

0.17

0.17

0.18

0.18

0.18

0.18

0.21

0.20

0.21

0.20

0.19

0.21

0.22

0.20

0.19

0.18

0.19

0.21

0.22

0.21

11

0.06

0.03

0.07

0.07

0.05

0.11

0.11

0.11

0.10

0.09

0.14

0.15

0.13

0.14

0.08

0.16

0.15

0.14

0.12

0.11

0.14

0.14

0.15

0.14

0.11

12

0.10

0.07

0.12

0.11

0.09

0.15

0.14

0.15

0.14

0.13

0.16

0.18

0.17

0.17

0.13

0.18

0.19

0.18

0.16

0.13

0.17

0.17

0.18

0.18

0.15

13

0.10

0.08

0.12

0.11

0.11

0.15

0.14

0.15

0.15

0.14

0.16

0.18

0.17

0.17

0.15

0.17

0.19

0.19

0.16

0.14

0.17

0.17

0.18

0.19

0.17

14

0.10

0.08

0.13

0.12

0.11

0.14

0.15

0.15

0.15

0.15

0.16

0.19

0.17

0.18

0.16

0.17

0.19

0.20

0.17

0.16

0.17

0.17

0.19

0.20

0.18

15

0.10

0.07

0.12

0.11

0.10

0.13

0.14

0.14

0.13

0.14

0.15

0.17

0.17

0.17

0.15

0.16

0.17

0.17

0.16

0.14

0.15

0.15

0.17

0.17

0.16

16

0.07

0.04

0.07

0.06

0.05

0.11

0.10

0.10

0.08

0.07

0.13

0.13

0.10

0.11

0.06

0.14

0.12

0.11

0.09

0.08

0.13

0.11

0.11

0.11

0.08

17

0.10

0.06

0.12

0.11

0.10

0.14

0.14

0.14

0.14

0.12

0.16

0.17

0.16

0.16

0.13

0.17

0.17

0.17

0.14

0.12

0.15

0.15

0.16

0.16

0.13

18

0.08

0.05

0.10

0.09

0.09

0.12

0.12

0.13

0.13

0.12

0.13

0.16

0.14

0.15

0.12

0.14

0.16

0.17

0.14

0.12

0.14

0.14

0.16

0.16

0.13

19

0.09

0.07

0.11

0.10

0.09

0.12

0.12

0.13

0.13

0.12

0.13

0.16

0.14

0.15

0.12

0.15

0.15

0.17

0.14

0.12

0.14

0.13

0.16

0.16

0.14

20

0.08

0.07

0.11

0.11

0.11

0.12

0.13

0.14

0.14

0.14

0.14

0.16

0.16

0.16

0.14

0.14

0.16

0.16

0.15

0.14

0.14

0.14

0.16

0.17

0.16

21

0.04

0.02

0.06

0.06

0.04

0.09

0.09

0.08

0.08

0.06

0.11

0.12

0.09

0.10

0.05

0.12

0.10

0.10

0.07

0.07

0.11

0.09

0.09

0.09

0.06

22

0.05

0.02

0.07

0.06

0.05

0.09

0.09

0.09

0.09

0.08

0.11

0.13

0.11

0.11

0.08

0.12

0.12

0.12

0.09

0.08

0.11

0.10

0.11

0.11

0.09

23

0.04

0.01

0.05

0.04

0.03

0.07

0.07

0.07

0.07

0.06

0.08

0.10

0.09

0.09

0.06

0.09

0.10

0.10

0.07

0.06

0.09

0.09

0.10

0.09

0.07

24

0.05

0.03

0.07

0.06

0.06

0.08

0.08

0.08

0.08

0.08

0.10

0.12

0.10

0.10

0.09

0.11

0.11

0.13

0.09

0.08

0.11

0.09

0.11

0.11

0.10

25

0.07

0.05

0.08

0.08

0.07

0.11

0.12

0.11

0.11

0.11

0.13

0.14

0.14

0.13

0.12

0.14

0.13

0.14

0.12

0.10

0.14

0.12

0.13

0.14

0.12

Figure 9.9: Illustration of the cross-autocorrelations, Corr.R t ; R t k /, monthly FF data.
Dark colors indicate high correlations, light colors indicate low correlations. course, be based on the variable that is supposed to forecast returns.
A common way (since Jensen, updated in Huberman and Kandel (1987)) is to study the performance of a portfolio by running the following regression
R1t

Rf t D ˛ C ˇ.Rmt

Rf t / C " t , E " t D 0 and Cov.R1t

Rf t ; " t / D 0; (9.20)

where R1t Rf t is the excess return on the portfolio being studied and Rmt Rf t the excess returns of a vector of benchmark portfolios (for instance, only the market portfolio if we want to rely on CAPM; returns times conditional information if we want to allow for time-variation in expected benchmark returns). Neutral performance (mean-variance intersection, that is, that the tangency portfolio is unchanged and the two MV frontiers intersect there) requires ˛ D 0, which can be tested with a t test.
See Figure 9.11 for an illustration.

192

Return = a + b*log(E/P), R2

Return = a + b*log(E/P), slope
Slope with 90% conf band

0.4

0.1

0.2

0.05

0

0

20
40
Return horizon (months)

60

0

0

20
40
Return horizon (months)

60

US stock returns 1926:1−2011:4

Scatter plot, 36 month returns
2
Return

1
0
−1
−2
−4

−3
−2
log(E/P), lagged

−1

Figure 9.10: Predictability of US stock returns

9.4

Security Analysts

Reference: Makridakis, Wheelwright, and Hyndman (1998) 10.1 and Elton, Gruber,
Brown, and Goetzmann (2010) 26
9.4.1

Evidence on Analysts’ Performance

Makridakis, Wheelwright, and Hyndman (1998) 10.1 shows that there is little evidence that the average stock analyst beats (on average) the market (a passive index portfolio).
In fact, less than half of the analysts beat the market. However, there are analysts which seem to outperform the market for some time, but the autocorrelation in over-performance is weak. The evidence from mutual funds is similar. For them it is typically also found that their portfolio weights do not anticipate price movements.

193

Buy winners and sell losers excess return alpha 8
6
4

Monthly US data 1957:1−2010:12, 25 FF portfolios (B/M and size)

2

Buy (sell) the 5 assets with highest (lowest) return over the last month

0

0

2

4
6
8
Evalutation horizon, days

10

12

Figure 9.11: Predictability of US stock returns, momentum strategy
It should be remembered that many analysts also are sales persons: either of a stock
(for instance, since the bank is underwriting an offering) or of trading services. It could well be that their objective function is quite different from minimizing the squared forecast errors—or whatever we typically use in order to evaluate their performance. (The number of litigations in the US after the technology boom/bust should serve as a strong reminder of this.)
9.4.2

Do Security Analysts Overreact?

The paper by Bondt and Thaler (1990) compares the (semi-annual) forecasts (one- and two-year time horizons) with actual changes in earnings per share (1976-1984) for several hundred companies. The paper has regressions like
Actual change D ˛ C ˇ.forecasted change/ C residual, and then studies the estimates of the ˛ and ˇ coefficients. With rational expectations (and a long enough sample), we should have ˛ D 0 (no constant bias in forecasts) and ˇ D 1
(proportionality, for instance no exaggeration).
The main findings are as follows. The main result is that 0 < ˇ < 1, so that the
194

forecasted change tends to be too wild in a systematic way: a forecasted change of 1% is
(on average) followed by a less than 1% actual change in the same direction. This means that analysts in this sample tended to be too extreme—to exaggerate both positive and negative news.
9.4.3

High-Frequency Trading Based on Recommendations from Stock Analysts

Barber, Lehavy, McNichols, and Trueman (2001) give a somewhat different picture.
They focus on the profitability of a trading strategy based on analyst’s recommendations.
They use a huge data set (some 360,000 recommendations, US stocks) for the period
1985-1996. They sort stocks in to five portfolios depending on the consensus (average) recommendation—and redo the sorting every day (if a new recommendation is published).
They find that such a daily trading strategy gives an annual 4% abnormal return on the portfolio of the most highly recommended stocks, and an annual -5% abnormal return on the least favourably recommended stocks.
This strategy requires a lot of trading (a turnover of 400% annually), so trading costs would typically reduce the abnormal return on the best portfolio to almost zero. A less frequent rebalancing (weekly, monthly) gives a very small abnormal return for the best stocks, but still a negative abnormal return for the worst stocks. Chance and Hemler
(2001) obtain similar results when studying the investment advise by 30 professional
“market timers.”
9.4.4

Economic Experts

Several papers, for instance, Bondt (1991) and Söderlind (2010), have studied whether economic experts can predict the broad stock markets. The results suggests that they cannot. For instance, Söderlind (2010) show that the economic experts that participate in the semi-annual Livingston survey (mostly bank economists) (ii) forecast the S&P worse than the historical average (recursively estimated), and that their forecasts are strongly correlated with recent market data (which in itself, cannot predict future returns).
9.4.5

The Characteristics of Individual Analysts’ Forecasts in Europe

Bolliger (2001) studies the forecast accuracy (earnings per share) of European (13 countries) analysts for the period 1988–1999. In all, some 100,000 forecasts are studied. It
195

is found that the forecast accuracy is positively related to how many times an analyst has forecasted that firm and also (surprisingly) to how many firms he/she forecasts. The accuracy is negatively related to the number of countries an analyst forecasts and also to the size of the brokerage house he/she works for.
9.4.6

Bond Rating Agencies versus Stock Analysts

Ederington and Goh (1998) use data on all corporate bond rating changes by Moody’s between 1984 and 1990 and the corresponding earnings forecasts (by various stock analysts).
The idea of the paper by Ederington and Goh (1998) is to see if bond ratings drive earnings forecasts (or vice versa), and if they affect stock returns (prices).
1. To see if stock returns are affected by rating changes, they first construct a “normal” return by a market model: normal stock return t = ˛ C ˇ

return on stock index t ,

where ˛ and ˇ are estimated on a normal time period (not including the rating change). The abnormal return is then calculated as the actual return minus the normal return. They then study how such abnormal returns behave, on average, around the dates of rating changes. Note that “time” is then measured, individually for each stock, as the distance from the day of rating change. The result is that there are significant negative abnormal returns following downgrades, but zero abnormal returns following upgrades.
2. They next turn to the question of whether bond ratings drive earnings forecasts or vice versa. To do that, they first note that there are some predictable patterns in revisions of earnings forecasts. They therefore fit a simple autoregressive model of earnings forecasts, and construct a measure of earnings forecast revisions (surprises) from the model. They then relate this surprise variable to the bond ratings.
In short, the results are the following:
(a) both earnings forecasts and ratings react to the same information, but there is also a direct effect of rating changes, which differs between downgrades and upgrades. 196

(b) downgrades: the ratings have a strong negative direct effect on the earnings forecasts; the returns react ever quicker than analysts
(c) upgrades: the ratings have a small positive direct effect on the earnings forecasts; there is no effect on the returns
A possible reason for why bond ratings could drive earnings forecasts and prices is that bond rating firms typically have access to more inside information about firms than stock analysts and investors.
A possible reason for the observed asymmetric response of returns to ratings is that firms are quite happy to release positive news, but perhaps more reluctant to release bad news. If so, then the information advantage of bond rating firms may be particularly large after bad news. A downgrading would then reveal more new information than an upgrade.
The different reactions of the earnings forecasts and the returns are hard to reconcile.
9.4.7

International Differences in Analyst Forecast Properties

Ang and Ciccone (2001) study earnings forecasts for many firms in 42 countries over the period 1988 to 1997. Some differences are found across countries: forecasters disagree more and the forecast errors are larger in countries with low GDP growth, less accounting disclosure, and less transparent family ownership structure.
However, the most robust finding is that forecasts for firms with losses are special: forecasters disagree more, are more uncertain, and are more overoptimistic about such firms. 9.4.8

Analysts and Industries

Boni and Womack (2006) study data on on some 170,000 recommedation for a very large number of U.S. companies for the period 1996–2002. Focusing on revisions of recommendations, the papers shows that analysts are better at ranking firms within an industry than ranking industries.

9.5

Technical Analysis

Main reference: Bodie, Kane, and Marcus (2002) 12.2; Neely (1997) (overview, foreign exchange market)
197

Further reading: Murphy (1999) (practical, a believer’s view); The Economist (1993)
(overview, the perspective of the early 1990s); Brock, Lakonishok, and LeBaron (1992)
(empirical, stock market); Lo, Mamaysky, and Wang (2000) (academic article on return distributions for “technical portfolios”)
9.5.1

General Idea of Technical Analysis

Technical analysis is typically a data mining exercise which looks for local trends or systematic non-linear patterns. The basic idea is that markets are not instantaneously efficient: prices react somewhat slowly and predictably to news. The logic is essentially that an observed price move must be due to some news (exactly which one is not very important) and that old patterns can tell us where the price will move in the near future.
This is an attempt to gather more detailed information than that used by the market as a whole. In practice, the technical analysis amounts to plotting different transformations
(for instance, a moving average) of prices—and to spot known patterns. This section summarizes some simple trading rules that are used.
9.5.2

Technical Analysis and Local Trends

Many trading rules rely on some kind of local trend which can be thought of as positive autocorrelation in price movements (also called momentum1 ).
A moving average rule is to buy if a short moving average (equally weighted or exponentially weighted) goes above a long moving average. The idea is that event signals a new upward trend. Let S (L) be the lag order of a short (long) moving average, with
S < L and let b be a bandwidth (perhaps 0.01). Then, a MA rule for period t could be
2
3 buy in t if MA t 1 .S/ > MA t 1 .L/.1 C b/
6
7
(9.21)
4 sell in t if MA t 1 .S/ < MA t 1 .L/.1 b / 5 , where no change

MA t

otherwise

1 .S/

D .p t

1

C : : : C pt

S /=S:

The difference between the two moving averages is called an oscillator (or sometimes, moving average convergence divergence2 ). A version of the moving average oscillator is
1
2

In physics, momentum equals the mass times speed.
Yes, the rumour is true: the tribe of chartists is on the verge of developing their very own language.

198

the relative strength index3 , which is the ratio of average price level on “up” days to the average price on “down” days—during the last z (14 perhaps) days.
The trading range break-out rule typically amounts to buying when the price rises above a previous peak (local maximum). The idea is that a previous peak is a resistance level in the sense that some investors are willing to sell when the price reaches that value
(perhaps because they believe that prices cannot pass this level; clear risk of circular reasoning or self-fulfilling prophecies; round numbers often play the role as resistance levels). Once this artificial resistance level has been broken, the price can possibly rise substantially. On the downside, a support level plays the same role: some investors are willing to buy when the price reaches that value. To implement this, it is common to let the resistance/support levels be proxied by minimum and maximum values over a data window of length L. With a bandwidth b (perhaps 0.01), the rule for period t could be
2
3 buy in t if P t > M t 1 .1 C b/
7
6
(9.22)
4 sell in t if P t < m t 1 .1 b / 5 , where no change
Mt

1

mt

1

otherwise

D max.p t

D min.p t

1; : : : ; pt S /
1 ; : : : ; p t S /:

When the price is already trending up, then the trading range break-out rule may be replaced by a channel rule, which works as follows. First, draw a trend line through previous lows and a channel line through previous peaks. Extend these lines. If the price moves above the channel (band) defined by these lines, then buy. A version of this is to define the channel by a Bollinger band, which is ˙2 standard deviations from a moving data window around a moving average.
A head and shoulder pattern is a sequence of three peaks (left shoulder, head, right shoulder), where the middle one (the head) is the highest, with two local lows in between on approximately the same level (neck line). (Easier to draw than to explain in a thousand words.) If the price subsequently goes below the neckline, then it is thought that a negative trend has been initiated. (An inverse head and shoulder has the inverse pattern.)
Clearly, we can replace “buy” in the previous rules with something more aggressive, for instance, replace a short position with a long.
3

Not to be confused with relative strength, which typically refers to the ratio of two different asset prices
(for instance, an equity compared to the market).

199

The trading volume is also often taken into account. If the trading volume of assets with declining prices is high relative to the trading volume of assets with increasing prices, then this is interpreted as a market with selling pressure. (The basic problem with this interpretation is that there is a buyer for every seller, so we could equally well interpret the situations as if there is a buying pressure.)
9.5.3

Technical Analysis and Mean Reversion

If we instead believe in mean reversion of the prices, then we can essentially reverse the previous trading rules: we would typically sell when the price is high. See Figure 9.12 and Table 9.1.
Some investors argue that markets show periods of mean reversion and then periods with trends—an that both can be exploited. Clearly, the concept of support and resistance levels (or more generally, a channel) is based on mean reversion between these points. A new trend is then supposed to be initiated when the price breaks out of this band.
Inverted MA rule, S&P 500
1350
MA(3) and MA(25), bandwidth 0.01

1300

1250

1200

1150
Jan

Long MA (−)
Long MA (+)
Short MA
Feb

Mar

Apr

1999

Figure 9.12: Examples of trading rules

200

All days
After buy signal
After neutral signal
After sell signal

Mean
0:031
0:059
0:038
0:009

Std
1:164
1:717
0:932
0:904

Table 9.1: Returns (daily, in %) from technical trading rule (Inverted MA rule). Daily
S&P 500 data 1990:1-2011:5
Hold index if Pt > max(Pt−1,...,Pt−5)

Hold index if MA(3)>MA(25)
6

6

SMI
Rule

4

4

2

2
1990

1995

2000
Year

2005

2010

1990

1995

2000
Year

2005

2010

Daily SMI data

Hold index if Pt/Pt−7 > 1

Weekly rebalancing: hold index or riskfree

6
4
2
1990

1995

2000
Year

2005

2010

Figure 9.13: Examples of trading rules

9.6

Spurious Regressions and In-Sample Overfit

References: Ferson, Sarkissian, and Simin (2003), Goyal and Welch (2008), and Campbell and Thompson (2008)

201

9.6.1

Spurious Regressions

Ferson, Sarkissian, and Simin (2003) argue that many prediction equations suffer from
“spurious regression” features—and that data mining tends to make things even worse.
Their simulation experiment is based on a simple model where the return predictions are r t C1 D ˛ C ıZ t C v t C1 ;
(9.23)
where Z t is a regressor (predictor). The true model is that returns follows the process r t C1 D

C Z t C u t C1 ;

(9.24)

where the residual is white noise. In this equation, Z t represents movements in expected returns. The predictors follow a diagonal VAR(1)
" #!
"#"
#"
# "#
Zt
0
Zt 1
"t
"t
D
C
, with Cov
D ˙:
(9.25)
Zt
0
Zt 1
"t
"t
In the case of a “pure spurious regression,” the innovations to the predictors are uncorrelated (˙ is diagonal). In this case, ı ought to be zero—and their simulations show that the estimates are almost unbiased. Instead, there is a problem with the standard deviation
O
of ı . If is high, then the returns will be autocorrelated.
Under the null hypothesis of ı D 0, this autocorrelation is loaded onto the residuals.
For that reason, the simulations use a Newey-West estimator of the covariance matrix
(with an automatic choice of lag order). This should, ideally, solve the problem with the inference—but the simulations show that it doesn’t: when Z t is very autocorrelated (0.95 or higher) and reasonably important (so an R2 from running (9.24), if we could, would be
0.05 or higher), then the 5% critical value (for a t-test of the hypothesis ı D 0) would be
2.7 (to be compared with the nominal value of 1.96). Since the point estimates are almost unbiased, the interpretation is that the standard deviations are underestimated. In contrast, with low autocorrelation and/or low importance of Z t , the standard deviations are much more in line with nominal values.
See Figures 9.14–9.15 for an illustration. They show that we need a combination of an autocorrelated residual and an autocorrelated regressor to create a problem for the usual
LS formula for the standard deviation of a slope coefficient. When the autocorrelation is very high, even the Newey-West estimator is likely to underestimate the true uncertainty.
202

Autocorrelation of xt ut
Model: yt = 0.9xt + ǫt , where ǫt = ρǫt−1 + ut , ut is iid N xt = κxt−1 + ηt , ηt is iid N

κ = −0.9 κ=0 κ = 0.9

0.5

ut is the residual from LS estimate of yt = a + bxt + ut

0
−0.5
−0.5

0 ρ 0.5

Figure 9.14: Autocorrelation of x t u t when u t has autocorrelation
Std of LS under autocorrelation, κ = −0.9
0.1
σ2 (X ′ X )−1
Newey-West(3)
Simulated
0.05

0

Std of LS under autocorrelation, κ = 0
0.1

0.05

0
−0.5

0 ρ 0.5

−0.5

0 ρ 0.5

Std of LS under autocorrelation, κ = 0.9
0.1

0.05

0
−0.5

0 ρ 0.5

Figure 9.15: Variance of OLS estimator, autocorrelated errors
To study the interaction between spurious regressions and data mining, Ferson, Sarkissian, and Simin (2003) let Z t be chosen from a vector of L possible predictors—which all are generated by a diagonal VAR(1) system as in (9.25) with uncorrelated errors. It is assumed
203

that the researcher chooses Z t by running L regressions, and then picks the one with the highest R2 . When
D 0:15 and the researcher chooses between L D 10 predictors, the simulated 5% critical value is 3.5. Since this does not depend on the importance of
Z t , it is interpreted as a typical feature of “data mining,” which is bad enough. With the autocorrelation is 0.95, then the importance of Z t becomes important—“spurious regressions” interact with the data mining to create extremely high simulated critical values. A possible explanation is that the data mining exercise is likely to pick out the most autocorrelated predictor, and that a highly autocorrelated predictor exacerbates the spurious regression problem.
9.6.2

In-Sample versus Out-of-Sample Forecasting

Goyal and Welch (2008) find that the evidence of predictability of equity returns disappears when out-of-sample forecasts are considered. Campbell and Thompson (2008) claim that there is still some out-of-sample predictability, provided we put restrictions on the estimated models.
Campbell and Thompson (2008) first report that only few variables (earnings price ratio, T-bill rate and the inflation rate) have significant predictive power for one-month stock returns in the full sample (1871–2003 or early 1920s–2003, depending on predictor).
To gauge the out-of-sample predictability, they estimate the prediction equation using data up to and including t 1, and then make a forecast for period t . The forecasting performance of the equation is then compared with using the historical average as the predictor. Notice that this historical average is also estimated on data up to an including t 1, so it changes over time. Effectively, they are comparing the forecast performance of two models estimated in a recursive way (long and longer sample): one model has just an intercept, the other has also a predictor. The comparison is done in terms of the RMSE and an “out-of-sample R2 ”
2
ROS D 1

XT t Ds

.r t

r t /2 =
O

XT t Ds

.r t

r t /2 ;
N

(9.26)

where s is the first period with an out-of-sample forecast, r t is the forecast based on the
O
prediction model (estimated on data up to and including t 1) and r t is the historical
N
average (also estimated on data up to and including t 1).
The evidence shows that the out-of-sample forecasting performance is very weak—as
204

claimed by Goyal and Welch (2008).
It is argued that forecasting equations can easily give strange results when they are estimated on a small data set (as they are early in the sample). They therefore try different restrictions: setting the slope coefficient to zero whenever the sign is “wrong,” setting the prediction (or the historical average) to zero whenever the value is negative. This improves the results a bit—although the predictive performance is still weak.
See Figure 9.16 for an illustration.
RMSE, E/P regression vs MA

RMSE, max(E/P regression,0) vs MA

0.2

0.2

0.19

0.19

0.18

0.18

0.17

Moving average (MA)
Regression

0.17

0.16

100 150 200 250 300 350
Length of data window, months

0.16

100 150 200 250 300 350
Length of data window, months

US stock 1−year returns 1926:1−2011:4
Estimation is done on moving data window, forecasts are made out of sample for : 1957:1−2011:4
In−sample RMSE: 0.17

Figure 9.16: Predictability of US stock returns, in-sample and out-of-sample

9.7

Empirical U.S. Evidence on Stock Return Predictability

The two most common methods for investigating the predictability of stock returns are to calculate autocorrelations and to construct simple dynamic portfolios and see if they outperform passive portfolios. The dynamic portfolio could, for instance, be a simple filter rule that calls for rebalancing once a month by buying (selling) assets which have increased (decreased) by more than x % the last month. If this portfolio outperforms a passive portfolio, then this is evidence of some positive autocorrelation (“momentum”) on a one-month horizon. The following points summarize some evidence which seems to hold for both returns and returns in excess of a riskfree rate (an interest rate).
205

1. The empirical evidence suggests some, but weak, positive autocorrelation in short horizon returns (one day up to a month) — probably too little to trade on. The autocorrelation is stronger for small than for large firms (perhaps no autocorrelation at all for weekly or longer returns in large firms). This implies that equally weighted stock indices have higher autocorrelations than value-weighted indices.
(See Campbell, Lo, and MacKinlay (1997) Table 2.4.)
2. Stock indices have more positive autocorrelation than (most) individual stocks: there must be fairly strong cross-autocorrelations across individual stocks. (See
Campbell, Lo, and MacKinlay (1997) Tables 2.7 and 2.8.)
3. There seems to be negative autocorrelation of multi-year stock returns, for instance in 5-year US returns for 1926-1985. It is unclear what drives this result, however. It could well be an artifact of just a few extreme episodes (Great Depression).
Moreover, the estimates are very uncertain as there are very few (non-overlapping) multi-year returns even in a long sample—the results could be just a fluke.
4. The aggregate stock market returns, that is, a return on a value-weighted stock index, seems to be forecastable on the medium horizon by various information variables. In particular, future stock returns seem to be predictable by the current dividend-price ratio and earnings-price ratio (positively, one to several years), or by the interest rate changes (negatively, up to a year). For instance, the coefficient of determination (usually denoted R2 , but not to be confused with the return used above) for predicting the two-year return on the US stock market by the current dividend-price ratio is around 0.3 for the 1952-1994 sample. (See Campbell, Lo, and MacKinlay (1997) Tables 7.1-2.) This evidence suggests that expected returns may very well be time-varying and correlated with the business cycle.
5. Even if short-run returns, R t C1 , are fairly hard to forecast, it is often fairly easy to forecast volatility as measured by jR t C1 j or R2C1 (for instance, using ARCH t or GARCH models). For an example, see Bodie, Kane, and Marcus (2002) Figure 13.7. This could possibly be used for dynamic trading strategies on options
(which directly price volatility). For instance, buying both a call and a put option (a
“straddle” or a “strangle”), is a bet on a large price movement (in any direction).

206

6. It is sometimes found that stock prices behave differently in periods with high volatility than in more normal periods. Granger (1992) reports that the forecasting performance is sometimes improved by using different forecasting models for these two regimes. A simple and straightforward way to estimate a model for periods of normal volatility is to simply throw out data for volatile periods (and other exceptional events).
7. It is important to assess forecasting models in terms of their out-of-sample forecasting performance. Too many models seem to fit data in-sample, but most of them fail in out-of-sample tests. Forecasting models are of no use if they cannot forecast.
8. There are also a number of strange patterns (“anomalies”) like the small-firms-inJanuary effect (high returns on these in the first part of January) and the bookto-market effect (high returns on firms with high book/market value of the firm’s equity). Bibliography
Ang, J. S., and S. J. Ciccone, 2001, “International differences in analyst forecast properties,” mimeo, Florida State University.
Barber, B., R. Lehavy, M. McNichols, and B. Trueman, 2001, “Can investors profit from the prophets? Security analyst recommendations and stock returns,” Journal of Finance, 56, 531–563.
Bodie, Z., A. Kane, and A. J. Marcus, 2002, Investments, McGraw-Hill/Irwin, Boston,
5th edn.
Bolliger, G., 2001, “The characteristics of individual analysts’ forecasts in Europe,” mimeo, University of Neuchatel.
Bondt, W. F. M. D., 1991, “What do economists know about the stock market?,” Journal of Portfolio Management, 17, 84–91.
Bondt, W. F. M. D., and R. H. Thaler, 1990, “Do security analysts overreact?,” American
Economic Review, 80, 52–57.
207

Boni, L., and K. L. Womack, 2006, “Analysts, industries, and price momentum,” Journal of Financial and Quantitative Analysis, 41, 85–109.
Brock, W., J. Lakonishok, and B. LeBaron, 1992, “Simple technical trading rules and the stochastic properties of stock returns,” Journal of Finance, 47, 1731–1764.
Brockwell, P. J., and R. A. Davis, 1991, Time series: theory and methods, Springer Verlag,
New York, second edn.
Campbell, J. Y., A. W. Lo, and A. C. MacKinlay, 1997, The econometrics of financial markets, Princeton University Press, Princeton, New Jersey.
Campbell, J. Y., and S. B. Thompson, 2008, “Predicting the equity premium out of sample: can anything beat the historical average,” Review of Financial Studies, 21, 1509–
1531.
Chance, D. M., and M. L. Hemler, 2001, “The performance of professional market timers: daily evidence from executed strategies,” Journal of Financial Economics, 62, 377–
411.
Cochrane, J. H., 2001, Asset pricing, Princeton University Press, Princeton, New Jersey.
Cuthbertson, K., 1996, Quantitative financial economics, Wiley, Chichester, England.
Ederington, L. H., and J. C. Goh, 1998, “Bond rating agencies and stock analysts: who knows what when?,” Journal of Financial and Quantitative Analysis, 33, 569–585.
Elton, E. J., M. J. Gruber, S. J. Brown, and W. N. Goetzmann, 2010, Modern portfolio theory and investment analysis, John Wiley and Sons, 8th edn.
Ferson, W. E., S. Sarkissian, and T. T. Simin, 2003, “Spurious regressions in financial economics,” Journal of Finance, 57, 1393–1413.
Goyal, A., and I. Welch, 2008, “A comprehensive look at the empirical performance of equity premium prediction,” Review of Financial Studies 2008, 21, 1455–1508.
Granger, C. W. J., 1992, “Forecasting stock market prices: lessons for forecasters,” International Journal of Forecasting, 8, 3–13.

208

Huberman, G., and S. Kandel, 1987, “Mean-variance spanning,” Journal of Finance, 42,
873–888.
Lo, A. W., H. Mamaysky, and J. Wang, 2000, “Foundations of technical analysis: computational algorithms, statistical inference, and empirical implementation,” Journal of
Finance, 55, 1705–1765.
Makridakis, S., S. C. Wheelwright, and R. J. Hyndman, 1998, Forecasting: methods and applications, Wiley, New York, 3rd edn.
Murphy, J. J., 1999, Technical analysis of the financial markets, New York Institute of
Finance.
Neely, C. J., 1997, “Technical analysis in the foreign exchange market: a layman’s guide,”
Federal Reserve Bank of St. Louis Review.
Priestley, M. B., 1981, Spectral analysis and time series, Academic Press.
Söderlind, P., 2010, “Predicting stock price movements: regressions versus economists,”
Applied Economics Letters, 17, 869–874.
The Economist, 1993, “Frontiers of finance,” pp. 5–20.

209

10

Event Studies

Reference: Bodie, Kane, and Marcus (2005) 12.3 or Copeland, Weston, and Shastri
(2005) 11
Reference (advanced): Campbell, Lo, and MacKinlay (1997) 4
More advanced material is denoted by a star ( ). It is not required reading.

10.1

Basic Structure of Event Studies

The idea of an event study is to study the effect (on stock prices or returns) of a special event by using a cross-section of such events. For instance, what is the effect of a stock split announcement on the share price? Other events could be debt issues, mergers and acquisitions, earnings announcements, or monetary policy moves.
The event is typically assumed to be a discrete variable. For instance, it could be a merger or not or if the monetary policy surprise was positive (lower interest than expected) or not. The basic approach is then to study what happens to the returns of those assets that have such an event.
Only news should move the asset price, so it is often necessary to explicitly model the previous expectations to define the event. For earnings, the event is typically taken to be the earnings announcement minus (some average of) analysts’ forecast. Similarly, for monetary policy moves, the event could be specified as the interest rate decision minus previous forward rates (as a measure of previous expectations).
The abnormal return of asset i in period t is ui t D R i t

normal
Ri t
;

(10.1)

where Ri t is the actual return and the last term is the normal return (which may differ across assets and time). The definition of the normal return is discussed in detail in Section
10.2. These returns could be nominal returns, but more likely (at least for slightly longer horizons) real returns or excess returns.
Suppose we have a sample of n such events (“assets”). To keep the notation (reason210

-1

0

1

-1

firm 1

0

1

firm 2
Figure 10.1: Event days and windows

ably) simple, we “normalize” the time so period 0 is the time of the event. Clearly the actual calendar time of the events for assets i and j are likely to differ, but we shift the time line for each asset individually so the time of the event is normalized to zero for every asset. See Figure 10.1 for an illustration.
To control for information leakage and slow price adjustment, the abnormal return is often calculated for some time before and after the event: the “event window” (often ˙20 days or so). For day s (that is, s days after the event time 0), the cross sectional average abnormal return is
P
us D nD1 ui s =n:
N
(10.2) i For instance, u2 is the average abnormal return two days after the event, and u 1 is for
N
N one day before the event.
The cumulative abnormal return (CAR) of asset i is simply the sum of the abnormal return in (10.1) over some period around the event. It is often calculated from the beginning of the event window. For instance, if the event window starts at w , then the q -period (day?) car for firm i is cari q D ui;

w

C ui;

w C1

C : : : C ui;

w Cq 1 :

(10.3)

The cross sectional average of the q -period car is carq D

Pn

i D1 cari q =n:

(10.4)

See Figure 10.2 for an empirical example.
Example 10.1 (Abnormal returns for ˙ day around event, two firms) Suppose there are two firms and the event window contains ˙1 day around the event day, and that the
211

time

Cumulative excess return (average) with 90% conf band
100

Returns, %

80
60
40
20
Sample: 196 IPOs on the Shanghai Stock Exchange, 2001−2004

0
0

5

10

15
Days after IPO

20

25

Figure 10.2: Event study of IPOs in Shanghai 2001–2004. (Data from Nou Lai.) abnormal returns (in percent) are
Time Firm 1 Firm 2 Cross-sectional Average
1
0:2
0:1
0:05
0
1:0
2:0
1:5
1
0:1
0:3
0:2
We have the following cumulative returns
Time Firm 1 Firm 2 Cross-sectional Average
1
0:2
0:1
0:05
0
1:2
1:9
1:55
1
1:3
2:2
1:75

10.2

Models of Normal Returns

This section summarizes the most common ways of calculating the normal return in
(10.1). The parameters in these models are typically estimated on a recent sample, the
“estimation window,” that ends before the event window. See Figure 10.3 for an illustra212

tion. (When there is no return data before the event window (for instance, when the event is an IPO), then the estimation window can be after the event window.)
In this way, the estimated behaviour of the normal return should be unaffected by the event. It is almost always assumed that the event is exogenous in the sense that it is not due to the movements in the asset price during either the estimation window or the event window. This allows us to get a clean estimate of the normal return.
The constant mean return model assumes that the return of asset i fluctuates randomly around some mean i
Ri t D

i

C

it

with E.

it /

D Cov.

i t ; i ;t s /

D 0:

(10.5)

This mean is estimated by the sample average (during the estimation window). The normal return in (10.1) is then the estimated mean. O i so the abnormal return becomes Oi;t .
The market model is a linear regression of the return of asset i on the market return
Ri t D ˛i C ˇi Rmt C "i t with E."i t / D Cov."i t ; "i;t s / D Cov."i t ; Rmt / D 0: (10.6)
Notice that we typically do not impose the CAPM restrictions on the intercept in (10.6).
The normal return in (10.1) is then calculated by combining the regression coefficients
O
with the actual market return as ˛i C ˇi Rmt , so the the abnormal return becomes "i t .
O
O
When we restrict ˛i D 0 and ˇi D 1, then this approach is called the market-adjustedreturn model. This is a particularly useful approach when there is no return data before the event, for instance, with an IPO.
Recently, the market model has increasingly been replaced by a multi-factor model which uses several regressors instead of only the market return. For instance, Fama and
French (1993) argue that (10.6) needs to be augmented by a portfolio that captures the different returns of small and large firms and also by a portfolio that captures the different returns of firms with high and low book-to-market ratios.
Finally, another approach is to construct a normal return as the actual return on assets which are very similar to the asset with an event. For instance, if asset i is a small manufacturing firm (with an event), then the normal return could be calculated as the actual return for other small manufacturing firms (without events). In this case, the abnormal return becomes the difference between the actual return and the return on the matching portfolio. This type of matching portfolio is becoming increasingly popular.

213

time estimation window
(for normal return)

event window

Figure 10.3: Event and estimation windows
All the methods discussed here try to take into account the risk premium on the asset.
It is captured by the mean in the constant mean mode, the beta in the market model, and by the way the matching portfolio is constructed. However, sometimes there is no data in the estimation window. The typical approach is then to use the actual market return as the normal return—that is, to use (10.6) but assuming that ˛i D 0 and ˇi D 1. Clearly, this does not account for the risk premium on asset i , and is therefore a fairly rough guide.
Apart from accounting for the risk premium, does the choice of the model of the normal return matter a lot? Yes, but only if the model produces a higher coefficient of determination (R2 ) than competing models. In that case, the variance of the abnormal return is smaller for the market model which the test more precise (see Section 10.3 for a discussion of how the variance of the abnormal return affects the variance of the test statistic). To illustrate this, consider the market model (10.6). Under the null hypothesis that the event has no effect on the return, the abnormal return would be just the residual in the regression (10.6). It has the variance (assuming we know the model parameters)
Var.ui t / D Var."i t / D .1

R2 / Var.Ri t /;

(10.7)

where R2 is the coefficient of determination of the regression (10.6).
Proof. (of (10.7)) From (10.6) we have (dropping the time subscripts)
Var.Ri / D ˇi2 Var.Rm / C Var."i /:

214

We therefore get
Var."i / D Var.Ri /
D Var.Ri /
D Var.Ri /

D .1

ˇi2 Var.Rm /
Cov.Ri ; Rm /2 = Var.Rm /
Corr.Ri ; Rm /2 Var.Ri /

R2 / Var.Ri /:

The second equality follows from the fact that ˇi D Cov.Ri ; Rm /= Var.Rm /, the third equality from multiplying and dividing the last term by Var.Ri / and using the definition of the correlation, and the fourth equality from the fact that the coefficient of determination in a simple regression equals the squared correlation of the dependent variable and the regressor. This variance is crucial for testing the hypothesis of no abnormal returns: the smaller is the variance, the easier it is to reject a false null hypothesis (see Section 10.3). The constant mean model has R2 D 0, so the market model could potentially give a much smaller variance. If the market model has R2 D 0:75, then the standard deviation of the abnormal return is only half that of the constant mean model. More realistically,
R2 might be 0.43 (or less), so the market model gives a 25% decrease in the standard deviation, which is not a whole lot. Experience with multi-factor models also suggest that they give relatively small improvements of the R2 compared to the market model. For these reasons, and for reasons of convenience, the market model is still the dominating model of normal returns.
High frequency data can be very helpful, provided the time of the event is known.
High frequency data effectively allows us to decrease the volatility of the abnormal return since it filters out irrelevant (for the event study) shocks to the return while still capturing the effect of the event.

10.3

Testing the Abnormal Return

In testing if the abnormal return is different from zero, there are two sources of sampling uncertainty. First, the parameters of the normal return are uncertain. Second, even if we knew the normal return for sure, the actual returns are random variables—and they will always deviate from their population mean in any finite sample. The first source
215

of uncertainty is likely to be much smaller than the second—provided the estimation window is much longer than the event window. This is the typical situation, so the rest of the discussion will focus on the second source of uncertainty.
It is typically assumed that the abnormal returns are uncorrelated across time and across assets. The first assumption is motivated by the very low autocorrelation of returns.
The second assumption makes a lot of sense if the events are not overlapping in time, so that the event of assets i and j happen at different (calendar) times. It can also be argued that the model for the normal return (for instance, a market model) should capture all common movements by the regressors — leaving the abnormal returns (the residuals) uncorrelated across firms. In contrast, if the events happen at the same time, the crosscorrelation must be handled somehow. This is, for instance, the case if the events are macroeconomic announcements or monetary policy moves. An easy way to handle such synchronized (clustered) events is to form portfolios of those assets that share the event time—and then only use portfolios with non-overlapping events in the cross-sectional study. For the rest of this section we assume no autocorrelation or cross correlation.
Let i2 D Var.ui;t / be the variance of the abnormal return of asset i . The variance of the cross-sectional (across the n assets) average, us in (10.2), is then
N
Var.us / D
N

2
1

2
2

C

2 n C ::: C

= n2 D

Pn

2
2
i D1 i =n ;

(10.8)

since all covariances are assumed to be zero. In a large sample (where the asymptotic normality of a sample average starts to kick in), we can therefore use a t -test since us = Std.us / !d N.0; 1/:
N
N

(10.9)

The cumulative abnormal return over q period, cari;q , can also be tested with a t -test.
Since the returns are assumed to have no autocorrelation the variance of the cari;q
Var.cari q / D q

(10.10)

2 i: This variance is increasing in q since we are considering cumulative returns (not the time average of returns).
The cross-sectional average cari;q is then (similarly to (10.8))
Var.carq / D q

2
1

Cq

2
2

C ::: C q

2 n = n2 D q

Pn

2
2
i D1 i =n ;

(10.11)

216

if the abnormal returns are uncorrelated across time and assets.
Figures 4.2a–b in Campbell, Lo, and MacKinlay (1997) provide a nice example of an event study (based on the effect of earnings announcements).
Example 10.2 (Variances of abnormal returns) If the standard deviations of the daily abnormal returns of the two firms in Example 10.1 are 1 D 0:1 and and 2 D 0:2, then we have the following variances for the abnormal returns at different days
Time Firm 1 Firm 2 Cross-sectional Average
0:12 C 0:22 =4
1
0:12
0:22
0:12 C 0:22 =4
0
0:12
0:22
0:12 C 0:22 =4
1
0:12
0:22
Similarly, the variances for the cumulative abnormal returns are
Time Firm 1 Firm 2 Cross-sectional Average
1
0:12
0:22
0:12 C 0:22 =4
0 2 0:12 2 0:22
2
0:12 C 0:22 =4
1 3 0:12 3 0:22
3
0:12 C 0:22 =4
Example 10.3 (Tests of abnormal returns) By dividing the numbers in Example 10.1 by the square root of the numbers in Example 10.2 (that is, the standard deviations) we get the test statistics for the abnormal returns
Time Firm 1 Firm 2 Cross-sectional Average
1
2
0:5
0:4
0
10
10
13:4
1
1
1:5
1:8
Similarly, the variances for the cumulative abnormal returns we have
Time Firm 1 Firm 2 Cross-sectional Average
1
2
0:5
0:4
0
8:5
6:7
9:8
1
7:5
6:4
9:0

217

10.4

Quantitative Events

Some events are not easily classified as discrete variables. For instance, the effect of positive earnings surprise is likely to depend on how large the surprise is—not just if there was a positive surprise. This can be studied by regressing the abnormal return (typically the cumulative abnormal return) on the value of the event (xi ) cari q D a C bxi C i :

(10.12)

The slope coefficient is then a measure of how much the cumulative abnormal return reacts to a change of one unit of xi .

Bibliography
Bodie, Z., A. Kane, and A. J. Marcus, 2005, Investments, McGraw-Hill, Boston, 6th edn.
Campbell, J. Y., A. W. Lo, and A. C. MacKinlay, 1997, The econometrics of financial markets, Princeton University Press, Princeton, New Jersey.
Copeland, T. E., J. F. Weston, and K. Shastri, 2005, Financial theory and corporate policy,
Pearson Education, 4 edn.
Fama, E. F., and K. R. French, 1993, “Common risk factors in the returns on stocks and bonds,” Journal of Financial Economics, 33, 3–56.

218

11

Investment for the Long Run

Reference: Campbell and Viceira (2002), Elton, Gruber, Brown, and Goetzmann (2010)
12

11.1

Time Diversification: Approximate Case

This section discusses the notion of “time diversification,” which essentially amounts to claiming that equity is safer for long run investors than for short run investors. The argument comes in two flavours: that Sharpe ratios are increasing with the investment horizon, and that the probability that equity returns will outperform bond returns increases with the horizon. This is illustrated in Figure 11.2. The results presented in this section are approximate, since we work with simple returns (and disregard compounding). This has clear disadvantages, but also the advantage of delivering simple results.
11.1.1

Increasing Sharpe Ratios

With iid returns, the expected return and variance both grow linearly with the horizon, so Sharpe ratios (expected excess return divided by the standard deviation) increase with
Sharpe ratio

Prob(excess return>0)

1.5

1

1

0.8

0.5
0.6
0

1m

1y
3y
6y
Investment horizon

9y

1m

1y
3y
6y
Investment horizon

9y

US stock returns 1927:7−2011:4

Figure 11.1: SR and probability of excess return>0
219

Probability excess return>0

Sharpe ratio
1
2
0.9

1.5

0.8

1

0.7

0.5
0

5
10
15
Investment horizon (years)

20

0

5
10
15
Investment horizon (years)

20

Assumes annual excess return has mean 0.08 and std 0.16, and is iid N

Figure 11.2: SR and probability of excess return>0, iid returns the square root of horizon. However, this does not mean that risky assets are better for long horizons, at least not if we believe in mean variance preferences and unpredictable returns. Something else than iid data is needed for that.
Let Zq be the net return on a q -period investment. If returns are iid, the Sharpe ratio of Zq is approximately p E Re q ;
(11.1)
SR.Zq /
Std.R/
where E Re is the mean one-period excess return and Std.R/ is the standard deviation of the one-period return. (Time subscripts are suppressed to keep the notation simple.) This
Sharpe ratio is clearly increasing with the horizon, q .
Proof. (of (11.1)) The q -period net return is
Zq D .R1 C 1/.R2 C 1/ : : : .Rq C 1/

1

R1 C R2 C : : : C Rq :

If returns are iid, then the mean and variance of the q -period return are approximately
E.Zq /
Var.Zq /

q E.R/; q Var.R/:

220

Example 11.1 (The quality of the approximation of the q -period return) If R1 D 0:9 and
R2 D 0:9, then the two-period net return is
Z2 D .1 C 0:9/.1

0:9/

1D

0:81

With the approximation we instead have
Z2

R1 C R2 D 0:

The difference in net returns is dramatic. If the two net returns instead are R1 D 0:09 and R2 D 0:09, then
Z2 D .1 C 0:09/.1

0:09/

1D

0:01

and the approximation is still zero: the difference is much smaller.
Example 11.2 (The danger of arithmetic mean return). Consider two portfolios with the following returns
Portfolio A Portfolio B
Year 1
5%
20%
Year 2
5%
35%
Year 3
5%
25%
Just adding these returns give 5% and 10% respectively, but the total returns over the three periods are actually 4.7% and -2.5% respectively.
11.1.2

Probability of OutPerforming a Riskfree Asset

Since the Sharpe ratio is increasing with the investment horizon, the probability of beating a riskfree asset is (typically) also increasing. To simplify, assume that the returns are normally distributed. Then, we have e Pr Zq > 0 D ˚ SR.Zq / ;

(11.2)

e where Zq is the excess return on a q -period investment and ˚./ is the cumulative distribution function of a standard normal variable, N .0; 1/. The argument of an increasing probability of a positive excess return is therefore the same argument as the increasing
Sharpe ratio. See Figure 11.2 for an illustration.

221

Pdf

Pdf, conditional on negative return
1 year
10 years

2

1 year
10 years

10

1.5
1

5

0.5
0

−0.5

0
0.5
Net return

0

1

−0.5 −0.4 −0.3 −0.2 −0.1
Net return

0

Excess returns are iid N(0.08,0.162)

Prob of negative return

Expected return, conditional on negative return

0.3

−0.05
−0.1
−0.15
−0.2
−0.25

0.2
0.1
0

0

5
10
15
Investment horizon (years)

20

0

5
10
15
Investment horizon (years)

20

Figure 11.3: Time diversification, normally distributed returns
Proof. (of (11.2)) By standard manipulations we have e Pr Zq > 0 D 1

e
Pr Zq Ä 0

D1

Pr

D1

˚



e
Zq

e
E Zq

e
Std.Zq /
!
e
E Zq

Ä

e
E Zq

!

e
Std.Zq /

e
Std.Zq /
!
e
E Zq
;
e
Std.Zq /

where the last line follows from ˚.x/ C ˚. x / D 1 since the standard normal distribution is symmetric around zero.
Although the increasing Sharpe ratios mean that the probability of beating a riskfree
222

asset is increasing with the investment horizon, that does not mean that the risky asset is safer for a long-run investor. The reason is, of course, that we also have to take into account the size of the loss—in case the portfolio underperforms. With a longer horizon
(and therefore higher dispersion), really bad outcomes are more likely. See Figure 11.3 for an illustration.
To say more about how the investment horizon affects the portfolio weights, we need to be more precise about the preferences. As a benchmark, consider a mean-variance investor who will choose a portfolio for q periods. With one risky asset (the tangency portfolio) and a riskfree asset, the optimization problem is e maxv v E Zq C qRf

k2 v Var.Zq /;
2

(11.3)

where Rf is the per-period riskfree rate. With iid returns, both the mean and the variance scale linearly with the investment horizon, so we can equally well write the optimization problem as k2 e v q Var.Z1 /; if iid returns.
(11.4)
maxv vq E Z1 C qRf
2
Clearly, scaling this objective function by 1=q will not change anything: the horizon is irrelevant. To be more precise, the solution of (11.3) is vD e
1 E Zq
:
k Var.Zq /

(11.5)

If returns are iid, we get the following portfolio weights for investment horizons of one and two periods
1 E Re
;
k Var.R/
1 2 E Re
;
v.2/ D k 2 Var.R/

v.1/ D

(11.6)
(11.7)

which are the same. With MV behaviour, non-iid returns are required to generate a horizon effect on the portfolio choice. The key point is that the portfolio weight is not determined by the Sharpe ratio, but the Sharpe ratio divided by the standard deviation.
Or to put it another way, comparing Sharpe ratios across investment horizons is not very informative. 223

Proof. (of (11.5)) The first order condition of (11.3) is e 0 D E Zq k v Var.Zq / or e 1 E Zq vD : k Var.Zq /

Example 11.3 (US long-run stock market) For the period 1947–2001, the US stock market had an average excess return of 8% (per year) and a standard deviation of 16%. From
(11.5), the weight on the risky asset is then v D .0:08=0:162 /=k D 3:125=k .
With autocorrelated returns two things change: returns are predictable so the expected return is time-varying, and the variance of the two-period return includes a covariance term. The portfolio weights (chosen in period 0) are then e 1 E0 R1
;
k Var0 .R1 / e e
1
E0 .R1 C R2 /
v.2/ D
;
k Var0 .R1 / C Var0 .R2 / C 2 Cov0 .R1 ; R2 /

v.1/ D

(11.8)
(11.9)

where all moments carry a time subscript to indicate that they are conditional moments.
A key aspect of these formulas is that mean reversion in prices makes the covariance (of returns) negative. This will tend to make the weight for the two-period horizon larger.
The intuition is simple: with mean reversion in prices, long-run investments are less risky than short-run investments since extreme movements will be partially “averaged out” over time. Empirically, there is some evidence of mean-reversion on the business cycle frequencies (a couple of years). The effect is not strong, however, so mean reversion is probably a poor argument for horizon effects.
Example 11.4 (AR(1) process for returns) Suppose the excess returns follow an AR(1) process ReC1 D .1
/ C Re C " t C1 with 2 D Var." t C1 /: t t

224

The conditional moments are then e E0 R1 D .1

e
/ C R0 ;

e
E0 R2 D .1

Var0 .R1 / D

2

/C

/

2

2

Var0 .R2 / D .1 C

Cov0 .R1 ; R2 / D

2

2

2

e
R0 ;

:

e
If the initial return is at the mean, R0 D , then the forecasted return is horizons, which gives the portfolio weights

1 k 1
v.2/ D k v.1/ D

2

;
2

2

.2 C

2

C2 /

:

With D . 0:5; 0; 0:5/ the last term is around .1:6; 1; 0:6/. With last term is around .1:1; 1; 0:9/.

11.2

across all

D . 0:1; 0; 0:1/, the

Time Diversification and the Growth-Optimal Portfolio: Lognormal Returns

This section revisits the issue of time diversification—this time in a setting where log portfolio returns are normally distributed. This allows us to get more precise results, since we can avoid approximating the cumulative returns.
11.2.1

Time Diversification with Lognormal Returns

The gross return on a q -period investment can be written
1 C Zq D .1 C R1 /.1 C R2 /:::.1 C Rq /;

(11.10)

where R t is the net portfolio return in period t . Taking logs (and using lower case letters to denote them), we have the log q -period return zq D r1 C r2 C : : : C rq ;

(11.11)
225

where zq D ln.1 C Zq / and r t D ln.1 C R t /.
Remark 11.5 (ln.1 C x/ x :::) If x is small, ln.1 C x/ x , so assuming that x is normally distributed is fairly similar to assuming that ln.1 C x/ is normally distributed.
Remark 11.6 (Lognormal distribution) If x N . ; ability density function of y is
"
Â
1
1 ln y pdf.y/ D p exp 2 y2 2
The r th moment of y is E y r D exp.r C r 2

2

2

/ and y D exp.x/, then the probÃ2 #
, y > 0:

=2/.

2
To simplify the analysis, assume that the log returns of portfolio y , ryt , are iid N. y ; y /.
(This is a convenient assumption since it carries over to multi-period returns.) The “Sharpe ratio” of the log q -period return, zqy , is

SR.zqy / D

p

q

rf

y

;

(11.12)

y

where rf is the continously compounded interest rate.
If log returns are normally distributed, the probability of the q -period return of portfolio y (denoted Zqy ) being higher than the q -return of portfolio x (Zqx ) is
!
p y x q ;
(11.13)
Pr Zqy > Zqx D ˚ ryt rxt where ˚ is the cumulative distribution function of a standard normal variable, N .0; 1/, ryt rxt is the standard deviation of y the expected log return on portfolio y , and the difference in log returns. (The portfolios are constant over time, since the returns are iid.) In particular, if the x portfolio is a riskfree asset with log return rf , then the probability is e Pr Zqy > 0 D ˚ SR.zqy / ;
(11.14)
which is a function of the Sharpe ratio for the log returns. This probability is clearly increasing with the investment horizon, q . On the other hand, with a longer horizon (and therefore higher dispersion), really bad outcomes more likely.
See Figure 11.4 for an illustration.
226

Pdf

Pdf, conditional on negative return
10

1 year
10 years

3
2

1 year
10 years

5

1
0
−0.5

0

0.5
Net return

1

1.5

0
−0.5

−0.4

−0.3 −0.2
Net return

−0.1

0

log returns are iid N(0.04,0.12)

Prob of negative return

Expected return, conditional on negative return
0

0.3
−0.05
0.2

−0.1

0.1
0

−0.15
0

5
10
15
Investment horizon (years)

20

−0.2

0

5
10
15
Investment horizon (years)

20

Figure 11.4: Time diversification, lognormally distributed returns
Proof. (of (11.12)) Consider (11.11). If log returns are iid with mean
2
, then the mean and variance of the q -period return are

and variance

E.zq / D q ;

Var.zq / D q

2

:

227

Proof. ( of (11.13)) By standard manipulations we have
Pr exp

Pq

t D1 r ty

> exp

Pq

t D1 r tx

Pq
Pq
Pr exp t D1 r ty Ä exp t D1 r tx
Pq
Pq
D 1 Pr t D1 r ty Ä t D1 r tx
" Pq r tx qy x t D1 r ty
D 1 Pr
Ä
p q ryt rxt
"
# p y x D1 ˚ q ryt rxt
"
# p y x D˚ q ; ryt rxt
D1

q p q

where the last line follows from ˚.z/ C ˚. z / D 1 since the standard normal distribution is symmetric around zero.
To demonstrate that, with iid log returns, optimal portfolio weights are indeed unaffected by the investment horizon, consider the simple case of a logarithmic utility function, where we find a portfolio that solves maxv E ln.1 C Rq / D maxv E.r1 C r2 C : : : C rq /;

(11.15)

where r t is the log portfolio return in period t (which clearly depends on the chosen portfolio weights v ). We here assume that the portfolio weights are chosen at the beginning
(time t D 0) of the investment period and then kept unchanged. With iid log returns, we can clearly write (11.15) as maxv q E r1 ;
(11.16)
which demonstrates that the investment horizon does not matter for the optimal portfolio choice. It doesn’t matter that the Sharpe ratio is increasing.
Example 11.7 (Portfolio choice with logarithmic utility function) It is typically hard to find explicit expressions for what the portfolio weights should be with log utility, so one typically has to resort to numerical methods. This example shows a case where we can find an explicit solution—because of a very simple setting. Suppose there are two states
(1 and 2) and that asset A has the gross return RA .1/ in state 1 and RA .2/ in state 2—and similarly for asset B . The portfolio return is Rp D vRe C RB , where Re D RA RB . If
228

y

x

ryt

rxt

#

Expected log portfolio gross return

Log gross return × 100

8
7.99
7.98
7.97
Two states with prob 1/3 and 2/3

7.96

Gross return of asset A: 1.05 in state 1 and 1.1 in state 2
Gross return of asset B: 1.083 in both states

7.95
−0.5

0

0.5
Weight on asset A

1

1.5

Figure 11.5: Example of portfolio choice with log utility is the probability of state 1, then the expected log portfolio return is
E ln.Rp / D

lnŒvRe .1/ C RB .1/ C .1

/ lnŒvRe .2/ C RB .2/:

The first order condition for v is
0D

Re .1/ C

.1

/
Re .2/
C RB .2/

vRe .1/

C RB .1/

vD

Re .1/RB .2/ C .1
/Re .2/RB .1/
:
Re .1/Re .2/

vRe .2/

and the solution is

See Figure 11.5 for an illustration.

11.2.2

The Growth-Optimal Portfolio and Log Utility

The portfolio that comes out from maximizing the log return has some interesting properties. If portfolio y has the highest expected log return, then (11.13) shows that the probability that it beats any other portfolio is increasing with the investment horizon—and
229

Probability of Ry > Rx
1
0.8
0.6
0.4
0.2

µe/σ = 0.4 µe/σ = 0.2

0

0

5

10
Investment horizon (years)

15

20

Figure 11.6: The probability of outperforming another portfolio goes to unity as the horizon goes to infinity. This portfolio is called the growth-optimal portfolio. See Figure 11.6 for an illustration.
This portfolio is commonly advocated to be the best for any long-run investor. That argument is clearly flawed. In particular, for an investor with a relative risk aversion different from one, the growth-optimal portfolio is not optimal: a higher risk aversion would give a more conservative portfolio. (It can be shown that the logarithmic utility function is a CRRA utility function with a relative risk aversion of one.) The intuition is that the occasional lower return of the growth-optimal portfolio is considered very risky, so the investor prefers a less volatile portfolio.
Notice that, for a given q < 1, the growth-optimal portfolio does not necessarily maximize the probability of beating other portfolios. While the growth-optimal portfolio has the highest expected log return so it maximizes the numerator in (11.13), it may well have a very high volatility. It is only in the limit that the growth-optimal portfolio is a sure winner. 230

11.2.3

Maximizing the Geometric Mean Return

The growth-optimal portfolio is often said to maximize the geometric mean return. That is true, but may need a clarification.
Remark 11.8 (Geometric mean) Suppose the random variable x can take the values
PS
x.1/; x.2/; : : : ; x.S/ with probabilities .1/; .2/; : : : ; .S/, where j D1 .j / D 1.
PS
The arithmetic mean (expected value) is j D1 .j /x.j / and the geometric mean is
QS
.j /
. Taking the log of the definition of a geometric mean gives j D1 x.j /
PS

j D1

.j / ln x.j / D E ln x;

which is the expected value of the log of x .
Remark 11.9 (Sample geometric mean) With the sample z1 ; z2 ; : : : ; zT , the sample arithP
Q
1=T metic mean is TD1 z t =T and the sample geometric mean is TD1 z t . t t
It follows directly from these remarks that a portfolio that maximizes the geometric mean of the portfolio gross return 1 C Rp also maximizes the expected log return of it,
E ln 1 C Rp .
An intuitive way of motivating this portfolio is as follows. The gross return on the q -period investment in (11.10) is, of course, random, but in a very large sample (long investment horizon), the histogram of the returns should start to converge to the true distribution. With iid returns, this is the same distribution that defined the geometric mean
(which we have maximized). Hence, with a very long investment period, the portfolio
(that maximizes the geometric mean) should give the highest return over the investment period. Of course, this is virtually the same argument as in (11.13), which showed that the growth-optimal portfolio will outperform all other portfolios with probability one as the investment horizon goes to infinity. (The only difference is that the current argument does not rely on the normal distribution of the log returns.)

11.3

More General Utility Functions and Rebalancing

We will now take a look at more general optimization problems. Assume that the objective is to maximize
E0 u.Wq /;
(11.17)
231

where Wq is the wealth (in real terms) at time q (the investment horizon) and E0 denotes the expectations formed in period 0 (the initial period). What can be said about how the investment horizon affects the portfolio weights?
If the investor is not allowed (or it is too costly) to rebalance the portfolio—and the utility function/distribution of returns are such that the investor picks a mean-variance portfolio (quadratic utility function or normally distributed returns), then the results in
Section 11.1.1 go through: non-iid returns are required to generate a horizon effect on the portfolio choice.
If, more realistically, the investor is allowed to rebalance the portfolio, then the analysis is more difficult. We summarize some known results below.
11.3.1

CRRA Utility Function and iid Returns

Suppose the utility function has constant relative risk aversion, so the objective in period
0 is max E0 Wq1 =.1
/:
(11.18)
In period one, the objective is max E1 Wq1 =.1 /, which may differ in terms of what we know about the distribution of future returns (incorporated into the expectations operator) and also in terms of the current wealth level (due to the return in period 1).
With CRRA utility, relative portfolio weights are independent of the wealth of the investor (fairly straightforward to show). If we combine this with iid returns—then the only difference between an investor in t and the same investor in t C 1 is that he may be poorer or wealthier. This investor will therefore choose the same portfolio weights in every period. Analogously, a short run investor and a long run investor choose the same portfolio weights (you can think of the investor in t C 1 as a short run investor). Therefore, with a CRRA utility function and iid returns there are no horizon effects on the portfolio choice. In addition, the portfolio weights will stay constant over time. The intuition is that all periods look the same.
However, with non-iid returns (predictability or variations in volatility) there will be horizon effects (and changes in weights over time). This would give rise to intertemporal hedging, where the choice of today’s portfolio is affected by the likely changes of the investment opportunities tomorrow.
The same result holds if the objective function instead is to maximize the utility from
232

stream of consumption, provided the utility function is CRRA and time separable. In this case, the objective is
1
max C0

=.1

1
/ C ı E0 C1

=.1

1
/ C : : : C ı q E0 Cq

=.1

/:

(11.19)

The basic mechanism is that the optimal consumption/wealth ratio turns out to be constant.
11.3.2

Logarithmic Utility Function

In the special case where the relative risk aversion (in a CRRA utility function) is one, then the utility function becomes logarithmic.
The objective in period 0 is then max E0 ln Wq D max.ln W0 C E0 r1 C E0 r2 C : : : C E0 rq /;

(11.20)

where r t is the log return, r t D ln.1 C R t / where R t is a net return.
Since the returns in the different periods enter separably, the best an investor can do in period 0 is to choose a portfolio that maximizes E0 r1 —that is, to choose the one-period growth-optimal portfolio. But, a short run investor who maximizes E0 lnŒW0 .1 C R1 // D max.ln W0 C E0 r1 / will choose the same portfolio. There is then no horizon effect.
However, the portfolio choice may change over time, if the distribution of the returns do.
The same result holds if the objective function instead is to maximize the utility from stream of consumption as in (11.19), but with a logarithmic utility function.

Bibliography
Campbell, J. Y., and L. M. Viceira, 2002, Strategic asset allocation: portfolio choice of long-term investors, Oxford University Press.
Elton, E. J., M. J. Gruber, S. J. Brown, and W. N. Goetzmann, 2010, Modern portfolio theory and investment analysis, John Wiley and Sons, 8th edn.

233

12

Dynamic Portfolio Choice

More advanced material is denoted by a star ( ). It is not required reading.

12.1

Optimal Portfolio Choice: CRRA Utility and iid Returns

Suppose the investor wants choose portfolio weights (v t ) to maximize expected utility, that is, to solve max E t u.W t Cq /;
(12.1)
vt

where and E t denotes the expectations formed today, u./ is a utility function and W t Cq is the wealth (in real terms) at time t C q .
This is a standard (static) problem if the investor cannot (or it is too costly to) rebalance the portfolio. (In some cases this leads to a mean-variance portfolio, in other cases not.)
If the distribution of assets returns is iid, then the portfolio choice is unchanged over time—otherwise it changes. For instance, with mean-variance preferences, the tangency portfolio changes as the expected returns and/or the covariance matrix do.
Instead, if the investor can rebalance the portfolio in every time period (t C 1; :::; t C q 1), then this is a truly dynamic problem—which is typically more difficult to solve.
However, when the utility function has constant relative risk aversion (CRRA) and returns are iid, then we know that the optimal portfolio weights are constant across time and independent of the investment horizon (q ). We can then solve this as a standard static problem. The intuition for this result is straightforward: CRRA utility implies that the portfolio weights are independent of the wealth of the investor and iid returns imply that the outlook from today is the same as the outlook from yesterday, except that the investor might have gotten richer or poorer. (The same result holds if the objective function instead is to maximize the utility from stream of consumption, but with a CRRA utility function.)
With non-iid returns (predictability or time-varying volatility), the optimization is typically much more complicated. The next few sections present a few cases that we can handle. 234

12.2

Optimal Portfolio Choice: Logarithmic Utility and Non-iid Returns

Reference: Campbell and Viceira (2002)
12.2.1

The Optimization Problem 1

Let the objective in period t be to maximize the expected log wealth in some future period max E t ln W t Cq D max.ln W t C E t r t C1 C E t r t C2 C : : : C E t r t Cq /;

(12.2)

where r t is the log return, r t D ln.1 C R t / where R t is a net return. The investor can rebalance the portfolio weights every period.
Since the returns in the different periods enter separably, the best an investor can do in period t is to choose a portfolio that solves max E t r t C1 :

(12.3)

That is, to choose the one-period growth-optimal portfolio. But, a short run investor who maximizes E t lnŒW t .1 C R t C1 // D max.ln W t C E t r t C1 / will choose the same portfolio, so there is no horizon effect. However, the portfolio choice may change over time, if the distribution of the returns do. (The same result holds if the objective function instead is to maximize the utility from stream of consumption, but with a logarithmic utility function.)
12.2.2

Approximating the Log Portfolio Return

In dynamic portfolio choice models it is often more convenient to work with logarithmic portfolio returns (since they are additive across time). This has a drawback, however, on the portfolio formation stage: the logarithmic portfolio return is not a linear function of the logarithmic returns of the assets in the portfolio. Therefore, we will use an approximation
(which gets more and more precise as the length of the time interval decreases).
If there is only one risky asset and one riskfree asset, then Rpt D vR t C .1 v /Rf t .
Let ri t D ln.1 C Ri t / denote the log return. Campbell and Viceira (2002) approximate the log portfolio return by rpt rf t C v r t

rf t C v

2

=2

v2

2

=2;

(12.4)
235

where 2 is the conditional variance of r t . (That is, 2 is the variance of u t in r t D
E t 1 r t C u t .) Instead, if we let r t denote an n 1 vector of risky log returns and v the portfolio weights, then the multivariate version is rpt rf t C v 0 r t

rf t C v 0

2

=2

v 0 ˙v=2;

(12.5)

where ˙ is the n n covariance matrix of r t and 2 is the n 1 vector of the variances (that is, the the diagonal elements of that covariance matrix). The portfolio weights, variances and covariances could be time-varying (and should then perhaps carry time subscripts).
Proof. (of (12.4) ) The portfolio return Rp D vR1 C .1 v /Rf can be used to write
Â
Ã
1 C Rp
1 C R1
D1Cv
1:
1 C Rf
1 C Rf
The logarithm is rp ˚ rf D ln 1 C v exp.r1

rf /

1

«

:

The function f .x/ D ln f1 C v Œexp.x/ 1g has the following derivatives (evaluated at x D 0): df .x/=dx D v and d 2 f .x/=dx 2 D v.1 v /, and notice that f .0/ D 0. A second order Taylor approximation of the log portfolio return around r1 rf D 0 is then rp rf D v r1

1 rf C v.1
2

v / r1

rf

2

:

In a continuous time model, the square would equal its expectation, Var.r1 /, so this further approximation is used to give (12.4). (The proof of (12.5) is just a multivariate extension of this.)
12.2.3

The Optimization Problem 2

The objective is to maximize the (conditional) expected value of the portfolio return as in (12.3). When there is one risky asset and a riskfree asset, then the portfolio return is given by the approximation (12.4). To simplify the notation a bit, let eC1 be the condit
2
tional expected excess return E t .r t C1 rf;t C1 / and let t C1 be the conditional variance
(Var t .r t C1 /). Notice that these moments are conditional on the information in t (when the portfolio decision is made) but refer to the returns in t C 1.

236

The optimization problem is then maxvt rf;t C1 C v t

e t C1

C vt

2 t C1 =2

2 vt 2 t C1 =2:

(12.6)

The first order condition is
0D
vt D

e
2
t C1 C t C1 =2 e 2 t C1 C t C1 =2
;
2 t C1

vt

2 t C1 ,

so
(12.7)

which is very similar to a mean-variance portfolio choice. Clearly, the weight on the risky asset will change over time—if the expected excess return and/or the volatility does. We could think of the portfolio with v t of the risky asset and 1 v t of the riskfree asset as a managed portfolio.
Example 12.1 (Portfolio weight, single risky asset) Suppose
0:15, then we have v t D .0:05 C 0:15=2/=0:15 D 5=6 0:83.

e t C1

D 0:05 and

2 t C1

D

With many risky assets, the optimization problem is to maximize the expected value of (12.5). The optimal n 1 vector of portfolio weights is then
1
v t D ˙ t C1 .

e t C1

C

2 t C1 =2/;

(12.8)

where ˙ t C1 is the conditional covariance matrix (Cov t .r t C1 /) and t2C1 the n 1 vector of conditional variances. The weight on the riskfree asset is the remainder (1 10 v t , where
1 is a vector of ones).
Proposition 12.2 If the log returns are normally distributed, then (12.8) gives a portfolio on the mean-variance frontier of returns (not of log returns).
Figures 12.1–12.2 illustrate mean returns and standard deviations, estimated by exponentially moving averages (as by RiskMetrics). Figures 12.3–12.4 show how the optimal portfolio weights change (assuming mean-variance preferences). It is clear that the portfolio weights change very dramatically—perhaps too much to be realistic. The portfolio weights seem to be particularly sensitive to movements in the average returns, which potentially a problem since the averages are often considered to be more difficult to estimate
(with good precision) than the covariance matrix.
237

Mean excess returns (annualized

Mean excess returns (annualized

0.15
0.1
0.05
1990

0.15
0.1

Cnsmr
Manuf
2000

HiTec
Hlth

0.05
1990

2010

2000

2010

Mean excess returns (annualized
0.15
0.1
Other
0.05
1990

2000

2010

Figure 12.1: Dynamically updated estimates, 5 U.S. industries
Proof. (of (12.8)) From (12.5) we have
E rp

rf C v 0

e

2

C v0

=2

v 0 ˙v=2;

so the first order conditions are e C

2

=2

˙

1

v D 0n 1 :

Solve for v .
Proof. (of Proposition 12.2) First, notice that if the log return r t in (12.5) is normally distributed, then so is the log portfolio return (rpt ). Second, recall that if ln y N . ; 2 /, p 2 then E y D exp C 2 =2 and Std .y/ = E y D exp. 2 / 1, so that ln E y
=2 D

238

Std (annualized

Std (annualized

0.25

0.25
Cnsmr
Manuf

HiTec
Hlth

0.2

0.2

0.15

0.15

1990

2000

2010

1990

2000

2010

Std (annualized
0.25
Other
0.2

0.15
1990

2000

2010

Figure 12.2: Dynamically updated estimates, 5 U.S. industries and lnŒVar .y/ =.E y/2 C 1 D

2

D ln E y

. Combine to write lnŒVar .y/ =.E y/2 C 1=2;

which is increasing in E y and decreasing in Var.y/. To prove the statement, notice that y corresponds to the gross return and ln y to the log return, so corresponds to E t rpt C1 .
Clearly, is increasing in E y and decreasing in Var.y/, so the solution will be on the
MV frontier of the (gross and net) portfolio return.

239

Portfolio weights, Cnsmr

Portfolio weights, Manuf
10

6

fixed mean fixed cov

4
2

5

0
−2
1990

2000

2010

0
1990

Portfolio weights, HiTec

2000

2010

Portfolio weights, Hlth

4

2

2

1

0

0

−2

−1

−4
1990

2000

2010

1990

2000

2010

Figure 12.3: Dynamically updated portfolio weights, T-bill and 5 U.S. industries
12.2.4

A Simple Example with Time-Varying Expected Returns (Log Utility and
Non-iid Returns)

A particularly simple case is when the expected excess returns are linear functions of some information variables in the (k 1) vector z t e t C1

D a C bz t ; with E z t D 0;

(12.9)

at the same time as the variances and covariances are constant. In this expression, a is an n 1 vector and b is an n k matrix. Assuming that the information variables have zero means turns out to be convenient later on, but it is not a restriction (since the means are captured by a). The information variables could perhaps be the slope of the yield curve
240

Portfolio weights, Other

Portfolio weights, riskfree

0
2
−5

0

−10
1990

fixed mean fixed cov

−2

2000

2010

1990

2000

2010

Figure 12.4: Dynamically updated portfolio weights, T-bill and 5 U.S. industries and/or the earnings/price ratio for the aggregate stock market.
For the case with one risky asset, we get e t C1

vt D
D

‚ …„ ƒ a C bz t C

2

=2

2

C ! t , with

, or
D

(12.10)

aC

2
2

=2

and ! t D

bz t
2

:

(12.11)

so the weight on the risky asset varies linearly with the information variable bz t . (Even if there are many elements in z t , bz t is a scalar so it is effectively one information variable.)
In the second equation, the portfolio weight is split up into the static (average) weight
( ) and the time-varying part (! t ). Clearly, a higher expected return implies a higher portfolio weight of the risky asset.
Similarly, for the case with many risky assets we get e t C1

‚ …„ ƒ v t D ˙ 1 .a C bz t / C ˙
D

C ! t , with

1



2
1

=2, or

.a C

(12.12)
2

=2/ and ! t D ˙

1

bz t :

(12.13)

See Figure 12.5 for an illustration (based on Example 12.3). The figure shows the basic properties for the returns, the optimal portfolios and their location in a traditional mean-std figure. In this example, z t can only take on two different values with equal probability: 1 or 1. The figure shows one mean-variance figure for each state—and the
241

portfolio is clearly on them. However, the portfolio is not on the unconditional meanvariance figure (where the means and covariance matrix are calculated by using both states). Example 12.3 (Dynamic portfolio weights when z t is a scalar that only takes on the values 1 and 1; with equal probabilities) The expected excess returns are
(
a b when z t D 1 e t C1 D a C b when z t D 1:
The portfolio weights on the risky assets (12.13) are then
(
˙ 1 .a C 2 =2/ ˙ 1 b when z t D 1 vt D
˙ 1 .a C 2 =2/ C ˙ 1 b when z t D 1:
Example 12.4 (One risky asset) Suppose there is one risky asset and a D 1; b D 2; k D
3=4; 2 D 1;, then Example 12.3 gives e t C1

vt

1
3

4 =3 in low state
4
in high state

Example 12.5 (Numerical values for Example 12.3).
02 31 2 r1 1:19 0:32
B6 7C 6
Cov @4r2 5A D 4 0:32 0:81 r3 0:024 0:02

Suppose we have three assets with
3
0:24
7
0:02 5 =100;
0:23

and
2

e
1

3
0:41
6
7
D 4 0:295 =100 and
0:07

In this case, the portfolio weights are
2
3
0:112
6
7
v 1 40:0945 and v1
0:065

e
1

2
3
0:63
6
7
D 40:435 =100;
0:21

2
3
0:709
6
7
40:7365 :
0:610

242

Example 12.6 (Details on Figure 12.5) To transfer from the log returns to the mean and std of net returns, the following result is used: if the vector x
N . ; 2 / and y D exp.x/, then E yi D exp . i i C i i =2/ and Cov.yi ; yj / D exp i C j C . i i C jj /=2 exp.
MVF of basic assets in different states state −1 state 1 optimal constant

7
6

MVF from unconditional moments
8
Mean, %

Mean, %

8

5

7
6
5

0

5

10
Std, %

15

20

0

5

10
Std, %

15

20

Figure 12.5: Portfolio choice, two different states

12.3

Optimal Portfolio Choice: CRRA Utility and non-iid Returns

12.3.1

Basic Setup

An important feature of the portfolio choice based on the logarithmic utility function is that it is myopic in the sense that it only depends on the distribution of next period’s return, not on the distribution of returns further into the future. Hence, short-run and long-run investors choose the same portfolios—as discussed before. This property is special to the logarithmic utility function.
With a utility function with a constant relative risk aversion (CRRA) different from one, today’s portfolio choice would also depend on distribution of returns in t C 2 and onwards. In particular, it would depend on how the (random) returns in t C 1 are correlated with changes (in t C 1) of expected returns and volatilities of returns in t C 2 and onwards.
This is intertemporal hedging.
In this case, the optimization problem is tricky, so I will illustrate it by using a simple model. As in Campbell and Viceira (1999), suppose there is only one risky asset and let the (scalar) information variable be an AR(1) zt D zt

1

C Á t , where Á t

i idN.0;

2
Á /:

(12.14)
243

ij /

1.

In addition, I assume that the expected return follows (12.9) but with b D 1 (to simplify the algebra) e (12.15) t C1 D a C z t :
Combine the time series processes (12.14) and (12.15) to get the following expression for the excess return r teC1 D r t C1

rf D a C z t C u t C1 , where u t C1

i idN.0;

2

/:

(12.16)

Clearly, the conditional variance of the return is Var t .r teC1 / D Var.u t C1 / D 2 . This innovation to the return is allowed to be correlated with the shock to the future excpected return, Á t C1 , Cov.u t C1 ; Á t C1 / D uÁ . For instance, a negative correlation could be interpreted as a mean-reversion of the asset price level: a temporary positive return is followed by lower future (expected) returns.
Remark 12.7 ( How to estimate (12.14) and (12.16)). First, regress the excess returns on some information variables z t : r t C1 rf D a C b z t C u t C1 . Second, define z t D b .z t E z t /. Then, a regression of the return on z t gives a slope coefficient of one as in (12.16). Third, estimate an AR(1) on z t as in (12.14). Fourth and finally, estimate the covariance matrix of the residuals from the last two regressions.
It is important to realize that the unconditional and conditional autocovariances differ markedly Cov.r teC1 ; r teC2 / D

Cov t .r teC1 ; r teC2 / D

Var.z t / C uÁ :



(12.17)
(12.18)

This shows that the unconditional autocovariance of the return can be considerable at the same time as the conditional autocovariance may be much smaller. It is the latter than matters for the portfolio choice. For instance, it is possible that the unconditional autocovariance is zero (in line with empirical evidence), while the conditional covariance is negative.
Figure 12.6 shows the impulse response function (the forecast based on current information) of a shock to the temporary part of the return (u) under two different assumptions about how this temporary part is correlated with the mean return for the next period return. When they are uncorrelated, then a shock to the temporary part of the return is just
244

Average impulse response of return to a return innovation, ut
1.2
Cov(u, η) = 0
Cov(u, η) < 0

1
0.8
0.6
0.4
0.2
0
−0.2
0

1

2

3

4
5
Future period

6

7

8

9

Figure 12.6: Average impulse response of the return to changes in u0 , two different cases a “blip.” In contrast, when today’s return surprise indicates poor future returns (a negative covariance), then the impulse response function is positive (unity) in the initial period, but then negative for a prolonged period (since the expected return, a C z t , is autocorrelated).
Proof. (of (12.17)–(12.18)) The unconditional covariance is
Cov.r teC1 ; r teC2 / D Cov.z t C u t C1 ; z t C Á t C1 C u t C2 /
D

Var.z t / C

uÁ ;

since z t C u t C1 is uncorrelated with Á t C1 C u t C2 . The conditional covariance is
Cov t .r teC1 ; r teC2 / D Cov t .z t C u t C1 ; z t C Á t C1 C u t C2 /
D

uÁ ;

since z t is known in t and u t C1 is uncorrelated with u t C2 . It is also straightforward to show that the unconditional variance is
Var.r teC1 / D Cov.z t C u t C1 ; z t C u t C1 /
D Var.z t / C Var.u t /;

245

since z t and u t C1 are uncorrelated. The conditional variance is
Var t .r teC1 / D Cov.z t C u t C1 ; z t C u t C1 /
D Var.u t /;

since z t is known in t .
To solve the maximization problem, notice that if the log portfolio return, rp D ln.1 C
Rp /, is normally distributed, then maximizing E.1 C Rp /1 =.1
/ is equivalent to maximizing E rp C .1
/ Var.rp /=2;
(12.19)
where rp is the log return of the portfolio (strategy) over the investment horizon (one or several periods—to be discussed below).
12.3.2

One-Period Investor (Myopic Investor)

With one risky and a riskfree asset, a one-period investor (also called a myopic investor) maximizes E t rpt C1 C .1
/ Var t .rpt C1 /=2:
(12.20)
This gives the following weight on the risky asset vt D

e t C1

C

2

2

=2

D

a C zt C

2

=2

2

;

(12.21)

and the weight on the riskfree asset is 1 v t . With D 1 (log utility), we get the same results as in (12.7). With a higher risk aversion, the weight on the risky asset is lower.
Clearly, the portfolio choice depends positively on the (signal about) the expected returns.
Figure 12.7 for how the portfolio weight on the risky asset depends on the risk aversion.
Example 12.8 (Portfolio weight for one-period investor) With . ; a; uÁ ; Á / D .0:4; 0:05; 0:4; 2/ and D 2, the portfolio weight in (12.21) is (on average, that is, when z t D 0) vt D

0:05 C 0 C 0:42 =2
2 0:42

0:41:

246

Weight on risky asset, 2-period investor (CRRA) myopic 2-period
2-period (no rebal)

1.2
1
0.8

σ, a, σuη , ση =0.40 0.05 -0.40 2.00

0.6
0.4
0.2
0
1

1.5

2

2.5
3
3.5
Risk aversion (γ )

4

4.5

5

Figure 12.7: Weight on risky asset, two-period investor with CRRA utility and the possibility to rebalance
Proof. (of (12.21)). Using the approximation (12.4), we have
E rp D rf C v

Var.rp / D v 2

2

e

2

Cv

v2

=2

2

=2

:

The optimization problem is therefore max rf C v

e

v

Cv

2

2

=2

=2

v2

2

=2 C .1

/v 2

2

=2;

so the first order condition is e C

v

2

v

2

D 0:

Solve for v .
12.3.3

Two-Period Investor (No Rebalancing)

In period t , a two-period investor chooses v t to maximize
E t .rpt C1 C rpt C2 / C .1

/ Var t .rpt C1 C rpt C2 /=2:

(12.22)
247

x 10

−3

Normalized log(E/P)

Myopic portfolio weight on risky asset

1
0
−1
−2
−3

γ =1 γ =3

2.5
2
1.5
1
0.5
1980

2000

1980

2000

US stock returns 1970:1-2011:4
State variable: log(E/P)

Figure 12.8: Dynamic portfolio weights
The solution (see Appendix) is vD aC
2

2

=2 C .1 C /z t =2
:
2
.1
/. Á =2 C uÁ /

(12.23)

Similar to the one-period investor, the weight is increasing in the signal of the average return (z t ), but there are also some interesting differences. Even if the utility function is logarithmic ( D 1), we do not get the same portfolio choice as for the one-period investor. In particular, the reaction to the signal (z t ) is smaller (unless D 1). The reason is that in this case, the investor commits to the same portfolio for two periods—and the movements in average returns are assumed to be mean-reverting.
There are also some important patterns on average (when z t D 0). Then, D 1 actually gives the same portfolio choice as for the one-period investor. However, if > 1, and there are important shocks to the expected return, then the two-period investor puts a lower weight on the risky asset (the second term in the denominator tends to be positive).
The reason is that the risky asset is more dangerous to the two-period investor since rpt C2 is more risky than rpt C1 , since rpt C2 can be hit by more shocks—shocks to the expected return of rpt C2 . In contrast, if data is iid then those shocks do not exist (Var.Á t C1 / D 0), so the two-period investor makes the same choice as the one-period investor.
One more thing is worth noticing: if uÁ < 0, then the demand for the risky asset is higher than otherwise. This can be interpreted as a case where a temporary positive return
248

leads to lower future (expected) returns. With this sort of mean-reversion in the price level
(conditional negative autocorrelation), the risky asset is somewhat less risky to a long-run investor than otherwise. When extended to several risky assets, the result is that there us a higher demand for assets that tend to be negatively correlated with the future general investment outlook. See Figure 12.6 for an illustration of this effect and Figure 12.7 for how the portfolio weight on the risky asset depends on the risk aversion.
Example 12.9 (Portfolio weight without rebalancing) Using the same parameters values as in Example 12.8, (12.22) is (at z t D 0) vD 12.3.4

2

0:05 C 0:42 =2 C 0
0:42 .1 2/.22 =2 0:4/

0:07

Two-Period Investor (with Rebalancing)

It is more reasonable to assume that the two-period investor can rebalance in each period.
Rewrite (12.22) as
E t rpt C1 C E t rpt C2 C .1

/ŒVar t .rpt C1 / C Var t .rpt C2 / C 2 Cov t .rpt C1 ; rp2C1 /=2;
(12.24)
and notice that the investor (in period t ) can affect only those terms that involve rpt C1 (as the portfolio will be rebalanced in t C 1). He/she therefore maximizes
E t rpt C1 C .1

/ŒVar t .rpt C1 / C 2 Cov t .rpt C1 ; rp2C1 /=2:

(12.25)

The maximization problem is the same as for a one-period investor (12.20) if returns are iid (so the covariance is zero), or if D 1.
Otherwise, the covariance term will influence the portfolio choice in t . The difference to the no-rebalancing case is that the investor in t takes into account that rpt C2 will be generated by a portfolio with the weights of a one-period investor v t C1 D

a C z t C1 C
2

2

=2

:

(12.26)

(This is the same as (12.21) but with the time subscripts advanced one period). This affects both how the signal about future average returns (z t ) and the risk are viewed. The

249

solution is (a somewhat messy expression, see Appendix for a proof) vt D

a C zt C

2

2

=2

C

1

2
2

1
2

2

aC

2

=2 C z t

uÁ :

(12.27)

See Figure 12.7 for how the portfolio weight on the risky asset depends on the risk aversion and for a comparison with the cases of myopic portfolio choice and and no rebalancing.
As before, the portfolio choice depends positively on the expected return (as signalled by z t ). But, there are several other results. First, when D 1 (log utility), then the portfolio choice is the same as for the one-period investor (for any value of z t ). Second, when uÁ D Var t .u t C1 ; Á t C1 / D 0, then the second term drops out, so the two-period investor once again picks the same portfolio as the one-period investor does. Third ,
> 1 combined with uÁ < 0 increases (on average, z t D 0) the weight on the risky asset—similar to the case without rebalancing. In this case, the second term of (12.27) is positive. That is, there is a positive extra demand (in t ) for the risky asset: such an asset tends to pays off in t C 1 (since u t C1 > 0, which only affects the return in t C 1, not in subsequent periods) when the overall investment prospects for t C 2 become worse ( eC2 t is low since Á t C1 and thus z t C1 tends to be low when u t C1 is high and uÁ < 0). In this case, the return in t C 1, driven by the temporary shock u t C1 , partially hedges investment outlook in t C 1 (that is, the distribution of the portfolio returns in t C 2). The key to getting intertemporal hedging is thus that the temporary movements in the return partially offset future movements in the investment outlook.
To get a better understanding of the dynamic hedging, suppose again that we have a positive shock to the return in t C 1, that is, u t C1 > 0. This clearly benefit all investors, irrespective of whether they are can rebalance or not. However, the investor who can rebalance in t C 1 has advantange. His portfolio weight in t C 1 (when he’s a one-period investor) is given by (12.26), which depends on z t C1 . Knowing u t C1 does not tell us exactly what z t C1 is since the latter depends on the shock Á t C1 (see (12.14)). However, we know that
E.z t C1 jz t ; u t C1 / D z t C E.Á t C1 ju t C1 / D z t C

uÁ u; 2 t C1

(12.28)

where uÁ = 2 is the (population) regression coefficient from regressing Á t C1 on u t C1 .
(This follows from the standard properties of bivariate normally distributed variables.)
250

Therefore, the conditional expected one-perio portfolio weight (12.26)
E.v t C1 jz t ; u t C1 / D

a C zt C .

uÁ =

2
2

/u t C1 C

2

=2

(12.29)

:

When uÁ < 0, then a positive u t C1 (good for the return in t C 1, but signalling poor expected returns in t C 2) is on average followed by a lower weight (v t C1 ) on the risky asset than otherwise. See Figure 12.9.
This shows that an investor who can rebalance can enjoy the upside (in t C 1) without having to suffer the likely downside (in t C 2). Conversely, when he suffers a downside in t C 1, then he can enjoy the likely upside in t C 2. Overall, this makes the risky asset more attractive than otherwise.
Weight on risky asset in the next period (on average)
2
1.5
1
0.5
0
−0.5 σ, a, σuη , ση , γ =0.40 0.05 -0.40 2.00 2.00

−1
−0.2

−0.15

−0.1
−0.05
0
0.05
0.1
Shock to return in the next period, ut+1

0.15

0.2

Figure 12.9: Average portfolio weight v t C1 as a function of u t C1

Example 12.10 (Portfolio weight with rebalancing) Using the same parameters values as in Example 12.8, (12.27) is (at z t D 0) vt D

0:05 C 0 C 0:42 =2
1 22 2 1
C
0:05 C 0:42 =2 C 0 . 0:4/
2
2 0:4
2 0:42 22 0:42
0:41 C 0:76 D 1:17:
251

Consider a positive shock to the return in t C 1, for instance, u t C1 D 0:1 so r teC1 D
0:05 C 0 C 0:1 D 0:15. From (12.28), we have
E.z t C1 jz t ; u t C1 / D 0 C

0:4
0:42

0:1 D

0:25;

so the one-period portfolio weight (12.29) is (on average, conditional on u t C1 D 0:1)
E.v t C1 jz t D 0; u t C1 D 0:1/ D

0:05 C . 0:25/ C 0:42
D
2 0:42

0:375:

This is negative since the expected return for t C 2 is negative.
While this simplified case only uses one risky asset, it is important to understand that this intertemporal hedging is not about that a particular asset hedging the changes in its own return distribution. Indeed, if the outlook for a particular asset becomes worse, the investor could always switch out of it. Instead, the key effect depends on how a particular asset hedges the movements in tomorrow’s optimal portfolio—that is, tomorrow’s overall investment outlook.

12.4

Performance Measurement with Dynamic Benchmarks

Reference: Ferson and Schadt (1996), Dahlquist and Söderlind (1999)
Traditional performance tests typically rely on the alpha from a CAPM regression.
The benchmark in the evaluation is then a fixed portfolio consisting of assets that are correctly priced by the CAPM (obeys the beta representation). It often makes sense to use a more demanding benchmark—by including managed portfolios.
Let v.z/ be a vector of portfolio weights that potentially depend on the information variables in z . The return on such a portfolio is
Rpt D v.z/0 R t C Œ1

10 v.z/Rf D v.z/0 Re C Rf : t (12.30)

However, without restrictions on v.z/ it is impossible to sort out what sort of strategies that would be assigned neutral performance by a particular (multi-factor) model. Therefore, assume that v.z/ are linear in the K information variables
v.z t

1/

D „ƒ‚… z t 1 d „ƒ‚…
NK

(12.31)

K1

252

for any N K matrix d . For instance, when the expected returns are driven by the information variables z t as in (12.9), then the optimal portfolio weights (for an investor with logarithmic preferences) are linear functions of the information variables as in (12.11) or
(12.13).
It is clear that the portfolio return (12.30)–(12.31) can be written
Rpt D Re0 v.z t t D Re0 dz t t 1/
1

C Rf

C Rf

D .vec d /0 .z t

1

˝ Re / C Rf : t (12.32)

Remark 12.11 (Kronecker product) For instance, we have that if
2
3 z1 f1
6
7
6z1 f2 7
23
6
7
"# f1 6z f 7 z1 67
6 1 37 zD ; f D 4f2 5 , then z ˝ f D 6
7:
6z2 f1 7 z2 6
7
f3
6z f 7
4 2 25 z3 f3
Proof. (of (12.32)) Recall the rule that vec .ABC / D .C 0 ˝ A/ vec B . Here, notice that Re0 dz is a scalar, so we can use the rule to write Re0 dz D .z 0 ˝ Re0 / vec d . Transpose and recall the rule .D ˝ E/0 D D 0 ˝ E 0 to get .vec d /0 .z ˝ Re /
This shows that the portfolio return can involve any linear combination of z ˝ Re so the new return space is defined by these new managed portfolios. We can therefore think of the returns
Q
R t D .z t 1 ˝ Re / C Rf
(12.33)
t as the returns on new assets—which can be used to define, for instance, mean-variance frontiers. It is not self-evident how to measure the performance of a portfolio in this case. It could, for instance, be argued that the return of the dynamic part of the portfolio is to be considered non-neutral performance. After all, this part exploits the information in the information variables z , which is potentially better than keeping a fixed portfolio. In this case, the alpha from a traditional CAPM regression e e
Rpt D ˛ C ˇRmt C "i t

(12.34)
253

is a good measure of performance.
Example 12.12 (One risky asset, two states) If the two states in Example 12.4 are equally likely and the riskfree rate is 5%, then it can be shown that ˛ D 4:27% and ˇ D 2:4.
On the other hand, it may also be argued that a dynamic trading rule that investors can easily implement themselves should be assigned neutral performance. This can be done by changing the “benchmark” portfolio from being just the market portfolio to include managed portfolios. As an example, we could use the intercept from the following
“dynamic CAPM” (or “conditional CAPM”) as a measurement of performance e Rpt D ˛ C .ˇ C z t

e
1 / Rmt

e
D ˛ C ˇRmt C z t

C "t

e
1 Rmt

C "t :

(12.35)

where the second term are the dynamic benchmarks that capture the effect of time-varying portfolio weights. In fact, (12.35) would assign neutral performance (˛ D 0) to any pure
“market timing” portfolio (constant relative weights in the sub portfolio of risky assets, but where the split between riskfree and risky assets change).
Remark 12.13 In a multi-factor model we could use the intercept from e Rpt D ˛ C ˇf t C .z t

1

˝ ft / C "t ;

where f t is a vector of factors (excess returns on some portfolios), where ˝ is the Kronecker product.
12.4.1

A Simple Example with Time-Varying Expected Returns

To connect the performance evaluation in (12.34) and (12.35) to the optimal dynamic portfolio strategy (12.13), suppose the optimal strategy is a pure “market timing” portfolio.
This happens when the expected returns (12.9) are modelled as e t C1

D a C bz t ; with b D c.a C

where c is some scalar constant, while a and weights (12.13) vt 1 D C c zt 1 D
„ ƒ‚ …
!t

2

2

=2/;

(12.36)

are vectors. This gives the portfolio
.1 C cz t

1 /;

(12.37)
254

where is defined in (12.13). There are constant relative weights in the sub portfolio of risky assets, but the split between the risky assets (the vector v t 1 ) and riskfree (the scalar
1 10 v t 1 ) and change as z t 1 does: market timing.
Proof. (of (12.37)) Use b D c.a C 2 =2/ from (12.36) in (12.13)


!t D ˙

1
1

.a C

.a C

2

=2/

2

=2/cz t D

c zt :

With these portfolio weights, the excess return on the portfolio is e Rpt D

0

Re .1 C cz t t 1 /:

(12.38)

First, consider using the intercept (˛ ) from the the CAPM regression (12.34) as a measure of performance. If the market portfolio is the tangency portfolio (for instance, we could assume that the rest of the market do static MV optimization so the market equilibrium satisfies CAPM), then the static part of the return (12.38), 0 Re , will be t 0 e assigned neutral performance. The dynamic part, cz t 1 R t , is different: it is like the return on a new asset—which does not satisfy CAPM. It is therefore likely to be assigned a non-neutral performance.
Second, consider using the intercept from the dynamic CAPM regression (12.35) as a measure of performance. As before, the static part of the return should be assigned neutral performance (as the market/tangency portfolio is one of the regressors). In this case, also the dynamic part of the portfolio is likely to be assigned neutral performance (or close to it). This is certainly the case when the static portfolio weights, , are proportional e weights in the market portfolio. Then, the z t 1 Rmt term in dynamic CAPM regression
(12.35) exactly matches the 0 Re z t 1 part of the return of the dynamic strategy (12.38). t See Figure 12.5 for an illustration (based on Example 12.3). Since, the portfolio is not on the unconditional mean-variance figure, it does not have a zero alpha when regressed against the tangency (as a proxy for the “market”) portfolio. (All the basic assets do, by construction, have zero alphas.) However, it does have a zero alpha when regressed on
(Rm ; zRm ).
However, dynamic portfolio choices that are more complicated than the market timing strategy in (12.37) would not necessarily be assigned neutral performance in (12.35).
255

MVF of basic assets in different states

MVF from unconditional moments
8

state −1 state 1

7

Mean, %

Mean, %

8

6
5

of basic assets of managed portfolios

7
6
5

0

5

Returns:

10
Std, %

15

20

asset 1 asset 2 asset 3

ER, state -1
ER, state 1
Std(R)

5.1
5.9
10.9

5.2
6.3
9.0

5.1
5.4
4.8

0

5

10
Std, %

Portfolio weights:
Asset 1
Asset 2
Asset 3

ψ
-0.03
0.91
1.03

15

ω−1 /ψ
-0.75
-0.75
-0.75

20

ω1 /ψ
0.75
0.75
0.75

The states have equal probabilities
Correlation matrix:

Alpha against: Rm (Rm , xRm )
1.00
0.33
0.45

0.33
1.00
0.05

0.45
0.05
1.00

Asset 1
Asset 2
Asset 3
DynamicP

0.00
0.00
0.00
0.52

0.00
0.00
0.00
0.00

tangency
0.00
0.00
0.00
0.00

Figure 12.10: Portfolio choice, two different states where market timing is optimal
However, also such strategies could be assigned a neutral performance—if we augmented the number of benchmarks to properly capture the time-varying portfolio weights. In this case, this would require using z t 1 ˝ Re (where Re are the returns on the original assets) t t as the regressors e e
Rpt D ˛ C ˇRmt C .z t 1 ˝ Re / C " t :
(12.39)
t
With those benchmarks all strategies where the portfolio weights on the original assets are linear in z t 1 would be assigned neutral performance. In practice, evaluation of mutual funds typically define a small number (perhaps 5) of returns and even fewer instruments
(perhaps 2–3). The instruments are typically inspired by the literature on return predictability and often include the slope of the yield curve, the dividend yield or lagged returns. Figures 12.10 illustrates the case when the portfolio has a zero alpha against (Rm ; zRm ), while Figure 12.11 shows a case when the portfolio does not.
256

MVF of basic assets in different states

MVF from unconditional moments
8

state −1 state 1

7

Mean, %

Mean, %

8

6
5

of basic assets of managed portfolios

7
6
5

0

5

10
Std, %

Returns:

15

20

0

ER, state -1
ER, state 1
Std(R)

5.8
5.8
9.0

10
Std, %

Portfolio weights:

asset 1 asset 2 asset 3
5.1
5.9
10.9

5

5.1
5.4
4.8

15

ψ

ω−1 /ψ
7.11
0.12
-0.57

-0.03
0.91
1.03

Asset 1
Asset 2
Asset 3

20

ω1 /ψ
-7.11
-0.12
0.57

The states have equal probabilities
Correlation matrix:

Alpha against: Rm (Rm , xRm )
1.00
0.33
0.45

0.33
1.00
0.05

0.45
0.05
1.00

0.00
0.00
0.00
0.20

Asset 1
Asset 2
Asset 3
DynamicP

0.00
0.00
0.00
0.16

tangency
0.00
0.00
0.00
0.00

Figure 12.11: Portfolio choice, two different states where market timing is not fully optimal

A

Some Proofs

Proof. (of (12.23)) (This proof is a bit crude, but probably correct....) The objective is to maximize (12.24). Using (12.4) we have rpt C1 rpt C2

rf C vr teC1 C v rf C vr teC2 C v

2

=2

v2

2

=2

2

=2

v2

2

=2;

so rpt C1 C rpt C2

2rf C v.r teC1 C r teC2 / C v

2

v2

2

Cv

2

v2

:

The expected value of the two-period return is
E t .rpt C1 C rpt C2 / D 2rf C v.

e t C1

C Et

e t C2 /

2

;

257

so the derivative with respect to v
@ E t .rpt C1 C rpt C2 /
D
@v t

C Et

e t C1

e t C2

2

C

2v

2

(foc1)

:

The variance of the two-period return is
Var t .rpt C1 C rpt C2 / D v 2 Var t .r teC1 C r teC2 /; so the derivative is
@ Var t .rpt C1 C rpt C2 /
D 2v Var t .r teC1 C r teC2 /:
@v t

(foc2)

Combine (foc1) and (foc2) to get the first order condition
0D
D

@ E t .rpt C1 C rpt C2 / 1
C
@v t
2
e t C1

C Et

e t C2

C

2

2v

@ Var t .rpt C1 C rpt C2 /
@v t
2

C .1

/v Var t .r teC1 C r teC2 /;

so we can solve for the portfolio weight as vD C E t eC2 C 2 t :
.1
/ Var t .r teC1 C r teC2 / e t C1

2

2

Recall that e t C1

Et e t C1

Notice also that r teC1

e t C2

C Et

e t C2

D a C zt

D a C E t z t C1 D a C z t , so
D 2a C .1 C /z t :

E t r teC1 D u t C1 and that r teC2

E t r teC2 D Á t C1 C u t C2 ,

Var t .r teC1 C r teC2 / D Var t .u t C1 C Á t C1 C u t C2 / D

2

C

2
Á

C

2

C2

uÁ ;

258

since Cov.u t C1 ; u t C2 / D Cov.Á t C1 ; u t C2 / D 0. Combining into the expression for v gives vD

2

2

2a C .1 C /z t C 2
2
.1
/.2 2 C Á C 2

uÁ /

a C .1 C /z t =2 C 2 =2
2
.1
/. 2 C Á =2 C uÁ /

D

2

D

a C .1 C /z t =2 C 2 =2
:
2
2
.1
/. Á =2 C uÁ /

Proof. (of (12.27)) (This proof is a bit crude, but probably correct....) The objective is to maximize
E t rpt C1 C .1

/ŒVar t .rpt C1 /=2 C Cov t .rpt C1 ; rp2C1 /:

(obj)

Using (12.4) we have rpt C1 rpt C2

rf C v t r t C1

rf C v t C1 r t C2

rf C v t

2

2 vt =2

rf C v t C1

2

2

=2

2 v t C1

=2

2

=2:

The derivative with respect to v of the expected return in (obj) is
@ E t rpt C1
D
@v t

e t C1

C

2

=2

vt

2

(foc1)

:

The variance term in (obj) is
2
2
Var t .rpt C1 / D v t Var t .r t C1 / D v t

since r t C1

2

;

rf D a C z t C u t C1 . The derivative of the variance part of (obj) is
1
2

@ Var t .rpt C1 /
D .1
@v t

/v t

2

:

(foc2)

259

The covariance in (obj) is
Cov t .rpt C1 ; rp2C1 / D v t Cov t u t C1 ; v t C1 r t C2
D v t Cov t .u t C1 ; v t C1


e t C2

rf C v t C1

C v t C1 2 =2 ƒ‚ 2

2 v t C1

=2

2 v t C1

B

2

=2/;


2

=2 ;
(ff)

where the second line uses the fact that r t C2 rf D eC2 C u t C2 and that u t C2 is t uncorrrelated with u t C1 and v t C1 . There are two channels for the covariance: u t C1 might be correlated with the expected return, eC2 , or with the portfolio weight, v t C1 . The t portfolio weight from the one-period optimization (12.21), but for t C 1, is v t C1 D

a C z t C1
Q
2

;

where a D a C 2 =2 (this notation is only used to make the subsequent equations shorter)
N
The B term in (ff) can then be written
Â
Ã
1
1
B D .a C z t C1 / .a C z t C1 / 2 1
N
N
2
Â
Ã
1
1
2
D 2az t C1 C z t C1
N
1
+ constants
2
2
2
Since z t C1 D z t C Á t C1 , we have z t C1 D known in t , we therefore have

B D 2 . a C z t / Á t C1 C
N

22 z t C Á2C1 C 2 t Á2C1 t 1
2

Â
1

1
2

z t Á t C1 . Dropping variables

Ã

C known in t

Since Cov t u t C1 ; Á2C1 D 0 (since they are jointly normally distributed) the covariance t in (ff)
Â
Ã
1
1
Cov t .rpt C1 ; rp2C1 / D v t .a C z t / uÁ 2 2
N
The derivative of the corvariance part of (obj) is
.1

@ Cov t .rpt C1 ; rp2C1 /
/
D .1
@v t

Â
/2

1

Ã

a C zt
N
2

uÁ :

(foc3)

260

Combine the derivatives (foc1), (foc2) and (foc3) to the first order condition
0D

@ E rpt C1
C .1
@v t

D.

e t C1

C

2

=2

/

@ Var t .rpt C1 /=2
C .1
@v t

vt

D

e t C1

C

2

=2

D

e t C1

C

2

=2 C .1

vt

2

/ C .1

2

C .1
Â
/2

/

@ Cov t .rpt C1 ; rp2C1 /
@v t
Â
Ã
N
1 a C zt
/2
2

/v t 2 C .1
Â
Ã
1 a C zt
N
/2
2
Ã
1 a C zt
N
2
2







vt ;

which can be solved as (12.27).

Bibliography
Campbell, J. Y., and L. M. Viceira, 1999, “Consumption and portfolio decisions when expected returns are time varying,” Quarterly Journal of Economics, 114, 433–495.
Campbell, J. Y., and L. M. Viceira, 2002, Strategic asset allocation: portfolio choice of long-term investors, Oxford University Press.
Dahlquist, M., and P. Söderlind, 1999, “Evaluating portfolio performance with stochastic discount factors,” Journal of Business, 72, 347–383.
Ferson, W. E., and R. Schadt, 1996, “Measuring fund strategy and performance in changing economic conditions,” Journal of Finance, 51, 425–461.

261

Similar Documents

Premium Essay

Simple Finance Notes

...the good and services they produce. * The company is successful when the cash inflow exceeds the cash outflow. * When this occurs, the remaining cash is called residual cash flows. * Companies that have less inflow than outflow are forced into insolvency. * Insolvency is the inability to pay debts when they are due. Three Fundamental Decisions in Financial Management Financial managers may be confronted with decisions to make when running business. There are three fundamental decisions, which include: 1. Capital Budgeting Decisions: Identifying the productive assets the company should buy and how much money can the company afford to spend. 2. Financing Decisions: Determining how the company should finance or pay for assets. 3. Working Capital Management Decisions: Determining how day-to-day financial managed so that the company can pay its expenses and also how surplus should be invested. 1.2 Forms of Business Organisation 1. Sole Traders: A business owned by a single individual * All...

Words: 381 - Pages: 2

Premium Essay

Notes for Corporate Finance

...Corporate Finance Notes * Chapter One: Introduce to Corporate Finance 1. Three Questions: A. What Long-term asset should be invested? Capital Budgeting B. How to raise cash for capital expenditures? Capital Structure C. How to manage short-term cash flow? Net Working Capital 2. Capital Structure: Marketing Value of Firm = MV of Debt + MV of Equity 3. Finance perspect and Accountant perspect: Finance: Cash Flow ! Accountant: A/R means profit ! 4. Sole proprietorship, parternership and corporation | 5. The goal of financial management: Maximize the current value per share of the existing stock. 6. Agency problem and Control of the Corporation Agency Relations: stockholders with management - agency cost Goal: Management has a significant incentive to act in the interests of stockholders. Conclusion: Stockholders control the firm and the stockholder wealth maximization is the relevant goal of the corporation . 7. Financial Market: Money Market & Capital Market Money Market: loosely connected markets – dealer markets. Core – market banks, government secutities dealers, money brokers 8. Financial Market: Primary Market & Secondary Market Primary Market: New Issues initially sell securities – public offerings and private placement IPO: underwriten by a syndicate (辛迪加, 财团) of IBs. Buy and sell for a higher price. Register in SEC. Private Placement: avoid the cost of preparing the registration...

Words: 2072 - Pages: 9

Premium Essay

Finance Notes Chapter 1

...Real Assets vs. Financial Assets I. Real Assets a. Land, buildings, equipment and knowledge that can be used to produce goods and services b. Generate net income to economy II. Financial Assets a. Stocks and bonds b. Claims to the income generated by real assets c. Define the allocation of income or wealth among investors d. Investor’s returns come from the income produced by the real assets that were financed by the issuance of those securities Taxonomy of Financial Assets I. Fixed-income/ Debt a. Promise fixed stream of income or a stream of income that is determined to a specified formula b. Investment performance is lease closely tied to the financial condition of the issuer c. Money market is fixed income securities that are short term, highly marketable, and generally very low risk i. Ex: US Treasury bills or bank certificates of deposit (CDs) d. Capital market range from very safe to relatively risky i. Treasury bonds and bonds issued by federal agencies, state and local municipalities, and corporations II. Equity a. Represents and ownership share in the corporation b. Payments are not promised, dividends may be paid c. Value will increase if firm is successful, performance is tied directly to success of firm and its real assets- tend to be riskier III. Derivative securities (options and futures contracts) a. Provide payoffs that are determined by the prices of other assets such as bond or stock prices b. Used to hedge risks or transfer them Financial...

Words: 479 - Pages: 2

Free Essay

Foundations of Finance Notes Sheet

...| | | | MULTIPLE PAYMENT | | ANNUITIES | PERPETUITIES | EFFECTIVE ANNUAL RATE Periodic Compounding | Continuous Compounding | M = times compounded in the year, APR = Annual Percentage Rate (annual quoted rate) | HOLDING PERIOD RETURN Buy investment at price V(0), reinvest all cash flows until period T, sell investment and reinvested cash flow for V(T). V(T) = ending prince + cash dividend | INTERNAL RATE OF RETURN (calculator input) Initial Capital Contribution | 1st period return | 2nd period return | 3rd period return | | STATSTICSCorrelation | Standard Deviation, where p = probabilityVariance | Covariance | | | If two securities are perfectly negatively correlated (i.e.), complete the squarei.e. if .and , then , and | MEAN VARIANCE UTILITY where A is risk aversion ([risk neutral] 0 ≤ A ≤ 1 [risk averse]) SHARPE RATIO CAPITAL ALLOCATION LINE , where = w SINGLE INDEX MODEL Idiosyncratic Risk Idiosyncratic Risk Systemic Risk Systemic Risk SECURITY MARKETS LINE , where This can also be used to find the discount rate as E[Ri] Security I’s alpha is: ________________________________________________________________________________ GORDON’S GROWTH MODEL Price-Earnings Ratio Sensitivity of Price-Earnings Ratio w.r.t. b Price-Dividend Ratio R = required returns, g = expected growth, b = 1 – D0/E0 = plowback Return on Equity = g/b ______...

Words: 307 - Pages: 2

Premium Essay

Finance Notes Chapter 1 Finance 234

...Chapter 1 Finance is the study how people allocate scarce resources over time. Why should finance be studied? • To manage your personal resources • To deal with world business • To pursue interesting and rewarding opportunities • To make informed public choices as a citizen • To expand your mind Discuss and provide examples of the four basic financial decisions every household faces • Consumption and saving decisions: How much of their current wealth should the y spend on consumption and how much of their current income should they save for the future? • Investment decisions: How should they invest the money they have saved? • Financing decisions: When and how should households use other people’s money to implement their consumption and investment plans? • Risk-management decisions: How and on what terms should households seek to reduce the financial uncertainties they face or when should they increase their risks? Describe the types of financial decisions firms make • Strategic planning. What business the company wants to be in. • Capital budgeting process. Determining what asset to acquire. • Investment project. Investing in the selected asset. • Working capital management. The long term and the day-to-day operations. List the three types of business organizations, and describe the advantages and disadvantages of each Sole Proprietorship- a firm owned by an individual or a family, in which the assets and liabilities for the firm are the personal...

Words: 1177 - Pages: 5

Premium Essay

Finance 234 Notes Chapter 1

...Chapter 1 Finance is the study how people allocate scarce resources over time. Why should finance be studied? • To manage your personal resources • To deal with world business • To pursue interesting and rewarding opportunities • To make informed public choices as a citizen • To expand your mind Discuss and provide examples of the four basic financial decisions every household faces • Consumption and saving decisions: How much of their current wealth should the y spend on consumption and how much of their current income should they save for the future? • Investment decisions: How should they invest the money they have saved? • Financing decisions: When and how should households use other people’s money to implement their consumption and investment plans? • Risk-management decisions: How and on what terms should households seek to reduce the financial uncertainties they face or when should they increase their risks? Describe the types of financial decisions firms make • Strategic planning. What business the company wants to be in. • Capital budgeting process. Determining what asset to acquire. • Investment project. Investing in the selected asset. • Working capital management. The long term and the day-to-day operations. List the three types of business organizations, and describe the advantages and disadvantages of each Sole Proprietorship- a firm owned by an individual or a family, in which the assets and liabilities for the firm are the personal...

Words: 1177 - Pages: 5

Premium Essay

Finance Notes

...Summary -Cash flows from owning a share of stock come in the form of future dividends -Zero growth -Share of common stock with a constant dividend -Constant Growth -dividend grows at a steady rate -Dividend growth model -model that determines the current price of a stock by its dividend next period divided by discount rate minus the dividend growth rate -If the growth rate is bigger than the discount rate, then the present value of the dividend gets bigger and bigger -Nonconstant growth -allows for supernormal growth rates over some finite length of time -Dividend yield -stocks expected cash dividend divided by its current price. -similar to current yield on a bond -Capital gains yield -rate at which the value of an investment grows -Common stock features -shareholders have rights-right to elect directors and control the corporation through this. -Proxy -grant of authority granted by a shareholder allowing another individual to vote that shareholders shares -Other rights -right to share proportionally in dividends paid -right to share proportionally in assets remaining after liabilities have been paid in liquidation. -Right to vote on stockholder matters of great importance Preferred stock -has preference over common stock in the payment of dividends -Preferred stock is a form of equity from a legal and tax standpoint. -Has a fixed dividend -The Stock Markets -consists of primary and secondary markets -Primary markets- shares...

Words: 317 - Pages: 2

Premium Essay

Finance Notes

...3. As a shareholder of a firm that is contemplating a new project, would you be more concerned with the accounting break-even point, the cash break-even point, or the financial break-even point? Why? From the shareholder perspective, the financial break-even point is the most important. A project can exceed the accounting and cash break-even points but still be below the financial break-even point. This causes a reduction in shareholder (your) wealth. 4. In an effort to capture the large jet market, Airbus invested $13 billion developing its A380, which is capable of carrying 800 passengers. The plane has a list price of $280 million. In discussing the plane, Airbus stated that the company would break even when 249 A380s were sold. a. Assuming the break-even sales figure given is the cash flow break-even, what is the cash flow per plane? The cash flow per plane is the initial cost divided by the breakeven number of planes, or: Cash flow per plane = $13,000,000,000 / 249 Cash flow per plane = $52,208,835 b. Airbus promised its shareholders a 20 percent rate of return on the investment. If sales of the plane continue in perpetuity, how many planes must the company sell per year to deliver on this promise? In this case the cash flows are a perpetuity. Since we know the cash flow per plane, we need to determine the annual cash flow necessary to deliver a 20 percent return. Using the perpetuity equation, we find: PV = C /R $13,000,000,000 = C / .20 C = $2,600,000...

Words: 454 - Pages: 2

Premium Essay

Global Finance Note

...Chapter 1 Current Mutinational Challenges and the Global Economy The Global Financial Marketplace Assets(government debt securities), institutions(central banks, commercial/investment bank), linkages(interbanks) Eurocurrency markets serve two valuable purposes:Eurocurrency deposits are an efficient and convenient money market device for holding excess corporate liquidity, The Eurocurrency market is a major source of short-term bank loans to finance corporate working capital needs (including export and import financing) What Is Different About International Financial Management Market Imperfections: A Rationale for the Existence of the Multinational Firm MNE motives: Market seekers, Raw material seekers, Production efficiency seekers, Knowledge seekers, Political safety seekers Globalization process -Stage I: early domestic phase growing into the international trade phase, Stage II: A successful firm will continue to grow from simple international trade to the multinational phase characterized by production and investment both at home and abroad Twin agency: Chapter 2 Corporate Ownership, Goals, and Governance Who Owns the Business The Goal of Management two models: 1.shareholder wealth maximization(max return&min risk): market efficient&risk exsit, unsystematic risk can be diversified, systematic risk can be eliminated. Replace, take-over, vote/share 2.stakeholder capitalism model(labor...

Words: 1984 - Pages: 8

Premium Essay

Dw Note for Finance

...Introduction to Data Warehousing and Business Intelligence Slides kindly borrowed from the course “Data Warehousing and Machine Learning” Aalborg University, Denmark Christian S. Jensen Torben Bach Pedersen Christian Thomsen {csj,tbp,chr}@cs.aau.dk Course Structure • Business intelligence   Extract knowledge from large amounts of data collected in a modern enterprise Data warehousing, machine learning Acquire theoretical background in lectures and literature studies Obtain practical experience on (industrial) tools in practical exercises Data warehousing: construction of a database with only data analysis purpose • Purpose   Business Intelligence (BI) Machine learning: find patterns automatically in databases 2 •1 Literature • Multidimensional Databases and Data Warehousing, Christian S. Jensen, Torben Bach Pedersen, Christian Thomsen, Morgan & Claypool Publishers, 2010 • Data Warehouse Design: Modern Principles and Methodologies, Golfarelli and Rizzi, McGraw-Hill, 2009 • Advanced Data Warehouse Design: From Conventional to Spatial and Temporal Applications, Elzbieta Malinowski, Esteban Zimányi, Springer, 2008 • The Data Warehouse Lifecycle Toolkit, Kimball et al., Wiley 1998 • The Data Warehouse Toolkit, 2nd Ed., Kimball and Ross, Wiley, 2002 3 Overview • • • • Why Business Intelligence? Data analysis problems Data Warehouse (DW) introduction DW topics    Multidimensional modeling ETL Performance optimization 4 •2 What is Business Intelligence (BI)? • From...

Words: 8493 - Pages: 34

Premium Essay

Finance Investment Notes

...Finance 202 Alex Low – Course Notes Chapter 10 – bond prices and yields 10.1 Bond Characteristics Bond: A security that obligates the issure to make specific payments to the holder over time. Face value/par value: The payment at which is made at maturity to the bond holder Coupon rate: A bonds annual interest payment per dollar of par value Zero coupon bonds: pays no coupons, sells at discount, provides only payment of par value at maturity. If a bond is purchased between coupon dates the buyer must pay the seller for accrued interest. [Formula] Corporate Bonds: like government bonds except issued by companies. Floating rate bonds: Coupon rates periodically reset according to specified market date. Preference Stock: Although strictly classified as equity, it is often included in fixed income universe. Preference stock often pay a fixed dividend. Other domestic issuers: there are other issuers of bonds. Local governments issue municipal bonds to finance local projects. International bonds: * Foreign bonds: issued by a borrower from a country other than the one in which the bond is sold. These are dominated in the currency of the market country. * Eurobonds: different as denominated in currency (usually of the issuing country) different than that of market. Inflation protected securities: face values change with changes in price level. There is a fixed coupon rate, the amount changes with principle. 10.2 Bond pricing:...

Words: 8410 - Pages: 34

Premium Essay

Finance Lecture Notes

...International Finance S.O.E. – Small Open Economy Foreign Exchange – a foreign country’s currency. Exchange Rate: e, the “price” of a foreign currency. Eg. The price of 1 US dollar is $ CDN , e=$ CDN Currency Appreciation: The value of a currency rises relative to other currencies. – e falls. (the price of foreign currency falls) Currency Depreciation: a currency’s value falls. – e rises. Case 1 Spring break Case 2 “Hollywood North” grows Exchange Rate Regimes Fixed Exchange Rate Regimes * The external value of the currency is set at a certain level. * Pegged to the price of gold. (“Gold Standard”) - Bahamas -Panama In order to maintain the value of the currency at the predetermined level, the Bank of Canada must intervene regularly in the foreign exchange market. * Canada: 1962- 1970 Problem: Can’t just set (“fix”) your currency at some desired rate. Strong forces of S and D in international currency markets. Suppose: Value of Canadian dollar rises. BOC must step in and sell Cdn dollars in International Currency markets to offset the upward pressure on the currency. Monetary policy is being used to support e at the desired level, at the expense of the domestic economy. “Fixed exchange rates” policy requires adjustments of the Canadian money supply. * “becomes” Canada’s Monetary Policy. A country Cannot use one policy instrument to influence two policy targets. For example, Canada can use monetary policy to target interest...

Words: 313 - Pages: 2

Premium Essay

Something Borrowed Malcolm Gladwell Analysis

...Those notes can only be sequenced so many times before they are repeated by a new musician and called “original”. Intellectual property has been protected in the courts systems, but has favored personal interest over creativity and borrowing. In the case of Weber vs. Repp for example, Repp was claiming to be the owner of the copied Catholic folk music stolen to create music by Weber. With help from a lawyer, it is proven that Weber wrote a song previous to the music and songs by Repp. It was demonstrated that Weber wrote a song, Repp wrote another song sounding similar, and then Weber wrote the song in question. This showing that Weber borrowed from himself and Repp borrowed from him. The musical notes played in the same sequence were copied by both composers and therefore the courts dismissed the case, musical notes are not owned by any one composer. It does not matter what you copy but how much you choose to take. The idea behind Gladwell’s argument is that borrowing some to be creative is and needs to be acceptable in the eyes of “plagiarism...

Words: 1296 - Pages: 6

Free Essay

Narrative

...to harmonize, considering it was our first year learning an instrument. There was no reading or writing when it came to playing the instruments, but with music, a story can be made. For example, half the class would play our recorders in sync with one another, and other students in the class would play percussion. With the rhythm of the music combined, the feel and sound of the music gives the audience a feel of a different environment, such as feeling as though you are taking a journey through an Indian village, or celebrating the first fourth of July in America. As I progressed through the year, music classes turned into singing as well. In order to know the words that we were singing, we had paperback music, which had music lines, notes, and words for us to...

Words: 1172 - Pages: 5

Free Essay

Integrity

...through the paper. Halfway through the paper, I saw my friend John suspiciously looking at the class. My instincts told me that something was wrong. As a result, I began to keep an eye on John. Suddenly, I saw John taking notes out from his pencil case! My mouth hung wide open and I gasped in shock. How could John do that! I thought should I report him? The devil in my mind said that I should not care about this thing after all, he is still my best friend while the angel said that I should be honest and report him. After thinking for a while, I decided to report him. I raised my hand and told the teacher “ Mr Tan, John is cheating by using notes from his pencil case.” The teacher nodded his head and walked towards John’s table. Mr Tan said “John! Why are you cheating?” John shook his head to deny that he did not cheat. Mr Tan confiscated his pencil case and dumped the contents out. Out came pencils, erasers and pens. But there was no notes inside! John let out a smirk from his mouth. I was shocked! I thought that there was a note? Just when I thought all hope was lost, Mr Tan found another zip at the pencil case and he opened it. Suddenly, John’s smirk began to vanish. Waves of panic overwhelmed him. The hidden note was found there! Mr Tan looked at John sternly. He brought John to the principal’s office to explain what had happened. On the next day, the fiery-tempered Discipline Master caned John during assembly period. After this incident...

Words: 333 - Pages: 2