Statistics & Finance II

Taught by Sotirios Damouras.
PDF version here

W1: Financial Data & Returns

Continuous double auction

Real-time mechanism to match buyers & sellers and determine prices at which trades execute
At any time, participants can place orders in the form of bids (buy) and asks (sell)
Matching orders (bid $\geq$ ask) are executed right away, whereas outstanding orders are maintained in an order book

Order types

Limit order: transact at no more/less than a specific price
- If order not filled, it’s kept in the order book
Market order: transact immediately at current market price
- A single order can have more than one price
Iceberg order: contains both hidden and displayed liquidity
- Splits a large order into smaller ones to maintain order anonymity

Financial data

Quote data: record of bid/ask prices from order book
Trade data: record of filled orders

Daily data

Open/close
- Adjusted close (used for calculating returns): adjusted for dividends and splits
High/low
Volume

Candlestick

Green: close > open
Red: close < open

Other data

FX rates: currency prices set by global financial centers
LIBOR rates: average interest rate that major London banks would be charged when borrowing from each other

Reliability of financial data

Financial data could be skewed by

Fake orders: trades placed to manipulate prices w/o intention to trade
Fake trades: trades where buyer and seller is the same party, used to increase trading activity

Returns

$R_{t} = \frac{P _{t} - P _{t - 1}}{P _{t - 1}} = \frac{P _{t}}{P _{t - 1}} - 1$

Log returns (assumes continuous compounding)

r_{t} = lo g (1 + R_{t}) = lo g (\frac{P _{t}}{P _{t - 1}}) = p_{t} lo g (P_{t}) - lo g (P_{t - 1}) = p_{t} - p_{t - 1}

Dividend adjustment

Assuming dividend is reinvested, the adjusted return is

R_{t} = \frac{P _{t} + D _{t}}{P _{t - 1}} - 1 r_{t} = lo g (P_{t} + D_{t}) - lo g (P_{t - 1})

The dividend is added back to the price (after the price drop)

Split adjustment

R_{t} = \frac{P _{t}}{P _{t - 1} /2} - 1 r_{t} = lo g (P_{t}) - lo g (P_{t - 1} /2)

Net vs log returns

$R_{t} \approx r_{t}$ for small values of $R_{t}$ (<1%)

Taylor approx.: $f (x) \approx f (x_{0}) + f^{'} (x_{0}) (x - x_{0}) + \frac{1}{2} f^{''} (x_{0}) (x - x_{0})^{2} + ...$

r_{t} r_{t} = lo g (x 1 + R_{t}) \leftarrow expand log function around x_{0} = 1 \approx lo g (1) + lo g^{'} (x) ∣_{x = 1} \cdot (x 1 + R_{t} - x_{0} 1) + ... = 0 + \frac{1}{1} R_{t} \approx R_{t}

Monthly returns

For daily net returns $R_{1}, R_{2}, \dots, R_{22}$ , monthly net return is:

R_{1 - 22} = (1 + R_{1}) \times (1 + R_{2}) \times \dots \times (1 + R_{22}) - 1

For daily log returns $r_{1}, r_{2}, \dots, r_{22}$ , monthly log return is:

r_{1 - 22} = lo g (\frac{P _{22}}{P _{21}} \frac{P _{21}}{P _{20}} ... \frac{P _{1}}{P _{0}}) = lo g (\frac{P _{22}}{P _{21}}) + lo g (\frac{P _{21}}{P _{20}}) + ... + lo g (\frac{P _{1}}{P _{0}}) = r_{1} + r_{2} + \dots + r_{22}

Random walk model

Additive log returns suggest using the following to model asset prices

lo g (\frac{P _{t}}{P _{0}}) = r_{1} + r_{2} + ... + r_{t}

If ${r_{t}}$ are i.i.d., then the log return process is a RW with drift $μ$ and volatility $σ$

aggregate returns over $n$ periods has mean $n μ$ and volatility $n σ$

Exponential/geometric random walk

P_{t} = P_{0} exp {r_{1} + ... + r_{t}}

Return distribution

Most convenient assumption: normal (by CLT)
- Not a good description of reality due to fat tails (heavier than normal)
Skewness = $E [(\frac{X - μ}{σ})^{3}]$
- Right skewed $⟺$ positively skewed
Kurtosis = $E [(\frac{X - μ}{σ})^{4}] - 3$
- Measures heaviness of the tails
- Defined as the standardized fourth central moment of a distribution minus 3, which is the kurtosis of the standard normal distribution
- Returns are leptokurtic

E.g. Identify skewness/kurtosis from QQ plot

W2: Univariate Return Modelling

Normality tests

Kolmogorov-Smirnov - Based on distance of empirical & Normal CDF
Jarque-Bera - Based on skewness & kurtosis combined
Shapiro-Wilk (most powerful) - Based on sample & theoretical quantiles (QQ plot)

Heavy tail distributions

A pdf $f (x)$ is said to have:

Exponential tails, if $f (x) \propto exp (- x / λ)$
Polynomial tails, if $f (x) \propto x^{- (1 + α)}$

Heavy tailed distributions are those with polynomial tails.

$α$ is the tail index controlling tail weight: smaller $⟺$ heavier tails
for $k \geq α$ , moments are infinite: $E (X^{k}) = \infty$
- Although the MGF’s is infinite, the characteristic function always exists (refer to PS2 Q2)

Examples

Pareto

f (x) = \frac{α x ^{- α + 1}}{l ^{α}}

Cauchy: $α = 1$

f (x) = \frac{1}{π ( 1 + x ^{2} )}

Student’s t: $α = ν$

f (x) = \frac{Γ ( \frac{ν + 1}{2} )}{ν π Γ ( \frac{ν}{2} )} (1 + \frac{x ^{2}}{ν})^{- \frac{ν + 1}{2}}

Theoretical justification

Let $r_{1}, \dots, r_{n} \sim$ i.i.d. heavy tail distributions with tail index $0 < α < 2$

By the generalized CLT, the aggregate return $r_{1 \to n} = r_{1} + \dots + r_{n} \sim$ stable distribution

A distribution is stable if linear combinations of independent RVs have the same distribution, up to location and scale parameters.

All stable distributions besides the Normal have heavy tails, but not all heavy tailed distributions are stable (unstable if tail index > 2)
Moreover, the sum of independent stable RVs also follows a stable distribution
Thus, adding many heavy tail $(σ = \infty)$ i.i.d. price changes, we get heavy tail returns

Modeling tail behaviour

The complementary CDF of a heavy tail distribution behaves as:

\overset{ˉ}{F} (x) = 1 - F (x) = P (X > x) \sim x^{- α}, as x ↑

To model (absolute) returns above a cutoff $r_{min}$ , use Pareto distribution $\overset{ˉ}{F} (r) = (r / r_{m i n})^{α}, \forall r > r_{m i n}$

To estimate tail index $α$ , use:

Maximum Likelihood: $\overset{α}{^} = \frac{n}{\sum _{i = 1}^{n} l n ( r _{i} / r _{m i n} )}$
Pareto QQ plots (for tails, e.g. top 25% of returns):
- Plot empirical CDF vs returns in log-log-scale
- Estimate $α$ using slope of best fitting line (simple linear regression)
Student’s t QQ plot (for entire distribution, not just tails)
- Adjust for location and scale $Y = μ + σ X$ where $X \sim t (ν), {E (X) = 0 V (X) = \frac{ν}{ν - 2}$
- Estimate parameters $(μ, σ, ν)$ using MLE
Mixture models
- Generate an RV from one out of a family of distributions, chosen at random according to another distribution (a.k.a. mixing distribution)
  - Easy to generate, but not easy to work with analytically
- 2 types: discrete and continuous
  - e.g. (discrete mixing distribution) RV generated from ${N (0, 1) N (5, 3) p = 60% p = 40%$
- e.g. (continuous mixing distribution) $Y = μ + V \cdot Z$ where $V$ is a RV. This is called a normal scale mixture.
  - Examples with heavy tails:
    - (GARCH) $r_{t} = μ + σ_{t} Z_{t}$ where the mixing process for $σ_{t}$ is $σ_{t}^{2} = ω + \sum_{i = 1}^{p} α_{i} r_{t - i}^{2} + \sum_{j = 1}^{q} β_{j} σ_{t - j}^{2}$
    - (T-dist) $t = Z ν / W$ where $W \sim χ^{2} (df = ν)$

E.g. using mixture models, verify that for $X \sim t (ν)$ , ${E (X) = 0 V (X) = \frac{ν}{ν - 2}$

Hint: if $W \sim χ^{2} (ν)$ , then $E (1/ W) = 1/ (ν - 2)$ .

$E (t) = E (Z ν / W) = 0 E (Z) E (ν / W) = 0$

$V (t) = E (t^{2}) - 0 [E (t)]^{2} = E (Z^{2} \cdot \frac{ν}{W}) = E (Z^{2}) E (\frac{ν}{W}) = 1 \cdot ν E (\frac{1}{W}) = \frac{ν}{ν - 2}$

Stylized Facts

Typical empirical asset return characteristics:

Absence of simple autocorrelations
Volatility clustering
Heavy tails
Intermittency (alternation between periodic and chaotic behaviour)
Aggregation changes distribution (the distribution is not the same at different time scales)
Gain/loss asymmetry

Extreme value theorem

2 limit results for modelling extreme events that happen with small probability

1st EVT (Normalized max of an iid sequence converges to the generalized extreme value distribution)

Let $X_{1}, X_{2}, ...$ be i.i.d. RVs and $M_{n} = max (X_{1}, ..., X_{n})$ .

$\exists$ normalizing constants $a_{n} > 0, b_{n}$ s.t.

P (\frac{M _{n} - b _{n}}{a _{n}} \leq x) = [F (a_{n} x + b_{n})]^{n} \to H (x)

If $H (x)$ exists, it must be one of:

Gumbel (exponential tails): H (x) Frechet (heavy tails): H (x) Weibull (light/finite tails): H (x) = exp {- e^{- x}}, x \in R = {0 exp {- x^{- α}} x < 0 x > 0 = {exp {- ∣ x ∣^{α}} 1 x < 0 x > 0

We can combine the three types into the generalized extreme value distribution

H (x) = exp {- (1 + ξ \frac{x - μ}{σ})_{+}^{- 1/ ξ}}

$ξ$ is the shape parameter:

$> 0$ for heavy tails
$= 0$ for exponential tails
$< 0$ for light tails

E.g. Show that the “normalized” max of iid Uniform (0, 1) with $a_{n} = \frac{1}{n}$ , $b_{n} = 1$ converges to Weibull for $x < 0$

P (\frac{M _{n} - b _{n}}{a _{n}} \leq x) n \to \infty lim (1 - \frac{∣ x ∣}{n})^{n} = P (\frac{M _{n} - 1}{1/ n} \leq x) = P (M_{n} \leq \frac{x}{n} + 1) = P (max (U_{1}, ..., U_{n}) \leq 1 + \frac{x}{n}) = i = 1 \prod n P (U_{i} \leq 1 + \frac{x}{n}) = (1 + \frac{x}{n})^{n} = (1 - \frac{∣ x ∣}{n})^{n} = e^{- ∣ x ∣} \leftarrow CDF for Weibull with α = 1

2nd EVT (Conditional distribution converges to GPD above threshold)

For RV $X$ with CDF $F (\cdot)$ , consider its conditional distribution given that it exceeds some threshold $u$ :

F_{u} (y) = \frac{F ( u + y ) - F ( u )}{1 - F ( u )}, 0 \leq y \leq x_{F} - u

where $x_{F} = sup {x \in R : F (x) < 1}$ is the right endpoint (finite or $\infty$ ) of $F$

In certain cases, as $u \to x_{F}$ , the conditional distribution converges to the same (family of) distributions called the Generalized Pareto Distribution (GPD)

F_{u} (y) \to G_{ξ, σ} (y) = 1 - (1 + ξ \frac{y}{σ})_{+}^{- 1/ ξ} = {1 - (1 + ξ \frac{y}{σ})^{- 1/ ξ} 1 - e^{- y / σ}, ξ \neq = 0 ξ = 0

where $σ > 0, y \geq 0$ , and for $ξ < 0, y \leq - σ / ξ$

This gives:

$ξ > 0$ : heavy tails (tail index $1/ ξ$ )
$ξ = 0$ : exponential distribution
$ξ < 0$ : finite upper endpoint

W3: Multivariate Modeling

We can model the returns of a linear combination of assets using a constant matrix $A$ like so

C o v (A^{T} R) = A^{T} C o v (R) A

To minimize the effect of outliers, we can use robust estimation - an estimation technique that is insensitive to small departures from the idealized assumptions that were used to optimize the algorithm

However, we should never remove outliers in finance. We can instead model heavy tails using the following.

Multivariate (Student’s) t distribution

A more practical/realistic distribution than Normal for modelling financial returns.

$R = μ + Z ν / W$ where $Z \sim N (0, Λ), W \sim χ^{2} (df = ν)$

Note that it is a Normal that gets scaled/divided by the square root of a Chi-square

Notation: $R \sim t_{ν} (μ, Λ)$ where $Λ = C o v (Z)$ , not $C o v (R)$

$E (R) = E (Z)$ , but $C o v (Z) \neq = C o v (R)$

$E (R) = E (μ + Z ν / W) = μ + E (Z) \cdot E (ν / W) = μ$
$C o v (R) = C o v (μ + Z ν / W) = C o v (Z) C o v (ν / W) = Λ \frac{ν}{ν - 2} for v>2$

Marginals are t-distributed with the same degrees of freedom $⟹$ all asset returns have the same tail index $α$

There is tail dependence - extreme values are observed at the same time in all dimensions (desirable property for modelling financial returns)

The greater the tail dependence, the more points we will observe in the corners (figure on the left has tail dependence).

Linear combinations of multivariate t follow 1D t with the same df

$R \sim t_{ν} (μ, Λ) ⟹ w^{T} R \sim t_{ν} (w^{T} μ, w^{T} Λ w)$

$E (w^{T} R) = w^{T} E (R)$
$Va r (w^{T} R) = w^{T} Va r (μ + Z \frac{ν}{w}) w = w^{T} Λ Va r (Z) Va r (\frac{ν}{w}) w = \frac{ν}{ν - 2} (w^{T} Λ w)$

Using the same degree of freedom is limiting. A more flexible way is to model dependencies with copulas.

Copula

Intuitively, copulas allow us to decompose a joint probability distribution into the following:

their marginals (which by definition have no correlation)
a function which couples them together

thus allowing us to specify the correlation separately. The copula is that coupling function. (joint = copula + marginals)

Formally, a copula is a multivariate CDF with Uniform(0, 1) marginals

$C (u_{1}, ..., u_{d}) \in [0, 1], \forall u_{1}, ..., u_{d} \in [0, 1]$

$C (0, ..., 0) = 0$
$C (1, ..., 1) = 1$
$C (..., u_{i - 1}, 0, u_{i + 1}, ...) = 0$
$C (1, ..., 1, u_{i}, 1, ..., 1) = u_{i}$

Independence copula

$C_{indep} (u_{1}, \dots, u_{d}) = u_{1} \times \dots \times u_{d}$

Fréchet-Hoeffding theorem (Copula bounds)

Any copula is bounded like so

\underline{C} (u_{1}, \dots, u_{d}) \leq C (u_{1}, \dots, u_{d}) \leq \overset{ˉ}{C} (u_{1}, \dots, u_{d}) where {\underline{C} (u_{1}, \dots, u_{d}) = max {0, 1 - d + \sum_{i = 1}^{d} u_{i}} \overset{ˉ}{C} (u_{1}, \dots, u_{d}) = min {u_{1}, \dots, u_{d}}

The lower bound is 1 minus number of uniforms plus the values of the uniforms. Observe that the min of the copula is only non-zero if the average value of the uniforms $\frac{\sum u _{i}}{d} > \frac{d - 1}{d}$

Sklar’s Theorem

Any continuous multivariate CDF $F (x_{1}, ..., x_{d})$ with marginal CDF’s $F_{i} (x_{i}), \forall i = 1, ..., d$ can be expressed as a copula

F (x_{1}, ..., x_{d}) = C (F_{1} (x_{1}), ..., F_{d} (x_{d}))

The inverse is also true: any copula combined with marginal CDFs gives a multivariate CDF

If we let $u_{i} = F_{i} (x_{i}) ⟹ x_{i} = F_{i}^{- 1} (u_{i}) ⟹ C (u_{1}, ..., u_{d}) = F (F_{i}^{- 1} (u_{i}), ..., F_{d}^{- 1} (u_{d}))$

So, for continuous CDF $F (x_{1}, .., x_{d})$ with marginals $F_{i} (x_{i})$ , the copula is given by

C (u_{1}, ..., u_{d}) = F (F_{1}^{- 1} (u_{1}), ..., F_{d}^{- 1} (u_{d}))

E.g. If $X \sim F$ , then $F (X) \sim U ni f (0, 1)$ , and $F^{- 1} (U ni f) = X \sim F$

Gaussian Copula

Suppose $X \sim N_{d} (μ, Σ)$ with correlation matrix $ρ$ . Its copula is given by

C_{ρ} (u_{1}, ..., u_{2}) = Φ_{μ, Σ} (Φ_{μ_{1}, σ_{1}^{2}}^{- 1} (u_{1}), ..., Φ_{μ_{d}, σ_{d}^{2}}^{- 1} (u_{d}))

For the independent copula, the derivative would be a plane.

Note: The Gaussian copula only depends on $ρ$ , not on the individual means and variances ( $μ_{i}$ ’s and $σ$ ’s). Shown below.

C_{G a u ss ian} (u_{1}, \dots, u_{d}) = Φ_{μ, Σ} (Φ_{μ_{1}, σ_{1}^{2}}^{- 1} (u_{1}) ..., Φ_{μ_{2}, σ_{2}^{2}}^{- 1} (u_{2})) = Φ_{μ, Σ} (μ_{1} + σ_{1} Φ_{0, 1}^{- 1} (u_{1}), \dots, μ_{d} + σ_{d} Φ_{0, 1}^{- 1} (u_{d})) = Φ_{0, ϕ} (\frac{[ μ _{1} + σ _{1} Φ _{0, 1}^{- 1} ( u _{1} )] - μ _{1}}{σ _{1}}, \dots, \frac{[ μ _{d} + σ _{d} Φ _{0, 1}^{- 1} ( u _{d} )] - μ _{d}}{σ _{d}}) = Σ_{0, ϕ} (Φ_{0, 1}^{- 1} (u_{1}), \dots, Φ_{0, 1}^{- 1} (u_{d}))

Meta-Gaussian distributions

Multivariate distributions with a Gaussian copula

Simulation

Copulas can be created from known distributions. To simulate data from a distribution with copula $C$ and marginals $F_{i}$ :

Generate (dependent) uniforms

(U_{1}, ..., U_{d}) \sim C

Generate target variates from marginals

X_{i} = F_{i}^{- 1} (U_{i}) \forall i

E.g. To generate uniforms from Gaussian copula:

Generate multivariate normals with correlation $ρ$

Z \sim N_{d} (0, ρ)

Calculate uniforms as their marginal CDF’s

U_{i} = Φ (Z_{i})

Then, use these uniforms with any other marginals

For $ρ = 0$ , the pdf of a Gaussian copula vs a t copula looks like

Elliptical copula

Normal and t distributions both have a dependence structure that is said to be elliptical (due to their elliptical contours)

Symmetry of covariance matrix $⟺$ same dependence strength for positively and negatively correlated values

Archimedean copula

Family of copulas with the following form

C (u_{1}, \dots, u_{d}) = ϕ^{- 1} (ϕ (u_{1}) + \dots + ϕ (u_{d}))

where $ϕ$ is called the generator function with the following properties:

$ϕ$ is continuous and convex
$ϕ : [0, 1] \to [0, \infty]$
$ϕ (0) = \infty, ϕ (1) = 0$

There are infinitely many choices for $ϕ$ , but the most common ones are:

Name Clayton Frank Gumbel Generator ϕ (t) t^{- θ} - 1 - ln \frac{e ^{- θt} - 1}{e ^{- θ} - 1} (- ln t)^{θ} Generator Inverse ϕ^{- 1} (s) (1 + s)^{- 1/ θ} - \frac{1}{θ} ln (1 + e^{- s} (e^{- θ} - 1)) e^{- θ \cdot s} Parameter θ \geq 0 θ \geq 0 θ \geq 1

2D Archimedean copula random variates:

Contours of 2D pdf’s with Archimedean copulas and standard normal marginals:

Although Archimedean copulas can model dependence asymmetries, there are limitations in $\geq$ 3D

The copula value is constant for any permutation of coordinates $u_{1}, \dots, u_{d}$

C (u_{1}, \dots, u_{d}) = ϕ^{- 1} (ϕ (u_{1}) + \dots + ϕ (u_{d})

All pairs of coordinates have the same dependence, which is not the case for elliptical copulas

Alternative: vine copulas, which allow for both asymmetry and differences in pairwise dependence.

Fitting copulas

For given copula and marginals, we can use MLE to fit multivariate distribution parameters to data, but the number of parameters can be very high.

Instead, use pseudo-MLE to break problem down into marginals and copula:

Estimate marginal params for each dimension and calculate uniforms

U_{i}^{(j)} = \hat{F_{j}} (X_{i}^{(j)}), \forall i = 1, \dots, n, j = 1, \dots, d

Then estimate copula using ML on uniforms

W4: Portfolio Theory

Assumptions:

Static multivariate return distribution
Investors have same views on mean & variance
Investors want minimum risk for maximum return
Investors measure risk by portfolio’s variance
No borrowing or short-selling restrictions
No transaction costs

Two asset portfolio

The portfolio return is $R_{p} = w_{1} R_{1} + w_{2} R_{2}$

R_{p} = \frac{V ( t ) - V ( 0 )}{V ( 0 )} = \frac{( x _{1} S _{1} ( t ) + x _{2} S _{2} ( t )) - ( x _{1} S _{1} ( 0 ) + x _{2} S _{2} ( 0 ))}{V ( 0 )} = \frac{x _{1} ( S _{1} ( t ) - S _{1} ( 0 )) + x _{2} ( S _{2} ( t ) - S _{2} ( 0 ))}{V ( 0 )} = x_{1} R_{1} \frac{S _{1} ( t ) - S _{1} ( 0 )}{S _{1} ( 0 )} \frac{S _{1} ( 0 )}{V ( 0 )} + x_{2} R_{2} \frac{S _{2} ( t ) - S _{2} ( 0 )}{S _{2} ( 0 )} \frac{S _{2} ( 0 )}{V ( 0 )} = w_{1} \frac{x _{1} S _{1} ( 0 )}{V ( 0 )} R_{1} + w_{2} \frac{x _{2} S _{2} ( 0 )}{V ( 0 )} R_{2}

We can model it like so

R_{p} = w^{T} \cdot R = (w_{1} w_{2}) (R_{1} R_{2})

where $R \sim N_{2} (μ, Σ) ⟹ R \sim N_{1} (μ_{p}, σ_{p}^{2})$ with

μ_{p} σ_{p}^{2} = E (R_{p}) = E (w^{T} R) = w^{T} E (R) = w^{T} μ = V (R_{p}) = V (w^{T} R) = w^{T} V (R) w = w^{T} Σ w = [w_{1} w_{2}] [σ_{1}^{2} σ_{21} σ_{12} σ_{2}^{2}] [w_{1} w_{2}] = w_{1}^{2} σ_{1}^{2} + 2 w_{1} w_{2} σ_{12} + w_{2}^{2} σ_{2}^{2}

Let $w_{1} = w$ , $w_{2} = 1 - w$ , then $σ_{p}^{2} = w^{2} σ_{1}^{2} + 2 w (1 - w) σ_{12} + (1 - w)^{2} σ_{2}^{2}$

To minimize, differentiate w.r.t. w and set to 0: $\frac{\partial}{\partial w} σ_{p}^{2} = 0$

2 w σ_{1}^{2} + 2 (1 - w) σ_{12} - 2 w σ_{12} - 2 (1 - w) σ_{2}^{2} w (σ_{1}^{2} - σ_{12}) + (1 - w) (σ_{12} - σ_{2}^{2}) w = 0 = 0 = \frac{- σ _{12} + σ _{2}^{2}}{σ _{1}^{2} + σ _{2}^{2} - 2 σ _{12}}

Multiple asset portfolio

Consider n risky assets with returns $R_{1}, \dots, R_{n}$

R = R_{1} ⋮ R_{n} \sim N μ = μ_{1} ⋮ μ_{n}, Σ = σ_{1}^{2} ⋮ σ_{n 1} \dots ⋱ \dots σ_{1 n} ⋮ σ_{n}^{2}

A portfolio with weights $w = [w_{1}, \dots, w_{n}]^{T}$ s.t. $\sum_{i = 1}^{n} w_{i} = w^{T} 1 = 1$ has

R_{p} \sim N_{1} (μ_{p}, σ_{p}^{2})

To find the min variance portfolio with given expected return $μ_{p}$ , we solve the following quadratic optimization problem with linear constraints

w min {w^{⊤} Σ w}, subject to {w^{⊤} μ = μ_{p} w^{⊤} 1 = 1

The set of such portfolios forms a parabola in mean-variance space, containing attainable portfolios.

Minimum variance portfolio weights

We can use Lagrange multipliers to find the minimum variance portfolio weights:

Lagrange Multipliers are used to find the local max/min subject to equality constraints

1 constraint (2 variables): example
$L (x, y, λ) \nabla_{x, y, λ} L (x, y, λ) = f (x, y) - λ g (x, y) = 0 ⟺ {\nabla_{x, y} f (x, y) = λ \nabla_{x, y} g (x, y) g (x, y) = 0$
M constraints (n variables):
$L (x_{1}, \dots, x_{n}, λ_{1}, \dots, λ_{M}) \nabla_{x_{1}, \dots, x_{n}, λ_{1}, \dots, λ_{M}} L (x_{1}, \dots, x_{n}, λ_{1}, \dots, λ_{M}) = f (x_{1}, \dots, x_{n}) - k = 1 \sum M λ_{k} g_{k} (x_{1}, \dots, x_{n}) = 0 ⟺ {\nabla f (x) - \sum_{k = 1}^{M} λ_{k} \nabla g_{k} (x) = 0 g_{1} (x) = \dots = g_{M} (x) = 0$

Objective function (Lagrangian):

L (w, λ) = w^{T} Σw - λ (w^{T} 1 - 1)

Differentiate and set to 0:

\frac{\partial L}{\partial w} 2 Σw - λ 1 w = 0 = 0 = \frac{λ}{2} Σ^{- 1} 1

Solve for lambda:

w^{T} 1 = 1 ⟹ \frac{λ}{2} \cdot 1^{T} Σ^{- 1} 1 = 1 ⟹ λ = \frac{2}{1 ^{T} Σ ^{- 1} 1}

Plugging lambda into $w$ , we get

w^{*} = \frac{Σ ^{- 1} 1}{1 ^{T} Σ ^{- 1} 1}

It is the row sums of $Σ^{- 1}$ divided by the sum of all its elements.

Risk-free asset

Consider splitting an investment into portfolio $(μ_{p}, σ_{p})$ & risk-free asset, with weights $w_{p}$ and $1 - w_{p}$

A risk-free asset has constant return $R_{f} = μ_{f} > 0, σ_{f} = 0$

The return is given by $R = w_{p} R_{p} + (1 - w_{p}) R_{f}$ , with

$E (R) = w_{p} E (R_{p}) + (1 - w_{p}) E (R_{p}) = w_{p} μ_{p} + (1 - w_{p}) μ_{f}$
$V (R) = V (w_{p} R_{p} + (1 - w_{p}) R_{f}) = w_{p}^{2} V (R_{p}) = w_{p}^{2} σ_{p}^{2}$

For a set of assets including risk-free ones, the best investments lie on the line tangent to the efficient frontier - they are combinations of the tangency portfolio and risk free assets.

The tangency portfolio is the efficient frontier portfolio that belongs to the tangent line.
The slope of the line is the Sharpe ratio.

To find the tangency portfolio, maximize Sharpe ratio.

max {\frac{μ _{p} - μ _{f}}{σ _{p}}} = w max {\frac{w ^{⊤} μ - μ _{f}}{w ^{⊤} Σ w}}, subject to w^{⊤} 1 = 1

Tangency portfolio weights (solution to above) are given by

w_{T} = \frac{Σ ^{- 1} ( μ - μ _{f} )}{1 ^{⊤} Σ ^{- 1} ( μ - μ _{f} )}

CAPM (Capital asset pricing model)

If every investor follows mean-variance analysis & the market is in equilibrium, then:

Every investor holds some portion of the same tangency portfolio
The entire financial market is composed of the same mix of
risky assets
Tangency portfolio is simply the market value-weighted index

Market portfolio

Since composition of the tangency portfolio is equivalent to that of the market portfolio, its weights are just

w_{i} = \frac{S _{i} O _{i}}{\sum _{i = 1}^{N} S _{i} O _{i}}

where $S_{i}$ = price of asset i, $O_{i}$ = # shares outstanding

Capital market line

Every mean-variance efficient portfolio $(μ_{p}, σ_{p})$ lies on the capital market line:

μ_{p} = μ_{f} + \frac{μ _{M} - μ _{f}}{σ _{M}} σ_{p}

where $μ_{f}$ is the risk free rate, and $(μ_{M}, σ_{M})$ is the market portfolio

Security market Line

CAPM implies the following relationship between risk and expected return for all assets/portfolios (not just efficient ones)

μ_{i} = μ_{f} + β_{i} (μ_{M} - μ_{f}) where β_{i} = \frac{σ _{i M}}{σ _{M}^{2}} = \frac{C o v ( R _{i} , R _{M} )}{σ _{M}^{2}} = μ_{f} + β_{i} Sharpe Ratio (\frac{μ _{M} - μ _{f}}{σ _{M}}) σ_{M}

Implications:

At equilibrium, an asset’s return depends only on its relation to the market portfolio.
- $β_{i}$ measures the extent to which an asset’s return is related to the market. Higher $∣ β ∣ ⟺$ higher risk and reward.
Investors are only rewarded with higher returns for taking on market/systematic risk

Derivation: max Sharpe ratio using 1st order conditions

w max {\frac{μ _{M} - μ _{f}}{σ _{M}}} = w max {\frac{w ^{⊤} μ - μ _{f}}{w ^{⊤} Σ w}}, subject to w^{⊤} 1 = 1

E.g. Consider N assets with iid $N (μ, σ^{2})$ returns and risk free return $μ_{f} < μ$ . Find market portfolio weights and SML.

$\forall w s.t. w^{T} 1 = 1$ , $w^{T} μ 1 = μ$

Since the market portfolio is the min variance portfolio, we have

w^{*} = \frac{Σ ^{- 1} 1}{1 ^{T} Σ ^{- 1} 1} = \frac{\frac{1}{σ ^{2}} I1}{\frac{1}{σ ^{2}} 1 ^{T} I1} = \frac{1}{N} ⟹ w_{i} = \frac{1}{N}, \forall i = 1, \dots, N

So the minimum variance is

w^{T} Σ w = (\frac{1}{N})^{2} 1^{T} (σ^{2} I) 1 = σ^{2} \frac{1}{N ^{2}} N = \frac{σ ^{2}}{N}

Security characteristic line

To find $β_{i}$ ’s empirically, regress ( $R_{i} - R_{f}$ ) on ( $R_{M} - R_{f}$ )

(R_{i, t} - R_{f, t}) = β_{i} (R_{M, t} - R_{f, t}) + ϵ_{t}, where ϵ_{t} \sim N (0, σ_{ϵ, i}^{2})

$R_{M}$ is the market return (proxy by large market index, ex S&P500)
$R_{f}$ is the risk free rate (proxy by T-bill)

The slope of the SCL is the beta estimate:

Mean return vs estimated betas:

Now consider adding an intercept, $α$

(R_{i, t} - R_{f, t}) = α_{i} + β_{i} (R_{M, t} - R_{f, t}) + ϵ_{t}

The mean and variance are given by

μ_{i} = E [R_{i}] σ_{i}^{2} = V [R_{i}] = E [R_{f, t} + α_{i} + β_{i} (R_{M, t} - R_{f, t}) + ϵ_{t}] = R_{f} E [R_{f}] + α_{i} + β_{i} μ_{m - R_{f}} E [R_{m} - R_{f}] + = 0 E [ϵ_{t}] = R_{f} + α_{i} + β_{i} (μ_{m} - R_{f}) = V [Va r = 0 R_{f} + α_{i} + β_{i} (R_{m} - R_{f}) + ϵ_{i})] = V [β_{i} (R_{m} - Va r = 0 R_{f})] + V [ϵ_{i}] = β_{i}^{2} \cdot V [R_{M}] + σ_{ϵ, i}^{2} = β_{i}^{2} \cdot σ_{m}^{2} + σ_{ϵ, i}^{2}

$α_{i}$ measures the excess increase in asset return on top of that explained by the $β_{i}$ . The bigger the alpha, the higher the outperformance (compared to the market portfolio).

Legacy of CAPM

CAPM says the best portfolio you can create is the tangency/market portfolio. This implies the best you can do is get the broadest index and combine it with a T-bill.

CAPM is wrong, but had immense practical impact on investing, specifically in terms of

Diversification: concept of decreasing risk by spreading portfolio over different assets
Index investing: justification for common investing strategy of tracking some broad index with mutual funds or ETF’s
Benchmarking: Measuring performance of investment relative to market / index

Performance Evaluation

There are several ways to measure an asset’s performance, based on CAPM

Sharpe ratio: (excess return per unit risk)

S_{i} = \frac{μ _{i} - μ _{f}}{σ _{i}}

Treynor index: (excess return per unit non-diversifiable risk)

T_{i} = \frac{μ _{i} - μ _{f}}{β _{i}}

Jensen’s alpha: (excess return on top of the return explained by the market)

α_{i} = \overset{α}{^}_{i}

Usually the most important measure a portfolio manager tries to use to convince people to invest in them.

W5: Factor Models

Main implication of CAPM: the market is the single factor driving asset returns

To improve performance, use more factors that drive asset returns

Factor Models

3 types:

Macroeconomic: Factors are observable economic and financial time series
Fundamental: Factors are created from observable asset characteristics
Statistical: Factors are unobservable, and extracted from asset returns

All 3 types follow some form of

R_{i} (t) = β_{i, 0} + β_{i, 1} F_{1} (t) + \dots + β_{i, p} F_{p} (t) + ϵ_{i} (t), \forall {i = 1, \dots, N t \in R

$R_{i} (t)$ is return on the $i^{t h}$ asset at time t
$F_{j} (t)$ is the $j^{t h}$ common factor at time t
$β_{i, j}$ is the factor loading/beta of $i^{t h}$ asset on the $j^{t h}$ factor
$ϵ_{i} (t)$ is the idiosyncratic/unique return of asset $i^{t h}$

In matrix form:

R_{1} (t) ⋮ R_{N} (t) R (t) = β_{1, 0} ⋮ β_{N, 0} + β_{1, 1} ⋮ β_{N, 1} \dots ⋱ \dots β_{1, p} ⋮ β_{N, p} F_{1} (t) ⋮ F_{p} (t) + ε_{1} (t) ⋮ ε_{p} (t) \Leftrightarrow = β_{0} + β^{⊤} F (t) + ε (t)

Assumptions:

Asset specific errors $ϵ_{i}$ are uncorrelated with common factors

C o v (ϵ (t), F (t)) = 0

The factors $F_{j} (t)$ are stationary, with moments

E (F (t)) = μ_{F}, V (F (t)) = Σ_{F}

Errors are serially and contemporaneously uncorrelated across assets

E (ϵ) V (ϵ (t)) C o v [ϵ (t), ϵ (s)] = 0 = diag [{σ_{ϵ_{i}}^{2}}_{i = 1, \dots, N}] = Σ_{ϵ} = 0

Find moments of model $R (t) = β_{0} + β^{T} F (t) + ϵ (t)$

μ_{R} Σ_{R} = E (β_{0} + β^{T} F (t) + ϵ (t)) = β_{0} + β^{T} E (F (t)) + E (ϵ (t)) = β_{0} + β^{T} μ_{F} = V (β_{0} + β^{T} F (t) + ϵ (t)) = V (β^{T} F (t)) + V (ϵ (t)) = β^{T} V (F (t)) β + Σ_{ϵ} = β^{T} Σ_{F} β + Σ_{ϵ}

Find moments of portfolio with $w = [w_{1} \dots w_{N}]^{T}$ , i.e. $R = w^{T} R$

μ σ^{2} = E (w^{T} R) = w^{T} E (R) = w^{T} (β_{0} + β μ_{F}) = V (w^{T} R) = w^{T} V (R) w = w^{T} (β^{T} Σ_{F} β + Σ_{ϵ}) w

Time Series Regression Models

Consider model for which factor values are known (e.g. macro/fundamental model).

We can estimate betas & risks (variances) for one asset at a time. For each $i = 1, \dots N$ , fit regression model:

R_{i} (t) = β_{i, 0} + β_{i, 1} F_{1} (t) + \dots + β_{i, p} F_{p} (t) + ϵ_{i} (t)

over observations $t = 1, \dots, T$

Most models will always include some proxy for the overall economy (e.g. the market). The following is a famous example.

Fama-French 3 Factor Model

3 factors

Excess Market Return (XMT)
- Same as in CAPM.
Small Minus Big (SMB)
- Captures the size (market cap) of the company/stock.
High Minus Low (HML)
- High = value stock; low = growth stock.
- Measured using book-to-market ratio.

We can use the factor model to estimate the return covariance matrix.

Var (R) = \hat{Σ}_{R} = \hat{β}^{⊤} \hat{Σ}_{F} \hat{β} + \hat{Σ}_{ε}

where:

\hat{β} = beta coefficient matrix (from regressions) \hat{Σ}_{F} = factor sample covariance matrix \hat{Σ}_{ε} = diagonal error variance matrix (from residuals)

This gives more stable estimates than sample covariance.

Statistical Factor Models

Factors are unknown and unobserved

Need to estimate both $β$ and $F$
Problem is ill-posed $⟹$ need constraints

Assumptions

Asset specific errors $ϵ_{i}$ are uncorrelated with common factors

C o v (ϵ (t), F (t)) = 0

The factors $F_{j} (t)$ are orthogonal, with moments

E (F (t)) V (F (t)) = 0 = I

Errors are serially and contemporaneously uncorrelated across assets

E (ϵ) V (ϵ (t)) C o v [ϵ (t), ϵ (s)] = 0 = diag [{σ_{ϵ_{i}}^{2}}_{i = 1, \dots, N}] = Σ_{ϵ} = 0

Resulting moments of returns

μ_{R} Σ_{R} = E [R] = E [β_{0} + β^{T} F + ϵ] = β_{0} + β^{T} 0 E [F] + 0 E [ϵ] = β_{0} = V [R] = V [β_{0} + β^{T} F + ϵ] = β^{T} V [F] β + Σ_{ϵ} = β^{T} (I \cdot σ_{F}) β + Σ_{ϵ}

Principal component analysis

PCA: constructing a set of variables (components) that capture most of the variability given a set of $N$ assets

It can be thought of as a linear transformation of original variables.

For a random vector $R = [R_{1} \dots R_{N}]^{'}$ with covariance $Σ_{R}$ (correlation $ρ_{R}$ ), the PC’s are linear combinations of $(R_{1}, \dots, R_{N})$

F_{1} = γ_{1}^{⊤} R = γ_{11} R_{1} + \dots + γ_{1 N} R_{N} ⋮ F_{N} = γ_{N}^{⊤} R = γ_{N 1} R_{1} + \dots + γ_{NN} R_{N}

such that:

$F_{1}, \dots, F_{N}$ are uncorrelated
Each component has maximum variance

Problem definition

We want to find components $F_{i}, F_{j}$ (i.e. find coefficient vectors $γ_{i}$ ) s.t.

$F_{i} = γ_{i}^{⊤} X$ maximizes $Var (F_{i}) = γ_{i}^{⊤} Σ_{R} γ_{i}$ subject to $γ_{i}^{⊤} γ_{i} = 1$
$C o v (F_{i}, F_{j}) = γ_{i}^{⊤} Σ_{R} γ_{j} = 0$ , for any $j < i$

Solution

Given by eigen-decomposition of $Σ_{R}$

Σ_{R} = λ_{1} e_{1} e_{1}^{⊤} + \dots + λ_{N} e_{N} e_{N}^{⊤} = P Λ P^{⊤}

where

P = ∣ e_{1} ∣ \dots ∣ e_{N} ∣ and Λ = λ_{1} ⋮ 0 \dots ⋱ \dots 0 ⋮ λ_{N}

P is an orthogonal matrix, i.e. $P^{- 1} = P^{T}$
$λ_{1} \geq \dots \geq λ_{N} \geq 0$

Principal components: $F_{j} = e_{j}^{T} R = e_{j_{1}} R_{1} + \dots + e_{j N} R_{N}, \forall j = 1, \dots, N$

F = P^{T} R = - - e_{1}^{T} ⋮ e_{N}^{T} - - R

Find $Va r (F)$

V (F) = V (P^{T} R) = P^{T} V (R) P = P^{T} Σ_{R} P = P^{T} (PΛ P^{T}) P = Λ

Find the loading of $R_{i}$ on $F_{j}$ (the beta)

R_{i} C o v (R, F) Va r (F_{j}) = j = 1 \sum N β_{i, j} F_{j} where β_{i, j} = \frac{C o v ( R _{j} , F _{j} )}{Va r ( F _{j} )} = C o v (R, P^{T} R) = C o v (R, R) P = Σ_{R} P = (P Λ P^{T}) P = P Λ = P λ_{1} 0 ⋱ 0 λ_{N} = λ_{j}

Total variance of all PC’s = variance of original variable

t r (Λ) λ_{1} + \dots + λ_{p} = t r (Σ) = σ_{1}^{2} + \dots σ_{N}^{2}

Proportion of total variance explained by each PC is

\frac{λ _{j}}{λ _{1} + \dots + λ _{N}}

How do we choose the number of PCs?

We can use a scree plot:
E.g. In this example, one PC already explains much of the variation

PCA can be used to identify components that explain overall variation of data, but it does not always give meaningful PC’s - PC’s are just transformations that capture the most variability, they do not explain how data was generated.

For a proper data-generating model, use Factor Analysis:

Factor Analysis

Assuming $Σ_{F} = I$ , the return variance becomes $Σ_{R} = co mm u na l i t y β^{T} β + u ni q u e n ess Σ_{ϵ}$

We need to estimate $β$ and variances $σ_{ϵ_{i}}^{2}$ using maximum likelihood.

A rotation of $β$ (scaling it with orthogonal matrix P) has no effect on the model:

Σ_{R} = (β^{⊤} P) (P^{⊤} β) + Σ_{ε} = β^{⊤} (PP^{⊤}) β + Σ_{ε} = β^{⊤} β + Σ_{ε}

We need further constraints on $β$ . A common constraint is to rank factors by explained variability, similar to PCA.

W6: Risk Management

Types of risks

• Market risk: due to changes in market prices
• Credit risk: counterparty doesn’t honour obligations
• Liquidity risk: lack of asset tradability
• Operational risk: from organization’s internal activities (e.g. legal, fraud, or human error risk)

Risk measures

There exists different notions of risk (losing money, bankruptcy, not achieving desired return), but in practice risk measures are used to determine the amount of cash to be kept on reserve
Return volatility is not a good risk measure. The following distributions have the same $σ$ , but their risk profiles are very different
- LHS: average return is positive but it has a fat left tail, so returns could be large negative values
- RHS: average return is negative, but no chance of getting very large negative values; hence less risky

Value at Risk (VaR)

The VaR is the amount that covers losses with probability $1 - α$ .

Let $L$ be the loss of an investment over time period $T$ . ( $L = - R$ , where $R$ is revenue).

The VaR is defined as the $1 - α$ quantile of $L$ for some $α \in (0, 1)$ :

VaR_{α} = in f {x : P (L \leq x) \geq 1 - α} = in f {x : P (L > x) \leq α}

For a continuous RV with CDF $F_{L}$ , it is defined as:

Va R_{α} = F_{L}^{- 1} (1 - α)

E.g. Consider asset with $N (μ = 0.03, σ^{2} = 0.04)$ annual log-returns. Find the 95% confidence level annual VaR for a $1000 investment in this asset.

Want to find VaR( $α$ ) s.t.

P (L > Va R) 0.05 ⟹ z = - 1.645 Va R = P (- R > Va R) = P (R < - Va R) = P (S_{T} - S_{0} < - Va R) = P (S_{T} < S_{0} - Va R) = P (S_{0} e^{X} < S_{0} - Va R) = P (X < lo g (\frac{S _{0} - Va R}{S _{0}})) = P \frac{X - 0.03}{.2} < \frac{lo g ( 1 - \frac{Va R}{S _{0}} ) - 0.03}{.2} = P Z < \frac{lo g ( 1 - \frac{Va R}{S _{0}} ) - 0.03}{.2} = \frac{lo g ( 1 - \frac{Va R}{S _{0}} ) - 0.03}{.2} = (1 - exp {- 1.645 \cdot 0.20 + 0.03}) \cdot S_{0} = 258.44

Formula for VaR given log returns:

VaR = S_{0} (1 - exp {μ + z σ})

Limitations

VaR can be misleading as it hides tail risk and discourages diversification.

However, it is still widely used due to the Basel framework (banking regulations).

As an example, the following have the same VaR but vastly different risk

Solution: use conditional VaR / expected shortfall

Conditional VaR / Expected Shortfall

Defined as the expected value (or average) of losses beyond VaR

\frac{1}{α} \int_{0}^{α} Va R (u) d u = E (L ∣ L \geq Va R_{α})

Examples

VaR & CVaR of a Normal Variable

If R~N(0, 1), find ES at confidence level $α$

Let $Z_{α}$ denote the top $α$ -quantile of the standard normal

Normal pdf:
$f (x) = \frac{1}{2 πσ} exp {- \frac{1}{2} (\frac{x - μ}{σ})^{2}}$
Standard normal pdf:
$ϕ (x) = \frac{1}{2 π} exp {- \frac{x ^{2}}{2}}$

E S_{α} = E (L ∣ L > z_{α}) = \int_{z_{α}}^{\infty} x ϕ (x ∣ L > z_{α}) d x = \int_{z_{α}}^{\infty} x \frac{ϕ ( x )}{P ( L > z _{α} )} d x = \frac{1}{α P ( L > z _{α} )} \int_{z_{α}}^{\infty} x \frac{1}{2 π} e^{- x^{2} /2} d x = \frac{1}{α} \int_{z_{α}}^{\infty} \frac{1}{2 π} [- e^{- x^{2} /2}]^{'} d x = \frac{1}{α} \frac{1}{2 π} [- e^{- x^{2} /2}]_{x = z_{α}}^{\infty} = \frac{1}{α} \frac{1}{2 π} e^{- z_{α}^{2} /2} = \frac{1}{α} ϕ (z_{α})

More generally, for $L \sim N (ϕ, σ^{2})$

E S_{α} Va R_{α} = μ + \frac{ϕ ( z _{α} )}{α} σ = μ + z_{α} σ

Risk measure properties

Let $ρ (L)$ denote a risk measure for an investment with loss L.

A coherent risk measure must satisfy the following properties:

Normalized (the risk of holding no assets is 0)
- $ρ (0) = 0$
Translation invariance (adding loss $c$ to portfolio increases risk by $c$ )
- $ρ (L + c) = ρ (L) + c, \forall c \in R$
Positive homogeneity
- $ρ (b L) = b ρ (L), \forall b > 0$
Monotonicity
- $L_{1} \geq L_{2} ⟹ ρ (L_{1}) \geq ρ (L_{2})$
Sub additivity (due to diversification)
- $ρ (L_{1} + L_{2}) \leq ρ (L_{1}) + ρ (L_{2})$

E.g. Show that VaR and CVaR are translation invariant and positively homogeneous

Let $L^{'} = b L + c$ , then

VaR_{α} (L) VaR_{α} (L^{'}) = in f {x : P (L > x) \leq α} = in f {x^{'} : P (L^{'} > x^{'}) \leq α} = in f {x^{'} : P (b L + c > x^{'}) \leq α} = in f {x^{'} : P (L > \frac{x ^{'} - c}{b}) \leq α} = in f {b x + c : P (L > x) \leq α} = b in f {x : P (L > x) \leq α} + c = b VaR (L) + c

CVaR_{α} (L) CVaR_{α} (L^{'}) = \frac{1}{α} \int_{0}^{α} VaR_{u} (L) d u = \frac{1}{α} \int_{0}^{α} b VaR_{u} (L) + c d u = b (\frac{1}{α} \int_{0}^{α} Va R_{u} (L) d u) + c = b \cdot CVaR (L) + c

E.g. Consider 2 risky zero-coupon bonds priced at $95 p er$ 100 face value. If each one has 4% independent default probability, show that $VaR_{5%}$ is not sub-additive.

Distribution of $L_{1}$ or $L_{2}$ :

L_{i} = {- 5, 95, p = 96% p = 4%

VaR_{5%} (L_{i}) = in f {x : P (L > x) \leq 5%} = in f {x : P (L \leq x) \geq 95%} = - 5

Distribution of $L_{1} + L_{2}$ :

L_{1} + L_{2} = ⎩ ⎨ ⎧ - 5 - 5 = - 10, - 5 + 95 = 90, 95 + 95 = 190, p = .9 6^{2} = 92.15% p = 2 (.96) (.04) = 7.68% p = (.04)^{2} = 0.16%

VaR_{5%} (L_{1} + L_{2}) = in f {x : P (L_{1} + L_{2} \leq x) \geq 95%} = 90

which is greater than

VaR_{5%} (L_{1}) + VaR_{5%} (L_{2}) = - 5 - 5 = - 10

This shows that under VaR, owning both bonds is riskier than owning them separately. VaR is thus incoherent at the 5% level (it hides tail risk). At 3%, it would be coherent.

E.g. Show that $CVaR_{5%}$ is sub-additive.

CVaR_{5%} (L_{1}) CVaR_{5%} (L_{1} + L_{2}) = \frac{1}{5%} \int_{0}^{5%} VaR_{u} (L_{1}) d u = \frac{1}{5%} (95 \cdot 4 % + (- 5) \cdot 1%) = 73 = \frac{1}{5%} \int_{0}^{5%} VaR_{u} (L_{1} + L_{2}) d u = \frac{1}{5%} (190 \cdot 16% + 90 \cdot 4.84%) = 93.2

We see that $CVaR_{5%} (L_{1} + L_{2}) = 93.2 \leq CVaR_{5%} (L_{1}) + CVaR_{5%} (L_{2}) = 2 \cdot 73$

Entropic VaR

EVaR is a coherent alternative to VaR based on the Chernoff bound, which is attained by applying Markov’s inequality to $e^{tX}$ . It is an exponentially decreasing upper bound on the tail of a RV based on its MGF.

Markov inequality: for a positive RV $X$ , we have
$P (X \geq c) \leq \frac{E ( X )}{c}, \forall c > 0$

For loss RV $L$ with MGF $M_{L} (z) = E (e^{z L}) < \infty, \forall z > 0$ , we have

P (L \geq c) = P (e^{z L} \geq e^{zc}) \leq \frac{M _{L} ( z )}{e ^{zc}}

Bound this by $α$ and solve for $c$ :

M_{L} (z) e^{- zc} \leq α ⟹ c = z^{- 1} ln (\frac{M _{L} ( z )}{α})

Thus, EVaR is defined as

EVaR_{α} = z > 0 in f {z^{- 1} ln (\frac{M _{L} ( z )}{α})}

EVaR of a Normal Variable

The MGF of a Normal variable $L \sim N (μ, σ^{2})$ is

M_{L} (Z) = e^{μ z + \frac{1}{2} σ^{2} z^{2}} = E (e^{z L}), \forall z

EVaR is the infimum of the following:

z^{- 1} ln (\frac{M _{L} ( z )}{α}) = \frac{1}{z} ln (\frac{e ^{μ z + \frac{1}{2} σ^{2} z^{2}}}{α}) = \frac{1}{z} (μ z + \frac{σ ^{2} z ^{2}}{2} - ln α) = μ + z \frac{σ ^{2}}{2} - \frac{ln α}{z} = f (z)

To find infimum (minimum) over z>0, differentiate and set to 0:

0 z^{*} = \frac{σ ^{2}}{2} + ln α (\frac{1}{z ^{2}}) = \frac{- 2 ln α}{σ}

So we have

EVaR_{α} (L) = z > 0 in f {f (z)} = f (z^{*}) = μ + \frac{- 2 ln α}{σ} \frac{σ ^{2}}{2} - \frac{ln α}{\frac{- 2 l n α}{σ}} = μ + \frac{- 2 ln α}{σ} \frac{σ ^{2}}{2} + σ \frac{ln α ^{- 1}}{2 ln α ^{- 1}} = μ + σ - \frac{ln α}{2} + σ \frac{ln α ^{- 1}}{2} = μ + 2 σ \frac{ln α ^{- 1}}{2} = μ + σ 2 ln α^{- 1}

Calculating risk measures

3 ways:

Parametric modeling
Historical simulation
Monte Carlo simulation

Other risk management techniques: stress-testing (worst-cast scenario) and extreme value theory (EVT)

~85% of large banks use historical simulation, the remaining use MC simulation

Parametric modeling

Fitting a distribution to revenues/returns and calculating VaR or CVaR/ES based on distribution

E.g. Assuming net returns of an investment follow a normal distribution, then for an initial capital $S_{0}$ , the parametric VaR and CVaR at confidence $1 - α$ are

VaR_{α} CVaR_{α} = - S_{0} \times {\overset{μ}{^} + \overset{σ}{^} Φ^{- 1} (α)} = - S_{0} \times {\overset{μ}{^} + \overset{σ}{^} \frac{ϕ ( Φ ^{- 1} ( α ) )}{α}}

where $\overset{μ}{^}, \overset{σ}{^}$ are sample estimates, and $Φ, ϕ$ are standard normal cdf, pdf.

Historical simulation

Instead of assuming a specific distribution, it uses the empirical distribution of returns estimated by historical data.

Monte Carlo simulation

Even if returns are parametrically modelled, their resulting distribution is often intractable.

E.g. Consider a portfolio of 2 assets - one with normal returns, one with t-distributed returns. The distribution the portfolio return is not explicitly known.

We can simulate returns from such a model and treat simulated values as historical returns.

Time Series Models

Static models assume independence over time (but allow dependence across assets)

RiskMetrics Model

A simple time series model using the exponentially weighted moving average for return volatility

$σ_{t}^{2} = λ σ_{t - 1}^{2} + (1 - λ) r_{t - 1}^{2}$ where $r_{t} \sim N (0, σ_{t}^{2})$

Typically, use $λ = .94$ for daily returns

GARCH(p, q) Model

$r_{t} = μ + σ_{t} ϵ_{t}$ where $ϵ_{t}$ are iid and $σ_{t}^{2} = α_{0} + \sum_{j = 1}^{p} α_{j} r_{t - j}^{2} + \sum_{k = 1}^{q} β_{k} σ_{t - k}^{2}$

W7: Betting Strategies

If we have a sequence of gambles where we have a positive expected payoff, how do we wager our bets for optimal results? We will look at a few different strategies below.

Setup. Consider a sequence of independent & identical gambles with

Let p = P(win) $\in [0.5, 1]$
For each $1 placed, the payoff is ${1 - 1 p q$

Starting with initial wealth $V_{0}$ , assume you bet a constant amount $X$ at each step. Find expected wealth $E (V_{n})$ after n steps (ignoring ruin: $V_{t} \leq 0$ for some $t > 0$ )

Define indicator RV of winning i-th bet: $I_{i} = {10 p q$

V_{1} V_{2} V_{n} E (V_{n}) = V_{0} + X I_{1} - X (1 - I_{1}) = V_{0} + X (2 I_{1} - 1) = V_{1} + X I_{2} - X (1 - I_{2}) = V_{1} + X (2 I_{2} - 1) = V_{0} + X (2 (I_{1} + I_{2}) - 2) ⋮ = V_{0} + X 2 \sim B in o mia l (n, p) i = 1 \sum n I_{i} - n = V_{0} + X (2 E (i = 1 \sum n I_{i}) - n) = V_{0} + X (2 n p - n) = V_{0} + n X > 0 (2 p - 1)

Notice that if $p < \frac{1}{2}$ , $E (V_{n})$ could be negative. Otherwise, we can expect to have some positive wealth at time n which increases linearly. The variance increases quadratically.

Assume we bet $1 at each step. Start with $V_{0} = M$ . Find the probability of eventual ruin, i.e. $V_{n} = 0$ for some n.

Let $π_{i} = P (eventual ruin for V_{0} = i), \forall i \geq 1$ and $π_{0} = 1$

π_{i} = π_{i + 1} \cdot p + π_{i - 1} \cdot q, \forall i \geq 1

Assume solution of the form $π_{i} = y^{i}$

y^{i} = p y^{i + 1} + q y^{i - 1} ⟹ i = 1 y = p y^{2} + q ⟹ p y^{2} - y + q = 0

Solve quadratic:

y = \frac{- b \pm b ^{2} - 4 a c}{2 a} = \frac{- ( - 1 ) \pm ( - 1 ) ^{2} - 4 pq}{2 p} = \frac{1 \pm 1 - 4 p ( 1 - p )}{2 p} = \frac{1 \pm 1 - 4 p + 4 p ^{2}}{2 p} = \frac{1 \pm ( 2 p - 1 ) ^{2}}{2 p} = \frac{1 \pm ( 2 p - 1 )}{2 p} = {1 \frac{1 - 2 p + 1}{2 p} = 1 - \frac{p}{p} = \frac{q}{p} trivial sol’n

If $y = 1$ , then $π_{i} = y^{i} = 1, \forall i \geq 0$ . This is a trivial solution (probability of ruin = 1 at all times).

So $y = \frac{q}{p}$ , and the probability of eventual ruin is $π_{i} = y^{i} = (\frac{q}{p})^{i}, \forall i \geq 0$

Assume we bet everything (entire wealth) at each step. What is our expected wealth $E (V_{n})$ after $n$ steps, not ignoring ruin?

V_{n} E (V_{n}) = {V_{0} \cdot 2^{n} 0 p^{n} 1 - p^{n} = 2^{n} V_{0} \cdot p^{n} + 0 \cdot (1 - p^{n}) = V_{0} (2 p)^{n}

Note that as $n \to \infty, E (V_{n}) \to \infty$ since 2p>1. Wealth will grow exponentially.

Assume we bet a fixed fraction $f$ of wealth at each step. What is our expected wealth $E (V_{n})$ after $n$ steps?

V_{i} V_{n} E (V_{n}) = {V_{i - 1} (1 + f) V_{i - 1} (1 - f) p q = V_{n - 1} (1 + f)^{I_{n}} (1 - f)^{1 - I_{n}} = V_{0} (1 + f)^{\sum_{i = 1}^{n} I_{i}} (1 - f)^{n - \sum_{i = 1}^{n} I_{i}} = E (V_{0} (1 + f)^{w} (1 - f)^{n - w}) \leftarrow w = |wins| = V_{0} (1 - f)^{n} (q + p \frac{1 + f}{1 - f})^{n} E (\frac{1 + f}{1 - f})^{w}

This step uses the PGF (probabilistic generating function) of Binomial(n, p):

G_{w} (z) = E (z^{w}) = (q + p z)^{n}

Continuing:

E (V_{n}) = V_{0} (1 - f)^{n} (q + p \frac{1 + f}{1 - f})^{n} = V_{0} (q (1 - f)) + p (1 + f))^{n} = V_{0} (1 - q f + p f)^{n} = V_{0} (1 + f (p - 1 - p q))^{n} = V_{0} (1 + f (> 0 2 p - 1))^{n}

We have exponential growth and a low probability of ruin.

Kelly Criterion

Bet fraction of wealth that maximizes expected log return (or equivalently log of $V_{n}$ , or geometric average of returns).

Note: by Jensen’s inequality, maximizing log wealth != maximizing wealth, i.e.

E (lo g V_{n}) \neq = lo g (E (V_{n}))

What is the optimal value of the fraction?

E (lo g \frac{V _{n}}{V _{0}}) = E (lo g (\frac{V _{0} ( 1 + f ) ^{w} ( 1 - f ) ^{n - w}}{V _{0}})) = E (w lo g (1 + f) + (n - w) lo g (1 - f)) = lo g (1 + f) E (w) + lo g (1 - f) (n - E (w)) = lo g (1 + f) n p + lo g (1 - f) (n - n p) = lo g (1 + f) n p + lo g (1 - f) n q = G (f)

Now maximize G(f) w.r.t. $f$ and set to 0

\frac{d G ( f )}{df} 0 \frac{p}{1 + f} p (1 - f) p - q f^{*} = \frac{d}{df} (lo g (1 + f) n p + lo g (1 - f) n q) = n (\frac{1}{1 + f} p - \frac{1}{1 - f} q) = \frac{q}{1 - f} = q (1 + f) = f (p + q) = p - q = 2 p - 1

The optimal fraction is the difference between P(win) and P(lose).

What is the geometric average of the returns as $n \to \infty$ ?

Denoting the growth rate from $i - 1$ to $i$ with $r_{i} = \frac{V _{i}}{V _{i - 1}}$ , the geometric average is

n r_{1} \times \dots \times r_{n} = [i = 1 \prod n (1 + f)^{I_{i}} (1 - f)^{1 - I_{i}}]^{1/ n} = (1 + f)^{\sum_{i = 1}^{n} I_{i} / n} (1 - f)^{\sum_{i = 1}^{n} (1 - I_{i}) / n} = (1 + f)^{W_{n} / n} (1 - f)^{1 - W_{n} / n}

where $W_{n} \sim B in o m (n, p)$

By SLLN, $\frac{W _{n}}{n} \to p$ as $n \to \infty$ , so the geometric average $\to (1 + f)^{p} (1 - f)^{q}$

General Setup

Now consider a general sequence of bets, where $1 bet +$a if win and −$b if lose. (In previous examples, a = b = 1, and the bet is favourable, i.e. $p a > q b$ .)

The Kelly criterion optimal fraction to bet is: (Proved in PS 7.1b)

f^{*} = \frac{p a - q b}{ab}

In the following example, f = 0.55 - 0.45 = 0.1

Investing Example

Now consider the following. We have a…

risk free asset with return $r_{f}$
risky asset with return $R$ : $E (R) = μ, V (R) = σ^{2}$

We invest fraction f of wealth into risky asset & remaining (1–f) into risk-free asset.

Apply Kelly Criterion to find f that maximizes logarithm of wealth:

V_{1} lo g (\frac{V _{n}}{V _{0}}) E [lo g (\frac{V _{n}}{V _{0}})] = f V_{0} \cdot (1 + R) + (1 - f) V_{0} \cdot (1 + r_{f}) = V_{0} [f + f R + 1 - f + r_{f} - f r_{f}] = V_{0} [1 + r_{f} + f (R - r_{f})] = lo g (t = 1 \prod n \frac{V _{t}}{V _{t - 1}}) = t = 1 \sum n lo g (1 + r_{f} + f (R_{t} - r_{f})) = t = 1 \sum n E [lo g (1 + r_{f} + f \cdot (R_{t} - r_{f}))] = n E [lo g (1 + r_{f} + f \cdot (R_{t} - r_{f}))]

Use Taylor expansion for $x_{0} = lo g (1 + r_{f} + f (R_{t} - r_{f}))$ around $1 + r_{f}$

$g^{'} (x_{0} + δ) \approx g (x_{0}) + g^{'} (x_{0}) δ + \frac{1}{2} g^{''} (x_{0}) δ^{2}$

Applying this, we get

lo g (x_{0} 1 + r_{f} + δ f \cdot (R_{t} - r_{f})) E (lo g (1 + r_{f} + f \cdot (R_{t} - r_{f}))) \approx g (x_{0}) lo g (1 + r_{f}) + g^{'} (x_{0}) \frac{1}{1 + r _{f}} f \cdot (R_{t} - r_{f}) + \frac{1}{2} g^{''} (x_{0}) (- \frac{1}{( 1 + r _{f} ) ^{2}}) f^{2} \cdot (R_{t} - r_{f})^{2} \approx lo g (1 + r_{f}) + \frac{1}{1 + r _{f}} f \cdot μ - r_{f} E (R_{t} - r_{f}) + \frac{1}{2} (- \frac{1}{( 1 + r _{f} ) ^{2}}) f^{2} \cdot (σ^{2} + μ^{2}) + r_{f}^{2} - 2 μ r_{f} E [(R_{t} - r_{f})^{2}] \approx lo g (1 + r_{f}) + f \frac{( μ - r _{f} )}{( 1 + r _{f} )} - f^{2} \frac{σ ^{2} + ( μ - r _{f} ) ^{2}}{2 ( 1 + r _{f} ) ^{2}} = G (f)

Differentiate w.r.t. f and set to 0:

\frac{\partial}{\partial f} G (f) \frac{μ - r _{f}}{1 + r _{f}} - f \frac{σ ^{2} + ( μ - r _{f} ) ^{2}}{( 1 + r _{f} ) ^{2}} f^{*} = 0 = 0 = (1 + r_{f}) \frac{( μ - r _{f} )}{σ ^{2} + ( μ - r _{f} ) ^{2}}

Since $σ^{2} ≫ μ - r_{f}$ , we have

f^{*} \approx (1 + r_{f}) \frac{( μ - r _{f} )}{σ ^{2}}

Theoretical properties

In the long term ( $n \to \infty$ ) with probability 1, a strategy based on Kelly criterion:

Maximizes limiting exponential growth rate of wealth
Maximizes median of final wealth
• Half of distribution is above median & half below it
Minimizes the expected time required to reach a specified goal for the wealth

Criticism

Can have considerable wealth volatility (b/c of multiplicative bet amounts)
Does not account for the uncertainty in probability of winning
- Many practitioners use fractional or partial Kelly, i.e. using smaller than Kelly fraction (e.g., f*/2)
In practice, investing horizons are not infinite and there are many other considerations (e.g. transaction costs, short-selling limits etc)

W8: Statistical Arbitrage

Statistical Arbitrage (StatArb) refers to trading strategies that utilize the “statistical mispricing” of related assets

StatArb strategies are typically short term and market neutral, involving long & short positions simultaneously

Examples of StatArb strategies:

• Pairs trading
• Index Arbitrage
• Volatility Arbitrage
• Algorithmic & High Frequency Trading

Pairs Trading

Original & most well-known StatArb technique developed by Morgan Stanley quants
Profit not affected by overall market movement (market neutral)
Contrarian strategy profits from price convergence of related assets

Main idea

Select pair of assets “moving together”, based on certain criteria
If prices diverge beyond certain threshold, buy low sell high
If prices converge again, reverse position and profit

Example

Let $L$ = price of lower asset, $H$ = price of higher asset.

Open the position when prices diverge: buy $1 of low asset ( $\frac{1}{L _{0}}$ units), sell $1 of high asset ( $\frac{1}{H _{0}}$ units)
- cost = $\frac{1}{L _{0}} L_{0} - \frac{1}{H _{0}} H_{0} = 0$
Close the position when prices converge
- profit = $\frac{1}{L _{0}} L_{c} - \frac{1}{H _{0}} H_{c}$ (c stands for closing)

Profitability is determined by asset price ratios (hence the use of log ratios for modelling):

\frac{H _{c}}{H _{0}} - \frac{L _{c}}{L _{0}} \frac{H _{c}}{H _{0}} lo g (\frac{H _{c}}{L _{c}}) < or > 0 < or > \frac{L _{c}}{L _{0}} < or > lo g (\frac{H _{0}}{L _{0}})

The strategy is market neutral, i.e. profitability is not affected by market movement - Assets typically have common market betas

What can go wrong?

Prices may not converge.

Factors to consider

Which pairs to trade
When to open trade
What amounts to buy/sell
When to close trade
When to bail out of trade

Most of these decisions involve trade-offs, so how do we select pairs to trade?

Profitable pairs must have log-ratio with strong mean reversion
- Note: Mean reversion is not the same as simply having constant mean

Mean Reversion

Suggests log-ratio process ${X_{t}}$ is stationary

$E (X_{t}) = μ, \forall t$
$Va r (X_{t}) = σ^{2} < \infty, \forall t$
$C o v (X_{t}, X_{s}) = C o v (X_{t + r}, X_{s + r}), \forall r, s, t$
Autocorrelation function $ρ (h), \forall h = 0, 1, \dots$ describes linear dependence at lag $h = ∣ t - s ∣$

Stationarity ensures process will revert back to its mean within reasonable time.

E.g. Let $X_{t} = lo g (\frac{H _{t}}{L _{t}}) \sim^{ii d} N (0, σ^{2}), \forall t = 1, 2, \dots$

If $X_{0} = 2 σ$ , what is the expected time until $X_{T} \leq 0$ ? I.e. until $lo g H_{t} - lo g L_{t} < 0$

On any day t, $P (X_{t} \leq 0) = \frac{1}{2}$

Let T = # days until ${X_{t} \leq 0}$ for the first time. T is called hitting time, it is equal to # trials until 1st success (if $X \leq 0$ ), so $T \sim G eo m (p = \frac{1}{2})$ which has prob mass function

p_{T} (t) = (\frac{1}{2})^{t}, \forall t \geq 1

The expected time is hence

E (T) = \frac{1}{p} = 2

E.g. Let $X_{t} = lo g (\frac{P 1 _{t}}{P 2 _{t}}) \sim$ Brownian Motion (BM) (continuous time Random Walk)

For any $X_{0} = c > 0$ , show that the expected time until $X_{T} \leq 0$ is infinite.

Let $T_{c}$ = {first time standard BM with $W_{0} = 0$ } hits level c Let $M_{t} = max {W_{u}; 0 \leq u \leq t} ⟹ M_{t} \sim ∣ W_{t} ∣$

This means:

P (T_{c} \leq t) = P (M_{t} \geq c) = P (∣ W_{t} ∣ \geq c) = 2Φ (- \frac{c}{t})

PDF of $T_{c}$ is given by $f (t) = \frac{d}{d t} P (T_{c} \leq t)$

f (t) E (T_{c}) ⟹ E (T_{c}) = \frac{d}{d t} [2Φ (- \frac{c}{t})] = 2 ϕ (- \frac{c}{t}) \cdot \frac{d}{d t} (- \frac{c}{t}) = 2 ϕ (- \frac{c}{t}) \cdot (\frac{1}{2} \frac{c}{t ^{3}}) = \frac{1}{2 π} e^{- \frac{1}{2} \frac{c ^{2}}{t}} \frac{c}{t ^{3}} = \int_{0}^{\infty} t \cdot f (t) d t = \int_{0}^{\infty} t \cdot \frac{c}{2 π t ^{3}} e^{- \frac{c ^{2}}{2 t}} d t = \frac{c}{2 π} \int_{0}^{\infty} \frac{1}{t} e^{- c^{2} /2 t} d t \geq \frac{c ^{'}}{2 π} \int_{0}^{a} \frac{1}{t} d t + \int_{a}^{\infty} \frac{1}{t} e^{- c^{2} /2 t} d t \to \infty = \infty

Integrated Series

A non-stationary time series ${X_{t}}$ whose difference ${\nabla X_{t} = X_{t} - X_{t - 1}}$ is stationary

Asset log prices are not stationary - will need to apply differencing

Although $r_{t} = lo g (\frac{S _{t}}{S _{t - 1}})$ follows a stationary process, $lo g S_{t} = lo g S_{0} + \sum_{i = 1}^{t} r_{i}$ is a random walk

Example: IBM stock price before and after differencing

Cointegration

Two integrated series ${X_{t}, Y_{t}}$ are cointegrated if there exists a linear combination of them that is stationary.

Consider a vector of time series $x_{t}$ . If each element becomes stationary after differencing, but a linear combination $α^{'} x_{t}$ is already stationary, then $x_{t}$ is said to be co-integrated with $α$ , which is the co-integrating vector.

There may be several such co-integrating vectors so that $α$ becomes a matrix. Interpreting $α^{'} x_{t} = 0$ as a long run equilibrium, co-integration implies that deviations from equilibrium are stationary, with finite variance, even though the series themselves are non-stationary and have infinite variance.

For pairs trading, we want to find assets which are cointegrated (their log difference is mean reverting, and thus stationary)

E.g. ST, MT, and LT interest rates are co-integrated - they move together but behave as random walks individually

E.g. Let ${W_{t}}$ be random walk, and ${X_{t} = W_{t} + ε_{t} Y_{t} = W_{t} + η_{t}$ where $ε_{t}, η_{t} \sim^{iid} N (0, σ^{2})$

Show that ${X_{t}, Y_{t}}$ are cointegrated

First, we need to show $X_{t}, Y_{t}$ are integrated (not stationary, with stationary 1st order differences).

V (X_{t}) = V [W_{t} + ϵ_{t}] = V (W_{t}) + V (ϵ_{t}) = t σ_{w}^{2} + σ^{2} ⟹ not stationary

Which is the same case for $Y_{t}$

Next, show cointegration by showing $X_{t} - X_{t - 1}$ is stationary.

\nabla X_{t} = X_{t} - X_{t - 1} E (\nabla X_{t}) V (\nabla X_{t}) C o v (\nabla X_{t}, \nabla X_{s}) = W_{t} + ϵ_{t} - W_{t - 1} - ϵ_{t - 1} = ν_{t} \sim W N (0, σ^{2}) (W_{t} - W_{t - 1}) + ϵ_{t} - ϵ_{t - 1} = E (ν_{t}) + E (ϵ_{t}) - E (ϵ_{t - 1}) = 0 ⟹ not stationary = V (ν_{t}) + V (ϵ_{t}) + V (ϵ_{t - 1}) = σ^{2} + 2 σ_{ϵ}^{2} = γ (t - s)

If |t-s| > 1, $C o v (\nabla X_{t}, \nabla X_{s}) = C o v (ν_{t} + ϵ_{t} - ϵ_{t - 1}, ν_{s} + ϵ_{s} - ϵ_{s - 1}) = 0$

If |t-s| = 1, $C o v (\nabla X_{t}, \nabla X_{s}) = C o v (ν_{t} + ϵ_{t} - ϵ_{t - 1}, ν_{t + 1} + ϵ_{t + 1} - ϵ_{t}) = C o v (ϵ_{t}, - ϵ_{t}) = - σ_{ϵ}^{2}$

$\nabla X_{t}$ is thus stationary $⟹ X_{t} \sim I (1)$ is integrated of order 1 (and similarly for $Y_{t}$ )

$X_{t} - Y_{t} = w_{t} + ϵ_{t} - (w_{t} + n_{t}) = ϵ_{t} - n_{t} \sim ii d N (0, σ_{ϵ}^{2} + σ_{n}^{2})$ is stationary, so $(X_{t}, Y_{t})$ are cointegrated.

Stationarity Tests

Hypothesis test for ${H_{0} : series is integrated H_{1} : series is stationary$

Idea: fit $X_{t} = β X_{t - 1} + ε_{t}$ to data and test ${H_{0} : β = 1 H_{1} : β < 1$

For random walk, we fail to reject the null hypothesis that $X_{t}$ is integrated.

Issue: we don’t know which linear combination to check for stationarity

Two-step method

Estimate linear relationship between variables
Test resulting difference series (residuals) for stationarity
Example: regress Chevron on Exxon log-prices
- Then, test residuals for stationarity: reject the null hypothesis that it is integrated -
Problems:
- Regressing P1 and P2 can give different results than regressing the other way around.
- There is estimation error for residuals.
Can be used to address spurious regression
- Results of random walk (integrated series) regressions are NOT reliable
  - Consider 2 independent random walks ${W_{t}, V_{t}}$
  - When you regress $W_{t} = β_{0} + β V_{t} + e_{t}, t = 1, \dots, n$ you are NOT guaranteed that $\hat{β} \to 0$ as the sample size $n \to \infty$ (i.e. not consistent)

Vector Error Correction models (VECM)

Combined treatment of dynamics & cointegration, using Vector AutoRegressive (VAR) models

Index Arbitrage

Indices measure value/performance of financial markets
- Dow-Jones Industrial Average (DJIA): Simple average of 30 major US stock prices (since 1896)
- Standard & Poor (S&P) 500: Weighted (cap-base) average of 500 large NYSE & NASDAQ listed companies
Financial indices are NOT traded instruments. However, there are many financial products whose value is directly related to indices:
- Mutual funds: e.g., Vanguard® 500 Index Fund
- Exchange-Traded-Funds (ETF’s): e.g., SPDR or iShares S&P500 Index
- Futures: e.g., E-Mini S&P futures
Financial products based on indices essentially offer a sophisticated version of multivariate cointegration
- For an index of N assets ${S_{i}}_{i = 1}^{N}$ w/ weights ${w_{i}}_{i = 1}^{N}$ , the index level is $I (t) = \sum_{i = 1}^{N} w_{i} \times S_{i} (t)$
- It has a co-integrating relationship with $F (t)$ , an instrument tracking index (e.g. futures)

F (t) - I (t) = F (t) - i = 1 \sum N w_{i} \times S_{i} (t) \sim stationary

Volatility Arbitrage

VolArb is implemented with derivatives, primarily options
The higher the volatility, the higher the option price

Consider European options:

For Black-Scholes formula, the only unobserved input is volatility $σ$ , which has to be estimated
Implied volatility $σ_{i}$ i s the input that makes Black-Scholes price equal to observed market price
- not estimated from underlying asset dynamics
If volatility will increase in the future, beyond what current options prices warrant (implied vol), some possible strategies are:
- straddles (long at the money call and put)
- strangles (long out of the money call and put)
- delta-hedged long call or put
Delta-neutral strategies eliminate effects of asset movement
Common approach is to describe the evolution of volatility with GARCH (Generalized AutoRegressive Conditional Heteroskedasticity) models

y_{t} = σ_{t} \cdot ε_{t}, ε_{t} \sim^{ii d} N (0, 1) σ_{t}^{2} = α_{0} + j = 1 \sum p α_{j} y_{t - j}^{2} + k = 1 \sum q β_{k} σ_{t - k}^{2}

W9: Monte Carlo Simulation

Numerical Option Pricing

3 basic numerical option pricing methods:

Binomial trees
Finite difference (based on Black Scholes PDE)
Monte Carlo simulation (based on SDE for asset prices & risk neutral valuation)

Multivariate Normal Properties

If $X = [X_{1} X_{2}] \sim N (μ = [μ_{1} μ_{2}], Σ = [Σ_{11} Σ_{21} Σ_{12} Σ_{22}])$ , then:

Marginals

X_{1} \sim N (μ_{1}, Σ_{11})

Linear combinations

a + B^{⊤} X \sim N (a + B^{⊤} μ, B^{⊤} Σ B)

Conditionals

X_{1} ∣ (X_{2} = x) \sim N (μ_{1} + Σ_{12} Σ_{22}^{- 1} (x - μ_{2}), Σ_{11} - Σ_{12} Σ_{22}^{- 1} Σ_{21})

Notice how $Σ_{12} = 0 ⟺ X_{1} ∣ (X_{2} = x) \sim N (μ_{1}, Σ_{11})$

Brownian Motion

$W_{T} \sim N (0, T)$ forms the building block of continuous stochastic models

Recall Ito Processes from STAC70:

A (one-dimensional) Itô process is a stochastic process ${X_{t}}_{t \geq 0}$ of the form
$X_{t} = x_{0} + \int_{0}^{t} a_{s} d s + \int_{0}^{t} b_{s} d B_{s}$
where ${a_{t}}_{t \geq 0}$ and ${b_{t}}_{t \geq 0}$ are adapted processes such that the integrals are defined. Equivalently, we can write this as
$d X_{t} = a_{t} d t + b_{t} d B_{t}$
The Brownian motion ${B_{t}}_{t \geq 0}$ is an Itô process. (Pick $a_{t} \equiv 0$ and $b_{t} \equiv 1$ .)

The general Brownian motion with (constant) drift $μ \in R$ (and constant volatility $σ > 0$ ) is an Itô process. (Pick $a_{t} \equiv μ$ and $b_{t} \equiv σ$ .)
$X_{t} = x_{0} + μ t + σ B_{t}$

Standard Brownian Motion

${W_{t}}$ with the following properties:

$W_{0} = 0$
$(W_{t} - W_{s}) ∣ W_{s} \sim N (0, t - s)$

Arithmetic Brownian Motion

${X_{t}}$ with drift $μ$ and volatility $σ$ and the following properties:

$X_{0} = 0$
$(X_{t} - X_{s}) ∣ X_{s} \sim N (μ (t - s), σ^{2} (t - s))$

$⟹ X_{t} ∣ X_{s} = x \sim N (x + μ (t - s), σ^{2} (t - s))$

SDE form:

X_{t} - X_{0} d X_{t} = \int_{0}^{t} μ d s + \int_{0}^{t} σ d W_{s} = μ t + σ (W_{t} - W_{0}) = μ d t + σ d W_{t}

E.g. For $d X_{t} = μ d t + σ d W_{t}$ , find distribution of $X_{t} ∣ X_{s} = x$

For $s < t$

[X_{s} X_{t}] ⟹ [X_{t} X_{s}] ⟹ X_{t} ∣ X_{s} = x \sim N (μ [s t], σ^{2} [s s s t]) \sim N (μ [t s], σ^{2} [t s s s]) \sim N (μ t + σ^{2} s \frac{1}{σ ^{2} s} (x - μ s), σ^{2} (t - s \frac{1}{s} s)) \sim N (x + μ (t - s), σ^{2} (t - s))

C o v (W_{s}, W_{t}) = C o v (W_{s}, W_{s} + (W_{t} - W_{s})) = C o v (W_{s}, W_{s}) + C o v (W_{s}, (W_{t} - W_{s})) = Va r (W_{s}) = s

For $s \in (0, t)$ (Brownian Bridge)

[X_{s} X_{t}] X_{s} ∣ X_{t} = x \sim N (μ [s t], σ^{2} [s s s t]) \sim N (μ s + s \frac{1}{t} (x - μ t), σ^{2} (s - s \frac{1}{t} s)) \sim N (\frac{s}{t} x, σ^{2} \frac{s ( t - s )}{t})

Geometric Brownian Motion

Process ${S_{t}}$ whose logarithm follows ABM

S_{t} = S_{0} e^{l o g (\frac{S _{t}}{S _{0}})} where lo g (\frac{S _{t}}{S _{0}}) \sim N (μ t, σ^{2} t) \sim S_{0} lo g N (μ t, σ^{2} t)

SDE form:

d lo g (S_{t}) d S_{t} = μ d t + σ d W_{t} = (μ + \frac{σ ^{2}}{2}) S_{t} d t + σ S_{t} d W_{t}

Risk Neutral Pricing

A risk-neutral (RN) measure or equivalent martingale measure (EMM) is a probability measure under which discounted asset prices are martingales.

Martingale: a stochastic process with the property $E (X_{n + 1} ∣ X_{1}, ..., X_{n}) = X_{n}$

Assuming GBM for asset ${S_{t}}$ and risk-free interest rate $r$ , there exists a probability measure such that

d S_{t} S_{t} = r S_{t} d t + σ S_{t} d W_{t} \sim S_{0} \times lo g N ((r - \frac{σ ^{2}}{2}) t, σ^{2} t)

The arbitrage-free price of any European derivative with payoff $G_{T} = f (S_{T})$ is given by discounted expectation w.r.t. RN measure

G_{0} = E [e^{- r T} G_{T}] = E [e^{- r T} f (S_{T})]

E.g. Show that under RN measure, $E (S_{t}) = S_{0} e^{r t}$ . More generally, $E (S_{t} / e^{r t} ∣ S_{s}) = S_{s} / e^{r t}$ .

E (S_{t}) = E (S_{0} e^{l o g (S_{t} / S_{0})}) = S_{0} E (e^{Y}) where Y = lo g (\frac{S _{t}}{S _{0}}) \sim N ((r - \frac{σ ^{2}}{2}) t, σ^{2} t)

Use the Normal MGF:

If $X \sim N (μ, σ^{2})$ , then $m_{X} (z) = e^{μ z + \frac{1}{2} σ^{2} z^{2}}$

m_{Y} (1) = exp ⎩ ⎨ ⎧ E (Y) (r - \frac{σ ^{2}}{2}) t + \frac{1}{2} V (Y) σ^{2} t ⎭ ⎬ ⎫ = S_{0} e^{r t}

E.g. Find price of forward contract $F_{0, T}$ (no dividends)

G_{T} = f (S_{T}) = (S_{T} - F_{0, T})

We know $G_{0} = 0$ (forward contracts involve no cashflow at t=0)

By risk neutral pricing, $G_{0} = E (e^{- r T} G_{T})$

00 F_{0, T} = E (e^{- r T} (S_{T} - F_{0, T})) = E (S_{T}) - F_{0, T} = E (S_{T}) = e^{r T} E (\sim m g l e e^{- r T} S_{T}) = e^{r T} S_{0}

Estimating Expectations

If $E [e^{- r T} f (S_{T})]$ cannot be calculated exactly, it can be estimated/approximated by simulation:

Generate N independent random variates $S_{i} (T), i = 1, \dots, N$ based on RN measure (i = iterations, not time)
By Law of Large Numbers (SLLN)

\hat{G}_{0} = \frac{1}{n} i = 1 \sum n e^{- r T} f (S_{i} (T)) \to E [e^{- r T} f (S_{T})], with prob. 1

Moreover, by Central Limit Theorem (CLT)

\frac{G ^ _{0} - G _{0}}{s _{G} / n} \sim^{a pp r .} N (0, 1), where s_{G}^{2} = \frac{1}{n - 1} i = 1 \sum n [e^{- r T} f (S_{i} (T)) - \hat{G}_{0}]^{2}

E.g. Show estimator of $E (e^{- r T} f (S_{T}))$ is consistent, and build 95% confidence interval for $G_{0}$

Estimator: $E [e^{- r T} f (S_{T})]$

Build a 95% confidence interval for $G_{0}$ as well

E (\hat{G_{0}}) = E [\frac{1}{n} i = 1 \sum n e^{- r T} f (S_{i} (T))] = \frac{1}{n} i = 1 \sum n G_{0} E [e^{- r T} f (S_{i} (T))] = \frac{1}{n} n G_{0} = G_{0}

Confidence interval:

\hat{G_{0}} \pm 1.96 \times \frac{S _{G}}{n}

European Call

Estimate European call price w/ simulation

Asset price dynamics: $d S_{t} = r S_{t} d t + σ S_{t} d W_{t}$
Payoff function for strike K & maturity T: $f (S_{T}) = (S_{T} - K)_{T}$

Generate random asset price variates as:

S_{i} (T) = S (0) exp {(r - \frac{σ ^{2}}{2}) T + σ T Z_{i}}

where $Z_{i}$ is standard Normal variate

Multiple assets

Payoff of some options depends on prices of multiple assets

E.g. exchange (outperformance) option w/ payoff

max {S_{1} (T) - S_{2} (T), 0} = (S_{1} (T) - S_{2} (T))_{+}

Monte Carlo option pricing requires simulating and averaging multiple asset prices/paths. We cannot simply simulate each asset separately since there could be cross-asset dependence.

Multivariate Brownian Motion

Define $d$ -dimensional standard BM

W (t) = W_{1} (t) ⋮ W_{d} (t) with correlation matrix ρ = 1 ⋮ ρ_{d 1} \dots ⋱ \dots ρ_{1 d} ⋮ 1

to have independent Normal increments

W (t) - W (s) ∣ W (s) \sim N_{d} (0, (t - s) ρ)

Note: increments are independent over time, but can be dependent across dimensions!

Multivariate ABM

${X (t)}$ w/ SDE $d X (t) = μ d t + σ d W (t)$ , where

μ = [μ_{1} \dots μ_{d}]^{⊤}, σ = [σ_{1} \dots σ_{d}]^{⊤}

${W (t)} \sim d$ -dim. standard BM W/ correlations $ρ$

$X (t) - X (s) ∣ X (s) \sim N_{d} ((t - s) μ, (t - s) Σ)$ , where

Σ = [{σ_{i} σ_{j} ρ_{ij}}_{i, j = 1}^{d}] = σ_{1}^{2} ⋮ σ_{1} σ_{d} ρ_{1, d} \dots ⋱ \dots σ_{1} σ_{d} ρ_{1, d} ⋮ σ_{d}^{2} = (σ σ^{⊤}) \circ ρ

Cholesky Factorization

A simple way to generate correlated Normal variates from independent ones

For a positive definite matrix $Σ$ where $x^{T} Σ x > 0$ , the Cholesky decomposition gives
$x^{T} L L^{T} x > 0$
It’s essentially the square root matrix.

If $Z \sim N_{d} (0, I)$ and $Σ = L L^{T}$ is the Cholesky factorization of the covariance matrix $Σ$ , then

V (LZ) = L V (Z) L^{T} = L L^{T} = Σ ⟹ LZ \sim N_{d} (0, Σ)

Note that $L$ is lower diagonal.

E.g.

W_{1} W_{2} W_{3} \sim N (0, 111122123) \sim Z_{1} Z_{1} + Z_{2} Z_{1} + Z_{2} + Z_{3}

where $Z_{i} \sim N (0, 1)$

The covariance matrix can be decomposed as

L L^{T} = 111011001100110111

Note that $W_{1} \sim N (0, 1), W_{2} ∣ W_{1} = W_{1} \sim N (0, 1), W_{3} ∣ W_{2} = W_{2} \sim N (0, 1)$

W10: Pricing Exotic Derivatives

Path Dependent Options

Derivatives whose payoffs depend on (aspects of) the entire asset price path, instead of just the final price

Barrier Options

Options that come into existence/get knocked out depending on whether prices hit a barrier. Final payoff is equal to a call/put.

4 types of Barrier options:

Up-and-out (U&O):

gets knocked out if prices moves above the barrier
max must be below barrier for the option to be worth something

Down-and-out (D&O):

gets knocked out if price moves below the barrier
min must be above barrier for the option to be worth something

Up-and-in (U&I):

comes into existence if price moves above the barrier
max must be above barrier for the option to be worth something

Down-and-in (D&I):

comes into existence if price moves below the barrier
min must be below barrier for the option to be worth something

E.g. When $B < K$ (barrier < strike), $C_{U & O} = 0$ b/c the payoff is not >0

Note that combining “out” and “in” options with the same B, K, T, etc. gives us a vanilla option. E.g. $P_{U & I} + P_{U & O} = P$

Pricing

Notation:

M_{T} = max {S_{t}}_{0 \leq t \leq T} m_{T} = min {S_{t}}_{0 \leq t \leq T}

The prices are:

C_{U & O} = e^{- r T} E [(S_{T} - K)_{+} I_{{M_{T} < B}}] C_{D & O} = e^{- r T} E [(S_{T} - K)_{+} I_{{m_{T} > B}}] C_{U & I} = e^{- r T} E [(S_{T} - K)_{+} I_{{M_{T} > B}}] C_{D & I} = e^{- r T} E [(S_{T} - K)_{+} I_{{m_{T} < B}}]

Simulating GBM paths

To price general path dependent options, we need to simulate asset price paths ${S_{t}}_{0 \leq t \leq T}$

In practice, we discretize time and simulate asset price at m points:

{S (t_{i})}_{i = 0}^{m} where t_{i} = i \frac{T}{m} = i \cdot Δ t, \forall i = 1, \dots, m

For GBM, $d S_{t} = r S_{t} d t + σ S_{t} d W_{t}$ has solution:

S (t_{i}) = S (t_{i - 1}) exp {(r - \frac{σ ^{2}}{2}) Δ t + σ Δ t \times Z_{i}} where {Δ t = T / m Z_{i} \sim ii d N (0, 1), i = 1, \dots, m

Only the neon purple option has a non-zero payoff, since it hasn’t been knocked out (exceed B), and is above K

Monte Carlo for Barrier Options

MC for barrier options based on simple discretization leads to biased prices!

All knock-out option prices will be overestimated, because the discretized minima/maxima will not be as extreme as the true ones (there may be some time point we did not simulate, during which the barrier could have been crossed, making the option worthless). Similarly, the knock-in option prices will be underestimated.

Bias can be reduced by increasing number of steps (m) in time discretization, but the computation would become increasingly expensive.

Trade-off between # paths (n) & # steps (m):

n↑ $⟺$ Var↓ & m↑ $⟺$ Bias↓ (Bias-Variance trade-off)

Reflection principle

If the path of a Wiener process $W_{t}$ reaches a value $y$ at time $t$ , then the subsequent path after time $s$ has the same distribution as the reflection of the subsequent path about the value $y$ .

= = P (M_{t} > y, W_{t} \leq x) P M_{t} > y, subset of M_{t} > y W_{t} > y + (y - x) P (W_{t} > 2 y - x)

Max of standard BM ~ absolute normal

For standard BM ${W_{t}}$ , the max by time T, $M_{t} = max {W_{t}}_{0 \leq t \leq T}$ is distributed as a folded normal.

We want to find the CDF $P (M_{T} \leq y)$

For $x \leq y$

P (W_{T} \leq x, M_{T} \leq y) = P (W_{T} \leq x) - P (W_{T} \leq x, M_{T} \geq y) \leftarrow P (A \cap B) = P (A) - P (A \cap B^{C}) = P (W_{T} \leq x) - P (W_{T} \geq 2 y - x) \leftarrow reflection principle

When $x = y$ , we have

P (W_{T} \leq y, M_{T} \leq y) = P (M_{T} \leq y) = P (W_{T} \leq y) - P (W_{T} \geq 2 y - y) = P (W_{T} \leq y) - P (W_{T} \geq y) = P (W_{T} \leq y) - P (W_{T} \leq - y) = P (- y \leq W_{T} \leq y) = P (∣ W_{T} ∣ \leq y)

Thus $M_{T} \sim ∣ W_{T} ∣$ (consider first line vs last line above).

E.g. Find the probability that standard BM ${W_{t}}$ hits barrier $B = 1$ before time $T = 1$

Since $W_{T} \sim N (0, T)$ , we have

P (M_{1} \geq 1) = P (∣ W_{1} ∣ \geq 1) = 2 P (Z \geq 1) = 2Φ (- 1) = .317862

Optimal n/m ratio

MC estimates of $P (max {W_{t}}_{0 \leq t \leq 1} \geq 1)$ using path discretization w/ different n (paths), m (steps)

Best MSE lies in the middle:

High variance when n/m is low
High bias when n/m is high

E.g. Estimate probability that standard BM hits 1 before time 1, with MC but without bias.

Generate values of $M_{T}$ directly by generating $W_{T}$ and setting $M_{T} = ∣ W_{T} ∣$ . Then, estimate the probability by the proportion of $M_{T}^{'} s$ that are $> 1$

Below are MC estimates of $P (max {W_{t}}_{0 \leq t \leq 1} \geq 1)$ using direct simulation of $max {W_{t}}$ w/ $n = 100, 000$

Extrema of Brownian Motion

For standard BM ${W_{t}}$ , maximum $M_{T}$ is distributed as $∣ W_{T} ∣$
For arithmetic BM ${X_{t}}$ , the distribution of the maximum is difficult to work with - reflection principle does not work b/c of drift
However, one can easily simulate random deviates of maximum using Brownian bridge (Brownian motion with fixed end point)
- Its construction allows for general treatment of extrema of various processes

Consider ABM: $d X_{t} = μ d t + σ d W_{t}$

Conditional on $X_{T} = b$ , the maximum $(M_{T} ∣ X_{T}) = max_{t} {X_{t} ∣ X_{T}}$ of the Brownian bridge process has a Rayleigh distribution:

P (M_{T} \leq m ∣ X_{T} = b) = 1 - exp {- 2 \frac{m ( m - b )}{σ ^{2} T}}, \forall m \geq 0 \lor b

Note that distribution of conditional maximum is independent of the drift, given $X_{T} = b$

Simulating maxima of ABM

Generate $X_{T} \sim N (μ T, σ^{2} T)$
Generate $U \sim Uniform (0, 1)$
Calculate $M_{T} ∣ X_{T} = \frac{X _{T} + X _{T}^{2} - 2 σ ^{2} T l o g ( U )}{2}$

For maxima of GBM, exponentiate ABM result

Simulating minima of ABM

By symmetry, min of ABM with $μ \equiv$ max of ABM with $- μ$

Time Discretization

Path dependent options generally require simulation of entire discretized path. Exceptions are options depending on maximum (e.g. barrier, look-back).

If prices do not follow GBM, it is generally not possible to simulate from exact distribution of asset prices, so we need to approximate sample path distribution over discrete times

Euler Discretization

Consider a general SDE where drift/volatility can depend on time $(t)$ and/or process $(S_{t})$

d S_{t} = μ (t, S_{t}) d t + σ (t, S_{t}) d W_{t}

There is no general explicit solution for $S_{t}$ , i.e. distribution of $S_{t}$ is unknown (in closed form)

To approximate the behaviour of $S_{t}$ :

Discretize time $t_{i} = i (T / m) = i Δ t, i = 0, \dots, m$
Simulate (approx.) path recursively, using $Z_{i} \sim ii d N (0, 1), i = 1, \dots, m$

S (t_{i}) = S (t_{i - 1}) + μ (S (t_{i - 1}), t_{i - 1}) Δ t + σ (S (t_{i - 1}), t_{i - 1}) Δ t Z_{i}

To approximate distribution of $S (T)$ , generate multiple (#n) discretized paths

W11: Simulation - Variance Reduction Techniques

Antithetic Variables

For each normal variate $Z_{i}$ , consider its negative $- Z_{i}$ . Note that they are dependent. For Uniform(0, 1), use $U_{i}$ and $1 - U_{i}$

Calculate the discounted payoff under both:

Y_{i} = f (Z_{i}), \tilde{Y}_{i} = f (- Z_{i})

Estimate price as the mean of the RVs

\overset{ˉ}{Y}_{A V} = \frac{1}{2 n} (i = 1 \sum n Y_{i} + i = 1 \sum n \tilde{Y}_{i}) = \frac{1}{n} i = 1 \sum n \frac{Y _{i} + Y ~ _{i}}{2}

The idea is to balance payoffs of paths with opposite returns.

Pros and cons

This technique is simple, but not always useful. It only helps if the original and antithetic variates are negatively related.

We can prove this by comparing its variance $V (\overset{ˉ}{Y}_{A V})$ to the variance of the naive mean $V (\frac{1}{2 n} \sum_{i = 1}^{2 n} Y_{i})$ . Under what condition does the variance get reduced?

Variance reduction proof

Variance of naive mean:

V (\frac{1}{2 n} i = 1 \sum 2 n Y_{i}) = (\frac{1}{2 n})^{2} (2 nV (Y_{i})) = \frac{1}{2 n} V (Y_{i})

Variance of antithetic mean:

V (\overset{ˉ}{Y}_{A V}) = V (\frac{1}{2 n} (i = 1 \sum n Y_{i} + i = 1 \sum n \tilde{Y}_{i})) = (\frac{1}{2 n})^{2} V (i = 1 \sum n Y_{i} + i = 1 \sum n \tilde{Y}_{i}) = (\frac{1}{2 n})^{2} [nV (Y_{i}) + nV (\tilde{Y}_{i}) + 2 C o v (i = 1 \sum n Y_{i}, j = 1 \sum n \tilde{Y}_{j})] = (\frac{1}{2 n})^{2} [2 nV (Y_{i}) + 2 i = 1 \sum n j = 1 \sum n C o v (Y_{i}, \tilde{Y}_{j})] = (\frac{1}{2 n})^{2} [2 nV (Y_{i}) + 2 i = 1 \sum n C o v (Y_{i}, \tilde{Y}_{i})] = (\frac{1}{2 n})^{2} [2 nV (Y_{i}) + 2 n C o v (Y_{i}, \tilde{Y}_{i})] = \frac{1}{2 n} (V (Y_{i}) + C o v (Y_{i}, \tilde{Y}_{i}))

For $V (\overset{ˉ}{Y}_{A V}) < V (\frac{1}{2 n} \sum_{i = 1}^{2 n} Y_{i})$ to hold, we must have

C o v (Y_{i}, \tilde{Y}_{i}) \leq 0

Even function => worst case scenario (-Z gives the same value)

Asymptotic distribution of estimator

Find asymptotic distribution of antithetic variable estimator in terms of moments of $\frac{Y _{i} + Y ~ _{i}}{2}$

By CLT, we have

\overset{ˉ}{Y}_{A V} \sim a pp ro x N (E (\frac{Y _{i} + Y ~ _{i}}{2}), \frac{1}{n} V (\frac{Y _{i} + Y ~ _{i}}{2}))

where the mean is

\frac{1}{2} (E (Y_{i}) + E (\tilde{Y}_{i})) = \frac{1}{2} 2 E (Y) = E (Y) = E (f (Z))

and the variance is

= = \frac{1}{n} \cdot \frac{1}{4} (V (Y_{i}) + V (\tilde{Y}_{i}) + 2 C o v (Y_{i}, \tilde{Y}_{i})) \frac{1}{4 n} (2 V (Y_{i}) + 2 C o v (Y_{i}, \tilde{Y}_{i})) \frac{1}{2 n} (V (Y_{i}) + C o v (Y_{i}, \tilde{Y_{i}}))

Example

Antithetic variable pricing of a European call

Stratification

Split the RV domain into equiprobable strata, and draw equal number of variates from within each one

Consider $m$ equiprobable Normal strata ${A_{j}}$

P (Z \in A_{j}) = \frac{1}{m} for j = 1, \dots, m, and Z \sim N (0, 1)

Stratified estimator of $Y = f (Z)$ is given by

\overset{ˉ}{Y}_{St r} = \frac{1}{m} j = 1 \sum m \overset{ˉ}{Y}^{(j)} = \frac{1}{m} j = 1 \sum m (\frac{1}{n} i = 1 \sum n f (Z_{i}^{(j)}))

where $\overset{ˉ}{Y}^{(j)}$ is the estimator within each stratum $j$ , and $Z_{i}^{(j)} \sim ii d N (0, 1 ∣ Z_{i}^{(j)} \in A_{j}), j = 1, \dots, m$

Mean of estimator

Verify that $\overset{ˉ}{Y}_{St r}$ is an unbiased estimator of $E (f (Z))$

E (\overset{ˉ}{Y}_{St r}) = E (\frac{1}{m} j = 1 \sum m \overset{ˉ}{Y}^{(j)}) = \frac{1}{m} j = 1 \sum m E (\overset{ˉ}{Y}^{(j)}) = \frac{1}{m} j = 1 \sum m E (\frac{1}{n} i = 1 \sum n Y_{j}^{(j)}) = \frac{1}{m} j = 1 \sum m \frac{1}{n} i = 1 \sum n E (Y ∣ Z \in A_{j}) E (Y_{i}^{(j)}) = P (Z \in A_{j}) \frac{1}{m} j = 1 \sum m \frac{1}{n} n E (f (Z) ∣ Z \in A_{j}) = j = 1 \sum m E (f (Z) ∣ Z \in A_{j}) P (Z \in A_{j}) = E (f (Z)) by LOTP = E (Y)

Variance reduction proof

Show that $V [\overset{ˉ}{Y}_{St r}] < V [\overset{ˉ}{Y}], where \overset{ˉ}{Y} = \frac{1}{nm} \sum_{i = 1}^{nm} f (Z_{i})$

V (\overset{ˉ}{Y}) V (\overset{ˉ}{Y}_{St r}) = V (\frac{1}{nm} i = 1 \sum nm Y_{i}) = \frac{1}{nm} V (Y_{i}) = \frac{1}{nm} (E (Y_{i}^{2}) - (E (Y_{i}))^{2})) = \frac{1}{nm} (E (f^{2} (Z)) - μ^{2}) = V (\frac{1}{m} j = 1 \sum m \overset{ˉ}{Y}^{(j)}) = \frac{1}{m ^{2}} j = 1 \sum m V (\overset{ˉ}{Y}^{(j)}) = \frac{1}{m ^{2}} j = 1 \sum m V (\frac{1}{n} i = 1 \sum n Y_{i}^{(j)}) = \frac{1}{m ^{2} n ^{2}} j = 1 \sum m i = 1 \sum n V (Y_{i}^{(j)}) = \frac{1}{m ^{2} n ^{2}} j = 1 \sum m nV (Y^{(j)}) = \frac{1}{m ^{2} n} j = 1 \sum m V (Y ∣ Z \in A_{j}) = \frac{1}{m ^{2} n} j = 1 \sum m (E (f^{2} (Z) ∣ Z \in A_{j}) - (E (f (Z)) ∣ Z \in A_{j})^{2}) = \frac{1}{mn} ⎩ ⎨ ⎧ E (f^{2} (Z)) j = 1 \sum m E (f^{2} (Z) ∣ Z \in A_{j}) P (Z \in A_{j}) \frac{1}{m} - \frac{1}{m} j = 1 \sum m (μ_{j} E (f (Z) ∣ Z \in A_{j}))^{2} ⎭ ⎬ ⎫ = \frac{1}{mn} {E (f^{2} (Z)) - \frac{1}{m} j = 1 \sum m μ_{j}^{2}}

So we have $V [\overset{ˉ}{Y}_{St r}] < V [\overset{ˉ}{Y}]$ since $\frac{1}{m} \sum_{j = 1}^{m} μ_{j}^{2} \geq μ^{2}$ by Jensen’s inequality

So for $f (x) = x^{2}$ , a convex function, we have

= \frac{1}{m} \sum_{j = 1}^{m} μ_{j}^{2} E (x^{2}) \geq = μ^{2} = (\frac{1}{m} \sum_{j = 1}^{m} μ_{j})^{2} (E (x))^{2}

Pros and cons

This method ensures equal representation of each stratum in the RV’s domain. It always reduces variance.

It works best when target RV (payoff) changes over its domain, i.e. is highly variable (as opposed to a flat payoff).

It is computationally difficult for multidimensional RV’s. Getting the conditional distribution within each stratum can be difficult, and the CDF is often unknown.

Example

Stratified pricing of a European call

Control Variates

Estimate $E [Y] = E [f (Z)]$ using MC: generate iid $Z_{i}$ and use

\overset{ˉ}{Y} = i = 1 \sum n Y_{i} / n = i = 1 \sum n f (Z_{i}) / n

where $f (\cdot)$ is option’s discounted payoff

Assume there is another option with payoff $g (\cdot)$ whose price $E [X] = E [g (Z)]$ is known. The idea is to use MC with the same variates to estimate both $E [Y]$ and $E [X]$ , but adjust the estimate $\overset{ˉ}{Y}$ to take into account the error of estimate $\overset{ˉ}{X}$ . E.g. if $\overset{ˉ}{X}$ underestimates $E [X]$ , then increase $\overset{ˉ}{Y}$ .

Adjust $\overset{ˉ}{Y}$ for estimation error $\overset{ˉ}{X} - E [X]$ linearly, as

\overset{ˉ}{Y} (b) = \overset{ˉ}{Y} - b (\overset{ˉ}{X} - E [X])

where the coefficient $b$ controls adjustment.

Mean of estimator (unbiased proof)

Show that $\overset{ˉ}{Y} (b)$ is unbiased for any $b$ (provided $\overset{ˉ}{Y}, \overset{ˉ}{X}$ are unbiased)

E (\overset{ˉ}{Y} (b)) = E (\overset{ˉ}{Y} - b (\overset{ˉ}{X} - E (X))) = E (\overset{ˉ}{Y}) - b (E (\overset{ˉ}{X}) - E (X)) = E (Y) - b (0 adjustment E (X) - E (X)) = E (Y)

Variance of estimator

V [\overset{ˉ}{Y} (b)] = V [\overset{ˉ}{Y} - b (\overset{ˉ}{X} - E (X))] = V [\overset{ˉ}{Y} - b \overset{ˉ}{X}] = V (\overset{ˉ}{Y}) + b^{2} V (\overset{ˉ}{X}) - 2 b Cov (\overset{ˉ}{Y}, \overset{ˉ}{X}) = \frac{1}{n} V (Y) + b^{2} \frac{1}{n} V (X) - 2 b Cov (\frac{1}{n} i = 1 \sum n Y_{i} f (X_{i}), \frac{1}{n} i = 1 \sum n X_{i} g (Z_{i})) = \frac{1}{n} V (Y) + b^{2} \frac{1}{n} V (X) - 2 b \frac{1}{n ^{2}} n Cov (f (Z)), g (Z)) = \frac{1}{n} [σ_{Y}^{2} + b^{2} σ_{X}^{2} - 2 b σ_{X Y}]

Optimal value of adjustment coefficient

Show that the optimal value of $b$ is $b^{*} = \frac{C o v ( X , Y )}{Va r ( X )}$ . This is the regression slope coefficient.

\frac{\partial}{\partial b} V (\overset{ˉ}{Y} (b)) \frac{\partial}{\partial b} (\frac{1}{n} [σ_{Y}^{2} + b^{2} σ_{X}^{2} - 2 b σ_{x X Y}]) b σ_{X}^{2} - σ_{X Y} b = 0 = 0 = 0 = \frac{σ _{X Y}}{σ _{X}^{2}} = \frac{ρ _{X Y} σ _{X} σ _{Y}}{σ _{X}^{2}} = ρ_{X Y} \frac{σ _{Y}}{σ _{X}}

In practice, $Cov [X, Y], Var [X]$ are unknown, so we estimate $b^{*}$ using MC sample

\hat{b} = \frac{\sum _{i = 1}^{n} ( X _{i} - X ˉ ) ( Y _{i} - Y ˉ )}{\sum _{i = 1}^{n} ( X _{i} - X ˉ ) ^{2}}

Optimal variance

Show that the optimal variance is $V (\overset{ˉ}{Y} (b^{*})) = V (\overset{ˉ}{Y}) (1 - ρ_{X Y}^{2})$

V [\overset{ˉ}{Y} (b^{*})] = \frac{1}{n} (σ_{Y}^{2} - (b^{*})^{2} σ_{X}^{2} - 2 b^{*} σ_{X Y}) = \frac{1}{n} (σ_{Y}^{2} + (ρ_{X Y} \frac{σ _{Y}}{σ _{X}})^{2} σ_{X}^{2} - 2 (ρ_{X Y} \frac{σ _{Y}}{σ _{X}}) σ_{X} σ_{Y} ρ_{X Y}) = \frac{1}{n} (σ_{Y}^{2} + ρ_{X Y}^{2} σ_{Y}^{2} - 2 ρ_{X Y}^{2} σ_{Y}^{2}) = \frac{1}{n} σ_{Y}^{2} (1 - ρ_{X Y}^{2}) = V (\overset{ˉ}{Y}) (1 - ρ_{X Y}^{2})

In practice, we need to use sample estimates of $Var [\overset{ˉ}{Y}], ρ_{X Y}$

Correlation of control

Good control variates have high absolute correlation with option payoff (high $ρ_{X Y}$ )

In-the-money call: $ρ_{X Y} \approx 1$
- $C o v (S_{T}, S_{T} - K) > 0$
Out-of-the-money call: $ρ_{X Y} \approx 0$
- $C o v (S_{T}, 0) = 0$
In-the-money put: $ρ_{X Y} \approx - 1$
Out-of-the-money put: $ρ_{X Y} \approx 0$

Example

Price European option using final asset price ( $S_{t}$ ) as control, assuming GBM with $r, σ$

$X = S_{T} = g (Z) = S_{0} exp {(r - \frac{σ ^{2}}{2}) T + σ T Z}$
$E [X] = E [S_{T}] = S_{0} e^{r T}$

Importance Sampling

We can reduce variance by changing the distribution (probability measure) from which paths are generated to give more weight to important outcomes, thereby increasing sample efficiency. The performance of this method relies heavily on the equivalent measure being used.

E.g. for European call, we put more weight to paths with positive payoff (i.e. paths for which we exercise)

Let $ϕ (z)$ be pdf of $Z$ , we want to estimate

α = E_{ϕ} [f (Z)] = \int_{z} f (z) ϕ (z) d z

Using simple MC, generate sample $Z_{i} \sim^{ii d} ϕ, i = 1, \dots, n$ . The estimate is thus $\overset{α}{^} = \sum_{i = 1}^{n} f (Z_{i}) / n$ If we have sample $Z_{i}^{'} \sim ii d ψ, i = 1, \dots, n$ from a new pdf $ψ$ , we can still estimate $α$ as follows

α = \int_{z} f (z) ϕ (z) d z = \int x f_{X} (x) d x = E (X) \int_{z} f (z) \frac{ϕ ( z )}{ψ ( z )} ψ (z) d z = E_{ψ} [f (Z^{'}) \frac{ϕ ( Z ^{'} )}{ψ ( Z ^{'} )}] \Rightarrow \overset{α}{^}^{'} = \frac{1}{n} i = 1 \sum n f (Z_{i}^{'}) \frac{ϕ ( Z _{i}^{'} )}{ψ ( Z _{i}^{'} )}

Mean of estimator

E_{ψ} [\overset{α}{^}^{'}] = E_{ψ} (\frac{1}{n} i = 1 \sum n f (Z_{i}^{'}) \frac{ϕ ( Z _{i}^{'} )}{ψ ( Z _{i}^{'} )}) = \frac{1}{n} i = 1 \sum n E_{ψ} [f (Z^{'}) \frac{ϕ ( Z ^{'} )}{ψ ( Z ^{'} )}] = \int_{- \infty}^{\infty} v a l u e f (Z^{'}) \frac{ϕ ( Z ^{'} )}{ψ ( Z ^{'} )} p ro b ψ (Z^{'}) d Z^{'} = \int_{- \infty}^{\infty} f (Z^{'}) ϕ (Z^{'}) d Z^{'} = E_{ϕ} [f (Z)] = α

Note that this estimate is unbiased (provided simple MC estimate $\overset{α}{^}$ is unbiased).

Variance of estimator

V_{ψ} (\overset{α}{^}^{'}) = V_{ψ} (\frac{1}{n} i = 1 \sum n f (Z_{i}^{'}) \frac{ϕ ( Z _{i}^{'} )}{ψ ( Z _{i}^{'} )}) = \frac{1}{n} V_{ψ} (f (Z_{i}^{'}) \frac{ϕ ( Z _{i}^{'} )}{ψ ( Z _{i}^{'} )}) = \frac{1}{n} {E_{ψ} [(f (z^{'}) \frac{ϕ ( z ^{'} )}{ψ ( z ^{'} )})^{2}] - (E_{ψ} [f (z^{'}) \frac{ϕ ( z ^{'} )}{ψ ( z ^{'} )}])^{2}}

Variance reduction proof & condition

Show that $Var_{ψ} [\overset{α}{^}^{'}] \leq Var_{ϕ} [\overset{α}{^}] ⟺ E_{ϕ} [f^{2} (Z) \frac{ϕ ( Z )}{ψ ( Z )}] \leq E_{ϕ} [f^{2} (Z)]$

V_{ψ} [\overset{α}{^}^{'}] \frac{1}{n} {E_{ψ} [(f (z^{'}) \frac{ϕ ( z ^{'} )}{ψ ( z ^{'} )})^{2}] - α^{2}} E_{ψ} [f^{2} (z^{'}) \frac{ϕ ^{2} ( z ^{'} )}{ψ ^{2} ( z ^{'} )}] \leq V_{ϕ} [\overset{α}{^}] \leq \frac{1}{n} {E_{ϕ} [f^{2} (z)] - α^{2}} \leq E_{ϕ} [f^{2} (z)]

The LHS is equivalent to

\int f^{2} (z^{'}) \frac{ϕ ^{2} ( z ^{'} )}{ψ ^{2} ( z ^{'} )} ψ (z^{'}) d z^{'} = \int f^{2} (z^{'}) \frac{ϕ ( z ^{'} )}{ψ ( z ^{'} )} ϕ (z^{'}) d z^{'} = E_{ϕ} [f^{2} (z) \frac{ϕ ( z )}{ψ ( z )}]

Optimal variance condition

Show that for positive $f$ , $Va r_{ψ} [\overset{α}{^}^{'}] = 0$ if $ψ (z) \propto f (z) ϕ (z)$

i.e. importance sampling works best when new pdf $ψ$ resembles $f \times ϕ$ (payoff $\times$ original pdf)

ψ (z) \int ψ (z) d z c = \frac{1}{c} f (z) ϕ (z) = \int \frac{1}{c} f (z) ϕ (z) d z = 1 = \int f (z) ϕ (z) d z = E_{ϕ} (f (z)) = α

V_{ψ} [\overset{α}{^}^{'}] = \frac{1}{n} {E_{ψ} [(f (z^{'}) \frac{ϕ ( z ^{'} )}{ψ ( z ^{'} )})^{2}] - α^{2}} = \frac{1}{n} {E_{ψ} [(\frac{f ( z ^{'} ) ϕ ( z ^{'} )}{\frac{1}{c} f ( z ^{'} ) ϕ ( z ^{'} )})^{2}] - α^{2}} = \frac{1}{n} {α^{2} E_{ψ} [c^{2}] - α^{2}} = \frac{1}{n} (α^{2} - α^{2}) = 0

Multiple random variates

Importance sampling can be extended to multiple random variates per path

For example, for a path-dependent option with payoff $f (Z_{1}, \dots, Z_{m})$ , which is a function of m variates forming discretized path, the mean of the estimate is

E_{ϕ} [f (Z_{1}, \dots, Z_{m})] = E_{ψ} [f (Z_{1}^{'}, \dots, Z_{m}^{'}) \frac{ϕ ( Z _{1}^{'} , \dots , Z _{m}^{'} )}{ψ ( Z _{1}^{'} , \dots , Z _{m}^{'} )}]

If in addition, $Z_{j} \sim ii d ϕ_{j}$ , $Z_{j}^{'} \sim ii d ψ_{j}$ , then

E_{ϕ} [f (Z_{1}, \dots, Z_{m})] = E_{ψ} [f (Z_{1}^{'}, \dots, Z_{m}^{'}) j = 1 \prod m \frac{ϕ _{j} ( Z _{j}^{'} )}{ψ _{j} ( Z _{j}^{'} )}]

Example

Consider a deep out-of-the-money European call with $S_{0} = 50, K = 65$

With simple MC, generate final prices as

S_{T} = S_{0} e^{Z}, where Z \sim ϕ = N ((r - \frac{σ ^{2}}{2}) T, σ^{2} T)

Which of the following is a better candidate for $ψ$ ?

Z^{'} \sim ψ = N (lo g (\frac{90}{50}) - \frac{σ ^{2}}{2} T, σ^{2} T) or N (lo g (\frac{30}{50}) - \frac{σ ^{2}}{2} T, σ^{2} T)

The former, since it is ITM. We want to simulate from distributions with higher means (closer to $K = 65$ ).

W12: Optimization in Finance

Most real world problems involve making decisions, often under uncertainty. Making good/optimal decisions typically involves some optimization.

In finance, we must typically decide how to invest over time and across assets.

E.g. mean-variance analysis or Kelly criterion

Types of Optimization Problems

Straightforward (closed form or polynomial complexity):
- Linear, quadratic, convex
- Equality/linear/convex constraints
Difficult:
- Discrete optimization (discrete variable)
  - E.g. indivisible assets, transaction costs
- Dynamic optimization (previous decisions affect future ones)
  - Investing overtime
- Stochastic optimization (uncertainty)

Discrete & Dynamic Optimization

Assume you can perfectly foresee the price of a stock. You want to make optimal use of such knowledge, assuming

you can only trade integer units of the asset
every transaction costs you a fixed amount
you cannot short sell the asset

This is a discrete, dynamic optimization problem. Although there is no randomness (we have perfect knowledge), the problem is not trivial.

We could consider all possible strategies, but that would be expensive - the search space size is $2^{n}$ .

We can use dynamic programming (backward induction) instead:

At any time t, there are 2 states: owning or not owning the asset
The optimal value of each state at t = the best option out of transitioning to another state + the optimal value of that state at t+1
Start from the end, and consider optimal value going backwards to discover the best strategy

E.g. Find evolution of value, assuming no position at $t = 0$ and $t > n$

Let $S (t) =$ asset price at $t$ , $t c =$ transaction cost, $V_{n p} (t) =$ opt. value for no position at $t$ , $V_{lp} (t) =$ opt. value for long position at $t$

state \ time	t=1	t=2	t=3=n	> n
no position	$V_{n p} (1) = max {0 + V_{n p} (2), - S (1) - t c + V_{lp} (2)}$	$V_{n p} (2) = max {0 + V_{n p} (3), - S (2) - t c + V_{lp} (3)}$	$V_{n p} (3) = 0$	0
long position	/	$V_{lp} (2) = max {0 + V_{lp} (3), S (2) - t c + V_{n p} (3)}$	$V_{lp} (3) = S (3) - t c$	/

E.g.

Stochastic Optimization

Consider a similar problem without exact price knowledge (e.g. prices follow binomial tree with some probabilities)

We want to find the best trading strategy (which maxes the expected P&L)

Define the following:

$X_{t}$ is the state RV (price and position, e.g. long/short)
$a_{t}$ is action (change in state e.g. buy/sell)
$f_{t}$ is a reward function (e.g. cashflow)

We want to maximize expected reward over stochastic actions

Letting $V (t, X_{t})$ be the optimal value function, we have

V (t, X_{t}) = a_{t \to T} max {E [s = t \sum T f (s, X_{s}, a_{s}) ∣ X_{t}]} = a_{t \to T} max {f (t, X_{t}, a_{t}) + E [s = t + 1 \sum T f (s, X_{s}, a_{s}) ∣ X_{t}]} = a_{t} max {f (t, X_{t}, a_{t}) + a_{(t + 1) \to T} max {E [s = t + 1 \sum T f (s, X_{s}, a_{s}) ∣ X_{t}]}} = a_{t} max {f (t, X_{t}, a_{t}) + E [a_{(t + 1) \to T} max {E [s = t + 1 \sum T f (s, X_{s}, a_{s}) ∣ X_{t + 1}^{(a_{t})}]} ∣ X_{t}]} = a_{t} max {f (t, X_{t}, a_{t}) + E [V (t + 1, X_{t + 1}^{(a_{t})}) ∣ X_{t}]}

E.g. Consider the following Binomial tree, with up/down probability of 1/2:

Find the optimal strategy that maximizes the expected $P / L$ assuming you can long and short the asset, and there is a transaction cost of $0.1/share. Note that there are three possible states now (long, neutral, short), and $3^{3} = 27$ possible strategies. Find the optimal strategy and its value using dynamic programming, and optionally verify it with an exhaustive search.

Solution: It is not difficult to reason that the best strategy is to go long at $u$ and short at $d$ , since the up paths have positive expected $P / L$ and the down paths have negative expected $P / L$ (greater than the transaction costs), and this strategy minimizes the expected costs (you only long/short when you need). You can actually verify this by calculating the expected $P / L$ of all 27 strategies by brute force (e.g., in R) to get:

Strategy	Expected P//L
s,s,s	-1-2tc
s,s,n	-1.5-2tc
s,s,l	-2-3tc
s,n,s	0.5-2tc
s,n,n	0-2tc
s,n,l	-0.5-3tc
s,l,s	2-3tc
s,l,n	1.5-3tc
s,l,l	1-4tc
n,s,s	-1-2tc
n,s,n	-1.5-1tc
n,s,l	-2-2tc
n,n,s	0.5-1tc
n,n,n	0
n,n,l	-0.5-1tc
n,l,s	2-2tc
n,l,n	1.5-1tc
n,l,l	1-2tc
l,s,s	-1-4tc
l,s,n	-1.5-3tc
l,s,l	-2-3tc
l,n,s	0.5-3tc
l,n,n	0-2tc
l,n,l	-0.5-2tc
l,l,s	2-3tc
l,l,n	1.5-2tc
l,l,l	1-2tc

But you can drastically reduce the required calculations using backward induction/dynamic programming. Let $V (t, (S_{t}, p))$ denote the optimal value at time $t$ for price $S_{t}$ and “position” $p \in {s, n, l}$ .

At time $t = 2$ we have:

V (2, (110, l)) V (2, (110, n)) V (2, (110, s)) V (2, (100, l)) V (2, (100, n)) V (2, (100, s)) V (2, (94, l)) V (2, (94, n)) V (2, (94, s)) = 110 - t c = 0 = - 110 - t c = 100 - t c = 0 = - 100 - t c = 94 - t c = 0 = - 94 - t c

At time $t = 1$ and $S_{1} = 102$ , we have

V (1, (102, l)) V (1, (102, n)) V (1, (102, s)) = max ⎩ ⎨ ⎧ 0 + \frac{1}{2} [V (2, (110, l)) + V (2, (100, l))] = 105 - t c, 102 - t c + \frac{1}{2} [V (2, (110, n)) + V (2, (100, n))] = 102 - t c, 2 (102 - t c) + \frac{1}{2} [V (2, (110, s)) + V (2, (100, s))] = 99 - 3 t c ⎭ ⎬ ⎫ = 105 - t c = max ⎩ ⎨ ⎧ - 102 - t c + \frac{1}{2} [V (2, (110, l)) + V (2, (100, l))] = 3 - 2 t c, 0 + \frac{1}{2} [V (2, (110, n)) + V (2, (100, n))] = 0 102 - t c + \frac{1}{2} [V (2, (110, s)) + V (2, (100, s))] = - 3 - 2 t c ⎭ ⎬ ⎫ = 3 - 2 t c = max ⎩ ⎨ ⎧ - 2 (102 + t c) + \frac{1}{2} [V (2, (110, l)) + V (2, (100, l))] = - 99 - 3 t c, - 102 - t c + \frac{1}{2} [V (2, (110, n)) + V (2, (100, n))] = - 102 - t c, 0 + \frac{1}{2} [V (2, (110, s)) + V (2, (100, s))] = - 105 - t c ⎭ ⎬ ⎫ = - 99 - 3 t c

At time $t = 1$ and $S_{1} = 98$ , we have

V (1, (98, l)) V (1, (98, n)) V (1, (98, s)) = max ⎩ ⎨ ⎧ 0 + \frac{1}{2} [V (2, (100, l)) + V (2, (94, l))] = 97 - t c 98 - t c + \frac{1}{2} [V (2, (100, n)) + V (2, (94, n))] = 98 - t c 2 (98 - t c) + \frac{1}{2} [V (2, (100, s)) + V (2, (94, s))] = 99 - 3 t c ⎭ ⎬ ⎫ = 99 - 3 t c = max ⎩ ⎨ ⎧ - 98 - t c + \frac{1}{2} [V (2, (100, l)) + V (2, (94, l))] = - 1 - 2 t c, 0 + \frac{1}{2} [V (2, (100, n)) + V (2, (94, n))] = 0 98 - t c + \frac{1}{2} [V (2, (100, s)) + V (2, (94, s))] = 1 - 2 t c ⎭ ⎬ ⎫ = 1 - 2 t c = max ⎩ ⎨ ⎧ - 2 (98 + t c) + \frac{1}{2} [V (2, (100, l)) + V (2, (94, l))] = - 99 - 3 t c, - 98 - t c + \frac{1}{2} [V (2, (100, n)) + V (2, (94, n))] = - 98 - t c, 0 + \frac{1}{2} [V (2, (100, s)) + V (2, (94, s))] = - 97 - t c ⎭ ⎬ ⎫ = - 97 - t c

At time $t = 0$ and state $n$ (the only relevant one at the start) we have

V (0, (100, n)) = max ⎩ ⎨ ⎧ - 100 - t c + \frac{1}{2} [V (1, (102, l)) + V (1, (98, l))] 0 + \frac{1}{2} [V (1, (102, n)) + V (1, (98, n))] + 100 - t c + \frac{1}{2} [V (1, (102, s)) + V (1, (98, s))] ⎭ ⎬ ⎫ = max ⎩ ⎨ ⎧ - 100 - t c + \frac{1}{2} [(105 - t c) + (99 - 3 t c)] = 2 - 3 t c, 0 + \frac{1}{2} [(3 - 2 t c) + (1 - 2 t c)] = 2 - 2 t c, + 100 - t c + \frac{1}{2} [(- 99 - 3 t c) + (- 97 - t c)] = 2 - 3 t c ⎭ ⎬ ⎫ = 2 - 2 t c

Summer

Table of Contents

STAD70

Statistics & Finance II §

W1: Financial Data & Returns §

Continuous double auction §

Order types §

Financial data §

Daily data §

Candlestick §

Other data §

Reliability of financial data §

Returns §

Log returns (assumes continuous compounding) §

Dividend adjustment §

Split adjustment §

Net vs log returns §

Monthly returns §

Random walk model §

Exponential/geometric random walk §

Return distribution §

E.g. Identify skewness/kurtosis from QQ plot §

W2: Univariate Return Modelling §

Normality tests §

Heavy tail distributions §

Examples §

Theoretical justification §

Modeling tail behaviour §

Stylized Facts §

Extreme value theorem §

1st EVT (Normalized max of an iid sequence converges to the generalized extreme value distribution) §

2nd EVT (Conditional distribution converges to GPD above threshold) §

W3: Multivariate Modeling §

Multivariate (Student’s) t distribution §

Copula §

Independence copula §

Fréchet-Hoeffding theorem (Copula bounds) §

Sklar’s Theorem §

Gaussian Copula §

Meta-Gaussian distributions §

Simulation §

Elliptical copula §

Archimedean copula §

Fitting copulas §

W4: Portfolio Theory §

Two asset portfolio §

Multiple asset portfolio §

Minimum variance portfolio weights §

Risk-free asset §

CAPM (Capital asset pricing model) §

Market portfolio §

Capital market line §

Security market Line §

Security characteristic line §

Legacy of CAPM §

Performance Evaluation §

W5: Factor Models §

Factor Models §

Time Series Regression Models §

Fama-French 3 Factor Model §

Statistical Factor Models §

Assumptions §

Principal component analysis §

Factor Analysis §

W6: Risk Management §

Types of risks §

Risk measures §

Value at Risk (VaR) §

Limitations §

Conditional VaR / Expected Shortfall §

Examples §

VaR & CVaR of a Normal Variable §

Risk measure properties §

Entropic VaR §

EVaR of a Normal Variable §

Calculating risk measures §

Parametric modeling §

Historical simulation §

Monte Carlo simulation §

Time Series Models §

Statistics & Finance II

W1: Financial Data & Returns

Continuous double auction

Order types

Financial data

Daily data

Candlestick

Other data

Reliability of financial data

Returns

Log returns (assumes continuous compounding)

Dividend adjustment

Split adjustment

Net vs log returns

Monthly returns

Random walk model

Exponential/geometric random walk

Return distribution

E.g. Identify skewness/kurtosis from QQ plot

W2: Univariate Return Modelling

Normality tests

Heavy tail distributions

Examples

Theoretical justification

Modeling tail behaviour

Stylized Facts

Extreme value theorem

1st EVT (Normalized max of an iid sequence converges to the generalized extreme value distribution)

2nd EVT (Conditional distribution converges to GPD above threshold)

W3: Multivariate Modeling

Multivariate (Student’s) t distribution

Copula

Independence copula

Fréchet-Hoeffding theorem (Copula bounds)

Sklar’s Theorem

Gaussian Copula

Meta-Gaussian distributions

Simulation

Elliptical copula

Archimedean copula

Fitting copulas

W4: Portfolio Theory

Two asset portfolio

Multiple asset portfolio

Minimum variance portfolio weights

Risk-free asset

CAPM (Capital asset pricing model)

Market portfolio

Capital market line

Security market Line

Security characteristic line

Legacy of CAPM

Performance Evaluation

W5: Factor Models

Factor Models

Time Series Regression Models

Fama-French 3 Factor Model

Statistical Factor Models

Assumptions

Principal component analysis

Factor Analysis

W6: Risk Management

Types of risks

Risk measures

Value at Risk (VaR)

Limitations

Conditional VaR / Expected Shortfall

Examples

VaR & CVaR of a Normal Variable

Risk measure properties

Entropic VaR

EVaR of a Normal Variable

Calculating risk measures

Parametric modeling

Historical simulation

Monte Carlo simulation

Time Series Models

RiskMetrics Model

GARCH(p, q) Model

W7: Betting Strategies