Friday, September 4, 2015

Tsay Ch10 - Multivariate Volatility Models and their applicaitons

Generalize the univariate volatility models of chapter 3 to the multivariate case as well as simplifying the dynamic relationship between volatility process of multiple asset returns, to address the curse of dimensionality and time-varying correlations. Consider a multivariate return series $\pmb{r}_t$ given by $$\pmb{r}_t=\pmb{\mu}_t+\pmb{a}_t,$$ where $\pmb{\mu}_t=E(\pmb{r}_t|F_{t-1})$ is the conditional expectation of $\pmb{r}_t$ given the past information and $\pmb{a}_t$ is the shock at time t. The mean equation of $\pmb{r}_t$ can be modeled as a multivariate time series process (ch 8) , e.g. a simple VARMA process $$\pmb{\mu}_t=\pmb{\Gamma}\pmb{x}_t+\sum_{i=1}^{p}\pmb{\Phi}_i\pmb{r}_{t-i}-\sum_{i=1}^{q}\pmb{\Theta}_i\pmb{a}_{t-i},$$ where $\pmb{x}_t$ denotes the $m$-dimensional vector of exogenous (explanatory) variables with $x_{1t}=1$, $\pmb{\Gamma}$ is a $k\times m$ matrix, and $p$ and $q$ are non-negative integers.

The conditional covariance matrix of $\pmb{a}_t$ given $F_{t-1}$ is a $k\times k$ positive definite matrix $\pmb{\Sigma}_t$ defined by $Cov(\pmb{a}_t|F_{t-1})$. Multivariate volatility modeling is concerned with the time evolution of $\pmb{\Sigma}_t$. This is referred to as the volatility model equation of $\pmb{r}_t$.

Exponentially weighted estimate

An equally weighted estimate of unconditional covariance matrix of the innovations can be estimated by $$\hat{\Sigma}=\frac{1}{t-1}\sum_{j=1}^{t-1}a_j a_j^T.$$ To allow for a time-varying covariance matrix with emphasis on recent information one can use exponential smoothing as $$\hat{\Sigma}_t=\frac{1-\lambda}{1-\lambda^{t-1}}\sum_{j=1}^{t-1}\lambda^{j-1}a_{t-j}a_{t-j}^T,$$ where $0<\lambda<1.$ For a sufficiently large t such that $\lambda^{t-1}\approx 0,$ the equation becomes $$\hat{\Sigma}_t=(1-\lambda)a_{t-1}a_{t-1}^T+\lambda \hat{\Sigma}_{t-1}.$$ This is called the EWMA estimate of covariance matrix. The parameters along with $\lambda$ can be jointly estimated using log-likelihood, which can be evaluated recursively. $\lambda$ of 0.94 (30 days) comes out commonly as optimal. 

Some multivariate GARCH models

  1. Diagonal Vectorization model (VEC): generalization of exponentially weighted moving-average approach. Each element is a GARCH(1,1) type mode. May not produce a positive definite covariance matrix and does not model the dynamic dependence between volatility series. 
  2. BEKK model: Baba-Engle-Kraft-Kroner model (1995) to guarantee the positive-definite constraint. Too many parameters but models dynamic dependence between the volatility series. 

Reparameterization

$\Sigma_{t}$ is reparameterized by making used of the symmetric property.
  1. Use of correlations - Covariance matrix can be represented as variances and lower triangle correlations and can be jointly modeled. Specifically, we write $\Sigma_t$ as $D_t\rho_tD_t$, where $\rho_t$ is the conditional correlation matrix of $a_t$, and $D_t$ is a $k \times k$ diagonal matrix consisting of conditional standard deviations of elements of $a_t$. To model the volatility of $a_t$, it suffices to consider the conditional variances and correlation coefficient of $a_{it}$. The $k(k+1)/2$ dimensional vector $\Xi_t= (\sigma_{11,t},...,\sigma_{kk,t}, \varrho_t^T)^T$, where $\varrho_t$ is a $k(k-1)/2$ dimensional vector obtained by stacking columns of the correlation matrix $\rho_t$, but using only the elements below the main diagonal, i.e. $\varrho_t=(\rho_{21,t},...,\rho_{k1,t}|\rho_{32,t},...,\rho_{k2,t}|...|\rho_{k,k-1,t})^T$. To illustrate, for $k=2$, we have $\varrho_t=\rho_{21,t}$ and $\Xi_t=(\sigma_{11,t},\sigma_{22,t},\rho_{21,t})^T$, which is a 3-dimensional vector. The approach has weaknesses because the likelihood function becomes complicated when the dimension is greater than 2. And the approach requires a constrained maximization to ensure positive definiteness. 
  2. Cholesky decomposition - This requires no constrained maximization. This is orthogonal transformation so the resulting likelihood is extremely simple. Because $\Sigma_t$ is positive definite, there exist a lower triangular matrix $L_t$ with unit diagonal elements and a diagonal matrix $G_t$ with positive diagonal elements such that $\Sigma_t=L_tG_tL_t^T.$ A feature of the decomposition is that the lower off-diagonal elements of $L_t$ and the diagonal elements of $G_t$ have close connections with linear regression. Using Cholesky decomposition amounts to doing an orthogonal transformation from $a_t$ to $b_t$, where $b_{1t}=a_{1t}$, and $b_{it}$, for $1<i \le k$, is defined recursively by the least-square regression $a_{it}=q_{i1,t}b_{1t}+q_{i2,t}b_{2t}+...+q_{i(i-1),t}b_{(i-1)t}+b_{it}$, where $q_{ij,t}$ is the $(i,j)$th element of the lower triangular matrix $L_t$ for $1\le j <i$. We can write this transformation as $a_t=L_tb_t$, where $L_t$ is the lower triangular matrix with unit diagonal elements. The covariance matrix of $b_t$ is $G_t$. The parameter vector relevant to volatility modeling under such a transformation becomes $\Xi_t=(g_{11,t},...,g_{kk,t},q_{21,t},q_{31,t},q_{32,t},...,q_{k1,t},...,q_{k(k-1),t})^T$, which is also a $k(k+1)/2$ dimensional vector. The likelihood function also simplifies drastically. There are several advantages of this transformation. First, $\Sigma_t$ can be kept positive definite simply by modeling $ln(g_{ii,t})$. Second, element of $\Xi_t$ are simply the coefficients and residual variances of multiple linear regressions that orthogonalize the shocks to the returns. Third, the correlation coefficient between $a_{1t}$ and $a_{2t}$, which is simply $q_{21,t}\sqrt{\sigma_{11,t}}/\sqrt{\sigma_{22,y}}$, is time-varying. Finally, we get $\sigma_{ij,t}=\sum_{c=1}^{j}q_{iv,t}q_{jv,t}g_{vv,t}.$

GARCH models for bivariate returns

Thursday, September 3, 2015

Multivariate Normal distribution

This was forthcoming, especially, if you want to understand Kalman filter.

A $k$-dimensional random vector $\pmb{x}=(x_1,...,x_k)^T$ follows a multivariate normal distribution with mean $\pmb{\mu}=(\mu_1,...,\mu_k)^T$ and positive-definite covariance matrix $\pmb{\Sigma}=[\sigma_{ij}]$ if its probability density function is $$f(x|\mu, \Sigma)=\frac{1}{(2\pi)^{k/2}|\Sigma|^{1/2}}e^{-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)}.$$ This is denoted by $x\sim N_k(\mu,\Sigma).$ A square matrix $A (m\times m)$ is a positive-definite matrix if $A$ is symmetric, and all eigenvalues of $A$ are positive. Alternatively, $A$ is a positive-definite matrix if for any nonzero $m$-dimensional vector $b$, we have $b^TAb>0.$ For a positive-definite matrix $A$ all eigenvalues are positive and matrix can be decomposed as $A=P\Lambda P^T,$ where $\lambda$ is a diagonal matrix consisting of all eigenvalues of $A$ and $P$ is an $m\times m$ matrix consisting of the $m$ right eigenvectors of $A$, making $P$ an orthogonal matrix, if eigenvalues are distinct.

For a symmetric matrix $A$, there exists a lower triangular matrix $L$ with diagonal elements being 1 and a diagonal matrix $G$ such that $A=LGL^T$. If $A$ is positive definite, then the diagonal elements of G are positive. In this case we can write $A=(L\sqrt{G})(L\sqrt{G})^T$, where $L\sqrt{G}$ again is a lower triangle matrix. Such a decomposition is called Cholesky decomposition of $A$. This shows that a positive-definite matrix $A$ can be diagonalized as $L^{-1}A(L^T)^{-1}=L^{-1}A(L^{-1})^T=G.$

Let $c=[c_1,...,c_k]^T$ be a nonzero vector partitioned as $x=[x_1^T,x_2^T]^T$, with the first of size $p$ and the second of size $k-p$ such that, $$\begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \sim N\left( \begin{bmatrix} \mu_1 \\ \mu_2 \end{bmatrix}, \begin{bmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22}\end{bmatrix} \right).$$ Some properties of $x$ are:

  1. $c^Tx \sim N\left( c^T\mu, c^T\Sigma c \right)$, any nonzero linear combination of $x$ is univariate normal and vice-versa.
  2. The marginal distribution of $x_i$ is normal, $x_i \sim N_k \left( \mu_i, \Sigma_{ii}\right)$.
  3. $\Sigma_{12}=0$ if an only if $x_1$ and $x_2$ are independent.
  4. The variable $(x-\mu)^T\Sigma^{-1}(x-\mu)$ follows a chi-squared distribution with $m$ degrees of freedom.
  5. The conditional distribution of $x_1$ given $x_2=b$ is also normally distributed as $$(x_1|x_2=b)\sim N \left( \mu_1+\Sigma_{12}\Sigma_{22}^{-1}(b-\mu_2), \Sigma_{11}-\Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} \right).$$
Suppose that $x$, $y$, and $z$ are three random vectors such that their joint distribution is multivariate normal. In addition, assume that the diagonal block covariance matrix $\Sigma_{ww}$ is nonsingular for $w=x,y,z$, and $\Sigma_{yz}=0$. Then,

  1. $(x|y) \sim N \left( \mu_x+\Sigma_{xy}\Sigma_{yy}^{-1}(y-\mu_y), \Sigma_{xx}-\Sigma_{xx}\Sigma_{yy}^{-1}\Sigma_{yx}\right)$
  2. $(x|y,z) \sim N\left( E(x|y)+\Sigma_{xz}\Sigma_{zz}^{-1}(z-\mu_z), Var(x|y)-\Sigma_{xz}\Sigma_{zz}^{-1}\Sigma_{zx}\right)$