The conditional covariance matrix of aat given Ft−1 is a k×k positive definite matrix ΣΣt defined by Cov(aat|Ft−1). Multivariate volatility modeling is concerned with the time evolution of ΣΣt. This is referred to as the volatility model equation of rrt.
Exponentially weighted estimate
An equally weighted estimate of unconditional covariance matrix of the innovations can be estimated by ˆΣ=1t−1t−1∑j=1ajaTj. To allow for a time-varying covariance matrix with emphasis on recent information one can use exponential smoothing as ˆΣt=1−λ1−λt−1t−1∑j=1λj−1at−jaTt−j, where 0<λ<1. For a sufficiently large t such that λt−1≈0, the equation becomes ˆΣt=(1−λ)at−1aTt−1+λˆΣt−1. This is called the EWMA estimate of covariance matrix. The parameters along with λ can be jointly estimated using log-likelihood, which can be evaluated recursively. λ of 0.94 (30 days) comes out commonly as optimal.
Some multivariate GARCH models
- Diagonal Vectorization model (VEC): generalization of exponentially weighted moving-average approach. Each element is a GARCH(1,1) type mode. May not produce a positive definite covariance matrix and does not model the dynamic dependence between volatility series.
- BEKK model: Baba-Engle-Kraft-Kroner model (1995) to guarantee the positive-definite constraint. Too many parameters but models dynamic dependence between the volatility series.
Reparameterization
Σt is reparameterized by making used of the symmetric property.
- Use of correlations - Covariance matrix can be represented as variances and lower triangle correlations and can be jointly modeled. Specifically, we write Σt as DtρtDt, where ρt is the conditional correlation matrix of at, and Dt is a k×k diagonal matrix consisting of conditional standard deviations of elements of at. To model the volatility of at, it suffices to consider the conditional variances and correlation coefficient of ait. The k(k+1)/2 dimensional vector Ξt=(σ11,t,...,σkk,t,ϱTt)T, where ϱt is a k(k−1)/2 dimensional vector obtained by stacking columns of the correlation matrix ρt, but using only the elements below the main diagonal, i.e. ϱt=(ρ21,t,...,ρk1,t|ρ32,t,...,ρk2,t|...|ρk,k−1,t)T. To illustrate, for k=2, we have ϱt=ρ21,t and Ξt=(σ11,t,σ22,t,ρ21,t)T, which is a 3-dimensional vector. The approach has weaknesses because the likelihood function becomes complicated when the dimension is greater than 2. And the approach requires a constrained maximization to ensure positive definiteness.
- Cholesky decomposition - This requires no constrained maximization. This is orthogonal transformation so the resulting likelihood is extremely simple. Because Σt is positive definite, there exist a lower triangular matrix Lt with unit diagonal elements and a diagonal matrix Gt with positive diagonal elements such that Σt=LtGtLTt. A feature of the decomposition is that the lower off-diagonal elements of Lt and the diagonal elements of Gt have close connections with linear regression. Using Cholesky decomposition amounts to doing an orthogonal transformation from at to bt, where b1t=a1t, and bit, for 1<i≤k, is defined recursively by the least-square regression ait=qi1,tb1t+qi2,tb2t+...+qi(i−1),tb(i−1)t+bit, where qij,t is the (i,j)th element of the lower triangular matrix Lt for 1≤j<i. We can write this transformation as at=Ltbt, where Lt is the lower triangular matrix with unit diagonal elements. The covariance matrix of bt is Gt. The parameter vector relevant to volatility modeling under such a transformation becomes Ξt=(g11,t,...,gkk,t,q21,t,q31,t,q32,t,...,qk1,t,...,qk(k−1),t)T, which is also a k(k+1)/2 dimensional vector. The likelihood function also simplifies drastically. There are several advantages of this transformation. First, Σt can be kept positive definite simply by modeling ln(gii,t). Second, element of Ξt are simply the coefficients and residual variances of multiple linear regressions that orthogonalize the shocks to the returns. Third, the correlation coefficient between a1t and a2t, which is simply q21,t√σ11,t/√σ22,y, is time-varying. Finally, we get σij,t=∑jc=1qiv,tqjv,tgvv,t.