How markets are interrelated is important to understand the lead-lag relationship and under what circumstances they reverse or do not work. Various previous methods can be applied directly to the vector case, but some need attention.
Weak stationarity and cross-correlation matrices
k-dimensional time series
rtrt=[r1t,...,rkt]T is weakly stationary if its first and second moments are time-invariant,
μμ=E(rrt) and
ΓΓ0=E[(rrt−μμ)(rrt−μμ)T]. Let
DD be a
k×k diagonal matrix consisting of the standard deviations of
rit. The lag-zero, cross-correlation matrix is defined as
ρρ0=DD−1ΓΓ0DD−1, which is the correlation matrix. The lag-
l cross-covariance matrix
rrt is defined as
ΓΓl=E[(rrt−μμ)(rrt−l−μμ)T]. For a weakly stationary series, the cross-covariance matrix
ΓΓl is a function of
l, not the time index
t. The lag-
l cross-correlation matrix (CCM) is defined as
ρρl=DD−1ΓΓlDD−1. To understand this better, notice
ρij(l)=Γij(l)√Γii(0)Γjj(0)=Cov(rit,rj,t−l)std(rit)std(rjt)
which is the correlation between
rit and
rj,t−l.
If ρij(l)≠0 and l>0, we say that the series rjt leads the series rit at lag l. Similarly, if
ρji(l)≠0 and
l>0 we say that the series
rit leads the series
rjt at lag
l. The diagonal element
ρii(l) is simply the lag-
l autocorrelation coefficient of
rit.
Some remarks are (
l>0):
1) In general
ρij(l)≠ρji(l) for
i≠j, because they are measuring two different lag relationships, implying
ΓΓl and
ρρl are in general
not symmetric.
2) It is easy to see that
ΓΓl=ΓΓT−l and
ρρl=ρρT−l. Hence it suffices in practice to consider the cross-correlation matrices
ρρl for
l≥0.
For the cross-correlation matrices {
ρρl|l=0,1,...}, the diagonal elements {
ρii(l)|l=0,1,...} are the autocorrelation function of
rit, the off-diagonal element
ρij(0) measures the concurrent linear relationship between
rit and
rjt, and for
l>0 the off-diagonal element
ρij(l) measures the linear dependence of
rit on past value of
rj,t−l. Depending on the value in these matrices one and identify -
1) no linear relationship (
ρij(l)=ρji(l)=0,∀l≥0),
2) concurrent correlation (
ρij(0)≠0),
3) no lead-lag relationship (
ρij(l)=ρji(l)=0,∀l>0),
4) unidirectional relationship (
ρij(l)=0,∀l>0,ρji(v) for some
v≥0), or
5) feedback relationship (
ρij(l)≠0, for some
l>0,ρji(v)≠0 for some
v≥0) .
Sample Cross-correlation matrices can be estimate using
^ρρl=^DD−1^ΓΓl^DD−1, where
ˆΓl=1TT∑t=l+1(rrt−¯rr)(rrt−l−¯rr)T,l≥0. Bootstrapping can be used to get confidence intervals on finite samples.
Multivariate Portmanteau tests: or Multivariate Ljung-Box test with statistic
Q(m) have the null hypothesis
H0:ρρ1=...=ρρm=00, and
Ha:ρρi≠0 for some
iϵ1,...,m. The test statistic assumes the form
Qk(m)=T2m∑l=11T−ltr(ˆΓTl^ΓΓ−10^ΓΓl^ΓΓ−10) and under some regularity conditions follows a chi-squared distribution with
k2m degrees of freedom, asymptotically.
Vector autoregressive models (VAR)
A multivariate time series
rrt is a VAR process of order 1, or VAR(1) for short, if it follows the model
rrt=ϕϕ0+ΦΦrrt−1+aat, where
ϕϕ0 is a k-dimensional vector,
ΦΦ is a
k×k matrix, and
aat is a sequence of uncorrelated random vectors with mean zero and covariance matrix
ΣΣ, which is positive definite (generally assumed to be multivariate normal).
Positive definite matrix is a symmetric matrix with all eigenvalues positive. Also for any vector
bb, we have
bbTAbAb>0. These types of matrices can be decomposed as
AA=PΛPPΛPT, where
ΛΛ is a diagonal matrix consisting of eigenvalues of
AA and
PP is a square matrix consisting of eigenvectors of
AA. These eigenvectors are orthogonal to each other. Matrix
PP is orthogonal and this decomposition is referred to as
spectral decomposition. For a symmetric matrix
AA, there exists a lower triangular matrix
LL with diagonal elements being 1 and a diagonal matrix
GG such that
AA=LGLLGLT. If
AA is positive definite, then the diagonal elements of
GG are positive. In this case, we have
A=L√G√GLA=L√G√GLT=MMMMT, where
M=L√GM=L√G is a lower triangular matrix. This is called
Cholesky decomposition. Notice that this implies
LL−1AA(LL−1)T=GG.
Reduced and Structural form: In general the off diagonal elements of matrix
ΣΣ show the concurrent relationship between
r1t and
r2t, while the matrix
ΦΦ measures the dynamic dependence or
rrt. This is called
reduced-form model because it does not show explicitly the concurrent dependence between the component series. The explicit-form expression of concurrent relationship (for the last series and hence any series by rearrangement) can be deduced by a simple linear transformation. Using Cholesky decomposition (possible because
ΣΣ is positive definite symmetric matrix) we can find a lower triangle matrix
LL with unit diagonal elements such that
Σ=LGLΣ=LGLT, where
GG is a diagonal matrix. If we define
bbt=LL−1aat, then
E(bbt)=LL−1E(atat)=00 and
Cov(bbt)=E(bbtbbTt)=LL−1ΣΣ(LLT)−1=GG. Since
GG is a diagonal matrix the components of
bbt are uncorrelated.
Pre-multiplying the reduced-form with
LL−1, to uncouple the equations, we get
LL−1rrt=LL−1ϕϕ0+LL−1ΦΦrrt−1+LL−1aat=ϕϕ∗0+ΦΦ∗rrt−1+bbt.
The last row of
LL−1 has 1 as the last element, let it be
(wk1,wk2,..,wk,k−1,1), and hence the
structural equation for the last (
kth) time series becomes:
rkt+k−1∑i=1wkirit=ϕ∗k,0+k∑i=1Φ∗kiri,t−1+bkt.
This is possible because
bbt is a diagonal matrix and uncoupled. Reduced-form is commonly used for two reasons - ease in estimation, and concurrent correlations cannot be used in forecasting.
Stationarity condition and moments of a VAR(1) model: All eigenvalues of
ΦΦ should be less than 1 in modulus for weak stationarity for
rrt, provided the covariance matrix of
aat exists. Further we have
ΓΓl=ΦΓΦΓl−1, for
l>0, where
ΓΓj is the lag-j cross-covariance matrix of
rrt. By repeated substitutions we get
ΓΓl=ΦΦlΓΓ0. Further for
ΥΥ=DD−1/2ΦΦDD1/2, we get
ρρl=ΥΥlρρ0. A VAR(p) model is generally converted to a VAR(1) model using companion matrix and then analyzed like a VAR(1) model.
To find the order of a VAR model one can generally use the multi-variant equivalent of PACF with hypothesis tests on the successive residuals. The
ith equation in the PACF is given by
rrt=ϕϕ0+ΦΦ1rrt−1+...+ΦΦirrt−i+aat. Parameters of these equations can be estimated by OLS method. For the
ith equation let the OLS estimates of coefficients be
^ΦΦ(i)j and
^ϕϕ(i)0, where the superscript
(i) is used to denote the VAR(i) model. Then the residual is
^aa(i)t=rrt−^ϕϕ(i)0−^ΦΦ(i)1rrt−1−...−^ΦΦ(i)irrt−i. We then test the hypothesis sequentially to identify the order of VAR model. For
ith and
(i−1)th equations we test
H0:ΦΦi=0 versus
Ha:ΦΦi≠0, the test statistic is
M(i)=−(T−k−i−32)ln(|ˆΣi||ˆΣi−1|). Asymptotically,
M(i) is distributed as a chi-squared distribution with
k2 degrees of freedom, where
k is the dimensionality of the asset universe.
Equivalent AIC and BIC methods can also be employed. OLS or ML is generally used to estimate the parameters, the two methods being asymptotically equivalent. Once a model is fit, residuals should be tested for inadequacy using
Qk(m) statistic (with
k2m−g degrees of freedom). Forecasting is similar to the uni-variate case. Impulse response function is the MA version and can be derived to look at the decay rate. The MA equation is premultiplied by
LL−1 to get the impulse response function of
rrt with the orthogonal innovations
bbt. But different ordering may lead to different response functions, which is a drawback.
Vector moving-average models (VMA) :A VMA(1) model is given by
rrt=θθ0+aat−ΘΘaat−1, where
θθ0 is k-dimensional vector,
ΘΘ is
k×k matrix. Like the uni-variate case the cross-correlations cuts at order 1 and can be used to identify the order. Estimation of VMA model is lot more involved. The conditional or exact MLE approaches can be used.
Vector ARMA models (VARMA): In generalizing from uni-variate to vectors ARMA encounters the issue of identifiability - it may not be uniquely defined. Some constraints need to be imposed - structural specification. These are hardly used.
Marginal models of components: Given the vector models the component models are called the marginal models. For a k-dimensional ARMA(p,q) model, the marginal models are ARMA(kp,(k-1)p+q) models.
Unit root nonstationarity and cointegration: When modeling several unit-root nonstationary time series jointly, one may encounter the case of cointegration. This may be due to the common trend or unit root of one of the components. In other words one can find a linear combination which is stationary. Let
h be the number of unit roots (or common trends) in the
k-dimensional series
xxt. Cointegration exists if
0<h<k, and the quantity
k−h is called the number of
cointegrating factors, these are the different linear combinations that are unit-root stationary. The linear combinations resulting in these unit-root stationary processes are called the
cointegrating vectors. Two price series, if cointegrated, will have common underlying trend and we will lose this information if we take the first difference of the price series. This is because one difference per unit-root preserves useful information. Under cointegration we have more non-stationary series than uni-roots hence losing information. Overdifferencing can lead to the problem of unit roots in the MA matrix polynomial, invertability and estimation. Also, cointegration can also exist after adjusting for transaction costs and exchange-rate risk, which is artificial.
Error correction form: To overcome the difficulty of noninvertible VARMA models one can use this form. A VARMA(p,q) model is
xxt=p∑i=1ΦΦixxt−i+aat−q∑j=1ΘΘjaat−j.
For
Δxxt=xxt−xxt−1, we can subtract
xxt−1 from both side of VARMA equation to get the error correction form.
Δxxt=αβαβTxxt−1+p−1∑i=1ΦΦ∗iΔxxt−i+aat−q∑j=1ΘΘjaat−j
where
αα and
ββ are
k×m full-rank matrix, k is the total asset dimension, m is the cointegrating factors (
m<k). The term
αβαβTxxt−1 is called the
error-correction term, as it compensates for the over-differentiating. Further,
ββTxxt−1 is stationary. Also,
ΦΦ∗j=−p∑i=j+1ΦΦi,j=1,...,p−1 αβαβT=ΦΦp+...+ΦΦ1−II
The time series
ββTxtxt is unit-root stationary, and the columns of
ββ are the cointegrating vectors of
xxt.
Co-integrated VAR models: We focus on VAR models for their simplicity in estimation, to better understand cointegration here. For a k-dimensional VAR(p) model
xxt=μμt+ΦΦ1xxt−1+...+ΦΦpxxt−p+aat,
where
μμ0+μμ1t. Or equivalently, using backshift operator
B
(II−ΦΦ1B−...−ΦΦpBp)xxt=μμt+aat.
The characteristic polynomial in the above is represented as
ΦΦ(B). For a unit-root nonstationary process, 1 is a root making
|ΦΦ(1)|=0 An error-correction form for this can be obtained by subtracting
xxt−1 from both sides of the equation giving
Δxxt=μμt+ΠΠxxt−1+ΦΦ∗1Δxxt−1+...+ΦΦ∗p−1Δxxt−p+1+aat
Where
ΠΠ=ΦΦ1+...+ΦΦp−II=−ΦΦ(1) and
ΦΦ∗j=−∑pi=j+1ΦΦi for
j=1,...,p−1. Further if Rank(
ΠΠ)=0 implies that
xxt is not cointegrated, Rank(
xxt)=
k implies that ECM is not informative and one studies
xxt directly. Finally, if
0<Rank(
ΠΠ)
=m<k then one can write
ΠΠ=αβαβT, where
αα and
ββ are both of rank
m. We have the case of cointegration with
m linearly independent cointegrated vectors
wwt=ββTxxt,a nd has
k−m unit roots or common trends.
To obtain the
k−m common trends vector of size
(k−m)×1,
yyt=ααT⊥xxt, we calculate the orthogonal matrix of size
k×(k−m) as
ααT⊥αα=00.To uniquely identify
αα and
ββ we require that
ββT=[IIm,ββT1], where
TTm is a
m×m identity matrix and
ββ1 is a
(k−m)×m matrix. There are a few more constraints for the process
wwt=ββTxxt to be unit-root stationary.
The rank of
ΠΠ in the ECM is the number of cointegrating vectors. Thus, to test for cointegration, once can examine the rank of
ΠΠ, the approach taken in Johansen test.
Deterministic function: Limiting distributions of cointegration tests depend on the deterministic function
μμt.
1)
μμt=00: All components series of
xxt are
I(1) without drift and the stationary series
wwt=ββTxxt has mean zero.
2)
μμt=ααcc0: Components of
xxt are
I(1) without drift, but
wwt have a nonzero mean
−cc0, called restricted constant.
3)
μμt=μμ0: Component series are
I(1) with drift
μμ0 and
wwt may have a nonzero mean.
4)
μμt=μμ0+αcαc1t: Components of
xxt are
I(1) with drift
μμ0 and
wwt has a linear time trend, called restricted trend.
5)
μμt=μμ0+μμ1t: Both the constant and trend are unrestricted. The components of
xtxt are
I(1) and have a quadratic time trend and
wwt have a linear trend.
Maximum likelihood estimation: Estimation of VAR(p) is quite involved. The deterministic term (
xxt) and stationary terms (
Δxxt) are first bifurcated and estimated using linear regression and have error terms are
uut and
vvt respectively. A relevant eigenvalue problem when solved leads to a likelihood which when maximized gives the estimates of the coefficients.
Johansen test for cointegration: This is esentially testing the rank of the matrix
ΠΠ, for a specified deterministic term
μμt. The number of non-zero eigenvalues of
ΠΠ can be obtained if a consistent estimate of
ΠΠ is available. Looking at the ECM equation it is clear that
ΠΠ is related to the covariance matrix between
xxt−1 and
Δxxt after adjusting for the effects of deterministic trend term and
Δxxt−i for
i=1,...,p−1. Using canonical correlation analysis between the two adjusted equation, the squared correlation between them are calculated to be
ˆλi. There are two versions of Johansen test:
1) Trace cointegration test:-
H0: Rank(
ΠΠ) =
m versus
Ha: Rank(
ΠΠ)>
m. The Likelihood ratio (LR) statistic is
LKtr(m)=−(T−p)k∑i=m+1ln(1−ˆλi)
Due to the presence of unit-roots, the asymptotic distribution of statistic is not chi-squared, but a function of standard Brownian motions. Thus, the critical values must be obtained via simulation.
2) Sequential test:-
H0: Rank(
ΠΠ) =
m versus
Ha: Rank(
ΠΠ)=
m+1. The
LK ratio test statistic, called the maximum eigenvalue statistic, is
LKmax(m)=−(T−p)ln(1−ˆλm+1)
Again, the critical values of the test statistics are nonstandard and must be evaluated via simulation.
Left out sections: 8.7, 8.8