Processing math: 100%

Sunday, August 16, 2015

Tsay Ch 11 - State-space models and kalman filter

Local trend model

For a univariate time series yt=μt+ϵt and μt+1=μt+ηt, both the error terms are assumed to be normally distributed to distinct variance σ2e and σ2η respectively. Notice the first equation is the observed version of the second trend model with added noise. This model can be used to analyze realized volatility of an asset price if μt is assumed to be the log volatility (which is not directly observable) and yt is the logarithm or realized volatility (which is observable constructed from high-frequency transaction data with microstructure noise).

If there is no measurement error term in the first equation (σe=0) this becomes a ARIMA(0,1,0) model. With the error term it is a ARIMA(0,1,1) model, which is also the simple exponential smoothing model. The form is (1B)yt=(1θB)at, θ and σ2a are related to σ2e and σ2η as follows: (1+θ2)σ2a=2σ2e+σ2η and θσ2a=σ2e. The quadratic equation for θ will give two solutions with |θ|<1 chosen. The reverse is also possible for positive θ. Both representations have pros and cons and the objective of data analysis, substantive issues and experience decide which to use.

Statistical Inference

Three types (using reading handwritten note example)
  1. Filtering - recover state variable μt given Ft to remove the measurement errors from the data. (figuring out the word you are reading based on knowledge accumulated from the beginning to the note).
  2. Prediction - forecast μt+h or yt+h for h>0 given Ft, where t is the forecast origin. (guess the next word).
  3. Smoothing - estimate μt given FT, where T>t. (deciphering a particular word once you have read through the note).

The Kalman Filter

Let μt|j=E(μt|Fj) and Σt|j=Var(μt|Fj) be, respectively, the conditional mean and variance of μt given information Fj. Similarly yt|j denotes the conditional mean of yt given Fj. Furthermore let vt=ytyt|j and Vt=Var(vt|Ft1) be 1-step ahead forecast error and its variance of yt given Ft1. Note that Var(vt|Ft1)=Var(vt), since the forecast error vt is independent of Ft1. Further, yt|t1=μt|t1 giving vt=ytμt|t1 and Vt=Σt|t1+σ2e. Also, E(vt)=0 and Cov(vt,yt)=0 for j<t. The information Ft{Ft1,yt}{Ft1,vt}, hence μt|t=E(μt|Ft1,vt) and Σt|t=Var(μt|Ft1,vt).

One can show that Cov(μt,vt|Ft1)=Σt|t1 giving, [μtvt]Ft1N([μt|t10],[Σt|t1Σt|t1Σt|t1Vt]). Applying the multivariate normal theorem we get μt|t=μt|t1+(V1tΣt|t1)vt=μt|t1+Ktvt, Σt|t=Σt|t1Σt|t1V1tΣt|t1=Σt|t1(1Kt), where Kt=V1tΣt|t1 is referred to as the Kalman gain, which is the regression coefficient of μt on vt, governing the contribution of th enew shock vt to the state variable μt. To predict μt+1 given Ft we have μt+1|tN(μt|t,Σt|t+σ2η). once the new data yt+1 is observed, the above procedure can be repeated (obviously once σe and ση are estimated, generally using maximum likelihood method). This is the famous Kalman filter algorithm (1960). The choice of priors μ1|0 and Σ1|0 requires some attention.

Properties of forecast error -
State error recursion -
State smoothing -
Missing Values -
Effect of Initialization -
Estimation - 

Regression assumptions

Everybody in finance knows that the 90% of quant work is 'REGRESSION' and mostly LINEAR. The results of a linear regression are as good as we understand their assumptions. For a univariate case we write yt=α+βxt+ϵt, where the estimation is straightforward. The interesting case is multivariate regression, where we write Yt=ββXt+ϵϵt. To estimate the parameters we use the normal equation to get ββ=(XTX)1XTY Now, how good is this an estimate? We want these estimates to be:
unbiased - The expected value of the estimate is the true value.
consistent - With more observations the distribution of the estimate becomes more concentrated near true value.
efficient - lessor observations are required to establish true value for given confidence.
asymptotically normal - With a lot of observations the distribution of the estimate is a normal distribution.

OLS is consistent when the regressors are exogenous and there is no perfect multicollinearity, and optimal in the class of linear unbiased estimators when the errors are homoscedastic and serially uncorrelated. Under these conditions, OLS provides min-variance and mean-unbiased estimates, when the errors have finite variances. Aussuming errors have normal distribution, OLS is same as MLE. The expanded version of OLS is multi-fractional order estimator (like Kalman filter).

The 'random design' paradigm treats the regressors xi as random and sampled together with yi from some population. The 'fixed design' paradigm treats X as known constants and y is sampled conditionally on the values of X as in an experiment. Practically, the distinction is unimportant and results in the same formula for estimation. 

Assumptions

  1. OLS minimizes error in dependent variable y only and hence assumes there is no error in x.
  2. The functional dependence being modeled is valid.
  3. Strict exogeneity - The errors in regression have conditional mean zero: E[ϵ|X]=0, which implies that error have mean zero: E[ϵ]=0, and that the regressors are uncorrelated with the errors: E[XTϵ]=0. If not true the OLS estimates are invalid. In that case use method of instrumental variables.
  4.  No linear dependence - The regressors in X must be linearly independent, i.e. X must be full rank almost surely. Sometimes we also assume that the regressors has finite moments up to second order, in such a case the matrix XTX will be finite and positive semi-definite. If violated the regressors are called perfectly multicollinear, β can't be estimated, though prediction of y is still possible.
  5. Spherical errors - It is assumed that Var[ϵ|X]=σ2IIn. IF violated OLS estimates are still valid, but no longer efficient. If error terms are don't have same variance, i.e. they are not homoscedastic Weighted least square is used. If there autocorrealation between error terms Generalized least squares is used.
  6. Normality - It is sometimes additionally assumed that errors have normal distribution. This is not required. Under this assumption OLS is equivalent to MLE and is asymptotically efficient in the class of all regular estimators.
Certain degree of correlation between the observations is very common, under which OLS and WLS are inefficient. GLS is the right thing to do: Y=Xβ+ϵE[ϵ|X]=0,Var[ϵ|X]=Ω. GLS estimates β by minimizing the squared Mahalanobis length of the residual vector to give ˆβ=(XTΩ1X)1XTΩ1Y. The GLS estimator is unbiased, consistent, efficient and asymptotically normal. It is equivalent to applying OLS to linearly transformed version of data, which standardize and de-correlates the regressors. WLS is a special case of GLS.

To estimate GLS we use Feasible Generalized Least squares (FGLS) in two steps:
1) Model is estimated using OLS (consistent but inefficient) estimator, and the residuals are used to build a consistent estimator of the error covariance matrix;
2) Using these we estimate GLS.

FGLS is preferred only for large sample size. For small sample size it is better to stick to OLS. FGLS is not always consistent for small sample.

Saturday, August 15, 2015

Tsay Ch9 - Principal Component Analysis and Factor Models

Dimension reduction is essential to search for the underlying structure of the assets - called factors.

Three types of factor models -
1) Macroeconomic factor models - GDP growth, interest rates, inflation, unemployment - observable factors using regression
2) Fundamental factor models - firm size, book and market values, industrial classification.
3) Statistical factor models - non-observable  or latent variables e.g. PCA

General Factor Model

For m factors, k assets, and T time periods let rit be the return of asset i at time period t. The factor model is
rrt=αα+ββfft+ϵϵt,t=1,...,T
where ββ is a k×m factor loading matrix and ϵϵt is the error vector with Covϵϵt=DD=diag[σ21,...,σ2k], a k×k diagonal matrix. The covariance matrix of the returns rrt is then given by:
Cov(rrt)=ββΣΣfββT+DD

Macroeconomic factor models

Macroeconomic factors are observable. We can convert the general factor model into Multiple Linear regression setup and estimate the factors. This estimation does not impose the constraint of ϵit being uncorrelated, so may not be efficient in general. The best known single factor model is the market model (Sharpe 1970). The R2 can reach up to 50%, showing the significance of common market factor. One simple trick to compare factor based covariance matrix with sample covariance matrix is to use the global minimum variance portfolio (GMVP). For a given covariance matrix Σ, the GMVP ω solves minσ2p,ω=ωTΣω, such that ωT11=1 given by
ω=Σ11111TΣ111.
It is also important to verify that the residual covariance matrices do not have large off-diagonal elements, to fit the factor model criteria.

Ross (1986) considers multi-factor model consisting of unexpected changes or surprises (e.g. residuals after fitting VAR(3) model to seasonally adjusted CPI and unemployment growth numbers). The explanatory power is low.

Fundamental factor models

BARRA factor method treats the observed asset specific fundamentals as the factor betas βi, and estimates the factor ft at each time index t via regression. Fama and French construct their factors based on hedge portfolio which depend on the fundamentals. For BARRA factor model ~rrt=ββfft+ϵϵt, where ~rrt is the mean-corrected returns. We need WLS setup since the regression is not homogeneous, the estimate would be ^fft=(βD1βT)1(βD1βT~rt). We estimate the diagonal covariance matrix of errors from OLS first and then use it to estimate the factors using WLS equation. Cross-correlations in errors are ignored. The diagonal covariance matrix of final errors ^Dg and the covariance matrix of estimated factor realizations ˆΣ can be used to derive the covariance matrix of the original returns as Cov(rt)=βˆΣfβT+^Dg. In practice, the sample mean or returns are not different from zero, so one may not need to remove the sample mean before fitting the BARRA factor model.

Fama-French approach used a two-step procedure. First, they sorted the assets based on the value of three fundamentals (market excess returns, small vs big cap, value vs growth stocks). They formed the hedge portfolio which is long top quintile and short the bottom quintile. The observed return on this hedge portfolio is the factor realization for the given asset. Finally, given the factor realizations calculate betas using regression.

Principal component analysis

We look to find linear combinations which explain the most variance and are orthogonal to each other, with weights summing to one. This is done on covariance or correlation matrix which are non-negative definite and hence have spectral decomposition. For covariance matrix the variance explained is λi/λ, which becomes λi/k for a correlation matrix, since Tr(ρr)=k.

Statistical factor analysis

The aim is to identify a few factors that can account for most of the variations in the covariance or correlation matrix of the data. The assumption of no serial correlations is all right for low frequency data but not accurate for higher frequencies. Serial correlations should first be removed parametrically. We then construct orthogonal factor model. Since both the loadings and the factors are unobservable it is different from other factor models. For the Statistical factor model rtμ=βft+ϵt, we have the assumptions E[ft]=0, Cov[ft]=IIm, E[ϵt]=0, Cov[ϵt]=D=diag(σ21,...,σ2k) and E[ftϵT]=0. These are not uniquely determined. This can be estimated either using Principal Component Method or Maximum Likelihood estimation, with specified k. Factor rotation can be used for interpretation using varimax criteria.

Left out sections: 9.6