I was going to leave out this chapter but may be not. I need to see what regime switching is all about.
Remember a purely stochastic time series $x_t$ is liner if it can be decomposed into a Wold form. Anything else is nonlinear. A general nonlinear functions is inapplicable due to too many parameters. We can restrict the model to the form $$x_t=g(F_{t-1})+\sqrt{h(F_{t-1})}\epsilon_t,$$ where $\epsilon_t=a_t/\sigma_t$ is the standardized innovation. ARMA is a linear model for mean while ARCH/GARCH models are nonlinear models of variance (because they model the square of volatility and not volatility). For stationary volatility series, shocks are uncorrelated but dependent. The models in this chapter extend to nonlinear model of mean.
The following four nonlinear models let the conditional mean $\mu_t$ evolve over time according to some simple parametric nonlinear function - Bilinear, Threshold autoregressive, State dependent, Markov switching. More recent nonlinear models are more data driven - nonlinear state-space, functional coefficient autoregressive, nonlinear additive autoregressive, multivariate adaptive regression spline. Finally, nonparametric methods like kernel regression and artificial neural networks have also been used. Test statistics - parametric (based on Lagrange multiplier or likelihood) and non-parametric (based on higher order spectra or dimension correlation) are also discussed.
Nonlinear models: Some basic ones for financial time series.
1) Bilinear Model: The linear model is simply the firs-order Taylor series expansion of $f(.)$ function. As such, a natural extension to non-linearity is to employ the second order terms in the expansion to improve the approximation. Hence the model is
$$x_t=c+\sum_{i=1}^p\phi_ix_{t-i}-\sum_{j=1}^q\theta_j a_{t-j}+\sum_{i=1}^m\sum_{j=1}^s\beta_{ij}x_{t-i}a_{t-j}+a_t.$$
This is generally analyzed by putting the model in state-space form. ARCH models generally seem to fit the data better, when quadratic terms of innovations are involved.
2) Threshold Autoregressive Model (TAR): Motivated by asymmetry in declining and rising patterns and hence is piece-wise linear (not in time space but threshold space). This can be used to model regimes. e.g. here is two regime model
\[
x_t=
\begin{cases}
\phi_1^{(1)}x_{t-1}+a_t, & \text{if } x_{t-l}<\delta\\
\phi_1^{(2)}x_{t-1}+a_t, & \text{if } x_{t-l}\ge \delta
\end{cases}
\]
Here $l$ is the delay parameter and $\delta$ is the threshold, which bifurcates the two regimes. There are several interesting characteristics of TAR models:
a) The process $x_t$ is geometrically ergodic and stationary,under the conditions $\phi_1^{(1)}<1$, $\phi_1^{(2)}<1$, and $\phi_1^{(1)\phi_1^{(2)}}<1$. A process is ergodic if its statistical properties (such as mean and variance) can be deduced from a single, sufficiently long sample (realization) of the process. Ergodic theorem: sample mean $\bar{x}=(\sum_t^Tx_t)/T$ of $x_t$ converges to the mean of $x_t$. This is regarded as the counterpart of the central limit theory for the iid case.
b) The series exhibits an asymmetric increasing and decreasing pattern due to different coefficients in the two regimes, hence different regimes will have different number of observations. The series is hence not time-reversible.
c) The model, in the example, has no constant terms, but $E(x_t)$ is not zero. $E(x_t)$ is a weighted average of the conditional means of the two regimes, which are nonzero. the weight of each regime is simply the probability that $x_t$ is in that regime under its stationary distribution. Hence for TAR model to have zero mean, nonzero constant terms are needed in some regimes, unlike in stationary linear model where nonzero constant implies nonzero mean for $x_t$.
Properties of TAR models are hard to obtain, but estimation is not difficult. US unemployment can be modeled using TAR models where the regime changes after a sudden increase in unemployment causing monetary intervention. It is also used to model asymmetric responses in volatility between positive and negative returns, study arbitrage tradings in index futures and cash prices.
3) Smooth transition AR model (STAR): The mean is discontinuous for TAR, so using logistic, exponential or cumulative functions it can be made continuous and differentiable. These are hard to estimate with large standard errors.
4) Markov switching autoregressive Model (MSA): Using probability of switching, emphasizing aperiodic transition between states, Hamilton (1989).
\[
x_t=
\begin{cases}
c_1+\sum_{i=1}^p\phi_{1,i}x_{t-i}+a_{1t}, & \text{if } s_{t}=1\\
c_2+\sum_{i=1}^p\phi_{2,i}x_{t-i}+a_{2t} & \text{if } s_{t}=2
\end{cases}
\]
where $s_t$ are the two states and is a first order Markov-chain with transition probability matrix
\[
\begin{bmatrix}
1-w_1 & w_1 \\
w_2 & 1-w_2
\end{bmatrix}
\]
The expected duration of the process to stay in state $i$ is $1/w_i$. TAR model uses a deterministic scheme of transition while MSA uses a stochastic scheme. Hence, in MSA one is never certain about which state $x_t$ belongs to. This has important implication. MSA forecast are always some linear combination of the two states, while TAR model picks a particular state. These models are estimated using EM or MCMC algorithms. The model can be further generalized by modeling transition probabilities to be logistic, probit or some other function of explanatory variables available at time $t-1$. Markov switching can hence be used to choose among many models. Taking the example of US quarterly GDP and fitting a MSA model we see the following observations:
a) The states correspond to growth and contraction.
b) Large std of state 2 implies that that there are relatively fewer observations in the contraction state.
c) It is more likely for US GNP to get out of contraction period than to jump into one.
d) Expected duration of expansion is 11 quarters while contraction is 4 quarters.
5) Non-parametric methods: The essence is smoothing, as they are highly data dependent and may otherwise overfit. If given many realizations of $y_t$ for a given $x_t$, an asymptotically accurate value of the functional dependence, $m(x)$, of $Y$ on $X$ at $X=x$ is given by average of $y_t$. In a time series only one observation of $y_t$ is available for a given $x_t$. But if the functional form is sufficiently smooth, then the values of $Y$ for which $X_t\approx x$ continues to provide accurate approximation of $m(x)$. We can use weighted average to estimate $m(x)$. Hence,
$$\hat{m}(x)=\frac{1}{T}\sum_{t=1}^Tw_t(x)y_t$$
Kernel Regression: Kernel $K(x)$ is the weighting function, satisfying $K(x)\ge0$ and $\int K(x)dz=1$. For rescaling of distance measure, one can use bandwidth $h>0$ making the kernel $K_h(x)=K(x/h)/h$ and $\int K_h(z)dz=1$. The weight function simply becomes:
$$w_t(x)=\frac{K_h(x-x_t)}{\sum_{t=1}^T K_h(x-x_t)}$$, which gives the Nadarya-Watson kernel estimator (1964). Kernels chosen are generally, Gaussian, Epanechnikov. If $h$ is very small, then the weights focus on a few observations that are int he neighborhood around each $x_t$. If $h$ is very large, then the weights will spread over a larger neighborhood of x_t. Bandwidth selection is done by either minimizing the mean integrated square error (using some preliminary smoothing estimations) or leave-one-out cross validation. Fan and Yao (2003) give the bandwidth for Gaussian as $1.06\sigma_xT^{-0.2}$.
Local Linear Regression Method:- Takes the weighted kernel function but minimizes the least square error to estimate the parameters of the equation.
Functional coefficient AR model:- Treat each AR equation coefficient as function and estimate, using kernels for functions. Shown to have better 1-step ahead forecasting.
Nonlinear additive AR model:- To avoid curse of dimensionality use GAM style AR model.
Nonlinear State-Space model:- instead of linear matrix evolution, use nonlinear evolution of states, using MCMC.
Left of sections: 4.1.9,4.2-4.5