Local trend model
For a univariate time series $y_t=\mu_t+\epsilon_t$ and $\mu_{t+1}=\mu_t+\eta_t$, both the error terms are assumed to be normally distributed to distinct variance $\sigma_e^2$ and $\sigma_{\eta}^2$ respectively. Notice the first equation is the observed version of the second trend model with added noise. This model can be used to analyze realized volatility of an asset price if $\mu_t$ is assumed to be the log volatility (which is not directly observable) and $y_t$ is the logarithm or realized volatility (which is observable constructed from high-frequency transaction data with microstructure noise).
If there is no measurement error term in the first equation ($\sigma_e=0$) this becomes a ARIMA(0,1,0) model. With the error term it is a ARIMA(0,1,1) model, which is also the simple exponential smoothing model. The form is $(1-B)y_t=(1-\theta B)a_t$, $\theta$ and $\sigma_{a}^2$ are related to $\sigma^2_e$ and $\sigma^2_{\eta}$ as follows: $(1+\theta^2)\sigma^2_a=2\sigma^2_e+\sigma^2_{\eta}$ and $\theta \sigma^2_a=\sigma^2_e.$ The quadratic equation for $\theta$ will give two solutions with $|\theta|<1$ chosen. The reverse is also possible for positive $\theta$. Both representations have pros and cons and the objective of data analysis, substantive issues and experience decide which to use.
If there is no measurement error term in the first equation ($\sigma_e=0$) this becomes a ARIMA(0,1,0) model. With the error term it is a ARIMA(0,1,1) model, which is also the simple exponential smoothing model. The form is $(1-B)y_t=(1-\theta B)a_t$, $\theta$ and $\sigma_{a}^2$ are related to $\sigma^2_e$ and $\sigma^2_{\eta}$ as follows: $(1+\theta^2)\sigma^2_a=2\sigma^2_e+\sigma^2_{\eta}$ and $\theta \sigma^2_a=\sigma^2_e.$ The quadratic equation for $\theta$ will give two solutions with $|\theta|<1$ chosen. The reverse is also possible for positive $\theta$. Both representations have pros and cons and the objective of data analysis, substantive issues and experience decide which to use.
Statistical Inference
Three types (using reading handwritten note example)
- Filtering - recover state variable $\mu_t$ given $F_t$ to remove the measurement errors from the data. (figuring out the word you are reading based on knowledge accumulated from the beginning to the note).
- Prediction - forecast $\mu_{t+h}$ or $y_{t+h}$ for $h>0$ given $F_t$, where $t$ is the forecast origin. (guess the next word).
- Smoothing - estimate $\mu_t$ given $F_T$, where $T>t$. (deciphering a particular word once you have read through the note).
The Kalman Filter
Let $\mu_{t|j}=E(\mu_t|F_j)$ and $\Sigma_{t|j}=Var(\mu_t|F_j)$ be, respectively, the conditional mean and variance of $\mu_t$ given information $F_j$. Similarly $y_{t|j}$ denotes the conditional mean of $y_t$ given $F_j$. Furthermore let $v_t=y_t-y_{t|j}$ and $V_t=Var(v_t|F_{t-1})$ be 1-step ahead forecast error and its variance of $y_t$ given $F_{t-1}$. Note that $Var(v_t|F_{t-1})=Var(v_t)$, since the forecast error $v_t$ is independent of $F_{t-1}$. Further, $y_{t|t-1}=\mu_{t|t-1}$ giving $v_t=y_t-\mu_{t|t-1}$ and $V_t=\Sigma_{t|t-1}+\sigma^2_e$. Also, $E(v_t)=0$ and $Cov(v_t,y_t)=0$ for $j<t$. The information $F_t \equiv \{F_{t-1},y_t\} \equiv \{F_{t-1},v_t\}$, hence $\mu_{t|t}=E(\mu_t|F_{t-1},v_t)$ and $\Sigma_{t|t}=Var(\mu_t|F_{t-1},v_t)$.
One can show that $Cov(\mu_t,v_t|F_{t-1})=\Sigma_{t|t-1}$ giving, $$\begin{bmatrix} \mu_t \\ v_t \end{bmatrix}_{F_{t-1}} \sim N\left( \begin{bmatrix} \mu_{t|t-1} \\ 0 \end{bmatrix}, \begin{bmatrix} \Sigma_{t|t-1} & \Sigma_{t|t-1} \\ \Sigma_{t|t-1} & V_t\end{bmatrix} \right).$$ Applying the multivariate normal theorem we get $$\mu_t|t = \mu_{t|t-1}+(V_t^{-1}\Sigma_{t|t-1})v_t=\mu_{t|t-1}+K_tv_t,$$ $$\Sigma_{t|t}=\Sigma_{t|t-1}-\Sigma_{t|t-1}V_t^{-1}\Sigma_{t|t-1} = \Sigma_{t|t-1}(1-K_t),$$ where $K_t=V_t^{-1}\Sigma_{t|t-1}$ is referred to as the Kalman gain, which is the regression coefficient of $\mu_t$ on $v_t$, governing the contribution of th enew shock $v_t$ to the state variable $\mu_t$. To predict $\mu_{t+1}$ given $F_t$ we have $$\mu_{t+1|t} \sim N(\mu_{t|t}, \Sigma_{t|t}+\sigma^2_{\eta}).$$ once the new data $y_{t+1}$ is observed, the above procedure can be repeated (obviously once $\sigma_e$ and $\sigma_{\eta}$ are estimated, generally using maximum likelihood method). This is the famous Kalman filter algorithm (1960). The choice of priors $\mu_{1|0}$ and $\Sigma_{1|0}$ requires some attention.
Properties of forecast error -
State error recursion -
State smoothing -
Missing Values -
Effect of Initialization -
Estimation -
One can show that $Cov(\mu_t,v_t|F_{t-1})=\Sigma_{t|t-1}$ giving, $$\begin{bmatrix} \mu_t \\ v_t \end{bmatrix}_{F_{t-1}} \sim N\left( \begin{bmatrix} \mu_{t|t-1} \\ 0 \end{bmatrix}, \begin{bmatrix} \Sigma_{t|t-1} & \Sigma_{t|t-1} \\ \Sigma_{t|t-1} & V_t\end{bmatrix} \right).$$ Applying the multivariate normal theorem we get $$\mu_t|t = \mu_{t|t-1}+(V_t^{-1}\Sigma_{t|t-1})v_t=\mu_{t|t-1}+K_tv_t,$$ $$\Sigma_{t|t}=\Sigma_{t|t-1}-\Sigma_{t|t-1}V_t^{-1}\Sigma_{t|t-1} = \Sigma_{t|t-1}(1-K_t),$$ where $K_t=V_t^{-1}\Sigma_{t|t-1}$ is referred to as the Kalman gain, which is the regression coefficient of $\mu_t$ on $v_t$, governing the contribution of th enew shock $v_t$ to the state variable $\mu_t$. To predict $\mu_{t+1}$ given $F_t$ we have $$\mu_{t+1|t} \sim N(\mu_{t|t}, \Sigma_{t|t}+\sigma^2_{\eta}).$$ once the new data $y_{t+1}$ is observed, the above procedure can be repeated (obviously once $\sigma_e$ and $\sigma_{\eta}$ are estimated, generally using maximum likelihood method). This is the famous Kalman filter algorithm (1960). The choice of priors $\mu_{1|0}$ and $\Sigma_{1|0}$ requires some attention.
Properties of forecast error -
State error recursion -
State smoothing -
Missing Values -
Effect of Initialization -
Estimation -