Time series analysis

In statistics, signal processing, and econometrics, a time series is a sequence of data points, measured typically at successive times, spaced at (often uniform) time intervals. Time series analysis comprises methods that attempt to understand such time series, often either to understand the underlying theory of the data points (where did they come from? what generated them?), or to make forecasts (predictions). Time series prediction is the use of a model to predict future events based on known past events: to predict future data points before they are measured. The standard example is the opening price of a share of stock based on its past performance.

As shown by Box and Jenkins in their book, models for time series data can have many forms and represent different stochastic processes. When modeling the mean of a process, three broad classes of practical importance are the autoregressive (AR) models, the integrated (I) models, and the moving average (MA) models (the MA process is related but not to be confused with the concept of moving average ). These three classes depend linearly on previous data points and are treated in more detail in the articles autoregressive moving average models (ARMA) and autoregressive integrated moving average (ARIMA). The autoregressive fractionally integrated moving average (ARFIMA) model generalizes the former three. Non-linear dependence on previous data points is of interest because of the possibility of producing a chaotic time series.

Among non-linear time series, there are models to represent the changes of variance along time (heteroskedasticity). These models are called autoregressive conditional heteroskedasticity (ARCH) and the collection comprises a wide variaty of representation (GARCH, TARCH, EGARCH, FIGARCH, CGARCH, etc). Recently, wavelet transform based methods (for example locally stationary wavelets and wavelet decomposed neural networks) have gained favour. Multiscale (often referred to as multiresolution) techniques decompose a given time series, attempting to illustrate time dependance at multiple scales.

Notation
A number of different notations are in use for time-series analysis:


 * $$X= \{X_1, X_2, \dots \}$$

is a common notation which specifies a time series X which is indexed by the natural numbers. We also are accustomed to


 * $$Y= \{Y_t : t \in T\ \}$$

Assumptions
There are only two assumptions from which the theory is built:


 * Stationary process
 * Ergodicity

The general representation of an autoregressive model well-known as AR(p) is


 * $$ Y_t =\alpha_0+\alpha_1 Y_{t-1}+\alpha_2 Y_{t-2}+\cdots+\alpha_p Y_{t-p}+\varepsilon_t\, $$

where the term εt is the source of randomness and is called white noise. It is assumed to have the following characteristics:

1. $$ E[\varepsilon_t]=0 \,$$

2. $$ E[\varepsilon^2_t]=\sigma^2 \, $$

3. $$ E[\varepsilon_t\varepsilon_s]=0 \quad\forall t\not=s \, $$

If it also has a normal distribution, it is called normal white noise:


 * $$ \{\varepsilon_t\}_{(t \in T)} : \mbox{Normal-WN} $$

Related tools
Tools for investigating time-series data include:


 * Consideration of the autocorrelation function and the spectral density function
 * Performing a Fourier transform to investigate the series in the frequency domain.
 * Use of a filter to remove unwanted noise.
 * Principal components analysis (or empirical orthogonal function analysis)
 * Artificial neural networks
 * time-frequency analysis techniques:
 * Continuous wavelet transform
 * Short-time Fourier transform
 * Chirplet transform
 * Fractional Fourier transform
 * Chaotic analysis
 * Correlation dimension
 * Recurrence plots
 * Recurrence quantification analysis
 * Lyapunov exponents

Applied time series
Time series analysis is exercised in numerous applied fields, from astrophysics to geology. Model selection is often based on the underlying assymption on the data generating process. Take, for example, traffic flow, here we would fully expect periodic behaviour (with bursts at peak travel times). In such a situation one may consider applying Dynamic Harmonic Regression (this is highly similar to airline data which is frequently analysed in the statistics literature).

More recently there has been increased use of time series methods in geophysics (the analysis of rain fall and climate change for example). Within industry, almost every sector will in some way perform time series analysis. With retail, for example, tracking and predicting sales. Analysts will typically load their data into a statistics package ( R and S-Plus are examples of such programs). The most important step is the review of the Autocorrelation function (ACF) which indicates the number of lagged observations to be included in any time series model (one should always analyse the partial autocorrelation function as well).

In general financial series often require non-linear models (such as ARCH) as the application of autoregressive models often results in a model suggesting that to predict the value of tomorrows, lets say share price here, depends almost entirely on yesterday's share price:


 * $$ Y_t =\alpha_0+\alpha_1 Y_{t-1}+\varepsilon_t \, $$

(where α1 is close to 1).

Robert Engle recognised the importance of including lagged values of the series' variance. In general time series can be considered in the time domain and/or the frequency domain. This duality has led to many of the recent developments in time series analysis. Wavelet-based methods are an attempt to model series in both domains. Wavelets are compactly supported "small waves", which when convolved with the series itself (when scaled and dylated) gives a scale by scale analysis of the temporal dependance of a series. Such wavelet based methods are frequently applied for climate change problems.

One other (and less researched) area of time series analysis considers the "mining" of series to reterospectively extract knowlegde. In the literature this is referred to as time series data mining (TSDM). Techniques in this area often depend on "feature detection". In essence this is an attempt to find the "characteristic" behaviour of the series, and use this to find areas of the series which do not adhere to this behaviour. Current efforts are led by the computer science department at the University of California (Riverside).