Independent Component Analysis

Independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents supposing the mutual statistical independence of the non-Gaussian source signals. It is a special case of blind source separation.

Definition
The independence assumption is correct in most cases so the blind ICA separation of a mixed signal gives very good results. It is also used for signals that are not supposed to be generated by a mixing for analysis purposes. A simple application of ICA is the “cocktail party problem”, where the underlying speech signals are separated from a sample data consisting of people talking simultaneously in a room. Usually the problem is simplified by assuming no time delays and echoes. An important note to consider is that if N sources are present, at least N observations (i.e. microphones) are needed to get the original signals. This constitutes the square (J = D, where D is the input dimension of the data and J is the dimension of the model). Other cases of underdetermined (J < D) and overdetermined (J > D) have been investigated.

The statistical method finds the independent components (aka factors, latent variables or sources) by maximizing the statistical independence of the estimated components. Non-Gaussianity, motivated by the central limit theorem, is one method for measuring the independence of the components. Non-Gaussianity can be measured, for instance, by kurtosis or approximations of negentropy. Mutual information is another popular criteria for measuring statistical independence of signals.

Typical algorithms for ICA use centering, whitening and dimensionality reduction as preprocessing steps in order to simplify and reduce the complexity of the problem for the actual iterative algorithm. Whitening and dimension reduction can be achieved with principal component analysis or singular value decomposition. Whitening ensures that all dimensions are treated equally a priori before the algorithm is run. Algorithms for ICA include infomax, FastICA and JADE, but there are many others also.

The ICA method is not able to extract the actual number of source signals, the order of the source signals, nor the signs or the scales of the sources.

The method is important to blind signal separation and has many practical applications.

Mathematical definitions
Linear independent component analysis can be divided into noiseless and noisy cases, where noiseless ICA is a special case of noisy ICA. Nonlinear ICA should be considered as a separate case.

General definition
The data is represented by the random vector $$x=(x_1,\ldots,x_m)$$ and the components as the random vector $$s=(s_1,\ldots,s_n)$$. The task is to transform the observed data $$x$$, using a linear static transformation $$s=Wx$$, into maximally independent components $$s$$ measured by some function $$F(s_1,\ldots,s_n)$$ of independence.

Linear noiseless ICA
The components $$x_i$$ of the observed random vector $$x=(x_1,\ldots,x_m)^T$$ are generated as a sum of the independent components $$s_k$$, $$k=1,\ldots,n$$:

$$x_i = a_{i,1} s_1 + \ldots + a_{i,k} s_k + \ldots + a_{i,n} s_n$$

weighted by the mixing weights $$a_{i,k}$$.

The same generative model can be written in vectorial form as $$x=\sum_{k=1}^{n} a_k s_k$$, where the observed random vector $$x$$ is represented by the basis vectors $$a_k=(a_{1,k},\ldots,a_{m,k})^T$$. The basis vectors $$a_k$$ form the columns of the mixing matrix $$A=(a_1,\ldots,a_n)$$ and the generative formula can be written as $$x=As$$, where $$s=(s_1,\ldots,s_n)^T$$.

Given the model and realizations (samples) $$x_1,\ldots,x_N$$ of the random vector $$x$$, the task is to estimate both the mixing matrix $$A$$ and the sources $$s$$. This is done by adaptively calculating the $$w$$ vectors and setting up a cost function which either maximizes the nongaussianity of the calculated $$s_k = (w^T*x)$$ or minimizes the mutual information. In some cases, a priori knowledge of the probability distributions of the sources can be used in the cost function.

The original sources $$s$$ can be recovered by multiplying the observed signals $$x$$ with the inverse of the mixing matrix $$W=A^{-1}$$, also known as the unmixing matrix. Here it is assumed that the mixing matrix is square ($$n=m$$).

Linear noisy ICA
With the added assumption of zero-mean and uncorrelated Gaussian noise $$n\sim N(0,\operatorname{diag}(\Sigma))$$, the ICA model takes the form $$x=As+n$$.

Nonlinear ICA
The mixing of the sources does not need to be linear. Using a nonlinear mixing function $$f(\cdot|\theta)$$ with parameters $$\theta$$ the nonlinear ICA model is $$x=f(s|\theta)+n$$.

Identifiability
The identifiability of independent component analysis requires that:


 * Only one of the sources $$s_k$$ can be Gaussian,
 * The number of observed mixtures, $$m$$, must be at least as large as the number of estimated components $$n$$: $$m \ge n$$,
 * The mixing matrix $$A$$ must be of full rank.

History and background
The problem of source separation is an old one in electrical engineering and has been well studied; many algorithms exist depending on the nature of the mixed signals. The problem of blind source separation (BSS) is more difficult because, without knowledge of the signals that have been mixed, it is not possible to design appropriate preprocessing to optimally separate them.

The general framework for independent component analysis was introduced by Jeanny Herault and Christian Jutten in 1986 and was most clearly stated by Pierre Comon in 1994. In 1995, Tony Bell and Terry Sejnowski introduced a fast and efficient ICA algorithm based on infomax, a principle introduced by Ralph Linsker in 1992. In 1997, Shun-ichi Amari realized that the infomax ICA algorithm could be improved by using the natural gradient, which was independently discovered by Jean-Francois Cardoso. However, the original infomax ICA algorithm with sigmoidal nonlinearities was only suitable for super-Gaussian sources. Te-Won Lee, in collaboration with Mark Girolami, developed an efficient extended version of the infomax ICA algorithm that is suitable for general non-Gaussian signals.

There are many algorithms available in the literature which do ICA. A largely used one, including in industrial applications, is the FastICA algorithm, developed by Aapo Hyvärinen and Erkki Oja, which uses the kurtosis as cost function. Other examples are rather related to blind source separation where a more general approach is used. For example, one can drop the independence assumption and separate mutually correlated signals, thus, statistically "dependent" signals.

Several different approaches have been taken to independent component analysis, which include maximum likelihood, Bussgang methods based on cumulants, projection pursuit and negentropy methods. All of these are all closely related to the infomax framework, although they have some contributions on their own.

Applications
An example of application of ICA algorithms is to EEG recordings of scalp potentials in humans. The electrical signals originating from the brain are quite weak at the scalp, in the microvolt range, and there are larger artifactual components arising from eye movements and muscles. It has been a difficult challenge to eliminate these artifacts without altering the brain signals. ICA is ideally suited to this task, since the brain and the scalp are good volume conductors and to a good approximation, the recordings are different linear mixtures of the brain signals and the artifacts. Many ICA algorithms have proven to be effective for separating out these artifacts. ICA algorithms have many other biomedical applications, including the analysis of extremely large datasets from functional Magnetic Resonance Imaging (fMRI) experiments, telecommunication or geological signals.

ICA can be extended to analyze non-physical signals. For instance, ICA has been applied to discover discussion topics on a bag of news list archives.