ICA in Finance

This version of the paper is for reading online and will not give good results when printed. Please download either the Postscript or PDF version for printing a hardcopy. If you cannot see the equations online please read these notes.

Forthcoming in: International Journal of Neural Systems, Vol. 8, No.5 (October, 1997).
(Special Issue on Data Mining in Finance.)

A First Application of Independent Component Analysis

to Extracting Structure from Stock Returns

Andrew D. Back
Brain Science Institute
The Institute of Physical and Chemical Research (RIKEN)
2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan

back@brain.riken.go.jp
www.open.brain.riken.go.jp/~back/

Andreas S. Weigend
Department of Information Systems
Leonard N. Stern School of Business
New York University
44 West Fourth Street, MEC 9-74
New York, NY 10012, USA

aweigend@stern.nyu.edu
www.stern.nyu.edu/~aweigend

In this paper we consider the application of a signal processing technique known as independent component analysis (ICA) or blind source separation, to multivariate financial time series such as a portfolio of stocks. The key idea of ICA is to linearly map the observed multivariate time series into a new space of statistically independent components (ICs). We apply ICA to three years of daily returns of the 28 largest Japanese stocks and compare the results with those obtained using principal component analysis (PCA). The results indicate that the estimated ICs fall into two categories, (i) infrequent but large shocks (responsible for the major changes in the stock prices), and (ii) frequent smaller fluctuations (contributing little to the overall level of the stocks). We show that the overall stock price can be reconstructed surprisingly well by using a small number of thresholded weighted ICs. In contrast, when using shocks derived from principal components instead of independent components, the reconstructed price is less similar to the original one. Independent component analysis is shown to be a potentially powerful method of analysing and understanding driving mechanisms in financial time series.

1 Introduction

What drives the movements of a financial time series? This surely is a question of interest to many, ranging from researchers who wish to understand financial markets, to traders who will benefit from such knowledge. Can modern knowledge discovery and data mining techniques help discover some of the underlying forces?

In this paper, we focus on a new technique which to our knowledge has not been used in any significant application to financial or econometric problems¹. The method is known as independent component analysis (ICA) and is also referred to as blind source separation [29,32,22]. The central assumption is that an observed multivariate time series (such as daily stock returns) reflect the reaction of a system (such as the stock market) to a few statistically independent time series. ICA seeks to extract out these independent components (ICs) as well as the mixing process.

ICA can be expressed in terms of the related concepts of entropy [6], mutual information [3], contrast functions [22] and other measures of the statistical independence of signals. For independent signals, the joint probability can be factorized into the product of the marginal probabilities. Therefore the independent components can be found by minimizing the Kullback-Leibler divergence between the joint probability and marginal probabilities of the output signals [3]. Hence, the goal of finding statistically independent components can be expressed in several ways:

Find a set of directions that factorize the joint probabilities.
Find a set of directions with minimum mutual information. When the mutual information between variables vanish, they are statistically independent.
Find a set of ``interesting" directions. The goal of finding interesting is similar to projection pursuit [26,30,25]. In the knowledge discovery and data mining community the term ``interestingness'' [53] is also used to denote unexpectedness [54].

From this basis, algorithms can be derived to extract the desired independent components. In general, these algorithms can be considered as unsupervised learning procedures. Recent reviews are given in [2,12,51,52].

Independent component analysis can also be contrasted with principal component analysis (PCA) and so we give a brief comparison of the two methods here. Both ICA and PCA linearly transform the observed signals into components. The key difference however, is in the type of components obtained. The goal of PCA is to obtain principal components which are uncorrelated. Moreover, PCA gives projections of the data in the direction of the maximum variance. The principal components (PCs) are ordered in terms of their variances: the first PC defines the direction that captures the maximum variance possible, the second PC defines (in the remaining orthogonal subspace) the direction of maximum variance, and so forth. In ICA however, we seek to obtain statistically independent components.

PCA algorithms use only second order statistical information. On the other hand, ICA algorithms may use higher order² statistical information for separating the signals, see for example [11,22]. For this reason non-Gaussian signals (or at most, one Gaussian signal) are normally required for ICA algorithms based on higher order statistics. For PCA algorithms however, the higher order statistical information provided by such non-Gaussian signals is not required or used, hence the signals in this case can be Gaussian. PCA algorithms can be implemented with batch algorithms or with on-line algorithms. Examples of on-line or ``neural'' PCA algorithms include [9,4,45].

This paper is organized in the following way. Section 2 provides a background to ICA and a guide to some of the algorithms available. Section 3 discusses some issues concerning the general application of ICA to financial time series. Our specific experimental results for the application of ICA to Japanese equity data are given in section 4. In this section we compare results obtained using both ICA and PCA. Section 5 draws some conclusions about the use of ICA in financial time series.

2 ICA in General

2.1 Independent Component Analysis

ICA denotes the process of taking a set of measured signal vectors, x, and extracting from them a (new) set of statistically independent vectors, y, called the independent components or the sources. They are estimates of the original source signals which are assumed to have been mixed in some prescribed manner to form the observed signals.

Figure 1: Schematic representation of ICA. The original sources s are mixed through matrix A to form the observed signal x. The demixing matrix W transforms the observed signal x into the independent components y.

Figure 1 shows the most basic form of ICA. We use the following notation: We observe a multivariate time series { x_i(t)} , i = 1,...,n, consisting of n values at each time step t. We assume that it is the result of a mixing process

x_i(t) =

n
å
j = 1

a_ijs_j(t) .

(1)

Using the instantaneous observation vector x(t) = [x₁(t),x₂(t),...,x_n(t)]^¢ where ¢ indicates the transpose operator, the problem is to find a demixing matrix W such that

y(t)

Wx(t)

WAs(t)

(2)

where A is the unknown mixing matrix. We assume throughout this paper that there are as many observed signals as there are sources, hence A is a square n ×n matrix. If W = A^-1, then y(t) = s(t), and perfect separation occurs. In general, it is only possible to find W such that WA = PD where P is a permutation matrix and D is a diagonal scaling matrix [55].

To find such a matrix W, the following assumptions are made:

The sources { s_j(t)} are statistically independent. While this might sound strong, it is not an unreasonable assumption when one considers for example, sources of very different origins ranging from foreign politics to microeconomic variables that might impact a stock price.
At most one source has a Gaussian distribution. In the case of financial data, normally distributed signals are so rare that only allowing for one of them is not a serious restriction.
The signals are stationary. Stationarity is a standard assumption that enters almost all modeling efforts, not only ICA.

In this paper, we only consider the case when the mixtures occur instantaneously in time. It is also of interest to consider models based on multichannel blind deconvolution [33,59,44,57,64,47] however we do not do this here.

2.2 Algorithms for ICA

The earliest ICA algorithm that we are aware of and one which generated much interest in the field is that proposed by [29]. Since then, various approaches have been proposed in the literature to implement ICA. These include: minimizing higher order moments [11] or higher order cumulants³ [14], minimization of mutual information of the outputs or maximization of the output entropy [6], minimization of the Kullback-Leibler divergence between the joint and the product of the marginal distributions of the outputs [3].

ICA algorithms are typically implemented in either off-line (batch) form or using an on-line approach. A standard approach for batch ICA algorithms is the following two-stage procedure [8,14].

Decorrelation or whitening. Here we seek to diagonalize the covariance matrix of the input signals.
Rotation. The second stage minimizes a measure of the higher order statistics which will ensure the non-Gaussian output signals are as statistically independent as possible. It can be shown that this can be carried out by a unitary rotation matrix [14]. This second stage provides the higher order independence.

This approach is sometimes referred to as ``decorrelation and rotation''. Note that this approach relies on the measured signals being non-Gaussian. For Gaussian signals, the higher order statistics are zero already and so no meaningful separation can be achieved by ICA methods. For non-Gaussian random signals the implication is that not only should the signals be uncorrelated, but that the higher order cross-statistics (eg. moments or cumulants) are zeroed.

The empirical study carried out in this paper uses the JADE (Joint Approximate Diagonalization of Eigenmatrices) algorithm [14] which is a batch algorithm and is an efficient version of the above two step procedure. The first stage is performed by computing the sample covariance matrix, giving the second order statistics of the observed outputs. From this, a matrix is computed by eigendecomposition which whitens the observed data. The second stage consists of finding a rotation matrix which jointly diagonalizes eigenmatrices formed from the fourth order cumulants of the whitened data. The outputs from this stage are the independent components. For specific details of the algorithm, the reader is referred to [14]. The JADE algorithm has been extended by [50]. Other examples of two-step methods were proposed in [11,21,8].

A wide variety of on-line algorithms have been proposed [3,6,15 -23,27,28,31,34-39,43,46,49]. Many of these algorithms are sometimes referred to as ``neural" learning algorithms. They employ a cost function which is optimized by adjusting the demixing matrix to increase independence of outputs.

ICA algorithms have been developed recently using the natural gradient approach [3,1]. A similar approach was independently derived by Cardoso [13] who referred to it as a relative gradient algorithm. This theoretically sound modification to the usual on-line updating algorithm overcomes the problem of having to perform matrix inversions at each time step and therefore permits significantly faster convergence.

Another approach, known as contextual ICA was developed in [48]. In this method, which is based on maximum likelihood estimation, the source distributions are modeled and the temporal nature of the signals is used to derive the demixing matrix. The density functions of the input sources are estimated using past values of the outputs. This algorithm proved to be effective in separating signals having colored Gaussian distributions or low kurtosis.

The ICA framework has also been extended to allow for nonlinear mixing. One of the first approaches in this area was given in [10]. More recently, an information theoretic approach to estimating sources assumed to be mixed and passed through an invertible nonlinear function was proposed in [63] and [62]. Unsupervised learning algorithms based on maximizing entropy and minimizing mutual information are described in [61]. Lin, Grier and Cowan describe a local version of ICA [38]. Rather than finding one global coordinate transformation, local ICAs are carried out for subsets of the data. While using invertible transformations, this is a promising way to express global nonlinearities.

ICA algorithms for mixed and convolved signals have also been considered, see for example [24,33,44,47,52,57,59,64].

3 ICA in Finance

3.1 Reasons to Explore ICA in Finance

ICA provides a mechanism of decomposing a given signal into statistically independent components. The goal of this paper is to explore whether ICA can give some indication of the underlying structure of the stock market. The hope is to find interpretable factors of instantaneous stock returns. Such factors could include news (government intervention, natural or man-made disasters, political upheaval), response to very large trades and of course, unexplained noise. Ultimately, we hope that this might yield new ways of analyzing and forecasting financial time series, contributing to a better understanding of financial markets.

3.2 Preprocessing

Like most time series approaches, ICA requires the observed signals to be stationary⁴. In this paper, we transform the nonstationary stock prices p(t), to stock returns by taking the difference between successive values of the prices, x(t) = p(t)-p(t-1). Given the relatively large change in price levels over the few years of data, an alternative would have been to use relative returns, log(p(t)) - log(p(t-1)), describing geometric growth as opposed to additive growth.

4 Analyzing Stock Returns with ICA

4.1 Description of the Data

To investigate the effectiveness of ICA techniques for financial time series, we apply ICA to data from the Tokyo Stock Exchange. We use daily closing prices from 1986 until 1989⁵ of the 28 largest firms, listed in the Appendix. Figure 2 shows the stock price of the first company in our set, the Bank of Tokyo-Mitsubishi, between August 1986 and July 1988. For the same time interval, Figure 3 displays the movements of the eight largest stocks, offset from each other for clarity.

Figure 2: The price of the Bank of Tokyo-Mitsubishi stock for the period 8/86 until 7/88. This bank is one of the largest companies traded on the Tokyo Stock Exchange.

Figure 3: The largest eight stocks on the Tokyo Stock Exchange for the period 8/86 until 7/88. In this figure, each stock has been offset for clarity. The approximate range for each stock is 2500 Yen. The lowest line displays the price of the Bank of Tokyo-Mitsubishi, shown also in the previous figure.

The preprocessing consists of three steps: we obtain the daily stock returns as indicated in Section 3.2, subtract the mean of each stock, and normalize the resulting values to lie within the range [ -1,1] . Figure 4 shows these normalized stock returns.

Figure 4: The stock returns (differenced time series) of the first eight stocks for the period 8/86 until 7/88. The large negative return at day 317 corresponds to the crash of 19 October 1987. The lowest line again corresponds to the Bank of Tokyo-Mitsubishi. The question is: can ICA reveal useful information about these time series?

4.2 Structure of the Independent Components

We performed ICA on the stock returns using the JADE algorithm [14] described in Section 2.2. In all the experiments, we assume that the number of stocks equals the number of sources supplied to the mixing model.

In the results presented here, all 28 stocks are used as inputs in the ICA. However for clarity, the figures only display the first few ICs⁶. Figure 5 shows a subset of eight ICs obtained from the algorithm. Note that the goal of statistical independence forces the 1987 crash to be carried by only a few components.

Figure 5: The first eight ICs, resulting from the ICA of all 28 stocks. Note that the ICs can be seen as quite distinct shock inputs into the system.

We now present the analysis of a specific stock, the Bank of Tokyo-Mitsubishi. The contributions of the ICs to any given stock can be found as follows.

For a given stock return, there is a corresponding row of the mixing matrix A used to weight the independent components. By multiplying the corresponding row of A with the ICs, we obtain the weighted ICs. We define dominant ICs to be those ICs with the largest maximum signal amplitudes. They have the largest effect on the reconstructed stock price. In contrast, other criteria, such as the variance, would focus not on the largest value but on the average.

Figure 6 weights the ICs with the the first row of the mixing matrix which corresponds to the Bank of Tokyo-Mitsubishi. The four traces at the bottom show the four most dominant ICs for this stock.

From the usual mixing process given by Eq. (1), we can obtain the reconstruction of the ith stock return in terms of the estimated ICs as

^
x

(t-j)

n
å
k = 1

a_ik y_k(t-j) j = 0,..,N-1

(3)

where y_k(t-j) is the value of the kth estimated IC at time t-j and a_ik is the weight in the ith row, kth column of the estimated mixing matrix A (obtained as the inverse of the demixing matrix W). We define the weighted ICs for the ith observed signal (stock return) as

_
y

(t-j)

a_iky_k(t-j) k = 1,..,n; j = 0,..,N-1.

(4)

In this paper, we rank the weighted ICs with respect to the first stock return. Therefore, we multiply the ICs with the first row of the mixing matrix and use a_1k k = 1,..,n to obtain the weighted ICs. The weighted ICs are then sorted⁷ using an L_¥ norm since we are most interested in showing just those ICs which cause the maximum price change in a particular stock.

Figure 6: The four most dominant ICs after weighting by the first row of the mixing matrix A (corresponding to the Bank of Tokyo-Mitsubishi) are shown starting from the bottom trace. The top trace is the summation of the remaining 24 ICs, the least dominant ICs for this stock. The weighted sum of all the ICs corresponds to the original stock return.

Figure 7: The dotted line on top is the original stock price. The solid line in the middle shows the reconstructed stock price using the four most dominant weighted ICs. The dashed line on the bottom shows the reconstructed residual stock price obtained by adding up the remaining 24 weighted ICs. Note that the major part of the true `shape' comes from the most dominant components; the contribution of the non-dominant ICs to the overall shape is only small. In this plot, the cumulative sum of the residual ICs are plotted at an offset of 1550.

The ICs obtained from the stock returns reveal the following aspects:

Only a few ICs contribute to most of the movements in the stock return.
Large amplitude transients in the dominant ICs contribute to the major level changes. The nondominant components do not contribute significantly to level changes.
Small amplitude ICs contribute to the change in levels over short time scales, but over the whole period, there is little change in levels.

Figure 7 shows the reconstructed price obtained using the four most dominant weighted ICs and compares it to the sum of the remaining 24 nondominant weighted ICs.

4.3 Thresholded ICs Characterize Turning Points

The preceding section discussed the effect of a lossy reconstruction of the original prices, obtained by considering the cumulative sums of only the first few dominant ICs. This section goes further and thresholds these dominant ICs. This sets all weighted IC values below a threshold to zero, and only uses those values above the threshold to reconstruct the signal.

The thresholded reconstructions are described by

_
x

(t-j)

r
å
k = 1

æ
è

_
y

(t-j)

ö
ø

j = 0,..,N-1,

g(u)

ì
í
î

| u|

(5)

where [`x]_i(t-j) are the returns constructed using thresholds, g(·) is the threshold function, r is the number of ICs used in the reconstruction and x is the threshold value. The threshold was set arbitrarily to a value which excluded almost all of the lower level components.

The reconstructed stock prices are found as

^
p

(j+1)

^
p

(j) +

_
x

(j) j = t-N,..,t-1

^
p

(t-N)

p_i(t-N)

(6)

For the first stock, the Bank of Tokyo-Mitsubishi, p₁(t-N) = 1550. By setting x = 0 and r = n the original price series is reconstructed exactly.

Figure 8: Thresholded returns obtained from the four most dominant weighted ICs for the Bank of Tokyo-Mitsubishi.

The thresholded returns of the four most dominant ICs are shown in Figure 8, and the stock price reconstructed from the thresholded return values are shown in Figure 9. The figures indicate that the thresholded ICs provide useful morphological information and can extract the turning points of original time series.

Figure 9: ICA results for the Bank of Tokyo-Mitsubishi: reconstructed prices for the Bank of Tokyo-Mitsubishi obtained by computing the cumulative sum of only the thresholded values displayed in the previous figure. Note that the price for 1,000 points plotted is characterized well by only a few innovations.

4.4 Comparison with PCA

PCA is a well established tool in finance. Applications range from Arbitrage Pricing Theory and factor models to input selection for multi-currency portfolios [58]. Here we seek to compare the performance of PCA with ICA.

Singular value decomposition (SVD) is used to obtain the principal components as follows. Let X denote the N ×n data matrix, where N is the number of observed vectors, and each vector has n components (usually N >> n). The data matrix can be decomposed into

X = USV^¢

(7)

where U is an N ×n column orthogonal matrix, S is an n ×n diagonal matrix consisting of the singular values and V is an n ×n matrix. The principal components are given by XV = US , ie., the vectors of the orthonormal columns in U, weighted by the singular values from S.

In Figure 10 the four most dominant PCs corresponding to the Bank of Tokyo-Mitsubishi are shown. Figure 11 shows the reconstructed price obtained using the four most dominant PCs and compares it to the sum of the remaining 24 nondominant PCs.

The results from the PCs obtained from the stock returns reveal the following aspects:

The distinct shocks which were identified in the ICA case are much more difficult to observe.
While the first four PCs are by construction, the best possible fit in a quadratic error sense to the data, they do not offer the same insight in structure of the data compared to the ICs.
The dominant transients obtained from the PCs, ie., after thresholding, do not lead to the same overall shape of the stock returns as the ICA approach. Hence we cannot make the same conclusions about high level and low level signals in the data. The effect of thresholding is shown in Figures 12 and 13.

For the experiment reported here, the four most dominant PCs are the same, whether ordered in terms of variance or using the L_¥ norm as in the ICA case. Beyond that the orders change.

Figure 10: The four most dominant PCs corresponding to stock returns for the Bank of Tokyo-Mitsubishi.

Figure 11: The solid line in the middle shows the reconstructed stock price using the four most dominant PCs. The dashed line (lowest) shows the reconstructed stock price by adding up the remaining 24 PCs. The sum of the two lines corresponds to the true price, this highest line is identical is identical to the true price (dotted line). In this case, the error is smaller than that obtained when using ICA. However, the overall shape of the stock is not reconstructed as well by the PCs. This can also be seen more clearly after thresholding in Figures 12 and 13.

Figure 12: Thresholded returns obtained from the four most dominant PCs applicable to the Bank of Tokyo-Mitsubishi.

Figure 13 shows the reconstructed stock price from the thresholded returns are a poor fit to the overall shape of the original price. This implies that key high level transients that were extracted by ICA are not obtained through PCA.

Figure 13: PCA results for the Bank of Tokyo-Mitsubishi: reconstructed prices for the Bank of Tokyo-Mitsubishi. The graph was obtained using only the thresholded values displayed in the previous figure. In this case, the model does not capture the large transients observed in the ICA case and fails to adequately approximate the shape of the original stock price curve.

In summary, while PCA also decomposes the original data, the PCs do not possess the high order independence obtained of the ICs. A major difference emerges when only the largest shocks of the estimated sources are used. While the cumulative sum of the largest IC shocks retains the overall shape, this is not the case for the PCs.

5 Conclusions

This paper applied independent component analysis to decompose the returns from a portfolio of 28 stocks into statistically independent components. The components of the instantaneous vectors of observed daily stocks are statistically dependent; stocks on average move together. In contrast, the components of the instantaneous daily vector of ICs are constructed to be statistically independent. This can be viewed as decomposing the returns into statistically independent sources. On three years of daily data from the Tokyo stock exchange, we showed that the estimated ICs fall into two categories, (i) infrequent but large shocks (responsible for the major changes in the stock prices), and (ii) frequent but rather small fluctuations (contributing only little to the overall level of the stocks).

We have shown that by using a portfolio of stocks, ICA can reveal some underlying structure in the data. Interestingly, the `noise' we observe may be attributed to signals within a certain amplitude range and not to signals in a certain (usually high) frequency range. Thus, ICA gives a fresh perspective to the problem of understanding the mechanisms that influence the stock market data.

In comparison to PCA, ICA is a complimentary tool which allows the underlying structure of the data to be more readily observed. There are clearly many other avenues in which ICA techniques can be applied to finance.

Acknowledgements

We are grateful to Morio Yoda, Nikko Securities, Tokyo for kindly providing the data used in this paper, and to Jean-François Cardoso for making the source code for the JADE algorithm available. Andrew Back acknowledges support of the Frontier Research Program, RIKEN and would like to thank Seungjin Choi and Zhang Liqing for helpful discussions. Andreas Weigend acknowledges support from the National Science Foundation (ECS-9309786), and would like to thank Fei Chen, Elion Chin and Juan Lin for stimulating discussions.

Appendix

The experiments used the following 28 stocks:


1.	The Bank Of Tokyo-Mitsubishi	15.	Japan Asahi Bank
2.	Toyota Motor	16.	Tokai Bank
3.	Sumitomo Bank	17.	Honda Motor
4.	Fuji Bank	18.	Sony
5.	Dai-Ichi Kangyo Bank	19.	Seibu Railway
6.	Industrial Bank Of Japan	20.	Toshiba
7.	Sanwa Bank	21.	Ito-Yokado
8.	Matsushita Electric	22.	Kansai Electric Power
9.	Industrial Sakura Bank	23.	Nippon Steel
10.	Nomura Securities	24.	Mitsubishi Trust And Banking
11.	Tokyo Electric Power	25.	Nissan Motor
12.	Hitachi	26.	Denso
13.	Mitsubishi Heavy Industries	27.	Mitsubishi
14.	Seven-Eleven	28.	Tokio Marine

References

[1]: S. Amari. Natural gradient works efficiently in learning. Neural Computation, 10(2):251-276, 1998.
[2]: S. Amari and A. Cichocki. Blind signal processing- neural network approaches. Proc. IEEE, Special issue on blind identification and estimation, 1998. to appear.
[3]: S. Amari, A. Cichocki, and H.H. Yang. A new learning algorithm for blind signal separation. In G. Tesauro, D.S. Touretzky, and T.K. Leen, editors, Advances in Neural Information Processing Systems 8 (NIPS*95), pages 757-763, Cambridge, MA, 1996. The MIT Press.
[4]: P. Baldi and K. Hornik. Neural networks and principal component analysis: Learning from examples without local minima. Neural Networks, 2:53-58, 1989.
[5]: Yoram Baram and Z. Roth. Forecasting by density shaping using neural networks. In Proceedings of the 1995 Conference on Computational Intelligence for Financial Engineering (CIFEr), pages 57-71, Piscataway, NJ, 1995. IEEE Service Center.
[6]: A.J. Bell and T.J. Sejnowski. An information maximization approach to blind separation and blind deconvolution. Neural Computation, 7:1129-1159, 1995.
[7]: A. Belouchrani, K. Abed Meraim, J.F. Cardoso, and É. Moulines. A blind source separation technique based on second order statistics. IEEE Trans. on S.P., 45(2):434-44, February 1997.
[8]: R. E. Bogner. Blind separation of sources. Technical Report 4559, Defence Research Agency, Malvern, May 1992.
[9]: H. Bourlard and Y. Kamp. Auto-association by multilayer perceptrons and singular value decomposition. Biological Cybernetics, 59:291-294, 1988.
[10]: G. Burel. Blind separation of sources: a nonlinear neural algorithm. Neural Networks, 5:937-947, 1992.
[11]: J. Cardoso. Source separation using higher order moments. In International Conference on Acoustics, Speech and Signal Processing, pages 2109-2112, 1989.
[12]: J.F. Cardoso. Blind signal separation: a review. Proc. IEEE, Special issue on blind identification and estimation, 1998. To appear.
[13]: J.F. Cardoso and B. Laheld. Equivariant adaptive source separation. IEEE Trans. Signal Processing, 44(12):3017-3030, December 1996.
[14]: J.F. Cardoso and A. Souloumiac. Blind beamforming for non-Gaussian signals. IEE Proc. F., 140(6):771-774, December 1993.
[15]: S. Choi, R. Liu, and A. Cichocki. A spurious equilibria-free learning algorithm for the blind separation of non-zero skewness signals. Neural Processing Letters, 7:1-8, 1998.
[16]: A. Cichocki, S. Amari, and R. Thawonmas. Blind signal extraction using self-adaptive non-linear hebbian learning rule. In Proc. of Int. Symposium on Nonlinear Theory and its Applications, NOLTA-96, pages 377-380, Kochi, Japan, 1996.
[17]: A. Cichocki and L. Moszczy\'nski. New learning algorithm for blind separation of sources. Electronics Letters, 28(21):1986-1987, October 8 1992.
[18]: A. Cichocki, R. Thawonmas, and S. Amari. Sequential blind signal extraction in order specified by stochastic properties. Electronics Letters, 33(1):64-65, 1997.
[19]: A. Cichocki and R. Unbehauen. Neural Networks for Optimization and Signal Processing. Wiley, 1993.
[20]: A. Cichocki, R. Unbehauen, and E. Rummert. Robust learning algorithm for blind separation of signals. Electronics Letters, 30(17):1386-1387, 1994.
[21]: P. Comon. Separation of sources using high-order cumulants. In SPIE Conference on Advanced Algorithms and Architectures for Signal Processing, volume XII, pages 170-181, San Diego, CA, August 1989.
[22]: P. Comon. Independent component analysis - a new concept? Signal Processing, 36(3):287-314, 1994.
[23]: N. Delfosse and P. Loubaton. Adaptive blind separation of independent sources: a deflation approach. Signal Processing, 45:59-83, 1995.
[24]: S.C. Douglas and A. Cichocki. Neural networks for blind decorrelation of signals. IEEE Trans. Signal Processing, 45(11):2829-2842, 1997.
[25]: J. H. Friedman. Exploratory projection pursuit. Journal of the American Statistical Association, 82:249-266, 1987.
[26]: J.H. Friedman and J.W. Tukey. A projection pursuit algorithm for exploratory data analysis. IEEE Trans. Computers, 23:881-889, 1974.
[27]: M. Girolami and C. Fyfe. An extended exploratory projection pursuit network with linear and nonlinear anti-hebbian connections applied to the cocktail party problem. Neural Networks, 10(9):1607-1618, 1997.
[28]: M. Girolami and C. Fyfe. Extraction of independent signal sources using a deflationary exploratory projection pursuit network with lateral inhibition. IEE Proceedings on Vision, Image and Signal Processing, 14(5):299-306, 1997.
[29]: J. Herault and C. Jutten. Space or time adaptive signal processing by neural network models. In J. S. Denker, editor, Neural Networks for Computing. Proceedings of AIP Conference, pages 206-211, New York, 1986. American Institute of Physics.
[30]: Peter J. Huber. Projection pursuit. The Annals of Statistics, 13:435-475, 1985.
[31]: A. Hyvärinen. Simple one-unit algorithms for blind source separation and blind deconvolution. In Progress in Neural Information Processing ICONIP'96, volume 2, pages 1201-1206. Springer, 1996.
[32]: C. Jutten and J. Herault. Blind separation of sources, Part I: An adaptive algorithm based on neuromimetic architecture. Signal Processing, 24:1-10, 1991.
[33]: C. Jutten, H.L. Nguyen Thi, E. Dijkstra, E. Vittoz, and J. Caelen. Blind separation of sources, an algorithm for separation of convolutive mixtures. In Proceedings of Int. Workshop on High Order Statistics, pages 273-276, Chamrousse (France), 1991.
[34]: J. Karhunen. Neural approaches to independent component analysis and source separation. In Proceedings of 4th European Symp. on Artificial Neural Networks (ESANN'96), pages 249-266, Bruges, Belgium, April 1996.
[35]: J.L. Lacoume and P. Ruiz. Separation of independent sources from correlated inputs. IEEE Trans. Signal Processing, 40:3074-3078, December 1992.
[36]: T.W. Lee, M. Girolami, A.J. Bell, and T. Sejnowski. A unifying information theoretic framework for independent component analysis. International Journal on Mathematical and Computer Modelling, to appear, 1998.
[37]: S. Li and T.J. Sejnowski. Adaptive separation of mixed broad-band sound sources with delays by a beamforming Hérault-Jutten network. IEEE Journal of Oceanic Engineering, 20(1):73-79, January 1995.
[38]: Juan K. Lin, David G. Grier, and Jack D. Cowan. Faithful representation of separable distributions. Neural Computation, 9:1305-1320, 1997.
[39]: X.-T. Ling, Y.-F. Huang, and R. Liu. A neural network for blind signal separation. In Proc. of IEEE Int. Symposium on Circuits and Systems (ISCAS-94), pages 69-72, New York, NY, 1994. IEEE Press.
[40]: J. E. Moody and L. Wu. What is the ``true price''? - State space models for high frequency financial data. In Progress in Neural Information Processing (ICONIP'96), pages 697-704, Berlin, 1996. Springer.
[41]: J. E. Moody and L. Wu. What is the ``true price''? - State space models for high frequency FX data. In A. S. Weigend, Y. S. Abu-Mostafa, and A.-P. N. Refenes, editors, Decision Technologies for Financial Engineering (Proceedings of the Fourth International Conference on Neural Networks in the Capital Markets, NNCM-96), pages 346-358, Singapore, 1997. World Scientific.
[42]: J. E. Moody and L. Wu. What is the ``true price''? - State space models for high frequency FX data. In Proceedings of the IEEE/IAFE 1997 Conference on Computational Intelligence for Financial Engineering (CIFEr), pages 150-156, Piscataway, NJ, 1997. IEEE Service Center.
[43]: E. Moreau and O. Macchi. Complex self-adaptive algorithms for source separation based on high order contrasts. In Signal Processing VII Proceedings of EUSIPCO-94, pages 1157-1160, Lausanne, Switzerland, 1994. EURASIP.
[44]: H-L. Nguyen Thi and C. Jutten. Blind source separation for convolutive mixtures. Signal Processing, 45(2):209-229, 1995.
[45]: E. Oja. Neural networks, principal components and subspaces. International Journal of Neural Systems, 1:61-68, 1989.
[46]: E. Oja and J. Karhunen. Signal separation by nonlinear hebbian learning. In M. Palaniswami, Y. Attikiouzel, R. Marks II, D. Fogel, and T. Fukuda, editors, Computational Intelligence - A Dynamic System Perspective, pages 83-97, New York, NY, 1995. IEEE Press.
[47]: Lucas Parra, Clay Spence, and Bert de Vries. Convolutive source separation and signal modeling with ML. In International Symposium on Intelligent Systems (ISIS'97), University of Reggio Calabria, Italy, 1997.
[48]: Barak A. Pearlmutter and Lucas C. Parra. Maximum likelihood blind source separation: A context-sensitive generalization of ICA. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9 (NIPS*96), pages 613-619. MIT Press, Cambridge, MA, 1997.
[49]: D. Pham, P. Garat, and C. Jutten. Separation of a mixture of independent sources through a maximum likelihood approach. In J. Vandevalle, R. Boite, M. Moonen, and A. Oosterlink, editors, Signal Processing VI: Theories and Applications, pages 771-774. Elsevier, 1992.
[50]: K.J. Pope and R.E. Bogner. Blind separation of speech signals. In Proc. of the Fifth Australian Int. Conf. on Speech Science and Technology, pages 46-50, Perth, Western Australia, December 6-8 1994.
[51]: K.J. Pope and R.E. Bogner. Blind signal separation. I: Linear, instantaneous combinations. Digital Signal Processing, 6:5-16, 1996.
[52]: K.J. Pope and R.E. Bogner. Blind signal separation. II: Linear, convolutive combinations. Digital Signal Processing, 6:17-28, 1996.
[53]: B.D. Ripley. Pattern Recognition and Neural Networks. Cambridge University Press, 1996.
[54]: Avi Silberschatz and Alexander Tuzhilin. What makes patterns interesting in knowledge discovery systems. IEEE Transactions on Knowledge and Data Engineering, 8(6):970-974, 1996.
[55]: L. Tong, R.W. Liu, V.C. Soon, and Y.F. Huang. Indeterminacy and identifiability of blind identification. IEEE Trans. Circuits, Syst., 38(5):499-509, May 1991.
[56]: L. Tong, V. C. Soon, Y. F. Huang, and R. Liu. AMUSE: A new blind identification algorithm. In International Conference on Acoustics, Speech and Signal Processing, pages 1784-1787, 1990.
[57]: K. Torkkola. Blind separation of convolved sources based on information maximization. In S. Usui, Y. Tohkura, S. Katagiri, and E. Wilson, editors, Proc. of the 1996 IEEE Workshop Neural Networks for Signal Processing 6 (NNSP96), pages 423-432, New York, NY, 1996. IEEE Press.
[58]: J. Utans, W. T. Holt, and A. N. Refenes. Principal component analysis for modeling multi-currency portfolios. In A. S. Weigend, Y. S. Abu-Mostafa, and A.-P. N. Refenes, editors, Decision Technologies for Financial Engineering (Proceedings of the Fourth International Conference on Neural Networks in the Capital Markets, NNCM-96), pages 359-368, Singapore, 1997. World Scientific.
[59]: E. Weinstein, M. Feder, and A.V. Oppenheim. Multi-channel signal separation by de-correlation. IEEE Trans. Speech and Audio Processing, 1(10):405-413, 1993.
[60]: L. Wu and J. Moody. Multi-effect decompositions for financial data modelling. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9 (NIPS*96), pages 995-1001. MIT Press, Cambridge, MA, 1997.
[61]: H. H. Yang and S. Amari. Adaptive on-line learning algorithms for blind separation: Maximum entropy and minimum mutual information. Neural Computation, 9:1457-1482, 1997.
[62]: H.H. Yang, S. Amari, and A. Cichocki. Information-theoretic approach to blind separation of sources in non-linear mixture. Signal Processing, 64(3):291-300, 1998.
[63]: Howard H. Yang, S. Amari, and Andrzej Cichocki. Information back-propagation for blind separation of sources in non-linear mixtures. In IEEE International Conference on Neural Networks, Houston TX (ICNN'97), pages 2141-2146. IEEE-Press, 1997.
[64]: D. Yellin and E. Weinstein. Multichannel signal separation: Methods and analysis. IEEE Transactions on Signal Processing, 44:106-118, 1996.

Footnotes:

¹ We are only aware of [5] who use a neural network that maximizes output entropy and of [40,41,42,60] who apply ICA in the context of state space models for interbank foreign exchange rates to improve the separation between observational noise and the ``true price.''

² ICA algorithms based on second order statistics have also been proposed [7,56].

³ For four zero-mean variables y_i, y_j, y_k, y_l, the fourth order cumulant is given by

E [y_i y_j y_k y_l ] - E [ y_i y_j] E [ y_k y_l ] - E [ y_i y_k] E [ y_j y_l ] - E [ y_i y_l] E [ y_j y_k ] .

This is the difference between the expected value E[ ·] of the product of the four variables (fourth moment), and the three products of pairs of covariances (second moments). The diagonal elements (i = j = l = m) are the fourth order self-cumulants.

⁴ A signal x(t) is considered to be stationary if the expected value is constant, or, after removing a constant mean, E[x(t)] = 0. In practice however, this definition depends on the interval over which we wish to measure the expectation.

⁵ We chose a subset of available historical data on which to test the method. This allows us to reserve subsequent data for further experimentation.

⁶ We also explored the effect of reducing the number of stocks that entered the ICA. The result is that the signal separation degrades when only fewer stocks are used. In that case, the independent components appear to give less distinct information. We had access to data for a maximum of 28 stocks.

⁷ ICs can by sorted in various ways. For example, in the implementation of the JADE algorithm Cardoso used a Euclidean norm to sort the rows of the demixing matrix W according to their contribution across all signals[14].

File translated from T_EX by T_TH, version 1.57.