Computational Finance
This version of the paper is for reading
online and will not give good results when printed. Please download either the Postscript or PDF version for printing a hardcopy.
If you cannot see the equations online please read
these notes.
International Journal of Neural Systems, Vol. 8,
No.5 (October, 1997).
(Special Issue on Data Mining in Finance.)
A First Application of Independent Component Analysis
to Extracting Structure from Stock Returns
Andrew D. Back
Brain Science Institute
The Institute of Physical and Chemical Research (RIKEN)
21 Hirosawa, Wakoshi, Saitama 3510198, Japan
back@brain.riken.go.jp
www.open.brain.riken.go.jp/~back/
Andreas S. Weigend
Department of Information Systems
Leonard N. Stern School of Business
New York University
44 West Fourth Street, MEC 974
New York, NY 10012, USA
aweigend@stern.nyu.edu
www.stern.nyu.edu/~aweigend
In this paper we consider the application of a signal processing
technique known as independent component analysis (ICA) or blind source separation, to
multivariate financial time series such as a portfolio of stocks. The key idea of ICA is
to linearly map the observed multivariate time series into a new space of statistically
independent components (ICs). We apply ICA to three years of daily returns of the 28
largest Japanese stocks and compare the results with those obtained using principal
component analysis (PCA). The results indicate that the estimated ICs fall into two
categories, (i) infrequent but large shocks (responsible for the major changes in the
stock prices), and (ii) frequent smaller fluctuations (contributing little to the overall
level of the stocks). We show that the overall stock price can be reconstructed
surprisingly well by using a small number of thresholded weighted ICs. In contrast, when
using shocks derived from principal components instead of independent components, the
reconstructed price is less similar to the original one. Independent component analysis is
shown to be a potentially powerful method of analysing and understanding driving
mechanisms in financial time series.
1 Introduction
What drives the movements of a financial time series? This surely is a question of
interest to many, ranging from researchers who wish to understand financial markets, to
traders who will benefit from such knowledge. Can modern knowledge discovery and data
mining techniques help discover some of the underlying forces?
In this paper, we focus on a new technique which to our knowledge has not been used in
any significant application to financial or econometric problems^{1}. The method is known as independent component
analysis (ICA) and is also referred to as blind source separation [29,32,22]. The central assumption is that an observed
multivariate time series (such as daily stock returns) reflect the reaction of a system
(such as the stock market) to a few statistically independent time series. ICA seeks to
extract out these independent components (ICs) as well as the mixing process.
ICA can be expressed in terms of the related concepts of entropy [6], mutual information [3], contrast functions [22]
and other measures of the statistical independence of signals. For independent signals,
the joint probability can be factorized into the product of the marginal probabilities.
Therefore the independent components can be found by minimizing the KullbackLeibler
divergence between the joint probability and marginal probabilities of the output signals
[3]. Hence, the goal of finding
statistically independent components can be expressed in several ways:
 Find a set of directions that factorize the joint probabilities.
 Find a set of directions with minimum mutual information. When the mutual information
between variables vanish, they are statistically independent.
 Find a set of ``interesting" directions. The goal of finding interesting is similar
to projection pursuit [26,30,25]. In the knowledge
discovery and data mining community the term ``interestingness'' [53] is also used to denote unexpectedness [54].
From this basis, algorithms can be derived to extract the desired independent
components. In general, these algorithms can be considered as unsupervised learning
procedures. Recent reviews are given in [2,12,51,52].
Independent component analysis can also be contrasted with principal component analysis
(PCA) and so we give a brief comparison of the two methods here. Both ICA and PCA linearly
transform the observed signals into components. The key difference however, is in the type
of components obtained. The goal of PCA is to obtain principal components which are
uncorrelated. Moreover, PCA gives projections of the data in the direction of the maximum
variance. The principal components (PCs) are ordered in terms of their variances: the
first PC defines the direction that captures the maximum variance possible, the second PC
defines (in the remaining orthogonal subspace) the direction of maximum variance, and so
forth. In ICA however, we seek to obtain statistically independent components.
PCA algorithms use only second order statistical information. On the other hand,
ICA algorithms may use higher order^{2}
statistical information for separating the signals, see for example [11,22]. For this reason
nonGaussian signals (or at most, one Gaussian signal) are normally required for ICA
algorithms based on higher order statistics. For PCA algorithms however, the higher order
statistical information provided by such nonGaussian signals is not required or used,
hence the signals in this case can be Gaussian. PCA algorithms can be implemented with
batch algorithms or with online algorithms. Examples of online or ``neural'' PCA
algorithms include [9,4,45].
This paper is organized in the following way. Section 2
provides a background to ICA and a guide to some of the algorithms available.
Section 3 discusses some issues concerning the general
application of ICA to financial time series. Our specific experimental results for the
application of ICA to Japanese equity data are given in section 4. In this section we compare results obtained using both ICA
and PCA. Section 5 draws some conclusions about the
use of ICA in financial time series.
2 ICA in General
2.1 Independent Component Analysis
ICA denotes the process of taking a set of measured signal vectors, x, and extracting
from them a (new) set of statistically independent vectors, y, called the independent
components or the sources. They are estimates of the original source signals which are
assumed to have been mixed in some prescribed manner to form the observed signals.
Figure 1: Schematic representation of ICA. The original sources s are mixed through matrix
A to form the observed signal x. The demixing matrix W transforms the
observed signal x into the independent components y.
Figure 1 shows the most basic form of
ICA. We use the following notation: We observe a multivariate time series { x_{i}(t)}
, i = 1,...,n, consisting of n values at each time step t. We assume that it is
the result of a mixing process
x_{i}(t) = 
n
å
j = 1

a_{ij}s_{j}(t) . 

(1) 
Using the instantaneous observation vector x(t) = [x_{1}(t),x_{2}(t),...,x_{n}(t)]^{¢} where ¢ indicates the transpose
operator, the problem is to find a demixing matrix W such that
where A is the unknown mixing matrix. We assume throughout this paper that there
are as many observed signals as there are sources, hence A is a square n ×n
matrix. If W = A^{1}, then y(t) = s(t), and perfect
separation occurs. In general, it is only possible to find W such that WA = PD
where P is a permutation matrix and D is a diagonal scaling matrix [55].
To find such a matrix W, the following assumptions are made:
 The sources { s_{j}(t)} are statistically independent. While this might sound
strong, it is not an unreasonable assumption when one considers for example, sources of
very different origins ranging from foreign politics to microeconomic variables that might
impact a stock price.
 At most one source has a Gaussian distribution. In the case of financial data, normally
distributed signals are so rare that only allowing for one of them is not a serious
restriction.
 The signals are stationary. Stationarity is a standard assumption that enters almost all
modeling efforts, not only ICA.
In this paper, we only consider the case when the mixtures occur instantaneously in
time. It is also of interest to consider models based on multichannel blind deconvolution
[33,59,44,57,64,47] however we do not do this here.
2.2 Algorithms for ICA
The earliest ICA algorithm that we are aware of and one which generated much interest
in the field is that proposed by [29]. Since
then, various approaches have been proposed in the literature to implement ICA. These
include: minimizing higher order moments [11]
or higher order cumulants^{3} [14], minimization of mutual information of the
outputs or maximization of the output entropy [6],
minimization of the KullbackLeibler divergence between the joint and the product of the
marginal distributions of the outputs [3].
ICA algorithms are typically implemented in either offline (batch) form or using an
online approach. A standard approach for batch ICA algorithms is the following twostage
procedure [8,14].
 Decorrelation or whitening. Here we seek to diagonalize the covariance matrix
of the input signals.
 Rotation. The second stage minimizes a measure of the higher order statistics
which will ensure the nonGaussian output signals are as statistically independent as
possible. It can be shown that this can be carried out by a unitary rotation matrix [14]. This second stage provides the higher
order independence.
This approach is sometimes referred to as ``decorrelation and rotation''. Note that
this approach relies on the measured signals being nonGaussian. For Gaussian signals, the
higher order statistics are zero already and so no meaningful separation can be achieved
by ICA methods. For nonGaussian random signals the implication is that not only should
the signals be uncorrelated, but that the higher order crossstatistics (eg. moments or
cumulants) are zeroed.
The empirical study carried out in this paper uses the JADE (Joint Approximate
Diagonalization of Eigenmatrices) algorithm [14]
which is a batch algorithm and is an efficient version of the above two step procedure.
The first stage is performed by computing the sample covariance matrix, giving the second
order statistics of the observed outputs. From this, a matrix is computed by
eigendecomposition which whitens the observed data. The second stage consists of finding a
rotation matrix which jointly diagonalizes eigenmatrices formed from the fourth order
cumulants of the whitened data. The outputs from this stage are the independent
components. For specific details of the algorithm, the reader is referred to [14]. The JADE algorithm has been extended by [50]. Other examples of twostep methods were proposed
in [11,21,8].
A wide variety of online algorithms have been proposed [3,6,1523,27,28,31,3439,43,46,49]. Many of
these algorithms are sometimes referred to as ``neural" learning algorithms. They
employ a cost function which is optimized by adjusting the demixing matrix to increase
independence of outputs.
ICA algorithms have been developed recently using the natural gradient approach [3,1]. A similar approach was independently derived by Cardoso [13] who referred to it as a relative gradient
algorithm. This theoretically sound modification to the usual online updating algorithm
overcomes the problem of having to perform matrix inversions at each time step and
therefore permits significantly faster convergence.
Another approach, known as contextual ICA was developed in [48]. In this method, which is based on maximum likelihood
estimation, the source distributions are modeled and the temporal nature of the signals is
used to derive the demixing matrix. The density functions of the input sources are
estimated using past values of the outputs. This algorithm proved to be effective in
separating signals having colored Gaussian distributions or low kurtosis.
The ICA framework has also been extended to allow for nonlinear mixing. One of the
first approaches in this area was given in [10].
More recently, an information theoretic approach to estimating sources assumed to be mixed
and passed through an invertible nonlinear function was proposed in [63] and [62]. Unsupervised learning algorithms based on maximizing entropy
and minimizing mutual information are described in [61]. Lin, Grier and Cowan describe a local version of ICA [38]. Rather than finding one global
coordinate transformation, local ICAs are carried out for subsets of the data. While using
invertible transformations, this is a promising way to express global nonlinearities.
ICA algorithms for mixed and convolved signals have also been considered, see for
example [24,33,44,47,52,57,59,64].
3 ICA in Finance
3.1 Reasons to Explore ICA in Finance
ICA provides a mechanism of decomposing a given signal into statistically independent
components. The goal of this paper is to explore whether ICA can give some indication of
the underlying structure of the stock market. The hope is to find interpretable factors of
instantaneous stock returns. Such factors could include news (government intervention,
natural or manmade disasters, political upheaval), response to very large trades and of
course, unexplained noise. Ultimately, we hope that this might yield new ways of analyzing
and forecasting financial time series, contributing to a better understanding of financial
markets.
3.2 Preprocessing
Like most time series approaches, ICA requires the observed signals to be stationary^{4}. In this paper, we transform the
nonstationary stock prices p(t), to stock returns by taking the difference between
successive values of the prices, x(t) = p(t)p(t1). Given the relatively large change in
price levels over the few years of data, an alternative would have been to use relative
returns, log(p(t))  log(p(t1)), describing geometric growth as opposed to additive
growth.
4 Analyzing Stock Returns with ICA
4.1 Description of the Data
To investigate the effectiveness of ICA techniques for financial time series, we apply
ICA to data from the Tokyo Stock Exchange. We use daily closing prices from 1986 until
1989^{5} of the 28 largest firms,
listed in the Appendix. Figure 2 shows the stock price of the
first company in our set, the Bank of TokyoMitsubishi, between August 1986 and July 1988.
For the same time interval, Figure 3 displays the movements
of the eight largest stocks, offset from each other for clarity.
Figure 2: The price of the Bank of TokyoMitsubishi stock for the period 8/86 until 7/88.
This bank is one of the largest companies traded on the Tokyo Stock Exchange.
Figure 3: The largest eight stocks on the Tokyo Stock Exchange for the period 8/86 until
7/88. In this figure, each stock has been offset for clarity. The approximate range for
each stock is 2500 Yen. The lowest line displays the price of the Bank of
TokyoMitsubishi, shown also in the previous figure.
The preprocessing consists of three steps: we obtain the daily stock returns as
indicated in Section 3.2, subtract the mean of each stock, and
normalize the resulting values to lie within the range [ 1,1] . Figure 4 shows these normalized stock returns.
Figure 4: The stock returns (differenced time series) of the first eight stocks for the
period 8/86 until 7/88. The large negative return at day 317 corresponds to the crash of
19 October 1987. The lowest line again corresponds to the Bank of TokyoMitsubishi. The
question is: can ICA reveal useful information about these time series?
4.2 Structure of the Independent Components
We performed ICA on the stock returns using the JADE algorithm [14] described in Section 2.2. In all
the experiments, we assume that the number of stocks equals the number of sources supplied
to the mixing model.
In the results presented here, all 28 stocks are used as inputs in the ICA. However for
clarity, the figures only display the first few ICs^{6}.
Figure 5 shows a subset of eight ICs obtained from the
algorithm. Note that the goal of statistical independence forces the 1987 crash to be
carried by only a few components.
Figure 5: The first eight ICs, resulting from the ICA of all 28 stocks. Note that the ICs
can be seen as quite distinct shock inputs into the system.
We now present the analysis of a specific stock, the Bank of TokyoMitsubishi. The
contributions of the ICs to any given stock can be found as follows.
For a given stock return, there is a corresponding row of the mixing matrix A
used to weight the independent components. By multiplying the corresponding row of A
with the ICs, we obtain the weighted ICs. We define dominant ICs to be those ICs
with the largest maximum signal amplitudes. They have the largest effect on the
reconstructed stock price. In contrast, other criteria, such as the variance, would focus
not on the largest value but on the average.
Figure 6 weights the ICs with the the first row of the
mixing matrix which corresponds to the Bank of TokyoMitsubishi. The four traces at the
bottom show the four most dominant ICs for this stock.
From the usual mixing process given by Eq. (1),
we can obtain the reconstruction of the ith stock return in terms of the estimated ICs as




n
å
k = 1

a_{ik} y_{k}(tj) j = 0,..,N1 




(3) 
where y_{k}(tj) is the value of the kth estimated IC at time tj
and a_{ik} is the weight in the ith row, kth column of the estimated mixing matrix
A (obtained as the inverse of the demixing matrix W). We define the weighted
ICs for the ith observed signal (stock return) as



a_{ik}y_{k}(tj) k = 1,..,n; j
= 0,..,N1. 




(4) 
In this paper, we rank the weighted ICs with respect to the first stock
return. Therefore, we multiply the ICs with the first row of the mixing matrix and use a_{1k}
k = 1,..,n to obtain the weighted ICs. The weighted ICs are then sorted^{7} using an L_{¥}
norm since we are most interested in showing just those ICs which cause the maximum price
change in a particular stock.
Figure 6: The four most dominant ICs after weighting by the first row of the mixing matrix
A (corresponding to the Bank of TokyoMitsubishi) are shown starting from the
bottom trace. The top trace is the summation of the remaining 24 ICs, the least dominant
ICs for this stock. The weighted sum of all the ICs corresponds to the original stock
return.
Figure 7: The dotted line on top is the original stock price. The solid line in the middle
shows the reconstructed stock price using the four most dominant weighted ICs. The dashed
line on the bottom shows the reconstructed residual stock price obtained by adding up the
remaining 24 weighted ICs. Note that the major part of the true `shape' comes from the
most dominant components; the contribution of the nondominant ICs to the overall shape is
only small. In this plot, the cumulative sum of the residual ICs are plotted at an offset
of 1550.
The ICs obtained from the stock returns reveal the following aspects:
 Only a few ICs contribute to most of the movements in the stock return.
 Large amplitude transients in the dominant ICs contribute to the major level changes.
The nondominant components do not contribute significantly to level changes.
 Small amplitude ICs contribute to the change in levels over short time scales, but over
the whole period, there is little change in levels.
Figure 7 shows the reconstructed price obtained using
the four most dominant weighted ICs and compares it to the sum of the remaining 24
nondominant weighted ICs.
4.3 Thresholded ICs Characterize Turning Points
The preceding section discussed the effect of a lossy reconstruction of the original
prices, obtained by considering the cumulative sums of only the first few dominant ICs.
This section goes further and thresholds these dominant ICs. This sets all weighted IC
values below a threshold to zero, and only uses those values above the threshold to
reconstruct the signal.
The thresholded reconstructions are described by




r
å
k = 1

g 
æ
è 

_
y

ik

(tj) 
ö
ø 
j = 0,..,N1, 







(5) 
where [`x]_{i}(tj) are the returns constructed
using thresholds, g(·) is the threshold function, r is the number of ICs used in the
reconstruction and x is the threshold value. The threshold was
set arbitrarily to a value which excluded almost all of the lower level components.
The reconstructed stock prices are found as




^
p

i

(j) + 
_
x

i

(j) j = tN,..,t1 







(6) 
For the first stock, the Bank of TokyoMitsubishi, p_{1}(tN) = 1550. By
setting x = 0 and r = n the original price series is
reconstructed exactly.
Figure 8: Thresholded returns obtained from the four most dominant weighted ICs for the
Bank of TokyoMitsubishi.
The thresholded returns of the four most dominant ICs are shown in Figure 8, and the stock price reconstructed from the thresholded return
values are shown in Figure 9. The figures indicate that the
thresholded ICs provide useful morphological information and can extract the turning
points of original time series.
Figure 9: ICA results for the Bank of TokyoMitsubishi: reconstructed prices for the Bank
of TokyoMitsubishi obtained by computing the cumulative sum of only the thresholded
values displayed in the previous figure. Note that the price for 1,000 points plotted is
characterized well by only a few innovations.
4.4 Comparison with PCA
PCA is a well established tool in finance. Applications range from Arbitrage Pricing
Theory and factor models to input selection for multicurrency portfolios [58]. Here we seek to compare the
performance of PCA with ICA.
Singular value decomposition (SVD) is used to obtain the principal
components as follows. Let X denote the N ×n data matrix, where N is the number of
observed vectors, and each vector has n components (usually N >>
n). The data matrix can be decomposed into
where U is an N ×n column orthogonal matrix, S is an n ×n diagonal
matrix consisting of the singular values and V is an n ×n matrix. The principal
components are given by XV = US , ie., the vectors of the orthonormal
columns in U, weighted by the singular values from S.
In Figure 10 the four most dominant PCs corresponding to
the Bank of TokyoMitsubishi are shown. Figure 11 shows
the reconstructed price obtained using the four most dominant PCs and compares it to the
sum of the remaining 24 nondominant PCs.
The results from the PCs obtained from the stock returns reveal the following aspects:
 The distinct shocks which were identified in the ICA case are much more difficult to
observe.
 While the first four PCs are by construction, the best possible fit in a quadratic error
sense to the data, they do not offer the same insight in structure of the data compared to
the ICs.
 The dominant transients obtained from the PCs, ie., after thresholding, do not lead to
the same overall shape of the stock returns as the ICA approach. Hence we cannot make the
same conclusions about high level and low level signals in the data. The effect of
thresholding is shown in Figures 12 and 13.
For the experiment reported here, the four most dominant PCs are the same, whether
ordered in terms of variance or using the L_{¥} norm
as in the ICA case. Beyond that the orders change.
Figure 10: The four most dominant PCs corresponding to stock returns for the Bank of
TokyoMitsubishi.
Figure 11: The solid line in the middle shows the reconstructed stock price using the four
most dominant PCs. The dashed line (lowest) shows the reconstructed stock price by adding
up the remaining 24 PCs. The sum of the two lines corresponds to the true price, this
highest line is identical is identical to the true price (dotted line). In this case, the
error is smaller than that obtained when using ICA. However, the overall shape of the
stock is not reconstructed as well by the PCs. This can also be seen more clearly after
thresholding in Figures 12 and 13.
Figure 12: Thresholded returns obtained from the four most dominant PCs applicable to the
Bank of TokyoMitsubishi.
Figure 13 shows the reconstructed stock price from the
thresholded returns are a poor fit to the overall shape of the original price. This
implies that key high level transients that were extracted by ICA are not obtained through
PCA.
Figure 13: PCA results for the Bank of TokyoMitsubishi: reconstructed prices for the Bank
of TokyoMitsubishi. The graph was obtained using only the thresholded values displayed in
the previous figure. In this case, the model does not capture the large transients
observed in the ICA case and fails to adequately approximate the shape of the original
stock price curve.
In summary, while PCA also decomposes the original data, the PCs do not possess the
high order independence obtained of the ICs. A major difference emerges when only the
largest shocks of the estimated sources are used. While the cumulative sum of the largest
IC shocks retains the overall shape, this is not the case for the PCs.
5 Conclusions
This paper applied independent component analysis to decompose the returns from a
portfolio of 28 stocks into statistically independent components. The components of the
instantaneous vectors of observed daily stocks are statistically dependent; stocks on
average move together. In contrast, the components of the instantaneous daily vector of
ICs are constructed to be statistically independent. This can be viewed as decomposing the
returns into statistically independent sources. On three years of daily data from the
Tokyo stock exchange, we showed that the estimated ICs fall into two categories, (i)
infrequent but large shocks (responsible for the major changes in the stock prices), and
(ii) frequent but rather small fluctuations (contributing only little to the overall level
of the stocks).
We have shown that by using a portfolio of stocks, ICA can reveal some underlying
structure in the data. Interestingly, the `noise' we observe may be attributed to signals
within a certain amplitude range and not to signals in a certain (usually high) frequency
range. Thus, ICA gives a fresh perspective to the problem of understanding the mechanisms
that influence the stock market data.
In comparison to PCA, ICA is a complimentary tool which allows the underlying structure
of the data to be more readily observed. There are clearly many other avenues in which ICA
techniques can be applied to finance.
Acknowledgements
We are grateful to Morio Yoda, Nikko Securities, Tokyo for kindly providing the data
used in this paper, and to JeanFrançois Cardoso for making the source code for the JADE
algorithm available. Andrew Back acknowledges support of the Frontier Research Program,
RIKEN and would like to thank Seungjin Choi and Zhang Liqing for helpful discussions.
Andreas Weigend acknowledges support from the National Science Foundation (ECS9309786),
and would like to thank Fei Chen, Elion Chin and Juan Lin for stimulating discussions.
Appendix
The experiments used the following 28 stocks:

1. 
The Bank Of TokyoMitsubishi 

15. 
Japan Asahi Bank 
2. 
Toyota Motor 

16. 
Tokai Bank 
3. 
Sumitomo Bank 

17. 
Honda Motor 
4. 
Fuji Bank 

18. 
Sony 
5. 
DaiIchi Kangyo Bank 

19. 
Seibu Railway 
6. 
Industrial Bank Of Japan 

20. 
Toshiba 
7. 
Sanwa Bank 

21. 
ItoYokado 
8. 
Matsushita Electric 

22. 
Kansai Electric Power 
9. 
Industrial Sakura Bank 

23. 
Nippon Steel 
10. 
Nomura Securities 

24. 
Mitsubishi Trust And Banking 
11. 
Tokyo Electric Power 

25. 
Nissan Motor 
12. 
Hitachi 

26. 
Denso 
13. 
Mitsubishi Heavy Industries 

27. 
Mitsubishi 
14. 
SevenEleven 

28. 
Tokio Marine 
References
 [1]
 S. Amari. Natural gradient works efficiently in learning. Neural Computation,
10(2):251276, 1998.
 [2]
 S. Amari and A. Cichocki. Blind signal processing neural network approaches. Proc.
IEEE, Special issue on blind identification and estimation, 1998. to appear.
 [3]
 S. Amari, A. Cichocki, and H.H. Yang. A new learning algorithm for blind
signal separation. In G. Tesauro, D.S. Touretzky, and T.K. Leen, editors, Advances
in Neural Information Processing Systems 8 (NIPS*95), pages 757763, Cambridge, MA,
1996. The MIT Press.
 [4]
 P. Baldi and K. Hornik. Neural networks and principal component analysis:
Learning from examples without local minima. Neural Networks, 2:5358, 1989.
 [5]
 Yoram Baram and Z. Roth. Forecasting by density shaping using neural networks. In Proceedings
of the 1995 Conference on Computational Intelligence for Financial Engineering (CIFEr),
pages 5771, Piscataway, NJ, 1995. IEEE Service Center.
 [6]
 A.J. Bell and T.J. Sejnowski. An information maximization approach to blind separation
and blind deconvolution. Neural Computation, 7:11291159, 1995.
 [7]
 A. Belouchrani, K. Abed Meraim, J.F. Cardoso, and É. Moulines. A blind source
separation technique based on second order statistics. IEEE Trans. on S.P.,
45(2):43444, February 1997.
 [8]
 R. E. Bogner. Blind separation of sources. Technical Report 4559, Defence Research
Agency, Malvern, May 1992.
 [9]
 H. Bourlard and Y. Kamp. Autoassociation by multilayer perceptrons and
singular value decomposition. Biological Cybernetics, 59:291294, 1988.
 [10]
 G. Burel. Blind separation of sources: a nonlinear neural algorithm. Neural
Networks, 5:937947, 1992.
 [11]
 J. Cardoso. Source separation using higher order moments. In International
Conference on Acoustics, Speech and Signal Processing, pages 21092112, 1989.
 [12]
 J.F. Cardoso. Blind signal separation: a review. Proc. IEEE, Special issue on
blind identification and estimation, 1998. To appear.
 [13]
 J.F. Cardoso and B. Laheld. Equivariant adaptive source separation. IEEE Trans.
Signal Processing, 44(12):30173030, December 1996.
 [14]
 J.F. Cardoso and A. Souloumiac. Blind beamforming for nonGaussian signals. IEE
Proc. F., 140(6):771774, December 1993.
 [15]
 S. Choi, R. Liu, and A. Cichocki. A spurious equilibriafree learning
algorithm for the blind separation of nonzero skewness signals. Neural Processing
Letters, 7:18, 1998.
 [16]
 A. Cichocki, S. Amari, and R. Thawonmas. Blind signal extraction using
selfadaptive nonlinear hebbian learning rule. In Proc. of Int. Symposium on
Nonlinear Theory and its Applications, NOLTA96, pages 377380, Kochi, Japan, 1996.
 [17]
 A. Cichocki and L. Moszczy\'nski. New learning algorithm for blind separation
of sources. Electronics Letters, 28(21):19861987, October 8 1992.
 [18]
 A. Cichocki, R. Thawonmas, and S. Amari. Sequential blind signal
extraction in order specified by stochastic properties. Electronics Letters,
33(1):6465, 1997.
 [19]
 A. Cichocki and R. Unbehauen. Neural Networks for Optimization and Signal
Processing. Wiley, 1993.
 [20]
 A. Cichocki, R. Unbehauen, and E. Rummert. Robust learning algorithm for
blind separation of signals. Electronics Letters, 30(17):13861387, 1994.
 [21]
 P. Comon. Separation of sources using highorder cumulants. In SPIE Conference
on Advanced Algorithms and Architectures for Signal Processing, volume XII, pages
170181, San Diego, CA, August 1989.
 [22]
 P. Comon. Independent component analysis  a new concept? Signal Processing,
36(3):287314, 1994.
 [23]
 N. Delfosse and P. Loubaton. Adaptive blind separation of independent sources:
a deflation approach. Signal Processing, 45:5983, 1995.
 [24]
 S.C. Douglas and A. Cichocki. Neural networks for blind decorrelation of signals. IEEE
Trans. Signal Processing, 45(11):28292842, 1997.
 [25]
 J. H. Friedman. Exploratory projection pursuit. Journal of the American
Statistical Association, 82:249266, 1987.
 [26]
 J.H. Friedman and J.W. Tukey. A projection pursuit algorithm for exploratory data
analysis. IEEE Trans. Computers, 23:881889, 1974.
 [27]
 M. Girolami and C. Fyfe. An extended exploratory projection pursuit network
with linear and nonlinear antihebbian connections applied to the cocktail party problem. Neural
Networks, 10(9):16071618, 1997.
 [28]
 M. Girolami and C. Fyfe. Extraction of independent signal sources using a
deflationary exploratory projection pursuit network with lateral inhibition. IEE
Proceedings on Vision, Image and Signal Processing, 14(5):299306, 1997.
 [29]
 J. Herault and C. Jutten. Space or time adaptive signal processing by neural
network models. In J. S. Denker, editor, Neural Networks for Computing.
Proceedings of AIP Conference, pages 206211, New York, 1986. American Institute of
Physics.
 [30]
 Peter J. Huber. Projection pursuit. The Annals of Statistics, 13:435475,
1985.
 [31]
 A. Hyvärinen. Simple oneunit algorithms for blind source separation and blind
deconvolution. In Progress in Neural Information Processing ICONIP'96,
volume 2, pages 12011206. Springer, 1996.
 [32]
 C. Jutten and J. Herault. Blind separation of sources, Part I: An adaptive
algorithm based on neuromimetic architecture. Signal Processing, 24:110, 1991.
 [33]
 C. Jutten, H.L. Nguyen Thi, E. Dijkstra, E. Vittoz, and J. Caelen.
Blind separation of sources, an algorithm for separation of convolutive mixtures. In Proceedings
of Int. Workshop on High Order Statistics, pages 273276, Chamrousse (France), 1991.
 [34]
 J. Karhunen. Neural approaches to independent component analysis and source
separation. In Proceedings of 4th European Symp. on Artificial Neural Networks
(ESANN'96), pages 249266, Bruges, Belgium, April 1996.
 [35]
 J.L. Lacoume and P. Ruiz. Separation of independent sources from correlated inputs.
IEEE Trans. Signal Processing, 40:30743078, December 1992.
 [36]
 T.W. Lee, M. Girolami, A.J. Bell, and T. Sejnowski. A unifying information
theoretic framework for independent component analysis. International Journal on
Mathematical and Computer Modelling, to appear, 1998.
 [37]
 S. Li and T.J. Sejnowski. Adaptive separation of mixed broadband sound sources
with delays by a beamforming HéraultJutten network. IEEE Journal of Oceanic
Engineering, 20(1):7379, January 1995.
 [38]
 Juan K. Lin, David G. Grier, and Jack D. Cowan. Faithful representation
of separable distributions. Neural Computation, 9:13051320, 1997.
 [39]
 X.T. Ling, Y.F. Huang, and R. Liu. A neural network for blind signal separation.
In Proc. of IEEE Int. Symposium on Circuits and Systems (ISCAS94), pages
6972, New York, NY, 1994. IEEE Press.
 [40]
 J. E. Moody and L. Wu. What is the ``true price''?  State space models for
high frequency financial data. In Progress in Neural Information Processing
(ICONIP'96), pages 697704, Berlin, 1996. Springer.
 [41]
 J. E. Moody and L. Wu. What is the ``true price''?  State space models for
high frequency FX data. In A. S. Weigend, Y. S. AbuMostafa, and A.P. N.
Refenes, editors, Decision Technologies for Financial Engineering (Proceedings of the
Fourth International Conference on Neural Networks in the Capital Markets, NNCM96),
pages 346358, Singapore, 1997. World Scientific.
 [42]
 J. E. Moody and L. Wu. What is the ``true price''?  State space models for
high frequency FX data. In Proceedings of the IEEE/IAFE 1997 Conference on
Computational Intelligence for Financial Engineering (CIFEr), pages 150156,
Piscataway, NJ, 1997. IEEE Service Center.
 [43]
 E. Moreau and O. Macchi. Complex selfadaptive algorithms for source
separation based on high order contrasts. In Signal Processing VII Proceedings of
EUSIPCO94, pages 11571160, Lausanne, Switzerland, 1994. EURASIP.
 [44]
 HL. Nguyen Thi and C. Jutten. Blind source separation for convolutive mixtures. Signal
Processing, 45(2):209229, 1995.
 [45]
 E. Oja. Neural networks, principal components and subspaces. International
Journal of Neural Systems, 1:6168, 1989.
 [46]
 E. Oja and J. Karhunen. Signal separation by nonlinear hebbian learning. In
M. Palaniswami, Y. Attikiouzel, R. Marks II, D. Fogel, and
T. Fukuda, editors, Computational Intelligence  A Dynamic System Perspective,
pages 8397, New York, NY, 1995. IEEE Press.
 [47]
 Lucas Parra, Clay Spence, and Bert de Vries. Convolutive source separation and signal
modeling with ML. In International Symposium on Intelligent Systems (ISIS'97),
University of Reggio Calabria, Italy, 1997.
 [48]
 Barak A. Pearlmutter and Lucas C. Parra. Maximum likelihood blind source
separation: A contextsensitive generalization of ICA. In M. C. Mozer, M. I.
Jordan, and T. Petsche, editors, Advances in Neural Information Processing
Systems 9 (NIPS*96), pages 613619. MIT Press, Cambridge, MA, 1997.
 [49]
 D. Pham, P. Garat, and C. Jutten. Separation of a mixture of independent
sources through a maximum likelihood approach. In J. Vandevalle, R. Boite,
M. Moonen, and A. Oosterlink, editors, Signal Processing VI: Theories and
Applications, pages 771774. Elsevier, 1992.
 [50]
 K.J. Pope and R.E. Bogner. Blind separation of speech signals. In Proc. of the
Fifth Australian Int. Conf. on Speech Science and Technology, pages 4650,
Perth, Western Australia, December 68 1994.
 [51]
 K.J. Pope and R.E. Bogner. Blind signal separation. I: Linear, instantaneous
combinations. Digital Signal Processing, 6:516, 1996.
 [52]
 K.J. Pope and R.E. Bogner. Blind signal separation. II: Linear, convolutive
combinations. Digital Signal Processing, 6:1728, 1996.
 [53]
 B.D. Ripley. Pattern Recognition and Neural Networks. Cambridge University
Press, 1996.
 [54]
 Avi Silberschatz and Alexander Tuzhilin. What makes patterns interesting in knowledge
discovery systems. IEEE Transactions on Knowledge and Data Engineering,
8(6):970974, 1996.
 [55]
 L. Tong, R.W. Liu, V.C. Soon, and Y.F. Huang. Indeterminacy and identifiability of
blind identification. IEEE Trans. Circuits, Syst., 38(5):499509, May 1991.
 [56]
 L. Tong, V. C. Soon, Y. F. Huang, and R. Liu. AMUSE: A new blind
identification algorithm. In International Conference on Acoustics, Speech and Signal
Processing, pages 17841787, 1990.
 [57]
 K. Torkkola. Blind separation of convolved sources based on information
maximization. In S. Usui, Y. Tohkura, S. Katagiri, and E. Wilson,
editors, Proc. of the 1996 IEEE Workshop Neural Networks for Signal Processing 6
(NNSP96), pages 423432, New York, NY, 1996. IEEE Press.
 [58]
 J. Utans, W. T. Holt, and A. N. Refenes. Principal component analysis for
modeling multicurrency portfolios. In A. S. Weigend, Y. S. AbuMostafa, and
A.P. N. Refenes, editors, Decision Technologies for Financial Engineering
(Proceedings of the Fourth International Conference on Neural Networks in the Capital
Markets, NNCM96), pages 359368, Singapore, 1997. World Scientific.
 [59]
 E. Weinstein, M. Feder, and A.V. Oppenheim. Multichannel signal separation by
decorrelation. IEEE Trans. Speech and Audio Processing, 1(10):405413, 1993.
 [60]
 L. Wu and J. Moody. Multieffect decompositions for financial data modelling.
In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in
Neural Information Processing Systems 9 (NIPS*96), pages 9951001. MIT Press,
Cambridge, MA, 1997.
 [61]
 H. H. Yang and S. Amari. Adaptive online learning algorithms for blind
separation: Maximum entropy and minimum mutual information. Neural Computation,
9:14571482, 1997.
 [62]
 H.H. Yang, S. Amari, and A. Cichocki. Informationtheoretic approach to blind
separation of sources in nonlinear mixture. Signal Processing, 64(3):291300,
1998.
 [63]
 Howard H. Yang, S. Amari, and Andrzej Cichocki. Information backpropagation
for blind separation of sources in nonlinear mixtures. In IEEE International
Conference on Neural Networks, Houston TX (ICNN'97), pages 21412146. IEEEPress,
1997.
 [64]
 D. Yellin and E. Weinstein. Multichannel signal separation: Methods and
analysis. IEEE Transactions on Signal Processing, 44:106118, 1996.
Footnotes:
^{1} We are only aware of [5] who use a neural network that maximizes output
entropy and of [40,41,42,60] who apply
ICA in the context of state space models for interbank foreign exchange rates to improve
the separation between observational noise and the ``true price.''
^{2} ICA algorithms based on
second order statistics have also been proposed [7,56].
^{3} For four zeromean
variables y_{i}, y_{j}, y_{k}, y_{l}, the fourth order
cumulant is given by
E [y_{i} y_{j} y_{k} y_{l} ]  E
[ y_{i} y_{j}] E [ y_{k} y_{l} ]  E [ y_{i} y_{k}]
E [ y_{j} y_{l} ]  E [ y_{i} y_{l}] E [ y_{j} y_{k}
] . 

This is the difference between the expected value E[ ·] of the product of the four
variables (fourth moment), and the three products of pairs of covariances (second
moments). The diagonal elements (i = j = l = m) are the fourth order selfcumulants.
^{4} A signal x(t) is
considered to be stationary if the expected value is constant, or, after removing a
constant mean, E[x(t)] = 0. In practice however, this definition depends on the interval
over which we wish to measure the expectation.
^{5} We chose a subset of
available historical data on which to test the method. This allows us to reserve
subsequent data for further experimentation.
^{6} We also explored the
effect of reducing the number of stocks that entered the ICA. The result is that the
signal separation degrades when only fewer stocks are used. In that case, the independent
components appear to give less distinct information. We had access to data for a maximum
of 28 stocks.
^{7} ICs can by sorted in
various ways. For example, in the implementation of the JADE algorithm Cardoso used a
Euclidean norm to sort the rows of the demixing matrix W according to their
contribution across all signals[14].
File translated from T_{E}X by T_{T}H,
version 1.57.
