This version of the paper is for reading online and will not give good results when printed. Please download either the Postscript or PDF version for printing a hardcopy. If you cannot see the equations online please read these notes.
Forthcoming in: International Journal of Neural Systems, Vol. 8,
No.5 (October, 1997).
(Special Issue on Data Mining in Finance.)
A First Application of Independent Component Analysis
to Extracting Structure from Stock Returns
Andrew D. Back
Brain Science Institute
The Institute of Physical and Chemical Research (RIKEN)
21 Hirosawa, Wakoshi, Saitama 3510198, Japan
back@brain.riken.go.jp
www.open.brain.riken.go.jp/~back/
Andreas S. Weigend
Department of Information Systems
Leonard N. Stern School of Business
New York University
44 West Fourth Street, MEC 974
New York, NY 10012, USA
aweigend@stern.nyu.edu
www.stern.nyu.edu/~aweigend
In this paper we consider the application of a signal processing technique known as independent component analysis (ICA) or blind source separation, to multivariate financial time series such as a portfolio of stocks. The key idea of ICA is to linearly map the observed multivariate time series into a new space of statistically independent components (ICs). We apply ICA to three years of daily returns of the 28 largest Japanese stocks and compare the results with those obtained using principal component analysis (PCA). The results indicate that the estimated ICs fall into two categories, (i) infrequent but large shocks (responsible for the major changes in the stock prices), and (ii) frequent smaller fluctuations (contributing little to the overall level of the stocks). We show that the overall stock price can be reconstructed surprisingly well by using a small number of thresholded weighted ICs. In contrast, when using shocks derived from principal components instead of independent components, the reconstructed price is less similar to the original one. Independent component analysis is shown to be a potentially powerful method of analysing and understanding driving mechanisms in financial time series.
What drives the movements of a financial time series? This surely is a question of interest to many, ranging from researchers who wish to understand financial markets, to traders who will benefit from such knowledge. Can modern knowledge discovery and data mining techniques help discover some of the underlying forces?
In this paper, we focus on a new technique which to our knowledge has not been used in any significant application to financial or econometric problems^{1}. The method is known as independent component analysis (ICA) and is also referred to as blind source separation [29,32,22]. The central assumption is that an observed multivariate time series (such as daily stock returns) reflect the reaction of a system (such as the stock market) to a few statistically independent time series. ICA seeks to extract out these independent components (ICs) as well as the mixing process.
ICA can be expressed in terms of the related concepts of entropy [6], mutual information [3], contrast functions [22] and other measures of the statistical independence of signals. For independent signals, the joint probability can be factorized into the product of the marginal probabilities. Therefore the independent components can be found by minimizing the KullbackLeibler divergence between the joint probability and marginal probabilities of the output signals [3]. Hence, the goal of finding statistically independent components can be expressed in several ways:
From this basis, algorithms can be derived to extract the desired independent components. In general, these algorithms can be considered as unsupervised learning procedures. Recent reviews are given in [2,12,51,52].
Independent component analysis can also be contrasted with principal component analysis (PCA) and so we give a brief comparison of the two methods here. Both ICA and PCA linearly transform the observed signals into components. The key difference however, is in the type of components obtained. The goal of PCA is to obtain principal components which are uncorrelated. Moreover, PCA gives projections of the data in the direction of the maximum variance. The principal components (PCs) are ordered in terms of their variances: the first PC defines the direction that captures the maximum variance possible, the second PC defines (in the remaining orthogonal subspace) the direction of maximum variance, and so forth. In ICA however, we seek to obtain statistically independent components.
PCA algorithms use only second order statistical information. On the other hand, ICA algorithms may use higher order^{2} statistical information for separating the signals, see for example [11,22]. For this reason nonGaussian signals (or at most, one Gaussian signal) are normally required for ICA algorithms based on higher order statistics. For PCA algorithms however, the higher order statistical information provided by such nonGaussian signals is not required or used, hence the signals in this case can be Gaussian. PCA algorithms can be implemented with batch algorithms or with online algorithms. Examples of online or ``neural'' PCA algorithms include [9,4,45].
This paper is organized in the following way. Section 2 provides a background to ICA and a guide to some of the algorithms available. Section 3 discusses some issues concerning the general application of ICA to financial time series. Our specific experimental results for the application of ICA to Japanese equity data are given in section 4. In this section we compare results obtained using both ICA and PCA. Section 5 draws some conclusions about the use of ICA in financial time series.
ICA denotes the process of taking a set of measured signal vectors, x, and extracting from them a (new) set of statistically independent vectors, y, called the independent components or the sources. They are estimates of the original source signals which are assumed to have been mixed in some prescribed manner to form the observed signals.
Figure 1: Schematic representation of ICA. The original sources s are mixed through matrix
A to form the observed signal x. The demixing matrix W transforms the
observed signal x into the independent components y.
Figure 1 shows the most basic form of ICA. We use the following notation: We observe a multivariate time series { x_{i}(t)} , i = 1,...,n, consisting of n values at each time step t. We assume that it is the result of a mixing process

(1) 
Using the instantaneous observation vector x(t) = [x_{1}(t),x_{2}(t),...,x_{n}(t)]^{¢} where ¢ indicates the transpose operator, the problem is to find a demixing matrix W such that

(2) 
where A is the unknown mixing matrix. We assume throughout this paper that there are as many observed signals as there are sources, hence A is a square n ×n matrix. If W = A^{1}, then y(t) = s(t), and perfect separation occurs. In general, it is only possible to find W such that WA = PD where P is a permutation matrix and D is a diagonal scaling matrix [55].
To find such a matrix W, the following assumptions are made:
In this paper, we only consider the case when the mixtures occur instantaneously in time. It is also of interest to consider models based on multichannel blind deconvolution [33,59,44,57,64,47] however we do not do this here.
The earliest ICA algorithm that we are aware of and one which generated much interest in the field is that proposed by [29]. Since then, various approaches have been proposed in the literature to implement ICA. These include: minimizing higher order moments [11] or higher order cumulants^{3} [14], minimization of mutual information of the outputs or maximization of the output entropy [6], minimization of the KullbackLeibler divergence between the joint and the product of the marginal distributions of the outputs [3].
ICA algorithms are typically implemented in either offline (batch) form or using an online approach. A standard approach for batch ICA algorithms is the following twostage procedure [8,14].
This approach is sometimes referred to as ``decorrelation and rotation''. Note that this approach relies on the measured signals being nonGaussian. For Gaussian signals, the higher order statistics are zero already and so no meaningful separation can be achieved by ICA methods. For nonGaussian random signals the implication is that not only should the signals be uncorrelated, but that the higher order crossstatistics (eg. moments or cumulants) are zeroed.
The empirical study carried out in this paper uses the JADE (Joint Approximate Diagonalization of Eigenmatrices) algorithm [14] which is a batch algorithm and is an efficient version of the above two step procedure. The first stage is performed by computing the sample covariance matrix, giving the second order statistics of the observed outputs. From this, a matrix is computed by eigendecomposition which whitens the observed data. The second stage consists of finding a rotation matrix which jointly diagonalizes eigenmatrices formed from the fourth order cumulants of the whitened data. The outputs from this stage are the independent components. For specific details of the algorithm, the reader is referred to [14]. The JADE algorithm has been extended by [50]. Other examples of twostep methods were proposed in [11,21,8].
A wide variety of online algorithms have been proposed [3,6,1523,27,28,31,3439,43,46,49]. Many of these algorithms are sometimes referred to as ``neural" learning algorithms. They employ a cost function which is optimized by adjusting the demixing matrix to increase independence of outputs.
ICA algorithms have been developed recently using the natural gradient approach [3,1]. A similar approach was independently derived by Cardoso [13] who referred to it as a relative gradient algorithm. This theoretically sound modification to the usual online updating algorithm overcomes the problem of having to perform matrix inversions at each time step and therefore permits significantly faster convergence.
Another approach, known as contextual ICA was developed in [48]. In this method, which is based on maximum likelihood estimation, the source distributions are modeled and the temporal nature of the signals is used to derive the demixing matrix. The density functions of the input sources are estimated using past values of the outputs. This algorithm proved to be effective in separating signals having colored Gaussian distributions or low kurtosis.
The ICA framework has also been extended to allow for nonlinear mixing. One of the first approaches in this area was given in [10]. More recently, an information theoretic approach to estimating sources assumed to be mixed and passed through an invertible nonlinear function was proposed in [63] and [62]. Unsupervised learning algorithms based on maximizing entropy and minimizing mutual information are described in [61]. Lin, Grier and Cowan describe a local version of ICA [38]. Rather than finding one global coordinate transformation, local ICAs are carried out for subsets of the data. While using invertible transformations, this is a promising way to express global nonlinearities.
ICA algorithms for mixed and convolved signals have also been considered, see for example [24,33,44,47,52,57,59,64].
ICA provides a mechanism of decomposing a given signal into statistically independent components. The goal of this paper is to explore whether ICA can give some indication of the underlying structure of the stock market. The hope is to find interpretable factors of instantaneous stock returns. Such factors could include news (government intervention, natural or manmade disasters, political upheaval), response to very large trades and of course, unexplained noise. Ultimately, we hope that this might yield new ways of analyzing and forecasting financial time series, contributing to a better understanding of financial markets.
Like most time series approaches, ICA requires the observed signals to be stationary^{4}. In this paper, we transform the nonstationary stock prices p(t), to stock returns by taking the difference between successive values of the prices, x(t) = p(t)p(t1). Given the relatively large change in price levels over the few years of data, an alternative would have been to use relative returns, log(p(t))  log(p(t1)), describing geometric growth as opposed to additive growth.
To investigate the effectiveness of ICA techniques for financial time series, we apply ICA to data from the Tokyo Stock Exchange. We use daily closing prices from 1986 until 1989^{5} of the 28 largest firms, listed in the Appendix. Figure 2 shows the stock price of the first company in our set, the Bank of TokyoMitsubishi, between August 1986 and July 1988. For the same time interval, Figure 3 displays the movements of the eight largest stocks, offset from each other for clarity.
Figure 2: The price of the Bank of TokyoMitsubishi stock for the period 8/86 until 7/88.
This bank is one of the largest companies traded on the Tokyo Stock Exchange.
Figure 3: The largest eight stocks on the Tokyo Stock Exchange for the period 8/86 until
7/88. In this figure, each stock has been offset for clarity. The approximate range for
each stock is 2500 Yen. The lowest line displays the price of the Bank of
TokyoMitsubishi, shown also in the previous figure.
The preprocessing consists of three steps: we obtain the daily stock returns as indicated in Section 3.2, subtract the mean of each stock, and normalize the resulting values to lie within the range [ 1,1] . Figure 4 shows these normalized stock returns.
Figure 4: The stock returns (differenced time series) of the first eight stocks for the
period 8/86 until 7/88. The large negative return at day 317 corresponds to the crash of
19 October 1987. The lowest line again corresponds to the Bank of TokyoMitsubishi. The
question is: can ICA reveal useful information about these time series?
We performed ICA on the stock returns using the JADE algorithm [14] described in Section 2.2. In all the experiments, we assume that the number of stocks equals the number of sources supplied to the mixing model.
In the results presented here, all 28 stocks are used as inputs in the ICA. However for clarity, the figures only display the first few ICs^{6}. Figure 5 shows a subset of eight ICs obtained from the algorithm. Note that the goal of statistical independence forces the 1987 crash to be carried by only a few components.
Figure 5: The first eight ICs, resulting from the ICA of all 28 stocks. Note that the ICs
can be seen as quite distinct shock inputs into the system.
We now present the analysis of a specific stock, the Bank of TokyoMitsubishi. The contributions of the ICs to any given stock can be found as follows.
For a given stock return, there is a corresponding row of the mixing matrix A used to weight the independent components. By multiplying the corresponding row of A with the ICs, we obtain the weighted ICs. We define dominant ICs to be those ICs with the largest maximum signal amplitudes. They have the largest effect on the reconstructed stock price. In contrast, other criteria, such as the variance, would focus not on the largest value but on the average.
Figure 6 weights the ICs with the the first row of the mixing matrix which corresponds to the Bank of TokyoMitsubishi. The four traces at the bottom show the four most dominant ICs for this stock.
From the usual mixing process given by Eq. (1), we can obtain the reconstruction of the ith stock return in terms of the estimated ICs as

(3) 
where y_{k}(tj) is the value of the kth estimated IC at time tj and a_{ik} is the weight in the ith row, kth column of the estimated mixing matrix A (obtained as the inverse of the demixing matrix W). We define the weighted ICs for the ith observed signal (stock return) as

(4) 
In this paper, we rank the weighted ICs with respect to the first stock return. Therefore, we multiply the ICs with the first row of the mixing matrix and use a_{1k} k = 1,..,n to obtain the weighted ICs. The weighted ICs are then sorted^{7} using an L_{¥} norm since we are most interested in showing just those ICs which cause the maximum price change in a particular stock.
Figure 6: The four most dominant ICs after weighting by the first row of the mixing matrix
A (corresponding to the Bank of TokyoMitsubishi) are shown starting from the
bottom trace. The top trace is the summation of the remaining 24 ICs, the least dominant
ICs for this stock. The weighted sum of all the ICs corresponds to the original stock
return.
Figure 7: The dotted line on top is the original stock price. The solid line in the middle
shows the reconstructed stock price using the four most dominant weighted ICs. The dashed
line on the bottom shows the reconstructed residual stock price obtained by adding up the
remaining 24 weighted ICs. Note that the major part of the true `shape' comes from the
most dominant components; the contribution of the nondominant ICs to the overall shape is
only small. In this plot, the cumulative sum of the residual ICs are plotted at an offset
of 1550.
The ICs obtained from the stock returns reveal the following aspects:
Figure 7 shows the reconstructed price obtained using the four most dominant weighted ICs and compares it to the sum of the remaining 24 nondominant weighted ICs.
The preceding section discussed the effect of a lossy reconstruction of the original prices, obtained by considering the cumulative sums of only the first few dominant ICs. This section goes further and thresholds these dominant ICs. This sets all weighted IC values below a threshold to zero, and only uses those values above the threshold to reconstruct the signal.
The thresholded reconstructions are described by

(5) 
where [`x]_{i}(tj) are the returns constructed using thresholds, g(·) is the threshold function, r is the number of ICs used in the reconstruction and x is the threshold value. The threshold was set arbitrarily to a value which excluded almost all of the lower level components.
The reconstructed stock prices are found as

(6) 
For the first stock, the Bank of TokyoMitsubishi, p_{1}(tN) = 1550. By setting x = 0 and r = n the original price series is reconstructed exactly.
Figure 8: Thresholded returns obtained from the four most dominant weighted ICs for the
Bank of TokyoMitsubishi.
The thresholded returns of the four most dominant ICs are shown in Figure 8, and the stock price reconstructed from the thresholded return values are shown in Figure 9. The figures indicate that the thresholded ICs provide useful morphological information and can extract the turning points of original time series.
Figure 9: ICA results for the Bank of TokyoMitsubishi: reconstructed prices for the Bank
of TokyoMitsubishi obtained by computing the cumulative sum of only the thresholded
values displayed in the previous figure. Note that the price for 1,000 points plotted is
characterized well by only a few innovations.
PCA is a well established tool in finance. Applications range from Arbitrage Pricing Theory and factor models to input selection for multicurrency portfolios [58]. Here we seek to compare the performance of PCA with ICA.
Singular value decomposition (SVD) is used to obtain the principal components as follows. Let X denote the N ×n data matrix, where N is the number of observed vectors, and each vector has n components (usually N >> n). The data matrix can be decomposed into

(7) 
where U is an N ×n column orthogonal matrix, S is an n ×n diagonal matrix consisting of the singular values and V is an n ×n matrix. The principal components are given by XV = US , ie., the vectors of the orthonormal columns in U, weighted by the singular values from S.
In Figure 10 the four most dominant PCs corresponding to the Bank of TokyoMitsubishi are shown. Figure 11 shows the reconstructed price obtained using the four most dominant PCs and compares it to the sum of the remaining 24 nondominant PCs.
The results from the PCs obtained from the stock returns reveal the following aspects:
For the experiment reported here, the four most dominant PCs are the same, whether ordered in terms of variance or using the L_{¥} norm as in the ICA case. Beyond that the orders change.
Figure 10: The four most dominant PCs corresponding to stock returns for the Bank of
TokyoMitsubishi.
Figure 11: The solid line in the middle shows the reconstructed stock price using the four
most dominant PCs. The dashed line (lowest) shows the reconstructed stock price by adding
up the remaining 24 PCs. The sum of the two lines corresponds to the true price, this
highest line is identical is identical to the true price (dotted line). In this case, the
error is smaller than that obtained when using ICA. However, the overall shape of the
stock is not reconstructed as well by the PCs. This can also be seen more clearly after
thresholding in Figures 12 and 13.
Figure 12: Thresholded returns obtained from the four most dominant PCs applicable to the
Bank of TokyoMitsubishi.
Figure 13 shows the reconstructed stock price from the thresholded returns are a poor fit to the overall shape of the original price. This implies that key high level transients that were extracted by ICA are not obtained through PCA.
Figure 13: PCA results for the Bank of TokyoMitsubishi: reconstructed prices for the Bank
of TokyoMitsubishi. The graph was obtained using only the thresholded values displayed in
the previous figure. In this case, the model does not capture the large transients
observed in the ICA case and fails to adequately approximate the shape of the original
stock price curve.
In summary, while PCA also decomposes the original data, the PCs do not possess the high order independence obtained of the ICs. A major difference emerges when only the largest shocks of the estimated sources are used. While the cumulative sum of the largest IC shocks retains the overall shape, this is not the case for the PCs.
This paper applied independent component analysis to decompose the returns from a portfolio of 28 stocks into statistically independent components. The components of the instantaneous vectors of observed daily stocks are statistically dependent; stocks on average move together. In contrast, the components of the instantaneous daily vector of ICs are constructed to be statistically independent. This can be viewed as decomposing the returns into statistically independent sources. On three years of daily data from the Tokyo stock exchange, we showed that the estimated ICs fall into two categories, (i) infrequent but large shocks (responsible for the major changes in the stock prices), and (ii) frequent but rather small fluctuations (contributing only little to the overall level of the stocks).
We have shown that by using a portfolio of stocks, ICA can reveal some underlying structure in the data. Interestingly, the `noise' we observe may be attributed to signals within a certain amplitude range and not to signals in a certain (usually high) frequency range. Thus, ICA gives a fresh perspective to the problem of understanding the mechanisms that influence the stock market data.
In comparison to PCA, ICA is a complimentary tool which allows the underlying structure of the data to be more readily observed. There are clearly many other avenues in which ICA techniques can be applied to finance.
We are grateful to Morio Yoda, Nikko Securities, Tokyo for kindly providing the data used in this paper, and to JeanFrançois Cardoso for making the source code for the JADE algorithm available. Andrew Back acknowledges support of the Frontier Research Program, RIKEN and would like to thank Seungjin Choi and Zhang Liqing for helpful discussions. Andreas Weigend acknowledges support from the National Science Foundation (ECS9309786), and would like to thank Fei Chen, Elion Chin and Juan Lin for stimulating discussions.
The experiments used the following 28 stocks:
1.  The Bank Of TokyoMitsubishi  15.  Japan Asahi Bank  
2.  Toyota Motor  16.  Tokai Bank  
3.  Sumitomo Bank  17.  Honda Motor  
4.  Fuji Bank  18.  Sony  
5.  DaiIchi Kangyo Bank  19.  Seibu Railway  
6.  Industrial Bank Of Japan  20.  Toshiba  
7.  Sanwa Bank  21.  ItoYokado  
8.  Matsushita Electric  22.  Kansai Electric Power  
9.  Industrial Sakura Bank  23.  Nippon Steel  
10.  Nomura Securities  24.  Mitsubishi Trust And Banking  
11.  Tokyo Electric Power  25.  Nissan Motor  
12.  Hitachi  26.  Denso  
13.  Mitsubishi Heavy Industries  27.  Mitsubishi  
14.  SevenEleven  28.  Tokio Marine 
^{1} We are only aware of [5] who use a neural network that maximizes output entropy and of [40,41,42,60] who apply ICA in the context of state space models for interbank foreign exchange rates to improve the separation between observational noise and the ``true price.''
^{2} ICA algorithms based on second order statistics have also been proposed [7,56].
^{3} For four zeromean variables y_{i}, y_{j}, y_{k}, y_{l}, the fourth order cumulant is given by

This is the difference between the expected value E[ ·] of the product of the four variables (fourth moment), and the three products of pairs of covariances (second moments). The diagonal elements (i = j = l = m) are the fourth order selfcumulants.
^{4} A signal x(t) is considered to be stationary if the expected value is constant, or, after removing a constant mean, E[x(t)] = 0. In practice however, this definition depends on the interval over which we wish to measure the expectation.
^{5} We chose a subset of available historical data on which to test the method. This allows us to reserve subsequent data for further experimentation.
^{6} We also explored the effect of reducing the number of stocks that entered the ICA. The result is that the signal separation degrades when only fewer stocks are used. In that case, the independent components appear to give less distinct information. We had access to data for a maximum of 28 stocks.
^{7} ICs can by sorted in various ways. For example, in the implementation of the JADE algorithm Cardoso used a Euclidean norm to sort the rows of the demixing matrix W according to their contribution across all signals[14].
File translated from T_{E}X by T_{T}H, version 1.57.