Untangling complex syste.., p.70
Untangling Complex Systems, page 70
C( m
DC
,ε ) ∝ ε [10.68]
In practice, the plot of log( C) veraus log( ε) is fitted by a straight line determined by the least-squares method. Its slope is the correlation dimension D . The procedure is repeated for different value
C
of m. If the dynamic is chaotic, D converges to a finite value that, often, is not an integer. If the C
dynamic is stochastic, D does not converge and does not show a saturating value even at high m
C
values (Osborne and Provenzale 1989).
10.8.1.6 Permutation Entropy
Another parameter that is useful for discriminating between chaotic and stochastic data is the
permutation entropy (Bandt and Pompe 2002). It represents the Shannon entropy of the permuta-
tion patterns of the elements of the time series. In the phase space of embedding dimension m,
we account for all m! possible permutations of all the vectors A { ( ), ( +τ ), , ( +( − )1τ )}
i = A ti
A ti
… A ti m
,
for i = 1,…, N − ( m − )
1 τ . Every pattern among the m! possible permutations is labelled as π j (with
j = 1,…, m!). We determine the abundance q(π )
j of each pattern π j and calculate ts frequency
ν (π )
( ) / ( (
)1 )
j = q π j
N − m − τ . The permutation entropy ( S ) is the Shannon entropy associated
P
with the distribution of the permutation patterns:
m!
SP = −∑ν (π ) ilogν π
2 ( i ) [10.69]
j=1
348
Untangling Complex Systems
The range of variability for S is 0 (for monotonically increasing or decreasing data) and log ( )
2 m!
P
(for completely stochastic data). For this reason, we can normalize S by diving it by log ( )
2 m! . The
P
normalized permutation entropy is:
m
∑ !ν(π ) i log2ν(π i)
S
j=1
P = −
[10.70]
log2 ( m )
!
It ranges between 0 and 1.
10.8.1.7 Surrogate Data
The surrogate data methods are useful and reliable statistical tests to explore the presence of chaos
in a time series (Theiler et al. 1992). These methods have two ingredients: a null hypothesis tested
against observations and a discriminating statistic. The null hypothesis is a potential explanation
that we want to check is inadequate for interpreting the data we have collected. A discriminating
statistic is a number that quantifies some features of the time series. If the statistical number is
different for the observed data with respect to the surrogate data generated under a specific null
hypothesis, then the null hypothesis can be rejected. Examples of null hypothesis are the following:
a time series that is (I) linearly filtered noise; (II) linear transformation of linearly filtered noise;
(III) monotonic nonlinear transformation of linearly filtered noise. Different methods for generating
surrogate data have been proposed so far. For example, the Iterative Amplitude Adjusted Fourier
Transformed (IAAFT) surrogate data method (Schreiber and Schmitz 1996) considers a nonlinear
rescaling of a Gaussian linear stochastic process as the null hypothesis. On the other hand, the cycle
surrogate data method (Small and Judd 1998) is based on the null hypothesis that each cycle in
aperiodic dynamics is independent of its adjacent cycles. Usually, one begins with simple assump-
tions and progresses to more sophisticated models if the collected time series is inconsistent with
the surrogate data.
10.8.1.8 Short-Term Predictability and Long-Term Unpredictability
A feature of a chaotic time series is its unpredictability in the long term. It has been demonstrated
(Sugihara and May 1990) that the accuracy of a nonlinear forecasting method falls off when it tries
to predict chaotic time series, and the prediction-time interval is increased. On the other hand, the
nonlinear forecasting method is roughly independent of the prediction time interval when it tries
to predict time series that are uncorrelated noise.10 For the discernment of a chaotic from a white noise time series, we first need to build its phase space by exploiting the Takens’ theorem. Then,
we plot the time series in its phase space. For each point Ai of the trajectory, it is possible to find its
neighbors, which are Aj (with j = 1, …, k). For the prediction of the value Ai+τ t∆, which lies τ∆ t ahead in its phase space (see Figure 10.20), the nonlinear predictor exploits the corresponding Aj+τ t∆ (with j = 1, …, k) and the following algorithm:
k
A τ∆
∑ ( , )
i+ t =
W Aj Ai * Aj+τ t∆ [10.71]
j=1
10 Noise is a signal generated by a stochastic process. Stochastic phenomena produce time series having power spectral density (i.e., power per unit of frequency) functions that follow a power law of the form
L(ν )
f =
b
ν
where ν is the frequency, b is a real number included in the interval [−2, 2], and L ν
( ) is a positive slowly varying or
constant function of ν . When b = 0, we have white noise or uncorrelated noise. When b ≠ 0, we have colored noise that has short-term autocorrelation (Kasdin 1995).
The Emergence of Chaos in Time
349
A
A
1+ τδ t
2+ τδ t
Ai+ τδ t
A
D
1
1
A
A D 2
2
A
k+ τδ t
A
3+ τδ t
i
D D
A 3 k
3
Ak
Ai+ τΔ t =
e−|| Ai− A 1|| A 1+ τΔ t + e−|| Ai− A 2|| A 2+ τΔ t+ e−|| Ai− A 3||
+ e−|| Ai− Ak||
A 3+ τΔ t
Ak+ τΔ t
e−|| Ai− Aj||
Σ kj = 1
e−|| Ai− Aj||
Σ kj = 1
e−|| Ai− Aj||
Σ kj = 1
e−|| Ai− Aj||
Σ kj = 1
FIGURE 10.20 Sketch and formula of the nonlinear local predictor.
In [10.71], W ( A
)
j , Ai is a nonlinear function that depends on the Euclidean distances d j = Ai − Aj (with j = 1, …, k) as indicated in [10.72]
(
)
e− Ai− Aj
W ( A
)
j , Ai =
[10.72]
k
(
)
e− Ai− Aj
∑ j=1
For the quantitative comparison of the predictions with the real data, we calculate the correlation
coefficient C:
Nts
( Ai,exp Ai,pred) −
( Ai,exp)
( Ai
∑
∑
∑ ,pred)
C
i
i
i
=
[10.73]
1/2
2
2
N
2
2
ts
( Ai,exp) −
Ai,exp
Nts
( i
A,pred) −
i
A,pred
∑
∑
∑
∑
i
i
i
i
In [10.73], N is the number of testing data, A
is the i-th experimental value, and A
is the i-th
ts
i, exp
i, pred
predicted value. The correlation coefficient always lies between +1 and −1. When C is close to 1,
there is a high degree of correlation. If C is negative, data A
and A
are anti-correlated. When
i, exp
i, pred
C is nearly zero, the predicted and the experimental data are independent, and the predictions are
not reliable. The procedure is repeated for increasing value of the prediction time τΔ t. A decreasing trend of C versus τΔ t is a proof that the aperiodic time series is not uncorrelated noise. On the other hand, if C is independent of τΔ t, the time series is white noise. Predictions of uncorrelated noise have a fixed amount of error, regardless of how far, or close, into the future one tries to project.11
TRY EXERCISE 10.15
10.8.2 PredicTion of The chaoTic Time series
The beginning of “modern” time series prediction is set at 1927, when Yule invented the autore-
gressive technique in order to predict the annual number of sunspots. His model predicted the next
value as a weighted sum of the previous observations of the series (Yule 1927). In the half-century
11 A deterministically chaotic time series may be distinguished from colored noise when the correlation coefficient obtained by the nonlinear local predictor is significantly better than the corresponding C obtained by the best-fitting autoregressive linear predictor (Sugihara and May 1990) where the predicted value A
is a linear function of previous values. For
i+τ t
∆
instance, A
, which is a first-order autoregressive model (read also next paragraph).
∆
ε
τ
=
+
i+ t
bAi
i+τ t
∆
350
Untangling Complex Systems
following Yule, the reigning paradigm remained that of linear models driven by noise (Gershenfeld
and Weigend 1993). An autoregressive model of order N looks like equation [10.74]:
N
xt =
aixt− i + t
∑
[10.74]
i=1
In [10.74], a ( 1
)
i i = ,
,
… N are the parameters of the model and t is white noise. Linear time series
models have two particularly desirable features; they can be understood in depth, and they are straight-
forward to implement. However, they have a relevant drawback; they may be entirely inappropriate for
even moderately complicated systems. Two crucial developments in aperiodic time series prediction
occurred around 1980. The first development was the state-space reconstruction by the time-delay
embedding; the second was the research line of machine learning, typified by the nonlinear artificial
neural networks, which can adaptively explore a large space of potential models. Both developments
were enabled by the availability of powerful computers that allowed much longer time series to be
recorded and more complex algorithms to be used. Since the 1980s, several models have been pro-
posed to understand and predict aperiodic time series. Such investigations interest many disciplines,
such as meteorology, medicine, economy, engineering, astrophysics, geology, chemistry, and many
others. In 1991, Doyne Farmer, head of the Complex Systems Group at the Los Alamos National
Laboratory, quit his job and cofounded, along with his longtime friend and fellow physicist, Norman
Packard, a firm called Prediction Company. The mission of their new firm was to develop fully auto-
mated trading systems, based on predictive models of markets. In the same year, the Santa Fe Institute
organized a competition, the Santa Fe Time Series Prediction and Analysis Competition, to compare
different prediction methods (Gershenfeld and Weigend 1993). Six time-series data sets were pro-
posed: fluctuations of a far-infrared laser (data set A); physiological data from a patient with sleep
apnea (data set B); currency exchange rate data (data set C); a numerically generated series (data set
D); astrophysical data from a variable star (data set E); and Bach’s final fugue (data set F). The main
benchmark was data set A consisting of 1000 points and with 100 points in the future to be predicted
by the competitors. The winner was E. A. Wan, who used a finite impulse response neural networks
for autoregressive time series prediction. In 1998, there was the K.U. Leuven Competition within an
international workshop titled “Advanced Black-Box Techniques for Nonlinear Modeling: Theory and
Applications” (Suykens and Vandewalle, 1998). The benchmark was a time series with 2000 data
sets generated from a computer simulation of Chua’s electronic circuit (read Box 10.2 of this Chapter
for more information about the Chua circuit). The task was to predict the next 200 points of the time
series. The winner was J. McNames who used the nearest trajectory method, which incorporated local
modeling and cross-validation techniques. More time series prediction competitions have been orga-
nized in the twenty-first century, several international symposia on forecasting have been arranged
and research groups focused on time series prediction, have been born. Usually, the best predictions
have been guaranteed by artificial neural network methods (see, e.g., Gentili et al. 2015 where an arti-
ficial neural network is compared with Fuzzy logic and the nonlinear local predictor).
10.8.2.1 Artificial Neural Networks
Artificial neural networks (ANNs) are algorithmization architectures simulating the behavior of
real nerve cells networks in the central nervous system (Fausett 1994; Hassoun 1995). They are well
suited to predict chaotic and stochastic time series, but also solve problems that are complex, ill-
defined, highly nonlinear, and to recognize variable patterns (also read Chapters 12 and 13). There are infinite ways of organizing a neural network, although there are just four ingredients needed to
build one of them.
The first ingredient is its architecture or connection patterns. Based on the architecture, ANNs are
grouped into two categories: (a) feed-forward networks, where graphs have no loops, and (b) feedback
(or recurrent) networks, where loops occur because of feedback connections. In feed-forward net-
works, neurons are organized into layers that have unidirectional connections between them.
The Emergence of Chaos in Time
351
(a)
(b)
(c)
(d)
FIGURE 10.21 Examples of activation functions: (a) threshold, (b) piecewise linear, (c) logistic, (d) Gaussian functions.
The second ingredient to generate an ANN is the ensemble of its activation functions that trans-
form the inputs of a node into output values. A mathematical neuron computes a weighted ( w ) sum
i
of n inputs, x (
1 2
)
i i = , ,
,
… n , and generates an output through an activation function f(.)
n
y = f
w
i xi
∑
[10.75]
i=1
Examples of f(.) are the threshold, piecewise linear, logistic, and Gaussian functions shown in
Figure 10.21.
The third ingredient is the cost function that estimates if the output is acceptable. The most
frequently used cost function is the squared error E ( t)
E ( t) = out( t) − target( t) 2 [10.76]
where:
out ( t) is the output at time t calculated from the input recorded at time t,
target ( t) is the desired output at t.
The fourth and last ingredient is the training algorithm, also known as the learning rule, which
modifies the parameters wi to minimize the chosen cost function. There are three types of learn-
ing paradigms: (1) supervised, (2) unsupervised, (3) hybrid learning. In supervised learning, the
network is provided with a correct output. Weights are determined to allow the network to produce
answers as close as possible to the known correct answers. Unsupervised learning does not require
a correct answer associated with each input pattern in the training data set. It explores the underly-
ing structure of the data, or correlations between patterns in the data, and organizes patterns into
classes from these correlations. Finally, hybrid learning combines supervised and unsupervised
learning. Part of the weights is usually determined through supervised learning, while the others are
obtained through unsupervised learning. There are many learning rules. The most frequently used
in time series prediction are the error-correction rules within the supervised learning paradigm.
They iteratively update the weights w by taking a small step (parametrized by the learning rate
i
η)
in the direction that decreases the squared error [10.76] the most. The updated weight wi will be
obtained from the previous weight w through the algorithm [10.77]:
