Untangling Complex Systems (Pier Luigi Gentili) » p.70

Untangling complex syste.., p.70

Untangling Complex Systems, page 70

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112

C( m

DC

,ε ) ∝ ε [10.68]

In practice, the plot of log( C) veraus log( ε) is fitted by a straight line determined by the least-squares method. Its slope is the correlation dimension D . The procedure is repeated for different value

C

of m. If the dynamic is chaotic, D converges to a finite value that, often, is not an integer. If the C

dynamic is stochastic, D does not converge and does not show a saturating value even at high m

C

values (Osborne and Provenzale 1989).

10.8.1.6 Permutation Entropy

Another parameter that is useful for discriminating between chaotic and stochastic data is the

permutation entropy (Bandt and Pompe 2002). It represents the Shannon entropy of the permuta-

tion patterns of the elements of the time series. In the phase space of embedding dimension m,

we account for all m! possible permutations of all the vectors A { ( ), ( +τ ), , ( +( − )1τ )}

i = A ti

A ti

… A ti m

,

for i = 1,…, N − ( m − )

1 τ . Every pattern among the m! possible permutations is labelled as π j (with

j = 1,…, m!). We determine the abundance q(π )

j of each pattern π j and calculate ts frequency

ν (π )

( ) / ( (

)1 )

j = q π j

N − m − τ . The permutation entropy ( S ) is the Shannon entropy associated

P

with the distribution of the permutation patterns:

m!

SP = −∑ν (π ) ilogν π

2 ( i ) [10.69]

j=1

348

Untangling Complex Systems

The range of variability for S is 0 (for monotonically increasing or decreasing data) and log ( )

2 m!

P

(for completely stochastic data). For this reason, we can normalize S by diving it by log ( )

2 m! . The

P

normalized permutation entropy is:

m

∑ !ν(π ) i log2ν(π i)

S

j=1

P = −

[10.70]

log2 ( m )

!

It ranges between 0 and 1.

10.8.1.7 Surrogate Data

The surrogate data methods are useful and reliable statistical tests to explore the presence of chaos

in a time series (Theiler et al. 1992). These methods have two ingredients: a null hypothesis tested

against observations and a discriminating statistic. The null hypothesis is a potential explanation

that we want to check is inadequate for interpreting the data we have collected. A discriminating

statistic is a number that quantifies some features of the time series. If the statistical number is

different for the observed data with respect to the surrogate data generated under a specific null

hypothesis, then the null hypothesis can be rejected. Examples of null hypothesis are the following:

a time series that is (I) linearly filtered noise; (II) linear transformation of linearly filtered noise;

(III) monotonic nonlinear transformation of linearly filtered noise. Different methods for generating

surrogate data have been proposed so far. For example, the Iterative Amplitude Adjusted Fourier

Transformed (IAAFT) surrogate data method (Schreiber and Schmitz 1996) considers a nonlinear

rescaling of a Gaussian linear stochastic process as the null hypothesis. On the other hand, the cycle

surrogate data method (Small and Judd 1998) is based on the null hypothesis that each cycle in

aperiodic dynamics is independent of its adjacent cycles. Usually, one begins with simple assump-

tions and progresses to more sophisticated models if the collected time series is inconsistent with

the surrogate data.

10.8.1.8 Short-Term Predictability and Long-Term Unpredictability

A feature of a chaotic time series is its unpredictability in the long term. It has been demonstrated

(Sugihara and May 1990) that the accuracy of a nonlinear forecasting method falls off when it tries

to predict chaotic time series, and the prediction-time interval is increased. On the other hand, the

nonlinear forecasting method is roughly independent of the prediction time interval when it tries

to predict time series that are uncorrelated noise.10 For the discernment of a chaotic from a white noise time series, we first need to build its phase space by exploiting the Takens’ theorem. Then,

we plot the time series in its phase space. For each point Ai of the trajectory, it is possible to find its

neighbors, which are Aj (with j = 1, …, k). For the prediction of the value Ai+τ t∆, which lies τ∆ t ahead in its phase space (see Figure 10.20), the nonlinear predictor exploits the corresponding Aj+τ t∆ (with j = 1, …, k) and the following algorithm:

k

A τ∆

∑ ( , )

i+ t =

W Aj Ai * Aj+τ t∆ [10.71]

j=1

10 Noise is a signal generated by a stochastic process. Stochastic phenomena produce time series having power spectral density (i.e., power per unit of frequency) functions that follow a power law of the form

L(ν )

f =

b

ν

where ν is the frequency, b is a real number included in the interval [−2, 2], and L ν

( ) is a positive slowly varying or

constant function of ν . When b = 0, we have white noise or uncorrelated noise. When b ≠ 0, we have colored noise that has short-term autocorrelation (Kasdin 1995).

The Emergence of Chaos in Time

349

A

A

1+ τδ t

2+ τδ t

Ai+ τδ t

A

D

1

1

A

A D 2

2

A

k+ τδ t

A

3+ τδ t

i

D D

A 3 k

3

Ak

Ai+ τΔ t =

e−|| Ai− A 1|| A 1+ τΔ t + e−|| Ai− A 2|| A 2+ τΔ t+ e−|| Ai− A 3||

+ e−|| Ai− Ak||

A 3+ τΔ t

Ak+ τΔ t

e−|| Ai− Aj||

Σ kj = 1

e−|| Ai− Aj||

Σ kj = 1

e−|| Ai− Aj||

Σ kj = 1

e−|| Ai− Aj||

Σ kj = 1

FIGURE 10.20 Sketch and formula of the nonlinear local predictor.

In [10.71], W ( A

)

j , Ai is a nonlinear function that depends on the Euclidean distances d j = Ai − Aj (with j = 1, …, k) as indicated in [10.72]

(

)

e− Ai− Aj

W ( A

)

j , Ai =

[10.72]

k

(

)

e− Ai− Aj

∑ j=1

For the quantitative comparison of the predictions with the real data, we calculate the correlation

coefficient C:

Nts

( Ai,exp Ai,pred) −

( Ai,exp)

( Ai

∑

∑

∑ ,pred)

C

i

i

i

=

[10.73]



1/2



2 

2











N

2



2



ts

( Ai,exp) − 

Ai,exp 

Nts

( i

A,pred) − 

i

A,pred 





∑



∑

∑

 

∑

 

i



 i



i

 i











In [10.73], N is the number of testing data, A

is the i-th experimental value, and A

is the i-th

ts

i, exp

i, pred

predicted value. The correlation coefficient always lies between +1 and −1. When C is close to 1,

there is a high degree of correlation. If C is negative, data A

and A

are anti-correlated. When

i, exp

i, pred

C is nearly zero, the predicted and the experimental data are independent, and the predictions are

not reliable. The procedure is repeated for increasing value of the prediction time τΔ t. A decreasing trend of C versus τΔ t is a proof that the aperiodic time series is not uncorrelated noise. On the other hand, if C is independent of τΔ t, the time series is white noise. Predictions of uncorrelated noise have a fixed amount of error, regardless of how far, or close, into the future one tries to project.11

TRY EXERCISE 10.15

10.8.2 PredicTion of The chaoTic Time series

The beginning of “modern” time series prediction is set at 1927, when Yule invented the autore-

gressive technique in order to predict the annual number of sunspots. His model predicted the next

value as a weighted sum of the previous observations of the series (Yule 1927). In the half-century

11 A deterministically chaotic time series may be distinguished from colored noise when the correlation coefficient obtained by the nonlinear local predictor is significantly better than the corresponding C obtained by the best-fitting autoregressive linear predictor (Sugihara and May 1990) where the predicted value A

is a linear function of previous values. For

i+τ t

∆

instance, A

, which is a first-order autoregressive model (read also next paragraph).

∆

ε

τ

=

+

i+ t

bAi

i+τ t

∆

350

Untangling Complex Systems

following Yule, the reigning paradigm remained that of linear models driven by noise (Gershenfeld

and Weigend 1993). An autoregressive model of order N looks like equation [10.74]:

N

xt =

aixt− i + t

∑

[10.74]

i=1

In [10.74], a ( 1

)

i i = ,

,

… N are the parameters of the model and  t is white noise. Linear time series

models have two particularly desirable features; they can be understood in depth, and they are straight-

forward to implement. However, they have a relevant drawback; they may be entirely inappropriate for

even moderately complicated systems. Two crucial developments in aperiodic time series prediction

occurred around 1980. The first development was the state-space reconstruction by the time-delay

embedding; the second was the research line of machine learning, typified by the nonlinear artificial

neural networks, which can adaptively explore a large space of potential models. Both developments

were enabled by the availability of powerful computers that allowed much longer time series to be

recorded and more complex algorithms to be used. Since the 1980s, several models have been pro-

posed to understand and predict aperiodic time series. Such investigations interest many disciplines,

such as meteorology, medicine, economy, engineering, astrophysics, geology, chemistry, and many

others. In 1991, Doyne Farmer, head of the Complex Systems Group at the Los Alamos National

Laboratory, quit his job and cofounded, along with his longtime friend and fellow physicist, Norman

Packard, a firm called Prediction Company. The mission of their new firm was to develop fully auto-

mated trading systems, based on predictive models of markets. In the same year, the Santa Fe Institute

organized a competition, the Santa Fe Time Series Prediction and Analysis Competition, to compare

different prediction methods (Gershenfeld and Weigend 1993). Six time-series data sets were pro-

posed: fluctuations of a far-infrared laser (data set A); physiological data from a patient with sleep

apnea (data set B); currency exchange rate data (data set C); a numerically generated series (data set

D); astrophysical data from a variable star (data set E); and Bach’s final fugue (data set F). The main

benchmark was data set A consisting of 1000 points and with 100 points in the future to be predicted

by the competitors. The winner was E. A. Wan, who used a finite impulse response neural networks

for autoregressive time series prediction. In 1998, there was the K.U. Leuven Competition within an

international workshop titled “Advanced Black-Box Techniques for Nonlinear Modeling: Theory and

Applications” (Suykens and Vandewalle, 1998). The benchmark was a time series with 2000 data

sets generated from a computer simulation of Chua’s electronic circuit (read Box 10.2 of this Chapter

for more information about the Chua circuit). The task was to predict the next 200 points of the time

series. The winner was J. McNames who used the nearest trajectory method, which incorporated local

modeling and cross-validation techniques. More time series prediction competitions have been orga-

nized in the twenty-first century, several international symposia on forecasting have been arranged

and research groups focused on time series prediction, have been born. Usually, the best predictions

have been guaranteed by artificial neural network methods (see, e.g., Gentili et al. 2015 where an arti-

ficial neural network is compared with Fuzzy logic and the nonlinear local predictor).

10.8.2.1 Artificial Neural Networks

Artificial neural networks (ANNs) are algorithmization architectures simulating the behavior of

real nerve cells networks in the central nervous system (Fausett 1994; Hassoun 1995). They are well

suited to predict chaotic and stochastic time series, but also solve problems that are complex, ill-

defined, highly nonlinear, and to recognize variable patterns (also read Chapters 12 and 13). There are infinite ways of organizing a neural network, although there are just four ingredients needed to

build one of them.

The first ingredient is its architecture or connection patterns. Based on the architecture, ANNs are

grouped into two categories: (a) feed-forward networks, where graphs have no loops, and (b) feedback

(or recurrent) networks, where loops occur because of feedback connections. In feed-forward net-

works, neurons are organized into layers that have unidirectional connections between them.

The Emergence of Chaos in Time

351

(a)

(b)

(c)

(d)

FIGURE 10.21 Examples of activation functions: (a) threshold, (b) piecewise linear, (c) logistic, (d) Gaussian functions.

The second ingredient to generate an ANN is the ensemble of its activation functions that trans-

form the inputs of a node into output values. A mathematical neuron computes a weighted ( w ) sum

i

of n inputs, x (

1 2

)

i i = , ,

,

… n , and generates an output through an activation function f(.)

n





y = f 

w 

i xi

∑

 [10.75]

 i=1



Examples of f(.) are the threshold, piecewise linear, logistic, and Gaussian functions shown in

Figure 10.21.

The third ingredient is the cost function that estimates if the output is acceptable. The most

frequently used cost function is the squared error E ( t)

E ( t) = out( t) − target( t) 2 [10.76]

where:

out ( t) is the output at time t calculated from the input recorded at time t,

target ( t) is the desired output at t.

The fourth and last ingredient is the training algorithm, also known as the learning rule, which

modifies the parameters wi to minimize the chosen cost function. There are three types of learn-

ing paradigms: (1) supervised, (2) unsupervised, (3) hybrid learning. In supervised learning, the

network is provided with a correct output. Weights are determined to allow the network to produce

answers as close as possible to the known correct answers. Unsupervised learning does not require

a correct answer associated with each input pattern in the training data set. It explores the underly-

ing structure of the data, or correlations between patterns in the data, and organizes patterns into

classes from these correlations. Finally, hybrid learning combines supervised and unsupervised

learning. Part of the weights is usually determined through supervised learning, while the others are

obtained through unsupervised learning. There are many learning rules. The most frequently used

in time series prediction are the error-correction rules within the supervised learning paradigm.

They iteratively update the weights w by taking a small step (parametrized by the learning rate

i

η)

in the direction that decreases the squared error [10.76] the most. The updated weight wi will be

obtained from the previous weight w through the algorithm [10.77]:

Untangling Complex Systems, page 70

Other author's books: