Untangling complex syste.., p.70

Untangling Complex Systems, page 70

 

Untangling Complex Systems
Select Voice:
Brian (uk)
Emma (uk)  
Amy (uk)
Eric (us)
Ivy (us)
Joey (us)
Salli (us)  
Justin (us)
Jennifer (us)  
Kimberly (us)  
Kendra (us)
Russell (au)
Nicole (au)


Larger Font   Reset Font Size   Smaller Font  

  C( m

  DC

  ,ε ) ∝ ε [10.68]

  In practice, the plot of log( C) veraus log( ε) is fitted by a straight line determined by the least-squares method. Its slope is the correlation dimension D . The procedure is repeated for different value

  C

  of m. If the dynamic is chaotic, D converges to a finite value that, often, is not an integer. If the C

  dynamic is stochastic, D does not converge and does not show a saturating value even at high m

  C

  values (Osborne and Provenzale 1989).

  10.8.1.6 Permutation Entropy

  Another parameter that is useful for discriminating between chaotic and stochastic data is the

  permutation entropy (Bandt and Pompe 2002). It represents the Shannon entropy of the permuta-

  tion patterns of the elements of the time series. In the phase space of embedding dimension m,

  we account for all m! possible permutations of all the vectors A { ( ), ( +τ ), , ( +( − )1τ )}

  i = A ti

  A ti

  … A ti m

  ,

  for i = 1,…, N − ( m − )

  1 τ . Every pattern among the m! possible permutations is labelled as π j (with

  j = 1,…, m!). We determine the abundance q(π )

  j of each pattern π j and calculate ts frequency

  ν (π )

  ( ) / ( (

  )1 )

  j = q π j

  N − m − τ . The permutation entropy ( S ) is the Shannon entropy associated

  P

  with the distribution of the permutation patterns:

  m!

  SP = −∑ν (π ) ilogν π

  2 ( i ) [10.69]

  j=1

  348

  Untangling Complex Systems

  The range of variability for S is 0 (for monotonically increasing or decreasing data) and log ( )

  2 m!

  P

  (for completely stochastic data). For this reason, we can normalize S by diving it by log ( )

  2 m! . The

  P

  normalized permutation entropy is:

  m

  ∑ !ν(π ) i log2ν(π i)

  S

  j=1

  P = −

  [10.70]

  log2 ( m )

  !

  It ranges between 0 and 1.

  10.8.1.7 Surrogate Data

  The surrogate data methods are useful and reliable statistical tests to explore the presence of chaos

  in a time series (Theiler et al. 1992). These methods have two ingredients: a null hypothesis tested

  against observations and a discriminating statistic. The null hypothesis is a potential explanation

  that we want to check is inadequate for interpreting the data we have collected. A discriminating

  statistic is a number that quantifies some features of the time series. If the statistical number is

  different for the observed data with respect to the surrogate data generated under a specific null

  hypothesis, then the null hypothesis can be rejected. Examples of null hypothesis are the following:

  a time series that is (I) linearly filtered noise; (II) linear transformation of linearly filtered noise;

  (III) monotonic nonlinear transformation of linearly filtered noise. Different methods for generating

  surrogate data have been proposed so far. For example, the Iterative Amplitude Adjusted Fourier

  Transformed (IAAFT) surrogate data method (Schreiber and Schmitz 1996) considers a nonlinear

  rescaling of a Gaussian linear stochastic process as the null hypothesis. On the other hand, the cycle

  surrogate data method (Small and Judd 1998) is based on the null hypothesis that each cycle in

  aperiodic dynamics is independent of its adjacent cycles. Usually, one begins with simple assump-

  tions and progresses to more sophisticated models if the collected time series is inconsistent with

  the surrogate data.

  10.8.1.8 Short-Term Predictability and Long-Term Unpredictability

  A feature of a chaotic time series is its unpredictability in the long term. It has been demonstrated

  (Sugihara and May 1990) that the accuracy of a nonlinear forecasting method falls off when it tries

  to predict chaotic time series, and the prediction-time interval is increased. On the other hand, the

  nonlinear forecasting method is roughly independent of the prediction time interval when it tries

  to predict time series that are uncorrelated noise.10 For the discernment of a chaotic from a white noise time series, we first need to build its phase space by exploiting the Takens’ theorem. Then,

  we plot the time series in its phase space. For each point Ai of the trajectory, it is possible to find its

  neighbors, which are Aj (with j = 1, …, k). For the prediction of the value Ai+τ t∆, which lies τ∆ t ahead in its phase space (see Figure 10.20), the nonlinear predictor exploits the corresponding Aj+τ t∆ (with j = 1, …, k) and the following algorithm:

  k

  A τ∆

  ∑ ( , )

  i+ t =

  W Aj Ai * Aj+τ t∆ [10.71]

  j=1

  10 Noise is a signal generated by a stochastic process. Stochastic phenomena produce time series having power spectral density (i.e., power per unit of frequency) functions that follow a power law of the form

  L(ν )

  f =

  b

  ν

  where ν is the frequency, b is a real number included in the interval [−2, 2], and L ν

  ( ) is a positive slowly varying or

  constant function of ν . When b = 0, we have white noise or uncorrelated noise. When b ≠ 0, we have colored noise that has short-term autocorrelation (Kasdin 1995).

  The Emergence of Chaos in Time

  349

  A

  A

  1+ τδ t

  2+ τδ t

  Ai+ τδ t

  A

  D

  1

  1

  A

  A D 2

  2

  A

  k+ τδ t

  A

  3+ τδ t

  i

  D D

  A 3 k

  3

  Ak

  Ai+ τΔ t =

  e−|| Ai− A 1|| A 1+ τΔ t + e−|| Ai− A 2|| A 2+ τΔ t+ e−|| Ai− A 3||

  + e−|| Ai− Ak||

  A 3+ τΔ t

  Ak+ τΔ t

  e−|| Ai− Aj||

  Σ kj = 1

  e−|| Ai− Aj||

  Σ kj = 1

  e−|| Ai− Aj||

  Σ kj = 1

  e−|| Ai− Aj||

  Σ kj = 1

  FIGURE 10.20 Sketch and formula of the nonlinear local predictor.

  In [10.71], W ( A

  )

  j , Ai is a nonlinear function that depends on the Euclidean distances d j = Ai − Aj (with j = 1, …, k) as indicated in [10.72]

  (

  )

  e− Ai− Aj

  W ( A

  )

  j , Ai =

  [10.72]

  k

  (

  )

  e− Ai− Aj

  ∑ j=1

  For the quantitative comparison of the predictions with the real data, we calculate the correlation

  coefficient C:

  Nts

  ( Ai,exp Ai,pred) −

  ( Ai,exp)

  ( Ai

  ∑

  ∑

  ∑ ,pred)

  C

  i

  i

  i

  =

  [10.73]

  

  1/2

  

  2 

  2

  

  

  

  

  

  N

  2

  

  2

  

  ts

  ( Ai,exp) − 

  Ai,exp 

  Nts

  ( i

  A,pred) − 

  i

  A,pred 

  

  

  ∑

  

  ∑

  ∑

   

  ∑

   

  i

  

   i

  

  i

   i

  

  

  

  

  

  In [10.73], N is the number of testing data, A

  is the i-th experimental value, and A

  is the i-th

  ts

  i, exp

  i, pred

  predicted value. The correlation coefficient always lies between +1 and −1. When C is close to 1,

  there is a high degree of correlation. If C is negative, data A

  and A

  are anti-correlated. When

  i, exp

  i, pred

  C is nearly zero, the predicted and the experimental data are independent, and the predictions are

  not reliable. The procedure is repeated for increasing value of the prediction time τΔ t. A decreasing trend of C versus τΔ t is a proof that the aperiodic time series is not uncorrelated noise. On the other hand, if C is independent of τΔ t, the time series is white noise. Predictions of uncorrelated noise have a fixed amount of error, regardless of how far, or close, into the future one tries to project.11

  TRY EXERCISE 10.15

  10.8.2 PredicTion of The chaoTic Time series

  The beginning of “modern” time series prediction is set at 1927, when Yule invented the autore-

  gressive technique in order to predict the annual number of sunspots. His model predicted the next

  value as a weighted sum of the previous observations of the series (Yule 1927). In the half-century

  11 A deterministically chaotic time series may be distinguished from colored noise when the correlation coefficient obtained by the nonlinear local predictor is significantly better than the corresponding C obtained by the best-fitting autoregressive linear predictor (Sugihara and May 1990) where the predicted value A

  is a linear function of previous values. For

  i+τ t

  ∆

  instance, A

  , which is a first-order autoregressive model (read also next paragraph).

  ∆

  ε

  τ

  =

  +

  i+ t

  bAi

  i+τ t

  ∆

  350

  Untangling Complex Systems

  following Yule, the reigning paradigm remained that of linear models driven by noise (Gershenfeld

  and Weigend 1993). An autoregressive model of order N looks like equation [10.74]:

  N

  xt =

  aixt− i + t

  ∑

  [10.74]

  i=1

  In [10.74], a ( 1

  )

  i i = ,

  ,

  … N are the parameters of the model and  t is white noise. Linear time series

  models have two particularly desirable features; they can be understood in depth, and they are straight-

  forward to implement. However, they have a relevant drawback; they may be entirely inappropriate for

  even moderately complicated systems. Two crucial developments in aperiodic time series prediction

  occurred around 1980. The first development was the state-space reconstruction by the time-delay

  embedding; the second was the research line of machine learning, typified by the nonlinear artificial

  neural networks, which can adaptively explore a large space of potential models. Both developments

  were enabled by the availability of powerful computers that allowed much longer time series to be

  recorded and more complex algorithms to be used. Since the 1980s, several models have been pro-

  posed to understand and predict aperiodic time series. Such investigations interest many disciplines,

  such as meteorology, medicine, economy, engineering, astrophysics, geology, chemistry, and many

  others. In 1991, Doyne Farmer, head of the Complex Systems Group at the Los Alamos National

  Laboratory, quit his job and cofounded, along with his longtime friend and fellow physicist, Norman

  Packard, a firm called Prediction Company. The mission of their new firm was to develop fully auto-

  mated trading systems, based on predictive models of markets. In the same year, the Santa Fe Institute

  organized a competition, the Santa Fe Time Series Prediction and Analysis Competition, to compare

  different prediction methods (Gershenfeld and Weigend 1993). Six time-series data sets were pro-

  posed: fluctuations of a far-infrared laser (data set A); physiological data from a patient with sleep

  apnea (data set B); currency exchange rate data (data set C); a numerically generated series (data set

  D); astrophysical data from a variable star (data set E); and Bach’s final fugue (data set F). The main

  benchmark was data set A consisting of 1000 points and with 100 points in the future to be predicted

  by the competitors. The winner was E. A. Wan, who used a finite impulse response neural networks

  for autoregressive time series prediction. In 1998, there was the K.U. Leuven Competition within an

  international workshop titled “Advanced Black-Box Techniques for Nonlinear Modeling: Theory and

  Applications” (Suykens and Vandewalle, 1998). The benchmark was a time series with 2000 data

  sets generated from a computer simulation of Chua’s electronic circuit (read Box 10.2 of this Chapter

  for more information about the Chua circuit). The task was to predict the next 200 points of the time

  series. The winner was J. McNames who used the nearest trajectory method, which incorporated local

  modeling and cross-validation techniques. More time series prediction competitions have been orga-

  nized in the twenty-first century, several international symposia on forecasting have been arranged

  and research groups focused on time series prediction, have been born. Usually, the best predictions

  have been guaranteed by artificial neural network methods (see, e.g., Gentili et al. 2015 where an arti-

  ficial neural network is compared with Fuzzy logic and the nonlinear local predictor).

  10.8.2.1 Artificial Neural Networks

  Artificial neural networks (ANNs) are algorithmization architectures simulating the behavior of

  real nerve cells networks in the central nervous system (Fausett 1994; Hassoun 1995). They are well

  suited to predict chaotic and stochastic time series, but also solve problems that are complex, ill-

  defined, highly nonlinear, and to recognize variable patterns (also read Chapters 12 and 13). There are infinite ways of organizing a neural network, although there are just four ingredients needed to

  build one of them.

  The first ingredient is its architecture or connection patterns. Based on the architecture, ANNs are

  grouped into two categories: (a) feed-forward networks, where graphs have no loops, and (b) feedback

  (or recurrent) networks, where loops occur because of feedback connections. In feed-forward net-

  works, neurons are organized into layers that have unidirectional connections between them.

  The Emergence of Chaos in Time

  351

  (a)

  (b)

  (c)

  (d)

  FIGURE 10.21 Examples of activation functions: (a) threshold, (b) piecewise linear, (c) logistic, (d) Gaussian functions.

  The second ingredient to generate an ANN is the ensemble of its activation functions that trans-

  form the inputs of a node into output values. A mathematical neuron computes a weighted ( w ) sum

  i

  of n inputs, x (

  1 2

  )

  i i = , ,

  ,

  … n , and generates an output through an activation function f(.)

  n

  

  

  y = f 

  w 

  i xi

  ∑

   [10.75]

   i=1

  

  Examples of f(.) are the threshold, piecewise linear, logistic, and Gaussian functions shown in

  Figure 10.21.

  The third ingredient is the cost function that estimates if the output is acceptable. The most

  frequently used cost function is the squared error E ( t)

  E ( t) = out( t) − target( t) 2 [10.76]

  where:

  out ( t) is the output at time t calculated from the input recorded at time t,

  target ( t) is the desired output at t.

  The fourth and last ingredient is the training algorithm, also known as the learning rule, which

  modifies the parameters wi to minimize the chosen cost function. There are three types of learn-

  ing paradigms: (1) supervised, (2) unsupervised, (3) hybrid learning. In supervised learning, the

  network is provided with a correct output. Weights are determined to allow the network to produce

  answers as close as possible to the known correct answers. Unsupervised learning does not require

  a correct answer associated with each input pattern in the training data set. It explores the underly-

  ing structure of the data, or correlations between patterns in the data, and organizes patterns into

  classes from these correlations. Finally, hybrid learning combines supervised and unsupervised

  learning. Part of the weights is usually determined through supervised learning, while the others are

  obtained through unsupervised learning. There are many learning rules. The most frequently used

  in time series prediction are the error-correction rules within the supervised learning paradigm.

  They iteratively update the weights w by taking a small step (parametrized by the learning rate

  i

  η)

  in the direction that decreases the squared error [10.76] the most. The updated weight wi will be

  obtained from the previous weight w through the algorithm [10.77]:

 

Add Fast Bookmark
Load Fast Bookmark
Turn Navi On
Turn Navi On
Turn Navi On
Scroll Up
Turn Navi On
Scroll
Turn Navi On
183