11— Non-Linear Transformation Studies On Electronic Computers: With P. R. Stein (LADC-5688, 1963)

### Appendix I

In this appendix we collect some general remarks about the process of iterating transformations, particularly in one dimension. We also discuss, in some detail, a few special one-dimensional transformations which we have had occasion to study.

1. One of the first simple transformations whose iterative properties were established is the following: x'= f(x) = 4x(1 - x) To obtain these properties we consider, instead of (1), the broken-linear transformation:* 2x for x< 1/2 , x: =g(x), g(x) = 2(1-x) forx > 1/2 (2) 2(1\- x) for xa> 1/2

The study of the iterates of this transformation is equivalent to investigating the iterates of a function S(x) defined as follows: if x = O.a 1a 2a 3... an..., where the ai are either 0 or 1, (3) then S(x) =.a2a3a4 .... (4)

In other words, S(x) is merely a left shift of the binary word x by one place. The iterative properties of S(x) are in turn deducible from the law of large numbers in the case of Bernoulli. In effect S(i)(x) falls into the first half of the interval if and only if ai = 0. The ergodic average E F(l)[S)(x) i=l is therefore the same as the fraction of ones among the ai for 1 <i < N. F(I) is the characteristic function of the interval [0,1/2].

The relation between (1) and (2) is that of conjugacy: there is a homeomorphism h(x) of the interval [0, 1] with itself such that g(x) = h[f[h-(x)]] . (5)

* This transformation has already been mentioned in section VI.

350

Thus the study of the iterates of the quadratic transformation (1) reduces to the corresponding study for the broken-linear transformation (2). In this case, h(x) can be written down explicitly:15 2 h(x) =- sin-l(V/) . (6) 7r

2. The Set of Exceptional Points. In the case of the function f(x) = 4x(1 - x), it is true, then, for almost every* initial point, that the sequence of iterated images will be everywhere dense in the interval, and what is more, the ergodic limit can be explicitly computed; it is positive for every sub-interval.

There exist, however, initial points x such that the sequence x, f(x), f[f(x)],... is not dense in the whole interval [0,1]. Obviously, all periodic points, i.e., points such that, for some n, f(n)(x) = x, are of this sort. It is interesting to notice, however, that there exist points x for which the sequence f(i)(x), i = 1,2,..., is infinite without being dense; there are, in fact, non-countably many such points. To show this we consider the equivalent problem of exhibiting such points for the function S(x) introduced above. The construction then proceeds as follows. Consider a point x = 0.ala2 ... an ... We define a set Z consisting of all those x's which have an = an+i =an+2 for all n of the form n = 3i. In other words, the set Z consists of points which have every binary digit repeated three times, the sequence being otherwise arbitrary. Consider now the transformation x' = S(x), where S(x), as defined in (4) above, is a shift of x one index to the left. We now look at the subinterval from 0.010 to 0.011. Starting with any point in Z, it is clear that no iterated image will fall in this sub-interval; no three successive binary digits of a point in Z are of the form (010). It is easy to see that Z contains non-countably many points; in particular, it contains non-periodic points, so that the set of images S(i)(x), i = 1,2, ... is infinite, but not dense in [0,1].

Presumably one can find points in Z for which the ergodic limit exists. The measure of the set Z is zero, but relative to Z, the set S of those points for which the sojourn time exists still form a majority. We may define majority either in the sense of Baire category or as follows. Take points na (mod 1), n = 1,2,..., where a is an irrational constant. Consider the set N1 of those n's for which na (mod 1) belongs to Z, and also the set N 2 of those n's for which na (mod 1) belongs to S. We then say that the points belonging to S form a majority of those points belonging to Z if the relative frequency of N2 in N1 is one.

* Almost every is to be understood in the sense of Lebesgue measure.

351

The behavior exhibited by the points belonging to Z is more general, in the sense of measure, for some other transformations of the interval [0,1]. It is possible to give examples of continuous functions such that, for almost every point, the iterated sequence will be nowhere dense in the interval, although the sequence does not converge to a fixed point.

3. A Remark on Conjugacy Theorem.Let g(x) be the broken-linear function of equation (2) below, i.e., 2x, o<x<g(x)-21 (2) 0 g(x)= 2(1 - x), 2 >x > 1

Let t(x) be a convex function on [0.1] which transforms the interval into itself, and such that t(O) = t(1) = O. For some p in the interval, we must have t(p) = 1; by convexity, there is only one such point. Consider the lower tree * generated by the point 1. The necessary and sufficient condition that t(x) be conjugate to g(x) is that this tree be combinatorially the same as that generated by 1 under g(x), and that the closure of this set of points be the whole interval, i.e., that the tree be dense in [0,1].

The condition is obviously necessary, since the point 1 generates a tree under g(x) which is simply the set of binary rational points. Under any homeomorphism h(x) which has to effect the conjugacy, the point 1/2 must go over into p, and our assertion follows.

To prove sufficiency, we construct h(x) in the following manner. We take h(1/2) = p by definition. We next chose h(1/4) to be the smaller of the two values of t-1 (p); h(3/4) is then by definition the larger of these two values. We then take h(3/8) to be the smaller of the two values of t- [h(3/4)], and so on. Proceeding in this fashion, we thus construct a function h(x) defined for all binary rationals. It remains to prove that we can define it for all x by passage to the limit. This, however, follows from the assumption that these points are dense in [0,1] and that their order is preserved. The function h(x) will obviously be monotonic, and, being continuous, will possess an inverse h-l(x). From our construction it then follows that h[g(x)] = t[h(x)].

* By the lower tree of p (under f(x)) we understand the smallest set of all points with the following properties: (a) The set contains the given point p. (b) If a point belongs to the set, then so do all its counter-images under f.

352

4.Broken-Linear Transformations. In one dimension these are functions f(x) that are continuous on [0,1] and linear in subintervals of [0,1]. We assure that the graph of the function has a finite number of vertices, i.e., that f(x) consists of a finite number of lines fitted together continuously. For these broken-linear transformations one certainly expects that the ergodic limit exists for almost every point. For example: if one considers the sequence of iterated images T(W)(p), then, for almost every initial point p, the time of sojourn should exist for all sub-intervals, i.e., N lim N fR[T(i)(p)] i=l

should exist for almost all p and all measurable sets R; here fR is the characteristic function of the set R.* The value of this limit may indeed depend on the initial point p; it is likely, however, that all the points of the interval can be divided into a finite number of classes such that, within each class, the value of the limit is the same.**

There is another finiteness property that these transformations may possess. Given n, consider all broken-linear transformations which have at most n pieces (i.e., the space divides into n regions, in each of which the transformation is linear). Then it may be conjectured that there are only a finite number of different types of such transformations, where any two transformations of the same type are asymptotically similar (in the sense defined above, section VI). In other words, according to this conjecture, the type (or class) that a given transformation belongs to does not depend on the precise numerical values of

* We should perhaps mention here a more general conjecture. Suppose T is a polynomial transformation of the sort described by equations (10) to (13) of the Introduction. We then conjecture that the sequence of iterated images T(n)(p) has the following property: Let C be any cone of directions in n-space, and let fc(p) be the characteristic function of this cone, i.e., fc(P) = 0 if p does not belong to C, fc(p) = 1 otherwise. Then, for almost every p, exists.(l6)

** See pages 71, 72 of S. Ulam, A Collection of Mathematical Problems, New York 1960. An analogous conjecture can be made concerning the actual limit sets Lp(T) for four cubics in three variables.

353

the coordinates of those points where the derivative is undefined (corner points), but is determined only by the combinatorial structure of the subdivision of space into linear domains. In one dimension this means that the type of a transformation is determined by the number and interrelation of the nodes in the graph of the function, and not by their precise location.

5. Numerical Accuracy. The machines we use to compute the iteration process work with a fixed number of significant digits;* in MANIAC II, for example, this number is eight. It is therefore clear that any direct, single-step iterative process carried out on this computer will exhibit a period in not more than 108 steps. Given an algorithm which is iterative and of first order [i.e., the nth step depends only on the (n - )th], the process will, with great probability, exhibit a period which is much shorter. Statistically, one can reason as follows. If we assume a random distribution of, say, the last four digits of all computed numbers, then the probability that the cycle will close long before the full theoretical run of 108 steps is extremely close to one. Indeed, after producing numbers A1, A2,...,Ak, the chance that Ak+1 will be equal to one of the preceding numbers is of the order of k/108 . If we continue the calculations up to A2k, the chance that at least two numbers in the chain will coincide is approximately

/ k\k This is practically equal to 1 1-- if k -104 e

Clearly, going to 3k, 4k, ..., the probability that the chain will be cyclic gets very close to one. The situation is quite different in an iterative process involving two or more variables, e.g., computing (Ak+l, Bk+l) from (Ak, Bk). On a probabilitic basis alone, one expects to encounter periods of length ~ 108 (the maximum possible being 1016). In practice, this means that in two dimensions one does not expect to encounter periods of accidental or false periodicity unless one generates very long sequences.

* For any particular machine one can increase the accuracy by restoring to so-called multiprecision arithmetic. This can generally be done only at the cost of considerable loss in speed.

354

As the argument above shows, fortuitous periodicity can be a very real danger in one-dimensional iterative calculations. Indeed, we came across a striking example in the course of studying the asymptotic properties of the transformation. y = sin ry, 0 < y 1. (7)

This classical transformation is symmetric about y = 1/2, and maps the unit interval into itself. Furthermore, both fixed points-y = 0 and y = 0.7364845-are repellent. Now a simple argument shows that for such a function there cannot be any attractive periods of any finite order, that is, the derivative dTk (y) IY=Y dy

evaluated at any periodic point T(k)(yp) = yp is always greater than one in absolute value.* Nevertheless, when we performed the iteration on MANIAC II, the limit set we observed was invariably one of two finite periods, the first of order 1578, the second of order 6168. These periods were exact to the last available binary digit! This spurious convergence is undoubtedly produced by the complicated interaction of several factors, e.g., the particular machine algorithms for multiplication and round-off, our choice of finite approximation to the function sin7y, and so forth. When we repeated the calculation on the STRETCH computer (which works with about 15 significant figures), there was no observable tendency toward convergence to such periods.

If one is interested in obtaining the asymptotic distribution of iterates under some transformation like (7), one must either provide for more significant figures or resort to ingenious devices. One such trick which appears to work well and is relatively convenient-consists essentially in computing the inverse transformation. Since the functions

* It is essential for this argument that the transformation maps the whole interval into itself. The number of distinct periods of order k can then be easily enumerated. The conclusion follows on noting that dydT(k) (y)

must be the same for all points y belonging to the same period. If, however, the maximum value of the function is less than one, T(k) (y) can have relative minima above the y axis; then, indeed, there may exist attractive periods, i.e., the line y' _ T(k)(y) = y may intersect the curve sufficiently close to an extremum so that the kth order fixed point is attractive.

355

under consideration are two-valued, this involves introducing a random choice at each step. Specifically, taking some initial point po, we compute the sequence o, f-1)(po), f-2(po),....

Here the symbol f-k(po) implicitly contains the prescription that we choose one of the two values of the true inverse at random; thus the sequence f-i(p)_= f[f-(i-1)(p)], i = 1,2,..,

implies a sequence of random decisions as to which counter-image to choose at each step. If the calculation is carried out in this fashion, the chance of falling into an exact period is vanishingly small until we reach a chain length in the neighborhood of the theoretical maximum. Once having obtained our inverse sequence, we can conceptually invert it, that is pretend that we started with the last point and proceeded to po by direct iteration.*'**

6. In this sub-section we present some results of a study of a particular one-dimensional transformation:*** T,: y' = W(3 - 3W + aW2 ), W =3y(1 - y) . (8) This arises in a natural manner from a certain sub-class of our generalized cubic transformations in three variables: 10 Xi=dijMj, (9) j=l

' One cannot, of course, actually reverse the calculation and expect to reproduce the sequence. If one could, there would be no need for this stochastic device.

** This procedure presupposes a good method for generating random numbers. There are several of these which are well suited for use in automatic digital computers. One of the most common-and, in fact, the method used by us-is to generate the numbers by the chain: Rn+l - RoR,(mod 2k), where k is the binary word length of the machine, and Ro is some properly chosen constant. On MANIAC II, Ro = 513. This chain closes, but its length is greater than 1012. The lowest order binary digits are themselves not random, but this makes no difference in practice.

*** This transformation was introduced by the way of illustration in section I.

356

3 dij = 1, all j, 0 < dij 1 . (10) i=l Namely, we choose certain of the dij as follows: d13 -= , d17 = d19 = d34 = d 36 = 1, d 31 = 0.

The rest are arbitrary, except that they must, of course satisfy (10). If we restrict ourselves to the sub-class of initial points such that x3 = 0 (i.e., the side S + a = 1 of our reference triangle), the second power of the transformation can be written in the form (8). Ta is symmetric about y = 1/2, but it does not map the whole interval [0,1] into itself; its maximum value is Ymax = T ( 4 (11)

so this is the right-hand boundary of the invariant sub-region. The lefthand boundary is then the image of y' . In the range 1> a >0.9, the fixed point varies from yo = 0.8224922 at a = 1 down to yo = 0.8193719 at a = 0.90. Over this range, the derivative at yo is negative and greater than 1 in absolute value; thus the fixed point is repellent.

We have studied this transformation as a function of the two parameters-the initial point yo and the coefficient a--on the STRETCH computer. To study the asymptotic distribution, we divide the interval into 10000 equal parts and have the machine keep track of the number of points which fall into each sub-interval over the iteration history. Such a history was usually taken to be a sequence of length n x 105, with 4 <n< 9.* Should a period (of order k< 3 x 105) be detected by the machine, the calculation is automatically terminated. If the order of the period is not too great (k< 500), the values of the periodic points are printed out, and the value of

T(k) (y) dy is calculated over the period; if dy ( o 1 ,

* As we have already remarked (section I), the calculation of 2 x 105 iterates of this transformation requires about I minute on STRETCH.

357

this fact is strong evidence that the observed period is actually a limit set.

Our numerical investigation of (8) has been mostly restricted to the range 0.98 <ar 1.* Even in this restricted parameter range, the observed asymptotic behavior is of bewildering complexity. For a = 1, the distribution of iterates in the interval is extremely nonuniform. There are large peaks at the end-points of the allowed subinterval, with much complicated fine-structure in between, i.e., many relative maxima as well as sizeable intervals in which the distribution is locally uniform. This general behavior persists as a is decreased down to a = 0.9902. At a = 0.9901, however, a dramatic change takes place; most of the points concentrate in a few small sub-intervals. The limit set is, in fact, a pseudo-period.** At a = 0.99009, the pseudoperiodic behavior is still evident, but the occupied sub-intervals are larger. They contract, however, for a = 0.99008, and at a = 0.990079 a period of order k = 42 is found. Actually, for this particular value of the parameter a, there appear to be two possible periodic limit sets, with k = 42 and k = 84 respectively. The dependence on the initial point is, of course, quite complicated.*** For example, as yo is varied

over the values yo = 0.11, 0.12, 0.13,..., 0.21, the period with k = 42 is found in all except three cases, namely yo = 0.11, 0.16, 0.18, which apparently lead to the periodic limit set with k = 84. Both periods are numerically well-attested; they are exact to the last binary digit and have (d T()(y) < t

* Below a = 0.98, various periodic limit sets of order k< 24 were found on MANIAC II. We have not studied this parameter range on STRETCH. ** It may actually be a period with order k = 63049. This is the result indicated by the machine. At present we have no way of verifying this.

*** Subsequent numerical work has shown that this yo dependence is spurious. The calculations were performed with multi-precision arithmetic; in some cases as many as its decimal places were retained!

t It is difficult to decide whether this behavior is real or not. Some supporting evidence for its reality is the observed behavior for values of a very close to this critical value a = 0.990079. We find that, with a = 0.9900789, only a period with k = 42 is obtained, while on the other side, a = 0.9900791, the limit set is always a period with k = 84.

358

As we continue down in a, the limit sets are pseudo-periodic* until we reach a = 0.99007, for which a period of order k = 112 is found. This appears to be the approximate right-hand boundary of a periodic belt; that is, in the range 0.98990 <a< 0.99007 there are only periodic limit sets for this transformation. All these have orders which are multiples of 14 (with k< 112), except for the (approximate) lefthand boundary of the a-range (a = 0.9899) where a period of order 7 exists. The dependence of these limit sets on the initial point is complicated, and will not be reproduced here.

At a = 0.98988 there is another large discontinuity in behavior; we again find an asymptotic distribution which covers the whole allowable sub-interval in non-uniform fashion. No more periodic limit sets are found as a is decreased to a = 0.9800. The only phenomenon of note is the splitting of the limit set into two parts; this occurs somewhere between a( = 0.986 and a = 0.985. The resulting gap-which contains the fixed point-continues to widen as a -a 0.980.**

The results of this investigation-which is rather incidental and subordinate to the larger study reported in the present paper-clearly show that there is a great deal to be learned about the asymptotics of iterative processes, even in one dimension. It seems that "pathological" behavior is not a property of higher-dimensional systems alone. With regard to our study of the periodic limit sets, it may be argued that what we have really done is to investigate, in a rather indirect manner, the behavior of the roots of high-order algebraic equations as a function

* In this range, the machine detected some exact periods of huge ordere.g., k = 295148 (ac = 0.990079)! There seems to be no compelling reason to take this at face value. ** Such gaps have been observed in other cases. One interesting example is the transformation:

y' = W2 (3 - 2W), W = 3 y(l - y), which is also a special case of one of our cubics in three variables. For this transformation, the limit set consists of four separate pieces, as follows: (I) 0.3455435 < y < 0.4086018, (II) 0.4296986 < y < 0.5791385, (III) 0.7562830 < y < 0.8146459, (IV) 0.8220964 < y< 0.84375.

In the present case, i.e., that of T,, the existence of the gap can easily be predicted. Let yo(a) be the fixed point. There will clearly be a gap providing T2(ymax > yo(a), since in this case only one of the two inverses of yo lies in the allowed interval. Of course, the determination of the critical value ac of a from the equation TV2(Ymax) = Y(ff) would in any event have to be carried out numerically.

359

of their coefficients. This is certainly true, at least in part. It is therefore of interest to observe that the iterative method seems at present to be the only effective tool for treating this purely algebraic problem.

11— Non-Linear Transformation Studies On Electronic Computers: With P. R. Stein (LADC-5688, 1963)