Preferred Citation: Ulam, S. M. Analogies Between Analogies: The Mathematical Reports of S.M. Ulam and his Los Alamos Collaborators. Berkeley:  University of California Press,  c1990 1990. http://ark.cdlib.org/ark:/13030/ft9g50091s/


cover

Analogies between Analogies

The Mathematical Reports of S.M. Ulam and His Los Alamos Collaborators

S.M. Ulam

UNIVERSITY OF CALIFORNIA PRESS
Berkeley · Los Angeles · Oxford
© 1990 The Regents of the University of California


Preferred Citation: Ulam, S. M. Analogies Between Analogies: The Mathematical Reports of S.M. Ulam and his Los Alamos Collaborators. Berkeley:  University of California Press,  c1990 1990. http://ark.cdlib.org/ark:/13030/ft9g50091s/

Foreword

"Good mathematicians see analogies between theorems or theories. the very best ones see analogies between analogies." Stefan Banach

Slanislaw Ulam's affiliation with Los Alamos National Laboratory spanned over two-thirds of his professional life. There was no aspect of its mathematical activity during this period in which he was not involved. either centrally or tangentially.

His catholic view of the role of mathematics vis-à-vis other sciences extended far beyond into the mathematical and scientific community at large, as did his genius for problem formulation and for applying the most abstract ideas from the foundations of mathematics to computing. physics. and biology. In addition he possessed the ability to excite others many of them not trained as mathematicians -and involve them in his researches. The impact of his work is still felt both at the Laboratory and in those larger communities, for he liked to disseminate his ideas orally in an ever wildening round of lectures and seminars from where they took on a life of their own. The Monte Carlo method which he originated with von Neumann in order to study neutron scattering and other nuclear problems at Los Alamos is one such example. Its offshoots are now so universal that they are even applied to regulate traffic lights!

His influence, along with that of John von Neumann. the brilliant lHungarian nlathemrlatlician. contributed to the establislhment at the Laboratory of an atmosphere and a tradition that fostered and supported an exceptional- if not unique--interaction between mathematics and science. Extensive testimony and documentation concerning the integral role that Staln Uilam played in this interaction can be found in "From Cardinals to (haos. Reflexions on the Life and Legacy of Stanislaw Ulam" published by (ambridge University Press in 1989.

From 1944 until his death in 1984. while connected with Los Alamos in a variety of ways. from staff member, to group leader. to research advisor. to 'no-fee consultant'-one of his favorite expressions-he wrote Laboratory reports (mnany with the help of trusted and talented collaborators) that show a breadth of scientific interests unusual for a mathematician. They cover pioneering work. in the horse-and-buggy days


x

of computing. on mathematical modeling of physical processes, nuclear rocketry, space travel. and biomathematics. (Another eleven. weaponsrelated reports, with Evans. Everett. Fermi. Metropolis, von Neumann. Richtmyer. Teller. Tuck. and others are still classified and unavailable for publication.)

Mathematically speaking, three motifs run through Ulam's theoretical and applied work: the iteration or composition of functions. or relations; the use of evolving computer capability in the exploration of analytically intractable problems: and the introduction of probabilistic approaches-while knowing that most practical applications are made in the presence of uncertainty. The fusion of these themes is characteristic of the central contributions of this collection.

As to the quotation from which the title of this book is derived, one must remember that Ulam held Banach. along with von Neumann and Fermi. "as one of the three great men whose intellects impressed me the most." Stefan Banach was an outstandingly original Polish mathematician and one of the founders of the now famous Lw6w school of mathematics (see Chapter 20. Preface to "The Scottish Book.") Banach was Ulam's friend and mentor in Poland before World War II. His influence on Ulam was profound and Ulam liked to quote his comment on the ability of some mathematicians to see "analogies between analogies." There is no question that in the practice of his craft and his art. Ulam was guided by this principle, and that he. in turn. epitomized its application. In addition, to Ulam the idea of analogy was itself amenable to mathematical discussion.

In 1983. when D. Sharp and M. Simmons. editors of this Los Alamos Science series, asked Ulam to gather his unclassified-and declassified-reports for publication in one volume, they intended to omit a few that had appeared elsewhere. After his death in 1984. it was decided to publish them all as many represent preliminary studies of subjects that were subsequently expanded elsewhere, leading, in several instances. to the development of new and extensive theories.

Ulam had dictated brief introductory notes and a sketch for a preface which he intended to develop. Rather than put words in his mouth. it was thought more appropriate to reproduce his notes in their short and unpolished form. It was also decided to leave the style and substance of the reports untouched. as evidence that scientific advances do not usually arise in their final, definite form. More often than not they are the product of sequences of tentative. sometimes repetitive. and even at times inaccurate steps. Two appendices complete the volunie: a list of Ulam's publications and a brief biographical chronology. (More detailed biographical material can be found in his autobiography. "Adventures of a Mathematician." as well as in "From Cardinals to Chaos.")


xi

This collection represents an important complement to the selection of papers and problems, mostly in pure mathematics, published by MIT Press in 1974 as "Sets, Numbers, Universes," and to a volume of essays, "Science, Computers, and People," published by Birkhiuser in 1985. And, whereas these two books are composed of papers readily available albeit scattered in the scientific literature, the Los Alamos Reports have been for the most part difficult of access and little known. Their historic value is therefore very real.

As mentioned earlier, many of these reports and much of Ulam's work was done in collaboration. He liked to stress the importance of working with collaborators, with whom, he said, the nature of their "shared ideas and techniques" depended "on the personality and experience of the individuals." We add a few words about these colleagues as well as about the work they and Ulam engaged in.

As early as 1944, when he joined the Manhattan Project, Ulam and David Hawkins (Chapter 1) were "playing" with the notion of branching processes, or multiplicative systems, as they called them, motivated by their application to atomic physics.

David Hawkins, a philosopher of science by profession and mathematician by inclination, was, in Ulam's words "the best amateur mathematician I know," and they became fast friends. Hawkins is presently professor emeritus at the University of Colorado.

The work in the Ulam-Hawkins report was subsequently developed with Everett in the three extensive reports grouped in Chapter 3, which are reproduced here for the first time. While clearly motivated by the need to understand neutron multiplication in fission processes, the reports lay the foundations for-in their own words-a "formalism general enough to include as special cases the multiplication of bacteria, radioactive decay, cosmic ray showers, diffusion theory and the theory of trajectories in mechanical systems."

C. J. Everett, who died in 1987, was a mathematician with whom Ulam worked on a conceptual as well as technical level in Wisconsin and at Los Alamos, where he became a member of Ulam's group. An eccentric, shy, and witty man, he was quite probably the only person who ever opted for bus transportation to come to Los Alamos for a hiring interview, and he was known for having turned in a monthly progress report-in which staff members were supposed to describe their research-which said tersely "progress was made on last month's progress report."

The first written proposals for the Monte Carlo method put together in a 1946 "report" called "Statistical Methods in Neutron Diffusion" appear in Chapter 2. This method of approaching precise but intractable problems through the introduction of random processes and


xii

probabilistic experimentation, has found wide application not only in areas close to those motivating its origin but others more removed, such as operations research, and combinatorics. In fact, the "report" --of which only eight copies were made-consists of two letters and handwritten calculations photographed and stapled together. Its cover specifies that the "work" was "done" by Ulam and von Neumann and "written" by von Neumann and Robert Richtmyer-then head of the Laboratory's Theoretical Division. Its informality attests to the casual manner in which information was disseminated through the Laboratory at the time.

The long term professional and personal rapports between von Neumann and Ulam need not be recounted here--references to them can be found in the books already mentioned. Suffice it to say that though there exist few papers and abstracts under their joint names, von Neumann's extensive correspondence with Ulam attests to their interacting interests in pure mathematics, in pioneering computer technology and techniques, and in cellular automata and the brain. (The correspondence is now stored in the archives of the Philosophical Society in Philadelphia.)

Ulam's collaboration with Enrico Fermi initiated the computer simulations of nonlinear dynamical systems that lead to the evolution of a major field of research popularly labeled "chaos theory." Fermi called this work, which was developed with the programming assistance of John Pasta, "a minor discovery," a modest understatement given the seminal character of this investigation (Chapter 5.) Chapters 10 and 11, with P. R. Stein, address the subject of nonlinear transformations in greater detail.

Fermi, with whom Ulam became acquainted in Los Alamos during the war, was a man of simple tastes and life style. The Ulams had an opportunity to sample this while motoring together across France one summer. Feeling ill at ease during a lunch in a recommended temple of gastronomy, Fermi decreed he would select the night's lodgings. Meandering through a picturesque valley he chose a modest inn by a babbling brook where, after dinner, sitting under the stars they discussed physics and new mathematical problems to experiment with after the vibrating string calculations. However the night's encounter with fleas, bedbugs, and mosquitoes made him admit the next morning that the higher-class hostelry next door that Ulam had eyed, might perhaps have provided a more restful night.

In the area of space technology, Ulam investigated schemes for nuclear rocketry with Everett and with Conrad Longmire, a physicist from the mountains of Tennessee who played a mean banjo. Chapter 7 describes a way to propel very large space vehicles by a series of small


xiii

Foreword external nuclear explosions which later developed into Project Orion. Chapter 9 deals with the propulsion of space vehicles by extraction of gravitational energy from planets. Schemes based on a similar idea are now used in "flyby" missions to the outer planets and to provide part of the energy for spacecrafts going beyond the planets.

A study of patterns of growth, with Robert Schrandt (Chapter 12) investigates how simple recursively defined codes can give rise to complex objects. Such studies have become a growth industry of their own in the improved computer graphics world of today.

Several other reports are devoted to biomathematical questions. Their findings have opened new fields of biomathematical research. Abstract schemata of mathematics are applied to pattern recognition with the help of computers investigating, for example, the way visual pictures are recognized. Using metrics in molecular biology shows how a new mathematical concept of distance between finite sequences or objects can be applied to reconstruct the evolutionary history of biological organisms.

Closely involved with Ulam in this work was Paul Stein, a physicist turned mathematician under Ulam's influence who became an invaluable collaborator able to implement and develop the gist of Ulam's directions. William Beyer, a gifted fellow mathematician, also collaborated on a conceptual and technical level in the biological investigations.

John Pasta, Mary Tsingou-Menzel, Robert Schrandt, and Myron Stein lent their talents to creative programming, at a time when the art was in its infancy and pre-microchip-era machines with names like "Eniac," "Maniac," "Johnniac," presented storage and timing constraints. Pasta, who died in 1980, was a self-made man of Italian descent. He had furthered his education and became a physicist while working on the New York city police force.

The mathematician Al Bednarek, one of the editors of this volume and coauthor of this foreword, also collaborated with Ulam on problems of parallel computation (Chapter 18). He is a former chairman of the mathematics department at the University of Florida.

Shortly after his arrival at Los Alamos in 1944, Ulam was asked by a colleague what it was that he was doing. Since at the time he was a very pure mathematician and had not yet familiarized himself with the nature of the work, his Socratic answer was "I supply the necessary don't know how!" Stan Ulam's "necessary don't know how" as well as his modestly unenunciated "know how" are sorely missed by all who were privileged to have known him or worked with him.

Last but not least, the editors wish especially to thank Peggy Atencio, Ben Atencio, Janet Holmes, Debi Erpenbeck, Gary Benson, Chuck


xiv

Calef and Gloria Sharp, among other members of Los Alamos Information Services Division, and above all Chris West and Pat Byrnes, for their herculean efforts in transposing into print and formatting these extremely difficult reports, and also Patricia Metropolis for her invaluable informal advice and help. The editors assume full responsibility for any existing discrepancies or inaccuracies. They also gratefully acknowledge the permission granted by Rozprawy Matematyczne to reprint their edition of the report "Non linear transformations studies on electronic computers." The preparation of this book was done under the generous auspices of the Los Alamos National Laboratory.

A. R. Bednarek, Gainesville, Florida Francoise Ulam, Santa Fe, New Mexico


xv

Preface

The collection of these reports, which appeared over the considerable span of years that I spent at Los Alamos, concerns a great variety of topics. Its very heterogeneous nature illustrates the diversity of the programs and of the areas of research that interested the laboratory.

Before World War II it was almost exclusively in the universities, in the graduate schools of the larger institutions that scientific research used to take place. The Bureau of Standards and a very few large industrial companies such as Bell Telephone, General Electric, and some pharmaceutical firms were the exception to the rule.

This little collection may bear witness, in a very modest way, to the wide-ranging changes, which are still going on in the organization and practice of research in this country and abroad. Because of the novel problems which confronted its scientists during the wartime establishment of Los Alamos, the need arose for research and ideas in domains contiguous to its central purpose. This trend continues unabated to the present.

Problems of a complexity surpassing anything that had ever existed in technology rendered imperative the development of electronic computing machines and the invention of new theoretical computing methods. There, consultants like von Neumann played an important role in helping enlarge the horizon of the innovations, which required the most abstract ideas derived from the foundations of mathematics as well as from theoretical physics. They were and still are invested in new, fruitful ways.

An enormous number of technological and theoretical innovations were initiated at this laboratory during these forty years. To mention but a few, besides the advances in computing, one can name research on nuclear propulsion of rockets and space vehicles, in molecular biology, and on the technology of separating cells.

The growing importance of research laboratories such as this one became a not exclusively American phenomenon. For instance the aspect of academic research has changed almost beyond recognition in France. What used to be, before World War II, almost exclusively the province of universities, has now shifted to the French National Center of Research (Centre National de Recherche.)


xvi

The growing importance of research laboratories such as this one became a not exclusively American phenomenon. For instance the aspect of academic research has changed almost beyond recognition in France. What used to be, before World War II, almost exclusively the province of universities, has now shifted to the French National Center of Research (Centre National de Recherche.)

This collection of Los Alamos Reports ranges over almost four decades and may illustrate, I hope, how a mathematical turn of mind, a mathematical habit of thinking, a way of looking at problems in different subfields of physics, astronomy, or biology can suggest general insights and not just offer the mere use of techniques. Ideas derived from even very pure mathematical fields can provide more than mere "service work," they may help provide true conceptual contributions from the very beginning.

The period in question has seen the origin and development of the art of computing on a scale which vastly surpasses the breadth and depth of the numerical work of the past. In at least two different and separate ways the availability of computing machines has enlarged the scope of mathematical research. It has enabled us to attempt to gather, through heuristic experiments, impressions of the morphological nature of various mathematical concepts such as the behavior of solutions of certain nonlinear transformations, the properties of some combinatorial systems, and some topological curiosities of seemingly general behaviors. It has also enabled us to throw light on the behavior of solutions of many problems concerning complicated systems, by allowing numrerical computations of very elaborate special physical problems, using both Monte Carlo type experiments and extensive but "intelligently chosen" brute force approaches, in hydrodynamics for example.

A number of such experiments have revealed, surprisingly, a nonclassical ergodic behavior of several dynamical systems. They have showed unexpected regularities in certain flows of dynamical systems, in the mechanics of many-body problems, and in continuum mechanics. Recently they have been applied to the study of elementary particle physics set-ups and interactions.

And now there appear some most exciting vistas in the applications of mathematics to biology that deal with both the construction and the evolution of living systems, including problems of the codes, which seem to define the basic properties of organisms and ultimately may provide us with a partial understanding of the working and evolution of the nervous system and some of the powers of the brain itself.

In addition these reports show the varying involvments of my collaborators and myself. I particularly want to stress the importance of


xvii

the role of collaborators. An ever increasing number of publications of mathematical research is proof of the advantages derived when two or more authors share ideas and techniques. The nature of this exchange varies from case to case, depending on the personality and experience of the individuals.

These few very sketchy remarks are merely intended to emphasize how the necessity of defense work at the frontiers of science has continued to this day to stinmlate research in a multitude of directions.

S. M. Ulan Santa Fe, February 1984


xviii

Faded cover of the original 1944 report after it was released from classification in 1956. Note its low number.


1

1—
Theory of Multiplicative Processes:
With David Hawkins (LA-171, November 14, 1944)

This report treats branching processes, of neutron proliferation for instance, through a mathematical theory involving compositions of generating functions.

It is a precursor of the studies of multiplicative systems in several variables written with C. J. Everett in 1948 (LA-683, -690 and -707) which develop an elaborate and basic theory of "multiplicative" (branching) processes. See for example a book by T. E. Harris: The Theory of Branching Processes, published by Springer in 1963. (Author's note.)

Abstract

Generalpropertiesofstatistics ofmultiplicativesystemsare discussedtogetherwiththestudyoffluctu ations in the number of particlesin such systems. A general methodisindicatedthroughwhichonemay study the fluctuations in the casewhereonetakesinto accountthefactorsofgeometryand time-dependence ofconstants.

The statistical theory of multiplicative chain processes does not compare in completeness to date with the corresponding theory of additive processes. The present paper is intended primarily as an exposition of a simple theory of the statistics of multiplication, permitting application to a variety of special problems.


2

Analogies between Analogies The simplest (the "Bernoullian") case may be described as follows: A particle can produce, with probabilities po, pi, P2. , Pn,.. . a number 0, 1, 2, 3, ..., n,... of similar particles in one generation. We assume that each particle produced has again the same probabilities of producing n offspring. We also assume that each particle dies at procreation. Required is the probability law pk(n) for any generation k.

We remark parenthetically that this formulation makes the multiplicative process essentially discrete and finite. The statistics of neutron multiplication involves a continuous process as well, namely a random distribution in energy, space, and time. We disregard this aspect initially. Later we shall show that the admission of such continuity leads to a generalization of the methods described below. There are, in the meantime, two physically accurate interpretations of a discrete series: (1) one can represent the chain process as a graph; the n particles in the kth generation are the n lines connecting the kth and the k + 1st - branch points in a chain or set of chains; (2) the n particles are those in existence at the kth unit of time, where the probability law pl(n) is the distribution one unit of time after the introduction of a single particle. If a time unit be chosen equal to the average time between fissions, the distinction is in many cases not crucial. Frankel, and later Feynman, studied the continuous process. We shall show later that their differential equations of the random process correspond to the infinitesimal transformations of the group in which our iteration (see Theorem I) may be imbedded.

1. The first problem to consider is this: we are given an amount and arrangement of active material. In this system a neutron produces on the average n neutrons with probability p(n); C=)p(n) = 1.p(O) is the average probability of leakage or absorption, without subsequent production of neutrons. p(n) normalized for n > 0 is a nuclear constant, so far purely empirical, known as to its first moment and less accurately as to its second mnoment. Required is the probability of having n neutrons after k generations (or units of time). This problem is solved, in principle, by:

Theorem I. Let f(x) be the generating function of the distribution of the lnmber of offspring, i.e., f(x) =n =op(n)xn. Then the generating function for the kth generation fk(x) = fk(x), the kth iterate of f(x). [The kth iterate is defined as follows: f1 (x) = f(x), fk(x) = f(fk-l(x)). The theorem asserts that the probability pk(n) is given as the coefficient of xn in the ascending polynomial or power series expression of fk(x). The physical multiplication of


3

the random variable is reflected in the iterated substitution by which x- f(x)].

Proof: Starting with one neutron in the 0th generation we obtain, with probability pk(n), n neutrons in the kth generation. Beginning with r neutrons, denote the corresponding probability by p() (n). Now assume that a chain is started by one neutron. We have Pk(n) = EPk-l(r)p) (n). r=O

Now if f(x) is the generating function of the distribution pl (n), the generating function of the distribution p(r(n) is [f(x)]r. This follows from the assumption that contemporary neutrons are independent in procreative powers, and from the theorem (of Laplace) that the generating function of a sum of independent random variables is the product of their generating functions. The above proposition may also be verified for r = 0, since p(l)(n) = 0 for all n > O. Substituting generating function for probability in the above equation, we have:

Two remarks may be made at this point. (a) The simple proof above sustains a more general theorem if the distribution generated by f(x) is not constant, but time- or generation-dependent. Instead of the iterate ffff...(f(x)), we will have some fgh...(q(x)). By the mode of argument established, the chain process may be analyzed one step further. Let g(y) = ay +b be the generating function for the probabilities b of loss or absorption of a single neutron and a of producing fission, with a + b = 1. Let h(x) = cl x + c 3x3 + ... be the generating function of distribution of neutrons per fission. Then if the two are combined by the transformation y -, h(x), we have that the distribution of neutrons per neutron is generated by f(x) = g[h(x)]. (b) If on the other hand we start from a single fission, and wish to know the distribution for the number of first-generation fissions, this is given by F(y) = h(g(y)). The iterates of f(x) and F(y) are connected by simple and evident relations.

There remains the practical problem of determining coefficients and other properties of fk(x), given f(x). To this end we first shall establish some general properties of iteration.


4

2. Let f(x) be a monotone function. Assume, for example, f(x) increasing, i.e., if x< y, f(x) < f(y). A fixed point for f(x) is a value xo such that f(xo) = xo. The set of fixed points for a continuous function is closed, i.e., the points which are not fixed form a collection of disjoint intervals, whose endpoints are fixed points. If we form the sequence fk(x) for a given x we obtain a sequence of points converging to a fixed point xo which forms the endpoint of the interval in which x is situated. In fact, there are two cases possible, either f(x) < x or f(x) > x. From the monotone character of f(x) it follows that we shall have correspondingly either fk(x) < fk-l(x) or fk(x) > fk-1(x) for all k. Unless these sequences tend to -oo or +oo, they will have limit points. If now limk-, fk(x) = xo, we must have f(xo) = xo. In fact limk-, f(fk(xo)) = f(xo) = xo. In addition, it is easy to see that xo is the next fixed point to x (on the left or right depending on whether f(x) < x or f(x) > x. This follows from the fact that if f(x) is monotone and f(xo) = xo, f(xl) = xl, then for all x such that xo < < < x, we have f(xo) = xo< f(x) < f(x1) = xi.

In our case f(x) is a power series with all coefficients non-negative, f(O)>0,f(l) = 1. This function is certainly monotone and increasing for all non-negative x. Let xo be the first (non-negative) fixed point, xo certainly exists, the set of fixed points being closed. From these conditions it follows that limko,fk(O) = x0 . But if the variable in a generating function is set = 0, the value of the function is the probability that the random variable takes the value 0. Hence xo gives us the limit of the probability of mortality in the system. The probability of immortality is therefore simply 1 - x0 , where xo is the smallest nonnegative root of the equation f(x) = x. It is easy to see that if, as in our case, all the coefficients in the expansion of f(x) are non-negative and f(1) = 1, then from f'(1) > 1 it follows that there is a root, and only one root x0 , which is non-negative and < 1. If f'(1) < 1,xo = 1 is the smallest positive root. We obtain immediately therefore the familiar fact that neutrons in a subcritical gadget without source will, with probability 1, die out in a finite time. For the supercritical gadget the probability of indefinite production can be obtained by solving the equation f(x) = x.

The kth iterate of a function can be obtained by a simple graphical or mechanical method which is based on the fact that along the diagonal, f(x) = x. Thus we may for given x replace this argument by f(x), getting f2 (x) graphically, then repeating f4 (x) and so forth. In the case of the generating function under discussion this shows that fk(x) very rapidly approaches its asymptotic form: for the critical or subcritical case the asymptote in the interval 0 <x < 1 is limkoo fk(x)-xl; for the supercritical case in the interval 0 <x < I the asymptote is


5

limk,, fk(x) _ xo. This implies that for all positive powers of x in fk(x) the coefficients approach 0 uniformly, i.e., the mass of probability is either absorbed altogether into the zero region (subcritical case), or is spread out in an infinitely long tail (supercritical case). In the region of criticality the distribution has an infinitely long tail with mass approaching zero as the probability of mortality approaches one.

3. One of the important properties of generating functions is that they permit the calculation of moments. Thus if p, is the distribution itself, f(x) = '°_=opnXn its generating function, we have, because obviously f(1) = 1, the first moment or expected value of the random variable = _=onpn = f'(1) = the first derivative of f(x) at x = 1. Similarly the second moment of the number of neutrons can be found if we know the second derivative. In fact

Similarly the rth moment can be found easily from the values of the first r derivatives of f(x) at x = 1. (The rth derivative at x = 1 is sometimes called the rth combinatorial moment.)

Our generating function is the kth iterate fk(x). It turns out that its first m derivatives depend only on the first m derivatives off(x) itself in a rather simple way. We have, in fact:

Theorem II. (a) [fk(x)]= = f'(1) = n (i.e., the proof of the intuitively obvious result that the expected number of neutrons after n generations is n.) (b) [fk(x)]=" =f"(x)=l1 . [(f(l))k + (yf(l))k+1 + ..(f (1))2 k- ]).

The proof is immediate by induction: [f(fk-l(x))] = f'(fk-l()))[fk-l()]'.

But for x = 1, fk-l(x) = x = 1; therefore since by assumption [fk-l(x)] = 1= [f'(l)]k-l we obtain our formula (a). By differentiating twice we obtain (b). Somewhat more complicated formulae hold for higher derivatives:

Their derivation is through recursive relations as follows: by differentiating the identical equation fk() = f(fk-l(x)) repeatedly, and in all places substituting xo for fk-l(xo), we obtain a


6

sequence of linear first-order difference equations. Representing dr/dx' fk (x) = Mk,r(Ml,r = ir) we obtain Mk,l =M lMk-l,1 Mk,2 =l M2M21+ M1Alk-l,2 Mk,3 = M3M13_ ,1 + 3M2Ik_1,l Mk-1,2 + MlMk-l,3 each is of the form Xk = Ak-1 + Mlxk-1 whose general solution is k Xk = E M As-1 + M=1 x1 s=2 Solutions for the first three derivatives are

Since in the function under discussion xo = 1 is a fixed point, these derivatives are the combinatorial moments of the distribution. We may now consider the three cases where M1(== v of common use) is > 1, < 1, or = 1.

a) In the supercritical case where M1> 1, it is clear from the method of deriving these factorial moments that if the random variable n is measured in units of M 1, all moments approach a finite asymptotic form. Computation of moments for this asymptotic distribution may be greatly simplified as follows: Let us define a function O(X) = xl/Ml, the kth iterate being _k(x) = x(1/Ml) k. The generating function fk(Qk(x)) if expanded in powers of x(1/MI)k has the


7

same coefficients as fk(x) but these are now probabilities associated with the number of particles measured as fractions of the expected number. This is to say that the distribution is scaled in units of Mk = -Uk, and its first moment = 1. Since for the supercritical case all moments approach a constant value as k -- oo when scaled in this way, and since the generating function is monotonic in the region (0, oc), there exists a common limiting value, g(x) of both f(k[k(x)] and fk-l[k-1 (x)]. Since fk[ck(x)] = f[Ok-1[jk(O(x))]], we may write in the limit: g(x) = f[g[((x)]], Q(x) = xl/Ml, f(x) given, and from this functional equation for g, its moments may be obtained from the second, third, etc., derivatives of g by solving only linear algebraic equations.

b) In the exactly critical case, M1= 1, the moments are Mk,1 = 1 Mk,2 = k • M2 Mk,3 = kM3+ 3 2

This is a distribution in which Pk,o - 1 - 1/kM2, and such that if the system has not died in the kth generation, the expected number of neutrons is _kM 2.

c) In the subcritical case all moments converge to zero, but are approximately proportional to the first moment.

4. We may consider here briefly a simple special case, in which the iteration problem may be solved exactly.

Let f(x) = (ax+b)/(cx+d); we have here a three-parameter family of functions (one of the four constants a,b, c, d, is immaterial). We can adjust them so that f(1) = 1, and f'(1) = 7. We can then impose another condition, either on f"(l), or so that f(xo) = xo, where xo is the "true" probability of mortality. Functions of the above sort form a group under substitution. This can be verified directly by substituting. (They form the so-called projective group of the line.) A fortiori the iterated function fk(x) = (akX + bk)(ckx + dk)

By expanding fk(x) in a power series in x, we obtain the exact solution of our problem in this fairly general case. We determine the constants by the following three relations:


8

(1) Because f(1) = 1, we have for every k fk (1) = 1 which gives ak + bk = Ck + dk.

(2) Similarly, for the second fixed point xo of f(x), i.e., the root xo= 1 of f(x) = x, we have f(xo) = xo, and therefore for all k fk(xo) = xo or akxo + bk = Ckxo + dk and xo = -b/c, from axo + b = xo(cxo + d).

(3) From the results of section 3, we know that [fk(x)]l=1= [f,(1)]k =-k This gives a(cx + d) - (ax + b) (cx + d)2 or taking account of (1) (a-c)(c + d) and therefore for all k(ak-ck)k (Ck - dk)

From the above three relations it is easy to calculate the constants ak, bk, Ck, dk in terms of v and one arbitrary parameter. By eliminating ak, bk and developing into a power series, we get, noting that ck/dk /1Vk - 1, assuming, e.g., v > 1, the result in the form fk(x) = [(ax + b)/vk+l]{l + (1- l/vk)x + (1 - l/vk)2 x2+ ... (1 - 1/vk)nxn +...

This constitutes a complete solution of our problem. It is interesting to note that the probability of having n neutrons decreases geometrically with n; the ratio of the successive terms is in the case v> 1, k large extremely close to 1. The distribution has the form of an exponential, decreasing very slowly. Asymptotically the probability of having exactly n neutrons is independent of n. This result shows also the possibility of enormous fluctuations in multiplicative systems. The "law of large numbers" in its ordinary formulation is not true for multiplicative processes. In fact the probability of having more (or less) than f times the expected value of neutrons tends to a positive constant (dependent on £). The following form of the law of large numbers is valid, as the examination of the distribution shows at once:


9

Theorem III. Given an e > O, there exists an N such that for all k > N, the probability of the number n of neutrons in the kth generation being such that (_ e)k <n < (+-te)k is greater than 1 -e: P{(v-_)k < n < ( + E)k) > 1-.

It remains to discuss the most general form of the distribution. We hope to do this later through two methods, one consisting of the consideration of functions of the form hfh- (x), where f is of the projective linear form discussed above, and h(x) is an arbitrary monotonic function. The kth iterate then is simply hfkh-l(x). The function h(x) will give us more arbitrary parameters for our real distribution. The second method consists in developing f(x) into a series of functions whose terms have the "projective" form.

Finally it may be remarked that the limiting distribution obtained above is formally identical to those obtained by Frankell and Feynman who used a continuous time parameter instead of our discretegenerations model. Their physical model is somewhat different and leads to the finding of the infinitesimal transformation of the continuous, abelian, one-parameter group into which the group of iterates of a function can be imbedded.

5. There are many other problems besides the question of the probable number of neutrons after k generations which can be solved by operational methods. The first we shall consider is that of a subcritical system (i7 < 1) with a source. We suppose that the distribution of neutrons entering the system in a given generation has the generating function X(x),f(x) being the generating function of the system itself as before, we shall have

Theorem IV. The generating functions in the zero, first, second generations are the functions: O(X), (x). * [f(x)], O(X) - ) [f(x)]- [f2(x)]. Proof is completely analogous to that of Theorem I. In general, letting Fk(x) represent the distribution in the kth generation a) Fk(x) = ¢(x) . Fk-l[f(x)] If the system is subcritical, but sustained at a definite level by the source, we shall have the limiting distribution-or its limiting generating function-as a nonsingular function of x : limk-, Fk(x) = F(x), F(1) = 1. Passing to the limit on both sides of our equation a) we get


10

b) F(x) = F(x) [ F[f(x)] where O(x), f(x) are given. One has to determine F(x) from this functional equation. Even without doing it one can obtain at once useful statistical information, for example the moments of F(x), by differentiating b). Thus: F'(1) = (1- f') F"(1) =_ + 20f' + f"+/' (1 - f,2 ) (1 - f'2 )(1 - f')

giving us a way to compute standard deviations, and similarly, more complicated expressions for the higher derivatives and moments. The first derivative--the expected value--being inversely proportional to the degree of subcriticality becomes infinite if f'(1) approaches 1.

6. We come now to the probability distribution of the sum of all neutrons in the system from the first to the kth generation. We have established previously that if f(x) = E'=o0 pnXn is the generating function for the probabilities of n particles in the first generation then the generating function of the kth generation is given by the kth iterate fk(x).

If we want the generating function for probabilities of having the total of n particles from the first to the kth generation, we shall proceed as follows.

The total of n particles can be obtained by any one of the following mutually exclusive cases: we can have I in the first generation and n-1 in the remaining k - 1, or 2 in the first generation and n - 2 in the remaining k - 1; in general we can have r in the first and n - r in the remaining k - 1 generations. The required probability is therefore the sum of q(n) =-Pr-P (n) . r

Here pn-r(n) denotes the pro the probability that, starting from r in the first generation, we shall attain from these r a total of n - r in k - 1 generations. But the r particles are independent of each other. The probability of getting the total of n- r from them is therefore the probability of n - r in the sum of these r variables. The generating function for the sum of the independent variables is the product of the generating functions corresponding to each of them. In our case it is the rth power of f(x). We are looking for the coefficient of xn- r in [fk-1(x)]r. Our required probability qk equals therefore the sum with


11

respect to r of coefficients of xn-r in [fk-l(x)]r, or the sum of the coefficients of xn in r PnX [f k-(x)].l

But the coefficient of x" in ZrPnXr[fk-l(X)]r is the same as this coefficient in f(x fk-l(x)). This is true for all n. Therefore the generating function for q, is f(xfk-l(x)). Since n here is arbitrary we get:

Theorem V. The generating function for the time sum is: uk(x) = f[xukl(x)].

If we 'count" the original particle, this multiplies the generating function by x; expressing this slightly modified form recursively, we obtain the more convenient expression: uk(x) = xf[uk-(x)]

As we know we have, in general, a relation between moments of the nth order of a distribution function and the nth derivative of the generating function. We shall now show how one can compute the derivatives of uk(x) for any k in an explicit manner.

Since, as was shown above, uk(x) = xf[k-1(x)],

we may obtain the desired results by repeated differentiations, and by solving the resulting finite difference equations. But if k is allowed to approach infinity, and if the system is subcritical, li ux)limuk( k-i1(x)= u(x) k-—oo k-oo

Hence for the distribution of the total number produced, we have u(x) = x f[u(x)], differentiating, we obtain: u/'() = (1 - f (1)) u( -If" '(-f')] (1 - f')3


12

These examples show how moments of the distributions can be computed for various problems in our discrete model. Otto Frisch has shown how, from the form of these moments, one can write their correct form for the continuous model, without having to solve the partial differential equations of the problem. This correspondence between the two models will be taken up later. It may be said that a generality of method has been established by the foregoing results, which demonstrate that the iteration of suitable operators corresponds to various physical observables connected with chain processes. For example it may be mentioned that the transformation x -(1/x)f(x) gives us the probability-distribution for differences between the number of neutrons in a generation and the number in the next generation. Thus fk-l[(1/x) .f(x)] generates probabilities of this kind. The mathematical description of a multiplicative chain process is seen to involve the iteration of a functional operator U. These operators U act on the domain of all monotone functions g(x), g(l) = 1. To summarize again just a few examples:

(1) U(g) = f(g), f here is a given monotone function, g represents any function of the domain on which U operates, i.e., g(x) is monotonic, g(1) = 1. This operator U is the only one that has been studied extensively in the literature. Its iteration leads to the simple iteration process: g(x), f(g(x)), f[f(g(x))]...fk(g(x)) ...

(2) U(g) = f(x-g), f a given function. The domain of the operator, i.e., the admissible g are the same, but there seems to be very little known about the iterates of this operator. This operator is tied to the probability law of the total number of particles produced.

(3) U(g) = ¢(x) g(f(x)); ((x), f(x) are given. The iterates of this operator give us the distribution of the number of particles produced when a source with given distribution 5(x) is acting constantly.

(4) U(g) = f[(/x) . g] . This operator relates to the probability distribution of the difference of the number of particles in successive generations. The study of conjugates, fixed points, etc., for such operators seems to be important. We hope to undertake this study later.

We turn now to a more complex version of the problem. Hitherto it has been assumed that the generating function was independent of temporal and geometrical factors. However, our methods are extensible beyond these limitations.


13

7. The calculation of the probability distributions in the general case of heterogeneous particles will now be considered. So far we have assumed that the probability of generating n neutrons is the same independently of the parent neutron. If one takes the real situation where the system of the active material is of finite extent, then obviously the probability of leakage and absorption is a function of position of the parent nucleus. It is obvious that in general chemical or nuclear chain-reaction processes one has to deal with several kinds or even a continuous variety of the elementary generating functions.

In order to explain our methods of iteration of functional operators for this general case we shall take the simplest case of two kinds of particles. If we divide, for the first approximation, the sphere of the active material into two parts, an inner sphere and the outer shell, we shall characterize the neutrons generated in the one part by the subscript x, the others by subscript y. An x-particle can generate either x-particles again or penetrating to the outer shell y-particles, or, of course, leak out or be absorbed; the same, though with different probabilities, applies to the y-particles. In reality we should consider a one-dimensional variety of kinds of particles corresponding to all values of their distance r from the center of the sphere or even a twodimensional one if we want to take into account different velocities. To simplify the presentation we shall limit ourselves here to just two kinds (x and y).

We assume that the following elementary probabilities are given by the nuclear constants and by the integrals of the geometry involved. An x-particle can produce n(> 0) x-particles with the probabilities Pn and n(> 0) y-particles with probabilities qn. The probability of dying out absorption or leakage will be denoted by po. For the y-particles the corresponding probabilities will be denoted by Pn,n, and po. It is because of the geometry of the system that po and po are certainly different.

We now write the two functions of two variables each: f(x, y) = po + plx + . pnxn + ... + qly + . qnyn + . g(x, y) = po+ plx+...pn +.' + qlY + ... qnY + ...

The coefficients of f(x, y) give the probabilities of having in the first generation a given number of x- or y-particles starting with one xneutron. Those of g(x, y), if we start with a y-neutron.

Required are the probabilities of finding in the next generation a given number of x- and y-particles. Let us form the function f2(x, y) = f[f(x, y) g(x, )] .


14

By reasoning exactly as in the proof of Theorem I (or Theorem III) we see that the probability of having n x-particles and m y-particles is given by the coefficient of xrlyM in f2(x, y). If we started in 0th generation with a y-particle we will get these probabilities as the coefficients of xnyt in .g[f(x,y),g(x,y)]. By an obvious induction we obtain:

Theorem VI. The probabilities of having n x-particles and m y particles in the kth generation are given by the coefficient of xny t in f[Tk-l(r)] or g[Tk-l(r)] (depending on whether we started from an x- or from a y-particle). Tk(p) is a transformation of the plane (x, y) into itself defined as follows: if p =(x, y) then T'(p) = T(p) [f(x, y), g(x, y)]; Tk(r) = T(Tk-l(p)]. Without going into the details of the proof or actual computations of moments we wish to conclude by the following remarks:

(1) In the case of 3 or any finite number r of different kinds of particles, the formalism necessary to obtain the generating function for the kth generation is the same. It consists of iterating a given set of r functions or a transformation in r dimensions (variables xl, x2... xr).

(2) One fairly general case where the coefficients of the mixed powers of the variables xa xa2 ... xar can be computed explicitly in a closed form for any number k of generations is when the given transformation is the r dimensional generalization of our projective transformations on the line, i.e., p = (xl,x2...xr); p' = T(r) = (x4x2...x4) where x: = fl(xl ...Xr) =(alIxi + . alrr + bl)/(cllx +...clrXr + dl) .................................................................. r= fr(xi . ..r) = (ariXl + . . arrxr + br)/(CrlXl + .. Crr + dr).

(3) The computation of moments of the distribution in the most general case does not involve the explicit knowledge of Tk(r), but can be obtained through the knowledge of the moments of the r given elementary functions fl(Xl ...Xr) ...fr( ...Xr)

The role of the numerical multiplication of moments is here taken over by matrix multiplication.

(4) The other operators corresponding, e.g., to U(g) = f(x g), etc., have not been so far investigated in the r-dimensional case.


15

Conclusions Regarding Applications

The expected value of the number of neutrons per fission "v" is known with fair accuracy. The critical mass and the expected number of neutrons in a gadget depend on this constant alone. Very little seems to be known, however, about the distribution function of the number of neutrons or even only about its second moment. The great fluctuations in multiplicative systems discussed above are of some practical interest for the following reasons:

1. The correct timing of the initiation of the gadget is vital for high efficiency. Even with good sources there will be an uncertainty of several generations time -due to fluctuations in multiplication.

2. The fluctuations of multiplication are of interest in all "integral" experiments.

3. For gadgets large in comparison with the mean free path for fission, the spatial fluctuations may destroy the initial spherical symmetry.

In dealing with such problems it is useful to develop a uniform technique for describing the statistics of multiplicative phenomena. This paper constitutes a first step consisting essentially in the observation that the iterated substitution (of a function, or more generally of a functional operation) represents exactly the statistical laws of multiplicative processes. In the sequel, it is hoped to apply this technique to the study of the problems of geometrical- and time-dependence of the process.

Reference

1. Stanley P. Frankel, The Statistics of the Hypercritical Gadget, LAMS-36, January 8, 1944.


16

Title page of the original 1947 report which was declassified in 1959. Los Alimos documents used to list the persons who did the work as well as those who wrote the reports. This practice was abandoned soon after the end of the war.


17

2—
Statistical Methods in Neutron Diffusion:
With J. von Neumann and R. D. Richtmyer (LAMS-551, April 9, 1947)

This report, written in 1947 by J. von Neumann and R. D. Richtmyer, is about work done by J. von Neumann and myself-as the title page indicates. It gives the first published ideas and proposals for the Monte Carlo Method. (Author's note.)

Abstract

There is reproduced here some correspondence on a method of solving neutron diffusion problems in which data are chosen at random to represent a number of neutrons in a chain-reacting system. The history of these neutrons and their progeny is determined by detailed calculations of the motions and collisions of these neutrons, randomly chosen variables being introduced at certain points in such a way as to represent the occurrence of various processes with the correct probabilities. If the history is followed far enough, the chain reaction thus represented may be regarded as a representative sample of a chain reaction in the system in question. The results may be analyzed statistically to obtain various average quantities of interest for comparison with experiments or for design problems.

This method is designed to deal with problems of a more complicated nature than conventional methods


18

based, for example, on the Boltzmann equation. For example, it is not necessary to restrict neutron energies to a single value or even to a finite number of values, and one can study the distribution of neutrons or of collisions of any specified type not only with respect to space variables but with respect to other variables, such as neutron velocity, direction of motion, time. Furthermore, the data can be used for the study of fluctuations and other statistical phenomena.

THE INSTITUTE FOR ADVANCED STUDY Founded by Louis Bamberger and Mrs. Felix Fuld Princeton, New Jersey School of Mathematics March 11, 1947

VIA AIRMAIL: REGISTERED Mr. R. Richtmyer Post Office Box 1663 Santa Fe, New Mexico

Dear Bob: This is the letter I promised you in the course of our telephone conversation on Friday, March 7th.

I have been thinking a good deal about the possibility of using statistical methods to solve neutron diffusion and multiplication problems, in accordance with the principle suggested by Stan Ulam. The more I think about this, the more I become convinced that the idea has great merit. My present conclusions and expectations can be summarized as follows:

(1) The statistical approach is very well suited to a digital treatment. I worked out the details of a criticality discussion under the following conditions:

(a) Spherically symmetric geometry. (b) Variable (if desired, continuously variable) composition along the radius of active material (25 or 49), tamper material (28 or Be or WC), and slower-down material (H in some form). (c) Isotropic generation of neutrons by all processes of (b). (d) Appropriate velocity spectrum of neutrons emerging from the collision processes of (b), and appropriate description of the


19

cross-sections of all processes of (b) as functions of the neutron velocity; i.e., an infinitely many (continuously distributed) neutron velocity group treatment. (e) Appropriate account of the statistical character of fissions, as being able to produce (with specified probabilities), say 2 or 3 or 4 neutrons.

This is still a treatment of "inert" criticality: It does not allow for the hydrodynamics caused by the energy and momentum exchanges and production of the processes of (b), and for the displacements, and hence changes of material distribution, caused by hydrodynamics; i.e., it is not a theory of efficiency. I do know, however, how to expand it into such a theory (cf. (5) below).

The details enumerated in (a)-(e) were chosen by me somewhat at will. It seems to me that they represent a reasonable model, but it would be easy to make them either more or less elaborate, as desired. If you have any definite desiderata in this respect, please let me know, so that we may analyze their effects on the set-up.

(2) I am fairly certain that the problem of (1), in its digital form, is well suited for the ENIAC. I will have a more specific estimate on this subject shortly. My present (preliminary) estimate is this: Assume that one criticality problem requires following 100 primary neutrons through 100 collisions (of the primary neutron or its descendants) per primary neutron. Then solving one criticality problem should take about 5 hours. It may be, however, that these figures (100 x 100) are unnecessarily high. A statistical study of the first solutions obtained will clear this up. If they can be lowered, the time will be shortened proportionately.

A common set-up of the ENIAC will do for all criticality problems. In changing over from one problem of this category to another one, only a few numerical constants will have to be set anew on one of the "function table" organs of the ENIAC.

(3) Certain preliminary explorations of the statistical-digital method could be and should be carried out manually. I will say somewhat more subsequently. (4) It is not quite impossible that a manual-graphical approach (with a small amount of low-precision digital work interspersed) is feasible. It would require a not inconsiderable number of computers for several days per criticality problem, but it may be possible, and


20

it may perhaps deserve consideration until and unless the ENIAC becomes available. This manual-graphical procedure has actually some similarity with a statistical-graphical procedure with which solutions of a bombing problem were obtained during the war, by a group working under S. Wilks (Princeton University and Applied Mathematics Panel, NDRC). I will look into this matter further, and possibly get Wilks' opinion on the mathematical aspects.

(5) If and when the problem of (1) will have been satisfactorily handled in a reasonable number of special cases, it will be time to investigate the more general case, where hydrodynamics also comes into play; i.e., efficiency calculations, as suggested at the end of (1). I think that I know how to set up this problem, too: One has to follow, say, 100 neutrons through a short time interval At; get their momentum and energy transfer and generation in the ambient matter; calculate from this the displacement of matter; recalculate the history of the 100 neutrons by assuming that matter is in the middle position between its original (unperturbed) state and the above displaced (perturbed) state; recalculate the displacement of matter due to this (corrected) neutron history; recalculate the neutron history due to this (corrected) displacement of matter, etc., etc., iterating in this manner until a "self-consistent" system of neutron history and displacement of matter is reached. This is the treatment of the first time interval At. When it is completed, it will serve as a basis for a similar treatment of the second time interval At; this, in turn, similarly for the third time interval At; etc., etc.

In this set-up there will be no serious difficulty in allowing for the role of light, too. If a discrimination according to wavelength is not necessary, i.e., if the radiation can be treated at every point as isotropic and black, and its mean free path is relatively short, then light can be treated by the usual "diffusion" methods, and this is clearly only a very minor complication. If it turns out that the above idealizations are improper, then the photons, too, may have to be treated "individually" and statistically, on the same footing as the neutrons. This is, of course, a non-trivial complication, but it can hardly consume much more time and instructions than the corresponding neutronic part. It seems to me, therefore, that this approach will gradually lead to a completely satisfactory theory of efficiency, and ultimately permit prediction of the behavior of all possible arrangements, the simple ones as well as the sophisticated ones.

(6) The program of (5) will, of course, require the ENIAC at least, if not more. I have no doubt whatever that it will be perfectly


21

tractable with the post-ENIAC device which we are building. After a certain amount of exploring (1), say with the ENIAC, will have taken place, it will be possible to judge how serious the complexities of (5) are likely to be.

Regarding the actual, physical state of the ENIAC my information is this: It is in Aberdeen, and it is being put together there. The official date for its completion is still April 1st. Various people give various subjective estimates as to the actual date of completion, ranging from mid-April to late May. It seems as if the late May estimate were rather safe.

I will inquire more into this matter, and also into the possibility of getting some of its time subsequently. The indications that I have had so far on the latter score are encouraging.

In what follows, I will give a more precise description of the approach outlined in (1); i.e., of the simplest way I can now see to handle this group of problems.

Consider a spherically symmetric geometry. Let r be the distance from the origin. Describe the inhomogeneity of this system by assuming N concentric, homogeneous (spherical shell) zones, enumerated by an index i = ,..., N. Zone No. i is defined by ri-_<r<ri, the ro, rl, r2,...,rN-1, rN being given: 0 = ro < rl < r 2< ... < rN-1 < rN = R, where R is the outer radius of the entire system.

Let the system consist of the three components discussed in (1), (b), to be denoted A, T, S, respectively. Describe the composition of each zone in terms of its content of each of A, T, S. Specify these for each zone in relative volume fractions. Let these be in zones Nos. ixi, Yi, Zi, respectively.

Introduce the cross sections per cm3 of pure material, multiplied by l0 loge = .43..., and as functions of the neutron velocity v, as follows: Absorption in A, T, S:EaA(),EaT(V), EaS(o) Scattering in A, T, S: EAA(v),ErT(V), EsS(v) Fission in A, with production of 2, 3, 4 neutrons: fA(V), (.fA(V), (4) .V

Scattering as well as fission are assumed to produce isotropically distributed neutrons, with the following velocity distributions:


22

If the incident neutron has the velocity v, then the scattered neutrons' velocity statistics are described for A, T, S, by the relations V = V=(A(V), V = VT(V), V' = Vs(V) Here v' is the velocity of the scattered neutron, (OA(v), ((T(V), (PS(V) are known functions, characteristic of the three substances A, T, S (they vary all from 1 to 0), and v is a random variable, statistically equidistributed in the interval 0, 1.

Every fission neutron has the velocity vo. I suppose that this picture either gives a model or at least provides a prototype for essentially all those phenomena about which we have relevant observational information at present, and actually for somewhat more. It may be expected to provide a reasonable vehicle for the additional relevant observational material that is likely to arise in the near future. Do you agree with this?

In this model the state of a neutron is characterized by its position r, its velocity v, and the angle 0 between its direction of motion and the radius. It is more convenient to replace 0 by s =r cos 0, so that v/r s2 is the "perihelion distance" of its (linearly extrapolated) path. Note that if a neutron is produced isotropically, i.e., if its direction "at birth" is equidistributed, then (because space is three-dimensional) cos will be equidistributed in the interval -1, 1, i.e., s in the interval -r, r.

It is convenient to add to the characterization of a neutron explicitly the No. i of the zone in which it is found, i.e., with ri_-< r < ri. It is furthermore advisable to keep track of the time t to which the specifications refer.

Consequently, a neutron is characterized by these data: i, r, s, v,t .

Now consider the subsequent history of such a neutron. Unless it

suffers a collision in zone No. i, it will leave this zone along its straight path, and pass into zones Nos. i + 1 or i - 1. It is desirable to start a "new" neutron whenever the neutron under consideration has suffered a collision (absorbing, scattering, or fissioning-in the last-mentioned case several "new" neutrons will, of course, have to be started), or whenever it passes into another zone (without having collided).

Consider first, whether the neutron's linearly extrapolated path goes forward from zone No. i into zone No. i + 1 or i - 1. Denote these two possibilities by I and II.


23

If the neutron moves outward, i.e., if s > 0, then we have certainly I. If the neutron moves inward, i.e., if s < 0, then we have either I or II, the latter if, and only if, the path penetrates at all into the sphere ri- 1. It is easily seen that the latter is equivalent to s2 >>r2 r2_1. So we have: s > O.-. A s< {r' 1 + S2 _ r2 < O B'} . I . s < 0.'- B{ri2_qs2 _2- .B O . r?+s 2 _r2 > '0B"} 11

The exit from zone No. i will therefore occur at *= ri for I . = ri-1 for II. It is easy to calculate that the distance from the neutron's original position to the exit position is d = s*- s, where *--2 + for II

The probability that the neutron will travel a distance d1 without suffering a collision is 10 -fd, where f =(a ) + EA(v) + ) + ) + ) (v)) X + (aT() ZT(v)) Yi+ (aS( V) + Ea (V)) Zi

It is at this point that the statistical character of the method comes into evidence. In order to determine the actual fate of the neutron, one has to provide now the calculation with a value A, belonging to a random variable, statistically equidistributed in the interval 0, 1; i.e., A is to be picked at random from a population that is statistically equidistributed in the interval 0, 1. Then it is decreed that 10- fd has turned out to be A; i.e., d'10 log Af -logA *

From here on, the further procedure is clear. * Today this older notation would read d1 = -log10 A/f. (Eds.)


24

If d'>d, then the neutron is ruled to have reached the neighboring zone No. i ± 1 ( + fr II) without having suffered a collision. The "new" neutron (i.e., the original one, but viewed at the interzone boundary, and heading into the new zone), has characteristics which are easily determined: i is replaced by i ± 1, r by r*,s is easily seen to go over into s*, v is unchanged, t goes over into t* = t+d/v. Hence, the "new" characteristics are i ± 1,r*,s*,v,t*.

If, on the other hand, d 1 < d, then the neutron is ruled to have suffered a collision while still within zone No. i, after a travel d1 . The position at this stage is now r* = /r2 + 2sd + (dl)2 , and the time t* =t+.v

The characteristic contains, accordingly, at any rate i,r*, and t* in place of i, r, and t. It remains to determine what becomes of s and v.

As pointed out before, the "new" s will be equidistributed in the interval -r*, r*. It is therefore only necessary to provide the calculation with a further value p', belonging to a random variable, statistically equidistributed in the interval 0, 1. Then one can rule that s has the value = r*(2p' - 1)

As to the "new" v, it is necessary to determine first the character of the collection: Absorption (by any one of A, T, S); scattering by A, or by T, or by S; fission (by A) producing 2, or 3, or 4 neutrons. These seven alternatives have the relative probabilities fi =EaA(V)Zi + EaT( v)yi+ EaS(v)Z f2 - fl= E A(V)Xi f3- f2 =YsT(1V)i4-f3sS(v)zi f-f4= f((V)xi f6_f5 = fA (v)xii f - Z 4 (V) f-f26= ()x -


25

We can therefore now determine the character of the collision by a statistical procedure like the preceding ones: Provide the calculation with a value p belonging to a random variable, statistically equidistributed in the interval 0, 1. Form fi = -f; this is then equidistributed in the interval 0, f. Let the 7 above cases correspond to the 7 intervals 0, fl; f, f2; f2, f3; f3, f4; f4, f5; f5, f6; f6, f, respectively. Rule, that one of those 7 cases holds in whose interval f actually turns out to lie.

Now the value of v can be specified. Let us consider the 7 available cases in succession.

Absorption: The neutron has disappeared. It is simplest to characterize this situation by replacing v by 0.

Scattering by A: Provide the calculation with a value v belonging to a random variable, statistically equidistributed in the interval 0, 1. Replace v by V = VpA(V) Scattering by T: Same as above, but v = VPT(V) Scattering by S: Same as above, but v' = v(S(v) ·

Fission: In this case replace v by vo. According to whether the case in question is that one corresponding to the production of 2, 3, or 4 neutrons, repeat this 2, 3, or 4 times, respectively. This means that, in addition to the p',s' discussed above, the further p", s";p"', s"'; p"", s"" may be needed.

This completes the mathematical description of the procedure. The computational execution would be something like this: Each neutron is represented by a card C which carries its characteristics i, r, s, v, t, and also the necessary random values A, p,v,p pl?' ",,p, ,,

I can see no point in giving more than, say, 7 places for each one of the 5 characteristics, or more than, say, 5 places for each of the 7 random variables. In fact, I would judge that these numbers of places


26

are already higher than necessary. At any rate, even in this way only 70 entries are consumed, and so the ordinary 80-entry punchcard will have 10 entries left over for any additional indexings, etc., that one may desire.

The computational process should then be so arranged as to produce the card C' of the "new" neutron, or rather 1 to 4 such cards C', C", C"', C"" (depending on the neutrons' actual history, cf. above). Each card, however, need only be provided with the 5 characteristics of its neutron. The 7 random variables can be inserted in a subsequent operation, and the cards with v = 0 (i.e., corresponding to neutrons that were absorbed within the assembly) as well as those with i = N +1 (i.e., corresponding to neutrons that escaped from the assembly) may be sorted out.

The manner in which this material can then be used for all kinds of neutron statistic investigations is obvious.

I append a tentative "computing sheet" for the calculation above. It is, of course, neither an actual "computing sheet" for a (human) computer group, nor a set-up for the ENIAC, but I think that it is well suited to serve as a basis for either. It should give a reasonably immediate idea of the amount of work that is involved in the procedure in question.

I cannot assert this with certainty yet, but it seems to me very likely that the instructions given on this "computing sheet" do not exceed the "logical" capacity of the ENIAC. I doubt that the processing of 100 "neutrons" will take much longer than the reading punching, and (once) sorting time of 100 cards, i.e., about 3 minutes. Hence, taking 100 "neutrons" through 100 of these stages should take about 300 minutes, i.e., 5 hours.

Please let me know what you and Stan think of these things. Does the approach and the formulation and generality of the criticality problem seem reasonable to you, or would you prefer some other variant? Would you consider coming East some time to discuss matters further? When could this be?

With best regards; Very truly yours, John von Neumann


27

Tentative Computing Sheet

Data: (1)ri, Xi, Yi, Zi as functions of i = 1,..., N1 . (ro = 0.) (2) EaA(V),aT(),EaS(),EsA(), EsT(V), Ess(vu), w(f f(3() ,fAfA(v as functions of v > 0, < vo.2 (3) vo. (4) (A(vV), (T(V), (PS() as functions of v> 0, < 1.2 (5) -0 logA as function of A > 0, < 1.2 Card C: C1 i C2r C3 s C4 V C5 t Random Variables: R1 A R2U R3v R 4p' R 5pl R 6P"' R 7p"" 1 Tabulated. (Discrete domain.) 2 Tabulated, to be interpolated, or approximated by polynomials. (Continuous domain.)


28

TENTATIVE COMPUTING SHEET (Cont'd.) Calculation: Instructions: Explanations:

1 r of C1 - 1,see (1) ri-1 2 rofCl,see(1) ri 3 (C3)2 S2 4 (C2)2 r 5 3 - 4 2 -r2 6 3>0.'.A f > 0'.A < C <O.-'.B<0.-.B Only for B: 7 (1)2 r2 Only for B: 8 5 + 7 r2 _] + - 2 Only for B: 9 8 > O.'. B"2 2 2 2 > B" >my lor ^. } 0.'.B" ^i-i ^ + s> 0 .B" 10 A or B'..2A or B' '. ri - r * B".'. 1 B".'. i- = A or B'.. +1 A or B'.'.+1 = B". .-1 B".'. -1 = 12 (10)2 r*2 13 5 + 12 r*2 + s2 _ r2 14 11(sign) x 3 s* 15 14-C3 d 16 x of C1, see (1) Xi 17 y of C1, see (1) i 18 z of C1, see (1) zi 19 > oaA of C4, see (2) EaA(v 20 16 x 19 EaA(V)Xi 21 EaT of C4, see (2) EaT(V) 22 17 x 21 EaA(V)Yi 23 20 + 22 EaA(V)Xi + EaT(V)yi 24 EaS of C4, see (2) EaS(V) 25 18 x 24 EaS(V)Zi


29

TENTATIVE COMPUTING SHEET (Cont'd.) Calculation: Instructions: Explanations:

26 23 + 25 fl = ,aA(V)Xi+ EaT(V)yi+ EaS(V)Zi 27 -aA of C4, see (2) ZaA(v) 28 16 x 27 f2 - fl =E,A(V)Xi 29 26 + 28 f2 30 ZsT of C4, see (2) EsT(v) 31 17 x 30 f3 - f2 = EsT(v)Yi 32 29 + 31 f3 33 Es$of C4, see (2) Ess(v) 34 18 x 33 f4 - f3 = EsS(V)Zi 35 32 + 34 f4 36 (2) of C4, see (2) E>(2) 37 16 x 36 f5 - f4 = E(A(U)Xi 38 35 + 37 f 39 :(3) of C4 see (2) E(3) ( 40 16 x 39 f6 - f5 -= f(V)Xi 41 38 + 40 f6 42 >(4) of C4, see (2) (4)(v) 43 16 x 42 f - f6 = Z (v)xi 44 41 + 43 f 45 -10 log of R1, see (5) -10 logA 46 45:44 d1 47 46>15.'.P >d..P 47 46< 15.'.Q d < d.'. Q 48 P 15 4C4 P.'.d:} Q Q.46:^ Q.'.:]d' 49 C5 + 48 t* = t+


30

TENTATIVE COMPUTING SHEET (Cont'd.) Calculation: Instructions: Explanations:

Only for P: 50 C1+ 11 i* = i + Only for P: 51 C' :50Ci zi* C2: 10 C2:r* C': 14 C:s* 4:C4 C:v C5:49C5 : t* From here on only Q: 52 R 2 x44 = ,f 53 52 < 26.'.Q1 < fi'. Q1 { < 29 } * Q2 {<f2 < .>{<29 } .' Q3 {->f2 .. < 32 Q < f3 1 ' >{<325} Q4 <qf3!.Q4 < 35 J {< f4 > 35I .Q > i f4 Q0 < 38 <5A {>3 } . Q6 {> }A Q6 <41 - <{<f6 ' > 41.'.Q7 _ f6 'Q7 54 C3 x 46 sd1 55 2 x 54 2sd1 56 (46)2 (dl)2 57 55 + 56 2sd1 + (d1 )2 58 4 + 57 r2 + 2sd1 + (d1 )2 59 58 r*


31

TENTATIVE COMPUTING SHEET (Cont'd.) Calculation: Instructions: Explanations:

Only for Qi: 60 C: Ci C: i C2: 59 C2:r* C3:. C3.. c4:0C4:0 C5:49C5: t* From here on only Q2,. ., Q7: Only for Q2: 61 OA of R3, see (4) PA(v) = ( Only for Q3: 62 YT of R 3, see (4) rPT() = V Only for Q4: 63 ps of R 3, see (4) $s(v) =) Only for 64 C4 x (61 or vo Q2,Q3, Q4 62 or 63) 65 Q2,Q3, Q4 .64 Q2,Q3,Q4-'-V =lv Q5,Q6, Q7 (3) Q5,Q6,Q7.'. V 66 2x R42pt 67 66- 1 2p- 1 68 59 x 67 s' 69 C1: C1 C :i C2 59 C2 r* C3: 68 C3:' C4: 65 C4:v C5:49C5: t

From here on only Q5, Q6, Q7: 70 2x R52p" 71 70 - I 2p"- 1 72 59 x 71 s" 73 C: C1C:i C2': 59 C2:r* 3': 72 C' :s"C" : 65 C4:v' C" :49 C':t*


32

TENTATIVE COMPUTING SHEET (Cont'd.) Calculation: Instructions: Explanations:

From here on only Q6, Q7: 74 2x R62p"' 75 74 - 1 2p"'- 1 76 59 x 75 s'W 77 C' : C1C1' i C2": 59 C2": r* C3/: 76 C3": s"' C4": 65 C4": v' C': 49 C5": t*

From here on only Q7: 78 2x R7 2p"" 79 78 - 1 2p"" - 1 80 59 x 79 s"" 81 C"tt Cl C0'": i C2"': 59 C2"' :r* C3"': 80 C3"' : s"" C4': 65 C'": v' C5"': 49 C5" : t*

At right, sample page of John von Neumann's handwritten "tentative computing sheet" as it was used in this 1947 report.


33

Instruationa iExpluations t


34

April 2, 1947 Professor John von Neumann, The Institute for Advanced Study, School of Mathematics Princeton, New Jersey Dear Johnny:

As Stan told you, your letter has aroused a great deal of interest here. We have had a number of discussions of your method and Bengt Carlson has even set to work to test it out by hand calculation in a simple case.

It has occurred to us that there are a number of modifications which one might wish to introduce, at least for calculations for a certain type. This would be true, for example, if one wished to set up the problem for a metal system containing a 49 core in a tuballoy tamper. It seems to us that it might at present be easier to define problems of this sort than, for example, problems for hydride gadgets. It is not so much our intention to suggest that the method you are working on now should be modified as to suggest that perhaps alternative procedures should be worked out also. Perhaps one of us could do this with a little assistance from you; for example, during a visit to Princeton.

The specific points at which it seems to us modifications might be desired are as follows:

1. Of the three components A, T, S that you consider, only one is fissionable, whereas in systems of interest to us, there will be an appreciable number of fissions in the tuballoy of the tamper, as well as in the core material.

2. On the other hand, we are not likely for some time to have data enabling one to distinguish between the velocity dependence of the three functions (2) (), (3)fa (v), (v)

that you introduce so that for any particular isotope these might as well be combined into a single function of velocity with a random procedure used merely for determining the number of neutrons emerging. If there is a single such function of velocity for each of the three isotopes 25, 28, 49, the total number of function tables required would be the same as in your letter.


35

3. It is suggested that in the case of 25 or 28, one might wish to allow also for the possibility of one neutron emerging from fission. The dispersion of the number of neutrons per fission is not too well known but we think we could provide some guesses.

4. Because of the sensitive dependence of tamper fissions on the neutron energy spectrum, it might be advisable to feed in the measured fission spectrum at the appropriate point. This would, of course, require introduction of one or two additional random variables and would raise the nasty question of possible velocity correlation between neutrons emerging from a given fission.

5. Material S could, of course, be omitted for systems of this sort. On the other hand, when moderation really occurs, it seems to us there would have to be a correlation between velocity and direction of the scattered neutron.

6. For metal systems of the type considered, it would probably be adequate to assume just one elastically scattering component and just one inelastically scattering component. These could be mixed with the fissionable components in suitable proportions to mock up most materials of interest.

In addition, we have one general comment as follows: Suppose that the initial deck of cards represents a group of neutrons all having t = 0 as their time of origin. Then after a certain number of cycles of operations, say 100, one will have a deck of cards representing a group of neutrons having times of origin distributed from some earliest tl, to a latest, t2 . Thus all of the multiplicative chains will have been followed until time t1 and some of them will have been followed to various later times. Then if one wishes, for example, to find the spatial distribution of fissions, it would be natural to examine all fissions occurring in some interval At and find their spatial distribution. But unless the interval At is chosen within the interval (0,tl) one cannot be sure that he knows about all the fissions taking place on At, and the fissions that are left out of account may well have a systematically different spatial distribution than those that are taken into account. Therefore, if, as seems likely, t1 << t2 , it would seem to be necessary to discard most of the data obtained by the calculation. The obvious remedy for this difficulty would seem to be to follow the chains for a definite time rather than for a definite number of cycles of operation. After each cycle, all cards having t greater than some preassigned value would be discarded, and the next cycle of calculation performed with those


36

remaining. This would be repeated until the number of cards in the deck diminishes to zero.

These suggestions are all very tentative. Please let us know what you think of them.

Sincerely, R. D. Richtmyer. cc: S. Ulam C. Mark B. Carlson


37

3—
Multiplicative Systems in Several Variables I, II, III:
With C.J. Everett (LA-683, June 7, 1948) (LA-690, June 11, 1948) (LA-707, October 28, 1948)

These three reports with C.J. Everett form the foundation of a very large amount of work concerning cascades of particles, epidemics, and a great variety of other processes of this kind. As can be seen in the book on branching processes by Harris, quoted in the note for Chapter 1, it is the basis of several theories in probability theory. Much of the material in these reports is still unexploited and will give rise to further applications in problems in astronomy, chemistry, biology, and other fields where chain reactions are studied. (Author's note).

I

Abstract

This report is the first of a series developing the probability theory of systems of particles of many types, differing in nature, position, and velocity, which undergo transformations of type. The principal technique is that of the generating transformation G whose iterates yield the probability distributions for higher generations. The properties of G with respect to fixed points which determine criticality, convergence under iteration, and relation to first moment matrix M are obtained. Necessary and sufficient conditions for supercriticality of the system in terms of M are given. A ratio theorem is proved for supercritical systems stating that, with overwhelming probability, the distribution of progeny among types in high generations will be essentially in the ratios of the unique characteristic vector of the first moment matrix M of G.


38

Introduction

The following report, first of a series, deals with some of the purely mathematical theory of multiplicative systems in several variables. The scheme admits various physical interpretations. The t variables may be regarded as different kinds of elementary particles, for example the electrons, photons, mesons, etc., occurring in cosmic ray showers, or as particles of the same nature but belonging to different velocity groups.

The methods and results generalize those of a report by D. Hawkins and S. Ulam on multiplicative systems involving one type of particle. The principal results are: (1) probability distributions for higher generations are given by iteration of the generating transformation G, (2) relation of first and second moments of the k-th generation to those of the first via the Jacobian M = J(G) and Hessians of G, (3) existence of a unique fixed point x° = G(x ° ) of the unit cube in the supercritical case, (4) convergence of this cube to x° under iteration of G, (5) necessary and sufficient conditions for supercriticality in terms of the first moment matrix M, (6) existence and uniqueness of a positive characteristic vector v of M, (7) convergence of the positive sector to the v-ray under iteration of M, (8) flow of probability toward the v-ray with succeeding generations.

The next report will deal with problems arising in subcritical systems.

I—
The Generating Transformation

1. Suppose that a system of particles consists of t distinct types, and that a particle of type i upon transformation has probability pl(i;jl,...,it) of producing jl +... + jt new particles, ji of type i, i = 1,...,t. For every set of non-negative integers ji,...,jt, we have then pi(i;jl,...,jt) > 0, and for each i, j pl(i;j,...,jt) = 1.

With each i, we associate a generating function gi(xi,...,xt) =Ejpli(i; il,.. , jt)xj ... xt, which defines the probabilities of progeny at the end of one generation from one particle of type i. Hence x'= G(x), explicitly, X 1= gl(xl,,Xt) Zt= gt(xi,...,Xt)


39

with 0 <xi< 1, defines a generating transformation of the unit cube It of euclidean t-space into itself.* Moreover, abbreviating (1,...,1) as 1, we see that G(1) = 1, so that the point 1 is a fixed point of transformation G.

2. If, in a given generation, the generating function is f(xi,..., xt) = k q(kl,, k. t)xl ... x.kt, i.e., the coefficient q(k) is the probability in this generation of the state: ki particles of type i, i = 1,..., t, then the generating function of the next generation is f(gi,... , gt) = q(k) [-pl(l;j)xl ,.xt]. . [Zpl(t;j)xJl ..Xti]tk j j

3. If we begin with one particle of type i, then the generating functions are: for 1st generation, gi(x1,...,xt), for 2nd generation, i(g,..., gt), for 3rd generation, gi(gi(gi,...,gt),...,gt(g,..., gt)), and so on. Hence, adopting the notation Gk(x) for the k-th iterate of the transformation G(x): x' =Gk(x) : = (x1,... ,Xt) = .pk(i;j)xi ...x i = l,...,t we have that gk) (x) is the generating function for the k-th generation of progeny from one particle of type i.

II—
First moments. Jacobian

1. Let f(xl,...,xt) = jq(j)x ....x xt be the generating function for a particular generation. Then Of/Oxl = Ej q(j)jxl' x- 22 . . xt and 1O)f(=q(j0 )j Ox, ] =1 q(jj = E E q=jE,jjt) j = P(jl)jl i ' = J2=o...jtj-=o0 where P(j1) is the probability of ji particles of type 1 in this generation.

* Euclidean t-space is the set of all real t-tuples, with the distance function d(x,x') =/E(xci -x')2 . The unit cube It consists of all points x for which 0 x zi _ , i=1,...,t.


40

Hence we define Of/Oxl]x=l as the first moment for particles of type 1, similarly, af/dxj]x=1 as the first moment of particles of type j.

2. We adopt notation Og k)/xj = m () as the first moment of particles of type j in the k-th generation of progeny from one particle of type i.

3. Recall that, for a transformation G(x) : gi(xl,...,xt), the Jacobian matrix is J(G) = [agi/Oxj] (i-row, j-column) and if H(x) is a second transformation: hi(xl,...,Xt), then the Jacobian of the composite transformation H(g(x)) : hi(gl,...,gt) is J(H(G))] = J(H)]G(x) J(G)]x, since &hi(gl,.. .,gt)/Dxk-=hjdhi//Xj]G· •gj/g Xk]x-

4. It follows, since Gk = G(Gk- l) that J(Gk)_ = J(G)Gk-IJ(G)Gk-2 ... J(G)G J(G)x, and for a fixed point,*x = G(x),J(Gk)t = (J(G)0)k, i.e., the Jacobian of the k-th iterate of G at a fixed point is the k-th power of the Jacobian matrix of G. In particular, since 1 = G(1), we have J(Gk) 1=(J(G))k.

5. Now J(Gk)1[ag\ k)/xj][m()], so that the relation [m()] = [m\l)] exists between the first moments of the k-th generation and those of the first.

6. Since (S-1AS)k = S-1AkS for matrices, we have [mi)] = S(S-lMkS)S-1 = S(S-IMS)kS-1 where S-1MS is the canonical form of M = [m(l). This permits more rapid computation of mri.

III—
Second moments. Hessians

1. Let f(xl,...,xt) again be the generating function of a particular generation. Then 2 fl/x2 = q(j,. . ,j,)j(j - 1)X-2Xj2 . . . t jl>2 so that

*A fixed point x = (xi,...,Xt) of G is one such that xi = gi(xl,...,Xt), i = 1,...,t. Clearly, x = G(x) implies x = G(x) = G2(x) = ... so that a fixed point of G is also a fixed point of all iterates Gk of G.


41

02f/OX211 =Z q(j)jl(ji - 1) + qj)jil(i - 1) = j >2 jl>1yq (j)j2 =q(j)jl , JI>i i ql>l ji?1 jiŽ1 where, just as in II, the former sum is the second moment of particles of type 1 in this generation.

2. Letting t(k) represent the second moment of particles of type j in the k-th generation of progeny from one particle of type i, we have Q2(k)/xj2]=-m

3. Recall that, for a single function g(xl,..., xt), the Hessian is defined as the matrix H(g) = [02g/QxiOxj], (i-row, j-column). If S(x) is a transformation: si(xi,.., Xt), then for the function g(S) = g(si,.., st), we have 9g(S)/9xi =Ej 9g/xj]S· - sj/0xi and hence 02g(S)/OxiOxk = y(92 g/0qXj,Xr) s sr/0k ' QSj/OXi + Z (09g/Xj)S02 Sj/0XiOXk . j,r j In matrix notation, this reads * H(g(S)) = J' (S) H(g)s J(S) + (0Og/0xj)sH(sj) J

4. In particular, if we set g = gn(xi, . ,xt)andS = G- l, we have H(g,,(Gk-)) = H(g(k)) = [02g(k)/Oxixk] = 'r (G1) H(gn)Gk-1 J(G -) +- 9Ogn/0Xj, H(gjl ) . Ji If we write Hk) =H (gn)) and J =J(G)i=[mij], we have

1 H2) = (Jk+1)7H(1)jk-'m, ..... (Jk-l)TH(l)Jk·-l + m (J-2)TH(i)Jk2 + Zmnjlmj,j(J3 ) ji jlJ2 H()Jk-3 + + . mnlm332 ... mjk-2jk j-H "ji2 ,k-k-Ijl ,•••,jk-I *If A = [aij], AT denotes the transposed matrix [aji].


42

If we define mij as the element in the i,j position of M we obtain H ( k) = (Jk-1)H Jk-1 + Jnj(Jk-2 )rHjJk-2 + Z(J2 )j (jk-3)T 3 J HjJk-3+ + Z(Jk-l)jH * 5. Since Hn)= [09g()/oxiOXk] r(kc) _(k) t(n) - mm1* (k) (k) L * t't - m)nt the preceding result gives the second moments t(k) for the k-th generation in terms of the first and second partial derivatives associated with the first generation.

IV—
Fixed Points of the Transformation x[prime] = G(x)

1. We study properties of the transformation x' = G(x): x = gi(xi, . . . t) = Zjpi(i; ji ... jt)xlx22 ... xt, on the unit cube It. We write 0 = (0,...,0), and as before, 1 = (1,...,1), x = (x1 ,...,xt).

2. We write x = (xl,...,xt) < x' = (x1 ,...,x') in case xi <xi,i 1,... ,t. Then x -< x' implies G(x)< G(x'), since, for each i, gi(x1, . ..,t) < gi(x1,...,xt)<gi(x1,x2,...,xt) < gi(x1,...,xt). The individual inequalities hold because gi has all coefficients non-negative, and hence is monotone non-decreasing in each variable separately.

3. We recall that a fixed point x= G(x) of G is a fixed point of all iterates Gk(:), and hence that Gk(1) = 1: 1 = gik)(1,1,...,1) all k>O, i= ,...,t.

4. Since gi(0,...,0) = pi(i; 0,..., 0), we have 0 < G(O), and so, from 0 -< x < 1 follows 0 < G(0) - G(x) < G(1) = 1. Hence G(It) cIt, G and all its iterates being transformations of the unit cube into itself.

*Note that in case t = 1, this reads gk = (g')2k-29g" + .. + (g,)k-19g" at x = 1, a relation obtained in Chapter I.3


43

5. If x° = limGk(x) then x° is a fixed point of G. For x°= limGk(x) = limG(Gk-l(x)) = G(limGk-l(x)) = G(x°). Thus all limit points under iteration of G are fixed points.

6. The set of all fixed points of G(x) is closed, i.e., a limit x° of a sequence of fixed points x"= G(xv) is itself a fixed point. For we have x° = limx = limG(xV) = G(lim x") = G(x°).

7. Since 0 < 1, we have 0 - G(O) < G2 (0) < ... - G(1) = 1, and for each i, g(k)(0) = pk(i;0) is monotone non-decreasing and bounded above by 1. Hence exists limg(k)(O) = limpk(i;0)= xi ° < 1. It follows that limGk(0) = x° = (x°...,x°) exists and x -< 1. Moreover, from 6, x°= G(x° ) is a fixed point.

But g(k)(0) = pk(i;O) is the probability of death in the k-th generation of progeny from one i-particle, and is the limit approached from below by this probability. For this reason we speak of x° as the death fixed point of G, and say i-progeny form a supercritical system in case xi< 1, subcritical if xi= 1.

8. From 0 -< x - x° , follows Gk(0) < Gk(x) -<Gk(x° ) = x° and since limGk(O) = x° , also limGk(x) = x° for all x in the indicated range.

9. If the death fixed point x° is 1, then limGk(x) = 1 for all x in It. This follows from 8.

10. If 0 -x -< I and limGk(x) exists, then x° - limGk(x) < 1. For Gk(0) - Gk(x)- Gk(1) = 1 and x° = limGk(0) < limGk(x) -< 1.

11. If x is a fixed point of G in It, then x°-< - 1. For x = Gk(x) for all k, so lim Gk(x) exists, and we use 10.

12. If O -<x = (Xl,...,Xt)x' = (X,...,X) < 1, i.e., xi < x, i = 1,... , t and Xj x x for at least one j, then gi(x) < gi(x') for all i = 1,..., t. For, in the inequalities of 2, we have gi (xl,   , xjl, , xj+1, ,Xt) < gi(x', . . .,z 1x,zXj+ ..,zt), since by the law of the mean their difference is Ogi (x',.. x'j_l, j,j+, ..,xt)/xj. (xj- Xj) where 0 < Xj < Ej < Xj. *

* In this paragraph and in the remainder of the paper we assume 3gi/axj> 0 for all i,j and all x : 0 of It.


44

13. If x = (xl,...,xt) I x = (xi,...,Xt) =G(x), the latter being a fixed point, then gi(x) < gi(x) = xi, i = 1,. ., t, as a consequence of 12. In particular, if the death fixed point x° is not 1 = (1,...,1), that is, if a least one xj < 1, we have xi= gi(x ° ) < gi(1) = 1 for all i. This means that =limpk(i; 0) is 1 if and only if it is for every i, so that we may speak unambiguously of a system as subcritical (all x i= 1) or as supercritical (all x9 < 1), regardless of which type of particle is considered as ancestor of the process.

14. We have seen that x° -< 1 and that any other fixed point x of It must satisfy x° -< x -< 1. Hence, for G subcritical, x°= 1 and there is only one fixed point. Moreover, if G is supercritical (x°< 1) and x is an additional fixed point, x° ± x -1, we have xz < xi < 1, for all i, from 13. We shall see that no such third fixed point can exist.

15. If 0 ¢ x° then 0 < gi(0) < gi(x° ) = xi. Thus, if the death fixed point is not 0, all of its components are positive.

16. Suppose x, x are chosen in It with xi 7 xi,i = 1,..., t. Define Fi(t) = gi(x + (x - x)t) on 0 <t< 1. Taylor's expansion gives Fi(1)Fi(O) + F'(O) + 1/2F"(Oi), 0 < 0i < 1. Now Fi(1) = gi(x), Fi(0) = gi (), F'(t) = Ej (Ogi/Oxj) + (x - X)t(xj-) , Fi(o) =E (g/ilaxj)(xj- j) Fi(t) = E(02 gi/OxjOxk)±+(x-_)t(Xj - x)(Xk - Xk) , and F"'(9) = y(0 2 g9i/OXjxk)x+(x-x)i (xj - Xj)(Xk - Xk) Hence gi(x) = gi(x) +Ej(Ogi/xj) (zx - xj) + 1/2 Z (02gi/Oxk0xk)V() (xj - Xj)(Xk - Xk) where !) = Xj + (Xj - Xj)t, 0 < t < 1 In case Xj < Xj for all j, we have Xj < i < Xj, and if xi < Xj for all j, then xj < < < .

17. We show now* that, in the supercritical case (x0 1) the transformation G has no fixed point in It other than x° and 1. We have

* In this section and in the remainder of the paper we assume that for each gi(x) at least one of the 02 gi/0qxjxk does not vanish on the range x° -< xi -< 1. This is the last restriction we place on G.


45

already seen that if such a point x exists we have for all of its components the relation < xi < 1. In 16, let x be the hypothetical third fixed point, and set x = 1. Then 1 = i(1) = xi + E (&gi/9xj),(1 - xj) + 1/2 E (02gi/OxjOXk)(i) (1- xj)(l - Xk) where < < < <1. Hence 1 - xi > E ( /gi/xj),(1 - xj) .........(A)

Now, in 16, let x again be the third fixed point, but set x = x° . Then we have x° = gi(x° ) = xi +E(0gi/0xj)_(x° - Xj) + 0-(!i) 1/2 Z (02i/xjxk) .(i)(xo - tj)(X -Xk) where xj<< j < 1. Hence xo - Xi > E (0gi/0x j)i(x ° - j), or xi - X < (gi/0xj).(j -xo) ........ (B).

Now set ui = 1- xi > 0, vi = xi - xi > 0, Pij = (Ogi/0xj)x > 0 We have then, for every i, (A') ui > EpijUj > 0 (B') 0 < vi <Epijvj (B") 1/vi > 1/ Epijvj > 0 (C) ui/vi> Zpijuj/Zpijvj > 0.

Now define um/Vm = min {ul/vl,..., ut/vt} so that for every j, jl/vj>u,/vm and hence Uj>(u,/vm)vij. Hence in (C) with i = m, Um/Vm > (EPmjUj)/(EPmjr j) > (Z-Pmj j(urm/Iv,)) / (Epm vj) = Um/Vm, a contradiction.

V—
On the lim[sub(k)]p[sub(k)] (i;j)

1. We recall that Gk(x) is denoted by X'i = gi ) , Xt) = pk(i; 0) + Epk(i;jl, ,jt)x ..' Xt, jfo where limpk(i; 0) = limg)(0) x i.


46

2. In the subcritical case, x° = 1, we have 9)(1) -pk(i; O) = 1- pk(i; 0) = Zpk(i;jl,...,jt), jfo and hence limk ZjAo Pk(i; j) = 0.

3. In case 0 x° 1, we have seen that 0 < x i < 1 for all i. Hence, let m = min {x, . . ., x} where 0 < C < 1. Then gi()()) - pkki;0)=) -= pk(i;j) (XI)L (Xt) j3o and hence limk 1Ej_oPk(i;) (x°)i1 .. (x °)j = 0. For every integer Uo> 1, we have* -Pk(i;j)(Xl°)j l ... (X)j > E Pk(i; j)(X)j . ..(XO)J > jo 1<aU(j)<a Z Pk(i;j)f(j i)> 3 Pk(i;j)k since 0 < < 1 and r(j)< r l<a(j)<Oa l<a(j)<<a on the range of summation. It follows that linm E pk(i; j)W = 0 and hence 1<o(j)<a likm pk(i;j) = since e0. 1<a(j) a

4. In the only remaining case x°= 0, we have Gk(0) = 0 and so gi()(0) = pk(i; O) = 0 and thus gk)(xi,...,xt) = EjoPk(i;j)xl ..j t. We will show first the existence of a point x such that G(x) -< x 1. Since G(x) has for its component functions the gi indicated above, we have Ogi/dxi = Epi(i;j)jixl-'xlz2 . ..xand so(gi/Oxi)o = pi(i; j0,. ..,0). Thus generally,(Ogi/0xj)o=pi(i; O,. . . , ).

Letting x = 1 in G(x), 1 = Ejo Pi (i;)>Ej=Pi (i; 0, . ., 1 , 0) = (9gi /lxj,)o Moreover, if 1 = (Ogi/Oxj)o, then all other pl(i;j) = 0 whence gi(x) = E t=Pi(i; O,..., ,..., 0)xj. But then (02 gi/&xjaxk) 0 for all x of It and all j, k, thus violating our assumption on second partials. Hence Ej(Qgi/Oxj)o < I for each i.

Define Si(x1 ,...,Xt) = E(9gi/Oxj)x on It. Then Si(O) < 1, hence for each i there exists a 0i, 0 < ~i < 1, such that, for all x satisfying 0 <xj< (i we have Si(x) < 1 (continuity of S). * We use the notation a(j) = jl +... + jt, for j = (jl,...,jt).


47

Let U = min(f1,...,t}j , 0 < C < 1. Then for all x such that 0 <xj<< 1, we have Si(x) < 1 for all i. Now the Taylor form for gi(x) about x = 0 may be written gi(x) = 9i(O)+j(0gi/axj)txj = 0+E-(0gi/'xj),(i)xj, with 0 < x < x. Let x = (i,...,) where 5 is that above. Then gi(<,...,0) = ( /gi/)xj) (i) < since 0 < xj <~. Hence for x= (C,..., ) we have G(x) -< x ~ 1. It follows that 0 < ... < Gk(x)-Gk-1(x) - ...-G(x)-<x 1 so limGk(x) ~ 1 exists, and is a fixed point of G. Since there is only one fixed point (x°= 0) other than 1, we have lim G(x) = 0.

For x= (,...,) then, lim g(k) () = limjoPk(i; j)(i) = Again fix a as any positive integer, and we have -Epk(i;j)e()>pk(i;ij)>(j)> pkP(i;j)". jO 1l<a(j)<al<o(j)<a So limk Zpk(i; j)l = 0 and thus limppk(i; j) = 0, where 1 <a(j)< a.

5. In summary, we have shown in this section that in all cases, for every a and every i lirm pk(i;j)= 0 k 1<a(j)<i and hence trivially, limkpk(i; jl... ,jt) = 0 for all i and all j 5 0.

VI—
On lim G[superscript(k)](x) = x[superscript(0)]

1. In case x° = 1, we have already seen that, for all x in It, lim Gk(x) = x°. We prove in this section that, in the remaining case, x° ¢ 1, lim Gk (x) = x° for all x / 1 of It. Indeed, we already know this when 0 - x -< x°.

2. We shall use the following Lemma. Let 0 <xi < 1 for all i, and p = max{x1 ,...,xt}, 0 < p < 1. Then for every e > 0 there exists a a > 0 such thatxil . . xt < e. 0-(j)>47


48

Analogies between Analogies

Proof. For arbitrary a > 0 we have (Margenau, Murphy p. 417) 00oo oo00 Xjl ... tjt _< E t a(j) -'o(j)= E tCm-lm o(j)>a a(j)>r m=a+l a(j)=m m=ar+ The series :=o Ct+m-l m converges by the ratio test: Ctmnm+l l/C't+m- Im = (t + m/m + 1) /-/< 1

3. Let x°0 1 and fix x in It, with 0 < xi< I for all i. Let l = max {ix,...,xt } and write g((x l...,Xt)-pk(i;0) = Ppk(i;j)x' ...xjt. Fix e > Oj30 Then by 2, there exists a a such that i x... xit < e/2. a(j)>a For this a, there exists a K such that, for all k>K, pk (i;j) < e/2. l1<a_(j)<a Hence, for all k > K, k) (Xl, . .,Xt) -pki)=Pk(i;j)x. ..x+pk(i;j)x .... t l<a(j)<ra a(j)>a < S pk(i;j) + E x' ....xtt < e/2 +e/2 e .1<oa(j)a a(o(j)>a Thus for every x such that 0 < xi < 1, all i, we have 0 = lim (gk)(x) - pk(i; 0)) since limpk(i; 0) = x i

4. Now suppose 0 - x 1. Then gi(x) < gi(1) = 1, for all i, so G(x) is a point of the type considered in 3. Hence x° = limGk(G(x)) = lim Gk+l(x) = lim Gk(x). This completes the proof that, in all cases, every point x7 1 of It has the lim Gk(x)=


49

VII—
Supercriticality Conditions

We give now necessary and sufficient conditions for supercriticality of the transformation G(x), namely that limGk(O) = I1 . By definition, mij = mi') = (Ogi/axj)i. Since we have assumed all (&gi/Oxj) non-vanishing for x :$ 0 on I, clearly all mij > 0.

Theorem. The following conditions on G(x) are equivalent: (a) limGk(O) = x ° 1. (b) there exists an x such that 0 -< G(x) -< x 1, where gi(x) < ii, 0 < -i < 1 for all i. (c) there exists an x such that 0 -< G(x) -<x1, where 0 <Xi < I for all i. (d) there exists an i such that 0 -< G() -< x 1. (e) there exists an x such that _ mij(1 - Xj) > 1 - ji, where 0 < xi < 1, for all i. (f) there exists a 5 = (1i,..., St) such that E mijbj > 6i, where bi > 0 for all i. (g) the matrix [mij - ij]t = J(G(1)) -I contains at least one upper principal minor A, = [mij - bij] (1 < v <t) such that (-l1)v+ I[AI> 0 where (-l)"+l · IA, > 0 in case v = t.* (h) there exists a v = (vl,...,vt) such that Evimij > vj, where vj > 0 for all j. (i) there exists a characteristic root r > 1 of M = [mij] and a corresponding characteristic vector v = (v, .. ., vt) with all vi > 0, that is, Zvimij = rvj for such r and v.

In matrix notation m ii .l * mit (vi,..., vt) · =(rv, ..., rVt) mtl mtt or briefly, vM = rv.

* [aij]~ indicates a matrix whose indices i,j range from m to n. 6j is the Kronecker delta, 1 or 0 as i is or is not equal to j. I is the identity matrix [6ij]. IAI indicates determinant of A.


50

1. The proof consists of the following implications: a e /\\/\d (c f-*g(- h-(i. \ / b The theorem has been amplified to permit proof in easy stages, but the essential content is the reduction to matrix theory in J(G)1 of the question of criticality of the (non-linear) transformation G(x).

2. (a-c) If x° 1, then 0 -G(x° ) = x° 1, 0 <x° < 1, all i, so that x° serves as x in (c).

3.(c> d) is of course trivial.

4. (d-a) If 0 -< G(x) - x 1, then Gk(0)-Gk(2) -< Gk-l(x) < ... -< G(.) < .l1, hence x° = limGk(O) __< 1.

5.(c —e) If 0 - G(x) -< a 1, 0 <xi < 1, all i, we have xi > gi(2) = gi(1) + - mij(xj - 1) + 1/2 E(igi/Ixj&xk)((i)(xj - 1)(xk - 1) where Xj <((i) < 1 for all j. Hence xi >1 -+ E mij(Xj - 1) and 1 - xi < Z^mij(1 - ;j) with 0 <Xj < 1, all j.

6. (e -- f) is trivial, since we may define (i = 1 - xi > 0 in terms of the x given in e.

7. (fb) Suppose t=' mij6j > 6i > 0, i = . .., t Since mij = (Ogi/9xj)l and Ogi/Oxj is continuous, there exists a 5, 0 < 1 < 1, such that (O9gi/Oxj)S6j > 6i, all i, whenever <Xj < 1, all j. Define p = min {(I1- )/Si;i j = 1,..., t} > 0. Then _(Ogi/9xj) p6 j > p6i all i, wherever I<Xj< 1, all j. Now define i = 1- Sip; i = 1,...,t. Then E(Ogi/Oxj)=(1 - Xj) > (1 - i) all i, whenever I<j< 1, all j. Since 0 < 1 - xi = 6iP< 1 (because of definition of p), we have 0 < < xi < 1, all i. Now define x = (1t,...,t) where, as we have seen, 0 < xi < 1.Then 1 = gi(1) = gi(x) + E(Ogi/0xj)(i)(1- xj)/x(i) where j < xj < 1, all i,j. x i < all i, j. K Since j< Xy < < , we have (O9gi/Oxj)=(i)(1 -j) > 1 - Zi and 1 > gi(x) + (1 - Xi). Hence gi(x) < xi, all i, and 0 -< G(x) -< x1 for an x of the type required by (b).


51

8. (b , c) is trivial.

9. For the equivalence of (f) and (g) we need the following:

Lemma. Let A = [aij]l be a matrix in which aij > 0 for all i $ j. Then there exists a point 6 = (6 1,...,6 t), with all 6i > 0, such that Eaij6j > 0, all i, if and only if, for some v, (1 <v < t) the upper principal minor A, = [aij]} satisfies the relation (-1)"+1A,Il> O, with (-1)V+1. IAVI > 0 in case v = t.

Proof. We first prove E aijSj > 0 implies the condition on principal minors. Proof is by induction on the order t. If t = 1, we are given that a116 1 > 0 for 61 > 0 so that all > 0, and (-1)2 1A1 1 > 0 as required. Assume the statement of the theorem for order t - 1, and suppose given j= 2aijSj > 0, i = 1,..., t, t > 2, all 6i > 0, aij > 0 for i 4 j. If all > 0, in the given system, we have at once (-1)2 All > 0 as required (since t > 2). Now suppose all < 0. Then taljSi > -a1161 and aii61 + -t 2aijSj > 0, i = 2,..., t. Since -all > 0 and ail > 0 (i = 2,...,t) we multiply to obtain t ailalj6j > ail(-aii)S1 and (-all)ai6lS + 2 aij (-all)6j > O, i = 2, ,t. .,tHence J (-a11) aij6j+ -ailaib6=Et(-allaij+ ailai)6 > 0; i = 2, ...,t.

Since, for i $ j,aij > 0, and for i, j =2,...,t, ail > 0 and aij > 0 we have also (-allaij+ ailaij) > 0 for i 5 j. By the induction assumption on systems of order t - I there exists a minor of order v: AU = [-allaij + ailalj]2+l such that (-1)"+lA,Il > 0 with (-1)"+1 IAIl > 0 if v = t- 1. But for the original matrix we have lA+l = laijl+l = I...alj... = l/(-all)v ...-alj...a . . . . allai... l/(_ 1)'. i'alja( - alaij + ailaij ...) 1/(-ail) =ai/-a alla12 a) 1.alt


52

Hence (-l1)+2. IA,+ =(-1)v+2ali/(- all)"IAv = (-1)"+l/( - all)-11A,l> 0 and > 0 in case v + = t (i.e., v = t - 1).

We now prove the converse, namely, the condition on minors implies the existence of a 6 such that E aijSj > 0. Proof is again by induction on order t.

In case t = 1, we are given that (-1)2 1All = all > 0, so for (say) =1 > 0 we have allb6 > 0.

Assume the theorem true for order t - 1, and suppose given A = [aij] with t> 2, having for a given v(1 < v < t) (-1)v+1 IA. > 0 (> 0 if v + t).

If all> 0, let 62 = ...= t = I and choose 61 > 0 so that 61 > max { - (1/al) t2aij; i = 2,..., t}. Then aijj = all6l+ E2 ali> 0 + 2 aij > 0 since alj > 0 for j $ 1. Also, for i = 2,...,t, aij = al61= Et aij> - t aij aij= 0, by definition of 61. Thus the conclusion follows in case all = (_1)2. IA}l > 0.

Suppose then that all < 0. Then the given A, must have 2 < v< t. Now 0<(-1)v+l-A, = (-1)-+1aijv = (-1)V+1 . aij ... .. aij . ... (-l)v+ /(-aii)-. aj ..·.• (-_1)+1/ (-a11)v . (-allaij) ... ... a 13=(-1)v+l/(-al1)V-lallal2   . ait .. (-allaij + ailalj) l0 _l-1 (-1)"/(--all)-2·IAI-. Hence (-1)>"A,_1 > 0 for 1 <v-1<t -1 (> O if v-1=t-1) for the system of order t - 1:

A = [-allaij + ailaij]2 in which -allaij + ailalj > 0 for i 4 j just as before. By the induction hypothesis, 6i > 0, i = 2,... , t exist such that E ( - allaij+ aaiaj)6 j> 0, i 2,...,t. Hence ail ( aj6j/ - all) + t aij6j > 0, i = 2,..., t. Let 61 represent the parenthesis of the last inequality. Then el > 0, since aij > 0 for j $ 1, and substituting, ailel + aij6j> 0,i = 2,..., t.


53

Choose T so that 0 < T < el, and T < (1/ail)(ailel + 12 aijsj) for all i = 2,..., t. Finally, define 61 = e1 - T > 0. Then Etaij 6 j a11 61+ '2 alj6j = allel + 2- alj6+ (-all)T =(-ail)T> 0, and for i = 2,..., t,t aij0j = ai161+ aij6j = ailel + aijj - ailT > 0 because of the second inequality defining T. Hence the lemma is proved.

10.(f -9 g) If JEmijj > 5j > 0, all i, then Z(mij - 6ij)6j > 0, all i, and the matrix [mij- 6ij]t satisfied the condition of (g) because of the lemma.

11.(g -9 f) If the matrix [mij - bij]t satisfies the condition of (g), there exists by the lemma numbers 6i > 0 such that (mij - bij)6j > 0 and hence E nmijSj > 6i, all i.

12. (g --h) If [mij - 6j] has the property (-1)'+1 \IAI > 0 (> 0 if v = t) for some v, then the matrix [mij - 6ij]T = [mij - 6ij] has* (-1)+l' . IAI = (-1)>+ 1 . IA i > 0 (> 0 if v = t). Write mij = mji; then [mij - 6 j] has (-1)^+ · IlAj > 0 (> 0 if v = t). But we have seen that (g -+f) so that there exist vi > 0 such that Emijvj > vi. Hence _mjivj > vi, or changing notation, Zvimij > vj, all j.

13.(h -* g) If E vimij > vj > 0, all j, then E mjivj > v > 0, and, defining mij = mji, Emijvj> vi > 0. But we know that (f -+ g) so the matrix [imj - 6ij] has the property (-1)^+l · nij - —ijl _> 0 (> 0 if = t). But then (-1)v+l(mij - iji = (-1)V+l [mi - 6jil = (-1)V+lmij- 6ij\l> 0 (> 0 if v=t).

14. (i - h) If vimij = rvj for vj > 0, r > 1, we have vimi > v3, hence (h).

15.(ih i) Assume Ei=l vimij > vj > O. Since (Evimij)/lv > 1,

j = 1,...,t, we define ro so that ( vimij)/vj > ro> 1, j = 1,...,t. Then vimij> rovj, all j. Write _ Vimij = rovj + Ej.

* Recall that IAl = IAI .


54

Define llvll = v2 > 0. Then E(vi/2 llvll)mi j = ro(vj/2. lvIl) + (ej/211vll ). Define vO= (vj/2 11vll) and ej = (e/211vll). We have now v?°mi = rov + ej, where ej > 0, vO > 0, all j, and Ilv°ll = 1/2. Define 1 = min l/2v°;mij/ /mij; all i,j= 1,...,t >0. Let A = {v; vj>t, llvll < 1, E vimij>rovj, all j}.

(A) The set A is non-void for v° E A and is a subset of euclidean t-space Et. For v.O> 2i > p by definition of u,Iv°0 l = 1/2 < 1, and v°mij > rov.

(B) The set A is bounded, since \lvll< I for all v e A.

(C) The set A is closed. For, let v = limv" where vaeA. We have to show v e A. Since vja > , all a, j, we have vj = lima vj > /p. Also, Ilvi = IIlimaaill = ([limvall < 1, since 11 I is continuous and all iva 11 < 1. Finally, vimij = E (limava) ij = lima Vmij> limara rv= ro lim a v=rovj Hence v E A.

(D) The set A is convex. That is, for every v,v' C A, and every a, a'> 0 with a+a' = 1, we have av+a'v' E A. For avj + a'v' > au+a' = (a + a')p = . Also llav + a'v'll<a lvl +'- lIv'il <a + a' = 1. And (avi + a'vD)mij = a Zvimij +a'_vimij>arovj + a'rovj = ro(avj + a'v).

(E) The set A has an inner point, e.g., v° . For v° E A there exists a neighborhood of v° contained in A. Let e = min vj- ; ej (ro + mij); j= l,...,t. .

Since vj > 2, > -, vO - > 0 and e > 0. Now let v be any point of the e- neighborhood of v° , that is, lIv - v°ll < e. We show that v e A. For Ivj -v°\< Z/ (vj - = v) 11 - lv- < e, so vj > vj, -e > /, by choice of e, and thus v satisfies the first requirement for membership in A.

Next, Illvll-IIvOlll<liv-v°0 1 < e, so llvll < e+llvll- = 1/2+e < 1. The latter inequality obtains since e < v -/ < vj< Z(v ) < v = ilv°ll = 1/2. Hence v satisfies the second requirement.

Finally, E vimj>E (v - e) mij = rov + ej- e Ei mij> rov. The latter inequality is true since it requires ro (v - vj)>e mij - ej, i.e.,


55

o- j>(e Ei mij - j)/r . But we know v-j > -e and it suffices to prove -e>(e E mnij-ej) /ro or ej >e( i mij + ro). But e satisfies this requirement by definition.

The properties A-E show that A is a convex body in Et. It is well known, therefore, that any continuous transformation T of A into A has a fixed point T(v) = v, v E A. (Alexandroff-Hopf).

For every v E A, define T(v)= vM/llvMll, that is, T(v) = v' where V = Eivi, mij/ Ej(vimij)2 . Since for v c A, vi > > 0, and mij > 0, clearly IIvMll > 0 on A.

First we show that if v e A, then T(v) E A. For (vj)2 = (Evimij)2 /Ej(Zivimij)2 EviZ2mj/Ej(Ev2) (im) > 1/VII2 . Ei {mTjvi /Eimij }> (1/IIVI12 )2'EiVi2 = SO Vj > / as required for T(v) = v' E A.

Moreover, IIT(v)II = (IvM/IIvMlIII) = 1. Finally, EiVimjk = (1/I1vMll) Ej mjk Eivimij>(/ll|vMll) E mjkrovj = rov;, so T(v) e A, and T(A) c A. Clearly, T(v) is continuous on A. Hence, for some v e A, T(v) = v = vM/lIvM\l, and thus Vj · IIvMI = vimij>rovj, so I\vMII>ro > 1. Set r = llvMll and we have Evimij = rvj with vj > 0 and r > 1. Thus the theorem is complete.

16. Some general remarks on positive matrices are now appropriate. If [mij] is a matrix with all mij > 0, it can have at most one characteristic root r > 0 with a corresponding positive characteristic vector v, all vi > 0, that is, the equations vimij = rvj and Evjmij = r'vj; v j, vj, r, r' > 0 imply r = r'.

For, choose A, > > 0 so that A <(v'/vi)<j,i = 1,...,t. Then Avi<v i< p/i, and since vM = rv implies vMk = rkv and v'M = r'v' implies v'M = r'v', we have EAviMik<EV/m(k) < ,Virnm,k) and hence Arkvj<r'kvj < prkvj, where Mk = [m()] and clearly m( > 0. Then Avj <(r'/r)kv' < LiVj, an impossibility for high k, if the ratio r'/r is either less than or greater than one.


56

17. We shall make use of the following

Lemma. If B = [bij] is a matrix with bij> 0 for all i $j, and if Evibij = 0 v= Evibij for vi, v' > 0, then there exists an a > 0 such that v = avi, i = 1,...,t.

Proof. Induct on order t. Suppose t = 1. Then we have vlbll = 0 = vlbll, vl, v1> 0, so v' = avl where a = vl/vl. Assume the lemma for systems of order t - 1, and suppose given t vbij = 0 = tv'j with t > 2. Since vibil= -bll vi and bil > 0 (for i 4 1) and vi > 0, all i, we must have -b1l > 0. Note also vibij +2vib = 0, i = 2,...,t.

Hence T2vibibij = -blbiivl and vlblj(-bl) + Evi(-bli)bj = 0, j =2,...,t. Substituting, ' vi - bbj + bill) = 0, j = 2,...,t. For i 7 j, bij > 0 and - bl > 0, bil > 0, bi > 0 for i, j > 2. Thus the matrix [- bIbij+bilblj]t has the property of the lemma. In identical fashion we have also 2 v i( -b 11bij+ bilblj) = 0. Thus, by the induction hypothesis, I i v' = avj, j = 2,...,t, for some a > 0. But -bllv' i = vb il = a vibil = a(-bllvl) and thence also vi = avl.

18. It follows that, if mij > 0, all i, j, and E vimij = rvj, E v'mij= rv' for vj,v3 > 0, then v' = av. That is, a positive matrix can have at most one positive characteristic vector (up to the scalar multiples) corresponding to any one of its characteristic roots.

For, the above equations may be written vi (mij - ijr) = 0 = i v'(mij - bijr), where the non-diagonal elements of [mij - bijr] are the mij > 0.

19. If M is the moment matrix of any G, there is one and only one solution for r > 0, vi > 0 of the relation vM = rv. G is supercritical if and only if the solution r is greater than one. (See 20 for existence.)

20. The proof of property (i) for supercritical matrices was complicated by the fact that we wanted to prove r > 1. It is possible to show more easily the weaker result that a solution of the relation in 19 exists for arbitrary matrix M = [mij] with all mij > 0.


57

Define = min{l/(4vt); mij/ ijm} > 0, A = {v; Vi ,>L, lvll< 1} and T(v) = vM/\\vMll for v in A. Then just as in the proof of (i) one verifies A is a convex body in Et, and T(v) is a continuous transformation of A into itself. As such, there exists a v in A such that T(v) = vM/IIvMIl = v, and for r = IlvMll > 0, vM = rv. Since v E A, vi> t > 0.

21. Note that, if a(v) = vi, o is a linear functional: ou(av + a'v') =Z(avi+ a'v') = a,vi + a' E via(Vv) + a'(v') . If we define T(v) = vM/a(vM) on the sector vi > 0 (v $ 0), where M has all nij > 0, we see that a(T(v))u(vM)/a(vM) = 1 and T(v) may be regarded as the central projection of vM onto the hyperplane v 1+. . .+t = 1. Our operator T has some interest beyond that of the classical Markoff operators (see G. Birkhoff) since it need not be a contraction in the positive sector. For example, if M =[1 1 one finds that it is supercritical, has positive characteristic root r = 5, corresponding vector = (4/5,1/5) = v where o(v) = 1, and vM = rv. Hence T(t) = v. Let 6 = (1,0). Then IT(b) - T(v) I = /377/20 > 116 - vIl = /2/5. Hence T(6) is not closer to v than 6 was. Nevertheless we prove limTk(v) =v for all vectors v in the sector vi> 0 (v# 0). Our proof of this fact is lengthy but of a general topological character.

22. Let {yi} denote the simplex with vertices yl,...,y" in E, that is, the set of all points y = E aiyi , ai > 0, E ai = 1. If A is an arbitrary matrix, we have (Caiyi)A ai(yiA) so that the map under A of the simplex {yi} is the simplex with vertices yiA: {y'A= {yA)}

23. For arbitrary x with a(x) 0 define the transformation xS = x/ao(x). If the yi are all $ 0 and have all components non-negative, clearly the same properties are possessed by all points of {yi}. Hence we may operate on {yi} with S and {yi}S = {yiS}. This says that the projection onto the hyperplane vi = 1 of the simplex {yi} is the simplex with vertices yi/ai where ci - ao(yi).


58

For, {yi}S is the set of all aiyi/ aioi and {yiS} is that of all ,bi(yi/ai) where ai,bi> 0, and Eai = 1 = bi. We have to show that, given either the ai or the bi, the other set may be determined so that i aiyi/ E ai ri(bi/i)yi (1) Given the ai, definition of bi = aiai/ r aii implies bi> 0, E bi = 1 and validity of (1) so that {yi}S C {yiS}.

Given the bi, it suffices to find ai so that aiai = biajcj or E (biaj- bijoj)aj = O,i = 1,..., n. (2) J J

The determinant of the latter system is zero since addition of each row but the first successively to the first yields in the (1,j) position aj b bi - j =rj- aj-j = 0.

Hence a solution (al,...,an) $ (0,...,0) exists, and we see from ai/i aii =bi/ai > 0 that all ai have the same sign. Hence we can choose a solution ai> 0 for all i, with Eai > 0, and then ai/Zai fulfills all requirements.

24. If T(v) = vM/oa(vM) then Tk(v), the k-th iterate of T, is given by Tk(v) = vMkl/a(vMk). Proof is trivial by induction on k. Hence Tk(v) = (vMk)S and Tk{yi} = ({yi}Mk)S = {yiMk}S = {yiMk/a(yi'M)} ={T (yi)}

25. Define j6 = (0,..., 1,. ., 0), the j-th unit vector of t-space, and A = {6i}, Ak = Tk (A) = {Tk(5i)}. Since mij > 0, clearly T(A) c A and hence A D T(A) D T2 (A) D ... is a "nest" of simplices. Define D = QAk, the intersection of all Ak. We prove that T(D) = D. Let de e D, i.e., de = T(dl) = T2 (d2) .... Then T(do) = T2 (dl) = T3 (d2) = ... is in D and hence T(D) c D.


59

To obtain the other inclusion, we make use of the lemma. If {yi}1 is a simplex contained in A such that T(y) = T(y') for y 4y' in {yi}, then the linear dimension of the subspace spanned by the T(yi), i = 1,...,t is less than that of the space spanned by the yi, i = 1,..., t.

Proof. Let (say) yl,...,yd be linearly independent vertices of the simplex {yi} and the remaining yd+i dependent upon them.

Write y = Edciyi+ y'= d c'yi in {yi} C A, hence a(y) = a(y') = 1. Suppose that T(y) = ciyiM/l E ci&i = T(y') = E cyi M/ E cii where i = a(yiM). If we let C = ciai, C' = E ci then (ci/C c'/C')yiM = 0. Now not all coefficients here are zero, for if so, we should have c' = (C'/C)ci =Rci and hence y' = c'yi = R cyi = Ry. But then cr(y') = Ra(y), R = 1, and thus c i= ci, = = y', a contradiction. Hence the yiM are dependent (i = 1,...,d) and since the yd+i are dependent on the yi, i = 1,..., d so are the yd+iM dependent upon the yiM, i = 1,...,d. It follows that the space spanned by the yiM, i = 1,...,t is of lower linear dimension than d. But this space is identical with that spanned by the T(yi) = yiM/&i.

Since the dimension t of the space spanned by the 6i can suffer only a finite number of reductions, it follows that, for all k>K, for some K, T is one to one on Ak to Ak+l.

Now we prove D c T(D). Let d e D so that we have for all k, d =Tk(dk) for some dk E A. Now for all i > 0, T++l(dk+i+l) = Tk+i+2 (dk+i+2) or T(Tk+i(dk+i+l)) - T(Tk+i+(dki+2))

where the operands of T are in Ak+l C Ak. Since T is one to one on Ak, we have ek= Tk+i(dk+i+l) = Tk+i+l(dk+i+ 2) for all i. Hence d = T(ek) where ek e K +1 Akc C Q Ak = D, and thus D c T(D).

26. We now show D is itself a simplex. To this end, consider the sequences Tk(Si),i + 1,... ,t, of vertices of the Ak. There exists a common subsequence k, such that limTk(bi) = Ai all exist i = 1,...,t. We claim that the simplex {iV}t is indeed the set D.


60

First, it is clear that D = I Tk(A) = T Tk"(A) since the Ak are nested. Hence it suffices to prove that {Ai } = TTkv(A).

Since the Ai = limTk"(6i), Ai is a limit of a sequence of points which are in Tk"(A) = Ak, for all v > N. Since Tk(A) is closed, Ai E TkN(A). Hence Ai c ; Tkv(A). Now all Tk"(A) are convex, hence so is their intersection D. Since all Ai are in the convex set n Tk"(A) = D, and {Ai} is the convex hull of the Ai , we have {Ai } c 0 Tkv(A) = D (Alexandroff-Hopf).

We prove now nTku(A) c {Ai}. Suppose d in the intersection but not in {Ai}. Then, since {Ai} is closed, there exists an e such that the e-neighborhood of d excludes {Ai}. However, since limTkv(6i) = Ai , we can find N so that, for i-+1,..., t, I TkN(6i) - Ai < e/2. Now d C TkN(A) hence write d = EaiTkN(Si). Then |ld - Eaiill = E ai(TkN(i) Ai)ll < Eaie/2 = e/2 and thus EaiAi of {Ai} is in the e-neighborhood of d, a contradiction.

27. Since {Ai} = n Tkv(A) = n Tk(A) =D, we have that T is one to one on the simplex {Ai} to all of itself. Now {Ai} is the convex hull of the finite set of points i,. . ., At. As such it is a convex polyhedron, and the set of all its geometric vertices* is a subset of the original, say Al,...,A, and {Ail = {Ai}. We know that {Ai}l T = {Ai} {T(Ai)}I. It follows that T(A1 ), T(A2 ),..., T(An ) is a permutation of the geometric vertices A1 ,..., A, and hence for some N, TN(Ai) = Ai = AMN /u(AiMN), i = l,...,n or AiMN = T(AiMN) . A. But MN is a matrix with all elements positive. By the uniqueness results obtained previously, it follows that n = 1, and D = n Tk(A) = {A'}, where AlMN =((AMN) ·-1 . But for M itself we have vM = rv so VMN = rNv and again by uniqueness, Al = v.

Hence limTk(v) = limvMk/a(vMk) = v for all v with vi> 0, (v y 0).

* A geometric vertex is the point of the polyhedron not an inner point of any segment {x, y} of the polyhedron, P, y, x in P.


61

VIII—
A Theorem On Ratios

1. Let Gl,...,Gt be arbitrary generating functions, each in the variables xl,..., xt, and define G = G... Gt. Define Ma = -(aG/aXa) 1, Ta = (a2G/O 2 XC)l + Ma Mia = (&Gi/xa)l, Tia = (2 Gi/Oax)l+ Mia Then aG/Oxa = (G1/Oxa)G2...Gt+G 1(aG 2/xa)G 3 ... Gt + GiG 2. . Gt-l (aGt/OXa) and 2 G/xa =(d2 Gi/x2)G 2 .. .Gt + (aCGl/xa)(QG2/Xa)...Gt +... +(aGi/Xa) ...(aGt/, a) +... + G1(aG2/xa) ... (oGta/9xa) +. + G... (oGt/xa).

Hence Ma = Ma + M2a + ... + Mta and Ta- Ma = (Tla - Ml) + MlaM2 a +... + MiaMta +MlaM 2a +(T 2a-M 2a) + + M 2aMta + . + MlaMta +... M2aMta + . + (Tta - Mta).

Define Da = Ta - M2 and Dia = Tia - M2 as the various dispersions for particles of type a. Then Da = EZiMia + Z= 1(Tia Mia) + Ei=j MiaMia = (Z i Mia)2= Ei Tia - i Ma = Zi Dia.

2. Write G(x)=P(ji,. ..,jt)x ...x . Then (T-M M2) +.+(Tt-Mt2) = EDa=Dia > P(j E S)K2 *i,a * P(j e S) means the probability that j e S, i.e., , P(jl,...,jt) over all j = (j, ...,jt)S.


62

where S denotes the set of all j = (jl, . , Jt) such that /(jil -Mi1)2+... + (jt -Mt)2>K . Hence P(j c S)<Eia Dia/K2 .

3. We remark that, if H(x) = hn(x), where h(x) and hence H(x) is a generating function of xl,...,xt, and as usual Ma = (OH/OXa)1, Ta = (O 2 H/dxa) + Ma, ma = (ah/dxa)1, ta = (02 h/9xa)1 + ma, then OH/9 x2 = nh'-l (h/Oxa) and a2H/ = n [(n-l)hn-2(h/9xa)2 + hn-l(a2h/IX2)] Thus Ma = nma and Ta = Ma + n(m - 1)m2 + n(ta - ma). Hence Da = Ta- M2 = n(ta - m2 ) = nda.

4. Now let G = (gk), i= , . ..,tfor a fixed k, and G = (gk)) .. (gtk))t for a fixed n =(nl,...,nt) 4 O. Then with reference to this G, P(j C S)< E Dia/K2 where S = {j; v/(i-M 1)'+c ...+- (-Mj) >K} . But now Ma =- nlm(k) + n2( + . .. +nt(k) and Dia = nldk) where nltla + '2n2a - 'Jtttta- --telUiad(k)=t(k)-([mk)) . Thus P (j G S)<nid )/K 2 , where iaiaiai'daia S {j; j-nMkll>k}

We summarize: for every n =(nl,...,nt), k, and K > 0, in the probability distribution having generating function (k)ni (nk)n- ( i zr 9g1g9t= Pnlk (ji.· ,jt)xI...Xt it is true that Pn,k(Ilj - nMk \>K) <d(k) a(n)/K2 where d(k) =max { ad(k). -diaia But then Pn,k(ij/a(nMk) - Tk(n)fl>e)< d(k)a(n)/e2(a(nMk))2


63

Now o(nMk) = Enim(k)> E nim(k) = m(k)a(n) where iJ m(k) minim { mij}. Hence Pm,k( jll/((nMk) - Tk(n)l> e) < d(k)/e2(m(k))2 a(n) = e(k) /e2 U(n), where ek) = d(k)/(m())2 .

Then for every n / 0, k, e > 0, we have for 9.k)nl .9) = Pn,k (jl ., jt)Xj' ...X, that Pn,k(lIj/T(nMk) - Tk(n)l> 1/2c) <ek/1/4e26(n).

Now for arbitrary e > 0, rn > 0, there exists a k (hereafter fixed) such that lITk(n) - v\\ < 1/2e for all n 4 0. For this k, and the original e > 0, r1 > 0, we can now fix a so that for all n with a(n) > a, we have e(k)/1/4C2 r a(n) < 1/277, and hence Pn,,k(lIJ/o(nMk) - Tk(n)| > 1/2e) < 1/21. But we have the set inclusion, for each n with a(n) > a: {j; Ilj/a (nMk) - Tk(n)> 1/2e} D {i; ll/ (nMk) - v>}.

Hence u(n) > n implies Pn,k (II ji/ (nMk) - vll > e)< 1/271. For the fixed a, and the original rV, there exists an R so that for all r > R, i= 1,...,t, EO<(n)<a Pr (i;nl,... ,nt) < 1/2. Now for r > R, we have G(k+r) = Gr (Gk) so that g+r) =(r)(gk)) / >.•\g (k)l(k)t(k)n (tpr(i; O)+l<,(r P<pr,(i;n)g )n' g)n +a(n)>aPr(i; n).(k)'•(k)n, and hence 1 < pr(i; 0) + 1/271 + 1/271 + > p,(i;n)Pn,k (Iljl/ (nMki) - vl < e) a(n))>a

It only remains to relate the last term to the probabilities pk+r(i; j) of the generation k + r. Consider the set J(v,, ) of all j= (jl,...,jt) such that llj/r-vil < e for some r > 0. Since llj/r - vil < e if and only if lj -rvll < re, this means the set of all j within an angle about v of opening arc-cos 1 - e2/11ll l2 . Since every term Pr (i;nl,   , nt) Pn,k (jl, . ,jt) involved in the above sum represents part of the total probability Pk+r (i; jl,..,jt) for a j = (i,...,jt) c J(v, e), we have


64

pk+r (i; jl...,j t) > 1 - . Hence we have the jEJ(,e )Theorem. For every e > 0, r > 0, and i = 1,..., t, there exists a K such that for all k>K, :pk (i; jl -, jt) > 1 -, where the summation is over all vectors j such that \IJ - rvll < re for some r > 0.

This means that, with overwhelming probability the distribution of progeny in high generations among the t types will be essentially in the ratios of the unique characteristic vector of the first moment matrix M of the original generating function G.


65

II

Abstract

We continue in this report the generalization of the methods and results of Hawkins and Ulam2 which we began in I, being concerned principally with systems which are below critical. After deriving necessary and sufficient conditions for this state in terms of first moments, we study the direction of flow induced in the "unit cube" by the corresponding generating transformation. The latter results are used to show that the distribution of the generation in which death first occurs possesses moments of all orders.

Limits of expectations are obtained for the problem of subcritical system with source, and for that of total progeny (corpses) in subcritical systems.

Finally, it is shown how fictitious particles of new types may be introduced in such a way that certain more complicated problems may be reduced to the case of simple iteration of generating transformations. In particular we have shown how this may be accomplished for the system with source, and for the problem of total progeny.

The next section, III of this series, will deal with measure theorems on the space of all genealogies possible in multiplicative systems.

I—
Some Properties of the Jacobian

1. We consider a multiplicative system involving t types of particles, in which a particle of type i has a fixed probability pi(i;jl, .. jt) of producing a total of jl +... +jt particles, j, of type v, upon transformation. The corresponding generating transformation G(x) of the unit cube It:gi(x)- =-Pi (i; il,..*,jt) xi '*6 xtit5


66

has the property that, upon iteration k times, the resulting transformation Gk(x): gi)() = Pk (i;jl,.. .,jt)x...xt has as its coefficient pk(i;j) the probability that the state (jl,...,jt) shall exist in the k-th generation of progeny from one particle of type i.

2. The transformation G(x) has for its Jacobian at x = 1 the first moment matrix J(G(1)) = [Ogi/axj = [mij][m,i j and, more generally, J(Gk(l)) = [gk)/axj] =[mij ] where mk) is the expectation of particles of type j in the k-th generation of progeny from one particle of type i. Under our assumptions on G (see I), all these moments are positive. Moreover, we have seen that [mij] = [mij]

3. The importance of matrices with positive elements required study of their properties. We found that for such a matrix M, there is one and only one solution, r, v of the relations vM = rv, r > O, v > 0 (i.e., all vi > 0) In similar fashion one shows the existence and uniqueness of r', v' such that Mv' =r'v', r' > 0, v' > . That r = r' in these equations is evident from the following

Lemma. If vM = rv, r > 0, v > 0, and R is an arbitrary positive characteristic root of M, then r > R. Similarly, Mv' = r'v', r' > 0, v' >0, IM - RII=, R > 0 implies r'>R.


67

Proof. First statement: Let wM = Rw, w 5 0, and define b= min{wi/vi}, B = max{wi/vi}. Since all vi > O, bvi<wi <Bvi and hence, bVm(jk) <Em < BVim(k) and brkvj <RkW<BrkVj, all j. Since r > O, also bvj<(R/r)kwj<Bvj. Suppose R/r > 1. If at least one wj > 0 the right member yields a contradiction for large k. If at least one wj < 0 then the left member does. Hence all wj = 0, contradicting choice of w. Thus R/r< 1.

The second statement of the lemma is proved similarly.

We include for later use the trivial remark: If M is a matrix with positive elements, and Mv' = rv', r > O, v' > O, we have Mnv' = r"v' for all positive integers n, and thus rnvj = m > min(vl ). Hence . m7(n) < r~v'/ min (v) -= rVi < rr max(vi) Vr", where V is a positive constant.

Similarly, if vM = ri, r > 0, v > 0, we have Ei m () < Wrn, where W is a positive constant.

4. We shall also need the following

Theorem. If M is a matrix of positive elements with Mv' = rv', r > 0, v' > 0, and Tk(v) is the transformation Mkv/s (Mkv), then limTk(v) = v' uniformly for all v $ 0, vi> 0.

Here s(w) indicates the sum E wi of the components of the vector w. The proof is entirely analogous to that used in I to prove the same result for row vectors.

5. We proved in I that lim Gk(O) = x° exists: lin gk)(0) = limpk(i; 0)=Xi, and defined G as supercritical in case all < 1. Under our conditions on G, the alternative case is that all x = 1, hence = 1, and this case we called subcritical. Most of the present report will be devoted to systems of the latter type, in which the probability of death in generation k rises to limit one.


68

6. We have seen in I that a system is subcritical if and only if the maximal root r of the first moment matrix M = J(G(1)) is less than or equal to one. Equivalently G is subcritical if and only if the determinants IAnl of the upper principal minors An = [mij - Oij]1 of the matrix A = [mij - ijl = M - I satisfy the relations: (-1)n+1lA1l < 0, n = 1,... ,t-1, (-1)t+lIM - I < .

7. We say a system is just-critical in case it is subcritical and the maximum positive characteristic root r of M is equal to one. A subcritical system with r < 1 is said to be below-critical. The justcritical case, while of theoretical interest is refractory, and we have limited ourselves for the most part to systems which are below-critical.

Theorem. A subcritical system is just-critical if and only if IM-II = 0. If r = 1, then of course 0 = \M-rI = IM-II, r being a characteristic root of M. Conversely, if IM - II = O, r > 1 by maximality of r, and r< 1 by assumption of subcriticality; hence r = 1.

8. We include for future reference the trivial remark: If M is a matrix of positive elements with Mv' = rv', 1 > r > O, v' > O, then there exists an e > 0 such that all vectors w in the e-neighborhood of v' are positive and satisfy the inequalities mi Wj < -(1 + r)Wi < Wi .

It suffices to note that the functions Ri(w) =_ mijwj/wi are continuous at w = v' and there have value Ri(v') = r < 1. Hence there is an e > 0 so that whenever llw - v'll< e, w will have positive components and Ri(w)< 1/2(1+ r).


69

II—
Direction of Flow of G[superscript(k)](x) in Subcritical Systems

1. In this section we study properties of the vector 1 - GK(x) with i-th component I-g?(k) (x), for x 1 in It, for a subcritical system, and of the vector Gk+l(x) = Gk(x) in a system below-critical. These results are of preliminary character, and are exploited in III.

Note that, for x $ 1 in It, we have x; I and thence gk)(x) < (k) (1) = 1 for all k so that the vector 1 - Gk(x) is not zero; in fact all its components are positive.

2. Recall that the e-cone of the vector v' consists of all vectors w such that (lw/a - v'll < e for some real positive a. We prove the

Theorem. If G is subcritical and x $ 1 is in It, then for every e > 0 there is a K so that for all k>K the vector I - Gk(x) is in the e-cone of v'.

Here v' denotes the characteristic vector of the relation Mv' rv', r > 0, v' > 0, where M is the first moment matrix of G. The theorem asserts, in geometric terms, that the direction from Gk(x) to 1 approaches the direction of v' with increasing k.

Proof. Fix x $ 1 in It, and e > 0. There exists a k (hereafter fixed) such that ITk(v) - v'\ < e/2Vt for all v $ 0, vi> O. (The transformation Tk is that defined in I 4.) Since limGn"() = I and the first partials of Gk are continuous at 1, there exists an N such that for, all n> N, mre) = (rga/nXj) < eMk/2 (() where Mk min { E mk;ij j}, and the partial is evaluated at the point Gn (x)e. Fix n >N and define an = E (1-g)(i)) =s(Mk (1-G(x))). Then vi-( - (1-g x)/a[_va-y'(O-gk)/9xj)p (l-g)( n))/aI from Taylor's form of gk)(x) expanded about 1, and evaluated at x = Gx(z). Thus G(x)-<P-< 1.


70

The latter absolute value does not exceed v -Em (k)-g(n(- ))/a + ' -T(k) (1 - G(n)()) I +E(m()-_ (ag(k)/D X)) (1-gn)(i))/a~ < e/2vt + eMk/2Vt s (1 - C(n)()) /a. But ak> Mks (1 - C(n)(x)) so finally the original absolute value is seen to be less than e/vt. But then Ilv' - (1 - Gk+n(X)) /akll< e.

3. In the below-critical case in one variable (t = 1) the graph of the generating function g(x) is monotone increasing and concave up on the interval (0,1) with 1 = g(1). Moreover, the Jacobian J (g(1)) = g'(1) and hence the characteristic root r is g'(1), which is therefore less than one when g is below-critical. It is obvious geometrically therefore that for every x satisfying 0 <x < 1, the sequence of iterates gk(x) is monotone increasing: x< g(x) < g2 (x) < .... This simple situation need not obtain in case t > 2. For purposes of illustration we regard the following example.

Consider the transformation G(x) of the unit square I2 defined by I I 1 g1(x) = - + - 1X+ x1x21 1192(x)= - + -X2 + xx2 Computation of the four first partials at 1 shows that the first moment matrix is M= 4 I and thus M-I=1_41 The upper minor determinants of the latter satisfy (-1)2 AI1 = -- < 0, (-1)3 A21 =-16 < 0. Hence G is below-critical. Indeed the characteristic equation of M is x2 - x + 3/16 = 0 with roots 1/4 and 3/4. Thus r = 3/4 < 1.


71

(The right hand characteristic vector v' is found to be [ upon setting r = 3/4 in the matrix-vector equation (M - rI)v' = 0 and solving the resulting two homogeneous equations in v{,vs.)

Note that (see I), M being subcritical, we must have mij vj<vi for at least one i = 1, 2, whenever v # O, vi> 0, but not necessarily for both indices. In our case, for example, [14 [] [] where 9 < 8 but 3 > 2. 42 2 2

It is essentially this fact that causes the simplicity of the one variable case to break down. Specifically, let x = (4/5, 1/5). Then our G(x) at this point is (74/100, 59/100) and x -< G(x) is false for this x.

We can however prove the following

Lemma. If G is below-critical, and x $ 1 is in It, the k-sequences 9i(k)(x) are eventually monotone increasing.

Proof. By I 8, there is an e > 0 such that ilw - v'll < e implies w positive and mijwj< wi, all i.

By the preceding theorem, this e determines a K so that, for all k>K, II (1- Gk(x)) /ak - v'll < e. For all such k therefore, Zmij(1 -g(k)())/ak< (1 -g)())/ak and the positive ak may be deleted from this inequality. Under our assumptions on G, gi(x) > I + Emij(xj- 1) for every x with all components less than one, and thus I - gi(x) < Emij (- xj). In this inequality we may set x =Gk(x), since we have already shown that the latter enjoys this property of components (cf. II 1).

Thus 1 -g (k+l) ) < Em (1 - (k)(x)), and combining with the previous inequality, 1 -gx )) < 1-g(k)x), whence the result desired.

4. We have seen that the direction of the vector 1 - Gk(x) approaches that of v'. We intend to prove the same result for the "vector of flow" Gk+l(x) - Gk(), with x 1 in It.


72

It is trivial that the latter vector is never zero for any k, and hence defines a direction. For otherwise, G would have a fixed point Gk () 1. Moreover, we know from the preceding result that for all k >K, all components of this vector are positive.

Theorem. If G is below-critical and x : 1 is in It, the direction from Gk(x) to Gk+l(x) approaches that of v'.

Proof is entirely analogous to that for 1 - Gk(x). Fix e > 0 and 2 5 1 in It. Then we may fix k so IITk(v) - v'le/2v/t for all v O 0, vl > 0, and next determine N so n >N implies the inequality (*) of II 2, and i(n)(2X) > g9i(). Now define An = k ij m=(k) (gk+n+l)(,) - g(k+)()) > Mk (Gk+n+l()- Gk+n(r)). Then Iv - (gi++x)( -gki ))/AVil(c (k)/Xj) p(g n+)( -jn)(2 ))/Ak < - ' /g)MJ)-J))/ by Taylor's theorem where Gn(x) -P < G"+L().

The remainder of the proof now proceeds just as in II 2. The essential point is that the Tk operates now on the vector Gk+n+l(p) Gk+n(z) which we know by II 3 to be positive and hence subject to the inequality v' - Tk (Gn+() - Gn(x)) l < e/2V. Thus the final result is Iv'- (Gk+'+n() - Gk+n()) /A\ < e for all n > N.

5. We now have immediately the main result which we want in III.

Theorem. If G is below-critical, and its first moment matrix satisfies the relation Mv' = rv', 1 > r > 0, v' > 0, and further, if x$ 1 is in It, then there is a K such that, for all k > K, the ratio of successive terms of the K-sequence gi +)() - gi()() is less than 1/2(1 + r).

From I 8, there is an e > 0 such that lw - v'll < e implies w positive and mnijwj < 1/2(1 + r)wi. This e determines K by the preceding theorem so that k>K implies


73

Gl(x)G() v <e Ak Hence EJ((k+(l)k(1)-9(k) ()) < i( + r) (k+)()- g(k)( )) since the positive Ak may be deleted. But (k+2) - _g(k) )=gi (G+ )) - gi(Gk(x)) = 9(gi/ x,)p((kgl)x) (k)(-g ))< Emij(gk)x) g(k)()) Hence, combining, we have the desired result.

III—
On the Distribution of Death in Subcritical Systems

1. Let G be subcritical, and define qk(i) as the probability that complete death of the system of progeny from one particle of type i should first occur in the k-th generation.

Clearly pk(i; 0), the probability of death in the k-th generation, may be expressed as k pk(i;) =qj(i) j=l Hence we have the relations: qk(i) = Pk(i; 0) - Pk-l(i; 0) =- gk)( 0) - (k1(0), k > 2, ql (i = p(i; 0) = gi(0)

2. We recall that x z x' implies gi(x) < gi(x') under our assumptions on G. Now 0 ~ G(0), otherwise G(0) = 0 and G would have a fixed point in It besides 1 and would be supercritical. Hence, inductively gi(O) < 92)(0 ) < g ( ()(0) < ..


74

It follows that the qk(i) are positive for k> 2, and ql(i) = gi(O) is positive for at least one i. Also, Z'qj(i) = limk Ek qj(i) = limk pk(i; 0) = 1 for every i, and thus the sequence ql(i), q 2(i), q 3(i),... is a probability density function on the positive integers.

3.Theorem. If G is below-critical, its first moment matrix M having maximal positive root r < 1, there is a K so that for all k >K, qk+l(i)/qk(i) < 1/2(1 + r). Consequently the density sequence qk(i) is eventually monotone decreasing, and all its moments ms(i)- ksqk (i), S0 k exist.

The first statement is an immediate consequence of II 5, with x = 0. The finiteness of the moments follows from the ratio test: (k + 1)sqk+l/kSqk < (1 + l/k)s (1 + r) - (1 +r) < .

4. In the case of a system below-critical in one variable (t = 1), it is geometrically obvious that qk+l/qk = gk+ (0) gk (0)/gk (0) gk-() = g(x') - g(x)/x' - x < g'(1) = r < 1 for all k, so qk < qlrk- l and mi = Ekqk < q (1 + 2r + 3r2+ ...) =ql(l +x+x+...) X = q1 ((1- x)-)xr = ql/(l + r)2. Hence in this case m1< p(0)/(1 + r)2 = g(0)/(1 + g(1))2

5. Examples show that, even in the one variable case if the system is just-critical, even ml may be infinite. We hope to study the one variable case more completely in a separate report.

IV—
Subcritical System with Source

1. Consider t types of particles whose probabilities of transmutation are given by the generating transformation G(x): gi(x) =p(i;jil,...,jt)xj l ...xj as before. Suppose further that we have a source which emits independently into the system n+...+nt particles, ni of type i, with probability s(nl,...,nt)> 0. We associate with the source the generating function S(x)= s (nl,...,nt)xL ... xt, S(1) = 1


75

Consider a process consisting of the following steps:

1. The source produces an initial set of n, + ... + nt particles, ni of type i, with probability s(nl,...,nt). These particles transmute according to the G law to form a system which we regard as the first population.

2. The source again contributes new particles, and these together with the first population transform according to the G law to form the second population, and so on. At the k-th step, the population (ml,...,mt),mi of type i, will occur with some probability hk (ml,...,mt), and we define the corresponding generating function Hk(x) = , hk (ml,. ., m) x ...xt

Now, from the elementary laws of probability, as we have pointed out in I, transmutation of any population with generating function N(x) according to the G law gives a population with generating function N (G(x)) = N (g 1(x),...,gt(x)). Hence for the problem considered above, we see that Hl(x) = S(G(x)), H2(x) = S (G(x))S (G2 ()),..., and, generally Hk(x) = S (G(x)) Hk- 1(G(x)) = S (G(x)) S (G2(x)) ... S (Gk (x)).

For, if Hk-i(x) is the generating function for the (k-1)st population, then the generating function for the intermediate population resulting from the contribution of the source to the (k - )st is Tk(x) = S(x) . Hk-1(x), and, upon transformation of this result by the G law, we must have for the generating function for the k-th population: Hk(x) = Tk (G(x)) = S (G(x)) - Hk-(G(x)) .


76

k 2. From Hk(x) = S (Gi(x)) follows Hk () = Hk- (x) . S (Gk ())

Since, for x in It, the latter factor is less than or equal to one, the k sequence Hk(x) is monotone non-increasing at x and H(x) - limHk(x) exists in It. Clearly 0 <H(x)< I and H(1) = 1.

If S satisfies the conditions

(S*) at least one OS/Oxj > 0 for all x : 0 on It, then for every x' with all components less than one, we have S(x') < 1. For S(x')1+ E (OS/9xj)p (x' - 1), where 0 ¢ P.

But for every x $ 1 in It, we have seen x' = g\)(x) < 1 for all i,k. Hence, for such x, S(x') = S (Gk(x)) < 1 and 1 > H1(x) > H2 (x) > ... so H(x) < 1. Thus we have

Theorem. The function H(x) - limHk(x) exists for all x in It, and 0 <H(x)<1,H(1) = 1. Moreover, H(x) satisfies the functional equation H(x) = S (G(x)) H (G(x)). If the source function S satisfies condition (S*), then H(x) is not identically one on It, indeed, for every x $ 1 in It, H(x) < 1.

3. If G is supercritical: x° ¢ 1, then Hk (O) = H S (Gi(x°))= (S(x°))k 0, SO H(xO) = 0. Moreover, it is easy to see that H(x) _ 0 for all x 5 1 in It. For limGk(±) = x°(rx z 1) and hence S (Gk()) is bounded from 1. Thus only the subcritical case is of interest.

Theorem. If G is below-critical, and S is a polynomial of degree s, then H(x) is continuous.

Proof. We have H_-(x) - H,(x) = S (G(x))...S (G'-l(x)) (1 -S (G .(x))) 1-S (Gh (x)) < 1-S(G"(0))


77

Now, by Taylor's form, g2(0) > 1 + Ej m(n (0 - 1) = 1- yj m(n) > 1 - Vr'. Since r < 1, the latter is positive for all n sufficiently large. Now 1-S (Gn(0)) < 1-Es (i,...,jt) (1-Vrn(j) < 1-S(jl, ...,jt) (1 - Vrn) = 1-(1 - Vrn)s < 1- (1 - sVrn)<sVrn. (See Appendix B). It follows that the sequence Hn(x) is uniformly convergent to H(x) on It, and hence the latter is continuous on this range (see Appendix A).

4. Since H,(x) = S(G)...S(G") = Hn_S (Gn), we have OHn/Oxj = OHn-_i/xj •S (Gn ) + H_n-l (S (G)) /Oxj and \OHn_ /Oxj- OHn/G3xj < OHn-1/Oxj . (1 - S(Gn)) + H,O_l (S (Gn)) /Oxj. Now aHn_l/0xj <OS(G)/0x 3+. . .+S(Gn-1) /O xj and OS (Gn) /Oxj = E (S/x)0 (Og/x) (OSS/)Oi) (n)/ < E (S/xi) m) < TE TWrn.

Thus the above absolute value does not exceed TW (r + ...+ rn-~ ) sVrn +TWr' <Krn(1+r +... + r-1 ) < Krn/(1 - r). Hence by Appendix A, the n-sequences OHn/Oxj are uniformly convergent on It, the partials OH/Oxj exist, and lim OHn/0xj= OH/Oxj in It.

But (OHn/Oxj)1 is the expectation of particles of type j in generation n, and the limit approached by this expectation is (OH/Oxj)1.

Since we know H satisfies the functional equation H = S(G)H(G), we have OH/Oxk = EOS/Oxj dgj/OxkH(G) + S(G) ZOH/Oxj Ogj/Oxk. Setting x = 1, we obtain s (mjk -S jk) (OH/O9xj), = - E (0S/0xj)I mjk, k= I,...,t.

Since G is below-critical, the determinant of the system is not zero and the expectation limits are uniquely determined. Thus follows the

Theorem. If G is below critical and the source function S is a polynomial, the limit function H possesses first partials on It, and the limit of the expectation of particles of type j in population n is the value at x = 1 of OH/Oxj. Moreover the latter limits are uniquely determined by the linear system E(mi k - bjk) (OH/0xj)l= -mjk (OS/OXj)l with non-vanishing determinant IM - II.


78

V—
Total Progeny for Systems Without Source

1. Returning to the simple problem without source, let Pk (i; jl, , t) be the probability that in the total progeny in all generations 1 through k produced by one particle of type i (generation 0), there should be ji particles of type 1,..., jt particles of type t. Define ci)(x) = Z Pk (i; jl, .jt) xj '..X-j and C(k)(x) as the corresponding transformation of It. Here the upper k does not indicate iteration. Clearly P(i; j) = pi(i; ), hence cl1)(x) = gi(x) and C(1 ) = G(x).

Now let k be greater than 1. The production of the total state J,...,Jt at the end of the k-th generation from one particle of type i arises from the mutually exclusive states jl, - jt; 0 < jh < Jh in the first generation. If this state is 0, .. ., 0, then and only then will the total state J1,. .., Jt be 0,... ,0, so Pk(i; O) = p(i; 0).

Suppose then that state J is not 0, and hence state j $ 0. Each of the jh particles of type h in the first generations acts independently of the others, and of those of other types to produce in the k - 1 next generations a total state of some al,...,at particles with probability Pk-1 (h; ai,...,at). We want the total state from the ji,... ,jt particles of the first generation to be J1- ii,..., Jt - jt after the next k - 1 generations. It follows from the elementary laws of probability that, for J 0,

O<jk<Jkji jt Pk(i; J) = Zpi(i;j) ZfPk-l (1; al... at)...f JPk- (t; al,..., at) jiO E ai=Ji-ji But this is the coefficient of xiJ ... xtJt in gi (xic1) (x),.. . ,XtCt- )(X)) = pl (i;j)xjl . xj t [ Pk-1 (1; a)xa ] . [ Pk-l(t; a)xa] and Pk_l(i;0) = pl(i;0), which is the constant term of the above function. Hence the


79

Theorem. The generating transformation C(k)(x) for the total progeny in generations 1 through k satisfies the recursive relations cM (x) = gi(x) ci (x) = gi (xc 1)(X),. ..,tCt (x)

2. If y,z are arbitrary points of It, we have from Taylor's form, gi(Y) = gi(z) + E (gi/dxj)p (yj - zj) and thus gi(y) - gi(z) I<mij lyj- zj \. It can be said therefore that the number dk _ ci (x) - c(x)I < Emij Ixjck _xjc -2 1 < mijd k-1 since is in It Iteration of this inequality yields dk < mrk-2 dk-2, and eventually dk < Zm (k-2 d(2) = -En -2g9j (xG(x) - gj()) I<ZE,k-2 Emjn <_ ijJ = Z ij _ ij Xn9n(X) -Xn= Z mi'n Xn 9n(X) - 1 <m(k1 ) < Vrk1. n Thus we have the

Theorem. If G is below critical, the generating functions ck(x) are uniformly convergent to continuous limit functions ci(x) on It. The latter satisfy the functional equations ci(x) = gi (xicl(x), . . . ,xtt(x)).

3. We seek now a dominating sequence for Di-=Ocik/xj ck-1 aXI First, note that Pj -ac/Qxj = Z (OgiO/Xn)p [5nj Cn x]+XnC 1 /j] <mij+ P\j , where P = xCk- l. Iteration leads to P1<mij + m m) pkj P2 and eventually pk <+m(2)+. . .+m (k-)+Em(k-)(gn/j)<mni+m( + (+mk) a~ <.~ij +.~ij ij in _j +j

For brevity, let Ak and Bn denote temporarily the round and square brackets involved in the Pj sum above. Then D= pk pk-P= |E aBk _ An1 <AkIBk -B \k-1 + EB 1A- -A - < min,lBk -Bk-11' + E Bk-'llAk - Ak-117


80

We obtain upper bounds for the three A, B expressions: (1) Bk -B-'- =(cJ- I6ck-2)+ X (cC-1 _1 /X- -c+2 /iXj) < (2) Bn1 < 6 nj + c a2 /oxj<6nj + (mnj +..+ mj ) (3) IAk - A k-1(Ogi/X.l)XCk-1 - (-g9il/Xn)Xk-2<a&gi/azXnpzlxpc 1 _pC-k-21 Bdk- 1 < BtVr k-2

Hence, combining, Dk < E min (6njVrk-2 + Dj ) + BtVrk-2 (6nj + mj +. +m( 2) < mijVr + E mi,nDn1 + BtVrk- + BtVr-2n(mj+. + m 2))j m ijVrk2 tV+ BVkBtVr2 (I+ Bt + Wrk2) + E mn D - < mTijVr- 2 + BtVr-2 + BtVr-2Wr/(l - r) + E minDn Thus we have Dk < Krk + E minD4nj1

Iteration leads to Dk<Krk-2 + Krk-3min + Krk-4Emin) + ... + Kr Emk 3) + F-mik 2 )D We obtain an upper bound for D2 j = dc2 /Oqxj - dc/9xjj = 9 (gn/ x,)G g,i+ E (agn/xp).x•gX/ -X ogn/0x= (09g /aXj ) ,g-1 + I(Ogn/axj),) - (09gn/ j) + | E (a9gn /xp).X pOgp/O/xj <m,j + B > \xpgp-xp + mnpmpj<m,j + +Bt


81

Therefore, substituting gives Dk < Krk2 (k - 2) + E (-) + E (k-2)m + tm(2) < Krk-2(k - 2) + Vrk- l + Vrk + Btrk-2 . Since each of these terms defines a convergent series so does their sum and, we have

Theorem. If G is below-critical, the sequences ack/x3j are uniformly convergent on It. Hence the partials Oci,/xj exist on It, and are the limit functions of the corresponding sequences. Since lim (Mcr/nxj) = (ci/xj), the latter is the limit approached by the expectation of particles of type j in the total progeny at the end of k generations from one particle of type i. From the functional equation satisfied by ci(x) follows En (mi n- - in) (C,n/OXj)I = -min, i = 1,...,t where IM -IIf 0, and the expectation limits are uniquely determined.

VI—
Total Progeny in Subcritical System with Source

1. Consider again the process described in IV. We found that the probability distribution in the n-th generation had generating function Hn(x) = S(G)S (G2 ) . . . S (Gn) .

Here S (G') gives the distribution for the isolated system produced by the initial action of the source, S (Gn-1 ) is that for the component due to the second action of the source, and so on. Regard the isolated component produced by the (n - k + l)st action of the source, with generating function at the n-th level (of the whole system) S (Gk). By an argument exactly analogous to that of V 1, one sees that the generating function for total progeny at the n-th level for this isolated component of the system is given by S (xic (x),... ,xtCz(x)).

It follows from the elementary laws of probability that the generating function Un(x)- un (il,j,jt) x .. .Xis Un(z) = S (xic(z)..., xtCt(x)). Here n (jl,...,jt) denotes the probability that, at the n-th level of the entire system, there should be a total of ji particles of type 1,..., jt particles of type t produced altogether, counting particles contributed by the source as well as all progeny of such particles.


82

2. Since Un(x) = Un- 1(x)S (xici(x)), and for arbitrary x in It with all components less than one, the latter factor is less than or equal to S(x) which in turn is less than one (supposing condition (S*)), it is evident that lim Un(x) = 0 for all such x. Since however Un(1) = 1 for all n, Un(x) converges to a discontinuous limit function on It. Moreover, it is manifest that the expectations approach infinity so we cannot expect a simple theory.

Appendix A. We collect here some standard results from classical analysis.

A sequence of functions Fn(x) on It is said to be uniformly convergent on It in case (1) limFn(x) = F(x) exists for each x of It, and (2) for every e > 0 exists N so that n > N implies IF,(x) - F(x)l < e, all x in It.

Theorem. If all Fn(x) are continuous on It, and the sequence is uniformly convergent there, then the limit function F(x) is continuous on It.

Theorem. If Kn(z) is a uniformly convergent sequence of continuous functions on It, with limit function K(x), then the sequence of functions In(X) =- 0 Kn (Zl,X2, ...,xt)dzl is uniformly convergent to jXK(zl,x 2,...,xt) dzl, and similarly for the other variables.

Theorem. If (1) Fn(x) converges pointwise on It to the limit function F(x), (2) the partials aF,n/xl exist and are continuous on It, (3) the sequence QFn/Oxl converges uniformly on It, then OF/Oxi exists on It and is equal to the limit of the sequence aFn/9Xl. Similar statements hold for the other variables.

Theorem.Fn(x) is uniformly convergent on It if and only if e > 0 implies existence of N such that for all n>N and all positive integers p, Fn+p (x) - Fn(x) < e on It.


83

Theorem. If F (x) is a sequence of functions defined on It, E Mn is a convergent series of non-negative numbers, and for all sufficiently large n, Fn(x) - Fn+l(x)\<Mn, n It then Fn(x) is uniformly convergent on It.

Appendix B Theorem. If 1 + h> 0 and n is a positive integer, then (1 + h)n> 1 + nh. Proof trivial by induction on n.

VII—
The "Time" Particle

1. It is of interest to note that, in the system with source, it is possible to regard the n-th population as essentially the n-th generation of progeny of a simple system of t + 1 types of particles produced from one particle of new type t + 1. Suppose that we associate x1 ,. .., xt as before with the t original types of actual particles, and introduce a new type of particle with variable xt+l. Consider then the transformation V(x) of It+l, defined by the component functions vi(X) = gi(x) vt(x) = gt(x) X: _ (X1, .. ,Xt,Xt+i) Vt+l(x.) = Xt+lS(G(x)), x = (Xl,...,Xt,).

One verifies easily that the (t + 1)st component of the k-th iterate Vk(x) of V(x) is = xt+lS(G(x))...S (G(x)). Hence vk+1, the result of a simple iteration satisfies the relation t+l(Xl ,... ,Xt,Xt+l) = Xt+Hk(X1, . ..,Xt).


84

The transformation V(x) fails to satisfy the restrictions we have imposed throughout on our generating transformations, for example vli/Oxt+l vanishes identically on It+l, and we have treated the problem independently for this reason. Nevertheless we shall be able to make use of the indicated simplification to the iterative case in a future report on the space of histories of a multiplicative system.

VIII—
Total Progeny as an Iterative Problem

1. In a similar way, the transformation Ck(x) of V may be produced by an iterative process. Let x1 ,...,xt, be the usual variables associated with the t types of actual particles, and z,... ,Zt, be variables for a set of t types of dummy particles. Suppose probabilities of transmutation among the 2t types are defined by the following generating transformation L(x,z) of I2t, L1(x, z) = zig,(X,..,Xt) Lt(x, z) = Ztgt (X,...,t) Lt+l(x, z) = zi L2t(x,z) =- t . = (l,...,t) =(Z,...,Zt)

where the gi(x) are the components of the usual G(x). One verifies easily that ththe -th component (for i = 1,.. , t) L(x, z) of the k-th iterate of the transformation L(x, z) satisfies the relation L (x, x) = XiCk (Z)

If one examines the nature of the process induced by L(x, z) one finds that each time an actual particle of type xi is produced in a generation k, it is forced to produce in generation k+l a dummy particle of type Zi, as well as its actual progeny. Moreover, every dummy particle of type zi is forced to just reproduce itself in one for one fashion. Thus the total actual progeny is tallied by means of the one for one reproduction of the dummies through all generations. Thus if one sets zi = xi in generation k one totals the entire progeny including the actual particles produced from actual particles in the (k- 1)st generation and the dummies which total the whole previous progeny of actual particles. The extra xi factor is due to counting the zero-generation particle in the L process.


85

III

Abstract

The set FJ of all possible genealogies or graphs z of a multiplicative system produced from one particle of type i is here introduced as a fundamental concept in the theory of such systems (see I, II). This set possesses a natural intrinsic distance function d(z, z') under which it is a complete zero-dimensional metric space satisfying the second axiom of countability.

Simple axioms on (A) intervals and (B) measure of intervals are given for an abstract set from which the classical theory of completely additive measure is derived.

Intervals in the set J1 are defined intrinsically and shown to satisfy the axioms A. If now a particular multiplicative system with given generating transformation G(x) is given, the transition probabilities Pl (i;ji,.. ,jt) serve to define a measure for the intervals of 1; satisfying the axioms B. Proof of the latter is non-trivial due to non-local-compactness of the space 17.

With this mathematical structure at hand it becomes possible to state in a simple way some of the striking properties of multiplicative systems.

If x° = G(x° ) is the death-fix-point of G(x), then the set of graphs of Fi which terminate in death has measure xi .

If v = (vl,...,v 1) is the characteristic vector corresponding to the maximal positive characteristic root r > 1 of supercritical system, then the set of all graphs of 1 whose k-th generation population approaches the ratios v:v2:... :vi has measure I - A°. Thus, almost all graphs (genealogies) either terminate in death or approach the mode v as limit. These results are trivial for subcritical systems, in which by definition, 3z = 1.


86

I—
A Remark On Measure Theory

1. It is convenient for our purposes to have a simple set of axioms on which measure theory may be shown to rest. To this end, let r = {z} be a set of points z, and I = {i} a class of special subsets i of F called intervals. We demand that the entire set F and the empty set X be intervals. Denote by J the class of all subsets S = Ei of F which are set sums of a finite or countable number of intervals. (All set sums hereafter are understood to operate on finitely or countably many summands. The notation La is used to indicate that the summands are mutually disjoint.)

We suppose that intervals satisfy the following axioms: I 1. Every set S = Zi of J can be represented as a sum Ej of disjoint intervals. I 2. The complement i' = -i of an interval i in J. I 3. The set product ij of two intervals i and j is an interval k.

Moreover, we assume m 1. To every interval i is assigned a non-negative real number m(i)>0 called its measure. m 2. M(F) = 1, m(0) = 0. m 3. If i = Ej, where the i and j are intervals, then m(i) = mr(j).

An additive class C of subsets of F is one such that C 1. All intervals belong to C. C 2. A is in C whenever all sets A are in C. C 3. If A is in C, so is A'.

The Borel sets are those common to all additive classes, hence, themselves form an additive class.

We shall define a property of subsets U of F called measurability, and prove that the class of measurable sets is additive. Simultaneously we define a measure m(M) for every measurable set M and prove

M 1. 0 <m(M)< 1 for M measurable. M 2. m ( M) =Em(M) for M measurable. M 3. If M is an interval, the measure assigned to M as a measurable set coincides with its intitially given measure.


87

2. In stating the above axioms, we have attempted to focus attention on the fundamental assumptions. It remains to show that the essential theorems follow from our axioms. This we do in the remainder of this chapter, for the sake of completeness, following the classical arguments for the real line1 . We note first the following properties of intervals: I 4. E S is in J whenever all the summands S are in J. For E S is an at most countable sum of at most countable sums of intervals. N

I 5. I1 i, then S' is in J whenever all N factors are in J. It suffices to see this for N = 2. Hence let S1 = Zim, S2 = j. Then S1S2= i,mjn = E (imjn). By I 3, imjn is an interval, so S1S2 is in J, the range (m, n) being at most countable. 6. If S = i, then S' is in J. N For S' = i', each i' is in J by I 2, and thus S' is in J by I 5. 1

3. We now assign a measure to every set S of J. Note first that if i D Si in, then i = Zin±+i(in) . By I 6 and I 5, the latter set is in J, so by I 1, we may write i= in i-n+ Ejm, and by m 3, and m 1, m(i = E (in) + E T(j) > E m (in).

Hence, if in is any sum of disjoint intervals, we have for every N, F D i in and hence 1 > Z m (in). Thus E m (in) exists, less than or equal to 1, for every such disjoint sum.

Now suppose -i,n = Zjn. Then im = im(Zjn)= E (imjn) ,nkmn, kmn being the intervalim, jn. (See 1 3.) Hence by m 3, m (im) = En m (k,,) and m(im) = m En m(kmn). Similarly Em(j,) = En Em m(kmrl). Now the double sum mn, m (kmn) converges to a limit < 1 by the preceding paragraph since the kmn are disjoint. Hence the iterated sums are both equal to this limit and hence to each other. Thus E m(im)= - m (jn).

It is therefore clear that, if S is any set of J we can write S = Zi and define m(S) = rm(i) unambiguously. Clearly 0 <m(S), and if S is an interval i, m(S) coincides with the initially given measure m(i).


88

4. We establish in this section some properties of the measure m(S) for sets S of J. First, m 4. m(S) =m(S). For, write Sm =i,,mn. Then m(ZS) m( n imn)En,m (im,) = Zm n m (in,) = m (S,) . Next m 5. If S and T are in J and S c T, then m(S)<m(T).

For, let S = E im and define Sn =m. Then T = S +TS', the summands being disjoint and in J. Hence, by m 4, m(T) = m (S) + m(TS')> rn(Sn) =m(i,,). Hence m(S) = m(i) < m(T). Thus follows m 6. For all S in J, 0 <m(S)< 1.

Next we prove m 7. For arbitrary sets S of J, disjoint or not, m(Z£S)<Zm(S), where the latter sum may of course be infinite.

Write S = E imn. Then ZSm = m,i,,, = ip = il4i1i 2+ili2" 3i . , the latter summands being disjoint and in J. Hence, by m 4, m 5, m(Sm,) = m(il) +(i'i2) + ... <(il)+m(i 2) + ...= r m(ip) = m,, 7n(imr,) = nErE m (imn) = Emm(S), the latter possibly being infinite.

Finally we have m 8. If S and T are in J, then m(S + T) + m(ST) = m(S) + m(T). First suppose S and T are each finite sums of intervals. Write S + T = S- (T- ST) ST- (T - ST) = T

where S and ST are in J, and, ST being a finite sum of intervals, also T - ST = T(ST)' is in J. Thus by m 4, m(S + T) = m(S) + m(T - ST) m(ST) + m(T - ST) = m(T), and addition yields the equation of the theorem.


89

Now let S = Zim, T = ij,, where we make either sum terminate in 0 in case it happens to be finite. Define Sn and Tn as the corresponding partial sums through the first n terms. Then by the first part of the proof we have for every n, n (Sn + Tn)+m(SnTn) = m(Sn)+mr(Tn). But limm(Sn) = limTli, = mnS), and limm(T,) = m(T). (Recall that m(0) = O).) Also, S + T = im + j, = E (im +jm) S, = S1-+SS2i   Thus m(S + T) = m (S1) + m (S2) +...= lim, (m (SI)m(S9S2)+...+ m(S   S S)) =limnm(Si+SS2 +-... + S ... n) = limn (1 + ...+Sn) =limn m (Sn+ T) . Finally ST = n im , = ,jimj n =mmnn = kj = mn = k +k k2 +.. where the intervals kmn = kp are listed in sequential order by upper squares: (1,1), (1,2), (2,1), (2,2); .... Then m(ST)= rn(kl) + m (klk2) + .. . limn2 (m (kl)+... + mn (ki ... k2_lkn2)) = lim,n mk+... ... (kl + +k k _)lk 2) = lim m (k + m (SnTn).

Hence we obtain m 8 in general by taking limits of both sides of the finite relation.

5. Let U be an arbitrary subset of F and define the outer measure O(U) = glb(m(S); U c S c J), and the inner measure I(U) = I - O(U'). From m 6 we have for all subsets U, m 9. O <O(U)< 1, and O <I(U)< 1.

Moreover, if U1 c U2 , every S of J which contains U2 also contains U1, so the numbers m(S) defining O(U 2) form a subset of those defining O(U1), and hence m 10. If Ui c U2 , then O(U1) O0(U2 ) and I(UI) < I (U2).

For every Si D U, S 2 D U', SI, S 2, in J, we have S1+S 2 D U+U' = F and M (F) = 1 = m (S + S2) <m (S)+m(S2 ). Fix S1 U. Then 1 - rn (S) < m (S2 ) for all S2 D U', so 1 - m(S)<O()1 U') U')< m(SI) for all S, DU, and 1 - O(U')<O(U). Thus follows m 11. For every subset U, I(U)<O(U).

Now let {Un } be a sequence of arbitrary subsets Un of F. Fix e > 0. For every 7n there exists an Sn in J such that Sn D Un and n (S,) < O(U) +e/2n. Hence Sn D EU,, Sn is in J, and m(r Sn)<Em(Sn) <ZO(Un)+e. Thus O(EUn )<m(Sn,) < EO(Un) +e for every e > 0 and we have


90

m 12. For arbitrary subsets U, O(0 U)<0(U). Suppose U 1, U 2 are disjoint. For every S1DUl, S2 D U2, S1,S2 in J, we have Sl + S2 D U{+ U2 = (U1U2)' = 0 = so m (S + S2 )= 1. Also, (Ul +U2)' = UlU C S1 S2 is in J, hence O (U +U 2)'< gib (m(S1S2)). Then I (Ul + U2) - I (Ul)- I (U2) = 1 - O (U1 + U2)'- I - (U)- 1O(U2) = O(U{) + O(U2) -0(Ul + U2)' - 1 > glb m (S1) + glb m (S2)gib m (S1S2) -1 gib (m(S1) + m(S2) -m (SiS2 )) - = gib m (S1 + S2) - 1= 1 - 1 = 0. We therefore obtain m 13. UIU2 =0 implies I (Ul) + I(U2) < I (Ul + U2 ).

Generalizing this, we have I(ZU)> I(U1)+ I(E2 U) >I(U1)+ I (U) + (I Z 3U)> ... >I() +...+(UN) +I( EN+) > 1 I(U) for all N, and m 14. (I(U)<I(L U), for disjoint summands.

6. We say a set M is measurable in case I(M) = O(M) and denote by K the class of all measurable sets M. For a measurable set M we define a measure m(M) = I(M) = O(M). From m 9 follows m 15. For a measurable set M, 0 <m(M)< 1.

Let i be an interval. Since i is in J and contains itself, we have O(i)<m(i) since O(i) is a lower bound. Now suppose i c S E J. Then m(i)<m(S) and m(i)<O(i) since O(i) is a greatest lower bound. Hence rn(i) = O(i). Since i' is in J, we may write F = i+ jn, so m(F) = 1 = m(i) + m(i'). Now 1(i) = 1-O(i') = 1 - O(Ejn) > 1-E (O(j,)) 1 - Em(jn) 1 - m(i') = m(i) = O(i)>I(i), so we have established m 16. Every interval is measurable with O(i) = I(i) = m(i), its initially given measure.

Also, we see m 17. If Mn are disjoint measurable sets, then E M, is measurable and m(ZMn) = Emr(M,). For, Em(M.) = I (M,)<I( Mn) <O(M,) < 0O(M,)= Em (M,), and I(EMn,) = O(EM) = E (Mn).


91

As a corollary we get m 18. Every set S = i of J is measurable and its measure as such coincides with that previously defined. If M is measurable, I(M') = 1-O(M) = 1-I(M) = 1-(1-O(M')) O(M'), and so m 19. The complement of a measurable set is measurable and m(M) + m(M') = 1.

Finally , we have to prove m 20. A sum of measurable sets, disjoint or not, is measurable. Let M1, M 2 be measurable, and hence also M,, M2. Fix e > 0. Then there exist sets S1,S2, T1 ,T2 in J such that S1 DMl,S2 D M2 , T1D MI, T 2 D M2 and m(Si) < O(MI)+e; m(Ti) < 0(Ml)+e m (S2) < O(M2) + e; m (T2) < 0 (M2) + e.

Now S1 + TI D M1+ M' and S2+T 2 D M 2+ M2 so S1 +T = F = S2 + T2 . But 1 + m(SlTl) = m(Si + T1)+ mT m(iTi) = i(S 1) + m(Ti) < O(M1 ) + e + 0(Ml) + e =m(Mi) + m(M{) + 2e = 1 + 2e. Thus m(SiTl) < 2e and similarly m(S2T2) < 2e.

But M1 + M 2C S1 + S2 and (M1 + M2)' = Ml,M c TiT2, so F (M1 + M2) + (Mi + M2 )' c S1+ S2 + TiT2 . Moreover, from the first two inclusions, 0(M1+M 2) < m(S1 +S2 ) and I(M1+M 2) = 1-O(M +M2)' > 1-m(TiT2). So, 0(M1+ M2)-I(Mi + M 2)<m(Sl + S2)+m(TiT2)-1 = m(S1+S2+TlT2)+m( (S1 + S2)TiT2)-1 = m (F)+m( (S + S2) TT2)-1 = m(S1TiT2+ S2TIT2)<mi(STl + S2T2)<m(SiTi) + m(S2T2) <4e. Thus O(M1+M2) <I(M1+M2) and M1+M2 is measurable. Accordingly, so is every sum of a finite number of measurable sets.

But we see now that a sum of a countable number of measurable sets E M, = M1 +MM2-+ (Ml+ M2)'M3- ... is measurable by m 17 and the fact that each of the latter summands (M1+ . + Mn_l)'Mn (Ml+ ... +M,-i + M,)' is measurable by m 19 and the preliminary results for finite sums.


92

II—
The Set of Graphs

1. Consider t types of particles, such that a particle of type i may produce, upon transformation, an arbitrary number jl + .... + t > 0 of such particles, of which js are of type s, s = 1,...,t. We suppose that transformation times are the same for each type, and hence that generations may be counted unambiguously. We agree to consider zerogeneration as consisting of one particle of a fixed type i, and then consider the set Fi of all conceivable genealogies or histories proceeding from it, that is, the infinite record of the transformations of this particle and of all its progeny through all generations k = 0, 1, 2,...

2. We may represent such a genealogy in the plane by a graph or lattice if we agree on the following conventions: (a) A particle of type i in the k-th generation is represented by a number i in the k-th row.

(b) If a particle of type i in generation k is transformed into no particle, that is, if it dies in generation k + 1, this is so indicated by a sequence of zeros proceeding from it to the (k + 1)st and thence successively to all lower rows, thus: (i) row k I (0) row k + 1 (0) row k + 2 I * • * • * • (c) If a particle of type i in generation k is transformed into jl + .. jt > 0 particles, j, of type s, in the (k + 1)st generation, this is indicated by a branching from the corresponding number i in the k-th row into a group of jl + ... + jt numbers in row k + 1,js being the quantity of numbers s, identical numbers being grouped consecutively, and different numbers ranging from left to right in increasing order.


93

Thus (1) /\ (1) (1) (2) indicates that a particle of type I was transformed into two particles of type 1 and one particle of type 2. Note that the two 1's represent different particles, so that the events (1) and (1) (1) (1) (2) (1) (1) (2) I I I I I (0) (2) (0) (2) (0) (0) are counted as different.

Consideration shows that the set Fi of all genealogies is uniquely represented by the set of all such graphs z in the plane, and we speak hereafter of the set Fr of all graphs z. We will not change the type of the zero-generation ancestor during the subsequent discussion.

3. The set ri has at least the power of the continuum. For to every sequence of O's and l's {an}, we may make correspond a particular graph (there are many) which contains a total of one particle in generation n if a, is 0 and a total of two particles in generation n if an is 1. This correspondence is one-one on the set of all sequences {a(,} which has the power of the continuum to a subset of the set Fi. That the set ri has power less than or equal to that of the continuum may be seen directly or from a topological result which we wish to establish anyway in the next section. It will thus appear that Fi has power of the continuum.

4. If z is a graph, Zn denotes its upper segment from generation O through n, both inclusive. Thus, if z = z', then Zn = zn for all n, whereas, if z 5 z', there exists a first integer k = k(z,z') such that Zk / Zk. Since z0 = (i) for all graphs z of ri, the integer k(z, z') > 1.


94

5. Suppose z,z',z" are all different, where m = k(z,z')>n= k(z', z"). Then Z,n- =z'_l = Z_, and k(z, z") >n. Hence we have the

Lemma. If z, z', z" are three different graphs, then k(z, z")> min (k(z, z'), k(z, z")).

6. A graph z is said to terminate in case it contains no particles (only O's) in some generation. We can write the set T of all terminating graphs in the form of a disjoint sum T = ToT 1T 2 +...

where Tn is the set of all graphs yn which contain at least one particle in all generations through the n-th and die completely in generation n + 1. The set of To consists of the single graph (i) I (0) I (0)

We prove the Lemma. The sets T, T2 ,... are all countable and thus so is T.

Proof. Induct on n of T,. First, T, is countable, since it is in oneone correspondence with the countable set of all integer-component vectors (jl,...-,t), js> 0, Ejs > 0. Suppose Tn is countable. Every graph of Tn+ 1 is the continuation to one more "live" generation of the section yn of some yn of Tn. This serves to split Tn+1 into a countable (since Tn is) number of subsets S, all y"+l in a fixed S being continuations of the same yn of Tn to one more live generation n +1, followed by death. Now the n-th generation of yn contains only a finite number of particles each of which can branch into the (n + 1)st generation in only a countable number of ways. Hence each subset S is countable and so is Tn+l.


95

7. We developed in I an abstract theory of measure, which we are eventually to apply to the set Fi of graphs z. To this end we define intervals in Fi as follows. By an interval of order n(n = 0,1,...) is meant the set i(yn ) of all graphs z such that z, = yn, where yn is some terminating graph of Tn. Note that the interval of order 0 is Fi itself. By an interval we mean an interval of order n, or a single graph z, terminating or not, or the null set 0. Define J as the class of all sets S = i which are the sums of an at most countable number of intervals, just as in I. We now have to verify the axioms I 1, 2, 3.

8. We have in the present case the simple and unusual situation that, if i and j are intervals then they are either disjoint or one is contained in the other, hence ij = 0 or i or j and I 3 follows. For suppose ij 5 0 so that some z is in both. If i or j is a single graph it must be z itself and hence lies in the other interval.

Suppose then i = i(ym), j = i(qn), with m < n. Then y =- Zm = y. Let z' be in i(pn). Then z = Y m =Ym and z' is in i(ym). Thus i(ym) D i(yn). Moreover, we see that if m =n,i(ym) = i(yn).

9. Suppose S= E i(yn) + E z + E 0 is any sum of intervals where the i(yn ) are summed according to increasing n. Without altering the sum S we may successively (a) delete the 0 summands, (b) delete duplicate z's, (c) delete duplicate i(yy)'s, (d) delete z's contained in the remaining i(yn)'s, (e) delete every i(yn) which intersects a preceding i(ym) with m < n, since then the former is contained in the latter. The resulting summands are now disjoint with sum S, and we have I 1. Moreover, and unusually, the final summands are a subclass of those originally given.

10. We have to show now (I 2) that the complement of an interval is in J. (a) If i = 0, i' = ri = i(y). (b) If i = i(y ° ) = Fi, i' = 0, an interval by definition.


96

(c) If i = i(yn), where ny is in Tn and n > 1, we may write the disjoint countable sum n-1 ri= E yk + S i(yn) k=OykeTkyneTn

since every graph z either terminates at some generation less than or equal to n, or lives through generation n and is thus in some i(yn ) with y, in Tn. The complement of i(yn) is manifestly in J.

(d) If i = y-l, where n - 1 > 0, the above decomposition of ri shows that i' is in J.

(e) If i is a non-terminating graph z, define y" in Tn so yn = Zn, n = 0, 1,.... We claim that z = H1i("n). Clearly z is in every i(y"). Moreover if z' is in i(yn) for all n, then zn= yn = Zn and z' = z. It follows that z' = - (i(yn ))', the summands being in J by b,c. Hence z' is in J.

11. We saw in I that if S = ii + . + in then S' is in J. This may be false in our case for countable sums. For example, if T = E yn is the set of all terminating graphs, then T' is the set of all non-terminating graphs. The latter cannot be a sum of a countable or finite number of intervals, for if T' = i i(yn) + z, we have T' with power of the continuum, the z summands are at most countable, hence at least one i(yn) must occur in the T' sum. But then yn is in T', a contracdiction.

12. For use in IV we include the

Lemma. An interval of finite order cannot be expressed as a finite sum of two or more disjoint non-null intervals.

For suppose i("n) =i(ym) 4- z is such a finite sum. Let T be the set of all terminating yn+1 such that n+l = ". Of these there are countably many, all in i(fn). Since the z's above are finite in number, an infinite subset T of T must be in the i(ym); if such a yn+1 is in i(ym) we must have n + I > m since yn+l = ym and ym is alive in generation m. But since y"m is in i($n), we have also m >n. Moreover if m = n, i(yn) = i(ym) and we should have only one summand above. Hence m = n + 1 and yn+l = ym. So to each yn+l of the infinite set T


97

corresponds a summand i(y"). The correspondence being one-one a contradiction arises.

Corollary. If El in D j, where i,j are intervals then j is in some in.

If j is 0 or a single graph, this is trivial. Let j be an interval of finite order. Then j = j(Zin) = EN(jin), where jin is an interval. By the preceding result, all jin = 0 except one, so (say) j = jil c il.

13. Let i(yn ) be written as the disjoint sum i(gn) = yn+ .i(yn+l) where yn+l ranges over the set T of the preceding paragraph. Then if j is an interval properly contained in i(ny), it must be contained in one of the summands. We need only consider the case j = i(ym). Since ym is in i(yn),ym must be a continuation of yn, and since ym : yn,ym must be a continuation of some yn+1 of T. Thus ym is in some i(-n+1) and so is j.

14. If we are given an arbitrary finite class N of disjoint intervals i (ym) and yP, there exists a disjoint decomposition of Fi into such intervals among which appear all those of the given class N. First write ri = yo+i(y') (1) y'ET1

We leave unaltered all i(y') which occur in the class N. All others we decompose further thus: i(y') = y' Ei(y2 ) where y2 ranges over all graphs of T2 which are continuations of y', and substitute the results into (1). Again we retain all i(y2 ) occurring in N and decompose all other i(y 2 ) one more step as indicated. This procedure eventually obtains a disjoint sum of the desired sort including all i (ym) of N as summands. The yP of N, being disjoint from the given i (ym) must occur in one of the other intervals of the sum before us. Each such interval may be decomposed until the contained yP's are expressed as summands.


98

15. If Z i(yn ) D i(ym), then the latter is contained in one of the summands. For ym is some i(yn ) and hence, so is i(ym).

III—
The Space of Graphs

1. On ri we define a metric or distance function as follows: If z = z', d(z,z') = 0, whereas, if z - z', d(z,z') = l/k(z,z') where k(z,z') is the first integer k(> 1) for which zk # zk. It is clear that d satisfies the axioms for a metric:

(A) d(z,z')> 0, for if z = z', d = 0, while if z 5 z', d > 0. Hence also d(z, z') = 0 if and only if z = z'.

(B) d(z, z') =d(z', z) since k(z, z') is symmetrically defined.

(C) d(z, z")< max (d(z, z'), d(z', z"). If any two of the three graphs involved are equal, this inequality is trivial. If all three graphs are different, C follows from the lemma of II 5. This is indeed stronger than the customary triangle inequality: d(z, z") <d(z, z') + d(z', z").

2. If z and z' are graphs, the following are equivalent (a) d(z, z') < e. (b) z = z' or z - z' and k(z, z') > 1/e. (c) z = z' or z fz' and k(z, z')> [1/e] + 1,

where the notation [1/e] indicates the greatest integer in 1/e. (d) Zn = z' where n = [l/e].

To each graph z and real positive number e > 0 we assign the e-neighborhood Ne(z) of z, namely the set of all z' such that d(z, z') < 0. Thus Ne(z) consists of all graphs z' such that z4 = Zn where n = [l/e]. It is clear that if z' is in Ne(z) then Ne(z') = Ne(z).

3. A sequence of graphs {z() } converges to limit z in case d(z(n), z) 0, that is, for every e > 0 there exists an N such that for all n >N, z() coincides with z through generation [l/e].


99

4. The space Fi is complete in the sense that every sequence (z(n) such that d(z(n), z()) O 0 has a sequential limit z. For the given condition implies that for every n = 1, 2,... there is an N(n) increasing with n such that Zn = z,, m >N(n). From this one sees that the subsequence zN(n), = 1,2,... has the property zn() = z(n+P) and hence defines a graph z such that Zn,- zn which is the limit of the subsequence and hence of the original sequence.

5. Every Ne(z) is a closed set. For let z' = limz() where z() Ne(z). Then for n sufficiently large, z() coincides with z' through generation [1/e]. But z() is in Ne(z), hence coincides also with z through generation [l/e]. Thus z' coincides with z through [1/e] and is in Ne(z). Hence the space Fi is zero-dimensional, every neighborhood being both open and closed.4

6. The space ri is not compact. For example, the sequence consisting of an arbitary sequential ordering of the countable set of graphs y' or T 1 can have no convergent subsequence, since no two y' have the same section y'. For the same reason, no neighborhood Ne(z) of a graph z which is alive in generation [l/e] is compact, and therefore the space Fi is not even locally compact.

7. A space is said to satisfy the second axiom of countability in case there exists an at most countable set C of neighborhoods of the original system such that for every z and N,(z) there is a neighborhood N of the system C such that z e N C Ne(z). Our space Fi satisfies this axiom in the trivial sense that the whole set of original neighborhoods is itself countable. Consider an arbitrary Ne(z). If [1/e] = 0, then Ne(z) = Fi, and there is only one such neighborhood. If [1/e] = n> 1, clearly Ne(z) = N (z). In this case define a terminating graph x so that xn = Zn and x has only 0's in generation n + 1. Then N,(z) =N,(x), where x is in T. Since T is countable, so is the entire neighborhood system.

Note that it is not implied in the above that x is in Tn. If z is dead in generation n., x = z will be in some Tm with m <n, and Ne(z) will consist of z alone.

8. It is well known5 that a space satisfying the second axiom of countability has at most the power of the continuum. The argument


100

rests on assigning to every z the class of all neighborhoods in C which contain z. This is one-one (by the separation axiom) on Fi to some subsets of a countable set. The class of all subsets of a countable set has power of the continuum so the power of ri cannot exceed this. Combining with the opposite inequality obtained earlier, we see that ri has the power of the continuum.

9. A word may be said about the relation of intervals to neighborhoods. Every neighborhood is either an interval of finite order or a terminating graph, hence an interval. Every interval of order n is either ri = N (y°) (when n = 0) or a NL (yn) (when n> 1) hence a neighborhood. Every terminating graph is a neighborhood, e.g., yn = N i (y), but a non-terminating graph cannot be realized as a neighborhood, and of course, neither can the null-set.

IV—
Measure in the Space of Graphs

1. Whereas the notions of the set Fi, its intervals, and its topology are intrinsic in character, depending only on the number t of types of particles considered, we may establish a measure for intervals, and hence for the class of measurable sets in various ways. We are concerned at present only with the following procedure. As in I 3, we consider a fixed generating transformation G(x) of the unit cube It of t-space: gi(2x1. rXt) = Pl(i;jl,. i -.,jt)xjl ...xJ i= 1,...,t,

defining the probabilities of transmutation pl(i;j) of a particle of type i into jl+ ... + jt> 0 particles, js of type s,s = 1,..., ,t. Then every finite upper section Zn of a graph z of ri has an associated probability P(zn) that the event defined by Zn should occur.

For example, if G(x) is g 1(xI,X 2) =I + Ix1+1XlX2 4 4 1 100 92(Xl,X2) = 2 + 4X2 + iXlX2,


101

and zo = (1); zl =(l); 2 =(1) I / \ (2) (1) (2) l l (0) (2) we have p(zo) = 1,p(z1) = 0, and p(z2) = 1/4(1/2 1/4) = 1/32.

2. We assign a G-measure to intervals of Fi thus: (a) m(0)= 0, (b) m(z)= limp(zn), (c) m(i(y(n)) =p(yn).

Note that m(z) may be non-zero, for example for a terminating yn~, m(yr) = p(y±+i), and m(i(yn)) may be zero. Clearly ml, m2 of I are satisfied and it remains to prove m 3. This is non-classical because of the non-local compactness of the space ri and we divide the proof into a number of steps. Since m(0) = O, we have only to consider the case i(yn) = Zi(yn) + Z of a countable number of summands (see II 12).

3. First suppose i D jl + ...-j, the j's being disjoint and nonnull. Then m(i)> Em(j). Proof is by induction on s. If s = 1, we have the case i D j. If j has finite order, so does i and we write i(ym) D i(yn) where necessarily n >m. Then m(i(yn)) = p(yn) <p(yn) = p(ym) = m(i(ym)). If j is a graph z, the only case to consider is i = i(yT.) D z. Then m(z) = limp(zn)<p(Zm) = p(ym) = m(i(y"7)). Now suppose the theorem true for <s- 1 summands and consider i(yn) :DJi + ...+ is with s> 2. Write i(yn) = yn + Ei(yn+l), where m(i(y")) = m(yn) + m(i(yn+l1 )). Then since s > 2, each j is properly contained in i(yn) and is therefore in one of its summands. If not all j are in the same summand, we invoke the induction assumption and obtain the result. If all j are in the same summand i(yn+l ) we decompose it and repeat the argument. Eventually the j's must be split between two summands, since the intersection of a nest i(yn) D i(yn+l) D i(yn+2 ) D ... is a single point. Hence eventually we obtain


102

the result from the induction hypothesis and the fact that m(i(yn))>m(i(ynP)) = m (yn+P) + E m(i(yn+P+l)).

4. Hence we see that i = = j implies m(i) >m(j). The opposite inequality remains to be proved. Because of non-local compactness we need the lemma of the next section.

5. Lemma. For every e > 0 there exists a set Y of disjoint intervals i(y") such that (a) Em(i(yn)) < e, (b) Ce (i(y")) is closed and compact.

Denote by s, an arbitrary finite graph from generation 0 through n. Since 1 = Ep(sl) where si ranges over all finite graphs of order 1, we have Ep(si) < e/2 for all but a finite number N 1 of si. The intervals or order 1 corresponding to the former sl we throw into y.

Consider for each of the N 1 remaining sl all of its continuations s12. Then p(Sl) = -P(S12) and we have Ep(s12) < e/2 2 Ni for all but some finite number of s12. Collecting the former s12 for all s1 of the finite set, we have E p(S12) < e/22 . The corresponding intervals of order 2 we throw into y, and retain the remaining finite number of s12. We proceed in this way to obtain the set y with E m(i(y")) < e/2 + e/22 + ... = e, at each stage retaining a finite set of finite graphs s12...,. We see now the Ce is compact, since an infinite set of graphs z in Ce may be thrown into a finite number of disjoint classes according to the agreement of their segments z1 with the retained sl. At least one class must be infinite. The latter class is further subdivided into a finite number of subclasses according to z2, and so on. The process defines a graph z with segments sl, 12,S123, which is clearly a limit point of the given infinite set of z's. Thus Ce is compact. It is manifestly closed since it is a complement of an open set. (Recall that every interval of finite order is a neighborhood, hence an open set, and the sum of open sets is open).

6. We saw in 4 that if = Ej then rn(i)>]m(j). We have now to prove the opposite inequality. We do this first for the case where i = ri, that is, we suppose Fi = j, explicitly


103

ri = Ei(YT) + P Z(V)_j

the z",v = 1,2,... being non-terminating. Fix e > 0. Since m(z()) limn p(z,() ) and the latter sequence is monotone non-increasing in n, we can fix N(v) such that p(z(,)^)) < m(z(v)) +e/2". Define yN(v) so that it is in 7Tv() and N(v) = Z() Then m(i(yN(v))) = (N(V)) =p(z ) < YN(u) N (v)' rn(z(V)) + e/2" and z() is in i(yN(v)). For the given e > 0 we define the set Ce as in 5, and we have ce c ri c c Zi(ym) + -yP+ E i(yN(v)).

Now we saw in III 9, that every interval i(ym) of order m is a neighborhood of our space namely N1/mr(y"'), and every terminating graph yP is a neighborhood, namely N1/(p+l)(yP). Hence we have the closed compact set Ce covered by a countable class of open sets. A well known topological theorem tells us that Ce is covered by a finite sub-class of the original open sets, i.e., ce CEi(yi) + EYP + Ei(YN(v))

the primed sigmas indicating summation over a sub-class of the original summands. Moreover we know that every sum of intervals can be expressed as a disjoint suml of a subclass of the original intervals so that we have C, C cli(y) + Z'_yp+ti(yN(')). The latter sum we call S.

Since there are only a finite number of intervals in S, consisting exclusively of either intervals of finite order or terminating graphs, we may decompose Fr into a disjoint sum of such intervals, among which will be those of S (see II 14.): ri = S+E y-P+i(y-) and such that 1 = E) y m(i(+Ym))+ E m(yP)+Z r(i(yN(' )))+ Cl(yP)+m (i( yP)) .


104

Taking complements in the preceding inclusion we obtain Ce D S' that is, E i(y) D Pi(m) y

Now each summand on the right is contained in one on the left (II 15) and thus using the results of (5) and (4) e> m( i(y"))> E m(y) + m(i(m)) y = 1-(EZm(i(ym)) + m(yP) + E m(i(yN()))> 1-(Em((i(y )) + ETm(y P) + Em(i(yN('))))> 1-(,m(i(ym)) + E± m(yP)+ ± m(zV) + e) = 1-m ~ m(j)-e.

So, rm(j)> 1 - 2e for every e > 0 and m(j)> 1, as was to be proved. From (4) we therefore see that ri = 7 j implies 1 = m(j).

7. Finally, suppose i(y n ) = Ej. Then we may decompose Fi into a sum of disjoint intervals (see II 14) in two ways: Fi = i(yn) + k =>j+k. By (6) we know that 1 = m(i(yn)) + Zm(k) and also 1 = Em(j) + rm(k), so that m(i(yn)) = m(j). This completes the proof of property m 3 of I.

8. Hence we see that every generating transformation G defines a G-measure for intervals of Fi, and thus a measure for the additive class of measurable sets as indicated in I.


105

V—
m(T) = x[superscript(0)subscript(i)]

1. Let T be the set of all terminating graphs y. It is trivial that T is measurable since it may be regarded as the sum of its countably many points each of which is an interval. Now pk(i; 0) is the probability of death in the k-th generation and as such represents the sum of the measures of all graphs in the set To +... + Tk_l. It follows that m(T) = Yk=0yCETkrm(y) = limk Fy-Tk m(y) = limkpk(i;0) = I°. Thus we have

Theorem. The G-measure of the set T of all terminating graphs of Fi is x°, the i-th component of the death-fixed-point x° of G. Hence, in a subcritical system (x° = 1), almost all graphs terminate.

VI—
A Strong Ratio Theorem for Supercritical Systems

1. Let G be the generating transformation: gi(x) = Epi(i;j)xj for a supercritical system with first moment matrix M = [mij]. We recall that mij = (Ogi/Oxj)1 and the maximal positive characteristic root r of M is greater than unity, possessing a unique left characteristic vector v with positive components vi, and having norm llvll = 1.

By the e-cone Ce of the system (e > 0) is meant the vector j = 0 together with all vectors jsuch thatlij/a-vll < efor some positive a.

We proved in I 3, a weak ratio theorem to the effect that, for every e > 0, f > 0, there exists a K such that, for every k>K, E Pk(i;j) < f

2. We prove in this section a much stronger result. Consider the set of all graphs z of li. Let T be the set of all terminating graphs y of Fi. For an arbitrary graph z denote by j(zk) the vector j = (i,..,jt) whose component js equals the number of particles of type s in the k-th generation of z. Let L be the set of all non-terminating graphs z such that limk ray j(Zk) = ray v. Precisely, L consists of all nonterminating z such that, for every e > 0, there exists K such that, for all k>K, j(zk) is in Ce. Define N = Fr - (T + L). We thus have Fi=T+L+N


106

where N consists of all non-terminating z for which e > 0 exists so that for all K, there exists a k>K for which j(zk) is outside Ce. We may now state the

Strong Ratio Theorem (SRT). For a super-critical system, the set N of non-terminating graphs of Fr which do not approach the ray v is measurable and m(N) = 0. Hence L is measurable and m(L) = 1- x°.

The proof is difficult and we may proceed by means of lemmas 1-20.

Lemma 1. The set N may be decomposed into subsets: N = N1 + N2 + ... where Nn is the set of all non-terminating graphs such that for every K, there is a k>K such that j(Zk) is outside C 1/n.

Clearly every Nn is a subset of N. Moreover if z is in N, where, for every K there exists a k > K for which j(Zk) is outside Ce, then, defining n so that 1/n < ez, and noting that C/,, is in Cz, we see that z is in N.n

Lemma 2. To prove the SRT, it suffices to show that if Ne is the set of all non-terminating graphs such that for every K, a k>K exists for which i(zk) is outside Ce, then outer measure O(Ne) = 0.

For then O(N,) = 0 for every Nn of the decomposition of Lemma 1, and so O(N)< O(N,) = 0, whence N is measurable and m(N) = 0.

We must now adopt some notations: Let gf(x) = EpK(i;j)x' ... xj. Then g ) = K(g,..,g = P(i; j)g gt _ pK(; j)EPj (Jl) x' .. xt and so on. Generally, (X)EPK(i; j)Epj(j) ... Pj-1 (jS)xj ...Xt jjljs

Lemma 3. Let Ne be given as in Lemma 2. To prove O(Ne) = 0, it suffices to show that, for every f > 0 there exists a K such that E PK(i;j) + Pk(i; j) pj(jl) + iJCe iOo j i iC a C, pK(i;j) pj(jl))Pj (j 2 ) + ...< f j3oj30oj 2 Ce JECe jlec,


107

For, let Ne be fixed, and fix f > 0. Then, assuming such a K exists, we have for this K that every z in Ne has z(jk) not in Ce for some first k >K. Define yk in Tk so yk = Zk. Then z e i(yk). But the above sum is the sum of all m(i(yk)) for all yk so defined. Hence O(N0) <f for every f > 0, and so O(Ne) = 0.

It is the condition of Lemma 3 which makes clear the relation of the SRT with the weaker form.

Lemma 4. To prove m(Ne) = 0, it suffices to show that, for every f there is a K such that for all s> 1, SK,, pK(i;j) +YpK(i;j) E Pj(jl)+ ...+ jffCejoj1 jCe PK(i; j) E pj(jl) * ~ pjs_~ (js)<fj3ojl ?OjsCc iECe jlECe

Now, setting x = 1 in the form obtained for gK+s(), we see that 1 =pK(i; O) + E P(i;j) + E PK(i;j)>= gi (C), we s e e that pK(i; O)+ pK(i;j)+ E K(i;j) E p(jl)+ jce .-(J j C) E K(i;j) E P(jl) >... >pK(i;O)+ SK,s+ j3ojl1o j7Ce jl Ce pK(i;j) E Pj(jl)... E pj-l(jS) jio j I o js o jECe j eC-e jsECe

The latter expression we denote by IK,S.

Lemma 5. To prove O(Ne) = 0, it suffices to show that, for every f > 0 there exists a K such that, for all s> 1, IK,s > 1 -pK(i; 0) - f

For then Sk,s<1-pK(i;O)-IK,S< 1-pK(i;0)-(1-p K(i; )--f) = f.

We must now consider more closely than we have done hitherto the transformations T(n) = nM/s(nM) and S(n) = nM/((nMII, each of


108

which we regard as operating on the set of non-null vectors with nonnegative components. One sees easily that the k-th iterates are given by Tk(n) = nMk/s(nMk) and Sk(n) = nMk/lInMkll. We have already denoted by v the characteristic vector of norm 1, for which vM = rv, and for our present purposes we let v = v/s(v) so that s(v) = 1, and v)M = rv. Clearly Tk(v) = v and Sk(v) = v. The transformation T(n) is a mapping onto the plane s(x) = 1,S(n) is onto the sphere llxll = 1. Clearly Sk(n)/s(Sk(n)) = Tk(n) while Tk(n)/IITk(n)lI = Sk(n). We first note some trivial relations.

Lemma 6. If j has non-negative components, then llll <s(j) For ljl2 =jEj (EJ)2S 2 j)

Lemma 7. If j is an arbitrary vector, then lljll >ls(j)l/t>s(j)t. For lljll = ( j2)2 > l l, m = 1,...t, and tlljll > E IJm I > I I Is(j)l > s(j).

Lemma 8. If x and y are arbitrary vectors, ( xiyi)2 < 11xll2 I11y2 . This is the well-known Schwarz inequality.2

Lemma 9. If x is an arbitrary vector, then llxMjl<IxlIl(Eij mi .)2 --X IIXi M. 2 2 For, [ [xMII2 = Ej(Eiximij)2 < EjEiXi iT Ij- II22 We have used the Schwarz inequality here.

Lemma 10. If n is a vector with non-negative components, then \lnMll> llnll( min Ej m'2.) 2 -= \nlla.l For IInM2 = Ej (Einimij)2 >Er_jEii = Ein E,m >Ei min Ej m = lin 112. .

We have now the means of establishing the existence of a Lipschitz constant for the transformation S(n).


109

Lemma 11. There exists a constant L > 1 such that IlS(n) - S(n')ll<Llln-n'll for all vectors n, n' which are non-null and have non-negative components, lln'll being unity.

For IIS(n) - S(n')ll = \lnM/lInMil - n'M/Iln'MIIll<nM/InMI\nM - nM/II II llnMIIn'Mlll + IMnMIIIn'M -n'M/ln'M = 1l/l1nMIl - 1/lln'Mlll · IlnMII + lInM - n'Mil/lln'Mi = \llnMIJ - lln'MI\I/ln'MII + IlnM - n'Mll/lln'MII<2))(n - n')MIll/ln'MIJ< 2arlln - n'll/lln'llai, by Lemmas 9, 10. Now Iln'll = 1. Hence L - 2a/a1 serves.

Lemma 12. If n and n' are non-null, non-negative component vectors and lin'll = 1, then llSk(n)-Sk(n')l<LIln - n'll, k = 1,2,.... For JlS2 (n)-S2 (n')ll < LIIS(n)-S(n')l<L2l\n-n'll, and so on. Note that IISk(n')ll = 1.

The transformation T(n) has topological and algebraic advantages since its denominator is a linear functional (see I 3). However S(n) has simplicity from the point of view of norm inequalities. We work with whichever seems more adaptable to the current purpose. The following two lemmas give inequalities connecting the two operators.

Lemma 13. Let j and k be vectors with non-negative components and having s(j) = 1= s(k). Define j' = j/lljll and k' = k/llkll as their projections on the unit sphere. Then lil' - k'Il <2tllj - kJl.

For l--k'll - llJ/lIJi - lk/\\jll\\<J/llJIl -J/llkllll+lJ/llkll - k/llki\i = il/llJll - I/llkli) llJ ll + llJ - kl\/lIkll = llJlll- llkll/llkll + 11i - k\ll/kl < 21]j - kll/llkll <2tllj - kll/s(k) = 2tllj - kll. Here we have used Lemma 7.

Lemma 14. Let j' and k' be vectors with non-negative components and 1li'll = 1 = 1lk'1l. Define j = j'/s(j'), k = k'/s(k'), their projections on the unit plane s(x) = 1. Then llj - kll<(t + 1)flj' - k'll.

For llj-kll = lj'/s(j') - k'/s(k')|<\\j'/s(j') - j'/s(k')ll + llj'/s(k')- k'/s(k')\\ = Il/s(j') - 1/s(k')l I '11/1 + lli'11 + Ii'-k'll/s(k') = Is(j') - s(k') /s(j')s(k') + iJ' - k'll/s(k') = s(j'-k') /s(j')s(k') + llj' - k'll/s(k') But s(k')> Ilk'll = 1, s(j')> Ill'll = 1 by Lemma 6, and Is(j' - k')<tllj' - k'll, by Lemma 7.


110

Lemma 15. For every e > 0 there exists a K such that for all k>K and all non-null, non-negative component vectors n, lSk(n)-v- < .

We saw in 1 3, that for e/2t > 0 there exists a K such that for all k > K and all non-null non-negative component vectors n, IITk(n)-—vI < e/2t. Referring to Lemma 13, we see that Tk(n) and v are on the plane s(x) = 1, and are vectors with non-negative components. Moreover Sk(n) = Tk(n)/llTk(n)ll and v = v/ivill. Hence I{Sk(n)- vil<2tllTk(n)vl < e.

Thus the sequence of the transformations Sk(n) is uniformly convergent to v.

We saw in Lemma 11 that IIS(n) - S(n')\\ <Llin - n'll where L > 1. Examples show that S(n) need not be a contraction in the sense that the above relation holds for some L < 1. We can however show that there exists an iterate Sq(n) which is a contraction toward v. It seems simpler first to establish the analogous result for T(n).

We recall that the simplex {, . .., t} means the set of all vectors n = nij6 i , ni > 0, nrii = 1, that is, all vectors n = (nl,...,nt) with non-negative components and s(n) = 1. By the boundary of the simplex we mean all of its points having at least one component zero. Since vM = rv, s(v) = 1, and M has all elements positive, v can have no component zero and is therefore not a boundary point. One sees therefore that there is a minimum distance D > 0 from v to the boundary of the simplex {1, ..., t }.

Lemma 16. For every f > 0 there exists a Q such that for all q >Q, \\Tq(n) -v\< flln-vll for all vectors n with non-negative components and having s(n) = 1.

Proof. Let V = maxvi/minvii > 1. Let D = min {Ilb-vl} > 0 where b ranges over the boundary of the simplex {6, .. ., f}.Iff be given, we know that for the positive constant fD/Vt there exists a Q such that for all q > Q, TTq(n) - vI < fD/Vt for all non-zero n with non-negative components.


111

Now let n v be an arbitrary such vector with s(n) = 1. The halfray v + a(n - ), a > 0, cuts the boundary of the simplex {, . .., t} in a unique point b, and n may thus be written n= hv+kb,h>O, k>0, h+k=l1.

Thus for q>Q, \\Tq(n)- vIl - \\nMq/s(nMq ) - v1 = IlhivM + kbMq/hs(vMq) + ks(bMq) - v = hrqv+ kb hr + ks(bMq) - I = k IbMq - s(bMq)v I/hrq + ks(bMq ) = ks(bMq) I Tq(b) - v\l/hrq + ks(bMq ) <ks(bMq)fj[b - v\l/Vt(hrq + ks(bMq)) = fllkb - kOvl/Vt(hrq/s(bMq ) + k).

Now (lkb-kIvI = Ilkb-(l-h)vl( = (hv +kb-vll = Iln-vll. Also, note that min i )i miq < Ei viimq = rqvj< rq maxv, and hence Ei (mq)2 < (Ei mi)2 < r2qV2 and (E (r.j)2 ) < rqV. Thus s(bM) = j i bimt < E Ilbll(i (mEij)1) ) Ej (Ei (m )2) 2 <trqV.

Hence hrq/s(bMq)+k>h/tV+k = 1-k+ktV/tV = l+k(tV-l)/tV>l/tV. Thus we see that [[Tq(n) - v\I<flln - vIl.

Lemma 17. For every e > 0, there exists a Q such that for all q>Q, \ISq(n')-v\l<elln' -vll, for all n having non-negative components and lin'll = 1.

Fix e > 0. Define f = e/2t(t + 1). Then by Lemma 16, Q exists so that for all q>Q, ITq(n) - vll<elln - vl\/2t(t + 1) for all n with non-negative components and s(n) = 1.

For arbitrary n' with Iln'll = 1 and non-negative components, define n = n'/s(n'). Then we have \Sq(n')-v\\<2tllTq(n')-vl<2t\lTq(n)-vl<2telln-v\ll/2t(t+1) <e(t+l)lln'-vll/t+1 = elln'-vll. We have used Lemmas 13, 14, and the fact that Tq(n') = n'Mq/s(n'Mq) = nMq/s(nMq) = Tq(n).

Just as the possible failure of S(n) to contract all vectors toward v forced us to prove Lemma 17, so does the possible failure of the


112

transformation nM to increase the norm of n cause difficulties for which we must provide in the next two lemmas.

Lemma 18. There exists a constant e > 0 such that for all vectors n satisfying Iln-vll < e, one has lInMII > lln11(l+r)/2 = mllnll, 1 < m < r.

For the function R(x) _ IlxMll/llxl( is continuous at x = v and R(v) = r > 1. Thus for m = (1 + r)/2 < r there exists an e > 0 such that ln - vll < e implies R(x)>m.

Lemma 19. There exists a K such that for all k > K, UnMkI> lnll for all n, with non-negative components.

Note that if w is the right characteristic root of M, with s(w) = 1, Mkw = rkw, we have max w >j mk > _jmk.wj = rkwi > rk minw so Ej mrj > rkW where W =minw/max w. Now |nMk l > s(nMk)/t= Eij nimk/t= Eini Ejm /t > rkWs(n)/t > rkWlInIlt. Take K so rKW/t > 1.

By virtue of Lemma 5, it now remains to prove the final lemma for which we have prepared all the essential tools.

Lemma 20. Let e > 0, f > 0 be fixed. Then there is a K such that, for all s > 1, E pK(i;j), P,(jl) E Pj- (jS)> 1 -pK(i;O) - f. jio jlio js»oji-Cejl Ce jS Ce

Our proof is based upon a complicated construction of K which was, of course, obtained by looking from the other end. In spite of its glaring artificiality we give the direct route:

1. By Lemma 18, we fix e so that 0 < e < e and lln - vll < e implies }lnMl >mrllnll where m = (1 + r)/2 > 1.

2. By Lemma 17, there exists a q such that $qSq(n')-vll < ln'-v11/4 for all non-negative n' with Iln'll = 1. We determine an e satisfying the following conditions: a. 0 < 5 < e < e, and e < 1. b. 5 Lq- l < e/2, that is e < e/2Lq- 1 where L is the Lipschitz constant of Lemma 11, and q is the constant just defined.


113

3. Fix E > 0 so that a. E(1+LL2 +...+ Lq- 1)</2, b. m(1 - E) > 1, that is E < (m - 1)/m < 1. Note that we now have automatically, for all s<q, E(1 + L + L2 + ...+ Ls-2) + e Ls-'<E(1 + L + L2 +... + Lq-2) + 5 Lq-1 < e/2 + e/2 < e/2 + e/2 = .

4. Fix k so that a. $Sk((n)- vI < e /2 for all non-negative n / 0 (see Lemma 15). b. [InMkfl> Ilnll, for all non-negative n. (see Lemma 19).

5. We define dk = Ej( Ei (dik) ) , dI being the dispersion of gk with respect to xj, and define rnk = (minEj (mi)2)½ .

We now fix A > 0 so a. 4dk/ 2emjA < f/4. b. dl/E2 m2(1- e )A < 1. c. dim(l - E)/E2 m2(1 - e )A(m(1 - E) - 1) < f/2.

6. Fix r so that pr(i; n) < f/4, see I 3. O<llnll<A

7. Fix K = r + k. We contend that this K satisfies the condition of the lemma. First note that gK(x) = EpK(i;j)x1 ... = gr (Gk(x)) = pr(i; n) Pk,n(j)xl .. .xt, where (gk) 1 ... (gtk)l = E Pk,n(j)xl ...xt by definition. Setting x = 1, we may write 1 = pr(i; 0) + pr(i;n) Pk,n(j) + O<llnll<A j pr(i; n) E Pk,n(j) + pr(i; n) E Pk,n(j) Ilnl>A llj/llnMklJJ-vIl<


114

Now just as in I 3, we see that, for arbitrary R > 0, Z Pk, (j)R2 < Pk,n(j)j -nMk11 = Znidj|j-NMk|>R ij lln ( (d4) ) 2 = -nlldk. J 2 Thus, Pk,(j)< llnlldk/R2 . Ilj-nMkl|>R Setting R = (InMkIe /2, we have 5 Pk,n(j)<411nldk/ e2 (lnMkl12 < 4(1nldk/ ( 11nl2m2k = Uj/ll|nMkIl-JSk(n)l> e /2 4dk/ 2m -lllnl, where mk = min ( n (m) )) (see Lemma 10).

However, if 11j/llnMkll - v||> then j is on the preceding range, for if not, \\j/llnMkl| - Sk(n) | < /2 and \Sk (n) -v | < 5 /2 by choice of k, whence 1ij/lJnMkJ1 - v|l< e, a contraction. Hence 5 Pk,n(j)< 4dk/2 mllInll. |ljlln/fMllk-vll> e Thus E pr(i;n)Pk,n(j) < pr(i; n) 4dk/ e 2m2A < Iln\l>A Ij/1\nMkll-v\I>e nll>A (f/4) E pr(i;n)<f/4. IlnII>A But by choice of r E pr(i; n)Pk,n(j)= p(i;n)<f/4. O<llnHJ<A j O<llnll<A

And, as always, pr(i; 0)< pr+k(i; 0). Thus we obtain from the original equation that Pr,(i; n).Pk,n(j) >1-pr+k(i;0) - f. nl>Aj/nMk-v< e


115

Every vector j involved on the range above has the property that ji/a - vii < 5 for some a > A, for lInMklI> llnll > A by choice of k. Thus Z pK(i;j)>1-PK(i; 0)-f lIj/a-v\l< e a>A Since 1lv l = 1 > e by choice of e, j = 0 is not on this range.

Note the two trivial remarks:

A. If llaj' - vii < e, a > 0, i'll = 1, then lij' - vll < 2e. For e > llaj' - vii > laj'll - ilvll l = a - 11, and lij' - vll = lij' - Jai' + Ilaj' - vii = I1 - al llj'll + e < 2e.

B. Corollary. If lij/a - vll < e and j' = j/lljll then (lj' - vll < 2e. Consider now the product pK(i; j) Pi(jl)  Pjl(j) { \\j/a-v\\< e \\j1\\jMj-Sll(j)\<E\\jj/\\jjMjl-S(j,)\\<E a>A E p_j (jq-l) pE _Pq- (j) *- (*) \3jq-1lj~jq-2M\\-S(jq-2) ||<E || j"/\j"-1M\\-S(j"-) ||<E

Since e < 1, E < 1, and Ilvli =1 = Sll(j)ll[ , we see that no j or ji is ever zero on these ranges. Applying Lemma 12 we find that lljq/lljMq-IM_-S(jq-l) I< E, S(jq)-S2 (j -) I< LE, 11(jq-)S3 (jq-3) < L E ... Ilsq-2 (j2)_- sq-l (j)l) 11 < Lq-2E \Sq-l(jl) Sq(j)\\ < Lq-'E, and hence ljq/lljq-lMI - _Sq(j)ll <E(1 + L + ... + Lq- l)


116

But since IISq(j/(ljl)- vl < /jlJljl - vIl/4 < 2 /4 <-/2 (see remark B and definition of q) we have|jq/lljq-MII- v|l< E(1 + L + .. + Lq- l) + /2 < /2 + /2 = 5. (see definition of E). Similarly we shall find, working from 2q to q, that llj2q/llj2q-lM -vl\ < 5 and so on for all multiples of q. Note also, for s<q, Ijs-/lljS-2Mi - S(jS-2) II < E S(js- -) - S2(j-3 ) l < EL... SS-2(jl) - Ss-l(j)I < ELS-2 \S`-l(j) -v\ <eL-1 , so lljs-1 lIj~-2MI - v\\ < E(1 + L + ... + L-2 ) + t LS-l <

(see definition of E). Similarly all jnq+r on the non-q-multiple postions are in Ce. It is clear now that the ranges for j, j,.. .are all in Ce c Ce and exclude zero. Hence the product of Lemma 20 is greater than or equal to that of (*) carried to the corresponding place.

Now note the elementary result that if l j/a-k 1 < eo where IlkI = 1, we have a - lljll = Ilakll -lljll<1ljlll - Ilak]ll<llj - akll < aeo and hence ll > a(1 - eo).

We have seen that, on the ranges involved in the (*) product, all j and js are in Ce, indeed \lj/a - vll < 5 < e, and jI/lljs-1 Mll - vll < e, all s. By choice of e, ll(j/a)MIl> mllj/all and II(js/lljS-Mll)MII>ml (js/lljs-lMll) 11. But then lljMll >mlljll and \ljsM\\> m\jsll. Thus we see that

lij/a - vll < 5 implies 11jll > a(1 -e ) > A)(1 - ). llj/(jMll - S(j)| < E implies jllj> lljMll(l - E)>m(1 - E)jllj > m(l - E)A(1 - 5), 1j2 /llj1 Mll - S(j)11 < E implies fij2 >jMll (1 - E)>m(l - E)[ jl | > m2 (1 - E)2 A(1 - ),..., js/ij-l'MIJl-S(j'-)1l < E implies ljs >m"(1 - E)8 A( - ).

Letting C = A(1 - ) and R =m(1 - E) > 1 by choice of E, we have on the (*) ranges, llill > C, ljjll > CR, lij21 > CR2 ... ljs-l > CRs-.


117

Now regard the distribution defined by g .g.. t=Pj-l, (j")xxl . .. . We see as before, for arbitrary n > 0, E P,_l(js) N2 < P(jjs)I-2 -J- E j-ldl < JJjs-js-1MJ(>VNIv Jj-1. dl, and hence 1 Pj-_(js) 1- IIs-1lld1/ .N2 lljl-j»-lMl)<N

Setting N = Ijj -1MII E for all s>1 EPP- (is) > 1 - lj-ll dl/E2lljs-lMj2 > 1- Ij1-dl 1/E2 Iij-11 2 . m l = 1 - di/E2 m llj~l I. Beginning at the last (s)-th term, we have S Pjs- (js)> 1 - di/E2 m2CR-1 > 0 Pj.s-2(jS- 1) > 1 - di/E2 m2CR-2 > 0 1IjS-1 /1jS- 2MII- (js-2) II<E 5 P (j')> 1-d/E2 mC > . 11|/11j11jM-S(j)11<E

Note that d1/E2 m\2CRi < d1/E2 m2C =di/E2 m2(1- 5)A< 1 by choice of A. Hence the (*) product through s factors is greater than or equal to (1-pK(i; 0)-f/2) (1-di/E2 m2C) (1-di/E2 m2CR) ... (1-di/E2 m2CRS-1).

It is trivially proved by induction that if all pi and 1 -pi are positive, then (1-pl)(l-p2) ... (1-pn)>1 -p-p2 -.-Pn,. Hence the preceding product is greater than or equal to


118

- pK(i; 0) - f/2 - (di/E2 m2C) (1 + 1/R + /R2 + . .. + 1/R-1 ) > 1 - pK(i; 0) - f/2 - di/E2 mC(1 - 1/R) = 1 - pK(i; O) -f/2 - diR/E2m2C(R - 1) -pK(i; 0) - f/2- m( - )/ - - E/ )A(m(1 - E)- 1) > 1 - pK(i; 0) - f/2 - f/2 = 1 - pk(i; 0)- f, by choice of A.Q.E.D.

VII—
Remarks On Systems below Critical

1. For a subcritical system (x° = 1), we have seen in V that m(T) = x = 1 so that m(T') = 0, and since N c T', O(N)<O(T') = m(T') = 0. Thus, in such a system, it is trivial that m(N) = 0. Similarly L C T' and m(L) = 0 = 1- x°. Therefore the SRT is self-evident for subcritical systems.

References

I

1. P. Alexandroff, H. Hopf, Topologie, Berlin, 1935 2. G. Birkhoff, Lattice Theory, Am. Math. Soc. Colloq. Publ. XXV, New York, 1940. 3. D. Hawkins, S. Ulam, Theory of Multiplicative Processes, see Chapter 1. 4. H. Margenau, G. M. Murphy, Mathematics of Physics and Chemistry, Van Nostrand, 1943.

II

1. C. J. Everett, S. Ulam, Multiplicative Systems in Several Variables, I, (LA 693) see Part I. 2. D. Hawkins, S. Ulam, Theory of Multiplicative Processes, see Chapter 1.


119

III

1. H. Cramer, Mathematical Methods of Statistics, Princeton University Press, 1946.

2. C. J. Everett and J. R. Ryser, The Gram Matrix and Hadamard Theorem, Am. Math. Monthly, LIII, 1946.

3. C. J. Everett, S. Ulam, Multiplicative Systems in Several Variables, I and II.

4. W. Hurewicz and H. Wallman, Dimension Theory, Princeton University Press, 1941.

5. W. Sierpinski, Introduction to General Topology, University of Toronto Press, 1934.


121

4—
Heuristic Studies in Problems of Mathematical Physics On High Speed Computing Machines:
With John Pasta (LA-1557, 1953)

This report introduces some elementary but basic methods of a specific version of "brute force" approaches in problems of hydrodynamics which do not yield to analytical methods.

The ideas in this report were further developed, generalized, and modified by Frank Harlow, and continue to play an important role in modern computer calculations in continuum mechanics. (Author's note.)

This paper, the first of a series, is intended primarily to illustrate some possible uses of electronic computers as a means of performing "mental experiments" on mathematical theories and methods of calculation, for a variety of physical phenomena.

There is no unifying principle in the few problems selected for this first report; on the contrary, the aim is to illustrate a certain variety in the problems which one can consider by performing model calculations. Mathematically, these problems are mainly attempts to solve special cases of various partial differential equations; in some cases one prefers to have direct recourse to the physical formulation and perform, as it were, the model experiments on paper rather than solving, by standard conversion of differential to difference expressions, the equations themselves. One general tendency might be notedwe stress an attempt to "observe" a few functionals of the unknown


122

functions, rather than to put credence in the solutions themselves "at each individual point."

The calculations were performed on the Los Alamos electronic computer. It is a pleasure to express our thanks to N. Metropolis for making it available and for his generous help in general. We are indebted to Miss Mary Tsingou who performed much of the work of "coding" the problems, running the machine and checking the procedure and results.

One purpose of this work was to gain a feeling for: first, the time necessary to formulate and prepare the problems for such machine study; second, the time of computing as a function of the size and complexity of the formulation. In general, we restrict ourselves in this first paper to problems where meaningful results could be obtained in a few hours of actual computing.

All the problems discussed in this paper were run on our machine. We append a small selection of the results which we intend to discuss at greater length in our second paper. We want to thank Mrs. Connie Snowden for preparation of the graphs.

1—
Hydrodynamical Problems; Heuristic Considerations

In general for problems involving two or more spatial variables, there is at present little hope for obtaining solutions analytically-in closed form. In the several sections that follow, heuristic considerations are set forth with exploratory calculations on the high speed computing machines performed in several cases. The purpose of these was primarily to establish the feasibility of such calculations, to estimate the sizes of problems which can be handled in a reasonable time, and in general to gain experience in the new methods and new fields which, in our opinion, are now open for investigation.

Our approach to the problem of dynamics of continua can be called perhaps "kinetic"--the continuum is treated, in an approximation, as a collection of a finite number of elements of "points;" these "points" can represent actual points of the fluid, or centers of mass of zones, i.e., globules of the fluid, or, more abstractly, coefficients of functions, representing the fluid, developed into a series, e.g., Fourier or Rademacher series. This corresponds to the use of general Lagrangian coordinates in classical mechanics; their use can be rigorously justified in problems where entropy is constant-i.e., holonomic systems.

They are always functions of time, which proceeds by discrete intervals.

We thus replace partial differential equations or integer-differential equations by systems of total differential equations. The number of


123

elements which we can at present handle is always less than 1,000, the limitation being primarily in the "memory" of the machines, the number of time intervals ("cycles") of the order of 100.

The problems which we study are characterized by lack of symmetry. The positions of our points become, as a rule, more and more irregular as time goes on. This has the consequence that the meaningful results of the calculations are not so much the precise positions of our elements themselves as the behavior, in time, of a few functionals of the motion of the continuum.

Thus in the problem relating to the mixing of two fluids, it is not the exact position of each globule that is of interest but quantities such as the degree of mixing (suitably defined); in problems of turbulence, not the shapes of each portion of the fluid, but the overall rate at which energy goes from simple modes of motion to higher frequencies; in problems involving dynamics of a star cluster, not the individual positions, but quantities like angular momenta of subsystems of the whole system, of size smaller but still comparable to the whole system, etc.

Needless to say, our investigations are in a most preliminary and rudimentary state; we have not made rigorous estimates of the disturbing or smoothing action of the roundoff errors which accumulate, nor the effects of finiteness of the time interval (i.e., the errors due to replacement of differential by differences expressions). In the individual cases that follow the reader will be able to judge for himself how far elementary common sense permits to estimate these effects.

We hope in the future to multiply considerably the number of calculations performed for each of our problems to assay the influence of changes in initial conditions on our conclusions. We repeat that so far the value, if any, of this work is only heuristic.

The first problem considered concerns the behavior of a gas confined in a vessel, expanding into vacuum under its own pressure and weight. The surface is not plane but has an irregularity of a finite size. In other words we consider the problem of instability in the compressible case.

2—
Instability and Mixing

The conditions at time t = 0 are the following: a vessel (twodimensional) contains a gas filling it partly, with a boundary against a vacuum. This boundary is not flat but has an irregularity in it in form of a triangular prominence jutting out with dimensions comparable to the diameter of the vessel (about one-fourth of its width).


124

We want to follow the behavior of the expansion of this gas, under its own pressure, in the vacuum below it. Two problems were considered:

1) the gas was assumed weightless, i.e., there was no external force acting on it;

2) the gas, in addition to its own pressure, was acted upon by a constant gravitational field.

The hydrodynamical setup used was the following: the gas was represented by 256 material points. These represent centers of mass of regions in the gas. The treatment is Lagrangian; that is, the calculation follows, in time, the position of these masses. The pressure gradients are represented by forces which our points exert on each other. These are repulsive forces, depending only on the distance between points (and thus having a potential "pressure") and are of the form Fij = a/rjI; where the exponent x depends on the adiabatic equation of the state of the gas considered.

We chose oc = I for our first problem. Given a point Pi (xi,yi), one considers all forces exerted on it by other points (Xj, yj) and computes their resultant vector.

Actually, we limit the pj in computing the forces to the "neighbors" of pi; these are defined as the nearest eight points to pi. This is done for two reasons: the economy of the computation-we have to calculate only eight instead of 256 of the total number of forces for each point under consideration; the second, more fundamental reason is, of course, that in the gradient of pressure only the local configuration matters. The 1/r force law gives divergence at infinity. We might mention here parenthetically that in the search or scanning of points for the nearest eight to the given one, the following was adapted for purely practical, economy, reasons. The points really scanned were 50 "candidates" for the nearest neighbor. They were the 50 originally nearest to pi; the problem was not run long enough so that we had to relocate the original candidates but it is, of course, possible to recheck this periodically. In addition, in order to avoid the use of multiplications and operate


125

only with the much shorter addition times, we used, in the search for the nearest eight points, the non-Euclidean metric p(vi, vj) =I(xi xj) I + I (Yi- yj) Once the eight points were found, however, the true Euclidean distances were computed in order to find correctly the resulting force.

We shall not describe here the special treatment which has to be accorded to points which adjoin the walls of the vessel and the points on the boundary of the gas (with the vacuum).

Among the quantities that were printed as the results of the calculation we shall mention here only these: an interesting functional of the motion is the kinetic energy of the gas divided into two parts: the energy of the motion in the x direction and in the y direction. We study E1 = 1/2 Emi.2 and E2 = 1/2 Emi]2.

The ratio of the two is a function of time and can serve, in a way, as a measure of mixing or irregularity of the motion. One expects due to the initial irregularity of the boundary this ratio to be positive. From the beginning, sidewise motions ensue. Later on one would expect, of course, the motion to be predominantly downward; as the irregularity increases, the ratio of the two quantities should, in Problem I, increase again and approach a constant less than unity.

It is perhaps remarkable that the time behavior found for this ratio was extremely regular; a graph (Fig. 1) for the first 36 cycles is appended.

One word to explain the need to resort to the rather unorthodox procedures outlined above:

It was found impractical to use a "classical" method of calculation for this hydrodynamical problem, involving two independent spatial variables in an essential way (since the gas interface had an irregularity assumed from the beginning). This "classical" procedure, correct for infinitesimal steps in time and space, breaks down for any reasonable (i.e., practical) finite length of step in time. The reason is, of course, that the computation of Jacobians which define the compression assumes that "neighboring" points, determining a "small" area, stay as neighbors for a considerable number of cycles. It is clear that in problems which involve mixing specifically this is not true. Calculations were made just in order to observe the rapidity of change in the "neighbor" relations on the classical pattern and have shown just what was expected to happen: the proximity relations change radically, for points near the boundary, just when the mixing to be studied is starting and the neighboring relation of our points has to be redefined; i.e., the classical way of computing by referring to initial (at time t = 0) ordering of points becomes meaningless.

The problem can be treated, of course, using the Eulerian variables (of a set-up due to von Neumann to be discussed in our next report)* where this difficulty does not occur. This Eulerian treatment

* There was no subsequent report. (Eds.)


126

is not suitable, however, for the study of the shape of the boundary-a fictitious, i.e., purely calculational, diffusion and mixing obscures the very phenomenon one wants to study. We shall return to this question in our second report.

In the "classical" set-up involving calculation of Jacobians for determining pressure gradients the cycle was approximately two minutes.

In the problem involving the calculation of forces from the eight "neighbors," the cycle time was three minutes. Meaningful results are obtained in about 100 cycles. Appended are graphs showing the results of a few dozen cycles.

3—
Billowing Transformations

The aim of the model calculation here is to exhibit transformations of space which, when applied iteratively to a sphere (or, actually, so far, to a circular region in a plane) will show a sequence of regions, imitating the familiar phenomenon of a ball of smoke billowing outwards. The parameter of iteration is, of course, the time again increasing by discrete intervals.

We assume that the billowing is due to initial irregularities or deviations from a spherical form.

The gas is contained in a region R, and is under internal pressures. We assume first, for simplicity, that it expands into vacuum and we want to study the motion in a highly stylized form-keeping, we hope, the quantitative phenomena essentially correct. The situation then is the following: R is the region occupied by the gas under pressure in the initial position. The following assumptions are made and made plausible qualitatively:

1. The accelerations of points on the boundary are in the direction of the outer normal to the boundary.


127

2. The motion is computed only for the points on the boundary above; the accelerations at each point depend on the shape of the boundary which is approximately correct in the following approximations:

The density distribution and the pressure well inside the ball is essentially uniform-if the time elapsed since the origin of our ball is long enough-i.e., the instantaneous changes in the position of the particles on the boundary are small compared to the dimensions of our object. "The pressures inside have had time to equalize."

3. On the boundary itself-the accelerations along the normal to each point depend on the curvature at each point in the following way:

There is a positive term outward in n with an additive term whose sign is that of the curvature c of the boundary at the point which is considered. This is due to the convergence (or divergence) effects of the streamlines.

E.g., in case 2 the local density at P is probably greater than in the nearby points because of converging motions of points in a band near the surface; in case 1 there is correspondingly a local rarefaction. So we set ntt=p + ac (1) where c is the curvature of the boundary. The terms p and a are not constant but vary, say, with the volume enclosed by the boundary-the pressure diminishing, e.g., inversely with the volume.

A qualitatively similar formulation, (in a discussion with John von Neumann) in the small, would be this: consider the part of the curve given by y = f(x)


128

f is positive and the direction of expansion is upward. We write Yttt = Yxx (2)

with the initial conditions at t = 0, ytt = 1, yt = 0, say, and y = f(x). The tendency is again the same in convex points. The accelerations

upwards are decreased, while in concave ones (from which one expects "jetting") they are increased.

Setting y =- (t) X (x) and say X (x)= cos (nx - /), one gets f"'(t) = -n 2). Setting ) = e"(t), a is the cubic root of -n2. It follows, since the dominant part will be due to the root with positive real and imaginary parts that the motion is highly unstable. A kind of billowing or swirling will take place--the concave parts will puff out, giving rise later to at least two concavities on its sides. This will repeat the same phenomenon later, etc. It would seem then that a multiplication of irregularities takes place-their number will increase rapidly-accompanied with a continuous increase in size.

In three dimensions the problem is much harder to treat. The additional term in the acceleration must be set up in terms of the two principal radii of curvature at each point.

Our computation program started in two dimensions as follows. Initially the boundary is taken to be composed of sixty-four equal segments defined by points (xi, Yi) on the surface. The magnitude and sense of the curvature at (xi, yi) is derived from the cross product of the two vectors (xi - xi,_, i - yi-1i) and (xi+ 1- xi, Yi+1 - yi). From the coordinates (x, y) at two consecutive time levels the positions at the next time level are found by integrating ntt = p +ac.

Because of the difficulty of plotting many points every time and in order to observe the motion, it was decided to display the points on one of the cathode ray tubes of the computer memory section. Each such tube has a 32 x 32 array of points, each representing a binary digit in one of the machine's 1024 forty bigit numbers. There are forty such tubes. The most convenient tube to use is tube number one, the "sign bigit" tube. For the proper display it is convenient that all constants of the problem be taken as positive, which may be done without loss of generality. In the computer used here, machine orders are independent of the bigit in the sign position. Thus, the abscissa and ordinate of a point are transformed into an instruction for changing the sign of the appropriate memory position, which, in turn, lights that spot on the tube face. In this way a picture of the boundary is "painted" on the tube face. The picture is displayed for a fixed time, is erased by dropping the sign of all memory positions, and a picture of the new position is then plotted and displayed. A scaling routine keeps the


129

surface within the confines of the tube matrix.

This procedure should be useful in many problems of motion of gases or liquids.

It is certainly of value in obtaining a quick overall check on the correctness of the code; reasonableness of the time intervals, etc.

The cycle time on this problem is of the order of ten seconds. No printing is involved. Meaningful results are obtained in a matter of minutes.

4—
Problems on Rotational Motions in Gravitating Systems

An interesting set of questions in statics concerns the properties of moments of forces exerted on each other by randomly distributed points forming a system S. In dynamics the questions concern angular momenta of subsystems a contained in S; as a function of the time.

Let us imagine the following situation: E consists of a number of mass points mi, m2, ... mn located at t = 0 at positions rl,... rn given at random, say, in a unit sphere (let us assume, for example, a uniform probability distribution for the position of each point). We assume further Newtonian attractive forces Fij acting between any two points mi, mj. Denote by Gi the sum over all j of forces acting upon the point mi and let us imagine the vector Gi applied at the point ri. It is clear that the sum over all i of Gi is equivalent to zero. What we propose to study is, at first, the statistical behavior of the forces Gi if summed over subsystems a of the whole set with the following questions to be investigated: let p be any number and let us consider subsystems located in a circle with radius p and an arbitrary center ro. Let us form the sum of all Gi located in such and we obtain, in general, a single force $ and a couple P referred to ro with magnitude which we shall call 0. Both ( and 0 thus computed are functions of p and ro but we can integrate these quantities over all initial positions ro and will obtain )p and Op, a single force and moment, which will be now functions of p alone. It is our aim to obtain these functions for a random dynamical system, that is to say, the expected values in a random distribution. This can be obtained in practice by computing, on the machine, these quantities for a large number of systems each chosen by a random process. Our statistical computations will be confined at first to plane

cases, the three dimensional systems requiring too great a memory at present.

The next, more interesting, thing to study is the following. In a situation as described above at time t = 0 let t increase. Motions of our


130

points will ensue and we intend to investigate the angular momenta of subsystems o as functions of the size of a and of $q in time.

Such a situation is perhaps exemplified by star clusters. What we want to study are dynamical systems with many particles, but not gases; that is to say, by mean free path for "collision" we mean an appreciable change in the velocities due to gravitational forces acting between just two of our mass points. It is known that clusters or galaxies possess a rotational motion as a whole. These could, perhaps, originate as follows: the original distribution of matter now separated in galaxies was more or less uniform and random-like. Our system E can be imagined infinite. Finite subsystems have angular momenta as a result of fluctuations in the distribution and then, if fluctuations of density occurred also, some subsystems would isolate themselves, stay together due to gravitation (cf. the work of Jeans) and if the whole space expanded with time these condensations would have kept all or most of their angular momenta due to the original fluctuations in the system of vector forces and as the condensations receded from each other, their non-zero angular momenta would have stayed constant in time. Another way to look upon the problem is to study the distribution of vorticity of finite subsystem a in a very large or infinite system E whose points exert forces on each other.

Our proposed calculations consist then of producing a sizeable number of randomly chosen systems E and following the behavior of subsystems in time, i.e., using a discrete series of time intervals or "cycles" on the machine and computing the following set of averages: L(ro, p), the angular momentum as a function of the radius of the subsystem p and position ro. For a system of 100 points four positions along a radius with four values of p at each point are adequate. The running time per cycle for 100 mass points is less than five minutes with this machine. The running time increases as the square of the number of mass points, but statistics can be improved by running many problems with few mass points, in preference to the less economical method of increasing the number of mass points.

The total kinetic and potential energy is calculated. The system should stabilize at some given size. In this steady state the number of double and triple "stars" is of interest. The total energy serves as a check on the problem.

The moment of inertia of the complete system is calculated, and finally the number of particles in each of the sixteen subsystems mentioned above. The latter numbers permit an approximate plot of the density distribution of the system. The largest value of p is chosen to be of the order of the dimension of the system.

Several hundred cycles will be necessary for results on this problem.


131

Appended are graphs (Figs. 2 to 5) showing the results of 100 cycles run so far.

5—
Magnetic Lines of Force

It appears that our computing machines are especially well adapted to the study of properties in the large of the system of lines of magnetic force, due to given currents in space. The renewed interest in the qualitative, ergodic, or even just topological behavior of such families of lines is due to studies in magneto-hydrodynamics, applications in astronomy, questions of origin of the cosmic rays (Fermi), not to mention the importance of such knowledge for applications in the construction of high energy accelerating machines (cyclotrons, synchrotons, etc).

In order to study this subject systematically, it is best to consider first steady currents following through given lines (wires) in space. If there is only one current through a straight line extending infinitely, the system of lines of magnetic force is, of course, very well known. They form circles linking the line; the same is, or course, true for a current in a single closed circle.

However, the topology of the system of lines of force seems to be very complicated and the ergodic behavior of single lines of force unknown for the case where the single closed curve, through which the current flows, forms a knotted loop, say in the simplest case a clover leaf knot. Some single lines of force will probably be dense on two dimensional surfaces, probably some singularities in the field of lines exist, independently of the metric appearance of the knot, but are present necessarily in every topological knot of this sort. There seems to be little hope of obtaining analytically closed expressions describing the system of lines of force.


132

The situation is complicated in case of two given currents. It is easily seen, in the case given below, that almost all lines of force will be bounded, not closed and each dense on a surface! Let the two currents flow as follows: current 1 on a straight line, say the z axis, current 2 on a circle x2 + y2 = 1. In general, except at points where the ratio of the two current strengths is rational, a line of force will exhibit an "ergodic" behavior on a surface of a torus.

We propose to investigate, on the computing machines, the properties of lines of force due to two currents-each on a straight line extending to infinity. The two lines are skew. T1 flows on the y axis, T2 on the line z = d, y = 0.

One is interested, among other things, in the following questions: Do there exist lines which, although not closed, cross a surface of a fixed sphere infinitely many times? Are there lines going arbitrarily far from a fixed point and returning to its neighborhood? Do there exist lines braiding or linking both given wires any number of times?

The computations of such lines of force do not involve much of the "memory" of the machine. The procedure is this: starting at a point (xo, yo, zo) we compute the direction of the magnetic fields, simply adding the two field strengths, given elementarily from each wire; we perform "short" step (Ax, Ay, Az) in the direction of the (constant in time!) force. The computation of this step is done as follows: we calculate a provisional set of increments (Ax)', (Ay)', (Az)' of the variables, in the new position we calculate the new set (Ax)", (Ay)", (Az)"; we then take Ax = ((Ax)' + (Ax)")/2, Ay, = ((Ay)' + (Ay)")/2, Az = ((Az' + (Az)")/2 and proceed anew. In general this way of solving a system of equations dx/X = dy/Y = dz/Z works well. In our case it is seen that each step is computed in the order of 50 milliseconds; to perform, say, 1,000 steps will take of the order of a minute. The idea is now that with the order of 104 steps we shall be able to get


133

some qualitative information about a single line of force as follows: it seems practical to take each step long enough so that in, say, 50 to 100 steps one complete "loop" can be described around a wire (in positions where it is expected that the lines of force surround the current). One would expect then to obtain a number of "loops" of the order of a few hundred.

The quantities printed as a result of each such calculations could be for instance:

1. The number of "returns" of a line to a given sphere. One simply has to record on the machine the number of times our line crosses the surface of a given sphere.

2. The number of times and the sense in which a line loops the two wires separately and the number of loops surrounding both together. This can be done simply by computing the work done by moving on the line of force, calculating the loops around each wire, as if it alone had current flowing through it. The Gaussian looping coefficients for any two given curves in space can be quickly computed on the machine.

It is convenient to take the length of the "step" along the magnetic field vector to be inversely proportional to the magnetic field strength at that point. In this way the step is proportional to the distance from the wire since the magnetic field is inversely proportional to that distance. The number of steps per looping of the wire is then constant and the step is appropriately shorter at points where the curvature is greater.

It is worthwhile to point out that some integer valued topological invariants may be computed exactly even though we use difference expressions instead of differential ones, and have also round-off errors. This is due to "e-invariance" theorems on simplicial approximations in topology1 . Also, so to say, the field of "error vectors" is in general "curl-free."

Reference

1. K. Borsuk and S. Ulam, "Uber Gewisse Invarianten der Abbildungen," Math. Ann. 108, 311-318, 1933.


134

blank


135

Problem Of The Attracting Points Fig. 2.


136

Fig. 3.


137

Fig. 4.


138

Fig. 5.


139

5—
Studies of Non Linear Problems:
With E. Fermi and J. Pasta (LA-1940, May 1955)

This report details the first attempt to study by computer experimentation the asymptotic behavior of nonlinear dynamical systems for which closed analytic solutions were not available. The results, both interesting and surprising, led to the examination of other examples of such physical systems (e.g., reports 6, 10 and 11.) The impact of these pioneering efforts is reflected in continuing research and publications. The evolution of the concept of solitons, as well as much of the current experimentation-aided by vastly improved computer graphics--on chaos and turbulence, can be traced to this source. (Eds.)

Abstract

A one-dimensional dynamical system of 64 particles with forces between neighbors containing nonlinear terms has been studied on the Los Alamos computer MANIAC I. The nonlinear terms considered are quadratic, cubic, and broken linear types. The results are analyzed into Fourier components and plotted as a function of time.

The results show very little, if any, tendency toward equipartition of energy among the degrees of freedom.*

*After the untimely death of Professor Fermi in November 1954, the calculations were continued in Los Alamos. The last few examples were calculated in 1955.


140

This report is intended to be the first one of a series dealing with the behavior of certain physical systems where the nonlinearity is introduced as a perturbation to a primarily linear problem. The behavior of the systems is to be studied for times which are long compared to the characteristic periods of the corresponding linear problems.

The problems in question do not seem to admit of analytic solutions in closed form, and heuristic work was performed numerically on a fast electronic computing machine (MANIAC I at Los Alamos).* The ergodic behavior of such systems was studied with the primary aim of establishing, experimentally, the rate of approach to the equipartition of energy among the various degrees of freedom of the system. Several problems will be considered in order of increasing complexity. This paper is devoted to the first one only.

We imagine a one-dimensional continuum with the ends kept fixed and with forces acting on the elements of this string. In addition to the usual linear term expressing the dependence of the force on the displacement of the element, this force contains higher order terms. For the purpose of numerical work this continuum is replaced by a finite number of points (at most 64 in our actual computation) so that the partial differential equation defining the motion of this string is replaced by a finite number of total differential equations. We have, therefore, a dynamical system of 64 particles with forces acting between neighbors with fixed end points. If xi denotes the displacement of the i-th point from its original position, and a denotes the cofficient of the quadratic term in the force between the neighboring mass points and f3 that of the cubic term, the equations were either ,i==(Zi+i + Xi-l - 2Xi) + a[(Xi+l - Xi)2 - (Xi - xi1 )2 ] i 1, 2,... 64, or xi =(Xi+i + xi-i - 2xi) + 13 [(xi+1 - xi)3 - (xi- Xi_)3] i = 1, 2,...64,

a and 3 were chosen so that at the maximum displacement the nonlinear term was small, e.g., of the order of one-tenth of the linear term. The corresponding partial differential equation obtained by letting the number of particles become infinite is the usual wave equation plus nonlinear terms of a complicated nature.

* We thank Miss Mary Tsingou for efficient coding of the problems and for running the computations on the Los Alamos MANIAC machine.


141

Another case studied recently was :xi = (6I(Xi+ I - Xi) - 62 (Xi - Xi--1) + c (3)

where the parameters 61, 62, c were not constant but assumed different values depending on whether or not the quantities in parentheses were less than or greater than a certain value fixed in advance. This prescription amounts to assuming the force as a broken linear function of the displacement. This broken linear function imitates to some extent a cubic dependence. We show the graphs representing the force as a function of displacement in three cases. Quadratic Cubic Broken Linear

The solution to the corresponding linear problem is a periodic vibration of the string. If the initial position of the string is, say, a single sine wave, the string will oscillate in this mode indefinitely. Starting with the string in a simple configuration, for example in the first mode (or in other problems, starting with a combination of a few low modes), the purpose of our computations was to see how, due to nonlinear forces perturbing the periodic linear solution, the string would assume more and more complicated shapes, and, for t tending to infinity, would get into states where all the Fourier modes acquire increasing importance. In order to see this, the shape of the string, that is to say, x as a function of i and the kinetic energy as a function i were analyzed periodically in Fourier series. Since the problem can be considered one of dynamics, this analysis amounts to a Lagrangian change of variables: instead of the original xi and xi, i = 1, 2, ... 64, we may introduce ak and ak, k = 1, 2,... 64, where ak = xi sin 64 (4) The sum of kinetic and potential energies in the problem with a quadratic force is


142

Ekin Epot 1 .2(xi+l - Xi)2 + (xi-i-1)2i + Ei- i + (5a) iWkink222 7 k Eak +E -pt = + 2a sin (5b) a, 2k+n 2128 if we neglect the contributions to potential energy from the quadratic or higher terms in the force. This amounts in our case to at most a few per cent.

The calculation of the motion was performed in the x variables, and every few hundred cycles the quantities referring to the a variables were computed by the above formulas. It should be noted here that the calculation of the motion could be performed directly in ak and ak. The formulas, however, become unwieldy and the computation, even on an electronic computer, would take a long time. The computation in the ak variables could have been more instructive for the purpose of observing directly the interaction between the ak's. It is proposed to do a few such calculations in the near future to observe more directly the properties of the equations for ak.

Let us say here that the results of our computations show features which were, from the beginning, surprising to us. Instead of a gradual, continuous flow of energy from the first mode to the higher modes, all of the problems show an entirely different behavior. Starting in one problem with a quadratic force and a pure sine wave as the initial position of the string, we indeed observe initially a gradual increase of energy in the higher modes as predicted (e.g., by Rayleigh in an infinitesimal analysis). Mode 2 starts increasing first, followed by mode 3, and so on. Later on, however, this gradual sharing of energy among successive modes ceases. Instead, it is one or the other mode that predominates. For example, mode 2 decides, as it were, to increase rather rapidly at the cost of all other modes and becomes predominant. At one time, it has more energy than all the others put together! Then mode 3 undertakes this role. It is only the first few modes which exchange energy among themselves and they do this in a rather regular fashion. Finally, at a later time mode I comes back to within one per cent of its initial value so that the system seems to be almost periodic. All our problems have at least this one feature in common. Instead of gradual increase of all the higher modes, the energy is exchanged, essentially, among only a certain few. It is, therefore, very hard to observe the rate of "thermalization" or mixing in our problem, and this was the initial purpose of the calculation.

If one should look at the problem from the point of view of statistical mechanics, the situation could be described as follows: the phase


143

space of a point representing our entire system has a great number of dimensions. Only a very small part of its volume is represented by the regions where only one or a few out of all possible Fourier modes have divided among themselves almost all the available energy. If our system with nonlinear forces acting between the neighboring points should serve as a good example of a transformation of the phase space which is ergodic or metrically transitive, then the trajectory of almost every point should be everywhere dense in the whole phase space. With overwhelming probability this should also be true of the point which at time t = 0 represents our initial configuration, and this point should spend most of its time in regions corresponding to the equipartition of energy among various degrees of freedom. As will be seen from the results this seems hardly the case. We have plotted (Figs. 1-9) the ergodic sojourn times in certain subsets of our phase space. These may show a tendency to approach limits as guaranteed by the ergodic theorem. These limits, however, do not seem to correspond to equipartition even in the time average. Certainly, there seems to be very little, if any, tendency towards equipartition of energy among all degrees of freedom at a given time. In other words, the systems certainly do not show mixing.*

The general features of our computation are these: in each problem, the system was started from rest to time t = 0. The derivatives in time, of course, were replaced for the purpose of numerical work by difference expressions. The length of time cycle used varied somewhat from problem to problem. What corresponded in the linear problem to a full period of the motion was divided into a large number of time cycles (up to 500) in the computation. Each problem ran through many "would-be-periods" of the linear problem, so the number of time cycles in each computation ran to many thousands. That is to say, the number of swings of the string was of the order of several hundred, if by a swing we understand the period of the initial configuration in the corresponding linear problem. The distribution of energy in the Fourier modes was noted after every few hundred of the computation cycles. The accuracy of the numerical work was checked by the constancy of the quantity representing the total energy. In some cases, for checking purposes, the corresponding linear problems were run and these behaved correctly within one per cent or so, even after 10,000 or more cycles.

It is not easy to summarize the results of the various special cases. One feature which they have in common is familiar from certain problems in mechanics of systems with a few degrees of freedom. In the

* One should distinguish between metric transitivity or ergodic behavior and the stronger property of mixing.


144

compound pendulum problem one has a transformation of energy from one degree of freedom to another and back again, and not a continually increasing sharing of energy between the two. What is perhaps surprising in our problem is that this kind of behavior still appears in systems with, say, 16 or more degrees of freedom.

What is suggested by these special results is that in certain problems which are approximately linear, the existence of quasi-states may be conjectured.

In a linear problem the tendency of the system to approach a fixed "state" amounts, mathematically, to convergence of iterates of a transformation in accordance with an algebraic theorem due to Frobenius and Perron. This theorem may be stated roughly in the following way. Let A be a matrix with positive elements. Consider the linear transformation of the n-dimensional space defined by this matrix. One can assert that if x is any vector with all of its components positive, and if A is applied repeatedly to this vector, the directions of the vectors xi, A(x), ..., Ai(x), ..., will approach that of a fixed vector xo in such a way that A(Xo) = A(2o). This eigenvector is unique among all vectors with all their components non-negative. If we consider a linear problem and apply this theorem, we shall expect the system to approach a steady state described by the invariant vector. Such behavior is in a sense diametrically opposite to an ergodic motion and is due to a very special character, linearity of the transformations of the phase space. The results of our calculation on the nonlinear vibrating string suggest that in the case of transformations which are approximately linear, differing from linear ones by terms which are very simple in the algebraic sense (quadratic or cubic in our case), something analogous to the convergence to eigenstates may obtain.

One could perhaps conjecture a corresponding theorem. Let Q be a transformation of a n-dimensional space which is nonlinear but is still rather simple algebraically (let us say, quadratic in all the coordinates). Consider any vector x and the iterates of the transformation Q acting on the vector x. In general, there will be no question of convergence of these vectors Qn(x) to a fixed direction.

But a weaker statement is perhaps true. The directions of the vectors Qn(x) sweep out certain cones C, or solid angles in space in such a fashion that the time averages, i.e., the time spent by Qn(x) in C, exist for n -— oo. These time averages may depend on the initial 2 but are able to assume only a finite number of different values, given C,. In other words, the space of all direction divides into a finite number of regions Ri, i = 1,... k, such that for all vectors x taken from any one of these regions the percentage of time spent by images of x under the Q" is the same in any C,.


145

The graphs which follow show the behavior of the energy residing in various modes as a function of time; for example, in Fig. I the energy content of each of the first 5 modes is plotted. The abscissa is time measured in computational cycles, bt, although figure captions give 6t2 since this is the term involved directly in the computation of the acceleration of each point. In all problems the mass of each point is assumed to be unity; the amplitude of the displacement of each point is normalized to a maximum of 1. N denotes the number of points and therefore the number of modes present in the calculation, a denotes the coefficient of the quadratic term, and / that of the cubic term in the force between neighboring mass points.

We repeat that in all our problems we started the calculation from the string at rest at t = 0. The ends of the string are kept fixed.


146

t In Thousands Of Cycles

Fig. 1. The quantity plotted is the energy (kinetic plus potential in each of the first five modes). The units for energy are arbitrary. N = 32; a = 1/4; 8t2 = 1/8. The initial form of the string was a single sine wave. The higher modes never exceeded in energy 20 of our units. About 30,000 computation cycles were calculated.


147

'ig. 2. Same conditions as Fig. 1, but the quadratic term in the force was stronger. = 1. About 14,000 cycles were computed.


148

Fig. 3. Same conditions as in Fig. 1, but the initial configuration of the string was a "saw-tooth" triangular-shaped wave. Already at t = 0, therefore, energy was present in some modes other than 1. However, modes 5 and higher never exceeded 40 of our units.


149

Fig. 4. The initial configuration assumed was a single sine wave; the force had a cubic term with 3 = 8 and 6t2 = 1/8. Since a cubic force acts symmetrically (in contrast to a quadratic force), the string will forever keep its symmetry and the effective number of particles for the computation N = 16. The even modes will have energy 0.


150

Fig. 5.N = 32; 8t2 = 1/64; / = 1/16. The initial configuration was a combination of 2 modes. The initial energy was chosen to be 2/3 in mode 5 and 1/3 in mode 7.


151

Fig. 6. 5t2 = 2-6 . The force was taken as a broken linear function of displacement. The amplitude at which the slope changes was taken as 2-5 + 2-7 of the maximum amplitude. After this cut-off value, the force was assumed still linear but the slope increased by 25 per cent. The effective N = 16.


152

Fig. 7. 6t2 = 2-. Force is again broken linear function with the same cut-off, but the slope after that increased by 50 percent instead of the 25 percent charge as in problem 6. The effective N = 16.


153

Position Of The Mass Point Fig. 8. This drawing shows not the energy but the actual shapes, i.e., the displacement of the string at various times (in cycles) indicated on each curve. The problem is that of Fig. 1.


154

t In Thousands Of Cycles Fig. 9. This graph refers to the problem of Fig. 6. The curves, numbered 1, 2, 3, 4, show the time averages of the kinetic energy contained in the first 4 modes as a function of time. In other words, the quantity is i E Z1 i=l v is the cycle number, k = 1, 3, 5, 7.


155

6—
On the Ergodic Behavior of Dynamical Systems:
(LA-2055, May 10, 1955)

This is a lecture which is included in a series of lectures on the physics of ionized gases given in 1955. It presents ideas and remarks on general properties of ionized gases connected with the "Sherwood" project. "Sherwood" was one of the early attempts to use fusion for the peaceful production of energy by the confinement of thermonuclear reactions. (Author's note.)

The purpose of this lecture is to review the present status of the socalled ergodic hypothesis, summarize the mathematical results of the last twenty years or so, and indicate briefly the nature of the difficulties that still remain in applying the general theorems to specific physical situations.

As has been pointed out in previous lectures the ergodic hypothesis can serve as a fundamental point on which to base the entire structure of statistical mechanics. (See, e.g., the derivation of the H-theorem from the ergodicity assumption by ter Haar.) My own feeling, to anticipate the conclusions of this lecture, is that the mathematical work of the last twenty years has brought complete rigor to only a small part of the theory of statistical mechanics, "the equivalent of the first twenty pages or so of a standard book on the subject." One could say here that, as is often the case, the mathematicians know a great deal about very little and the physicists very little about a great deal. The ergodic theorem itself and the subsequent proof of existence and more: the prevalence of ergodic transformations among all volume or measure preserving flows assert, roughly speaking, the legitimacy of assumptions one makes in physical theories about the limiting or equilibrium states of physical systems. The next question of importance


156

equally essential for our understanding of statistical mechanics, concerns the rate of approach to the equilibrium, if this approach, indeed, does take place. This problem seems much more difficult and only very incomplete results exist. In the second half of this lecture, I shall mention some recent results obtained with Fermi and Pasta on the would-be approach to equilibrium in certain simple nonlinear systems. These were obtained on the calculating machine here in Los Alamos.

To start with, we have a space located in a Euclidean space E of 6n-dimensions. This space is the phase of a dynamical system which we shall assume for most of the talk to be conservative. The Hamilton equations define a flow in the space E. This flow preserves the volume in the space E. This is the theorem of Liouville and it follows directly from the Hamilton equations. The measure or volume in the space is preserved exactly, not only infinitesimally to the first order, but, of course, for volumes of any finite size. I shall not discuss here the definition of the volume in the space E. It is obtained from the ordinary Euclidean volume in the 6n-dimensional space by the most elementary geometrical considerations. The space E represents the entire available phase space of the dynamical systems. It is divided onto a one-parameter family of subspaces Ek, corresponding to fixed values k of the entire energy of the physical systems. Each of these subspaces, of one less dimension than E, undergoes a flow into itself. Each of these spaces has its own volume and this volume is also preserved under the dynamical flow. It is important to note that, in special cases, each of these spaces Ek may again be decomposed into a family of subspaces of still lower dimensions, each of which flows into itself. Indeed, if there are further integrals of the given dynamical problem in addition to the integral of energy, we shall obtain such further decompositions.

The ergodic theory deals with the asymptotic properties of a flow (volume-preserving in such spaces). One is interested in the behavior of trajectories of single points under the given flow. A single point represents one possible initial condition of the dynamical system. That is to say, the position of the n-particles at time t = 0 and all the velocities at this time. As time proceeds, this representative point will describe a line in the phase space. One is interested in how this line behaves in the given space Ek (or in case of existence of further integrals, in the "irreducible" subspace of it, Ek).

It is simpler if only for typographical reasons to consider, instead of a continuous flow in phase space, one mapping and a discrete sequence of time intervals. That is to say, instead of the family of transformations TA where A is any real number, we consider Tn which means we look at the flow at intervals of one "second" each. Of course, we have in both cases either TX+ L =TA(T/) or Tm+n =Tm(Tn). These relations


157

merely express that we have a one-parameter group flow or a sequence of powers of one transformation. All the mathematical results proved so far are valid in both formulations. We shall deal with the discrete one here.

The so-called ergodic theorem proved by G. D. Birkhoff asserts the following: the time averages of functions f(Tn(p)) exists for almost every point p if T is a volume preserving transformation of a space on which a measure or volume is defined and f any integrable function. In particular (which is equivalent to the more general statement) it is true that if T is a transformation of the above sort eist fo fATi (p) lim n-oo n i=l

exists for almost every point p. fA is the characteristic function of any set A in our phase space. That is to say, f(p) = I if p does, f(p) = 0 if p does not belong to A. The sum written above merely counts how many of the first n-iterates of the point fall into the set A. The theorem asserts the existence of a sojourn time for almost every point p in any volume A in phase space. This theorem is necessary to have in order to formulate rigorously the Boltzmann hypothesis according to which this sojourn time is equal to the relative volume of the region A in phase space. So, at least the existence of the sojourn time has been proved. (It should be pointed out here that a somewhat weaker form of the theorem which we have just stated, a so-called weak ergodic theorem, was proved several months before Birkhoff by John von Neumann. His theorem asserted the convergence of our sums in the mean and not for almost every point p, as did Birkhoff, which latter is a stronger statement. The difference, however, may be of less importance to physicists than to mathematicians.)

Birkhoff noticed also that the hypothesis of Boltzmann is equivalent to the following property of the transformation T. There is no subregion of subset E' C E which has positive measure less than the measure of the whole space and which goes into itself under the transformation T. Such transformations T are called by him metrically transitive.

One might say that the result of Birkhoff, as far as applications of the ergodic theorem are concerned, shifted the emphasis from the existence of limits to the search for transformations which would be metrically transitive and so satisfy the Boltzmann hypothesis. Only very special such transformations were known and only on very special manifolds E. For example, if E should be the circumference of a circle


158

in one-dimension and T a rotation of it through an angle a, irrational with respect to the length of the circle, then the famous results of H. Weyl on equipartition of numbers n, a modulo 1 (circumference was length 1) establish the metric transitivity in this case and the ergodic theorem with it.

The generalization of this to n dimensions has the form of the wellknown theorem of Kronecker about vectors of the form (kai,ka2 , ... an), k = 1,2,..., where the a's are rationally independent of each other. Another case known was that of a flow along geodetic lines on a surface of two dimensions and of constant negative curvature. This result asserted the transitivity of such a flow, i.e., its ergodic properties, and was established by Hopf, Hedlund, and others.

In 1941, a rather general result was established by Oxtoby and the speaker,l and can be described as follows: let E be a manifold in any number of dimensions with the volume defined for its subsets. Consider all possible continuous and volume-preserving transformations of such a manifold E into itself. Most of such transformations will be metrically transitive, that is to say, satisfy Boltzmann hypothesis. The expression "most" is defined rigorously and, in particular, implies that arbitrarily near to any transformation T, given in advance, one will be able to find transformations which are metrically transitive. (Two transformations S and T are said to be within an e of each other, if for every point p, S(p) and T(p) are within the same e.) I shall not go into the technical definition of the word "most," etc., but would like to stress that the above result shows the existence of ergodic transformations on any manifold (e.g., sphere, ellipsoid, etc.), in any number of dimensions and what is more, the prevalence of such transformations among all possible ones. It is important to stress here that the transformations which are given by actual dynamical flows defined by Hamilton's equation could, nevertheless, form exceptions. An analogy: most real numbers are transcendental, but, of course, there exists a "small minority" of algebraic numbers. The question is still open whether or not the actual dynamical transformations are, in general, metrically transitive, i.e., ergodic. Our result above merely makes it very probable or, if one may say so, the probability a priori is equal 1 that most of these transformations will be ergodic. It is certainly true that every flow

may be perturbed arbitrarily little to become ergodic. The actual criteria, given a dynamical transformation, are, however, still lacking.

In what now follows, I shall discuss a number of examples of physical systems where the ergodic behavior seems a priori inevitable and which were considered during the last two years by E. Fermi, John Pasta, and the speaker. Numerical computations were performed on the MANIAC here in Los Alamos. The results were most unexpected to


159

all of us mainly because, as we shall see, they show a certain hesitancy on the part of these particular physical systems to behave ergodically, or, more exactly, the times taken for mixing or equipartition seem to be unduly long, if not infinite.

Before we take up these examples, we make two more remarks about the generality of the ergodic behavior of dynamical flows in phase space:

1) The subject of this seminar is mainly the study of behavior of charged particles or plasma in the presence of magnetic fields. The discussion given above applies to the case of motion of charged particles in a fixed given magnetic field if one neglects the changes in the field due to the motions of the particles themselves. A system of such particles acted upon by the external field and their mutual electrostatic interactions "should" behave ergodically independently of given constraints in form of walls, etc. The ergodic behavior has to be understood relative to the phase space after it has been reduced by a number of integrals of the motion imposed by such constraints.

2) It may be worthwhile pointing out here that the proof of the prevalence of ergodic flows among all possible volume preserving continuous flows as given in the paper quoted above, establishes more than equality of the time and space averages for most transformations. The construction used in the proof exhibits the general transformation as having a "turbulent" character. That is to say, a roughly periodic motion which does not, however, exactly close the orbits of points in phase space, but feeds a large periodic motion into smaller "rotational" motions which in turn do not have the orbits quite closed but feed still smaller rotations in turn and this process continues indefinitely. Fourier analysis in n dimensions of such a transformation which is the

general one would show the feeding of vortex-type flows into successively smaller vortices. This fact had not been noticed or exploited by the authors at the time the paper was written but may be useful in discussing a statistical mechanical type of treatment for motions of fluids; that is to say, systems with infinitely many degrees of freedom which, in general, it is believed tend to become turbulent. Parenthetically, we may add here that a statistical mechanical type of treatment of systems with infinitely many degrees of freedom is required if one wants to include the radiation effects due to the motion of charges. The magnetic field itself would have, of course, infinitely many degrees of freedom.

The problem which was studied numerically on the MANIAC is the following: we have a continuous string with ends kept fixed and with


160

forces acting between its elements which are nonlinear as functions of displacements. This continuous string is replaced, for the purpose of the computations, by a finite number of points, 64 in our actual work, and the equations describing its motion were: Xi = a(Xi+l + Xi- - 2xi) + 0[(Xi+l - Xi)2 - (i - i-1)2 ] i = 1,2,...,64. or i = a(xi+l + Xi-i - 2xi) + 7[(i+l - xi) - (xi - xi-) 3] i= 1,2,...,64.

d3 and y were chosen so that at the maximum displacement xi, the nonlinear term, was small, e.g., of the order of one-tenth of the linear term. The corresponding partial differential equation obtained by letting the number of particles become infinite is the usual wave equation plus nonlinear terms of a complicated nature. There seems to be, of course, very little hope for obtaining explicit solutions and what was done was to run a great number of special problems.

These were studied as follows: the initial position of the string, that is to say, the distribution of the x1 at time t = 0 was assumed to be in a form of a single sine wave or in some other simple forms, e.g., a sum of two sine waves of low frequencies or triangular. Most problems were run under the assumption of constant masses (equal to 1, say) for each i. The position of this string and the total energy of it, kinetic plus potential, were analyzed during the course of each problem in Fourier series. That is to say, we studied the 64 Fourier coefficients Ak, the total energy residing in each mode Ek, k = 1, 2,..., 64, as functions of time. Since the problem is approximately linear, for times not very long the string vibrates periodically; the length of the numerical run of the problem was, in each case, equal to several hundred or more of what would be full vibration periods from the initial condition in the corresponding linear problem, if the nonlinear terms in the force had been neglected. Of course, each single vibration period in the calculation corresponded to a hundred or several hundred time cycles on the machine so the number of computation cycles in each problem

was several tens of thousands. The initial motivation for the problem was to observe how, in the course of time, the energy of the system, initially contained in the first or the first few modes, in time flows to other modes and to observe the rate at which equipartition of energy among all modes becomes established. This problem was to serve as


161

the first one of a series in a systematic investigation of the question of rates of approach to equilibrium, so important in many problems of statistical mechanics.

The actual results were somewhat surprising to us since, instead of what one expected-a continual and steady flow of energy, say from the first mode to all higher modes and an asymptotically uniform approach to equipartition--something entirely different seems to happen. For example, in the problem where the initial position was a pure sine wave and all of the energy was in the first mode, the behavior of the system was the following: initially, as predicted by Rayleigh's perturbation analysis, that is to say, the infinitesimal study for short t, the other modes grow in energy one by one, the first mode feeds the second, the first and second together the third, the second feeds the fourth and so on. This, indeed, was observed but later on it was only single modes, say, mode No. 3, that continued to grow in energy systematically and for many dozens of the periods of the vibration the higher modes not growing at all. Then the energy in the third mode was dropping steadily and mode No. 5 was increasing. The first few modes, that is 1, 3, 5, 7, were exchanging energy among themselves slowly with the higher modes not obtaining any sizeable contributions and after 30,000 cycles or 300 "would be" full vibrations of the string, the system came back within one percent of the total energy, to its original shape, that is to say, a pure sine wave.

This behavior seems to be typical in other cases, too. It is the first few modes that exchange energy among themselves in a somewhat erratic fashion but it is always one or the other or very few of them that seem to predominate and, far from a tendency towards equipartition among all modes proceeding steadily, one sees an almost periodic exchange between the low modes. In the case of initially triangular shape where modes 1, 3, 5, 7, are mainly involved in the beginning, again these played among themselves and the string did not show, up to times where the computation was stopped, any tendency to become really turbulent or energy thermalized.

Another problem recently calculated was the following: instead of assuming the nonlinear term to be quadratic or cubic in displacement, we took this nonlinear part to be represented by a broken polygon imitating the shape of a cubic (and also small compared to the linear term). This was done because perhaps the quadratic or cubic form of the forces would introduce some analytic peculiarities which could possibly explain this almost periodic and non-mixing or only slowly mixing behavior, and here a non-analytic form of the force would possibly remove this special character and the unknown analytic reason for the unexpected behavior of the motion. The results were again very


162

similar. Starting with a single wave, only the first few Fourier modes exchanged energy among themselves. There seems to be one difference; the system does not come back to its original position, but again the first few modes only seem to exchange energy significantly. The behavior of this system thus may seem to serve as a warning against relying too much on the statistical arguments for an approach to equilibrium for systems of many degrees of freedom. These results cannot be discussed here in any detail, but a report on all the work done will be available shortly.* It was merely intended to point out the existence of cases where the estimate, a priori, based on the usual volume in phase space considerations and estimates of relaxation times seems quite inadequate. Problems which are nonlinear, but still are algebraic or "simple" in terms of the forces involved, may not be good examples of the general or random flows which Boltzmann or Gibbs had in mind but instead show an almost periodic behavior, a slow transfer of energy between the degrees of freedom. More generally, one might suspect, on certain mathematical grounds, the existence, instead of, as in the linear case, states of a system, the appearance of quasi-states between which the system oscillates. These quasi-states apparently need not form a continuum or be too dense, but may, approximately, consist of combinations of a few states in the corresponding linear problem.

Reference

1. J. C. Oxtoby and S. Ulam, "Measure-Preserving Horneomorphisms and Metrical Transitivity," Ann. Math., 40, 2, 874-920, 1941.

* This refers to the preceding report which was distributed after this lecture was delivered, but before this collection of lectures was issued. (Eds.)


163

7—
On a Method of Propulsion of Projectiles by Means of External Nuclear Explosions:
With C. J. Everett (LAMS-1955, August 1955)

This report outlines the methods and proposals which led to "Project Orion" on which vast efforts were expended at General Dynamics, Westinghouse and other places, by scientists like Ted Taylor and Freeman Dyson. Everett and the author hold a patent on the idea. (Author's note).

Abstract

Repeated nuclear explosions outside the body of a projectile are considered as providing means to accelerate such objects to velocities of the order of 106 cm/sec. A few schematic calculations are presented, showing the dependence of the mass ratios ("propellant" to the final mass), accelerations, etc, on the various free parameters entering in this scheme.

1—
Introduction

It is the purpose of this report to summarize certain considerations and proposals, some of which originated as long as ten years ago, and to discuss additional ideas concerning the attempt to attain velocities in the range of the missiles considered for intercontinental warfare and even more perhaps, for escape from the earth's gravitational field, for unmanned vehicles.


164

The methods most frequently proposed for obtaining such vehicles involve expulsion of material at high velocity froml rocket motors. This ejected material is heated in the rocket itself. either by a chemical reaction. or. in more recent schemes, by nuclear reactors.l In both cases there is a severe limitation on motor temperature and thus also on the velocity of material ejected. The well-known exponential rocket formula* then demlanlds impractical mass ratios for the attainment of final velocities Vf in the desired ranges, and multi-stage vehicles become necessary. The advantage of the nuclear rocket of this kind over the chemical type lies paradoxically not so much in its potentially enormous power source, which is limited by chamber temperature T to much the same range as chemical motors, but in its ability to use hydrogen as propellant. with molecular weight ,t lower than the average of chemical reaction products2 . thus permitting operation at higher specific impulse, which is a function of T-/.

The scheme proposed in the present report involves the use of a series of expendable reactors (fissioii bombs) ejected and detonated at a considerable distance from the vehicle, which liberate the required energy in an external "motor" consisting essentially of empty space. The critical question about such a method concerns its ability to draw on the real reserves of nuclear power liberated at bomb temperatues without smashing or melting thle vehicle.

General proposals of this sort were first made by S. Ulam in 1946. and some preliminary calculations were made by F. Reines and S. Ulam in a Los Alamos memorandum dated 1947. More recently, an additional idea was advanced. which consists in placing between each bomb and the rocket a "propellant" consisting of water or some plastic, which will be heated by the bomb. and which will propel the vehicle during its subsequent explosive expansion. Some of the advantages of this proposal will be mentioned in the final section.

In any such device. one of the principal difficulties is the heating of the rocket by the propellant. We seem to encounter a situation in which the base of the rocket will be. periodically, at one second intervals, in the proximity of a very hot gas for durations of about one millisecond each. Study of the effects of such a variable wall temperature on various materials will be made. and reported on subsequently.

The most recent idea is that the use of a sufficiently powerful magnetic field shielding the base of the rocket will have the effect of reflecting the (ionized) atoms of the hot propellant gas before they reach the rocket. thus avoiding heating of the base and incidentally gaining a factor on momentum transfer. It is hoped that this possibility

* AMo/Af = mass-ratio = exp(Vl/I). I = specific impulse.


165

also may be investigated at least schematically and reported on in Part II.* However, there appear to be many difficulties in such a study, involving the reaction of a plasma to the magnetic field. Whether the field strength required is impractically large remains to be seen. There is, it seems, the possibility of the formation of a powerful plasma current at the base of the rocket and a pinch effect, which may mean that the magnetic field becomes compressed to a smaller volume and the magnetic pressure considerably increased.

2—
Kinematics

In order to gain some quantitative insight into the elements of such a system, we propose to adopt a particular set of assumptions and to study numerically the effect of variation of parameters. The Eqs. (17) which follow are obviously highly tentative and subject to many questions here unresolved.

The vehicle is considered to be saucer-shaped, of diameter about 10 meters, sufficient at any rate to intercept all or most of the exploding propellant. Its final mass Mf is perhaps 12 tons, which must cover structure, payload, instruments, storage for propellant and bombs, and, if required, apparatus for maintaining the magnetic field. The initial mass M0 of the vehicle exceeds this by the mass of bombs and propellant.

The bombs are ejected at something like one second intervals from the base of the rocket and are detonated at a distance of some 50 meters from the base. Synchronized with this, disk-shaped masses of propellant are ejected in such a way that the rocket-propellant distance is about 10 meters at the instant the exploding bomb hits it. The propellant is raised to high temperature, and, in expanding, transmits momentum to the vehicle. The final velocity Vf is attained after N (-50) such explosions.

We regard now the i-th stage of the process. From the rocket, traveling at velocity Vi-1 with respect to the earth, are ejected first the i-th bomb (mass mB) and then the i-th mass of propellant m? at some small velocity vo relative to the rocket. It is supposed that, upon detonation, a certain fraction cr of the mass of the bomb collides inelastically with the ejected propellant mass. This fraction could be made, in our case, perhaps as much as 1/10, which is considerably more than the factor given by the solid angle. This could probably be achieved by a suitable distribution of the mass of the tamper surrounding the core

* Part II never appeared.(Eds.)


166

of the bomb. In this way, a larger fraction of the mass of the bomb would hit the propellant. (It is easy to make the distribution of the mass involved in the bomb explosion nonisotropic; the energy distribution is probably essentially isotropic.) If vB is the average velocity of explosion of the bomb in the sector reaching the propellant, we have OamB(Vi- - Vo + VB) + mip(Vi-l - vo) =('amB + mip)V ,

where Vp is the velocity relative to the earth of the center of mass of the combined system (oam, mp). If we introduce a velocity vp by means of the relation Vp = Vi--vo + vp we obtain amBvB = (amB + mp)Vp . (1)

The excess kinetic energy in this transfer is supposed to appear initially as thermal energy Hi in the propellant Hi= lmB(v) (amB + mp)(vp)(2)2 2 (

It is assumed that about half of this heat Hi reappears in kinetic energy of expansion of the propellant, with an expansion velocity VE relative to its own center I 1 Hi -(amB + mp)(v)2. (3) 2 2

We assume, arbitrarily, that in the expansion of the propellant, one half of its internal energy becomes converted to kinetic energy of expansion. This fraction depends obviously on the distance d and is, in our case, higher.

In our schematic computation we prefer to adopt this much too conservative value.

We may consider that the upper and lower halves of the exploding propellant travel with average velocities Vp ± VE, respectively. Now Eqs. (1), (2), (3) show that )2 i )2


167

and since, in all cases we consider, mi > 2 oamB, we have vE > vp.Thus Vp- vi = Vi- - vo + vp - v < Vi-l , and the lower half of the exploding propellant will not reach the rocket.

The momentum conservation equation for the rocket and upper half of the propellant should read - (OamB + mi)(Vi- l-v + p + ) + MiV1(umB + mp)(i- l - (-VO + vp + VE)) + mivi 2 or, simplifying, - (JmB + mp). 2(-vo + vp + v) = MiiV , where M i is the present mass of the rocket, and AiV is the i-th increment in its velocity relative to the earth. This assumes total reflection of the propellant. To allow for side effects and imperfect reflection, we use the equation - (amB + ) + v) = MiAiV . (4)

Finally, we assume the time Ait for the i-th acceleration to be 2d /it- (=,) (5) where d is the distance from propellant to rocket. The i-th acceleration is thus AiV(>i =/t (6) There are two cases of mathematical simplicity which we outline, and for which we include some numerical examples. (Tables 1 and 2 for the cases 1 and 2, respectively.)


168

Case 1—
Constant Acceleration

We take as independent parameters: Vf the final velocity Mf the final mass of the rocket N the number of stages (bombs) a the acceleration at each stage (assumed constant) d distance from propellant to rocket mB mass of each bomb a fraction of mB hitting propellant and show how all other parameters may be expressed in terms of these.

Thus each change in velocity will be AiV_Vf (7) over a time interval __v Vf/it== aN (8) The propelling velocity vp + v _- wi is thus 2d 2aNd a=i- - (9) jt Vf We now consider Eq. (4), setting C = (1 - )mB (10) and mi = mB + m, the total ejected i-th mass. Thus (4) becomes mi-C=k M-mj (4*) j=l where k :2AiV 1 (Vf) (11) and M0 is the initial mass of the rocket.

Writing the equation (4*) for i + 1 and subtracting shows that mi+1 = mip where 1 (12)


169

Thus mi = mlpi- , i = 1,2,...,N. We determine M 0 and ml as follows. Substituting -=mj = mi(1 - pi)/(l - p) into (4*) shows that mi( + k) =kMo + C, while, by definition, N Mo - Mf = E j = ml( -N)/(1 - p) =- (1 +k)(1 - pN) j=l

Eliminating Mo between these two relations yields nl = (kMf + C)(1+ k)N- (13) and so Mo = [mi(1 + k)-C]/k . (14) Thus we have trivially the i-th mass: mi = mi , (15) the mass ratio: M.R. = M, (16) the total expelled mass: T=Mo-Mf, (17) the total bomb mass: MB = NmB, (18) the total propellant mass: Mp = T- MB, (19) and the i-th mass of propellant: mp = mi- mB· (20) Now, solving equations (1), (2), and (3) for vp and vg in terms of v, we get vp = ornmBvB/mi - C (21)


170

and ·i1i-amBmp V Emi - C 2B Substitution into v+ + vp = w (22) yields v = (mi-C)/{l mB + a\MBm (23) whence the values of v, and vg may now be obtained, using (21) and (22), respectively.

Thus all parameters are determined in terms of the fundamental set Vf, Mf, N, a, d, mB, a. It is interesting to note that the mass ratio mi(k + 1) - C mi(k + )pN - C is (approximately in general and exactly when C = 0) (1 + k)N where k = (1/ad)(Vf/N)2 , which indicates the extreme sensitivity of the mass ratio to a, N, d, and especially to Vf, in the constant acceleration case.

A rough indication of the energy of the i-th bomb is given by the kB = (1/2)mB(vB)2 included in the tables. The actual yield of each bomb is several times greater since we assumed a special shaping of the tamper to concentrate as much as possible the mass, but not the energy of the exploding bomb, towards the propellant.

Table 1 is intended to show how the various factors in the problem depend on the initial parameters N, a, d and mg. None of the twelve "problems" is intended as an optimum case. It may be noted that problems 1 and 2 with Vf = .7 x 106 are included for the sake of comparison with various intercontinental ballistic missiles schemes. It should be noted that our mass ratios are considerably less than those contemplated in such cases, while the accelerations are very much more (w 10,000 g's), lasting for periods of about I millisecond each. One also notes that the bombs are rather "small" (1019 - 1020 ergs).


171

blank


172

Case 2—
Constant Mass

In this case, which closely corresponds to the usual rocket assumption, we take as independent parameters Mf, N,d, mB, , and now mp, VB (assumed constant) instead of a and Vf.

Thus we have for the mass expelled at each stage: m = mB + mp , (24) the total bomb mass: MB = NmB, (25) and the total propellant mass: Mp = Nmp , (26) the total mass expelled: T = MB + MP, (27) the initial rocket mass: Mo = Mf + T, (28) and the mass ratio: M.R. = Mo/Mf . (29) Since vB is given, we find from Eq. (1) that vp = crmBvB/(crmB + mp) , (30) while Eqs. (2) and (3) show that 1 2mp 2 omB + mP) (31) and VE = VH/(amB + mp) . (32) Hence we again have a constant propelling velocity given by J = VP + VE . (33) The "rocket equation" (4) now becomes L = - (amB + mp)w = MiiV , ~~172~~ 2


173

the left side being a known constant, and Mi = MO - im being a known function of i = 1,. ., N. Hence we can compute the i-th increment of velocity AiV = L/M , (34) and the velocity after i stages: i Vi - E V . (35) j=l In particular the final velocity is N Vf = VN= V. (36) j=l The time Ait is given by the constant Ait =2d/w , (37) and hence we have the i-th acceleration ai = AiV/zAit. (38)

In particular, amin = Ol= ()/(Mo - m) , (39) 2d / and armax = oN = ( )/Mf (40) In analogy with the usual rocket equation, our Eq. (34) might be written (L) m = MiAiV , or, letting 3= L/M= (B + mp-) w- dM=MdV 2 \ m whence dM M =-dV


174

and Mo In M = /3-lV or Mo- evf// which affords a rough estimate of Vf, namely Vf -Lin(M.R.) m

In Table 2, Problem #4' is intended to be an analogue of Problem #4 of Table 1, while Problem #12' is intended as a companion to Problem #12 of the former table. It may be noted that in order to duplicate the performance of a given rocket of constant acceleration a by the second method, one requires accelerations whose average is ~ a and which, therefore, individually greatly exceed a in the final stage. It may be that the method of Case 1, although unorthodox, has advantages in this sense which might justify the use of bombs of variable yield.

3—
Remarks

1. The mass of each fission bomb is assumed to be of the order of 500 kg, including tamper and explosive. Since these bombs are of small yield and many of them are required, they might be of hydride composition. Certainly a disadvantage of our scheme is its wastefulness of fissionable material.

2. The figure of 12 tons for the final mass of the projectile was assumed arbitrarily in our computations. Actually increasing this number with a proportional increase in the mass of the propellant is very advantageous since the mass of the bombs need hardly be increased even though their yields can be made considerably greater. Thus with, say, 20 tons for the vehicle the mass ratio will be more favorable.

3. Assuming ~ I second intervals between explosions, the total duration of the process will be less than 100 seconds, and the resulting loss of velocity due to the earth's gravitational pull will not exceed 105 cm sec-1 . Thus the velocity Vf of Section 2 should be taken as the actual desired final velocity plus 105. This explains our use of Vf = 1.2 x 106 = 1.1 x 106 + .1 x 106.


175

TABLE 2 (c.g.s. units) *The complete Alv table is not included.


176

4. The accelerations of the order of 10, 000 g are certainly large, and must be rather uniform over the entire structure or breakage is inevitable. The question of the necessary strength for our structure under such accelerations has not been studied. Shock heating in these accelerations is believed to be small.

5. The problem of predetonation of remaining bombs by neutron flux from previously exploded ones must be considered. Strong source bombs and suitable shielding should overcome this difficulty. One should also consider the heating of the vehicle by neutrons and y-rays. Solid angle considerations insure that this effect will be small.

6. The propellant could be made of a solid material fabricated in N sheets which are placed at the bottom of the projectile. They are detached one by one and expelled to the desired distance. They could be separated by very thin ceramic layers. The placing of the propellant at the bottom of the structure has the advantage that the problem of heating of the permanent structure is attenuated. After each explosion only a small fraction of the next sheet of the propellant would be lost by evaporation and melting.

7. The problem of heating by the propellant and the possible avoidance of this difficulty by the use of magnetic fields have yet to be studied and will be reported in Part II as indicated previously.

8. The whole scheme presupposes elevation of the entire structure beyond the earth's atmosphere by a chemical booster rocket. On the other hand, for the first few explosions we could use air as the propellant with a resultant gain in our mass ratio and with smaller accelerations.

9. We have assumed that the expansion of the thin propellant layer will be essentially perpendicular to its disk surfaces. The losses due to sidewise expansion beyond the base of the rocket were treated summarily by halving the momentum imparted each time to the base of the projectile.

10. The problem of stability has not been seriously studied. The saucer must be so designed that the "center of push" is ahead of the center of mass. Since the immediate impact is at the base of the rocket, stability will probably be a major problem.

11. At little additional cost in mass a V-2 or Viking type of vehicle


177

could be carried as part of the payload and the saucer jettisoned after the escape velocity is attained. The standard rocket could then proceed under its own power with greater control over its trajectory.

12. The position of the propellant provides a given momentum with larger mass and smaller velocity than would be the case if the same mass of propellant surrounded the bomb, where solid angle losses are considerable. Moreover, it is presumably easier to eject the heavy propellant mass to the smaller distance.

One could even consider iterating this scheme by providing a propellant in two parts at distances of, say, 10 and 20 meters from the rocket, thus increasing the contribution of vp and decreasing that of VE to the velocity w.

References

1. Lee Aamodt, The Feasibility of Nuclear Powered Long Range Ballistics Missiles, LAMS-1870, (Del.) 3/24/1955. B. B. McInteer, G. I. Bell, R. M. Potter and E. S. Robinson, A Pachydermal Rocket Motor, and Appendix on Porous Tube Criticality Calculations, LAMS-1887, 5/24/55.

2. LA-714 T-Division Progress Report, Feb.-Ma. 1948.


179

8—
Some Schemes for Nuclear Propulsion, Part I:
With C. Longmire (LAMS-2186, March 1958)

Part I of this report is a discussion of the proposals made in the preceding report. (Author's note).*

Introduction

It is intended to present here a qualitative description of certain schemes for nuclear propelled rockets. The ideas sketched in the sequel stem from the schemata proposed by some of us in the past. Various details and technical points were discussed in a Rocket Group which meets weekly in our Laboratory.

The scheme discussed here might be considered as intermediate between the one outlined in report 71 and the ones where the idea is to propel a nuclear rocket by having a gaseous fission reactor operating inside the vehicle.2

Part I—
C. Longmire and S. Ulam-Internal Explosions

Briefly speaking, we imagine a great number N of very mild explosions taking place in succession. These explosions involve bomb-like assemblies of either metal surrounded by a small amount of high explosive and essentially hydrogenous material or UDk cores. Each of these explosions is supposed to heat the total mass involved only to

* Only Part I is reproduced here, as Part II is by F. Reines. (Eds.)


180

very moderate temperatures. To fix the ideas we consider temperatures of the order of 3/4 ev, i.e., 9,000°C, although temperatures up to a few ev may be useful. Each of these explosions will involve only several kilograms of active material and several tens of kilograms of hydrogenous material, and therefore the total yield of the order of a few hundreds of kilograms (sic!) of TNT equivalent. These explosions are, properly speaking, "fizzles" resembling burning rather than a true nuclear detonation. One imagines a large chamber with steel walls of roughly paraboloidal shape with the "explosions" taking place at its focus. The chamber may be considered, for the purpose of this discussion, as being evacuated except for the material to be exploded. The linear dimensions of the chamber are large compared to the assembly which is exploding. For orientation, we may assume the diameter of the chamber to be of the order of 4 meters, whereas the diameter of the bomb, together with the enclosing hydrogen, is say of the order of 40 cm. Each of our bombs should be thought of as being in a liquid or solid state before the explosion. This explosion will convert its whole mass into gas which will expand and fill the chamber with high velocity particles impinging on the walls and ultimately escaping from the chamber.

FOCUS

The "bombs" are brought in in rapid succession from a storage chamber and brought to the "nozzle" chamber where they are exploded. Compared to the proposals made in the preceding report the present scheme differs in the following respects: The explosions are of smaller yield. Their number is greater by a factor of 10 or 20. They will be of longer duration and lesser violence, and therefore, by order of magnitude, the individual accelerations given to the body of the rocket in each push will be smaller. Secondly, they are made internally, which allows a greater fraction of mass to be used in imparting the momentum. This, of course, is more than counter-balanced by the greater number of supercritical assemblies that one has to employ. Let


181

us say from the beginning that the total amount of fissionable material expended will be of the order of a few tons, at least for a first design. This makes it appear, offhand, that the primary use of such rocket motors would be to have large satellites and vehicles for interplanetary travel, rather than for stockpiling in large numbers.

We shall employ the following notations:

N = total number of exploding assemblies Ei i=1 ... N = the energy release in the i-th explosion Mi = the total amount of material exploded mi = the mass of fissionable material in the i-th "bomb" R = the diameter of the nozzle chamber Pi = the pressure on the wall of the nozzle d = the thickness of the wall Ve = the velocity of propellant mass escaping the chamber p = mean molecular weight of the bomb material Ti = the temperature to which the mass Mi is brought as a result of the nuclear reaction Ww = weight of the walls of the paraboloid Wp = weight of the propellant Wa = weight of the structure of housing of the bombs, and injecting mechanisms, instruments and "payload" W = W, + Wp + Ww total weight.

We now give a tentative set of values in c.g.s. units for our quantities. N = 103 mi = ml = 5.104 gms R = 2.102 cm T = 3/4 ev d = 3cm / = 3.

The effective volume V of our paraboloid with a length of 300 cm would be V - R2t = 3.14 x (2.102)2 . 3.102 ~ 4.107 cc. W, -~ 27rRdp =6.3 x 2.102 .3.102 x 3 x 8 -10 tons Wp ~ 103 x 5.104 - 50 tons Wa ~ 10 tons W ~ 70 tons.


182

The exit velocity ve of the propellant will be sensibly higher than the thermal velocity of our material at the temperature obtained in the nuclear explosion. This is so because of the effects of the recombination of the molecules and ions. If T = 3/4 ev, , = 3, the thermal velocity v is about 6 km/sec., and the final ve about 10 km/sec. The energy Ei of each explosion is then given by E ~ 1/2 mv2 = 1/2 x 5 x 104 1012 = 2.5 x 1016 ergs, about 500 kgs of TNT equivalent. The pressure on the walls will be of the order of P (y - 1) E/V = (.4x2.5x 1016)/(4x 107) ~ 2.5x108 ~250 atmospheres ~4000 lbs/sq.in.

In the first discussion we shall assume that the quantities are independent of i, that is to say, each assembly and explosion have constant characteristics.

The numerical data above represent merely an order of magnitude orientation about the scheme and are, of course, in no way optimal. There are many degrees of freedom in this scheme. Obviously, most of the fissionable material is "wasted" and we could choose our yields Ei within a very wide range of values-also the composition of the hydrogenous material surrounding the bomb and its mass in proportion to the mass of U235 is at our disposal, in a large measure. The geometries of the chamber, etc., seem not to be limited from above by the numbers adopted here.

Speaking qualitatively, the possible advantages of our scheme are as follows:

1. If we admit that the temperature of the material heated by the nuclear explosion is of the order of 1/2 -1 ev, the expansion of this material in the vacuum of the nozzle chamber will convert most of the energy released and initially present in the form of thermal energy to kinetic energy of the particles with the corresponding cooling of the gas. The velocity of the escape of the propellant will be therefore of the order of 10 kilometers per second, that is to say, the velocity of a satellite. For the velocity of the final "payload" to be of this order, one needs only a ratio e between the mass of the propellant and the mass of the installation and instruments, etc.

2. We mentioned a ratio of about 10 between the linear dimensions of the nozzle chamber and those of the exploding assembly. The density of the gas which will fill the chamber before impinging on the walls will be therefore 1/1000 of the original density. This means that the pressure of the wall will be moderate. The tensile strength of the wall of a fixed thickness depends, inversely, linearly on the inner diameter. If we assume that the pressure on the wall is given by the


183

Bernoulli formula P = 1/2 p (v2 ) --since p depends inversely on the cube of the linear expansion there is obviously a gain by having the walls of given tensile strength far apart. This gain obtains as long as the total weight of the propellant, auxiliary equipment, and the "payload" exceeds sensibly the weight of the walls of the chamber where the explosions take place. Heating by neutrons and gammas becomes even less of a problem when, with the weight of the payload and equipment essentially constant, the chamber is large.

Considerable computational and experimental work seems necessary to provide a design of the above sort. First of all one should try to calculate individual explosions which are to heat the material to be reproducible and as precise as possible. This should be done with the greatest possible economy of the fissionable material. Probably experiments have to be made with actually exploding such assemblies in order to learn about their characteristics. The action of the expanding gas on steel or tungsten covered structures has to be studied in order to understand the erosion of the material by successive explosions of this sort. The velocity of the propellant leaving the chamber has to be calculated--possible benefits from shaping the exit of the nozzle should be studied. One should discuss the possibilities of cooling of the walls by "sweating" if that should be necessary. We have not discussed the problem of "pumping" individual assemblies at a sufficiently fast rate and the concomitant engineering difficulties. At any rate, the problem here involves a shoving in of masses of the order of 50 kilograms each in intervals of about 1/10 sec. The problem of neutron heating should be calculated in detail, also the problem of the residual gas remaining after the (i - 1)th explosion at the time when the i-th explosion is to take place, etc.

It seems likely that a shock absorber between the thrust chamber and the remainder of the missile is desirable, to spread the sharp impulses out over time as well as possible. The number of g's that the main structure has to stand can thus be reduced to a small number.

It appears that steel of average 3 cm thickness will, for our choice of the radius of the wall, contain 4000 lbs/sq.in.

The wall can be coated with tungsten to resist the temperature of the gas accumulating on its surface. The contact of the gas with the wall is of extremely short duration-in a case like the one illustrated above about I millisecond for each explosion, so that heating by conduction is seemingly negligible.

Materials other than steel could be considered for confining the exploding gas, with greater strength for weights than steel.

The main problem is the construction of "economic" bombs giving yields of ~1 ton of TNT equivalent.


184

We had the benefit of conversation with George Bell on this problem and C. B. Mills is in the process of calculating critical masses and "alphas" for such UDk assemblies.

References

1. On a Method of Propulsion of Projectiles by Means of External Nuclear Explosions, Part I, Everett and Ulam, September, 1955.

2. T-821, Nuclear Chinese Rockets, C. Longmire, May, 1956.


185

9—
On the Possibility of Extracting Energy from Gravitational Systems by Navigating Space Vehicles:
(LAMS-2219, April 1, 1958)

This report contains the outline of calculations connected with ideas for using suitable orbits so space vehicles can gain energy from near encounters with stars, planets, asteroids, and such. Orbits like these have since been used repeatedly in some U.S. planetary missions. (Author's note.)

It is intended to outline in this brief report a number of problems of the following type: We assume an astronomical system composed of two or more stellar bodies and a space vehicle which, as an additional body of infinitely small mass compared to the celestial objects, forms part of a many-(e.g., 3) body system. We assume that the "rocket" not only describes the trajectory under the action of the gravitational forces, but also that it has still a reserve energy available for steering by suitably emitted impulses. This energy in the discussion below will be assumed to be roughly of the order of the kinetic energy which the rocket already possesses. The problem, broadly speaking, involves the possibility of using this reserve energy in such a way as to acquire, by suitable near collisions with one or the other of the celestial bodies, much more kinetic energy than it possesses-more by an order of magnitude than the available reserve energy would allow it to acquire by itself.

As examples of the situation we have in mind: Assume a rocket cruising between the sun and Jupiter, i.e., in an orbit approximately


186

that of Mars, with an energy in reserve which would allow the kinetic energy of the vehicle to increase by a factor like 2. The question is whether, by planning suitable approaches to Jupiter and then closer approaches to the sun, it could acquire, say, 10 times more energy. Another example would be a space vehicle moving in a double star system "half-way in between." Then the question is whether, by using additional impulses of its own, it could acquire again a kinetic energy much greater than what it already possesses.

As a purely mathematical problem we could consider the case of two mass points each of mass 1 forming a Keplerian system, and a rocket of mass vanishingly small compared to 1 in an orbit which forms a curve between the two mass points. Suppose that the reserve power of the rocket is such that it could double its kinetic energy. Question: Can one, in this idealized condition, obtain a velocity arbitrarily large (i.e., close to light velocity)?

That this possibility exists seems extremely probable from the theorems of ergodic transformations.l It has been shown that arbitrarily near to any given transformation, like the one given by the Hamiltonian describing the n-body system above, there exist transformations which are metrically transitive, that is to say, in particular, Liouville flows such that the trajectory of the system will penetrate arbitrarily near any point on the phase space. The theorem has been proved for bounded phase spaces. This does not make our theorem inapplicable to the problem. We could put in cutoffs in the distance of approach and assume at a finite but very great distance from the gravitational bodies another cutoff. The theorem would imply that arbitrarily near the given dynamical motion there exists one which will make the rocket approach as close to the cutoff sphere surrounding any one of the given mass points as we please, which would in particular imply obtaining arbitrarily high velocities. The theorem asserts the existence of such motions arbitrarily near given ones. The question whether these can be obtained by changes effected through emitting additional impulses inside the rocket is not essentially answered, but in view of the prevalence of ergodic motions near a given one, this seems extremely likely. Such an ergodic trajectory would, of course, in particular provide arbitrarily high velocities. Nothing is said, however, about the times necessary

for effecting this. They might be of super-astronomical lengths. It is clear, on the other hand, on general thermodynamical grounds, that "in general" the equipartition of energy may take place. This implies that the body with the small mass of the rocket will acquire very high velocities. This is well known even in systems of a moderate number of particles. The energy distribution is Maxwellian, again tending to provide the small masses with high velocity. The problem is whether,


187

by steering the rocket, one can to some modest extent acquire the properties of a Maxwell demon, i.e., plan the changes in the trajectory in such a way as to shorten by many orders of magnitude the time necessary for acquisition of very high velocities.

As is well known, the perturbations of Jupiter on the motion of some comets provide them occasionally with velocities of escape from the solar system. It has been noticed also2 that one can use the attraction of the moon to provide a rocket with additional kinetic energy, enabling it to escape from the earth's gravitational field even if it did not have enough energy to do that to begin with.

Our problem is whether one can do it repeatedly to obtain essentially arbitrary kinetic energies by repeated and suitably timed approaches to the two or more celestial bodies.

The question is that of finding general recipes for a 3-(or more) body probleni to achieve that aim as quickly as possible. It is proposed to calculate some very schematic, simple, but perhaps instructive cases, for a "strategy" of steering the rocket.

1. The first case involves a problem in one dimension. Suppose two masses oscillate at the end of a segment with given amplitude, say, harmonically; a point of vanishingly small mass is rocketed in between and, possessing some initial kinetic energy, collides elastically with the two oscillating end points. If these should be in phase, the calculation will show the increase of kinetic energy of the small mass. If the phases of the two oscillators should be randomly independent, the question arises how to plan the emission of additional small impulses by the middle point, so as to make it increase its kinetic energy most efficiently. Obviously, one should plan to collide head on with the two oscillators as much as possible. In other words, through additional impulses, collisions that lead to a gain in energy for the "free" point should be maximized as far as possible. The ones that involve colliding by overtaking the receding end point should be diminished in their effect. Without a strategy of changes in the velocity of the "rocket", the gain of energy towards an eventual equipartition would be a very slow process-the rates at which this approach to equipartition takes place are unknown in statistical mechanics, but certainly the gains in a random process increase with the square root of time or slower. With an operating intelligence perhaps this approach to near-equilibrium could be made vastly more rapid.

2. This problem will involve two mass points describing quite elongated Keplerian ellipses around their center of mass. The rocket moves initially in a roughly circular orbit in between the two masses


188

around their common center. Of course, the actual trajectory in this 3-body problem is very complicated. The question is again to plan a strategy of changing, by small amounts, the energy of the small object so as to approach one or the other of the large bodies to gain kinetic energy. If this is to take place, the approach to either of the two bodies must be increasingly closer. This involves great elongation and an increase in apastron of the rocket. Again the plan is to make near collisions head on. Presumably the planned changes, that is to say, the emitted impulses from the rocket, will be most efficient when the body is at maximum distance from the center of gravity of the two celestial points. It is there that a small increase in velocity will enable one to make changes in the time of the next approach.

The above discussion is, of course, intended for a purely theoretical, mathematical question. Even so, during the next few decades large objects may be constructed with a cruising velocity of 20 kilometers a second, and there will be still some additional energy left for changes in this velocity. It is obvious that the process of increasing the kinetic energy of the rocket by such extraction of gravitational energy from celestial motions, is, at best, very slow. The computations required to plan changes in the trajectory might be of prohibitive length and complication. This little note is meant merely as an introduction to exploratory analyses and calculations undertaken with Kenneth W. Ford and C. J. Everett of LASL.

References

1. J. Oxtoby and S. Ulam, "Measure-Preserving Homeomorphisms and Metrical Transitivity," Annals of Mathematics, 42 874 (1941). 2. Krafft A. Ehricke and George Gamow, "A Rocket around the Moon," Scientific American 196, No. 6, 47 (1957).


189

10—
Quadratic Transformations Part I:
With P. R. Stein and M. T. Menzel (LA-2305, March 1959)

This report is the original study of properties of iterations of non-linear transformations. It has given rise to a very large body of work, and by now, an extensive literature is still appearing on the subject at an ever increasing rate. (Author's note).

Abstract

This report deals with the properties of a restricted class of homogeneous quadratic transformations, with interesting physical and biological analogues, which we have called Binary Reaction Systems. All possible transformations of this class in 3 variables have been studied numerically on a computing machine, and the limiting behavior of random initial vectors under iteration of each of these transformations is tabulated. Some examples of 4-variable Binary Reactions Systems are also studied, and a few generalizations of the notion of Binary Reaction System are investigated for particular cases. Some remarks and results concerning the behavior in the large are presented, and examples of the mode of approach to the limit are given. Several of the more interesting phenomena are illustrated graphically.

The appendix deals with a different class of homogeneous quadratic transformations (of arbitrary dimension) which arise naturally from the study of a


190

simple evolutionary mode. For this class of transformations, the limiting behavior of arbitrary vectors under iteration can be given explicitly.

Introduction

This report summarizes and discusses some recent studies of the properties of quadratic transformations in several variables under iteration. This report is of an interim nature, and consists mainly in a presentation of "experimental" (i.e., numerical) results. A general theory of quadratic transformations (in contrast to the linear case) is essentially non-existent, and from the theoretical point of view, the work summarized below does little to improve the situation. It can only be hoped that as more facts become known, some outlines of a theory-at least a classification or "descriptive theory"-will emerge.

The motivation for the considerations which follow lies in the combinatorial problems suggested by genetic or biological systems. One has to deal with large populations of individuals (or particles) present in a given generation. Those may combine in pairs and produce, in the next generation, new particles. Suppose the original particles are of N different types. Given a rule for the type i (i = 1,... N) produced by mating of individuals of type j and k, the proportion or fraction xi of a given type in the next generation will be a quadratic function of the two fractions zj and xk.

More generally, one could consider a system (gas) of physical particles with N possible characteristics which collide in pairs and produce through the collision, say, a pair of particles with new characteristics. There could be many different values of the momenta, but possibly the "type" of the particle resulting from the collision could be different from the original ones.

Our present considerations concern the averages or expected values of the fraction xi in the next generation. The role of fluctuations or deviations from the fractions given by the quadratic formulae will be studied in a subsequent report.

As suggested in the title, this report is the first of a series (one might add, of unspecified length). By the time these remarks appear in print, much of the work will probably have been generalized and the results extended. It is hoped that at least some of the tentative conclusions presented below will stand.


191

I—
Homogeneous Quadratic Transformations

We begin by defining a homogeneous quadratic transformation as a set of N coupled non-linear first-order difference equations of the form: N Xi=Xi zke i=l,...N (1) k,Q=l where the -y are some real numerical coefficients with the property: Ik = ijk (2) In the present work we restrict ourselves to the case of non-negative coefficients: 7" > 0, all k, , i (3) Then if we choose the Xj as non-negative real numbers, the xj will also have this property. In the following we shall always restrict our Xj in this manner.

In order that the system (1) be of dimension N, we insist that for a given index i, not all ye can vanish, i.e., N k > 0 (4) k,Q=l

Systems of the form (1) (not necessarily with the restrictions of reality or non-negativeness) can be considered from two points of view:

(a) As the transformation of all vectors ( ) into vectors XN X IN or, more generally, as a mapping of some specified region X into a region X'. (b) As a set of difference equations which determine the value of X1I a vector at "time" n from the value of some initial vector X(°) XN (XN)


192

If we take the first point of view we are, in effect, studying a single iteration, the "mapping" in explicit form. The literature does contain some sporadic work on this problem for low values of N.() On the other hand, the second point of view does not seem to have been considered except for the case of one dimension.* The general "solution" to the problem posed by (b) is the explicit construction of the vector x(n) in terms of the initial vector x(0 ) and the time-variable n. Such a solution can only, in general, be presented as an iterative procedure. The most one can hope to do is to predict the limiting behavior of the system as n -— oc. Luckily, for practical applications this is usually the only thing of interest. However, even this problem has hardly been touched upon.

In the limit n - oo a variety of behaviors is possible; the vectors may, for example, converge under iteration to a limiting vector x, they may oscillate between a finite set of limit vectors xi, or they may exhibit a more or less chaotic behavior, i.e., do neither, but have an ergodic behavior, i.e., the limit in time of average 1/N Z_4 Xi will exist. (The last is familiar from Kronecker-Weyl's theorem4 of irrational rotations; see also Ref. 5.) The type of behavior, as well as the numerical value of the limit (if it exists) may or may not depend on the initial vector x(O) All the examples (with one exception--see Section XI) studied in this report turn out to be of the first two types; i.e., they either converge to a limit vector or converge to an oscillation between a finite set of vectors in a definite order, i.e., with a definite period.

II—
Normalization

Formally, the system (1) contains N2 (N+1)/2 parameters, the yke. It proves convenient to reduce this number (somewhat arbitrarily) by postulating the condition: N S 7Z = 1, all k, e (5) i=1

* To be sure, there is an extensive literature on coupled non-linear differential equations of 1st order in 2 variables.2 For N variables, a special class of differential equations is treated in V. Volterra.3


193

This has the great practical advantage that if we now normalize the initial vector by the conditions: N Exi - S (6) i=l then if we divide the right-hand side of each equation of (1) by the constant S, we have: N x = S (7) i=l

In biological terminology (see below) this means that we restrict ourselves to the case of a "constant population." Of course, there is no loss of generality in taking S = 1. In the sequel we therefore always impose condition (5), and furthermore, take: 0 < x(°) < 1, for all i with o) = 1 (8) i=l

This, of course, reduces the number of variables to N - 1. (For an example of a different procedure when (5) is not postulated, see the last section of the Appendix.) The xi are now restricted to lie on the positive portion of the hyperplane: N xi= 1 i=l

Even with all these restrictions, it has so far proved impossible to give any general theory of the limiting behavior of the systems (1). However, one sub-class, defined by certain reasonable restrictions on the coefficient -k, has been completely studied for all N. The results for this case are summarized in the Appendix. The rest of the report is concerned with a discussion of a different sub-class which seems to be of considerable interest, but for which no general theory as yet exists.


194

III—
Binary Reaction Systems

1. Equations (1) have a natural interpretation in terms of biological or genetic language. Consider a large population which consists of N different "types" of individuals.* Let Xj represent the fraction of male individuals of type j, xz be the fraction of the population of type e (we assume that there are equal numbers of males and females of every given type, hence we may represent them both by the same letter x). Then the system (1), defined by the coefficients y7e, determines the composition of the next generation, which results from a random pairing (once for each individual) of the population at the present generation. If we assume that the members of the old generation do not survive into the next, then for very large populations the expected value of the fractions of the individuals of type j will be given by the system (1). By virtue of our restrictions: N kl = I, all k, (5) i=l N yxi = 1, <xi < 1 (8) i=l

the size of the population is constant. The system (1) can therefore be looked upon simply as defining a "mating rule," i.e., determining the characteristics of the offspring from the characteristics of parents. The problem, of course, is to determine the composition of the limiting population.

2. As mentioned in II, the set of equations (1) is, in general, very difficult to study. Consequently, we decided to restrict ourselves initially to a sub-class of systems defined by the following additional restrictions on the coefficients kE'

For each pair k, , there is exactly one value of i for which /k = 1. For all other values of i the coefficient is zero; i.e., Vi = 5ii, (9) where each pair (k, c) determines one value of i l.

* These "types" are not, of course, meant to correspond to the genotypes or phenotypes of Mendelian genetics!


195

This means that every term in the product (xl +2 ...+x N)2 will appear in exactly one row of the set (cross-terms appearing with the factor 2). For example, X1 = X1 + x2+ x4+ 2X1X42 4XX + 2X 3X 4 X2= 2x1X3+ 2X2 x3 33 = 2x1X2x43

Since there are N(N + 1)/2 terms in the square of the sum, the number of different possible systems of this sort is clearly equal the number of ways of placing N(N + 1)/2 different objects in N boxes, no box being empty. Setting P- N(N + 1)/2, this number is easily shown to be: TN = NP ( (N-1)P + (N-2 -. .. (10) () 2 For example, T3 = 540 T4 = 818,520.

Naturally, many of these are equivalent to each other under permutation of the indices 1, 2,... N. A lower limit to the number SN of different systems, inequivalent by permutation, is: T*TN (11) N N!

In fact, the number S N of inequivalent systems (in this sense) will be somewhat higher than this because of the fact that some of the systems are formally invariant under certain of the N! permutations. Thus, for N = 3, T* = 90, but by actual enumeration it is found that there are 97 inequivalent systems, i.e., SN = 97. For N = 4, Tk = 34,105; at the present writing the actual number SN of inequivalent systems has not been determined by us.

We have called systems restricted by condition (9) "Binary Reaction Systems. " The reason for this name is that such a system associates with each pair (Xk,x() a unique result, say x'. Symbolically: k ® (-Dj (12)

This seems to us to be a natural and simple definition for binary reactions in which "particles" of types k and e produce by "collision"


196

particles of type j. (The "genetic" case is inherently more complicated; see, for instance, the Appendix.)

More general and natural, though less simple, would be a rule of the form: k e f (j,m) (13) i.e., a pair of particles produces a pair of not necessarily similar particles. We have studied a few generalizations of the simple scheme (12), though not yet in comparable detail (see below, Section VIII.2).

The reaction rule may be presented in tabular form. Consider, e.g., the system: The table for this would be:

Considered as an algebraic system with a law of composition (multiplication) given by the table, this scheme is seen to be commutative

(xixj = xjxi) but non-associative, e.g., (XlX2)x 3= X3X3 = X2 X 1(X 2X 3) = XlXl= X1

Binary reaction systems, as defined above, are always commutative (since each product occurs in only one of the set of equations) but are not in general associative. Indeed, for N = 3 there are just five associative schemes of this sort:


197

This last table corresponds to the finite group in 3 variables. Although this classification is suggestive, it does not appear that these associative systems are distinguished from the non-associative ones as regards their convergence properties. This is at least the case for N = 3, and the presumption is that no particular significance will attach to the associative property for higher N either. However, if the "reaction table" possesses the properties of a group table, certain special properties are easy to establish; e.g.:

The fixed point of the transformation has coordinates xl = x2 ... XN = 1/N and is attractive, i.e., the iterates of any vector in its neighborhood converge to it.

IV—
Procedure and Results

At this point it is necessary to describe our experimental procedure and results in some detail, since they will be referred to frequently in what follows.

As stated above, for the case of 3 variables, X1,x2,x3, there are 97 binary reaction systems inequivalent to each other under permutation of the labels 1, 2, 3. Each one of these has been studied numerically (on an IBM 704) by having the machine select randomly three initial vectors with coordinates x(), ) x (satisfying x() = 1, 0 < x <1) 1 ,22 ,23 i _ 2__ stsyn and letting the computer iterate the transformation in question "as


198

long as necessary," i.e., until some definite limiting behavior was observed.* In all but two cases such limiting behavior became evident without further analysis (one "ambiguous" case is discussed in Section IX.1; the other is mentioned in Section XI). All other transformations eventually either:

(a) Reached a stable distribution or a "fixed point" of the transformation, or (b) Oscillated between two or three fixed sets of values.

Only in (relatively) few cases did the behavior depend on the choice of the initial vector. However, the rate of convergence (used in the generalized sense to refer to both fixed points and "fixed" oscillations) often varied considerably with this choice.

The main results are contained in Table II. Here each system is written down in symbolic form, the results of iteration being given below. Each system has attached to it a conventional symbol, e.g., I.5.p, II.l.d, etc. The Roman numerals I, II, III refer to a distribution of quadratic terms on the right-hand side corresponding respectively to the three partitions of 6 into exactly 3 parts, viz.: (3,2,1), (4,1,1), (2,2,2). The other symbols refer to distributions within these main divisions, and are purely conventional. (They correspond to a particular order in which we have examined these cases on computing machines.)

A few examples will serve to illustrate the notation. (a) Consider the system: x1 = 2x lx2 + 2x lx 3+ 2x 2x 3 x2 = x2 + x3 Conventional name: I.6.b , =_ 2 X3 2 In Table II this appears symbolically as: 2(12) + 2(13) + 2(23) b (11) + (33) b.d.p. ) not degenerate (22) i.f.p. given by(x = x2 ): x= .56311573 2x4 + 2x3 _ 2 - 3x + I = 0 x2 = .32878482 i.f.p. m -* i.f.p. X3 = .10809945

* Although the sum xl+x 2+x 3= 1 is formally conserved, it is necessary to normalize at each step to avoid loss of accuracy by round-off errors in the last digit.


199

The notation b.d.p. means that if any of the initial x's = 1, the limiting configuration will be an oscillation between the two states: (x1 = 0, x2 = 1, x3 = 0) and (xl= 0, x2 = 0, x3 = 1). This we call a "boundary double point" (b.d.p.). Since it is evident from the structure of the system which variables will assume these values, it is not necessary to specify the b.d.p. more completely. In some systems the b.d.p. will only be reached if either of some two rather than any of all 3 variables is initially equal to 1. These cases are always immediately obvious from the structure of the system.

The words "not degenerate" mean that if initially we have some x0) = 0, it will not automatically remain so for all time, i.e., that xi is not a factor of the r.h.s. of the ith equation. The notation i.f.p. (x = x2) followed by an equation means that there exists an interior fixed point (i.f.p.), that is, a fixed point with no xi = 0, and that the value of one of the variables (in this case x2 ) is given by the relevant root of the equation. This equation is simply gotten by suppressing the primes on the l.h.s. and eliminating two of the variables. By relevant root we mean a real root between 0 and I which satisfies the set (sometimes extraneous roots are introduced in the elimination process; these do not satisfy the original set). To the right of this is given the resulting fixed point obtained from this equation. The notation m -+ i.f.p. means that the transformations as carried out on the machine actually converged to this value (to 8 decimal places) for three random initial vectors.

(b) As a second example, consider: 1 2 + X1=- X2 + 2xXl2+2x1x 3 x2 = x2 + x2 Conventional name: I.4.a X.= 2x2x3 In Table II this appears as: (11) + 2(12) + 2(13) 2 n.f.p.'s (22) + (33) b.f.p. 2(23) doubly degenerate no i.f.p. m -* n.f.p. (xl = 1)


200

n.f.p. means "nodal fixed point" and refers to the fact that xi = 1, for at least one value of i, is a fixed point. b.f.p. or "boundary fixed point" means that there exists a fixed point for which one x = 0. In view of the explanation of "not degenerate," the term "doubly degenerate" is self-explanatory. "No i.f.p." means that there is no interior fixed point, and m - n.f.p. (xl = 1) means that the system converged to xl = 1 for 3 randomly chosen initial vectors.

(c) Consider finally: = + 2 X1 = X1 + X3 X2 = 2X1X3+ 2X2X3 Conventional name: III.1.f X3 = 2X1X2+ X2

In Table II (leaving out some explanatory material with regard to the b.d.p.): ( ( ) n.f.p. b.d.p. ( not degenerate i.f.p. given by (xX: . - - x) .i.f.p. m -* i.d.p. X. xl .xl X.X .i.d.p. X.X.

In this case the transformation did not converge to the i.f.p., but rather ended up oscillating between two sets of values, that is, it achieved an "interior double point" (i.d.p.).

We hope that with these examples in mind, it will be possible to interpret Table II. In a few instances, some remarks are appended, but these are self-explanatory.


201

V—
Convergence Behavior

To date we know of no criterion which enables one to predict combinatorially, i.e., from an inspection of the reaction table, whether or not the limiting behavior of a given system will be true convergence (fixed point attained), oscillation between a finite set of limit vectors (periodic point), or neither. For general N, such a criterion will certainly not be simple; it would, for instance, have to take into account the various boundary fixed points and boundary periodic points which are, of necessity, present in any binary reaction system. Of course, every binary reaction system has either boundary fixed points or boundary periodic points (or both). (from Brouwer's fixed point theorem it follows that at least one fixed point must exist.) In other words, the behavior of boundary points under iteration will always have to be treated specially.

An example of this type of complication is provided by system I.1.c. Transcribed from Table II, this reads: X1 = X1+X2 + X3 X2= 2x1x 3+ 2x2 3 33 = 2X1 X2

First of all, it is clear that x(0) = 1 will lead to the n.f.p. x1 = 1. Furthermore, it is clear that a b.d.p. exists, namely: x = 1/2 x1 = 1/2 x2 = 0 x2 = 1/2 3 = 1/2 x3 = 0

Experimentally, for three different randomly chosen interior vectors, the systems converged to the i.f.p. given in the table.

Other special cases have a behavior more difficult to discover; e.g., the system III.l.f, in addition to the i.f.p., has the periodic solution: l = .31944846 x = .56519772 X2 = 0 x2 = .43480228 x3 = .68055154 X3 = 0 which is attained if x?f) or 3x0) = 0.

For our randomly selected initial vectors, however, the system attains the i.d.p. given in the table.


202

One might think that things get simpler if we consider only interior initial points. Several examples, however, show that even here nothing universally true can be asserted. For example, the system I.2.m attained its i.f.p. with two different initial vectors, but went to the b.d.p. from a third initial vector.

The situation clearly gets more complicated in higher dimensions, where the classification of special boundary solutions in general depends on a complete knowledge of the behavior of lower-dimensional systems.

One may consider, in order to determine whether the fixed point of the transformation is "attractive" or "repellent," the value of the Jacobian of the transformation or the fixed point. If, for example, the absolute value of the Jacobian is > 1, then, in general, iterates of points in every neighborhood of the fixed point will diverge from it.

A summary of the convergence behavior in all 97 three-variable systems for random initial vectors is given in Table I. In 23 systems there is convergence to xi = 1, for some i. Twelve converged to a b.f.p., 15 to a b.d.p., 4 to an i.d.p., 4 to a b.t.p. (boundary triple point), one to an interior triple point, while 6 showed varying behavior depending on the initial conditions. (This class may turn out to be larger with more initial points sampled.) One does not converge at all. All the rest converged to an i.f.p.

The two systems II.1.d and II.1.f showed a continuum of i.f.p.'s and i.d.p.'s, respectively, but this behavior is easy to understand, and not specially significant (see remarks in Table II).

Except in 2 cases, systems I.2.j and III.2.3.a, convergence was numerically evident (though occasionally extremely slow). 1.2.j is particularly interesting, and is discussed in detail in Section IX.1. III.2.3.a is briefly discussed in Section XI.

VI—
The Nature of the Interior Fixed Points

1. Leaving aside for the moment the question of convergence, it is of interest to inquire into the nature of the various i.f.p. Since for a given N, there are a finite number of different systems, there are only a finite number of i.f.p. These are, of course, defined by the set of algebraic equations obtained on suppressing the primes on the left-hand side of the systems in question. As mentioned above, from Brouwer's theorem it follows that there exists at least one fixed point,


203

but it need not lie in the interior. Frequently these systems have no solutions such that 0 < xi< 1, all xi, which means that no i.f.p. exists. Consider, for example, system I.1.a. The set of equations defining the fixed point is: 2 2 2 X1 = C2 + X2 +-X2 X2 = 2x1X2+2x1x3x3 = 2X2 X3

Since, by definition of an i.f.p., x34 0, we must have x2= 1/2; the second equation then implies: 1/2 = 2x(1 - xl), i.e., xl = 1/2, so that xl+ x2 = 1, implying x3= 0. Thus no i.f.p. exists.

In general, to find the i.f.p. we must eliminate two of the variables. The resulting equation is then of 4th order in the remaining variable, say xi, although it may have factors corresponding to x 1 = 1, xi = 0,

or perhaps to extraneous roots like xi = -1. (In Table II the equation listed is always in "reduced" form, with these factors removed.) Occasionally the equation may have two real roots in the interval 0 to 1. For N = 3, in all such cases one of the roots proved to be spurious, i.e., it did not satisfy the original system. In fact, excepting the case II.l.d, mentioned above, which had a continuum of i.f.p., no system had more than one i.f.p. Although it is doubtless possible to give a complete theory of these equations for N = 3, a similar treatment for general N seems beyond reach. Here the elimination process can yield an (unreduced) equation of order 2N-1.

2. Bounds for the i.f.p. Consider an i.f.p. satisfying: I >X1>x 2 >X 3>...> > 0 (16)

Clearly, we lose no generality by specifying this ordering, since we can always carry out a permutation on the system so that (16) holds. For a given N, the "largest" i.f.p. will be defined as that i.f.p. for which xl< 1 has the largest numerical value as we range over all possible systems. (Since the number of systems is finite for finite N, there will always exist a largest i.f.p.) The question then arises: Given N, which system has the largest i.f.p., and what is the corresponding value of


204

xl? In view of the astronomical number of inequivalent systems (for even moderate values of N) it is of some interest that a partial answer can be given to this question.

For N = 3, our complete study reveals that the system possessing the largest i.f.p.-hereafter called the "maximal system"-is II.3.d, for which the defining equations are (we interchange x2 and x3 for convenience): X1 = X3 + 2x1x2+ 2x1x3+ 2x2x3X2 = X (17) X3 = X2 The (unreduced) equation is clearly (x = xi): x+x2 +x4 = 1 (18) which yields: x =.56984029 (19)

The natural generalization of this system to N dimensions is: N\2 N-1 XI=XN + ...= Xi - Xii=l i=l X2 = X12 X3 = X22 (20) 2 XN-XN-1 The root x = xl is then given as a root of the equation: p=N-1 fN(x)-- E x2 =1 (21) N=O This root converges very rapidly as N - oo; for example, N = 4: x = .566160865... N = 5: x = .566123797... (22) N = o: x = .566123792...


205

It is tempting to consider this last number as an N-independent bound for all binary reaction systems. Unfortunately, this is false, as will be shown below.

Consider a system with xl > 1/2 and satisfying the ordering (16). Let us assume the "skeleton": x1 = x1+ 2x1x2+ ... X2 = 2x lx 3 + ... X 3= 2X 1X 4+ ... (23)XN_1= 2XN+ ...XN

Clearly: 1-Xl = X2+X3+X4+ ...+XN<X2 [1+2+42+-8X3(22 (24)l 2xL 2 (2x,1) 1- xi But X2< 2 (25) i - Xl 1 - xl or >2>p 2 (2x)P (26) If we equate these bounds and set y = 2x1 , we obtain the equation: yN-1 - 2yN-2 + I = 0 (27) Calling the root of this equation yN, it is evident from (26) that: Xz< Y (28)

Clearly, yN - 2 as N - oo, i.e., the bound is N-dependent. One might suspect at first that this bound is a very weak one, and that the actual maximal system has a much lower i.f.p. However, that the bound is the best possible is proven by exhibiting a system for which yN/2 = x1 is actually obtained. In fact, such a system is:


206

X= X1+ 2x1x2 X2 = 2l1X3X3 = 2x1x4 (29) XXN_1 = 2xlx NN N = 1 2 x2 x j=(x .....X)2 =(1X)2xNx = E i + E2 E^ = Xj=(X2+• *.1) = (-Xl) i=2 i<j=2

It is easily verified that this system has (27) as its i.f.p. equation. For N = 4 x l= = .809016995 ( 4 (30) N = 5 x1 = .919643378

Experimentally (N = 4, 5, 6), this i.f.p. is not attained on iteration starting from a general point, but these converge to the n.f.p. XN = 1. This is to be contrasted with the behavior of the system (20), which actually attained its i.f.p. (N = 3, 4, 5). Indeed, for system (29) it turns out that the absolute value of the Jacobian at the i.f.p. is (y = 2x1): IJI = yN-2(2 - y)(31) > 1 which makes it reasonable that this i.f.p. is not attractive.

On the other hand, it is clear that for a "skeleton" of the form: Xl = ... X2 = X2+ ... X3 X2 + ... (31) xN-XN2 -_1 + -' we have:


207

2 x X2 >x2 X3> X2> xl (32) 2 >x 2>x2N-1N N-1 - 1 N-1 whence: 1 = xl+x2 ...+x>Zx = fN(Xl) p=O

Therefore, for such a skeleton, the root of fN(x) = 1 does indeed provide an upper bound (attained for the system (20)).

At this writing it has not yet been shown that the system (29) is actually maximal. However, a weak upper bound can be obtained for all systems such that x1 does not contain the term x2 (skeletons of the form (31) are a sub-class of these).

Namely, in this case xl<2xl (1 -xl ) + (1 -x)2 = 1--2 (33) Therefore, clearly x2 + Xl 1 or xI < .61803399 (34)

However, we can do much better for this case. In fact, we can show that for xl > 1/2, we must have, under the ordering (16): Xk > z-21 (35)

which then establishes the bound (21) by the previous argument.

VII—
Periodic Limits

For a large number of 3-dimensional systems, randomly chosen initial vectors iterated to a periodic limit, i.e., the limiting behavior was an oscillation of period 2 or 3 between fixed points. Twentyfour systems exhibited this behavior for three initial vectors, while six


208

others achieved a similar limiting configuration for at least one choice of initial vector (with no coordinates lying on the boundary). In most of these cases the final state was of the form:

i.e., a boundary double point. A few cases of a boundary triple point were also observed (cf. Table I). Such final states we call "trivial," for the reason that the algebraic structure of the transformation alone indicates that such a final state is at least possible. We may contrast this "trivial" type of oscillatory final state with those for which the oscillation takes place between two or three interior points. The latter we call an "interior double point" (i.d.p.) or "interior triple point" (i.t.p.).

For N = 3 we found just four examples of an i.d.p. and one of an i.t.p. There was also one case of a "non-trivial" b.d.p. (system I.2.e) for which the final state was oscillatory with period 2, but between two "non-trivial" boundary points, viz.:

The existence of an interior double (or triple) point means that the second (or third) power of the transformation possesses these limit values as fixed points. The algebraic difficulty of finding such points is in general prohibitive. For example, in the unique case of the interior triple point (system 1.3.g), if we let:

then one coordinate of the triple point is determined by the set of equations:


209

It may be verified that the successive values of xl given in Table II indeed satisfy this set of equations.

Although there are no oscillatory limiting configurations with periods greater than 3 for N = 3, one can, of course, find such by going to higher N. Indeed, we discovered, by chance, a particularly interesting case, viz.:

This can be considered as one particular generalization of the 3variable system I.5.j. This generalization-which we applied to several of our original systems (Table III)-consists in setting x4 = x3, replacing x3 in the original system by x4, and putting the new crossterms 2x 4(il + x 2+ X 3) in the top line. When we generalized the triple periodic case, I.3.g, in this manner, the resulting limiting behavior (for 3 randomly chosen initial vectors) was still periodic with period 3, but the configuration was of the "trivial" sort, i.e., the b.t.p. (1,0, 0, 0), (0,0, 1,0), (0, 0,0, 1). However, in the case of I.5.j, whose resulting generalization is given above, the limiting configuration was oscillatory with period 12. (The values of the coordinates are given in Table III.) A further generalization to 5 variables, following the same prescription, yields the system:

In this case, 3 random initial vectors achieved an oscillatory limiting configuration of period 6. (See Table III for the numerical values.) In our opinion, it is not likely that the behavior observed in these two cases could be predicted by means of any simple criteria.


210

VIII—
Form Stability

1. In a few cases we investigated the effect of making slight changes in the form of the equations themselves. One way of doing this, which makes the change of form depend on a single parameter, is as follows: Multiply each term by the factor I - e, and add e/2 times the term in each of the two other rows.

For example, the system 1.5.0 is:

This was now modified to:

In this case it is easy to carry out the elimination of x2 , x3 to obtain the equation for the i.f.p. as a function of e. There results: blx4+ b2x3 + b3x2 + b4xI+ b 5 = 0 (38) where the bi are given in terms of the parameter: 2- 3e a = 2 (39) as follows: b = a4 b2 = 4a3 b3 2a2 - 3 (2 + a)3 (40) 4a 2b4 =a(1 + 2a) - 3 (2 + a) o~ o ~a2 b5 = (2a + 1)+ (2 + a)2


211

For e = 0 (a = 1) we get (dividing out xl) the original i.f.p. equation: x+ + 4x2 - 1 = 0 (41) At e = 2/3, a = 0, all the coefficients bi vanish. This corresponds to the set: 12X1= x 2= x3 = (X2 + X2 + ) + 3 (X1l2 + x1X3 + 3X23) (42)

which reaches the i.f.p. (1/3, 1,3, 1/3) on a single iteration starting from any initial vector. In fact, as e -- 2/3, all systems to which this generalization is applied tend toward this simple case (since at e = 2/3, 1- e = e/2).

For the system 1.5.0, generalized in this manner, we investigated the convergence for several values of e, viz.: e = .001, .01, .05, .1. In each case, randomly chosen initial vectors iterated to the i.f.p. predicted by equation (38). In other words, there is a sort of continuity in the convergence behavior as the form of the equation is changed in this simple manner.

Slightly more interesting is the result of applying this one-parameter generalization to the system 1.3.g. As mentioned above, random initial vectors iterated according to this transformation reached an oscillatory final state with period 3. Vectors under the generalized transformation behaved in the same fashion, but as e was increased the final-state oscillations decreased in amplitude, until at e _ .045 general initial vectors appeared to converge to the i.f.p. predicted by the corresponding i.f.p. equation. Presumably, the final state is still oscillatory (with period 3), but the oscillations are too small to observe with 8-decimalplace accuracy.*

The conclusion to be drawn from these experiments is that binary reaction systems are "stable" under small perturbations of formal structure.

*For e > 0, some initial points converged to the i.f.p. In other words, the i.f.p. (which is a function of e) appears to be attractive for e > 0. More detailed analysis would probably show that what is happening is that the area of the region of the triangle for which convergence (of a point in this region) to the i.f.p., is increasing with increasing e. Correspondingly, one is less likely to pick an initial point outside this region, i.e., a point which will iterate to the i.t.p. For further discussion of behavior in the large, see Section XI.


212

2. When more radical changes of form are made, we do get correspondingly greater changes in behavior. For instance, we took the above-mentioned system I.3.g and kept only the skeleton: s/ 12 12 xI = 22 + X23 + 2x1x2 + ' = 2x1x3+ 2x2x3+ . .. (43) 1 2x3 X1 -t +.

To this we added the missing terms (1/2) x2, (1/2) x2 , (1/2) x3 in all possible ways (i.e., 27 ways, of which one corresponds to the original system). For 25 of the resulting 26 new transformations, random initial vectors iterated to an i.f.p. (The results are summarized in Table IV.) One system, however, gave an oscillatory final state with period 3, a behavior analogous to that of the original system, viz.:

The configuration of the final state (starting from 3 randomly chosen initial vectors) was:

It is clear that a "rule" which allows x3@z3 - xl or x2 (with equal probability) is a rather unnatural one. A somewhat more logical modification is to assume a skeleton in which the cross-terms appear with coefficient unity, and to add the missing terms x1x2,xx13,x2x3, in all possible ways. When this was done for I.3.g, all the resulting transformations had an i.f.p., and in every case random initial vectors iterated to the i.f.p. (see Table V).

This change of rule has a natural interpretation; effectively it allows non-commutativity, since e.g., xixj may give two results, which


213

we can interpret as the respective results of xixj and xjxi. Formally, this is an interesting generalization of the concept of a binary reaction system (as originally defined), but the convergence behavior does not appear to be startlingly different.

This same "non-commutative" generalization was tried on the system III.2.1.b; the results offered no surprises.

IX—
A Specific Convergence Problem

1—
The Exceptional Case I.2.j

As mentioned above, there are only two systems among all the 97 which exhibit an ambiguous convergence behavior. The most interesting of these has the form: x' = xl+ x3 + 2x1x3 x'2 = x + 2xlx2 I.2.j xI3 = 2J:2:J:3

The system is degenerate, and lias two n.f.p.'s; in the following we shall ignore the behavior of tile boundary points (since these present no problem) and only discuss the behavior under iteration of interior points.

By inspection, it is evident that the system possesses the i.f.p: 1Xl = 3 = 1 (45) X2 = 2

For three randomly chosen (interior) initial points, little convergence was evident even after some 85,()()( iterations. In order to see better what was happening, wc intro(duce(ld iew coordinates: 1 + .1 - : 3 2 (46) ( = -r2

These are effectively Cartesian coordinates in the plane of the triangle formed by the three vertices (1,0,0) (0, 1,0) (0,0, 1).


214

More specifically, in the original form of the transformation, the xi's are constrained to move on the positive portion of plane x1 + x2 + X3 = 1:

For algebraic convenience, we distort the triangle into a 45° triangle with base unity. The coordinates of a point in this triangle are then (S, a), as shown in the sketch below. Here


215

In terms of these new coordinates, the transformation takes the form: S' = 1 - 4a + 42+ 2aS (47) a' = 2aS and the i.f.p. is: 1 2 (48) 1 4

Note that the Jacobian of (47) is exactly 1 at the f.p. (48). For future reference, we write down the inverse of (47): - ' - (49) 1-S'-a' a= -' 2

If we now make the further transformation: x =S-1 f~~~~~~~~2 ~~(50) y =c-awe get: x x' -y + + 42 + 2xy (51) x y =y + + 2xy with i.f.p. x = y = 0.

Note that if we consider only linear terms: x 2' =-y + (52) , x y =y+ ~~2~~~~~ ~215


216

then an invariant ellipse exists: x12 + xty+ + 2yt2 = + xy + 2y2 (53)

Figure 1 shows a plot of the observed iterates in the x, y plane. The three curves show the behavior of successive iterates, the initial point of the sequence being taken respectively at n = 1550, n = 10,101, n = 75, 001. Each curve is roughly an ellipse of the form: +xy + 2y2 = C (54)

Numerically, at least, C appears to -— 0 as n -—oo, or, to put it another way, the axes of the ellipse of reference are shrinking.

In order to convince ourselves that this apparent convergence was not simply the result of systematic round-off, we used the inverse transformation (48), numerically retracing our steps from n = 10, 101 to n = 1,550. All coordinates agreed to 6 decimal figures, which pretty well precludes the possibility that the observed convergence is a numerical accident.

It should be noted, by the way, that no matter how close one is to the fixed point, the quadratic terms in (51) cannot be ignored, for according to (53), the linear terms by themselves will generate a sequence of iterates which will all lie on the ellipse of reference.

For the purpose of discussion, it is convenient to transform equation (51) so that the reference curve is a circle. The linear part of the transformation can be written: ()A(x) A- (1 ) (55) y (55) 2 If we transform this with the matrix: S=( ) (56) 0 2 we find: R = S-1 AS = 4 (57) 216V-- 3 4 4


217

which corresponds to a rotation through an angle 0 = cos- . The transform through S of the complete mapping (50) has, then, the form: , 3 V7 10 2 =-x + -y+ y -6xy ( 4^~~~ 4 ~~~~(58) -v 3 4 4 y' = -T-x + y+2y2 + 2Vfxy

2—
The Asymptotic Behavior of the Angle of the Radius Vector under Iteration

The transformations with which we are concerned do not preserve the ordinary measure (Lebesgue measure) of the space which is mapped into itself. Moreover, they are not one-one. In some cases they shrink a neighborhood of the fixed point into a proper part of itself and the limiting image of such a region may consist of this point alone. Obviously, an invariant measure, if it is to be constructed, would have to be of Lebesgue-Stieltjes type and assign positive values, in some cases, to sets consisting of single points.

One may be interested in the behavior of the angles (with a fixed direction issuing from a fixed point) of the vectors Ti(x), where x = (x1 , x2 x3 ); i = 1, 2,... more generally in the behavior of the points: T(Si-) (59) ITi(()I

on the unit sphere. Something can be said about it even in cases where T does not transform any bounded region into itself. Thus, for example, if T is an arbitrary linear transformation of the n-dimensional Euclidean space into itself, T(0) = 0. Then the ergodic limit of the average of Si exists. In other words, if C is an arbitrary "cone" of directions in space, i.e., a sub-set of the unit sphere fc(s), its characteristic function being:

fc(s) = 0 if .s C; fc(s) = 1 if s c C then for almost all s, the limit: N lim N Zfc(T'(s)) (60) i=1


218

exists. This follows for a general linear transformation T, from the well-known theorems giving, in fact, more precise information-in the two extreme cases: if T is an orthogonal transformation we have the Kronecker-Weyl theorem on equipartition; if T, considered as a matrix, has all coefficients positive, the Perron-Frobenius theorem asserts convergence to a unique direction. In the general case one obtains, by considering a decomposition of the space into sub-regions where one or the other behavior dominates, at least the existence of the ergodic limit.

Presumably, the theorem still holds true if T is a general homogeneous quadratic transformation of the n-dimensional space.

In our very special quadratic transformations of the plane, more can be said: The transformation of the previous section possesses the Knonecker-Weyl property: The angle described by the iterates of almost every point covers the circumference of the unit circle densely and uniformly.

We hope to show that if the linear part of a quadratic transformation Q, which has the origin as its fixed point, consists of a rotation through an irrational angle, then the iterates of Q converge to the origin provided one starts with points in a circle of sufficiently small radius.

A detailed discussion of these matters will be given in a subsequent report.

X—
Further Generalizations

In view of the impracticability of studying all possible Binary Reaction Systems for any N > 3, we thought it worthwhile to generalize a few of our 3-variable systems to higher dimension by arbitrary but fixed rules. One such generalization is mentioned in SectionVII (see I.5.j - ext - I and subsequent discussion). Another essentially different way to generate "interesting" systems is to construct for any given 3-variable transformation the corresponding "super-system." This is constructively defined as follows: We introduce nine variables, yl, y ,... yg, according to the prescription:


219

and substitute these in the original transformation. We then have three transformations for the three triads of variables yi+ Y2 + Y3, y4 + y5 + Y6, Y7 + Y8 + y9- Consider, e.g., the system I.2.e. The last line of the transformation now reads: y' + y8 + y' = 2 (yl + Y2 + Y3) (Y4 + Y5 +Y6)

In order to convert this into 3 separate expressions for y', y8, y', we could, for example, formally identify the variables modulo 3, i.e., YI ~ Y4 ~Y7 Y2 ~ Y5 ~ Y8 (62) Y3Y~ 6 ~ Y9

Correspondingly, on the right-hand side, we could make the identification: 2 y1l4 - y2 y2y5 ~ Y2 2y2y6 +2y3Y5 - 2y2y3 (63) 2yy16 + 2y3y4 ~2ylY3 2yly5 +2y2Y4~ 2 Y1Y2

We can now write expressions for y',y8, y' so that, with these formal identifications, the resulting sub-system will have the same form as the original 3-variable system, i.e., Y7~ y = yl +Y2 +2 Y2Y3 - 2Y14 + 2 Y2y5 + 2Y2Y6 + 2y3Y5 Ys ~ Y2 = Y3 + 2yy 33 - 2Y3Y6 + 2y1Y6 +2y3y4 (64) ys y13 = 2y1y2~ 2yly5 + 2y2Y4


220

In this way we (arbitrarily) obtain equations for y', y8, y' in terms of the yi's. When this is done for each triad, a 9-dimensional B.R.S. results.

In the present case, using the symbolic notation of Table II: (11) + (44) + 2(47) + (22) + (55) + 2(58) + 2(23) + 2(56) + 2(59) + 2(68) (33) + (66) + 2(69) + 2(13) + 2(46) + 2(49) + 2(67) 2(12) + 2(45) + 2(48) + 2(57) (77) + 2(17) + (88) + 2(28) + 2(29) + 2(38) + 2(89) (99) + 2(39) + 2(79) + 2(19) + 2(37) (I.2.e - Super) 2(78) + 2(18) + 2(27) 2(14) + 2(25) + 2(26) + 2(35) 2(36) + 2(16) + 2(34) 2(15) + 2(24)

As a second example, we quote the result of treating system I.3.g in the same manner: (55) + 2(25) + (88) + (66) + (99) + 2(36) + 2(45) + 2(78) + 2(15) + 2(24) 2(46) + 2(79) + 2(16) + 2(34) + 2(56) + 2(89) + 2(26) + 2(35) (44) + (77) + 2(14) 2(28) + 2(58) + 2(39) + 2(69) + 2(18) + 2(48) + 2(27) + 2(57) 2(19) + 2(37) + 2(49) + 2(67) + 2(29) + 2(38) + 2(59) + 2(68) 2(17) + 2(47) (I.3.g - Super) (22) + (33) + 2(12) 2(13) + 2(23) (11)

When these transformations were iterated for randomly chosen initial vectors, the sums: Xl = Yl + Y2 + Y3, x2= Y4 + 5 + Y6, 33 = Y7 + Y8 + Y9


221

reached, of course, as they should, the same limiting configuration as was observed in the original 3-variable system. However, the actual values of the individual yi varied with the initial configuration. The results are given in Table VI.

XI—
Properties in the Large

1. Although it is, in general, possible to discuss the behavior of points under iteration in the neighborhood of a fixed point, the iteration behavior of such points over the whole domain (positive portion of the hyperplane) can, at present, be treated only experimentally. In what follows, we shall take the variables to be S, a, and the domain to be the corresponding 45° triangle with unit base (see Section IX.1 for a definition of this coordinate system).

As stated above, we have not found any general criteria for determining which of several possible limiting behaviors will be realized for a given 3-variable system, starting with a general point in the triangle. On the basis of a rather small sample (~ 3 random initial points for each system), it appears that, excluding boundary points, the limiting behavior is independent of the initial point for the large majority of systems. As shown in Table I, however, there are (at least) 6 systems in which this limiting behavior depends on the initial point (we exclude from consideration the "pathological" systems II.1.d and II.l.f; see the discussions under these entries in Table II). Two of these, I.2.m and I.5.h, have been examined in greater detail. What was done was to look for boundaries which separate regions of different limiting behavior. This was accomplished by programming the computing machine to "search" the whole triangle in a systematic manner. On the first pass a crude net was used (AS = Aa = .05). Then, when the boundaries had been approximately located, a more refined interval was employed in the appropriate neighborhoods. For each trial point in the triangle, sufficient iterations had to be performed to identify the limiting behavior. Despite the apparent magnitude of the task (several hundred trial points had to be followed for some 70 iterations each), a complete search (first crude, then appropriately refined) takes only about 15 minutes of computing time per system.* The results for the two

* On the average, the machine will perform 50 iterations/sec. for a 3-variable Binary Reaction System. The systematic search of the triangle is somewhat slower for various reasons connected with input-output requirements.


222

systems studied are shown in Figures 2 and 3. In these, each calculated boundary point is determined to within an absolute error < .0025 i.e., to within 1/4% of the length of the base of the triangle. In these figures, initial points lying in the region marked "OSC." will iterate to the appropriate boundary oscillation, while all points lying outside these regions will converge under iteration to the i.f.p. The boundaries appear to be complicated. We have not attempted to study what happens to points actually lying on the boundary curves; for this an analytical treatment is necessary.

In a few simple cases it is possible to give an analytical treatment of such boundary regions. As an example, we cite the system III.2.2.a. In the S,a coordinates (we use these to conform with our treatment of the other systems; the argument can be carried out in the original coordinates with equal ease) the transformation takes the form: S' = 2S - S2 + 3,2 - 4aS (65) a' = 2a(1- S)

There are 3 n.f.p.'s, namely, (S = 0, a = 0), (S = 1, a = 0), and (S = 1/2, a = 1/2), while the i.f.p. is (S = 1/2, a = 1/6). The boundaries a = 0 and S = a are clearly transformed into themselves. In addition, there exists an invariant line: S = 3a .

All points lying on this line can easily be shown to iterate to the i.f.p.with the exception of S = a = 0, which is a non-attractive f.p. This line is shown on the diagram below. The curve S =S* is the locus of all points such that S'= S; its equation is: S*= - 4a+ (1 4a)2 12 (66) 2

It is easily shown that a point lying in the region below the line S = 3a, i.e., such that S > 3a, remains in this region under iteration; similarly for points lying above the line. Further, all points lying to the left of the curve S = S* remain to the left under iteration. It can be further shown that points lying to the right of S = S* and not situated at the corners of the triangle or on the line S = 3a will eventually cross


223

the curve. Furthermore, in the neighborhood of the i.f.p., the linear approximation to the transformation is (x S - 1/2, y a - 1/6): x x (67) Y =- +Y

from which it can be deduced that the i.f.p. is attractive only along the line x = 3y. Finally, using the fact that for S < 1/2, we always have a' - a > 0 and that correspondingly for S > 1/2, a' - a < 0, it can be shown that all points lying above the line S = 3a will iterate to S = a = 1/2, while all points below the line will iterate to S = 1, a = 0.

The reason why this system can be treated so simply is, of course, that the boundary curves are explicitly known. When this is not the case, it may be helpful to have before one a picture of the mapping (single iteration) of the entire triangle, as well as the curves along which a and S are stationary. A few interesting examples are given in Figures 4, 5, and 6.


224

2. There is one case which does not appear to converge either to a fixed point or to a finite oscillation. This is the system III.2.3.a: x1 = X2+ 2x1x 2 x2 = 2 + 2x2x3 (68) x -+ - 2x 1x 3

or, in terms of the coordinates, S, a: S' = 2a+S2 - 3a2 (69) a' = 2(1 - S)

The situation is illustrated in Figure 7. The i.f.p. is non-attractive and the 3 corners of the triangle are attractive only along the boundaries in a clockwise direction. Under iteration, points will spiral out, approaching arbitrarily close to the boundaries, but, e.g., as the transformation a' = 2a(l - S) shows (for the bottom boundary), no point inside the triangle can ever reach the boundary. Thus a general point will continue to spiral indefinitely.

Numerically, a spurious convergence was observed owing to the fact that the several random initial points chosen rapidly iterated to within a distance less than 10-8 from one or another boundary line.

If one transforms the triangle into the unit circle in an appropriate manner, the situation can be viewed as follows: The center is a nonattractive fixed point, and all points lying in the circle spiral outwards towards the circumference. On the circumference itself, there are 3 fixed points located, say at 0 = 0, 27r/3, 47r/3, which in turn define 3 arcs. Any point lying on one of these arcs will move under iteration in a clockwise direction, ultimately converging to the fixed point which constitutes the right-hand boundary of the arc in question. Interior points, however, can never reach the boundary. In general, the sequence of iterates of any interior point (excluding the center) does not converge.

3. We have not studied in detail the rate of approach to the limiting configuration. In the neighborhood of a fixed point, this rate is usually easy to obtain, but for oscillating configurations the algebra is more difficult. For most of the cases studied, convergence (to 8 decimal places) was either attained within 100 iterations (some were much


225

faster) or else not for many thousands of iterations. We shall not discuss this further except to remark that the path of approach to a fixed point may depend very critically on the initial point. As an example, we refer the reader to Figure 8. For this transformation, two of the corners of the triangle are non-attractive fixed points; initial points in their neighborhoods iterate smoothly to the i.f.p. In contrast, points in the neighborhood of the origin iterate to the i.f.p. in an oscillatory manner; the existence of a limiting line through the i.f.p. is clearly evident.

XII—
Connection with Ordinary Differential Equations

No doubt it will have occurred to the reader that Binary Reaction Systems and their extensions have an obvious connection with systems of ordinary differential equations. Consider, for instance, the system of differential equations:

where the fi (xl ... XN) are homogeneous quadratic functions of the variables. A straightforward finite difference approximation to the system is: xn+ 1 (1 -t)x() + At ft (71) xN+ l) - (1 - At)X) + At f )


226

If we now restrict the fi to be disjoint partial sums of the terms in (Xl+...+ XN) 2 , such that ilfi = (xi +... +N)2 , i.e., just the terms that occur in our binary reaction transformation, then the above set of difference equations goes over into a Binary Reaction System for At = 1. With this restriction on the fi, the system of difference equations has the property xn = 1 for all n if Ex() = 1, independent of the time-step At. It should be observed that for At > 0, the system has the same fixed points as the corresponding Binary Reaction System.

As a very simple example, consider the system: 2 2 1 with i.f.p. xl = 2= -, 2 (72) X2 = 2xlX2 and b.f.p. xI = 1, x2 = 0.

Any interior point (xif 1, 0) will iterate to the i.f.p. The corresponding differential equation is (eliminating x2 ): dx=2x - 3xl + 1 (73) dt 1 of which the solution is: 1 /2-1-x 1/2 -Jx (0 l(74) 1/2-^ 1/2-2' x

Clearly, as t - oc, xl -, 1/2, x(°)# 1, 1/2. Thus, in this case, the asymptotic behavior of the differential equation is the same as that of the corresponding Binary Reaction System. This, however, is not generally so. For example, the transformation I.5.d (see Table II) possesses a boundary double point which is attained for a certain set of initial points. The corresponding differential equation system, when integrated according to the finite difference scheme (71) with At = 26, converged to the n.f.p. xl = 1 regardless of the initial point.

In such cases the Binary Reaction System, viewed as a finite difference approximation to the corresponding system of differential equations, is clearly an "unstable" scheme. We hope to discuss the point further in a subsequent report.


227

Explanation of Graphs

1. Figure 1 is a plot of successive iterates of a point under the transformation equation (53) of the text. The outer curve shows the iterates from cycle n = 1550 (point ii) to cycle n = 1570 (point fl); the next curve goes from n = 10,101 (point i2) to n = 10,114 (point f2), while the innermost curve goes from n = 75,001 (point i 3) to n = 75,011 (point f3).

2. Figures 2 and 3 show the (experimentally determined) boundaries between two types of limiting behaviors for two different Binary Reaction Systems (see discussion in Section XI). In terms of the S, a coordinates, these transformations are: 1.2.m (Figure 2): 3S2 - a2S' = 1 + + 3aS - 2S S2 _ 3a2 a -3aS + 2a 2 I.5.h (Figure 3): 1 + a2 - 3S2 S= l+~2 -a+S+3aS I + S2 - 3,a2(a = + a -S - aS

In each case, points in the inner region (marked "CONV.") will iterate to the fixed point, while points outside this region will reach an oscillatory final state (in I.5.h, all three outer regions are oscillatory, though only one is so marked).

3. Figures 4, 5, and 6. These figures give the S, a mapping explicitly for three different Binary Reaction Systems: I.2.j (Figure 4): S' = 1 - 4a + 4a2 + 2aS (cf. equation (47) of the text) a' = 2aS I.5.h (Figure 5): See 2. above.


228

III.2.1.a (Figure 6): S' = S + 3a + 6aS a' = S- a + 3a2 - S2

In Figures 4 and 5, the numerically labelled lines are the transforms of lines of constant a; e.g., in I.5.h the curve labelled .25 is the transform of the horizontal line a = .25. The curves labelled a' = a and S'= S are lines of constant a and constant S, respectively.

Figure 6 is plotted in a 60° , 30° reference triangle, i.e., in terms of the variables S and t = vJa. The labelled curves are the transforms of lines of constant t, and the arrows indicate the order of the transformed points as the relevant lines of constant t are traversed from left to right (direction of increasing S). Note the change of direction for t > v3/6. (The vertical line with two arrows labelled 1/6 is the transform of a = 1/6 or t > \/3/6. This line is doubly covered, first upwards, then downwards, as S increases.)

4. Figure 7 illustrates the "non-convergent" case. See Section XI, equation (69), and the accompanying text.

5. Figure 8 illustrates different modes of convergence to a fixed point. See part 3 of Section XI for discussion.


229

blank


230

blank


231

blank


232

blank


233

Table I—
Summary of Convergence Behavior of Three-Variable Binary Reaction Systems

Note: In addition, we have the following I.3.g i.t.p. III.2.3.a non-convergent


234

Introduction to Table II

This table summarizes the properties of all 97 Binary Reaction Systems in 3 variables. The notation is explained in Section IV of the text. For a few systems, the behavior of arbitrary vectors under iteration can be predicted theoretically owing to the fact that the system reduces to a difference equation in a single variable (e.g., I.l.b, 1.2.q, etc.).* We have not though it worthwhile to note these instances explicitly; anyone using the table will immediately discover them for himself.

For the 3-variable case, the coordinates of all fixed points could be written explicitly in terms of radicals, since the f.p. equation is at most of 4 th degree. There are many interesting relationships between the roots of the various f.p. equations. We have not investigated these relationships systematically, although it could easily be done using standard tools (cf. L. E. Dickson).6

Table II—
Three-Variable Binary Reaction Systems

I.l.a (11) + (22) + (33) n.f.p. 2(12) + 2(13) b.f.p. 2(23) J degenerate no i.f.p. m -* b.f.p. I.l.b (11) + (22) + (33) n.f.p. 2(12) + 2(23) 2 b.f.p.'s 2(13) J doubly degenerate no i.f.p. m - b.f.p. (x3 = 0)

* In some other cases, one may observe that one of the variables will obviously iterate to zero, in which case the limiting behavior is also evident.


235

Table II (cont.) I.l.c ( n.f.p. b.f.p. j not degenerate i.f.p. given by (xX - - .or (y l) x .i.f.p. yy - .m -, i.f.p. I.a ( ( n.f.p. ( ne degenerate i.f.p. given by (x x v- xx - .i v/-(cf e)m- i.f.p. x-.(cf I) I.b ( ( n.f.p. ( ne degenerate no i.f.p. m -n.f.p.


236

Table II (cont.) I.c ( ( n.f.p. ( nfp not degenerate no i.f.p. m -, n.f.p. I.d ( ( n.f.p. ( b.f.p. ) degenerate no i.f.p. m - b.f.p. I.e ( ( n.f.p. ( b.d.p. (see below) J not degenerate no i.f.p. one coordinate of b.d.p. (x x given by - - x .56519772, xl . x .X0, X.b.d.p. m - b.d.p. x.43480228, x J


237

Table II (cont.) I.f ( ( n.f.p. ( b.f.p. J degenerate no i.f.p. m -— b.f.p. g ( ( ) n.f.p.'s ( b.f.p. J doubly degenerate no i.f.p. m -b.f.p. h ( ( n.f.p.'s ( degenerate degenerate no i.f.p. m -t n.f.p. (x i ( ( nn.f.p.'s Xl ] ( x ni.f.p. degenerate m , n.f.p. (xl


238

Table II (cont.) I.j ( ( nf's ( degenerate i.f.p. J m i.f.p. (See Section IX. of text for discussion) I.k ( ( p's n.f.p.'s ( ( ot degenerate i.f.p. given by (x xi) x . - i.f.p. £.m -* i.f.p. x. ( ( n.f.p.'s ( b.f.p. J doubly degenerate no i.f.p. m -* n.f.p.(x I.m ( ( x - b.d.p. . ( b i i.f.p. degenerate (cf I.b) I ^ ^ - (cf b) x) - .(cf. III.l.d)


239

Table II (cont.) I.m (cont.) m -* i.f.p. with initial vectors ( x) .x) and ) .x x .m - b.d.p. with initial vector (See Section XI for discussion) x() .x ) .x . I.n ( ( bd b.d.p. ( degenerate degenerate The only f.p. is the b.f.p., which is the i.f.p. of the 2-variable system System ( .| ( 3-e .(X This f.p. is easily shown not to be attractive. And, experimentally m -* b.d.p. I.o ( ( b b.d.p. ( ( not degenerate i.f.p. given by (x x x.- - X.i.f.p. m -i.f.p. x.


240

Table II (cont.) I.p ( ( b.f.p. (cf. I.n) ( b.d.p. J degenerate i.f.p. given by (x xi) l.x - xi.f.p. /3- (cf. h) - (cf. II.b) m -* i.f.p. with initial vectors x() x() x x() and x?) and x) .x -) x) x()m -* b.d.p. with initial vector x() .x() .x . I.q ( ( b.d.p. ( b not degenerate i.f.p. given by (x xl) .x- V5-i.f.p. -Ox- - .i m - b.d.p. X.


241

Table II (cont.) r ( ( b.f.p. (cf. I.n) ( b.d.p. J degenerate no i.f.p. m -* b.d.p. I.a ( ( n.f.p.'s degenerate degenerate ( no i.f.p. m -* n.f.p. (x I.b ( ( ) n.f.p.'s b.f.p. ( J doubly degenerate no i.f.p. m - b.f.p. I.c ( ( n.f.p.'s b.f.p. ( J not degenerate no i.f.p. m -- n.f.p. (x xl)


242

Table II (cont.) I.d ( ( n.f.p. n.f.p not degenerate ( no i.f.p. m - n.f.p. I.e ( ( xl ) In.f.p. den e i.f.p. ( x i.f.p. exists m -l i.f.p. I.f ( ( n.f.p. nfp not degenerate ( i.f.p. given by (x x) .- x.i.f.p. m - i.f.p. x .J I.g ( ( }) . b.d.p. .i.f.p. not degenerate ( .


243

Table II (cont.) I.g (cont.) i.f.p. given by (x x - - x m -- interior triple point xnx) .73924369, x() .08071790, n(n x) .01294586, x( .37280086, x) .i.t.p. - x ) .24781045, x(n) .54648124, .. I.h ( ( ) b.f.p. (cf. I.n) b.d.p. ( J degenerate i.f.p. given by (x xl) xi . m i.f.p.) mi.f.p. 1- X . I.i ( ( b.d.p. bdp not degenerate ( 3- v/. i.f.p. given by (x xl)x .x- - .i.f.p. m - b.d.p. - .(cf. f)


244

Table II (cont.) I.a ( ) n.f.p.'s ( ( b.f.p. doubly degenerate no i.f.p. m -- n.f.p. (xl I.b ( n.f.p.'s ( ( degen degenerate i.f.p. given by (x x xl - V - .i.f.p -Vif..2- (cf. I.m) m -, i.f.p. X . I.c ( n.f.p.'s ( ( not degenerate no i.f.p. m -- n.f.p. (xl


245

Table II (cont.) I.d ( b.f.p. (cf. I.n) ( ( b.d.p. ) degenerate no i.f.p. m -. b.d.p. I.e ( b.f.p. (cf. I.n) ( ( b.d.p. J degenerate i.f.p. given by (x xx - - i.f.p. m - b.d.p. (v2-(cf a) X. I.f ( b b.d.p. ( ( ( (ot degenerate i.f.p. given by (x x .- x - .i.f.p. m -* i.f.p. X.J


246

Table II (cont.) I.g ( n.f.p. ( ( b.f.p. J degenerate no i.f.p. m - b.f.p. I.h ( n.f.p. ( ( b.f.p. J degenerate no i.f.p. m -- b.f.p. I.i ( n n.f.p. ( ( not degenerate i.f.p. given by (x x x.- - x. m - i.f.p. x .J a ( nfps n.f.p.'s ( triply degenerate no i.f.p. m - n.f.p. (xl


247

Table II (cont.) I.b ( n.f.p. b.f.p. (cf. I.n) ( b p. b.d.p. ( degenerate no i.f.p. m - n.f.p. I.c ( nfp's ( deen degenerate ( no i.f.p. m -- n.f.p. (Xa I.d ( n.f.p. ( b.d.p. (see below) ( not degenerate no i.f.p. In addition to the "trivial" b.d.p. (0, 1,, (0,0, , there is a b.d.p. which is a f.p. of the transformation X X-2 , Viz. Xz X 3- xV5- - b.d.p. x This is not an attractive f.p. If xO) 0, x < (- v/2, the ini- tial vector - xi If a) 0, x ) > (- v/)/2, the initial vector the trivial b.d.p. This behavior has been verified numerically.


248

Table II (cont.) I.e ( n 's n.f.p.'s ( ( doubly degenerate ( no i.f.p. m - n.f.p. (x I.f ( n.f.p. ( b.d.p. ( J not degenerate i.f.p. given by (x x X - - Z.i.f.p. - (cf. i) m - . i.f.p. . I.g ( n.f.p. b.f.p. (cf. n) ( bd.p. b.d.p. ( degenerate no i.f.p. m - b.d.p.


249

Table II (cont.) I.h ( b.t.p. ( btp not degenerate ( i.f.p. given by (x xi)x .i.f.p. X . m -- i.f.p. with initial vectors x) .x)and x) .XJ ( m -- b.t.p. with initial vector x) .x) .(See Section XI for discussion.) . I.i ( n.f.p. b.f.p. (cf. I.n) ( b.d.p. ( degenerate no i.f.p. m -* b.d.p.


250

Table II (cont.) I.j ( b.d.p. ( bdp not degenerate ( i.f.p. given by (x x) x. xxx - .i.f.p. m -* b.t.p. x .j k ( n.f.p. b.f.p. (cf. I.n) ( b.d.p. ( degenerate no i.f.p. m -- b.d.p. ( x . ( b.t.p. .cf. .n for not degenerate x. fo value of ( ) .(cf. of .r) (cf. I.r) i.f.p. given by (x x x3 - x 2 - m -* i.d.p. (n (n) (n-i) xI(x - .67046846, xi .(n(n) (n) x) .31224737, x .i.d.p. (n (n)(n) x .01728417, x .


251

Table II (cont.) I.m ( n.f.p. ( b.d.p. b.f.p. (cf. n) ( degenerate no i.f.p. m -* b.d.p. I.n ( ) b.t.p. ( t (not degenerate ( i.f.p. given by (x x x . x - xi.f.p. m -* i.f.p. x .(cf. for value of x o ( ) n.f.p. ( b.d.p. ( not degenerate i.f.p. given by (x x x .x _ .i.f.p. m - i.f.p. X I.p ( b.t.p. ( not degenerate (


252

Table II (cont.) p. (cont.) i.f.p. given by (x x x x- x- .i.f.p. m i.d.p. x. m - i.d.p. (n ) .42717428, x .xnx) .53818709, x .i.d.p. (n x) .04463863, x( . I.q ( n.f.p. ( b.f.p. (cf. n) ( b.d.p. ( degenerate no i.f.p. m -- n.f.p. with initial vectors x() x) .x) and x) .( x) .m -* b.d.p. with initial vector () . .x) . I.r ( b.t.p. ( t ( f not degenerate ( i.f.p. given by (x x xi X- x- I X .\ i.f.p. m - b.t.p. x.J(cf.


253

Table II (cont.) a n.f.p.'s ( ( b.f.p. ( J degenerate no i.f.p. m - b.f.p. I.b bd b.d.p. ( ( not degenerate ( i.f.p. given by (x x xl ._-x - X.i.f.p. m -* i.f.p. x . I.c n n.f.p. ( ( not degenerate ( i.f.p. given by (x X X.- i.f.p. m -÷ i.f.p. x . II..a ( ( ( } n.f.p. b.f.p. J doubly degenerate no i.f.p. m - b.f.p.


254

Table II (cont.) II.l.b ( ( ( n.f.p. ~~~ }degenerate no i.f.p. m -- n.f.p. II.d ( ( ( n.f.p. b.f.p.'s J doubly degenerate X There is a continuum of i.f.p.'s, since -- a. The i.f.p. corresponding to this a is Xx l a Xa) i.f.p. -a) Note that if we write xxY2, xl yi, then the system takes the form tYY which, for a general point, Y - Yl Ym -* appropriate i.f.p. II.l.f ( ( ( ) n.f.p. ~ not degenerate There is a continuum of i.d.p.'s owing to the fact that Thus, if X a, the i.d.p. is given by x' x(


255

Table II (cont.) II.f (cont.) x(n xn) n _ n) (n a x)' - - 1-a) i.d.p. )' ) _(n ( xn) a (n) -(cf II.d) This system reduces to the same 2-variable system as does II.c m -* appropriate i.d.p. II.a ( ( } n.f.p. ( ) } degenerate no i.f.p. m - n.f.p. II.b ( ( } n.f.p. ( degenerate i.f.p. given by (x xx x x - .i.f.p. i.f.p. - (cf. I.p.) - II.c ( ( n n.f.p. ( not degenerate no i.f.p. m -* n.f.p.


256

Table II (cont.) II.d ( ( p's n.f.p.'s ( ~~~( tdoubly degenerate no i.f.p. m n.f.p. (x II.e ( ( n.f.p.'s ( b.f.p. J doubly degenerate no i.f.p. m -* b.f.p. II.f ( ( n.f.p. ( ) doubly degenerate no i.f.p. m - n.f.p. (x II.g ( ( b.f.p. (cf. I.n) ( b.d.p. Jdegenerate no i.f.p. m -* b.d.p.


257

Table II (cont.) II.h ( ( b.f.p. (cf. I.n) ( b.d.p. J degenerate xl } i.f.p. exists M --b.d.p. ^~~xi.f.p. m -* b.d.p. x II.i ( ( b.d.p. not degenerate xl ) i.f.p. exists x m - ifp 4- i.f.p. X(cf. II.h) II.a ( n.f.p.'s doubly degenerate ( no i.f.p. m -- n.f.p. (xl II.b ( n.f.p. ( b.d.p. ( J not degenerate no i.f.p. m - n.f.p.


258

Table II (cont.) II.c ( n.f.p. b.f.p. (cf. n) ( ^J bdb.d.p. ( degenerate no i.f.p. m - b.d.p. II.d ( b.t.p. ( not degenerate ( J i.f.p. given by (x xl) - -or (unreduced) x - -(cf. Eq. ( of text) or (y Xx .i.f.p. y - xm —- i.f.p. III.a ( ( ) n.f.p.'s b.f.p. ( J doubly degenerate no i.f.p. m -b.f.p.


259

Table II (cont.) III.l.b ( ) n.f.p.'s b.f.p. ( J degenerate no i.f.p. m - n.f.p. (x III. .c ( ( ) n.f.p. b.f.p. ( Jdegenerate xi.f.p. exists i m- i.f.p. X .i.f.p. m --- i.f.p. x- . III.l.d ( ( n.f.p. ) b.f.p. ( Jdegenerate 2- v/ - .i.f.p. exists i.f.p. m i.f.p. JV(cf. I.m) .


260

Table II (cont.) III.l.e ( ( ) n.f.p. ot degenerate ( i.f.p. given by(x xl)x . - 6- - xi.f.p. m - i.f.p. x .J III.l.f ( ( ) n.f.p. b.d.p. (see below) ( j not degenerate x. i.f.p. given by (x xx.i.f.p. I - - x)X- /.A non-trivial b.d.p. exists, which is a f.p. of the transformation x' - x [- - x] This f.p. is given by the root of (xX - - which leads to x x) 31944846, x .(n (n) (n l) xx , x.(n) .68055154, x(n £--£ Actually m - i.d.p. x(n x .17899745, x .(n, (n) . x_(n xn 44248, .i.d.p. x( x .61156007, xz(n .


261

Table II (cont.) III.g ( ( ) b.f.p. ( cf. n) b.d.p. ( degenerate - i .i.f.p. exists X i.f.p. m - i.f.p. - (cf. I.a) X .J III.h ( ( b b.d.p. not degenerate ( i.f.p. given by (x x) x .- - .i.f.p. m -, i.f.p. X- . III.i ( ( b b.d.p. not degenerate ( i.f.p. given by (x - - x.i.f.p. m -, i.f.p. x.J III.a ( n.f.p.'s ( not degenerate ( ) i.f.p. xiXXm - P i.f.p.


262

Table II (cont.) III.b ( ) n.f.p. ( b.d.p. ( J not degenerate i.f.p. xi Xxm -P i.f.p. III.e ( b.t.p. ( b.t.p. not degenerate ( i.f.p. xxxm - i.f.p. III.a ( n.f.p.'s ( doubly degenerate ( i.f.p. XXX m - i.f.p. (xl with initial vectors xl .xl ..and x...m - n.f.p. (x with initial vector x.x.(See Section XI for discussion) X .


263

Table II (cont.) III.b ( n.f.p. ( b.d.p. ( J not degenerate i.f.p. x xxm -- b.d.p. III.c ( ) n.f.p. ( sIb.f.p. (cf. I.n) ( bdp b.d.p. ( degenerate i.f.p. Xlxx m - i.f.p. III.e ( ) b.t.p. ( b.t.p. not degenerate ( i.f.p. lXX m -— i.f.p. III.a ( n.f.p.'s ( > triply degenerate ( i.f.p. xX Xm - no convergence (see Section XI for discussion)


264

Table II (cont.) III.b ( ) n.f-p b.f.p. (cf. I.n) ( b.d.p b.d.p. ( ) degenerate i.f.p. xlx x m -- n.f.p. with initial vectors xl.xl .X.and X..x.m - b.d.p. with initial vector xl .x .x. III.e ( ) b.t.p. ( b not degenerate ( i.f.p. XlX xm - b.t.p. I.f ( b.t.p. ( b.t.p. not degenerate ( i.f.p. Xxxm - b.t.p.


265

Introduction to Table III

This table lists a few 4-variable generalizations (and two 5-variable ones) of selected 3-variable systems. The method of generalization is explained in the text (Section VII, 1.5.j - ext - I and subsequent discussion). The basic notation is that of Table II, but degeneracy, b.f.p.'s, n.f.p.'s, etc., are not noted.

Table III—
Examples of Binary Reaction Systems for N>3

I.l.c - ext - ( ( ( ( f( xi . i.f.p. given by (x xi)x) - X. m - i.f.p. X . I.a - ext - ( ( ( ( x i.f.p. given by (x x .x- - x - x . m - .i.f.p. X.


266

Table III (cont.) k - ext - ( ( ( ( i.f.p. given by (x xl) X. (- - X .m -i.f.p. . I.p - ext - ( ( ( v/ xl ()x i i.f.p. m - b.d.p. -/-xs ^ (1,0,0,, (0,1,0, X . I.g - ext - ( ( ( ( i.f.p. gienby (x( xxl . i.f.p. given by ( x ) .- . m - b.t.p. (1,00,,, (0,0,1,, (0,0,0, X.


267

Table III (cont.) I.i - ext - ( ( ( )x .( .. i.f.p. given by (x xl) x .i.f x - .m -- i.f.p. with initial vectors: x() .x() x) .and x) .x) - .x® .x) .x().m -b.t.p. with initial vector: x) .( . .(.I.e - ext - ( ( ( (x i.f.p. given by (x x .xxx - X.nm - b.d.p. Z.J


268

Table III (cont.) m -* interior periodic points of period 12: Cycle


269

Table III (cont.) m -, interior periodic points of period 6: Cycle i.f.p. given by


270

Table III (cont.) b - ext - ( ( ~~( ( ~xl . ~(~)~ x ( .i f.p' i.f.p. given by (x x X. - - 2 - m - i.f.p. This i.f.p. is very close to that of II.d - ext - (q.v.), as might be expected. I.c - ext - ( ( ( ( i.f.p. given by (x x) . xxxx X .i.f Xm - •i.f.p. X.(cf. I.b - ext - II.d - ext - ( ( ( ( xl .} i.f.p. given by (x x X .X X - ..f.. x .m - i '.f.p. X.(cf. Eq. ( of the text)


271

Table III (cont.) II.d - ext - ( ( ( ( ( x .i.f.p. given by (x xi) .x xxx x- x.i.f.p. (cf. Eq. ( of the text) .m i.f.p. X. III. .c - ext - ( ( ( ( x, .. f • ^ ( ^ X .i.f.p. given by (x X i.f.p. x.- - (-x - . m -* i.f.p. III.f- ext - ( ( ( i( . i.f.p. given by (x XX .- x - xx -X ( m i.f.p. X.


272

Explanation of Tables IV and V

Each of these tables consists in a tabulation of the convergence behavior of random initial points under 26 transformations which are simple modifications of I.3.g. These 26 systems are generated as follows (see Section VIII.2 of text):

In Table IV, we retain the "skeleton" x1 =1/2x2 + 1/2x+ 2xx2 + ..2=2xlx3+ 2x2X3+ ... x' =1/2x2 + ...

To this we add in all possible ways the missing terms 1/2l2 , 1/2x2, 1/2x3. Twenty-six new systems result (the 27th is identical with 1.3.g). The notation is the same as in the previous tables, but we give only the numerical results, omitting comments and i.f.p. equations.

Table V is similar, except that the skeleton is x] =x2 + x3 + x3x2 +.. 3 =3x 2 + 23 + 12+ . x2=1XX3+ X2X3+ ... x3 ~l +... and the missing terms are x1x 2, xlx 3, x 2x 3.

Table IV—
Modifications of System I.3.g.

Case ( ( . x.i.f.p. x. m -— i.f.p. Case x . .i.f.p. m - i.f.p. .


273

Table IV (cont.) Case x . x .\ i.f.p. ( X .m -, i.f.p. Case ( x. x.i.f.p. X.m - i.f.p. Case ( . x.i.f.p. X.m -* i.f.p. Case ( ( x . x.i.f.p. X .m -, i.f.p. Case ( x . x.i.f.p. X . m -, i.f.p.


274

Table IV (cont.) Case ( xi . X., i.f.p. X .m -. i.f.p. Case xi . x .i.f.p. X. m - i.f.p. Case xl . x .i.f.p. X .m - i.f.p. Case xi . X.i.f.p. ( m , i.f.p. Case . .i.f.p. X . m -P i.f.p.


275

Table IV (cont.) Case ( xi . .i.f.p. .m -. i.f.p. Case xI . x .i.f.p. x.m -i.f.p. Case xl . x .i.f.p. ( x.Jm -* i.f.p. Case xi . .i.f.p. ( x.m -* i.f.p. Case xl . x.i.f.p. x . m -, i.f.p.


276

Table IV (cont.) Case ( x x.i.f.p. ( .m -> i.f.p. Case ( x . x .i.f.p. X.m -- i.f.p. (Very slow convergence; IJacobiani .at i.f.p.) Case x .x - -.i.f.p. m i.f.p. X. Case xl . x.) i.f.p. . m -- i.f.p. Case . x.i.f.p. X .m - i.f.p.


277

Table IV (cont.) Case ( xl x .\ i.f.p. x.m -l i.f.p. Case ( . x .) i.f.p. ( X.i.f.p. given by (x xl) - - m -, i.t.p. (n () .87777286, x(n) .00766150, x(n .(n x).00011739, x (nl) .22185332, x - .i.t.p. Xn x) .12210975, X(.77048518, ( Case ( xi . x .) i.f.p. .m -i.f.p. Case ( x . .) i.f.p. ( X. rn - i.f.p.


278

Table V—
Modifications of System I.3.g.

Case ( ( ( ( .( ( x.i.f.p. ( X.m - i.f.p. Case ( ( ( . ( x.i.f.p. ( x.m - i.f.p. Case ( ( ( xI .( ( x.i.f.p. ( ( ( ( X.m -, i.f.p. Case ( ( ( x ( xi.f.p. ( Xm -- i.f.p.


279

Table V (cont.) Case ( ( ( x . ( X.i.f.p. ( X .m -i.f.p. (slow convergence) Case ( ( ( ( ( xi ( ( ( x.i.f. ( X.m - i.f.p. (cf. Case Case ( ( ( xi .( ( x.i.f.p. ( ( X . nm - i.f.p. Case ( ( ( xi .( ( x .i.f.p. ( ( x . m - i.f.p. Case ( ( ( ( ( x ( ( x. ( ( X .m i.f.p. (cf. Case


280

Table V (cont.) Case ( ( ( xi ( ( x.i.f.p. ( ( X .m -- i.f.p. Case ( ( ( ( ( x .i.f.p. ( ( X .m -, i.f.p. Case ( ( ( x .i.f.p. ( ( X .m - i.f.p. Case ( ( ( ( . ( ( X.i.f.p. ( X.m -— i.f.p. Case ( ( ( ( x( ( x .i.f.p. ( X .m -i.f.p.


281

Table V (cont.) Case ( ( ( XXX i.f.p. ( ( ( ( m -* i.f.p. Case ( ( ( xl xx i.f.p. ( ( ( ( m -* i.f.p. Case ( ( ( xi X i.f.p. ( ( ( ( ( ( m -* i.f.p. Case ( ( ( ( xi .i.f.p. ( ( x. ( ( ( (cf. Case mn -- i.f.p. Case ( ( ( ( .( ~ ~(~.i.f.p. ( ( x.( ( ( .m -* i.f.p.


282

Table V (cont.) Case ( ( xl X .} . ( ( .( ( ( (cf. Case m -, i.f.p. Case ( ( x . ( x.i.f.p. ( ( X.J m - i.f.p. Case ( ( ( XX .( ( (cf. Case m -> i.f.p. Case ( ( ( ( x ( ( ( x.i.f.p. ( ( X.m -* i.f.p. Case ( ( ( ( x . ( XX .J ( ( (cf. Case m - i.f.p.


283

Table V (cont.) Case ( ( ( ( x .( ( ( x .i.f.p. ( ( . m -* i.f.p. Case ( ( ( ( . ( X .i.f.p. ( ( X.m -, i.f.p.


284

Table VI—
Super-Systems

e - Super a. Initial Configuration yi .y .y.Y .s .Y .y .Y .yg .b. Initial Configuration yl ..y .Y.Y .Y.Y ..yg .Both of these gave the final periodic configuration (Yi ( y(n) .yn y(n .) .yn) y) - .y V y(n) (n)y(n)(n n .y(n (n) (n) (n) Y Y?n) ) (n _ (n- .Y .y For the initial configuration c. yi .Y ..Y .Y .Y .y .Y .yg .07284413, the final configuration was ( y(): y(n .y- .(n) y) .y .y(n) (n) (n) (n)y(n+- 31944846 y(n)= 0 y(n) =.24574926 (n+l) (n+)0 (n+l) =Y = 2492 6 = . 290536 9 -0(n)= .24574926 )= 40 = 18905302 Y3 .18905302


285

Table VI (cont.) I.g- Super Initial Configurations: a. yi ...Y..Y.Y .Y.yg .b. yl .y..Y.y.Y.Y .Y .y .These both gave the final configuration (y n y(n)): y ) .(n) .y.y) .y(n) .) - .) .(n) () .y y_ (n) yn .(n .._~n) ( _ ) ~ (n)yl .y .y .(n) y(n (nya^ .Y) .Y.y(n y . .(n (n (n Y.Yn .Y .n .Y. .Initial Configuration: C. yi ..y .Y.Y.Y.y .Y .y .This gave the final configuration (yn yi: y ) .y .y() .y(n) .) .( .y) .y() .(n) _n _(n)(n y() .(n) y ._n- _(n .yn) .Y) ..( _) _(n (n Y- .Y ..(n (n (ny( ...(n y (n .Y...(n .(n (n _ .Y.YS~~~~~~n_s .


286

Appendix

1. In this appendix we summarize the results for a class of homogeneous quadratic transformations quite distinct from the class we have called Binary Reaction Systems. This class arises in a natural way in the study of a certain crude model of the "evolutionary process" which will be described below. It also has some mathematical interest, owing to the fact that the limiting behavior of all systems belonging to this class can be explicitly predicted.

2.The Evolutionary Model

Consider a large population in which each distinct "type" of individual is labelled by an index pair (i,j), i,j = 1,2,... ,N. Let the fraction of the male population which is of type (i,j) be denoted by xij. We shall assume that xij = xji; also, we take the number of females of type (i,j) to be equal to the number of males of this type-hence there is no need to denote the fraction of females by a separate letter. We now impose a mating rule (random mating is assumed) which states, in effect, that if individuals of type (k, ) mate with individuals of type (m,n), the progeny will be of all types (i,j) such that min(k, m)<i <max(k, m) min(e, n)<j< max(f, n)

Loosely speaking, we may call each index of a pair a "characteristic." A given mating will produce all possible children such that (1) is satisfied, the distribution of the two indices determining the progeny being the product of two identical distributions. The number of children will, of course, be proportional to the number of parents of each type. Mathematically: x(n+1) _ ,kmnxn(n))(n) (2) ^ij~/--li lj X k£ -"mn k,£,m,n

The sum in (2) is to be carried out under the restriction (1). We specify the system further by postulating: m= m mk > 0, min(k, m)<i < max(k, m) z - (3) = 0 otherwise; k i/kN 1 (4) i=m


287

ikrn m + k-EZkm=rn+k 2(5) i=m In addition, we normalize by taking: N Zx)- = 1,<x)< 1, alli,j (6) i,j=1 It is then evident that we have: N E ij i,j for all n.

The class of systems defined above is the one we shall actually discuss. However, it may be of interest to at least mention the actual evolutionary model in connection with which equation (2) arises. Evolution is assumed to take place by mutation. A type (i,j) can give rise to two new types (i+1,j) and (i,j+1) with some small probability. When we include this (linear) effect, we get a series of equations: Xij = -exij + e/2(xi-l,j + xi,j-) + Z 7kmnXktxmn (7)

Here E is taken as some small number. We actually performed many numerical experiments on systems of the form (7) with special values of the <ym satisfying (3), (4), (5). Two particularly convenient choices are: -= 2-k i-min(j,k)) (8) and k = Jj-kJ+ , min(j, k) <i< max(j, k) = 0 otherwise

Note that with our definition (3) we may take the sum in (2) as unrestricted, i.e., over-all k, , m,n = 1/2,..., N. We shall not discuss the behavior of the "mutating" system (7) in this report, but rather restrict ourselves to the pure "mating" system (2).*

It may be objected that our mating rules have nothing to do with Mendel's Laws. This is intentionally the case. The Mendelian case has been treated in great generality in a series of papers by Hilda Geiringer;7 , See also C. C. Li:8


288

3. If we sum over one of the indices in (2), we obtain the system: N C (n+l) = E "kmFC (n)(n) i1,... N (10) k,m=l N Ci-xij (11) j=l and, of course, N E C(n)= 1 all n. (12) i=l

By virtue of condition (5) on the km, the system (10) possesses a linear invariant (distinct from E Ci = 1); in fact: E jC(n+ - EC()C(n)E jm = E (n)C(n)(e+ ) jm 29 j e,m j £ ,m = E7 cj) (13)

The consequences of this property are very interesting. It turns out that the existence of this linear invariant enables one to predict explicitly the limiting behavior of any initial vector (C(O) C°O),...CO)) when iterated according to (10). Using the fact that EN Ci = 1 is also an invariant, we may define an invariant: N-1 a--, (N - i)Ci (14) i=l

a is, of course, explicitly determined by the initial vector. It can then be proved that every initial vector will converge to a definite fixed point which is determined as follows: For the given value of a, there is one index j such that: N-j>a> N-j-1 (15) The f.p. is then explicitly given by: Cj = a- (N - j - 1) Cj+i = N-j-a o (16) all other Ci = 0


289

The f.p. is independent of the actual values of the coefficients "km providing they satisfy (3), (4), and (5).* These results can easily be referred back to the original variables xij. Defining a quantity a from (16) by: Cj - l+ (17) we find that the corresponding symmetric tensor (x(°) will converge to the final state: 1 3 (1 + a)2Xj+l,j = j,i+l- (1 + a)2 (18) a2Xjj+I,j= (1 + )2

The results can also be extended to the case of M "characteristics" il, i2,- ·-iM The fraction of the population of type (il,. . .,iM) is then denoted by xii...iM. This is taken to be symmetric in all M indices; hence, there are: (N+M-1) M

types of individuals in the population. The corresponding mating rule is: X(n+1) = E \^,1S,354 S2M-1S2M il...iM...~S1S2 S3S4...• ' ' 'l 'Yi2M' ' ''M S1,S2S 3,S 4(n () (19) S1S3...S2M-1S2S4...S2M

If we sum over M- 1 indices, we can again define:

*Of course the rate of convergence will, in general, depend on the actual values of the "km. In one case it is actually possible to solve explicitly for the iterates as functions of n. This is the case N = 2 where we find: (n)ii - n [O) 2(2 "X~l)--2- Ix?) + or (2 - 1)] X(0) (0) 11 lx12 This case can be viewed as a generalization of the Mendelian law for a single gene. The population has the same limiting configuration in both cases, but for Mendel's rules equilibrium is reached in a single step "(Hardy's Law").


290

N Ci il...iM (20) i2,...,iM=l and we again get the system (10). In terms of the tensor xi, . . iM, the final state is: •• i = (1 + a)Mxj,j+ ...j = (1 + )M = (1 + a)M (21) Xj+l,j+l...I+ = (1 + c)M Here j and a are determined as in (16) and (17).

4. The explicit nature of these results is due to the existence of the linear invariant a, which in turn is a consequence of the "meanpreserving" property (5) assumed for the coefficients yijk. However, the actual convergence properties are probably more closely connected with the "index-limiting" condition (3). (Our Binary Reaction systems do not, in general, have this property; the existence of oscillating final states may well be a consequence of allowing such mating rules as jEj - k 5 j.) Some cases have been investigated in which the conditions on the coefficients were relaxed. In particular, we have considered systems for which we no longer require: jk > 0 all j, k. As an example, we chose the case: N yi = bj,k + 6j+l,k + 6j-l,k (22) i=l

This corresponds to a sort of "selective mating scheme" in which only individuals with "nearby"- indices can mate. Of course we then no longer have E Ci = 1. In order to secure this property, we must "renormalize" at every step, i.e., set: C'i *


291

If this is formally written out, the system is no longer quadratic. There are then many types of fixed points possible. So far it has not been possible to give an analytical treatment of such systems.*

References

1. See, e.g., R. Sauer, Math. Ann.106, 722, (1932), 0. Baier, Math. Ann. 112, 630.

2. Coddington and Levinson, Theory of Ordinary Differential Equations, Chapters 15, 16, McGraw-Hill, New York (1955).

3. V. Volterra, Leqons sur la The6rie Mathematique de la Lutte Pour La Vie, Gauthier-Villars, Paris (1939).

4. H. Weyl, Math. Ann. 77, 313 (1916).

5. S. Ulam and J. von Neumann, Bull. Am. Math. Soc. 53, 1120 (1947).

6. L. E. Dickson, Modern Algebraic Theories, Chapter II, Sanborn, Chicago, (1930).

7. Hilda Geiriger, Ann. Math. Statistics, 15, 25-57 (1944), Genetics,33, 548-564, (1948). 8. C. C. Li, Population Genetics, Univ. of Chicago Press, Chicago (1955).

*Note, however, that if we simply take 7j k = ,we find that every initial vector is a fixed point.


292

Enrico Fermi

Johnny von Neumann with 11 year old Claire Ulam

Bob Richtmyer

David Hawkins

From left to right, Ulam, Mycielski, Bednarek

Bill Beyer

"Computers can also be used to investigate... and with less success, to study games of 'skill' like chess." (See page 302.) P. R. Stein playing the first game of chess he and others had programmed with Ulam in 1957 against the MANIAC.


293

11—
Non-Linear Transformation Studies On Electronic Computers:
With P. R. Stein (LADC-5688, 1963)

This paper is a continuation of the study initiated and reported in the preceding chapter on Quadratic Transformations. Interactions of polynomial transformations, particularly cubic transformations in three variables, as well as the asymptotic and ergodic properties of the sequences of iterated points are considered. The theme of the computational study of difficult problems in pure mathematics is exemplified. (Eds.)

Introduction

This paper will deal with properties of certain non-linear transformations in Euclidean spaces--mostly in two or three dimensions. In the main they will be of very special and simple algebraic form. We shall be principally interested in the iteration of such transformations and in the asymptotic and ergodic properties of the sequence of iterated points. Very little seems to be known, even on the purely topological level, about the properties of specific non-linear transformations, even when these are bounded and continuous or analytic. The transformations we study in this paper are in fact bounded and continuous, but in general many-to-one, i.e., not necessarily homeomorphisms. In one dimension such transformations are simply functions with values lying in the domain of definition; for example, if f(x) is continuous and nonnegative in the interval [0.1] and max[f(x)] < 1, then x' f(x) is a


294

transformation* of the type considered. Even in one dimension, however, nothing resembling a complete theory of the ergodic properties of the iterated transformation exists. On the algebraic side, we study in this paper the invariant points (fixed points), finite sets (periods)-and invariant subsets (curves) of these transformations-together with the means of obtaining them constructively. The topological properties of two (not necessarily one-dimensional) transformations S(p), T(p) are identical under a homeomorphism H: when S(p) = H[T[H-l(p)]]. When S and T are themselves homeomorphisms-and for one dimen sion-necessary and sufficient conditions for conjugacy are known.1 When S and T are one-dimensional, but not necessarily one-to-one, it is possible to give a set of necessary conditions for conjugacy; no meaningful sufficient conditions, however, are known.

For example, the set of fixed points of S has to be topologically equivalent to those of T. The same must hold for the set of periodic points, i.e., points such that the nth power of the transformation returns the point to its original position. The attractive and repellent fixed points must correspond, etc. These conditions are known from the corresponding study of homeomorphisms. For many-to-one transformations one may generalize these conditions by considering the tree of a point. For a given transformation T we define the tree of a point P as the smallest set Z of points such that: a) P belongs to Z. b) If a point Q belongs to Z, then T(Q) belong to Z. c) If Q belongs to Z, then all points of the form T-1 (Q) belong to Z.

Obviously, for two transformations to be conjugate, the trees of corresponding points must be combinatorially equivalent and, in addition, their topological interrelations must be the same.**

The present study was initiated several years ago2 with the consideration of certain homogeneous, quadratic transformations which we called binary reaction systems. A typical example is the following: x1 = x2 + 2x132 , I = 2x1X3 + 2X2X3, (1) 2 X3 - X1 ,

* Here and throughout the paper a primed variable always represents the value obtained on the next iterative step. In a more explicit notation, the above equation would read: x( n+ l) = f(x(n)). ** One-dimensional transformations are considered in more detail in Appendix I.


295

where we consider initial points P with coordinates xl,x2,X3 satisfying: O < Xi < 1, x + x2+x3= 1 . (2) Since xI + x2 + x3=(Xl +x 2 +x3)2

the transformation (1) maps the two-dimensional region (2) into some sub-region of itself. The choice of these transformations was motivated by certain physical and biophysical considerations. For example, the set of equations (1) could be interpreted as determining the composition of a hypothetical population whose individuals are of three types, conventionally labeled 1, 2, and 3. The xi would then represent the fraction of the total population which consists of individuals of type "i." The transformation can be thought of as a mathematical transcription of the mating rule:

type 2 and type 2 produce type 1 , type 3 and type 3 produce type 1 , type I and type 2 produce type , (3) type I and type 3 produce type 2, type 2 and type 3 produce type 2, type 1 and type 1 produce type 3 .

For any assigned initial composition, i.e., any initial vector (x1 , x2, x3 ) satisfying (2), we may then ask: What is the final (or limiting) composition of the population after infinitely many "generations," that is, after infinitely many matings according to the scheme (3)? In the present context, a mating rule can be defined as a system of three non-linear first-order difference equations of the form: x1= fi(x1,X 2,X 3), x2 = f2(xl, 2, x3), (4) X3= f3 (Xl, x2, x3)

where each fi is the sum of some subset of the six homogeneous monomials x2, x2 , x 3, 2x1X2, 21 Xl3 , 2x2x3, and each such term must belong to one and only one fi. Two transformations are called equivalent if they are conjugate under the (linear) transformation defined by a given permutation of the indices 1,2,3. (This is the only linear homeomorphism which preserves the homogeneous quadratic character of the transformation.) Under this definition of equivalence, it turns out that there are 97 inequivalent transformations of the above type. It


296

quickly becomes apparent that, despite their formal simplicity, these transformations are very difficult to study analytically, particularly if one is interested in their iterative properties. For example, for most initial points in the region of definition, the sequence of iterates generated by repeated application of the transformation given by equation (1) converges to a set of three points: P2 = T(pl) , p3 = T(p 2) , (5) P1 = T(p3)

Using a standard terminology to be explained in detail below, we say that the "limit set" is a "period of order three." It is clear by inspection of transformation (1) that another limit set exists; if we write pi = (XI = 1, x2 = X3 = 0), P2 = (XI = X2 =0, X3 = 1), then P2 = T(pi), pi = T(p2) (6) In addition there is the algebraic fixed point of (1): p = T(p) . (7) The general initial vector, however, always leads to (5). Certain other quadratic transformations show an even more complicated behavior.

An example is the transformation: x1= x2+ x3 + 2X1X2, x2 = X12+ 2x2x3, x3 = 2xx3 (8)

This bears a close formal relationship to (1); in fact, they differ only by the exchange of a single term. The limit sets, however, are quite different. Transformation (8) has an attractive fixed point with coordinates: 1 = , x3 . (9) 31=2, 2- 4 '3= 4 (9)

It also has a limit set of the type (6) with pi = (1, 0, 0), P2 = (0, 1,0). In this case, both limit sets are observed. It is found experimentally that the set of initial points leading to (9) is separated from those leading to the oscillatory limiting behavior (6) by a closed curve surrounding the fixed point (figure 2 of chapter 10). The analytical nature of this boundary curve remains unknown.

In view of the complicated behavior exhibited by these examples, we felt it would be useful to study these transformations numerically,


297

making use of the powerful computational aid afforded by electronic computing machines. From one point of view our present paper may be looked on as an introduction, through our special problems, to modern techniques in experimental mathematics with the electronic computer as an essential tool. Over the past decade these machines have been extensively employed in solution of otherwise intractable problems arising in the physical sciences. In addition to solving the particular practical problem under consideration, this work has in some cases resulted in significant theoretical advances. Correspondingly, attempts to solve difficult physical problems have led to considerable improvements in the logical and technical design of computers themselves. In contrast, the use of electronic computers in pure mathematics has been relatively rare.* This may be partly due to a certain natural conservatism; in our opinion, however, the neglect of this important new research tool by many mathematicians is due simply to lack of information. In other words, the average mathematician does not yet realize what computers can do. It is our hope that the present paper will help to demonstrate the effectiveness of high-speed computational techniques

in dealing with at least one class of difficult mathematical problems. With this end in mind, we have devoted the first section of our paper to a brief discussion of how computing machines can be used to study problems in pure mathematics. Much of this section is introductory in character, and is meant primarily for those readers who have had no firsthand experience in the use of computers. It also includes, however, a description of the numerical techniques used in this study; these may be of interest even to seasoned practitioners.

After our study of quadratic transformations in three variables,** we decided to investigate the iterative properties of other classes of polynomial transformations. As a natural generalization of the quadratics described above, we consider transformations of the form: xi -fi(x,...,Xk) (i = to k), (10) where the fi are disjoint sums of the homogeneous monomials which arise on expanding the expression: / k\ m F - xi (11) i=l

* Perhaps the greatest computational effort has been expended on problems in number theory. See refs. 2 and 5. ** We shall not discuss this work here. Full details are contained in the above references. That report contains, in addition, some fragmentary results on a few particular quadratic transformations in higher dimensional spaces.


298

The number of such terms, each taken with its full multinomial coefficient, is Nm ().m + k- )(12) By construction, kY,fi=F ,i=l so that if we take k xi= 1, xi > 0, (13) i=l

the (additive) normalization of the xi is preserved. We are then dealing with positive transformations in a bounded portion of the Euclidean space of k - 1 dimensions, i.e., just the hyperplane defined by (13). If m = 2, k = 3, these transformations are the 97 quadratics in three variables introduced above. The bulk of the present paper is devoted to the case m = k = 3, i.e., cubic transformations in three variables; there are 9370 independent transformations of this form. We have also examined the 34337 quadratic transformations in four variables, but our analysis of the results is not yet complete (January, 1963); for this case (m = 2, k = 4) we include only some statistical observations and a few interesting examples. These three cases-m = 2, k = 3; m 2, k = 4; m = 3, k = 3--are the only ones for which an exhaustive survey is at present feasible. For other values of m and k the number of transformations to be studied is much too large.

The determination of the exact number Tkm of inequivalent transformations for arbitrary m and k is an unsolved combinatorial problem. It can, of course, be reduced to enumerating those transformations which are invariant under one or more operations of the symmetric group on the k indices, but no convenient way of doing this is known. The problem, however, is not of much practical significance. A lower limit T*m to the number Tk of inequivalent transformations is given by: TkF = Sk, (14)

where Sj is the Stirling number of the second kind. S, is also the number of ways of putting i objects in j identical boxes, no box being left empty. This underestimates Tk by assuming in effect that each transformation has k! non-identical copies, i.e., that no transformation is invariant under any permutation (except the identity). The following table illustrates the trend:


299

The Tk were obtained by direct enumeration-using, of course, all known shortcuts. For m = 2, k = 4, this enumeration was actually performed on a computer. In view of the huge values of the Tk'm in the lower half of this table, it is unlikely that anyone will be interested in attempting a comprehensive numerical study of these transformations for values of m and k larger than those we have considered.

A general discussion of our results for the cubics in three variables and the quadratics in four variables is given in section II; the reader will also find there formal definitions of a few basic concepts and an explanation of the special terminology employed throughout the paper. Perhaps the most interesting result of this study is our discovery of limit sets of an extremely "pathological" appearance. The existence of such limit sets was quite unexpected,*-and is indeed rather surprising in view of the essential simplicity of the generating transformations. Sections III and IV are concerned with the effect--on the iterative properties of our transformations-of two types of structural generalization. Specifically, in section III we consider the one-parameter generalization-called by us the "At-modification"-which consists in replacing equation (10) by: i = (1- At)Xi + tfi(xl,... ,Xk) (i = I to k), 0 < At < 1 . (12a)

This generalization has the special property of leaving the fixed points of the transformation invariant, although their character-i.e., whether they are attractive or repellent-may be altered. The detailed

* Quadratic transformations in three variables apparently do not exhibit similar pathologies.


300

discussion of the behavior of such transformations under variations of the parameter At is limited to the cubic case.

Section IV describes the result of introducing small variations in the coefficients of the monomials which make up the various fi. Again we deal only with the cubic case, and indeed only with a few interesting examples chosen from our basic set of 9370 transformations. Let us denote the Nk monomials (e.g., x3 , 3x2x2,6x1x2x3,...) in the expansion of (11) by the symbol Mj, j = 1 to Nk. The assignment of a particular index to a particular monomial is arbitrary.

Then we have k fi= E dijMj,<i<k, (13a) j=l with dij = 1 or 0, (14a) k E i= 1. (15) i=l

The generalization then consists in relaxing the restriction (14). If this were done subject only to the condition that the dij all be nonnegative, we should be dealing with a (k - 1)Nk parameter family of positive, bounded, homogeneous polynomial transformations. At present nothing significant can be said about this class as a whole. As explained in section IV, our procedure has been to study one-parameter families of transformations which are in a certain sense "close" to some particular transformation of our original set.*

In section V we give a brief, heuristic discussion of the connection between our transformations-which are really first-order non-linear difference equations--and differential equations in the plane. Our conclusion is that the connection is not, in fact, very close, and that the techniques so far developed for treating non-linear differential equations do not seem suitable for handling the problems discussed in this paper.

* Some analogous but rather unsystematic investigations were carried out on quadratics in three variables, and are contained in the report cited in ref. 2. Subsequent to the appearance of that report we made some studies (unpublished) on quadratics with randomly chosen positive coefficients satisfying (15). For quadratics (at least in three variables), the conclusion seems to be that such randomly chosen transformations are most likely to lead under iteration to simple convergence for almost all initial points.


301

The final section of our main text-section VI--contains a description of a class of piece-wise linear transformations on the unit square. These transformations exhibit interesting analogies with our polynomial transformations in three variables. Relatively little work has been done on this "two-dimensional broken linear" case, but the preliminary results we report seem to indicate that a detailed study might prove worthwhile.

There are two appendices: Appendix I is largely devoted to an extended discussion of certain non-linear transformations in one dimension, on the unit interval. Some of these are special cases of our cubics in three variables; others originated independently of our principal study. It is perhaps rather surprising how little can be said theoretically even about this simple one-dimensional case. It turns out that some of the same phenomena are observed in one dimension as are found in the plane-e.g., the apparently discontinuous behavior of limit sets as a function of a monotonically varying parameter. Of course, the repeated iteration of a one-dimensional transformation is a much simpler matter than the corresponding process in several dimensions. However, as we soon discovered, great care must be taken to avoid the phenomenon of "spurious convergence." This point is discussed in some detail and a few-rather alarming-examples are given.

Appendix II contains the bulk of the photographic evidence- including the "pathology" of the limit sets-on which the discussion of sections III and IV is based. These pictures, together with others scattered throughout the main body of the text, constitute in a sense the unique contribution of this paper. In retrospect, it seems unlikely that our investigation could have been successfully carried out without the visual aid afforded by the oscilloscope and the polaroid camera. Put in the simplest terms, unless one knows precisely what one is looking for, mere lists of numbers are essentially useless. Automatic plotting devices however, such as the oscilloscope, allow one to tell at a glance what is happening. Very often the picture itself will suggest some change in the course of the investigation-for example, the variation of some hitherto neglected parameter. The indicated modification can often be effected in a few seconds and the result observed on the spot.*

Visual display is of very great value when one is in effect studying sets of points in the plane; when one passes to three dimensions automatic plotting ceases to be merely a convenience and becomes essential. * This interaction of man and computing machine has sometimes been referred to as "synergesis."3


302

A glance at our pictures of three-dimensional limit sets--the result of iterating certain quadratic transformations in four variables-should convince even the most skeptical reader. In our opinion, it would be virtually impossible to make sense out of a mere numerical listing of coordinates of the points plotted in these photographs.

Of the many who have helped with this work, there are three to whom we are particularly indebted: Cerda Evans, Verna Gardiner, and Dorothy Williamson. These ladies did the actual coding and supervised all the machine calculations. Without their help this paper could not have been written.

I—
The Role of the Computing Machine

1. The use of electronic computers for the solution of complicated or tedious problems, usually of practical origin, is by now familiar. Typical computer tasks are: the evaluation of integrals, the solution of large systems of linear equations, the solution of minimax problems (linear programming), the treatment of complicated boundary value or initial value problems, etc. One of the more impressive jobs that computers have done is to calculate the time history of immensely complicated physical systems (e.g., involvinig hydrodynamical motions, magnetic fields, etc.). Recently there has been considerable interest in using computers to attack problems of a less applied nature, for example those arising in combinatorial analysis4 and number theory.5 This work often takes on an experimental flavor; such experimentation has led to results of considerable interest, for example, the construction of certain types of mutually orthogonal latin squares.6 Computers can also be used to investigate formal mathematical systems,7 to reduce symbolic expressions,8 and with less success-to study games of "skill" like chess.u

The use of computing machines that we describe in the present paper differs in two respects from the examples just cited. On the one hand, our study is not essentially combinatorial in character, but falls rather in the domain of algebra and real variable function theory. On the other hand, we are not attempting to "solve" some welldefined problem; instead we investigate via repeated trials the asymptotic properties of certain non-linear transformations, usually without any advance knowledge of what we may find in a given case. Even "after the fact," so to speak, it is difficult to classify these asymptotic properties in a meaningful fashion; the broadness of the categories we employ for this classification* is merely a measure of our lack of insight into the structure of the observed limit sets.

* See section II.


303

Faced with this situation, one may ask the question: how does one recognize "convergence" --- -i.e., the existence of an invariant set -when one has no a priori numerical criteria to apply? We can only supply a partial answer to this question, but that answer has the advantage of simplicity, viz.: "use your eyes." The practical application of this "technique" involves, of course,* the use of automatic plotting devices.

2. Roughly speaking, computing machines are devices which perform the four elementary arithmetic operations on numbers in a certain--not necessarily simple -sequence. This sequence of operations is called the "program," and consists of a set of logical commands, both of the sequential ("do this and then do that") and of the branching ("if this holds, then do that") type. The program is composed by an investigator (the "programmer") and must therefore reflect his own limitations. Nevertheless, the machine may easily produce results quite unanticipated by the programmer, even if the program is essentially deterministic in nature.** A classic example-which happens to be relevant here-is the step-by-step application of some recurrence relation which generates a sequence whose trend the programmer cannot determine in advance. As an example, we may cite the following one-step recursion in a single variable: Yn+l = Wn(3 - 3Wn + (wn), Wn 3yn(1 - Yn) (1)

Given some initial 0 < yo < 1, we may ask: what is the result of applying the rule (1) N times, where N is some larger number, say 105? This particular transformation is discussed in detail in appendix I; here we quote three examples for the purpose of illustration.

(a) If a( = 0.99004, then for almost all yo the sequence of iterates produced by (1) converges (in < 105 steps) to a period of order 14. (b) If a = 0.99005, the corresponding limit set is a period of order 28.***

* Hand plotting is in general highly impractical, and clearly relinquishes the principal advantage of machine computation: SPEED. ** Strictly speaking, all programs used on digital computers are deterministic in nature: even when randomn numbers are employed, these are generated according to some fixed algorithm so that the sequence is in principle known. *** These results were found by using IBM "STRETCH" computer. The periods are exact to within the accuracy of that machine, i.e., 48 binary digits (-15 decimals). See further in appendix I.


304

(c) If a = 0.99008, no finite period is observed after N = 5 x 105 steps.

So far as we are aware, this behavior could not be predicted by current analytical or algebraic techniques. Such phenomena are easy to study on a computer, however, because of the great speed with which it can carry out the (relatively simple) operations implied by an expression such as (1) above. In fact, 200000 iterations of this transformation takes slightly less than one minute on a really fast computer.*

3. As we mentioned in the introduction, the principal content of this paper is the study of the asymptotic properties of certain nonlinear transformations of relatively simple form. This means that, if T is such a transformation, we examine the sequence T(p), T2 (p), T3 (p), ...

for various initial points p lying in the domain of T. The mathematical object of interest to us is the set (or sets) of points to which these sequences converge. In the absence of any general analytical technique for calculating these "limit sets," we must have recourse to "bruteforce" methods.

Some non-linear transformations which appear morphologically similar to those considered here can in fact be completely analyzed by elementary methods. We discovered one such case in the course of some earlier work on biological systems. It is described in our report on quadratics in three variables (see chapter 10). We restate these special results here: Let N C' = 7" CkC , 1 l<i<N, (2) k,m=l with coefficients satisfying ykm= im > 0, min (k, m)<i< max (k, m), kcmmke (3) ,k= 7y > 0, otherwise, k E y= (4) i=m

* This figure applies to STRETCH and includes all additional "diagnostic" operations such as checking for "convergence," etc.


305

kE m (5) (5) t=m2 We normalize the Ci by N 0 < C, < 1, all i, Ci= 1. (6) i=l

This property is clearly preserved under iteration. With the coefficients defined as above, there exists a linear invariant: N-1N-1-y(N - i)Ci = (N -i)C:. (7) i=ii=l

Given an initial vector (C(), C() , ..., C() whose coordinates satisfy (6), a is explicitly determined. It can then be shown that every initial vector satisfying (6) converges to a definite fixed point which is determined as follows. For the given value of a, there is one value of the index j such that N-j > > N-j-1. (8)

The fixed point is then explicitly given by Cj = a -(N -j-1), Cj+1 = N -j - a, all other Ci = 0. (9)

Note that the fixed point is independent of the values of the coefficients ykm.

As simple examples of coefficients satisfying (3), (4), and (5), we may mention km 1 I k-m ,Y. (10) 2k-ml i- min (k, m) and { ik-ml+l if min (k,m)< i < max (k,m), (11) 0, otherwise.

For a fuller discussion of this transformation and its possible applications, we refer the reader to the original report.

The term "brute-force" refers to the fact that, in order to determine the convergence properties of some transformation T belonging to our class, we must in general actually evaluate Tk(p) for k = 1, 2,..., N, where N is likely to be quite large, sufficiently large that


306

is, so that we can observe convergence* to the limit set. To make matters clear, let us consider a specific example. We choose the cubic transformation I1 3 1= X3 + 3xx3 + 3x3 x2 + 6x1 X2 X3 x2 = x + 3X2X3 + 3x3x2, (12) 3 = x2+ 3xx12 + 3X2X . We take some initial point p = (X1, x2 , x3 ) whose coordinates satisfy: x1+x2+±x3= 1, 0<xi < 1, i= 1,2,3. (13)

The program then instructs the computing machine to evaluate the right hand side of (12), thus producing a new point p' = (xI,x, x3); the coordinates of p' again of course satisfy (13). p' is then set to p, and the process is repeated. The iteration proceeds in this fashion until either some finite limit set is found** or an invariant set-presumably infinite-is "observed." The observation consists in looking at successive groups of consecutive iterates-in practice we have usually taken 900 points at a time-until no qualitative visual change is noted over a sample of several successive such groups of points. Since the transformation (12) is really two-dimensional, we may plot the successive points p in the plane. Accordingly, we define new coordinates S, a by the linear transformation*** 1+ x1 - 3 2 (14) S=a- =-. (14) 2 2

The domain of the transformation is then the 450 isosceles triangle: 1 2 0 S I

* "Convergence" must be of course understood in some approximate numerical sense. Our usual criteria are set forth in the next subsection. ** See the next subsection. *** These are the coordinates employed in our earlier work on quadratics in three variables; we have retained them more for historical reasons than for any particular advantage they may possess.


307

In terms of these new variables, (12) takes the form: Si -Is3 -15S23 2 3 3S' =-1 S - S2a- Sa +-a+ 6Sa -3a + l- F(S, a), 2 2 2 2 3 139 7(15)a'= 3 +Sa + Sa2- a3 - 6Saa + 3a G(S, a) 2 2 2 2

The computer is instructed to store 900 successive points p(S(n),a(n)), p(S(n+1), a(n+l)), ..., and, when the last point has been calculated, to plot all 900 points on our oscilloscope screen.* If we choose, we may then photograph the resulting pattern with a polaroid camera. Such a photograph is shown in figure 1. Here one sees 900 successive high-order iterates (n =2700 to 3600) of the initial point, S = 1/2, a = 0.17. For convenience, the triangle of reference is also shown. Fig. 1

This calculation --as well as all others which produced the photographs in this paper--was performed on the Los Alamos Laboratory's MANIAC II computer.** MANIAC II requires about 15 seconds to calculate 900 iterates of a point by repeated applications of

* The points are actually plotted in the order in which they are calculated, the whole pattern being replotted as many times as we wish. Actually, the plotting of 900 points is effectively instantaneous so far as the human eye is concerned. If we wish to see the points plotted in succession, we must introduce artificial time delays between the plotting of successive points. ** For the use of other computing machines in this work, see the next subsection.


308

a cubic transformation like (12) above. This figure includes the time spent in examining the successive points for simple convergence, as well as other "diagnostic" operations.* The actual numerical values of the coordinates may be printed out whenever desired by simply flipping a switch. On MANIAC II a decimal number is normally limited to eight significant figures. In the present paper, when there is occasion to quote numerical values obtained from MANIAC II print-outs, we shall generally reproduce them to seven figures without further specifying their accuracy.

Computer programs are, of course, not limited to generating sequences of numbers from an iterative formula such as (12). A considerable amount of sophistication can be incorporated into such a program so as to allow the machine to make "decisions" in the course of the calculation. It can, in fact, examine any property or any functional of the data that the programmer can describe in appropriate terms. One problem that is met with frequently in this work is to determine the points in a sequence of iterates that lie closest to some point, say within some chosen angle or set of angles. This sort of experiment is frequently of help in elucidating the local structure of a complicated limit set. Then again we may want to determine the average values of S and a, i.e., ergodic means, taken over the sequence. To achieve any sort of accuracy in such problems** we may be required to go to 50000 or even 100000 iterations. One saving feature is that several such diagnostic experiments can be carried out simultaneously. There are, however, special questions that must be dealt with by special programs. One such question arises in connection with our illustrative transformation (12). The complicated limit set shown in figure 1 is not the only one observed. This transformation has an attractive fixed point at:

S = F(S, a) = 0.6259977, a = G(S, a) = 0.1107896 ; (16) indeed, the eigenvalues of the jacobian matrix*** evaluated at this point are complex, with \A12 = 0.4366967. Consequently, there must be a neighborhood of this point in which all sequences will converge to it. The only way to find the boundary of this neighborhood is by

* For reasons of accuracy, the calculations are performed in the xi coordinates; the transformation (14) to the S, a coordinates is carried out only for plotting purposes. ** More properly, to have confidence in the results. The accuracy cannot always be satisfactorily estimated. *** The criterion for the nature of a fixed point is discussed in Section III.


309

trial and error. This is a time-consuming job, even for an electronic computer; if one picks a point close to the boundary of the region of convergence, several hundred-or even several thousand-iterations may be required before one can tell whether the chosen point lies inside or outside the region. Figure 2 shows the approximate boundary for the present case, drawn through 107 experimentally determined points. One of these is known to one part in 107, while the others have been determined only to I part in 104.* Fig. 2

4. General Procedure

a. Cubic transformation in three variables. Enough has been said above to make clear the necessity of using an electronic computer in such investigations. We must now say something about the systematic aspects of the study. All 9370 cubic transformations were initially studied on an IBM 7090.** First a complete list of inequivalent cubics was prepared-this was also done on the 7090-incidentally serving to check our original pencil and paper enumeration. Then by a completely automatic procedure, each transformation was taken in turn and four randomly-generated initial points were each taken as the start of an iterative sequence. For each point the iteration was continued until either convergence to a finite set of points was "observed" or 10000 iterations had been performed. By "observed" we mean that the machine sensed convergence to a fixed point or to a finite period of order

* The point S = 1/2, a = 0.2952833 lies in the region of convergence, while the point S = 1/2, a = 0.2952834 gives rise to a sequence which converges to the class IV limit set (see definition in section II). ** This computer is approximately five times as fast as MANIAC II.


310

<300. More precisely, the computer was programmed to test whether the following conditions were satisfied X(n')-_) < 107, i = ,2, 3 . (17) \ -xi\<10- i=1,2,3. (17)

If (17) is satisfied, a finite limit set has been reached to within the indicated accuracy. For n = n1 + I this means convergence to a fixed point ("simple convergence"). Otherwise, the limit set is a period of order n - ni. In practice, values of the xi were stored at fixed time steps nl = 300, 600,..., the test (17) being performed on each step. If "convergence" was found, the appropriate values of the xi were printed out and the next random initial point was used, etc. If no such convergence was found after 10000 steps, the values of the iterates for the last few steps were printed, and the computer proceeded as before.

When all the cubic transformations had been studied in this fashion, the "interesting" cases--i.e., those in which no convergence was observed-were examined one by one on MANIAC II, where the visual oscilloscope display could be consulted. Many cases of apparent non-convergence turned out in fact to be convergent with the iteration carried further. It should be stressed that the restriction to 10000 iterations, which we imposed in the course of the systematic, fully automatic survey of all cubic transformations, was merely one of convenience; without some such reasonable limitation, the automatic survey would have taken too long. The same remark applies to the decision that only four randomly generated initial points be taken for each case. Past experience has shown that this last restriction is not unreasonable when a complete survey of transformations is contemplated. By this we mean that the behavior of an arbitrary transformation of our class is "likely" to be defined even if iterates of only four random points are studied. To be sure, in some cases the limit set depends in a very complicated way on the initial point; for such a transformation this crude sampling technique is not adequate. In these cases, however, the four random trials are likely to produce two difference limit sets; this in itself is an indication that the transformation in question should be studied in more detail.

For the detailed examination of a given transformation, many relatively sophisticated MANIAC programs are available. We may, in effect, study any properties of the transformation that seem of interest. Typically, these may include:

1. Determination of non-attractive fixed points (see section III). 2. Checking for periodicity. 3. Exhibiting some qualitative properties of the mapping, e.g., by showing the images under the transformation of a family of lines.


311

4. Determining the dimensions of the limit set. 5. Verifying that low-order periods are attractive (see section III). 6. Examining the dependence of the limit set on the initial point. We cannot expatiate here on the actual procedures involved; sufficient to say that the use of visual display (i.e., the oscilloscope plot) is an essential tool in all this analysis.

b. Quadratic in four variables. All (34337) inequivalent transformations of this class were studied by the same fully automatic method as that used to study the cubics. For this purpose a faster machine than the IBM 7090 was clearly required; we were fortunate enough to have access to the IBM 7030 STRETCH computer, which is approximately 4 times as fast as the 7090 and 20 times as fast as MANIAC II. Only partial results are reported in section II, since our analysis of the STRETCH print-outs is not complete.

The detailed study of a given quadratic in four variables is more difficult than the corresponding analysis for the three variable cubics: the domain is three-dimensional, being in fact the tetrahedron defined by 4i = 1, 0 < i<1, 1 <i <4. i=l

Thus a meaningful visual display involves plotting some properly chosen projection of the three-dimensional limit set. In some cases it may require several trials before an appropriately "revealing" viewing angle is found; consequently it was not feasible to plot every potentially "interesting" limit set in this fashion, and some sort of selective procedure had to be resorted to. The method we chose was to look at three plane projections first--e.g., xl versus x2, xl versus x3 , and x2 versus X3. It turns out that one soon develops a feeling for the "interesting" case even without being able to build up an image of the actual three-dimensional configuration from the plane "slices." More serious than this purely technical difficulty is that resulting from the generally more complicated dependence of the limit set on the initial point: it turns out that in these transformations one is much more likely to miss something by restricting one's self to a few randomly generated initial points. At the present time, lacking any local or structural criteria for the prediction of asymptotic behavior, we see no way to overcome this difficulty.


312

II—
Limit Sets

1. Abbreviated Notation for Transformations. In order to have a convenient way of referring to a particular transformation without having to reproduce its explicit form, we introduce at this point a simple shorthand notation. As already noted in the introduction, our cubic transformations in three variables may be written in the form: 10 x' = dijMj, i=1,2,3, (1) j=1 with dij =Oor 1, alli,j, (2) and 3 dij = 1, all j, (3) i=l

where the Mj are the separate terms in the expansion of (x1+x2+x3)3 . We now choose the following conventional ordering of the Mj. Ml = x31, M2= 2 3= , M4== 3xilx2 , M5= 3xlx2 , M6= 3x2x1, M 7= 3x 2x2 , M8= 3x3x2 , (4) Mg = 3x3x2,M1o= 6x 1x 2x3 .

Any cubic transformation of our class is then completely determined by specifying which terms Mj or, equivalently, which indices j, appear in the first two lines of the schema (1). Let us call the set of indices belonging to the first line C1 and those belonging to the second line C2; C3 is of course the complement of C1 + C2 with respect to the full set {1,2,..., 10} and need not be written down. Thus, for example, the transformation:

xi = x3+ 3XX22 + 3Xlx2 + 3 x2x 2 + 3x 3x22 + 6 1X2X3 , 2 = 1+ 3 3X1, (5) 3 + 3x2x1 x3 = x2 + 32, would appear in the form: C= {3, 4, 5, 7, 9, 10} (6) C 2 =1, 8}


313

An analogous notation may be adopted mutatis mutandis for quadratics in four variables. Any such transformation can be written in the form: 10 x' =- dijFj, i-= 1,2,3,4, j=1 dij = 0 or 1, all i,j, (7) 4 dij = 1, all j. i=1

Our conventional assignment of indices to the Fj is as follows: F1= x,F2 = F3= x3, F 4= x, F 5= 2x1x2, F 6= 2x 1x3, F 7= 2x 1x 4, F 8=2 x2x3, F9= 2x2x4, (8) F0o= 2x3x4. Let Qk denote the set of indices belonging to the kth line of the schema (7). Then any such transformation is specified by writing down three of the four Qk thus: Q1= {2,8,9}, TQ1Q2Q: Q2 = {3, 7, 10}, (9) Q3= {5, 6}

represents the transformation xl = X2 + 2x2x3+ 2x2, X4,x2+ 2xlx 4 + 2x3, 4, ,( Z3 = 2X1x2 + 2xlX3 , 2 2 X4=xXX 4 This notation will be used extensively throughout the paper.

2. Limit Set Terminology. By a limit set Lp(T) we shall mean the set of all points of the region of definition* which are limit points, * For cubics this is the S, a triangle introduced in section I; for quadratics it is the tetrahedron 4 i = 1, Xi> 0 i=1


314

in the ordinary sense, of the set Tn(p), n = 1, 2,.. , for fixed p. It may happen that Lp(T) is independent of the initial point p; Lp(T) L(T) could then be called the limit set of T. In general, L(T) will only be defined for interior points p, since points on the boundary frequently* behave in a rather special way.

Thus, for example, if po is a unique fixed point of the transformation: T(po) = po, and if the iterated images Tn(p) of all interior points p converge to po, then L(T) {po}. If po,Pl,...,Pk-1 form a system S of k points such that T(pi) = pi+l, i = 1, 2,... (mod k), and if for all interior points p, limn-,, Tnk(p) is one of these points, then Lp(T) S.

It might happen that the interior points divide into a finite number of classes C1, C2, ..., Cr such that all points p belonging to the same class Lp(T) forms the same set; we should then have a finite number of limit sets L1, L2, ..., Lr. Some of these may contain a finite number of points, others may be infinite. For convenience we shall usually refer to a finite limit set containing k distinct points as a period of order k.

Although a given finite limit set belonging to some transformation T may legitimately be considered a "property" of that transformation, it is in no sense characteristic; many different transformations of our type may possess the same limit set, even for the same set of initial points. It should also be stressed that not every set of points S {pi, ..., Pk} such that T(pi) = Pi+l (i mod k) is properly a limit set. Such a set of points, each of which is a solution of the equation Tk(p) = p, must have the additional property that there exists a set of initial points whose iterated images converge to S. Finite sets S which have this property are conventionally termed attractive. Thus, we should properly refer to a finite limit set of k points as an attractive period of order k. In the sequel we shall usually omit the word attractive when the context makes it clear that this is what is meant.

There is, of course, no structure problem so far as finite limit sets are concerned; they are completely described by giving the coordinates of their constituent points. For infinite limit sets, the situation is different. On the basis of our numerical work alone, we cannot say with certainty that our transformations have such limit sets; the sets may in fact be finite (with an enormous number of points in them), but the presumption that they are infinite is very strong. For any observed infinite limit set we can at most say that it is not a period of order less than some very large k. Granting, however, that we are dealing with infinite sets, and that we may infer some of their properties by

* i.e., often enough to make it worthwhile excluding them in the definition of L(T).


315

examining a sufficiently large finite subset* we may attempt to classify them according to their macroscopic morphological properties.

3. Infinite Limit Sets for Cubics in Three Variables. On the basis of our empirical study of cubic transformations, we may make a rough division of infinite limit sets into four classes:

Class I. This includes all limit sets that appear to have the form of one or more closed curves. Figures 3 through 6 will serve as examples of this class. The detailed structure of these "curves" has been studied numerically in some cases, but there are as yet no theoretical arguments to the effect that these are really one-dimensional continua. Fig. 3 Fig. 4 Fig. 5 Fig. 6

* This assumption underlies all our numerical work.


316

To illustrate one type of numerical study that we have carried out on these limit sets we cite the case of figure 3. This shows the "infinite" limit set L(T) belonging to the transformations:

C = {2, 5, 7,9, 10 , 2 C2 = {1,3,6,8} )

In the S, a coordinates, this takes the form: S/ 3 3 15 23 3 3 1 S' 3 b- S2a - -Sa2 + a3 - 3S2 - 3a2 + qS + -a + , 2 2 2 2 2 2 2' (12) 3 3 9 1 3 3 1 - + 2a- Sa2+ a3 +3S2+3a2 - S-a+ 2 2 2 2 2 2 2

There is a (repellent) fixed point at: So = 0.6149341, ao = 0.1943821 . (13)

To six decimal places, the overall bounds on the curve are* Smax = 0.816878 at a = 0.058022, Smin = 0.411270 at a = 0.204391 (14) amax = 0.435861 at S = 0.552246, amin = 0.017750 at S = 0.728386 .

To five decimal places, the average value of coordinates is found to be* 1 1 N NS-N S S)- 0.62231, a -Na() = 0.20772. (15) i=l i=l

This set L(T) is the only infinite limit set the transformation seems to possess [the pair (S = 1, a = 0), (S = 1/2, a = 1/2) turns out to be an attractive period for this transformation.] For "most" initial points, the sequence of iterates converges to L(T). If we choose as our initial point some p C L(T), the curve will be traced out by successive images of p, though not in a continuous fashion. If, however, we look only at successive iterates of the 71 st power of T: T71 , the curve is indeed generated in a relatively continuous fashion; the successive points T(71n)(p), n = 1, 2,..., lie close to each other and trace the curve in a clockwise sense. This is illustrated in figure 7, where 246 successive values of T(71)(p) are plotted. It is striking that the nonuniform density of

* These results were obtained by carrying out N = 9600 iterations, starting from a point on the curve with coordinates: S = 0.5841326, a = 0.4125823.


317

points along the curve-as shown in figure 3-is reproduced by this sequence of iterates. We are thus led to the conjecture that L(T) and L(T(71 )) coincide.

It is, of course, by no means generally true that T(k) and T will have the same limit set for an arbitrary T of our type (cf. the case of periodic limit sets where k is a multiple of the period). Further experiments have convinced us that L(T) _ L(T(k)) for all k in this case. If this is so, the set is certainly infinite. That it is a continuum is also very probable.

The presumption that L(T) is one-dimensional is supported by the following experiment. We choose a point po which seems to lie, with all available precision, on a convex portion* of the curve, and obtain 100000 iterated images of it, keeping track of those iterates which lie closest to p0. We find that the two points Pl and P2 of closest approach lie in opposite quadrants with respect to Po, and that the slopes of the two line segments (pi, po) and (p2,Po) are the same to within a fraction of a percent. This suggests (1) that the limit set is a curve, and (2) that the curve probably has a continuous derivative at Po **

Limit sets consisting of several separate curves (figures 8, 9, 10) may in principle be treated in the same manner, although it is then no longer true that T(k) will have the same limit set as T for all k. For example, if L(T) consists of three separate curves, L(T( 3 )) will coincide with only one of these-which curve depends, of course, on the initial point.

Class II. This class consists of those infinite limit sets all points of which lie on a pair of boundaries of the (S, a) triangle. Alternate iterates lie on alternate sides, hence the square of the transformation will have a limit set confined to one side of the triangle. T(2)(p) is then strictly one-dimensional for all p situated on one or the other of the two sides in question. The situation is illustrated in figures 11 and 12. There seem to be only a few such one-dimensional limit sets possible within our class of cubic transformations. Correspondingly, many different cubic transformations lead to the same pair of onedimensional transformations when the set of initial points is restricted to a pair of sides of the (S, a) triangle.

* Overall convexity is rarely, if ever, a property of these limit sets. ** We do not conjecture that the derivative exists at every point, but we think it likely that the number of points where the derivative does not exist is at most a set of measure zero.


318

blank


319

For example, every transformation of the form:

with non-negative ai, bi, ci satisfying:

will lead to the pair of one-variable polynomial transformations of 6th order:

In other words, transformations T of the form (16) have the property that T(2) transforms each of the lines X3= 0 and xi = 0 into subsets of themselves (in the S, a coordinates, these lines are respectively the boundaries S+ a = I and S = a). The study of such one-dimensional transformations is much easier than that of the original plane transformations, but there are certain serious computational pitfalls connected with high-order iteration (see appendix I).

Class III. The limit sets constituting this class will be referred to as pseudo-periods. They consist of relatively dense clusters of points localized at a finite set of centers, with a few scattered points in between (figure 13). Such limit sets have not been observed for our original cubic transformations with integer coefficients; they are, however, a prominent feature of the more general transformations discussed in sections III and IV.

ClassIV. In this class we place all infinite limit sets not included in the first three classes. Viewed on the oscilloscope they appear as very complicated distributions of points with no recognizable orderly structure. Some examples are shown in figures 14 through 17. A few other examples will be discussed in detail in the following sections. For illustrative purposes, however, we include here a few remarks about figure 17.

This limit set belongs to the transformation:


320

blank


321

As is evident from the photograph, it consists of seven separated pieces; each of these is invariant under the 7th power of the transformation. Extensive experimentation indicates that the gaps are really there. There appears to be no orderly structure within the separate pieces; in figure 18 we show about 385 consecutive images T(7 )(p) (in the upper left-hand piece of the limit set) of some p lying in this subset.

Fig. 18

4. Statistical Observations.

a. A large majority of our 9370 cubic transformations in three variables-some 75 per cent--exhibited what might be called simple convergence for all initial points tried. For these the limit sets consist of a single point, i.e., a fixed point of the transformation. In many cases there are two such attractive fixed points, but we have not found a case in which both such points are interior to the (S, a) triangle.

We exclude here a few trivial cases such as the following. Consider T C1 = {1,2,3,4,5,10} , Tc,' -{C 2 = 6,9}. (

Explicitly, the second and third line read: x' = 3x2(x2 + x2 x3 ) (22 x3= 3X3 (X1 + x 2x 3) Thus 22 (23) x3 X3


322

so that this ratio is fixed by the initial value, and we have a continuum of fixed points. Setting x 2/x 3 _ r, we find that the fixed point is given by 1 +r- (1+r2 )/3 (24) x3 = (24) 1 + 3r + r3 with, of course, x2 = Tx3,1 = 1 - (1 + r)x3. (25)

If we consider the transformation derived from the above by interchanging the right-hand sides of (22), we shall have: 2X XI 22-1 3- (26) yielding a corresponding continuum of limit sets which are periods of order two.

b. About 16.5 percent of the transformations seem to have only finite (periodic) limits sets; not surprisingly, most of these are of order two. More than half of the latter are of a trivial nature, that is, two vertices of the triangle permute under T. Less than 20 cases were found for which the limit set was a period of order k > 3. High-order periods are, however, frequently encountered in the study of the generalized transformations discussed in sections III and IV.

c. Some 5 percent of the cases were found to have several (i.e., two, rarely three) distinct finite limit sets of the types described above. For a given transformation it would in principle be possible to determine numerically the set of initial points whose iterated images converge to a particular one of the several limit sets; lack of time has prevented us from doing this except in a few cases. We only remark that there is in general no reason to suppose that the boundary of such a set of initial points is simple.

d. The remaining 3.5 percent, some 334 transformations, possess infinite limit sets. Most of these (roughly three-quarters of them) belong to class I, that is, they look like closed curves. Perhaps 5 percent of the rest belong to class II, the 20 percent residuum being of class IV type. As mentioned above, no examples of class III (pseudo-periods) limit sets were encountered in the study of our original group of cubic transformations (i.e., those with integer coefficients 1, 3, or 6).

e. No case has been found in which a transformation has two distinct class IV limit sets, although there are cases where one of several


323

limit sets was of class IV type. One such has already been described in section I (page 16); a more complicated example will be mentioned in section IV below.

f. We can say very little about the rate of convergence of a sequence T(n)(p) to its Lp(T). Sometimes it may be extremely rapid (10 to 20 iterations); in other cases many thousands of iterates may be required. If Lp(T) consists of a single point, Lp(T) = {po}, this rate can, of course, be calculated (for points sufficiently close to po) by solving the approximate, linear difference equations explicitly.

This is, however, not always sufficient. If the jacobian matrix, evaluated at po, has complex roots, and \A2[= 1, the linear difference equations may generate an inlvariant ellipse. Such a case was found in one of our quadratics in three variables, and is discussed in our report on that work. In the S, a coordinates, this transformation is S' = 1 -4a + 4a 2 +2aS, a' =2aS (27) with fixed point at 1 1 S= a = - (28) 2' 4

Letting 1 1 x= S--y=a-, (29) 2' 4 the linear approximation is 1 1 x' =-y + x, yy y+x . (30) This then generates the invariant ellipse: x'2+xy' + 2y2 =x 2 +xy+ 2y2 . (31) In fact, however, for the full (nonlinear) transformation, the fixed point is attractive.

5. Limit Sets for Quadratic Transformations in Four Variables. All 34337 distinct systems of this type have been investigated on the STRETCH computer, as described in section I above.* A preliminary survey of the results indicates that only about 2 percent of

* The computing time required for the whole study was only a fraction of what one would predict on the basis of 7 seconds for 105 iterations-an average for these recurrence relations as actually coded---because a large majority of cases "converged" in a few (~50) steps.


324

these transformations possess infinite limit sets. The finite limit sets need no special comment; they are of the same sort as those found in cubics in three variables-except, of course, that they are not in general plane sets. A few periods of rather high-order (more than 100 points!) were found, as well as a fair number of cases with 10 to 80 points. This probably should be expected in view of the greater variety of possible algebraic structures.

We are not yet in a position to classify the infinite limit sets as we have in the case of the cubics. Perhaps the closest analogy to sets of the class I type are those which appear to be closed curves in space. These are illustrated in figures 19 through 23. They are shown in convenient projections; the "coordinate system" in the center of the picture merely indicates the orientation relative to the viewer, who is conceived of as stationed at a certain distance from the origin along the y axis.* Figure 19 shows a limit set belonging to the transformation: Q1 = {1,3,4}, TQ12Q3: Q2 = {5,6,8}, (32) Q3 = {7, 10}

Presumably what one sees is a twisted space curve. In figure 20, the limit set consists of two plane curves, one of which lies in the (x1 , x3) plane, the other lying in a plane inclined at 450 with respect to the first. The corresponding transformation is Q1 {1,2,9}, TQ1Q2Q 3: Q2 = {4, 7, 10} , (33) Q3 {3,5,6}.

The observed limit set is at least consistent with the fact that (33) evidently transforms these planes into each other (so that the points lie alternately on the separate curves). More complicated twisted curves are possible (figure 21). We have also found quite implausible looking limit sets like that shown in figure 22. As a final example, we cite the transformation:

* The "reference system" (x, y, z) is parallel to, but displaced relative to, the actual coordinate system (x1,x2,x3). The origin of the (x, y, z) system is in the (approximate) center of the picture; that of the (x1,x2,x3) system is in the lower left-hand corner.


325

blank


326

This has at least two infinite limit sets, one of which may be of class IV type (not shown); the other (figure 23) is a "curve" of unknown structure.

At the time of this writing (January, 1963) we are unable to say anything more specific about the limit sets for quadratics in four variables; to date, less than one-third of the seven hundred or so potentially "interesting" cases have been looked at on the oscilloscope.

III—
The "At-Modification"

1. We discuss here a particular one-parameter generalization of our original cubic transformations in three variables which we have called the At-modification.* It consists in replacing the usual difference equations:** 10 x'=dijMj, i = 12,3, (1) j=l by 10 x' = (1- At)xi + At dijMj , (2) j=l with O < At< 1. (3)

If At = 1, we recover the original set (1); At = 0 is excluded, since the equations then become the trivial identity transformation.

The abbreviated notation of section II is extended in an obvious manner to cover this case. Thus, if (1) is represented symbolically by TC1C2, (2) may be symbolized by Tc 1c 2(At). For a given p, the limit set will correspondingly be denoted by Lp(T), Lp(At)(T).

* This has already been mentioned in the Introduction, equation (12). The modification can, of course, be introduced for the general case [equations (10) through (13) of the Introduction.]

** The Mj are defined in section II, subsection 1, equation (4). Unless otherwise stated, equations (2) and (3) of section II are assumed to hold, as well as the condition Zi = 1, xi> 0, for all initial points p = (xl,x2 ,x3 ) .


327

In the S,a coordinates, (1) appears in the form S' = F(S, a), a' = G(S, a) . (4) Correspondingly, (2) reads S' = (1 - At)S + AtF(S, a), a' = ( - At)a+ AtG(S, a) . (5)

It is clear that the fixed points of (5) coincide with those of (4). As we shall see below, this fact enables us to find these fixed points by simple iteration, thus avoiding the unpleasant algebra involved in eliminating one variable from the pair of general cubics S = F(S,a), a = G(S,a) , (6)

and then solving for the roots of the resulting high-order (< 9) polynomial. One can look upon (5) at the simplest (and most naive) finite difference scheme for approximating the first-order differential system dS da d = -S+ F(S, a), d=-a +G(S, a). (7) dt dt

The analogy between (5) and (7) is not, however, very close;* consequently it is better to discuss (5) on its own merits. The effect of setting At < 1 (but > 0) on a single iteration is easy to see. Let us take a particular point S, a; the image produced by (4) will be denoted, as usual, by S', a', while we shall call the corresponding image under (5) S' mod, a' mod. Then S' rod- S' = At(S'- S), a' mod- a' = At(a'- a) . (8)

In other words, the length of the iterative step is altered, while the direction remains the same. What happens on repeated iteration is, however, not all obvious. One expects that the limit set Lp(,t)(T) will in general have smaller diameter as we decrease At, but we cannot at present predict its structure as a function of At, even relative to the (observed) structure of Lp(T). It is worthwhile illustrating this in a particular case. Consider the transformation.

* For further discussion on this point see section V.


328

In the S, a coordinates, this reads explicitly: 3 S2+ 3a2 S' = S3 - 6 S2a - 3Sa2 + 4a3- S2_ +3Sa-a2 +1. TA': 2 2 (A) a' =-S3 + 3Sa2 + 2a3 + S-3Sa + 2a 2 2

Since we shall refer to this transformation quite often in the sequel, we have given it the distinctive label (A). TA has one interior* fixed point (repellent), whose coordinates are: So = 0.5885696, ao = 0.1388662 . (10)

There are two infinite limit sets; these are shown in figure 24 and in figure 11 of section II. At the moment, we shall not be concerned with the limit set shown in figure 11; this is evidently of class II type and can therefore be studied in one-dimensional form.** The limit set shown in figure 24--which we shall henceforth refer to as L(TA) appears as an irregular pattern surrounding the fixed point (shown superimposed on the picture). Figure 25 again shows L(TA)-this time enlarged by a factor 3, while figure 26 shows a portion of the upper left-hand corner*** enlarged about 14 times. Figure 24 shows 900 consecutive iterates, while figure 25 shows these same 900 points plus 1800 more. For comparison, in figure 27 we plot just 50 consecutive iterates. The approximate outer dimensions of L(TA) aret

Smax at a 077251, Smin at a 204610, amax at S 491266, () amin at S

We now contrast with L(TA) the limit sets L(At)(TA) belonging to the generalized transformation TA(Xt). If we set At = 0.9931, we get a limit set entirely different from L(TA) (from the same initial point). This is shown in figure 28. It exhibits what we have called "pseudo-periodic" structure, that is, almost all the iterated images

* There is another repellent fixed point at a vertex of the triangle, namely S=a= . ** See section II. *** This is the region 0.455 <S< 0.525, 0.225 < a < 0.278. t These results are based on a calculation with N = 9600 iterations.


329

blank


330

of the initial point p are concentrated in the neighborhood of seven distinct "centers"-an example of class III limit set.*

With a very small change in At-namely, by setting At = 0.9930we find instead a period of order 7. This is shown in figure 29. As we decrease At in small steps down to At = 0.9772 (figure 30), the corresponding L(At)(TA) remains a period of order 7; the coordinates of the individual points appear to change continuously with At. For At = 0.97713, L(At)(TA) is again a pseudo-period, and this character persists down to At = 0.9770 (figure 13 of section II). Below** At = 0.9770 L(At)(TA) is a closed curve*** around the fixed point which shrinks in more or less continuous fashion as At is decreased. Figures 31, 32, and 33 illustrate, respectively, the limit sets for At = 0.97, 0.94, and 0.92. Finally, for At < 0.9180154 (see below), the limit set consists of a single point-the fixed point (10).t This peculiar behavior of L(At)(TA) as a function of At is not an isolated instance, nor is it by any means among the most extreme examples we have encountered (see section IV for a considerably more "pathological" case). Within the class of cubic transformations we have studied, it seems to be an empirical rule that the more pathological the limit set looks for At = 1, the more complicated will be its behavior as At is decreased.

2. Attractive and Repellent Fixed Points. The fixed points of a cubic transformation in the standard form (4) are those real roots of the algebraic system (6) which lie in or on the boundary of the S, a triangle.tt We are interested both in finding the values of the coordinates of these fixed points and in determining whether the points are attractive or repellent. By attractive we mean as always that, for any point p in a sufficiently small neighborhood of the fixed point po, the sequence T()(p) will converge to po. A general criterion for the attractiveness of a fixed point has been given by Ostrowski,10 viz: let lAmaxl be the largest eigenvalue in absolute value of the jacobian matrix evaluated at the fixed point. Then if )Amaxl < 1, the point

* See section II for this classification scheme. ** We have not attempted to find the critical values of At with greater precision, though this could in principle be done to, say, 7 decimal places on MANIAC II. *** This is, of course, only a conjecture. See the discussion in section II.

t In the language of functional analysis, TA(at) is a shrinking operator in this range of At values. tt Brouwer's theorem assures us that there is at least one fixed point.


331

blank


332

is attractive; if jAmaxI > 1 the point is repellent. The theorem says nothing about the case nor Amaxl = 1, nor does it yield a method for determining theoretically the appropriate neighborhood. For the two-variable transformation (4), we may give the eigenvalues of the jacobian matrix explicitly: T ±T-4J (12) 2

where To is the trace: OF OG To = ~S + 0a SSo (13) a=a and Jo is the jacobian: OF oGOG OF d OS Oa OS da S=So' (14) For the modified system (5), we find correspondingly: Amod = 1 -t + (To± ) . (15)

If the roots are complex, i.e., if To2 < 4Jo, we have I Amod 1 - \t(2 -To) + A2 (1 -To + J) (16) Defining Atlim as the value of At for which I Amod 12= 1, we obtain Atim = 2-To (17) 1 - TO+ Jo Thus, for the case of complex roots, we may make the fixed point (So, ao) attractive by choosing At such that 0 < At < Atlim . (18) Similarly, if Amax is real and negative, and l Amax [ > 1, Atlim -+2 - mx (19)


333

It is clear that this artifice will not work if Amax > 1. Such a situation arises in the one-dimensional case: x = x3+ 3xx2 , x + 3 l, x + x 1 (20) that is xl = x2(3- 2xi) (21)

The fixed points are xl= 0, 1/2, 1; at these points the derivative dx ldxl, has the values 0, 3/2, 0. Clearly both xl = 0 and xl = 1 are attractive fixed points; for all x(l)> 1/2, x(n - 0, while for all x( ) > 1/2, (n) - 1. The interior fixed point xl = 1/2 is repellent and cannot be made attractive by using the At-modification. The corresponding situation does not seem to occur for any of our cubic transformations in the plane.

In practice, all one has to do to obtain the numerical value of a repellent fixed point is to choose a sufficiently small At and iterate; on a computer, this calculation requires only a few seconds.*

3.Attractive Periods. The set of points constituting a period of order k are fixed points of T(k). Thus one may test whether a periodic limit set is attractive by applying Ostrowski's criterion to T(k). Let as(n)aS(-)jo(n) oS 1 a J(n) _ as(n)da(n)(22).S3 a -(So, a0 ) OS(n)OS(n) aS(n-l)aa(n-1) (23) Jn-1-Oa(n)Qa(n) _aS(n-I) Oa(n-1) S(n-1), a(n-1)

where, e.g., (S° , a° ) is a fixed point of T(k). Then, by the chain rule: J(k) = k-1 x Jk-2 ... x J (24)

Thus J(k) is easily obtained by evaluating (24) over the periodic set in question; the application of Ostrowski's criterion is then immediate.

* Early in this investigation we made the "mistake" of taking At > Atlim in a few cases, and thereby discovered the interesting limit sets Lp(/t)(T).


334

We have often used this technique to convince ourselves that the periods are really limit sets and not the result of spurious or accidental convergence.*

IV—
Modification of the Coefficients

1. In this section we present some results on the effect of modifying the original integer-valued coefficients of our cubic transformations in three variables. That is, we consider, as before, transformations of the form: 10 ' =dijMj, (1) j=l 3 d=ijl, (2) i=l

but we no longer require that the dij all be 1 or 0. As already remarked in the introduction, if we impose on the dij only the additional condition: 0 <dij< 1 (3)

then (1), (2), and (3) define a class of cubic transformations depending on 20 parameters, e.g., dlj, d2j (j = 1 to 10).** Since we are unable to formulate a complete theory for the finite subclass of transformations characterized by the restrictions: dij = 0 or 1, all i,j, it is clear a fortiori that we do not have a theory for the infinite class.

In this paper we limit ourselves to showing how an experimental study of some special cases can help to throw light on the properties of our original cubic transformations.

In effect, what we do is study certain transformations which are "close to" some particular transformations of our basic type. A natural

* This technique has actually been used for periods with orders k as large as 148. For very large k the method might fail owing to round-off errors or other numerical inaccuracies.

** For the definition of the cubic monomials Mj, see equation (4) of section II. The domain of this class of transformations is again the region 3 xi=- 1, 0 < Xi 1. i=l


335

way to define a transformation close to some given Tc,c2 would be to choose its coefficients as follows: dij-1 - eij,j C Ci , (4) dij = eij3, j Ci

where the dij must satisfy (2) and (3), and the eij are small. This class of transformations, defined with respect to some Tc,C2, is still too extensive to study, even if the various eij are restricted to a few discrete values. What we have actually done is to consider 20 such transformations, each of which depends on a single parameter e. We denote these by the symbol: T(r,s, 1< r < 10, 0 <e< , s = 0, 1. (5)

It is understood that these transformations are only defined relative to some TC 1c 2. For convenience we shall generally refer to the transformations T(r,s), defined relative to some Tc,c2 as associated transformations. The coefficients of the T(,,,), are specified as follows: For j ~ r: dij = 1, j c Ci, (6) dij = , j f Ci For j = r: dij - -e, rC 1, dir(1-s)e, r C1 , d 2r = 1-e, r C 2,d2r = se, r 02 C2

In words: T(r,s) is formed from Tc,c2 by the replacement Mr (1 -e)M wherever the term Mr occurs, and by adding eMr to one of the other two lines of the three-line schema. As an example, consider the transformation TA introduced in section III: TCA = {3, 5,8,9, 10}, C2 = {1,2,8}.

Relative to TA, T(5,1)E would read: x' = M3+ (1 - e)M5 + M7+ M9+ M 10, X2' = M1 + M2 + eMs + M8 (9) X3= M4 + M6,


336

while T(4 ,0) would take the form: x1=M3+ eM4 + Ms + M7 + M9+ Mo , 2 =M1+ M2+ M 8, (10) 3 = (1- C)M4+ M6,

In the S,a coordinates, the T(r,s), can be written: S' = F(S,a) + efrs(S,a), a' = G(S,a) + eg,,(S,a) ; (11)

the original Tc,c 2 is obtained from (11) by setting e = 0. For the two examples given above, we have: frs = f51-(S - a)(1 -S - a)2 e(5, 31)S : 2 (12) 951=-f51; frs = f40 = 12a2 (S - a) (13) T(4,0)e : ' (13) 940 = 0 -

It turns out that for these one-term modifications T(r,s)e we always have gr = ±frs or 0. frs can further be factored into a numerical coefficient crs and a function Mr(S,a); the Mr are of course just the original cubic monomials expressed in terms of S and a. The crs and g,s are determined as follows:

For r C C sgrs 0, Crs -1, rs -frs Crs -; ( for r C Cgrs -frs, Crs rs frs, Crs - ( for r t cl,r t C s rs O, Crs 1, s s frs Crs(

2. We have studied the modified transformations T(r,s)e for a variety of our original cubic Tc 1C 2 that happen to have infinite limit sets.


337

Our usual procedure has been to vary e in steps of 1/100 in the range 1/100 < e < 1/10 for a given T(,,s) relative to a given Tc,c 2, although on occasion intermediate values of e have been used. Only for the transformation TA [equation (8) above] have we looked at all 20 modified transformations. For a few other TC 1C2 we have limited ourselves to selecting certain of the associated T(r,s)e for detailed study. Since this selection has generally been made on intuitive grounds, we cannot claim that the most "interesting" modifications of the original transformations have always been considered. Nevertheless, this part of our study has proved most revealing, especially as regards the structure of class IV limit sets.

Before describing the results, we insert a few remarks on the difference between the two types of generalizations we have considered, the At-modification of section III and the associated transformations Tr,,s). The At-modification is essentially nothing but the application of a technique frequently employed in the practical solution of nonlinear equations by iterative methods; it is, in fact, one way-perhaps the simplest--of introducing a linear convergence factor. Apart from our use of this device for obtaining the coordinates of the fixed point, our principal interest is in small convergence factors (At close to unity)-too small, in fact, to produce convergence to the fixed point. In view of the fact that the At-modified transformation TC 1C 2(At) has precisely the same fixed point as the original transformation Tc1C 2, one might expect that there exists a close relationship between the corresponding limit sets Lp(At)(T) and Lp(T). In some sense this is true, as the examples given in section III show (see also below, subsection 4). We may express this more formally as follows:

We define a sequence of transformations Tc1c2(Ati)=TAt with corresponding limit sets Lp(Ati)(T) by some convenient rule: Ato = Atlim, Ati = - t (17)

The sequence Tt,, i = 1, 2, . . ., clearly converges to Tc,C2. We then formulate the following conjecture:

Given a Tc,c2and a 6 > 0, then, for all p in the triangle, there exists an N(p) such that, for i > N(p) and for all x C Lp(T), there exists a y C Lp(At )(T) satisfying I y - x < 6.

The modification of Tc1 c2 defined by the associated transformations T(r,s) differs from the At-modification in several respects. In the first place, the perturbation introduced is not linear. Furthermore,


338

the fixed points of T(r,s)c are in general not the same as those of TC0 C2 (fixed points on the boundary of the triangle may, of course, be common to T(,s),e and Tc,c2 for some pairs r, s).* Finally, each pair r, s must be treated separately; for fixed e, perturbations of different terms of TC1C2 may lead to quite different limit sets. Nevertheless, a conjecture analogous to that formulated for the sequence T xt would most probably turn out to be correct.

3. Limits Sets of the TransformationsT(r,s)e Associated withTA. Since we usually deal with values of e of the form: ei = 1 I, << 10, (18) we introduce a symbol to denote a set of such values: I(i,j)- {en}, i< n < j . (19) In addition, Rk will denote the closed interval of e: (r,s) Rk8 R-±k + k -Rk k(r,s) = [ (r,s) (rs)] ( (rs) < <+(rs) (20)

for which the limit sets L(r,,) of T(r,s), are periods of order k. The photographs illustrating the examples that follow will be found, suitably labelled, in subsection 2 of appendix II.

There is one significant feature common to all the T(,s)E associated with TA; for every pair r, s at least one periodic limit set-that is, a period of order k > 1- -was found in the range I(1, 10). The order of periodicity of most frequent occurrence was k = 7. Thus, for example, for (r,s) = (10,0), we found periodic limit sets with k = 7 over the range 1(3, 10), and the case (r, s) = (6, 1) behaves in the same fashion over the same range. For both series of associated transformations, the limit sets for e = 0.01 are of class IV type and closely resemble L(TA). At e = 0.02, bright spots show up in the pattern (figure A-1); this usually indicates that one is near a period, i.e., that a relatively small change in e will yield a transformation having a finite limit set. In the notation (20), this would be written: -R7 -0.02 < 1. In (r,s)

* The new fixed point S(r,s) S = S + As, a(r,s)e = a + Aa, calculated to first order in (AS, Aa), has both AS and Aa proportional to e. The ration AS/Aa in this order is therefore independent of e, though not of r, s. There are in fact, 6 possible directions of displacement, two for each of the three cases: grs = frs, grs =-frs, rs = 0, cf. equations (14), (15), and (16).


339

these two examples it happens that a period of order 7 is observed over the range 1(3, 10), that is: I(3, 10) cR L). This is not generally the case. Thus for the case (r, s) = (9, 0), I(2, 9) cR70, whereas L(9 ,o)0.01 and L(9 ,o)0.10 are of class IV type and are morphologically similar to L(TA). It may be recalled (section III) that an analogous behavior was observed for the At-modified transformations TA(at), namely that LAt(TA) was found to be a period of order 7 for a particular range of values of At (0.9930 < At < 0.9772), and different in character (actually, of class III type) outside the range on both sides.

Periodic limit sets of order 7 have been found for some range I(i, j) of e in 9 out of the 20 possible cases. For one of these, (r, s) = (2, 1), 1(4, 7) CR(2), while L(21)o0.io is periodic with k = 28 (figures A-2, A-3). In the transition region, i.e., for +R2, ) < e < -R(28 ) the -C) -cl 1 I ICIV(2,1)(2,1)2 limit sets are infinite. These are shown in figures A-4 and A-5 for the range 1(8,9). They look like pseudo-periods, but, when suitably enlarged, they are seen to be of class I type (figures A-6, A-7). In these pictures one clearly sees with increasing e the onset of instabilityto use an expression from mechanics-and the eventual attainment of a different stable state. The transition region at the lower end of the range also contains infinite limit sets. Figures A-8 and A-9 show (2,1)0.03, first to normal scale, then enlarged. It is manifestly a class III limit set.

For other T(r,s), periods of order k > 7 are found for certain ranges of the parameter e, viz.: k = 9, 16, 23, 30, 37, 46, 62, 148. In two cases, two periods of relatively prime order are found in different sub-ranges of I(1, 10). Thus T(5 ,l)e has two periodic limit sets, one with kl = 23 for e = 0.01 and one with k2 = 16 over 1(9, 10). Similarly, T(1,i1 ) has a periodic limit set with kl 16 for e = 0.01, and one with k2 = 9 for e = 1.10. In these cases the dependence of the limit set on e in the transition region +fRkl< e < Rk2 is more complicated than that t(r,s) (< < r,s) described above. For e-values in this region and sufficiently close to the end-points we observe the expected pseudo-periodic limit sets. For values of e not too close to either boundary the limit set may be either of class IV or of class I type. Figure A-10 shows L(5 ,1 )0.04 to normal scale; in figure A-11, a portion of the limit set is shown enlarged.

We conclude this subsection with two further examples. These illustrate a phenomenon previously mentioned in our general description of limit sets (section II), namely the coexistence of finite periods and class IV sets. Figures A-12 and A-13 show two distinct limit sets belonging to T(3 ,1 )0.01 . One is a period of order k = 23, while the other is a class IV set closely resembling L(TA). The same phenomenon


340

is perhaps more strikingly illustrated by the case of T(5 ,0)0.02. Here we find both a class IV limit set and a period of order k = 148 (figures A-14 and A-15). We can say virtually nothing in this case about the dependence of the limit set on the initial point. Current computing facilities and techniques are not sufficiently powerful to effect an acceptably accurate determination of the respective regions of convergence without using prohibitive amounts of computing time. We have, however, carried out a few numerical experiments, the results of which certainly confirm our first impression that the geometrical structure of these regions is immensely complicated.

4.Study of theAssociated Transformations for OtherTcIC 2. In this subsection we discuss a few additional examples to illustrate the dependence of infinite limit sets on the parameter e. The relevant photographs and tables will be found in appendix II.

For our first example, we choose the transformation: T Ci-{2,4,6, 7, 9}, TCI2=TBC2 = 5,8,10} (21)

The class IV limit set L(TB) belonging to this transformation is shown in figure B-1. As is evident, it consists of three separate pieces. Each of these is, of course, a limit set for T ). It is instructive to compare the limits sets LAt(TB) with those belonging to certain of the associated T(r,s),. In appendix II we list the results for only one case; (r, s) = (1,0). The limit sets LAt(TB) and L( 1,o), are described in table B. There are (at least) three ranges of At values for which LAt(TB) is periodic; for At close to unity the behavior of LAt(TB) as a function of At is rather wild. As At approaches Atlim = 0.854320 the (class I) limit set shrinks in a continuous manner. The behavior of L(1,0) as e is varied over 1(1, 10) is, if anything, more "pathological"; there are at least six different intervals R(l'°) for which the limit set is periodic, and each period has a different order. Note the similarity in appearance between the two class IV limit sets: LAt(TB) (At = 0.994) and L(1,o)o.oi.

The next two examples may be taken together: C: {2, 7,8,9,101 T C2 {4, 5, 6} (22) T C = {2,5,7,8,9} (23) C2={4,5,10}.


341

The basic class IV limit sets L(TD) and L(TE) are shown in figures D-l and E-l; their morphological resemblance is apparent. The behavior of the LAt and L(1,0)i for these two cases is set forth il the tables and photographs of appendix II. Detailed comment is perhaps superfluous at this state of our knowledge; we limit ourselves to drawing attention to the following comparisons:

1. Compare L Tt(TD) (At = 0.97) with L(l,o)o.lo(TD). 2. Compare LAt(TE) (At = 0.97) and At = 0.96) with L(1,o)o.o 9(TE) and L(1,o)o.lo(TE).

5. The original transformations TB, TD, TE are closely related from the point of view of formal structure. TD and TE differ by exchange of a single term between the defining sets C1 and C2, while each of these goes over into TB under the simultaneous interchange of two terms between C1 and C2. A comparison of the associated limit sets for TD and TE shows that the initial similarity of L(TD) to L(TE) is roughly preserved under perturbation. This suggests the possibility that some meaningful classification based on algebraic form might be devised.* Of even greater interest is the correspondence, in these examples, between the LAt and the L(1,0), for some ranges of the respective parameters. We are not at present in a position to draw any significant conclusions from the existence of this correspondence; it seems likely, however, that a closer study of these examples would yield criteria enabling one to predict such behavior.

6. There is one property of these transformations which may safely be inferred from the data, namely, that they are close to transformations having periodic limit sets (for some common set of initial points), where close is to be interpreted with reference to some appropriate parameter space---e.g., a range of e values of At values. Their limit sets are "close" to periods, not in the sense that pseudo-periods are, but rather by virtue of the fact that they contain points which lie closeperhaps arbitrarily close--to a set of algebraic solutions of Tk(p) = p. In other words, the Hausdorff distance between the set of period points and the limit set L is small. In this connection, the following piece

* The difference in behavior of Lat(TB) on the one hand and LAt(TD), LAt(TE) on the other is undoubtedly due in part to the fact that in the first case the jacobian matrix has complex eigenvalues at the fixed point, while for TD and TE the eigenvalues are real; this is probably sufficient to explain the qualitative difference of behavior of the corresponding LAt as At - Atlim for At - Atlim sufficiently small.


342

of evidence may be presented. Consider the transformation T(1.o)o.oi associated with TE, for which we have observed that the sequence T((,)ool(P ), n = 1,2,..., converges to a period of order k = 10 for almost all p. Let us choose a p close to the fixed point. If we then examine the sequence for n = 1, 2,..., N, where N is sufficiently large, we find that the images T() (p) of p have traced out a pattern which closely resembles the original class IV limit set L(TE) of figure E-1. This is shown in figure E-2. The bright spots are the points belonging to the periodic limit set L1 ,o)o.01 . Presumably this means that the effect of introducing a small perturbation into TE, of the form specified by T(.0o).o 01,* is to make the limit set L(TE) contract to 10 points. Alternatively, we could say that, as e - 0, the periodic limit set L(l,o)E(TE) spreads out until it becomes the class IV limit set L(TE).

This and other similar examples suggest that it might be useful to consider the periodic limit sets as fundamental, the hope being that one could develop an appropriate perturbation method, taking these periods as the unperturbed states. The effect of a small change of a parameter (in the direction of instability) is then simply to make the period non-attractive. This can in principle be studied by purely algebraic methods. Determining the structure of the resulting limit set-the perturbed state-is of course a more difficult matter.

In some cases this may amount to nothing more than the development of improved techniques for handling algebraic expressions of very high order. To clarify this statement, we offer one further example. Consider the Atmodified transformations TAt(TA), where TA is a transformation introduced in subsection 2 above. For At = 0.99300, LAt(TA) is a period of order 7. With a very small change in At- namely, for At = 0.99301-the limit set is of class III type, a pseudo-period. Rather than investigating TA(,t) let us turn our attention to the seventh power of the nmodified transformation, TA(at)(p) (At = 0.99301). If we choose our initial point p sufficiently close to one of the (repellent) fixed points of T(,t) ** we find that the first 516 iterated

* If TE is written in the form: S' =F(S, a), a' =G(S, a), then T(1,o), is S' = F(S, a) + 6(S - a)3 , a' = G(S, a). ** The actual values are not known: we have not yet developed good techniques for finding the coordinates of the points of a non-attractive period. The initial point for this example was taken as: S = 0.7034477, a = 0.1159449, chosen on the basis of some simple numerical experimentation. It is close to one of the periodic points belonging to the limit set LA(at) (At = 0.99300), viz.: S = 0.7037400, a = 0.1157123.


343

images of p, T(Z7t) (P), n = 1, 2,...,516, lie on an almost exact straight line in the S,a triangle. This is shown in figure A-16. The initial point p is at the lower right, and the successive images trace out the line continuously from right to left.* If we continue the iteration, we find that the later images deviate from the straight line, then oscillate in position, and finally settle down to generate another straight line with a different end-point--presumably very close to another fixed point of T(7) It is clear that if one had powerful enough algebraic tools, one could calculate this linear behavior.

7. We close this section with two remarks: 1) A study of the TAt and T(r,s)e associated with those Tc1C2 which have only class I limit sets indicates that the latter are much more stable with respect to these one-parameter modifications than are the limit sets discussed above. 2) Even these unstable limit sets appear to be stable with respect to some one-parameter perturbations of the corresponding transformations. Thus, the transformations T(3 ,0) associated with TE have limit sets visually identical with L(TE) over the whole range I(1, 10). Anomalies such as these make general pronouncements about absolute stability (or instability) impossible.

To illustrate: one rlight be tempted to explain the observed stability in this case as follows: Explicitly, T(3,0)e has the form: S' = F(S, a) +e(1- S-a)3 , a' = G(S,a) . (24)

Now the density of L(TE) is relatively large near the right-hand boundary of the triangle, S + a = 1. The perturbing term, however, vanishes on this line. Thus the transformation is on the average very little altered by the perturbation. But this "explanation" becomes less convincing when one looks at other transformations associated with TE. T(2 ,l1), for example has the form:

One would expect the same argument to apply here, but in fact the limit sets only resemble L(TE) over the two ranges 1(1, 2) and 1(6, 10). In between, we get the familiar periodic and pseudo-periodic behavior.

* The final point plotted has coordinates: S = 0.7030206, a = 0.11628136, so the slope of the line is roughly Aa/AS r -0.713. For this photograph, the scaling factor is approximately 2340.


344

V—
Relation to the Theory of Differential Equations

1. As we remarked in section III, the non-linear transformations discussed in this paper exhibit certain analogies with systems of differential equations. In the following we confine ourselves to discussing the plane case.

An important study in the theory of differential equations, particularly as applied to non-linear mechanics, is that of so-called autonomous systems11 '12 '13dx dy d=P(x,y), y = Q(X). (1)

The theory, initiated by Poincare, seeks to determine the properties of the solutions of (1) under very general conditions, and to deduce such properties for particular cases without actually solving the equations explicitly (i.e., obtaining the general integral). In particular, the trajectories, given parametrically as a function of t: x = x(t), = y(t) (2)

are investigated from a topological point of view. Fundamental is the classification of the singular points of the system (1), that is, the points x,y, where P(x,y) = Q(x,y) = 0. The behavior of trajectories in the neighborhood of singular points can be found by consideration of the linear approximation to (1); the real object of the theory, however, is to characterize and, where possible, predict behavior in the large. One of the most interesting phenomena connected with behavior in the large is the existence of closed trajectories, or limit cycles. The theorem of Poincare and Bendixson14 gives sufficient conditions for the existence of such. Unfortunately, the fulfillment of these conditions in particular cases is often hard to verify; to date no satisfactory theoretical method for dealing with an arbitrary given system has been found.*

2. If we write our general two-dimensional system of non-linear difference equations in the form: S— - S(n-)S(n-l) + F(S(nl) a(n-l)) ~~~~~~~At ~~~~(3) a( a(-) a(n ) + G(S(n-), a(n-)) At

* See reference 15. The practical applications are largely confined to stability theory. Also reference 13 and the literature there cited.


345

the analogy with (1) is evident. The fixed points of (3) correspond to the singular points of (1), and the behavior of solutions in the neighborhood of a fixed point can be investigated via the linear approximation; this procedure, in fact, yields Ostrowski's criterion (see section III). If the fixed point is attractive, the asymptotic solution in its neighborhood can of course be obtained. In the case of repellent fixed points (or if the initial point is outside the region of attraction of all attractive fixed points), the sequence of iterates sometimes converges to a limit set which appears to resemble a Poincar6 limit cycle, i.e., a closed curve. In other cases, finite limit sets (periods) are obtained; on the other hand, one may observe limit sets of quite ambiguous geometrical, not to say topological, structure. These last two alternatives have no analogues in the case of differential equations.

In fact, the analogy between (3) and (1) is more apparent than real. The significant distinction lies, perhaps, in the fact that for our difference equations there is nothing corresponding to the trajectories of (1); successive iterates do not in general lie close to each other. This fact makes it difficult to use topological arguments to determine the character of the limit set. For sufficiently small At the sequence of iterates may resemble a trajectory to some extent, but the limit as At 0 is almost certain to be a single point.*

VI—
Broken-Linear Transformations in Two-Dimensions

1. For certain special quadratic transformations in one dimension one can give an almost complete discussion of the iterative properties; this is possible because these transformations are conjugate to piecewise linear (broken-linear) mappings of the interval into itself. For example, the transformation: x' = g(x), where g(x) = 2x, 0<x< 1/2; g(x) = 2 - 2x, 1/2 <x< 1.** The iterative properties of the latter can be obtained from a study of the law of large numbers for the elementary case of Bernoulli. Stated differently, the behavior of iterates of this simple quadratic transformation turns out to depend on combinatorial rather than analytic properties of the function. With this in mind, we tried to see whether an analogous situation would obtain in two dimensions. Our non-linear, polynomial transformations of a triangle into itself might, we thought, be similar to suitably chosen broken-linear mappings of a square into itself, at least as regards their

* It may happen that some power T(n of a transformation more closely resembles a trajectory; cf. the example cited in section II. ** See further in appendix I.


346

asymptotic behavior. One simple generalization to two dimensions of broken-linear transformations in one variable is a mapping: x = f(x,y), = g(x,y) , (1)

where each of the functions f and g is linear in regions of the plane. In other words, the graphs of these functions consist of planes fitted together to form pyramidal surfaces. The motivation for studying such transformations is the hope that their iterative properties will turn out to depend only on the folding of the plane along straight lines or, more specifically, on the combinatorics of the overlap of the various linear regions which is generated by the mapping. The simplest nontrivial case to investigate consists in taking f(x,y) as a function defined by choosing a point in the square and making f maximum at this point, the function being linear in the four triangles into which the square is divided. g(x, y) is defined in an analogous manner.

Each of the functions f(x, y), g(x, y) is thus made to depend on three parameters. Thus for f we choose a point xl, yl in the square and erect a perpendicular of height 0 <dl< 1 at this point; this defines a surface consisting of four intersecting planes. The transformation can then be given explicitly as follows:

where the regions I to IV are specified by the bounding lines:


347

then region I is bounded by L 1, L 2, and x = 0, region II is bounded by L2, L3, and y = I , (region III is bounded by L 3, L 4, and region III is bounded by L3,L4, and x = 1 , region IV is bounded by L4, L1 , and y = 0 . Analogous equations hold for y' = g(x, y), with parameters x2 , Y2, d 2.

Of the several transformations of this type that we have studied numerically we mention only the following: 1 1 T Xl-= , y i , dl =0.95, T •' 3 3 0 -95(5) x2= 0.6,y2= 0.5, x = 0.5,yi= 0.9 , di= ,5 2=0.3,Y2= 0.7,d2= 0.8

and the one-parameter family: X1 iYi= Z, di= 1, x22 =2 = - z,d2 = 1.,

The limit sets are shown in figures H-l, H-2, and H-3 through H-17 of appendix II.

L(T2) (figure H-2) is, in a sense, analogous to the class I limit sets we observed for some of our cubic transformations in three variables. In contrast, L(T1) (figure H-1) represents a new phenomenon-a connected "curve" (in this case, a collection of line segments) that does not close. More interesting, however, is the behavior of the limit sets L(Tz) as z varies from 0.49 down to 0.01.* Initially L(Tz) is an open cycle like

* The case z = 1/2 has not been studied. The reason for this is technical. Straightforward iteration will always produce sequences which degenerate to zero in a finite number of steps, owing to the fact that every iteration involves multiplication by 2. In a binary machine, this operation is a "shift" to the left. A sufficiently long chain of such left shifts will always result in zero. If one wants to study this case, one must replace multiplication by 2 by some arithmetically equivalent operations, e.g., multiplication by C/(1/2C), where C is not a power of 2.

For this case (z=1/2), the problem becomes, of course, one-dimensional. The iterates remain on the line x = y, and the limit set is identical with that of the transformation x' = g(x) introduced in subsection 1 above.


348

L(Ti). With decreasing z, the limit set becomes more complex, until it resembles a class IV limit set (e.g., z = 0.27). With further decrease of z, the limit set appears to contract; the tails get shorter, and the points cover some sub-region of the square more and more densely. One interesting question-unanswered at the time of this writing-is: does Tz become ergodic for some range of z values? In figure H-10 we see 1000 (consecutive) points belonging to L(Tz) for z = 1/4. Figure H-11 shows these same 1000 points together with the next 2000 points. It is evident that the region containing L(Tz) is filling in. In our opinion, this is a strong indication of ergodicity.

For lower values of z, the same ergodic behavior is observed, until, at z = 0.15, the limit set splits into four disconnected pieces (figure H-15). As z is further decreased, these four pieces shrink; by the time we reach z = 0.01, the limit is nearly a single point.

2. It is clear that one can devise broken-linear transformations that are dense in the unit square; one may take, for example, product transformations with independent coefficients and use the one-dimensional result for each factor. For transformations of the type considered in this section, however, it is not easy to determine a priori what the limit set will be. It should be emphasized that there is no hope of demonstrating that our polynomial transformations in three variables are exactly conjugate to some two-dimensional broken-linear transformations. Presumably there is no such conjugacy. Nevertheless, one might hope that a somewhat weaker notion of equivalence than that of strict conjugacy could be introduced.

One suggestion along these lines is the following: define two transformations T and S to be asymptotically similar if for almost every initial point p the limit set Lp(T) is topologically equivalent to Lp,(S) for some suitably chosen p', and vice versa. Thus, for example: if for any two transformations T, S, the sets of iterates T , Sn) are dense in the (common) domain of definition for almost every initial point, then T and S are asymptotically similar in the above sense. Another special case in which two transformations, T and S, are asymptotically similar is when each transformation possesses just one attractive fixed point, the region of attraction being, in both cases, the whole space.

As we remarked above, in the case of broken-linear transformations the asymptotic behavior of iterates depends only on the combinatorial structure of the subdivisions of the fundamental regions (triangles) under repeated folding. Just how complicated this can be is shown by the behavior of the limit sets L(Tz) belonging to the one-parameter family Tz discussed above. To date we have not managed to devise any good method for handling the Boolean algebra of these iterated


349

intersections.

Appendix I

In this appendix we collect some general remarks about the process of iterating transformations, particularly in one dimension. We also discuss, in some detail, a few special one-dimensional transformations which we have had occasion to study.

1. One of the first simple transformations whose iterative properties were established is the following: x'= f(x) = 4x(1 - x) To obtain these properties we consider, instead of (1), the broken-linear transformation:* 2x for x< 1/2 , x: =g(x), g(x) = 2(1-x) forx > 1/2 (2) 2(1\- x) for xa> 1/2

The study of the iterates of this transformation is equivalent to investigating the iterates of a function S(x) defined as follows: if x = O.a 1a 2a 3... an..., where the ai are either 0 or 1, (3) then S(x) =.a2a3a4 .... (4)

In other words, S(x) is merely a left shift of the binary word x by one place. The iterative properties of S(x) are in turn deducible from the law of large numbers in the case of Bernoulli. In effect S(i)(x) falls into the first half of the interval if and only if ai = 0. The ergodic average E F(l)[S)(x) i=l is therefore the same as the fraction of ones among the ai for 1 <i < N. F(I) is the characteristic function of the interval [0,1/2].

The relation between (1) and (2) is that of conjugacy: there is a homeomorphism h(x) of the interval [0, 1] with itself such that g(x) = h[f[h-(x)]] . (5)

* This transformation has already been mentioned in section VI.


350

Thus the study of the iterates of the quadratic transformation (1) reduces to the corresponding study for the broken-linear transformation (2). In this case, h(x) can be written down explicitly:15 2 h(x) =- sin-l(V/) . (6) 7r

2. The Set of Exceptional Points. In the case of the function f(x) = 4x(1 - x), it is true, then, for almost every* initial point, that the sequence of iterated images will be everywhere dense in the interval, and what is more, the ergodic limit can be explicitly computed; it is positive for every sub-interval.

There exist, however, initial points x such that the sequence x, f(x), f[f(x)],... is not dense in the whole interval [0,1]. Obviously, all periodic points, i.e., points such that, for some n, f(n)(x) = x, are of this sort. It is interesting to notice, however, that there exist points x for which the sequence f(i)(x), i = 1,2,..., is infinite without being dense; there are, in fact, non-countably many such points. To show this we consider the equivalent problem of exhibiting such points for the function S(x) introduced above. The construction then proceeds as follows. Consider a point x = 0.ala2 ... an ... We define a set Z consisting of all those x's which have an = an+i =an+2 for all n of the form n = 3i. In other words, the set Z consists of points which have every binary digit repeated three times, the sequence being otherwise arbitrary. Consider now the transformation x' = S(x), where S(x), as defined in (4) above, is a shift of x one index to the left. We now look at the subinterval from 0.010 to 0.011. Starting with any point in Z, it is clear that no iterated image will fall in this sub-interval; no three successive binary digits of a point in Z are of the form (010). It is easy to see that Z contains non-countably many points; in particular, it contains non-periodic points, so that the set of images S(i)(x), i = 1,2, ... is infinite, but not dense in [0,1].

Presumably one can find points in Z for which the ergodic limit exists. The measure of the set Z is zero, but relative to Z, the set S of those points for which the sojourn time exists still form a majority. We may define majority either in the sense of Baire category or as follows. Take points na (mod 1), n = 1,2,..., where a is an irrational constant. Consider the set N1 of those n's for which na (mod 1) belongs to Z, and also the set N 2 of those n's for which na (mod 1) belongs to S. We then say that the points belonging to S form a majority of those points belonging to Z if the relative frequency of N2 in N1 is one.

* Almost every is to be understood in the sense of Lebesgue measure.


351

The behavior exhibited by the points belonging to Z is more general, in the sense of measure, for some other transformations of the interval [0,1]. It is possible to give examples of continuous functions such that, for almost every point, the iterated sequence will be nowhere dense in the interval, although the sequence does not converge to a fixed point.

3. A Remark on Conjugacy Theorem.Let g(x) be the broken-linear function of equation (2) below, i.e., 2x, o<x<g(x)-21 (2) 0 g(x)= 2(1 - x), 2 >x > 1

Let t(x) be a convex function on [0.1] which transforms the interval into itself, and such that t(O) = t(1) = O. For some p in the interval, we must have t(p) = 1; by convexity, there is only one such point. Consider the lower tree * generated by the point 1. The necessary and sufficient condition that t(x) be conjugate to g(x) is that this tree be combinatorially the same as that generated by 1 under g(x), and that the closure of this set of points be the whole interval, i.e., that the tree be dense in [0,1].

The condition is obviously necessary, since the point 1 generates a tree under g(x) which is simply the set of binary rational points. Under any homeomorphism h(x) which has to effect the conjugacy, the point 1/2 must go over into p, and our assertion follows.

To prove sufficiency, we construct h(x) in the following manner. We take h(1/2) = p by definition. We next chose h(1/4) to be the smaller of the two values of t-1 (p); h(3/4) is then by definition the larger of these two values. We then take h(3/8) to be the smaller of the two values of t- [h(3/4)], and so on. Proceeding in this fashion, we thus construct a function h(x) defined for all binary rationals. It remains to prove that we can define it for all x by passage to the limit. This, however, follows from the assumption that these points are dense in [0,1] and that their order is preserved. The function h(x) will obviously be monotonic, and, being continuous, will possess an inverse h-l(x). From our construction it then follows that h[g(x)] = t[h(x)].

* By the lower tree of p (under f(x)) we understand the smallest set of all points with the following properties: (a) The set contains the given point p. (b) If a point belongs to the set, then so do all its counter-images under f.


352

4.Broken-Linear Transformations. In one dimension these are functions f(x) that are continuous on [0,1] and linear in subintervals of [0,1]. We assure that the graph of the function has a finite number of vertices, i.e., that f(x) consists of a finite number of lines fitted together continuously. For these broken-linear transformations one certainly expects that the ergodic limit exists for almost every point. For example: if one considers the sequence of iterated images T(W)(p), then, for almost every initial point p, the time of sojourn should exist for all sub-intervals, i.e., N lim N fR[T(i)(p)] i=l

should exist for almost all p and all measurable sets R; here fR is the characteristic function of the set R.* The value of this limit may indeed depend on the initial point p; it is likely, however, that all the points of the interval can be divided into a finite number of classes such that, within each class, the value of the limit is the same.**

There is another finiteness property that these transformations may possess. Given n, consider all broken-linear transformations which have at most n pieces (i.e., the space divides into n regions, in each of which the transformation is linear). Then it may be conjectured that there are only a finite number of different types of such transformations, where any two transformations of the same type are asymptotically similar (in the sense defined above, section VI). In other words, according to this conjecture, the type (or class) that a given transformation belongs to does not depend on the precise numerical values of

* We should perhaps mention here a more general conjecture. Suppose T is a polynomial transformation of the sort described by equations (10) to (13) of the Introduction. We then conjecture that the sequence of iterated images T(n)(p) has the following property: Let C be any cone of directions in n-space, and let fc(p) be the characteristic function of this cone, i.e., fc(P) = 0 if p does not belong to C, fc(p) = 1 otherwise. Then, for almost every p, exists.(l6)

** See pages 71, 72 of S. Ulam, A Collection of Mathematical Problems, New York 1960. An analogous conjecture can be made concerning the actual limit sets Lp(T) for four cubics in three variables.


353

the coordinates of those points where the derivative is undefined (corner points), but is determined only by the combinatorial structure of the subdivision of space into linear domains. In one dimension this means that the type of a transformation is determined by the number and interrelation of the nodes in the graph of the function, and not by their precise location.

5. Numerical Accuracy. The machines we use to compute the iteration process work with a fixed number of significant digits;* in MANIAC II, for example, this number is eight. It is therefore clear that any direct, single-step iterative process carried out on this computer will exhibit a period in not more than 108 steps. Given an algorithm which is iterative and of first order [i.e., the nth step depends only on the (n - )th], the process will, with great probability, exhibit a period which is much shorter. Statistically, one can reason as follows. If we assume a random distribution of, say, the last four digits of all computed numbers, then the probability that the cycle will close long before the full theoretical run of 108 steps is extremely close to one. Indeed, after producing numbers A1, A2,...,Ak, the chance that Ak+1 will be equal to one of the preceding numbers is of the order of k/108 . If we continue the calculations up to A2k, the chance that at least two numbers in the chain will coincide is approximately

/ k\k This is practically equal to 1 1-- if k -104 e

Clearly, going to 3k, 4k, ..., the probability that the chain will be cyclic gets very close to one. The situation is quite different in an iterative process involving two or more variables, e.g., computing (Ak+l, Bk+l) from (Ak, Bk). On a probabilitic basis alone, one expects to encounter periods of length ~ 108 (the maximum possible being 1016). In practice, this means that in two dimensions one does not expect to encounter periods of accidental or false periodicity unless one generates very long sequences.

* For any particular machine one can increase the accuracy by restoring to so-called multiprecision arithmetic. This can generally be done only at the cost of considerable loss in speed.


354

As the argument above shows, fortuitous periodicity can be a very real danger in one-dimensional iterative calculations. Indeed, we came across a striking example in the course of studying the asymptotic properties of the transformation. y = sin ry, 0 < y 1. (7)

This classical transformation is symmetric about y = 1/2, and maps the unit interval into itself. Furthermore, both fixed points-y = 0 and y = 0.7364845-are repellent. Now a simple argument shows that for such a function there cannot be any attractive periods of any finite order, that is, the derivative dTk (y) IY=Y dy

evaluated at any periodic point T(k)(yp) = yp is always greater than one in absolute value.* Nevertheless, when we performed the iteration on MANIAC II, the limit set we observed was invariably one of two finite periods, the first of order 1578, the second of order 6168. These periods were exact to the last available binary digit! This spurious convergence is undoubtedly produced by the complicated interaction of several factors, e.g., the particular machine algorithms for multiplication and round-off, our choice of finite approximation to the function sin7y, and so forth. When we repeated the calculation on the STRETCH computer (which works with about 15 significant figures), there was no observable tendency toward convergence to such periods.

If one is interested in obtaining the asymptotic distribution of iterates under some transformation like (7), one must either provide for more significant figures or resort to ingenious devices. One such trick which appears to work well and is relatively convenient-consists essentially in computing the inverse transformation. Since the functions

* It is essential for this argument that the transformation maps the whole interval into itself. The number of distinct periods of order k can then be easily enumerated. The conclusion follows on noting that dydT(k) (y)

must be the same for all points y belonging to the same period. If, however, the maximum value of the function is less than one, T(k) (y) can have relative minima above the y axis; then, indeed, there may exist attractive periods, i.e., the line y' _ T(k)(y) = y may intersect the curve sufficiently close to an extremum so that the kth order fixed point is attractive.


355

under consideration are two-valued, this involves introducing a random choice at each step. Specifically, taking some initial point po, we compute the sequence o, f-1)(po), f-2(po),....

Here the symbol f-k(po) implicitly contains the prescription that we choose one of the two values of the true inverse at random; thus the sequence f-i(p)_= f[f-(i-1)(p)], i = 1,2,..,

implies a sequence of random decisions as to which counter-image to choose at each step. If the calculation is carried out in this fashion, the chance of falling into an exact period is vanishingly small until we reach a chain length in the neighborhood of the theoretical maximum. Once having obtained our inverse sequence, we can conceptually invert it, that is pretend that we started with the last point and proceeded to po by direct iteration.*'**

6. In this sub-section we present some results of a study of a particular one-dimensional transformation:*** T,: y' = W(3 - 3W + aW2 ), W =3y(1 - y) . (8) This arises in a natural manner from a certain sub-class of our generalized cubic transformations in three variables: 10 Xi=dijMj, (9) j=l

' One cannot, of course, actually reverse the calculation and expect to reproduce the sequence. If one could, there would be no need for this stochastic device.

** This procedure presupposes a good method for generating random numbers. There are several of these which are well suited for use in automatic digital computers. One of the most common-and, in fact, the method used by us-is to generate the numbers by the chain: Rn+l - RoR,(mod 2k), where k is the binary word length of the machine, and Ro is some properly chosen constant. On MANIAC II, Ro = 513. This chain closes, but its length is greater than 1012. The lowest order binary digits are themselves not random, but this makes no difference in practice.

*** This transformation was introduced by the way of illustration in section I.


356

3 dij = 1, all j, 0 < dij 1 . (10) i=l Namely, we choose certain of the dij as follows: d13 -= , d17 = d19 = d34 = d 36 = 1, d 31 = 0.

The rest are arbitrary, except that they must, of course satisfy (10). If we restrict ourselves to the sub-class of initial points such that x3 = 0 (i.e., the side S + a = 1 of our reference triangle), the second power of the transformation can be written in the form (8). Ta is symmetric about y = 1/2, but it does not map the whole interval [0,1] into itself; its maximum value is Ymax = T ( 4 (11)

so this is the right-hand boundary of the invariant sub-region. The lefthand boundary is then the image of y' . In the range 1> a >0.9, the fixed point varies from yo = 0.8224922 at a = 1 down to yo = 0.8193719 at a = 0.90. Over this range, the derivative at yo is negative and greater than 1 in absolute value; thus the fixed point is repellent.

We have studied this transformation as a function of the two parameters-the initial point yo and the coefficient a--on the STRETCH computer. To study the asymptotic distribution, we divide the interval into 10000 equal parts and have the machine keep track of the number of points which fall into each sub-interval over the iteration history. Such a history was usually taken to be a sequence of length n x 105, with 4 <n< 9.* Should a period (of order k< 3 x 105) be detected by the machine, the calculation is automatically terminated. If the order of the period is not too great (k< 500), the values of the periodic points are printed out, and the value of

T(k) (y) dy is calculated over the period; if dy ( o 1 ,

* As we have already remarked (section I), the calculation of 2 x 105 iterates of this transformation requires about I minute on STRETCH.


357

this fact is strong evidence that the observed period is actually a limit set.

Our numerical investigation of (8) has been mostly restricted to the range 0.98 <ar 1.* Even in this restricted parameter range, the observed asymptotic behavior is of bewildering complexity. For a = 1, the distribution of iterates in the interval is extremely nonuniform. There are large peaks at the end-points of the allowed subinterval, with much complicated fine-structure in between, i.e., many relative maxima as well as sizeable intervals in which the distribution is locally uniform. This general behavior persists as a is decreased down to a = 0.9902. At a = 0.9901, however, a dramatic change takes place; most of the points concentrate in a few small sub-intervals. The limit set is, in fact, a pseudo-period.** At a = 0.99009, the pseudoperiodic behavior is still evident, but the occupied sub-intervals are larger. They contract, however, for a = 0.99008, and at a = 0.990079 a period of order k = 42 is found. Actually, for this particular value of the parameter a, there appear to be two possible periodic limit sets, with k = 42 and k = 84 respectively. The dependence on the initial point is, of course, quite complicated.*** For example, as yo is varied

over the values yo = 0.11, 0.12, 0.13,..., 0.21, the period with k = 42 is found in all except three cases, namely yo = 0.11, 0.16, 0.18, which apparently lead to the periodic limit set with k = 84. Both periods are numerically well-attested; they are exact to the last binary digit and have (d T()(y) < t

* Below a = 0.98, various periodic limit sets of order k< 24 were found on MANIAC II. We have not studied this parameter range on STRETCH. ** It may actually be a period with order k = 63049. This is the result indicated by the machine. At present we have no way of verifying this.

*** Subsequent numerical work has shown that this yo dependence is spurious. The calculations were performed with multi-precision arithmetic; in some cases as many as its decimal places were retained!

t It is difficult to decide whether this behavior is real or not. Some supporting evidence for its reality is the observed behavior for values of a very close to this critical value a = 0.990079. We find that, with a = 0.9900789, only a period with k = 42 is obtained, while on the other side, a = 0.9900791, the limit set is always a period with k = 84.


358

As we continue down in a, the limit sets are pseudo-periodic* until we reach a = 0.99007, for which a period of order k = 112 is found. This appears to be the approximate right-hand boundary of a periodic belt; that is, in the range 0.98990 <a< 0.99007 there are only periodic limit sets for this transformation. All these have orders which are multiples of 14 (with k< 112), except for the (approximate) lefthand boundary of the a-range (a = 0.9899) where a period of order 7 exists. The dependence of these limit sets on the initial point is complicated, and will not be reproduced here.

At a = 0.98988 there is another large discontinuity in behavior; we again find an asymptotic distribution which covers the whole allowable sub-interval in non-uniform fashion. No more periodic limit sets are found as a is decreased to a = 0.9800. The only phenomenon of note is the splitting of the limit set into two parts; this occurs somewhere between a( = 0.986 and a = 0.985. The resulting gap-which contains the fixed point-continues to widen as a -a 0.980.**

The results of this investigation-which is rather incidental and subordinate to the larger study reported in the present paper-clearly show that there is a great deal to be learned about the asymptotics of iterative processes, even in one dimension. It seems that "pathological" behavior is not a property of higher-dimensional systems alone. With regard to our study of the periodic limit sets, it may be argued that what we have really done is to investigate, in a rather indirect manner, the behavior of the roots of high-order algebraic equations as a function

* In this range, the machine detected some exact periods of huge ordere.g., k = 295148 (ac = 0.990079)! There seems to be no compelling reason to take this at face value. ** Such gaps have been observed in other cases. One interesting example is the transformation:

y' = W2 (3 - 2W), W = 3 y(l - y), which is also a special case of one of our cubics in three variables. For this transformation, the limit set consists of four separate pieces, as follows: (I) 0.3455435 < y < 0.4086018, (II) 0.4296986 < y < 0.5791385, (III) 0.7562830 < y < 0.8146459, (IV) 0.8220964 < y< 0.84375.

In the present case, i.e., that of T,, the existence of the gap can easily be predicted. Let yo(a) be the fixed point. There will clearly be a gap providing T2(ymax > yo(a), since in this case only one of the two inverses of yo lies in the allowed interval. Of course, the determination of the critical value ac of a from the equation TV2(Ymax) = Y(ff) would in any event have to be carried out numerically.


359

of their coefficients. This is certainly true, at least in part. It is therefore of interest to observe that the iterative method seems at present to be the only effective tool for treating this purely algebraic problem.

Appendix II

1. In this appendix we collect the photographs and tables illustrating the phenomena discussed in section IV and VI. The notation used has been described in sections II, III, IV, and VI. For convenience, some of the transformations are written out explicitly.

2. Modifications of the TransformationTA. In shorthand notation, this transformation is C1= {3, 5, 7, 9, 10} , C2 = {1,2,8} 1

In the S, a coordinates, this reads: 3S2 3 2 +S' = F(S, a) S3-6S2a-3Sa2 + 4a3+ 3Sa- +1, 2 2 a' = G(S, a) -S 3 + 3Sa2 + 2a3+S2-3Sa +a2 . (2) 2 2

The (repellent) fixed point has coordinates: So = 0.5885696, ao = 0.1388662.

The generalized transformations based on TA may be written: S' = (1 -At)S + At{F(S, a) + efrs(S, a)}, a' = (1 - At)a + At{G(S, a) + egs(S, a)} ;

the original transformation is recovered by setting At = 1, e = 0.


360

TABLE A Figure number At e r, s Comments A-6, Class IV limit set A-2, Period: k A-2, Period: k A-2, Class I limit set A-2, Class I limit set A-2, Scaled plot of part of A- Scaling factor .(< S< 485, < a < A-2, Scaled plot of part of A- Scaling factor ~(<S< 492, <a < A-2, Pseudo-period A-2, Scaled plot of part of A- Scaling factor ~(<S< 620, <a< A-5, Class IV limit set A-5, Scaled plot of part of A- Scaling factor ~(,<S< 64, < a < A-3, Period: k A-3, Class IV limit set A-5, Period: k A-5, Class IV limit set A-successive iterates of T()p) Initia point: So 7034477, ao Scaling factor ~

Modifications ofTB. In shorthand notation, this transforma- tion is C{2,4,6,7,, C{5,8, . In the S,a coordinates, it takes the form S' F(S, a) -a-S-a- aS a ( a'G(S, a) - -S - Sa - a-S- a.


361

The coordinates of the fixed point are So = 0.6887703, ao = 0.1592083.

The only associated T(r,s)e discussed is the case (r,s) = (1,0); for this case, the generalized transformation, written in the form of equation (4), has frs = f0o = (S-a) (8) grs = 910= 0 .

In table B below, all scaled plots show the region: 0.30 <S< 0.65, 0.20 < a < 0.36, i.e., the upper left-hand piece of the complete limit set (shown in figure B-1 for the unmodified transformation). The scale factor is ~ 2.9.

TABLE B Figure number At e r, s Comments B-Class IV limit set (points) B-- Scaled plot of part of figure B-B-Class IV B-Class IV Compare figure B- B-- Class IV B-- Period: k (points in this piece) B-- Class IV B-1, Class IV: separate pieces (shown here) Compare figure B-B-1, Period: k (points shown here) B-1, Class IV B-ll 1, Period: k (points shown here) B-1, Class IV B-1, Period: k (points shown here) B-1, Period: k (points shown here) B-Class IV; shows "transition" from period with k (At


362

blank


363

blank


364

blank


365

blank


366

blank


367

blank


368

4.Modifications ofTD andTE. These transformations are given by the schemes: T C1 - {2, 7,8,9, 10}, C2 {= 4,5,6} , C1- = {2,5, 7,8,9} (10) C 2 = {4,6,10} .

In the S, a coordinates, these read explicitly: S'= S3 + S2a+ Sa2+ a3- 6Sa-6a2 + S + a T2 2 2 2 2 2 3 9 2 2 9 32 2 3 a= S3+ S2a- -Sa2 - a-3 + 3a2 + S-a, (11)

with fixed points: So = 0.6525211 (12) ao = 0.3056821; S' = 9S2 a - a3 - 3S2 - 12Sa + 3a2 +3S + 3a (13), a' =-3S2 a + 3a3 + 6Sa - 6a2

with fixed point: So = 0.6444612, (14) ao = 0.3219578 .


369

TABLE D Figure number At e r,sComments D-- D-Compare figure D-D-96, Limit set for At superimposed on L(TD) D-- D-1, D-1, Compare figure D- TABLE E Figure number At e r, s Comments E-- E-1, Shows convergence to periodic limit (set k from initial point close to fixed point E-- Compare with E-5, E-E-Compare with E-5, E-E-1, E-1,


370

blank


371

blank


372

5. Broken Linear Transformations. These are described in section VI. Figures H-1 and H-2 show, respectively, 1000 points in the limit sets of T1 and T2 . The latter are specified as follows: 1 1 Ti l= , Y 1, dl 1 0.95, (15) X2 = 0.6, Y2 = 0.5, d2 = 0.95,

with fixed point: 1 2 xo= 2' Yo= 3 (16) and T x = 0.5, yi = 0.9 , d = (1 ,7 2 2 = 0.3, Y2 = 0.7, d 2 = 0.8 , with fixed point: 80 72 43' o= 143 (18)

The remaining figures, H-3 through H-17, show the limit sets belonging to the one-parameter family T,: T Xi=Y1 = z, d = 11 x2 =2 = 1-z, d2 = 19


373

blank


374

blank


375

blank


376

The identification is given by the following table (all figures except H-11 show 1000 points):

TABLE H Figure number z Comments H-For z - 5, see the remarks in section VI, note H-H-Compare figure H-H-H-H-Like "class IV" limit set H-Like "class IV" limit set H-H- Points of H-plus the next consecutive iterates H-H-H-H-H-H-Fixed point becomes attractive below z

References

1. J. Schreier und S. Ulam, Eine Bemerkung fber die Gruppe der topologischen Abbildungen der Krieislinie auf sich selbst, Studia Math. 5 (1935).

2. Menzel, Stein and Ulam, Quadratic Transformations, Part I, Los Alamos Report LA-2305, May, 1959 (Available from the Office of Technical Service, U.S. Dept. of Commerce, Washington, D.C.)

3. See for example, S. M. Ulam, On some new possibilities ... computing machines, I.B.M. Research Report 68, 1957. 4. See, e.g., Proceedings of Symposia on Applied Mathematics, Vol. X, American Mathematical Society (1958), or the article by Marshall Hall, Jr. in Surveys of Applied Mathematics, Vol. IV, 1958.


377

5. In the absence of a comprehensive reference, we refer the reader to the recent issues of Mathematics of Computation (1960-1962). See, e.g., Vol. 16, No. 80, October 1962, especially the article by D. H. Lehmer, et al.

6. See the article by E. T. Parker in Proceedings of Symposia in Pure Mathematics, Vol. VI, American Mathematical Society (1962). Parker's original construction supplied the final step in the disproof of a famous conjecture by Euler.

7. See H. Gelernter et al., Proceedings of the Western Joint Computer Conference, San Francisco, May 1960, pp. 143-149.

8. M. B. Wells, Proceedings of the IFIP Congress (1962). 9. C. Shannon, Philosophical Magazine 41, March, 1950. See also Kister, Stein, Ulam, Walden and Wells, Journal of the Association for Computing Machinery, Vol. 4, Number 2, April 1957.

10. A. Ostrowski, Solution of Equations and Systems of Equations, Chapter 18, (New York 1960).

11. A general reference is N. Minorsky, Non-linear Oscillations, Princeton 1962. More detail on theoretical points can be found in 12.

12. S. Lefschetz, Differential Equations: Geometric Theory, New York 1957.

13. Nemitsky and Stepanov, Qualitative Theory of Differential Equations, Princeton 1960.

14. See reference in footnote 11.

15. This result was first published by S. Ulam and J. von Neumann, American Mathematical Society Bulletin, Vol. 53 (1947), p. 1120, Abstract 403. See also the work of O. Rechard, Duke Mathematical Journal, Vol. 23 (1956), pp. 477-488.

16. Cf. the article by S. Ulam in Modern Mathematics for the Engineer, second series, edited by E. F. Beckenbach, New York 1961, p.280 .


379

12—
On Recursively Defined Geometrical Objects and Patterns of Growth:
With R. G. Schrandt (LA-3762, August 15, 1967)

This report formed a foundation of a now great ongoing effort involving many authors concerning the behavior and growth of cellular automata. See, for instance, the proceedings of a March 1983 Los Alamos National Laboratory conference on cellular automata and reports written by Wolfram and others. This report was reprinted in Essays on Cellular Automata edited by A. Burks, published by the University of Illinois Press in 1970. (Author's note). *

Abstract

Illustrations are given of computer-generated patterns exhibited by figures "growing" according to certain recursive rules. Examples of growing patterns in twoand three-dimensional space are given. Patterns are discussed in an infinite strip of a given width where periodic growth is observed. When modification of the rules of growth allows portions of the pattern to die out, configurations split into separate connected pieces, exhibiting the phenomena of both motion and some self-reproduction. A simple conflict rule together

* Also in Science, Computers, and People, by S. Ulam, Birkhauser, 1986. (Eds.)


380

with this modification allows a game of survival between two systems growing in a finite portion of the plane. The examples show both the complexity and richness of forms obtained from starting with a simple geometrical element and the application of a simple recursive rule.

In this report we discuss briefly some empirical results obtained by experiments on computing machines. We continue the work described in a paper "On Some Mathematical Problems Connected With Patterns of Growth."'

Rules of Growth

Growth is in the plane subdivided into regular squares. The starting configuration may be any arbitrary set of (closed) squares. The growth proceeds by generations in discrete intervals of time. Only the squares of the last generation are "alive" and able to give rise to new squares. Given the nth generation, we define the (n + 1)th as follows: A square of the next generation is formed if

a) it is contiguous to one and only one square of the current generation, and b) it touches no other previously occupied square except if the square should be its "grandparent." In addition: c) of this set of prospective squares of the (n + 1)th generation satisfying the previous condition, we eliminate all squares that would touch each other. However, squares that have the same parent are allowed to touch.

In three dimensions the growth rules are the same. One merely replaces the squares by cubes and observes all three provisions.

We show an example of such a pattern growing on the infinite plane and then discuss patterns of growth in an infinite strip of a given width where a periodic growth is observed. We discuss, beyond the work mentioned in Ref.1, the behavior of figures growing according to our rules, with a new proviso: every element of the figure which is older than a specified number of generations, say two or three, "dies," i.e., is erased. This makes the figure move in the plane. We show some cases of such motion, with occasional splitting of the figures into separate connected pieces. In some cases these figures are similar to


381

the original ones, thus exhibiting phenomena of both motion and of self-reproduction. As another amusement we tried on the computers the following game: starting, still in the plane with two separate initial elements, we let each grow according to our rule (including erasure or death of the "old" pieces); then when the two patterns approach each other we still apply the rule of a further growth of each figure with the proviso that the would-be grown new pieces are not put in if they should try to occupy the same square. This gives rise to a game for survival or a "fight" between two such systems-in some cases both figures die out.

Finally, we give an example of a similar process of growth in threedimensional space (subdivided into regular cubes) with our rules for recursive addition of new elements.

Two-Dimensional Patterns

We present as examples of the planar type of pattern Figs. la and lb. Our start was a single square. The patterns are plotted merely in one quadrant of the plane and show the result of 100 and 120 generations of growth, respectively. Fig. lb shows the pattern on a large square with 100 units on a side; the portion of growth that extends beyond 100 units horizontally or vertically is not plotted. The figures are symmetric about the diagonal of the square, and the density of the occupied squares is about 0.44. There is no apparent periodicity in portions of this pattern. As shown in Ref. 1, the "stems" grow indefinitely on the sides of the quadrant, and the side branches split off from the stem. It is not known whether some of these side branches will grow to infinite length or whether they will all in turn be choked off by other side branches growing from the stem at later times.

In Fig. 2 we show a pattern grown from an initial configuration consisting of three noncontiguous squares at the vertices of an approximately equilateral triangle. One will note that the patterns in the subquadrants are both identical to those of Figs. la and lb. The borders or strips between the subquadrants are due to interference between patterns generated by the individual starting squares. One of the strips reduces to a stem, since two of the starting squares generate patterns symmetric with respect to a 45° line through the center of the triangle.

By restricting in advance the growth of a pattern to an infinite strip of finite width in the plane, one observes a periodic growth. The


382

proof that in a finite strip the pattern must be periodic is easily obtained. On inspection of our growth procedure one observes that the last generation is confined to a part of the strip which extends through its width and an equal distance in length back of the most forward square. There is only a finite number of possible patterns in such a square. Therefore, a configuration must repeat itself and from then on the whole process starts again. Figure 3 shows different patterns generated in strips of widths from 8 to 15 through 100 generations of growth. In each case the start is a single square in the upper left-hand corner of the strip. Table I gives the observed lengths of the periods for strips of widths 1 to 17. There seems to be no simple relation between the width of the strip and the length of the period.

Rules for Termination or "Death" in the Pattern

We have experimented with a rule for erasing, i.e., elimination of a part of the pattern after it is a fixed number of generations old. For this we have adopted a simpler rule of growth of the pattern by assuming only condition (a). Each square that is a certain fixed number k of generations old is erased or "dies" and becomes unoccupied. Later on, the pattern may grow back into these unoccupied positions. We took for k either the values 1, 2, or 3. For example, given a pattern, we grow the squares of the (n + 1)th generation from those of the nth, and then erase those of the (n-1)th. Under this rule the pattern will move and it may split up into disconnected pieces, as shown in Fig. 4a. It turns out that certain parts of the pattern replicate themselves in shape, and these repeat as subpatterns. One such subpattern consists of a straight strip of squares with two additional squares on each end. We call this rather frequent replication subpattern a "dog bone." (See Fig. 4b.)

Another construction concerns the behavior of such patterns in a finite portion of the plane. We have adopted a large square as the space for growth. Its boundary acted as an absorber so that each square which would possibly grow from a square on the boundary was not considered. This was studied under the simplified rule of growth mentioned above. Starting with an initial configuration, say a single small square, the pattern will grow and either eventually "die," or else will become periodic in time and continue indefinitely. In most cases the pattern eventually disappears or dies. This is because the death rule eliminates the old squares, and the simple conflict rule together with the boundary condition prevent any new squares from forming. For these problems we kept only the current generation, so k = 1. By


383

this we mean that given a configuration, we produce the next one and then immediately erase the starting one. We have run a number of cases on a computer to ascertain either the period and its length, or the number of generations before the pattern terminates. This we have done in various sizes of the large square in which the game takes place. A sampling was obtained for sizes of the large square for 2x2 up to 8x8.

As an example, consider the square of size 6x6. There are, of course, 236 possible initial configurations. Out of these we have chosen 132 such configurations at random, assuming that each of the 36 squares has 1/2 chance of being occupied initially. Each of these different starting configurations grew until it became periodic or died out. Let s be the number of states in each sequence and t the length of a period. The values of s ranged from 11 to 109 with an average of 33. The values of t were 1, 4, 6, 8, 12, and 24. Here t = 1 means the pattern died out. In our sample, 87 of the 132 cases had t = 1. The longest sequence has s = 109, with t = 24. In another experiment we tried 15 random starting configurations chosen in an 8x8 large square. The values of s ranged from 49 to 397, with t values of 1, 8, 12, and 16. Ten of our 15 experiments had t = 1.

We can formulate condition (a) of the rules of growth in another way if we keep only one generation before death. In this case the status of any square in the (n+ 1)th generation is determined only by the state of its four neighboring squares in the nth generation. Let us assign a I to an occupied square and a 0 to an empty one. We use the two operators (.) and (+), with the (+) modulo 2-that is, 1+1 = 0.

If an, bn, cn, dn are the four neighbors of a square Xn and all four symbols have values 1 or 0, that is, they represent the states of the squares in the nth generation, then the state of the square x in the next generation is simply Xn+l = -Cndn.(an +bn)+ an bn(cn + dn) where the bars above the symbols represent the complement (also modulo 2).

If the whole region in which the game is played is bounded, say, again by a large square, we will assume that the values on the boundary are always 0. The state of the configuration at time (n + 1) is then obtainable by a fixed transformation from the state at time (n).


384

One of the interesting properties to determine is the existence of states which are self-replicating, that is, they reproduce themselves immediately. These are the fixed points of the transformation defined above. It is easily verified that there are none such (except those identically 0, which means the pattern dies out) for squares of size 2x2 and 3x3. There exists just one such state for the 4x4 square. This is given by

For the 5x5 square there are these two:

There are none of the 6x6 case. Here is an example of one for the 17x17 case. Let A be the second of the two 5x5 matrices. Let Nc and Nr be 5x1 and 1x5 matrices, respectively, with zero elements. Then the matrix A Nc A Nc A Nr O Nr O Nr A N A Nc A Nr O Nr O Nr A N, A NC A is self-replicating.

Contests or Fights between Two Configurations

We may start, on a large finite square, with two different initial configurations each labeled, say, by a different color, so as to distinguish one set from the other. We let each grow according to condition a) of the rules of growth, plus the death rule. Now condition a) states that a square of the next generation is not formed if it is contiguous to two or more squares of the current generation. Two such squares of the current generation may be members of the same configuration or else one from each of the two different configurations. So the growth of these patterns is subject to restrictions for elements of the new generation within themselves separately, and when they are almost in


385

contact, with the two taken together. One or both of these systems may then go to zero or one may survive, for some time or indefinitely.

Figures 5a, 5b, 5c, 5d illustrate one case of such a fight between two starting patterns in a 23x23 square. They show the situation at generations 16, 25, 32, and 33, respectively. We kept two generations before erasure for both patterns. We assumed as initial conditions for pattern A one square in the extreme lower left-hand corner and for pattern B one square placed one unit of distance off from the upper right-hand corner. After 33 generations (Fig. 5d) pattern B won, at which time the nth generation squares of pattern A were completely erased. (The (n- 1)t generation squares of A will disappear the next generation.)

In another game we have started with two single squares in the same relative positions from the corners. For pattern A we kept one generation before erasure, and for pattern B two generations. In this case A won in 112 generations on the 23x23 board. There is no figure for this contest.

Three-Dimensional Model

We again used all three conditions of the rules of growth in forming a three-dimensional pattern. Figures 6a and 6b show two views of a model of such a pattern. Two-dimensional plots of the pattern were obtained from the computer after 30 generations of growth. The model was constructed from these plots and then photographed. The starting configuration was the single cube on the extreme left of Fig. 6a. This model represents the part grown in one octant of the space. In each octant there is a further threefold symmetry along the coordinate axes, of which we took the part x > y, x > z. There still remains a plane of symmetry at 450 to the x axis. The dark cubes represent the 30th generation elements.

Our examples show both the complexity and the richness of forms obtained from starting with a simple geometrical element and the application of a simple recursive rule. The amount of "information" contained in these objects is therefore quite small, despite their apparent complexity and unpredictability.

If one wanted to define a process of growth which is continuous rather than by discrete steps, the formulation would have to involve functional equations concerning partial derivatives.


386

It appears to us that a general study of the geometry of objects defined by recursions and iterative procedures deserves a general studythey produce a variety of sets different from those defined by explicit algebraic or analytical expressions or by the usual differential equations.

TABLE I Width of StripPeriod of Pattern

Acknowledgments

The three-dimensional model was constructed by Barbara C. Powell and photographed by W. H. Regan.

Reference

1. S. Ulam in Proceedings of Symposia in Applied Mathematics, XIV, American Mathematical Society 1964, p. 215 to 224; see also J. C. Holladay, and S. Ulam, Notices of the American Mathematical Soc. 7 (1960), p. 234; and R. G. Schrandt, and S.Ulam, Notices of the American Mathematical Society 7 (1960), p. 642.


387

Fig. la. Growth from a single starting square after 100 generations.


388

Fig. lb. Same as Fig. la but after 120 generations.


389

Fig. 2. Growth from three noncontiguous starting squares.


390

Fig. 3. Patterns generated in an infinite strip of widths 8 to 15, after 100 generations.


391

Fig. 4a. Growth from a single starting square with death rule, keeping two generations. The nth generation squares are cross hatched, the (n - l)th are blank. The integers at the top are the number of squares in the nth and (n - )th generation, and the generation number.


392

Fig.4b. Same as Fig. 4a but after 45 generations.


393

Fig. 5a. Fight between two different patterns after 16 generations, keeping two generations before erasure. There are 26 (n - 1)th generation squares of the lower pattern, and 4 nth generation squares. The upper pattern has 32 (n - )th generation squares, and 12 nth generation squares.


394

Fig. 5b. Same as Fig. 5a but after 25 generations.


395

Fig.5c. Same as Fig. 5a but after 32 generations.


396

Fig.5d. Same as Fig. 5a but after 33 generations. Lower pattern has been eliminated.


397

Fig. 6. Model of three-dimensional pattern after 30 generations of growth. The starting configuration is the single cube on the extreme right.


399

13—
Computer Studies of Some History-Dependent Random Processes:
With W. A. Beyer and R. G. Schrandt (LA-4246, October 28,1969)

This report is about a novel method for the study of several non-independent probability schemata with rather curious results on patterns of growth, iteration processes, and dependent random walks. (Author's note).

Abstract

Various history-dependent random processes are investigated by computer and a few cases are investigated theoretically. These processes include historydependent random walks, a combination of a birth process and a self-avoiding random walk, historydependent randomly-generated increasing integer sequences, and randomly-generated integer sequences which might have prime-like densities. A possible random ergodic theorem for history-dependent processes is discussed.

I—
Introduction

In this paper we consider some examples of random processes in which the probabilities of the outcome of the nth step depend upon the entire past history of the process. This, of course, means that they


400

are non-Markovian processes. In contrast to Markovian processes, very little is known theoretically about these history-dependent processes. They are much more difficult to analyze. However, the real world abounds with examples of the latter.

Most of the examples are results of computer studies. In some instances, theoretical results are known and are reported here.

The computations were performed on a CDC-6600 machine. The random-number generator needed had the form x -5 15 xi_1 (mod235 with xo = 515). The scales on the figures are linear.

II—
Random Walk Examples

A—
Self-Avoiding Random Walk

A process which cannot, in any reasonable way, be made into a Markov process is a self-avoiding walk, i.e., a walk on a lattice starting at some fixed point that is not permitted to visit any point more than once. A survey of this topic will be published by Domb.1 Another version of a self-avoiding walk is a scattering process in which scattering is not permitted at a point that previously had been a scattering center. The physical idea is that the particle, which had been the scattering center, is moved by the scattering process.

B—
History-Dependent Walk On the Line (Pólya)

An old example of a history-dependent process is the P6lya urn scheme described by Feller.2 This has been used as a model of phenomena, such as contagious diseases, where an occurrence of a disease increases the probability of further occurrences.

As a special case of the P6lya scheme on two symbols, consider the following example. Let ao = 0, al = 1 and an+l = ai (i = 0, 1, ..., n) where ai means that one of the ai (i = 0, 1, ..., n) is selected at random with uniform probability. Let Yn- i=o ai Ti=n

Then Prob. [Yn = k/n] = 1/n - 1 for n > 1 and k = 1, 2, ..., n- 1. As discussed by Feller in Ref. 3, p. 237, Y = limnooYn exists with probability 1, since [Yn] is a martingale and the distribution of Y is uniform on [0,1]; i.e., Prob. [0 <a < Y < b< 1] = b-a.


401

The process Yn can be interpreted as a random walk on the horizontal line of integers with 0 interpreted as a step to the left and 1 as a step to the right.

C—
History-Dependent Walk On a Plane Lattice

A two-dimensional version of the Polya scheme on two symbols is a walk on the lattice of plane points with integer coordinates starting at the origin and is executed according to the following rule. The decision to execute either a horizontal step or a vertical step is made independently each time with Prob. [horizontal step (H)] = Prob.[vertical step (V)] = 1/2. After the decision H or V is made, a second decision is made. In the case of H, a decision is made to take the step right or left in accordance with the P6lya two-symbol game discussed previously. In a similar way, a decision is made to take a step up or down in the case of V. Figure 1 shows the terminal points of 10,000 walks of 64 steps each for walks made by these rules.

Fig. 1. End points of 10,000 random walks of 64 steps on the plane quadratic lattice starting at the origin in which the decision to execute a horizontal or vertical step is made independently with equal probability. The steps horizontally and vertically are made in accordance with the Polya two-symbol game.


402

For comparison, the distribution of 10,000 walks of 64 steps in the case of classical P6lya walks is shown in Fig. 2. Classical P6lya walk is the walk on the points with integer coordinates starting at the origin and selecting one of the nearest neighbors with equal probability. The formula for the probability that the nth step takes the particle to (x, y) is I r7 r4 ,22 n j (cos a + cos 3)n cos xa cos y/dadd* .

It seemed easier to generate the distribution of end points by a Monte Carlo procedure than to use this formula.

Fig. 2. End points of 1000 random walks of 64 steps on the plane quadratic lattice starting at the origin in the classical Polya case.

A second two-dimensional version of the Polya scheme on two symbols is as follows. The probability of a horizontal or vertical step is itself a Polya scheme on two symbols. The remaining decisions are made as before and the resulting distribution is shown in Fig. 3. It is peaked about the coordinate axes in an approximately hyperbolic manner.

D—
A History-Dependent Explosion

A plane configuration on the plane lattice of points with integer coordinates is generated. Let the origin be the first generation. Assuming that certain of the lattice points have been occupied by generations

* See ref. 2 p. 371.


403

Fig.3. End points of 1000 random walks of 64 steps on the plane quadratic lattice starting at the origin in which the sequence of decisions to execute a horizontal or vertical step forms a P61ya two-symbol game. The steps horizontally and vertically are made as in Fig. 1.

up to and including the nth generation, let the (n + 1)th generation be determined as follows. For each point of the nth generation, two random numbers are selected to determine two neighbor points from four possible neighbor points for possible positions to be occupied in the (nth + i)th generation. These two neighbor points are to be occupied provided they have not been previously occupied. As an example (see figure below), suppose the point (0) is the initial point. Assume that positions (1) and (2) are chosen by the random numbers as the new positions. Then, in the first generation, the walk is made from point (0) to points (1) and (2), because they are unoccupied. Now there are two terminal points, (1) and (2), from which to walk in the second generation. If the directions (1) -— (0) or (2) -4 (0) are now chosen as


404

one of the new directions, this walk is not executed, because point (0) is occupied. If both directions (1) -- (3) and (2) -* (3) are chosen, the first walk is executed, but the second is not. The walk terminates if all possible directions chosen for all terminal points lead to points that are occupied.

Two computer examples are given for this type of walk. In the first case, a maximum of two random numbers were chosen for the possible new directions of the walk for each terminal point. The same direction could occur twice if the two random numbers were in the same interval kk+l-<rl, r2 <-,k = 0, 1, 2, or 3 4 - 4

In this case, only one direction for the walk was allowed for this terminal point. In this run, the walk terminated after 108 generations, with 656 total points. This is plotted in Fig. 4. Fig. 4. A history-dependent explosion. The rules for generation of this explosion are explained in the text.

In the second example, this situation was not allowed because the second random number was rejected and a new number was generated until two distinct new directions were obtained for each terminal point. This walk apparently does not stop. In Fig. 5 the walk is plotted through 65 generations, with 3126 points occupied. There are 100 active terminal points in the sixty-fifth generation.

The explosion is in some respects similar to a self-avoiding random walker, except the walker multiplies. Put in other terms, the explosion is in some respects similar to a branched polymer.4


405

Fig. 5. A second history-dependent explosion with different rules for generation as explained in the text.

III—
Integer Sequences Generated by History-Dependent Random Processes

In this section, sequences of integers obtained from the following three random processes are discussed. (a) an+I=an+ ai, i =,..., n (b) an+i=ai + ai, i,j = 1,..., n (c) an+1 = ai - aj, i,j 1,..., n, (i > j)

where a' has the meaning given above. In (a), al is given. In (b) and (c), al, a2 are given.

For (a), we will only mention that Kac and Ulam have shown that the expected value of an, E(an), is asymptotic for large n to eV. In case (b), it is easy to show by induction that 1 E(an) = - (al + a 2)n for n > 2


406

One thousand sample sequences of case (b) were obtained by a Monte Carlo sampling with al = -1, a2 = 1. Let an be the nth member of the kth sequence with 1 <n< 100 and 1 <k < 1000. The averages 1000 bn 1000 E ank=l

are plotted in Fig. 6 as a function of n. In this case, E(an) = 0, n = 3, 4,.... It is seen that the averages increase with n. Fig. 6. The averages over 1000 sequences generated by an+l = ai +aj, ij = 1,..., n with al = -1 and a2 as a function of n.

In Fig. 7, a second example is given with al = 0 and a2 = I and 1 <k< 5000. Then E(an) = 1/3n. However, we plot the deviation 1 5000 \~ I ki/ -n 5000 S a n3 1 -c=l as a function of n. The quantity apparently increases linearly with n.

In the case (c), it can be shown that the expected value of an, En = E(an), satisfies the recurrence relation 2n n-2 - n+2En+=En+ En (n = 1, 2, ...) n+ln(n+ 1)

The asymptotic value of En is discussed in the Appendix and it is shown that En = 0(1/n). Figures 8 and 9 show a graph of En for al, a2 = 0 and 1 and al, a2, = 1, 0, respectively.


407

Fig. 7. Graph of the function g(n) = 1/5000 E-5 I a- E(an) I for the case I/irvvv .k-1 an --=1 where an+i = ai + aj with al = 0, a2 = 1. {an}k=1,...,5000 is a random sampling of 5000 sequences.

Fig. 8. Graph of the function En defined by En+2 = [2n/(n + )]En+l - [(n2 n + 2)/n(n + 1)] En which is the expected value of an where an+l = ai - aj,, i,j = 1, 2, ..., n. Here al = 0 and a2 == 1.

Fig. 9. Same as Fig. 8, except that al = 1, a2 = 0.


408

Figure 10 shows a graph of the quantity 1 1000 bn = 1 E an n 1, ..., 100 1000 __ k=l where ak is the nth member of the kth sample sequence for case (c) al = -1, a2 = 1. The average increases with n, but here n remains small.

Figure 11 is a graph of the function 500 b- >1 f \ak-E(a) I bn = 500 E Jan- E(an) I k=l for n = 1, ..., 1000 with a = 0, a2 =1. [ak] are 500 sample sequences of case (c). This function appears to be parabolic.

Fig. 10. The averages over 1000 sequences generated by a+ =ai - aj,i,j=1, . . ., n with al =-1, a2 = 1 as a function of n.

Fig. 11. The function bn = 1/500 E = la -E(an)l for n = 1, 2,..., 1000 with an+l = ai - aj and al = 0, a2 = 1. {a }k=i ...,500 is a random sampling of 500 sequences.


409

IV—
Number Theoretical Games

In this section, we discuss sequences of positive integers generated by a history-dependent random process which might have densities like the primes.

Let di, d2 , ..., dk be the sequence of differences of the first k primes. Let dk+l = di (i = 1, ..., k) and dk+2= d + 2 ( = 1,..., k;,e i). In general, let dk+2j-1 = di (i = 1, . .., k + 2j-2) and dk+2j = ' +2 (E= 1,..., k + 2j - 2; £ $ i) for j = 1, 2, ... Let so = 1, si = si-i + di (i = 1, ..., n).

The following results are obtained.

1. For k = 8 and dl,..., d8 , the first eight real prime differences, i.e., 1, 2, 2, 4, 2, 4, 2, 4, and n = 1000, 200 "games" are played. The average of s1ooo is 7986. The one-thousandth real prime is 7919. The average number of real primes in sl, ..., s1ooo is about 26%.

2. k = 49, d 1, ..., d 49 are the first 49 real prime differences. Two hundred "games" are played and the average s1ooo is 7643. The average number of real primes in sl, ..., slooo is about 30%. In 20 "games" it is found that the average slo,ooo = 101,356, whereas the 10 ,000 th prime is 104,723. The average number of primes in sl, ..., 10oo000 is 20%.

A second history-dependent random process for generating sequences with perhaps prime-like density is similar to the above. Again, let di,d 2, ..., dk be the sequence of differences of the first k primes with k even. Let dk+i = (i = 1, 2, ..., k/2) and dk+l = / +2 (i = k/2 + 1, ..., k), etc., as before. For n = 1000 with 200 "games," the average value of slooo is 7892 and an average of 26% of the primes up to 7892 were obtained.

P. Stein5 has used such sequences to generate minimal binary additive bases for the even integers.

V—
A Problem

Starting with Ulam and von Neumann6 and Pitt7 there have been various versions of the random ergodic theorem stated and proved. One version is as follows.8 Let [X, -, li] be a measure space and [J, A, p] be a probability measure space with J a set of measure-preserving transformations defined on X. Let J* = 11i Ji where i = J for


410

all i and p* = pli° Pi where pi = p for all i. Then if F(x) e L1(X, t), it follows that {2 I n p* lim -1 f(Tk ... Tx) = f*(x) foralmostall xinX =1 k=1l where (T1 , T 2, ...) is a point of J* and f*(x) e L1(X, /).

In the above theorem, the transformations Ti are chosen independently. One can ask, "Suppose the Ti are not chosen independently, but are chosen to form a Markov chain, or are chosen in accordance with the rules governing a P6lya process (see above), does some random ergodic theorem hold?" One could specialize to the case of choosing from two transformations, each of which is a rotation of the circle.

Appendix

Theorem.E(k) = O(1/k) where k(k + 1)Ek+2 - 2k2 Ek+l + (k2 - k + 2)Ek =(k = 1, 2,...) . (1)

Proof. Define the vector (k)-k)1 (k)1 - e(k)1) e2(k) [e(k )J Then Eq. (1) is equivalent to the vector equation 0 1 £(k + 1) = k) (2) kc-l 2 2k k+l -k(k+l) k+l

We first show that any vector solution to Eq. (2) is bounded for all k> 1 in the norm I1£1j = 111 + \121. Write 0 1B(k)_k-l 2kk+l k+l and o o A(k) _ 2 0 k(k+l)


411

Consider £(k + 1) = B(k)£(k). (3) A fundamental matrix set of solutions for Eq. (3) (Ref. 9) is given by Yk= (4) 1 k for k = 2, 3, .... From work shown in Ref. 9, any solution £(k) to Eq. (2) for k> 2, can be expressed by k-1 £(k) = E Yk-s+1 Y2 - A(s + 1) £(s) + YkY2-£(2) . (5) s=2 Since 00 EA(k) < oo, k=l

it follows from Lemma 3.2 of Ref. 9, p. 21, and Eq. (5) that l£(k)l is bounded for k> 2 and hence for k> 1. Now a computation gives 1 11- k -s 4£1(s) Yk-s+l Y2A(s + 1) (s) = - - k1(s +1)(22) - k sl (s 1)(2 4 2) ' and • - (2) + 22(2) + £1(2) - £2(2) YkY2-'£(2) =k = 1(2) + 252(2) + k21(2) - e2(2)

Thus k-i [1 - 41(s)5(k) =-Zk-s41 (s) s=21( + 1)( + 2) k-s+l + (6) [-£1(2) + 2£2(2) + k21£1(2) - £ E2(s) -£1(2) + 2(2) 1(2 ) + (2 - 1(2)


412

for k > 2. The summation can be written k-1 _ k-s 44£1(s) =2 L+11(s + 1)(s + 2) 1 51k-s+l k-I £ l()k-I [ 1](s) -4 + 4 k+4 (7) -4 + ( 1)(s + 2) '1 [ 1(s + 1)(s +2) ,9=2 8=2 k-sZl

Since 51 (s) is bounded, it is seen that the first term on the right of Eq. (7) can be written k-—1 oc 0i k-41 (s) -41 (s) 4 (s+1)(s+2) -4 (s+1)(s+2) s-2s=2 +4 ( 1£)(s (8) s=-k (k) where C N= -4 s -i 1)($ (+ 2) is a constant. E (s + 1)(s + 2)

Now consider the second term on the right of Eq. (6). For 2 <s<k/2 1 2 k-s ki11 2 = O(k)k-s+1 - (k/2 +1) k+2 k and for k/2 < s <k- 1 1 1 4 (k- s)(s + l)(s + 2) (k/2 + 1)(k/2 + 2) (k + 2)(k + 4)' 1 2 (k-s+1)(s+1)(k+2)- (k+2)(k+4)


413

Thus, with [] denoting integer part, we have k-1 l(s)[k/2](s)(k-s)(s+ )(s+2) (k - s)(s + l)(s + 2) (9) k-1El1 (s) +(k 1)(s = 2) -0(1/k) -i kO(1/k2 ) = O(1/k) Z=[kE]+ (k-s)(s + 1)(s + 2) s=[k/2]+1

and similarly for k-i 1(s) (k - s + 1)(s+l)(s +2) s=2 Hence, from Eqs. (6), (8), and (9), we have £(k) = A + 0(1/k), where A = [] is a constant vector. Therefore, c(k) = A + 0(1/k) Substituting this into Eq. (1), we obtain [k(k + 1) - 2k2+k2 -k +2] A+[k(k + ) - 2k2+k2 - k + 2]0(1/k) = 2(A + 0(1/k)) = 0 Thus, A = 0, which completes the proof.

An Approximating Differential Equation

If Eq. (1) is written in the form E(k + 2Ak) -2E(k + Ak) + E(k) 2 E(k + Ak)-E(k)(Ak)2k +1 Ak + k(k E(k) = 0, Ik(k + 1)


414

with Ak = 1, then the differential equation 2 2 e" + - k(k+ e=O, (10) could be regarded as an approximation to Eq. (1).

In Eq. (10), k is regarded as a continuous variable. It can be shown that the general solution to Eq. (10) is a linear combination of the functions F( l+i Vx/1- i.2, k and ( 2 2 2,1 + where F denotes the Gauss hypergeometric function. Hence, for large k, the solution to Eq. (10) is a linear combination of the functions el and e2 where ei= - cos[ i log k + 0 e 7L2 kJ3 /2 and e2=- sin[ -log k ]+ ( k/

Regarding Eq. (10) as an approximation to Eq. (1), it can be shown from the theory of difference approximations to differential equations that if ko is fixed and E(ko) = e(ko) as well as E(ko + 1) = e(ko + 1), then for k > ko we have Ie(k)-E(k)\< ,e(~) [e,] (11) e(k) I 41 e(k)I e where ko< 5 < k. Thus, while it is true that e" (W) -x oo, the exponential factor in Eq. (11) prevents Eq. (10) from being a very good approximation to Eq. (1) for arbitrarily large k. This difference equation can also be discussed, perhaps more successfully, by using generating functions.


415

References

1. C. Domb, "Self-avoiding Walks on Lattices," to be published in Advan. Chem. Phys.

2. W. Feller, An Introduction to Probability Theory and Its Applications, (John Wiley and Sons, New York 1968), Vol. 1, 3rd ed.

3. W. Feller, An Introduction to Probability Theory and Its Applications, John Wiley and Sons, New York 1966), Vol. 2.

4. L. V. Gallagher and S. Windmer, "Monte Carlo Study of Flexible Branched Macromolecules," J. Chem. Phys. 44, 1139 (1966).

5. P. Stein, private communication. 6. S. M. Ulam and J. von Neumann, "Random Ergodic Theorem," Bull. Am. Math. Soc. 51, 660 (1945).

7. H. R. Pitt, "Some Generalizations of the Ergodic Theorem," Proc. Cambridge Phil. Soc. 38, 325 (1942).

8. P. Revesz, "A Random Ergodic Theorem and its Application in the Theory of Markov Chains," in Ergodic Theory, (Academic Press, New York, 1963) Fred. B. Wright, Ed.

9. K. S. Miller, Linear Difference Equations, (W. A. Benjamin, Inc., New York, 1968).


416

Everett and Ulam in Madison, Wisconsin, in 1941.


417

14—
The Entropy of Interacting Populations:
With C. J. Everett (LA-4256, August 1969)

This report is a novel probabilistic approach to defining distributions of functionals of thermodynamical systems, including for instance, interactions between radiation and particles. (Author's note).

Abstract

A study is made of interacting populations of "particles" which closely parallels the Boltzmann kinetic theory and the Planck-Einstein-Tolman treatment of radiation interacting with matter. The analogy is perhaps surprising since it appears that our postulates do not embody the physics of such systems, but are nevertheless quite reasonable, and applicable to similar situations. While we have deliberately used the language of physics for its intuitive appeal, one may well consider for example the implications of replacing "particles" by "people" and "energy" by "wealth." It is especially interesting that the "reversibility paradox" is excluded by confining the discussion to a scalar "energy" rather than a vector "velocity." As Boltzmann might have said, "Go ahead, reverse the energy."1


418

I—
The System of "Particles"

1. The Interaction Postulates. We consider a system of "particles," of which there are N(E, t)dE possessing "energy" on (E, E+dE), 0 <E < oc, at time t > 0, and undergoing c(E1, E2)N(E, t)dE1N (E2,t)dE2 "collisions," or binary El, E2-interactions per unit time, between particles on the indicated ranges, where c(El, E2) > 0 is a function of form f(Ei + E2). We further assume that the (probable) number Q(E1, E2, E)dE of particles emerging on (E, E + dE) from such a collision has the properties

V1. V)(El,E2, E) > 0; 0 < E < E1 + E2 , 0 < (El&E2) < ox 02. O (E1,E2, E)= (E2, El, E) 33. (E1, E2, E) = ((E, E1 + E2 - E, E1) Eil+E 24. / (E1,E2, E)dE = 2. Jo From 42, 4'3 follows the symmetry 05. O(E 1, E2, E) = (E1, E2, E1 + E2,-E) of the p-distribution about its midpoint, and hence from 44 follows El +E2 ?6. fEO(E1, E2, E)dE = E1 + E2

The uniform distribution '(E 1,E2, E) = 2 /(E1 + E2) is by no means the only one having these properties. For example, 4(E1 + E2)- 1 sin2 27r (El + E2)-l (El + E), 0 < (E1&E) < (E1 + E2)/2, symmetrically extended to the full interval (0, E1 + E2), yields such a p-function (cf. Appendix I).

2. The Boltzmann Equation. The above assumptions imply the equation ON(E)/at - c(E, E 2)N(EI)N(E 2)f(E, E2 E)dEldE 2E+E2>E 2 - c(E, E2)N(E)N(E2)dE2=B- B 2B 3(1)


419

for the change of N(E, t) with time. (N.B. Hereafter all functions N(E, t) are written N(E) for brevity.)

For a solution of (1), we verify at once the conservation laws d 1" d 0° d- N(E)dE = 0 = -EN(E)dE dt Jo at Jo showing the values No,Eo of these integrals to be constant in time.

This rests on the fact that the properties of c and Q imply 0 B3dE = = EB3(E)dE, (2) that is to say j BidE = j B 2dE and a EBi(E)dE = EB2(E)dE. (3) For, these B1 integrals may be written in the form /oo /00 El+E2 JE1 JE20JE=0 and the relations in (3) follow from 44, and from V6.

We next throw (1) into the equivalent form oo) EE+ E2 1 ON(E)/t== E 2-c(E, E2))(E, E2, El) (4) E2=0 JE1=0 2 {N(E1)N(E + E2- El) - N(E)N(E2)}dEidE2 -B 1- B2 _B 3.

To see this, one makes the transformation E1= F1, E2= E + F 2 - F 1 in B 1, using c(E,E2) = f(E1+ E2) and 03, and uses V4 to express B2 as a double integral.

3. The Boltzmann H-function. For a solution N(E) of (1,4) we now define (ignoring the nice question of units) HN(t) = N(E) log N(E)dE, Jo

for which dUNC- HN(t) = EN(E)/Qt log N(E)dE= 1 B3 log N(E)dE


420

(since No is invariant). Using the form of B3 in (4), and making the change of notation (E, El) -- (E1, E), the last integral becomes roo oo rEl+E21 1=/ / / -c(E,E2,) (E, E2,E) JE1=O JE2=O JE=O 2 {N(E)N(E1+ E2- E)- N(E1)N(E2)} log N(E1)dE 2dE 1

Due to the symmetry in E1, E2, this is equal to the same integral with N(E2) replacing N(E1) in the log factor. Averaging the two results, we obtain 0fooo froo E1+E 2 1 B3 logNdE= / / -c(El,E2, )(E, E2, E) =0 JE=o JE= JE=O 4 {N(E)N(E1+ E2,-E)- N(E1)N(E2)} log N(El)N(E2)dEdE2dE.

If we now make the transformation E1 = F, E2 = F1+ F2 - F, E = F1 , use c(E1,E2) = f(E1 + E2) and 03, and change notation (F, F2 , F) -+ (E, E2, E) we obtain the same formula with the log factor replaced by - log N(E)N(E 1+ E 2- E). Averaging once more, we obtain finally

HN(t) = j aN(E)/Ot log N(E)dE = B3 log N(E)dE roo foo fEl+E2IN(E)N(E1+ O - E)- N()N(2)} (5) {N(E)N(Ei + E2-E) -N(Ei)N(E2)} (5) log N(E)N(E1 + E2 - E)dE2dE N(E1)N(E2) with equality iff (at the time t in question), {N(E)N(E 1+ E2 - E) - N(E1)N(E2)}0-0; O<E<Ei + E2<oc. (For, c > 0, 0 > 0, and x - y < (>)0 as log x/y < (0) .

Moreover, it is well known that a (continuous) function N(E) satisfies (6) iff it is of form 3e-~E (cf. Appendix II).

4. The Steady State Solution. If No(E) is a time independent function, the following conditions are now seen to be equivalent:


421

S1. No(E) satisfies (4) (steady state solution) S2. For N = No(E), B3 (E) _ 0 in (4) S3. For N = No(E), B3s (E) log N(E)dE = 0 in (5) S4. For N = No(E), {N(E)N(E1+ E2 - E) - N(E 1)N(E 2)}- 0 in (6) S5. No(E) = Pe-" E (with a =No/Eo, 3 = Noa if we stipulate the totals No, Eo).

It is here apparent that we do not have Boltzmann statistics, for which No(E)dE = 3E1/2 e- E, with a = (3/2) (No/Eo) and d = 2Noa3 /2 /7l1 /2 (see ref. 2).

5. Time Dependent Solutions. If N(E, t) is a solution of (4), with invariants No, EO, which is not the solution No(E) in S5 at any t, then HN(t) = fN(E) log N(E)dE Jo is strictly decreasing, since HN(t) < 0, t> 0, in (5). Moreover, we shall now prove HN(t)>HNO; t>0 (7) and hence the existence of lim HN(t) - HN> HNo. To verify (7), note first that, if M(E) is either N(E) or No(E), then J M log NodE = M (log 3 - aE)dE = No (log - 1), because both have the same totals No, Eo. Since these integrals have the same value, we have HN(t) - HNO= N log NdE- j No log NodE = {N log N/No + No - N} dE > 0. Jo

(For the integrand is of form f(x) = x log x/xo+xo-x, with f(xo) = 0, and f'(x) = log x/xo<(>) 0 as x <(>) xo. Thus HN(t)- HNO > 0, and if equality held for any to, we should have N(E, to) - No(E).)


422

If, in addition, N(E, t) is sufficiently well-behaved, with the limits lim HN(t) = 0, and lim N(E, t) = N*(E), we may conclude from (5) t-—+ct+00 that N*(E) = No(E), i.e., the time dependent solution N(E,t) of (4) approaches the steady state.

We do not investigate the existence of these limits; apparently they have not been established even in the simplest kinetic theory. (See however ref. 1.) It is clear that the second limit implies the first, since (5) then shows the existence of lim Hv(t) = C, and by the theorem t-—oo of the mean, HN(T) = (HN(t +A t)- HN(t))/At - (H N-Hk)/At 0= C.

II—
A Linked System of "Particles" and "Photons"

6. Interaction Assumptions. We now consider a system of "particles" and "photons," of which there are respectively N(E, t)dE and N(E, t)dE on (E, E + dE) at time t> 0, 0 < E < oo. Particles are subject to the rules of §1, while photon-particle interactions are governed by two positive functions A(E2, El), B(E2, El), 0 < E1 < E2 < oo, according to the Einstein postulates:

P1. N(E2 ,t)dE2B(E, E2)N(E - E2,t)dE gives the number, per unit time, of (E2, E2 + dE2) particles raised to (E, E + dE) by absorption of an (E - E2)-photon; 0 < E2 < E < oo.

P2. N(E2, t)dE2B(E2, E)N(E2- E, t)dE gives the corresponding number "induced" by the presence of (E2- E)-photons to drop to (E, E + dE) with creation of such a photon, and

P3. N(E2 ,t)dE2A(E2, E)dE gives the number of such particles spontaneously decaying to (E, E + dE), with creation of an (E2 - E)-photon, 0 < E < E2 < oc.

P4. The functions A, B are related by the equation A(E2, E) -= B(E2, E1)R(E2- E1) where R is the function of the energy difference. (In the Planck-Einstein case, R(F) = 87r(hc)-3 F2 , and N, N are numbers per cm3 .)

7.The "Boltzmann-Einstein Equation." The analogue of (1) in §2 is seen to be the linked system (cf. (4) for B3)


423

E aN(E)/Ot =B3 +N(E 2)B(E, E 2)N(E - E 2)dE 2fo + N(E2){B(E 2, E)N(E 2 - E) + A(E 2, E)}dE 2-J N(E)B(E 2, E)N(E 2 - E)dE 2E - N(E){B(E, E 2)N(E - E 2) + A(E, E 2)}dE 2Jo ON(F)/at = - N(E2)B(E 2+ F, E 2)N(F)dE 2Jo + j N(E 2){B(E 2, E 2- F)N(F) + A(E 2, E 2- F)}dE 2.

Combining integrals, bringing all lower limits to 0, making an inversion on the proper integral resulting, and using P4, one can show that aN(E)/atE B3- B(E, E - F){N(E)[N(F) + R(F)]-N(E - F)N(F)}dF Jo + B(E + F, E){N(E + F)[N(F) + R(F)] - N(E)N(F)}dF _ B3- B4 + B5 (8) aN(F)/at =J B(E+F,E){N(E+F)[N(F)+R(F)]-N(E)N(F)}dF _ B6 . We next verify, for a solution N, N of (8), the relations d /N(E)dE = 0=d { /CEN(E)dE +j FN(F)dF and hence the invariance of the particle number No, and of the total energy £o= Eo+ Eo. For the first, we have to prove B 3dE - B 4dE+ / B 5dE = 0 . But the first integral is zero by (2), and making the transformation E = E' + F', F = F' (9)


424

on the second shows it equal to the third. For the second relation we must show that 00oo /oo j EB3dE -EB4dE + EB 5dE + FB 6dF= 0. Here, the first is zero by (2), and the transformation (9) on the second makes the result clear.

8. The H-function. For a solution N, N of (8) we define3HN,(t)- j N(E) log N(E)dE + {N(F) log N(F) - [N(F) + R(F)] log [N(F) + R(F)]}dF with derivative d HN (t) = t ON/Ot log NdE + ON/Ot log NdF —-t / N/( t N+R = Bj B log NdE -B 4 log NdE (10) +FB5 log NdE +B6 log - dF. Jo Jo N +R

We know the first integral from (5), and making the transformation (9) on the second shows (10) to be HNV(t) = B3 log N(E)dE - J B(E + F,E){N( E + F )[ N(F) + R(F)] ( -N(E)N(F)} log N(E +F)(F)R(F) N(E)N(F) with equality iff (at the time t in question), both {N(E)N(E1+E2-E)-N(E1)N(E2)}- 0; 0 < E <E1 +E2< oo and {N(E+F)[N(F)+R(F)] -N(E)N(F)} 0; 0 < (E&F) < oo . (12)


425

(cf. §3.) As we have seen, the first of these implies N = 3e- ', and hence from the second follows N = R(F)(e"F - 1)-1 . Conversely, such a pair satisfy (12).

9. The Steady State. For a pair of time independent functions No, No, we now find that the following are equivalent: S'i. No, N satisfy (8) (steady state solution) S'2. For N = No(E), N = No(F), B3(E) - B4(E) + Bs(E) - 0 - B6(F) in (8) . S'3. For N = No(E), N = No(F), / { B3(E)- B4(E) + B5(E)} log N(E)dE + B6(F) log (F)) dF = 0 in (10), (11) Jo .( , N(F) + R(F) S'4. For N = No(E), N = No(F), {N(E)N(E1 + E 2- E) - N(E1)N(E 2)} 0 and {N(E + F)[N(F) + R(F)] - N(E)N(F)} 0 in (12). S'5. No(E) = 3e- E, No(F) = R(F)(eF- 1)-

Here, stipulation of the totals No, So determines ca, 3, and hence also No, Eo, Eo. If we take the special function R(F) = 8r(hc)-3F2 , we find again a = No/Eo, /3 = aNo, and No = 167r((3)(hc)-3 3-3 , Eo = 8/15 7r(hc)-3 a-4 . Note that for us, 1/a = Eo/No and not (2/3)Eo/No (=kT).

10. Time Dependent Solutions. If N(E, t), N(E, t) is a solution of (8), with invariants No, o, which is not the solution of S'5 at


426

any time t, then HN,N'(t) is strictly decreasing, and bounded below by HN ,, as we now show.

Note first that No(F) = R(eF - 1)-I implies No + R = NoeaF. Hence, if M, M is either the pair N, N or the pair No, No with the same two invariants, we find

J M log NodE +M log NodF -(M + R) log (No + R)dF = j M (log 3- oE)dE + M log NodF - M (log No + aF)dF - R log (No + R)dF = No log 3- Eo -R log (No + R)dF.

The integrals having this common value, we may write HN,N(t) - HN,o/N log NdE + N log NdF - (N + R) log (N + R)dF - { jNo log NodE+ No log NodF - (No + R) log (No + R)dF = log NNdE log N/NdEN log N/NodF Jo Jo -I (N + R) log RdF Jo N0+R = J{N log N/No + No - N}dE +0 {N log TN/No - (N + R) log N+R}dF> 70LNo +R >

We have seen (§5) that the first integrand is non-negative. The second, of form g(y) = y log y/yo - (y + R) log y + R yo + R has g(yo) = 0, and g'(y) = log (1 R/yo) - log (1 + R/y) <(>)0 as y< (>)yo and thus is also > 0. Hence HNy,(t) > HN N, by the argument of §5, and we have the existence of lim HN (t) H - > t- oo i, N,N No,NO


427

If we assume the limits lim H' -(t) = 0, lim N(E,t) = N*(E), t-+ooN,N t-ooc lim N(F,t)= N*(F), then from (11) follows t-*oo N*(E) = No(E) and N*(F) = No(F) as defined in S'5. The remark at the end of §5 applies here as well.

Appendix I

It is apparent that definition of a G-function ?((E1, E2, E) is equivalent to defining a function f(Ei, S, E) such that fl. f(El,S,E) > 0; 0 < (E1&E) < S < oo f2. f(E1,S,E) = f(S- El, S,E) f3. f(El,S,E)= f(E,S,El) f4. f(E1,S, E)dE= 2.

For, f(E1 ,S,E) =- (E1,S - E, E) and E(E1,E2, E) f(E1, E1 + E2, E) serve to define each in terms of the other.

Moreover, given an f-function, the function h(E1, S,E) f(E1, S, E) for E1, E on (0, S/2) satisfies hi. h(E, S,E) >0 h3. h(E, S,E) = h(E,S, El) s/2 h4. / h(E 1, S, E)dE = 1. Jo

Conversely, given such an h-function defined for all E1, S, E with 0 < (E1&E) < S/2 < oo, we may extend h to an f-function by the consistent definitions f(El, S, E) =h(E1, S, E), E1E(0, S/2),Ee(O, S/2) h(S - El, S, E), Eli(S/2, S), Ee(0, S/2) h(E, S, S - E), Ee(O, S/2), Ee(S/2, S) h(S- E1,S, S- E), Ele(S/2, S), Ee(S/2, S).


428

The function h(Ei, S, E) = 4/S sin2 27r(E1 +E)/S satisfies hi, 3,4 and therefore defines a p-function, as indicated.

Appendix II

Obviously N(E) in (7) satisfies (6). Conversely, setting E = 0 in (6), and L(E) = log (N(E)/N(O)), we must have L(E1 + E2) =L(E) + L(E2 ); El, E2 > . Then for integers m,n > 1, L(m 1) = mL(1), L(1)=L(n-) =nL(), and V n\n L () = L( m - ) = mL ) =-L(1). \n n)n n If fractions m/n -- E > 0, we have L(E) = L (im n) = lim L - = lim m L(1) = L(1)E Ln nlim) n) n = log (N(E)/N(O)). Thus N(E) = N(O) exp L(1)- E = oe-E.

References

1. M. Kac, Probability and Related Topics in Physical Sciences, (1959) Ch. III, Interscience Publishers, Ltd., London. 2. E. H. Kennard, Kinetic Theory of Gases, (1938) Ch. II, McGrawHill Book Co., Inc., New York. 3. R. C. Tolman, Statistical Mechanics, (1927) pp. 198-203, Chemical Catalogue Co., Inc., New York.


429

15—
Some Elementary Attempts at Numerical Modeling of Problems Concerning Rates of Evolutionary Processes:
With R. Schrandt (LA-4573-MS, December 1970)

This report is an attempt to study mathematically models of evolution showing the qualitative difference between the rates of development in sexual and non-sexual processes. A further number of papers were stimulated by this report.

See also the earlier Proceedings of a meeting at the Wistar Institute on "Mathematical Challenges to the Neo-Darwinian Interpretation of Evolution" held in April 1966 where a paper of mine called "How to formulate mathematically problems of rate of evolution" was published. (Wistar Institute's Symposium Monograph No. 5, pp. 21-33, 1967.)

The work of the above report, done with Schrandt, was performed before the Wistar meeting. (Author's note). *

Abstract

An account of numerical work prepared on electronic computers-the problems concerned the ratio at which favorable mutations spread throughout the population

* This report is also reprinted in the proceedings of a Los Alamos conference dedicated to Stan Ulam, Evolution, Games and Learning that occurred May 20-24, 1985 in Los Alamos. (Eds.)


430

subject to a "survival of the fittest" mechanism. Models of asexual reproduction showed the expected linear growth in the number of improvements. The bisexual process greatly accelerated the average acquisition rate.

I—
Introduction

In this report, we shall present an abbreviated account of calculations performed by us in the mid 1960's. These calculations were preliminary and intended merely as the zeroth approximation to the problem concerning rates of evolution-a process which we have here severely stylized and enormously oversimplified. A mention of the results of such calculations in progress at that time was made at a meeting in 1966 at the Wistar Institute in Philadelphia by one of us. The discussion there, as reported in the proceedings of the meeting, was rather frequently misunderstood and the impression might have been left that the results somehow make it extremely improbable that the standard version of the survival-of-the-fittest mechanism leads to much too slow a progress. What was really intended was indications from our computations-simple minded as they were-that a process involving only mitosis, in absence of sexual reproduction, would be indeed much too slow. However, and most biologists realize it anyway, the Darwinian mechanism together with mixing of genes accelerate enormously the rate of acquiring new "favorable" characteristics and leave the possibility of sufficiency of the orthodox ideas quite open. Numerous requests addressed to us for the elucidations and details of the numerical setup made us decide to give this account of our computations.

Perhaps the greatest uncertainties-the strongest objections to any calculation of the sort described in the pages that follow-must concern the values of the constants which are assumed initially or should indeed concern even their meaning in the interpretation we have chosen. We have tried to interpret the survivability of individuals by changes in the number of offspring which carry the species in time measured by a discrete succession of generations. The value of "favorable" mutations was mirrored in the increased proportion of offspring. The same, needless to say, is true of the frequency of favorable mutations. We have disregarded the lethal and the unfavorable mutations. We assumed a special form of the advantages which an individual holds relative to the rest of the population by comparing the number of his "improvements" with the average number present at that time in the


431

population. We assumed a proportionality law, again arbitrarily. In some problems we have penalized the individuals whose score in the improvements was less than the average; in some of the problems we considered only the positive excess as leading to a greater number of offspring. Another debatable procedure is the way we have handled the growth of the population by normalizing periodically the total number of individuals to a constant figure. If the number of individuals holding a certain number of "advantages" after normalization dropped below 1, we summarily dismissed such representatives. It should be stressed here strongly that this procedure makes it very hard to find an analytic model equivalent to our numerical work. We do not have any clear idea of the necessary scaling laws concerning the effect of changing the constants alpha, kappa (defined below) and the size of the population. All this is true throughout all the problems. In the calculations involving combinations of genes from both parents, further assumptions were made of independent inheritance of the "improved" genes, etc. As will be seen in the description of the individual problems, we have chosen successively less unrealistic assumptions. Clearly in the counting of the new improved genes coming from the "father" and from the "mother", one has to take care not to count the same "improvements" from each twice. As will be seen, this precaution has the effect of slowing down the at first seemingly exponential growth

into something more like a quadratic function of time (we have studied throughout the calculations the number of "improvements" present in the population as a function of time, that is to say of the generation number).

In order to get a feeling for the dependence of the results on the values of the constants, more such computations must be tried in the future and additional variables have to be considered-certainly the "kind" of the improved or favorable new gene has to be taken into account. A most important question concerns the existence of new genetic instructions involving perhaps logical prescriptions, that is to say recipes for operations and actions of the components rather than merely their chemical composition. An improvement in programming or interpretation of action by a gene or group of genes may be equivalent to a very large number of the "favorable changes" with which our computations have dealt so far.

Our first problem with the code name ADAM concerned asexual reproduction. We feel that the time scale to acquire a characteristic in an organism, such as the development of an eye, by a sequence of consecutive favorable mutations, is extremely long if one does not resort to something like sexual mating in the population. In the following rough and elementary estimate, the constants assumed are crude, but err toward "faster" evolution than what is to be expected.


432

Definitions: Let T = time of existing life (~109 yrs.); T = time for one generation say (-3 days); G = the number of generations=T/T=1011; N= the existing population size (-1011 ); K = the total number of "favorable mutations" necessary to produce the desired characteristic (--106); a = the chance of a favorable mutation per individual per generation (~10-1°); y = the "value" of a single favorable mutation expressed as a survival rate (~10-6). That is, an individual having this mutation would have (in expectation) (k + 7y) descendants, versus k descendants for an individual not having this mutation.

Therefore, in the first generation, the expected value of the population that could have one mutation is N ca = 10. In 1/7y =106 generations, a sizeable portion (approximately l/e) of the population would have this mutation, and in about 107 generations, most of the population would have it. But the time to acquire all the mutations would be about K. 107, or 1013 generations, which is like the age of the universe.

II—
ADAM

In order to study, on a computer, the rate at which a population can acquire a sequence of mutations we needed a set of more amenable parameters, which, it is hoped, could eventually be scaled down to the set given above. For the first problem, called ADAM, we used the following set: N = 100, K = 100, a = .02, and 7 = .1.

The method was as follows: In any one generation, each member of the population N had a probability a of acquiring one new independent mutation. Each individual then had one child, with a probability of extra children given by y(Kg - Ko), where Kg = the total number of different mutations possessed by this individual and, Ko = the minimum number of mutations possessed by any individual in the population. (If n/-y < Kg - K <(n + l)/7y, n = 1, 2, 3, ..., the individual had n extra children, and a probability of y(Kg - Ko- n/y) for an (n+ l)th child.)

The children were then given the number of mutations possessed by their parent. The parent population then was assumed to have died, and the children formed the new population. The numbers of


433

mutations in the population were recorded in categories by counts ni, i = 0, 1, 2, 3, ..., where ni = the number of individuals having a total of i mutations, and Eni =N, the size of the new population. In the next generation, each member of this new population had the same probability a of acquiring another new mutation, and had children according to the above recipe. These children with their number of mutations recorded then became the population for the succeeding generation, etc.

It was necessary to renorm the population periodically, since the number of children increased in each generation. This was done by reducing the count ni in each category by 1/2, to the nearest smaller integer, when the population reached 200, which is double the initial population.

The weighted average number p of mutations possessed by the population was computed for each generation from the categories ni. This average was then plotted as ordinate against the generation time as abscissa. The slope of this curve is then the rate at which the population can acquire a sequence of mutations, as a function of the parameters a and y.

This rate of acquiring mutations turned out indeed to be linear. For the parameters a = .02 and y = .1, the slope was about .1; or a majority of the population acquired an additional mutation every 10 generations. Several problems were run with smaller values of y, that is, y' = f -y, where f = a fraction. The graphs were all linear, with decreasing slopes, which decreased more closely with Vf than with f itself. There was no appreciable change in the slope by doubling the initial population to 200. Figure 1 shows a plot of 3 cases: N = 100, a = .01, and y = .1, .05 and .01 respectively.

A second version of ADAM was run with the Ko in the probability recipe defined as the average number of mutations in the population, instead of the minimum number. Those individuals having fewer than the average number of mutations (Kg < Ko) had a probability of 7(Ko - Kg) of no children, (if Ko- Kg> 1/7, they had no children deterministically), and a probability of 1 - 7(Ko - Kg) of one child. The individuals with more than the average number of mutations had the same probability for extra children as the one defined above. This version required fewer renormings of the population and it led to a somewhat greater slope than the Ko = min. recipe. The graph was still linear. Figure 2 illustrates the two versions with the same parameters N = 100, a = .02, and - =.1. The recipe with Ko = average was used in all subsequent problems.


434

NUMBER OF GENERATIONS Fig. 1.

III—
EVE

In the second class of problems, we introduced reproduction in the population, and also, what seems important, fluctuations in the manner in which the offspring received mutations from their parents. This problem was naturally called EVE. The initial parameters used were the same as in ADAM. The population acquired new mutations according to the probability a, as before. A random mating of the population N was then defined, resulting in N/2 pairs of individuals. (For N odd, the population was arbitrarily reduced by one to obtain (N - 1)/2 pairs.) Each pair then constituted a set of parents. The number of children from each set was again determined by the probability function a(Kg - Ko). Here Kg = the total number of mutations possessed by the set of parents, and Ko = the average total number of mutations of all the pairs of this mating.


435

* K, MINIMUM ^ K. AVERAGE NUMBER OF GENERATIONS Fig. 2.

The children were produced in pairs. Thus if a set of parents had Kg > K0 , they had two children for certain, and with a probability given by y(Kg - Ko - n/y), n= 1, 2, 3, ..., where n/y < Kg - K < (n + l)/a), of 2n extra children. If for the set of parents, Kg < Ko, they had, with the probability y(Ko - Kg) no children, and with the probability 1 - y(Ko - Kg) two. Again if K - Kg> l1/, they had no children.

The number of mutations acquired by each child was obtained under a binomial distribution centered about the average number of mutations of the parents. If the parents had a count of (x + y) mutations, the number for each child was obtained under the distribution centered at (x + y)/2, with its minimum at zero and maximum at (x + y). The number of mutations for each child separately was determined under this distribution. Thus a child could possibly obtain as much as the sum of numbers of mutations of its parents. This recipe


436

involving fluctuations in the inheritance would, we thought, speed up the rate of acquisition of mutations (compared to always giving the offspring (x+y)/2 favorable mutations).

The parents having died, the children became the new population. As before, they were classified in terms of counts of the number of individuals having each number of mutations. As before, the individuals had the chance a new mutations, and were mated at random in N/2 pairs. These pairs then had children whose mutations were again determined from the binomial distribution, and these children constituted the population for the following generation, etc.

In our random mating, the sex of the individual was not distinguishable. No attempt was made to keep members of the same "family" from mating. Their number of mutations was not necessarily the same, since it was determined separately for each child under the probability distribution. (The identity of the family was lost once the members were classified according to their mutations.)

The norming of the increasing population was done in the same way as in ADAM: All categories were halved when the population doubled.

The rate of acquiring mutations turned out to be much faster than in ADAM, and appeared to be exponential. In Fig. 3 we have plotted this rate on a semilog scale for four problems: N = 100, a = .02, and y = .1, .05, .025, and .01 respectively. The reduction in acquisition rate with y was somewhat similar to that in ADAM.

In these problems no attempt was made to keep the histories of the different mutations. We define a to be the rate of acquiring new mutations, but we divide the population only in terms of numbers of individuals having a fixed number of mutations. The children too acquire mutations only under the distribution of the total count of the parent's mutations. Thus one might suspect that the exponential rate of acquisition could be due to a doubling of the identical mutations possessed by both parents.

A—
EVE-PQ

To correct for this, we computed an expected number v of mutations that the parents should have in common. This is given by v = (T1 T2)/S where T1 = the total number of mutations possessed by one parent, T2 = the total number of mutations possessed by the other parent, and S = the total number acquired by the entire population. (The


437

NUMBER OF GENERATIONS Fig. 3.

total S is accumulated as each individual in each generation has the probability a of acquiring a new mutation. If Ns is the average population in each generation, after k generations, Sis approximately N . k · a.)

We allotted this number v to the children for certain and then played a game of chance for additional mutations using the reduced binomial distribution centered at the midpoint of the total count of independent mutations possessed by the parents. In this manner, we count more correctly, the number in common only once.

This method leads to a slower rate of acquisition; it is still exponential in the beginning but tails off to something like a quadratic function, as more mutations must of necessity be held in common. Results from four problems are plotted in Fig. 4 on a semilog scale. The


438

NUMBER OF GENERATIONS Fig. 4.

cases PQ1 and PQ2 have N = 100, a = .02 and y = .1 and .05 respectively. The case PQ3 is the same as PQ2 except that the sample size was doubled, N = 200. It had a somewhat higher value for p,, but the slopes are like those of PQ2. The last case, PQ4, had the same parameters as PQ3 except that a was cut to .01. This problem was run to 290 generations and showed a definite bending over of the curve to almost a linear rate of acquisition after 230 generations as the population had more and more mutations in common.

B—
EVE-PM

In the two versions of EVE already discussed, the population mated at random uniformly. We next considered a version where the mating was not uniform, i.e., preferential in the following sense: We arbitrarily


439

divided the population equally into three groups ranked according to their number of mutations, i.e., the first had the individuals with the greatest number of mutations. We specified that 3/4 of the population in each group mate at random uniformly within their own group. The remaining 1/4 would mate at random with equal probability from either of the two remaining groups. For example, if we name the groups A, B, and C, an individual from group A would have a 3/4 chance of acquiring a mate from group A, and a 1/8 chance of a mate from each of the groups B and C. We called this problem EVE-PM. The EVE-PQ method was used to estimate the mutations in common and to count them only once.

The rate of acquisition of mutations should reflect this preferential mating. A comparison of the curve PM1 of Fig. 5 with PQ1 of Fig. 4, (both with the same parameters), shows indeed that the initial acquisition rate of mutations under preferential mating is much higher than under random mating. But the curve PM1 tails off very rapidly after 100 generations, and at 200 generations the mutation rate is almost the same for the two problems. This indicates that at this point most individuals have acquired most of the mutations available in the total population, so that preferential mating has the same effect as the uniform mating. Our small sample size of 100 in part causes this phenomenon to occur so soon.

The computing time goes up rapidly with the initial population; one problem of preferential mating with an initial sample of 400 was run to 96 generations. The result is shown in the curve PM2 of Fig. 5, with the same parameters as PM1 (except for the population size). The preferential mating has a greater initial effect in the larger population, but the slope of this curve too is beginning to decrease.

The mutation acquisition rate in all the EVE problems of sexual reproduction can apparently be divided into three stages: * An initial exponential rate, as few mutations are held in common (compared to the size of the population). * A rate, roughly quadratic, as more mutations are held in common by the parents. This number in common is approximated by computing the expected intersection of the number possessed by each parent, assuming the parents had acquired them independently. * A terminal rate, almost linear, as most of the mutations in the population are in common. If all the mutations were in common, the subsequent rate of acquisition must be linear, since new mutations are obtained only by the a rate of new acquisitions, which is a linear function.


440

NUMBER OF GENERATIONS Fig. 5.

C—
EVE-POS

In order to check these assumptions somewhat, a problem was run in which histories were kept of each new mutation. This was done by representing each new mutation as a bit position in a matrix of words in the computer. Each individual in the population and its children had their mutations recorded in such a matrix, that is to say mutation was specifically identified.

We called this problem EVE-POS. The sexual reproduction scheme was the same as before. The population was mated at random, and each set of parents had a probability of having extra children given by y(Kg - Ko). In both Kg and Ko the actual mutations in common were counted only once. This number was known for each mating, so no approximation for the expected intersection was needed.


441

If both parents possessed the i-th mutation, it was then given to each of the children. If neither parent possessed this mutation, it was not acquired by any of their offspring. If only one parent possessed this i-th mutation, each child had a probability of 1/2 of acquiring it.

The norming for this problem was different from before, namely, when the population doubled, each individual was given a half chance of surviving.

With this recipe for receiving mutations, any individual mutation can be lost to the population, since the parents die off in each generation. For example, the k-th mutation is initially acquired in the a recipe by one individual. If this individual and its mate have no children, the mutation is lost. If they have n children, there is a probability of (1/2)" that none of the children get it, in which case it is lost in the next generation. There is a chance that this particular mutation will be lost in each subsequent generation, although these probabilities are getting smaller. The k-th mutation is initially acquired only once, and by only one individual. The mutations that survived were "packed" in the bit positions of the matrices. This relieved the space limitation in the memory and allowed the problem to be run much further in time.

It was discovered that approximately 80% of the total of new mutations acquired by individuals were "lost" after matings in subsequent generations. Thus the parameter a in problem EVE-POS has a different meaning from that in EVE-PQ. In EVE-POS it denotes the probability of an individual acquiring a new mutation. In EVE-PQ it denotes the probability of acquiring a mutation that survives and will eventually be acquired by the entire population.

Figure 6 shows the mutation rate for two cases of EVE-POS, plotted on a log scale. The parameters are a = .02, y = .1, and N = 100 and 400, respectively. The case POS1 (N = 100) was run to 251 generations, and POS2 (N = 400) to 90 generations.

Note that the acquisition rate and the values of p are considerably smaller in POS1 than in PQ1, which has the same parameters. This is because of the different interpretation of the parameter a. A reasonable comparison with POS1 would be to run PQ1 and compute our v= (TiT 2)/S', where S' = .2S, since about 80% of the mutations are lost.

The acquisition rate for POS1 becomes linear after about 100 generations. The relatively small sample size is a contributing factor. Some statistics were compiled on the distribution of the available mutations. They are given in Table I.

An estimate was made of the expected number of mutations held in common compared to the actual number held by an average set of parents. The expected number was computed assuming that each


442

NUMBER OF GENERATIONS Fig. 6.

parent had the average number p of mutations. Then v =p2 /S', where S' = the total number of surviving mutations. The actual number held in common was about 1.3 times this expected number, after 251 generations.

In the problem with the larger sample size of 400 (POS2), the mutation acquisition rate remains approximately quadratic through 90 generations. At this point 18% of the mutations were held by at least 50% of the population, versus 36% for the population of 100. For this larger sample size, at 90 generations, separate distributions were kept of mutations acquired in the first 45 generations, versus those acquired in the last 45 generations. This data and some statistics on those mutations held in common are given in Table II.


443

15Elementary Attempts at Numerical Modeling of Problems of Evolution

TABLE I Problem-EVE POS1-N =100, a = .02,y = .1 After 90 generations, there were 55 surviving mutations out of 266 acquired mutations: 79% were lost. Distribution of the 55 mutations: Min. - 15 (least number held by any individual) Aver.- 21 (average number held-this is the number p that is plotted) Max. - 27 (greatest number held by an individual) 36% of the mutations were held by at least 50% of the population 1.8% of the mutations were held by the entire population.

After 251 generations, there were 130 surviving mutations out of 744 acquired: 85% were lost. Distribution of the 130 mutations: Min. - 87 Aver. - 94 Max. -101 74% of the mutations were held by at least 50% of the population 41% of the mutations were held by the entire population.

The actual number of mutations held in common was approximately 1.3 times the expected number.

TABLE II Problem-EVEPOS - N400,a .02, y. After generations, there were surviving mutations out of acquired: were lost. Distribution of the total mutations: Min. - Aver. - Max. - Distribution of mutations initially acquired in the first generations (: Min. - Aver. - Max. - Distribution of mutations acquired in the last generations (: Min. -Aver. - Max. -There were of the mutations held by of the population, held by of the population, and held by at least of the population. At generations, the actual number of mutations held in common was ap- proximately times the expected number.


444

The above problems give some indication of the rate at which a population can acquire mutations in terms of a finite and rather small sample size, and in terms of the procedures which we adopted for acquiring and transmitting of the mutations. These methods involved using rather large values of the parameters alpha and gamma. The problem remains to find scaling factors. The computing time on the old IBM-7094 for the problem POS2 was over one hour, for 90 generations of growth. The computing time for the PQ code was much faster, in the order of minutes, but the parameter a was in effect much larger than the one in the POS code.

IV—
Summary

The problem ADAM with asexual reproduction gave a linear rate of acquisition of mutations. In reducing the parameter y to f y, where f is a fraction, the acquisition rate was reduced by a factor more like f/7 than like f itself.

In problem EVE with sexual reproduction, the acquisition rate appeared to be exponential if the initial population were large enough. But with a small population, more of the same mutations were held in common by the parents. This caused the rate to change from an exponential to a "quadratic" and eventually to a linear one when most mutations were common to the majority of the population. The problem EVE-PQ involved approximating this number (our formula for v).

The advantage of preferential mating over random mating gave an initially pronounced increase in the acquisition rate, but this was soon offset by the smallness of the population. In effect, as more mutations were held in common, the range of the distribution of mutations became narrow. After that the preferential mating was not much different from the uniform one.

The EVE-POS problem (where we kept a history of the mutations) gave us a measure of the distribution of mutations as a function of their age. It showed that most of the mutations initially acquired by one individual were lost in subsequent matings. This caused a redefinition of the probability a in computing the expected number of mutations held in common. This problem also showed that the actual number held in common was greater than the estimated number v, by a factor of about 1.3 for the sample size of 100, and 2.4 for the sample size of 400. This is not too surprising, since the expected size of the intersection assumes independent sampling, whereas the mutations are acquired by something more like a Markov process.


445

16—
The Notion of Complexity:
With W. A. Beyer and M. L. Stein (LA-4822, December 1971)

This report is a study of complexity per se in certain algebraical systems. Much subsequent work seems to have been stimulated by these results. (Author's note.)

Abstract

The notion of the arithmetic complexityInlof an integernis defined in terms of the minimum number of additions, multiplications, and exponentiations required to combine l's to formn.The value of Inis calculated forn<2 10 . nis called complicated if In > nlI for everyn1<n.Of the first 19 complicated numbers, 14 are prime. A conjecture about a relation between complexity and entropy is proposed. Some computations are presented to support this conjecture.

I—
Introduction

In this report we discuss notions of complexity in some algebraic structures. These notions are also applicable to more general combinatorial situations that perhaps lack any algebraic pattern in the classical sense. We concentrate on a few special cases for which we studied and calculated a special notion of complexity. Essentially, we examined a special notion of complexity for ordinary integers with a little excursion on such a notion for integers modulo a prime.


446

The notion of complexity, in our view, is separate from, though associated with the idea of the amount of information or entropy of a system. We mention briefly a possible axiomatic approach to defining a real number called complexity for elements of a set or of a class on which certain operations are performed. These could be binary operations; our set could be a set of integers, and the operations could be addition, multiplication, and exponentiation, for example. It is this case that was examined on a computing machine and to which most of this report is devoted.

Another case would be a class of subsets of a given set, with allowed operations being the Boolean operations of union and intersection or union and complementation. One could add other operations, for example, the direct product of sets and also projection. This would correspond to allowing quantifiers in our theory. One can study a notion of complexity for vectors in a countable space or even in the continuum. An important study would be that of a relative complexity; that is to say, complexity of elements or "expressions" when the complexity of certain symbols is normalized to 1. In what has been sometimes called "speculation" on constants in physical theories, for example, the whole art seems to depend on the success of attempts to define some known important numbers, e.g., the dimensionless ratios Mprotonp = 1836.11... Melectron and e2 = 137.1... hc

by use of only a few artificially introduced constants which should be as "simple" as possible. (cf. the attempts by Eddington1 and some very recent ones by Good2 and Wyler.3 )

Considered "genetically," a mathematical theory resembles a tree in that one obtains, from a given number of symbols corresponding to "variables" and from a number of allowed operations, expressions that elongate by branching. The simplifications and abbreviations may then reduce the length of the expressions.

One could try to define complexity in a mathematical structure by postulating certain of its properties, somewhat like postulating properties of a measure.

Let the structure, S, consist of elements x, y, . It may be finite or infinite. We have in the set S a number of, say, binary operations R 1, R 2,... Rn. We want to assign a number c(x)> 0 to each element


447

x of S and to each Ri (i = 1... n) so that the following properties should hold.

a. If z = Ri(x, y), then c(z) = c(Ri(x, y)) <c(x) + c(y) + c(Ri)i = 1...n. b. For each element z, if z = Rj (x, y), we should have for one case at least c(z) = c(x) + c(y) + c(Rj). c. c(xo) = c(xi) = ...c(xn) = 1 for some preassigned elements o..., Xn in S.

Needless to say, one can define analogous desiderata for the case in which the operations are more general than binary ones.

Obviously, in the case to which our exercise is devoted, these postulates are satisfied. Moreover, they define the complexity uniquely if, as must be the case in general, the complexity was normalized for some elements. (In our case, we assume the complexity of the integer 1 to be equal to 0.) We hope to study this notion more thoroughly for the more general case and also to perform experiments to determine complexity functions for the case in which S is a class of sets. Ultimately, one would wish to discuss the complexity of genetic codes and biological organisms quantitatively.

("Integer" always means a positive integer.)

II—
Arithmetic Complexity of Integers

The arithmetic complexity Inl of an integer n is defined as the fewest number of operators: +, x, x x (addition, multiplication, and exponentiation) which combine 1's to form n. Thus, 1)1 = 0; 121 = 1 since 2 = 1 + 1; and 151 = 4 since 5 = (1 +1) x x (1 + 1) + 1 and not fewer than four operators with 1's will form five. Obviously, for a and b integers, la + bl, labl, and labl are each not more than lal + lbl+ 1. For an infinity of integers n, the relation In + 11 = Inl + 1 holds.

For the purpose of calculating the complexity of some integers, all correct formulas (up to some number of operators) involving +, x, x x, and the number 1 were enumerated using parenthesis-free notation on a computer. It required one hour of computer time to enumerate the integers with complexity <6. Ralph Cooper made the following observation. Each correct formula involving n(>0) operators is the composition of two formulas, one formula with nl operators and one formula with n2 operators such that n = nl + n2 + 1. One generates the integers of complexity n by first generating tables of integers of


448

complexity <n. One partitions n - 1 into nl + n2 in all ways and combines the integers of complexity n1 with the integers of complexity n2 to produce integers of complexity not larger than n. This method is considerably more efficient than the previous method. Table I lists the complexity of all integers < 210.

From the above construction, one sees that an upper bound el(k) to £(k), the number of integers of complexity k, is given by the solution of k fl(k + 1) = l(j)kl(k - j) , j=o with f1(0) = 1. The solution to this equation is given by el(k)= k -1()2k which implies that 2ke(k)< + 0(2kk-5 /2 )

Two additional forms of complexity have been considered and calculated.

a. Complement Complexity. To make complexity symmetric in 0's and l's, we introduce a slightly different complexity, the complement complexity K(yln). Define the complement operation C by C(xln) =2n - 1 - x. K(yin) is defined as the fewest operations of addition, multiplication, exponentiation, and complementation that combine 1's to form y. In the count of operations, the first three are given the value 1 and the last is given the value 0. Thus K(yn) = K(2n - 1 - yln). Table II gives the values of K(yln) for y < 21 and n = 10.

b. Modulo a prime p Complexity. In addition to the operations of +, x, and x x, the operation of modp is allowed and is defined by modp(x) = x - p[x/p] where p is a fixed prime and [ ] denotes the greatest integer. Table III gives the modulo prime p = 137 complexity for integers < 137. Table IV gives the modulo prime p = 1009 complexity for integers < 1009.


449

TABLE I. Complexity of Integers < Complexity Integer


450

blank


451

blank


452

blank


453

TABLEII.ComplementComplexity ofIntegers < 210 Complement Complexity Integer


454

blank


455

blank


456

TABLE III. Modulo Prime p = 137 Complexity of Integers < 137 ('omplexity lIltegcr


457

TABLE IV. Modulo Prime p= 1009 Complexity of Integers < 1009 Comlplexity Integer


458

blank


459

blank


460

III—
Complicated Numbers

One defines n to be a complicated number if Inl>lnjl for every nl<n. The complicated numbers < 210 are 1, 2, 3, 4, 5, 7, 11,13, 21, 23, 41,43,71, 94, 139, 211, 215, 431, and 863. (Those in bold are also prime.) Obviously, there is an infinity of complicated numbers. We propose the following conjectures.

a. There exists K such that all complicated numbers K1> K are prime. b. Every sufficiently large integer n is the sum of k < log n complicated integers. c. There exists c such that every sufficiently large n satisfies Inl<c + ±/logn.

IV—
Complexity and Entropy

Kolmogorov4 '5 has introduced the notion of complexity of a finite string over a given alphabet. For simplicity, suppose the alphabet to be {0, 1}. Let A be an algorithm that transforms finite binary sequences into binary sequences. By an algorithm is meant any of the various equivalent concepts used in logic. For a binary string x, one defines the complexity by Mill e(p) KA(X) = A(p)= oo if no p exists such that A(p) = x,

where £(p) denotes the length of the binary string p. Analogously, one defines conditional complexity. Let A(p,x) be an algorithm defined from pairs of binary strings to binary strings. Put Min 4(p) KA (X) = A(p)=x oo if no p exists such that A(p) = x,

KA(ylx) is called the conditional complexity of y with respect to x. Kolmogorov regards complexity as analogous to entropy. We make the following conjecture.


461

Conjecture. Let a discrete binary information source S in the sense of Shannon6 be given with entropy H = -p log p-(1-p) log(1p) where probability (0) - p and probability (1) = l-p; 0 < p < 1. Let {x1,x2,... ,X2n} be the set of all binary strings of length n arranged in order of decreasing probability. Let k(n) be the least integer so that k(n) prob (xi) > r where 1/2 < r < 1. Then asymptotically for large n, k(n) H k(n) ZKA(Xi In). (1)

(In Eq. (1), KA should be normalized so that when p = 1/2, k(n) ( KA(xi In) = 1.)ik-1

In other words, the most likely sequences from A have complexity approximately equal to the entropy of S.

In order to test the conjecture expressed in Eq. (1), we replaced KA(xiln) by A = K(yln), where A is selected so that when p = 1/2, k(n) k(n)EZA= K(xi I n) = 1 i=1

Graphs of H1 = -plogp- (1 -p)log(1 -p) and H2=k( ) E AK(xi n) when n = 10 and r = .75 are shown in Fig. 1.

V—
Complexity of N -Tuples of Integers

Matijasevic7 has proved the following theorem. There exists a fifth-degree polynomial Q(yl,..., Yk; z) with integer coefficients such that any enumerable set m of natural numbers (for example, the set of prime numbers) coincides with the set of natural values of the polynomial Q(yi,..., Yk; am) where am is a certain number effectively constructed for the set m. From the result it follows that if one could


462

Fig.1. Comparison of entropy H1 = - —pilogpi and complement complexity H2 as defined and discussed in text.

discuss complexity of n-tuples of integers, then one could discuss the complexity of enumerable sets of natural numbers by equating such complexity to the complexity of the associated polynomial Q.


463

References

1. A. S. Eddington, Fundamental Theory (Cambridge University Press, 1946).

2. I. J. Good, "The Proton and Neutron Masses and a Conjecture for the Gravitational Constant," Phys. Let. 33A, 383-384 (1970).

3. A. Wyler, "Les Groupes des Potentiels de Coulomb et de Yukawa," Compt. Rend. Acad. Sci. Paris, 271, 186-188 (1971).

4. A. Kolmogorov, "Three Approaches for Defining the Concept of Informationb Quantity," Problems of Information Transmission 1, 3-11 (1965).

5. A. Kolmogorov, "Logical Basis for Information Theory and Probability Theory," IEEE Trans. Information Theory IT-14, 662 664 (1968).

6. C. E. Shannon and Warren Weaver, The Mathematical Theory of Communication (The University of Illinois Press, Urbana, 1949).

7. Ju V. Matijasevic, "Enumerable Sets are Diophantine," Soviet Math. Dokl. 11, 354-358 (1970).

Additional References Not Used in Text

1. E. L. Lawler, "The Complexity of Combinatorial Computations: A Survey," Proceedings of 1971 Polytechnic Institute of Brooklyn Symposium on Computers and Automata.

2. D. W. Loveland, "On Minimal Program Complexity Measures," ACM Symp. Theory of Computing, Marina del Rey, California, May 5-7, 1969.

3. P. Young, "A Note on Dense and Nondense Families of Complexity Classes," Math. Systems Theory 5, 66-70 (1971).


465

17—
Metrics in Biology, an Introduction:
With W. A. Beyer, M. L. Stein, and Temple-Smith* (LA-4973, August 1972)

The role of measures of similarity, or dissimilarity, is basic to the development of taxonomical structures. Such measures, including the more restrictive ones, namely metrics having relevance for biological phenomena, are considered in this report. Of particular concern are non-traditional metrics of potential utility in recognition and taxonomy-particularly in molecular taxonomy. (Eds.)

Abstract

The use of metrics in biology is discussed. Attention is given to metrics in pattern recognition, taxonomy, and especially molecular taxonomy. Various ways of constructing metrics between sequences for use in molecular taxonomy are discussed.

I—
Introduction

To compare quantitatively different organisms, complex molecules, or biological entities in general, a measure of dissimilarity is required. More generally, all objects which form the elements of study

*Supported in part by a National Institutes of Health postdoctoral fellowship, grant HD 42801, Northern Michigan University.


466

in the natural sciences can be compared as to the degree of their differences. The notion of distance, in mathematics, is not directly or easily applicable to such studies, although intuitively any useful measure of the degree of difference between objects would seem to convey a measure of distance between them. The notion of distance between two sets of points in a metric space or between functions defined on some space (e.g., on the real line) is usually considered by comparing the values at each point separately. The differences are then either added in absolute value or integrated in the case of a continuum, and one may take, instead of sums of absolute values, the square root of sums of squares of the differences, etc. This, however, refers to numeric objects (sets, functions, operators) as rather fixed or rigid entities and does not in general involve moving or transforming one or both in order to obtain as close proximity of "fit" as possible. Obviously if one wants to compare two given different organisms, even purely geometrical organisms only, one tries to place them in positions where the comparison is made with real "corresponding" parts.1 Mathematically, this means that one looks for the distance between two sets, modulo a class of transformations. This class can, but need not necessarily, form a group.

In this report we shall recall the elementary notions about metric spaces; we shall then redefine distances and "pseudo distances" between sets or, what is particularly important, between those classes of sets defining the entities one deals with in the natural sciences. The stimulus for this general discussion stems in part from the interesting work of Fitch and Margoliash2 and others3-6 on reconstruction of evolutionary trees from the data on the amino acid sequence of certain proteins. It also comes from certain unrelated studies of the general formulations of the problems of "recognition of patterns" and "artificial intelligence studies" which were undertaken by S. Ulam, R. Schrandt, and J. Mycielski.

In biological taxonomy there are attempts to define such mathematical concepts as subspaces and neighborhoods within the space of all organisms, yet the direct application of the more general concepts of metric spaces to this apparent mathematical area of natural science has met with only limited success.7 The work of Sokal and Sneath8 exemplifies one of the more successful attempts to extract evolutionary measures or distances from numeric or phenotypic taxonomy. These studies were often concerned more with statistical analysis of the variables being analyzed than with more fundamental mathematical considerations.

The advent of modern molecular biology, and with it the availability of comparative protein sequence data, has renewed interest in


467

numerical taxonomy on the part of both biologists and mathematicians.9-11 This is largely because the data are simple enough to permit tractable evolutionary distance calculations, although the simple nature of the 23-element protein space or four-element DNA genetic space may be misleading. The biological interpretations of this new molecular taxonomy have raised controversies about our understanding of evolution.12 '13

Because of the current interest in molecular taxonomy and morphological distances in general we have outlined in this report the mathematical concepts of metric spaces and distances which may be applicable to these areas. This outline is then followed by a discussion of the protein sequence problem.

II—
Dissimilarity Coefficients and Metrics

Let P be a set of objects. Following Jardine and Sibson (Ref. 14, pp. 77-78), one says that a function p from P x P to the real line is a dissimilarity coefficient if it satisfies the following requirements: 1. p(p,q) > 0 for all p, qeP , 2. p(p, p) = 0 for all peP , 3. p(p, q) = p(q,p) for all pqP

Sometimes one requires 4. p(p, q) = 0 implies p(p, r) = p(q, r) for all reP (evenness) or 5. p(p, q) = 0 implies p = q for all p, qeP (definiteness)

A dissimilarity coefficient which also satisfies 6. p(p, r)<p(p, q) + p(q, r) for p, q, reP (triangle property) in addition to properties 1-5 is called a metric. Intuitively, it would seem that any measure of distance should satisfy the triangle property. The triangle property is essential for relating meaningful topological notions to properties defined by distance. We should note here, however, that given any assignment of a "semidistance" satisfying only the first five properties one can obtain a metric from it by the following procedure: given two points, p, q, one considers all possible finite chains


468

from p to q, continuing for example through p, xI,x2, ...xn, q, and defining the distance from p to q as the minimum sum of the lengths of the chains: n-l p'(p, q) = Min p(p, x) + p(Xn, q) + p(xi,xi+l). (1) [n,x1,...,xn]i=l

Sometimes it is useful to require that an additional property be satisfied: 7. p(p,q)< Max (p(p,r), p(q,r)) for all p, q, reP (ultrametric inequality).

The ultrametric inequality is important in the theory of p-adic numbers and valuation theory.15 Its relevance to biology is brought out by the following. 14

"The strongest assumption about evolutionary rates which can be made is that they are constant. On this assumption the dissimilarities between present-day populations would be monotone with the times since their divergence. They would therefore be ultrametric, since the times of divergence of populations in an evolutionary tree form an ultrametric. The fact that the dissimilarities between present-day populations are rarely ultrametric refutes the hypothesis of constancy of evolutionary rates in terms of known measures of dissimilarity."

The following geometric interpretation can be given to the ultrametric inequality: every triangle is isosceles and its base has length less than or equal to that of the equal sides.

III—
Metrics in the Space of Closed Sets, Hausdorff Distance, and Applications

One of the more general metrics is the Hausdorff distance. (See Ref. 16, pp. 166-172; Ref. 17, pp. 214-224; and Ref. 18, pp.20-32). This is a definition of distance between closed sets of points in a metric space. We assume here compactness of the underlying space P (this means that given any sequence of points xn of P there exists a subsequence of these points converging to some point of P). Given two closed sets, A and B, one defines: p (A,B) = Max Min p(x,y) + Max Min p(x,y), xeA yeB xeB yeA (2)


469

This distance satisfies the triangle property. Under this definition of distance the class of all closed sets in P becomes itself a compact metric space that is denoted by 2P . One can now iterate our definition and consider sets in the space 22P . This means that we consider classes of sets. Again, one can define a distance between any two of these classes with the use of the Hausdorff formula. This will be important in the sequel because when we speak of properties of sets we really consider classes of sets having a given property. So, for example, when we speak of sets of points on a screen "looking" like a letter A, we mean the aggregate or the class of such sets, distinguishable from the class of sets which "looks" like a letter B. In this way when we define a distance between objects independently of their size and orientation, for example, we have to consider, given a set, the class of all sets obtained from it by translations and rotations and also by changing of scale; then, given two different objects we are led to two classes of sets. The degree of their similarity or a quantitative measure of their differences should take into account possible changes of scale and position. If we now consider these two classes of sets and take the Hausdorff distance between them we do in essence the following. Given

a set of the first class we look at the set in the second class that is as close to it as possible; we then take a maximum of this with respect to all choices of the first set. We then perform it symmetrically the other way around by taking a set from the second class, etc. The sum (or the maximum) of these two numbers gives a measure of distance between the "letter A" and the "letter B."

In unpublished notes W. A. Beyer and S. Ulam have compiled possible methods of measuring distances between sets and certain theorems which should be proved about the distances.

IV—
Metrics for Molecular Taxonomy

In molecular taxonomy one considers sequences of amino acids defining the same protein* in various species. This means, mathematically, a class of codes each consisting of a sequence of symbols or words with 23 possible symbols. The encoded information gives the physical, chemical, and structural properties of the protein. The length of the sentences for the protein cytochrome c, e.g., is about one hundred words, and the first task is to define a distance function between any two such sequences of symbols, each assuming values from 0 to 22.2,4

* The phrase "same protein" means a class of proteins all of which perform the same biochemical function, and are by implication evolutionarily related.


470

This distance would then give a measure of dissimilarity. We shall, in this section, discuss the problem somewhat more generally. We may assume, for simplicity, that the symbols assume only two values: 0 and 1. We thus have a space of all sequences of this sort, of variable length, and we try to define a notion of distance between them under various postulates as to the equivalence or indistinguishability between some sequences. In other words, we shall assume, given a sequence, a class of other sequences "equivalent" to it and give definitions of distance between such classes. This is analogous with the illustration given above on classes of objects in the plane.

In mathematical studies, given two sequences of O's and l's, one may define the distance variously. For example, given a = [xl,x2 , ... xn] and 3 = [yi, Y2, ... yn] as: n p(a,/) = xXi-yi\, (3) i=l or n p(a,) = (Xi - y)2. (4) i=1

Still another way, suggested for coding problems by Hamming, 9 is: n p(c,,3) = [1 - 6(x1,i)],y (5) i=-

where 6(x, y) is the Kronecker delta function. We note that this metric is equal to that defined by Eq. (3) only for binary sequences. This will be of value later when considering the problem of protein sequences which are formed from the 23-symbol amino acid space.

We might, however, assume that the given sequences of O's and l's are written not linearly but on a circumference of a circle, and we can arbitrarily rotate this circle rigidly so that each sequence of a given length n is equivalent to n-l other sequences. Definition of distance, then, would concern a distance between classes of equivalents.

This is quite a general situation. In mathematics one defines a distance between two functions f(x) and g(x), for example, as follows: p(f,g)=Max I f(x) - g(x) I , (6a) p(f,g) = I f(x)-g(x) dx, (6b)


471

or p(f,g) = {(f(x) - g()) 2 dx , (6c) etc. We may however, wish not to distinguish between functions which are obtained by shifting one from another. In this case we have to define distance between classes of functions, perhaps using the Hausdorff metric. Also consult the work of Marczewski and Steinhaus.20

We shall now consider still different definitions of distance between two sequences of 0's and 1's. One definition could be p(a, 13)= Min (n + m + n' + ') (7a) n,m,n/,m' where n, m, n', and m' are defined by (T')" (T2 )m a = (T1)n' (T2 )m' (7b)

Here we allow two types of transformation or two kinds of "steps." T1 consists of changing a 0 into 1 or vice versa. T2 consists of a deletion of a symbol anywhere in the sequence and subsequent contraction of the rest, to close the gap. Given two sequences, one may define as a distance the minimum total number of steps performed on one or both of these sequences so as to bring them into identical form. As an example, let a = [010101010101] and d = 101010101010]. Then by Eq. (3) or (5) we would obtain p(a,) = 12 (8a)

since all places have different values, while by Eq. (7) we would have p(a, 0)=2 (8b) since by deleting the first symbol in a and the last in /3 one obtains identical sequences.

If one considers mutations in a chain of DNA and if amino acids defining a protein are considered to be the special types defined above, then the distance as constructed above may correspond to the number of necessary mutations to transform one sequence into the other or both into an ancestral one. This mutational-transformation approach has been applied by Reichert and Wong21 to protein and RNA sequences. Mathematically, this function of a pair of sequences is of certain combinatorial interest. It is not obvious a priori whether given at random two such sequences of n binaries the distance between them as computed by the algorithm above will be, on the average, a linear


472

function of n. It is clear that this average will be less than n/2. It is also of interest to consider infinite sequences and diverse definitions of distances between them.

One can allow a number of transformations of sequences which lead from a given one to sequences which we consider equivalent among themselves. Given such a division into classes, one can define the distance function between classes h la Hausdorff indicated above, starting with a given notion of distance between individual sequences.

Another distance-type function which is applicable to the sequence problem can be defined as follows: given two sequences, we consider the number of l's in each. We take the absolute value of the difference between them. Next we consider the number of l's followed by 0's, compare that number in the two sequences, and take the difference. We do the same for 0's followed by l's, then l's followed by two 0's, etc. We then add the numbers. This type of "Markov distance" gives us perhaps an idea of the "visual" distance between the two given sequences.

The last two distances have considerable appeal for the molecular sequence problem. The sequence transformation metric defined by Eq. (7) would seem to have a direct biochemical interpretation, as pointed out above. However, for the nonbinary sequences (such as proteins defined in the 23-symbol amino acid space) the direct interpretations of the different transformation operators as the analogs of the physical mutations become more difficult. This is partly because the physical events take place in 4-symbol RNA space and are not always simple* functions of these noncommuting operators.21

The "Markov distance" is also of considerable interest inasmuch as it appears to be a measure of the overall visual similarity of the sequences. This may be what is needed since in itself the sequence is not the object of ultimate interest but rather, as in the case of proteins, it is the three-dimensional structure which is, or chemical properties which are, encoded in the sequence.

It was in the light of the above considerations that a new sequence metric was defined. It began with an idea of Fitch.9 Fitch's original proposal for detecting sequence homology was defined as follows. Let a =(X1, ..., XN) and 3 = (Y1, ..., YN) be two sequences of amino acids each of length N. Let r1(X, Y) be some measure of the distance from amino acid X to amino acid Y. Put N-n N-n n+i+l Pn(a,3) = z z E (X 3,Yk+j) e=o k=O j=e+1(9) -np(N - n 1)2,(1<n< N).

* For example, genetic duplications, inversions, and the frame shifts as viewed in the protein space.


473

Here n is what is thought to be a statistically important subsequence size. The second term is the expected value of the measure assuming nonhomologous or random sequences with an average element probability of p. Our new sequence metric can be defined in a related manner. Put N-n n+e+l 7(a,/3,n)= O<Min E rX(Xj, Yk+j-) (lOa) e=o -- j=e+ and r()[ 2 11/2+ - -1/2 en=l te n= Then the metric is p(a, 3) = Max {p'(a, /); p (/3, o} . (IOc)

This metric has a number of potential advantages including the fact that it can be applied to sequences of varying length although no proof exists as to the triangle inequality for such cases. It also can give a measure as to the degree of redundance (subsequence duplications within or among the sequences). In a subsequent22 paper we shall study this metric in greater detail.

V—
Remarks

a. One of the major uses of distances in biology is in cluster analysis and evolutionary tree construction. J. A. Hartigan (unpublished notes) has pointed out the following objection: pairwise distances are a more sophisticated form of dissimilarity judgment than clusters, and so it may be inappropriate to use them to compute clusters. However, in tree construction where one wants to estimate length of branches, the distance concept is useful. There are other situations where one wants an estimate of the distance between clusters, and the distance concept is useful.

b. Another quantity which might be used in place of distance is a quantity a to be thought of as relating to the probability of transition from p to q. J (p,q) should be a mapping from P x P to [0,1] which satisfies


474

1. 0 < a (p,q)<1 2. cr (p, q)>(p,r) a (r,q) for all p, q, reP. Development of a theory of such a function might be worthwhile.

References

1. E. S. Smirnov, "Mathematische Studien fiber Individuelle und Kongregationen Variabilitat," Verh. 5 Intern. Kong. Vererbungswiss. 2, 1373-1392 (1927).

2. W. M. Fitch and E. Margoliash, "Construction of Phylogenetic Trees," Science 155, 279-284 (1967).

3. T. H. Jukes, Molecules and Evolution (Columbia Univ. Press, New York, 1966).

4. M. O. Dayhoff, Atlas of Protein Sequence and Structure (National Biomedical Research Foundation, Silver Spring, Md., 1969).

5. D. E. Kohne, "Evolution of Higher Organisms DNA," Quant. Rev. Biophys. 3, 327-381 (1970).

6. M. Goodman, J. Barrabas, G. Matsuda, and G. W. Moore, "Molecular Evolution and the Descent of Man," Nature 233, 604-613 (1971).

7. E. Mayr, Principles of Systematic Zoology (McGraw-Hill, New York, 1969).

8. R. R. Sokal and P. H. A. Sneath, Principles of Numerical Taxonomy (W. H. Freeman and Co., San Francisco, 1963).

9. W. M. Fitch, "An Improved Method of Testing for Evolutionary Homology," J. Mol. Biol. 16, 9-16 (1966).

10. T. Uzzell and K. W. Corbin, "Fitting Discrete Probability Distributions to Evolutionary Events," Science 172, 1089-1096 (1971).

11. M. Kimura, "Evolutionary Rate at the Molecular Level," Nature 217, 624-626 (1968).

12. B. Clarke, "Selective Constraints on Amino-Acid Substitutions During the Evolution of Proteins," Nature 228, 159-160 (1970).

13. T. H. Jukes and J. L. King, "Deleterious Mutations and Neutral Substitutions," Nature 231, 114-115 (1971).

14. N. Jardine and R. Sibson, Mathematical Taxonomy (John Wiley and Sons, New York, 1971).

15. G. Bachman, Introduction to p-adic Numbers and Valuation Theory (Academic Press, New York, 1964).

16. F. Hausdorff, Set Theory (Chelsea Publishing Co., New York, 1957).


475

17. K. Kuratowski, Topology, Volume I (Academic Press, New York, 1966).

18. K. Kuratowski, Topologie, Volume II (Polish Scientific Publishers, Warsaw, 1961).

19. R. W. Hamming, "Error Detecting and Error Correcting Codes," Bell System Tech. J. 29, 147-160 (1950).

20. E. Marczewski and H. Steinhaus, "On a Certain Distance of Sets and the Corresponding Distance of Functions," Colloq. Mathematicum 6, 319-327 (1958).

21. T. A. Reichert and A. K. C. Wong, "An Application of Information Theory to Genetic Mutations and Matching of Polypeptide Sequences," to appear in J. Theor. Biol.

22. W. A. Beyer, T. Smith, M. L. Stein, and S. M. Ulam, "A Molecular Sequence Metric and Evolutionary Trees," submitted to Nature.


477

18—
On the Theory of Relational Structures and Schemata for Parallel Computation:
With A. R. Bednarek (LA-6734-MS, May 1977)

This report is about a combinatorial study of relations between compositions of transformations and "projective algebra" forming foundations of mathematical logic. It also contains proposals to utilize such ideas for parallel computation machines. (Author's note.)

Abstract

This report will outline an area of work which, so far theoretical, will indicate some applications for construction and operation of computers which should be able to perform simultaneous operations, i.e., computations in parallel, particularly as these concern the composition and/or iteration of functions or, more generally, relations.

I—
Introduction

This report will deal with indications of usefulness of parallel and series machines for several areas of mathematical constructions. [For example, such is the case in the Monte Carlo method, where a great number of independent samplings with low or medium arithmetical precision (not requiring very many decimals) may considerably speed


478

up and enlarge the area of this application. In the Monte Carlo procedures the results are gathered not by a small number of very precise (many digit accuracy) calculations, but rather by obtaining statistics of very many independent or semi-independent histories of typical patterns or processes (to be developed in subsequent paragraphs). Thus, in the study of parabolic differential equations, a method of operating simultaneously on many channels would be useful. Similarly, in the discussion of hyperbolic differential equations or integral equations leading to or equivalent to such, we encounter a similar situation: Outside the cone of influence there is an independent or semi-independent development of disturbances and this can be studied all at once on a one-, three-, or multidimensional grid.]

Given two functions f and g, it would be nice to obtain the function f(g), all at once by a single order, including the case when the two functions are given numerically, graphically, or tabulated. The present machines compose them serially, point by point, with very great speed to be sure. If we had at our disposal a machine combining the digital and analogue features, we could envisage this most important and powerful algorithm of analysis to be effected as fast as possible with only one computer. After all, it is the facility of differentiating a composite function that increases the power of the infinitesimal calculus.

In particular, we outline a method of composing functions and relations, using a novel, parallel, i.e., "all at once" computing scheme. We concern ourselves with a two-track theoretical investigation, namely: (1) studies of the role of composition, particularly as far as its simultaneous or parallel realization is concerned, in mrathematical schemata. Here our concern is to encompass the standard operations of algebra and analysis in terms of composing point or set-valued functions (relations); (2) assuming a mechanism for effecting composition in a nonserial manner, we will study the advantages of such in computer organization, the modelling of neurological like phenomena such as iterative searches, memory building, "pattern recognition," processing of operations of the visual retina, etc.

Prelimlinary engineering studies (at the University of Florida) have established the feasibility of the fabrication of a computational element effecting instantaneous composition using existing solid state technology or some combination of optical and acoustical phenomena. However, our study is restricted to the theoretical imnplications of the existence of such a computer component.


479

II—
Functional and Relational Composition

We have observed that in some early work of A. Tarski27 there was given a set-theoretic formula for the composition RoS of two relations which involved projections. In particular, RoS={(R x X)n (X x S)} where 7r is the projection 7r(x, y, z) = (x, z). Technically speaking there should be included in the formula the morphisms ((x, y),z) --- (x, y,z) and (x, (y, z)) -+ (x, y, z)). The following schematic should aid in the visualization of this formula.

Subsequent discussions with electrical engineers and materials scientists have established the physical realizability of this operation employing existing solid state technology or some combination of optical and acoustical phenomena in a three-dimensional homogeneous medium. Some such schemata will be discussed in Appendix B.

Motivated by the above, our aim is to study mathematical operations and algorithms which would utilize parallel work and simultaneous operations on computing machines-be it on a single large computer with such facility, or on a number of smaller computers connecting together and operating simultaneously and largely independentlyas far as arithmetical or Boolean operations are concerned. In particular, assuming the facility to effect the operations of composition of functions and relations in a parallel ("all at once") fashion, our concern is to encompass the standard operations of algebra and analysis in terms of composing point or set-valued functions.

In the following section we describe some specific mathematical investigations now under way. In Sec. IV we illustrate some of the possible implications and/or applications of the anticipated results.


480

III—
Some Studies of Relational Structures

There are basically two directions in our study. The interconnection between composition and projection suggests the first of those, namely, the algebraization of the operations of projection and the construction of product sets. This corresponds to a (finite) model of the logical quantifiers; that is, the "there exists" and the "for all" operators. These, in addition to the usual Boolean and arithmetical orders on present day computers, would enlarge, even though modestly at first, the scope of theoretical mathematical investigations amenable to study on electronic machines. The second direction concerns the operation of composition of functions and transformations and, more generally, the composition of relations. In addition, we propose to study the interrelation of those two approaches-the algebraic and combinatorial properties of these.

Delineated below are some specific researches which we have undertaken.

(a) In Ref. 12 there was initiated the study of projective algebrasan initial step in the development of algebraic versions of logic from which have evolved the cylindric and polyadic algebras. Extending the work in Ref. 12, McKinsey20 showed that every projective algebra was isomorphic to a certain kind of class of binary relations. There are many open problems (see, for example, Ref. 28) concerning these algebras, but in particular we are concerned with those that are isomorphic to specific relation algebras. Recently it has been observed that the projective algebras BN of all binary relations on a finite set can be generated by a single element. We have characterized those algebras corresponding to BN. The "projection" and "direct product" operators in the axiomatization of projective algebras can be replaced by a multiplication (composition). Is it possible to characterize those algebras that may be isomorphic to some class of topological relations; e.g., under suitable conditions, the "clopen" relations on a compact, totally

disconnected space? Do there exist "Stone-type" theorems identifying spaces whose suitably associated algebras are isomorphic? Given a class of subsets, say, for example, in the plane, under what condition can this family be projectively generated? Recently V. Faber and J. Larson13 established the existence of a countable collection of sets in the plane that could not be embedded in a finitely generated algebra.

McKinsey20 gave postulates for a calculus of binary relations. We have been able to show that every such structure is a complete atomic projective algebra and, conversely, that there can be introduced into particular, complete atomic projective algebras a multiplication making these algebras of binary relations. This replacement of projections


481

by a multiplication may result in a more amenable (for machine purposes) model of the first-order predicate calculus.

(b) At the other end of the algebraic spectrum we have started to investigate various semigroups (under composition) of functions or relations. The object of this line of research is to see how much information is carried by the multiplicative structure of the algebras in question; that is, to what extent can the Boolean operations on the objects be supplanted by the multiplicative substitute for the logical quantifiers? The results reported in Refs. 1 and 2 show, for example, that for some classes of spaces, the spaces are determined by these semigroups. In this connection we propose to investigate and extend the results in Refs. 4 and 26 concerning the finite generation of these semigroups. Connected with these problems are questions of the most economical and shortest expressions involving the minimum number of compositions of the base elements to produce all given ones. That such extensions are possible has been confirmed by our recently noticed theorem that all closed relations on the unit interval can be approximated by the composition of three fixed relations.

The applications of such algorithms may be interesting for the study of models of memory building (see Refs. 9 and 21) by constructing trees (with some loops) or coding visual and aural impressions by a process of composing existing codes in the nervous system or memory storage.

(c) Focusing on the operation of composition there is a class of problems concerning fixed points of transformations as well as properties of their iterates.* Given the possibility of simultaneous or parallel effectuation of composition there is open an entire avenue of investigation concerning such questions not only for transformations but for multivalued transformations; that is, relations. As observed in Ref. 28, computations on existing computers are particularly well suited for the display of asymptotic properties of iterates. However, to tabulate only the interesting points it would be preferable to have a visual evaluation of the iterated properties of the many (or all) points. We propose to study the implications of the possibility of nonserial composition for investigations, qualitative as well as quantitative, of fixed-point problems, mixing phenomena in hydrodynamics, topological features of force fields, etc.

(d) Two relations R and S on a set X are product-isomorphic if there exists a bijection ) : X -- X such that the transformation

* Schemata for such were discussed with John Pasta some years ago.


482

(x,y) -* (/(x), (y)) takes R onto S. Employing composition this is equivalent to R = QoSo°-1 . Of course the original definition can be extended to n-ary relations. Is there an analogous compositional identity? It was noted in Ref. 28 that the notion of product isomorphism has an interesting relation to isomorphism in certain algebraic structures. In particular, if one associates with a group G the ternary relation {(x, y, z) I xy = z} then two groups are isomorphic if and only if their associated relations are product isomorphic. We propose to investigate the possibility of associating with structures, such as groups, a binary relation (or relations) so that these representations would be product isomorphic. Such representations coupled with the ability to compute rapidly via the compositional identity above would provide a useful tool.

The concept of weakly product-isomorphic28 for binary relations is equivalent to the assertion that R = 'oSoA-1 for a pair of bijections. Numerous extensions of the above will be investigated. Recent investigations by A. R. Bednarek and Gian-Carlo Rota have revealed a connection between the concepts of weakly product-isomorphic functions or relations and flows.

A more extensive discussion of these concepts and problems is contained in Appendix A.

(e) Another problem to be investigated is that of the classification of "complexity" of sets under the usual operations of the type mentioned earlier; that is, Boolean, projection, and composition. Here we mean "complexity" in the sense defined in the paper by Beyer, Stein, and Ulam.5 Also of concern is the relation of this complexity to "entropy" of such sets and the connected combinatorial problems.

(f) As mentioned earlier we have proved recently that all of the closed relations on the unit interval can be "approximated" by three such. Of course this approximation is dependent on the existence of an appropriate metric. In the case in question we used the Hausdorff metric. We are studying the question of possible metrics for various spaces of relations. For example, when there is an underlying metric, as there was above, on the space on which the relations are defined, then the Hausdorff metric is a natural choice. If, however, the subsets (relations) are subsets of a measure space then the Steinhaus distance m(A(A,B))m(A)+ m(B) ' where A denotes the symmetric difference between sets, may be more natural. The particularization of the above (or possible alternatives)


483

in the case when the underlying space is finite and the relations are therefore directed graphs will be studied.

IV—
Possible Applications of Results on Relational Structures

In this section we wish to indicate some of the directions for possible applications of results realized from the theoretical investigations outlined in Sec. III.

For example, in the last-mentioned item in the preceding section we mentioned the study of possible metrics. In the "recognition" of figures, some kind of "distance" must be used to examine similarity or analogy between them. The nervous-system conversion behind the retina might operate using some schemata involving such abstract distances; see, for example, Ref. 21.

In speculating on how nonserial (parallel) composition might play a role in effecting some of the standard operations of analysis we make the following observation. Suppose X = [a, b] is a real interval LetH = {(x,y) I x >y}CXxX C RxR. Nowiff : X - X, thenthe composition f OH is the set f OH = {(x, y, )I f(x)>y}. Schematically: Given instant composition and, for example, optical display of the same, if the luminosity of A C X x X is A(A), then bA(f H)J, fd^ A(X x X)• Studies concerning the further role of composition in integration (summation) are under way.

In another direction, one can explore the relation of the facility to compose in a nonserial fashion to, among other things, the recent work of N. Metropolis and Gian-Carlo Rota18 on "significance arithmetic."

We speculate on some of the possible applications to problems of "information retrieval." To make the situation more concrete we employ a simple situation basic to some existing retrieval systems.


484

First, we assume the existence of a finite collection W = {wk, w2, .., wN of words and a finite collection 1 { = {d l,d 2,..., dk} of documents, where each document di is a vertex of the N-dimensional cube; that is, di is a sequence of length N with its jth entry 1 or 0, indicating that word wj occurs or does not occur in the document. Of course we are not here concerned with how such a determination is made. A typical example is "key word in title indexing" of some collection of journal articles with the totality of "key words" constituting the collection W.

Given this situation, immediately there comes to mind two relations D, W, on the sets of documents D and words W, respectively. Namely D C D x D where (di, dj) e D iffdi and dj have a common non-zero entry and W c W x W where (wi,wj) eW iff wi,wj occur together in some document.

Of course the composition D°D[WoW] yields all those pairs of documents [words] that are correlated to a third document [word]; that is, for example, (d, d')eDoD iff there exists a document d* such that (d, d*)eD and (d*, d')eD. The process can be iterated with D[1 = D, D[2 ] = D D and D[n] = D[n-1 ] D.

Given a word w*eW, consider the composition Wo(w*,w*)oW which consists of all pairs (w,w') related through w*; that is (w,w') e Wo(w*, w*)oW iff (w, w*) e W and (w*, w') e W. In general, given a subset Wo c W, we let A\w = {(w,w) IweWo}. Then WoAwooW = {(w, w') l there exists a woe W such that (w, wo), (wo, w') e W}.

Given the relations D and W there is much interest in clusters; that is in the maximal complete subgraphs of D and W. Many algorithms exist for the determination of these (Refs. 3, 6, and 23). To our knowledge none of these focuses on composition. What is suggested is, given a reflexive-symmetric relation R on a finite set X, can one find more efficiently those subsets C of X maximal with respect to C x C C R? Similarly, can searches in more sophisticated retrieval systems be conducted more efficiently given the facility to compose relations all at once? What role, if any, does relational composition play in the design of an adaptive retrieval system, that is, one of the form Q x G-— G where the "inputs" are questions and G the "states"are admissible file organizations, with the transition resulting in file reorganizations that take into account the "information" carried by the queries presented?

It appears that some of the lines of research delineated above make contact with the recent work of L. N. Cooper (Refs. 9 and 21) concerning development of feature-detecting cells in visual cortex as well as possible organization of animal memory and learning. These connections were discussed by S. M. Ulam and L. N. Cooper at a recent


485

meeting on Quantum Biology and it is expected that our researches will result in a strengthening of these connections.

We include here also an account of some work concerning simple preliminary experiments of pattern recognition. These were performed in 1970-71 by Robert Schrandt and Ulam in Los Alamos on one of the Los Alamos Scientific Laboratory computers.

It is proposed to extend these studies on a more general basis using variety of different notions of a metric distance between two sets of points in the plane or between two classes of such sets (or "pictures") on a screen.

It is speculated that analogous schemata may be operating in the nervous system, specifically, in the visual brain, on the layers behind the retina.

As will be seen later, the operation of composition of sets may play a role in the production of "examples" which are variations of one or a few impressions stored in the brain, to be compared with a new impression confronted by the nervous system (or by the computer).

A subsequent report will contain a discussion of distances to be used in purely mathematical investigations concerning the degree of a quantitative measure of analogy or similarity between mathematical structures, i.e., between relation sets, i.e., graphs, groups, and some other mathematical structures. The behavior of such distances as regards relational composition has been studied by A. Bednarek in his work on relation theory.

The following account summarizes work done in 1970-71 in Los Alamos by Robert Schrandt and Ulam.* The computations were done on the CDC 6600. Specifically we experimented with schemata for an automatic recognition by an electronic computer of visual data, i.e., pictures such as presented on a screen or on the retina of an optical system. The idea involves "teaching by examples." The simplest instance considered by us was to present the computer with data describing a handwritten letter "a" in a large number of examples, then present the machine with a number of examples of the letter "b." After this, a letter was written again and the machine was to decide whether it was "a" or "b." The visual system of a living organism contains ways to abstract from size, from translation and from rotations through small angles, at least. We should stress right away that rotations through larger angles, say, 900, are not yielding equivalent impressions, so for example, N and Z are equivalent by a 90° rotation, but are perceived as different objects. The invariance then obtains

* We want to acknowledge also the interest and helpful suggestions of Professor Jan Mycielski.


486

only for "group neighborhood" small-angle rotations. A larger class of transformations which does not affect the notion of the same object is, of course, small deformations. [The notion of an object to be perceived, that is to say, a class of two-dimensional sets must involve abstraction from small changes, for the problem of recognition amounts to ways of distinguishing classes of two-dimensional sets when it comes to establishing visual memory (one-dimensional sequences for auditory impression, perhaps three-dimension ones for tactile ones)]. The main tool in our approach is the definition of a distance or, really, several distances between classes of sets. We shall indicate some plausible such metrics and speculate on the way these are coded and registered in the nervous system and the brain. Later on we shall speculate on iteration of such procedures, that is to say, families of classes which we can call ideas or notions of the first order for coding and registering those, etc.

In our simple-minded experiments we attempted to imitate the various versions of a handwritten letter as follows. An example of the letter a was written by hand and the points forming it were registered on a 64 x 64 grid in the computer in such a way that a unit square was touching the extremities of the set. Instead of writing a number of other examples by hand and putting the resulting pictures into the memory of the computer, which would be laborious and timeconsuming, we write down once and for all, two transformations of the unitsquare into itself, call them S and T, which are a little different from identity transformations, and we apply in succession various compositions of these two, obtaining a number of transformations of the given first example. In our case if we wanted, say, 128 examples, we would take all possible compositions of seven transformations S and T, e.g., STSSTST, TTSSTST, etc., each giving an image of the original set. Clearly, if one wanted, say, 1024 examples, we would take compositions of 10 of these transformations. If T and S differ sufficiently little

from the identity, the resulting transformations would be still "small" deformations. We have taken, for T and S, the following: I 1 x' = x + el4y(1 - y)el = - or 1 T 10 16 y' = y + l4x(1 -x) = ~ 2 1 1 S x = x + 2ye2= 8 or -2 y+ = y + E23

It was rather surprising to watch the images of the letter a (and similarly for b) as they appeared after these transformations, they gave the


487

impression of being handwritten, too, sometimes by a shaking hand showing a variety of different styles.

We now want to discuss a way to code numerically each of the resulting sets of points and explain our criteria for comparing a new example with our two sets of examples to decide whether the new picture should belong to the first or to the second set. The unit square is divided successively into four, then 16, then 64.. .subsquares. We denote these by, Qi, Q2, Q3, Q4, then by Qll, Q12, Q13, Q14, Q21Q24,Q23, Q24, ..., and so on. Let Z1, Z2, ..., ZN be the sets of points corresponding to our examples where N was, say, 128. Given a set Z, we associate with it a sequence of numbers which are binaries in O's and 1's as follows:

In the intersections of Z and Qi we will write a 1 or a 0 depending on whether Z has points in common with this set or not. We continue, in this way, obtaining a sequence of "the characteristic functions of a given set Z."

We shall now consider weights assigned to the successive coordinates of the sequence, giving more importance to the large squares and diminishing it successively for the ones corresponding to the smaller squares. This is to stress the "global" properties of the set Z more than the "smaller details." In our procedure we gave to the four first squares, Qi ... Q4, a weight of 1/4 each, to the next batch a weight of 1/16, and so on. We now assign a distance, p, between two sequences coding the two sets Zi and Zj as follows: If the sequences are X= (X1,2, ..., X128) and = (y1, 2, , Y128) we put P( ) = xl - Yl + _. X4 - 4 I 5 - 5 +.. l X128- Y128 4 4 16 128

In this fashion we obtain two finite sets of points A and B corresponding to the examples of the letter a and the letter b, respectively, each a set of 128 points. Given now a "problem," that is to say, a new example of a letter written by hand, we have a single new point p in this metric space.

We can now compute the distances from p to all the points of A and the distance from p to the points of B. If the sum of the distance from p to the points of A is smaller than the sum of the distance from


488

p to the points of B, the computer decides that "the point p is more like the letter a."

In our experiments we have produced, as problems for the machine, by actual writing, a variety of letters a and b, imitating different styles of handwriting, and obtaining points pl .. .pk. In each case the computer has made a decision whether the letters were a or b. The results were over 80% correct!

The first experiment was based on the rather crude metric pi, defined above.

A more suitable metric, which we shall call p2, would be based on a more precise way of coding the visual picture by a numerical sequence. Instead of merely noting the presence (1) or absence (0) of the given set in any of our squares of the subdivision of the screen, one could write down a real number, S1, 0 < S1< 1, describing the proportion of the number of points in the given set S to the total number of points in the subsquare in question. In this fashion a picture or set Z would be coded by a sequence of real numbers S1, S2 ,..., SN. Again, one should use weights for the sets of squares-larger ones for the more "global" or "gross" features of the set.

Still another distance, P3, could be defined to correspond to the "Hausdorff" distance between sets. This distance, PH, introduced by Hausdorff, between closed sets A, B in a compact metric space E is defined by PH = Max MinpE(x,y) + Max MinpE(x,y), xeB yeA xeA yeB PE being the metric in E.

In our case PE would be the ordinary (euclidean) distance between the points in our unit square. p3 would be defined by computing the distances between the sets of our two pictures in each of the squares of the subdivision, again giving weights to each square, larger ones to the larger squares, diminishing for the "small detail squares."

Still another distance, P4, would be defined analogously but starting with the "Steinhaus distance" ps instead of the Hausdorff one. The Steinhaus distance between sets A, B is based on the measure (in our case simply the number of points, of course) ps(A,B) =m(A B) m(A)-+-m(B)' A denoting the symmetric difference between the sets A, B.


489

To obtain p4 we would again compute the distances between the sets in each square of the subdivision separately, keeping track of the weights of the squares as above.

We plan to undertake the corresponding experiments with these "more precise" distances.

Our crude beginning attempts referred to two letters only. For "reading" a text, the computer would have to possess in its memory sets of examples for each letter, of course.

One can now speculate about the building of memory in a living organism--in the nervous system connected to retinal impressions. There might exist a "wiring" system of nerves and their connections behind the retina, allowing us to abstract from orientation, size, translation of the picture presented. In addition, we might perhaps venture to postulate a system of "deformation" of a given picture (object) by a mechanism not unlike our composition of two fixed transformations providing a set of equivalent impressions.

The way to compute a distance or degree of analogy could be based not on our numerical procedure but could perhaps utilize a system of finding an analogous set in a collection of sets by a method similar to the one which astronomers use to scan a set of photographs of a region of the sky to discover a new or variable star by flipping a number of these in succession in that the eye, disregarding the rejective pictures, reacts to a point which "jumps out" on a background of constant impressions.

This could be imagined as follows: The more "global" parts of the code evoke from the memory places where these "trunks" of the nervous connections go. A set of pictures which are then compared modulo 2 with the impression received, that is, the overlap of the pictures is "dark"--when the total intensity of the symmetric difference is sufficiently small-the picture is examined in more detail until it is finally recognized or else put in the memory in the new place (together with all the "small deformations" of the new picture).

A more economical and efficient method of operating by the nervous system and the brain should be, of course, to keep in the memory just one example (or a few examples). Presented with a letter as a "problem" the brain can obtain very quickly from any example it has many deformed ones and compare the one to be recognized with this whole class in the manner described above. This avoids the loading of memory with unnecessary variants of a given "picture."

The next step is to devise, in an analogous manner the recognition of classes of sets of pictures which are not variants of one of them. The following may serve as a case of the general problem. Given are, say, 10 pictures representing various animals-a dog, a cat, a horse, etc. Given


490

is a set of points representing another animal. One is to decide whether this new picture belongs to this class K or whether it represents, say, a tree, of which we also have in the memory a number of examples, forming a class. The problem deals obviously with variables of a higher class-families of sets-instead of sets individually.

The sets of each class have only some properties in common; they are not variables of just one set as before. The problem of recognition of a class would be attacked by considering functionals of sets. One set of functionals can be the Fourier, or rather Rademacher or Walsh coefficients, describing a two-dimensional set. These, by the way, should be weighted, as in our previous discussion. But there must be several or many other sets of functionals, including one dealing with individual points, for example. The functionals whose sequence is put in the memory must overdetermine the sets which we consider and which our experience builds up gradually from childhood. Perhaps a hundred or more collections of sets of functionals are gradually acquired.

We shall now tentatively describe a possible mechanism of discerning whether a single set S should be put in the class of sets R or in the class U. For simplicity of explanation let us assume that the class R consists of two sets R1 and R 2 and the class U of sets U1 and U2 .

We examine functions F1 ... FN for the sets U1 and U2 and find out which among them are the same or have close values for the sets R1 and R 2. This is a collection forming a subset of the sequence F1 ... FN, call it F ... F.1 . Analogously for sets U1 and U2 . these might be F 2 ...F . For our "problem" presented by the set S we look at these functionals for the set S and find out whether the values of S are closer to the F1 collection or else to the F 2 collection. We make our decision accordingly. (If none is sufficiently close we will decide that the set S has neither the "property" R nor U.)

In our next report we shall examine a possible way to scan this procedure, which, in the case where R and U contain a sizeable number of sets each, will be more "economical."

As a final example of a situation in which the type of computational facility we have alluded to might play a role, we outline a very general formulization which we label the "graph of mathematics." The iterative searches in this graph that might recognize "deep theorems," given the complexity of the graph, would involve the ability to make rapid computation on "parallel" searches over a great multitude of paths.

The idea is to consider a formal system of mathematics as a collection of groups of signs and symbols starting with a list of symbols, then selecting a system of axioms, for example Godel's, (really for the first attempt taking a much more restricted and simpler one), we start


491

with axioms which are groups of signs. We have rules of combination of these and rules of deduction which are formalized. These are not written down, but given a single group of formulae we consider joining them to others by the rules of procedure which include the symbols for Boolean operators, and, or, quantifiers, and substitutions. Therefore our graph has a form of a "pair tree," that is to say, from pairs of such formulae or from pairs of collections of symbols, we get new ones. In some cases, from a single one by "mitosis," we get another one (by omission or restriction, or some suitable "erasures" allowed by our procedures). We obtain a graph which bears a resemblance to "genealogical" graphs. These graphs can be characterized by the property that, going "backwards" from single points, or single individuals considered as vertices, one obtains two parents, ancestors of the individual. A single person has thus at most two, sometimes only one ancestor. (Allowing procreation by "mitosis.") These graphs form a natural generalization of graphs which are called trees, for which a considerable theory exists. Much less is known about graphs defined in the above way. A theory of such "pair trees" in analogy to the existing body of knowledge about tree graphs remains to be developed.

There are many problems which have been considered for trees. For example the reconstruction problem which has been solved for trees could be studied for pair trees. (A list of such problems will be appended later on.)

The graph of mathematics as we call it, is not the same as a genealogical tree since a given formula or a theorem can be derived by our procedures from different pairs of ancestors. (It could be considered as a pair tree with colors.)

It is possible to attempt to define the "value" or "interest" or "importance" or "depth" of a theorem, or a formula. This, roughly speaking, should have the following characteristics: some of the formulae are obtained only by a long train of successive pairs of ancestors and deductions. If they happen to be relatively short we consider them the more valuable. The really interesting ones are such which in their "neighborhoods"* have many others easily deducible from them. For example, the fixed point theorem of Brouwer has this property (the consequences of the "deep theorem" need not easily lead back to the "important" or "deep" formula or theorem.) One could try to define

* Having a pair tree, one can define a distance between any two formulae or two theorems by the length of the shortest connection between them through the chain of "ancestors." This will always be a metric. In this metric it makes sense to define "proximity" of a vertex to other vertices and to define what a neighborhood means.


492

what might be called a surprising formula or a surprising theorem. These could be such that can be deduced by a perhaps complicated sequence of operations from the starting list of symbols and axioms but are such that the path or a band of paths leading to them comes from a rather remote, "sidewise" located collection upstairs.

V—
Relation Algebras: Some Examples

In the next to the last application of the preceding section attention was focused on a particular "distance" between classes of planar patterns (relations). While we have given several possible metrics for such objects, no systematic investigation of such spaces appears to have been made. At the algebraic end of the spectrum though, some attention has been paid to relational structures (pattern algebras); for example, the earlier mentioned calculus of relations of McKinsey,20 the relation algebras and algebras of relations of Tarski27 and others. To illustrate the character of these investigations, we summarize here results obtained relative to some of the more general of the relational structures, namely, semigroups of relations. Our focus throughout is on those questions of interest to anyone concerned with algebras of patterns, particularly those algebras including the iteration or composition of these patterns as one of the basic operations.

The family of all binary relations on a nonempty set X is a semigroup under the operation of composition. We denote this semigroup by Bx. There has been increased interest in these semigroups due in part to a resolution, by Montague and Plemmons,19 of a problem of Schwarz25 concerning the character of the maximal subgroups of Bx and due further, to the interesting results obtained in a topological setting by K. D. Magill, Jr. and his students.

In particular, Montague and Plemmons19 showed that every finite group is isomorphic to a maximal subgroup of some Bx. Later Plemmons and Schein24 (see Ref. 7 for a particularly elegant argument) extended this result to arbitrary groups.

In another direction attention has been focused on the morphisms of Bx. In 1964, Crestey showed that every automorphism of Bx which preserves finite unions is inner. One year later Zareckii36 showed that every automorphism of the subsemigroup of Bx consisting of all relations with domain and range equal to X are inner. Finally, Magill,16 and independently, Gluskin,1 4 showed that every automorphism of Bx is inner. Beginning with the work of Clifford and Miller8 and pursued by Magill and his students, in the topological case, investigations of the endomorphisms of 3 x are under way.


493

Obviously, there are many interesting questions one may ask about Bx, but the sampling of results above is intended to only indicate some of the possibilities. Furthermore, one could argue that all of this could be subsumed within the framework of transformation semigroups since Bx is isomorphic to the semigroup of all additive set functions of X that fix the empty set (define .:8x -4 Hom(2X,2X) by O(R) = FR, where FR(A) =RA = {x I (x, a) e R for some a e A} for all A e 2 X). This argument, however, loses some of its attraction particularly when one attempts to choose a proper topology to insure, for example, that the subsemigroup of all closed relations on some "nice" space is isomorphic to the semigroup of all continuous selfmaps of 2X.

At the other end of the spectrum; that is, when X is finite, too little attention has been paid to Bx. It appears that aside from some characterizations (Magill17 ) no systematic investigation of 3 x has been undertaken and some very basic questions remain unanswered. (By way of illustration we include the multiplication table for B2; that is, the semigroup of all binary relations on a two-element set. We are grateful to M. Stein of the Los Alamos Scientific Laboratory for the computation of this table.)

Multiplication Table for B2


494

We do not mean to imply by the above that the semigroup of relations on a set is the proper relation or pattern algebra to study. Quite the contrary, while the calculus of relations of McKinsey may represent too stringent an axiomatization, the disregard of the Boolean operations, particularly the natural order induced by them, errs in the other direction. This judgment is confirmed to some extent by an examination of some of the characterizations of semigroups of relations (Magill17 ) which reveal the presence of an order defined in terms of the multiplication. Although considerable insights can result from a study of semigroups of relations we feel that the multiplicative structure must be complemented by some additional structure; for example, some order respected by the multiplication. A variety of such algebras will be investigated.

In yet another direction, almost no attention has been paid to topological relational structures; that is, to relational algebras topologized in such a way as to insure the continuity of the operations involved.

Of fundamental interest in any algebra of patterns are the questions of generation; that is, the construction of the patterns from basic units using admissible operations. We illustrate these ideas with some recent results on generators and embeddings for semigroups of relations. These results, most of which have not appeared in the literature, are suggestive of questions that may be asked for any structure purporting to be a "relational structure."

Recently E. Howorka has proved the following: Theorem. (Howorka). If X is a set then a relation R on X is of the form f-1 g, where f and g are functions iff card R< card X. As an immediate corollary we have that every binary relation on an infinite set is of the form f-l'g; that is, every relation is the composition of the inverse of a function and of a function. In addition, Howorka15 observed, extending the results in Ref. 4, that the calculus of relations 3N, that is, the calculus of all relations on a finite set of card N could be generated by a single element. This followed from the proposition.

Theorem. (Howorka). Let B, denote the collection of all binary relations on the set X = {1,2,...,n}. Let R = {(1,2),(2,3),..., (n- 1, n)} and S = {(n, 1)}. The subsemigroup [R, S] of Bn generated by R and S under relational composition contains all of the atoms of Bn.

By observing that S = [R 2 U (RoRC) U (RCoR)]c, where Rc denotes the complement of R in X x X, we then know that the relation R generates all of 3 n under the Boolean operations and composition.


495

Very little was known until recently about generators of finite semigroups and, in particular, about generators of BN. Recently E. Norris22 proved the following:

Theorem. (Norris). The semigroup 3 x satisfies (*) below iff X is finite. (*) For all R,SeBx, R°S = A implies R-1 = S where S is a permutation on X.

This implies that 3N cannot be generated by a pair of generators. It is not known how many generators are required for BN.

There follows an example of a result similar to those suggested in Ref. 28 for projective algebras.

T. EvansT l showed that any countable semigroup can be embedded in a semigroup generated by two elements. S. Subbiah26 gave an elegant constructive proof of the fact that any countable collection of transformations on the set of natural numbers N was contained in a subsemigroup of TN, the full transformation semigroup on the natural numbers, generated by two such transformations. Since any countable semigroup can be embedded in TN, Evan's result was an immediate consequence of Subbiah's theorem.

A minor modification of Subbiah's proof permits the following extension of her theorem to relations on the set of natural numbers.

Theorem. If {Rn} is any countable family of relations on the natural numbers N, then there exist two relations on N that generate a semigroup containing {Rn}.

Proof. As in Ref. 26 we let An {2n(2m- 1)- 1}m1 for n =1,2, 3,... Now let X and T be relations on N defined as follows: = {(x,2x) I xeN}, and T = {(x,x - 1) I x even}U U {xX ( x + 1) + )Rn I xeAn} n=l where we note that, in general, for any relation R, by xR we mean the set xR = {y I (x,y)eR}.


496

Assert that Rn = bTn"T2 , where juxtaposition denotes relational composition.

Suppose (x, y)eRn, then (x,2x)eO(2x, 2x- )eT, (2x -1, 2n(2x -1)) eon , (2"(2x - 1), 2n(2x - 1) - l)eT. Note 2n(2x - 1)- leA so that (2n(2x - 1)- 1) x (1/2n+1((2n(2x - 1) - 1+ 1) + 1/2)Rn is contained in T. But 2+1 (2n-(2x-1)-1 + 1) + ) =n xRn so that 2n+i 1\_ _ _ ((2 (2- 1) - 1,y) eT Thus Rn C OTqnT2 .

Suppose (x, y)eqTcnT2 , then (x,a)eo, (a, b)eT, (b, c)eqn, (c, d)eT and (d, y)eT, for some a, b, c, deN. So a = 2x, b = 2x-1, c = 2n(2x1), d = 2"(2x - 1)- 1. Since d =2n(2x - 1)- 1eAn, we have ye (2n+i (d+l)+Rn =xRn Hence qT5nTT2 C Rn, and therefore Rn = oT'TnT2 If the family {Rn} is a family of functions then the relation T is a function so that the result particularizes to that of Ref. 26.

This completes our illustration of the character of the results and problems that might be of concern to investigators of "algebras" purporting to be algebras of patterns, particularly two-dimensional patterns. Generalizations to higher dimensions of some concepts and problems considered are given in Appendix A.

Appendix A

Examples of Combinatorial and Set-Theoretical Problems Concerning the Operation of Forming Product Sets

The theory of relations, the theory of graphs, also the theory of projective or cylindric algebras have as their bases the following settheoretical setup:


497

Let E be an abstract set, finite or infinite. By En we understand the set of ordered n-tuples: En = E x E x ..., E, the elements being (el, e2, ..., en) where el e E for i = 1...n. denotes the set of infinite (countable) sequences (el, e2, ..., en...).

For the study of binary relations, we consider E2 and associate with the given relation R between pairs of elements x, y, a set of edges; the edges correspond to certain pairs of elements of E, namely those pairs which are R-related.

In this way a given relation R, or a graph G, can be associated with a subset, call it A of E2 .

(More generally, in a study of other algebraical or combinatorial notions, for example in ternary relations, say a group H we might associate with it a subset of a higher direct product. Thus for a group operation, defined on an abstract set E, we might consider in E3 the set of such triplets (el, e2 , e3 ) such that el o e2 = e3 where o denotes the group operation. Analogously when we deal for example with rings, we could consider in E6 a subset ofsixtuples (el, e2 , e3, e4 , e5, e6 ) where e1+e2 = e3, e40 e5 = e6 where + and o denote the two arithmetic operations defining the ring. For infinite operations, i.e., in some topological theories, say, in considering Frdchet spaces where the fundamental notion is that of convergence of a sequence: (el, e2, ..., en, ...) to an element eo, we may "represent" the topological space by a subset of consisting of the sequences: (eo, el, e2, ..., en, ...).

We introduce now a general notion of product isomorphism and product homomorphism which subsume the notions of isomorphism, and homomorphism used in these theories.

Two subsets A, B contained in E2 are called product isomorphic if there exists a one-one mapping T(E) and the mapping T2 defined by T2 (x,y) = (T(x), T(y)) maps the set A onto B: T(A) = B. Analogously a product homomorphism S2 (E) is defined by a not necessarily one-one mapping S(E) = E by (x,y) - (S(x), S(y)).

In complete analogy one defines a product isomorphism Tn of En onto itself by Tn(En ) = En through (x1, x2,..., xn) - (T(xl), T(x2), ..., T(xn)) and two subsets A,B of En are called product isomorphic if there exists a T and Tn(A) = B.

Similarly we define the product homomorphism Sn. B is called product homomorphic to A if there exists an S such that Sn(A) = B. n can be any finite integer or n = oo.

It is obvious now that two relation sets R1 and R 2, or two graphs G1 and G2, are isomorphic to each other in the usual sense if and only if the "representations" of them as defined above are product isomorphic in our sense.

Similarly, the isomorphism of two groups H1 and H 2 is equiva-


498

lent to the product isomorphism of the subsets "representing" them in our sense. Analogously, homeomorphism of two topological spaces is obviously equivalent to the product isomorphism of their "representations" in E°. It is equally clear that homomorphisms, algebraically or topologically translate into homomorphisms of the representations.

We shall now present a few examples of combinatorial problems arising in the theory of properties of the product isomorphisms. The first questions concern the enumerations.

(1) How many nonproduct isomorphic sets can one have in En or equivalently, of course, how many classes of product isomorphic sets are there? (Product isomorphism is a transitive relation, so the classes are disjoint.) The asymptotic estimates are known for E2 --that is to say for the number of nonisomorphic relations or nonisomorphic graphs, in case of E, finite. (For E countably infinite the number is of the power of the continuum.) Much less if known for n = 3 or higher, i.e., the number of non-isomorphic general ternary relations, for example.

(It is of the order of 2(3),N being the cardinality of E; it is obviously = c for cardinality of E equal R.)

(2) Analogous problems concerning product homeomorphisms. Less is known for this case. The question is how many sets can one have in En so that none of them is a homomorphic image of the other? (Recently, E. Howorka and, independently, J. Mycielski showed a continuum set of graphs such that none is a homomorphic image of any other in the set.)

(3) Questions concerning the existence of "universal" countable subsets of En, i.e., such sets A in En that every countable subset X in En should be isomorphic to a subset of A; this, of course, in the case when the cardinality of E is that of a continuum.

(4) Problems concerning the existence of "economical" bases for a given class (e.g., a countable one of subsets A1, A2, ..., Am, ... of En). This means, for instance, furnishing finite number of sets B1, B2, ..., Bj such that by the operations of Boolean algebra and of projection and direct product formation one can obtain, among others, all the sets A1 , ..., An... (There is a whole class of problems of this sort, mainly still unresolved.) For the case of K finite can the base be formed with just 2 sets? The problem is of interest when the cardinality of E is say of the power of continuum or higher-the class of given sets may then be uncountable.


499

(5) Problems of the above sort become more difficult when their analogs are stated for the notion of "equivalence" of two sets A, B through a decomposition.

We illustrate this notion for subsets of E2 . Two subsets A, B of E2 are called equivalent by finite decomposition if: A =Al A 2 U ... U Ak B =B B 2 U U  U Bk Ai n Aj = 0 for i # j Bi n Bj = 0 for i :j and each Ai is product isomorphic to the corresponding Bi-the product isomorphism depending in general on i that is A = T(B): through a transformation Ti.

In the case when the cardinality of E is finite the problem concerns for example the number of sets nonequivalent by a decomposition into two subsets. If E is infinite, e.g., of the power R of the continuum or higher, one can ask about the possible number of sets nonequivalent by decomposition even into a countable number of sets:

00 00A= U Ai, B= U B. n=1 n=l (6) A notion of "weak" product isomorphism: e.g., for n = 2. For subsets A, B C E2 , we call these weakly product-isomorphic if there exist two transformations T1 and T2 of E onto itself. Such that the transformation T1,2(E2 ) ~* E 2 defined by (x,y) - (Ti(x), T2(y)) carries A onto B: T, 2(A) = B.

(7) A definition of "iteration" of subset A C En. This will be a subset 2 A of E2n defined as follows:

2 A consists of all sequences of elements (el, e 2, ..., en fi, f2, , fn), where (el, ..., en) and (fi, ..., fn) were arbitrary elements of A.

Suppose 2 A is product isomorphic to 2 B; under what conditions can one assert that A is product isomorphic to B? In all generality this


500

is not true. For example, Fox constructed, in an answer to a problem posed by one of us, an example of two topological spaces S1 and S2, and the S2, and S2 are homomorphic (in the usual topological sense) but S1 and S2 are not. For algebraic or combinatorial structures, the answers are in general negative-but true for special classes, e.g., finite groups, graphs of special character, etc. When can we assert that if 2 A2 is a homomorphic product image of 2 A1 then A 2 is a homomorphic image of A 1?

The above will merely serve as isolated specimens of a great class of problems formulable in the spirit of combinatorial set theory from a unified point of view.

Appendix B

Physical Realizations of Nonserial Compositions

In this appendix we consider the possibilities for the physical realization of nonserial composition.* In particular, in A there is described the preliminary engineering design of a device that effects this operation. One could consider the problem of more efficacious designs along these lines with the objective of possible fabrication of a pilot model to serve as a heuristic aid in early theoretical studies, but the interconnection problems preclude a system of even modest capability.

In part B we describe possible lines of investigation of threedimensional computing in a homogeneous medium. While presenting many interesting possibilities, particularly in the elimination of the size restriction inherent in A, this endeavor faces many unsolved problems in the area of materials science. What is considered here is an initial assessment of physical phenomena that might be exploited to effect the realization of the principal processes of cylindrification, intersection, and projection, the processes basic to nonserial composition.

A—
Discrete Systems

Here we are concerned with the preliminary engineering design of a discrete composition machine, in particular the logic, memory, inputoutput, control, and size. The cube of Fig. 1 is referred to in the description of this proposed hardware.

* We are particularly grateful to Derek Dove, Professor of Materials Science and Engineering, University of Florida, for generous discussion of the schemata that follow.


501

This machine is described as a cube with three relevant working surfaces, A, B, and C, where pairs of indices (i, j, k) describe locations on each surface as indicated in Fig. 1, with each index having n possible values. In this description, surfaces A and B are thought of as being the input mappings to be composed, with the results projected on surface C.

The logic of the composition is described as a two-level AND-OR array. A typical cell on surface A is associated by logic AND with each of the n cells on the same row of surface B, with the result projected to a row of internal points. For instance, typical internal point cijk holds the results of the logical AND of cells aik and bjk, as Cijk = ik A bjk -

Now the projection upward onto the C surface is taken as follows. Each cell on the C surface, 2n-1 Cij = V Cijk k=o where Cij is a typical cell in the C surface and where the recursion operator indicates a logical OR in the k direction of all the cij elements below it.

A possible realization to illustrate this idea is as follows. Each of the n" AND gates can be realized by a transistor such that transistor conduction indicates a logic 1, occurring if the voltage on the base is high AND, and the voltage on the emitter is low. As shown in Fig. 1, variable ik is a bus bar which supplies base current from surface A to any number up to n transistors. Variable jk is a similar bus bar from the B surface. Each bar is programmed, at surfaces A or B, by connection to a voltage supply.

The recursion OR, operating in the k direction, can be considered as a bus bar connecting together all the collectors of the transistors in each level below it which have the same (i, j) coordinates. The display cell on each cell of the C surface is a lamp, which is on if one or more transistors below it are turned on by the (ik, jk) coincidence.

One of the system requirements is that a result of a composition displayed on surface C needs to be mapped back onto surface A in order to provide the capability of several iterative compositions with a fixed mapping on B. This can be accomplished with an n x n array of flip-flops on each of the two surfaces A and C, where one of the surfaces would required dual rank flip-flops.


502

Logical AND at each Cijk = aik A bjk 2n-1 Logical OR vertically Cij = Cij = V Cijk k=O A possible cell realization: ILAMP "TIED OR" Fig. 1. Schematic for a discrete composition machine.


503

18 On the Theory of Relational Structures and Schemata for Parallel Computation

B—
Homogeneous and Partially Parallel, Partially Serial Systems

Here we consider physical phenomena that might be possibly exploited to effect a realization of the processes of projection and intersection.

Several major types of systems may be envisaged. The projection path may be delineated by some physical structure such as an electrical conductor, and intersection may be realized by the use of discrete components as in A. We may refer to such a system as a discrete system. Alternatively, projection may be carried out by the motion of an advancing wave front, while intersection in such a case may involve the interaction of wave fronts in a homogeneous medium. Evidently these two examples represent the extremes of completely discrete and completely homogeneous systems.

The present status of such arrangements may be summarized briefly by noting that while the technology exists for the construction of an entirely discrete system, such a system suffers from an inefficient use of elements and presents interconnection problems of extreme difficulty for a system of even modest capability. Modern high-density fabrication techniques lend themselves naturally to two-dimensional organization.

Three-dimensional computing in a homogeneous medium, while an exciting possibility, presents unsolved problems in both mechanisms of defining a vector or scalar wave property over a large region of space and in choice of wave/wave interaction phenomena.

This part of the appendix is concerned with the exploration of the implications of function composition for parallel computing schemes, and with the preliminary assessment of the potentialities and limitations of homogeneous medium interactions. Finally a study is proposed of partially parallel, partially serial systems, based upon modern high density solid state optical sensor arrays.

1.Homogeneous Medium. Wave front delineation: The spatial limitations imposed upon a wave front by diffraction effects are, of course, well known; the resolution with which a desired image may be reproduced may be readily evaluated from a knowledge of wave length, system observations, and effective aperture.

The process of projecting a two-dimensional image "intact" through a region of space can only be carried out with a resolution far inferior to the two-dimensional resolution capabilities of the system. In optical terms an extreme depth of field requires a very small beam divergence which in turn limits lateral resolution.


504

One would need to examine the information packing density that may be "projected" using optical, holographic, acoustic, and electron beam techniques.

Interaction in Medium. In order to carry out the intersection process utilizing the influence of two projected wave fronts, it is necessary that the intersecting wave fronts produce an effect that may be detected by a third orthogonal wave front and that may in addition be distinguished above the background "noise" level due to the nonintersecting wave fronts.

Particular attention needs to be given to an evaluation of known optical and acoustic phenomena for their potentialities for the present applications. Examples are (i) Photochromic effects in which a beam of one wavelength produces a coloration in a crystalline or glassy medium, a second beam of different wavelength produces a bleaching or other change of the colored region. (ii) Quenched fluorescence, a phenomena where the fluorescence produced by light of a particular wavelength may be locally quenched by light of another wavelength. (iii) Acoustic beams, that may give rise to a stress-induced birefringence. By suitable summation of local stress fields, a change may be made that is detectable by a polarized light beam. Evidently each phenomenon offers certain potentialities and limitations.

The phenomena need to be evaluated for signal/noise ratio, information packing density, and the inherent natural lifetimes associated with them.

2. Parallel/Serial Image Processing. A promising compromise approach to a machine permitting functional composition is one employing solid state imaging technology. Such a system might consist of two planar input optical sensor arrays, a cathode ray tube output display, and line and frame scan circuitry.

Each sensor array consists of n rows of sensors, each row containing m sensors connected in a charge transfer chain. Thus the charges produced upon a line of sensors may be stepped sideways simultaneously by simple scan circuitry. As the charges reach the end of the row they activate further circuitry controlling, for example, the intensity of an electron beam being scanned synchronously across a phosphor screen. Thus in a solid state television system, the sensor array is exposed to an optical image, giving rise to greater or lesser charge buildup in the individual sensors. The charges are then read out by stepping the charges in each line along each row in sequence. This information is used to modulate a scanning electron beam and so the image is regenerated upon a phosphor screen.


505

In the composing system an input image is presented to a lineorganized sensor array, and a second image is presented to a similar second array. In order to form an image of the mapped intersection of the projection of the two arrays as described earlier, the information content of the first array is stepped along a row by one unit, all rows stepping simultaneously. The row output signals enter a chain-of-logic elements. The second sensor rows are now stepped continuously in synchronism with a scanning electron beam and are compared with the first column of signals of panel 1. The row output signals enter the chain-of-logic elements and, if an output signal is encountered on any corresponding rows of the two arrays, then the intensity of the electron beam is modulated.

The information of the first array is now stepped again, the electron beam is displaced one unit sideways, the second array is re-exposed if necessary and a new line is scanned.

In this way an image is built up on a display screen having the required properties. By using a storage display tube this image may be retained and used to expose the first sensor array, light output and optical sensitivities being quite compatible in this regard.

The two sensor arrays, scanning circuitry, and logic chain may be fabricated upon one silicon single-crystal wafer, using the state-of-theart photolithograph techniques. It is not unreasonable to contemplate arrays of 500 x 500 elements.

Path of Electron Beam I Output Plane 4 Bright Spots

The logic chain gives a signal if and only if a signal is present on both input terminals. The information in plane 1 is stepped sideways; i.e., change in column 2 is moved to column 3 one step at a time. At


506

each step an electron beam sweeps along a line and plane 2 is stepped through 1 to 5. If a signal is present on both inputs of the chain of comparators at some step, then the intensity of the electron beam is increased.

In a simpler scanning arrangement the two images may be mechanically scanned, one slowly, one rapidly and repetitively, across two columns of photosensors, for example, by rotating or oscillating mirrors. The scan rate of the images is synchronized to the faster scan of the electron beam of a display tube as before. Signals are compared as before and coincident signals give rise to a brief increase in intensity of the electron beam. Such devices are fully within the present state of the art of silicon microelectronic technology.

f Column of Photo Detectors for Image 1 Photo Detectors for Image 2 ElectronBeam Controls Brightness of Electron Beam Image Stepped Slowly Image Scanned Quickly


507

18On the Theory of Relational Structures and Schemata for Parallel Computation

References and Related Publications

1. A. R. Bednarek and K. D. Magill, Jr., "Q-Composable Properties, Semigroups and CM-Homomorphisms," Trans. Am. Math. Soc. 171 (1972), 383-398.

2. A. R. Bednarek and K. D. Magill, Jr., "Semigroups of Clopen Relations," submitted.

3. A. R. Bednarek and 0. E. Taulbee, "On Maximal Chains," Rev. Roum. Math. Pures Appl. 11 (1966), 23-25.

4. A. R. Bednarek and S. M. Ulam, "Generators for Algebras of Relations," Bull. Amer. Math. Soc. 82 (1976) 781-782.

5. W. A. Beyer, M. L. Stein, and S. M. Ulam, "The Notion of Complexity," Los Alamos Scientific Laboratory report LA-4822 (December 1971).

6. R. E. Bonner, "On Some Clustering Techniques," IBM J. Res. Dev. 8 (1964).

7. A. H. Clifford, "A Proof of the Montague-Plemmons-Schein Theorem on Maximal Subgroups of the Semigroup of Binary Relations," Semigroup Forum 1 (1970), 303-314.

8. A. H. Clifford and D. D. Miller, "Union and Symmetry Preserving Endomorphisms of the Semigroup of All Binary Relations on a Set," Czech. Math. Journ. 20 (95) (1970), 303-314.

9. L. N. Cooper, "A Possible Organization of Animal Memory and Learning," Nobel Symposia 24 (1973), 252-264.

10. M. Crestey, "Applications Multiformes Partielles," Collectanea Mathematica, Univ. of Barcelona 16 (1964), 111-126.

11. T. Evans, "Embeddings for Multiplicative Systems and Projective Geometries," Proc. Amer. Math. Soc. 3 (1952), 614-620.

12. C. J. Everett and S. M. Ulam, "Projective Algebra I." Am. J. Math. 68 (1946), 77-88.

13. B. Faber and J. Larsen, Private communication.

14. L. M. Gluskin, "Automorphisms of Semigroups of Binary Relations." Ural. Gos. Univ. Mat. Zap. 6, tetrad 1 (1967), 44-54.

15. E. Howorka, "Generators for Algebras of Relations. Preliminary Report," Notices Am. Math. Soc. 24 (1977), A-4-A-5.

16. K. D. Magill, Jr., "Automorphisms of the Semigroup of All Relations on a Set," Canadian Math. Bull. 9 (1966), 73-77.

17. K. D. Magill, Jr., "Isomorphisms of Triform Semigroups," J. Australian Math. Soc. 10 (1969), 185-193.

18. N. Metropolis and Gian-Carlo Rota, "Significance Arithmetic on the Algebra Binary Strings," in Studies in Numerical Analysis, B. K. P. Scaife, Ed., (Academic Press, New York and London, 1974) pp. 241-252.


508

19. J. S. Montague and R. J. Plemmons, "Maximal Subgroups of the Semigroup of Relations," J. Algebra 13 (1969), 575-587.

20. J.C.C. McKinsey, "Postulates for the Calculus of Binary Relations," The Journal of Symbolic Logic 2 (1940), 85-97.

21. M.M. Nass, and L. N. Cooper, "A Theory for the Development of Feature Detecting Cells in Visual Cortex," Report, Center for Neural Studies, Brown University.

22. E. N. Norris, "A Characterization of Finiteness," unpublished.

23. R. E. Osteen, "Clique Detection Algorithms Based on Line Addition and Line Removal," Siam J. Appl. Math. 26 (1974), 126-135.

24. R. J. Plemmons and B. M. Schein, "Groups of Binary Relations," Semigroup Forum 1 (1970).

25. S. Schwarz, "On the Semigroup of Binary Relations on a Finite Set," Czech. Math. J. 20 (1970), 632-679.

26. S. Subbiah, "Another Proof of a Theorem of Evans," Semigroup Forum 6 (1973), 93 94.

27. A. Tarski, "On the Calculus of Relations," The Journal of Symbolic Logic 3 (1941), 73-89.

28. S. M. Ulam, A Collection of Mathematical Problems (Interscience, New York, 1960).

29. S. M. Ulam, "Computers," Sci. Am., Vol. 211, No. 3 (1964), 203216.

30. S. M. Ulam, "Computations on Certain Binary Branching Processes," in Computers in Mathematics Research, (North Holland Publishing Company, Amsterdam, 1968). 31. S. M. Ulam, "Electronic Computers and Scientific Research, (Parts I and II)," Comput. Autom., Vol. 12, No. 8, 20-4 (Pt. I); Vol. 12, No. 9, 35-40 (Pt. II), 1963.

32. S. M. Ulam, "General Formulations of Simulation and Model Construction " in Prospects for Simulation and Simulators of Dynamic Systems, (Macmillan & Co., Ltd., London, New York, 1967).

33. S. M. Ulam, "Generalizations of Product Isomorphisms (Resume)", in Recent Trends in Graph Theory 186, Lecture Notes in Mathematics, Springer, 1971.

34. S. M. Ulam, "On Some Possibilities in the Organization and Use of Computing Machines," International Business Machines report IBM-RC-68, 1957.

35. S. M. Ulam, "Some Ideas and Prospects in Biomathematics," Annu. Rev. of Biophys. Bioeng., 1 (1972), 227-292.

36. K. A. Zareckii, "The Semigroup of Completely Effective Binary Relations," Theory of Semigroups and Appl. I (Russian), Izdat., Saratov. Univ., Saratov (1965), 238-250.


509

19—
The Scottish Book a LASL Monograph:
(LA-6832, 1977)

The "Scottish Book" played a vital role in the stimulation of mathematical research--particularly in topology and in real and functional analysis. It is a collection of problems posed by the Lwdw mathematicians before World War II, to which Ulam was one of the principal contributors, and which he translated, had typed (at his own expense), and circulated privately in 1957.

Its first printed version appeared in 1977 as a LASL Monograph. More recently a new version under the title "The Scottish Book, Mathematics from the Scottish Cafe", edited by R. Daniel Mauldin, (BirkhaiuserBoston, 1982) delineates the evolution of this collection, lists the problems, and provides commentaries on the status of most of them.

While it would be redundant to reproduce the entire LASL Monograph here, Ulam's preface to his 1957 translation of the original collection is of historical interest and is thus included. (Eds.)

Preface to Monograph

Numerous requests for copies of this document, addressed to Los Alamos Scientific Laboratory or to me, appear to make it worthwhile (after a lapse of some 20 yrs) to reprint, with some corrections, this collection of problems.

This project was made possible through the interest and active help of Robert Krohn of the Laboratory.

It is a pleasure to give special thanks to Dr. Bill Beyer for his perspicacious review of the changes and the revised version of some


510

formulations. Thanks are due to Martha Lee DeLanoy for editorial work.

Stan Ulam Los Alamos, NM May 1977

Preface

The enclosed collection of mathematical problems has its origin in a notebook which was started in Lw6w, in Poland in 1935. If I remember correctly, it was S. Banach who suggested keeping track of some of the problems occupying the group of mathematicians there. The mathematical life was very intense in Lw6w. Some of us met practically every day, informally in small groups, at all times of the day to discuss problems of common interest, communicating to each other the latest work and results. Apart from the more official meetings of the local sections of the Mathematical Society (which took place Saturday evenings, almost every week!), there were frequent informal discussions mostly held in one of the coffee houses located near the University building-one of them a coffee house named "Roma," and the other "The Scottish Coffee House." This explains the name of the collection. A large notebook was purchased by Banach and deposited with the headwaiter of the Scottish Coffee House, who, upon demand, would bring it out of some secure hiding place, leave it at the table, and after the guests departed, return it to its secret location.

Many of the problems date from years before 1935. They were discussed a great deal among the persons whose names are included in the text, and then gradually inscribed into the "book" in ink. Most of the questions proposed were supposed to have had considerable attention devoted to them before an "official" inclusion into the "book" was considered. As the reader will see, this general rule could not guarantee against an occasional question to which the answer was quite simple or even trivial.

In several instances, the problems were solved, right on the spot or within a short time, and the answers were inscribed, perhaps some time after the first formulation of the problem under question.

As most readers will realize, the city of Lw6w, and with it the "Scottish Book," was fated to have a very stormy history within a few years of the book's inception. A few weeks after the outbreak of World War II, the city was occupied by the Russians. From items at the end


511

of this collection, it is seen that some Russian mathematicians must have visited the town; they left several problems (and prizes for their solutions). The last date figuring in the book is May 31, 1941. Item Number 193 contains a rather cryptic set of numerical results, signed by Steinhaus, dealing with the distribution of the number of matches in a box! After the start of war between Germany and Russia, the city was occupied by German troops that same summer and the inscriptions ceased.

The fate of the Scottish Book during the remaining years of the war is not known to me. According to Steinhaus, this document was brought back to the city of Wroclaw by Banach's son, now a physician in Poland. (Many of the surviving mathematicians from Lwow continue their work in Wroclaw. The tradition of the Scottish Book continues. Since 1945, new problems have been formulated and inscribed and a new volume is in progress.)

A general word of explanation may be in order here. I left Poland late in 1935 but, before the war, visited Lw6w every summer in 1936, '37, '38, and '39. The last visit was during the summer preceding the outbreak of World War II, and I remember just a few days before I left Poland, around August 15, the conversation with Mazur on the likelihood of war. It seems that in general people were expecting another crisis like that of Munich in the preceding year, but were not prepared for the imminent world war. Mazur, in a discussion concerning such possibilities, suddenly said to me, "A world war may break out. What shall we do with the Scottish Book and our joint unpublished papers? You are leaving for the United States shortly, and presumably will- be safe. In case of a bombardment of the city, I shall put all the manuscripts and the Scottish Book into a case which I shall bury in the ground." We even decided upon a location of this secret hiding place; it was to be near the goal post of a football field outside the city. It is not known to me whether anything of the sort really happened. Apparently, the manuscript of the Scottish Book survived in good enough shape to have a typewritten copy made, which Professor Steinhaus sent to me last year (1956).

The existence of such a collection of problems was mentioned on several occasions, during the last 20 years, to mathematical friends in this country. I have received, since, many requests for copies of this document. It was in answer to such oral and written requests that the present translation was made. This spring in an article, "Can We Grow Geniuses in Science?", which appears in Harper's June 1957 issue, L. L. Whyte alluded to the existence of the Scottish Book. Apparently, the diffusion of this small mystery became somewhat widespread, and this provided another incentive for this translation.


512

Before deciding to make such an informal distribution, I consulted my teacher and friend (and senior member of the group of authors of the problems), Professor Steinhaus, about the propriety of circulating this collection. With his agreement, I have translated the original text (the original is mostly in Polish) in order to make it available through this private communication.

Even as an author or coauthor of some of the problems, I have felt that the only practical and proper thing to do was to translate them verbatim. No explanations or reformulations of the problems have been made.

Many of the problems have since found their solution, some in the form of published papers (I know of some of my own problems, solutions to which were published in periodicals, among them Problem 17.1, Z. Zahorski, Fund. Math., Vol. 34, pp. 183-245 and Problem 77(a), R. H. Fox, Fund. Math., Vol. 34, pp. 278-287).

The work of following the literature in the several fields with which the problems deal would have been prohibitive for me. The time necessary for supplying the definitions or explanations of terms, all very well understood among mathematicians in Lw6w, but perhaps not in current use now, would also be considerable. Some of the authors of the problems are no longer living and since one could not treat uniformly all the material, I have decided to make no changes whatsoever.

Perhaps some of the problems will still present an actual interest to mathematicians. At least the collection gives some picture of the interests of a compact mathematical group, an illustration of the mode of their work and thought; and reflects informal features of life in a very vital mathematical center. I should be grateful if the recipients of this collection were willing to point out errors, supply information about solutions to problems, or indicate developments contained in recent literature in topics connected with the subjects discussed in the problems.

It is with great pleasure that I express thanks to Miss Marie Odell for help in editing the manuscript and to Dr. Milton Wing for looking over the translated manuscript.

S. Ulam Los Alamos, NM Fall 1957


513

20—
On the Notion of Analogy and Complexity in Some Constructive Mathematical Schemata:
(LA-9065-MS, October 1981)

This report is a study of some schemata which might be used in the processes of understanding and invention. (Author's note). *

Abstract

Banach often remarked "Good mathematicians see analogies between theorems or theories; the very best ones see analogies between analogies." Mark Kac certainly belongs to the latter group. His work on problems in statistical mechanics and in number theory, disciplines so different from each other, exhibits a feeling for the role of the ideas of probability analogous in some way in these two domains which are so far apart.

In what follows I shall try to sketch an elementary approach to the notion of analogy and suggest a few mathematical problems that pose themselves once one tries to discuss this notion in a somewhat general way, namely, similarity or proximity of proofs and counting binary and unary operations at each stage with similar trees on the set of axioms.

* The report was published in "Probability, Statistical Mechanics, and Number Theory," A volume dedicated to Mark Kac, G.-C. Rota editor, Academic Press, Inc., 1986 (Eds.)


514

I—
Generalizations about Analogy

Throughout the development of mathematics and with the growth of new concepts and more complicated notions, a cohesive tendency and organic structure have been guided by a feeling of analogy between the old and new ideas.

Historically, problems posed by the development of a new mathematical discipline, which originally was only metamathematical, coalesced into new parts of mathematics itself. One could cite, as obvious examples, the study of transformations of a space as points of a new space of such transformations, and the study of algorithms for solving equations as entities per se (group theory, for instance).

The increasing proliferation of notions in pure mathematics may suggest that the idea of analogy itself is amenable to mathematical discussion. One finds that old and elementary formulations of this idea are, in special cases, present in the definitions of the similarity of geometrical figures, more generally in the equivalence of figures-sets, through the elements of a group of transformations, or, more generally yet, through the identity of proximity of such sets in spaces which encompass them.

Two abstract sets of elements may be felt to be "analogous" if the difference between their cardinalities (in the finite case) is small compared to the cardinalities themselves. Two classes of such sets may be deemed to be analogous if the numbers of sets in the two classes differ by "little" and if the cardinalities of the corresponding sets also do not differ by much. Obviously, one needs to attempt to formulate a quantitative criterion, and it is clear a priori that the notion of analogy will not be, in general, transitive.

In this report we merely want to discuss some of the salient features of analogy and exhibit them on a class of examples where we shall attempt to define it, at first, as proximity in the sense of a metric distance in suitably defined spaces.

Our first example dealing with a linear or one-dimensional array of symbols will illustrate the definitions and the role of possible metrics in determining the corresponding analogy. For simplicity's sake we consider finite sequences of 0's and l's. One kind of distance between them was defined in a report of mine.1 These sequences are to represent the DNA codes (which factually use four symbols, but the whole discussion and results are equally valid in that case). The distance p between sequences x =al, a2 , ..., aCn and y = (31,P2,...,/3 m, where ci and 3i are 0 or 1, is defined as the smallest number of steps that, performed on one or both of these sequences, will render them identical. The steps are of two kinds: replacing 0 by 1 or vice-versa, or


515

erasing (or inserting) a symbol. One can prove easily that the number just defined satisfies all the properties of a metric, in particular the triangle inequality holds. We may consider two such sequences or codes as analogous or related if the distance between them is small compared to theirlengths.

Another type of distance between these sequences can be described as follows:

Given the two sequences x = {ai, Ca2 , . ., a m} and y = {/1,/2, · · , /im}, ai and 3i are 0's or 1's; we look for the difference in the number of each type in x and in y. The absolute value of these differences of two types we describe by K1 . Next, given a system of two successive symbols either (0,0), (0,1), (1,0), (1,1), we compute the difference of the number of occurrences of each type in x with the corresponding number in y. Their sum is K2. Similarly for triplets and so on. The sum of the absolute values of all these differences may serve as a "statistical" type of distance between x and y.

Still another distance could be the following:

Again we consider two types of steps. The first one is an erasure of any symbol (in either x or y). The second consists of choosing any element in x (or y) and placing it in between in another position. The smallest number of such steps making the two sequences identical is a metric distance.

We could modify this definition by "weighing" the transposition by its length. The distance would be the minimal "work" sufficient to make the two sequences identical.

One may define a distance for infinite sequences of this type by suitable passages to the limit. We shall discuss this later. Needless to say, one-dimensional (linear) sequences of symbols may represent a succession of sounds or other signals such as encephalographic scans. The metric or a distance between two such sequences should obviously depend on the interpretation of these sequences.

One may consider two-dimensional arrays of points or of more general numerical symbols.

To begin, we imagine that the sets or pictures are subsets of a bounded part of a Euclidean space with, say, the Euclidean metric in it (or a similar Minkowski-type metric). A widely used metric for closed subsets of this space is the Hausdorff distance. It is defined as follows. pH(A,B) = Max Min pE(x,y) + Max Min pE(x,y) yeA xeB yeA xeB


516

where A, B are sets (closed) in E; PE is the metric in E. We want to define a metric in a space whose elements are classes of sets in E2 . Given a set or a picture, we consider sets which are translations or diminutions or enlargements of the given set; more generally, sets obtainable from the given one by "small" deformations. "Small" means that we have an e-neighborhood of a group or a semigroup of transformations of E2 and consider sets obtained from a given one as similar to it. This way we have a collection of classes of sets and one may define a distance function between any two such classes by iterating the Hausdorff procedure.

p(X,Y) = Max Min PH (A,B) + Max Min PH (A,B) AeX BeY BeY AeX where X, Y are (compact) classes of sets and PH is the Hausdorff distance as defined above.

In particular this definition applies to subsets of a finite set of points (that is, a grid or lattice of points-a "screen"). Other distances are possible in analogy to the different metrics defined earlier for the one-dimensional case. We may, given two sets A and B, add or subtract individual points to any of them. A "step" involving k points will contribute k units of distance. A metric between the two sets can be defined as the minimum of the Hausdorff distance added to k. In other words, the Hausdorff difference between two fixed sets is modified by the optimum (smallest) k. Given two classes of sets we again iterate the Hausdorff procedure as above. One can prove again the triangle property of this distance.

A more general procedure of this type can be defined by starting with a more general distance in the original grid of points, for example starting with a distance of the type proposed by A. Bednarek.2 Again, it is a minimum number of steps that was used for the distance formula.*

In the case where sets A, B are infinite, the changes affected by adding or subtracting a finite number of points have to be generalized to adding or removing sets of "small" measure (or length).

The analogy between sets or classes of sets is thus representable, in one way at least, by the smallness of a distance in a suitable metric

* The minimum number of steps could be thought of as the minimum work of transition from one configuration to the other. A memoir of Appell3 deals with the problem of deblais and remblais. This involves the calculation of the minimum work sufficient to move a pile of "sand" of a given shape into another prescribed one.


517

defined for them. Problems on the stability of properties of sets or transformations or their invariance with respect to some changes have been considered in many special cases. Thus, for example, the problem of transformations of a space, say the Euclidean or Hilbert spaces, that change the distances between points by less than a given e > 0, was studied in a paper by Hyers and myself.4 Similarly, the problem of transformations that are almost linear is discussed in another paper with Hyers.5

The general problem of whether a transformation of a metric group into itself which is almost an isomorphism must be close to a strict isomorphism is still open. A recent result of D. Cenzer6 brings a positive solution in the case of some Abelian groups, for example, the group of rotations on a circle or torus.

One could pose analogous (sic!) more general problems about other functional equations. For example, suppose a function of two complex variables satisfies up to an e > 0 an algebraic addition theorem, must there exist then a function satisfying an algebraic addition theorem exactly and within c(e) of the given function?

Let us consider still another question of "stability" of more geometric nature. Given two surfaces which can be mapped into each other in such a way that the curvatures and the inverses of curvatures in corresponding points differ by less than e > 0, do there exist surfaces within c(e) of the given ones that are strictly isometric in the sense of internal geometry?

Certain topological properties are also stable with respect to transformations that are not necessarily homeomorphisms but more general ones-continuous, and not collapsing two points if the distance between them exceeds a certain e > 0. There is, for example, a result of Borsuk and myself7 which asserts that if a continuous transformation T of the surface of the unit sphere in n-dimensions is such that if the distance between p and q is > (v/2/2) r(S) where r is the radius of S, then T(p) : T(q). Then the image separates the space. An interesting distance between topological types proposed by Borsuk8 is defined as follows. Given two sets A,B in a metric space E, one considers all continuous mappings of A onto B and vice-versa. If A and B are homeomorphic, such mappings, if they exist at all, must collapse the images of a pair of points and either A or B or both. In the compact case the smallest sum of their distances is Borsuk's measure of the topological distinction between A and B.

One way to mathematize some ideas of analogy is then to consider a space 2E with a metric p, e. Two sets of elements of 2E will belong to a cluster if their distance according to the metric p is < e. A cluster of analogous sets we define as a collection such that any two


518

sets have a distance < e. (These clusters in general are, of course, not disjoint.) If T is a transformation of the set E into itself such that if two sets A, B, are analogous, then their images T(A) and T(B) are also analogous. One should want to find conditions on E, p, e such that a transformation T of this sort preserves the analogy; that is, A is analogous to T(A) for all A. Theorems of this sort would be generalizations of some of the stability theorems mentioned above on e-isometries, c-linear transformations, etc.

Another investigation should concern, so to speak, analogies between analogies. Considering now E, p, e, we may, as the next order, define a metric between clusters of sets within which any two sets are analogous. Using the Hausdorff formula as above, we will define a metric between such clusters. Given this metric we then have super clusters consisting of analogous classes.

Given E, p, e, and another set F, a, rT, where (a is a metric in F and 7r > 0, we shall call the F systems of analogies a representation or image of the E systems if the combinatorial structure of the clusters in 2E can be mapped into a combinatorial structure of 2F with a monotone correspondence of e --r, and preserving the Boolean relations between clusters of 2E homomorphically into some clusters in 2F.

An interesting case is if there should exist a finite representation of E, p, e, that is to say, the set F consists of a finite number of points.

II—
Complexity

In recent years, a great number of studies have dealt with the mathematization of the idea of complexity in mathematical schemata. In this report, we want to study some special definitions of complexity, especially relative complexity of mathematical constructs, and consider the equality or approximate equality of complexity in two different mathematical setups as one characteristic of analogy.

We shall start with the account of work contained in a Los Alamos Laboratory report.9

1. Consider the set of positive integers Z and the operations addition, multiplication, and exponentiation. The complexity of an integer n will be the smallest number of steps using the above operations and starting with 1, adding the complexities of the numbers used to obtain N plus the complexities of the operations addition, multiplication, and exponentiation that in our first exercise we took as all = 1. The complexity of 1 we assumed to be = 1. We denoted the complexity of n


519

by c(n). In our report we prepared a table of complexities of the first thousand integers, indicated an algorithm on a computer to calculate it, and gave some asymptotic expressions on the behavior of c(n). We have also stated a number of problems. One of them is the following.

Call a number k complicated if c(k) < c(£) for all e < k. One of the conjectures was that beginning with a certain no, all complicated numbers are primes.

We could define a relative complexity of an integer n relative to m by the smallest number of operations sufficient to obtain n not counting the complexity of m. (We will discuss this notion later in a more general context.)

The complexity c(n) may be considered more generally as follows. We have a number of symbols al, a2, ..., an whose complexities are given at the start. We have also a number of binary or unary operations R 1, R 2, ..., Rt. The complexity of any new symbol c(x) is defined by the following: c(ai),c(a2), ..., c(an); c(R1), c(R2 ), ..., c(Rt)are given initially. If z =Ri(x, y) then c(z) <c(x)+ c(y) + c(Ri) and the sign of equality obtains for at least one such representation.

2. One could discuss, in the same spirit as above, the complexities of integers allowing, for example, the operation of subtraction a - b. It is obvious that the complexity values will change radically. We have indicated in our report a complexity within a system of integers modulo p.

Analogously one could define complexities within the system of Gaussian integers a + bi. More generally yet, in a finite or countable infinite ring, it is obvious how to define a complexity of its elements assuming the complexity of some starting elements and of the ring operations.

3. Another "complex system" could be a group of permutations, say the symmetric group Sn of permutations or the semigroup Tn of all transformations of the set of n integers into itself. Here one would again start with a number of algebraic elements, that is, permutations with given complexities, and assign a value of complexity to the operation of composition and/or to the operation of taking an inverse of a permutation. The complexity of a permutation would be the smallest number of steps starting with the given elements sufficient to obtain a given one. Again an interesting question would be: what is the average complexity of a permutation; what is the dependence of this value on the initial "generating" elements; what are the asymptotic bounds on these for n tending to infinity, etc.?

4. We may consider a set E finite or consisting of all the integers, its direct product En and the Boolean algebra on its subsets. We also


520

allow the arithmetic operations, say of addition and/or multiplication of numbers on the subsets. For example, given two subsets A, B, we define C =A + B as the set of all elements c representable by sum or multiplication of one element from each of A, B, c = a + b, c = a b, a e A, b eB. Starting with some "elementary" subsets S1... Sk with assigned complexities, we define the complexity of sets they generate in the same way as in the above examples, by counting the number and complexity of steps in the shortest way.

5. One may consider the same as in 4, that is, subsets of En by allowing the operators of projection of a set in En on Em(m < n), which is an unary operation, also the binary operation of direct product of sets in the "lower dimensions." These, of course, allow a treatment of quantifiers on a given class of sets obtainable from an initial base of sets. For the case of n = 2, one has a way to define complexity in the projective algebras10 or cylindrical algebras.11

6. One may consider in the case of n = 2 the operation of composition of sets treated as relations, together with Boolean algebra operations. Starting with some elementary sets, one will arrive at complexity definitions for sets in the relation algebra generated by them.12

7. Genealogical systems:13 A branching process of a multiplicative system14 constitutes possibly the simplest example of a complexity structure. In this process, starting with one or more elements, one considers the generation of progeny by mitosis, that is to say, each particle by itself gives rise to 0, 1, ... other particles and the process continues. The complexity of an element would be, of course, its number of ancestors leading to the original ones.

A much more complicated situation arises when the population is endowed with "sex" and the generation proceeds by pairing and the production of off-springs with the process continuing. Compared to the rather well-developed theory of branching processes, results on such "pair trees" are still meager and the analogs of the theorems known for branching processes remain unproved.

In some simple cases metrics have been defined for such genealogical trees,15 and a probabilistic study of the degree of relationship has been initiated.

The complexity of an element could be again the shortest path leading to the original ancestor (in our general schema both direct mitosis and meiosis are allowed in the procreation). A relative relationship, or relative complexity, would be defined as above as an asymmetric distance between two elements.


521

8. Formal systems more generally: A most comprehensive example would be given by a formal system of mathematics, for example, one due to Godel or a system of expressions obtainable by a Turing machine operating, say, in lexicographic order on all possible continuations and combinations of operations. Meaningful results could be obtained only with a "natural or economic" Godel numbering system to mirror our intuitive and historical feeling about the development of logic and mathematics itself.

We might construct such trees of formal systems by discrete generations: given an n-th collection of expressions, we operate with all possible unary and binary combinations of the ones so far obtained to construct the (n + 1)th collection.

This gives rise to the "meaningful" expressions. If we have a system of axioms to start with, we may establish an identity between very many of these expressions. The identity of many pairs of statements, or that certain expressions equal 0, will not be obtainable from the axioms and rules of identification; these are the undecidable propositions or statements independent of the axioms. The number of meaningful expressions grows, roughly speaking, as 2n asymptotically, but with "pruning" due to the identities provable from the axioms will be much smaller. The number of essentially different ones will be enormously reduced.

III—
Comparisons between the Complexity of Constructive Systems

Given two systems such as those above, for example, among the ones enumerated earlier, we may consider the question of homomorphism of one with a subsystem of the other. Our systems are examples of partially ordered sets and differ from genealogical systems proper in that a single element or "expression" may be the offspring of many different pairs of parents. Homomorphic mapping of one into the other would be one where relative complexities between pairs of elements are either preserved or decreased. (We repeat that a relative complexity of an element b with respect to a is the length of the shortest path leading from a terminating in b where we add up the complexities of all expressions and the complexities of operations linking the intervening pairs without adding the complexity of a.)

A number of problems arise concerning the comparison between our examples as homomorphic images transforming one complexity system onto parts of another.

A question of combinatorial interest deals with an asymptotic expression for the complexity of elements of the n-th generation as a


522

function of n; for example, the average such complexity and, if we have a system which is finite, the average complexity for all elements and the average relative complexity of pairs (a,b) where a -< b; a precedes b in the construction.

In any of our systems, given an element, that is, an expression, we may look for the first time or the earliest generation where its equivalent appeared. This occurs, say, in the k-th generation. We denote by £ the length of the expression (this is the total number of symbols and operations involved in it together). One may be interested in a comparison of f with k. Expressions or statements where k is very large compared to £ may be "interesting." In a system of integers with only addition and multiplication allowed (starting with 1 as in the above example) certain primes (for example, those of the form a2 + 1) will be "interesting."

In a development of a "constructive system," one often introduces new operations which are abbreviations, in addition to the original Ri. For example, an n-th iteration or repetition of a binary operation may be denoted by a new symbol R whose complexity is defined as the complexity of R plus the complexity of n without adding the complexity of R n times.

A genetic development of a mathematical discipline in some way resembles the evolution of organisms and perhaps of matter itself. (See, for example, a mathematical schema for the development of physical patterns through a process of transmutations.16 )

In the development by recursion of geometrical figures17 and in the development of organisms or in the evolution of species, one may study a principle of "minimum total complexity" of the intervening stages between the initial and final positions. A "variational" principle of this kind would single out some out of all possible histories of the process between two given states.

Coming back to mathematical schemata, we can define a complexity of a rational number starting with the definition of complexity for integers by considering the operation of division with a given complexity and defining the complexity of the fraction a/b as the sum of the complexity of a and b plus the complexity of the operation of division. One could even attempt to define a complexity of a real number as the inferior limit of the complexities of the rational numbers converging to it normed by the complexity of the numerators and denominators suitably scaled, for example, by the logarithm of the two complexities.

Given two finite metric spaces we can consider a mapping of one into the other that minimizes the sum of the differences in the distances between corresponding pairs of points. Again, if the two spaces are, say, bounded, and the two metrics are defined on a topological measure


523

space, we can consider as the measure of analogy between the two metrics an integral of these differences taken over the space of all pairs of points.

Much of mathematics consists in ascertaining, in the developing and increasing formal systems, identities between different expressions, for example, in a set theoretical framework, showing some of them to be equal to zero.

Another endeavor or exercise could be an attempt to define, perhaps in a manner similar to the above, the complexity of proofs starting with a number of identities, that is, axioms. The similarity or analogy of this process continuing indefinitely suggests that mathematics itself exhibits the behavior of "eadem mutata resurgit."

In our search for properties of the notion of analogy, we may consider in a given system analogy-preserving transformations. We mentioned earlier theorems asserting that a transformation satisfying a linear function equation up to e must be close to one satisfying it exactly: a linear transformation that is almost isometric is close to a strictly isometric one.

In a space of elements which we metrize to define analogy by an e-proximity we may consider transformations of the space on itself that preserve analogy, that is, any two analogous elements are transformed into analogous ones. One would like to know in some cases that the transformation of the Euclidean space that has the property that a pair of e-congruent sets goes over into e-congruent ones is close to a congruence. One can easily show that the transformation itself is either a congruence or a change of scale in the distance. Similarly, one may try to prove analogous statements for more general definitions of analogy.

References

1. S. M. Ulam, "Some Ideas and Prospects in Biomathematics," Annual Review of Biophysics and Bioengineering 1, 277-291 (1972).

2. A. R. Bednarek and Temple F. Smith, "A Taxonomic Distance Applicable to Paleontology," Mathematical Biosciences 50, 285295 (1980).

3. Paul Appell, "Le Probleme G6ometrique des Deblais et Remblais," Memorial des Sciences Math6matiques de l'Academie des Sciences, fascicule 27, (1928) Gauthier-Villars, Paris.

4. D. H. Hyers and S. Ulam, "On Approximate Isometries," Bull. Am. Math. Soc. 51, 208 216 (1945).


524

5. D. H. Hyers and S. Ulam, "Approximate Isometries of the Space of Continuous Functions," Ann. Math. ser. 2, 49, 285-289 (1947).

6. D. Cenzer, "The Stability Problem for Transformations of the Circle," Proc. Royal Soc. of Edinburgh 84A, 279-281 (1979).

7. K. Borsuk and S. Ulam, "Uber gewisse Invarianten der e-Abbildungen," Math. Annalen, 312-318 (1933).

8. K. Borsuk, "On Some Metrizations of the Hyperspace of Compact Sets," Fund. Math. 41, 168 201 (1955).

9. W. A. Beyer, M. L. Stein, and S. M. Ulam, "The Notion of Complexity," Los Alamos Scientific Laboratory Report LA-4822 (December 1971).

10. C. J. Everett and S. Ulam, "Projective Algebra I," Amer. Jour. Math. 68, 77-88 (1946).

11. L. Henkin, D. Monk, and A. Tarski, Cylindrical Algebras (North Holland Publishing Co., 1971).

12. A. R. Bednarek and S. Ulam, "Projective Algebra and the Calculus of Relations," Jour. Symb. Logic 43, 54-64 (1978).

13. J. Mycielski and S. Ulam, "On the Pairing Process and the Notion of Genealogical Distance," Journ. Comb. Theory 6, 227-234 (1969).

14. C. J. Everett and S. Ulam, "Multiplicative Systems I," Proc. Nat. Acad. Sci. USA 34, 403-405 (1948). Also, C. J. Everett and S. Ulam, "Multiplicative Systems in Several Variables, I, II," Los Alamos Scientific Laboratory report LA-683 (1948), LA-690 (June 1948).

15. J. Kahane and R. Marr, "On a Class of Stochastic Pairing Processes and the Mycielski-Ulam Notion of Genealogical Distance," Jour. Comb. Theory, A, 13, 33--40 (1972).

16. S. Ulam, "On the Operations of Pair Production, Transmutations and Generalized Random Walks," Adv. Appl. Math., 1, 7-12 (1980).

17. R. G. Schrandt and S. M. Ulam, "On Recursively Defined Geometrical Objects and Patterns of Growth," Los Alamos Scientific Laboratory Report LA-3762 (November 1967).


525

21—
Speculations about the Mechanism of Recognition and Discrimination:
(LA-UR-82-62, 1982)

This report is a preprint of a talk I gave at Los Alamos in 1981, speculating about some of the methods which may be used in some processes occurring in the nervous system. (Author's note.)

Let me first say that this title is not quite exact. I may want to speculate, but it won't be about the physiological or anatomical nature of memory, about which I know nothing! At the end I may venture my own private little questions. I don't think anybody knows really what the true physiological elements of recognition are.

The talk will be about more abstract mathematical schemata, some of which may perhaps have a physical basis. I'll try to talk first about various ways in which a visual picture is recognized. Towards the end, I'll talk about how, with suitable changes, this may apply to other sets of objects--for example linear arrays like DNA codes, or to auditory experiences which are more or less linear too, as for example a sequence of musical notes.

I don't dare speculate too seriously about three-dimensional objects; that indeed is the domain of immunology, for example, or about olfactory recognition which refers to the recognition of molecules by something in our nose. But could all this be coded up linearly perhaps? Or could the shape of molecules have something to do with infrared radiation? I have looked at a book about olfactory problems, but it is ten years old and I suppose people know more now.


526

Let me now talk about some purely mathematical attempts to give the combinatorial schemata for what we call recognition.

Recognition is already more ambitious than what I would call discernment or discrimination, namely the finding of a difference between two signals. I have not seen that discussed per se in the literature, but it seems to me that it is more elementary to distinguish two different letters, for example, than to recognize an object stored in the memory. Discernment or discrimination I don't know what to call it is something we have experimented on long ago on computers in this lab.

The main discussion will be about distinguishing between twodimensional objects or pictures. The mathematical tool, or at least notion-for I don't think it deserves yet the name of tool-the mechanism I will talk about is the idea of a distance between objects such as, for instance, pictures on a screen.

First let me explain the properties of a distance:

Suppose we have a set E of objects we will call a, b, c,.... A distance p(a, b) is a real valued function of pairs of elements of E. It should be > 0. When it is 0, this means that a = b. It is symmetric, p(a, b) = p(b, a). It also has a very important property for all applications called the triangle inequality, p(a, c) >p(a, b) + p(a, c) for all a,b, c. If you have such a function on a set E, the set is called a metric space.

Mathematicians and physicists are familiar, of course, with metric spaces such as Euclidean space, Hilbert space and all kinds of function spaces, manifolds with "curved" geometry, with distance measured on geodesic lines, and so on.

I would like to give examples of differently constructed metric spaces. These may have something to do with at least a language for certain biological phenomena different from metric spaces in physics.

Here are various fifteen-year-old attempts to define a distance between sequences of symbols of a set of DNA codes, for example. The sequences consist of long arrays of four letters, A, C, T, G or U, looking for instance like ACTTGGA ....

For simplicity's sake, instead of using four letters, I'll take a sequence of just two letters and will call them 0 and 1. One such sequence may be A = 011110101001... 01... the other B may be B = 101101.... The sequences are quite long--they may have a thousand or two thousand letters, and perhaps they form the code for some definite molecule.

How can we compare them? The idea is to define a quantitative measure of this comparison, or a distance between them. Walter Goad*

* Los Alamos physicists who have become interested in biology. (Eds.)


527

and George Bell have occasionally defined various distances. I myself have played with this ten or twelve years ago and considered distances differently from the line of pure mathematics.

One of the simplest ways to define a metric for sequences ao ... aN, of symbols 0's and l's is, for example: N x=-Ol...OZN, y = .1 . N, p(X,y) = --3i | , i=l sometimes called the Hamming distance. Clearly this distance is not what one would want in biology for a distance between codes. In "pure" mathematics, as in most physical situations, the description of objects, i.e., their positions are fixed. They are so to say rigid, and their beginning and end are fixed, whereas in biological situations the objects are pliable. The above distance between the two sequences x = 0101...0101, y = 1010.. .1010 would have a large distance N because they differ in every place. But if we erase just one letter from each we see that the sequences are the same. If they were written on a circle they would be exactly the same.

The distance between two linear biological arrays could be defined as the mininum number of steps which will change one sequence into the other, or more generally, will work on both to bring them into the same form.

What are these steps? Given two sequences of 0's and l's, x = li ... a2... aN, y =1i 1 ... 2 ... /M (M could be different from N), we defihne the distance between x and y as the minrimlum number of steps which operating on one or the other or both of the sequences will bring x and y into the same form. One can prove that this minimum number satisfies the properties of a distance.

The steps are of the following types: 1. A change of a 0 into a 1 or vice versa. 2. An erasure of a symbol and a contraction of the remaining ones or an insertion of a symbol anywhere in the sequence. It turns out that in order to find out about this distance, a rather efficient algorithm devised by Peter Sellers determines it in less than N 3 steps.

A definition of the distance given above was presented in an article I wrote in the Annual Review of Biophysics.1

Distances of this sort can be used to compare sequences of DNA defining proteins, for instance. The biologist Margoliash had the idea of trying to reconstruct the evolutionary tree for organisms and animals based on the codes for cytochrome C present in essentially every living organism and seemingly constant within a species but (lifferent


528

from species to species. His idea was to find species whose codes for cytochrome C are less different from each other, or related more closely in the evolutionary tree, and species whose codes for this protein differ by more. To reconstruct a possible evolutionary tree led mathematically to the problem of a binary graph for the species now existing and perhaps some non-extant ones which have disappeared, starting with some very primitive organisms and in such a way that the sum of the distances in this binary graph of "descent" should have the sum of all the distances as small as possible. This postulate of Margoliash translates mathematically into an assumption that the collection of all mutations which occurred giving rise to the now existing variety of species was the least improbable among the possible ones.

In the space of the DNA codes defining the various cytochrome C sequences as a finite metric space we have a generalization of a problem by the nineteenth century German geometer Steiner who considered a finite system of points in the plane. His problem was to draw a graph through all the points, perhaps adding new auxiliary ones, so that the sum of all the edges would be minimal. Here, we have this kind of problem in a much more general combinatorial setting for a finite metric space.

One can think of many other types of distances for the sets of sequences of 0's and l's, or more generally for sequences with k symbols, k being a fixed number. One such definition, again involving the smallest number of steps which allow passage from one sequence to another, with the nature of steps defined ahead of time, is, for example, the following:

Suppose that we compare two sequences of O's and l's with the same number of l's and 0's in each and allow a step which consists of moving a 0 or a 1 from one position to another, the cost of this step being the length through which we move it. The "minimum" work to effect this is a possible distance between two such sequences.

Another definition can be obtained by comparing the number of l's in the two sequences, noting the difference between these numbers, then considering the 0's and l's in the two sequences, again noting the difference in these configurations and then the difference between symbols I followed by 1, etc. After this survey of pairs of succeeding symbols, counting triplets and so on, the sum total of such differences suitably normalized can serve as a distance.

We will now present a larger variety of possible distances between objects as sequences of two-dimensional "pictures." We shall concentrate on visual or two-dimensional impressions and on ways to quantify to a degree the similarity or lack of it between two-dimensional objects or "pictures." A variety of possible distances between two sets in a


529

plane, or more generally between two classes of sets in a plane, will be discussed. Just as in the case of one dimension, the possibility of "recognition" of a sequence of symbols involves the smallness of a distance between impressions (auditory or tactile, for instance), and the strings of symbols coding them which reside in the memory from previous impressions, a two-dimensional visual impression is compared with the picture or pictures stored in the memory of an organism.

Without at the moment going into possible physiological or anatomical ways to evaluate such distances, we shall discuss abstractly various ways of considering a metric space whose elements are sets in the plane. For our purpose it is sufficient to consider them finite, or if infinite, closed and bounded.

One can consider a topological distance due to Hausdorff. It is defined for closed subsets of a metric space E as follows: Let E be a space with metric PE(X, y). For any two closed sets A and B one can write as a distance PH(A, B) where pH(A,B) = Max Min pE(x,y) + Max Min pE(x,y) xeA yeB yeB xeA

But again this is not quite what one wants for a distance between two impressions or two sets given separately since the distance above depends on the position or mutual relation of two subsets A, B, in the plane (screen).

One can obtain a more satisfactory distance by iterating the Hausdorff formula as follows: Instead of a fixed set A consider a whole class of sets "like" A by slightly deforming, shifting, turning the given set A, and more generally by applying to A a number of transformations forming a neighborhood of the identity of a whole group of transformations. In this way we obtain a whole class A of sets. Proceeding analogously with the set B we obtain a class B. Assuming that these classes are finite, or compact in the case of an infinity of transformations, we may now consider a distance between A and B as: p(A,B) = Max Min pH(A,B) + Max Min pH(A,B) BeB AeA AeA BeB3

One of our contentions is that in problems of reaction to impressions (visual, auditory, tactile, or chemical), the organism produces a number of small variations of the impressions stored in the memory. Perhaps one is allowed to speculate that the memory could reside not only in the central nervous system, but could exist in the immunological or other autonomous parts of the organism.


530

There are other distances, in addition to the Hausdorff distance, which might actually be more suitable for the arrangements in the visual systems. The distance as defined above has the drawback that its value for a pair of sets which are almost identical except for a few points added to one of the sets may be considerable.

One can generalize the Hausdorff idea still further by considering in the notion of the class A variations in the sets A by looking at them ' modulo" a small number of points, or in the infinite case, "modulo" sets of small linear measure.

Another distance between two sets, each of which has its own metric between points in themi (e.g., if they are both subsets of a plane with Euclidean metric), can be obtained by trying to map the set A into B and vice versa with the smallest number of errors. If both sets are finite, we consider all mappings of one into the other trying to achieve an isometry as much as possible, that is to say, a transformation such that a pair of points x, y in A should go into a pair of points x', y' whose distance is equal to the distance between x and y.

Given a mapping, we can calculate the sum of the errors under an optimal mapping. The distance between two sets A and B can be defined as the minimumr of the sum of errors under all possible mappings. In practice, of course, the number of all possible trial mappings is enormous, even for sets A and B consisting of a small number of points, and trying all mappings is impractical. Instead one can take recourse to a Monte Carlo type assay by looking at very small subsets of A and mapping themn into subsets of B aind vice versa.

Even if the nurmber of such subsets is large, say hundreds or thousands, the total computation will be vastly shorter than the exponentially increasing (factorial n) number of all mappings.

The above definition bears a resemblance to the one involving a problem considered by Appell in his study of "Deblais and Remblais"2 : the minimum work necessary for transforming a given pile of sand ("points" of a set) into a given different configuration.

What such a definition suggests is that, given a set on a screen providing a new impression, there may be a mechanism of attempting to map the points which form it onto a set of points of a set residing in the memory this with a small number of errors in the distances between pairs of corresponding points. If this is possible, we consider the new impression as recognition of a previous one. If this is not possible, we might put it into the memory as a new object.

Here are a few more possible distances between sets: Given a set of points in the unit square, we may imagine it transformed homothetically so that it would touch the boundary of the square. We now consider a successive division of the square, first into


531

four squares of size 1/2, then a division into sixteen squares of size 1/4, then sixty-four, etc... We examine the characteristic function of the given set in the subdividing squares. We allot weights each normalized so that the squares of the first subdivision have weight 1, in the second subdivision 1/4, then 1/16, etc... The set will then be coded by a sequence of these numbers.

Given two sets A and B, we may define their distance as the norm of the absolute value of the difference between the two coding sequences. It is distances of this sort that were used by Schrandt and myself in experiments on the computer which we performed in Los Alamos around 1965 to attempt recognition via computer of hand written letters of the alphabet.

We proceeded as follows: We wanted to discrininate between two handwritten letters A and B. We stored in the memory 256 examples of each in the following way: Obviously it would have been laborious, time consuming, and slow to do it by changing the styles of these letters by handwriting. Instead we produced varied examples of a handwritten letter on the computer rather quickly. It is well known that there exists on an interval and similarly on the unit square two continuous transformations whose composition will produce a set dense in the space of all such continuous transformations.

Remembering this fact, we chose two transformations of the unit square S and T, each letter different from identity (small deformations). If we consider transformations of the form STTST.. .SSTSTS... etc., of say 7 letters, we have 27 = 128 different transformations, each still not too violently distorting the geometry of the square. Thus we obtain 128 examples of each of the letters A and B initially written once by hand into the machine.

Given a new letter also handwritten, the problem was for the machine to decide whether it was an A or a B.

We computed the distance between the problem letter and the 128 examples of each which were now stored in the memory. Whichever sum of distances was smaller determined the answer. As it turned out, the deformed examples of each letter produced by the computer looked like lifferent handwritten styles, some written by an old man with a shaking hand, some more rounded or pointed, etc., as if they had really been produced by people. The first computer trials gave more than 80% correct answers!

Is it possible that in the actual process of recognition---or discernment or distinction--between two visual impressions one does not need to have recourse to very many examples stored in the memory? That instead, by taking one of the examples in the memory, we might use internally some deformnations and compare them with the given impression?


532

This would be a great saving of memory storage and it could be applied not only to single pictures, but to pairs or triplets or short "films" of pictures, thus enabling recognition or distinction between different new impressions.

A more sophisticated schema of recognition, in fact the beginning of more abstract reasoning, would involve a distance more general yet than the ones mentioned above. This could be based on the comparison of two given sets by decomposing them into pieces and considering a distance calculated from the sum of the comparisons of the parts.

A still more general distance leading to a beginning of "logic" would involve a comparison of classes of pictures. The post hoc ergo propter hoc (after that hence because of that) conclusion serving as an example.

About problems of recognition of shapes by comparison with a large storage of examples:

One question which occurs in auditory, tactile, or visual impressions concerns the taking of a decision that the new impression is "novel" and not to be considered as a variant of one of the examples stored in the memory. Thus, for example, antibodies are able to recognize a foreign or strange object.

We may postulate that there is a list of objects in the memory considered "familiar." This list might be, for example, arranged lexicographically or alphabetically or coded by numbers. As we will see, the way of arranging it in the memory is important.

For example, suppose we have listed 106 numbers, each of ten digits, say, so we have a rather sparse collection. A new number is presented having ten digits. The first question is: Is it equal to some number in the list? On a computer the answer can be obtained immediately. Similarly in a putative mechanism in the brain, since it suffices to go through the digits in succession, which is a fast and efficient process.

Suppose however that the question is: Is the new number if not equal to any number in the list, at least close to some such number, e.g., differs from it by I or 2 in some position? The search for such a close number would be time consuming if we tried to compare the given number with each of the 106 numbers in the memory. Obviously there is a better way.

From the given number we fabricate 20 numbers which differ from it by 1 in some one of the ten digits. Then we look for each of the 20 whether one of them might be in the list, as above, very quickly. Again we see that should there exist in the brain a mechanism to produce some small changes or deformations, then compare those with the contents of the memory, the recognition or discrimination would be much more efficient.


533

The above example refers to one particular distance which depends on the absolute value of the difference in the digits in the same position.

For other types of distances described above, analogous but combinatorially more complicated procedures are possible. For each of these distances the question of what is the most economical or practical clustering presents an interesting exercise.

The principal contention or conjecture is then of the existence of a mechanism in the nervous system capable of producing a number of e-modifications of the impressions that affect distinction or recognition.

Astronomers trying to find a new star on a photograph of a portion of the sky quickly flip a number of pictures of this region and the new object jumps out visually from the collection of the others which are constantly present. A parade of coded pictures in the memory compared with the new impression might serve a similar purpose. Our supposition is that there exist ways of sensing quantitatively a number of different distances between the impression and the memory data, distances that are perhaps not unlike the ones we have enumerated above--an "averted" memory may, like averted vision, aid in the search.

Going beyond, we may have the beginning of a "logical" or "reasoning" process by considering sequences of impressions of pictures and measuring their analogy by metric distances.

References

1. S. Ulam, "Some Ideas and Prospects in Biomathematics," Ann. Rev. Biophys. and Bioeng. 1, 277-292, (1972).

2. Paul Appell, "Le Probleme Geom6trique des Deblais et Remblais," Memorial des Sciences Mathematiques de l'Academie des Sciences, fasc. 27 (1928), Gauthier-Villars, Paris.


535

Appendix A—
Publications of Stanislaw M. Ulam

by Barbara Hendry (LA-3923-MS, 1968)

The material contained in this report was revised, expanded and brought up to date in 1987 with the help of Nancy Shera and Dixie MacDonald. (Eds.)

Remark on the generalized Bernstein's theorem. * Fundamenta Mathematicae 13(1929): 281-3.

Concerning functions of sets.* Fundamenta Mathematicae 14(1929): 231-3.

Zur Masstheorie in der allgemeinen Mengenlehre.* Fundamenta Mathematicae 16(1930): 140-50. Also in Mengenlehre, edited by U. Felgner, 223-33. Darmstadt: Wissenschaftliche Gesellschaft, 1979.

On symmetric products of topological spaces (with Karol Borsuk).* Bulletin of the American Mathematical Society 37(1931): 875 82.

Sur une propriete de la mesure de M. Lebesgue (with J. Schreier).* Comptes Rendus de l'Academie des Sciences 192(1931): 539-42.

Zum Massbegriff in Produktraumen.* In Verhandlungen, Internationaler Mathematikerkongress Ziirich 1932, volume 2, 118-9. Zurich: Orell Fiissli Verlag, 1932.

* This publication appears in Stanislaw Ulam: Sets, Numbers, and Universes, edited by W. A. Beyer, J. Mycielski, and G.-C. Rota. Cambridge, Massachusetts: The MIT Press, 1974.


536

Quelques proprietes topologiques du produit combinatoire (with C. Kuratowski).* Fundamenta Mathematicae 19(1932): 247-51.

Sur les transformations isometriques d'espaces vectoriels norm6s (with S. Mazur).* Comptes Rendus de l'Academie des Sciences 194(1932): 946-8.

Uber gewisse Zerlegungen von Mengen.* Fundamenta Mathematicae 20(1933): 221 3.

Probleme 56. Fundamenta Mathematicae 20(1933): 285. Uber gewisse Invarianten der e-Abbildungen (with Karol Borsuk).* Mathematische Annalen 108(1933): 311-8.

Sur un coefficient lie aux transformations continues d'ensembles (with C. Kuratowski).* Fundamenta Mathematicae 20(1933): 244-53.

Sur le groupe des permutations de la suite des nombres naturels (with J. Schreier).* Comptes Rendus de l'Academie des Sciences 197(1933): 737-8.

Sur les transformations continues des spheres euclidiennes (with J. Schreier).* Comptes Rendus de l'Academie des Sciences 197(1933): 967-8.

Uber die Permutationsgruppe der natiirlichen Zahlenfolge (with J. Schreier).* Studia Mathematica 4(1933): 134-41.

Sur la th6orie de la mesure dans les espaces combinatoires et son application au calcul des probabilit6s: I. Variables ind6pendantes (with Z. Lomnicki).* Fundamenta Mathematicae 23(1934): 237-78.

Uber topologische Abbildungen der euklidischen Spharen (with J. Schreier).* Fundamenta Mathematicae 23(1934): 102-18.

Eine Bemerkung uber die Gruppe der topologischen Abbildungen der Kreislinie auf sich selbst (with J. Schreier).* Studia Mathametica 5(1934): 155 9.

Sur le nombre de generateurs d'un groupe semi-simple (with H. Auerbach).* Comptes Rendus de I'Academie des Sciences 201(1935): 117 9.

Sur une propriete caracteristique de l'ellipsoide (with H. Auerbach and S. Mazur).* Monatsheften fir Mathematik und Physik 42(1935): 45 8.

Sur le nombre des generateurs d'un groupe topologique compact et connexe (with J. Schreier).* Fundamenta Mathematicae 24(1935): 302-4.

Uber die Automorphismen der Permutationsgruppe der natiirlichen Zahlenfolge (with J. Schreier).* Fundamenta Mathematicae 28(1937): 258-60.

Probleme 74. Fundamenta Mathematicae 30(1938): 365. On the equivalence of any set of first category to a set of measure zero (with J. C. Oxtoby).* Fundamenta Mathematicae 31(1938): 201-6.


537

On the existence of a measure invariant under a transformation (with J. C. Oxtoby).* Annals of Mathematics, Second Series 40(1939): 560-6.

Measure-preserving homeomorphisms and metrical transitivity (with J. C. Oxtoby).* Annals of Mathematics, Second Series 42(1941): 874--920.

What is measure?* The American Mathematical Monthly 50(1943): 597--602.

Theory of multiplicative processes. I (with D. Hawkins). Los Alamos Scientific Laboratory report LA---171, 1944.

On ordered groups (with C. J. Everett).* Transactions of the American Mathematical Society 57(1945): 208-16.

On approximate isometries (with D. H. Hyers).* Bulletin of the American Mathematical Society 51(1945): 288-92.

Stefan Banach, 1892-1945. ** Bulletin of the American Mathematical Society 52(1946): 600 3.

Projective algebra I (with C. J. Everett).* American Journal of Mathematics 68(1946): 77-88.

Problemes P34; P35; P35,R1 (with S. Banach). Colloquium Mathematicum 1(1947): 152-3.

Approximate isometries of the space of continuous functions (with D. H. Hyers).* Annals of Mathematics, Second Series 48(1947): 2859.

Statistical methods in neutron diffusion (with J. von Neumann). Report written by R. D. Richtmyer and J. von Neumann. Los Alamos Scientific Laboratory report LAMS-551, 1947. Also in Von Neumann: Collected Works, 1903-1957, edited by A. H. Taub, volume 5. Oxford: Pergamon Press, 1963.

Multiplicative systems, I (with C. J. Everett).* Proceedings of the National Academy of Sciences of the United States of America 34(1948): 403-5.

Multiplicative systems in several variables. Parts I, II, and III (with C. J. Everett). Los Alanlos Scientific Laboratory reports LA-683, LA-690, and LA-707, 1948.

The Monte Carlo method (with Nicholas Metropolis).* Journal of the American Statistical Association 44(1949): 335-41.

On the Monte Carlo method. In Proceedings of the 1949 Symposium on Large-Scale Digital Calculating Machines, 207-12. Cambridge, Massachusetts: Harvard University Press, 1951.

** This publication appears in Science, Computers and People: From the Tree of Mathematics, edited by Mark C. Reynolds and Gian-Carlo Rota. Boston: Birkhaiiser, 1986.


538

Random processes and transformations. In Proceedings of the International Congress of Mathematicians (Cambridge, Massachusetts, August 30-September 6, 1950), volume 2, 264-75. Providence, Rhode Island: American Mathematical Society, 1952.

Approximately convex functions (with D. H. Hyers).* Proceedings of the American Mathematical Society 3(1952): 821-8.

A property of randomness of an arithmetical function (with N. Metropolis).* The American Mathematical Monthly 60(1953): 252-3.

Heuristic studies in problems of mathematical physics on high speed computing machines (with J. Pasta). Los Alamos Scientific Laboratory report LA-1557, 1953.

On the stability of differential expressions (with D. H. Hyers).* Mathematics Magazine 28(1954): 59-64.

Homage to Fermi. Santa Fe New Mexican, January 6, 1955.

On a method of propulsion of projectiles by means of external nuclear explosions (with C. J. Everett). Los Alamos Scientific Laboratory report LAMS-1955, 1955.

Studies of nonlinear problems. I (with E. Fermi, J. Pasta, and M. Tsingou).* Los Alamos Scientific Laboratory report LA-1940, 1955. Also in Enrico Fermi: Collected Papers, volume 2, edited by E. Amaldi, H. L. Anderson, E. Persico, E. Segre, and A. Wattenberg. Chicago: University of Chicago Press, 1965.

Study of certain combinatorial problems through experiments on computing machines (with P. R. Stein). In Proceedings of the 1955 High-Speed Computer Conference (Louisiana State University, Baton Rouge, Louisiana, February 14-16, 1955), 101-6.

On the ergodic behavior of dynamical systems. In "Series of lectures on physics of ionized gases." Los Alamos Scientific Laboratory report LA-2055, 1956.

On certain sequences of integers defined by sieves (with Verna Gardiner, R. Lazarus, and N. Metropolis).* Mathematics Magazine 29(1956): 117-22.

Infinite models in physics.* In Proceedings of the Seventh Symposium in Applied Mathematics (Brooklyn Polytechnic Institute, April 14-15, 1955). American Mathematical Society Symposia in Applied Mathematics, volume 7, 87-95. New York: McGraw-Hill Book Company, Inc., 1957.

Marian Smoluchowski and the theory of probabilities in physics.** American Journal of Physics 25(1957): 475-81.

The Scottish Book: A Collection of Problems. An edited translation of a notebook kept at the Scottish Cafe for the Lw6w Section of the Soci6et Polonaise de Mathematiques. Privately mimeographed


539

and distributed by S. M. Ulam in 1957. Reprinted as Los Alamos Scientific Laboratory report LA-6832, 1967.

On some new possibilities in the organization and use of computing machines. IBM research report RC-86, 1957.

Experiments in chess (with J. Kister, P. Stein, W. Walden, and M. Wells).* Journal of the Association for Computing Machinery 4(1957): 174-7.

Experiments in chess on electronic computing machines (with P. R. Stein).** Chess Review, January 1957, 13-5. Also in Computers and Automation, September 1957.

John von Neumann, 1903-1957.** Bulletin of the American Mathematical Society 64(1958): 1-49.

The late John von Neumann on computers and the brain.** Scientific American, June 1958, 127.

On the possibility of extracting energy from gravitational systems by navigating space vehicles. Los Alamos Scientific Laboratory report LAMS-2219, 1958.

Statement before the U.S. House of Representatives. Hearings on Astronautics and Space Exploration. 85th Congress, 2nd session, April 15-May 12, 1958.

Review of Funkcje Rzeczywiste by Roman Sikorski. Bulletin of the American Mathematical Society 65(1959): 305-6.

Quadratic transformations. Part I (with M. T. Menzel and P. R. Stein). Los Alamos Scientific Laboratory report LA-2305, 1959.

Heuristic numerical work in some problems of hydrodynamics (with John R. Pasta).* In Mathematical Tables and Other Aids to Computation 13(1959): 1-12.

A Collection of Mathematical Problems.* New York: Interscience Publishers, 1960. Reprinted as Problems in Modern Mathematics. John Wiley & Sons, Inc., 1964. Translated into Russian (1964).

Statement before the Joint Committee on Atomic Energy. In Frontiers in Atomic Energy Research: Hearings before the Subcommittee on Research and Development of the Joint Committee on Atomic Energy, Eighty-sixth Congress, Second Session, on Frontiers in Atomic Energy Research, March 22-25, 1960, 282-5. Washington, D.C.: U.S. Government Printing Office, 1960.

Monte Carlo calculations in problems of mathematical physics. In Modern Mathematics for the Engineer, Second Series, edited by Edwin F. Beckenbach, 95-108. New York: McGraw-Hill Book Company, Inc., 1961.

Nuclear propelled vehicle, such as a rocket (with C. J. Everett). British Patent 877,392, 1961.

How to formulate mathematically problems of the rate of evolution?** In Proceedings of the Symposium on Mathematical Challenges to the


540

Neo-Darwinian Interpretation of Evolution (New York, April 5-8, 1961), edited by Paul S. Moorhead and Martin M. Kaplan. Providence, Rhode Island: American Mathematical Society. Wistar Institute Monograph 5: (1967) 21-33, April 25-26, 1966 New York: A. Liss, 1985.

On some statistical properties of dynamical systems.* In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability (University of California, Berkeley, June 20 July 30, 1960), edited by Lucian M. Le Cam, Jerzy Neyman, and Elizabeth Scott, volume 3, 315-20. Berkeley: University of California Press, 1961. Translated into Russian (1963).

Electronic computers and scientific research. In The Age of Electronics, edited by Carl F. J. Overhage, 95-108. New York: McGraw-Hill Book Company, Inc., 1962. Also in Computers and Automation, August 1963 and September 1963.

An open problem. In Recent Advances in Game Theory (Papers Delivered at a Meeting of the Princeton University Conference, October 4-6,1961), 223. Princeton, New Jersey: Princeton University Conference, 1962.

On some mathematical problems connected with patterns of growth of figures.** Applied Mathematics 14(1962): 215-24. Also in Essays on Cellular Automata, edited by Arthur W. Burks. Urbana, Illinois: University of Illinois Press, 1970.

Stability of many-body computations. In Hydrodynamic Instability, edited by Garrett Birkhoff, Richard Bellman, and C. C. Lin, 247-58. American Mathematical Society Symposia in Applied Mathematics, volume 13. Providence, Rhode Island: American Mathematical Society, 1962.

Communication to the U.S. Senate Committee on Foreign Relations. In Nuclear Test Ban Treaty: Hearings before the Committee on Foreign Relations, United States Senate, Eighty-eighth Congress, First Session, on Executive M, August 12-15, 19-23, 26-27, 1963, 505-6 and 993. Washington, D.C.: U.S. Goverment Printing Office, 1963.

Problems 110, 111, and 112. In Proceedings of the 1963 Number Theory Conference (University of Colorado, Boulder, Colorado, August 524, 1963), 114-5.

Some properties of certain non-linear transformations.* In Mathematical Models in Physical Sciences: Proceedings of the Conference at the University of Notre Dame, 1962, edited by Stefan Drobot, 8595. Englewood Cliffs, New Jersey: Prentice-Hall, Inc., 1963.

Non-linear transformation studies on electronic computers (with P. R. Stein).* Rozprawy Matematyczne 39(1963): 1--66. The Introduction and Part I are also in Essays on Cellular Automata, edited


541

by Arthur W. Burks. Urbana, Illinois: University of Illinois Press, 1970.

The future of nuclear energy in space: A panel discussion sponsored by the Aerospace Division, American Nuclear Society, at the 1963 winter meeting in New York City, N. Y. on November 20, 1963 (with F. deLuzio, W. von Braun, M. Hunter, and I. Asimov), edited by R. F. Trapp. Also in Nuclear News, July 1964.

Combinatorial analysis in infinite sets and some physical theories. SIAM Review 6(1964): 343-55.

Computers.** Scientific American, September 1964, 202-216.

A visual display of some properties of the distribution of primes (with M. L. Stein and M. B. Wells).* The American Mathematical Monthly 71(1964): 516-20.

Possibility of an accelerated process of collapse of stars in a very dense centre of a cluster or a galaxy (with W. E. Walden). Nature 201(1964): 1202.

The Orion project.** Nuclear News, January 1965, 25-7. Collapse of stellar systems. In Proceedings of the 25th International Astronomical Union Symposium (Thessalonicki, Greece, August 1622, 1964), 76-7. International Astronomical Union, 1966.

La machine creatrice. In Rencontres Internationales de Geneve "Le robot, la bete et l'homme (1965), 31-42. Neuchatel: Editions de la Baconniere, 1966.

On some possibilities of generalizing the Lorentz group in the special relativity theory (with C. J. Everett).* Journal of Combinatorial Theory 1(1966): 248-70.

Thermonuclear devices.** In Perspectives in Modern Physics: Essays in Honor of Hans A. Bethe, edited by R. E. Marshak and J. Warren Blaker, 593-601. New York: Interscience Publishers, 1966.

An education in applied math. In Proceedings of May 24-27, 1966 SIAM Conference (Aspen, Colorado), edited by James Ortega, Paul I. Richards, and Frank W. Sinden. SIAM Review 9(1967): 343-4.

On general formulations of simulation and model construction. In Prospects for Simulation and Simulators of Dynamic Systems, edited by George Shapiro and Milton Rogers, 3-8. New York: Spartan Books, 1967.

On recursively defined geometrical objects and patterns of growth (with R. G. Schrandt).** Los Alamos Scientific Laboratory report LA-3762, 1967. Also in Essays on Cellular Automata, edited by Arthur W. Burks. Urbana, Illinois: University of Illinois Press, 1970.

On visual hulls of sets (with G. H. Meisters).* Proceedings of the National Academy of Sciences of the United States of America 57(1967): 1172-4.


542

An observation on the distribution of primes (with M. Stein).* The American Mathematical Monthly 74(1967): 43-4.

Computations on certain binary branching processes. In Computers in Mathematical Research, edited by R. F. Churchhouse and J.-C. Herz, 168-171. Amsterdam: North-Holland Publishing Company, 1968.

Numerical studies of star systems. In Colloque sur le Probleme des N Corps, 265-7. Editions du Centre National de la Recherche, 1968.

Philosophical implications of some recent scientific discoveries.** In Science, Philosophy and Religion. Proceeding Symposium Kirtland Air Force Laboratory, Albuquerque, New Mexico 44-48. (1968).

Note on the visual hull of a set (with W. A. Beyer).* Journal of Combinatorial Theory 4(1968): 240-5.

On equations with sets as unknowns (with Pal Erd6s).* Proceedings of the National Academy of Sciences of the United States of America 60(1968): 1189-95.

Mathematics and Logic: Retrospect and Prospects (with Mark Kac). New York: Frederick A. Praeger, Inc., 1968. The text of this book first appeared as the article entitled "Mathematics and logic: Retrospect and prospects" in Britannica Perspectives, volume 1, 557732. Chicago: Encyclopaedia Britannica, Inc., 1968. Translated into French (1973), into Serbo-Croatian (1977), and into Spanish (1979).

The applicability of mathematics.** In The Mathematical Sciences: A Collection of Essays, edited by the National Research Council's Committee on Support of Research in the Mathematical Sciences, 1-6. Cambridge, Massachusetts: The M.I.T. Press, 1969.

Wspomnienia Kawiarni Szkockiej (Reminiscences of the Scottish Caf6). Wiadomosci Matematyczne 12(1969):49-58. Available in English only in manuscript form.

Computer studies of some history-dependent random processes (with W. A. Beyer and R. G. Schrandt). Los Alamos Scientific Laboratory report LA-4246, 1969.

The entropy of interacting populations (with C. J. Everett). Los Alamos Scientific Laboratory report LA-4256, 1969.

On the pairing process and the notion of genealogical distance (with Jan Mycielski).* Journal of Combinatorial Theory 6(1969): 227-34.

Foreword to My World Line: An Informal Autobiography by G. Gamow. New York: Viking Press, 1970.

Generalizations of product isomorphisms. In Recent Trends in Graph Theory, 215. Lecture Notes in Mathematics, volume 186. Berlin: Springer-Verlag, 1971.

Testimony before the United States District Court, District of Minnesota, Minneapolis, Minnesota, September 17, 1971, in the case of Honeywell Incorporated versus Sperry Rand Corporation, 7342-438.


543

The notion of complexity (with W. A. Beyer and M. L. Stein). Los Alamos Scientific Laboratory report LA-4822, 1971.

Some probabilistic remarks on Fermat's last theorem (with P. Erd6s).* Rocky Mountain Journal of Mathematics 1(1971): 613-6.

Some elementary attempts at numerical modeling of problems concerning rates of evolutionary processes (with R. Schrandt). Los Alamos Scientific Laboratory report LAMS-4573, 1971.

Gamow--and mathematics.** In Cosmology, Fusion & Other Matters: George Gamow Memorial Volume, edited by Frederick Reines, 2729. Boulder, Colorado: Colorado Associated University Press, 1972.

Ideas of space and space-time.** Rehovot Magazine, Winter 1972-73, 29-33.

Some combinatorial problems studied experimentally on computing machines. In Applications of Number Theory to Numerical Analysis, edited by S. K. Zaremba, 1-10. New York: Academic Press, Inc., 1972.

Some ideas and prospects in biomathematics.** Annual Review of Biophysics and Bioengineering 1(1972): 277- 91.

Metrics in biology, an introduction (with W. A. Beyer, M. L. Stein, and Temple Smith). Los Alamos Scientific Laboratory report LA-4973, 1972.

Lectures in nonlinear algebraic transformations (with P. R. Stein). In Studies in Mathematical Physics (Lectures Presented at the NATO Advanced Study Institute on Mathematical Physics held in Istanbul, August, 1970), edited by A. O. Barut, 263-314. Dordrecht, The Netherlands: D. Reidel Publishing Company, 1973.

Infinities. In The Heritage of Copernicus, edited by J. Neyman, 378-92. Cambridge, Massachusetts: The MIT Press, 1974.

New rules and old games. Outlook, Spring 1974, 32-3.

Stanislaw Ulam: Sets, Numbers, and Universes, edited by W. A. Beyer, J. Mycielski, and G.-C. Rota. Cambridge, Massachusetts: The MIT Press, 1974.

A molecular sequence metric and evolutionary trees (with William A. Beyer, Myron L. Stein, and Temple F. Smith). Mathematical Biosciences 19(1974): 9-25.

Arthur Koestler et le d6fi du hazard: Entretien avec Stan Ulam de Pierre Debray-Ritzen. In Arthur Koestler, 428-32. Cahiers de l'Herne. Paris: Edition de L'Herne, 1975.

Adventures of a Mathematician. New York: Charles Scribner's Sons, 1976. Paperback editions published in 1977 and 1983. Translated into Japanese (1979).

Physics for mathematicians.** In Physics and Our World: A Symposium in Honor of Victor F. Weisskopf (Massachusetts Institute of


544

Technology, 1974), edited by Kerson Huang, 113-21. AIP Conference Proceedings, number 28. New York: American Institute of Physics, Inc., 1976.

Generators for algebras of relations (with A. R. Bednarek). Bulletin of the American Mathematical Society 82(1976): 781-2.

On the theory of relational structures and schemata for parallel computation (with A. R. Bednarek). Los Alamos Scientific Laboratory report LAMS-6734, 1977.

Some remarks on relational composition in computational theory and practice (with A. R. Bednarek). In Fundamentals of Computation Theory: Proceedings of the 1977 International FCT-Conference (Poznari-K6rnik, Poland, September 19-23, 1977), edited by Marek Karpiiski, 22-32. Lecture Notes in Computer Science, volume 56. Berlin: Springer-Verlag, 1977.

Przygody matematyka. Kultura 9(30 Lipca 1978). Translated into Polish by Jerzy Jaruzelski.

Banach i inni. Kultura 10(6 Sierpnia 1978). Translated into Polish by Jerzy Jaruzelski.

Narodziny "Ksiegi Szkockiej." Kultura 10(13 Sierpnia 1978). Translated into Polish by Jerzy Jaruzelski.

Projective algebra and the calculus of relations (with A. R. Bednarek). Journal of Symbolic Logic 43(1978): 56-64.

The role of abstract mathematical ideas in possible conceptual advances in natural sciences, more specifically biology. In Proceedings of International Colloquium on the Role of Mathematical Physics in the Development of Science (College de France, Paris, June 13-15, 1977), edited by Dominique Akl, Moshe Flato, and Daniel Sternheimer, 12-25. UNESCO, 1978.

An integer-valued metric for patterns (with A. R. Bednarek). In Fundamentals of Computation Theory, 52-7. Berlin: Akademie-Verlag, 1979.

Minimal decomposition of two graphs into pairwise isomorphic subgraphs (with F. R. K. Chung, P. Erdos, R. L. Graham, and F. F. Yao). In Proceedings of the Tenth Southeastern Conference on Combinatorics, Graph Theory, and Computing (Florida Atlantic University, Boca Raton, Florida, April 2-6, 1979), volume 1, 3-18. Congressus Numerantium, volume 23. Winnipeg, Manitoba: Utilitas Mathematica Publishing Incorporated, 1979.

A mathematical physicist looks at computing.** Rehovot Magazine, volume 9, number 1, 1980, 47-50.

On the operations of pair production, transmutations, and generalized random walks. Advances in Applied Mathematics 1(1980): 7-21.

Preface to A Half Century of Polish Mathematics: Remembrances and Reflections by Kazimierz Kuratowski. International Series in Pure


545

and Applied Mathematics, volume 108. Oxford: Pergamon Press Ltd., 1980.

Von Neumann: The interaction of mathematics and computing.** In A History of Computing in the Twentieth Century: A Collection of Essays, edited by N. Metropolis, J. Howlett, and Gian-Carlo Rota, 93-9. New York: Academic Press, Inc., 1980.

Further applications of mathematics in the natural sciences.** In American Mathematical Heritage: Algebra and Applied Mathematics, edited by J. Dalton Tarwater, 101-14. Texas Tech University Mathematics Series, volume 13. Lubbock, Texas: Texas Technological University Press, 1981.

Kazimierz Kuratowski, 1896-1980.** Polish Review 26(1981): 62-6. On the notion of analogy and complexity in some constructive mathematical schemata. Los Alamos National Laboratory report LA-9065, 1981. Also in Probability, Statistical Mechanics, and Number Theory, edited by Gian-Carlo Rota. Advances in Mathematics: Supplementary Studies, volume 9. New York: Academic Press, Inc., 1986.

An anecdotal history of the Scottish Book. In The Scottish Book: Mathematics from the Scottish Cafe, edited by R. Daniel Mauldin. Boston: Birkhauser, 1982.

Introduction** to Selected Studies: Physics-Astrophysics, Mathematics, History of Science. A Volume Dedicated to the Memory of Albert Einstein, edited by Themistocles M. Rassias and George M. Rassias. Amsterdam: North-Holland Publishing Company, 1982.

Reflections of the Polish masters: An interview with Stan Ulam and Mark Kac. Los Alamos Science, volume 3, number 3, 1982, 54-65.

Speculations about the mechanism of recognition and discrimination. Los Alamos National Laboratory unclassified release LAUR 82-62, 1982.

Transformations, iterations and mixing flows. In Dynamical Systems II, edited by A. R. Bednarek and L. Cesari, 419-26. New York: Academic Press, 1982.

Kazimierz Kuratowski, Wspomnienia (Kazimierz Kuratowski: A reminiscence). Wiadomosci Matematyczne, 1983. Translated into Polish by R. Engelking. Also in Kazimierz Kuratowski, Selected Papers,Polish Academy of Sciences, K. Borsuk, editor, PWN, Warsaw, 1988.

Speculations on some possible mathematical frameworks for the foundations of certain physical theories. Letters in Mathematical Physics 10(1985): 101-6.

Science, Computers, and People: From the Tree of Mathematics, edited by Mark C. Reynolds and Gian-Carlo Rota. Boston: Birkhaiiser, 1986.


546

Mathematical problems and games (with R. Daniel Mauldin). Advances in Applied Mathematics 8(1987): 281 -344.

Reflections on the brain's attempts to understand itself. Los Alamos Science, number 15, 1987, 283-7.

Analogies between Analogies: The Mathematical Reports of S. Ulam and his Los Alamos Collaborators, edited by D. Sharp and M. Simmons. University of California Press.

Eleven weapons-related reports written by Ulam and his collaborators between 1944 and 1958 are still classified. These are listed in LAMS-3923, 1968 and in Stanislaw Ulam: Sets, Numbers, and Universes, edited by W. A. Beyer, J. Mycielski, and G.-C. Rota (The MIT Press, 1974).

Abstracts

Uber unendliche Abelsche Gruppen (with S. Mazur). Annales de la Societe Polonaise de Mathematique, 9(1930): 204.

Ein Betrag zum Massproblem. Annales de la Societe Polonaise de Mathematique, 9(1930): 198.

Uber die Eindeutigkeit des Masses von Gerardenmengen. Annales de la Societe Polonaise de Mathematique, 9(1930): 200.

Uber vollstandig additive Massfunktionen in abstrakten Raumen. Annales de la Societe Polonaise de Mathematique, 9(1930): 195.

Zur Theorie des Fixpunktes. Annales de la Societe Polonaise de Mathematique, 9(1930): 201-2.

Einige Satze iiber Mengen II-er Kategorie. Annales de la Societe Polonaise de Math6matique, 10(1931): 123-4.

Uber eine charakteristische Eigenschaft des Ellipsoides (with H. Auerbach. S. Mazur). Annales de la Socie't Polonaise de Mathematique, 10(1931): 128.

Uber eine neue topologische Operation (with K. Borsuk). Annales de la Societe Polonaise de Mathematique, 10(1931): 125 6.

Uber isometrische Abbildungen von normierten Vektorraumen (with S. Mazur). Annales de la Societe Polonaise de Mathematique, 10(1931): 127.

Uber die Grundlagen der Wahrscheinlichkeitsrechnung (with Z. Lomnicki). Annales de la Societe Polonaise de Mathematique, 12(1933) 115.

Uber die Gesetze der grossen Zahlen (with Z. Lomnicki). Annales de la Societe' Polonaise de Mathematique, 12(1933): 118.


547

Bemerkungen iiber die stetigen Abbildungen von Topologischen Raumen (with J. Schreier). Annales de la Societe Polonaise de Mathematique, 13(1934): 142.

Uber stetige Abbildungen von Mannigfaltigkeiten. Annales de la Societe Polonaise de Mathematique, 13(1934): 141.

Existence of metrically transitive transformations. Preliminary report (with J. C. Oxtoby). Bulletin of the American Mathematical Society, 44(1938): 347.

On bounded transformations of space. Preliminary report. Bulletin of the American Mathematical Society, 44(1938): 195.

On the distribution of a general measure in any complete metric separable space. Bulletin of the American Mathematical Society, 44(1938): 786.

Set-theoretical invariants of the product operation. Bulletin of the American Mathematical Society, 44(1938): 195.

Sur les transformations ergodiques. Annales de la Societe Polonaise de Mathematique, 17(1938): 112.

On the abstract theory of measure. Bulletin of the American Mathematical Society, 45(1939): 83.

e-isomorphic transformations. Preliminary report. Bulletin of the American Mathematical Society, 45(1939): 232.

On approximate isometries. Preliminary report (with D. H. Hyers). Bulletin of the American Mathematical Society, 47(1941): 708.

On measures for subsets of sets of measure zero. Bulletin of the American Mathematical Society, 47(1941): 702.

Theory of operation of products of sets. I. Preliminary report. Bulletin of the American Mathematical Society, 47(1941): 702.

Approximate isometries of the space of continuous functions (with D. H. Hyers). Bulletin of the American Mathematical Society, 48(1942): 368.

Geometrical approach to the theory of representations of topological groups. Preliminary report. Bulletin of the American Mathematical Society, 48(1942): 44.

On the problem of completely additive measure in classes of sets with a general equivalence relation (with D. L. Bernstein). Bulletin of the American Mathematical Society, 48(1942): 361-2.

On the equivalence of functions. Bulletin of the American Mathematical Society, 49(1943): 49.

On the length of curves, the surface area and the isoperimetric problem under a general Minkowski metric. Preliminary report. Bulletin of the American Mathematical Society, 49(1943): 57.

Theory of the operation of products of sets. II. Preliminary report. Bulletin of the American Mathematical Society, 49(1943): 367-8.


548

On ordered groups (with J. C. Everett). Bulletin of the American Mathematical Society, 50(1944): 496.

On the algebra of systems of vectors and some problems in kinematics (with L. Cohen). Bulletin of the American Mathematical Society, 50(1944):61.

Some combinatorial problems in set theory. Preliminary report (with P. Erdos). Bulletin of the American Mathematical Society, 50(1944): 57.

Theory of the operation of product of sets. III. Preliminary report. Bulletin of the American Mathematical Society, 50(1944): 60-1.

Projective algebra, I (with C. J. Everett). Bulletin of the American Mathematical Society, 51(1945): 59.

Random ergodic theorems (with J. von Neumann). Bulletin of the American Mathematical Society, 51(1945): 660.

On combination of stochastic and deterministic processes. Preliminary report (with J. von Neumann). Bulletin of the American Mathematical Society, 53(1947): 1120.

On quasi-fixed points for transformations in function spaces. Bulletin of the American Mathematical Society, 53(1947): 1120.

On the group of homeomorphisms of the surface of the sphere (with J. von Neumann). Bulletin of the American Mathematical Society, 53(1947): 506.

On the problem of determination of mathematical structures by their endomorphisms (with C. J. Everett). Bulletin of the American Mathematical Society, 54(1948): 646.

Statistical methods for problems involving equations of the diffusion type, (Monte Carlo). A.E.C. Information Meeting, Brookhaven Natl. Lab., Apr. 26 8, 1948, BNL-17, Special,(1948):27.

Multiplicative systems, I (with C. J. Everett). Bulletin of the American Mathematical Society, 55(1949): 51.

Multiplicative systems, II (with C. J. Everett). Bulletin of the American Mathematical Society, 55(1949): 51.

Multiplicative systems, III (with C. J. Everett). Bulletin of the American Mathematical Society, 55(1949): 51-2.

On motions of systems of mass points randomly distributed on the infinite line (with N. C. Metropolis). Bulletin of the American Mathematical Society, 55(1949): 670-1.

On an application of a correspondence between matrices over real algebras and matrices of positive real numbers (with C. J. Everett). Bulletin of the American Mathematical Society, 56(1950): 63.

Random walk and the Hamilton-Jacobi equation (with C. J. Everett). Bulletin of the American Mathematical Society, 56(1950): 63-4.

Approximately convex functions (with D. H. Hyers). Bulletin of the American Mathematical Society, 59(1951): 300-1.


549

Applications of Monte Carlo methods to tactical games. in Proceedings of March 16-17, 1954 Symposium on Monte Carlo Methods, Univ. of Fla., Herbert A. Meyer ed., Wiley (1956) 63.

Some mathematical problems investigated through computations on electronic machines. The American Mathematical Monthly, 63(1956): 607.

Future uses of future computers. American Chemical Society Abstracts of Papers, 133(1958): 33-4K.

On certain binary reaction systems (with P. R. Stein). American Mathematical Society Notices, 6(1959): 68-9.

On patterns of growth of figures in two dimensions (with R. G. Schrandt). American Mathematical Society Notices, 7(1960): 642.

On some combinatorial problems in patterns of growth, I (with J. C. Holladay). American Mathematical Society Notices, 7(1960): 234.

On a statistical method of solving multiplication and diffusion problems. Monsanto Chemical Company Meeting, Clinton National Laboratory, Oct. 13-5, 1947. Monsanto Chemical Company Abstracts of Papers, Mon-411, #14, 1961.

On some possibility of generalizing the Lorentz group in special relativity theory (with C. J. Everett). American Mathematical Society Notices, 12(1965): 614.

Recursive definitions of static changing patterns, in Biomathematics and Computer Science in the Life Sciences, Monograph of Proceedings of 3d Annual Symposium-Houston, Texas, (1965): IX.*

* As an amusing commentary on Ulam's personality the editors include the following remark from the foreword to the original 1968 Publication's Report:. . ."The most distinguishing personal traits of Stan Ulam are friendliness, simplicity, tenacity, and a certain disregard of formality or other mundane impertinences. . .One of our functions is to monitor all material published from this Laboratory. . .Everyone else sends his proposed paper. .to us before mailing it. Most of our files on Ulam's publications were opened when he sent us a reprint and an invoice yet there was never a problem about any of the papers."...

Leslie M. Redman Technical Information Group April 1968


550

Appendix B—
Vita of Stanislaw M. Ulam

Born: April 13, 1909, Lwów. Poland

Studies: M.A. & D.Sc., 1933, Polytechnic Institute, Lwów. Post-doctoral studies in Vienna, Zürich & Cambridge (England), 1934

Positions: Came to U.S. on invitation of J. von Neumann, to Institute for Advanced Studies, Princeton, 1935 Junior Fellow at Harvard Society of Fellows, then lecturer in mathematics, Harvard, 1936-40 Assistant Professor, University of Wisconsin, 1941-43 Staff member, then research advisor, Los Alamos Scientific Laboratory, 1944-1967 During that period visited: University of Southern California, Los Angeles 1945-46, Harvard University, 195152, M.I.T., 1956-57, University of Colorado, 1961, University of California, La Jolla, 1962 Mathematics professor and chairman of department, University of Colorado, Boulder, 1965-77 Consultant, Los Alamos National Laboratory, 1967-84 Visiting Professor, M.I.T. and University of Paris, 1972 Graduate research professor, University of Florida, Gainesville, 1974-84 Professor of biomathematics, University of Colorado medical school, Denver, 197984 Visiting professor, University of California, Davis, 1982

Member of: American Academy of Arts and Sciences, National Academy of Sciences, American Philosophical Society, Mathematical and Physical Societies, Board of Governors and Scientific Advisory Committee, Weizmann Institute of Science, Rehovot, Israel; Board of Governors, Jurzykowski Foundation, New York

Honorary Degrees and Awards: University of New Mexico, University of Wisconsin, University of Pittsburgh, Polish Millenium, AC.P.C.C. Scientific, Polish Heritage awards.

Committees: NAS Committee on Innovations, NRS Committee on Applications of Mathematics, Harvard Visiting Committees for Mathematics, and Applied Mathematics and Physics, General Twining's Air Force Committee

Consultant: President Kennedy's Science Advisory Committee, also IBM, General Atomic, North American Aviation, Hycon

Died: May 13, 1984, Santa Fe, New Mexico


551

Index

A

Abelian groups, 517

Absorption, neutron, 2 , 3 , 13 , 21 , 22 , 24 , 25

Acceleration, 167 , 168 -171, 174

Active material, 18 , 21 , 22 , 24 , 25 , 34

ADAM, 431 , 432 -433, 436 , 444

Additive processes, 1

Adiabatic equation, 124

Algebra, 144 ;

Boolean, 348 -349, 446 , 479 , 480 , 481 , 482 , 491 , 494 , 498 , 518 , 520 ;

cylindric, 480 , 496 , 520 ;

pattern, 492 , 494 , 496 (see also Relational structures);

polyadic, 480 ;

projective, 480 -481, 496 , 520 ;

relation, 492 -496

American Philosophical Society, xii

Amino acids, 466 , 469 , 470 , 471 , 472

Analogy, ix , x , 514 -518;

analogies between, 513 , 518 ;

and complexity, 518 ;

criteria for, 514 ;

distance measures, 483 , 485 , 489 ;

transformations preserve, 523

Appell, Paul-Emile, 516 n, 530

Arithmetic:

complexity, 445 -463;

multiprecision, 353 n, 357 n;

significance, 483

Artificial intelligence, 466

Automatic plotting devices, 301 , 303 . See also Oscilloscope

Automorphism, 492

Autonomous systems, 344

B

Baire categories, 350

Banach, Stefan, ix , x , 510 , 513

Bednarek, Alexander R., xii , xiv , 477 , 482 , 485 , 516

Behavior:

chaotic, 192 ;

convergence, 192 , 197 , 199 , 200 , 201 -202, 211 , 213 -217, 218 , 224 -225, 227 , 233 , 272 , 305 -306;

ergodic, xvi , 13 , 132 , 140 , 143 , 155 -162, 192 , 293 , 294 , 348 ;

of gas, 123 -129;

limiting, 189 , 190 , 192 , 193 , 198 , 201 , 207 -209, 221 , 227 , 296 (see also Oscillation);

pathological, 358 ;

qualitative, 131 ;

topological, 131 , 133

Bell, George, 527

Bendixson, I., 344

Bernouillian, formulas, 2 , 183 , 345 , 349

Beyer, William A., xiii , 399 , 445 , 465 , 469 , 482

Billowing, 126 -129

Binary reaction systems, 194 -286, 294 -296;

as commutative, 196 ;

convergence behavior of, 199 , 200 , 201 -202, 211 , 213 -217, 218 , 224 -225,


552

227 , 233 , 272 ;

form stability in, 210 -213;

four-variable, 189 ;

as non-associative, 196 -197;

as non-commutative, 212 -213;

and ordinary differential equations, 225 -226;

oscillation in, 207 , 208 , 209 , 211 , 212 , 222 , 224 , 225 , 227

Biology:

distance in, 527 -528;

mathematics/metrics applied to, ix , xiii , xiv , 465 -475;

molecular, xiii

Biomathematics, x , xiii

Birkhoff, George D., 157

Boltzmann, Ludwig, 162 ;

equation of, 18 , 417 , 418 -419, 422 -423;

H-function of, 419 -420, 421 , 427 ;

hypothesis of, 157 , 158 ;

on kinetic energy, 417 , 418 -419

Boole, see Algebra, Boolean

Borel sets, 86

Borsuk, K., 517

Boundary:

in growth pattern, 382 , 383 ;

of vacuum, 123 , 125 , 126 -127

Boundary points:

double, 199 , 202 , 208 , 226 ;

fixed, 200 , 201 , 202 , 221 -222, 224 , 226 ;

periodic, 201 ;

triple, 202 , 208 , 209

Brain, xii , 532 . See also Nervous system

Branching processes, xi , 1 -15, 37 , 520 . See also Multiplicative processes

Broken-linear transformations, 139 , 151 , 152 , 161 , 301 , 345 -349, 350 , 351 , 352 -353, 372

Brouwer, L.E., fixed point theorem of, 201 , 202 , 330 n, 491

Brute force approaches, xvi , 121 , 304 , 305 -306

Burks, Arthur, 379

C

Carlson, Bengt, 34

Cartesian coordinates, 213 -215

CDC-6600, 400 , 485

Cellular automata, xii , 379

Centre National de Recherche (France), xvi

Cenzer, Douglas, 517

Chain:

process, 1 , 2 , 3 , 12 ;

reaction, 17 , 37 ;

rule, 333

Chaos, x , xii , 139 , 192

Classical mechanics, 122

Clifford, A.H., 492

Cluster:

analysis, 473 ;

of sets, 518 ;

star, 123 , 130

Coding problems, 470

Collisions, 17 , 18 , 19 , 20 , 22 , 24 -25, 190 , 195 -196, 418

Combinatorics, xii

Complexity:

and analogy, 518 ;

arithmetic, 445 -463;

calculated, 447 -448;

complement, 448 ;

conditional, 460 ;

defined, 446 -447, 522 -523;

and entropy, 445 , 446 , 460 -461, 482 ;

mathematized, 518 -521;

for modulo a prime integer, 445 , 448 , 519 ;

of rational numbers, 522 , 523 ;

of real numbers, 522 -523;

relative 446 , 519


553

Composition, 478 , 480 -482, 495 ;

functional and relational, 479 -492;

in integration, 483 ;

nonserial, 483 , 500 -506;

and projection, 480 .

See also Iteration

Compound pendulum problem, 144

Computers:

cathode ray tubes in, 128 -129;

combinatorial systems on, xvi ;

complexity calculated on, 447 ;

at Los Alamos, 122 (see also MANIAC);

mathematical research aided by, xv , xvi , 121 -138;

memory of, 132 ;

nonlinear transformations on, xvi , 139 -154, 297 -377;

technology, xv , xvi

Conjugacy, 294 , 295 , 345 , 348 , 349 , 351

Continuum mechanics, xiv , 121 , 122 -123

Convergence, 192 , 197 , 351 ;

in binary reaction systems, 199 , 200 , 201 -202, 211 , 213 -217, 218 , 224 -225, 227 , 233 , 272 ;

brute force applied to, 305 -306;

to fixed point, 4 , 305 , 308 , 309 -310;

under iteration, 37 ;

to limit set, 306 , 321 , 345 ;

linear, 337 ;

Frobenius-Perron theorem of, 218 ;

rate, 198 , 323 ;

recognized, 303 , 307 ;

region of, 211 n, 222 , 308 -309, 340 ;

spurious, 301 , 334 , 354 ;

uniform, 82 -83

Cooper, Leon, N., 484 -485

Cooper, Ralph, 447

Cosmic rays, 131

Crestey, M., 492

Criticality, 18 , 19 , 20 -21, 25 -26;

below-, 65 -84;

death and, 4 , 5 , 73 -74;

generating function in, 9 -10;

just-, 68 , 70 , 71 , 72 , 74 , 76 , 77 ;

strong ratio theorem for, 106 -118;

sub-, 4 , 7 , 9 -10, 65 -84, 118 ;

super-, 4 -5, 6 -7, 37 -38, 49 -60, 106 -118

Cubic transformations, 139 , 141 , 144 , 149 , 161 , 300 , 306 , 308 ;

modified, 326 -343;

three-variable, 293 , 298 , 299 , 309 -311, 312 , 315 -321, 326 -343, 355 -356

Curve, closed. See Limit sets, Class I

Cytochrome-C, 469 , 527

D

Darwin, Charles, 430

Death, 92 , 93 ;

and criticality, 4 , 5 , 73 -74;

distribution of, 73 -74;

fixed point, 43 , 44 , 85 , 105 ;

in growth pattern, 379 , 380 , 381 , 382 -384, 385 ;

probability of, 4 , 5 , 7 , 67 , 73 -74, 105

Déblais et ramblais problem, 516 n, 530

Decomposition, 97 -98, 104 , 106

Density distribution, 127 , 130

Difference equations, 121 , 300 , 326 ;

linear, 323 ;

nonlinear, 191 -192, 344 -345

Differential equations, 344 -345;

approximating, 413 -414;

hyperbolic, 478 ;

nonlinear, 300 ;

ordinary, 225 -226;

parabolic, 478 ;

partial, 12 , 121 , 122 -123, 140 , 160 ;

total, 122 -123, 140


554

Diffusion, xi , 17 -36

Discrimination/discernment, 20 , 526 . See also Recognition

Diseases, contagious, 400

Displacement, 20 , 139 , 140 , 145 , 151 , 153 , 160 , 161 , 338 n. See also Perturbation

Dissimilarity/difference, 465 -466, 467 -470, 473 . See also Distance

Distance (metric), 85 , 98 , 514 , 523 ;

analogy/similarity measured by, 483 , 485 , 489 ;

calculated, 469 -470, 473 , 487 -488, 489 ;

defined, 468 -469, 470 -471, 472 , 486 , 491 n, 515 , 516 , 527 , 528 , 530 ;

betwen DNA code sequences, 526 -528;

Euclidean, 125 , 488 , 515 ;

evolutionary, 466 , 467 ;

Hamming, 527 ;

Hausdorff, 341 , 468 -469, 471 , 472 , 482 , 488 , 516 , 518 , 529 , 530 ;

Markov, 472 ;

minimum work in, 516 , 530 ;

perihelion, 22 ;

properties of, 526 ;

pseudo-, 466 ;

for recognition, 526 -533;

semi, 467 ;

between sets/classes, 466 , 468 -469, 485 , 486 , 492 , 516 -517, 518 , 529 , 530 -531;

Steinhaus, 482 , 488

DNA, 467 , 469 -470, 471 , 472 , 514 , 526 -528

Dog-bone pattern, 382

Domb, C., 400

Dynamical systems:

ergodic, behavior of, xvi , 155 -162;

flow in, 156 -157;

phase space in, 156

Dynamics, 129

Dyson, Freeman, 163

E

Eddington, A. S., 446

Efficiency, theory of, 19 , 20

Einstein, Albert, 417 , 422 -423

Electrons, 38

Endomorphism, 492

Energy:

equipartition of, 139 , 140 -142, 143 , 159 , 160 , 161 -162, 186 , 187 , 218 ;

via fusion, 155 ;

gravitational, xiii , 185 -188;

kinetic, 125 , 130 , 141 -142, 146 , 154 , 160 , 166 , 185 , 186 , 187 , 188 , 417 , 418 -419;

Maxwellian distribution of, 186 -187;

potential, 130 , 141 -142, 146 , 160 ;

transformation of, 144

ENIAC, xiii , 19 , 20 -21, 25 -26

Entropy, 122 ;

complexity and 445 , 446 , 460 -461, 482

Epidemics, 37

Equilibrium, approach to, 156 , 161 , 162

Ergodic:

average, 349 ;

behavior, xvi , 131 , 132 , 140 , 143 , 155 -162, 192 , 293 , 294 , 348 ;

limit, 217 -218, 350 , 352 ;

motion, 144 ;

theorem, 143 , 155 -158, 399 , 409 -410;

transformations, 155 , 158 , 186

Euclidian:

distance, 125 , 488 , 515 ;

space, 39 , 54 , 156 , 217 , 293 , 298 , 515 , 517 , 523 , 526

Eulerian variable, 125 -126


555

Evans, Trevor, 495

EVE, 434 -442, 443 ;

PQ, 437 , 438 , 441 , 444 ;

PM, 438 -440;

POS, 440 -442, 443 , 444

Everett, C.J., xi , xii , 1 , 37 , 163 , 188 , 417

Evolution:

Darwinian, 430 ;

distance in, 466 , 467 ;

in mathematics, 522 ;

via mutations, 287 , 430 -431, 432 -433, 434 -442, 443 -444;

rate/development of, 429 -430, 432 , 434 -442, 443 , 468

Evolutionary trees, 466 , 469 , 470 , 471 , 472 , 473

Explosion:

external, 163 -177;

history-dependent, 402 -405;

nuclear, 163 -177, 179 -180, 182 -183;

velocity of, 165 -166

F

Faber, Vance, 480

Feller, W., 400

Fermi, Enrico, x , xii , 139 , 156

Feynman, Richard P., 2 , 9

Fine-structure, 357

Fission, xi , 19 , 21 , 22 , 24 -25;

bombs, 164 -165;

tamper, 34 , 35

Fitch, W.M., 466 , 472

Fixed points, 6 , 8 , 37 , 40 , 42 -45, 198 , 337 , 338 , 361 , 368 , 372 , 481 ;

attractive, 202 , 294 , 299 , 308 , 321 , 330 -333, 345 , 348 ;

boundary (see Boundary points);

Brouwer on, 201 , 202 , 330 n, 491 ;

for continuous function, 4 ;

convergence to, 4 , 305 , 308 -309, 310 ;

death, 43 , 44 , 85 , 105 ;

equation, 234 ;

inside gap, 358 ;

interior, 199 , 200 , 201 , 202 -207, 208 , 210 -211, 212 , 213 , 215 , 222 , 223 , 224 , 225 , 226 , 314 , 328 ;

invariant points as, 294 -299;

iteration behavior of, 221 , 327 ;

limit points as, 43 ;

limit sets as, 330 ;

nodal, 199 , 200 , 201 , 213 , 222 ;

non-attractive, 225 ;

repellent, 202 , 294 , 299 , 316 , 328 , 330 -333, 342 , 345 , 354 , 359 ;

in supercritical case, 38

Flow, 481 -482;

ergodic, 159 ;

Liouville, 186 ;

volume-preserving, 156 -157, 159

Flux, 176

Ford, Kenneth, W., 188

Formal systems, 521

Form stability, 210 -213

Fourier, Baron Joseph, 159 ;

series of, 122 , 139 , 141 , 143 , 160 , 161 , 490

Fox, R.H., 500

France, research in, xvi

Frankel, Stanley P., 2 , 9

Fréchet spaces, 497

Frisch, Otto, 12

Frobenius, G., 144 , 218

Fusion, 155

G

Gases, 123 -129


556

Gauss, Karl, 133 , 414 , 519

Genealogies/genealogical systems, 490 -491, 496 , 520 ;

intervals in, 85 ;

measure theory of, 85 , 86 -91, 95 ;

of multiplicative processes, 85 , 92 -105;

space of, 98 -105

General Dynamics, 163

Generating function, 6 , 40 -41, 61 , 62 , 64 , 74 , 76 , 79 ;

geometrical factors in, 12 , 13 -14;

iteration of, 2 -5, 7 , 9 -14;

moments and, 5 , 11 -12;

monotonic, 7 ;

and probability distribution, 38 , 39 , 75 , 81 ;

in sub-critical systems, 9 -10;

for time sum, 11

Generating transformation, 37 , 38 -39, 42 -45, 49 , 65 -66, 69 , 70 , 74 , 75 , 79 , 84 , 92 , 100 , 104 , 105

Generation, xi , 2 -15, 38 , 494 -496;

asexual/nonsexual, 429 , 430 , 431 , 432 -433, 444 ;

via collision, 190 ;

as continuous, 2 ;

by mitosis, 520 , 521 ;

probability in, 2 -3, 8 , 10 , 12 , 13 ;

quadratic functions in, 190 ;

random, 2 ;

pairing in, 194 , 286 ;

sexual, 429 , 430 , 431 , 434 -442, 443 , 520 ;

in subcritical case, 9

Genetics, 194 n, 287 n, 289 n

Geometry:

recursively studied, 379 -386;

of vertex, 60

Gibbs, G. W., 162

Gluskin, L.M., 492

Goad, Walter, 526 -527

Gödel, Kurt, 490 , 521

Good, I. J., 446

Graphs. See Genealogies/genealogical systems

Gravitating systems, 129 -131

Gravity, xiii , 124 , 174 , 185 -188

Growth pattern: xiii ;

boundary in, 382 , 383 ;

conflict for survival in, 379 -380, 381 , 384 -385;

death in, 379 , 380 , 381 , 382 -384, 385 ;

dogbone, 382 ;

periodic, 381 -382, 383 ;

rules of, 380 -381, 384 -385;

self-replicating states in, 384 ;

three-dimensional, 380 , 381 , 385 -386;

two-dimensional, 380 , 381 -382

H

Haar, Alfred ter, 155

Hamilton, William R., equation of, 156 , 158 , 186

Hamming, R.W., 470 ;

distance, 527

Harlow, Frank, 121

Harris, T.E., 37

Hartigan, J.A., 473

Hausdorff, Felix, distance, 341 , 468 -469, 471 , 472 , 482 , 488 , 516 , 518 , 529 , 530

Hawkins, David, xi , 1 , 38 , 65

Hendry, Barbara, 535

Hessians, 38 , 40 -42. See also Moments, second


557

Heuristic studies, 121 -138

Hilbert space, 517 , 526

History-dependent processes, 399 -410;

explosions, 402 -405

Holonomic systems, 122

Homeomorphisms, 293 , 294 , 295 , 349 , 498 , 517

Homogeneous transformations. See Quadratic transformations, homogeneous

Homomorphisms, 500 , 518 , 521 , 522 ;

product, 497 , 498

Howorka, Edward, 494 , 498

Hydrides, 34 , 174

Hydrodynamics/hydrodynamical problems, 19 , 20 , 121 , 122 -126;

magneto-, 131 ;

neighbor relations in, 124 -125, 126 ;

time intervals in, 126

Hydrogen, 164

Hyers, D.H., 517

Hypergeometric function, 414

I

IBM 704, 197

IBM 7090, 309

IBM 7094, 442

IBM STRETCH, 303 n, 304 n, 311 , 323 , 354 , 356 , 357 n

Inertia, 130

Information retrieval, 483 -484

Instability, 123 -126, 339 , 342 , 343 . See also Stability

Integer:

-differential equations (see Differential equations, partial);

modulo a prime, 445 , 448 , 519 ;

sequences, 399 , 405 -410

Intercontinental ballistic missiles, 163 , 170

Interior points. See Fixed points interior

Intervals, 85 , 86 , 87 , 88 , 90 , 95 , 96 , 97 , 100 , 102 , 103

Isomorphism, 480 , 492 , 493 , 517 ;

and flow, 482 ;

non-product, 498 ;

product, 481 -482, 497 , 498 , 499 ;

weak product, 482 , 499

Iteration, x , 78 , 80 ;

convergence under, 37 ;

and ergodic behavior, 293 , 294 ;

of fixed points, 221 , 327 ;

of generating function, 2 -5, 7 , 9 -14;

of nonlinear transformations, 189 ;

parameters of, 126 ;

of quadratic transformations, 191 -192, 197 -198;

of transformations, 84 -85, 345 , 349 .

See also Composition

J

Jacobians/jacobian matrix, 37 , 38 -39, 308 , 323 ;

calculated, 125 , 126 ;

properties of, 65 -68, 70 ;

value of, 202 , 206 , 215 , 330 , 331 , 341 n

Jardine, N., 467

Jetting, 128

JOHNNIAC, xiii

K

Kac, Mark, 405 , 513


558

Kepler, Johannes, 186 , 187 -188

Kolmogorov, A., 460

Kronecker, Karl Hugo, 158 ;

delta function of, 49 n, 470

Kronecker-Weyl theorem, 192 , 218

L

Lagrange, J. L., 122 , 124 , 141

Laplace, Pierre Simon, 3

Larson, Jean, 480

Leakage, 2 , 13

Lebesgue, Henri, measure, 217 , 350 n

Lebesgue-Stieltjes, measure, 217

Limit, 192 ;

cycles, 344 , 345 ;

ergodic, 217 -218, 350 , 352 ;

expectation, 81 ;

function, 82 ;

periodic, 207 -209;

point, 4 , 45 -48, 313 , 314 ;

sets (see Limit sets)

Limiting:

behavior, 189 , 190 , 192 , 193 , 198 , 201 , 207 -209, 221 , 227 , 296 (see also Oscillation);

distribution, 9

Limit sets, 312 -326, 372 ;

of associated transformations, 338 -340;

brute force method to find, 304 , 305 -306;

Class I (closed curve), 315 -317, 322 , 324 , 339 , 340 , 343 , 347 ;

Class II, 317 -319, 322 , 328 ;

Class III (pseudo-periods), 319 , 322 , 328 -330, 339 , 341 , 342 , 343 , 357 , 358 ;

Class IV, 319 -321, 322 -323, 326 , 337 , 339 , 340 , 341 , 342 , 348 ;

convergence to, 306 , 321 , 345 ;

defined, 313 -314;

finite, 294 , 306 , 310 , 314 , 322 , 324 , 333 n, 338 -340, 345 ;

fixed points as, 330 ;

gap, 358

infinite, 314 -321, 322 , 324 , 326 , 328 , 336 , 339 , 340 ;

one-dimensional, 317 -319;

open-cycle, 347 , 348 ;

pathological, 299 , 301 , 330 , 340 , 358 ;

periodic, 317 , 333 -334, 338 , 339 , 340 , 341 , 342 , 357 , 358 ;

for quadratic transformations, 323 -326;

structure problem and, 314 ;

three-dimensional, 302

Linear transformations, 216 , 217 , 218 ;

piece-wise, 301 . See also Broken-linear transformations

Liouville, Joseph, 156 , 186

Lipschitz constant, 108 -109, 112

Longmire, Conrad, xii , 179

Looping coefficients, 133

Los Alamos National Laboratory, ix , x , xii -xiv, 179

Lwów school of mathematics, x , 509 , 510 . See also Banach, Stefan

M

McKinsey, J. C. C., 480 , 492 , 494

Magill, K.D., Jr., 492 , 493 , 494

Magnetic field/force, 131 -133, 159 , 164 -165, 176

Manhattan Project, x

MANIAC I, xiii , 139 , 140 , 158 -159

MANIAC II, 307 -308, 310 , 353 , 354 , 355 n, 357 n


559

Many-body problems, xvi , 187 -188

Mapping, 346 , 530

Marczewski, Edward, 471

Margoliash, E., 466 , 527 -528

Markoff operators, 57

Markov, Andrey Andreevich:

distance, 472 ;

process, 400 , 410 , 444

Mass, 165 -166;

constant, 172 -174;

ratios, 163 , 164 , 169 -170, 172 , 174

Mathematics:

applied, ix , xiii , xvi , 465 -475;

in biology, xiii , xvi , 465 -475;

computer aided, 121 -138;

evolution in, 522 ;

experimental, 297 ;

graph of, 490 -491;

pure 297 , 514

Matijasevic, J. V., 461

Mating:

preferential, 438 , 439 , 444 ;

random, 434 , 436 , 438 , 439 , 440 , 444 ;

rules, 194 , 286 , 289 , 295

Matrix:

arbitrary, 57 ;

first moment, 37 , 64 , 69 , 70 , 72 , 74 , 105 ;

positive, 55 , 56 , 60 , 66 , 67 , 68 , 144 ;

supercritical, 56 -57

Mauldin, R. Daniel, 509

Maxwell, James Clark, 186 -187

Mazur, S., 511

Mean free path, 15 , 20 , 130

Measure/measure theory, 86 -91;

and dissimilarity/difference, 465 -466, 467 -468, 469 -470, 473 ;

intervals in, 85 , 86 , 87 , 88 , 90 , 95 , 96 , 97 , 100 , 102 , 103 ;

invariant, 217 ;

and neighborhoods, 98 , 99 , 100 , 103 ;

points in, 86 ;

properties of, 446 ;

of sets, 86 , 87 , 88 , 89 , 90 -91;

of similarity, 465 ;

space, 409 , 482

Mechanics. See Classical mechanics; Continuum mechanics; Statistical mechanics

Meiosis, 521

Memory:

built, 489 -490;

computer, 132 ;

nervous system/brain and, 478 , 484 -485, 486 , 489 , 490 , 532 ;

and recognition, 483 , 485 -490, 526 , 529 , 530 , 531 -532, 533

Mendel, Gregor, 194 n, 287 n, 289 n

Menzel, Mary Tsingou, xiii , 189

Mesons, 38

Metric, 98 , 483 ;

in biology, xiii , 465 -475;

for molecular taxonomy, 469 -473;

non-Euclidean, 125 ;

in pattern recognition, 465 ;

sequence, 472 -473;

set, 526 (see also Space, Euclidean);

space, 466 , 467 , 468 -469, 472 , 528 ;

transitivity, 143 , 157 -158, 186 (see also Behavior, ergodic);

ultra, 468 .

See also Distance

Metropolis, Nicholas, 483

Miller, D. D., 492

Mitosis, 430 , 491 , 520 , 521

Mixing, 123 , 142 , 143 , 159 , 161


560

Moments:

calculated, 5 , 11 -12, 14 ;

combinatorial, 6 ;

first, 2 , 5 , 7 , 37 , 38 -39, 64 , 65 , 69 , 70 , 72 , 74 , 105 ;

and generating function, 5 , 11 -12;

properties of, 123 , 129 -130, 131 ;

second, 2 , 5 , 15 , 38 , 40 -42;

Montague, J.S., 492

Monte Carlo method, ix , xi , xvi , 17 , 18 -36, 402 , 406 , 477 -478, 530

Morphism, 492 . See also Homeomorphism; Homomorphism; Isomorphism

Motion, 130 , 142 , 144 , 159

Multiplicative processes, xi , 1 -15, 37 -119;

branching and, 520 ;

as continuous, 2 ;

fluctuations in, 8 , 15 ;

genealogies of, 85 , 92 -105

Mutation:

evolution via, 287 , 430 -431, 432 -433, 434 -442, 443 -444;

as transformation, 471 , 472

Mycielski, Jan, 466 , 498

N

Neighborhoods, 98 , 99 -100, 102 , 103 , 491

Nervous system, 478 , 484 -485, 486 , 489 , 490 , 532

Neumann, John von, ix , x , xii , xv , 17 , 18 -33, 125 , 127 , 157 , 409

Neutron:

absorption, 2 , 3 , 13 , 21 , 22 , 24 , 25 ;

active material of, 18 , 21 , 22 , 24 , 25 , 34 ;

collision. 17 , 18 , 19 , 20 , 22 , 24 -25, 190 , 195 -196, 418 ;

cross-section of, 19 ;

density distribution of, 127 , 130 ;

diffusion, xi -xii, 17 -36;

fission, xi , 19 , 21 , 22 , 24 -25, 35 ;

flux, 176 ;

heating, 183 ;

leakage of, 2 , 13 ;

linearly extrapolated path of, 22 -24;

mean free path of, 15 , 20 , 130 ;

parent, 13 ;

scattering. ix , 21 , 22 , 24 , 25 , 35 , 400 ;

slower-down material of, 18 , 21 , 22 , 24 , 25 , 34 , 35 ;

sojourn time of, 143 , 157 , 350 , 352 ;

tamper material of, 18 , 21 , 22 , 24 , 25 , 34 ;

velocity, 18 -19, 21 -22, 34 , 35

Nonlinearity/nonlinear transformations, 293 -377;

broken-linear (see Broken-linear transformations);

computer study of, xii , xvi , 139 -154, 297 -377;

cubic (see Cubic transformations);

difference equations, 191 -192, 344 -345;

differential equations, 300 , 344 -345;

displacement in, 139 , 140 , 145 , 151 , 153 , 160 , 161 ;

ergodic behavior of, 140 ;

iterations of, 189 ;

polynomial, 345 ;

quadratic (see Quadratic transformations);

time in study of, 140

Normalization, 192 -193

Norris, E.N., 495

Nuclear:

constant, 2 ;

explosion, 163 -177, 179 -180, 182 -183;

propulsion, 163 -177, 179 -184

Numbers:

complicated, 445 , 460 , 519 ;

large, 8 -9;

p-adic, 468 ;

prime, 409 , 445 , 460 , 461 , 519 ;

random, 355 n;

rational/real, 522 -523

O

Oscillation, 198 , 199 , 200 , 201 , 207 , 208 , 209 , 211 , 222 , 224 , 225 , 227 , 296 , 343

Oscilloscope, 301 , 307 , 310 , 319 , 326


561

Ostrowski, A., 330 , 333 , 345

Oxtoby, John, 158

P

Pair trees, 491 , 520

Parallel computations, 477 , 478 -508

Partials, 69 . 70 , 77 , 81

Particles:

cascades of, 37 ;

elementary, xvi ;

and radiation 417 , 418 , 422 .

See also Neutron

Pasta, John, xii , xiii , 121 , 139 , 156

Pathological:

behavior, 358 ;

limit sets, 299 , 301 , 330 , 340 , 358 ;

systems, 221

Pattern:

algebras, 492 , 494 , 496 (see also Relational structures);

of growth (see Growth pattern);

recognition, xiii , 465 , 466 , 478 , 483 , 485 -490, 531 ;

Period (periodicity), 294 :

accidental or false, 353 -354;

attractive, 314 , 316 , 333 -334, 354 ;

fortuitous, 353 -354;

non-attractive, 342 ;

of order k, 311 , 314 , 316 , 322 , 333 -334, 338 , 340 , 354 n, 356 , 357 -358;

pseudo (see Limit sets, Class III)

Periodic:

belt, 358 ;

growth, 381 -382, 383 ;

limit, 207 -209;

limit sets, 317 , 333 -334, 338 , 339 , 340 , 341 , 342 , 357 , 358 ;

points, 201

Perron, O., 144 , 218

Perturbation, 20 , 337 -338, 341 , 342 , 343

Phase space, 156 , 186

Photons, 20 , 38 , 422

Piece-wise linear transformations. See Broken-linear transformations

Pitt, H.R., 409

Planck-Einstein-Tolman treatment, 417

Plasma, 165

Plemmons, R.J., 492

Poincaré, Henri, 344 , 345

Points:

corner, 353 ;

exceptional, 350 -351;

invariant, 294 , 299 ;

limit, 4 , 43 , 45 -48, 313 , 314 ;

non-periodic, 350 ;

periodic, 201 ;

set defined by (see Limit sets).

See also Boundary points; Fixed points

Pólya, George, urn scheme of, 400 , 401 , 402 , 403 , 410

Polynomial, 461 -462;

transformations, 293 , 297 , 300 , 319 , 345 , 348 , 352 n

Probability, xii ;

and branching processes, 37 ;

of death/mortality, 4 , 5 , 7 , 67 , 73 -74, 105 ;

and generating function, 38 , 39 , 75 , 81 ;

in generation, 2 -3, 8 , 10 , 12 , 13 ;

of immortality, 4 , 5 ;

transition, 85

Procreation. See Generation

Product:

homomorphism, 497 , 498 ;

isomorphism, 481 -482 497 , 498 , 499 ;

sets, 480 , 496

Projection, 480 -481, 498 , 520


562

Project Orion, xiii , 163

Propellant/propulsion:

acceleration in, 167 , 168 -171, 174 ;

air as, 176 ;

chemical, 174 , 176 ;

distance, 165 , 167 , 177 ;

external, 163 -177;

gravity as, xiii ;

heating by, 164 , 176 ;

hydrogen as, 164 ;

internal, 179 -181;

kinetic energy in, 166 ;

magnetic field in, 164 -165, 176 ;

mass in, 165 -166, 172 -174;

nuclear, 163 -177, 179 -184;

positioning of, 176 -177;

temperature, 165 ;

velocity in, 165 -166, 168 , 173 , 182 , 183

Protein, 466 , 469 , 470 , 471 , 472

Pseudo-periods. See Limit sets, Class III

Q

Quadratic function, 190

Quadratic transformations, 139 , 142 , 144 , 147 , 149 , 161 , 189 -291;

fourvariable, 298 , 299 , 302 , 311 , 313 , 323 -326;

homogeneous, 191 -192, 218 , 225 , 286 -291, 294 -296 (see also Binary reaction systems);

iteration in, 191 -192, 197 -198;

limit sets for, 323 -326;

three-variable, 189 , 218 -221, 300 n, 304 -305.

See also Broken-linear transformations

Quasi-states, 144 , 162

R

Rademacher, Hans Adolph, 122 , 490

Radiation, 417 , 418 , 422

Random:

ergodic theorem, 399 , 409 -410;

history-dependent processes, 399 -410;

mating, 434 , 436 , 438 , 439 , 440 , 444 ;

pairing, 194 , 286 ;

processes, xi , 129 , 399 -410;

procreation, 2 ;

walk, 399 -405

Rationals, binary, 351

Ratios, 61 -64;

strong, 106 -118

Rayleigh, John W., 142 , 161

Reaction systems. See Binary reaction systems

Recognition, 525 ;

discrimination compared to, 526 ;

distance as tool for, 526 -533;

memory and, 483 , 485 -490, 526 , 529 , 530 , 531 -532, 533 ;

pattern, xiii , 465 , 466 , 478 , 483 , 485 -490, 531 ;

of two-dimensional objects, 526 , 528

Recursion, 5 -6, 11 , 79 , 379 -397

Reichert, T.A., 471

Reines, F., 164

Relation:

algebras, 492 -496;

of neighbors, 124 -125, 126 ;

theory, 496

elational structures, 480 -492;

semigroups of, 481 , 492 -496;

topological, 494

Richtmyer, Robert D., xii , 17 , 34 -36

RNA, 471 , 472

Rockets. See Propellant/propulsion; Space vehicles

Rota, Gian-Carlo, 482 , 483

Rotations, irrational, 192

S

Scattering, ix , 21 , 22 , 24 , 25 , 35 , 400 . See also Monte Carlo method


563

Schein, B.M., 492

Schrandt, Robert, xiii , 379 , 399 , 429 , 466 , 485 , 531

Schwarz, S., 492

Schwarz inequality, 108

Scottish Book, preface to, 509 -512

Sellers, Peter, 527

Semigroups, 481 , 492 -496

Sets:

bounded, 54 ;

closed, 54 , 60 , 99 , 102 , 103 ;

clusters of, 518 ;

convex, 54 , 60 ;

distances between, 466 , 468 -469, 485 , 486 , 492 , 516 -517, 518 , 529 , 530 -531;

of exceptional points, 350 -351;

finite (see Period);

inner point of, 54 ;

invariant, 294 , 306 ;

limit (see Limit sets);

lower tree of p, 351 ;

measurable, 86 , 87 , 88 , 89 , 90 -91, 94 , 99 , 100 ;

metric (see Space, Euclidean);

non-void, 54 ;

null, 100 ;

open, 102 , 103 ;

product, 480 , 496 ;

subsets of, 306 , 446

Shannon, Claude, E., 461

Sharp, David, x

Sherwood Project, 155

Shrinking operator, 330 n

Sibson, R., 467

Similarities. See Analogy

Simmons, L.M., x

Smith, Temple, 465

Sneath, P. H. A., 466

Sojourn time, 143 , 157 , 350 , 352

Sokal, R.R., 466

Solitons, 139

Space:

Euclidean, 39 , 54 , 156 , 217 , 293 , 298 , 515 , 517 , 523 , 526 ;

flow in, 156 ;

of genealogies, 98 -105;

Hilbert, 517 , 526 ;

measure, 409 , 482 ;

metric, 466 , 467 , 468 -469, 472 , 528 ;

phase, 156 , 186 ;

t-, 58 , 59 ;

transformations of, 126 , 143 , 144 , 157 , 517 , 523 ;

variables of, 122

Space vehicles:

chemical, 164 -176;

gravity affects, xiii , 185 -188;

kinetic energy of, 185 , 186 , 187 , 188 ;

multi-stage, 164 .

See also Propellant/propulsion

Stability, 343 , 517 , 518 ;

form, 210 -213.

See also Instability

Star clusters, 123 , 130

Statics, 129

Statistical mechanics, 142 -143, 155 , 159 , 161 , 187

Steady state solution, 420 -421, 425

Stein, Myron L., xiii , 445 , 465 , 482

Stein, Paul, R., xii , xiii , 189 , 293 , 409

Steiner, Jakob, 528

Steinhaus, Hugo, 471 , 511 ;

distance, 482 , 488

Stieltjes, T. J., 217


564

Stirling number, 298

Strong ratio theorem, 106 -118

Subbiah, S., 495

Substitution, 7

Supercriticality, 4 , 6 , 7 , 37 -38, 49 -60, 106 -118. See also Criticality

Survival of the fittest, 430

Synergesis, 301 n

T

Tamper, 18 , 21 , 22 , 24 , 25 , 165 -166, 170 ;

fission, 34 , 35 , 174 ;

tuballoy, 34

Tarski, Alfred, 479 , 492

Taxonomy:

metric spaces in, 466 , 467 , 468 -469, 472 , 528 ;

molecular, 465 , 467 , 469 -473

Taylor, Theodore, 44 , 47 , 69 , 72 , 77 , 79 , 163

Thermalization. See Mixing

Thermodynamical systems, 417

Time:

-dependent solutions, 421 -422, 425 -427;

as function of problems, 122 , 125 , 129 -130, 140 , 143 , 144 , 145 ;

intervals, 122 , 123 , 156 -157;

particle, 83 -84;

velocity needs, 186 , 187

Transformations:

analogy preserved by, 523 ;

associated, 335 , 337 , 338 -341, 342 ;

billowing, 126 -129;

bounded, 293 ;

broken-linear (see Broken-linear transformations);

conjugacy of, 294 , 295 , 345 , 348 , 349 , 351 ;

continuous, 293 , 517 ;

cubic (see Cubic transformations);

of energy, 144 ;

ergodic, 155 , 158 , 186 ;

equivalent, 295 ;

fixed points of, 42 -45, 198 , 481 ;

generating (see Generating transformations);

homogeneous (see Quadratic transformations, homogeneous);

inequivalent, 295 -296, 298 , 309 , 311 ;

inverse, 354 -355;

iterated, 84 -85, 191 -192, 197 -198, 345 , 349 ;

linear, 216 , 217 , 218 , 301 ;

many-to-one, 294 ;

modified, 337 , 339 , 359 -361, 368 ;

multivalued, 481 ;

mutations as, 471 -472;

nonlinear (see Nonlinearity/nonlinear transformations);

one-dimensional, 294 , 301 , 317 , 349 , 355 -359;

polynomial, 293 , 297 , 300 , 319 , 345 , 348 , 352 n;

quadratic (see Quadratic transformations);

semigroups in, 493 , 495 ;

of space, 126 , 143 , 144 , 157 , 517 , 523

Transitivity, metric, 143 , 157 -158, 186 . See also Behavior, ergodic

Trees:

evolutionary, 466 , 467 , 470 , 471 , 472 , 473 ;

lower, of p, 351 ;

pair, 491 , 520

Trivial state, 208 , 209

Turbulence, 123 , 139 , 159 , 161

Turing machine, 521

U

Unperturbed state, 20 , 342 . See also Perturbation

V

Vacuum, 123 , 125 , 126 -127


565

Valuation theory, 468

Velocity, 18 -19, 21 -22, 34 , 35 ;

of explosion, 165 -166;

gravity affects, 174 ;

of propellant, 165 -166, 168 , 173 , 182 , 183 ;

time needed for, 186 , 187

Vertex, geometric, 60

Vibrating string calculations, xii , 141 , 142 , 144 , 146 -154, 160 -161

Viking/V-2, 176 -177

Volume, 156 -157, 159

von Neumann. See Neumann, John von

W

Walk, random, 399 -405;

self-avoiding, 399 , 400 , 404

Walsh. J.L., 490

Wave equation, 140 , 160

Westinghouse, 163

Weyl, Herman, 158 , 192 , 218

Whyte, L. L., 511

Wilks, S., 20

Wistar Institute, 429 , 430

Wolfram, S., 379

Wong, A.K.C., 471

Wyler, A., 446

Z

Zarecki, K.A., 492


Preferred Citation: Ulam, S. M. Analogies Between Analogies: The Mathematical Reports of S.M. Ulam and his Los Alamos Collaborators. Berkeley:  University of California Press,  c1990 1990. http://ark.cdlib.org/ark:/13030/ft9g50091s/